You can run a transcription model and a language model (the AI you talk to) locally however you will need a beefy GPU especially if you want to run the large models for better results.
OpenAI’s Whisper is open source and does transcription, and you can run inference on language models like LLaMa (+variants) or GPT4all locally. To store information long term (“AI memory”) you could find an open source vector database but I don’t have experience with this.
This is a camera shake effect that has been added before the speed ramping. Most video editors can do this effect: