Can you manage your house with a local, no-cloud voice assistant? Mostly, yes.

catculation@lemmy.zip to Technology@lemmy.world – 496 points –
Can you manage your house with a local, no-cloud voice assistant? Mostly, yes.
arstechnica.com
95

The problem I have always had with voice control is that it just doesn't really seem to fit into my home automation. I don't want to give Home Assistant a verbal command to turn on the lights. I want it to detect that I've entered the room and set the lights to the appropriate scene automatically; I haven't touched a light switch in weeks. For selecting an album or movie to play, it's easier to use a menu on a screen than to try to explain it verbally.

Don't get me wrong. I'm hugely in favor of anything that runs locally instead of using the "cloud." I think that the majority of people running a home automation server want to tinker with it and streamline it to do things on its own. I want it to "read my mind." The people who just want a basic solution probably aren't going to set up HA.

Maybe I'm missing a use case for voice control?

Funny...I'm the exact opposite. I don't want it to detect that I’ve entered the room and set the lights to the appropriate scene automatically. Unless it can detect when I don't want to go into a dark room and be blinded by lights I didn't want on, I want to control when it turns on. Unless it can determine that I'm only home from work for a few minutes to go to the bathroom, I don't want it to adjust the heat settings. In other words, until it can actually read my mind, I want to be able to control it and tell it what I want when I actually want it.

I'm looking into an HA setup specifically to get away from Alexa and host everything locally. I may only want simple controls, but I want to truly control everything myself.

I loved being able to control the dimmer level or color of the lights using voices controls.

I set up a few IFTTT recipes to create lighting and music scenes for things like reading, conversation, movie watching, date night, party time, and a few others and triggered them with a voice command.

It was always a hit with whoever I brought over, but mostly it just did 4 or 5 things with one voice command.

Same here.

  • I have no idea how to reliably sense who or how many people are in a room, going by questions here. The presence sensors I’ve tried so far are really inadequate
  • even if I knew who or how many are in the room, I have no idea if there is any logic to correctly decide whether I want the light on and how much
  • voice control of lights is more useful to me, although Alexa is slow and I haven’t yet tried other approaches
  • scheduled lighting has been surprisingly useful. That reminds me, I need to schedule dining room at 20% at 6am m,w,f
3 more...

My main use cases are Timers in Kitchen, Finding my wife's phone, and turning off music.

My main use case for voice is for things that I haven't been able to (reasonably) automate. For a couple of examples:

  1. Saying "turn on the TV" as I'm grabbing lunch and walking across the room
  2. "Turn on/off the stars" for bedroom mood lighting
  3. "Timer for x" is honestly probably the most used things.

It's all fairly trivial stuff to do manually but I think that's probably true for the vast majority of home automation.

Try "start Netflix on the TV". Should work for most services. (At least it does with Chromecast and Home Assistant).

Even ignoring privacy arguments, I think that voice control is a great use case for running services locally - lower latency due to not having up upload your sample and the option of having it learn your accent is very attractive.

That said, voice control is irritatingly error-prone and seems to be slower than just reaching for the remote control. I agree that automatic stuff would be best, but some stuff you can't have rules for.

Something that would be interesting is a more eye- and gesture-based system: I'm thinking something like you look at the camera and slice across your throat for stop or squeeze fingers together to reduce volume. This is definitely one to run locally, for privacy and performance reasons.

Assistive technology has been focused on this for a while.

My brother had severe cerebral palsy and for years (80s-90s) communicated via analog technology, a literal alpha/iconography communication board, which he could tap on with a head wand. By 2000 he had a digital voice, but still had to use a wand.

Stephen Hawking demonstrated eye sensing technology almost as soon as it was invented and that’s been over a decade ago.

In most cases, there is a definite aspect of “bespokeness” to implementing assistive consumer communication technology, but the barriers implementing the same for an able audience would appear much lower.

But where do you put the camera? If you're sitting in front of the TV, then near the TV makes sense. What if you're sitting facing a different direction with a book though? What if your hands are full?

A camera based system would be much more limited, and probably wouldn't work in the dark.

You're assuming that we can't have both. Why not have it as an complementary input?

I think looking at a device and talking is better than saying hey $brandname before everything, but having both would be better still.

My #1 use case is setting timers. My hands are messy in the kitchen, need to set 35 different timers to get the kids outta the house in the morning.

I love voice control specifically for telling the house to warm up before I get out of bed. I don't even have to grab my phone. I also use it almost daily to have music start playing from spotify.

I thought of doing that but now it is essentially my alarm clock. In a bit annoyed at maintaining separate alarms but:

  • heat comes up 15 minutes before
  • watch vibrates to wake me
  • speaker plays music from Spotify for half an hour
  • if I hear suddenly silence when the music stops it’s time to panic because I’m late

How does it know what scene you want? If you walk in to a room and want to watch TV, you might want the lights to be dimmer than if you're going to read a book, for example.

For selecting an album or movie to play, it's easier to use a menu on a screen than to try to explain it verbally.

How? I can put on my best Captain Picard voice and say 'Computer, play the album Insomniac by Green Day' much faster and easier than I could pick up the remote, turn on the media player, scroll to music, scroll to G, find Green Day, scroll to Insomniac, and press play.

I've got Amazon devices (bought before I knew how bad both they and Amazon are), and they're not great. Even with them, I can walk in to my living room in the night with my hands full and tell them to turn my chosen lights on, set the brightness and colour, start playing my chosen music, or turn the TV on and start playing certain media, all while I'm walking to my seat.

The only media that I can't play is what I haven't set up to use with Alexa yet, but that would be the same for any automation.

When I get around to it, I'm going to add either Plex or Jellyfin to my voice control setup, and hopefully be able to play anything from my library in the same way :)

My kids like it sometimes for asking it to tell knock-knock jokes, that's about it though.

The majority of people do not have perfect whole house motion sensors setup to turn on and off lights. Congrats, you're the 0.0001%.

Not everyone even wants lights to turn on just because the room is occupied.

Maybe when the sensor can locally detect that im eating in front of the TV so turn on the main lights, and when I put the plate down its time to turn those off and use the dimmer ambient lights

Well, Ibought a few Google speakers back in the day for easy voice commands, mostly for lights and weather.

I unplugged them for privacy. Still use one in the garage for music, but it might as well be a Bluetooth speaker at this point.

And easy to hook up local home assistance would be ideal for me. I don't want to spend weeks of my time fussing with it. It's not a hobby to me, just convenience

want it to detect that I've entered the room

This is a thing. If you don't want motion sensors and you want it to know your cousin from your cat, Cisco and others can pinpoint the location of people on a real time map (like that map in Harry Potter) by your cell or Bluetooth.

Local cops use it to log where people were during concerts and events and see who was nearby when a bad thing happened.

When you don’t want lights on all the time, it’s a good option. You can’t program automations to do things you can’t detect automatically, such as eating vs snacking, or watching a scary movie with the spouse vs watching a “scary” movie with the kids, vs watching a kids movie.

You can automate a lot, but when you have overlapping routines that don’t work together, and don’t have a way to be detected, you’re limited in your options. Voice is one solution.

automations

Heh. That's like "traffics".

In this case, I was referring to Home Assistant automations, where an automation is a cause and effect configured by the end user. So “automations” is a plural referring to a group of singulars.

Edit: A word

Friends have voice stuff, it's pretty annoying

I don't use voice control since I'm against all cloud based services but am definitely interested and waiting for a good local option, paired with some decent devices that I can disseminate around the house.

I have been using ha for a very long time but am the opposite, haven't gone down the rabbit hole and have only half a dozen automations. With the exception of living room lights that come on 39 minutes before sunset, I don't control lights. I have absolutely no problem using a switch as I enter a room, and see no point overcomplicating things in trying to guess what should happen when someone enters a room.

Being able to look for a recipe, set a timer, choose a youtube video, a song or a playlist while I am kneading dough and my hands are caked? Id love voice control while I'm in the kitchen and can't touch a phone without having to clean wash and dry my hands

4 more...

I've been doing home automation for awhile now. Voice assistant is never anything I would consider. What problem does it solve that a button doesn't do with less hassle?

Also, note automation. The whole point is for the house to do its thing with minimal interaction based on triggers and states. Everyone leaves? Turn off the lights, lock the doors, turn down the heat. TV comes on after dusk? Dim the living room lights if they are on. Going down the basement stairs? Turn on the lights. Cat just used the litter box? Turn on the hepa filter for a bit.

What problem does it solve that a button doesn’t do with less hassle?

I'd sure like to know how you got your automation to function 100% perfectly based on simple triggers and states.

TV comes on after dusk?

What if the TV comes on in the late afternoon and normally its bright enough that it needs to close the drapes but today it's cloudy so you'd like them to be open instead so the room isn't pitch black? Or maybe you'd like to watch the backyard for some reason?

"House...open the living room drapes."

There's ALWAYS edge cases where Voice Control is useful to backstop "dumb" automatons built on triggers and states.

Sometimes:

  1. I don't have an automation ready to go to do exactly the sequence of things I want to currently do.

  2. I'm warm in bed and I'm lazy and I don't have a phone or computer handy.

This is exactly me. I use voice command for the laziness feature. I don't often have my phone right at hand. And I just need to turn the lights up to 100%.

It's pretty handy for things like being able to just say "hey Google, unlock the door" when I'm carrying a dozen bags of groceries.

I use automations as well, but sometimes I need something done outside of my otherwise-considered parameters. And it's easier to just yell your wish into being than to take out your phone, open an app, select the device, then pick your command.

Telling your house to go to sleep when you’re ready is much easier than stopping the cuddle with your wife or animals to press a button.

Heat pumps are more efficient maintaining the same temp over the course of the day rather than heating in bursts.

Using Voice Assist pipeline via the HASS cloud subscription works a heck of a lot better than locally. Locally it takes about 15 seconds to respond, via the Nabu Casa server it's about 1 second. I've considered dedicating a box to the containers it's instantiating to do this to get faster response.

What hardware is it running on that takes 15 seconds? I've not actually tried it myself as I've got a poor little RPi 3, and I don't want to scare it.

The M5stack atom echo. The hardware is the same, but if you change the pipeline in the back end between the two, that's where the delay happens. You can run the Whisper stack locally or on another box locally but I think you'd want a good GPU on it to offload the NL processing to. Which is probably what happens when you're using the Nabu Casa pipeline.

Do you think throwing a coral TPU on there would help?

I saw it helps a ton with Frigate facial recognition.
I was planning to do that on my Yellow once I can get the display thing that's pictured in the article.

Idk if any LLMs are set up to operate on anything except GPUs, its an interesting question.

So what is Home Assistant using for this?

If I were to build it myself I'd probably over complicate it by using multiple llm agents on a local server. Probably use whisper to do the speech to text and then Mistral fine tuned on the Rosetta code dataset to send the API calls to HA. However that wouldnt keep it from always listening to me and trying to interpret what I say into a command for HA. Is that just a prompting issue for whisper or would I need another agent to turn on whisper?

I could maybe get this to run without specialized hardware like a GPU but it would be better to have something for the llms to be a bit more responsive.

There is no LLM, it just used to recognize simple commands such as "turn on kitchen light". What the "conversation agent" can do is very limited, though you can extend it to recognize custom commands. It's not comparable to Google Assistant/Siri, let alone ChatGPT.

I believe there is a ChatGPT integration in the works (optional, of course)

If it runs locally, that'll be awesome. I just hope it never decides to turn the heat up to 90F.

Ideally IMO you'd want a system with safeties in place. Like acceptable temperature ranges or durations for the oven to be on to avoid situations where the software misinterprets a command in a dangerous way.

Something like this:

User: Set temperature to 19 degrees. (Yeah it's on the cold side even for Celsius, but not a crazy amount as room temperature is around 22 degrees)

Assistant: Setting temperature to 90 degrees. (Deadly in Celsius... Water boils at around 100 degrees, depending on pressure)

Assistant: 90 degrees is outside of the safe range defined by your configuration. Intrusion suspected. Deploying sentry guns.

Good question - I have an allowed range configured on my thermostat but I don’t know if it applies to API calls or is just for the UI

There's plenty of local LLM options these days. It's entirely feasible to run it in house.

And if someone can do it... I would suspect that there'll be a HACS module up about 2 weeks ago...

Ok, hmm I wonder how much work it would be to implement it using open source models. I think the hardest part would be translating the voice instructions to an API call that HA can use correctly.

Then there is the whole hardware issue to fix too. I do know that some SOCs are getting good at running 7B parameter models locally but the cost is still probably going to be prohibitive.

I don’t think Sonos gets enough credit for their local voice control capability. It can’t be integrated into home assistant to do anything beyond controlling the Sonos speakers, but I have been ABSOLUTELY blown away by how responsive the voice commands have been. Literally a 100% success rate after using it for a couple months now. It correctly interprets if you want to start/stop playing, can find music by the artist I want from Apple Music (not sure about other streaming services), and will correctly adjust playing status for a specific speaker if you say to adjust music on that speaker only - even if you command it from another room.

The best part - no bullshit worst responses about “by the way….” Like on Alexa. At most, you get a short response like “good choice” or “ok”.

Sonos isn’t cheap, but I would 100% buy them again every time because it just works.

Except it's been in the news that they only support their products for 5 years after release. So as Apple and other streaming services update over the years, your Sonos will stop working.

I think an important correction here is that they say they will only commit to a minimum of 5 years of software updates after they stop selling a particular model.

Even then, there is no reason the speaker wouldn’t still work. To me, that sounds perfectly reasonable. There’s posts about a 15 year old play 5 speaker getting a firmware update within the last year even, so I think Sonos deserves credit where credit is due. They have a proven track record so far, but no doubt it’s something to keep an eye on going forward.

Sonos deserves credit where credit is due.

For five years. I'm not buying shit on a hope and a wish that they'll still support it in 7 years. I'm old enough to know better.

Ok - agree to disagree! Just wanted to share my experience with the community on a quality local voice control that’s available today while the home assistant community works on these exciting developments.

Sonos can suck it for locking settings behind an app that requires your GPS in order to function.

Seriously, connecting to the device through your computer gives you less settings than on your phone.

Oh shit, I'm going to have to shop Sonos then

I never thought I’d be the kind of person to use voice commands, but it is so nice to tell it to turn on the radio while I’m cooking dinner!

I tried the local voice assistant on my Sonos a while back and did like how well it worked to play music but at the time it didn’t support Spotify so it was a no-go. Do they now?

Do you know whether it supports locations and timers? For example I can tell Alexa “Play Eminem in the Family Room for one hour”

Can confirm timer works - set a 5 minute timer for playing music in the family room).

Spotify does work with Sonos now (I don’t use it but I have my account set up with it), but voice control does not work with Spotify.

Thanks. Sorry for being unclear: I use Spotify on my Sonos speakers all the time but was concerned about Sonos Voice working with Spotify. I guess it’s still a “no”

FYI- I received an email the other day that sonos officially added voice control for Spotify!

Excellent! Thanks for posting: I’ll have to give it a try

If only there was an alternative to voice-assisted house management.

Wife? :p

Ha ha ha casual sexism! I get it!

Got permission from the wife (Fucking joke jfc)

Is that supposed to mean something?
Bet you have a black friend too, lmao

Anyone else see a happy little house living in the mouth of an iDomokun?

This is the best summary I could come up with:


Right now, with some off-the-shelf gear and the patience to flash and fiddle, you can ask “Nabu” or “Jarvis” or any name you want to turn off some lights, set the thermostat, or run automations.

It’s not entirely fair to compare locally run, privacy-minded voice control to the “assistants” offered by globe-spanning tech companies with secondary motives.

While outgrowers are happy to leave behind the inconsistent behavior, privacy concerns, or limitations of their old systems, they can miss being able to just shout from anywhere in a room and have a device figure out their intent.

Here’s a look at what you can do today with your human voice and Home Assistant, what remains to be fixed and made easier, and how it got here.

“As it stands today, we’re not ready yet to tell people that our voice assistant is a replacement for Google/Amazon,” Schoutsen wrote.

All that said, it’s impressive how far Home Assistant has come since late 2022, when it made its pronouncement, despite not really having a clear path toward its end goal.


The original article contains 469 words, the summary contains 177 words. Saved 62%. I'm a bot and I'm open source!

I'm pretty new to all this, I just got a smart light and hub, etc. With the idea of using voice commands on my iphone/ipad.

But I was really disappointed to find out that I can't voice activate the command "living room light on", because as soon as Siri hears this, it responds "oh you havent setup a homekit device".

Homebridge is a way to get non-HomeKit devices into HomeKit. It’s what I am using for most of my stuff. It works pretty well in my opinion.

I used homebridge for a long time, but found maintaining it to be a bit of a chore. Home Assistant was easier to maintain and configure, thanks to its web-based interface. And it has a bridge to homekit that achieves basically everything that homebridge did. You may want to investigate it!

Thanks. I’ve found that once it is stable I just don’t bother with updates. I have to reboot the system maybe once every six months.

I am aware of and interested in home assistant and may make the switch when I move to a bigger residence. I do like the idea of having most of the logic on the HA side instead of having to script it all on the HomeKit side, which is just clunky and lacks any real backup options.

But homebridge has worked well for my first foray into home automation, and is pretty good for relatively simple setups.

I have this setup and using the esp my stack device mentioned in the article. The biggest problem with it is esphome and home assistant expect home assistant to be running in a dedicated device for this to work. The integration uses a random UDP port to communicate with the M5 stack device. I had to resort to patching the Integration to use a couple specific ports to work properly.

Unfortunately the fix didn't last long and a update to home assistant updated the integration and now the text to search response fails for esphome devices. My next plan is to try to downgrade esphome integration to the old one that was patched / working and call it a day.

I run HassOS virtualized and have had no issues getting it to use the Atom Echos with ESPhome flashed to them, I have 3 of them in use and a wyoming install on one desktop that communicates as well. Not sure what you might have different. I can certainly see that patching the integration would go badly as a Supervised install will probably lose any patching on container restarts or upgrade.

If your setup allows random udp ports from home assistant out to your network it works fine. Mines running in a kurbenetes cluster.

I used a configmap that mounts the code to patch the integration so it doesn't get overwritten. I haven't had time to troubleshoot more though I don't see why the patch would stop text to speak from working on the esphome devices. The code changed significantly and uses raw audio files now and the only thing I am changing is making the ports not random, but a range instead. The M5 stack firmware appears to be up-to-date too.

Issue on GitHub with all of this is here