声音年- Chapter 2: Let's talk


This year is Home Assistant’s声音年。2023年,我们的目标是让用户以自己的语言控制家庭助理。必威是什么今天,我们提出了第2章,这是我们朝着这一目标迈进的第二个里程碑。

Chapter 1,我们专注于意图 - 用户想要做的事情。今天,家庭助理社区已将必威是什么常见的智能家庭命令和回应转化为45种语言,关闭家庭助理支持的62种语言。必威是什么

For Chapter 2, we’ve expanded beyond text to now include audio; specifically, turning audio (speech) into text, and text back into speech. With this functionality,必威是什么家庭助理协助功能现在可以提供完整的语音界面供用户进行交互。

A voice assistant also needs hardware, so today we’re launching ESPHome support for Assist and; to top it off: we’re launching the World’s Most Private Voice Assistant. Keep reading to see what that entails.

To watch the video presentation of this blog post, including live demos, check我们现场直播的录制。


这new协助Pipeline integrationallows you to configure all components that make up a voice assistant in a single place.


Screenshot of the new Assist configuration in Home Assistant.

有了新的语音助手设置页面,用户可以创建多个助手,混合和匹配语音服务。想要一个以英国口音回应的美国英国助手吗?没问题。聆听荷兰语,德语或法国语音命令的第二个助手呢?或者,也许您想将Chatgpt扔进混合。根据需要创建尽可能多的助手,并从中使用协助dialogas well as voice assistant hardware for Home Assistant.




必威是什么家庭助理云subscription, besides end-to-end encrypted remote connection, includes state of the art speech-to-text and text-to-speech services. This allows your voice assistant to speak 130+ languages (including dialects like Peruvian Spanish) and is extremely fast to respond. Sample:


在addition to high quality speech-to-text and text-to-speech for your voice assistants, you will also be supporting the development of Home Assistant itself.

Join Home Assistant Cloud today

这fully local voice assistant

With Home Assistant you can be guaranteed two things: there will be options and one of those options will be local. With our voice assistant that’s no different.


为了使质量的文本到本地运行,我们必须创建自己的文本到语音系统,该系统已在Raspberry Pi 4上进行了优化,该系统称为Piper。


Piper usesmodern machine learning algorithms对于逼真的语音,但仍然可以快速产生音频。在Raspberry Pi 4上,Piper只能在处理时间的1秒钟内生成2秒的音频。更强大的CPU,例如Intel Core i5,可以在相同的时间内生成17秒的音频。样本:

有关更多样本,请参阅the Piper website

一个add-on with Piperis available now for Home Assistant withover 40 voices across 18 languages, including: Catalan, Danish, German, English, Spanish, Finnish, French, Greek, Italian, Kazakh, Nepali, Dutch, Norwegian, Polish, Brazilian Portuguese, Ukrainian, Vietnamese, and Chinese. Voices for Piper are trained fromopen audio datasets,其中许多来自free audiobooks read by volunteers。如果您有兴趣贡献自己的声音,让我们知道!

你也可以运行Piper as a standalone Docker container

Local speech-to-text with OpenAI Whisper

Whisper是由OpenAI创建的开源语音到文本模型,可在本地运行。自2022年发行以来,Whisper已被开源社区改进,以通过诸如诸如诸如窃窃私语and更快的呼声。在不到一年的进展中,Whisper现在能够为文本提供语音dozens of languageson small servers and single-board computers!

一个使用更快的呼吸速度附加现在可以为家庭助理提供。必威是什么在Raspberry Pi 4上,语音命令可能需要大约7秒钟的时间来处理约200 MB的RAM。Intel Core i5 CPU或更好的能力可以在秒后响应时间,并且可以运行更大(更准确)的耳语版本。



Voice assistants share many common functions, such as speech-to-text, intent-recognition, and text-to-speech. We created theWyoming protocolto provide a small set of standard messages for talking to voice assistant services, including the ability to stream audio.

怀俄明州使开发人员可以专注于语音服务的核心,而无需投入到诸如HTTP或MQTT之类的特定网络堆栈。该协议与即将到来的version 3.0 of Rhasspy,因此两个项目都可以共享语音服务。

With Wyoming, we’re trying to kickstart a more interoperable open voice ecosystem that makes sharing components across projects and platforms easy. Developers and scientists wishing to experiment with new voice technologies need only implement a small set of messages to integrate with other voice assistant projects.


埃斯法姆powered voice assistants

埃斯法姆is our software for microcontrollers. Instead of programming, users define how their sensors are connected in a YAML file. ESPHome will read this file and generate and install software on your microcontroller to make this data accessible in Home Assistant.


我们一直专注于M5STACK ATOM Echofor testing and development. For $13 it comes with a microphone and a speaker in a nice little box. We’ve created a tutorial to turn this device into a voice remote directly from your browser!


埃斯法姆Voice Assistant documentation.


If you were designing the world’s most private voice assistant, what features would it have? To start, it should only listen when you’re ready to talk, rather than all the time. And when it responds, you should be the only one to hear it. This sounds strangely familiar…

A phone! No, not the featureless rectangle you have in your pocket; an analog phone. These great creatures once ruled the Earth with twisty cords and unique looks to match your style. Analog phones have a familiar interface that’s hard to beat: pick up the phone to listen/speak and put it down when done.

With Home Assistant’s newVoice-over-IP integration, you can now use an “old school” phone to control your smart home!

By configuring off-hook autodial, your phone will automatically call Home Assistant when you pick it up. Speak your voice command or question, and listen for the response. The conversation will continue as long as you please: speak more commands/questions, or simply hang up. Assign a unique voice assistant/pipeline to each VoIP adapter, enabling dedicated phones for specific languages.

我们将最初的努力集中在支持Grandstream HT801 Voice-Over-IP盒子。It works with any phone with an RJ11 connector, and connects directly to Home Assistant. There is no need for an extra server.


Give your voice assistant personality using the OpenAI integration.

