Amazon’s Alexa is going to gain a more natural-sounding voice

Amazon’s voice assistant is going to sound more natural in addition to receiving a generative AI-powered boost and the ability to carry on conversations without using the wakeword “Alexa” once more. Today, the firm unveiled an improved “speech-to-speech” engine that is more sensitive to the user’s emotions and voice tone. As a result, Alexa can now react to queries with a range of emotions.

The company unveiled the new voice, which it said was powered by enormous transformers that had been trained on many languages and dialects. It generated a less robotic-sounding Alexa and had more expressiveness.

For instance, Alexa would be able to react with a happy voice if a consumer requested for an update on their favourite sports team and they had just won the most recent match. But Alexa would sound more sympathetic if they had lost.

“And we’re working on a new model — which we refer to as speech-to-speech — again powered by massive transformers. Instead of first converting a customer’s audio request into text using speech recognition, and then using an LLM to generate a text response or an action, and then text-to-speech to produce audio back — this new model will unify these tasks, creating a much richer conversational experience,” said SVP of Alexa Rohit Prasad.

According to Amazon, Alexa will be able show characteristics like surprise, laughter, and even uh-huhs to pique customers’ interest in continuing the conversation.

The Speech-to-Speech (S2S) and Large Text-to-Speech (LTTS) technologies from Amazon enable everything. While the latter adds audio input in addition to text to enable Alexa to modify its response with more conversational richness, the former enables Alexa to adapt its response using textual input such as a user’s request or the topic being addressed.