The quest to understand every voice worldwide

text-to-speech-ai.jpg

Shutterstock

A speech recognition startup just landed $62 million in Series B funding. How will the money be used? In a quest to enable a computer to understand every voice in the world.

If that doesn’t strike you as overly ambitious, you haven’t spent enough time trying to get Siri to compose a text message. Speech recognition has been a huge challenge for developers and it is a puzzle that is being closely watched in several industries. The technology has implications for human-machine interfaces in fields like roboticsautonomous vehiclesand personal computing, all of which will benefit from computers that can accurately interpret natural speech.

voice recognitionis then a kind of technological entry point, a market need that can spur the development of technologies that will have broad resonance and incalculable implications for how we interact with machines.

It is also a question of shares. Not surprisingly, speech recognition currently works well for a small fraction of the world’s population.

A big part of the challenge is the training model. Most training data has to be classified manually, meaning accuracy is only achievable for a very small set of speakers (unsurprisingly, that narrow set exactly matches the most valuable consumers). speech logic takes a different approach in its pursuit of more representative speech recognition.

Based on datasets used in Stanford’s Racial Disparities in Speech Recognition study, Speechmatics recorded an overall accuracy of 82.8% for African American voices compared to Google (68.6%) and Amazon (68). ,6). This level of accuracy corresponds to a 45% reduction in speech recognition errors – the equivalent of three words in an average sentence.

The engine is exposed to hundreds of thousands of individual voices using untagged, more representative speech data that requires no human intervention. That has helped the coverage extend beyond Anglophones.

“Our progress over the years has flooded us with investor interest in our Series B fundraiser,” said Katy Wigdahl, CEO. “The Speechmatics team is extremely ambitious. We have combined a true heritage in speech technology with some of the world’s most talented speech and machine learning experts.”

At the moment the engine understands 34 languages, a small drop in a very large linguistic bucket (more than 7,000 languages ​​are spoken worldwide). But the platform has made impressive strides in punctuation, numbers, currencies and addresses, which traditionally hamper speech recognition engines.

All of this has sparked great interest in the UK-based company. Companies such as 3Play Media, Veritone, Deloitte UK and Vonage, as well as government agencies around the world, use the platform.

In line with its global goals, Speechmatics is headquartered in the UK, but has offices in Boston (US), Chennai (India) and Brno (Czech Republic). The company will use the investment to support global expansion in the United States and Asia-Pacific.

Leave a Comment

Your email address will not be published.