The World Wide Web (WWW) and the WWW browser have permeated our lives and have revolutionized the way we get information and entertainment, how we socialize and how we conduct business.
Using new tools that make it easy and inexpensive to develop speech-based agents, Stanford researchers are now proposing the creation of the World Wide Voice Web (WWvW), a new version of the World Wide Web that people can fully navigate. . by using voice.
About 90 million Americans already use smart speakers to stream music and news, as well as perform tasks such as ordering groceries, scheduling appointments and controlling their lights. But two companies essentially operate these voice gateways to the voice web, at least in the United States: Amazon, the pioneer of Alexa; and Google, which developed Google Assistant. In fact the two services are walled gardens. These oligopolies create major imbalances that allow the technology owners to favor their own products over those of competing companies. They determine what content they make available and what fees they charge for acting as an intermediary between companies and their customers. In addition, their proprietary smart speakers compromise privacy because they eavesdropping on conversations as long as they are connected.
The Stanford team, led by a computer science professor Monica Lam at the Stanford Open Virtual Assistant Laboratory (OVAL), has developed an open-source privacy-protecting virtual assistant called Genie and cost-effective voice agent development tools that can provide an alternative to its proprietary platforms. The scientists also hosted a workshop on November 10 to discuss their work and introduce the design of the World Wide Voice Web (see the full event†
What is the WWvW?
Like the World Wide Web, the new WWvW is decentralized. Organizations publish information about their voting agents on their websites, which can be accessed by any virtual assistant. In WWvW, Lam says, the voice agents are like web pages, providing information about their services and applications, and the virtual assistant is the browser. These voice agents can also be made available as chatbots or call center agents, making them accessible on the computer or over the phone as well.
“WWvW has the potential to reach even more people than WWW, including those who are not tech savvy, those who cannot read and write well, or may not even speak written language,” Lam says. For example, Stanford Computer Science Assistant Professor Chris Piech, with graduate students Moussa Doumbouya and Lisa Einstein, working on developing speech technology for three African languages that could help bridge the gap between illiteracy and access to valuable resources, including agricultural information and medical care. Unlike Amazon and Google’s commercial voice web, which is only available in certain markets and languages, the decentralized WWvW enables society to provide voice information and services in any language and for any use, including education and other humanitarian causes that do not have large monetary returns,” says Lam.
Why weren’t these tools created before? The Stanford team says: It’s just really hard to make speech technology. Amazon and Google have invested huge amounts of money and resources to deliver the AI Natural Language Processing technologies to their respective assistants and employ thousands of people to annotate the training data. “The technology development process has been expensive and extremely labor-intensive, creating a huge barrier to entry for anyone looking to offer commercial-grade smart voice assistants,” Lam says.
For the past six years, Lam has collaborated with Stanford PhD student Giovanni Campagna, professor of computer science James Landayand Christopher Manning, professor of computer science and linguistics, at OVAL to develop a new voting agent development methodology that is two orders of magnitude more efficient than current solutions. The open-source Genie Pre-trained Agent Generator they created dramatically reduces costs and resources when developing voice agents in different languages.
Interoperability is an important part of ensuring that devices can communicate seamlessly with each other, Lam notes. At the heart of the Genie technology is a distributed programming language they created for virtual assistants called ThingTalk. It enables interoperability of multiple virtual assistants, web services and IoT devices. Stanford currently offers the first course on ThingTalk, Virtual conversation assistants using Deep Learningthis fall.
As of today, Genie has pre-trained agents for the most popular speech skills, such as music playback, podcasts, news, restaurant recommendations, reminders and timers, as well as support for more than 700 IoT devices. These agents are openly available and can be applied to other similar services.
World Wide Voice Web Conference
The OVAL team presented these concepts at a workshop focused on the World Wide Voice Web on Nov 10.
The conference included speakers from academia and industry with expertise in machine learning, natural language processing, computer-human interaction and IoT devices, and panelists discussed building a speech ecosystem, pre-trained agents and the social value of a speech web. The Stanford team also gave a live demonstration of Genie.
“We want other people to join us in building the World Wide Voice Web,” said Lam, who is also a Stanford faculty member. Institute of Human-Centric Artificial Intelligence† “The original World Wide Web grew slowly at first, but once it took off, it was unstoppable. We hope to see the same with the World Wide Voice Web.”
Genie is an ongoing research project funded by the National Science Foundation, the Alfred P. Sloan Foundation, the Verdant Foundation, and Stanford HAI.