Language is our lifeline to the world. But because there are no high-quality translation tools for hundreds of languages, billions of people today cannot access digital content or fully participate in conversations and communities online in their favorite or native language. This is a particular problem for hundreds of millions of people who speak the many languages of Africa and Asia.
To help people better connect today and be part of tomorrow’s metaverse, our AI researchers have created No Language Left (NLLB), an effort to develop high-quality machine translation capabilities for most of the world’s languages. Today we are announcing a major breakthrough in NLLB: we have built a single AI model called NLLB-200 that translates 200 different languages with results that are far more accurate than what previous technology could achieve.
When comparing the quality of translations with previous AI research, the NLLB-200 scored an average of 44% higher. For some African and Indian languages, the NLLB-200’s translations were over 70% more accurate.
In order to evaluate and improve the NLLB-200 as best as possible, we have FLORES-200, a dataset that allows researchers to assess the performance of this AI model in 40,000 different language directions. With FLORES-200, we can measure the performance of the NLLB-200 in any language to confirm that the translations are of high quality.
And to help other researchers improve their translation tools and build on our work, we’re making NLLB-200 models and the FLORES-200 dataset available to developers, in addition to our model training code and code to recreate the training dataset.
We are also awarding up to $200,000 in grants for impactful use of NLLB-200 to researchers and non-profit organizations with initiatives focusing on sustainability, food security, gender-based violence, education or other areas in support of the UN Sustainable Development Goals† Non-profit organizations interested in using NLLB-200 to translate two or more African languages, as well as researchers working in linguistics, machine translation and language technology, are invited to apply.
These research results will support more than 25 billion translations per day in Feed on Facebook, Instagram and our other technologies. You can explore a demo of NLLB-200 and take a deeper dive how we developed this model.
Extended translation and greater integration
A handful of languages, including English, Mandarin, Spanish, and Arabic, dominate the Internet. Native speakers of these very widely spoken languages can take for granted how meaningful it is to read something in your own native language. NLLB will help more people to read things in their preferred language, instead of always needing an intermediate language that often gets the sentiment or content wrong.
This work could also help advance other technologies, such as building assistants that work well in languages like Javanese and Uzbek, or creating systems to make Bollywood movies and add accurate subtitles in Swahili or Oromo.
As the metaverse begins to take shape, the ability to build technologies that work well in a wider range of languages will help democratize access to immersive experiences in virtual worlds.
Learn more about our work to build NLLB-200which will help make the metaverse accessible to more people around the world.