Meta has open sourced an AI model that can be translated into 200 different languages, the company announced Wednesday — a move that should open up different technologies and digital content to a much wider audience. The model, called No Language Left Behind, can translate into 200 languages, including 55 African languages, with high-quality results.
“A handful of languages — including English, Mandarin, Spanish and Arabic — dominate the web,” the company noted in a blog post† “Native speakers of these very widely spoken languages can take for granted how meaningful it is to read something in your own native language. NLLB will help more people read things in their preferred language, rather than always needing an intermediate language that often feeling or content wrong.”
Meta, of course, uses NLLB to improve its own products, but by making the model open source, technologists can use it to build other tools — like an AI assistant that works well in languages like Javanese and Uzbek, or subtitles in Swahili. or Oromo for Bollywood movies.
NLLB nearly doubles the number of languages covered by a single, state-of-the-art AI model. Meta says that many of these languages, such as Kamba and Lao, were not or not well supported by existing translation tools. Currently, less than 25 African languages are supported by widely used translation tools.
The model also improves translation quality by an average of 44%, compared to previous AI research. For some African and Indian languages, the NLLB-200’s translations were over 70% more accurate. To determine the quality of translations, Meta performed both automated metric evaluations and human evaluations.
To help ensure the quality of translations, Meta researchers FLORES-200a dataset that helps them evaluate NLLB’s performance in 40,000 different language directions.
In addition to open sourcing NLLB-200 models, Meta also makes the FLORES-200 dataset available to developers, as well as model training code and code to recreate the training dataset.
In addition, the company awards up to $200,000 in grants for impactful use of NLLB-200 to researchers and non-profit organizations with initiatives focused on sustainability, food security, gender-based violence, education or other areas in support of the United Nations Sustainable Development Goals.
For its own products, Meta expects the model to support more than 25 billion translations every day. In addition to translating content and displaying better advertisements, the model will be used to detect malicious content and misinformation.
Meta’s NLLB research is also applied to translation systems used by Wikipedia editors. Meta has partnered with the Wikimedia Foundation, the non-profit organization that hosts Wikipedia and other free knowledge projects, to help improve Wikipedia’s translation systems. There are versions of Wikipedia in over 300 languages, but most have far fewer articles than the more than 6 million available in English.
Editors can use the technology behind NLLB-200, through the Wikimedia Foundation’s Content Translation Tool, to translate articles into more than 20 languages with limited resources (those who don’t have extensive datasets to train AI systems).