Automatic universal translation has long been a science fiction dream. A new document from artificial intelligence researchers at Facebook’s parent company, Meta, claims to have taken a step toward that goal.
The paper shows that machine learning, the technology behind AI, can translate 204 different languages, twice as many as ever before, with a higher quality than had been achieved before.
This includes more than a hundred little-spoken languages, such as the languages of the Acehnese people of Indonesia and the Chokwe people of central and southern Africa, which have always been difficult to translate for computers because they have very little online presence. .
Facebook and Meta CEO Mark Zuckerberg praised the feat, calling the AI translation a “superpower,” and the researchers themselves were just a little less elated.
It is the latest development in artificial intelligence, a controversial area of science recently in the spotlight after a Google engineer was laid off after claiming a chatbot could express thoughts and feelings.
“The paper presents impressive work to drive the quality of translation at the production level in 200 languages,” said Philipp Koehn of Johns Hopkins University, one of the 38 Meta academics and researchers who collaborated on the work.
“There will also be many resources released that will allow everyone to use this model and recycle it on their own, encouraging research in this area.”
Learn more about Artificial Intelligence
“So yeah, I think that’s a great thing.”
However, while the paper stated that it was “laying the important foundations for the realization of a universal translation system,” the computer scientists who were not involved in the project stressed that it was a small step on a path. long and sinuous, with no obvious end in sight. .
“An awesome feat of engineering”
The document’s central machine learning technique, a model known by the Baroque term Sparsely Gated Mixture of Experts, was not new in itself, said Dr. Alexandra Birch-Mayne, a lecturer in natural language processing at the University of Edinburgh. .
His most important contribution, he said, was to gather, clean up and present new data on languages that did not appear widely on the Internet, the main source of data for machine translation.
“It’s an impressive feat of engineering. It’s not necessarily a breakthrough when it comes to basic science,” Dr. Birch-Mayne told Sky News.
In addition to translating languages with fewer speakers, the document also claimed to set a new bar for translation quality.
Read more: Facebook chief Mark Zuckerberg has personally sued for a massive Cambridge Analytica data breach scandal
The data and algorithms will be made available to the public
Measuring progress in machine learning is a difficult task, but using a metric known as BLUE, the Meta document improved the quality of translation over the state of the art by 44%.
“BLUE is an imperfect metric,” said Dr. Diptesh Kanojia, a professor of artificial intelligence for natural language processing at the University of Surrey. “However, it is standard practice in natural language processing research to quote BLUE scores.”
“If we look at this improvement only in statistical terms, the 44% improvement is quite significant.”
Although the work will be used to improve Facebook software, the language data and algorithms used to translate it will be made available to the public, so for the first time there will be authorized language datasets. such as Eastern Yiddish, Northern Kurdish and Cape. Green Creole for other researchers to use.
Crucially, Meta researchers found native speakers to check their translations, a time-consuming task that helps safeguard both the quality of the algorithm and the data of the underlying language.
“What’s admirable is relating to the community. They’re not necessarily starting this trend, but they’re following good practices,” Dr. Birch-Mayne said, though he noted the limitations of the effort, which involved native speakers in Europe and the US. more than in the countries of origin of the languages.
Some investigators criticized the fact that the document had been published without peer review, accusing Meta of practicing “peer review by the media.”
Professor Koehn defended the approach, saying it was “a common practice in the field … for better or worse” and helped improve the speed of communication of research results.
Read more: Amazon’s Alexa will soon be able to read you stories with the voice of a loved one, even if they are dead
Advances in machine learning
The paper is one of the recent advances in machine learning, which is improving at a much faster rate than researchers expected. A model released last week by Google solved a third of MIT undergraduate students ’math problems with 50% accuracy, a dramatic increase in performance.
But while each new breakthrough leads to speculation about new forms of consciousness, most people in the field believe that AI systems are neither sensitive nor intelligent, saying that they only mimic the data that is’ ls offers. A robot lift is not on the cards.
The biggest danger of AI systems is that they will cause a disaster by giving humans a false confidence in their still very limited capabilities, a very real perspective given the sensitivity of the tasks that may involve translations on Facebook, which in the past s ‘he criticized for failing. have native moderators to detect calls for violence on your platform.
Zuckerberg promised that “advances here will allow more than 25 billion translations every day into our apps,” something Facebook said could include detecting harmful content, securing elections, and curbing online sexual exploitation.
Dr Birch-Mayne, who has just completed a three-year project on 17 languages in Africa and India, working with the BBC, warned that you should not use machine translation for anything where accuracy really matters. it matters.
“You can’t trust these systems,” he said. “It may be right, but maybe not.”