ChatGPT is a marvel of multilingualism - Hindustan Times
The hype that followed ChatGPT’s public launch last year was, even by the standards of tech innovations, extreme. OpenAI’s natural-language system creates recipes, writes computer code and parodies literary styles. Its latest iteration can even describe photographs. It has been hailed as a technological breakthrough on a par with the printing press. But it has not taken long for huge flaws to emerge, too. It sometimes “hallucinates” non-facts that it pronounces with perfect confidence, insisting on those falsehoods when queried. It also fails basic logic tests.
ChatGPT - A Language Model
In other words, ChatGPT is not a general artificial intelligence, but an independent thinking machine. It is, in the jargon, a large language model. That means it is very good at predicting what kinds of words tend to follow which others, after being trained on a huge body of text—its developer, OpenAI, does not say exactly from where—and spotting patterns.
Generating Human-like Language
Amid the hype, it is easy to forget a minor miracle. ChatGPT has faced a problem that long served as a far-off dream for engineers: generating human-like language. Unlike earlier versions of the system, it can go on doing so for paragraphs on end without descending into incoherence. And this achievement’s dimensions are even greater than they seem at first glance. ChatGPT is not only able to generate remarkably realistic English. It is also able to instantly blurt out text in more than 50 languages—the precise number is apparently unknown to the system itself.
Multilingual Capabilities
Asked (in Spanish) how many languages it can speak, ChatGPT replies, vaguely, “more than 50”, explaining that its ability to produce text will depend on how much training data is available for any given language. Then, asked a question in an unannounced switch to Portuguese, it offers up a sketch of your columnist’s biography in that language. Most of it was correct, but it had him studying the wrong subject at the wrong university. The language itself was impeccable.
Portuguese is one of the world’s biggest languages. Trying out a smaller language, your columnist probed ChatGPT in Danish, spoken by only about 5.5m people. Danes do much of their online writing in English, so the training data for Danish must be orders of magnitude scarcer than what is available for English, Spanish, or Portuguese. ChatGPT’s answers were factually askew but expressed in almost perfect Danish. Indeed, ChatGPT is too modest about its own abilities.
Small Language Support
On request, ChatGPT furnishes a list of 51 languages it can work in, including Esperanto, Kannada, and Zulu. It declines to say that it can “speak” these languages, but rather “generates text” in them. This is too humble an answer. Addressed in Catalan—a language not on the list—it replies in that language with a cheerful “Yes, I do speak Catalan—what can I help you with?” A few follow-up questions do not trip it up in the slightest, including a query about whether it is merely translating answers first generated in another language into Catalan.
Implications for Language Technologies
Speakers of smaller languages have worried for years about language technologies passing them by. Somehow the developers of ChatGPT seem to have overcome such problems. It is too early to say what good the technology will do, but this alone gives one reason to be optimistic. As machine-learning techniques improve, they may not require the vast resources, in programming time or data, traditionally thought necessary to make sure smaller languages are not overlooked online.