AI'S GLOBAL VILLAGE OPENS WIDER TO MORE VOICES ...
Artificial intelligence engineer Jacky Chan Ho-kit has conflicting feelings about his industry. While he looks forward to a future where AI reaches its pinnacle — possessing humanlike cognitive capabilities — he is deeply concerned that it will only understand English. "Given the language status quo, this is highly likely to be a reality rather than just alarmism," he said.
Chan is the chief technology officer at Votee, a Hong Kong-based AI company. He is also a language enthusiast who in his free time follows language bloggers on social media, absorbing their linguistic insights. Through his research, he has learned that many languages are disappearing. Even though there are around 7,000 languages still in use globally, according to the World Atlas of Languages of UNESCO, only 10 boast more than 200 million speakers.
The Dominance of English Online
In the online realm, the disparity in language usage rates is even more pronounced. Over the last decade, English content has dominated the internet, accounting for 49.4 percent as of Nov 26 — more than eight times the use of Spanish, the second most prevalent online language at 6 percent, according to a report by W3Techs, a company that conducts global web surveys.

Conversely, the proportion of web pages that use Chinese, the second-most spoken language in the physical world with more than 1.1 billion speakers, has plummeted from 4.3 percent in 2013 to 1.2 percent in 2024.
Impact of Mainstream AI Language Models
In the realm of AI, prominent large language models, or LLMs, like Open-AI's ChatGPT4, Google's Gemini, and Anthropic's Claude all use English as their main language. Mainstream AI language models, particularly those originating in the West, are made for English-speaking audiences, with translations for other languages serving as only a support function.
Artificial intelligence is a field devoted to developing technologies that can replicate or even surpass human intelligence. Before this vision becomes real, large-scale AI companies will continue to prioritize enhancing AI's intelligence ability, instead of expanding their services to encompass more languages.
Challenges and Initiatives
Data scarcity is a significant hurdle in advancing AI's linguistic prowess. A significant hurdle in advancing AI's linguistic prowess is the scarcity of data available in numerous languages. Of about 7,000 languages spoken worldwide, nearly 99 percent are considered low-resource languages, as the data available for computational processing and analysis is limited.
Many smaller companies around the world are also venturing into the creation of small language models. Asiabots Ltd, a Hong Kong-based artificial intelligence company established in 2017, is one such company.

Preserving Endangered Languages
While commercial demand ensures the survival of languages with a large offline population, those with few speakers, limited commercial interest, and insufficient technological research are at risk of becoming endangered both online and offline.
With hundreds of indigenous languages in Africa at risk of extinction, Votee has worked with clients on the continent to assist in language preservation efforts. However, significant challenges stem from Africa's political instability, limited technological proficiency, and insufficient technology infrastructure.
In recent years, many clients have asked Asiabots to develop language models for the preservation of endangered languages.










