Innovating with AI: Naver's HyperClova X Setting New Standards

Published On Thu Aug 22 2024
Innovating with AI: Naver's HyperClova X Setting New Standards

Naver unveiled a voice synthesis technology based on Generative AI

Naver recently announced a groundbreaking voice synthesis technology based on Generative AI through its technology blog on the official site of Clova. This new technology showcases Naver's commitment to innovation and advancement in the field of artificial intelligence.

Advanced Voice AI Technology

The technology, known as HyperClova X-based voice AI, represents a significant leap forward in speech recognition and synthesis capabilities. By leveraging a Large Language Model (LLM), this new model enables natural conversations that enhance language structure, pronunciation accuracy, and emotional expression.

Clova Lamp is an AI-powered light that reads books to children

Naver compares this technology to OpenAI's GPT-4o (Omni) model, highlighting its ability to process various types of data simultaneously, including text, images, and voices. This advancement opens up possibilities for more complex interactions between AI systems and humans, paving the way for innovative applications that merge voice, vision, and text.

Integration of Technologies

The voice synthesis technology by Naver combines HyperCLOVAX, the latest Large Language Model (LLM), and Universal Speech Dialogue Model (USDM) to create Speech X. By integrating these models, Naver is able to handle speech understanding and processing within a unified system, eliminating the need for separate modules.

Through examples like a woman in her 40s speaking with embarrassment, Naver demonstrates the capabilities of its voice AI technology. The company has already proven its expertise with various AI services such as AI voice recording, hello phone assistance, and AI voice synthesis, and now aims to enhance user experiences with voice multimodal LLM technology.

Future Developments

Naver's HyperClova X is evolving into a Large Vision Language Model, incorporating image comprehension and voice multimodal language capabilities. This advancement will enhance Naver's services, including the interactive AI agent ClovaX, to deliver new user value and expand the HyperClovaX ecosystem.

Mockingboard - Wikipedia

Furthermore, Naver has enhanced Clova X's image comprehension function, allowing users to interact with AI based on uploaded images. This feature enables ClovaX to analyze images, describe phenomena, and infer situations, expanding its utility beyond logical and code-based tasks.

Enhanced User Experience

By combining HyperClova X with AI-based document processing and character recognition, Naver aims to provide more accurate and reliable services. The technology has already demonstrated impressive results, such as achieving an 84% correct answer rate in solving exam questions, outperforming other AI models.

With its ongoing innovations in voice synthesis and AI technology, Naver continues to set new standards in the industry, promising exciting developments for the future.