Erik Bjorgan Makes Voice Cloning Easy with the Applio- and Piper-Based TextyMcSpeechy Project
Maker Erik Bjorgan has found a new use for his Raspberry Pi: voice cloning with a software workflow he calls TextyMcSpeechy. This innovative project allows users to clone their own voices or any other voice and use it for text-to-speech (TTS) applications.
Bjorgan explains that TextyMcSpeechy was born out of his need for a simple voice cloning solution for Piper's TTS functionality. Unable to find an easy-to-use tool, he decided to create one himself. By leveraging the power of the Piper neural network for on-device text-to-speech and the Applio transformer-based voice conversion tool, Bjorgan developed a software approach to speech generation.

The Technology Behind TextyMcSpeechy
TextyMcSpeechy combines the capabilities of the Piper neural network and the Applio voice conversion tool to create custom voice models. With the help of Applio, the tool can train Piper to mimic a target voice using an existing voice dataset. This means that users can create TTS voices that sound like them or any other person, even if the dataset does not contain recordings of the target voice.
Bjorgan emphasizes the importance of having a voice dataset with similar tone and accent to the target voice for optimal results. He also notes that some datasets may include audio from multiple speakers, which can be both a challenge and an opportunity for voice cloning.

Hardware Requirements and Applications
While the training process may require a powerful workstation with an NVIDIA GPU for acceleration, the speech generation itself can be done on more modest hardware like a Raspberry Pi single-board computer. Bjorgan shares his plans to use the software for creating celebrity voice-enabled smart home assistants through Home Assistant's open AI conversations integration.
For those interested in exploring TextyMcSpeechy, the project is available on Erik Bjorgan's GitHub repository under the permissive MIT license. More information can be found in the Reddit thread dedicated to the project.

With Erik Bjorgan's TextyMcSpeechy project, voice cloning for TTS applications becomes more accessible and versatile, opening up a world of possibilities for personalized voice synthesis.