Detecting Baby Cry with TinyML, ChatGPT, and Synthetic Data

The combination of ChatGPT, TinyML, and Text-to-Audio technologies can be utilized to create artificial data for detecting baby crying. TinyML, which focuses on bringing the power of artificial intelligence to low-power devices, is particularly useful for applications that require real-time processing. In the field of machine learning, it can be a challenge in locating and gathering datasets. However, the use of synthetic data enables the training of ML models in a manner that is both cost-effective and adaptable, eliminating the necessity for significant quantities of real-world data.

Creating a Baby Cry Detection System

Developing customized voice recognition models can now be easily done with the use of the Arduino Nicla Voice development board. With Syntiant's deep learning processors that operate on ultra-low-power, the Nicla Voice is capable of providing Always-On Speech, gesture, and motion recognition on the edge. It can be incorporated into wearable devices, allowing for AI integration while requiring minimal energy consumption.

To create a baby cry detection system using Edge Impulse platform, we will train a machine learning model using synthetic data. This will differentiate between the occurrence of a baby's cry or the presence of background noise. By using ChatGPT to generate different prompts, the process of writing prompts for a machine learning model can be streamlined. It can also result in a wider range of diverse prompts, which can improve the accuracy and effectiveness of the machine learning model.

Generating Prompts with ChatGPT

Here are text prompts for Baby crying and background noise scenarios, which were generated using ChatGPT.

Prompts for Baby Crying Scenario:

What's making the baby cry?
Is the baby crying?
Does the baby need attention?
Why is the baby crying?
Is the baby in distress?

Prompts for Background Noise Scenario:

Is there background noise?
What kind of background noise is there?
Is there a lot of noise in the background?
What's causing the background noise?
Can you tell me more about the background noise?

Converting Text to Audio with AudioLDM

To produce audio files from text, we will use a text-to-audio generation tool called AudioLDM. This tool utilizes the latent diffusion model to generate high-quality audio from text. You will require a standalone computer with a powerful CPU to use AudioLDM. While having a dedicated GPU is recommended, it is not mandatory. To test the functionality of AudioLDM, you can try it out online via Hugging Face.

The next step involves configuring the Python environment. We will be using virtualenv for managing virtual environments, which can be installed using the following command:

Once virtualenv is installed, add the following lines to the ~/.bashrc file:

And add the following lines:

To activate the changes, execute the following command:

Now we can create a virtual environment using the mkvirtualenv command. Install PyTorch using pip, and then install audioldm package. Run the following command to generate audio files using text prompts, which were generated using ChatGPT:

Training a Machine Learning Model with Edge Impulse

Once the wav audio samples have been collected, they can be fed into the neural network to initiate the training process for automatic detection of whether a baby is crying or if there is background noise present. Edge Impulse is a web-based tool that helps us quickly and easily create AI models that can then be used in all kinds of projects. We can create Machine Learning models in a few simple steps and build custom image classifiers with nothing more than a web browser.

To train the model, navigate to Create Impulse on the left navigation menu. Add a processing block and add Audio (Syntiant), since it is very suitable for Syntiant NDP120 based development boards. Then add a learning block and add Classification with two output classes.

To generate features, navigate to Syntiant. Then click on Save parameters. Finally, click on the Generate features button. Train the model by pressing the Start training button. If everything goes correctly, you should see the following in the Edge Impulse. A validation accuracy of 90.7% can be obtained, and the final quantized model file(int8) is around 5KB in size and achieved an accuracy of almost 90%.

It is always interesting to take a look at a model architecture as well as its input and output formats and shapes. You can use a program like Netron to view the neural network.

Press serving_default_x:0 in Netron to view the model architecture and the neural network design.