Developing a Voice Changer App Using Superpowered Libraries

We developed the new voice changer feature for smart robotics customers that utilized the FFmpeg and DTLN libraries for handling multimedia data.

Table of contents
Contributors
Shilpa Ramaswamy

The Client & the Challenge

Personal, companion and entertainment robots aim to bridge the gap between intelligent technologies and human behaviour. Our client, a smart toy company specializing in child-friendly robots from ages 1-6, wanted to develop an android user interface that could clone, modify and customize voices. The idea was to create an app similar to the popular Talking Tom feature, making it easier for children to interact with the entertainment robots - boosting social, emotional and creative development. Our client wanted a voice changer solution that could give the desired output within seconds. In addition, they were searching for a new feature to modify children’s voices, allowing the addition of accents, pitches and tones.

Industry Overview

Disruption

The Entertainment Robots market is expected to grow by 25.23% from USD 1526.66 million in 2022 to USD 5887.84 million by 2028, with Smart Toys, Educational Robots, and Robotic Companion Pets as types, and Media, Education, Retail, and Others as applications. The Asia-Pacific region is leading this market, thanks to camera and sensor technological advancements. Additionally, the integration of technology-led education is driving strong growth, increasing awareness of the value of technology in learning. Thus, the entertainment robotics market is witnessing increasing demand for smart toys with advanced voice recognition features and hardware sensors to provide interactivity and increase product intelligence. Major players are investing heavily in research and development to capitalize on these trends.

Business Challenge

Customer Experience

From a business perspective, there are a few key factors to consider when considering voice changer applications for entertainment robots. Maintaining brand identity, messaging consistency and the overall appeal of the voice must all be taken into account – as should the quality and clarity of the voice which may require specialized tech or hardware. To make sure that the modified voice is viable for programming onto the smart robot's hardware or a software application, time and cost investments will likely need to be factored into its production. Finally, regulatory inspections should be put in place to ensure that safety guidelines have been met. Addressing these challenges can help create a more enjoyable experience for kids whilst providing manufacturers with an end product of high calibre.

The integration of technology-led education is driving strong growth in the Entertainment Robots market.


Solution

We used a blend of superpowered libraries and android studio to solve the customer's challenge. Here is a breakdown of the steps we used:

Step 1: Data preprocessing and libraries

The audio output was preprocessed to guarantee a noise-free experience. Four different voice types (baby, chipmunk, robot, monster, and echo) were created, followed by the integration of sliders to personalize the parameters for each sound type for an enhanced UI/UX. The collected data sets were meticulously labeled, cleaned, and relevant features were selected. Preprocessed data sets were organized into libraries, and the functionalities were thoroughly tested to ensure performance.

Step 2: App development on Android Studio

The latest SDKs and tools were installed on the Android Studio. Using the Android Studio’s layout editor, the UI was designed to facilitate user interaction with the app. Next, coding commenced incorporating the requested features and functionalities. Android framework was utilized for the creation of various voice effects, including pitch, voice tone, and distortion that could be applied to the end user’s voice.

Step 3: Integration process for voice sampling

We integrated a third-party library called Superpowered which offered the effortless production of intricate voice effects. The app was tested by deploying it onto a physical device to ensure functionality. After which, it was uploaded to the Google Play Store for Android users to download and enjoy.


Impact Delivered

  • With the use of the C++ Superpowered Library, we successfully launched the new voice-changing feature on our client’s application in less than 2 months.

Top Benefits

  • We developed the new voice-changing feature that utilized the FFmpeg and DTLN libraries for handling multimedia data, including transcoding, packaging, streaming, and playback of audio.
  • Open source scalability was achieved for our client’s software specifications.
  • The application was capable of handling diverse media formats with minimal user intervention, saving time and effort.
  • The solution supported containers, including avi, mp4, mp3, wma, wav, ts, flv, mkv, and many more, ensuring compatibility with a wide range of media formats.
  • Due to the DTLN library, we could successfully filter out background noises.

The Akaike Edge

Inbuilt libraries with transfer learning capabilities