Contxto – Back in the day, text-to-speech (TTS) technology was boring and monotonous. Not wanting to lull customers to sleep, some startups such as Vozy are developing solutions to make these machine-generated voices more lifelike.

While the Colombian startup has developed advanced English TTS technology, now it offers Spanish services, too. 

Based on the regional diversity of the Spanish language, Vozy’s bilingual virtual agents can now distinguish between various accents. This way, the AI technology leveraging neuronal TTS can adapt to customers, depending on how fast they speak or roll their “rs,” for example. 

Neuronal TTS

Out with the old and in with the new. While Vozy has Colombian origins, the startup based in Miami intends to replace standard TTS models with neuronal upgrades.

First, let’s back-track. By standard, I mean agonizingly dull voices with little to no character replaced with something more relatable. Since machine-generated voices follow text-to-speech scripts, the original system divided the text into small units.

Like a puzzle, users would essentially adjoin pieces of audio according to the units. Typically, this required large amounts of data to accurately correspond with the text. Needless to say, this was often a long and complicated process. 

Instead, neuronal TTS sounds more realistic due to machine learning models of converting text to voice. First, the text goes into the system followed by an acoustic generator. From there, it goes to an acoustic vocoder where the sound is produced. 

With this comes the ability to train machines to adapt to unique speech styles, just like a human could. Rather than spending a year in Argentina to learn the regional accent, the neuronal model allows the machine to master these nuances in just a few hours. Overall, this process is more concise than its predecessor. 

Behind this service is machine learning that’s converting code text into culturally-specific voices. Once the coded text becomes a string of characters, they turn into a sequence of “cepstrum coefficients,” meaning frequencies. When these go through the vocoder, this is where the noises become a continuous audio signal.

Voice recognition

Equipped with this communication solution, companies will be better able to serve customers in the Spanish-speaking world. All in all, the Colombian startup combines voice technology, AI and human understanding to develop personalized customer interactions at scale.

So far, the neural voice text technology is available in eight accents. These reportedly include Colombian, Mexican, Argentine, Peruvian, Puerto Rican, among others. Today, Vozy has more than 200 customers in 15 countries, including MAPFRE and Infopáginas in Puerto Rico. 

Recently, Vozy raised some funds from the Puerto Rico Science, Technology and Research Trust after collaborating with the Parallel18 accelerator. According to Vozy, it’s the only Latin American company providing this type of technology for the Spanish language.