Text To Speach in Arduino - Final Conclusions

In my last couple of Blogs I was comparing the following Text To Speach (TTS) libraries which are available on Arduino:

SAM Software Automatic Mouth
TTS Text-to-Speech Library for Arduino
Flite Festival lite

I was hoping to find some TinyML based implementations, but so far without success: I put this on my to-do list for some long cold winter days.

As a conclusion we see that the sound quality is directly related with the memory consumption, so we might never get any high quality speech generated from Microcontrollers because we just don’t have enough memory available. I think there is a good reason why Google and Amazon are only providing their TTS functionality over the network.

An alternative approach might be to record all required words, store them on a SD drive and just use these recordings to generate the sound output as demonstrated in my arduino-simple-tts project.

I think the best option for dynamaically generated TTS is to delegate the “Speech Generation” (and maybe even the output) to a separate machine: A Raspberry Pi makes already all the difference and there are plenty of resources on the internet which cover this topic.

My TTS projects of choice are

Rhasspy which provides multiple different TTS implementations and a simple REST API.
Mozilla TTS which implements some state of the art models

Sending a Post request to the Rhasspy URL “http://address:12101/api/text-to-speech” is returning a WAV file: Here is the corresponding Arduino sketch which will send the request to Rhasspy and provides the output to I2S:

#include "AudioTools.h"
#include "CodecWAV.h"

using namespace audio_tools;  

// UrlStream -copy-> AudioOutputStream -> WAVDecoder -> I2S
URLStream url("ssid","password");
I2SStream i2s;                  // I2S stream 
WAVDecoder decoder(i2s);        // decode wav to pcm and send it to I2S
AudioOutputStream out(decoder); // output to decoder
StreamCopy copier(out, url);    // copy in to out


void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Debug);  

// setup i2s output
  auto config = i2s.defaultConfig(TX_MODE);
  config.sample_rate = 16000; 
  config.bits_per_sample = 16;
  config.channels = 1;
  i2s.begin(config);

// rhasspy
  url.begin("http://192.168.1.37:12101/api/text-to-speech?play=false", POST, "text/plain","Hallo, my name is Alice");
}

void loop(){
  // copy audio from url -> i2s
  if (!copier.copy()) {
    i2s.end();
    LOGI("stopped");
    stop();
  }
}

This sketch (which is part of the arduino-audio-tools library) is also available on github.

If your microcontroller does not support I2S you can use the following output classes instead:

AnalogAudioStream
PWMAudioOutput
VS1053Stream

Addendum

A lot has happend since I wrote this library. The generic TTS Arduino library with the best audio quality by far is my arduino-espeak-ng.

Here is the updated list of all my tts blogs that cover the topic TTS on micro controllers.

Text To Speach in Arduino – Final Conclusions

Published by pschatzmann on 23. June 202123. June 2021

0 Comments

Leave a Reply Cancel reply

Arduino

Arduino Audio Tools: Introducing Pipelines

Arduino

ESP32-A2DP: Redesigning the I2S output

Arduino

Arduino Audio Tools: Using Tasks

Text To Speach in Arduino – Final Conclusions

Published by pschatzmann on 23. June 202123. June 2021

see also:

0 Comments

Leave a Reply Cancel reply

Related Posts

Arduino

Arduino Audio Tools: Introducing Pipelines

Arduino

ESP32-A2DP: Redesigning the I2S output

Arduino

Arduino Audio Tools: Using Tasks