In my last couple of Blogs I was comparing the following Text To Speach (TTS) libraries which are available on Arduino:

  • SAM Software Automatic Mouth
  • TTS Text-to-Speech Library for Arduino
  • Flite Festival lite

I was hoping to find some TinyML based implementations, but so far without success: I put this on my to-do list for some long cold winter days.

As a conclusion we see that the sound quality is directly related with the memory consumption, so we might never get any high quality speech generated from Microcontrollers because we just don’t have enough memory available. I think there is a good reason why Google and Amazon are only providing their TTS functionality over the network.

An alternative approach might be to record all required words, store them on a SD drive and just use these recordings to generate the sound output.

I think the best option for dynamaically generated TTS is to delegate the “Speech Generation” (and maybe even the output) to a separate machine: A Raspberry Pi makes already all the difference and there are plenty of resources on the internet which cover this topic.

My TTS projects of choice are

  • Rhasspy which provides multiple different TTS implementations and a simple REST API.
  • Mozilla TTS which implements some state of the art models

Sending a Post request to the Rhasspy URL “http://address:12101/api/text-to-speech” is returning a WAV file: Here is the corresponding Arduino sketch which will send the request to Rhasspy and provides the output to I2S:

#include "AudioTools.h"
#include "CodecWAV.h"

using namespace audio_tools;  

// UrlStream -copy-> AudioOutputStream -> WAVDecoder -> I2S
URLStream url("ssid","password");
I2SStream i2s;                  // I2S stream 
WAVDecoder decoder(i2s);        // decode wav to pcm and send it to I2S
AudioOutputStream out(decoder); // output to decoder
StreamCopy copier(out, url);    // copy in to out


void setup() {
  Serial.begin(115200);
  AudioLogger::instance().begin(Serial, AudioLogger::Debug);  

// setup i2s output
  auto config = i2s.defaultConfig(TX_MODE);
  config.sample_rate = 16000; 
  config.bits_per_sample = 16;
  config.channels = 1;
  i2s.begin(config);

// rhasspy
  url.begin("http://192.168.1.37:12101/api/text-to-speech?play=false", POST, "text/plain","Hallo, my name is Alice");
}

void loop(){
  // copy audio from url -> i2s
  if (!copier.copy()) {
    i2s.end();
    LOGI("stopped");
    stop();
  }
}

This sketch (which is part of the arduino-audio-tools library) is also available on github.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published.