In my last couple of Blogs I was comparing the following Text To Speach (TTS) libraries which are available on Arduino:
I was hoping to find some TinyML based implementations, but so far without success: I put this on my to-do list for some long cold winter days.
As a conclusion we see that the sound quality is directly related with the memory consumption, so we might never get any high quality speech generated from Microcontrollers because we just don’t have enough memory available. I think there is a good reason why Google and Amazon are only providing their TTS functionality over the network.
An alternative approach might be to record all required words, store them on a SD drive and just use these recordings to generate the sound output as demonstrated in my arduino-simple-tts project.
I think the best option for dynamaically generated TTS is to delegate the “Speech Generation” (and maybe even the output) to a separate machine: A Raspberry Pi makes already all the difference and there are plenty of resources on the internet which cover this topic.
My TTS projects of choice are
- Rhasspy which provides multiple different TTS implementations and a simple REST API.
- Mozilla TTS which implements some state of the art models
Sending a Post request to the Rhasspy URL “http://address:12101/api/text-to-speech” is returning a WAV file: Here is the corresponding Arduino sketch which will send the request to Rhasspy and provides the output to I2S:
#include "AudioTools.h"
#include "CodecWAV.h"
using namespace audio_tools;
// UrlStream -copy-> AudioOutputStream -> WAVDecoder -> I2S
URLStream url("ssid","password");
I2SStream i2s; // I2S stream
WAVDecoder decoder(i2s); // decode wav to pcm and send it to I2S
AudioOutputStream out(decoder); // output to decoder
StreamCopy copier(out, url); // copy in to out
void setup() {
Serial.begin(115200);
AudioLogger::instance().begin(Serial, AudioLogger::Debug);
// setup i2s output
auto config = i2s.defaultConfig(TX_MODE);
config.sample_rate = 16000;
config.bits_per_sample = 16;
config.channels = 1;
i2s.begin(config);
// rhasspy
url.begin("http://192.168.1.37:12101/api/text-to-speech?play=false", POST, "text/plain","Hallo, my name is Alice");
}
void loop(){
// copy audio from url -> i2s
if (!copier.copy()) {
i2s.end();
LOGI("stopped");
stop();
}
}
This sketch (which is part of the arduino-audio-tools library) is also available on github.
0 Comments