In my last Blogs I looked at SAM and Arduino/TTS. I was putting high hopes in CMU Flite:

CMU Flite (festival-lite) is a small, fast run-time open source text to speech synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative text to speech synthesis engine to Festival for voices built using the FestVox suite of voice building tools.

I was extending the project as well to provide a simple API and added some additional output scenarios, so that I could receive the data as stream: My extended project can be found on Github.

Like for SAM and TTS, the Arduino sketch for the Webserver is equally small (by using my arduino-audio-tools ):

#include "flite_arduino.h"
#include "AudioServer.h"

using namespace audio_tools;  

AudioWAVServer server("ssid","password");

// Callback which provides the audio data 
void outputData(Stream &out){
  Serial.print("providing data...");
  Flite flite(out);
  flite.say("Hallo, my name is Alice");

void setup(){
  // start data sink
  server.begin(outputData, 8000,1,16);

// Arduino loop  
void loop() {
  // Handle new connections

I did not get disappointed – this is so far the best voice quality:

But it comes at the cost of the size:

Sketch uses 2730326 bytes (86%) of program storage space. Maximum is 3145728 bytes.
Global variables use 38956 bytes (11%) of dynamic memory, leaving 288724 bytes for local variables. Maximum is 327680 bytes.

This might be just at the edge for an ESP32 but it is already too much for a Rasperry Pico…


Leave a Reply

Avatar placeholder

Your email address will not be published.