ESpeak and MBROLA ?

MBROLA is a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database.

Today I was playing around with MBROLA and was impressed by the quality of the speech. ESpeak provides an integration into the MBROLA speech synthesizer, so I was considering to integrate this implementation in my library as well.

But then I realised that the corresponding voices need at least 5 to 20 MB of memory. Unfortunately this is much more the we have available on any Microcontroller!

Bad luck…

2 Comments

knghtbrd · 26. August 2025 at 19:33

Notably, there are rp2040 boards which include 16MB of flash in addition to what chip gives you. If you can load the voice data into the extra flash, you’ll likely have enough program space for the engine.

pschatzmann · 26. August 2025 at 22:22

Google gives this: For instance, a single voice file can be tens to hundreds of megabytes, depending on the complexity and extent of the language data.

Published by pschatzmann on 19. November 202219. November 2022

2 Comments

knghtbrd · 26. August 2025 at 19:33

pschatzmann · 26. August 2025 at 22:22

Leave a Reply Cancel reply

Microcontroller FFT & IFFT Performance Benchmark (N=64)

USB Audio Class 2 for STM32 Arduino

USB Audio Class 2.0 for Arduino

ESpeak and MBROLA ?

Published by pschatzmann on 19. November 202219. November 2022

see also:

2 Comments

knghtbrd · 26. August 2025 at 19:33

pschatzmann · 26. August 2025 at 22:22

Leave a Reply Cancel reply

Related Posts

Microcontroller FFT & IFFT Performance Benchmark (N=64)

USB Audio Class 2 for STM32 Arduino

USB Audio Class 2.0 for Arduino