MBROLA is a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database.
Today I was playing around with MBROLA and was impressed by the quality of the speech. ESpeak provides an integration into the MBROLA speech synthesizer, so I was considering to integrate this implementation in my library as well.
But then I realised that the corresponding voices need at least 5 to 20 MB of memory. Unfortunately this is much more the we have available on any Microcontroller!