I converted eSpeak NG to an Arduino Library.

The eSpeak NG is a compact open source software text-to-speech synthesizer for Linux, Windows, Android and other operating systems. It supports more than 100 languages and accents. It is based on the eSpeak engine created by Jonathan Duddington.

eSpeak NG uses a “formant synthesis” method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings. It also supports Klatt formant synthesis, and the ability to use MBROLA as backend speech synthesizer.

Overview

The first step to convert the espeak-ng project to an Arduino library was quite easy. I just needed to move the files into the src directory, so that they can be found by Arduino and delete some implementation files, that were disturbing.

The project was compiling now, but nothing was working yet. There were the following major challenges:

  • The project is reading and processing an undefined number of configuration files with the C API.
  • For the audio output the project was using pcaudiolib.
  • The program was crashing with memory errors
  • The C++ API was not producing any audio output

Handling The Audio Output

This was quite straight forward. All I needed to do was to provide some pcaudiolib implementation. As a first thought I planned to integrate the AudioTools library, but I did not like the idea of having this specific dependency. In the end I decided to provide an API which is based on Arduino Streams. We can register any Arduino Stream as output and can query the audio parameters:

The relevant new methods are:

  audio_info espeak_info = espeak_get_audio_info();
  espeak_set_audio_output(&stream);

Handling Configuration Files

The ESP32 provides a virtual file system, so we can use the regular C file API together e.g. with an SD card. However I soon realized that the data is loaded into RAM and it is just too big!

Here is the minimum required data for the english language:

size file
166916 en_dict
2040 intonations
140 lang/en
550424 phondata
39062 phonindex
55764 phontab

The best place to store this constant information is in PROGMEM. I therefore created a new arduino-posix-fs project with the goal to provide an API to access the PROGMEM data with the regular file operations.

Here is the syntax to make the data available as files:

  file_systems::FileSystemMemory fsm("/mem"); // File system data in PROGMEM
  // setup min file system
  fsm.add("/mem/data/phontab", espeak_ng_data_phontab,espeak_ng_data_phontab_len);
  fsm.add("/mem/data/phonindex", espeak_ng_data_phonindex,espeak_ng_data_phonindex_len);
  fsm.add("/mem/data/phondata", espeak_ng_data_phondata,espeak_ng_data_phondata_len);
  fsm.add("/mem/data/intonations", espeak_ng_data_intonations,espeak_ng_data_intonations_len);
  // add language specific files
  fsm.add("/mem/data/en_dict", espeak_ng_data_en_dict,espeak_ng_data_en_dict_len);
  fsm.add("/mem/data/lang/en", espeak_ng_data_lang_gmw_en, espeak_ng_data_lang_gmw_en_len);

The espeak-ng application then accesses this information via regular file operations or by the preferred way with mem_map() which provides a pointer to the data:

size_t result_size;
uint8_t* mem_map("/mem/data/phontab", &result_size);

Program Crashes

I could identify the root cause of the crashes to the use of big data structures on the stack. So I needed to perform quite some optimizations that are activated with #define ESPEAK_STACK_HACK 1 in config.h
The standard workaround was to move the big data structures from the stack to the heap.

I also changed the provided fixed arrays as const, to make sure that they are stored in PROGMEM.

The API

ESpeak provides a Functional API, so I decided to add a simple C++ API on top of it.

Here is the Arduino Example that is using my Arduino C++ API.

Here as well, I was running into some really strange issues: On the Desktop and on PlatformIO the example was working as expected, but only in Arduino it did not work and did not produce any audio output. It was pretty nasty to resolve this since I needed to add plenty of println() statements to find which method was causing the issue: It turned out that the espeak-ng FindReplacementChars() method was running into an endless loop.

I still don’t know the root cause, but I added some additional break logic to prevent the issue.

Source Code and Further Information

You can find the source code on Github. If you plan to use this library, please read the README.md of the project.

Conclusion

The undertaking turned out to be much more difficult than initially thought, but I am happy that it is finally working now


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *