Vibe Coding: Adding M4A Audio Playback Support to the AudioTools

Last week I was asked, if my Arduino Audio Tools project was supporting the playback of M4A files, which it didn’t: but I thought that would be quite a good opportunity to try out Vibe Coding, since this problem space is well documented and quite established it seemed to be quite a good fit. Needless to say that at that point of time I did not know anything about M4A. So I just asked Claude to generate me an audio file extractor class for AAC which follows my Container API.

And it did generate quite a lot of detailed code that at first glace was looking very promising. But running some tests proved that it did not work and after looking at the code and the execution logic in the debugger I came to the concolusion that this is hopeless and impossible to fix!

Then I had quite some detailed discussion with ChatGPT which was leading me to a better understanding what needs to be done:

Parse the MP4 container format and handle the relevant boxes only
Build a sample table which was giving the record sizes to be played back
For AAC get the profile, sample rate index and channel index and with this build and add an adts header before the samples
For ALAC data extract the magic cookie and provide it to the decoder
Return the audio data which is contained in the mdat box so that it can be sent to a decoder.

The M4A Audio Format

The M4A audio format is based on the MP4 container structure, which organizes data into a hierarchy of boxes. Each box begins with a 4-byte big-endian size field, followed by a 4-byte type identifier. Container boxes can nest other boxes and typically contain no data of their own, although exceptions exist. In contrast, non-container boxes hold actual data.

Audio samples are stored in the mdat box, but to extract them correctly, information from the stsz box is required. The stsz box indicates the total number of samples and either a fixed sample size or, if the size is zero, a table of individual sample sizes. This data is essential to determine how many bytes to read from mdat for each sample. Additionally, the child boxes within the stsd box reveal the audio format—such as alac (Apple Lossless), mp4a (AAC), or mp3.

How to eat an elephant: cut him up into little pieces!

So, I decided to split up the problem into different C++classes:

The MP4 Parser

I expected to be able to find an existing simple, lean and tested MP4 parser which would run on a Microcontroller that provides a simple callback API where I can just plug in some callbacks with the relevant logic, and to my big surprise I found nothing!

So I was forced do this myself and I wrote and tested my MP4Parser class which, to keep things simple, expects each box to fit into the actual RAM buffer. Here I had quite some surprises: In some rare cases it did not find a valid box at the expected location, so I added some fallback error handling that just scans for the next box and reports the skiped data under the last box type.

As a next step, I was extending this with a MP4ParserIncremental class which removes this restriction and can provide the box content incrementally when the buffer is too small. After that functionality was tested, I integrated it back into the MP4Parser.

This was quit simple to test: just execute the test sketch and check if you find the boxes at the indicated locations!

Extracting the Audio Data

Next I decided to have a separate M4AAudioDemuxer class which implements all the necessary parsing data callbacks and data extraction logic and provides the the result as frame entries in a callback. This class basically

forwards the written data to the parser
implements the data M4A extraction logic
and provides the result via a callback

So this class is where the major work was done. With the parsing issues out of the way, I could concentrate on the logical issues. The two major ones were the following:

The proposed logic from Claude for the extraction of the AAC profile, sample rate index and channel index did just lead to wrong data. I was trying quite some alternative AI models until I was proposed with a working solution.
The proposed logic for the extraction of the alac magic cookie was also giving wrong values: until I figured out that there is an alac box in the alac box that contains the correct data.

With these issues out of the way the decoding of audio stared to work…

The Container API

The final ContainerM4A which is a ContainerDecoder subclass was quite simple to implement:

Get a MultiDecoder via the constructor, so that we can support the relevant audio formats: AAC and ALAC
Just subscribe to the data provided by the M4AAudioDemuxer callback
When receiving a Frame from the callback, just select the right decoder and write the data to it.

Example Arduino Sketch

An example sketch can be found on Github: I was using an AudioKit for testing, but you can easily adapt this logic to work with any other supported output type.

And there is also an example using the AudioPlayer: You can use a single MultiDecoder both for the AudioPlayer and the ContainerM4A or you can use the MultiDecoder only for the ContainerM4A and provide the container to the AudioPlayer or finally you can use two separate MultiDecoder for clearly specifying what audio types the AudioPlayer should support and what types the ContainerM4A should support.

Caveats

The M4A file format needs quite a lot of RAM to store the sample table, so don’t even try this w/o enough PSRAM!
The M4A need to be stored in streaming format where the mdat box (with the audio data) is at the end. Some files do not follow this logic: thouse need to be processed twice!

Outlook

I am planning to provide an Implentation where the API works with an Arduino File: with this we do not need to store the sample table in memory, but we can read it directly from the file when needed. This is much more memory efficient…

Vibe Coding: Adding M4A Audio Playback Support to the AudioTools

Published by pschatzmann on 5. June 20255. June 2025

The M4A Audio Format

The MP4 Parser

Extracting the Audio Data

The Container API

Example Arduino Sketch

Caveats

Outlook

0 Comments

Leave a Reply Cancel reply

Arduino Audio Tools: Supporting ALAC

Arduino Audio Tools: Using FTP as Data Source

Remote Control for the Arduino AudioTools AudioPlayer

Vibe Coding: Adding M4A Audio Playback Support to the AudioTools

Published by pschatzmann on 5. June 20255. June 2025

The M4A Audio Format

The MP4 Parser

Extracting the Audio Data

The Container API

Example Arduino Sketch

Caveats

Outlook

see also:

0 Comments

Leave a Reply Cancel reply

Related Posts

Arduino Audio Tools: Supporting ALAC

Arduino Audio Tools: Using FTP as Data Source

Remote Control for the Arduino AudioTools AudioPlayer