In my last post I was writing about audio containers. But containers are usually also used to mux audio and video together in one file. This made me think, if we could play movies on an ESP32?

In fact there are some projects out there that exactly do this: They used Motion-Jpeg to play the movie and store the audio in a separate file, so they actually avoided the complexity of containers. A mjpeg is basically just a sequence of jpegs that need to be decoded and displayed in sequence.

But can we improve this ? Actually we can: The Microsoft AVI container file format is easy enough to write a memory efficient parser and that’s what I have added to my Audio-Tools library: Here is the AVIDecoder class documentation

I started the development on the desktop, so that I can easily test and debug my code. So all the code that you see in this post is compiled and running on my Linux notebook.

A First Arduino Sketch for the AVI Container

I was using this file downloaded from here for my tests. It contains PCM audio with 8 bits (unsigned) and the video is in mjpeg. Since the AVI container can contain different codecs you need to specify the codec that should be used to decode the audio. In our case we use the DecoderL8 class!

#include "AudioTools.h"
#include "AudioCodecs/ContainerAVI.h"
#include "AudioLibs/Desktop/File.h"
#include "AudioLibs/PortAudioStream.h"

PortAudioStream out;   // Output of sound on desktop 
AVIDecoder codec(new DecoderL8());
EncodedAudioOutput avi(&out, &codec);
File file;
StreamCopy copier(avi, file);


void setup() {
  AudioLogger::instance().begin(Serial, AudioLogger::Info);
  file.open("/data/resources/test1.avi",FILE_READ);
}

void loop() {
  if(!copier.copy()){
    stop();
  }
}

There is no surprise here: we just copy the file to the EncodedAudioOutput which decodes the audio and sends the result to a PortAudioStream object.

So this plays the audio from the avi file!

Playing the Video

Now next move to the next step: We need some functionality to display the jpegs on the screen. I created an interface if the form of the abstract VideoOutput class for this. The avi decoder is just providing individual chunks of video that we need to assemble into a complete video frame and then display it on the screen. Here is the API:

class VideoOutput {
 public:
  virtual void beginFrame(size_t size) = 0;
  virtual size_t write(const uint8_t *data, size_t byteCount) = 0;
  virtual uint32_t endFrame() = 0;
};

To test the video output I implemented the JpegOpenCV class which is just using OpenCV to display the jpegs.

Here is the extended Arduino sketch, that displays video and plays audio:

#include "AudioTools.h"
#include "AudioCodecs/ContainerAVI.h"
#include "AudioLibs/Desktop/File.h"
#include "AudioLibs/PortAudioStream.h"
#include "Video/JpegOpenCV.h"

PortAudioStream out;   // Output of sound on desktop 
JpegOpenCV jpegDisplay;
AVIDecoder codec(new DecoderL8(), &jpegDisplay);
EncodedAudioOutput avi(&out, &codec);
File file;
StreamCopy copier(avi, file);


void setup() {
  AudioLogger::instance().begin(Serial, AudioLogger::Info);
  file.open("/data/resources/test1.avi",FILE_READ);
  codec.setOutputVideoStream(jpegDisplay);
  //codec.setMute(true);
}

void loop() {
  if(!copier.copy()){
    stop();
  }
}

The codec.setOutputVideoStream(jpegDisplay) gives the codec the information what need’s to be done with the video information.

We have video and audio now, but the playback is everything else but smooth!

If we deactivate the audio the video plays just perfect. We need to work on a better way to synchronize the audio and the video, as we currently do not have any logic or buffering at all: we just play what we get and when we get it!

Improving the Synchronization

I was not sure about the best way to deal with this, so I moved this logic into a separate VideoAudioSync class and to improve things I started to write the audio into a buffer and play it back instead of delaying the video frames to render them in the correct speed. Here is the final improved version which is using the VideoAudioBufferedSync class. Feel free to use your own improved version instead!

#include "AudioTools.h"
#include "AudioCodecs/ContainerAVI.h"
#include "AudioLibs/Desktop/File.h"
#include "AudioLibs/PortAudioStream.h"
#include "Video/JpegOpenCV.h"

PortAudioStream out;   // Output of sound on desktop 
JpegOpenCV jpegDisplay;
AVIDecoder codec(new DecoderL8(), &jpegDisplay);
EncodedAudioOutput avi(&out, &codec);
File file;
StreamCopy copier(avi, file);
VideoAudioBufferedSync videoSync(10*1024, -20);


void setup() {
  AudioLogger::instance().begin(Serial, AudioLogger::Info);
  file.open("/data/resources/test1.avi",FILE_READ);
  codec.setOutputVideoStream(jpegDisplay);
  codec.setVideoAudioSync(&videoSync);
}

void loop() {
  if(!copier.copy()){
    stop();
  }
}

The codec.setVideoAudioSync(&videoSync) is defining which synchronization logic to use.

It quite impressive that, with just a few lines of code, we can actually play movies!
The next step is to make this run on an ESP32 using Arduino. This should just be an easy step because all I need to do is to replace the Audio and JPEG output classes with the implementations that are supported by Arduino.

Source Code

Here is my source code on Github


1 Comment

Dimitry · 15. August 2023 at 3:17

It would be nice to have the RTSP stream ( Video + Audio) on Ai Thinker board ( ESP32 CAM) utilizing 2 cores simultaneously.

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *