I’m happy to announce that the Arduino Audio Tools library now includes native USB Audio Class 2.0 (UAC2) support. Your ESP32-S3 or RP2040 board can show up as a standard USB microphone, speaker, or both — no drivers needed on the host side. Linux, macOS, and Windows 10+ recognize it out of the box.

Why USB Audio?

Most Arduino audio projects use I2S to talk to an external DAC or ADC. That works well, but it requires extra hardware and doesn’t help when you want to stream audio directly between a microphone and a computer, or use your board as a USB sound card.

With UAC2, the microcontroller is the sound card. Plug it in, and tools like arecord, aplay, Audacity, or any DAW can send and receive audio without any custom software on the host.

What’s Supported

  • TX mode (device → host): The board appears as a USB microphone. Feed it audio from any source — I2S, a sine generator, an MP3 decoder — and it streams to the host as a standard capture device.
  • RX mode (host → device): The board appears as a USB speaker. The host plays audio, and your sketch reads it with readBytes() to forward to a DAC, process, or analyze.
  • RXTX mode: Both directions at once. The board appears as a composite capture + playback device.
  • Volume and mute control: The host can adjust per-channel volume and mute through the standard mixer interface (e.g. amixer, PulseAudio, Windows volume mixer).
  • Sample rate flexibility: 16-bit, 24-bit, and 32-bit PCM at any standard rate. An optional multi-rate mode advertises 14 discrete rates from 8 kHz to 192 kHz.
  • Composite USB: Works alongside USB CDC (serial), so you can debug over the same USB cable.
  • Cross-platform: Tested on Linux (snd-usb-audio), and designed to work with macOS CoreAudio and Windows UAC2 drivers.

Supported Hardware

  • ESP32-S2 / ESP32-S3 — via the native USB OTG peripheral (DWC2 controller)
  • Adafruit TinyUSB stack all boards supporting the Adafruit TinyUSB stack (e.g. Raspberry Pi Pico)
  • Zephyr-based boards — via Zephyr’s native usbd_uac2 driver (nRF5340, STM32, etc.)

Getting Started

TX Mode — USB Microphone

A minimal example that streams a sine wave to the host:

#include "AudioTools.h"
#include "AudioTools/Communication/USB/USBAudioStream.h"

AudioInfo info(44100, 2, 16);
SineGenerator<int16_t> sineWave;
GeneratedSoundStream<int16_t> sound(sineWave);
USBAudioStream out;
StreamCopy copier(out, sound);

void setup() {
  // Required on cores without built-in TinyUSB support (e.g. mbed RP2040)
  if (!TinyUSBDevice.isInitialized()) {
    TinyUSBDevice.begin(0);
  }

  sineWave.begin(info, N_B4);
  out.addNotifyAudioChange(sound);

  auto config = out.defaultConfig(TX_MODE);
  config.copyFrom(info);
  out.begin(config);

  // Re-enumerate so the host picks up the new audio interface
  if (TinyUSBDevice.mounted()) {
    TinyUSBDevice.detach();
    delay(10);
    TinyUSBDevice.attach();
  }
}

void loop() {
  copier.copy();
}

On the host, the device appears immediately:

arecord -D hw:Audio -f S16_LE -r 44100 -c 2 -d 5 recording.wav

RX Mode — USB Speaker

The board receives audio from the host:

#include "AudioTools.h"
#include "AudioTools/Communication/USB/USBAudioStream.h"

AudioInfo info(44100, 2, 16);
USBAudioStream in;
I2SStream i2s;  // or any output
StreamCopy copier(i2s, in);

void setup() {
  // Required on cores without built-in TinyUSB support (e.g. mbed RP2040)
  if (!TinyUSBDevice.isInitialized()) {
    TinyUSBDevice.begin(0);
  }

  auto config = in.defaultConfig(RX_MODE);
  config.copyFrom(info);
  in.begin(config);

  auto i2s_cfg = i2s.defaultConfig(TX_MODE);
  i2s_cfg.copyFrom(info);
  i2s.begin(i2s_cfg);

  // Re-enumerate so the host picks up the new audio interface
  if (TinyUSBDevice.mounted()) {
    TinyUSBDevice.detach();
    delay(10);
    TinyUSBDevice.attach();
  }
}

void loop() {
  copier.copy();
}

Then on the host:

aplay -D hw:Audio my_song.wav

Configuration

All USB audio settings live in a single USBAudioConfig struct. You get a pre-filled instance from defaultConfig() and customize what you need:

auto config = usb.defaultConfig(TX_MODE);  // or RX_MODE, RXTX_MODE
config.copyFrom(info);                     // inherit sample_rate, channels, bits_per_sample

Audio Format

Field Default Description
sample_rate 44100 Sample rate in Hz. Inherited from AudioInfo.
channels 2 Number of audio channels (1 = mono, 2 = stereo).
bits_per_sample 16 Bit depth per sample. Must be 16, 24, or 32. Maps to S16_LE, S24_3LE, S32_LE on the host.

Direction

Field Default Description
enable_ep_in true Enable the IN endpoint (device → host, capture/microphone).
enable_ep_out true Enable the OUT endpoint (host → device, playback/speaker).

defaultConfig(TX_MODE) sets enable_ep_in=true, enable_ep_out=false. defaultConfig(RX_MODE) does the opposite. RXTX_MODE enables both.

Endpoint Addresses

Field Default Description
ep_in 0x83 ISO IN endpoint address (capture).
ep_out 0x03 ISO OUT endpoint address (playback).
ep_fb 0x84 ISO IN feedback endpoint (RX-only mode).
ep_int 0x85 INT IN endpoint (AC change notifications).

The defaults are chosen to avoid conflicts with CDC, which uses 0x81, 0x82, and 0x02.

Buffering

Field Default Description
fifo_packets 16 Number of 1 ms USB packets buffered. Higher values reduce the risk of underrun at the cost of latency.
use_linear_buffer_rx true Use a flat buffer for RX (required for DMA-based downstream drivers).
use_linear_buffer_tx true Use a flat buffer for TX.

Device Identity

Field Default Description
vid 0xCafe USB Vendor ID.
pid 0x4002 USB Product ID.
manufacturer “Audio Tools” Manufacturer string shown by the host.
product “USB Audio” Product name shown by the host.
serial “000001” Serial number string.
self_powered true Device is self-powered (not bus-powered).
max_power_ma 100 Maximum current draw in mA.

Feature Flags

Field Default Description
enable_feedback_ep true Enable the isochronous feedback endpoint so the host can adjust its clock. Only active in pure RX mode (no IN endpoint).
enable_multi_sample_rate false When true, the clock source is programmable and GET_RANGE advertises 14 discrete rates (8 kHz – 192 kHz). When false, only the configured rate is reported.
enable_interrupt_ep false Enable the AC interrupt endpoint for device-initiated volume, mute, and sample-rate change notifications.
enable_ep_in_flow_control true Vary the per-frame packet size so non-integer rates like 44100 Hz are delivered at the exact average rate.
volume_active false When true, the library applies volume and mute scaling to the audio samples directly. When false, values are available via volume() and isMuted() for external processing.

ESP32-Specific

Field Default Description
begin_usb false When true, beginUSB() calls USB.begin() automatically. Set to false for composite USB devices where you control the startup order.

Example

auto config = usb.defaultConfig(TX_MODE);
config.copyFrom(info);
config.product = "My Synth";
config.vid = 0x1234;
config.pid = 0x5678;
config.fifo_packets = 32;         // more buffering
config.volume_active = true;      // let the library handle volume
config.enable_interrupt_ep = true; // push volume/rate changes to host
usb.begin(config);

Volume and Mute

The UAC2 descriptors advertise per-channel volume and mute controls. The host sees them as standard mixer controls:

# List controls
amixer -c Audio scontrols

# Set volume to 80%
amixer -c Audio sset 'Mic',0 80%

# Mute
amixer -c Audio sset 'Mic',0 mute

On the device side, you can react to volume changes with a callback:

out.setVolumeCallback([](float vol, uint8_t channel) {
  Serial.printf("Volume ch%d: %.0f%%\n", channel, vol * 100);
});

out.setMuteCallback([](bool muted, uint8_t channel) {
  Serial.printf("Mute ch%d: %s\n", channel, muted ? "on" : "off");
});

Or enable automatic volume processing so the library scales the audio samples directly:

config.volume_active = true;

Architecture

The implementation is split into layers:

  • USBAudioConfig — all configuration in one struct: sample rate, channels, bit depth, endpoint addresses, buffering, device identity, and feature flags.
  • USBAudio2DescriptorBuilder — generates the complete UAC2 descriptor block at runtime: IAD, Audio Control interface with Clock Source, Input/Output Terminals, and Feature Units, plus Audio Streaming interfaces with Format Type descriptors and isochronous endpoints.
  • USBAudioDeviceBase — the core class (~2200 lines) that implements the TinyUSB class driver interface: descriptor callbacks, control request handling (sample rate, volume, mute), isochronous endpoint management, flow control for accurate sample rates, and the AudioStream read/write API.
  • USBAudioDeviceESP32 / USBAudioDeviceTinyUSB / USBAudioDeviceZephyr — thin platform subclasses that provide the right buffer implementation and USB stack initialization for each target.
  • USBAudioStream — a dispatch header that resolves to the correct platform class automatically.

The key design goal was to make USB audio feel like any other AudioTools stream. StreamCopy works. addNotifyAudioChange works. Volume and mute integrate with the existing VolumeSupport interface. You can chain it with any source or sink in the library.

Flow Control

One subtle but important feature is TX flow control. At 44100 Hz with 1 ms USB frames, you need to send 44.1 samples per frame — not an integer. Without flow control, you’d send 45 samples every frame, and the effective rate would drift to 45000 Hz.

The flow control implementation uses a fractional accumulator: it alternates between 44-sample and 45-sample packets so the average over time is exactly 44.1 samples per frame. This is enabled by default (enable_ep_in_flow_control = true) and is essential for glitch-free audio.

Buffering

Getting buffer management right on a dual-core chip like the ESP32-S3 was one of the bigger challenges. The USB stack runs on core 0, but the Arduino loop() typically runs on core 1. We need lock-free or RTOS-synchronized handoff between the cores.

The ESP32 subclass uses BufferRTOS — a FreeRTOS StreamBuffer wrapper that provides safe cross-core data transfer. The TX side uses a short write timeout (5 ms) so the audio copier blocks briefly when the buffer is full, while the USB callback side never blocks. The RX side is fully non-blocking on both ends.

On the RP2040 (single-core), a simple RingBuffer is sufficient since there’s no cross-core contention.

Composite USB Devices

You can combine USB Audio with CDC serial for debugging. On ESP32-S3, set begin_usb = false in the config, register the CDC interface first, then start audio, and finally call USB.begin() yourself:

USBCDC MySerial;

void setup() {
  MySerial.begin(115200);

  auto config = out.defaultConfig(TX_MODE);
  config.copyFrom(info);
  config.begin_usb = false;  // we'll call USB.begin() manually
  out.begin(config);

  USB.begin();  // starts both CDC and Audio
}

The default endpoint addresses (0x83, 0x03, 0x84, 0x85) are chosen to avoid conflicts with CDC, which uses 0x81, 0x82, and 0x02.

Testing

The library includes a QualityAnalysisStream that can be wired into the RX path to detect audio problems:

AudioInfo info(44100, 2, 16);
USBAudioStream in;
QualityAnalysisStream quality(in);
uint8_t buf[1024];

void setup() {
  // Required on cores without built-in TinyUSB support (e.g. mbed RP2040)
  if (!TinyUSBDevice.isInitialized()) {
    TinyUSBDevice.begin(0);
  }

  auto config = in.defaultConfig(RX_MODE);
  config.copyFrom(info);
  in.begin(config);
  quality.setReporting(10000, Serial);
  quality.begin(info);

  // Re-enumerate so the host picks up the new audio interface
  if (TinyUSBDevice.mounted()) {
    TinyUSBDevice.detach();
    delay(10);
    TinyUSBDevice.attach();
  }
}

void loop() {
  quality.readBytes(buf, sizeof(buf));
}

On the host, play a test tone and watch Serial for results:

sox -n -r 44100 -c 2 -b 16 test_tone.wav synth 5 sine 440
aplay -D hw:Audio test_tone.wav

The code is available in the arduino-audio-tools library under src/AudioTools/Communication/USB/. Examples are in examples/examples-communication/usb/.

Feedback and bug reports are welcome — this is a complex protocol stack and real-world testing across different hosts and use cases is invaluable.


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *