Skip to content

Audio codec analysis for MCU-based industrial streaming

Xylolabs API — Codec Feasibility Study Revision: 2026-03-22

PATENT PENDING — XAP (Xylolabs Audio Protocol) and XMBP (Xylolabs Metadata Binary Protocol) are proprietary technologies of Xylolabs Inc. Patent applications have been filed.


1. Overview

Use case

Industrial audio monitoring with four-channel stereo microphone arrays operating at 96 kHz sample rate. Audio is captured on an embedded MCU, compressed in real-time, and streamed over LTE-M1 to a cloud ingest server where it is decoded, stored, and made available for analysis.

LTE-M1 bandwidth constraint

LTE-M1 (LTE Cat-M1) provides approximately 375 kbps uplink throughput in good conditions, degrading to ~100–200 kbps under typical field conditions. With protocol overhead, the practical sustained audio payload budget is:

  • Best case: ~47 KB/s (375 kbps ÷ 8, no overhead)
  • Practical target: ~30–40 KB/s (allowing for retransmissions, headers, metadata)

Requirements

Requirement Value
Channels 4 (two stereo pairs)
Sample rate 96 kHz
Bit depth 16-bit (24-bit optional)
Encode location MCU (no OS, bare-metal or RTOS)
Decode location Cloud server (Linux, no resource constraints)
Max sustained bandwidth ~40 KB/s
Latency budget < 100ms end-to-end
Quality priority Industrial monitoring — preserve full spectrum

Raw PCM baseline

4 channels × 96,000 samples/s × 16 bits = 6,144,000 bits/s = 768 KB/s

This is 19× over budget. Significant compression is required.


2. Codec comparison table

Codec Type MIPS/ch @48kHz MIPS/ch @96kHz RAM/ch Compression Quality License MCU Feasible?
XAP Lossy 5–10 10–20 ~8 KB 10:1 Excellent Xylolabs proprietary (patent pending) Yes
Opus (SILK) Lossy 15–25 30–50 ~20 KB 10:1+ Excellent BSD (royalty-free) Marginal
Opus (CELT) Lossy 30–50 60–100 ~30 KB 10:1+ Excellent BSD (royalty-free) No
MP3 (LAME) Lossy 20–40 N/A (max 48 kHz) ~30 KB 10:1 Good Patent-free (2017+) Marginal @48 kHz
AAC-LC Lossy 25–40 50–80 ~25 KB 10:1 Very Good ISO (Via Licensing) No @96 kHz
HE-AAC v1 Lossy 30–50 60–100 ~30 KB 15:1 Very Good ISO (licensing required) No
HE-AAC v2 Lossy 40–60 80–120 ~40 KB 20:1 Good (stereo only) ISO (licensing required) No
xHE-AAC (USAC) Lossy 50–80 N/A ~50 KB 25:1 Excellent ISO (licensing required) No
SBC Lossy 3–5 6–10 ~4 KB 4–5:1 Fair Bluetooth SIG (royalty-free) Yes
aptX Lossy 5–8 10–16 ~8 KB 4:1 Very Good Qualcomm (licensing required) Yes
aptX HD Lossy 8–12 16–24 ~12 KB 4:1 Excellent Qualcomm (licensing required) Yes (F411/ESP32-S3)
IMA-ADPCM Lossy < 1 < 2 < 1 KB 4:1 Fair Public domain Yes (all)
Speex Lossy 3–8 N/A (max 32 kHz) ~4 KB 8–11:1 Good (speech) BSD (royalty-free) Yes (voice only)
G.722 Lossy 3–5 N/A (max 16 kHz) ~2 KB 4:1 Good (wideband) ITU-T (royalty-free) Yes (voice only)
3GPP EVS Lossy 10–20 15–30 ~15 KB 8–12:1 Excellent (voice) 3GPP (licensing required) Marginal
AAC-ELD Lossy 15–25 20–30 ~20 KB 8–10:1 Very Good ISO (licensing required) No @96 kHz
aptX Lossless Lossless N/A N/A N/A ~2:1 Perfect Qualcomm (licensing required) No
FLAC Lossless 30–50 60–100 ~50 KB 2:1 Perfect BSD (royalty-free) No (too slow)
ALAC Lossless 25–40 50–80 ~40 KB 2:1 Perfect Apache (royalty-free) No

3. Detailed analysis per codec

3.1 XAP — Xylolabs Audio Protocol

  • Sample rates: 8, 16, 24, 32, 44.1, 48, 96 kHz
  • Frame sizes: 7.5 ms or 10 ms
  • Bitrate range: 16–320 kbps per channel (encoder-configurable)
  • MCU implementation: XAP decoder library — open-source reference implementation, ~5,000 lines of C, zero external dependencies, fixed-point arithmetic available
  • 4ch @96 kHz bandwidth: 4 × 80 kbps = 320 kbps = 40 KB/s — fits within LTE-M1 budget
  • CPU requirement: ~10 MIPS/ch @48 kHz, ~20 MIPS/ch @96 kHz → 80 MIPS total for 4ch @96 kHz → fits RP2350 (300 MIPS dual-core), fits ESP32-S3 (480 MIPS)
  • With DSP acceleration: ~7 MIPS/ch @48 kHz, ~14 MIPS/ch @96 kHz → 56 MIPS for 4ch @96 kHz using CMSIS-DSP optimized paths on Cortex-M33/M4F (see Section 6)
  • RAM: ~8 KB per channel encoder state → 32 KB for 4ch → fits all target MCUs
  • Latency: 7.5–10 ms algorithmic delay per frame
  • Quality: Near-transparent at 64 kbps/ch; excellent, broadcast-grade at 80 kbps/ch
  • Licensing: Xylolabs proprietary (patent pending)
  • Hardware acceleration: The XAP decoder library includes ARM-optimized fixed-point paths that use Cortex-M33/M4F DSP instructions (single-cycle MAC, saturating arithmetic). The MDCT, spectral analysis, and quantization loops see 25–40% speedup with DSP intrinsics.

Verdict: RECOMMENDED — best balance of quality, compression efficiency, CPU budget, RAM, and licensing for this use case. Already integrated in the Xylolabs Rust SDK (crates/xylolabs-sdk/) as the XAP codec module. DSP acceleration reduces CPU usage further on Cortex-M33/M4F platforms.


3.2 Opus

  • Standard: RFC 6716 (IETF)
  • Modes:
  • SILK: Low complexity, designed for speech, up to 48 kHz
  • CELT: Higher complexity, full-band music audio, up to 48 kHz
  • Hybrid: Combination of SILK + CELT for 12–20 ms frames
  • Sample rates: 8, 12, 16, 24, 48 kHz — no native 96 kHz support
  • SILK mode CPU: ~20 MIPS/ch → 80 MIPS for 4ch → fits RP2350, but limited to 48 kHz
  • CELT mode CPU: ~40 MIPS/ch → 160 MIPS for 4ch → tight on RP2350, exceeds STM32
  • RAM (encoder): ~20–30 KB/ch → 80–120 KB for 4ch — tight on MCUs with limited SRAM
  • 96 kHz limitation: Opus internally resamples all input to 48 kHz. Encoding at 96 kHz input is not meaningful — the codec will downsample, discarding content above 24 kHz.
  • Quality: Best-in-class for 48 kHz audio; indistinguishable from original at 96–128 kbps/ch in CELT mode
  • Licensing: BSD-style (royalty-free, no patent encumbrances)

Verdict: GOOD for 48 kHz streams. Not suitable for true 96 kHz capture because the encoder discards high-frequency content. If the application requirement is relaxed to 48 kHz, Opus SILK is viable on RP2350.


3.3 MP3 (MPEG-1/2 Audio Layer III)

  • Standard: ISO 11172-3 (MPEG-1) / ISO 13818-3 (MPEG-2)
  • Maximum sample rate: 48 kHz (MPEG-1) — no 96 kHz support
  • CPU (encoder, e.g., LAME): ~30 MIPS/ch → 120 MIPS for 4ch @48 kHz — tight on most MCUs
  • RAM: ~30 KB/ch — significant
  • Quality: Acceptable at 128–192 kbps/ch; audible artifacts at ≤ 96 kbps
  • Patents: All relevant patents expired by 2017; fully royalty-free
  • MCU implementations: Very few production-quality MCU MP3 encoders exist; LAME is not designed for embedded

Verdict: NOT RECOMMENDED — no 96 kHz support, high CPU and RAM requirements, dated codec superseded by XAP and Opus. Encoding on MCU is not practical.


3.4 AAC-LC (Advanced Audio Coding — Low Complexity)

  • Standard: ISO/IEC 14496-3 (MPEG-4 Audio)
  • Sample rates: Up to 96 kHz supported natively
  • CPU: ~30 MIPS/ch @48 kHz, ~60 MIPS/ch @96 kHz → 240 MIPS for 4ch — exceeds RP2350 (300 MIPS, too close), exceeds F411 and nRF52840 by far
  • RAM: ~25 KB/ch → 100 KB for 4ch
  • Quality: Very good at 128 kbps; excellent at 192 kbps
  • Licensing: ISO/IEC patent pool managed by Via Licensing — royalty required per unit sold
  • MCU encoders: FDK-AAC by Fraunhofer is the best open-source encoder, but it is heavy and not optimized for bare-metal MCUs

Verdict: NOT RECOMMENDED for MCU — too CPU-heavy at 96 kHz, licensing costs are non-trivial for hardware products, and no well-tested MCU encoder libraries exist.


3.5 HE-AAC v1 (High Efficiency AAC + Spectral Band Replication)

  • Standard: ISO/IEC 14496-3
  • Enhancement: Spectral Band Replication (SBR) regenerates high frequencies from low-frequency data, enabling high quality at half the bitrate of AAC-LC
  • CPU: Higher than AAC-LC due to SBR analysis — ~50 MIPS/ch @48 kHz
  • Best operating range: 48–80 kbps/ch (delivers AAC-LC @96 kbps quality at half the bitrate)
  • Licensing: ISO/IEC + Dolby patent pool — royalty required
  • MCU encoder complexity: SBR encoder analysis is computationally intensive and does not fit in typical MCU SRAM

Verdict: NOT FEASIBLE on MCU — encoder is significantly more complex than AAC-LC. Server-side decode is fine (FFmpeg), but encoding on embedded hardware is not practical.


3.6 HE-AAC v2 (HE-AAC v1 + Parametric Stereo)

  • Standard: ISO/IEC 14496-3
  • Enhancement: Adds Parametric Stereo (PS) on top of HE-AAC v1, encoding stereo as mono + steering parameters
  • CPU: Highest among AAC variants — ~60–80 MIPS/ch equivalent
  • Best at: 24–48 kbps for stereo content (remarkable compression for speech and music)
  • Limitation: Designed for stereo (2-channel) only — not applicable to 4-channel configurations
  • Quality at low bitrate: Very good for speech; stereo imaging degrades at low bitrates

Verdict: NOT FEASIBLE — too computationally intensive for MCU encoding, stereo-only limitation excludes 4-channel use case.


3.7 xHE-AAC / USAC (Extended HE-AAC / Unified Speech and Audio Coding)

  • Standard: ISO/IEC 23003-3 (MPEG-D)
  • Technology: Combines ACELP (speech coding) with MDCT (music coding), with smooth switching and advanced noise shaping
  • CPU: Very high — designed for ARM Cortex-A class processors, not microcontrollers
  • Quality: State of the art; near-transparent at 32 kbps stereo
  • Licensing: ISO/IEC patent pool — royalty required

Verdict: NOT FEASIBLE on any MCU — encoder requires a full-featured CPU with OS-level memory management. Not relevant for this use case.


3.8 SBC (Subband Coding)

  • Standard: Bluetooth A2DP specification
  • Algorithm: Simple subband filter bank with ADPCM-like quantization
  • CPU: Very low — ~5 MIPS/ch, trivial even on STM32F103
  • Compression: Only 4–5:1 — less efficient than XAP, meaning higher bandwidth for the same bitrate
  • RAM: ~4 KB/ch — excellent
  • Quality: Adequate for Bluetooth speakers; not suitable for high-fidelity archival or spectrum analysis
  • Licensing: Royalty-free under Bluetooth SIG terms

Verdict: FEASIBLE but strictly inferior to XAP in every dimension except simplicity. SBC was designed as a lowest-common-denominator codec before XAP existed. Use XAP instead when available; use SBC only as a final fallback if XAP cannot be integrated.


3.9 aptX / aptX HD

  • Standard: Qualcomm proprietary
  • Algorithm: ADPCM with enhanced subband processing
  • aptX CPU: ~8 MIPS/ch → 32 MIPS for 4ch — very feasible
  • aptX HD CPU: ~12 MIPS/ch → 48 MIPS for 4ch — feasible on most MCUs
  • Compression: 4:1 (same ratio as ADPCM but significantly better perceptual quality)
  • aptX quality: Very good — comparable to 16-bit CD quality
  • aptX HD quality: Near-lossless for human perception; 24-bit input support
  • RAM: ~8–12 KB/ch — manageable
  • Licensing: Qualcomm license agreement required — cost-prohibitive for custom hardware without Qualcomm silicon, and SDK access is restricted

Verdict: GOOD quality, impractical licensing. The Qualcomm licensing model ties aptX to their chipsets commercially. Not suitable for open custom hardware development.


3.10 IMA-ADPCM (Interactive Multimedia Association Adaptive Differential PCM)

  • Standard: IMA/DVI ADPCM specification
  • Algorithm: Encodes sample-to-sample deltas using a 4-bit quantizer with adaptive step size
  • CPU: Trivial — < 1 MIPS/ch → < 4 MIPS for 4ch @96 kHz
  • Compression: Fixed 4:1 (16-bit input → 4-bit output)
  • RAM: < 1 KB/ch — fits on any MCU
  • Quality: Fair — audible quantization noise, high-frequency artifacts, no psychoacoustic modeling
  • 4ch @96 kHz bandwidth: 768 KB/s ÷ 4 = 192 KB/s — still over the LTE-M1 budget
  • 4ch @48 kHz bandwidth: 384 KB/s ÷ 4 = 96 KB/s — still over budget
  • Licensing: Public domain

Verdict: BASELINE FALLBACK — always feasible computationally, runs on any MCU. However, 4:1 compression is insufficient for the 96 kHz / 4-channel use case over LTE-M1. Only viable if the sample rate is further reduced (e.g., 2ch @24 kHz for voice-only monitoring).


3.11 Speex

  • Standard: Xiph.Org open specification (RFC 5574)
  • Algorithm: CELP (Code-Excited Linear Prediction)
  • Maximum sample rate: 32 kHz (ultra-wideband) — no 48 kHz or 96 kHz support
  • Bitrate: 2.15–44.2 kbps (variable, mode-dependent)
  • CPU: ~3–8 MIPS/ch (narrowband to ultra-wideband)
  • RAM: ~4 KB/ch
  • Quality: Good for speech; includes noise suppression, echo cancellation, and VAD (Voice Activity Detection) — features absent from G.722
  • Licensing: BSD — fully royalty-free, no patent encumbrance
  • Note: The Speex project officially recommends Opus as its successor for new deployments. However, Speex remains a valid choice for MCU voice-only use cases where Opus is too heavy (>30 MIPS/ch) and G.722's 16 kHz ceiling is limiting.

Verdict: VOICE-ONLY ALTERNATIVE — viable for speech monitoring at 8–32 kHz with better compression than G.722 (8–11:1 vs 4:1) and built-in noise suppression. Superseded by Opus for general audio; XAP remains the recommendation for wideband industrial audio.


3.12 G.722

  • Standard: ITU-T G.722
  • Algorithm: Subband ADPCM, two-subband (low and high)
  • Maximum sample rate: 16 kHz (wideband voice) — no 48 kHz or 96 kHz support
  • CPU: ~5 MIPS/ch
  • Quality: Good for voice; completely unsuitable for full-band audio
  • Licensing: ITU-T — royalty-free

Verdict: VOICE-ONLY — useful only if the application is narrowband speech monitoring. Not applicable for 96 kHz industrial audio.


3.13 FLAC (Free Lossless Audio Codec)

  • Standard: Xiph.Org open specification
  • Algorithm: Linear predictive coding + Rice entropy coding
  • CPU: ~60–100 MIPS/ch @96 kHz → ~400 MIPS for 4ch — far exceeds all target MCUs
  • Compression: ~2:1 (lossless, content-dependent)
  • RAM: ~50 KB/ch — significant
  • Bandwidth for 4ch @96 kHz: ~384 KB/s — still 10× over LTE-M1 budget
  • Quality: Perfect — mathematically lossless

Verdict: NOT FEASIBLE — both the CPU cost and the resulting bandwidth exceed the constraints. FLAC is excellent for server-side storage transcoding but is not an encoding codec for this embedded use case.


3.14 ALAC (Apple Lossless Audio Codec)

  • Standard: Apple open-sourced, Apache 2.0 license
  • Algorithm: Similar to FLAC; integer polynomial predictor + Rice coding
  • CPU: ~50–80 MIPS/ch @96 kHz → ~320 MIPS for 4ch — exceeds most MCUs
  • RAM: ~40 KB/ch
  • Quality: Perfect (lossless)
  • Licensing: Apache 2.0 (royalty-free)

Verdict: NOT FEASIBLE — same constraints as FLAC. No benefit over FLAC for this use case and less tooling support on Linux servers.


3.15 3GPP EVS (Enhanced Voice Services)

  • Standard: 3GPP TS 26.441 (Release 12+)
  • Sample rates: 8, 16, 32, 48 kHz — no native 96 kHz support
  • Bitrate: 5.9–128 kbps (mode-dependent)
  • CPU: ~15–30 MIPS/ch — moderate complexity
  • RAM: ~15 KB/ch
  • Quality: Excellent for voice — state-of-the-art speech codec with super-wideband support
  • Licensing: 3GPP patent pool — per-unit royalty required, complex licensing terms
  • MCU feasibility: Marginal on RP2350 for 4ch @48 kHz (~120 MIPS), but no 96 kHz support eliminates it from consideration

Verdict: NOT RECOMMENDED — excellent voice codec but limited to 48 kHz maximum, requires 3GPP licensing, and offers no advantage over XAP for industrial wideband audio monitoring at 96 kHz.


3.16 AAC-ELD (Advanced Audio Coding — Enhanced Low Delay)

  • Standard: ISO/IEC 14496-3 (MPEG-4 Audio, low-delay profile)
  • Algorithm: AAC-LC core with low-delay MDCT window and optional SBR
  • CPU: ~20–30 MIPS/ch @96 kHz — lower latency than AAC-LC but similar computational cost
  • RAM: ~20 KB/ch
  • Latency: ~15–30 ms (significantly lower than standard AAC-LC)
  • Quality: Very good — designed for real-time communication (FaceTime, VoLTE)
  • Licensing: ISO/IEC patent pool — royalty required

Verdict: NOT FEASIBLE for MCU @96 kHz — 4ch @96 kHz requires ~80–120 MIPS, tight on RP2350 and exceeding most single-core MCUs. Licensing cost and lack of MCU-optimized encoders make it impractical. XAP achieves similar low-latency goals with lower complexity.


3.17 aptX Lossless

  • Standard: Qualcomm proprietary (Snapdragon Sound)
  • Algorithm: Adaptive lossless/lossy hybrid — lossless at ~1 Mbps, lossy fallback at ~300 kbps
  • Bandwidth: ~1 Mbps for lossless mode — far exceeds LTE-M1 budget
  • CPU: High — designed for Qualcomm application processors, not MCUs
  • Licensing: Qualcomm proprietary — restricted to Qualcomm silicon ecosystem

Verdict: NOT FEASIBLE — bandwidth requirement (~1 Mbps) is 20x over LTE-M1 capacity, requires Qualcomm silicon, and no MCU implementation exists. For lossless needs, record to SD card and batch upload.


4. Bandwidth analysis table

Codec Config Bitrate KB/s Fits LTE-M1? Quality
Raw PCM 16-bit 4ch @96 kHz 6144 kbps 768 KB/s No Perfect
Raw PCM 16-bit 4ch @48 kHz 3072 kbps 384 KB/s No Perfect
FLAC 4ch @96 kHz ~3072 kbps ~384 KB/s No Lossless
FLAC 4ch @48 kHz ~1536 kbps ~192 KB/s No Lossless
IMA-ADPCM 4ch @96 kHz 1536 kbps 192 KB/s No Fair
IMA-ADPCM 4ch @48 kHz 768 kbps 96 KB/s No (2× over) Fair
aptX 4ch @96 kHz 1536 kbps 192 KB/s No Very Good
aptX 4ch @48 kHz 768 kbps 96 KB/s No Very Good
SBC 4ch @48 kHz ~600 kbps ~77 KB/s No Fair
MP3 @128k 4ch @48 kHz 512 kbps 64 KB/s Yes (marginal) Good
AAC-LC @128k 4ch @96 kHz 512 kbps 64 KB/s Yes (marginal) Very Good
Opus SILK @64k 4ch @48 kHz 256 kbps 32 KB/s YES Excellent
XAP @80k 4ch @96 kHz 320 kbps 40 KB/s YES Excellent
XAP @64k 4ch @96 kHz 256 kbps 32 KB/s YES Very Good
HE-AAC @48k 4ch @48 kHz 192 kbps 24 KB/s Yes Very Good
Speex @20k 4ch @32 kHz 80 kbps 10 KB/s YES Good (voice)

Key finding: XAP at 80 kbps/ch is the only codec that simultaneously achieves 96 kHz support, fits LTE-M1 bandwidth, and can run on target MCUs.


5. MCU platform feasibility matrix

CPU budgets assume worst-case single-core utilization. Dual-core MCUs (RP2350, ESP32-S3) can split encoder work across cores.

Codec / Config RP2350 (300 MIPS, 520 KB SRAM) STM32F103 (72 MIPS, 20 KB) STM32F411 (100 MIPS, 128 KB) ESP32-S3 (480 MIPS, 512 KB+PSRAM) nRF52840 (64 MIPS, 256 KB)
XAP 4ch @96 kHz (80 MIPS, 32 KB) YES NO (RAM) Marginal YES NO
XAP 4ch @48 kHz (40 MIPS, 32 KB) YES NO (RAM) YES YES Marginal
XAP 2ch @48 kHz (20 MIPS, 16 KB) YES YES YES YES YES
Opus SILK 4ch @48 kHz (80 MIPS, 80 KB) YES NO Marginal YES NO
SBC 4ch @48 kHz (20 MIPS, 16 KB) YES NO (RAM) YES YES Marginal
IMA-ADPCM 4ch @96 kHz (< 8 MIPS, 4 KB) YES YES YES YES YES
FLAC 4ch @96 kHz (~400 MIPS, 200 KB) NO NO NO Marginal NO
AAC-LC 4ch @96 kHz (240 MIPS, 100 KB) NO NO NO Marginal NO

Notes

  • RP2350: Dual Cortex-M33 cores at 150 MHz each; 300 MIPS total. Each core includes the ARMv8-M DSP extension: single-cycle 32x32→64 MAC (SMLAL), dual 16x16 MAC (SMLAD/SMUAD), saturating arithmetic (QADD/QSUB/SSAT), and bit-field extract (SBFX/UBFX). These DSP instructions accelerate XAP's MDCT, spectral shaping, and FIR downsampling by 25-40%. With DSP-optimized XAP, 4ch @96 kHz drops from ~80 MIPS to ~56 MIPS, leaving ample headroom on a single core.
  • STM32F103: Cortex-M3 at 72 MHz, only 20 KB SRAM. No DSP extension, no FPU. Even XAP's 32 KB per 4ch encoder state exceeds available RAM. Maximum: 2ch XAP @48 kHz (16 KB state), or IMA-ADPCM (trivial CPU, < 1 KB state).
  • STM32F411: 100 MHz Cortex-M4F with single-precision FPU AND full DSP extension. The FPU accelerates XAP floating-point paths, and DSP instructions (same as M33 plus hardware integer divide) speed up fixed-point paths. With DSP-optimized XAP, 4ch @96 kHz uses ~56 MIPS — feasible at 56% CPU utilization, a significant improvement over the 80% baseline estimate. The CMSIS-DSP library provides drop-in optimized FFT and FIR routines.
  • ESP32-S3: Dual Xtensa LX7 at 240 MHz (480 MIPS total) + 8 MB PSRAM. Features 128-bit SIMD via Processor Instruction Extensions (PIE), capable of processing 4x f32 or 8x i16 in parallel. PIE provides 2-4x speedup for FFT/MDCT batch operations. Hardware AES/SHA accelerators offload TLS overhead. Most capable platform — can run Opus CELT for 2ch, or XAP 4ch @96 kHz at under 20% CPU utilization with SIMD.
  • nRF52840: Cortex-M4F at 64 MHz with FPU and DSP extension. Adequate RAM (256 KB) but limited CPU. With DSP optimization, XAP 2ch @48 kHz uses ~14 MIPS (22% utilization) — comfortable. XAP 4ch @48 kHz (~28 MIPS, 44%) becomes feasible with DSP acceleration, though tight.

CMSIS-DSP integration

ARM's CMSIS-DSP library provides hardware-optimized DSP routines for all Cortex-M cores. Key functions used by the XAP encoder:

CMSIS-DSP Function Used For Speedup vs C
arm_rfft_fast_f32 MDCT/spectral analysis 3-5x
arm_fir_f32 / arm_fir_q15 FIR downsampling filter 2-4x
arm_dot_prod_f32 Inner products in quantization 2-3x
arm_scale_f32 Gain normalization 2x
arm_fill_f32 / arm_copy_f32 Buffer management 1.5-2x

The Xylolabs SDK links CMSIS-DSP on Cortex-M targets automatically when XYLOLABS_USE_CMSIS_DSP=1 is set in the build configuration.


6. DSP and hardware acceleration

Each target MCU offers hardware features that reduce codec CPU cost below the baseline MIPS estimates. These are not theoretical gains -- they reflect real instruction-level speedups from the silicon itself.

Cortex-M33 DSP extension (RP2350, nRF9160)

The ARMv8-M Cortex-M33 includes a mandatory DSP extension with:

  • Single-cycle 32x32→64 MAC (SMLAL, UMLAL) -- the workhorse for FIR filter and MDCT accumulation
  • Dual 16x16→32 MAC (SMLAD, SMUAD) -- processes two 16-bit sample pairs per cycle, doubling throughput for 16-bit audio paths
  • Saturating arithmetic (QADD, QSUB, SSAT, USAT) -- eliminates branch-based clipping in ADPCM step-size adaptation and XAP quantization
  • Bit-field extract (SBFX, UBFX) -- efficient unpacking for XMBP binary protocol parsing

Impact on codecs:

Codec Baseline MIPS/ch @96 kHz With DSP Reduction
XAP ~20 ~14 -30%
IMA-ADPCM < 2 < 1 -50%
SBC ~10 ~7 -30%

Cortex-M4F FPU + DSP (STM32F411, STM32WB55, nRF52840)

Includes everything from the M33 DSP extension plus:

  • Single-precision FPU -- hardware float multiply-accumulate in 1-3 cycles vs 10-20 cycle software emulation
  • Hardware integer divide (SDIV, UDIV) -- 2-12 cycles vs 20-100 cycle software divide
  • FPU impact: XAP has both fixed-point and floating-point encoder paths. On M4F, the floating-point path can actually be faster than fixed-point because the FPU handles normalization, scaling, and spectral analysis natively

Impact on codecs:

Codec Baseline MIPS/ch @96 kHz With FPU+DSP Reduction
XAP (float path) ~20 ~12 -40%
XAP (fixed path) ~20 ~14 -30%
Opus SILK ~50 ~35 -30%

ESP32-S3 vector extensions (PIE)

The Xtensa LX7 cores include Processor Instruction Extensions (PIE):

  • 128-bit SIMD -- process 4x f32 or 8x i16 in a single instruction
  • Dedicated vector registers -- 16 x 128-bit registers for parallel DSP pipelines
  • Hardware AES/SHA -- offloads TLS handshake and encryption from the CPU, freeing cycles for codec work
  • PSRAM DMA -- large audio buffers can live in PSRAM with DMA transfer to/from internal SRAM for processing

Impact on codecs:

Codec Baseline MIPS/ch @96 kHz With PIE SIMD Reduction
XAP ~20 ~8 -60%
Opus CELT ~100 ~50 -50%
FLAC ~100 ~60 -40%

The ESP32-S3 with PIE can run codecs that are infeasible on other platforms -- Opus CELT 2ch @48 kHz, or even experimental FLAC 2ch @48 kHz for lossless capture.

Rust SDK DSP feature flags

The Rust SDK (crates/xylolabs-sdk/) exposes DSP acceleration via Cargo feature flags:

Feature Target Effect
cmsis-dsp Cortex-M33 (RP2350, nRF9160), Cortex-M4F (STM32F411, nRF52840) Enables CMSIS-DSP optimized MDCT and FIR paths
esp32-simd ESP32-S3 (Xtensa LX7) Enables PIE SIMD optimized MDCT paths
# Cargo.toml -- enable DSP acceleration for RP2350
[dependencies]
xylolabs-sdk = { path = "../../crates/xylolabs-sdk", features = ["xap", "cmsis-dsp"] }

# Cargo.toml -- enable SIMD acceleration for ESP32-S3
xylolabs-sdk = { path = "../../crates/xylolabs-sdk", features = ["xap", "esp32-simd"] }

For the C SDK, set XYLOLABS_USE_CMSIS_DSP=1 or XYLOLABS_USE_ESP32S3_SIMD=1 in the build configuration (auto-detected from compiler flags when not set explicitly).

See Performance Profile for per-target CPU budgets and DSP optimization details.

Practical recommendations

  1. Always enable CMSIS-DSP on Cortex-M targets: Rust SDK features = ["cmsis-dsp"], C SDK XYLOLABS_USE_CMSIS_DSP=1. The optimized FFT and FIR routines are drop-in replacements that require no code changes.
  2. Use the XAP floating-point path on Cortex-M4F targets (STM32F411). The FPU makes it faster than the fixed-point path despite the overhead of float conversion.
  3. Use the XAP fixed-point path on Cortex-M33 targets (RP2350) since there is no FPU. The DSP extension still provides significant speedup for MAC-heavy operations.
  4. Enable PIE intrinsics on ESP32-S3 builds: Rust SDK features = ["esp32-simd"], C SDK XYLOLABS_USE_ESP32S3_SIMD=1. The ESP-IDF compiler auto-vectorizes some loops, but explicit PIE intrinsics in the XAP hot path yield an additional 20-30% gain.

7. Recommendation

Primary choice: XAP at 64–80 kbps/ch

XAP is the optimal codec for this use case across all evaluation dimensions:

Criterion XAP Result
Native 96 kHz support Yes
4ch @96 kHz fits LTE-M1 (@80k) 40 KB/s — Yes
CPU for 4ch @96 kHz 80 MIPS — fits RP2350, ESP32-S3
RAM for 4ch 32 KB — fits all target MCUs
Licensing Xylolabs proprietary (patent pending)
Latency 7.5–10 ms frame — low latency
SDK integration XAP codec module in Rust SDK (crates/xylolabs-sdk/)

Recommended operating point: - Bitrate: 80 kbps/ch for highest quality; reduce to 64 kbps/ch to save bandwidth at acceptable quality loss - Frame size: 10 ms (lower CPU than 7.5 ms, acceptable latency) - Total stream: 4 × 80 kbps = 320 kbps = 40 KB/s

Fallback: IMA-ADPCM

When XAP integration is not possible (resource-constrained platforms, STM32F103, nRF52840 with 4ch):

  • CPU: trivial (< 1 MIPS/ch)
  • RAM: < 1 KB/ch
  • Tradeoff: 4:1 compression only — requires reducing channel count or sample rate to fit LTE-M1
  • 2ch @24 kHz ADPCM = 192 kbps = 24 KB/s — fits budget

For 48 kHz-only use: Opus SILK

When the application permits 48 kHz sample rate and maximum codec quality is required:

  • Best-in-class quality for 8–48 kHz audio
  • Rich server-side ecosystem (FFmpeg, libopus, web browser native)
  • BSD license — royalty-free
  • SILK mode: 80 MIPS for 4ch @48 kHz — fits RP2350 and ESP32-S3

8. Server-side decode support matrix

Codec FFmpeg Support Decode Library Storage Format Notes
XAP Yes (XAP decoder library, FFmpeg 7.1+) XAP decoder library (C, Apache 2.0) Transcode to FLAC/WAV Build FFmpeg with XAP decoder support; supports 7.5/10 ms frames + HR mode (48/96 kHz)
Opus Yes (libopus) Native FFmpeg Direct web playback Best-supported codec on modern platforms
MP3 Yes (libmp3lame) Native FFmpeg Direct web playback Mature, universal support
AAC-LC Yes (libfdk_aac) Native FFmpeg Direct web playback fdk-aac GPL issue — use non-free FFmpeg build
HE-AAC Yes (libfdk_aac) Native FFmpeg Direct web playback Same fdk-aac requirement
SBC Yes (built-in) Native FFmpeg Needs transcode to AAC/Opus Lower quality — transcode for storage
Speex Yes (libspeex) libspeex (C, BSD) Transcode to Opus/WAV Build FFmpeg with --enable-libspeex; voice-only, max 32 kHz
IMA-ADPCM Yes (adpcm_ima_wav) Native FFmpeg WAV container Trivial decode, built into all FFmpeg builds
FLAC Yes (built-in) Native FFmpeg Direct lossless storage Standard archival format

XAP server integration

Recommended approach: Use the XAP decoder library for server-side XAP decoding. The C API shown here is an alternative for programmatic control.

Link the XAP decoder library directly for lower-level control:

// Rust - Recommended
// Server-side XAP decode using the xap crate (bindings to XAP decoder library)
use xap::{Decoder, PcmFormat};

let mut decoder = Decoder::new(frame_us, sample_rate);
let pcm_output = decoder.decode::<i16>(xap_frame).unwrap();
Legacy C equivalent
// Server-side XAP decode (C example)
#include "xap.h"

xap_decoder_t decoder = xap_setup_decoder(frame_us, sample_rate, 0, mem);
xap_decode(decoder, xap_frame, xap_frame_len, XAP_PCM_FORMAT_S16,
           pcm_output, stride);

After decoding, write to FLAC or WAV for archival, or transcode to Opus for web delivery.


9. Implementation roadmap

Current state

  1. Rust SDK (crates/xylolabs-sdk/) is the recommended implementation with XAP codec module
  2. Legacy C SDK (sdk/c/common/src/xap_encoder.c) available but scheduled for deprecation
  3. IMA-ADPCM encoder is available as a fallback
  4. E2E burn-in tests validate XAP encoding stability (4 scenarios, 10-device concurrency, QEMU ARM emulation)
  5. Server-side XAP decoder integration pending

Planned steps

Step Action Priority
1 Integrate XAP decoder library into Axum ingest server High
2 Define XMBP chunk type for XAP frames (codec ID, bitrate, frame_us fields) High
3 Implement XAP → FLAC transcode pipeline for archival storage High
4 Add Opus encoder to ESP32-S3 SDK target (sufficient CPU) Medium
5 Deprecate legacy C SDK (sdk/c/) in favor of Rust SDK Medium

Codec ID assignment (XMBP protocol)

Codec ID Codec Notes
0x01 Raw PCM S16LE Debug/development only
0x02 IMA-ADPCM Fallback baseline
0x03 XAP Primary production codec
0x04 Opus ESP32-S3 / high-CPU platforms
0x05 Speex Voice-only (max 32 kHz)

References

  • RFC 6716 (Opus): https://www.rfc-editor.org/rfc/rfc6716
  • IMA ADPCM: IMA Digital Audio Focus and Technical Working Group, 1992
  • LTE-M1 (LTE Cat-M1) specifications: 3GPP TS 36.300, Release 13