Audio codec analysis for MCU-based industrial streaming¶

Xylolabs API — Codec Feasibility Study Revision: 2026-03-22

PATENT PENDING — XAP (Xylolabs Audio Protocol) and XMBP (Xylolabs Metadata Binary Protocol) are proprietary technologies of Xylolabs Inc. Patent applications have been filed.

1. Overview¶

Use case¶

Industrial audio monitoring with four-channel stereo microphone arrays operating at 96 kHz sample rate. Audio is captured on an embedded MCU, compressed in real-time, and streamed over LTE-M1 to a cloud ingest server where it is decoded, stored, and made available for analysis.

LTE-M1 bandwidth constraint¶

LTE-M1 (LTE Cat-M1) provides approximately 375 kbps uplink throughput in good conditions, degrading to ~100–200 kbps under typical field conditions. With protocol overhead, the practical sustained audio payload budget is:

Best case: ~47 KB/s (375 kbps ÷ 8, no overhead)
Practical target: ~30–40 KB/s (allowing for retransmissions, headers, metadata)

Requirements¶

Requirement	Value
Channels	4 (two stereo pairs)
Sample rate	96 kHz
Bit depth	16-bit (24-bit optional)
Encode location	MCU (no OS, bare-metal or RTOS)
Decode location	Cloud server (Linux, no resource constraints)
Max sustained bandwidth	~40 KB/s
Latency budget	< 100ms end-to-end
Quality priority	Industrial monitoring — preserve full spectrum

Raw PCM baseline¶

4 channels × 96,000 samples/s × 16 bits = 6,144,000 bits/s = 768 KB/s

This is 19× over budget. Significant compression is required.

2. Codec comparison table¶

Codec	Type	MIPS/ch @48kHz	MIPS/ch @96kHz	RAM/ch	Compression	Quality	License	MCU Feasible?
XAP	Lossy	5–10	10–20	~8 KB	10:1	Excellent	Xylolabs proprietary (patent pending)	Yes
Opus (SILK)	Lossy	15–25	30–50	~20 KB	10:1+	Excellent	BSD (royalty-free)	Marginal
Opus (CELT)	Lossy	30–50	60–100	~30 KB	10:1+	Excellent	BSD (royalty-free)	No
MP3 (LAME)	Lossy	20–40	N/A (max 48 kHz)	~30 KB	10:1	Good	Patent-free (2017+)	Marginal @48 kHz
AAC-LC	Lossy	25–40	50–80	~25 KB	10:1	Very Good	ISO (Via Licensing)	No @96 kHz
HE-AAC v1	Lossy	30–50	60–100	~30 KB	15:1	Very Good	ISO (licensing required)	No
HE-AAC v2	Lossy	40–60	80–120	~40 KB	20:1	Good (stereo only)	ISO (licensing required)	No
xHE-AAC (USAC)	Lossy	50–80	N/A	~50 KB	25:1	Excellent	ISO (licensing required)	No
SBC	Lossy	3–5	6–10	~4 KB	4–5:1	Fair	Bluetooth SIG (royalty-free)	Yes
aptX	Lossy	5–8	10–16	~8 KB	4:1	Very Good	Qualcomm (licensing required)	Yes
aptX HD	Lossy	8–12	16–24	~12 KB	4:1	Excellent	Qualcomm (licensing required)	Yes (F411/ESP32-S3)
IMA-ADPCM	Lossy	< 1	< 2	< 1 KB	4:1	Fair	Public domain	Yes (all)
Speex	Lossy	3–8	N/A (max 32 kHz)	~4 KB	8–11:1	Good (speech)	BSD (royalty-free)	Yes (voice only)
G.722	Lossy	3–5	N/A (max 16 kHz)	~2 KB	4:1	Good (wideband)	ITU-T (royalty-free)	Yes (voice only)
3GPP EVS	Lossy	10–20	15–30	~15 KB	8–12:1	Excellent (voice)	3GPP (licensing required)	Marginal
AAC-ELD	Lossy	15–25	20–30	~20 KB	8–10:1	Very Good	ISO (licensing required)	No @96 kHz
aptX Lossless	Lossless	N/A	N/A	N/A	~2:1	Perfect	Qualcomm (licensing required)	No
FLAC	Lossless	30–50	60–100	~50 KB	2:1	Perfect	BSD (royalty-free)	No (too slow)
ALAC	Lossless	25–40	50–80	~40 KB	2:1	Perfect	Apache (royalty-free)	No

3. Detailed analysis per codec¶

3.1 XAP — Xylolabs Audio Protocol¶

Sample rates: 8, 16, 24, 32, 44.1, 48, 96 kHz
Frame sizes: 7.5 ms or 10 ms
Bitrate range: 16–320 kbps per channel (encoder-configurable)
MCU implementation: XAP decoder library — open-source reference implementation, ~5,000 lines of C, zero external dependencies, fixed-point arithmetic available
4ch @96 kHz bandwidth: 4 × 80 kbps = 320 kbps = 40 KB/s — fits within LTE-M1 budget
CPU requirement: ~10 MIPS/ch @48 kHz, ~20 MIPS/ch @96 kHz → 80 MIPS total for 4ch @96 kHz → fits RP2350 (300 MIPS dual-core), fits ESP32-S3 (480 MIPS)
With DSP acceleration: ~7 MIPS/ch @48 kHz, ~14 MIPS/ch @96 kHz → 56 MIPS for 4ch @96 kHz using CMSIS-DSP optimized paths on Cortex-M33/M4F (see Section 6)
RAM: ~8 KB per channel encoder state → 32 KB for 4ch → fits all target MCUs
Latency: 7.5–10 ms algorithmic delay per frame
Quality: Near-transparent at 64 kbps/ch; excellent, broadcast-grade at 80 kbps/ch
Licensing: Xylolabs proprietary (patent pending)
Hardware acceleration: The XAP decoder library includes ARM-optimized fixed-point paths that use Cortex-M33/M4F DSP instructions (single-cycle MAC, saturating arithmetic). The MDCT, spectral analysis, and quantization loops see 25–40% speedup with DSP intrinsics.

Verdict: RECOMMENDED — best balance of quality, compression efficiency, CPU budget, RAM, and licensing for this use case. Already integrated in the Xylolabs Rust SDK (crates/xylolabs-sdk/) as the XAP codec module. DSP acceleration reduces CPU usage further on Cortex-M33/M4F platforms.

3.2 Opus¶

Standard: RFC 6716 (IETF)
Modes:
SILK: Low complexity, designed for speech, up to 48 kHz
CELT: Higher complexity, full-band music audio, up to 48 kHz
Hybrid: Combination of SILK + CELT for 12–20 ms frames
Sample rates: 8, 12, 16, 24, 48 kHz — no native 96 kHz support
SILK mode CPU: ~20 MIPS/ch → 80 MIPS for 4ch → fits RP2350, but limited to 48 kHz
CELT mode CPU: ~40 MIPS/ch → 160 MIPS for 4ch → tight on RP2350, exceeds STM32
RAM (encoder): ~20–30 KB/ch → 80–120 KB for 4ch — tight on MCUs with limited SRAM
96 kHz limitation: Opus internally resamples all input to 48 kHz. Encoding at 96 kHz input is not meaningful — the codec will downsample, discarding content above 24 kHz.
Quality: Best-in-class for 48 kHz audio; indistinguishable from original at 96–128 kbps/ch in CELT mode
Licensing: BSD-style (royalty-free, no patent encumbrances)

Verdict: GOOD for 48 kHz streams. Not suitable for true 96 kHz capture because the encoder discards high-frequency content. If the application requirement is relaxed to 48 kHz, Opus SILK is viable on RP2350.

3.3 MP3 (MPEG-1/2 Audio Layer III)¶

Standard: ISO 11172-3 (MPEG-1) / ISO 13818-3 (MPEG-2)
Maximum sample rate: 48 kHz (MPEG-1) — no 96 kHz support
CPU (encoder, e.g., LAME): ~30 MIPS/ch → 120 MIPS for 4ch @48 kHz — tight on most MCUs
RAM: ~30 KB/ch — significant
Quality: Acceptable at 128–192 kbps/ch; audible artifacts at ≤ 96 kbps
Patents: All relevant patents expired by 2017; fully royalty-free
MCU implementations: Very few production-quality MCU MP3 encoders exist; LAME is not designed for embedded

Verdict: NOT RECOMMENDED — no 96 kHz support, high CPU and RAM requirements, dated codec superseded by XAP and Opus. Encoding on MCU is not practical.

3.4 AAC-LC (Advanced Audio Coding — Low Complexity)¶

Standard: ISO/IEC 14496-3 (MPEG-4 Audio)
Sample rates: Up to 96 kHz supported natively
CPU: ~30 MIPS/ch @48 kHz, ~60 MIPS/ch @96 kHz → 240 MIPS for 4ch — exceeds RP2350 (300 MIPS, too close), exceeds F411 and nRF52840 by far
RAM: ~25 KB/ch → 100 KB for 4ch
Quality: Very good at 128 kbps; excellent at 192 kbps
Licensing: ISO/IEC patent pool managed by Via Licensing — royalty required per unit sold
MCU encoders: FDK-AAC by Fraunhofer is the best open-source encoder, but it is heavy and not optimized for bare-metal MCUs

Verdict: NOT RECOMMENDED for MCU — too CPU-heavy at 96 kHz, licensing costs are non-trivial for hardware products, and no well-tested MCU encoder libraries exist.

3.5 HE-AAC v1 (High Efficiency AAC + Spectral Band Replication)¶

Standard: ISO/IEC 14496-3
Enhancement: Spectral Band Replication (SBR) regenerates high frequencies from low-frequency data, enabling high quality at half the bitrate of AAC-LC
CPU: Higher than AAC-LC due to SBR analysis — ~50 MIPS/ch @48 kHz
Best operating range: 48–80 kbps/ch (delivers AAC-LC @96 kbps quality at half the bitrate)
Licensing: ISO/IEC + Dolby patent pool — royalty required
MCU encoder complexity: SBR encoder analysis is computationally intensive and does not fit in typical MCU SRAM

Verdict: NOT FEASIBLE on MCU — encoder is significantly more complex than AAC-LC. Server-side decode is fine (FFmpeg), but encoding on embedded hardware is not practical.

3.6 HE-AAC v2 (HE-AAC v1 + Parametric Stereo)¶

Standard: ISO/IEC 14496-3
Enhancement: Adds Parametric Stereo (PS) on top of HE-AAC v1, encoding stereo as mono + steering parameters
CPU: Highest among AAC variants — ~60–80 MIPS/ch equivalent
Best at: 24–48 kbps for stereo content (remarkable compression for speech and music)
Limitation: Designed for stereo (2-channel) only — not applicable to 4-channel configurations
Quality at low bitrate: Very good for speech; stereo imaging degrades at low bitrates

Verdict: NOT FEASIBLE — too computationally intensive for MCU encoding, stereo-only limitation excludes 4-channel use case.

3.7 xHE-AAC / USAC (Extended HE-AAC / Unified Speech and Audio Coding)¶

Standard: ISO/IEC 23003-3 (MPEG-D)
Technology: Combines ACELP (speech coding) with MDCT (music coding), with smooth switching and advanced noise shaping
CPU: Very high — designed for ARM Cortex-A class processors, not microcontrollers
Quality: State of the art; near-transparent at 32 kbps stereo
Licensing: ISO/IEC patent pool — royalty required

Verdict: NOT FEASIBLE on any MCU — encoder requires a full-featured CPU with OS-level memory management. Not relevant for this use case.

3.8 SBC (Subband Coding)¶

Standard: Bluetooth A2DP specification
Algorithm: Simple subband filter bank with ADPCM-like quantization
CPU: Very low — ~5 MIPS/ch, trivial even on STM32F103
Compression: Only 4–5:1 — less efficient than XAP, meaning higher bandwidth for the same bitrate
RAM: ~4 KB/ch — excellent
Quality: Adequate for Bluetooth speakers; not suitable for high-fidelity archival or spectrum analysis
Licensing: Royalty-free under Bluetooth SIG terms

Verdict: FEASIBLE but strictly inferior to XAP in every dimension except simplicity. SBC was designed as a lowest-common-denominator codec before XAP existed. Use XAP instead when available; use SBC only as a final fallback if XAP cannot be integrated.

3.9 aptX / aptX HD¶

Standard: Qualcomm proprietary
Algorithm: ADPCM with enhanced subband processing
aptX CPU: ~8 MIPS/ch → 32 MIPS for 4ch — very feasible
aptX HD CPU: ~12 MIPS/ch → 48 MIPS for 4ch — feasible on most MCUs
Compression: 4:1 (same ratio as ADPCM but significantly better perceptual quality)
aptX quality: Very good — comparable to 16-bit CD quality
aptX HD quality: Near-lossless for human perception; 24-bit input support
RAM: ~8–12 KB/ch — manageable
Licensing: Qualcomm license agreement required — cost-prohibitive for custom hardware without Qualcomm silicon, and SDK access is restricted

Verdict: GOOD quality, impractical licensing. The Qualcomm licensing model ties aptX to their chipsets commercially. Not suitable for open custom hardware development.

3.10 IMA-ADPCM (Interactive Multimedia Association Adaptive Differential PCM)¶

Standard: IMA/DVI ADPCM specification
Algorithm: Encodes sample-to-sample deltas using a 4-bit quantizer with adaptive step size
CPU: Trivial — < 1 MIPS/ch → < 4 MIPS for 4ch @96 kHz
Compression: Fixed 4:1 (16-bit input → 4-bit output)
RAM: < 1 KB/ch — fits on any MCU
Quality: Fair — audible quantization noise, high-frequency artifacts, no psychoacoustic modeling
4ch @96 kHz bandwidth: 768 KB/s ÷ 4 = 192 KB/s — still over the LTE-M1 budget
4ch @48 kHz bandwidth: 384 KB/s ÷ 4 = 96 KB/s — still over budget
Licensing: Public domain

Verdict: BASELINE FALLBACK — always feasible computationally, runs on any MCU. However, 4:1 compression is insufficient for the 96 kHz / 4-channel use case over LTE-M1. Only viable if the sample rate is further reduced (e.g., 2ch @24 kHz for voice-only monitoring).

3.11 Speex¶

Standard: Xiph.Org open specification (RFC 5574)
Algorithm: CELP (Code-Excited Linear Prediction)
Maximum sample rate: 32 kHz (ultra-wideband) — no 48 kHz or 96 kHz support
Bitrate: 2.15–44.2 kbps (variable, mode-dependent)
CPU: ~3–8 MIPS/ch (narrowband to ultra-wideband)
RAM: ~4 KB/ch
Quality: Good for speech; includes noise suppression, echo cancellation, and VAD (Voice Activity Detection) — features absent from G.722
Licensing: BSD — fully royalty-free, no patent encumbrance
Note: The Speex project officially recommends Opus as its successor for new deployments. However, Speex remains a valid choice for MCU voice-only use cases where Opus is too heavy (>30 MIPS/ch) and G.722's 16 kHz ceiling is limiting.

Verdict: VOICE-ONLY ALTERNATIVE — viable for speech monitoring at 8–32 kHz with better compression than G.722 (8–11:1 vs 4:1) and built-in noise suppression. Superseded by Opus for general audio; XAP remains the recommendation for wideband industrial audio.

3.12 G.722¶

Standard: ITU-T G.722
Algorithm: Subband ADPCM, two-subband (low and high)
Maximum sample rate: 16 kHz (wideband voice) — no 48 kHz or 96 kHz support
CPU: ~5 MIPS/ch
Quality: Good for voice; completely unsuitable for full-band audio
Licensing: ITU-T — royalty-free

Verdict: VOICE-ONLY — useful only if the application is narrowband speech monitoring. Not applicable for 96 kHz industrial audio.

3.13 FLAC (Free Lossless Audio Codec)¶

Standard: Xiph.Org open specification
Algorithm: Linear predictive coding + Rice entropy coding
CPU: ~60–100 MIPS/ch @96 kHz → ~400 MIPS for 4ch — far exceeds all target MCUs
Compression: ~2:1 (lossless, content-dependent)
RAM: ~50 KB/ch — significant
Bandwidth for 4ch @96 kHz: ~384 KB/s — still 10× over LTE-M1 budget
Quality: Perfect — mathematically lossless

Verdict: NOT FEASIBLE — both the CPU cost and the resulting bandwidth exceed the constraints. FLAC is excellent for server-side storage transcoding but is not an encoding codec for this embedded use case.

3.14 ALAC (Apple Lossless Audio Codec)¶

Standard: Apple open-sourced, Apache 2.0 license
Algorithm: Similar to FLAC; integer polynomial predictor + Rice coding
CPU: ~50–80 MIPS/ch @96 kHz → ~320 MIPS for 4ch — exceeds most MCUs
RAM: ~40 KB/ch
Quality: Perfect (lossless)
Licensing: Apache 2.0 (royalty-free)

Verdict: NOT FEASIBLE — same constraints as FLAC. No benefit over FLAC for this use case and less tooling support on Linux servers.

3.15 3GPP EVS (Enhanced Voice Services)¶

Standard: 3GPP TS 26.441 (Release 12+)
Sample rates: 8, 16, 32, 48 kHz — no native 96 kHz support
Bitrate: 5.9–128 kbps (mode-dependent)
CPU: ~15–30 MIPS/ch — moderate complexity
RAM: ~15 KB/ch
Quality: Excellent for voice — state-of-the-art speech codec with super-wideband support
Licensing: 3GPP patent pool — per-unit royalty required, complex licensing terms
MCU feasibility: Marginal on RP2350 for 4ch @48 kHz (~120 MIPS), but no 96 kHz support eliminates it from consideration

Verdict: NOT RECOMMENDED — excellent voice codec but limited to 48 kHz maximum, requires 3GPP licensing, and offers no advantage over XAP for industrial wideband audio monitoring at 96 kHz.

3.16 AAC-ELD (Advanced Audio Coding — Enhanced Low Delay)¶

Standard: ISO/IEC 14496-3 (MPEG-4 Audio, low-delay profile)
Algorithm: AAC-LC core with low-delay MDCT window and optional SBR
CPU: ~20–30 MIPS/ch @96 kHz — lower latency than AAC-LC but similar computational cost
RAM: ~20 KB/ch
Latency: ~15–30 ms (significantly lower than standard AAC-LC)
Quality: Very good — designed for real-time communication (FaceTime, VoLTE)
Licensing: ISO/IEC patent pool — royalty required

Verdict: NOT FEASIBLE for MCU @96 kHz — 4ch @96 kHz requires ~80–120 MIPS, tight on RP2350 and exceeding most single-core MCUs. Licensing cost and lack of MCU-optimized encoders make it impractical. XAP achieves similar low-latency goals with lower complexity.

3.17 aptX Lossless¶

Standard: Qualcomm proprietary (Snapdragon Sound)
Algorithm: Adaptive lossless/lossy hybrid — lossless at ~1 Mbps, lossy fallback at ~300 kbps
Bandwidth: ~1 Mbps for lossless mode — far exceeds LTE-M1 budget
CPU: High — designed for Qualcomm application processors, not MCUs
Licensing: Qualcomm proprietary — restricted to Qualcomm silicon ecosystem

Verdict: NOT FEASIBLE — bandwidth requirement (~1 Mbps) is 20x over LTE-M1 capacity, requires Qualcomm silicon, and no MCU implementation exists. For lossless needs, record to SD card and batch upload.

4. Bandwidth analysis table¶

Codec	Config	Bitrate	KB/s	Fits LTE-M1?	Quality
Raw PCM 16-bit	4ch @96 kHz	6144 kbps	768 KB/s	No	Perfect
Raw PCM 16-bit	4ch @48 kHz	3072 kbps	384 KB/s	No	Perfect
FLAC	4ch @96 kHz	~3072 kbps	~384 KB/s	No	Lossless
FLAC	4ch @48 kHz	~1536 kbps	~192 KB/s	No	Lossless
IMA-ADPCM	4ch @96 kHz	1536 kbps	192 KB/s	No	Fair
IMA-ADPCM	4ch @48 kHz	768 kbps	96 KB/s	No (2× over)	Fair
aptX	4ch @96 kHz	1536 kbps	192 KB/s	No	Very Good
aptX	4ch @48 kHz	768 kbps	96 KB/s	No	Very Good
SBC	4ch @48 kHz	~600 kbps	~77 KB/s	No	Fair
MP3 @128k	4ch @48 kHz	512 kbps	64 KB/s	Yes (marginal)	Good
AAC-LC @128k	4ch @96 kHz	512 kbps	64 KB/s	Yes (marginal)	Very Good
Opus SILK @64k	4ch @48 kHz	256 kbps	32 KB/s	YES	Excellent
XAP @80k	4ch @96 kHz	320 kbps	40 KB/s	YES	Excellent
XAP @64k	4ch @96 kHz	256 kbps	32 KB/s	YES	Very Good
HE-AAC @48k	4ch @48 kHz	192 kbps	24 KB/s	Yes	Very Good
Speex @20k	4ch @32 kHz	80 kbps	10 KB/s	YES	Good (voice)

Key finding: XAP at 80 kbps/ch is the only codec that simultaneously achieves 96 kHz support, fits LTE-M1 bandwidth, and can run on target MCUs.

5. MCU platform feasibility matrix¶

CPU budgets assume worst-case single-core utilization. Dual-core MCUs (RP2350, ESP32-S3) can split encoder work across cores.

Codec / Config	RP2350 (300 MIPS, 520 KB SRAM)	STM32F103 (72 MIPS, 20 KB)	STM32F411 (100 MIPS, 128 KB)	ESP32-S3 (480 MIPS, 512 KB+PSRAM)	nRF52840 (64 MIPS, 256 KB)
XAP 4ch @96 kHz (80 MIPS, 32 KB)	YES	NO (RAM)	Marginal	YES	NO
XAP 4ch @48 kHz (40 MIPS, 32 KB)	YES	NO (RAM)	YES	YES	Marginal
XAP 2ch @48 kHz (20 MIPS, 16 KB)	YES	YES	YES	YES	YES
Opus SILK 4ch @48 kHz (80 MIPS, 80 KB)	YES	NO	Marginal	YES	NO
SBC 4ch @48 kHz (20 MIPS, 16 KB)	YES	NO (RAM)	YES	YES	Marginal
IMA-ADPCM 4ch @96 kHz (< 8 MIPS, 4 KB)	YES	YES	YES	YES	YES
FLAC 4ch @96 kHz (~400 MIPS, 200 KB)	NO	NO	NO	Marginal	NO
AAC-LC 4ch @96 kHz (240 MIPS, 100 KB)	NO	NO	NO	Marginal	NO

Notes¶

RP2350: Dual Cortex-M33 cores at 150 MHz each; 300 MIPS total. Each core includes the ARMv8-M DSP extension: single-cycle 32x32→64 MAC (SMLAL), dual 16x16 MAC (SMLAD/SMUAD), saturating arithmetic (QADD/QSUB/SSAT), and bit-field extract (SBFX/UBFX). These DSP instructions accelerate XAP's MDCT, spectral shaping, and FIR downsampling by 25-40%. With DSP-optimized XAP, 4ch @96 kHz drops from ~80 MIPS to ~56 MIPS, leaving ample headroom on a single core.
STM32F103: Cortex-M3 at 72 MHz, only 20 KB SRAM. No DSP extension, no FPU. Even XAP's 32 KB per 4ch encoder state exceeds available RAM. Maximum: 2ch XAP @48 kHz (16 KB state), or IMA-ADPCM (trivial CPU, < 1 KB state).
STM32F411: 100 MHz Cortex-M4F with single-precision FPU AND full DSP extension. The FPU accelerates XAP floating-point paths, and DSP instructions (same as M33 plus hardware integer divide) speed up fixed-point paths. With DSP-optimized XAP, 4ch @96 kHz uses ~56 MIPS — feasible at 56% CPU utilization, a significant improvement over the 80% baseline estimate. The CMSIS-DSP library provides drop-in optimized FFT and FIR routines.
ESP32-S3: Dual Xtensa LX7 at 240 MHz (480 MIPS total) + 8 MB PSRAM. Features 128-bit SIMD via Processor Instruction Extensions (PIE), capable of processing 4x f32 or 8x i16 in parallel. PIE provides 2-4x speedup for FFT/MDCT batch operations. Hardware AES/SHA accelerators offload TLS overhead. Most capable platform — can run Opus CELT for 2ch, or XAP 4ch @96 kHz at under 20% CPU utilization with SIMD.
nRF52840: Cortex-M4F at 64 MHz with FPU and DSP extension. Adequate RAM (256 KB) but limited CPU. With DSP optimization, XAP 2ch @48 kHz uses ~14 MIPS (22% utilization) — comfortable. XAP 4ch @48 kHz (~28 MIPS, 44%) becomes feasible with DSP acceleration, though tight.

CMSIS-DSP integration¶

ARM's CMSIS-DSP library provides hardware-optimized DSP routines for all Cortex-M cores. Key functions used by the XAP encoder:

CMSIS-DSP Function	Used For	Speedup vs C
`arm_rfft_fast_f32`	MDCT/spectral analysis	3-5x
`arm_fir_f32` / `arm_fir_q15`	FIR downsampling filter	2-4x
`arm_dot_prod_f32`	Inner products in quantization	2-3x
`arm_scale_f32`	Gain normalization	2x
`arm_fill_f32` / `arm_copy_f32`	Buffer management	1.5-2x

The Xylolabs SDK links CMSIS-DSP on Cortex-M targets automatically when XYLOLABS_USE_CMSIS_DSP=1 is set in the build configuration.

6. DSP and hardware acceleration¶

Each target MCU offers hardware features that reduce codec CPU cost below the baseline MIPS estimates. These are not theoretical gains -- they reflect real instruction-level speedups from the silicon itself.

Cortex-M33 DSP extension (RP2350, nRF9160)¶

The ARMv8-M Cortex-M33 includes a mandatory DSP extension with:

Single-cycle 32x32→64 MAC (SMLAL, UMLAL) -- the workhorse for FIR filter and MDCT accumulation
Dual 16x16→32 MAC (SMLAD, SMUAD) -- processes two 16-bit sample pairs per cycle, doubling throughput for 16-bit audio paths
Saturating arithmetic (QADD, QSUB, SSAT, USAT) -- eliminates branch-based clipping in ADPCM step-size adaptation and XAP quantization
Bit-field extract (SBFX, UBFX) -- efficient unpacking for XMBP binary protocol parsing

Impact on codecs:

Codec	Baseline MIPS/ch @96 kHz	With DSP	Reduction
XAP	~20	~14	-30%
IMA-ADPCM	< 2	< 1	-50%
SBC	~10	~7	-30%

Cortex-M4F FPU + DSP (STM32F411, STM32WB55, nRF52840)¶

Includes everything from the M33 DSP extension plus:

Single-precision FPU -- hardware float multiply-accumulate in 1-3 cycles vs 10-20 cycle software emulation
Hardware integer divide (SDIV, UDIV) -- 2-12 cycles vs 20-100 cycle software divide
FPU impact: XAP has both fixed-point and floating-point encoder paths. On M4F, the floating-point path can actually be faster than fixed-point because the FPU handles normalization, scaling, and spectral analysis natively

Impact on codecs:

Codec	Baseline MIPS/ch @96 kHz	With FPU+DSP	Reduction
XAP (float path)	~20	~12	-40%
XAP (fixed path)	~20	~14	-30%
Opus SILK	~50	~35	-30%

ESP32-S3 vector extensions (PIE)¶

The Xtensa LX7 cores include Processor Instruction Extensions (PIE):

128-bit SIMD -- process 4x f32 or 8x i16 in a single instruction
Dedicated vector registers -- 16 x 128-bit registers for parallel DSP pipelines
Hardware AES/SHA -- offloads TLS handshake and encryption from the CPU, freeing cycles for codec work
PSRAM DMA -- large audio buffers can live in PSRAM with DMA transfer to/from internal SRAM for processing

Impact on codecs:

Codec	Baseline MIPS/ch @96 kHz	With PIE SIMD	Reduction
XAP	~20	~8	-60%
Opus CELT	~100	~50	-50%
FLAC	~100	~60	-40%

The ESP32-S3 with PIE can run codecs that are infeasible on other platforms -- Opus CELT 2ch @48 kHz, or even experimental FLAC 2ch @48 kHz for lossless capture.

Rust SDK DSP feature flags¶

The Rust SDK (crates/xylolabs-sdk/) exposes DSP acceleration via Cargo feature flags:

Feature	Target	Effect
`cmsis-dsp`	Cortex-M33 (RP2350, nRF9160), Cortex-M4F (STM32F411, nRF52840)	Enables CMSIS-DSP optimized MDCT and FIR paths
`esp32-simd`	ESP32-S3 (Xtensa LX7)	Enables PIE SIMD optimized MDCT paths

# Cargo.toml -- enable DSP acceleration for RP2350
[dependencies]
xylolabs-sdk = { path = "../../crates/xylolabs-sdk", features = ["xap", "cmsis-dsp"] }

# Cargo.toml -- enable SIMD acceleration for ESP32-S3
xylolabs-sdk = { path = "../../crates/xylolabs-sdk", features = ["xap", "esp32-simd"] }

For the C SDK, set XYLOLABS_USE_CMSIS_DSP=1 or XYLOLABS_USE_ESP32S3_SIMD=1 in the build configuration (auto-detected from compiler flags when not set explicitly).

See Performance Profile for per-target CPU budgets and DSP optimization details.

Practical recommendations¶

Always enable CMSIS-DSP on Cortex-M targets: Rust SDK features = ["cmsis-dsp"], C SDK XYLOLABS_USE_CMSIS_DSP=1. The optimized FFT and FIR routines are drop-in replacements that require no code changes.
Use the XAP floating-point path on Cortex-M4F targets (STM32F411). The FPU makes it faster than the fixed-point path despite the overhead of float conversion.
Use the XAP fixed-point path on Cortex-M33 targets (RP2350) since there is no FPU. The DSP extension still provides significant speedup for MAC-heavy operations.
Enable PIE intrinsics on ESP32-S3 builds: Rust SDK features = ["esp32-simd"], C SDK XYLOLABS_USE_ESP32S3_SIMD=1. The ESP-IDF compiler auto-vectorizes some loops, but explicit PIE intrinsics in the XAP hot path yield an additional 20-30% gain.

7. Recommendation¶

Primary choice: XAP at 64–80 kbps/ch¶

XAP is the optimal codec for this use case across all evaluation dimensions:

Criterion	XAP Result
Native 96 kHz support	Yes
4ch @96 kHz fits LTE-M1 (@80k)	40 KB/s — Yes
CPU for 4ch @96 kHz	80 MIPS — fits RP2350, ESP32-S3
RAM for 4ch	32 KB — fits all target MCUs
Licensing	Xylolabs proprietary (patent pending)
Latency	7.5–10 ms frame — low latency
SDK integration	XAP codec module in Rust SDK (`crates/xylolabs-sdk/`)

Recommended operating point: - Bitrate: 80 kbps/ch for highest quality; reduce to 64 kbps/ch to save bandwidth at acceptable quality loss - Frame size: 10 ms (lower CPU than 7.5 ms, acceptable latency) - Total stream: 4 × 80 kbps = 320 kbps = 40 KB/s

Fallback: IMA-ADPCM¶

When XAP integration is not possible (resource-constrained platforms, STM32F103, nRF52840 with 4ch):

CPU: trivial (< 1 MIPS/ch)
RAM: < 1 KB/ch
Tradeoff: 4:1 compression only — requires reducing channel count or sample rate to fit LTE-M1
2ch @24 kHz ADPCM = 192 kbps = 24 KB/s — fits budget

For 48 kHz-only use: Opus SILK¶

When the application permits 48 kHz sample rate and maximum codec quality is required:

Best-in-class quality for 8–48 kHz audio
Rich server-side ecosystem (FFmpeg, libopus, web browser native)
BSD license — royalty-free
SILK mode: 80 MIPS for 4ch @48 kHz — fits RP2350 and ESP32-S3

8. Server-side decode support matrix¶

Codec	FFmpeg Support	Decode Library	Storage Format	Notes
XAP	Yes (XAP decoder library, FFmpeg 7.1+)	XAP decoder library (C, Apache 2.0)	Transcode to FLAC/WAV	Build FFmpeg with XAP decoder support; supports 7.5/10 ms frames + HR mode (48/96 kHz)
Opus	Yes (`libopus`)	Native FFmpeg	Direct web playback	Best-supported codec on modern platforms
MP3	Yes (`libmp3lame`)	Native FFmpeg	Direct web playback	Mature, universal support
AAC-LC	Yes (`libfdk_aac`)	Native FFmpeg	Direct web playback	fdk-aac GPL issue — use non-free FFmpeg build
HE-AAC	Yes (`libfdk_aac`)	Native FFmpeg	Direct web playback	Same fdk-aac requirement
SBC	Yes (built-in)	Native FFmpeg	Needs transcode to AAC/Opus	Lower quality — transcode for storage
Speex	Yes (`libspeex`)	`libspeex` (C, BSD)	Transcode to Opus/WAV	Build FFmpeg with `--enable-libspeex`; voice-only, max 32 kHz
IMA-ADPCM	Yes (`adpcm_ima_wav`)	Native FFmpeg	WAV container	Trivial decode, built into all FFmpeg builds
FLAC	Yes (built-in)	Native FFmpeg	Direct lossless storage	Standard archival format

XAP server integration¶

Recommended approach: Use the XAP decoder library for server-side XAP decoding. The C API shown here is an alternative for programmatic control.

Link the XAP decoder library directly for lower-level control:

// Rust - Recommended
// Server-side XAP decode using the xap crate (bindings to XAP decoder library)
use xap::{Decoder, PcmFormat};

let mut decoder = Decoder::new(frame_us, sample_rate);
let pcm_output = decoder.decode::<i16>(xap_frame).unwrap();

Legacy C equivalent

// Server-side XAP decode (C example)
#include "xap.h"

xap_decoder_t decoder = xap_setup_decoder(frame_us, sample_rate, 0, mem);
xap_decode(decoder, xap_frame, xap_frame_len, XAP_PCM_FORMAT_S16,
           pcm_output, stride);

After decoding, write to FLAC or WAV for archival, or transcode to Opus for web delivery.

9. Implementation roadmap¶

Current state¶

Rust SDK (crates/xylolabs-sdk/) is the recommended implementation with XAP codec module
Legacy C SDK (sdk/c/common/src/xap_encoder.c) available but scheduled for deprecation
IMA-ADPCM encoder is available as a fallback
E2E burn-in tests validate XAP encoding stability (4 scenarios, 10-device concurrency, QEMU ARM emulation)
Server-side XAP decoder integration pending

Planned steps¶

Step	Action	Priority
1	Integrate XAP decoder library into Axum ingest server	High
2	Define XMBP chunk type for XAP frames (codec ID, bitrate, frame_us fields)	High
3	Implement XAP → FLAC transcode pipeline for archival storage	High
4	Add Opus encoder to ESP32-S3 SDK target (sufficient CPU)	Medium
5	Deprecate legacy C SDK (`sdk/c/`) in favor of Rust SDK	Medium

Codec ID assignment (XMBP protocol)¶

Codec ID	Codec	Notes
`0x01`	Raw PCM S16LE	Debug/development only
`0x02`	IMA-ADPCM	Fallback baseline
`0x03`	XAP	Primary production codec
`0x04`	Opus	ESP32-S3 / high-CPU platforms
`0x05`	Speex	Voice-only (max 32 kHz)

References¶

RFC 6716 (Opus): https://www.rfc-editor.org/rfc/rfc6716
IMA ADPCM: IMA Digital Audio Focus and Technical Working Group, 1992
LTE-M1 (LTE Cat-M1) specifications: 3GPP TS 36.300, Release 13