Skip to content

Performance Evaluation — MCU Targets & Server Concurrency

Xylolabs API — Performance evaluation: MCU targets and server concurrency Revision: 2026-03-23


1. XAP Codec Benchmark Results

All measurements taken on Apple M-series host in --release mode. The XAP encoder is implemented in crates/xylolabs-sdk/src/codec/xap.rs. ADPCM encoder is the IMA-ADPCM implementation in crates/xylolabs-sdk/src/codec/adpcm.rs.

1.1 Encode Time Per Frame — Mono, 10ms

Rate Ch Samples Frame Avg (us) Min (us) Max (us) Budget%
8000 1 80 10ms 0.5 0.4 12.7 0.005%
16000 1 160 10ms 1.1 1.0 1.2 0.011%
24000 1 240 10ms 1.9 1.6 6.8 0.019%
32000 1 320 10ms 3.0 2.6 9.0 0.030%
48000 1 480 10ms 512.3 451.1 700.0 5.123%
96000 1 960 10ms 1954.0 1763.8 2207.8 19.540%

Critical Observation — Cosine Table Threshold: There is a 170x discontinuity between 32kHz (3.0 us) and 48kHz (512.3 us). This is caused by the precomputed cosine table cutoff at XAP_PRECOMPUTE_MAX_N = 320 samples. Frames with N <= 320 (sample rates up to 32kHz at 10ms) use O(N^2) table lookup with zero trigonometric function calls. Frames with N > 320 (48kHz = 480, 96kHz = 960) fall back to runtime cosf() per MDCT coefficient, which is dramatically slower.

On real MCU targets with DSP extensions, the CMSIS-DSP arm_rfft_fast_f32 replaces this entire MDCT path, eliminating the discontinuity. The host benchmark reflects the software-only encoder behavior.

1.2 Channel Scaling — 16kHz, 10ms

Rate Ch Samples/ch Avg (us) Per-ch (us) Scaling
16000 1 160 1.0 1.0 1.00x
16000 2 160 1.9 1.0 1.90x
16000 3 160 2.8 0.9 2.80x
16000 4 160 3.7 0.9 3.70x

Channel scaling is sub-linear (3.7x for 4 channels instead of 4.0x). The encoder processes channels independently with shared infrastructure (header write, de-interleave). The per-channel MDCT cost dominates, and the slight sub-linearity comes from amortized header/setup overhead.

1.3 Four-Channel Stress — Key Sample Rates

Rate Ch Samples/ch Avg (us) Min (us) Max (us) Budget%
16000 4 160 3.6 3.2 4.6 0.036%
48000 4 480 2004.7 1829.7 2378.2 20.047%
96000 4 960 7930.9 7390.5 10632.4 79.309%

The 4ch@96kHz configuration consumes 79.3% of the 10ms frame budget on the host. This leaves only 2.1ms headroom for XMBP encoding, I/O, sensors, and housekeeping on the host. On MCU targets with DSP acceleration, the MDCT is replaced by hardware-accelerated FFT paths that reduce this to 37-56 MIPS (see Section 2).

1.4 Frame Duration Comparison — 48kHz, 2ch

Rate Ch Duration Samples/ch Avg (us) Budget%
48000 2 7.5ms 360 581.7 7.756%
48000 2 10ms 480 1007.6 10.076%

The 7.5ms frame processes fewer samples (360 vs 480) and is 42% faster in absolute time, but consumes a higher fraction of its shorter budget (7.8% vs 10.1%). The 10ms frame is preferred for MCU targets because the longer budget window provides more headroom for scheduling jitter and interrupt latency.

1.5 ADPCM Encode Time Per Frame — 10ms

Rate Ch Samples Avg (us) Min (us) Max (us) Budget%
16000 1 160 0.93 0.83 7.21 0.009%
16000 2 320 1.83 1.71 2.58 0.018%
16000 4 640 3.70 3.50 10.21 0.037%
48000 1 480 2.87 2.62 9.58 0.029%
48000 2 960 5.68 5.38 12.33 0.057%
48000 4 1920 11.56 10.79 19.67 0.116%
96000 1 960 5.79 5.38 13.50 0.058%
96000 2 1920 11.51 10.83 20.42 0.115%
96000 4 3840 23.00 21.75 48.79 0.230%

ADPCM is trivially cheap at all configurations. Even 4ch@96kHz costs only 23 us (0.23% of budget). This is because ADPCM uses pure integer arithmetic with no spectral transform -- it encodes sample-by-sample deltas using a lookup table.

1.6 XAP vs ADPCM Cost Ratio

Config XAP (us) ADPCM (us) Ratio
1ch@16kHz 1.0 0.93 1x
4ch@16kHz 3.6 3.66 1x
4ch@48kHz 2054.1 11.72 175x
4ch@96kHz 7973.9 22.91 348x

At low sample rates (<=32kHz), XAP and ADPCM have comparable cost because the precomputed cosine table eliminates runtime trig. At 48kHz and above, XAP becomes 175-348x more expensive due to the O(N^2) MDCT with runtime cosf(). On MCU targets, DSP-accelerated FFT paths close this gap significantly (to approximately 20-40x), but ADPCM remains the lightest option for CPU-constrained targets.


2. MCU Feasibility Matrix

2.1 Maximum Sustainable Configuration Per Target

Evaluated using documented MIPS profiles from PERFORMANCE-PROFILE.md and CODEC-ANALYSIS.md, cross-referenced with benchmark measurements. CPU% includes codec encoding, I/O stack (I2S DMA, transport protocol, sensor sampling), and system overhead. RAM% includes SDK client state, codec buffers, ring buffers, XMBP framing, transport stack, and task stacks.

Target Clock SRAM DSP/FPU Max Audio Config CPU% RAM% Verdict
RP2350 (Pico 2) 150 MHz 520 KB M33 DSP+FPU 4ch @96kHz XAP 46.0% 16.9% COMFORTABLE
ESP32-S3 240 MHz 512 KB + 8MB PSRAM PIE SIMD+FPU 4ch @96kHz XAP 17.7% 24.2% COMFORTABLE
STM32F411 100 MHz 128 KB M4F DSP+FPU 4ch @48kHz XAP 40.0% 34.4% FEASIBLE
nRF52840 64 MHz 256 KB M4F DSP+FPU 2ch @48kHz XAP 42.2% 21.9% FEASIBLE
nRF9160 64 MHz 256 KB M33 DSP+FPU 2ch @48kHz XAP 44.5% 21.9% FEASIBLE
STM32WB55 64 MHz 256 KB M4F DSP+FPU 2ch @48kHz XAP 42.2% 17.2% FEASIBLE
RP2040 (Pico) 133 MHz 264 KB None ADPCM 4ch @96kHz 3.0% 12.1% ADPCM ONLY
ESP32-C3 160 MHz 400 KB M ext only ADPCM 4ch @96kHz 2.5% 8.0% ADPCM ONLY
STM32F103 72 MHz 20 KB None ADPCM 2ch @24kHz 1.4% 80.0% SENSOR ONLY

2.2 Verdict Definitions

Verdict CPU Utilization Meaning
COMFORTABLE < 50% Ample headroom for additional processing, OTA updates, or future features.
FEASIBLE 50-70% Sufficient headroom for stable operation with careful scheduling.
TIGHT 70-85% Operational but may exhibit jitter under worst-case interrupt latency.
MARGINAL 85-100% Risk of frame drops under load. Not recommended for production.
ADPCM ONLY N/A (XAP infeasible) No DSP/FPU; cannot run XAP encoder. ADPCM at 4:1 compression only.
SENSOR ONLY N/A (audio limited) Extreme SRAM constraint. ADPCM 1-2ch + sensor telemetry only.

2.3 Detailed CPU Budget Per Target

RP2350 (Pico 2) — 4ch XAP @96kHz

Dual-core Cortex-M33 at 150 MHz. Core 0 handles I2S DMA + XAP encoding, Core 1 handles XMBP + HTTP + sensors.

Component Baseline MIPS With DSP MIPS % of 150 MHz
I2S DMA handling 2 2 1.3%
XAP MDCT forward 50 35 23.3%
XAP quantize+pack 15 10 6.7%
XMBP batch encode 5 5 3.3%
HTTP transport 10 10 6.7%
Sensor sampling (26ch) 5 5 3.3%
Watchdog + housekeeping 2 2 1.3%
Total 89 69 46.0%
Available headroom 61 81 54.0%

With dual-core split: system utilization drops to ~39.3%.

ESP32-S3 — 4ch XAP @96kHz

Dual-core Xtensa LX7 at 240 MHz (480 MIPS total).

Component Baseline MIPS With PIE MIPS % of 480 MHz
I2S DMA handling 2 2 0.4%
XAP MDCT forward 50 20 4.2%
XAP quantize+pack 15 6 1.3%
WiFi stack (FreeRTOS) 30 30 6.3%
XMBP batch encode 5 5 1.0%
HTTP/TLS transport 20 12 2.5%
Sensor sampling (26ch) 5 5 1.0%
PSRAM DMA management 3 3 0.6%
Watchdog + housekeeping 2 2 0.4%
Total 132 85 17.7%
Available headroom 348 395 82.3%

The ESP32-S3 has the most comfortable margin by far, primarily due to 128-bit PIE SIMD on the MDCT inner loop and hardware AES/SHA offload for TLS.

STM32F411 — 4ch XAP @48kHz

Single-core Cortex-M4F at 100 MHz.

Component Baseline MIPS With DSP MIPS % of 100 MHz
I2S DMA handling 2 2 2.0%
XAP MDCT forward 25 15 15.0%
XAP quantize+pack 8 5 5.0%
XMBP batch encode 3 3 3.0%
UART LTE-M1 transport 8 8 8.0%
Sensor sampling (26ch) 5 5 5.0%
Watchdog + housekeeping 2 2 2.0%
Total 53 40 40.0%
Available headroom 47 60 60.0%

The M4F FPU enables the XAP floating-point encoder path, which is faster than fixed-point on this core. 4ch@96kHz is not recommended (would exceed 80% utilization).

nRF52840 — 2ch XAP @48kHz

Single-core Cortex-M4F at 64 MHz.

Component Baseline MIPS With DSP MIPS % of 64 MHz
I2S DMA handling 1 1 1.6%
XAP MDCT forward 12 7 10.9%
XAP quantize+pack 4 3 4.7%
BLE GATT stack 10 10 15.6%
XMBP batch encode 2 2 3.1%
Sensor sampling (4ch) 2 2 3.1%
Watchdog + housekeeping 2 2 3.1%
Total 33 27 42.2%
Available headroom 31 37 57.8%

BLE stack overhead is the single largest non-codec consumer (15.6%). 4ch@48kHz is possible (~28 MIPS with DSP, 44% utilization) but leaves minimal headroom.

STM32F103 — Sensor-only + ADPCM fallback

Single-core Cortex-M3 at 72 MHz, 20 KB SRAM.

Component MIPS % of 72 MHz
ADPCM encode 2ch @24kHz 1 1.4%
XMBP batch encode 2 2.8%
UART LTE-M1 transport 8 11.1%
Sensor sampling (4ch) 3 4.2%
Watchdog + housekeeping 2 2.8%
Total 16 22.2%
Available headroom 56 77.8%

XAP is not feasible: the 32 KB encoder state for 4 channels exceeds the entire 20 KB SRAM. Even mono XAP (8 KB encoder state) leaves only 12 KB for everything else. ADPCM at 2ch @24kHz is the maximum practical audio configuration.


3. DSP/FPU Impact Analysis

3.1 Speedup by DSP Architecture

DSP Architecture Platforms Key Instructions XAP Speedup Mechanism
ARMv8-M DSP (Cortex-M33) RP2350, nRF9160 SMLAD dual MAC, QADD, SSAT ~30% Dual 16x16 MAC doubles throughput on 16-bit audio. Saturating arithmetic eliminates branch-based clipping. Fixed-point (Q15) path preferred.
Cortex-M4F FPU+DSP STM32F411, STM32WB55, nRF52840 FPU + SMLAD + SDIV ~35-40% Hardware float multiply-accumulate in 1-3 cycles. Float encoder path is faster than fixed-point. CMSIS-DSP arm_rfft_fast_f32 provides 3-5x speedup for MDCT.
Xtensa PIE SIMD (ESP32-S3) ESP32-S3 128-bit 4x f32, 8x i16 ~60% 4-wide vector operations on dedicated 128-bit registers. Hardware AES/SHA offloads TLS from CPU. PSRAM DMA for large buffer transfers.
No DSP RP2040, ESP32-C3, STM32F103 Software multiply only 0% (baseline) All operations in software. No SIMD, no hardware float. XAP MDCT requires software cosf() which is 10-50x slower than DSP-accelerated FFT.

3.2 MDCT Path Selection

The MDCT forward transform is the encoder hot path, consuming 60-70% of total XAP encode time. The SDK selects the optimal path at compile time:

Core FPU Compile Path MDCT Strategy
Cortex-M33 (RP2350, nRF9160) Single-precision cmsis-dsp feature Fixed-point Q15 with SMLAD dual MAC. Precomputed cosine table in Q15 format.
Cortex-M4F (F411, WB55, nRF52840) Single-precision cmsis-dsp feature Floating-point with arm_rfft_fast_f32. Hardware FPU makes float path faster than fixed-point.
Xtensa LX7 (ESP32-S3) Single-precision esp32-simd feature Float with PIE 128-bit SIMD. 4 f32 values processed per vector instruction.
Cortex-M0+ (RP2040) None default (no DSP) Not feasible. Software float MDCT exceeds CPU budget. ADPCM only.
Cortex-M3 (STM32F103) None default (no DSP) Not feasible. Same as M0+. ADPCM only.
RISC-V (ESP32-C3) None default (no DSP) Not feasible. M extension provides integer multiply but no SIMD. ADPCM only.

3.3 Benchmark-Measured Cosine Table Impact

The host benchmark reveals the dramatic impact of the precomputed cosine table:

Sample Rate Frame Samples Table Used Encode Time (1ch) Notes
8 kHz 80 Yes 0.5 us O(N^2) table lookup, zero trig calls
16 kHz 160 Yes 1.1 us O(N^2) table lookup
24 kHz 240 Yes 1.9 us O(N^2) table lookup
32 kHz 320 Yes (limit) 3.0 us Max N for precomputed table
48 kHz 480 No 512.3 us Runtime cosf() per coefficient -- 170x slower
96 kHz 960 No 1954.0 us Runtime cosf() -- 651x slower than 32kHz

The table limit (XAP_PRECOMPUTE_MAX_N = 320) keeps memory usage under 200 KB (51,200 i32 entries = 200 KB). Extending to 480 would require (480/2)*480 = 115,200 entries = 450 KB, which exceeds the SRAM of most MCU targets. On MCU targets, CMSIS-DSP replaces the entire MDCT with hardware-accelerated FFT, making this table irrelevant.


4. Memory Budget Analysis

4.1 Per-Target Memory Breakdown

All values in KB. Configurations shown are the maximum recommended per Section 2.

Target SRAM SDK+Codec Ring Buf XMBP HTTP Stack Used Avail RAM%
RP2350 (Pico 2) 520 20 32 16 4 16 88 432 16.9%
ESP32-S3 512+8M 20 64* 16 8 16 124 388+ 24.2%
STM32F411 128 20 8 4 4 8 44 84 34.4%
nRF52840 256 20 16 8 4 8 56 200 21.9%
nRF9160 256 20 16 8 4 8 56 200 21.9%
STM32WB55 256 20 8 4 4 8 44 212 17.2%
RP2040 (Pico) 264 8 8 4 4 8 32 232 12.1%
ESP32-C3 400 8 8 4 4 8 32 368 8.0%
STM32F103 20 4 4 2 2 4 16 4 80.0%

*ESP32-S3 ring buffer placed in PSRAM via DMA.

4.2 Memory-Limited Scenarios

RP2350 (520 KB SRAM): 4ch @96kHz + 26 sensors + HTTP = 88 KB (16.9%). Has 432 KB remaining for application logic, OTA staging buffer, and filesystem. The generous SRAM headroom makes RP2350 the best balanced target for feature-rich deployments.

STM32F411 (128 KB SRAM): 4ch @48kHz XAP = 44 KB (34.4%). The 84 KB remaining is adequate for the application but leaves no room for OTA staging. Firmware updates must use external flash or a swap-based bootloader. Dropping to 2ch@48kHz reduces usage to 36 KB, freeing 8 KB.

nRF52840 (256 KB SRAM): 2ch @48kHz + BLE GATT stack = 56 KB (21.9%). The SoftDevice BLE stack itself consumes an additional ~30 KB. With SoftDevice: total ~86 KB used, 170 KB available. Comfortable for BLE-based deployments.

STM32F103 (20 KB SRAM): Sensor-only ADPCM = 16 KB (80.0%). Only 4 KB remains for application logic. This is the absolute minimum viable configuration. Even adding a single additional sensor stream would require careful stack optimization. XAP is completely impossible: the encoder state alone (8 KB per channel, 32 KB for 4ch) exceeds total SRAM.

ESP32-S3 (512 KB + 8 MB PSRAM): The PSRAM massively extends available memory. Audio ring buffers (64 KB+) and XMBP batch buffers can reside in PSRAM with DMA access, keeping fast SRAM free for codec state and stack. This makes ESP32-S3 the only target that could feasibly support configurations beyond 4ch@96kHz (e.g., 8ch@48kHz for surround monitoring) without memory pressure.


5. Server Concurrency Evaluation

5.1 Architecture Overview

The Xylolabs API server is built on Tokio async runtime with Axum. The ingest pipeline (crates/xylolabs-server/src/ingest/manager.rs) processes XMBP batches from MCU devices, buffers samples in memory, compresses via zstd in spawn_blocking, and flushes to S3 and PostgreSQL.

5.2 Connection Parameters

Parameter Value Notes
Runtime Tokio multi-threaded Worker threads = CPU cores
DB connection pool 20 (default, configurable) DATABASE_MAX_CONNECTIONS env var
Per-session memory ~1-4 KB per stream buffer Excluding accumulated samples
Upload body limit Up to 2 GB Currently full-buffer (not streaming)
SSE live connections Unbounded Broadcast channels per session
HTTP keep-alive 75 seconds Axum default
Auth rate limit 10 attempts/IP/minute In-memory cache, 60s TTL, 10K IP capacity
Ingest flush window 10 seconds (configurable) Accumulates samples before S3 write
Session timeout 300 seconds (configurable) Auto-close stale sessions

5.3 Ingest Pipeline Throughput

Stage Latency Concurrency Model Notes
XMBP batch decode < 100 us per batch Inline async Pure CPU, no I/O
Sample buffering < 1 us per sample Mutex-guarded HashMap Per-stream Vec
Live event broadcast < 10 us per event broadcast::channel Only if subscribers exist
zstd compression ~200 us per chunk spawn_blocking Offloaded from async runtime
S3 upload ~5-20 ms per chunk Async I/O Network-bound; MinIO local ~2 ms
DB insert (chunk record) ~1-2 ms per record Async sqlx Batched within flush window
DB stats update ~1 ms per batch Inline async Single UPDATE per batch

Estimated throughput per core: ~500 XMBP batches/sec (CPU-bound stage is compression at ~200 us/chunk, but offloaded to blocking pool).

5.4 Concurrent Session Capacity

Scenario Sessions Audio Config Sensor Streams Server CPU DB Load RAM
Light 10 10 x 2ch @16kHz 40 @100Hz < 5% Low ~40 KB
Standard 50 50 x 4ch @48kHz 200 @100Hz ~20% Medium ~200 KB
Heavy 100 100 x 4ch @96kHz 2600 @100Hz ~60% High ~400 KB
Limit ~200 Limited by DB pool Limited by DB pool ~90% Saturated ~800 KB

The limiting factor is the PostgreSQL connection pool (default 20). Each flush operation requires a DB connection for the chunk INSERT. With 100 sessions flushing every 10 seconds, the DB pool processes ~10 flush operations/sec per connection, which is well within capacity. At 200+ sessions with aggressive flush windows, pool exhaustion becomes the bottleneck.

5.5 Concurrency Hazards — Identified and Resolved

Issue Severity Status Resolution
ConfigManager blocking RwLock in async context P2 Fixed Migrated from std::sync::RwLock to tokio::sync::RwLock. Blocking lock in async runtime caused thread starvation under contention.
Ingest flush data loss on S3 failure P0 Fixed Previously drained samples before confirming S3 write. Now clones samples before flush; originals retained on failure for retry. flushing flag prevents concurrent flushes of same buffer.
N+1 tag queries in list_uploads P1 Fixed Replaced per-upload tag fetch (1+N queries) with single JOIN batch query.
Sequential stats_overview queries P2 Fixed Six independent COUNT queries now run concurrently via tokio::try_join!.
S3 full-file buffering on upload P1 Open 2 GB upload = 2 GB RAM. Needs streaming multipart upload with backpressure.
Upload body buffered in memory P1 Open Large upload bodies held entirely in memory before S3 write. Needs streaming body handling.
No connection rate limiting (general) P3 Open Auth endpoints have IP-based rate limiting (10/min). General API endpoints lack rate limiting.
File descriptor leak via mem::forget P0 Fixed Temp files leaked via mem::forget. Fixed with into_temp_path for deterministic cleanup.
Session lock contention P2 Mitigated IngestManager.sessions uses tokio::sync::RwLock. Individual sessions use tokio::sync::Mutex. Flush operations drop the session lock before performing I/O. Stale session check uses atomic last_activity_ms without locking session mutexes.

5.6 Remaining Performance Risks

S3 full-file buffering: The most significant open issue. A single 2 GB upload consumes 2 GB of server RAM. With 10 concurrent large uploads, the server requires 20 GB RAM just for upload buffering. The fix requires streaming multipart upload to S3 with bounded memory buffers and backpressure signaling to the client.

DB pool starvation: The default pool of 20 connections supports approximately 200 concurrent sessions at the current flush interval. Beyond this, flush operations queue behind the pool, increasing latency and sample buffer memory. Increasing the pool size requires corresponding PostgreSQL max_connections tuning.

Broadcast channel unbounded subscribers: The SSE live event endpoint creates broadcast receivers with no limit on subscriber count. A malicious client opening thousands of SSE connections could exhaust memory with broadcast channel buffers.


6. Recommendations

6.1 Platform Selection by Use Case

Use Case Recommended Platform Codec Max Config Rationale
Full-spectrum industrial monitoring ESP32-S3 XAP 4ch @96kHz 82% headroom, WiFi built-in, PSRAM for large buffers
Battery-powered field sensor RP2350 (Pico 2) XAP 4ch @96kHz 54% headroom, lowest active power (25 mA), dual-core
Compact industrial node STM32F411 XAP 4ch @48kHz 60% headroom, proven M4F ecosystem, UART LTE-M
BLE wearable / beacon nRF52840 XAP 2ch @48kHz 58% headroom, BLE transport, ultra-low sleep (1.5 uA)
Cellular IoT (LTE-M) nRF9160 XAP 2ch @48kHz 55% headroom, integrated LTE-M modem
BLE + Thread mesh STM32WB55 XAP 2ch @48kHz 58% headroom, dual-protocol BLE + 802.15.4
Voice-only / legacy sensor STM32F103 IMA-ADPCM 2ch @24kHz 78% headroom, ADPCM only, 20 KB SRAM limit
Low-cost WiFi sensor ESP32-C3 IMA-ADPCM 4ch @96kHz 98% headroom, ADPCM only, RISC-V, WiFi built-in
Education / prototyping RP2040 (Pico) IMA-ADPCM 4ch @96kHz 97% headroom, ADPCM only, lowest cost ($1)

6.2 Server Scaling Recommendations

Load Tier Sessions Server Config DB Pool Notes
Development 1-10 Single instance, 2 CPU 10 Default configuration sufficient
Small deployment 10-50 Single instance, 4 CPU 20 Default pool adequate
Medium deployment 50-200 Single instance, 8 CPU 40 Increase pool, monitor flush latency
Large deployment 200-1000 Multiple instances + LB 60/instance Requires streaming S3 upload fix, horizontal scaling
Enterprise 1000+ Kubernetes pods, auto-scale Connection pooler (PgBouncer) Requires all P1 issues resolved

6.3 Priority Engineering Work

  1. P1 — Streaming S3 upload: Replace full-file buffering with streaming multipart upload. Eliminates the O(file_size) memory consumption. Critical for any deployment handling files larger than 100 MB.

  2. P1 — Streaming upload body: Implement backpressure-aware streaming from HTTP body to S3, never holding more than a bounded buffer (e.g., 4 MB) in memory.

  3. P2 — SSE subscriber limits: Cap broadcast channel receivers per session (e.g., 100). Return 429 when exceeded.

  4. P3 — General API rate limiting: Add Tower rate-limit middleware to all API endpoints, not just auth.

  5. Optimization — Cosine table extension: Consider extending XAP_PRECOMPUTE_MAX_N to 480 on targets with sufficient SRAM (ESP32-S3 with PSRAM) for 48kHz table-based encoding without DSP dependency. This would eliminate the 170x discontinuity for the 48kHz use case.