Live Audio Streaming — Design Spec
Date: 2026-05-22
Status: Draft for user review
Scope: server (xylolabs-server), SDK (xylolabs-sdk), HAL (xylolabs-hal-esp), transcoder (xylolabs-transcode), admin frontend (frontend/), operator app (frontend-app/)
Coexists with: existing session-based chunked upload path (/api/v1/ingest/sessions/*) — unchanged
1. Motivation
The current ingestion pipeline is session-based: a device opens an ingest_session, batches audio + metadata for ~500 ms at a time, posts via HTTP, and the server flushes to S3 every ~10 s. There is no live audio egress; the SSE live tail is metadata-only and capped at 30 minutes.
We want to add continuous live audio streaming: a device pushes a never-ending stream of LC3 frames, multiple listeners (browser, mobile app, server consumers) tune in at <3 s latency, and every byte is simultaneously archived to S3 so the timeline view can offer scrub-back.
Hardware target for the first cut: ESP32-S3 + 4-channel PDM mic array, LC3 codec (XAP). CPU/RAM headroom is comfortable (~17% CPU @ 96 kHz × 4ch, ~32 KB RAM per CODEC-ANALYSIS.md).
2. Decisions (from brainstorming)
| Question | Decision |
|---|---|
| Live-listen latency target | 1–3 s glass-to-ear (PDM capture → encoder → server → fan-out → listener decode + playback) |
| Listener clients | Operator browser dashboard, frontend-app SPA, mobile app (native LC3), server-side consumers; needs LC3 / PCM / AAC / Opus outputs |
| Concurrent listeners | ~10 concurrent listening sessions per server node; server-side transcoding budget acceptable (ARM Neoverse, 4 cores) |
| Archive policy | Dual path: LC3 zstd (lossless) + AAC HLS segments (browser-playable) — both to S3 |
| Relationship to batch sessions | Coexist. New live_streams resource; existing ingest_sessions untouched |
| Browser playback transport | LL-HLS (AAC fMP4 segments, hls.js) |
| Stream identity | Stable per-device logical channel: stream_key = "{facility_slug}/{device_uid}/{port_index}". Reboots / WiFi flaps keep the same stream_id. Manifest uses EXT-X-DISCONTINUITY to mark gaps |
| Integration with existing device timeline | The new live_streams and per-stream archive segments must surface in the existing GET /api/v1/devices/{id}/timeseries response so clicking a device shows live hero + scrub-back track |
3. Architecture (high-level)
DEVICE (ESP32-S3)
4ch PDM mic → I2S DMA → ring → LC3 enc (4ch interleaved) → WS push
│
50–100 ms batch, XMBP framing
│
▼
SERVER
ingest_ws → LiveAudioManager
├─ broadcast<LiveAudioFrame> per stream
│
├─ archive_lc3 → S3 (lossless, zstd chunks)
├─ archive_hls → FFmpeg(LC3→AAC fMP4)
│ → memory ring (60 s) + S3 PUT (.m4s)
│
└─ on-demand encoders, lifecycle = subscriber count:
├─ listen.ws?format=lc3 (passthrough)
├─ listen.ws?format=pcm (LC3 → s16le)
├─ listen.ws?format=aac (FFmpeg LC3 → AAC ADTS)
├─ listen.ws?format=opus (FFmpeg LC3 → Ogg/Opus)
└─ listen.m3u8 + segments (LL-HLS, served from ring)
Single fan-out channel — LiveAudioManager demuxes each device's WS once; all output paths (LC3/PCM/AAC/Opus subscribers + the two archive tasks) attach as broadcast subscribers. PCM decode is lazy: only runs when at least one downstream needs it.
HLS segment dual-signing — newly-encoded fMP4 segments go into both a 60-second in-memory ring (for low-latency live playback) and asynchronously to S3 (for DVR + replay). Live listeners never wait on S3 propagation.
4. Data model
Three new tables, none of which touch existing ingest_sessions.
live_streams — logical channel (one row per device port)
| column | type | notes |
|---|---|---|
id |
uuid PK | |
facility_id |
uuid → facilities | RBAC scope |
device_id |
uuid → devices | |
port_index |
smallint | for multi-mic boards (default 0) |
stream_key |
text UNIQUE | {facility_slug}/{device_uid}/{port_index} |
display_name |
text | operator label |
channels |
smallint CHECK 1..=8 | 1–4 for now |
channel_names |
text[] | e.g. {front, rear, L, R} |
sample_rate_hz |
int | 16/24/48/96 kHz |
codec |
text DEFAULT 'lc3' | |
bitrate_per_channel_bps |
int DEFAULT 64000 | |
frame_duration_us |
int DEFAULT 10000 | |
transcode_profile |
text DEFAULT 'default' | per-stream AAC/Opus bitrate override |
state |
text | idle / live / paused |
last_connected_at |
timestamptz | |
last_disconnected_at |
timestamptz | |
total_seconds_live |
bigint | cumulative uptime |
retention_days |
int | facility default override |
metadata |
jsonb | |
created_at / updated_at / deleted_at |
timestamptz | soft delete |
UNIQUE (facility_id, device_id, port_index) WHERE deleted_at IS NULL;
INDEX (facility_id, state) WHERE deleted_at IS NULL;
INDEX (stream_key);
live_stream_connections — per-WS audit log
| column | type | notes |
|---|---|---|
id |
uuid PK | |
stream_id |
uuid → live_streams | |
api_key_id |
uuid → api_keys | |
started_at |
timestamptz NOT NULL DEFAULT now() | |
ended_at |
timestamptz | NULL = in progress |
client_addr |
inet | |
disconnect_reason |
text | client_close/idle_timeout/server_close/error/replaced |
samples_received |
bigint | |
bytes_received |
bigint | |
last_batch_seq |
int | XMBP batch seq tracking |
live_archive_segments — unified LC3 + HLS segment index
| column | type | notes |
|---|---|---|
id |
uuid PK | |
stream_id |
uuid → live_streams | |
kind |
text | lc3_zstd / hls_init / hls_m4s |
s3_key |
text | object key |
sequence_num |
bigint | HLS media seq (NULL for lc3_zstd) |
start_us |
bigint | unix microseconds |
duration_us |
bigint | |
byte_size |
bigint | |
discontinuity |
bool DEFAULT false | HLS discontinuity marker |
created_at |
timestamptz NOT NULL DEFAULT now() |
INDEX (stream_id, start_us);
INDEX (stream_id, kind, sequence_num);
INDEX (created_at); -- retention prune
api_keys scope additions (text[], no schema change)
live:ingest— device push permissionlive:listen— listener pull permission
Both follow the existing media:read convention (cycle 5 2026-05-22, P672 followup).
Migration files
20260522120000_create_live_streams.sql
20260522120100_create_live_stream_connections.sql
20260522120200_create_live_archive_segments.sql
Strictly monotonic per CLAUDE.md migration rule. Verify before commit:
ls crates/xylolabs-db/migrations | awk -F_ '{print $1}' | sort -c
5. Server components
New module: crates/xylolabs-server/src/live/
src/live/
├── manager.rs # LiveAudioManager — orchestrates WS + fan-out + archive
├── frame.rs # LiveAudioFrame { stream_id, seq, pts_us, lc3_payload, channels }
├── transcode.rs # PerFormatEncoder trait + FFmpeg subprocess (AAC, Opus)
├── pcm.rs # LC3 → PCM s16le (in-process, liblc3 Rust)
├── archive_lc3.rs # LC3 zstd chunked archiver (re-uses IngestManager logic)
├── archive_hls.rs # HLS fMP4 segmenter + S3 PUT + memory ring
├── hls_playlist.rs # LL-HLS .m3u8 builder (live + DVR window)
└── connection.rs # per-device WS handler (BatchSequence tracking)
LiveAudioManager
pub struct LiveAudioManager {
streams: DashMap<Uuid, Arc<LiveStreamRuntime>>,
s3: S3Client,
db: PgPool,
config: LiveConfig,
}
struct LiveStreamRuntime {
stream_id: Uuid,
facility_id: Uuid,
audio_tx: broadcast::Sender<LiveAudioFrame>,
pcm_tx: broadcast::Sender<Bytes>,
encoders: RwLock<HashMap<EncoderKey, Arc<EncoderTask>>>,
hls_ring: Arc<HlsMemoryRing>,
state: RwLock<LiveStreamState>,
}
Single-connection-per-stream invariant
When a second WS arrives for the same stream_key, the existing connection is closed with code 4001 "replaced". UI shows the warning. Prevents two devices pushing into the same logical channel.
Lazy encoder lifecycle
- PCM decode task spawns when any of {PCM listener, AAC encoder, Opus encoder} attaches.
- AAC/Opus encoder spawns on first listener subscription, dies 30 s after last unsubscribe.
- HLS archive encoder runs continuously while
state=live(segments needed for both archive + future late joiners).
HlsMemoryRing
struct HlsMemoryRing {
init_segment: ArcSwap<Bytes>,
segments: parking_lot::Mutex<VecDeque<HlsSegment>>, // last ~60 s
parts: parking_lot::Mutex<VecDeque<HlsPart>>, // LL-HLS partials (200 ms)
media_sequence: AtomicU64,
}
Live listeners get segments from memory; the same bytes are PUT to S3 in the background for DVR.
Existing module touches
state.rs— addlive: Arc<LiveAudioManager>toAppStaterouter.rs— register new routes (see §6)routes/device_timeline.rs— extend response withlive_streams+live_segmentsmiddleware/rbac.rs— addlive:read/live:listen/live:manageperms
Background workers
LiveRetentionWorker— every 30 min, prunelive_archive_segmentsolder thanfacility.retention_days+ S3 batch delete.LiveHealthMonitor— every 10 s, transitionstate=live → idlewhenlast_connected_at < now - 30 s. AttachesEXT-X-DISCONTINUITYon the next segment after gap recovery.
6. API surface
Device ingestion
| method | path | auth | notes |
|---|---|---|---|
POST |
/api/v1/live/streams |
JWT | operator creates a logical stream (optional; auto-provision also supported) |
GET (WS) |
/api/v1/live/streams/{key}/ingest |
API key, live:ingest |
one connection per stream_key; 4001 "replaced" on dup |
Listener egress
All require JWT OR API key with live:listen (same dual-auth pattern as media:read).
| method | path | format | notes |
|---|---|---|---|
GET (WS) |
/listen.ws?format={lc3|pcm|aac|opus}&bitrate={kbps}&channels=... |
per format |
first frame is JSON hello, then binary u64 BE pts_us + payload |
GET |
/listen.m3u8?codec=aac&bitrate=128 |
LL-HLS | accepts ?token=jwt for <video> element |
GET |
/init.mp4 |
fMP4 | immutable per stream, 1y cache |
GET |
/segments/{seq}.m4s |
fMP4 segment | served from memory ring (<60 s old) or S3 |
GET |
/segments/{seq}.{part}.m4s |
LL-HLS partial | 200 ms |
GET |
/segments/{seq}.m4s?vod={start_us}-{end_us} |
fMP4 | DVR within retention_days |
CRUD
| method | path | auth |
|---|---|---|
GET |
/api/v1/live/streams?facility_id=…&device_id=…&state=live |
JWT, live:read |
GET |
/api/v1/live/streams/{id} |
JWT |
PATCH |
/api/v1/live/streams/{id} |
JWT, live:manage |
DELETE |
/api/v1/live/streams/{id} |
JWT, live:manage (soft delete) |
POST |
/api/v1/auth/live-token |
JWT |
Scope wildcards
The existing API-key scope check (crates/xylolabs-server/src/middleware/api_key_auth.rs::api_key_has_scope) accepts "*" as a wildcard. Keys minted with ["*"] automatically satisfy both live:ingest and live:listen; no migration of existing super-keys is required.
Existing endpoint extension
GET /api/v1/devices/{id}/timeseries response gains:
{
// existing fields…
"live_streams": [{ "stream_id", "stream_key", "display_name", "channels", "state", "last_connected_at" }],
"live_segments": [{ "stream_id", "start_us", "end_us", "kind", "discontinuity" }]
}
Single fetch → metadata charts + recording events + live hero + scrub track.
Error surface (new i18n keys)
errors.live.streamNotLive— "스트림이 현재 송출 중이 아닙니다"errors.live.formatUnsupported— "지원하지 않는 포맷입니다: {format}"errors.live.replaced— "다른 위치에서 같은 디바이스가 연결되어 이전 세션이 종료됐습니다"errors.live.subscriberLimit— "스트림당 동시 청취 한도를 초과했습니다"
WS close codes
| code | meaning |
|---|---|
| 4001 | replaced (ingest) |
| 4002 | auth_revoked |
| 4003 | facility_mismatch |
| 4004 | stream_not_live (listen) |
| 4005 | format_unsupported (listen) |
| 1011 | internal error |
7. SDK changes (Rust no_std)
New module: xylolabs-sdk/src/live/
src/live/
├── client.rs # LiveClient — long-lived WS
├── transport_ws.rs # embedded-websocket over embedded-tls + embassy-net
├── encoder_4ch.rs # N-channel interleaved LC3
└── reconnect.rs # exponential backoff, stable stream_key
LiveClient
pub struct LiveClient<P: Platform, T: WsTransport, const N_CH: usize> {
transport: T,
encoder: Lc3MultiChannel<N_CH>,
stream_key: heapless::String<128>,
ring: RingBuffer<{ N_CH * STANDARD_MAX_FRAME_SAMPLES }>,
batch_ms: u16,
api_key: heapless::String<64>,
}
- Configurable
batch_ms(50–100 ms for live; default 50). - Reconnect: 200 ms → 2× → 30 s cap. Same
stream_keyacross reboots → samestream_id→ listener URL stable.
ESP32-S3 4ch PDM example
New: sdk/examples/esp32-s3-4ch-live/. Adds to xylolabs-hal-esp:
pub fn new_pdm_4ch(pio: PIO0, clk_pin: u8, data_pins: [u8; 4]) -> Pdm4ChDriver;
DMA double-buffer [i16; 4 * 480] × 2. Embassy task: PDM DMA → client.push_pcm().
Existing SDK unchanged
StreamingClient (session/batch), HttpTransport, single-channel LC3 — untouched. Opt-in live path.
8. Frontend changes
Admin frontend (frontend/)
New components: LiveAudioPlayer, LiveStreamBadge, LiveSegmentTrack, ChannelMixer.
LiveAudioPlayer uses hls.js@^1.5 with lowLatencyMode: true. Auth: short-lived (5 min) JWT issued by POST /api/v1/auth/live-token, embedded in M3U8 URL as ?token=... (because <video> cannot carry custom headers). Media Session API for OS lockscreen / Bluetooth controls.
DeviceTimelinePage.tsx integration
- Top hero: 🔴 LIVE badge + display_name + channel count + Listen button if device has
state=livestream. - Live segment track: filled bar overlay on the recording_events track. Gaps render as
EXT-X-DISCONTINUITYmarkers. Click → opens player in VOD mode at that timestamp.
frontend-app/
Symmetric integration with reduced UI (operator-facing). Same LiveAudioPlayer lifted to a shared package if duplication grows.
9. Transcoding pipeline
FFmpeg subprocess strategy
FFmpeg ≥ 6 required (for LL-HLS EXT-X-PART + independent_segments HLS flag combination). The production Docker image already ships FFmpeg ≥ 7 per the existing xylolabs-transcode pipeline; the live path reuses the same binary.
Long-running encoder per (stream_id, format) (not segment-by-segment spawn). stdin = PCM s16le, stdout = AAC fMP4 or Ogg/Opus. Idle 30 s after last subscriber → reaped. HLS archive encoder runs continuously while state=live.
AAC HLS command (template)
ffmpeg -fflags +nobuffer -flush_packets 1 -muxdelay 0 \
-f s16le -ar 48000 -ac 4 -i pipe:0 \
-c:a aac -b:a 128k -ac 2 -af "pan=stereo|c0<c0+c1|c1<c2+c3" \
-f hls \
-hls_segment_type fmp4 -hls_time 1.0 \
-hls_flags omit_endlist+independent_segments+delete_segments+temp_file+program_date_time \
-hls_segment_filename "segments/%d.m4s" \
-hls_list_size 60 -hls_playlist_type event \
/dev/null
4ch → stereo downmix for first cut; multi-channel AAC (HE-AAC v2) deferred.
Resource budget
- ~15 MB FFmpeg RSS per encoder × ~1.5 formats avg × 10 streams ≈ 225 MB. t4g.medium 4 GiB safe.
- AAC encode 48 kHz stereo ≈ 5% × 1 Graviton2 core × 10 streams = 50% × 1 core.
10. Operations
CSP (crates/xylolabs-server/src/router.rs)
connect-src 'self' wss://api.xylolabs.com blob:;
media-src 'self' blob:;
(LL-HLS is same-origin so no extra origin needed.)
nginx (deploy/nginx/)
location /api/v1/live/streams/.*/ingest {
proxy_pass http://app:3000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 7d; proxy_send_timeout 7d;
proxy_buffering off;
}
location /api/v1/live/streams/.*/listen.ws { ... same ... }
location /api/v1/live/streams/.*/listen.m3u8 {
proxy_pass http://app:3000;
proxy_cache off;
add_header Cache-Control "no-cache" always;
}
location /api/v1/live/streams/.*/segments/ {
proxy_pass http://app:3000;
proxy_cache_valid 200 60s;
}
Migration / deploy order
- SQL migrations (3 files, strictly monotonic)
- Server with new routes + manager
- nginx reload (new locations)
- CSP header update (in server router code)
- Frontend (LiveAudioPlayer + DeviceTimelinePage integration)
- SDK firmware OTA (opt-in flag)
Each step is independently revertable.
11. Testing
Unit (server lib)
HlsPlaylistbuilder: various seq/discontinuity combos → canonical M3U8 textpcm.rsLC3 → s16le round-trip accuracy (samples_in == samples_outwithin float epsilon for sample data)LiveAudioManagerpubsub fan-out: subscribers count == sender attached count- Encoder lifecycle: spawn on subscribe, reap 30 s after last unsubscribe
Integration (server tests/)
api_live_ingest.rs— WS connect → push 100 frames → listener SSE receives. Docker postgres + minio.api_live_listen_ws.rs— 4 formats each: hello JSON + binary framesapi_live_hls.rs— m3u8 fetch → init.mp4 → segment stream; assertEXT-X-PROGRAM-DATE-TIMEaccuracy ≤ 1 sapi_live_archive.rs— stream close → both LC3 zstd + HLS segments PUT to S3 + DB rows indexedapi_live_replaced.rs— second WS to same stream_key closes first with 4001
E2E
tests/e2e-live-stream/ — ESP32 simulator → live push → headless browser → assert end-to-end glass-to-ear latency 1–3 s.
Load
tests/burnin-live/— 50 concurrent streams × 4ch × 1h → memory/CPU snapshottests/listener-fanout/— 100 concurrent listeners × 1 stream → broadcast channel backpressure
12. Rollout (phased, ~8–9 weeks)
| Phase | Scope | Duration |
|---|---|---|
| 0 | DB migrations, scope additions, transcode skeleton, LiveStreamActor |
1 wk |
| 1 | LiveAudioManager, WS ingest, LC3 zstd archive, integration tests |
1 wk |
| 2 | broadcast fan-out, per-format encoders, WS egress (4 formats), token endpoint | 1 wk |
| 3 | HLS segmenter, memory ring, LL-HLS playlist, DVR window | 1 wk |
| 4 | Frontend LiveAudioPlayer, DeviceTimelinePage hero + segment track |
1.5 wk |
| 5 | SDK LiveClient, ESP32-S3 4ch PDM HAL, 4ch LC3 encoder, OTA flag |
2 wk |
| 6 | E2E + burn-in + docs (docs/LIVE-STREAMING.{en,ko}.md) + API.md update |
1 wk |
Phases 0–3 are server-only and can ship without affecting users. 4–5 can run in parallel (web vs firmware teams).
Per-phase gate
- All new + existing tests pass
cargo clippy -- -D warningsclean- Browser 3 viewports (mobile/tablet/desktop) zero
pageerror - Deploy + health check 200 + endpoint auth matrix verified
13. Out of scope (deferred to subsequent specs)
- WebRTC sub-second listener path (would require SFU)
- Adaptive bitrate (multi-quality HLS rendition)
- Multi-channel AAC (HE-AAC v2)
- Cross-facility public share links (per-stream token grants)
- "Recording window" UI (explicit record start/stop on top of always-archive)
- Listener-side encryption beyond TLS (audio-payload-level)