Architecture

System Overview

See the rendered system diagram: architecture.svg.

At a high level: - firmware application code runs on top of xylolabs-sdk - xylolabs-protocol provides the shared XMBP wire format - xylolabs-hal-* crates implement target-specific platform adapters - xylolabs-server accepts HTTP/WebSocket ingest, enforces auth, and coordinates persistence - PostgreSQL stores metadata/index state while MinIO stores audio/blob payloads

Data Flow

MCU captures audio (I2S) and sensor data, feeds them into XylolabsClient.
XylolabsClient encodes audio via XAP or ADPCM into the ring buffer, accumulates metadata samples.
On each tick(), the client batches metadata into an XMBP packet and sends audio + metadata over HTTP POST (or WebSocket) to the server.
Server authenticates via API key, decodes XMBP, buffers samples in IngestManager, and flushes to PostgreSQL (metadata) and MinIO (audio chunks in XMCH format).

Crate Dependency Graph

Dependency flow summary: - xylolabs-server depends on xylolabs-core, xylolabs-db, xylolabs-storage, and xylolabs-transcode - xylolabs-transcode also depends on xylolabs-core, xylolabs-db, and xylolabs-storage - xylolabs-sdk depends on xylolabs-protocol - xylolabs-hal-rp and xylolabs-hal-stm32 sit on top of xylolabs-sdk and also use xylolabs-modem (UART modem driver); xylolabs-hal-esp/xylolabs-hal-nrf sit on xylolabs-sdk only

Crate Roles

Crate	Role	`no_std`
`xylolabs-core`	Domain models, DTOs, error types, chunk format (XMCH), XMBP server-side codec, downsampling	No
`xylolabs-db`	SQLx repository layer + PostgreSQL migrations (140 files; verify: `ls crates/xylolabs-db/migrations/*.sql \| wc -l`)	No
`xylolabs-storage`	S3/MinIO client abstraction	No
`xylolabs-transcode`	FFmpeg worker, XAP server-side decoder, stale job reaper, transcoding pipeline	No
`xylolabs-server`	Axum web server, routes, middleware, ingest engine	No
`xylolabs-protocol`	XMBP wire protocol encoder/decoder (shared with SDK)	Yes
`xylolabs-sdk`	Embedded SDK core: client, session, codecs, transport	Yes
`xylolabs-modem`	Shared AT command modem driver (LTE-M1/NB-IoT) consumed by HAL crates that use a UART modem	Yes
`xylolabs-hal-*`	Per-target Platform trait implementations (not in workspace)	Yes

Proprietary Protocols (Summary)

XMBP -- Xylolabs Metadata Batch Protocol

Binary big-endian wire protocol for batched sensor/metadata transmission from MCU to server. Shared between the no_std SDK encoder and the server decoder.

Magic: 0x584D4250 ("XMBP"), Version: 1
Supports 13 value types (F64, F32, I64, I32, I16, I8, Bool, String, Bytes, F64Array, F32Array, I32Array, Json)
Up to 256 streams per batch, microsecond timestamps
Encoder: zero-copy, zero-alloc into borrowed buffer with sticky overflow
Decoder: validates all lengths before allocation (DoS-safe)
Batch sequence wraps at u16::MAX with forward/backward gap detection

Full specification: XMBP-SPECIFICATION.md

XAP -- Xylolabs Audio Protocol

Proprietary MDCT-based spectral audio codec providing ~8:1 to 10:1 compression of 16-bit PCM.

Sample rates: 8, 16, 24, 32, 48, 96 kHz
Channels: 1--4 (encoded independently)
Frame durations: 7.5 ms, 10 ms
~10 MIPS per channel (with DSP acceleration), ~8 KB RAM per channel
Requires FPU; platforms without FPU use IMA-ADPCM instead
Codec ID in XMBP: 0x03

Server-side decoder: The xylolabs-transcode crate includes an XAP decoder (xap_decode.rs) that performs inverse MDCT + dequantization. When a .xap file is uploaded, the transcode pipeline automatically decodes it to PCM WAV before passing to FFmpeg for transcoding to the target format. Sample rate is inferred from the frame header's frame_samples field.

Full specification: XAP-SPECIFICATION.md

XMCH -- Xylolabs Metadata Chunk Format

Server-side binary format for storing flushed metadata chunks in S3/MinIO.

Layout: - 32-byte header - magic: 0x584D4348 (XMCH) - version: 1 - value type: wire tag - reserved: 2 bytes - sample count: u32 - start timestamp: u64 microseconds - end timestamp: u64 microseconds - data size: u32 - timestamps column: N × u64 big-endian - values column: type-dependent payload

Chunks are optionally zstd-compressed via encode_chunk_compressed() before S3 upload.

Server Ingestion Pipeline

IngestManager

Located at crates/xylolabs-server/src/ingest/manager.rs.

Maintains per-session state with per-stream StreamBuffers.
Buffers up to 500,000 samples per stream before flush.
Flush writes encoded XMCH chunks to S3 and metadata records to PostgreSQL.
Concurrent flush guard (flushing flag) prevents double-flush. Samples arriving during flush are accumulated separately and merged on completion.
On flush failure, original samples are prepended back for retry.
Supports live event broadcasting via tokio::sync::broadcast for real-time WebSocket subscribers.
Stale session eviction based on last activity timestamp.

Device Latest-Sample Projection

device_latest_samples (repo::device_latest_sample) is a durable (device_id, stream_name) → newest sample projection, upserted in batches with a monotonic guard (WHERE EXCLUDED.timestamp_us > device_latest_samples.timestamp_us) so a backfill of older data can never regress the stored "latest." IngestManager keeps its own in-memory latest_samples map (seeded from this projection on boot or on a cold-cache miss) as a fast serving mirror for the facility dashboard's live feed. The projection is the durable source of truth for "newest"; the in-memory cache is a monotonic mirror of it and must preserve the same ordering rather than blindly overwriting on every batch.

Batch Sequence Tracking

The server uses classify_batch_sequence() with forward/backward gap heuristics to distinguish reordering from genuine gaps at u16 wraparound boundaries.

Downsampling

Two server-side reduction strategies live in crates/xylolabs-core/src/downsample.rs (C3-ARCH-4):

LTTB (Largest Triangle Three Buckets) — query-time visual downsampling on the raw path. Preserves visual shape while reducing point count; applied in the timeline endpoint before returning points to the frontend.
bucket_by_time / TimeBucketAccumulator (P1023 WP5) — fixed-width time-bucket averaging for wide ranges. Under agg=auto (the default on GET .../streams/{id}/data and GET /devices/{id}/timeseries), a range that would exceed the raw chunk/sample caps folds into per-bucket means (aggregated: true, bucket_us, optional min/max bands) instead of returning 400. Memory is O(buckets), independent of range width.

Both paths are fed by the shared cache-aware chunk service services::chunk_decode::decode_cached (backed by AppState::timeline_chunk_cache; aggregation folds read through it without inserting, and the CPU-bound decode stage is capped by a process-global semaphore). The per-device rollup feed (GET /api/v1/metadata/device-rollups) is a separate single-query SQL aggregate in repo::ingest_session::device_rollups.

Device Timeline Performance

The GET /api/v1/devices/{id}/timeseries endpoint uses a three-phase pipeline to minimize latency:

Phase 1 — single batch DB query: repo::metadata_chunk::list_by_streams(stream_ids[]) fetches all chunks for all stream names in one query, eliminating N+1 per-stream DB roundtrips.
Phase 2 — parallel S3 download: all numeric chunks across all stream names are downloaded in one buffered(32) concurrent pass (C3-PERF-3: downloads are I/O-bound; the CPU-bound decode stage is separately capped by the global chunk_decode::DECODE_PERMITS semaphore). Cache lookups happen first: decoded chunks are cached by S3 key in AppState::timeline_chunk_cache (in-memory mini_moka, 1 h time_to_idle, Arc<Vec<MetadataSample>> values for O(1) arc-clone hits). Decoding is offloaded to spawn_blocking.
Phase 3 — anchor, filter, downsample: per-session clock-anchor correction is applied (device clocks may be ahead of server time), then the requested time range filter, then LTTB downsampling to the requested downsample target.

F32Array, F64Array, and I32Array samples are mean-aggregated to a single F64 value per sample before downsampling. This prevents 2 M-point OOM conditions from multi-axis accelerometer streams.

Frontend Charts

Both dashboards (frontend/ admin and frontend-app/ operator) use uPlot (canvas-based) for all time-series rendering: DeviceTimelineChart, StreamChart (numeric and array paths), and VectorTripletChart. uPlot handles large point counts efficiently on canvas. WaveformPlayer (WaveSurfer), FFT spectrogram, boolean bar charts, and data tables are not migrated and remain as-is.

Device Command Channel & Event Telemetry (SP4 / SP5 / SP9)

Two device-bound control-plane subsystems ride on the existing health POST (report_health):

SP4 command channel (repo::device_command, models::device_command, device_commands table): a per-device one-shot command (today: siren, live_listen, voice — SP4/SP9) delivered via a monotonic nonce/ack handshake. queue() (under a per-device advisory lock) assigns the next command_nonce; the command is embedded in the device's next health response; the device acks via last_cmd_nonce on its following heartbeat; record_ack advances acked_nonce (clamped to command_nonce) and clears pending. Admin endpoints: POST/GET /api/v1/admin/devices/{id}/command.
SP5 event telemetry (repo::device_event, models::device_event, device_events table): smart-sensor event sub-objects embedded in the health body are parsed (parse_health_events, 9 known keys) into rows. One-shot evidence events are de-duplicated by a partial unique index; every-cycle summaries are bounded by the services::device_events_retention background worker (30-min cadence, DEVICE_EVENTS_RETENTION_DAYS). Read endpoints: GET /api/v1/admin/devices/{id}/events{,/latest}.

Device Fitment Telemetry, Identity-Collision Flag & Location

Three device-health-adjacent columns round out the fleet-management surface (migrations 20260709000001/20260709000002/20260709130000). uwb_fitment / sd_fitment (0 = unknown, 1 = present, 2 = absent) plus the raw uwb_dev_id DW3000 register and cumulative sd_fs_errors are persisted on devices (latest snapshot) and device_health_history (time series) — fields the board1 firmware always sends in the health POST body that were previously discarded by the DTO's untyped extra-fields map. identity_collision_suspected_at flags a device row the server suspects is receiving interleaved health posts from multiple physical boards sharing one (api_key, dongle_id) identity (detected via repeated uptime inversions inconsistent with a genuine reboot); a flagged device's admin-issued commands are rejected with 409 unless the request explicitly sets confirm_identity_collision: true, and the flag itself is cleared via PATCH /devices/{id} (clear_identity_collision: true) after re-provisioning — every fresh detection and clear is audit-logged. location is a free-text operator-set installation location (e.g. "Building A / Line 3 / Pump 2"), distinct from alias and description, surfaced in the admin fleet list.

Live Audio Streaming Pipeline

A second ingest path runs alongside IngestManager for continuous low-latency audio. Located at crates/xylolabs-server/src/live/manager.rs (LiveAudioManager).

Wire path: WebSocket ingest at /api/v1/live/streams/{stream_key}/ingest (API key + live:ingest) → per-stream tokio::sync::broadcast::Sender<Bytes> (256-slot capacity) → WebSocket listener at /api/v1/live/streams/{id}/listen.ws (JWT or API key + live:listen). Browsers mint a 5-min listener JWT at POST /api/v1/auth/live-token.
PCM decode listener (cycle 3, P1025 WP2; see API reference §25.4): listen.ws?format=pcm runs a per-listener incremental LC3→s16le-mono decode server-side (Lc3StreamDecoder, shared xylolabs-transcode crate) so browsers can play live audio without a WASM codec (LiveAudioPlayer, admin SPA). The decode is CPU-bound on the 2-vCPU box: capped per facility with a small global ceiling (503 past it), each frame decodes on the blocking pool under the background decode-permit pool (services::chunk_decode), and the decoder resets on broadcast lag. format=lc3 passthrough remains uncapped.
Archive: every 60 s, raw LC3 frames are zstd-compressed inside tokio::task::spawn_blocking, written to s3://.../live/{facility}/{stream}/lc3/{start}_{end}.bin.zst, and indexed in live_archive_segments. A 5-minute mini_moka cache for stream_id → facility_id eliminates the per-flush DB roundtrip.
Retention: 30-minute worker prunes segments older than retention_days (default 30) and deletes S3 objects with buffer_unordered(8) parallelism.
Cleanup: 30-minute worker removes broadcast senders that have no subscribers and no recent producer.
Orphan reaper (three mechanisms): (1) server startup closes any live_stream_connections left open by a prior process (mirrors the ingest_sessions cleanup); (2) a periodic 15-minute worker (spawn_orphan_reaper, live/manager.rs, wired at main.rs) sweeps stale open connection rows via close_orphaned_connections for streams that are no longer present in the in-memory broadcast map; (3) orphans are also healed mid-connect: open_ingest catches the partial-unique-index collision left by a crashed ingest, closes the stale row(s) via close_all_open_for_stream (reason "replaced"), and retries the INSERT once (last-wins) so the device reconnects instead of getting a permanent 409. Operator stream-delete closes open rows eagerly (reason "stream_deleted").
Observability: atomic counters (LIVE_ARCHIVE_FLUSH_SUCCESSES/FAILURES, LIVE_ARCHIVE_FLUSH_LAST_US, LIVE_RETENTION_ROWS_PRUNED) surface at GET /api/v1/live/metrics (SuperAdmin) and as a dashboard panel.
Audit: every mutation on live_streams (POST/PATCH/DELETE) is written to audit_log with actor (user or API key) + before/after state.
Validation gate: channels 1–8, sample_rate_hz ∈ {8/16/24/32/48/96 kHz}, bitrate_per_channel_bps 16k–320k, display_name ≤ 200 B, channel_names length matches channels and each ≤ 32 B, control chars rejected, transcode_profile ∈ {default, low, high}, retention_days 1–3650.
Nginx log scrubbing: a dedicated log_format live_scrubbed strips the listener JWT query param from /listen.ws access logs (path-only). Same format applied defensively to ingest.

Database tables: live_streams, live_stream_connections, live_archive_segments (migrations 20260522130000*).

UWB Self-Localization Subsystem

A device-fed anchor-localization path lets a fleet of UWB-equipped boards discover their own 3D positions without a manual site survey.

Edge ingest (routes/uwb_ingest.rs): the board firmware POSTs the range edges it measured in a self-localization survey to POST /api/v1/ingest/uwb-edges?device_id=<u32> (device X-Api-Key + ingest scope, facility resolved from the key context, 2 MB body limit). Edges arrive as a bare JSON array and are grouped into a survey by arrival-time window (the firmware sends no survey id). An edge whose node index would exceed MAX_NODES is dropped rather than persisted, so it cannot poison a later solve.
Solver (xylolabs-core/src/localization.rs): iterative 3D multilateration (localize_3d / assemble_mesh). It clamps the problem to MAX_NODES (512, bounds the O(n²) memory) and drops any edge whose dist_m is non-finite, ≤ 0, or > MAX_RANGE_M (1e6 m), and sorts with NaN-safe f64::total_cmp — closing the earlier NaN-sort DoS. The math is CPU-bound and O(n³), so both HTTP solve entry points run it inside tokio::task::spawn_blocking to avoid starving the async runtime on the 2-vCPU box.
Solve endpoints (routes/localization.rs): POST /api/v1/localization/solve (Role::User, 4 MB body cap) solves an ad-hoc mesh supplied in the request; POST /api/v1/localization/surveys/{id}/solve solves a previously-ingested survey.
Tables: uwb_surveys, uwb_survey_edges, uwb_survey_solutions (pruned today only by the facility-delete cascade — a dedicated retention worker is a tracked follow-up).

Auth Architecture

Dual Auth Model

Path	Method	Auth
`/api/v1/ingest/*`	API Key	`X-Api-Key` header, scoped to facility
`/api/v1/auth/*`	Public	Rate-limited (100 req/min per IP)
All other `/api/v1/*`	JWT	Bearer token, RBAC-enforced

RBAC Hierarchy

super_admin -- global access, manages all facilities
facility_admin -- manages own facility's data and users (read + write)
user -- read-only access to own facility's data

All JWT-authenticated endpoints enforce require_role() explicitly. Read endpoints require at least Role::User; mutating endpoints (create, update, delete, acknowledge, resolve, retranscode) require at least Role::FacilityAdmin.

Rate Limiting

Auth endpoints use in-memory rate limiting via mini_moka::sync::Cache with 60-second TTL windows. Max 100 attempts per IP per minute. Trusted proxy IPs can be configured for correct X-Forwarded-For extraction.

Inference Pipeline Data Flow

External ML clients (GPU servers, cloud functions, or edge workers) fetch closed sessions, run local models, and post results back to the platform.

Inference pipeline data flow

Authentication: all /api/internal/* endpoints require X-Api-Key with the internal scope. Every key is facility-scoped; cross-facility access is impossible.

Result submission (POST /api/internal/inference/results): the client submits a list of events, each carrying anomaly_type, severity, title, and optional confidence/details/time window. The handler creates one anomaly_reports row per event and broadcasts each as an AnomalyEvent on the facility's broadcast channel.

Batch submission (POST /api/internal/inference/results/batch): up to 100 result objects in a single request. Entries are processed independently; partial failures are reported per-entry without rolling back successes.

Report lifecycle: - Created with is_resolved: false. - POST /api/internal/anomaly/reports/{id}/resolve sets is_resolved: true and resolved_at. - PATCH /api/internal/anomaly/reports/{id} reclassifies anomaly_type, severity, confidence, or description for post-hoc human correction.

Inference bundle (GET /api/internal/sessions/{id}/inference-bundle): aggregates all numeric streams (LTTB-downsampled) and audio stream headers for a session in a single response, minimising round trips for the common fetch-everything pattern.

Internal API & Inference Pipeline

The Internal API (/api/internal/*) provides machine-to-machine endpoints for GPU server management, inference model/job/proxy handling, and anomaly detection. Authentication is via API keys carrying the internal scope and a fixed facility_id; every response is scoped to that facility. See API Reference §24 for the full endpoint list and DTOs.

GPU Server State Model

Each gpu_servers row carries two orthogonal flags:

status (online / offline / draining / error) gates job scheduling and proxy selection.
health_status (healthy / degraded / unknown) is the observability signal driven by gpu_health_checker.

Direct status overrides via PATCH /gpu-servers/{id} bypass the health-checker state machine and are recorded in the audit log.

GPU Health Checker

crates/xylolabs-server/src/services/gpu_health_checker.rs. Runs on GPU_HEALTH_CHECK_INTERVAL_SECS (default 60, minimum 10). Selects every gpu_servers row with status='online' and probes it via GET http://<ip>:<port>/health with a buffer_unordered(10) concurrency cap. Successful responses are parsed (≤64 KB) and persisted via update_health. Persistent non-success statuses or network errors call mark_server_error; HTTP 429/503 are treated as transient and ignored. All error bodies are streamed under a 10 KB cap to prevent OOM from compromised upstreams.

Inference Worker

crates/xylolabs-server/src/services/inference_worker.rs. The bootstrap spawns INFERENCE_WORKER_CONCURRENCY (default 4) independent tokio tasks. Each task polls inference_jobs for facilities with queued work and claims a single job per loop iteration via repo::inference_job::claim_next_queued (SELECT ... FOR UPDATE SKIP LOCKED). Empty-queue backoff is exponential (2 → 30 s) and resets on a successful claim. Once claimed:

Resolve the target GPU (the job's pinned gpu_server_id, otherwise gpu_server::find_first_available).
Transition the job to running only if mark_running returns Some — if the job was cancelled between claim and start, the worker drops it without contacting the GPU.
POST /v1/inference to the GPU. Read the response body through a streaming bounded reader (1 MB cap) to bound memory exposure to a compromised GPU. Persist the parsed result via mark_completed.
On HTTP 429/503, mark the job failed but do not take the GPU offline. On any other failure, mark both the job and the GPU server error; failure messages are truncated to 256 characters before storage.

Inference Proxy

POST /api/internal/proxy/inference (routes/inference_proxy.rs) is the synchronous, low-latency path. The handler resolves (model_name, model_version) to an active model row, picks a GPU via find_first_available, and forwards the request with a 30 s timeout. The upstream response is read with the same 1 MB streaming bound; oversize responses fail closed with 400 Bad Request. Persistent upstream errors mark the GPU server with last_error so the health checker can rotate it out.

Real-time Anomaly Detection

IngestManager::process_batch performs inline anomaly detection. For every metadata sample, sensor values are checked against configurable thresholds. When a threshold trips:

An anomaly_reports row is created with source='realtime', the offending stream, value, threshold, and timestamp.
An AnomalyEvent is broadcast on a tokio::sync::broadcast channel sized by ANOMALY_BROADCAST_CAPACITY (default 10000). The GET /api/internal/anomaly/live SSE endpoint forwards events filtered by the subscriber's facility_id and emits an event: lag notice when the channel overflows.

The hook runs synchronously inside the ingest path, so threshold checks must remain O(1). The retrospective batch path (POST /api/internal/anomaly/batch) creates a placeholder report and broadcasts a batch_analysis event; automated batch analysis itself is not yet wired up.

Anomaly Threshold Registry & Overrides

The "configurable thresholds" above are not hardcoded: threshold_registry.rs holds the in-code registry of anomaly-detection keys (defaults, unit, [min, max] bounds), and services/threshold_resolver.rs (ThresholdCache) resolves each key through a device → facility → hardware_version → global → registry default hierarchy from DB-backed override rows (repo::threshold_override, admin CRUD at routes/inference_ops.rs). The cache refreshes on a 30 s TTL and on explicit invalidate() after an admin write; its snapshot content hash doubles as the config_version served with the GPU-fleet rubric (services/rubric.rs) so an unchanged override set yields a cheap 304. The ingest detector and the inference-result overheating guard resolve per-device (folding in hardware_version); the facility admin view's effective value and the rubric preview resolve facility-wide only, with no device in context — see API §29 for the full contract.

Naming note: four modules share registry/config/key/scope/version vocabulary but are unrelated systems — config.rs (static startup config read from env vars), config_manager.rs (the runtime-mutable system_config DB table, hot-reloaded via NOTIFY config_changed, admin-editable without a restart), config_registry.rs (the fixed per-device NVS config-key schema — duty + smart-sensor keys — that drives per-device remote-config PUT validation), and threshold_registry.rs (the fixed anomaly-threshold key schema above). Two unrelated things are also both called config_version: the per-device remote-config version (an integer bumped in device_config per admin PUT, exposed by routes/device_config.rs) and the threshold resolver's snapshot content hash (exposed as the rubric's config_version/ETag by services/rubric.rs) — they share nothing beyond the name.

Detection & Derivation Workers

Besides the inline real-time hook, several gated background workers run on fixed cadences (each downloads audio chunks and decodes them off the async runtime via spawn_blocking):

services/noise_level.rs — derives a per-device noise-level (dBFS) timeseries from session audio. Per-pass chunk count is bounded by MAX_CHUNKS_PER_STREAM.
services/acoustic_detector.rs (Phase 3, gated by ACOUSTIC_DETECTOR_ENABLED) — acoustic predictive-maintenance detector.
services/fusion_detector.rs (Phase 2, gated by FUSION_DETECTOR_ENABLED) — multi-sensor fusion anomaly detector (Mahalanobis distance).
services/detector_common.rs — shared watermark-advance helpers (contiguous-prefix rule, no-leapfrog invariant) reused by both acoustic_detector and fusion_detector; pure functions with unit coverage and no DB dependency.
services/fingerprint_worker.rs (Phase 7) — fills audio_fingerprints for search-by-sound.
services/voice.rs (Phase 12, gated by VOICE_ESCALATION_ENABLED) — LLM voice-call escalation via Twilio.

Watermark cursor pattern (AGG-C5, cycle 5): the codebase uses two distinct watermark primitives depending on cursor granularity. worker_watermarks (a single global row per worker, keyed by worker name) is for global single-cursor workers — services/timeline_rollup_worker.rs (below) folds metadata_chunks into timeline_rollups (300 s) AND timeline_rollups_fine (60 s) across every device from one process-wide cursor. Per-device watermark state rows (device_feature_baselines.last_session_us / last_acoustic_us) are for per-device workers — acoustic_detector and fusion_detector must track progress independently per device to preserve the contiguous-prefix / no-leapfrog invariant (services/detector_common.rs) on a per-device basis. New background workers should pick the primitive by cursor granularity: one process-wide cursor → worker_watermarks; one cursor per entity → a per-entity watermark column/table.

Background Retention Workers

Eight high-volume tables are bounded by dedicated retention workers spawned at boot (each with an owned handle that is aborted+joined on graceful shutdown). Workers are staggered with per-worker startup offsets (C4-PERF-1 + C5-PERF-1) to avoid a thundering herd at the 30-min mark, and DELETEs are batched (C4-PERF-2) with a 50ms inter-batch pause (C5-PERF-2) to prevent table-lock and pool-pin on backlog drain.

Each retention worker used to hand-roll its own ~60-line loop (disable-on-zero guard, startup-offset sleep, interval ticking, first-tick consume, Ok(0)/Ok(n)/Err prune-logging match). services/periodic.rs now folds that skeleton into two generic primitives: spawn_periodic (the staggered-tick loop, routed through services::supervision::spawn_supervised) and spawn_retention (adds the disable-on-zero guard and the cutoff/prune-logging match on top, taking each worker's own delete_*_older_than call as a closure). Each *_retention.rs module is now a thin wrapper: it keeps its own STARTUP_OFFSET_SECS const and exposes a spawn(pool, retention_days, health) -> JoinHandle<()> that just calls periodic::spawn_retention with its name, offset, and prune closure.

periodic::retention_worker_registry() is the single source of truth for all eight workers — name, startup offset, a retention_days accessor into AppConfig, and the module's spawn function pointer. main.rs's boot sequence loops over this table to perform the actual spawn (the registry IS the spawn list, closing the AGG-C3-29 gap where a registry entry could silently drift from what boot actually spawned), collecting each returned handle into a retention_handles vec that is drained by the same abort+join shutdown path as every other boot-spawned worker. There is no longer a per-worker cooperative shutdown token — spawn_periodic dropped that parameter (WP10) since no production caller ever wired it to a real, externally-held token; the workers are still drained at shutdown, just via main.rs's abort-based registry.

The eight workers (name, table, cadence, default horizon):

services/device_events_retention.rs — prunes device_events on a 30-minute cadence (DEVICE_EVENTS_RETENTION_DAYS, default 30).
middleware/api_request_log.rs — Tower middleware layer that captures every inbound HTTP request (method, path, status, latency, request body up to a configurable size cap) and writes a row to api_request_logs. This is the write (capture) side of the request-log subsystem.
services/api_request_log_retention.rs — prunes api_request_logs (which grows at ~4.3M rows/day with multi-KiB payloads) on a 30-minute cadence (default 30-day horizon). Added C18-AGG-8 to prevent unbounded disk growth on the production t4g.medium host. This is the retention (prune) side paired with the middleware above.
services/audit_log_retention.rs — prunes audit_log (append-only BIGSERIAL, default 365-day horizon for compliance; S8-2).
services/anomaly_report_retention.rs — prunes anomaly_reports (~26.5k rows/24h, default 90-day horizon; S8-1).
services/ingest_sessions_retention.rs — prunes closed ingest_sessions (cascades to metadata_chunks via ON DELETE CASCADE, default 90-day horizon; S8-10).
services/device_health_history_retention.rs — prunes device_health_history (every health POST writes a row, default 90-day horizon; S8-10).
services/live_stream_connections_retention.rs — prunes CLOSED live_stream_connections rows (default 90-day horizon; S8-10).
services/uwb_surveys_retention.rs — prunes uwb_surveys (cascades to uwb_survey_edges + uwb_survey_solutions via ON DELETE CASCADE, default 90-day horizon; C4-DEF-4). Surveys accumulate at the firmware's ~5-min cadence × facility count.

C5-PERF-1 also added startup offsets to the three live-stream background tasks (spawn_retention_task = 210s, spawn_cleanup_task = 240s, spawn_orphan_reaper = 15s) so they no longer collide with the offset-0 anomaly_report worker on the 30-min boundary. These three (plus the rollup/detector/noise/fingerprint workers, not yet converted to spawn_periodic) are still listed in periodic::worker_registry() alongside the eight retention entries so the boot-time assert_startup_offsets_unique() fail-fast and the services::mod stagger tripwire cover the whole fleet.

The live-archive retention worker (live streaming) is described above under the live-streaming section.

Worker Supervision and `/health/ready`

The eight retention workers above are spawned via services::periodic::spawn_retention (built on spawn_periodic, which itself calls services::supervision::spawn_supervised — not bare tokio::spawn). Each worker future is wrapped in AssertUnwindSafe(fut).catch_unwind(). On panic the supervisor emits a structured tracing::error! (with the worker name for log aggregation) and bumps a process-wide WorkerHealth.degraded counter (Arc<AtomicU64>, relaxed ordering — observability only). The counter is reflected in /health/ready's 200-response body as {"workers":{"degraded":N}} WITHOUT changing the HTTP status — a panicked retention worker is not a request-serving failure, so the load-balancer contract is preserved. Operator dashboards alert on degraded > 0 and the pod is not taken out of rotation. The supervisor returns the actual JoinHandle (single-task catch_unwind, not a two-layer spawn), so main.rs's shutdown contract (handle.abort() + timeout(5s, handle)) works unchanged. The server crate is built with panic = "unwind" (workspace Cargo.toml), so catch_unwind is operative. See services/supervision.rs for the design rationale (the C4-DEF-6 sub 4 "silently-panicked worker" risk) and services/periodic.rs for the shared staggered-driver rationale (ARCH-3/ARCH-5, AGG-C3-29/WP11, AGG-C3-30/WP10).

The three live-streaming background tasks (retention, broadcast-map cleanup, orphan-reaper) are also spawned via spawn_supervised inside live/manager.rs (as live_retention, live_cleanup, and live_orphan_reaper), so a panic in any of them is caught, logged, and reflected in /health/ready's {"workers":{"degraded":N}} counter exactly like the retention workers above. The C4-DEF-6 sub 4 retrofit that once tracked them as a residual (bare tokio::spawn) is complete.

Other Boot-Spawned Workers

Additional workers spawned at boot (each with an owned handle aborted+joined on graceful shutdown) that are not covered by the sections above:

services/alert_trigger.rs — the anomaly-to-alert bridge worker (described in the next section).
services/gpu_health_checker.rs — periodic GPU health probe; flips GPU server status to unhealthy after consecutive probe failures.
Inference reaper (main.rs) — reclaims stale inference_jobs rows whose claimed worker_id has not heartbeated within INFERENCE_STALE_TIMEOUT_SECS, returning them to the queue.
Transcode worker (xylolabs-transcode::worker) — polls transcode_jobs on a fixed cadence and runs FFmpeg, gated by TRANSCODE_STALE_TIMEOUT_SECS for stale-job reaping.
Config listener (config_listener_handle at main.rs:113) — watches the runtime-config reload channel (set by config_manager.rs) and re-applies updated values to live subsystems.
Ingest manager flusher + watchdog (flusher_handle + watchdog_handle at main.rs:246,250) — periodic flush of buffered metadata chunks to Postgres and stale-session reaper for the ingest/manager.rs engine.
Facility live pruner (facility_live_pruner_handle at main.rs:258) — per-facility live-stream housekeeping (in addition to the three live tasks spawned by live/manager.rs).
services/timeline_rollup_worker.rs (AGG-C4-2, dual-grain fold added AGG-C6-TL2) — folds each decoded sample into BOTH 300 s timeline_rollups buckets AND 60 s timeline_rollups_fine buckets in the same pass (one decode, one global worker_watermarks cursor), serving wide-window agg=auto GET /devices/{id}/timeseries requests without re-scanning raw chunks; also prunes both rollup tables past their respective retention horizons (the fine grain on a much shorter horizon than the coarse grain — see the fine-rollup section below).

Timeline Rollup Grains & Indexing

timeline_rollups (300 s buckets, 90-day retention) and timeline_rollups_fine (60 s buckets, 14-day retention) are folded from the SAME decode pass over metadata_chunks (AGG-C6-TL2) — the fine grid nests evenly inside the coarse grid (a 300 s bucket is the exact union of five 60 s buckets), so agg=auto can pick the tightest grain that satisfies a requested window without a second raw-chunk scan. Both tables share the same serving-index shape: (device_id, bucket_start_us) with an INCLUDE payload (stream_name, sample_count, value_sum, value_min, value_max) so a windowed query is an index-only range seek rather than a per-row heap fetch. The fine table's covering index (migration 20260715120000) was added after the coarse table's cycle-5 index, once fine-grain query volume (served up to 3 days, on a table 5x denser than the coarse one) made the missing INCLUDE payload's heap-fetch cost measurable.

Anomaly-to-Alert Bridge

When an anomaly report is created, the system evaluates whether it matches any configured alert rule (threshold, duration, severity). If a rule matches, the user-facing alert pipeline (/api/v1/alerts) is invoked, creating an alert record and dispatching notifications (email, SMS, push) per the facility's alert configuration. This bridges the internal anomaly subsystem with the user-facing alerting system.

Native Push (FCM) & Notification Locale

Besides browser Web Push (VAPID, services/push.rs), native Android installations register an FCM token via POST /api/v1/push/tokens (repo::device_push_token, device_push_tokens table). services/fcm.rs sends through FCM HTTP v1: a Google service-account JWT is exchanged for a short-lived OAuth2 access token (cached in-process) and messages POST to messages:send; a token FCM reports as gone is pruned by the dispatch loop. A registration token identifies one app installation, so the latest authenticated registration atomically claims it — a delayed logout from a prior user cannot delete a token another user has since claimed. Each token (and each Web Push subscription) carries its own locale ("ko" | "en", default "ko", normalized by services::alert_text::normalize_locale), so server-composed push copy renders in the recipient's own UI language rather than a single facility-wide language.