Rockbox open source high quality audio player as a Music Player Daemon
mpris rockbox mpd libadwaita audio rust zig deno
2
fork

Configure Feed

Select the types of activity you want to include in your feed.

Add AirPlay crate README

+473
+473
crates/airplay/README.md
··· 1 + # rockbox-airplay — AirPlay PCM Sink 2 + 3 + This document traces every hop an audio frame takes from the Rockbox C firmware 4 + through the `rockbox-airplay` Rust crate to an AirPlay (RAOP) receiver. 5 + 6 + --- 7 + 8 + ## Table of contents 9 + 10 + 1. [Overview](#overview) 11 + 2. [Layer map](#layer-map) 12 + 3. [PCM sink vtable (`pcm-airplay.c`)](#pcm-sink-vtable-pcm-airplayc) 13 + 4. [The DMA thread](#the-dma-thread) 14 + 5. [FFI boundary](#ffi-boundary) 15 + 6. [Session lifecycle (`lib.rs`)](#session-lifecycle-librs) 16 + 7. [RTSP handshake (`rtsp.rs`)](#rtsp-handshake-rtsprs) 17 + 8. [ALAC encoding (`alac.rs`)](#alac-encoding-alacrs) 18 + 9. [RTP audio stream (`rtp.rs`)](#rtp-audio-stream-rtprs) 19 + 10. [RTCP synchronisation](#rtcp-synchronisation) 20 + 11. [NTP timing responder](#ntp-timing-responder) 21 + 12. [Track transitions](#track-transitions) 22 + 13. [Configuration](#configuration) 23 + 14. [AirPlay 2 probe](#airplay-2-probe) 24 + 15. [Gotchas and known limits](#gotchas-and-known-limits) 25 + 26 + --- 27 + 28 + ## Overview 29 + 30 + The AirPlay sink lets Rockbox stream audio to any RAOP-compatible receiver — 31 + Apple TV, HomePod, Airport Express, or third-party software such as 32 + [shairport-sync](https://github.com/mikebrady/shairport-sync). It implements 33 + **AirPlay 1 (RAOP)** entirely in pure Rust with no external C libraries. 34 + 35 + The protocol stack looks like: 36 + 37 + ``` 38 + RTSP/TCP ── session negotiation (ANNOUNCE, SETUP, RECORD, TEARDOWN) 39 + RTP/UDP ── ALAC-encoded audio frames 40 + RTCP/UDP ── synchronisation (NTP send-report) every ~350 ms 41 + UDP ── NTP timing response service 42 + ``` 43 + 44 + --- 45 + 46 + ## Layer map 47 + 48 + ``` 49 + ┌────────────────────────────────────────────────────────┐ 50 + │ Rockbox C firmware (pcm.c, audio thread) │ 51 + │ pcm_play_data() → sink.ops.play() │ 52 + │ pcm_play_dma_complete_callback() per chunk │ 53 + └───────────────────┬────────────────────────────────────┘ 54 + │ raw S16LE stereo PCM chunks 55 + ┌───────────────────▼────────────────────────────────────┐ 56 + │ firmware/target/hosted/pcm-airplay.c │ 57 + │ sink_dma_start() → pcm_airplay_connect() │ 58 + │ airplay_thread() → pcm_airplay_write() │ 59 + │ sink_dma_stop() → pcm_airplay_stop() │ 60 + └───────────────────┬────────────────────────────────────┘ 61 + │ extern "C" FFI 62 + ┌───────────────────▼────────────────────────────────────┐ 63 + │ crates/airplay/src/lib.rs │ 64 + │ AirPlaySession { sender, rtsp, buf, first_frame } │ 65 + │ pcm_airplay_connect() — RTSP handshake │ 66 + │ pcm_airplay_write() — ALAC frame dispatch │ 67 + │ pcm_airplay_stop() — TEARDOWN + session clear │ 68 + └───────┬───────────────────────┬────────────────────────┘ 69 + │ RTSP/TCP │ ALAC frames 70 + ┌───────▼────────────┐ ┌───────▼──────────────────────┐ 71 + │ rtsp.rs │ │ alac.rs │ 72 + │ RtspClient │ │ encode_frame() │ 73 + │ ANNOUNCE / SETUP │ │ BitWriter │ 74 + │ RECORD / TEARDOWN │ │ 352 S16LE → 1411-byte frame │ 75 + └────────────────────┘ └───────┬──────────────────────┘ 76 + │ encoded frames 77 + ┌───────▼──────────────────────┐ 78 + │ rtp.rs │ 79 + │ RtpSender │ 80 + │ send_audio() — RTP/UDP │ 81 + │ send_sync() — RTCP │ 82 + │ timing_responder() — NTP │ 83 + └──────────────────────────────┘ 84 + │ UDP packets 85 + ┌───────▼──────────────────────┐ 86 + │ AirPlay receiver │ 87 + │ (Apple TV, shairport-sync…) │ 88 + └──────────────────────────────┘ 89 + ``` 90 + 91 + --- 92 + 93 + ## PCM sink vtable (`pcm-airplay.c`) 94 + 95 + `firmware/target/hosted/pcm-airplay.c` implements `struct pcm_sink` with the 96 + following vtable: 97 + 98 + | Op | Implementation | 99 + |-------------------|---------------------------------------------------------------------| 100 + | `init` | `pthread_mutex_init` (recursive) | 101 + | `postinit` | no-op | 102 + | `set_freq` | records `current_sample_rate` from `hw_freq_sampr[freq]` | 103 + | `lock` / `unlock` | `pthread_mutex_lock/unlock` | 104 + | `play` | `sink_dma_start` — connects, spawns `airplay_thread` | 105 + | `stop` | `sink_dma_stop` — signals thread, joins, calls `pcm_airplay_stop()` | 106 + 107 + `airplay_pcm_sink` is registered at index `PCM_SINK_AIRPLAY = 2` in the 108 + `sinks[]` array in `firmware/pcm.c`. 109 + 110 + --- 111 + 112 + ## The DMA thread 113 + 114 + `sink_dma_start(addr, size)` stores the initial PCM pointer/length under the 115 + mutex, then spawns `airplay_thread`. The thread mimics a hardware DMA 116 + interrupt loop: 117 + 118 + ``` 119 + while not stopped: 120 + 1. lock → grab (data, size) → clear pcm_data/pcm_size → unlock 121 + 2. if data: pcm_airplay_write(data, size) 122 + 3. lock → pcm_play_dma_complete_callback(OK, &pcm_data, &pcm_size) → unlock 123 + 4. if no more data: break 124 + 5. pcm_play_dma_status_callback(STARTED) ← tells audio engine chunk consumed 125 + ``` 126 + 127 + Unlike the FIFO sink, there is **no explicit real-time pacing** in C. Pacing is 128 + handled inside `rtp.rs` — the RTP sender sleeps to maintain the correct 129 + wall-clock transmission rate based on the RTP timestamp increment. 130 + 131 + --- 132 + 133 + ## FFI boundary 134 + 135 + `crates/airplay/src/lib.rs` exports three `#[no_mangle] extern "C"` functions: 136 + 137 + | C symbol | Rust function | Purpose | 138 + |------------------------|------------------------|--------------------------------------| 139 + | `pcm_airplay_set_host` | `pcm_airplay_set_host` | Store `HOST` + `PORT` atomics/mutex | 140 + | `pcm_airplay_connect` | `pcm_airplay_connect` | Open RTSP + RTP session (idempotent) | 141 + | `pcm_airplay_write` | `pcm_airplay_write` | Buffer PCM, encode ALAC, send RTP | 142 + | `pcm_airplay_stop` | `pcm_airplay_stop` | Send TEARDOWN, clear session | 143 + 144 + `HOST` is a `Mutex<Option<String>>` and `PORT` is an `AtomicU16` (default 145 + 5000). `SESSION` is a `Mutex<Option<AirPlaySession>>` — the session is 146 + created once and reused across `write` calls for the lifetime of a track. 147 + 148 + ### Force-link shim 149 + 150 + Because `rockbox-airplay` is an `rlib`, its symbols are only included in 151 + `librockbox_cli.a` if something references them. `crates/cli/src/lib.rs` 152 + contains: 153 + 154 + ```rust 155 + use rockbox_airplay::_link_airplay as _; 156 + ``` 157 + 158 + where `_link_airplay` is a public no-op function in `lib.rs`. This is enough 159 + to pull the entire crate into the link graph. 160 + 161 + --- 162 + 163 + ## Session lifecycle (`lib.rs`) 164 + 165 + `pcm_airplay_connect()` is called from `sink_dma_start()` at the start of 166 + every track. It is guarded by `SESSION`: 167 + 168 + ``` 169 + if SESSION is already Some → return OK immediately (idempotent) 170 + 171 + 1. Probe AirPlay 2 (non-fatal — logs and falls through on failure) 172 + 2. RtpSender::bind(host, ports) ← binds three UDP sockets 173 + 3. RtspClient::new(host, port) ← opens TCP connection to receiver 174 + 4. rtsp.announce(sdp) ← sends SDP describing the ALAC stream 175 + 5. rtsp.setup(transport) ← negotiates UDP port numbers 176 + 6. rtsp.record() ← starts the session 177 + 7. sender.send_initial_sync() ← sends first RTCP sync packet 178 + 8. SESSION = Some(AirPlaySession { sender, rtsp, buf: [], first_frame: true }) 179 + ``` 180 + 181 + `pcm_airplay_write(data, len)` appends the incoming PCM bytes to `buf`, then 182 + drains complete 352-sample (1408-byte) frames in a loop: 183 + 184 + ```rust 185 + while buf.len() >= FRAME_SIZE: 186 + frame_pcm = buf.drain(..FRAME_SIZE) 187 + alac_frame = alac::encode_frame(&frame_pcm) 188 + sender.send_audio(&alac_frame, first_frame) 189 + first_frame = false 190 + ``` 191 + 192 + `pcm_airplay_stop()` sends RTSP TEARDOWN and sets `SESSION = None`. 193 + 194 + --- 195 + 196 + ## RTSP handshake (`rtsp.rs`) 197 + 198 + `RtspClient` speaks synchronous RTSP over a single TCP connection. The full 199 + exchange for one session is: 200 + 201 + ### 1. ANNOUNCE 202 + 203 + Sends an SDP body describing the ALAC codec: 204 + 205 + ``` 206 + v=0 207 + o=iTunes <session_id> 0 IN IP4 <local_ip> 208 + s=iTunes 209 + c=IN IP4 <receiver_ip> 210 + t=0 0 211 + m=audio 0 RTP/AVP 96 212 + a=rtpmap:96 AppleLossless 213 + a=fmtp:96 352 0 16 40 10 14 2 255 0 0 44100 214 + ``` 215 + 216 + The `fmtp` parameters encode: 217 + `<frames_per_packet> <version> <bit_depth> <rice_history_mult> 218 + <rice_initial_history> <rice_limit> <channels> <max_run> <max_frame_bytes> 219 + <avg_bit_rate> <sample_rate>` 220 + 221 + ### 2. SETUP 222 + 223 + Sends a `Transport` header requesting UDP: 224 + 225 + ``` 226 + Transport: RTP/AVP/UDP;unicast;interleaved=0-1; 227 + client_port=<audio_port>-<ctrl_port> 228 + ``` 229 + 230 + `interleaved=0-1` is required by many receivers even though the transport is 231 + UDP (not RTSP interleaved). The response carries the server's UDP port pair, 232 + extracted by `parse_port()`. 233 + 234 + ### 3. RECORD 235 + 236 + Starts the stream. Sends `RTP-Info` with sequence number and RTP timestamp. 237 + 238 + ### 4. SET_PARAMETER (volume) 239 + 240 + Sets playback volume. Sent as a float string in a `text/parameters` body: 241 + `volume: -20.0` (range −144 to 0; 0 is full volume). 242 + 243 + ### 5. TEARDOWN 244 + 245 + Gracefully terminates the session. Called from `pcm_airplay_stop()`. 246 + 247 + --- 248 + 249 + ## ALAC encoding (`alac.rs`) 250 + 251 + `encode_frame(samples: &[i16])` encodes exactly **352 stereo S16LE samples** 252 + (1408 bytes of PCM) into an ALAC verbatim ("uncompressed escape") frame. 253 + 254 + ### Frame format 255 + 256 + The Hammerton ALAC decoder expects this exact bit layout: 257 + 258 + ``` 259 + Bits Width Field 260 + 0–2 3 channels − 1 (= 1 for stereo) 261 + 3–6 4 discarded (0) 262 + 7–18 12 discarded (0) 263 + 19 1 hassize = 0 264 + 20–23 4 uncompressed_bytes = 0 265 + 24 1 isNotCompressed = 1 ← verbatim frame flag 266 + 25+ 32 each sample as big-endian signed 16-bit, left then right 267 + ``` 268 + 269 + Output size = 4 bytes header + 352 × 2 channels × 2 bytes/sample 270 + = **1412 bytes** (rounded up to byte boundary). 271 + 272 + ### BitWriter 273 + 274 + `BitWriter` accumulates bits MSB-first into a `Vec<u8>`: 275 + 276 + ```rust 277 + fn write(&mut self, value: u64, nbits: u32) 278 + fn align(&mut self) // zero-pad to next byte boundary 279 + ``` 280 + 281 + The encoder calls `write` for the 25-bit header fields and then for each 282 + sample (16 bits per channel, interleaved L/R), then `align()` to flush the 283 + final byte. 284 + 285 + --- 286 + 287 + ## RTP audio stream (`rtp.rs`) 288 + 289 + `RtpSender` opens **three UDP sockets** at construction time: 290 + 291 + | Socket | Direction | Purpose | 292 + |---------------|-------------------------|---------------------| 293 + | `audio_sock` | → receiver audio port | RTP audio frames | 294 + | `ctrl_sock` | ↔ receiver control port | RTCP sync packets | 295 + | `timing_sock` | ↔ receiver timing port | NTP timing exchange | 296 + 297 + ### `send_audio(frame, marker)` 298 + 299 + Builds a 12-byte RTP header: 300 + 301 + ``` 302 + 0 1 2 3 303 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 304 + ├─┤─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤ 305 + │V=2│P│X│ CC │M│ PT=96 │ Sequence Number │ 306 + ├───────────────────────────────┼─────────────────────────────┤ 307 + │ Timestamp (RTP clock units) │ 308 + ├─────────────────────────────────────────────────────────────┤ 309 + │ SSRC │ 310 + └─────────────────────────────────────────────────────────────┘ 311 + ``` 312 + 313 + - `M` (marker) = 1 on the first frame of a session, 0 thereafter. 314 + - Timestamp increments by **352** per frame (one ALAC frame = 352 samples). 315 + - SSRC is a random 32-bit value chosen at sender creation. 316 + 317 + **Real-time pacing**: `send_audio` tracks the expected transmission instant 318 + using `Instant` and `frame_count × Duration_per_frame` and calls 319 + `thread::sleep` when the sender is running ahead. 320 + 321 + --- 322 + 323 + ## RTCP synchronisation 324 + 325 + `send_sync(first)` sends a 20-byte RTCP NTP Send Report to the control socket 326 + every **44 frames** (~350 ms at 44100 Hz): 327 + 328 + ``` 329 + Byte Field 330 + 0 V=2, P=0, RC=0 331 + 1 PT=200 (SR) or 0xD4 (first sync) 332 + 2–3 length = 4 (words after fixed header) 333 + 4–7 SSRC 334 + 8–11 NTP timestamp seconds (since 1900-01-01) 335 + 12–15 NTP timestamp fraction (2^32 units) 336 + 16–19 RTP timestamp (matching the next audio frame's timestamp) 337 + ``` 338 + 339 + `NTP_EPOCH_DELTA = 0x83AA_7E80` converts UNIX time (seconds since 1970) to NTP 340 + time (seconds since 1900). 341 + 342 + The first sync packet (`first=true`) uses PT=`0xD4` (not standard SR) — some 343 + receivers require this to accept the initial synchronisation. 344 + 345 + --- 346 + 347 + ## NTP timing responder 348 + 349 + A background thread (`timing_responder`) listens on `timing_sock` and answers 350 + NTP timing requests from the receiver: 351 + 352 + ``` 353 + Request PT = 0xD2 (timing request) 354 + Response PT = 0xD3 (timing response) 355 + 356 + Response body (32 bytes): 357 + [0–3] SSRC 358 + [4–7] 0 (reference seconds) 359 + [8–11] 0 (reference fraction) 360 + [12–15] received seconds (echoed from request) 361 + [16–19] received fraction (echoed from request) 362 + [20–23] send seconds (current NTP time) 363 + [24–27] send fraction (current NTP time) 364 + ``` 365 + 366 + Many receivers stall playback if timing responses stop arriving. The thread 367 + runs for the entire duration of the session. 368 + 369 + --- 370 + 371 + ## Track transitions 372 + 373 + When Rockbox moves to the next track: 374 + 375 + 1. `sink_dma_stop()` is called → `pcm_airplay_stop()` → RTSP TEARDOWN → 376 + `SESSION = None`. 377 + 2. `sink_dma_start()` is called for the new track → `pcm_airplay_connect()` → 378 + new RTSP session with fresh RTP sequence/timestamp counters. 379 + 380 + There is a brief gap (TEARDOWN round-trip + new ANNOUNCE/SETUP/RECORD) between 381 + tracks. This is inherent to RAOP and is typically inaudible (<100 ms). 382 + 383 + --- 384 + 385 + ## Configuration 386 + 387 + In `~/.config/rockbox.org/settings.toml`: 388 + 389 + ```toml 390 + audio_output = "airplay" 391 + airplay_host = "192.168.1.x" # IP of the AirPlay receiver 392 + airplay_port = 5000 # optional, default 5000 393 + ``` 394 + 395 + `crates/settings/src/lib.rs:load_settings()` reads these values and calls: 396 + 397 + ```rust 398 + pcm::airplay_set_host(&host, port); 399 + pcm::switch_sink(PCM_SINK_AIRPLAY); 400 + ``` 401 + 402 + `airplay_set_host` stores the host in `HOST: Mutex<Option<String>>` and the 403 + port in `PORT: AtomicU16`. These are read by `pcm_airplay_connect()` at the 404 + start of each track. 405 + 406 + --- 407 + 408 + ## AirPlay 2 probe 409 + 410 + `pcm_airplay_connect()` first attempts an AirPlay 2 handshake (PTP-based). If 411 + it fails (connection refused, or the receiver does not support AirPlay 2) the 412 + error is logged at `tracing::debug!` level and the function falls through to the 413 + AirPlay 1 / RAOP path. This makes the probe transparent to the user. 414 + 415 + The AirPlay 2 path uses the cryptographic dependencies declared in 416 + `Cargo.toml`: 417 + 418 + ```toml 419 + x25519-dalek # key exchange 420 + ed25519-dalek # signature 421 + chacha20poly1305 # AEAD encryption 422 + sha2, hkdf, hmac # key derivation 423 + num-bigint # SRP big-integer arithmetic 424 + ``` 425 + 426 + None of these are needed for the AirPlay 1 code path. 427 + 428 + --- 429 + 430 + ## Gotchas and known limits 431 + 432 + ### 1. Only one simultaneous receiver 433 + 434 + The `SESSION` mutex holds a single `AirPlaySession`. Sending to multiple 435 + AirPlay devices simultaneously is not supported. For multi-room output use 436 + the Squeezelite sink with multiple clients, or run multiple rockboxd instances. 437 + 438 + ### 2. Receiver must be on the local network 439 + 440 + RAOP uses UDP with no NAT traversal. The receiver must be directly reachable 441 + at the configured IP. Multicast discovery (mDNS/Bonjour) is not implemented — 442 + you must supply the IP manually. 443 + 444 + ### 3. `interleaved=0-1` in Transport header 445 + 446 + Even though the transport is plain UDP, most receivers require the 447 + `interleaved=0-1` parameter in the SETUP `Transport` header. Omitting it causes 448 + the receiver to ignore the `RECORD` command silently. 449 + 450 + ### 4. Verbatim ALAC only (no compression) 451 + 452 + `alac.rs` only implements the verbatim escape frame (`isNotCompressed=1`). 453 + Bitrate is fixed at `sample_rate × 4 bytes/s = 176,400 bytes/s` at 44.1 kHz. 454 + This is fine for LAN streaming but wasteful compared to the compressed ALAC 455 + path. 456 + 457 + ### 5. Fixed 44100 Hz sample rate 458 + 459 + The RTSP SDP and ALAC frame size constants are hard-coded for 44100 Hz. 460 + Playback of 48 kHz or 96 kHz tracks is not tested and may produce incorrect 461 + pitch or receiver errors. 462 + 463 + ### 6. Logging uses `tracing`, never `println!` 464 + 465 + All diagnostic output is routed through the `tracing` crate. To see the full 466 + AirPlay negotiation: 467 + 468 + ```sh 469 + RUST_LOG=rockbox_airplay=debug rockboxd 470 + ``` 471 + 472 + Never add `println!` or `eprintln!` — those bypass the log filter and pollute 473 + stdout, breaking FIFO/pipe mode.