···11+# rockbox-airplay — AirPlay PCM Sink
22+33+This document traces every hop an audio frame takes from the Rockbox C firmware
44+through the `rockbox-airplay` Rust crate to an AirPlay (RAOP) receiver.
55+66+---
77+88+## Table of contents
99+1010+1. [Overview](#overview)
1111+2. [Layer map](#layer-map)
1212+3. [PCM sink vtable (`pcm-airplay.c`)](#pcm-sink-vtable-pcm-airplayc)
1313+4. [The DMA thread](#the-dma-thread)
1414+5. [FFI boundary](#ffi-boundary)
1515+6. [Session lifecycle (`lib.rs`)](#session-lifecycle-librs)
1616+7. [RTSP handshake (`rtsp.rs`)](#rtsp-handshake-rtsprs)
1717+8. [ALAC encoding (`alac.rs`)](#alac-encoding-alacrs)
1818+9. [RTP audio stream (`rtp.rs`)](#rtp-audio-stream-rtprs)
1919+10. [RTCP synchronisation](#rtcp-synchronisation)
2020+11. [NTP timing responder](#ntp-timing-responder)
2121+12. [Track transitions](#track-transitions)
2222+13. [Configuration](#configuration)
2323+14. [AirPlay 2 probe](#airplay-2-probe)
2424+15. [Gotchas and known limits](#gotchas-and-known-limits)
2525+2626+---
2727+2828+## Overview
2929+3030+The AirPlay sink lets Rockbox stream audio to any RAOP-compatible receiver —
3131+Apple TV, HomePod, Airport Express, or third-party software such as
3232+[shairport-sync](https://github.com/mikebrady/shairport-sync). It implements
3333+**AirPlay 1 (RAOP)** entirely in pure Rust with no external C libraries.
3434+3535+The protocol stack looks like:
3636+3737+```
3838+RTSP/TCP ── session negotiation (ANNOUNCE, SETUP, RECORD, TEARDOWN)
3939+RTP/UDP ── ALAC-encoded audio frames
4040+RTCP/UDP ── synchronisation (NTP send-report) every ~350 ms
4141+UDP ── NTP timing response service
4242+```
4343+4444+---
4545+4646+## Layer map
4747+4848+```
4949+┌────────────────────────────────────────────────────────┐
5050+│ Rockbox C firmware (pcm.c, audio thread) │
5151+│ pcm_play_data() → sink.ops.play() │
5252+│ pcm_play_dma_complete_callback() per chunk │
5353+└───────────────────┬────────────────────────────────────┘
5454+ │ raw S16LE stereo PCM chunks
5555+┌───────────────────▼────────────────────────────────────┐
5656+│ firmware/target/hosted/pcm-airplay.c │
5757+│ sink_dma_start() → pcm_airplay_connect() │
5858+│ airplay_thread() → pcm_airplay_write() │
5959+│ sink_dma_stop() → pcm_airplay_stop() │
6060+└───────────────────┬────────────────────────────────────┘
6161+ │ extern "C" FFI
6262+┌───────────────────▼────────────────────────────────────┐
6363+│ crates/airplay/src/lib.rs │
6464+│ AirPlaySession { sender, rtsp, buf, first_frame } │
6565+│ pcm_airplay_connect() — RTSP handshake │
6666+│ pcm_airplay_write() — ALAC frame dispatch │
6767+│ pcm_airplay_stop() — TEARDOWN + session clear │
6868+└───────┬───────────────────────┬────────────────────────┘
6969+ │ RTSP/TCP │ ALAC frames
7070+┌───────▼────────────┐ ┌───────▼──────────────────────┐
7171+│ rtsp.rs │ │ alac.rs │
7272+│ RtspClient │ │ encode_frame() │
7373+│ ANNOUNCE / SETUP │ │ BitWriter │
7474+│ RECORD / TEARDOWN │ │ 352 S16LE → 1411-byte frame │
7575+└────────────────────┘ └───────┬──────────────────────┘
7676+ │ encoded frames
7777+ ┌───────▼──────────────────────┐
7878+ │ rtp.rs │
7979+ │ RtpSender │
8080+ │ send_audio() — RTP/UDP │
8181+ │ send_sync() — RTCP │
8282+ │ timing_responder() — NTP │
8383+ └──────────────────────────────┘
8484+ │ UDP packets
8585+ ┌───────▼──────────────────────┐
8686+ │ AirPlay receiver │
8787+ │ (Apple TV, shairport-sync…) │
8888+ └──────────────────────────────┘
8989+```
9090+9191+---
9292+9393+## PCM sink vtable (`pcm-airplay.c`)
9494+9595+`firmware/target/hosted/pcm-airplay.c` implements `struct pcm_sink` with the
9696+following vtable:
9797+9898+| Op | Implementation |
9999+|-------------------|---------------------------------------------------------------------|
100100+| `init` | `pthread_mutex_init` (recursive) |
101101+| `postinit` | no-op |
102102+| `set_freq` | records `current_sample_rate` from `hw_freq_sampr[freq]` |
103103+| `lock` / `unlock` | `pthread_mutex_lock/unlock` |
104104+| `play` | `sink_dma_start` — connects, spawns `airplay_thread` |
105105+| `stop` | `sink_dma_stop` — signals thread, joins, calls `pcm_airplay_stop()` |
106106+107107+`airplay_pcm_sink` is registered at index `PCM_SINK_AIRPLAY = 2` in the
108108+`sinks[]` array in `firmware/pcm.c`.
109109+110110+---
111111+112112+## The DMA thread
113113+114114+`sink_dma_start(addr, size)` stores the initial PCM pointer/length under the
115115+mutex, then spawns `airplay_thread`. The thread mimics a hardware DMA
116116+interrupt loop:
117117+118118+```
119119+while not stopped:
120120+ 1. lock → grab (data, size) → clear pcm_data/pcm_size → unlock
121121+ 2. if data: pcm_airplay_write(data, size)
122122+ 3. lock → pcm_play_dma_complete_callback(OK, &pcm_data, &pcm_size) → unlock
123123+ 4. if no more data: break
124124+ 5. pcm_play_dma_status_callback(STARTED) ← tells audio engine chunk consumed
125125+```
126126+127127+Unlike the FIFO sink, there is **no explicit real-time pacing** in C. Pacing is
128128+handled inside `rtp.rs` — the RTP sender sleeps to maintain the correct
129129+wall-clock transmission rate based on the RTP timestamp increment.
130130+131131+---
132132+133133+## FFI boundary
134134+135135+`crates/airplay/src/lib.rs` exports three `#[no_mangle] extern "C"` functions:
136136+137137+| C symbol | Rust function | Purpose |
138138+|------------------------|------------------------|--------------------------------------|
139139+| `pcm_airplay_set_host` | `pcm_airplay_set_host` | Store `HOST` + `PORT` atomics/mutex |
140140+| `pcm_airplay_connect` | `pcm_airplay_connect` | Open RTSP + RTP session (idempotent) |
141141+| `pcm_airplay_write` | `pcm_airplay_write` | Buffer PCM, encode ALAC, send RTP |
142142+| `pcm_airplay_stop` | `pcm_airplay_stop` | Send TEARDOWN, clear session |
143143+144144+`HOST` is a `Mutex<Option<String>>` and `PORT` is an `AtomicU16` (default
145145+5000). `SESSION` is a `Mutex<Option<AirPlaySession>>` — the session is
146146+created once and reused across `write` calls for the lifetime of a track.
147147+148148+### Force-link shim
149149+150150+Because `rockbox-airplay` is an `rlib`, its symbols are only included in
151151+`librockbox_cli.a` if something references them. `crates/cli/src/lib.rs`
152152+contains:
153153+154154+```rust
155155+use rockbox_airplay::_link_airplay as _;
156156+```
157157+158158+where `_link_airplay` is a public no-op function in `lib.rs`. This is enough
159159+to pull the entire crate into the link graph.
160160+161161+---
162162+163163+## Session lifecycle (`lib.rs`)
164164+165165+`pcm_airplay_connect()` is called from `sink_dma_start()` at the start of
166166+every track. It is guarded by `SESSION`:
167167+168168+```
169169+if SESSION is already Some → return OK immediately (idempotent)
170170+171171+1. Probe AirPlay 2 (non-fatal — logs and falls through on failure)
172172+2. RtpSender::bind(host, ports) ← binds three UDP sockets
173173+3. RtspClient::new(host, port) ← opens TCP connection to receiver
174174+4. rtsp.announce(sdp) ← sends SDP describing the ALAC stream
175175+5. rtsp.setup(transport) ← negotiates UDP port numbers
176176+6. rtsp.record() ← starts the session
177177+7. sender.send_initial_sync() ← sends first RTCP sync packet
178178+8. SESSION = Some(AirPlaySession { sender, rtsp, buf: [], first_frame: true })
179179+```
180180+181181+`pcm_airplay_write(data, len)` appends the incoming PCM bytes to `buf`, then
182182+drains complete 352-sample (1408-byte) frames in a loop:
183183+184184+```rust
185185+while buf.len() >= FRAME_SIZE:
186186+ frame_pcm = buf.drain(..FRAME_SIZE)
187187+ alac_frame = alac::encode_frame(&frame_pcm)
188188+ sender.send_audio(&alac_frame, first_frame)
189189+ first_frame = false
190190+```
191191+192192+`pcm_airplay_stop()` sends RTSP TEARDOWN and sets `SESSION = None`.
193193+194194+---
195195+196196+## RTSP handshake (`rtsp.rs`)
197197+198198+`RtspClient` speaks synchronous RTSP over a single TCP connection. The full
199199+exchange for one session is:
200200+201201+### 1. ANNOUNCE
202202+203203+Sends an SDP body describing the ALAC codec:
204204+205205+```
206206+v=0
207207+o=iTunes <session_id> 0 IN IP4 <local_ip>
208208+s=iTunes
209209+c=IN IP4 <receiver_ip>
210210+t=0 0
211211+m=audio 0 RTP/AVP 96
212212+a=rtpmap:96 AppleLossless
213213+a=fmtp:96 352 0 16 40 10 14 2 255 0 0 44100
214214+```
215215+216216+The `fmtp` parameters encode:
217217+`<frames_per_packet> <version> <bit_depth> <rice_history_mult>
218218+<rice_initial_history> <rice_limit> <channels> <max_run> <max_frame_bytes>
219219+<avg_bit_rate> <sample_rate>`
220220+221221+### 2. SETUP
222222+223223+Sends a `Transport` header requesting UDP:
224224+225225+```
226226+Transport: RTP/AVP/UDP;unicast;interleaved=0-1;
227227+ client_port=<audio_port>-<ctrl_port>
228228+```
229229+230230+`interleaved=0-1` is required by many receivers even though the transport is
231231+UDP (not RTSP interleaved). The response carries the server's UDP port pair,
232232+extracted by `parse_port()`.
233233+234234+### 3. RECORD
235235+236236+Starts the stream. Sends `RTP-Info` with sequence number and RTP timestamp.
237237+238238+### 4. SET_PARAMETER (volume)
239239+240240+Sets playback volume. Sent as a float string in a `text/parameters` body:
241241+`volume: -20.0` (range −144 to 0; 0 is full volume).
242242+243243+### 5. TEARDOWN
244244+245245+Gracefully terminates the session. Called from `pcm_airplay_stop()`.
246246+247247+---
248248+249249+## ALAC encoding (`alac.rs`)
250250+251251+`encode_frame(samples: &[i16])` encodes exactly **352 stereo S16LE samples**
252252+(1408 bytes of PCM) into an ALAC verbatim ("uncompressed escape") frame.
253253+254254+### Frame format
255255+256256+The Hammerton ALAC decoder expects this exact bit layout:
257257+258258+```
259259+Bits Width Field
260260+0–2 3 channels − 1 (= 1 for stereo)
261261+3–6 4 discarded (0)
262262+7–18 12 discarded (0)
263263+19 1 hassize = 0
264264+20–23 4 uncompressed_bytes = 0
265265+24 1 isNotCompressed = 1 ← verbatim frame flag
266266+25+ 32 each sample as big-endian signed 16-bit, left then right
267267+```
268268+269269+Output size = 4 bytes header + 352 × 2 channels × 2 bytes/sample
270270+ = **1412 bytes** (rounded up to byte boundary).
271271+272272+### BitWriter
273273+274274+`BitWriter` accumulates bits MSB-first into a `Vec<u8>`:
275275+276276+```rust
277277+fn write(&mut self, value: u64, nbits: u32)
278278+fn align(&mut self) // zero-pad to next byte boundary
279279+```
280280+281281+The encoder calls `write` for the 25-bit header fields and then for each
282282+sample (16 bits per channel, interleaved L/R), then `align()` to flush the
283283+final byte.
284284+285285+---
286286+287287+## RTP audio stream (`rtp.rs`)
288288+289289+`RtpSender` opens **three UDP sockets** at construction time:
290290+291291+| Socket | Direction | Purpose |
292292+|---------------|-------------------------|---------------------|
293293+| `audio_sock` | → receiver audio port | RTP audio frames |
294294+| `ctrl_sock` | ↔ receiver control port | RTCP sync packets |
295295+| `timing_sock` | ↔ receiver timing port | NTP timing exchange |
296296+297297+### `send_audio(frame, marker)`
298298+299299+Builds a 12-byte RTP header:
300300+301301+```
302302+ 0 1 2 3
303303+ 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
304304+├─┤─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
305305+│V=2│P│X│ CC │M│ PT=96 │ Sequence Number │
306306+├───────────────────────────────┼─────────────────────────────┤
307307+│ Timestamp (RTP clock units) │
308308+├─────────────────────────────────────────────────────────────┤
309309+│ SSRC │
310310+└─────────────────────────────────────────────────────────────┘
311311+```
312312+313313+- `M` (marker) = 1 on the first frame of a session, 0 thereafter.
314314+- Timestamp increments by **352** per frame (one ALAC frame = 352 samples).
315315+- SSRC is a random 32-bit value chosen at sender creation.
316316+317317+**Real-time pacing**: `send_audio` tracks the expected transmission instant
318318+using `Instant` and `frame_count × Duration_per_frame` and calls
319319+`thread::sleep` when the sender is running ahead.
320320+321321+---
322322+323323+## RTCP synchronisation
324324+325325+`send_sync(first)` sends a 20-byte RTCP NTP Send Report to the control socket
326326+every **44 frames** (~350 ms at 44100 Hz):
327327+328328+```
329329+Byte Field
330330+0 V=2, P=0, RC=0
331331+1 PT=200 (SR) or 0xD4 (first sync)
332332+2–3 length = 4 (words after fixed header)
333333+4–7 SSRC
334334+8–11 NTP timestamp seconds (since 1900-01-01)
335335+12–15 NTP timestamp fraction (2^32 units)
336336+16–19 RTP timestamp (matching the next audio frame's timestamp)
337337+```
338338+339339+`NTP_EPOCH_DELTA = 0x83AA_7E80` converts UNIX time (seconds since 1970) to NTP
340340+time (seconds since 1900).
341341+342342+The first sync packet (`first=true`) uses PT=`0xD4` (not standard SR) — some
343343+receivers require this to accept the initial synchronisation.
344344+345345+---
346346+347347+## NTP timing responder
348348+349349+A background thread (`timing_responder`) listens on `timing_sock` and answers
350350+NTP timing requests from the receiver:
351351+352352+```
353353+Request PT = 0xD2 (timing request)
354354+Response PT = 0xD3 (timing response)
355355+356356+Response body (32 bytes):
357357+ [0–3] SSRC
358358+ [4–7] 0 (reference seconds)
359359+ [8–11] 0 (reference fraction)
360360+ [12–15] received seconds (echoed from request)
361361+ [16–19] received fraction (echoed from request)
362362+ [20–23] send seconds (current NTP time)
363363+ [24–27] send fraction (current NTP time)
364364+```
365365+366366+Many receivers stall playback if timing responses stop arriving. The thread
367367+runs for the entire duration of the session.
368368+369369+---
370370+371371+## Track transitions
372372+373373+When Rockbox moves to the next track:
374374+375375+1. `sink_dma_stop()` is called → `pcm_airplay_stop()` → RTSP TEARDOWN →
376376+ `SESSION = None`.
377377+2. `sink_dma_start()` is called for the new track → `pcm_airplay_connect()` →
378378+ new RTSP session with fresh RTP sequence/timestamp counters.
379379+380380+There is a brief gap (TEARDOWN round-trip + new ANNOUNCE/SETUP/RECORD) between
381381+tracks. This is inherent to RAOP and is typically inaudible (<100 ms).
382382+383383+---
384384+385385+## Configuration
386386+387387+In `~/.config/rockbox.org/settings.toml`:
388388+389389+```toml
390390+audio_output = "airplay"
391391+airplay_host = "192.168.1.x" # IP of the AirPlay receiver
392392+airplay_port = 5000 # optional, default 5000
393393+```
394394+395395+`crates/settings/src/lib.rs:load_settings()` reads these values and calls:
396396+397397+```rust
398398+pcm::airplay_set_host(&host, port);
399399+pcm::switch_sink(PCM_SINK_AIRPLAY);
400400+```
401401+402402+`airplay_set_host` stores the host in `HOST: Mutex<Option<String>>` and the
403403+port in `PORT: AtomicU16`. These are read by `pcm_airplay_connect()` at the
404404+start of each track.
405405+406406+---
407407+408408+## AirPlay 2 probe
409409+410410+`pcm_airplay_connect()` first attempts an AirPlay 2 handshake (PTP-based). If
411411+it fails (connection refused, or the receiver does not support AirPlay 2) the
412412+error is logged at `tracing::debug!` level and the function falls through to the
413413+AirPlay 1 / RAOP path. This makes the probe transparent to the user.
414414+415415+The AirPlay 2 path uses the cryptographic dependencies declared in
416416+`Cargo.toml`:
417417+418418+```toml
419419+x25519-dalek # key exchange
420420+ed25519-dalek # signature
421421+chacha20poly1305 # AEAD encryption
422422+sha2, hkdf, hmac # key derivation
423423+num-bigint # SRP big-integer arithmetic
424424+```
425425+426426+None of these are needed for the AirPlay 1 code path.
427427+428428+---
429429+430430+## Gotchas and known limits
431431+432432+### 1. Only one simultaneous receiver
433433+434434+The `SESSION` mutex holds a single `AirPlaySession`. Sending to multiple
435435+AirPlay devices simultaneously is not supported. For multi-room output use
436436+the Squeezelite sink with multiple clients, or run multiple rockboxd instances.
437437+438438+### 2. Receiver must be on the local network
439439+440440+RAOP uses UDP with no NAT traversal. The receiver must be directly reachable
441441+at the configured IP. Multicast discovery (mDNS/Bonjour) is not implemented —
442442+you must supply the IP manually.
443443+444444+### 3. `interleaved=0-1` in Transport header
445445+446446+Even though the transport is plain UDP, most receivers require the
447447+`interleaved=0-1` parameter in the SETUP `Transport` header. Omitting it causes
448448+the receiver to ignore the `RECORD` command silently.
449449+450450+### 4. Verbatim ALAC only (no compression)
451451+452452+`alac.rs` only implements the verbatim escape frame (`isNotCompressed=1`).
453453+Bitrate is fixed at `sample_rate × 4 bytes/s = 176,400 bytes/s` at 44.1 kHz.
454454+This is fine for LAN streaming but wasteful compared to the compressed ALAC
455455+path.
456456+457457+### 5. Fixed 44100 Hz sample rate
458458+459459+The RTSP SDP and ALAC frame size constants are hard-coded for 44100 Hz.
460460+Playback of 48 kHz or 96 kHz tracks is not tested and may produce incorrect
461461+pitch or receiver errors.
462462+463463+### 6. Logging uses `tracing`, never `println!`
464464+465465+All diagnostic output is routed through the `tracing` crate. To see the full
466466+AirPlay negotiation:
467467+468468+```sh
469469+RUST_LOG=rockbox_airplay=debug rockboxd
470470+```
471471+472472+Never add `println!` or `eprintln!` — those bypass the log filter and pollute
473473+stdout, breaking FIFO/pipe mode.