Add PCM loudness normalization docs · tsiry-sandratraina.com/rockbox-zig@82161ad

+283

1 changed file

expand all

docs

+283

docs/pcm-normalization.md

··· 1 + # PCM Loudness Normalization 2 + 3 + Rockbox implements a real-time PCM loudness normalizer that equalises the perceived volume across tracks and sources. It is similar in purpose to Spotify's "Normalize Volume" or Apple Music's "Sound Check", but operates at the raw audio buffer level rather than on pre-computed track metadata — so it works for any source including live streams, radio, and HTTP audio. 4 + 5 + ## Table of Contents 6 + 7 + 1. [Overview](#overview) 8 + 2. [How It Works](#how-it-works) 9 + - [Step 1 — RMS Measurement](#step-1--rms-measurement) 10 + - [Step 2 — Running RMS Estimate with Silence Gate](#step-2--running-rms-estimate-with-silence-gate) 11 + - [Step 3 — Gain Computation and Smoothing](#step-3--gain-computation-and-smoothing) 12 + - [Step 4 — Linear Gain Interpolation and Application](#step-4--linear-gain-interpolation-and-application) 13 + 3. [Asymmetric Attack / Release](#asymmetric-attack--release) 14 + 4. [Silence Gate](#silence-gate) 15 + 5. [Warm Start](#warm-start) 16 + 6. [Parameters Reference](#parameters-reference) 17 + 7. [Position in the Signal Chain](#position-in-the-signal-chain) 18 + 8. [Enabling the Normalizer](#enabling-the-normalizer) 19 + 9. [Comparison with ReplayGain](#comparison-with-replaygain) 20 + 10. [Known Limitations](#known-limitations) 21 + 22 + --- 23 + 24 + ## Overview 25 + 26 + The normalizer targets a fixed RMS loudness level (−9 dBFS by default). Every PCM buffer that flows through a sink is analysed, a smoothed gain factor is computed, and the gain is applied in-place before the audio is written to the output device. The gain adjusts continuously and automatically — no track scanning, no metadata, no pre-processing required. 27 + 28 + ``` 29 + Decoded PCM ──► SW Volume scaling ──► Normalizer ──► Sink (SDL / FIFO / AirPlay / …) 30 + (pcm_copy_buffer) (pcm_normalizer_apply) 31 + ``` 32 + 33 + --- 34 + 35 + ## How It Works 36 + 37 + The algorithm runs once per PCM chunk. A "chunk" is the buffer delivered by the Rockbox audio engine to the DMA callback — typically 4 096–8 192 bytes (≈ 23–46 ms of stereo 44 100 Hz audio). The four steps below execute in order for every chunk. 38 + 39 + ### Step 1 — RMS Measurement 40 + 41 + The Root Mean Square (RMS) amplitude of the current chunk is computed: 42 + 43 + ``` 44 + chunk_rms = sqrt( (1/N) × Σ (sᵢ / 32768)² ) 45 + ``` 46 + 47 + where `sᵢ` are the raw S16LE sample values and dividing by 32 768 normalises them to the range `[−1, +1]`. RMS is used rather than peak amplitude because it correlates well with perceived loudness — a brief loud transient raises RMS only slightly, whereas sustained loud content raises it significantly. 48 + 49 + The summation uses `double` precision to avoid accumulated rounding error over large chunk sizes. 50 + 51 + ### Step 2 — Running RMS Estimate with Silence Gate 52 + 53 + A single chunk's RMS is noisy; averaging across many chunks gives a stable picture of the signal's loudness. A first-order Infinite Impulse Response (IIR) filter — also called an exponential moving average — is used: 54 + 55 + ``` 56 + rms_estimate = α × rms_estimate + (1 − α) × chunk_rms 57 + ``` 58 + 59 + The coefficient `α` controls how quickly the estimate tracks changes. Crucially, **two different coefficients** are used depending on the direction of change: 60 + 61 + | Signal direction | Coefficient | Behaviour | 62 + |---|---|---| 63 + | `chunk_rms > rms_estimate` (getting louder) | `RMS_ATTACK = 0.3` | Tracks loud transients in 2–3 chunks (< 150 ms) | 64 + | `chunk_rms < rms_estimate` (getting quieter) | `RMS_RELEASE = 0.99` | Takes ~7 s to settle on a quieter signal | 65 + 66 + This asymmetry is essential. A fast attack means the estimate rises quickly when a loud section begins — preventing the normalizer from over-boosting and causing clipping. A slow release means the estimate falls slowly after a loud section ends — preventing the gain from shooting up during a brief quiet passage (the "pumping" or "breathing" artefact). 67 + 68 + Chunks whose RMS falls below `GATE_THRESH` (−60 dBFS) are treated as silence: the RMS estimate and the gain are both held at their current values. This prevents the normalizer from amplifying the noise floor during pauses between tracks. 69 + 70 + ### Step 3 — Gain Computation and Smoothing 71 + 72 + The desired gain is calculated as: 73 + 74 + ``` 75 + desired_gain = TARGET_RMS / rms_estimate 76 + ``` 77 + 78 + This is the factor that, if applied to the signal, would bring its estimated loudness to `TARGET_RMS`. The value is clamped to prevent extreme correction: 79 + 80 + ``` 81 + desired_gain = clamp(desired_gain, MIN_GAIN, MAX_GAIN) 82 + = clamp(desired_gain, 0.1, 10.0) // −20 dB to +20 dB 83 + ``` 84 + 85 + The gain is not applied instantaneously — that would produce audible clicks at chunk boundaries whenever the gain changes significantly. Instead the applied gain `gain` moves toward `desired_gain` through another asymmetric IIR smoother: 86 + 87 + ``` 88 + gain = β × gain + (1 − β) × desired_gain 89 + ``` 90 + 91 + | Direction | Coefficient | Convergence | 92 + |---|---|---| 93 + | Gain decreasing (signal too loud) | `GAIN_ATTACK = 0.3` | Reaches target in ~3 chunks (< 150 ms) | 94 + | Gain increasing (signal too quiet) | `GAIN_RELEASE = 0.98` | Reaches target in ~3 seconds | 95 + 96 + The fast gain attack prevents over-shoot and clipping when a loud track suddenly follows a quiet one. The slow gain release prevents the loudness from rising abruptly during a quiet moment. 97 + 98 + ### Step 4 — Linear Gain Interpolation and Application 99 + 100 + Applying a discontinuous gain at the start of each chunk would still produce a click if the gain changed significantly between chunks. The gain is therefore **linearly interpolated** from its value at the start of the chunk (`g_start`) to its new value at the end (`g_end`): 101 + 102 + ``` 103 + g(i) = g_start + (g_end − g_start) × (i / (N − 1)) 104 + ``` 105 + 106 + Each sample is scaled by its per-sample gain and clamped to the S16 range to avoid integer overflow: 107 + 108 + ```c 109 + float v = (float)s[i] * g(i); 110 + v = clamp(v, -32768.0f, 32767.0f); 111 + s[i] = (int16_t)v; 112 + ``` 113 + 114 + This ramp completely eliminates the inter-chunk click artefact, even at fast gain-attack rates. 115 + 116 + --- 117 + 118 + ## Asymmetric Attack / Release 119 + 120 + The following diagram illustrates the asymmetric time constants on a hypothetical signal that starts quiet, becomes loud, and then returns to quiet: 121 + 122 + ``` 123 + RMS level 124 + │ ┌────────────────────────┐ 125 + │ │ Loud section │ 126 + │ │ │ 127 + │ ──────────┘ └───────── Actual signal 128 + │ ↑ fast attack ↑ slow release 129 + │ 130 + │ ────────┐ ┌────── rms_estimate 131 + │ └──────────────────────────────┘ 132 + │ 133 + gain│ ────────┐ ┌────── Applied gain 134 + │ └──────────────────────────────┘ 135 + │ ↑ fast gain reduction ↑ slow gain rise 136 + └──────────────────────────────────────────────── time 137 + ``` 138 + 139 + The fast attack on both the RMS estimator and the gain smoother means the normalizer reacts within ~150 ms when the signal becomes loud, preventing clipping. The slow release means it takes a few seconds to raise the gain again after a loud section, which avoids the pumping artefact that would otherwise be audible during the quiet parts of dynamic music or between tracks. 140 + 141 + --- 142 + 143 + ## Silence Gate 144 + 145 + If `chunk_rms ≤ GATE_THRESH` (0.001 linear = −60 dBFS), the chunk is classified as silence and neither `rms_estimate` nor `gain` is updated. Without this gate, a pause between tracks would cause `rms_estimate` to decay toward zero, `desired_gain` to hit `MAX_GAIN`, and the next track to begin at full boost — creating a loud pop on playback resume. 146 + 147 + The gate threshold of −60 dBFS is below any audible content but above the floating-point noise floor of a 16-bit signal. 148 + 149 + --- 150 + 151 + ## Warm Start 152 + 153 + When the normalizer is first enabled (or re-enabled), state is reset to: 154 + 155 + ```c 156 + gain = 1.0f; // no gain applied yet 157 + rms_estimate = 0.1f; // −20 dBFS: typical quiet-to-moderate music level 158 + ``` 159 + 160 + The `rms_estimate` warm-start at 0.1 (rather than at `TARGET_RMS`) means `desired_gain` starts above 1.0 for most content. This ensures the normalizer applies a boost from the very first chunk rather than waiting for the IIR filter to converge from the default. Without the warm start, the first several seconds of playback would sound un-normalised. 161 + 162 + --- 163 + 164 + ## Parameters Reference 165 + 166 + All parameters are compile-time constants in `firmware/pcm_normalizer.c`. 167 + 168 + | Constant | Value | dB equivalent | Description | 169 + |---|---|---|---| 170 + | `TARGET_RMS` | `0.35` | −9 dBFS | Target RMS loudness. Higher = louder output. | 171 + | `RMS_ATTACK` | `0.3` | — | IIR coefficient for RMS rising (loud signal). Lower = faster. | 172 + | `RMS_RELEASE` | `0.99` | — | IIR coefficient for RMS falling (quiet signal). Higher = slower. | 173 + | `GAIN_ATTACK` | `0.3` | — | IIR coefficient for gain decreasing. Lower = faster. | 174 + | `GAIN_RELEASE` | `0.98` | — | IIR coefficient for gain increasing. Higher = slower. | 175 + | `MAX_GAIN` | `10.0` | +20 dB | Maximum boost applied to quiet tracks. | 176 + | `MIN_GAIN` | `0.1` | −20 dB | Maximum cut applied to loud tracks. | 177 + | `GATE_THRESH` | `0.001` | −60 dBFS | RMS below this → treat chunk as silence. | 178 + 179 + ### Choosing TARGET_RMS 180 + 181 + `TARGET_RMS` is the most impactful parameter. A few reference points: 182 + 183 + | Value | dBFS | Character | 184 + |---|---|---| 185 + | `0.071` | −23 dBFS | EBU R128 broadcast standard (very conservative) | 186 + | `0.178` | −15 dBFS | Apple Music / AES streaming recommendation | 187 + | `0.200` | −14 dBFS | Spotify / YouTube streaming target | 188 + | `0.350` | −9 dBFS | **Current default** — loud and punchy | 189 + | `0.500` | −6 dBFS | Very loud; risk of clipping on loud source material | 190 + 191 + ### Convergence Time Reference 192 + 193 + IIR convergence depends on the chunk size. For a typical 4 096-byte chunk at 44 100 Hz stereo (46 ms per chunk): 194 + 195 + | Parameter | Coefficient | ~Time to move 63% of the way to target | 196 + |---|---|---| 197 + | `RMS_ATTACK` | 0.3 | 1 chunk ≈ 46 ms | 198 + | `RMS_RELEASE` | 0.99 | 100 chunks ≈ 4.6 s | 199 + | `GAIN_ATTACK` | 0.3 | 1 chunk ≈ 46 ms | 200 + | `GAIN_RELEASE` | 0.98 | 50 chunks ≈ 2.3 s | 201 + 202 + Time constant τ = `−chunk_duration / ln(α)`. For `α = 0.98` and chunk = 46 ms: τ = −46 ms / ln(0.98) ≈ 2.3 s. 203 + 204 + --- 205 + 206 + ## Position in the Signal Chain 207 + 208 + The normalizer runs **after** software volume scaling and **before** the audio is written to any output sink: 209 + 210 + ``` 211 + Rockbox audio engine 212 + │ 213 + ▼ raw S16LE stereo PCM (read-only buffer from firmware) 214 + ┌───────────────────────┐ 215 + │ pcm_copy_buffer() │ applies the user's SW volume setting 216 + │ (pcm_sw_volume.c) │ writes into a per-sink scratch buffer 217 + └───────────────────────┘ 218 + │ 219 + ▼ volume-scaled PCM (writable scratch buffer) 220 + ┌───────────────────────┐ 221 + │ pcm_normalizer_apply()│ measures RMS, updates gain, applies in-place 222 + │ (pcm_normalizer.c) │ 223 + └───────────────────────┘ 224 + │ 225 + ▼ normalised PCM 226 + ┌───────────────────────┐ 227 + │ PCM sink │ SDL / FIFO / AirPlay / Squeezelite / 228 + │ │ UPnP / Chromecast / Snapcast TCP 229 + └───────────────────────┘ 230 + ``` 231 + 232 + Because the normalizer runs after SW volume, it measures and targets the _post-volume-control_ signal level. If the user lowers the volume, the normalizer sees a quieter signal and raises its gain to compensate — partially offsetting the volume reduction. This is intentional: at any volume setting, loudness across tracks remains consistent. 233 + 234 + --- 235 + 236 + ## Enabling the Normalizer 237 + 238 + Add to `~/.config/rockbox.org/settings.toml`: 239 + 240 + ```toml 241 + normalize_volume = true 242 + ``` 243 + 244 + The setting is read at startup by `crates/settings/src/lib.rs` which calls `pcm_normalizer_enable(true)`. It is also persisted back to disk on `write_settings()` so the preference survives restarts. 245 + 246 + The normalizer can also be toggled at runtime via the Rust FFI: 247 + 248 + ```rust 249 + // crates/sys/src/sound/normalizer.rs 250 + rockbox_sys::sound::normalizer::enable(true); 251 + let on = rockbox_sys::sound::normalizer::is_enabled(); 252 + ``` 253 + 254 + --- 255 + 256 + ## Comparison with ReplayGain 257 + 258 + Rockbox also supports ReplayGain, which is a pre-computed per-track gain stored in file tags. The two approaches are complementary: 259 + 260 + | | ReplayGain | PCM Normalizer | 261 + |---|---|---| 262 + | **Requires track analysis** | Yes (offline scan) | No | 263 + | **Works on streams / radio** | No | Yes | 264 + | **Accuracy** | Very high (full-track analysis) | Moderate (real-time estimate) | 265 + | **Artefacts** | None | Slight pumping on highly dynamic content | 266 + | **Target** | Configurable per standard | `TARGET_RMS` compile constant | 267 + | **Processing cost** | Zero at runtime | ~1–2% CPU (RMS + gain loop) | 268 + 269 + For local music libraries, ReplayGain is generally preferred when tags are available. The PCM normalizer is the practical choice for streaming sources or when ReplayGain tags are missing. 270 + 271 + Both can be active simultaneously. When ReplayGain is applied by the DSP engine (before the DMA stage), the PCM normalizer sees the already-normalised signal and applies only a small residual correction. 272 + 273 + --- 274 + 275 + ## Known Limitations 276 + 277 + - **No lookahead.** The normalizer reacts to what has already been played. A sudden loud transient at the very start of a track will play at the previous gain for the first chunk (~46 ms) before the attack kicks in. In practice this is inaudible for most music. 278 + 279 + - **RMS is not LUFS.** RMS amplitude correlates with perceived loudness but is not identical to the ITU-R BS.1770 Integrated Loudness (LUFS) metric used by broadcast standards. Content with heavy bass or aggressive dynamic compression may feel louder than its RMS suggests. 280 + 281 + - **Chunk-size dependency.** The IIR coefficients produce the quoted time constants only at the assumed 46 ms chunk size. Smaller chunks (e.g., during low-latency mode) will make the attack and release feel slower in wall-clock time because more chunks are needed to advance the filter by the same amount. Chunk size is determined by the audio engine and is not directly configurable. 282 + 283 + - **State is shared across tracks.** The `gain` and `rms_estimate` state variables are not reset between tracks. This is generally desirable — it prevents a jump in loudness at a track boundary — but means the normalizer's gain at the start of a new track reflects the previous track's loudness. Tracks that differ wildly in level may take a few seconds to settle.

Configure Feed

Configure Feed