EP39

EP39: How Spotify Knows You Are Too Loud — The LUFS Algorithm

ITU-R BS.1770 K加权双二阶, 双门限, 真峰值

▶ 6:25 Signal ProcessingPsychoacoustics

前置知识

EP24 The Information Code of Pop Music EP30 The Sampling Theorem Is Not What You Think EP37 Your EQ Moves Poles — Biquad Filters

Overview / 概述

中文: “你有没有注意到，Spotify会自动把所有歌曲调到差不多的音量？太大声的曲子被压下来，太安静的被提上去。背后的标准叫 ITU-R BS.1770，由国际电信联盟在2006年制定，全称是’节目音量及真实峰值电平的测量'。”

When Spotify streams a song, it does not simply play the file at face value. Every track is measured by the ITU-R BS.1770 loudness standard and its gain is adjusted so the integrated loudness lands near a common target. The unit of measurement is LUFS — Loudness Units relative to Full Scale — a psychoacoustically weighted power measure that approximates how the human auditory system perceives volume.

The algorithm has four mathematically distinct stages:

K-weighting — two cascaded biquad IIR filters that approximate the ear’s frequency sensitivity curve.
Mean-square block analysis — 400 ms windows with 75% overlap, per-channel energy accumulation.
Dual-threshold gating — an absolute gate at −70 LUFS and a relative gate 10 LU below the ungated mean, to exclude silence and very quiet passages.
True peak measurement — 4× sinc oversampling to detect inter-sample peaks that lie between digital sample points.

This episode connects directly to three earlier results: Biquad IIR filter design (EP37) supplies the mathematical framework for the K-weighting cascade. Sinc interpolation (EP30) underpins the true peak oversampling step. Information entropy of pop music (EP24) gives the information-theoretic lens on the loudness war and dynamic range compression.

Prerequisites / 前置知识

Topic	Where covered
Biquad IIR filter: transfer function, bilinear transform, $H(z)$ poles/zeros	EP37
Sinc interpolation, Nyquist–Shannon sampling theorem	EP30
Shannon entropy, dynamic range as information	EP24
Decibels: $\text{dB} = 20\log_{10}\|x\|$ for amplitude, $10\log_{10}P$ for power	EP01

Definitions

Definition 39.1 (LUFS (Loudness Units Relative to Full Scale))

Let $P$ be the mean-square power of a K-weighted, gated audio signal, normalized so that digital full scale corresponds to 0 dBFS. The loudness in LUFS is

$L = -0.691 + 10\log_{10} P,$

where the constant $-0.691$ aligns the scale so that a 1 kHz sine wave at 0 dBFS reads $\approx -3.01$ LUFS (matching the legacy PPM meter convention). One LUFS equals one LU (Loudness Unit); differences are quoted in LU.

Definition 39.2 (K-Weighting Filter)

The K-weighting response $H_K(z) = H_1(z)\,H_2(z)$ is the cascade of two biquad IIR filters at sample rate $f_s$ :

Stage 1 (head-diffraction high-shelf): centre frequency $f_0 = 1681.97$ Hz, gain $G = +4$ dB, quality factor $Q = 1/\sqrt{2}$ .
Stage 2 (RLB high-pass): cutoff $f_c = 38.13$ Hz, $Q = 0.5$ .

Each stage has the canonical biquad transfer function

$H_i(z) = \frac{b_{0,i} + b_{1,i}z^{-1} + b_{2,i}z^{-2}}{1 + a_{1,i}z^{-1} + a_{2,i}z^{-2}}.$

Coefficients are derived via the bilinear transform with pre-warping at the respective centre/cutoff frequency.

Definition 39.3 (Analysis Block)

An analysis block is a segment of

N_B = \lfloor 0.4\,f_s \rfloor

samples (400 ms at the signal’s native sample rate), advanced by a hop of

N_H = \lfloor 0.1\,f_s \rfloor

samples (100 ms), giving 75% overlap between consecutive blocks. The index set of block

k

\{kN_H,\; kN_H+1,\; \ldots,\; kN_H + N_B - 1\}

Definition 39.4 (Block Loudness)

For block $k$ , let $\tilde{x}_{i}[n]$ denote the K-weighted sample of channel $i$ at time $n$ , and let $G_i$ be the channel weighting coefficient. The block loudness is

$L_k = -0.691 + 10\log_{10}\!\left(\sum_{i} G_i \,\bar{x}_{i,k}^2\right),$

where $\bar{x}_{i,k}^2 = \frac{1}{N_B}\sum_{n=kN_H}^{kN_H+N_B-1} \tilde{x}_i[n]^2$ is the mean square energy of channel $i$ in block $k$ .

Definition 39.5 (Channel Weighting Coefficients)

The ITU-R BS.1770-4 standard specifies the following per-channel coefficients $G_i$ :

Channel	$G_i$
Left (L)	1.0
Right (R)	1.0
Centre (C)	1.0
Left surround (Ls)	1.41
Right surround (Rs)	1.41
Low-frequency effects (LFE)	0

The LFE channel is excluded because sub-bass energy below ~80 Hz is not perceived as loudness in the conventional sense.

Definition 39.6 (Dual-Gate Integrated Loudness)

Let $\mathcal{B}$ be the full set of analysis blocks. Define two gating passes:

Absolute gate: $\mathcal{B}_1 = \{k \in \mathcal{B} : L_k \geq -70\}$ .
Preliminary mean: $\Gamma = -0.691 + 10\log_{10}\!\left(\frac{1}{|\mathcal{B}_1|}\sum_{k\in\mathcal{B}_1} 10^{(L_k+0.691)/10}\right)$ .
Relative gate: $\mathcal{B}_2 = \{k \in \mathcal{B}_1 : L_k \geq \Gamma - 10\}$ .

The integrated loudness of the programme is

$I = -0.691 + 10\log_{10}\!\left(\frac{1}{|\mathcal{B}_2|}\sum_{k\in\mathcal{B}_2} 10^{(L_k+0.691)/10}\right).$

Definition 39.7 (True Peak Level)

Let $x[n]$ be a digital audio signal at sample rate $f_s$ . Its true peak level (in dBTP — decibels relative to full scale true peak) is

$\mathrm{TP} = 20\log_{10}\!\max_m \bigl|x_{\uparrow}[m]\bigr|,$

where $x_{\uparrow}[m]$ is the signal upsampled to $4f_s$ (or higher) via sinc interpolation, so that inter-sample peaks between original sample positions are captured.

Main Theorems / 主要定理

Theorem 39.1 (K-Weighting Cascade Response)

The combined K-weighting response $H_K(e^{j\omega})$ satisfies:

(i) High-frequency boost: For $\omega \approx 2\pi \cdot 2000/f_s$ , $|H_K| > 1$ , with a maximum gain of approximately $+4$ dB near 2–4 kHz — the region of peak human auditory sensitivity.

(ii) Low-frequency rejection: For $f \ll 38$ Hz, $|H_K(e^{j2\pi f/f_s})| \to 0$ with a second-order (12 dB/octave) roll-off.

(iii) Passband reference: At 1 kHz, $|H_K| \approx 1$ (0 dB), establishing the normalization anchor.

Proof.

Write $H_K = H_1 H_2$ where $H_1$ is the high-shelf and $H_2$ is the high-pass biquad.

Part (i). The high-shelf $H_1$ has gain approaching $10^{G/20} = 10^{4/20} \approx 1.585$ (i.e., +4 dB) at high frequencies, while $H_2$ is near unity in the same band (its cutoff at 38 Hz is far below 2 kHz). Therefore $|H_K| \approx 1.585$ at 2 kHz, confirming the +4 dB boost. The exact peak location near 2–4 kHz follows from the shelf’s transition band centered at 1681.97 Hz combined with the ear’s Fletcher–Munson equal-loudness curves.

Part (ii). The high-pass $H_2$ has a second-order numerator zero at DC ( $z = 1$ , i.e., $\omega = 0$ ) with $b_0 = b_2 = 1/a_0$ and $b_1 = -2/a_0$ , so $H_2(1) = 0$ . Near DC, $|H_2(e^{j\omega})| \propto \omega^2$ , giving 12 dB/octave attenuation. Multiplying by $H_1$ , which has finite nonzero gain at DC, preserves the roll-off rate.

Part (iii). At 1 kHz the shelf is still in its transition region and contributes approximately +2 dB, while the high-pass is effectively transparent. The ITU normalization constant $-0.691$ dB in the loudness formula compensates so that a 1 kHz sine at 0 dBFS reads −3.01 LUFS, consistent with legacy meter calibration. $\square$

Theorem 39.2 (Block Loudness Formula)

For a single-channel signal $\tilde{x}[n]$ (K-weighted, $G = 1$ ) with mean-square power $P_k = \frac{1}{N_B}\sum_{n} \tilde{x}[n]^2$ in block $k$ , the block loudness $L_k$ satisfies

$L_k = 10\log_{10} P_k - 0.691.$

Moreover, if $P_k \leq 10^{-7}$ (approximately −70 dBFS power), then $L_k \leq -70$ LUFS, placing the block below the absolute gate.

Proof.

By Definition 39.4 with a single channel and $G = 1$ ,

$L_k = -0.691 + 10\log_{10}\!\left(\bar{x}_k^2\right) = 10\log_{10} P_k - 0.691.$

For the gate condition: $L_k \leq -70$ iff $10\log_{10} P_k \leq -70 + 0.691 = -69.309$ , i.e., $P_k \leq 10^{-6.931} \approx 1.17 \times 10^{-7}$ . In practice the standard states the absolute gate as $-70$ LUFS, so any block with mean-square power below this threshold is excluded. $\square$

Theorem 39.3 (Dual-Gate Integrated Loudness)

The dual-gate integrated loudness $I$ (Definition 39.6) is the power-domain mean of all gated block loudnesses. Specifically, if we define the per-block linear power $p_k = 10^{(L_k + 0.691)/10}$ , then

$I = -0.691 + 10\log_{10}\!\left(\frac{1}{|\mathcal{B}_2|}\sum_{k \in \mathcal{B}_2} p_k\right).$

Furthermore, the relative gate is equivalent to the condition $p_k \geq 10^{(\Gamma + 0.691)/10} / 10$ , i.e., blocks must have linear power at least one-tenth of the preliminary gated mean.

Proof.

The integrated loudness formula in Definition 39.6 is definitionally a power average. To see the equivalence with the relative gate: the condition $L_k \geq \Gamma - 10$ rearranges as

$\frac{L_k + 0.691}{10} \geq \frac{(\Gamma - 10) + 0.691}{10} = \frac{\Gamma + 0.691}{10} - 1,$

so $p_k = 10^{(L_k+0.691)/10} \geq 10^{(\Gamma+0.691)/10 - 1} = \bar{p}_{\mathcal{B}_1} / 10$ , where $\bar{p}_{\mathcal{B}_1}$ is the preliminary gated linear power. This confirms the 10 LU offset is a factor-of-10 power ratio in the linear domain. $\square$

Theorem 39.4 (True Peak via 4x Sinc Oversampling)

Let $x[n]$ be a band-limited digital signal with bandwidth $B < f_s/2$ . Define the 4× upsampled signal

$x_{\uparrow}[m] = \sum_{n=-\infty}^{\infty} x[n]\,\mathrm{sinc}\!\left(\frac{m}{4} - n\right),$

where $\mathrm{sinc}(t) = \frac{\sin(\pi t)}{\pi t}$ . Then:

(i) $x_{\uparrow}[m] = x[n]$ whenever $m = 4n$ (the upsampled signal agrees with the original at integer multiples of 4).

(ii) $\max_m |x_{\uparrow}[m]| \geq \max_n |x[n]|$ , and equality holds only if no inter-sample peak exceeds any sample value.

(iii) The true peak error $\epsilon = \max_m |x_{\uparrow}[m]| - \sup_t |x_c(t)|$ , where $x_c$ is the ideal continuous reconstruction, satisfies $|\epsilon| \leq \delta$ for some small $\delta$ determined by the stopband attenuation of the anti-aliasing filter applied before upsampling. At 4× oversampling, BS.1770 specifies a maximum error of +0.5/−1.0 dBTP.

Proof.

Part (i). Substituting $m = 4n_0$ into the sinc sum: $x_{\uparrow}[4n_0] = \sum_n x[n]\,\mathrm{sinc}(n_0 - n) = x[n_0]$ , since $\mathrm{sinc}(k) = \delta[k]$ for integer $k$ .

Part (ii). The upsampled grid $\{m/4f_s\}$ is a strict superset of the original grid $\{n/f_s\}$ , so the maximum over the finer grid is at least as large. Equality holds iff the continuous signal achieves its maximum at a sample point.

Part (iii). The true continuous reconstruction error is bounded by the Gibbs/aliasing artifacts of the finite-length interpolation filter. BS.1770 Annex 2 specifies a minimum filter order and stopband attenuation such that the oversampling measurement stays within the stated tolerance. $\square$

Prop 39.5 (Loudness War Upper Bound)

If a signal is subjected to brick-wall limiting so that its sample values satisfy $|x[n]| \leq A_{\max}$ , then its integrated loudness is bounded above by

$I \leq -0.691 + 10\log_{10}\!\left(\sum_i G_i A_{\max}^2\right).$

For stereo ( $G_L = G_R = 1$ ), this gives $I \leq -0.691 + 10\log_{10}(2 A_{\max}^2)$ . At digital full scale $A_{\max} = 1$ , the ceiling is approximately $-0.691 + 3.01 \approx +2.3$ LUFS — a practical upper bound on how loud any stereo programme can ever measure.

Proof.

Since

\bar{x}_{i,k}^2 \leq A_{\max}^2

for all blocks

k

and channels

i

, the block loudness satisfies

L_k \leq -0.691 + 10\log_{10}(\sum_i G_i A_{\max}^2)

. Taking the gated power average preserves the inequality, so

I \leq -0.691 + 10\log_{10}(\sum_i G_i A_{\max}^2)

\square

Numerical Examples

Example 1: Computing Biquad Coefficients for Stage 2 (38 Hz High-Pass)

At $f_s = 48000$ Hz, cutoff $f_c = 38.13$ Hz, $Q = 0.5$ :

K = \tan\!\left(\frac{\pi \cdot 38.13}{48000}\right) \approx \tan(0.002494) \approx 0.002494.

The normalization constant:

a_0 = 1 + \frac{K}{Q} + K^2 = 1 + \frac{0.002494}{0.5} + (0.002494)^2 \approx 1.004994.

High-pass biquad numerator coefficients:

b_0 = \frac{1}{a_0} \approx 0.99503, \quad b_1 = \frac{-2}{a_0} \approx -1.99006, \quad b_2 = \frac{1}{a_0} \approx 0.99503.

Denominator coefficients:

a_1 = \frac{2(K^2 - 1)}{a_0} \approx \frac{-1.99999}{1.004994} \approx -1.99005, \quad a_2 = \frac{1 - K/Q + K^2}{a_0} \approx \frac{0.99503}{1.004994} \approx 0.99010.

These coefficients are essentially a near-transparent all-pass above 100 Hz but roll off sharply below 38 Hz.

Example 2: Block Loudness Calculation (Stereo)

Suppose a 400 ms stereo block has K-weighted mean-square energies:

\bar{x}_{L}^2 = 0.050, \quad \bar{x}_{R}^2 = 0.045.

With $G_L = G_R = 1$ :

\sum_i G_i \bar{x}_i^2 = 0.050 + 0.045 = 0.095.

Block loudness:

L_k = -0.691 + 10\log_{10}(0.095) = -0.691 + 10 \times (-1.0223) = -0.691 - 10.223 = -10.91 \text{ LUFS}.

This block passes the absolute gate (−10.91 > −70) and would also survive a relative gate unless the preliminary mean were above −0.91 LUFS — essentially impossible in practice.

Example 3: Dual-Gate Pass Through

Suppose a programme has 500 analysis blocks. After the absolute gate (−70 LUFS), 450 blocks survive. Their preliminary mean is:

\Gamma = -0.691 + 10\log_{10}\!\left(\frac{1}{450}\sum_{k \in \mathcal{B}_1} p_k\right) \approx -23.0 \text{ LUFS}.

The relative gate threshold is $-23.0 - 10 = -33.0$ LUFS. After discarding the 80 blocks below −33 LUFS (quiet passages, fade-outs), 370 blocks remain. Recomputing the mean over these 370 blocks yields the integrated loudness $I \approx -18.5$ LUFS.

Example 4: Inter-Sample Peak Detection

A two-sample sequence $x[0] = 0.90$ , $x[1] = 0.90$ of a 1 kHz sine at $f_s = 44100$ Hz. The digital peak meter reads $20\log_{10}(0.90) \approx -0.92$ dBFS. However, between these two samples, the underlying sinusoid may reach its true peak. The 4× upsampled interpolation adds three points between each pair of original samples. If the sinusoid’s continuous peak $= 0.956$ , the true peak level is $20\log_{10}(0.956) \approx -0.39$ dBTP — about 0.53 dB higher than what the sample-level meter shows. After MP3 or AAC encoding, this inter-sample peak can cause clipping on playback.

Example 5: Platform Gain Adjustment

A mastered track has integrated loudness $I = -9$ LUFS. Spotify targets −14 LUFS, so it applies a gain of:

\Delta G = -14 - (-9) = -5 \text{ dB}.

The resulting playback level is −14 LUFS. The track’s dynamic range is preserved in ratio, but its absolute level is reduced by 5 dB. Compared with a track mastered at −16 LUFS (which plays through unmodified on Spotify), the over-limited −9 LUFS track offers no loudness advantage and loses 5 dB of headroom.

Musical Connection / 音乐联系

音乐联系

The Loudness War and Its Aftermath

中文: “这和EP24讲的流行音乐信息论直接相关——当动态范围压缩到极限，音乐的熵减少了，可预测性增加，听众更快感到疲劳。”

The 20-year loudness war (roughly 1990–2010) is a case study in how a measurement standard — or the absence of one — reshapes an entire art form. With no universal reference level, each record label competed to make its releases sound louder on radio and in record stores. The weapon was dynamic range compression pushed to extremes.

Death Magnetic (Metallica, 2008) reached an integrated loudness of approximately −4 LUFS with a dynamic range of only 2–3 LU. Waveforms are visually “brick-walled” — the amplitude envelope is nearly flat. From the perspective of EP24’s Shannon entropy analysis, a signal with near-constant amplitude has lower entropy in its amplitude envelope, meaning less information about dynamics is transmitted to the listener. The auditory system habituates rapidly to constant stimulation, producing the “ear fatigue” frequently reported by listeners.

When We All Fall Asleep, Where Do We Go? (Billie Eilish, 2019) sits near −14 LUFS with a dynamic range of 8–10 LU. This is not merely an aesthetic preference — it is a rational response to the streaming normalization landscape. At −14 LUFS, the album plays on Spotify without any gain reduction. At −4 LUFS, every song would be turned down by 10 dB, erasing the loudness advantage entirely while retaining all the dynamic damage.

Classical recordings (symphonic works, solo piano) typically measure between −23 and −18 LUFS with dynamic ranges exceeding 20 LU. A full orchestra’s pianissimo might be at −45 LUFS while fortissimo passages approach −12 LUFS — a 33 LU range that would be annihilated by the brick-wall limiting applied to commercial pop. LUFS normalization allows these recordings to coexist on the same platform without forcing them into the same amplitude box.

The mathematical moral: LUFS is not just a metering convenience. It is an application of psychoacoustic weighting (K-filter), robust statistics (gated mean resists outlier silence blocks), and signal interpolation theory (true peak oversampling). The standard embeds a model of human perception directly into the arithmetic of loudness.

Limits and Open Questions / 局限性与开放问题

1. Spectral content blindness. K-weighting approximates the average human equal-loudness contour but cannot capture content-dependent loudness effects. A 3 kHz sine and a broadband noise signal can have identical LUFS values but very different perceived loudness, especially for listeners with hearing loss in specific frequency bands.

2. Stereo/mono equivalence problem. The channel weighting $G_i = 1$ for L/R treats stereo as twice the power of mono (sum of two equal channels). A mono signal panned center would measure approximately 3 dB quieter in LUFS than the same signal split L/R at equal levels — a subtle source of inconsistency in loudness normalization across formats.

3. Transient content and short-form media. The 400 ms block length and gating are calibrated for long-form programme material (TV, radio, full albums). For very short content (short-form social media videos, notification sounds, UI audio), the integrated loudness metric is not well-defined; only a few blocks exist and the gating may reject all or most of them.

4. Binaural and spatial audio. ITU-R BS.1770 was designed for channel-based audio. Ambisonics, binaural, and object-based formats (Dolby Atmos, Apple Spatial Audio) require extensions. The EBU is actively developing recommendations for spatial audio loudness measurement, but no universally adopted standard exists as of 2026.

5. True peak model assumptions. The 4× sinc oversampling true peak measurement assumes the signal is bandlimited to $f_s/2$ . After lossy compression (MP3, AAC), spectral content can be reconstructed in ways that violate this assumption, causing true peak estimates from the pre-encoded PCM to underestimate the actual peak level of the decoded output. This is why some mastering engineers target −1 dBTP or even −2 dBTP rather than the BS.1770 ceiling of 0 dBTP.

6. Perceptual models beyond K-weighting. More sophisticated loudness models (Zwicker loudness, Moore–Glasberg model, PEAQ) incorporate masking, spectral spread, and temporal integration in ways that K-weighting does not. Research into whether these models should replace or supplement LUFS for streaming normalization is ongoing.

Academic References / 参考文献

ITU-R BS.1770-4 (2015). Algorithms to measure audio programme loudness and true-peak audio level. International Telecommunication Union, Geneva. The defining standard for LUFS measurement.
EBU R 128 (2014, rev. 2020). Loudness normalisation and permitted maximum level of audio signals. European Broadcasting Union. Companion recommendation to BS.1770; introduced the dual-threshold gating in 2011.
Florian Camerer, Florian Hoffmann, et al. (2011). “EBU R 128 loudness normalisation.” EBU Technical Review, 2011-Q3. Practical implementation notes and psychoacoustic rationale.
Soulodre, G. A. (2004). “Evaluation of objective loudness meters.” AES 116th Convention, Berlin. Pre-standard study that informed the K-weighting design.
Zwicker, E. & Fastl, H. (1999). Psychoacoustics: Facts and Models. 2nd ed. Springer. Chapter 8 covers loudness models from which K-weighting is a simplified derivative.
Lerch, A. (2012). An Introduction to Audio Content Analysis. Wiley-IEEE Press. Chapter 3 covers loudness and dynamics features in the context of MIR.
Vickers, E. (2010). “The Loudness War: Background, Speculation and Recommendations.” AES 129th Convention, San Francisco. Historical analysis of dynamic range compression trends 1990–2010.
Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). “A model for the prediction of thresholds, loudness and partial loudness.” Journal of the Audio Engineering Society, 45(4), 224–240. Foundation for psychoacoustic loudness models more sophisticated than K-weighting.