EP39: How Spotify Knows You Are Too Loud — The LUFS Algorithm
前置知识
Overview / 概述
中文: “你有没有注意到,Spotify会自动把所有歌曲调到差不多的音量?太大声的曲子被压下来,太安静的被提上去。背后的标准叫 ITU-R BS.1770,由国际电信联盟在2006年制定,全称是’节目音量及真实峰值电平的测量'。”
When Spotify streams a song, it does not simply play the file at face value. Every track is measured by the ITU-R BS.1770 loudness standard and its gain is adjusted so the integrated loudness lands near a common target. The unit of measurement is LUFS — Loudness Units relative to Full Scale — a psychoacoustically weighted power measure that approximates how the human auditory system perceives volume.
The algorithm has four mathematically distinct stages:
- K-weighting — two cascaded biquad IIR filters that approximate the ear’s frequency sensitivity curve.
- Mean-square block analysis — 400 ms windows with 75% overlap, per-channel energy accumulation.
- Dual-threshold gating — an absolute gate at −70 LUFS and a relative gate 10 LU below the ungated mean, to exclude silence and very quiet passages.
- True peak measurement — 4× sinc oversampling to detect inter-sample peaks that lie between digital sample points.
This episode connects directly to three earlier results: Biquad IIR filter design (EP37) supplies the mathematical framework for the K-weighting cascade. Sinc interpolation (EP30) underpins the true peak oversampling step. Information entropy of pop music (EP24) gives the information-theoretic lens on the loudness war and dynamic range compression.
Prerequisites / 前置知识
| Topic | Where covered |
|---|---|
| Biquad IIR filter: transfer function, bilinear transform, poles/zeros | EP37 |
| Sinc interpolation, Nyquist–Shannon sampling theorem | EP30 |
| Shannon entropy, dynamic range as information | EP24 |
| Decibels: for amplitude, for power | EP01 |
Definitions
Let be the mean-square power of a K-weighted, gated audio signal, normalized so that digital full scale corresponds to 0 dBFS. The loudness in LUFS is
where the constant aligns the scale so that a 1 kHz sine wave at 0 dBFS reads LUFS (matching the legacy PPM meter convention). One LUFS equals one LU (Loudness Unit); differences are quoted in LU.
The K-weighting response is the cascade of two biquad IIR filters at sample rate :
- Stage 1 (head-diffraction high-shelf): centre frequency Hz, gain dB, quality factor .
- Stage 2 (RLB high-pass): cutoff Hz, .
Each stage has the canonical biquad transfer function
Coefficients are derived via the bilinear transform with pre-warping at the respective centre/cutoff frequency.
For block , let denote the K-weighted sample of channel at time , and let be the channel weighting coefficient. The block loudness is
where is the mean square energy of channel in block .
The ITU-R BS.1770-4 standard specifies the following per-channel coefficients :
| Channel | |
|---|---|
| Left (L) | 1.0 |
| Right (R) | 1.0 |
| Centre (C) | 1.0 |
| Left surround (Ls) | 1.41 |
| Right surround (Rs) | 1.41 |
| Low-frequency effects (LFE) | 0 |
The LFE channel is excluded because sub-bass energy below ~80 Hz is not perceived as loudness in the conventional sense.
Let be the full set of analysis blocks. Define two gating passes:
- Absolute gate: .
- Preliminary mean: .
- Relative gate: .
The integrated loudness of the programme is
Let be a digital audio signal at sample rate . Its true peak level (in dBTP — decibels relative to full scale true peak) is
where is the signal upsampled to (or higher) via sinc interpolation, so that inter-sample peaks between original sample positions are captured.
Main Theorems / 主要定理
The combined K-weighting response satisfies:
(i) High-frequency boost: For , , with a maximum gain of approximately dB near 2–4 kHz — the region of peak human auditory sensitivity.
(ii) Low-frequency rejection: For Hz, with a second-order (12 dB/octave) roll-off.
(iii) Passband reference: At 1 kHz, (0 dB), establishing the normalization anchor.
Write where is the high-shelf and is the high-pass biquad.
Part (i). The high-shelf has gain approaching (i.e., +4 dB) at high frequencies, while is near unity in the same band (its cutoff at 38 Hz is far below 2 kHz). Therefore at 2 kHz, confirming the +4 dB boost. The exact peak location near 2–4 kHz follows from the shelf’s transition band centered at 1681.97 Hz combined with the ear’s Fletcher–Munson equal-loudness curves.
Part (ii). The high-pass has a second-order numerator zero at DC (, i.e., ) with and , so . Near DC, , giving 12 dB/octave attenuation. Multiplying by , which has finite nonzero gain at DC, preserves the roll-off rate.
Part (iii). At 1 kHz the shelf is still in its transition region and contributes approximately +2 dB, while the high-pass is effectively transparent. The ITU normalization constant dB in the loudness formula compensates so that a 1 kHz sine at 0 dBFS reads −3.01 LUFS, consistent with legacy meter calibration.
For a single-channel signal (K-weighted, ) with mean-square power in block , the block loudness satisfies
Moreover, if (approximately −70 dBFS power), then LUFS, placing the block below the absolute gate.
By Definition 39.4 with a single channel and ,
For the gate condition: iff , i.e., . In practice the standard states the absolute gate as LUFS, so any block with mean-square power below this threshold is excluded.
The dual-gate integrated loudness (Definition 39.6) is the power-domain mean of all gated block loudnesses. Specifically, if we define the per-block linear power , then
Furthermore, the relative gate is equivalent to the condition , i.e., blocks must have linear power at least one-tenth of the preliminary gated mean.
The integrated loudness formula in Definition 39.6 is definitionally a power average. To see the equivalence with the relative gate: the condition rearranges as
so , where is the preliminary gated linear power. This confirms the 10 LU offset is a factor-of-10 power ratio in the linear domain.
Let be a band-limited digital signal with bandwidth . Define the 4× upsampled signal
where . Then:
(i) whenever (the upsampled signal agrees with the original at integer multiples of 4).
(ii) , and equality holds only if no inter-sample peak exceeds any sample value.
(iii) The true peak error , where is the ideal continuous reconstruction, satisfies for some small determined by the stopband attenuation of the anti-aliasing filter applied before upsampling. At 4× oversampling, BS.1770 specifies a maximum error of +0.5/−1.0 dBTP.
Part (i). Substituting into the sinc sum: , since for integer .
Part (ii). The upsampled grid is a strict superset of the original grid , so the maximum over the finer grid is at least as large. Equality holds iff the continuous signal achieves its maximum at a sample point.
Part (iii). The true continuous reconstruction error is bounded by the Gibbs/aliasing artifacts of the finite-length interpolation filter. BS.1770 Annex 2 specifies a minimum filter order and stopband attenuation such that the oversampling measurement stays within the stated tolerance.
If a signal is subjected to brick-wall limiting so that its sample values satisfy , then its integrated loudness is bounded above by
For stereo (), this gives . At digital full scale , the ceiling is approximately LUFS — a practical upper bound on how loud any stereo programme can ever measure.
Numerical Examples
Example 1: Computing Biquad Coefficients for Stage 2 (38 Hz High-Pass)
At Hz, cutoff Hz, :
The normalization constant:
High-pass biquad numerator coefficients:
Denominator coefficients:
These coefficients are essentially a near-transparent all-pass above 100 Hz but roll off sharply below 38 Hz.
Example 2: Block Loudness Calculation (Stereo)
Suppose a 400 ms stereo block has K-weighted mean-square energies:
With :
Block loudness:
This block passes the absolute gate (−10.91 > −70) and would also survive a relative gate unless the preliminary mean were above −0.91 LUFS — essentially impossible in practice.
Example 3: Dual-Gate Pass Through
Suppose a programme has 500 analysis blocks. After the absolute gate (−70 LUFS), 450 blocks survive. Their preliminary mean is:
The relative gate threshold is LUFS. After discarding the 80 blocks below −33 LUFS (quiet passages, fade-outs), 370 blocks remain. Recomputing the mean over these 370 blocks yields the integrated loudness LUFS.
Example 4: Inter-Sample Peak Detection
A two-sample sequence , of a 1 kHz sine at Hz. The digital peak meter reads dBFS. However, between these two samples, the underlying sinusoid may reach its true peak. The 4× upsampled interpolation adds three points between each pair of original samples. If the sinusoid’s continuous peak , the true peak level is dBTP — about 0.53 dB higher than what the sample-level meter shows. After MP3 or AAC encoding, this inter-sample peak can cause clipping on playback.
Example 5: Platform Gain Adjustment
A mastered track has integrated loudness LUFS. Spotify targets −14 LUFS, so it applies a gain of:
The resulting playback level is −14 LUFS. The track’s dynamic range is preserved in ratio, but its absolute level is reduced by 5 dB. Compared with a track mastered at −16 LUFS (which plays through unmodified on Spotify), the over-limited −9 LUFS track offers no loudness advantage and loses 5 dB of headroom.
Musical Connection / 音乐联系
The Loudness War and Its Aftermath
中文: “这和EP24讲的流行音乐信息论直接相关——当动态范围压缩到极限,音乐的熵减少了,可预测性增加,听众更快感到疲劳。”
The 20-year loudness war (roughly 1990–2010) is a case study in how a measurement standard — or the absence of one — reshapes an entire art form. With no universal reference level, each record label competed to make its releases sound louder on radio and in record stores. The weapon was dynamic range compression pushed to extremes.
Death Magnetic (Metallica, 2008) reached an integrated loudness of approximately −4 LUFS with a dynamic range of only 2–3 LU. Waveforms are visually “brick-walled” — the amplitude envelope is nearly flat. From the perspective of EP24’s Shannon entropy analysis, a signal with near-constant amplitude has lower entropy in its amplitude envelope, meaning less information about dynamics is transmitted to the listener. The auditory system habituates rapidly to constant stimulation, producing the “ear fatigue” frequently reported by listeners.
When We All Fall Asleep, Where Do We Go? (Billie Eilish, 2019) sits near −14 LUFS with a dynamic range of 8–10 LU. This is not merely an aesthetic preference — it is a rational response to the streaming normalization landscape. At −14 LUFS, the album plays on Spotify without any gain reduction. At −4 LUFS, every song would be turned down by 10 dB, erasing the loudness advantage entirely while retaining all the dynamic damage.
Classical recordings (symphonic works, solo piano) typically measure between −23 and −18 LUFS with dynamic ranges exceeding 20 LU. A full orchestra’s pianissimo might be at −45 LUFS while fortissimo passages approach −12 LUFS — a 33 LU range that would be annihilated by the brick-wall limiting applied to commercial pop. LUFS normalization allows these recordings to coexist on the same platform without forcing them into the same amplitude box.
The mathematical moral: LUFS is not just a metering convenience. It is an application of psychoacoustic weighting (K-filter), robust statistics (gated mean resists outlier silence blocks), and signal interpolation theory (true peak oversampling). The standard embeds a model of human perception directly into the arithmetic of loudness.
Limits and Open Questions / 局限性与开放问题
1. Spectral content blindness. K-weighting approximates the average human equal-loudness contour but cannot capture content-dependent loudness effects. A 3 kHz sine and a broadband noise signal can have identical LUFS values but very different perceived loudness, especially for listeners with hearing loss in specific frequency bands.
2. Stereo/mono equivalence problem. The channel weighting for L/R treats stereo as twice the power of mono (sum of two equal channels). A mono signal panned center would measure approximately 3 dB quieter in LUFS than the same signal split L/R at equal levels — a subtle source of inconsistency in loudness normalization across formats.
3. Transient content and short-form media. The 400 ms block length and gating are calibrated for long-form programme material (TV, radio, full albums). For very short content (short-form social media videos, notification sounds, UI audio), the integrated loudness metric is not well-defined; only a few blocks exist and the gating may reject all or most of them.
4. Binaural and spatial audio. ITU-R BS.1770 was designed for channel-based audio. Ambisonics, binaural, and object-based formats (Dolby Atmos, Apple Spatial Audio) require extensions. The EBU is actively developing recommendations for spatial audio loudness measurement, but no universally adopted standard exists as of 2026.
5. True peak model assumptions. The 4× sinc oversampling true peak measurement assumes the signal is bandlimited to . After lossy compression (MP3, AAC), spectral content can be reconstructed in ways that violate this assumption, causing true peak estimates from the pre-encoded PCM to underestimate the actual peak level of the decoded output. This is why some mastering engineers target −1 dBTP or even −2 dBTP rather than the BS.1770 ceiling of 0 dBTP.
6. Perceptual models beyond K-weighting. More sophisticated loudness models (Zwicker loudness, Moore–Glasberg model, PEAQ) incorporate masking, spectral spread, and temporal integration in ways that K-weighting does not. Research into whether these models should replace or supplement LUFS for streaming normalization is ongoing.
Academic References / 参考文献
-
ITU-R BS.1770-4 (2015). Algorithms to measure audio programme loudness and true-peak audio level. International Telecommunication Union, Geneva. The defining standard for LUFS measurement.
-
EBU R 128 (2014, rev. 2020). Loudness normalisation and permitted maximum level of audio signals. European Broadcasting Union. Companion recommendation to BS.1770; introduced the dual-threshold gating in 2011.
-
Florian Camerer, Florian Hoffmann, et al. (2011). “EBU R 128 loudness normalisation.” EBU Technical Review, 2011-Q3. Practical implementation notes and psychoacoustic rationale.
-
Soulodre, G. A. (2004). “Evaluation of objective loudness meters.” AES 116th Convention, Berlin. Pre-standard study that informed the K-weighting design.
-
Zwicker, E. & Fastl, H. (1999). Psychoacoustics: Facts and Models. 2nd ed. Springer. Chapter 8 covers loudness models from which K-weighting is a simplified derivative.
-
Lerch, A. (2012). An Introduction to Audio Content Analysis. Wiley-IEEE Press. Chapter 3 covers loudness and dynamics features in the context of MIR.
-
Vickers, E. (2010). “The Loudness War: Background, Speculation and Recommendations.” AES 129th Convention, San Francisco. Historical analysis of dynamic range compression trends 1990–2010.
-
Moore, B. C. J., Glasberg, B. R., & Baer, T. (1997). “A model for the prediction of thresholds, loudness and partial loudness.” Journal of the Audio Engineering Society, 45(4), 224–240. Foundation for psychoacoustic loudness models more sophisticated than K-weighting.