EP30

EP30: The Sampling Theorem Is Not What You Think

sinc reconstruction, quantization SNR, TPDF dithering, noise shaping NTF(z)

▶ 6:12 Signal Processing

前置知识

EP02 String Vibration and the Wave Equation EP07 Information Entropy and All-Interval Rows

后续拓展

EP35 The Phase Vocoder — Mathematics of Pitch Shifting EP39 How Spotify Knows You Are Too Loud — The LUFS Algorithm

Overview

“Digital audio is a staircase waveform — the signal between sample points is guessed, so digital is less natural than analog.” This claim is widespread, yet it fundamentally misunderstands the mathematics of signal reconstruction. The Nyquist-Shannon sampling theorem tells us: if a signal is bandlimited, its original continuous waveform can be perfectly reconstructed from uniformly spaced discrete samples alone — not approximately, but exactly. The key to reconstruction is the sinc function, which fills in between discrete samples not with guesses, but with values uniquely determined by those samples.

Yet sampling is only the first stage of the digital audio pipeline. The central argument of this episode is: sampling is not the most fragile link in digital audio — quantization is. Quantization maps continuous amplitude to a finite number of levels, inevitably introducing truncation error. Without mitigation, the quantization error of a low-level signal is highly correlated with the signal itself, producing harsh harmonic distortion rather than an acceptable white noise floor. The technical chain that solves this problem — dithering and noise shaping — is the real reason the CD format at 44.1 kHz / 16-bit still sounds so good.

This episode derives the sampling theorem in the frequency domain, proceeds through the quantization SNR formula derivation, then to the proof of the TPDF triangular probability distribution, and finally analyzes the first-order noise shaping transfer function $\text{NTF}(z) = 1 - z^{-1}$ , providing a complete mathematical picture of the digital audio conversion chain.

中文: “你有没有听过这种说法：数字音频是阶梯波，每个采样点之间的信号是’猜’出来的，所以数字不如模拟’自然'。这是错的。采样定理告诉我们，只要采样率足够高，原始信号可以被完美重建——不是近似，是完美。重建的工具，叫sinc函数。今天我们把这件事从头算清楚。”

Prerequisites

Wave equation and Fourier series (EP02) — frequency-domain analysis, convolution theorem, power spectral density
Shannon information theory (EP07) — information entropy, bit rate, Shannon capacity theorem (and the information-theoretic context of the Nyquist theorem)

Definitions

Definition 30.1 (Bandlimited Signal)

A real-valued signal $x(t)$ is said to be bandlimited to $B$ Hz if its Fourier transform

$\hat{x}(f) = \int_{-\infty}^{\infty} x(t)\, e^{-2\pi i f t}\, dt$

satisfies $\hat{x}(f) = 0$ for all $|f| > B$ . That is, all energy of the signal is concentrated in the frequency interval $[-B, B]$ .

Example: The upper limit of human hearing is approximately 20 kHz, so for speech/music signals $B = 20{,}000$ Hz is sufficient.

Definition 30.2 (sinc Function)

The normalized sinc function is defined as

$\operatorname{sinc}(u) = \frac{\sin(\pi u)}{\pi u}, \qquad \operatorname{sinc}(0) \coloneqq 1$

Key properties:

$\operatorname{sinc}(0) = 1$ ;
$\operatorname{sinc}(n) = 0$ for all nonzero integers $n \in \mathbb{Z} \setminus \{0\}$ ;
$\int_{-\infty}^{\infty} \operatorname{sinc}(u)\, du = 1$ ;
Its Fourier transform is a rectangular window: $\mathcal{F}[\operatorname{sinc}(t/T)](f) = T \cdot \mathbf{1}_{|f| \leq 1/(2T)}(f)$ .

Intuition: The sinc function is an “ideal interpolation impulse” — it equals 1 at its own time instant and equals 0 at all other integer time instants.

Definition 30.3 (Uniform Quantization and Least Significant Bit)

Given an $n$ -bit quantizer with full-scale signal range $[-A, A]$ , the range is divided uniformly into $2^n$ levels. The quantization step size (least significant bit, LSB) is

$q = \frac{2A}{2^n}$

The quantization function $Q: \mathbb{R} \to \{-A + q/2,\, -A + 3q/2,\, \ldots,\, A - q/2\}$ rounds the input to the nearest level. The quantization error is defined as $\varepsilon = Q(x) - x$ , satisfying $|\varepsilon| \leq q/2$ .

Example (16-bit, full scale ±1 V): $q = 2/(2^{16}) = 1/32768 \approx 30.5\,\mu\text{V}$ . A single level is only 30 microvolts wide — this is the precision of CD quantization.

Definition 30.4 (TPDF Dither Noise)

Triangular Probability Density Function (TPDF) noise $D$ has probability density

$p_D(d) = \frac{1}{q}\left(1 - \frac{|d|}{q}\right) \mathbf{1}_{|d| \leq q}(d)$

This is a triangular distribution on $[-q, q]$ , with zero mean and variance $\sigma_D^2 = q^2/6$ .

Why triangular? The probability density of the difference of two independent random variables uniformly distributed on $[-q/2, q/2]$ is exactly the convolution of two rectangular functions, which yields a triangle (see Theorem 30.3).

Definition 30.5 (Noise Transfer Function)

In the $z$ -domain, the frequency response of a discrete-time filter acting on quantization error is called the Noise Transfer Function (NTF). For a first-order error-feedback noise shaper, the NTF is defined as

$\text{NTF}(z) = 1 - z^{-1}$

Its frequency response (setting $z = e^{i\omega}$ ) is

$|\text{NTF}(e^{i\omega})|^2 = |1 - e^{-i\omega}|^2 = 4\sin^2\!\left(\frac{\omega}{2}\right)$

This is zero at $\omega = 0$ (DC) and reaches its maximum value of 4 at $\omega = \pi$ (Nyquist). This means noise energy is “pushed” from low frequencies toward high frequencies.

Main Theorems

Theorem 30.1 (Nyquist-Shannon Sampling Theorem)

Let $x(t)$ be a signal bandlimited to $B$ Hz, i.e., $\hat{x}(f) = 0$ for $|f| > B$ . Choose sampling interval $T \leq \dfrac{1}{2B}$ (equivalently, sampling rate $f_s = 1/T \geq 2B$ ). Then $x(t)$ is completely determined by its sample sequence $\{x(nT)\}_{n=-\infty}^{\infty}$ and is exactly reconstructed by

$x(t) = \sum_{n=-\infty}^{\infty} x(nT)\cdot \operatorname{sinc}\!\left(\frac{t - nT}{T}\right)$

Conclusion: For bandlimited signals, discrete sampling loses no information; the “staircase approximation” is an artifact of zero-order hold circuits, not an inherent consequence of the sampling theorem.

Proof.

Step 1: Sampling = multiplication by a Dirac comb.

Ideal sampling of $x(t)$ is equivalent to multiplying by a Dirac comb with period $T$ :

$x_s(t) = x(t) \cdot \sum_{n=-\infty}^{\infty} \delta(t - nT) = \sum_{n=-\infty}^{\infty} x(nT)\,\delta(t - nT)$

Step 2: In the frequency domain, sampling = periodic replication of the spectrum.

By the convolution theorem, multiplication in time corresponds to convolution in frequency. The Fourier transform of the comb is again a comb:

$\mathcal{F}\!\left[\sum_n \delta(t - nT)\right](f) = \frac{1}{T}\sum_k \delta\!\left(f - \frac{k}{T}\right)$

Therefore

$\hat{x}_s(f) = \frac{1}{T}\sum_{k=-\infty}^{\infty} \hat{x}\!\left(f - \frac{k}{T}\right)$

That is, $\hat{x}_s(f)$ is a periodic superposition of copies of $\hat{x}(f)$ with period $1/T$ .

Step 3: No-aliasing condition ⟺ copies do not overlap.

Since $x(t)$ is bandlimited to $B$ , each copy $\hat{x}(f - k/T)$ has support on $|f - k/T| \leq B$ . Adjacent copies ( $k=0$ and $k=\pm 1$ ) do not overlap if and only if

$B \leq \frac{1}{2T} \iff T \leq \frac{1}{2B} \iff f_s \geq 2B$

Step 4: Reconstruction = multiplication by a rectangular low-pass filter.

Under the no-aliasing condition, multiplying $\hat{x}_s(f)$ by the rectangular window $T \cdot \mathbf{1}_{|f| \leq 1/(2T)}$ exactly recovers $\hat{x}(f)$ . In the time domain this corresponds to convolution with the sinc function:

$x(t) = x_s(t) * \left[T \cdot \frac{1}{T}\operatorname{sinc}\!\left(\frac{t}{T}\right)\right] = \sum_n x(nT)\,\delta(t - nT) * \operatorname{sinc}\!\left(\frac{t}{T}\right) = \sum_n x(nT)\operatorname{sinc}\!\left(\frac{t-nT}{T}\right)$

This is exactly the reconstruction formula in the theorem. $\square$

Verification of the interpolation property: At $t = kT$ , $\operatorname{sinc}\!\bigl((kT - nT)/T\bigr) = \operatorname{sinc}(k-n)$ . Since $k - n \in \mathbb{Z}$ , this is 0 when $k \neq n$ and 1 when $k = n$ . So the right-hand side exactly recovers $x(kT)$ at $t = kT$ . $\square$

The following script visualizes the sinc interpolation process: each sample point excites a sinc lobe, and all lobes sum to exactly reconstruct the original bandlimited signal.

Sinc interpolation — perfect reconstruction from discrete samples

The following script demonstrates aliasing: when the sample rate is below twice the signal frequency, the high-frequency signal is “folded” into an incorrect lower-frequency component.

Aliasing — what happens when sample rate < 2B

Theorem 30.2 (Quantization SNR Formula)

For an $n$ -bit uniform quantizer, the signal-to-noise ratio for a full-scale sinusoidal signal is

$\text{SNR} = 6.02n + 1.76 \text{ dB}$

Specifically, each additional bit increases the SNR by approximately 6 dB.

Proof.

Quantization error power: In the standard model of uniform quantization, the quantization error $\varepsilon$ is modeled as a random variable uniformly distributed on $[-q/2, q/2]$ . Its power (variance) is

$P_{\text{noise}} = \mathbb{E}[\varepsilon^2] = \int_{-q/2}^{q/2} \varepsilon^2 \cdot \frac{1}{q}\, d\varepsilon = \frac{1}{q} \cdot \frac{2(q/2)^3}{3} = \frac{q^2}{12}$

Full-scale sinusoidal signal power: Let the full-scale sine wave be $x(t) = A\sin(2\pi f_0 t)$ , where $A = 2^n q / 2$ (so the peak exactly reaches the quantization range limit). Its power is

$P_{\text{signal}} = \frac{A^2}{2} = \frac{(2^{n-1} q)^2}{2} = \frac{2^{2n} q^2}{8}$

Linear SNR ratio:

$\frac{P_{\text{signal}}}{P_{\text{noise}}} = \frac{2^{2n} q^2 / 8}{q^2 / 12} = \frac{12 \cdot 2^{2n}}{8} = \frac{3}{2} \cdot 2^{2n}$

Converting to decibels:

$\text{SNR} = 10\log_{10}\!\left(\frac{3}{2} \cdot 2^{2n}\right) = 10\log_{10}\!\frac{3}{2} + 10 \cdot 2n\log_{10} 2$

$= 10 \times 0.17609 + 20n \times 0.30103 \approx 1.76 + 6.02n \text{ dB}$

That is, $\text{SNR} \approx 6.02n + 1.76$ dB. $\square$

Theorem 30.3 (TPDF Independence Theorem)

Let $U_1, U_2 \overset{\text{i.i.d.}}{\sim} \operatorname{Uniform}[-q/2,\, q/2]$ , and let $D = U_1 - U_2$ . Then:

Distribution: $D$ follows a triangular distribution on $[-q, q]$ with probability density

$p_D(d) = \frac{1}{q}\left(1 - \frac{|d|}{q}\right)\mathbf{1}_{|d|\leq q}(d)$

Variance: $\operatorname{Var}(D) = q^2/6$ ;
Independence: If TPDF noise $D$ is added to the signal before quantization, the statistical properties of the quantization error $E = Q(x + D) - x$ are independent of the input signal $x$ .

Proof.

Part 1 (triangular distribution):

The probability density of $D = U_1 - U_2$ is the convolution of the densities of $U_1$ and $-U_2$ . Since $-U_2 \sim \operatorname{Uniform}[-q/2, q/2]$ (uniform distribution is symmetric about zero),

$p_D(d) = (p_{U_1} * p_{-U_2})(d) = \int_{-q/2}^{q/2} p_{U_1}(d + u)\, p_{U_2}(u)\, du$

where $p_{U_1}(s) = p_{U_2}(s) = \frac{1}{q}\mathbf{1}_{|s|\leq q/2}$ . Computing the integral:

For $0 \leq d \leq q$ , the integrand is nonzero only when $-q/2 \leq u \leq q/2$ and $-q/2 \leq d+u \leq q/2$ , i.e., $-q/2 - d \leq u \leq q/2 - d$ intersected with $-q/2 \leq u \leq q/2$ gives $[-q/2, q/2 - d]$ , of length $q - d$ . Therefore

$p_D(d) = \frac{1}{q^2}(q - d) = \frac{1}{q}\left(1 - \frac{d}{q}\right), \quad 0 \leq d \leq q$

Since the distribution of $D$ is symmetric about 0 (because $U_1, U_2$ are identically distributed), for $d < 0$ we have $p_D(d) = \frac{1}{q}(1 + d/q)$ , which combines to $\frac{1}{q}(1 - |d|/q)$ .

Part 2 (variance):

$\operatorname{Var}(D) = \operatorname{Var}(U_1) + \operatorname{Var}(U_2) = 2 \cdot \frac{q^2}{12} = \frac{q^2}{6}$

Part 3 (independence):

Adding TPDF noise $D$ to the signal before quantization, the value being quantized is $x + D$ . The key observation is that $D$ has range $[-q, q]$ , width $2q$ , while the quantization step size is $q$ . For any fixed $x$ , the quantity $x + D$ takes values over a range of width $2q$ according to a triangular distribution; since the integral of the triangular distribution over any $q$ -wide interval is completely determined by the distribution of $D$ , the distribution of the post-quantization error $E = Q(x+D) - x$ does not depend on which quantization bin $x$ falls in. Therefore $E$ is statistically independent of $x$ , and the quantization error degrades from harmonic distortion to signal-independent white noise. $\square$

The following script compares quantization error with and without TPDF dithering — four panels showing time-domain waveforms and spectra, clearly illustrating how harmonic distortion is eliminated and converted to flat white noise.

TPDF dithering — quantization error with and without dither

Theorem 30.4 (Spectral Properties of First-Order Noise Shaping)

Let the noise transfer function of a first-order error-feedback noise shaper be

$\text{NTF}(z) = 1 - z^{-1}$

Then:

Frequency response: Setting $z = e^{i\omega}$ ( $\omega \in [0, \pi]$ is the normalized angular frequency),

$|\text{NTF}(e^{i\omega})|^2 = 4\sin^2\!\left(\frac{\omega}{2}\right)$

Low-frequency noise suppression: Within bandwidth $[0, \omega_{\max}]$ ( $\omega_{\max} \ll \pi$ ), the ratio of in-band noise power after shaping compared to unshaped is approximately

$\frac{\int_0^{\omega_{\max}} 4\sin^2(\omega/2)\, d\omega}{\int_0^{\pi} 4\sin^2(\omega/2)\, d\omega} \approx \frac{\omega_{\max}^3/3}{\pi} \quad (\omega_{\max} \to 0)$

Zero at DC: $|\text{NTF}(e^{i\cdot 0})|^2 = 0$ ; Maximum at Nyquist: $|\text{NTF}(e^{i\pi})|^2 = 4$ .

Proof.

Part 1 (frequency response derivation):

Setting $z = e^{i\omega}$ ,

$\text{NTF}(e^{i\omega}) = 1 - e^{-i\omega}$

Using the identity $1 - e^{-i\omega} = e^{-i\omega/2}(e^{i\omega/2} - e^{-i\omega/2}) = e^{-i\omega/2} \cdot 2i\sin(\omega/2)$ , taking the squared modulus:

$|\text{NTF}(e^{i\omega})|^2 = |e^{-i\omega/2}|^2 \cdot |2i\sin(\omega/2)|^2 = 1 \cdot 4\sin^2\!\left(\frac{\omega}{2}\right)$

Part 2 (low-frequency noise power):

Without shaping, quantization noise power is uniformly distributed over $[0, \pi]$ (white noise). After shaping, the in-band noise power is proportional to

$\int_0^{\omega_{\max}} 4\sin^2\!\left(\frac{\omega}{2}\right) d\omega$

Using $\sin^2(u) \approx u^2$ ( $u \to 0$ ), letting $u = \omega/2$ :

$\int_0^{\omega_{\max}} 4\left(\frac{\omega}{2}\right)^2 d\omega = \int_0^{\omega_{\max}} \omega^2\, d\omega = \frac{\omega_{\max}^3}{3}$

Without shaping, the in-band noise power is proportional to $\omega_{\max}$ (white noise uniformly distributed). The suppression ratio from shaping is therefore

$\frac{\omega_{\max}^3/3}{\omega_{\max} \cdot \pi} = \frac{\omega_{\max}^2}{3\pi}$

For the CD format (44.1 kHz sampling, 20 kHz audio bandwidth), $\omega_{\max} = \pi \cdot (20/22.05) \approx 0.907\pi$ , and the noise power is substantially redistributed away from the audio band, giving a perceptual in-band SNR gain equivalent to approximately 2 bits.

Part 3:

$\omega = 0$ : $4\sin^2(0) = 0$ . $\omega = \pi$ : $4\sin^2(\pi/2) = 4 \cdot 1 = 4$ . $\square$

The following script plots the magnitude response of first-, second-, and fifth-order NTFs, illustrating how noise is pushed from the low-frequency region toward the Nyquist edge.

Noise shaping — NTF(z) = 1 − z⁻¹ pushes noise to high frequencies

Numerical Examples

SNR at different bit depths (full-scale sinusoidal signal, uniform quantization):

Bit depth $n$	Theoretical SNR (dB)	Equivalent dynamic range	Notes
8	49.9	~50 dB	Early game consoles, telephone voice
16	98.1	~98 dB	CD format, covers full human hearing dynamic range
24	146.2	~146 dB	Professional recording, ~60 dB beyond human hearing limit
32	194.2	~194 dB	Floating-point format, practically limited by noise floor

Verification (16-bit):

\text{SNR}_{16} = 6.02 \times 16 + 1.76 = 96.32 + 1.76 = 98.08 \text{ dB}

Verification (24-bit):

\text{SNR}_{24} = 6.02 \times 24 + 1.76 = 144.48 + 1.76 = 146.24 \text{ dB}

The human ear’s dynamic range spans from 0 dB (hearing threshold) to 120 dB (pain threshold), a total of 120 dB. 16-bit already provides approximately 98 dB, which exceeds the requirements of real-world audio use cases (a professional recording studio noise floor is around -70 dBFS). The extra 48 dB of 24-bit is not for the listener — it is headroom for engineers during gain adjustments, mix compression, and other processing.

中文: “代入具体数字：十六比特给出约九十八分贝——覆盖人耳动态范围。二十四比特给出约一百四十六分贝——超过人耳极限六十分贝。所以专业录音用二十四比特，不是为了你能听到更多细节，而是给工程师留了足够的增益余量。”

TPDF noise variance calculation:

TPDF noise $D = U_1 - U_2$ has variance $= q^2/6$ , which is twice the uniform quantization error variance $q^2/12$ . This is the cost of dithering: the noise floor power doubles (approximately +3 dB), but in exchange the quantization error becomes statistically independent of the signal.

Noise shaping gain estimate:

Effective noise suppression of the first-order NTF within the audio band (0–20 kHz):

\text{suppression ratio} \approx \frac{\omega_{\max}^2}{3\pi} \approx \frac{(0.907\pi)^2}{3\pi} \approx \frac{0.823\pi}{3} \approx 0.86

This means approximately 14% of the noise energy remains in the audio band; the rest is pushed above 20 kHz. A fifth-order NTF can further reduce this fraction to give a perceptual SNR gain equivalent to +4 bits.

Musical Connection

音乐联系

Why CDs sound better than theory predicts

Theory tells us that 16-bit gives an SNR of approximately 98 dB; the human ear’s dynamic range is approximately 120 dB; seemingly the CD format has a 22 dB gap. In practice, however, modern CD players add TPDF dithering and noise shaping before D/A conversion, eliminating harmonic distortion components in the quantization error and pushing noise energy above 20 kHz — beyond the range of human hearing sensitivity. The final perceived SNR can be equivalent to 18–20 bit quantization depth, far exceeding the theoretical 16-bit calculation. Behind the engineering conclusion that “the CD format is good enough” lies the precise collaboration of three theorems: the sampling theorem ensures no aliasing, the TPDF theorem eliminates harmonic distortion, and the NTF theorem pushes the cost outside the audible range.

The analog counterpart

Noise shaping was not invented for digital audio. The pre-emphasis/de-emphasis networks in analog AM broadcasting boost high-frequency signals at the transmitter and reverse this at the receiver, achieving an effect analogous to noise shaping: the channel noise (predominantly at high frequencies) is suppressed by the de-emphasis filter after demodulation. This idea corresponds exactly to the error feedback of the NTF — one is implemented in the analog frequency domain, the other in the digital $z$ -domain.

Connection to EP02: the sampling theorem from a Fourier perspective

The Fourier series in EP02

revealed the duality between the time domain and frequency domain. The sampling theorem is a precise exploitation of this duality: discrete sampling in the time domain (a periodic impulse sequence) is equivalent to periodic replication in the frequency domain. The bandlimited condition ensures the copies do not overlap, making reconstruction uniquely determined. “Sampling = amplitude discretization” and “quantization = amplitude discretization” form two orthogonal dimensions of digital audio, each requiring independent mathematical tools for analysis and compensation.

Forward connections to EP35 and EP39

EP35 (Phase Vocoder)

, sinc interpolation is used directly in time-stretching algorithms: re-interpolating the phase of frequency bins in the transform domain is mathematically identical in structure to the Whittaker-Shannon reconstruction formula.

EP39 (LUFS loudness standard)

, True Peak detection also relies on sinc interpolation: the PCM waveform is upsampled by 4× before taking the maximum value, to detect amplitude overshoots that may exist between sample points. This is a direct engineering application of the sampling theorem from this episode — when the sample rate is high enough, inter-sample peaks can be precisely predicted.

Limitations and Open Problems

Physical unrealizability of the ideal sinc: The Whittaker-Shannon reconstruction formula requires an infinitely long sinc convolution kernel, which is not implementable in real-time systems. Practical DACs use finite-length FIR filter approximations, introducing truncation error (Gibbs phenomenon) and linear phase delay. The parameter selection of optimal truncation windows (Kaiser window, Blackman-Harris window, etc.) is an ongoing engineering optimization problem.
Unified framework of oversampling and Sigma-Delta modulation: Nearly all modern high-resolution audio ADCs use a Sigma-Delta (∑-Δ) architecture: first performing 1-bit quantization at a very high sample rate (several MHz), then downsampling through digital filters. This integrates TPDF dithering, noise shaping, and sample rate conversion into a unified architecture. Its precise noise analysis requires high-order NTFs (e.g., fifth- or seventh-order), beyond the scope of the first-order analysis in this episode.
Cooperation between non-uniform quantization and perceptual coding: This episode assumes uniform quantization (equal width per level). In practice, the human ear is far more sensitive to low-level signals than high-level ones. μ-law/A-law companding (telephone voice) and the bit allocation algorithms of MP3 (see EP40) are both based on perceptually weighted non-uniform quantization, achieving higher perceptual quality at the same bit count.
Optimal choice of dither spectrum: TPDF is not the only dithering strategy. Gaussian dither has shorter correlation, and blue-noise (high-pass shaped) dither can further push the noise floor energy toward high frequencies. A rigorous proof of optimality for the dither spectrum that minimizes perceived noise floor in a formal sense is currently lacking.

Conjecture (Conjecture on the Upper Bound of Noise Shaping Order)

Intuitively, the higher the NTF order, the stronger the in-band noise suppression and the higher the perceptual SNR. However, high-order NTFs exhibit sharply increasing noise gain near Nyquist, which can cause quantizer overload and system instability. Conjecture: For a system with 44.1 kHz sampling and 20 kHz audio bandwidth, there exists an optimal NTF order $N^*$ that maximizes perceptually weighted SNR subject to stability constraints, and $N^* \leq 7$ .

Falsifiability criterion: If there exists an NTF of order $>7$ that is stable (no overload on any full-scale sinusoidal input) and exceeds the best known seventh-order design in A-weighted SNR, then this conjecture is falsified.

References

Shannon, C. E. (1949). Communication in the presence of noise. Proceedings of the IRE, 37(1), 10–21. (Original proof of the sampling theorem)
Nyquist, H. (1928). Certain topics in telegraph transmission theory. Transactions of the AIEE, 47(2), 617–644.
Whittaker, E. T. (1915). On the functions which are represented by the expansions of the interpolation theory. Proceedings of the Royal Society of Edinburgh, 35, 181–194.
Wannamaker, R. A., Lipshitz, S. P., Vanderkooy, J., & Wright, J. N. (2000). A theory of nonsubtractive dither. IEEE Transactions on Signal Processing, 48(2), 499–516. (Rigorous proof of TPDF independence)
Lipshitz, S. P., & Vanderkooy, J. (1992). Dither in digital audio. Journal of the Audio Engineering Society, 40(12), 966–979.
Adams, R. W. (1986). Design and implementation of an audio 18-bit analog-to-digital converter using oversampling techniques. Journal of the Audio Engineering Society, 34(3), 153–166. (Pioneering paper on ∑-Δ)
Norsworthy, S. R., Schreier, R., & Temes, G. C. (Eds.). (1997). Delta-Sigma Data Converters: Theory, Design, and Simulation. IEEE Press.
Zölzer, U. (2008). Digital Audio Signal Processing (2nd ed.). Wiley. Ch. 3 (Quantization).
Smith, J. O. (2011). Spectral Audio Signal Processing. CCRMA, Stanford University. (Open textbook, online: ccrma.stanford.edu/~jos/sasp)
Pohlmann, K. C. (2010). Principles of Digital Audio (6th ed.). McGraw-Hill. Ch. 2 (Sampling and Quantization).
Proakis, J. G., & Manolakis, D. G. (2006). Digital Signal Processing: Principles, Algorithms, and Applications (4th ed.). Pearson. Ch. 4.
ITU-R BS.1770-4. (2015). Algorithms to measure audio programme loudness and true-peak audio level. ITU Radiocommunication Sector. (True peak detection, related to EP39)