EP36: Quanta of Sound — Gabor and the Uncertainty Principle
后续拓展
Overview
In 1946, Dennis Gabor published a paper titled “Theory of Communication,” in which he proposed that any signal can be decomposed into elementary “acoustical quanta” — localized time-frequency atoms carrying the minimum possible uncertainty. This was not a metaphor borrowed from quantum mechanics; it was a precise mathematical claim: a signal cannot simultaneously have arbitrarily sharp resolution in both time and frequency. The constraint is written
where is the temporal spread and is the spectral spread of any square-integrable signal of unit energy. This inequality — the Heisenberg-Gabor uncertainty principle — is not an approximation or an engineering limitation. It is a theorem derived from the Cauchy-Schwarz inequality in .
The episode builds from the proof of this inequality upward through three major structures. First, Gabor atoms: windowed complex sinusoids that serve as elementary time-frequency cells. Projecting a signal onto a family of such atoms yields the Short-Time Fourier Transform (STFT). Second, Gaussian optimality: among all window shapes, only the Gaussian achieves the bound with equality, because the Fourier transform of a Gaussian is again a Gaussian. All other window shapes occupy a strictly larger area on the time-frequency plane. Third, the Balian-Low theorem: no Gabor system sampled at the critical density (one atom per unit area of the time-frequency plane) can form an orthonormal basis with a window that is simultaneously well-localized in both time and frequency. Perfect tiling of the plane and good localization are mathematically incompatible.
The musical consequences are direct. In granular synthesis, each grain is a physical realization of a Gabor atom — a windowed sinusoidal fragment placed at a specific time and pitch location. The uncertainty principle forces a fundamental tradeoff: short grains give precise rhythmic placement but smeared pitch; long grains give clear pitch but lose temporal precision. Xenakis exploited this tradeoff compositionally in the 1960s, and the same mathematics governs every modern audio codec’s block-switching logic.
中文: “1946年,Dennis Gabor在一篇论文里写道——声音由’声学量子’组成。他不是在借用物理学的隐喻,他在做一道数学题:任何信号,在时间上和频率上,不能同时无限精确。这就是时频不确定性,也是本集的主角。sigma t乘以sigma f,大于等于1除以4π。”
Prerequisites / 前置知识
- Shannon information theory (EP07) — Shannon entropy quantifies information content; the capacity-bandwidth tradeoff in Shannon’s theorem is conceptually dual to the time-frequency uncertainty principle explored here.
- 1/f statistics and granular synthesis (EP26) — granular synthesis and Curtis Roads' work on grain-density power laws; the grains introduced there are the musical instances of the Gabor atoms formalized in this episode.
- Familiarity with the inner product, Fourier transform, and Parseval’s theorem (covered informally in earlier episodes on the Fourier series and sampling theorem).
Definitions
Let with (unit energy). The temporal mean and temporal variance are
The spectral mean and spectral variance are defined symmetrically using the Fourier transform :
Since , Parseval’s theorem gives as well, so and are valid probability density functions. The quantities and are the standard deviations of these distributions, measuring effective duration and effective bandwidth respectively.
A Gabor atom (also called a logon in Gabor’s original 1946 paper) with time center , frequency center , and window function is the function
The parameter translates the window in time; the parameter modulates the carrier frequency. When is a normalized Gaussian, the atom achieves the minimum time-frequency area permitted by the uncertainty principle.
Interpretation: A Gabor atom is a complex sinusoid of frequency Hz “windowed” to a neighborhood of time . It occupies a rectangular cell of area proportional to in the time-frequency plane.
Given a signal and a window function , the Short-Time Fourier Transform of is the function of two variables defined by
The squared modulus is the spectrogram of : it gives the energy density of near time and frequency . The phase of carries local frequency information used in the phase vocoder (EP35).
Fix a window and lattice parameters . The Gabor system generated by is the collection
The sampling density of this lattice is , measuring atoms per unit area of the time-frequency plane. The system is a Gabor frame if there exist constants (frame bounds) such that for all :
The critical density is , corresponding to (one atom per unit time-frequency area). At densities (oversampling), reconstruction is stable and non-unique; at (undersampling), reconstruction fails.
Main Theorems / 主要定理
Let satisfy , , and . With as in Definition 36.1, we have
Equality holds if and only if is a (possibly time-shifted and frequency-modulated) Gaussian: for some and with chosen to give unit norm.
Without loss of generality, assume and (the general case follows by modulating and shifting ). We need to show
Step 1: Relate frequency spread to a derivative.
By the Fourier differentiation rule, , so
where the second equality uses Parseval’s theorem. Therefore
Step 2: Apply the Cauchy-Schwarz inequality.
By Cauchy-Schwarz in ,
Step 3: Evaluate the left-hand side using integration by parts.
Integrating by parts (boundary term vanishes since decays at infinity):
using . Therefore the left-hand side of the Cauchy-Schwarz inequality satisfies
Step 4: Combine.
From Steps 2 and 3:
Multiplying both sides by and using Step 1:
Taking square roots: .
Equality condition: Cauchy-Schwarz is an equality if and only if for some constant . The reality condition (from the integration-by-parts step) forces to be real and negative, say with . The ODE has general solution , a Gaussian. Including the displaced/modulated general case gives the stated equality condition.
中文: “把这个不等式写清楚。设一个信号以t为中心,时间宽度是sigma t,频率集中在某频带,宽度sigma f。Cauchy-Schwarz不等式告诉我们,两个L2函数内积的平方,小于等于各自模的平方之积。把这个应用到信号与其导数,就得到sigma t乘以sigma f大于等于1除以4π。”
Among all unit-norm functions , the only functions achieving the equality
are Gaussian atoms of the form (, , normalization constant). In particular, the standard Gaussian
has Fourier transform
which is again a Gaussian. No other window shape (rectangular, Hann, Hamming, Blackman, etc.) achieves .
Existence (Gaussian achieves equality):
For (with by symmetry), compute directly:
Using Gaussian integrals and , we get .
The Fourier transform is , another Gaussian with parameter in the exponent. By the same calculation, .
Therefore:
confirming that the Gaussian achieves equality.
Uniqueness (only Gaussian achieves equality):
This follows directly from the equality condition in Theorem 36.1: the Cauchy-Schwarz step is an equality if and only if the two functions in the inner product are proportional. As shown there, this forces for some , whose only solutions are Gaussians.
Comparison with other windows: For a rectangular window of width , one can compute and (since the sinc-function spectrum has infinite variance). For a Hann window, the spectral variance is finite but , strictly exceeding the Gaussian bound.
中文: “高斯函数g of t等于e的负t平方次方,是唯一能达到不等式等号的函数:sigma t乘以sigma f恰好等于1除以4π。原因在于,高斯函数的傅里叶变换还是高斯函数——时域紧凑,频域同样紧凑。没有其他形状能做到这一点。”
Let and with (critical sampling density). If the Gabor system forms an orthonormal basis (or more generally, a Riesz basis) for , then
That is, either the window has infinite temporal variance, or it has infinite spectral variance (or both). It is impossible to have a critically sampled Gabor Riesz basis whose window is simultaneously well-localized in both time and frequency.
We sketch the proof via the Zak transform, which is the standard tool for studying critical-density Gabor systems.
Step 1: The Zak transform.
For (say without loss of generality), the Zak transform of is
The Zak transform is a unitary map from to . The Gabor system is an orthonormal basis for if and only if for almost every .
Step 2: Continuity forces a zero.
A fundamental property of the Zak transform is that if is continuous and decays sufficiently rapidly, then is continuous on and satisfies the quasi-periodicity condition:
By the quasi-periodicity, tracing around the boundary of as a closed loop in , the winding number of the boundary must equal 1. By the argument principle, a continuous function with winding number 1 around a domain boundary must have at least one zero inside the domain. Therefore cannot have everywhere — it must vanish somewhere.
Step 3: Zero of Zak transform ↔ unbounded localization.
The Zak transform has an expansion in terms of the Fourier series coefficients of ; the decay of those coefficients is controlled by the localization of . A precise calculation (using Poisson summation) shows:
If both and , then belongs to the Sobolev space and is continuous. Combined with Step 2, the Zak transform then has a zero, preventing everywhere. This contradicts the orthonormality condition from Step 1.
Conclusion: For to be an orthonormal basis, must equal 1 in modulus a.e. But good localization forces to be continuous and hence to have a zero. The two requirements are incompatible: one of the two integrability conditions on or must fail, i.e., at least one of is infinite.
中文: “Balian-Low定理的回答是:不能。如果Gabor框架在临界密度下采样,也就是每个时频格恰好有一个原子,那么窗函数要么时域无界,要么频域无界。你不能同时拥有完美铺砌和良好局域化。”
Let (oversampling; density ). Then the Balian-Low obstruction does not apply: it is possible to construct a Gabor frame with a window that is well-localized in both time and frequency (finite and ), while still enabling stable reconstruction of any via the frame expansion
where is the dual frame. In particular, a Gaussian window with any hop size yields a valid frame.
Numerical Examples
Time-frequency area comparison for common windows (normalized to same temporal support):
| Window | Relative TF area | Notes | |
|---|---|---|---|
| Gaussian | 1.00 (minimum) | Unique equality case | |
| Hann | 1.04 | Good sidelobe suppression | |
| Hamming | 1.07 | Slightly larger TF area | |
| Rectangular | (undefined) | Sinc spectrum has infinite variance |
Grain length tradeoff in granular synthesis:
Consider a grain of duration seconds using a Gaussian envelope with (one standard deviation occupies a quarter of the grain length).
| Grain length | (ms) | Min. (Hz) | Min. bandwidth | Perceptual result |
|---|---|---|---|---|
| 5 ms | 1.25 ms | Hz | Very wide | Pitch indeterminate; “click” or noise burst |
| 20 ms | 5 ms | Hz | Wide | Pitch rough, ~quarter-tone |
| 100 ms | 25 ms | Hz | Narrow | Pitch clear; about 5 cents at A4 |
| 500 ms | 125 ms | Hz | Very narrow | Pitch very clear; temporal envelope lost |
Derivation for T = 20 ms:
At A4 = 440 Hz, a bandwidth of ±15.9 Hz corresponds to a frequency uncertainty of ±15.9/440 ≈ ±3.6%, roughly ±62 cents — close to a semitone. In granular synthesis this manifests as perceptibly rough pitch when grains are 20 ms long.
中文: “颗粒越短,时间分辨率越高——你能精确控制声音在哪一刻出现。但颗粒太短,sigma f变大,音高变得模糊,听起来是’咔哒’声,不是音符。颗粒越长,音高越清晰,但时间感消失,声音变成绵延的织体。”
Critical density example:
For STFT with hop size and window length samples at sample rate :
- Time resolution:
- Frequency resolution:
- Sampling density:
For the CD standard with 44100 Hz, samples and samples (75% overlap): . This is four times the critical density, well within the regime where the Balian-Low obstruction is absent and a well-localized Gaussian window can be used.
Musical Connection / 音乐联系
From information theory to acoustical quanta
EP07 (Shannon information theory)
established that the information capacity of a channel is bounded by bandwidth and signal-to-noise ratio. Gabor’s 1946 paper drew an explicit parallel: if information in a signal is localized both in time and frequency, then the number of distinguishable information cells per unit time-frequency area is bounded. A Gabor atom occupies the minimum possible area on the time-frequency plane, so it is the most efficient carrier of one “quantum” of acoustic information. The fundamental unit is the logon, carrying approximately bits — Shannon’s formula written for a single time-frequency cell. This is the same Shannon entropy from EP07, now geometrically embodied in a single atom of sound.
Granular synthesis: Xenakis and Curtis Roads
EP26 (1/f statistics and granular synthesis)
introduced Curtis Roads' discovery that grain densities in natural granular textures exhibit long-range power-law correlations — the grains are not scattered randomly but cluster on multiple time scales. The uncertainty principle explains why different grain lengths are auditorily distinct: a 5 ms grain is mathematically incapable of carrying a definite pitch; a 200 ms grain cannot convey a sharp rhythmic attack. Xenakis composed with clouds of short grains in Analogique A/B (1958–59) and Granular Synthesis (1970) precisely because he wanted sound textures that occupied specific regions of the time-frequency plane without crystallizing into definite pitch or rhythm — a compositional exploitation of the uncertainty principle. Roads' fractal grain statistics from EP26 are a further layer: not only is each individual grain constrained by the inequality, but the distribution of grains across time scales is itself scale-invariant.
MDCT block switching: uncertainty as adaptive design
The forward reference to
EP40 (MDCT and perceptual audio coding)
is a direct application. Modern audio codecs (MP3, AAC, Vorbis) use MDCT blocks of two sizes: a “long” block (2048 samples at 44100 Hz ≈ 46 ms) for stationary signals, and “short” blocks (256 samples ≈ 6 ms) for transients. This is an engineering implementation of the uncertainty tradeoff: a long block gives good frequency resolution (narrow ) for stable tones, but poor time localization (wide ). If a percussive transient arrives mid-block, the coding artifacts spread over the entire block duration — an audible “pre-echo.” Switching to short blocks reduces at the cost of increased , sacrificing frequency resolution for temporal accuracy. The block-switching decision algorithm (detecting signal stationarity via sub-band energy variance) is an automated traversal of the constraint surface.
The STFT and COLA: Balian-Low in engineering form
introduced the Constant Overlap-Add (COLA) condition as the requirement for perfect reconstruction. COLA requires that the sum of time-shifted window copies equals a constant, which is precisely the synthesis frame condition for an oversampled Gabor system. The Balian-Low theorem explains why engineers do not use critically sampled STFT: at , no well-localized window can yield a reconstruction basis. The ubiquitous 75% overlap () in production audio tools is not merely an empirical preference — it reflects the theoretical necessity established by Balian and Low.
The rectangle-sinc duality (EP30)
The rectangular window used in naive STFT analysis is an extreme case of the uncertainty tradeoff. The Fourier transform of a rectangular window of width is , which decays as and has infinite spectral variance (). Conversely, a sinc window in the time domain — which has perfect spectral concentration — is infinitely long in time (). These two extremes — sharp time boundary with infinite frequency spread, and sharp frequency boundary with infinite time spread — are the two poles of the uncertainty principle, separated by the entire space of intermediate windows with the Gaussian sitting at the minimum-area point.
Limits and Open Questions / 局限性与开放问题
-
Infinite support of the Gaussian window: The Gaussian is the unique minimizer of , but it has infinite support and cannot be implemented exactly in any digital system. Real implementations use truncated Gaussians (multiplied by a compact-support window), which break the equality but can be made arbitrarily close to it. Quantifying the tradeoff between truncation length and deviation from the uncertainty bound is a practical optimization problem without a universally agreed solution.
-
Balian-Low for frames (not just Riesz bases): The classical Balian-Low theorem applies to Riesz bases. For general Gabor frames at critical density (not necessarily Riesz), the situation is more delicate. The result extends in various ways, but the precise characterization of which window functions are admissible remains an active research area in time-frequency analysis.
-
Multiwindow and vector-valued Gabor systems: Using multiple windows simultaneously (superframes) can partially circumvent the Balian-Low obstruction. A system of windows at density each (total density 1) can have all windows well-localized. This is related to the theory of MIMO communication and has potential applications in multi-channel audio analysis, but the optimal construction of such multiwindow systems is not yet fully characterized.
-
Heisenberg uncertainty vs. Rényi entropy uncertainty: The classical uncertainty principle measures spread via variance (). Alternative formulations use Rényi entropy or Wehrl entropy instead of second moments, yielding tighter or different uncertainty inequalities. Whether these entropic uncertainty principles have a cleaner musical or perceptual interpretation than the Heisenberg formulation is an open question.
-
Discrete Gabor analysis: The continuous theory developed here requires adaptation for finite discrete signals. For a length- DFT, the Balian-Low theorem does not apply in its classical form, and perfect critically sampled Gabor bases do exist (e.g., the Zak-domain construction). However, for large , the discrete and continuous theories converge, and the same localization tradeoffs become apparent.
In a two-block-size MDCT codec (long block length , short block length ), the optimal switching threshold — the signal non-stationarity measure above which switching to short blocks minimizes total perceptual distortion — should be expressible as a function of the ratio and the masked threshold shape from the auditory model alone, independently of bit rate and sample rate.
Falsifiability criterion: If experiments with a standardized auditory model (e.g., ISO 532B loudness) and variable bit rates show that the optimal threshold varies with bit rate at fixed , the conjecture is falsified — implying that the coding rate itself must enter the block-switching decision.
Academic References / 参考文献
-
Gabor, D. (1946). Theory of communication. Journal of the Institution of Electrical Engineers, 93(26), 429–457. (Original paper introducing acoustical quanta and the time-frequency uncertainty principle for signals)
-
Heisenberg, W. (1927). Über den anschaulichen Inhalt der quantentheoretischen Kinematik und Mechanik. Zeitschrift für Physik, 43(3–4), 172–198. (Physics precursor; the mathematical structure is identical)
-
Gröchenig, K. (2001). Foundations of Time-Frequency Analysis. Birkhäuser, Boston. (Standard graduate reference; covers Gabor frames, Zak transform, Balian-Low theorem with complete proofs)
-
Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM. Ch. 4 (Windowed Fourier transforms and the uncertainty principle; Ch. 8 (Balian-Low theorem)
-
Balian, R. (1981). Un principe d’incertitude fort en théorie du signal ou en mécanique quantique. Comptes Rendus de l’Académie des Sciences, 292, 1357–1362. (First published proof of the Balian-Low theorem)
-
Low, F. (1985). Complete sets of wave packets. In C. DeTar et al. (Eds.), A Passion for Physics — Essays in Honor of Geoffrey Chew (pp. 17–22). World Scientific. (Independent proof of the Balian-Low theorem)
-
Lyubarskii, Yu. I. (1992). Frames in the Bargmann space of entire functions. Advances in Soviet Mathematics, 429, 167–180. (Proof that Gaussian Gabor systems form frames at subcritical density)
-
Seip, K., & Wallstén, R. (1992). Density theorems for sampling and interpolation in the Bargmann-Fock space. Journal für die reine und angewandte Mathematik, 429, 107–113. (Companion paper; characterizes the density thresholds for Gaussian Gabor frames)
-
Roads, C. (2001). Microsound. MIT Press. (Comprehensive treatment of granular synthesis; musical applications of time-frequency uncertainty in compositional practice)
-
Xenakis, I. (1971). Formalized Music: Thought and Mathematics in Composition. Indiana University Press. (Compositional use of granular clouds; stochastic time-frequency distributions)
-
Allen, J. B. (1977). Short term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Transactions on Acoustics, Speech, and Signal Processing, 25(3), 235–238. (Classical reference on STFT overlap-add reconstruction; COLA condition)
-
Zibulski, M., & Zeevi, Y. Y. (1997). Analysis of multiwindow Gabor-type schemes by frame methods. Applied and Computational Harmonic Analysis, 4(2), 188–221. (Multiwindow Gabor systems and partial circumvention of Balian-Low)
-
Feichtinger, H. G., & Strohmer, T. (Eds.). (1998). Gabor Analysis and Algorithms: Theory and Applications. Birkhäuser. (Engineering-oriented treatment of Gabor frames, discrete Gabor analysis, and audio applications)