EP08: Shepard Tones and the Tritone Paradox
Overview
A tone rises through twelve semitones. Then it rises again. It has been rising for thirty seconds and is at exactly the same pitch it started on.
This is not an acoustic trick — it is a topological fact. Roger Shepard’s 1964 construction exploits the two-dimensional structure of human pitch perception: every pitch has a chroma (position on a circle, ) and a height (octave number, ). Pitch lives on a helix . By smearing the spectral energy uniformly across octaves with a bell-shaped envelope, Shepard erases the height coordinate, leaving only the chroma circle .
Once height is ambiguous, the tritone (six semitones, diametrically opposite on the chroma circle) has no objective “direction.” Diana Deutsch’s 1986 experiments revealed that listeners consistently hear the same Shepard tritone pair as ascending or descending depending on their linguistic background — specifically the tonal range of their native language. Your perception fills the missing dimension with cultural prior information.
Mathematically this episode covers: quotient topology, the universal cover of , and a capsule account of spherical harmonics for 3D audio.
中文: “数学揭示了结构,感知填充了内容。当证据不够时,你听到的不是声音本身——而是你自己。”
Prerequisites
- Fourier Series and Timbre (EP02) — Fourier decomposition of audio signals into sinusoidal partials
- All-Interval Rows and (EP04) — the cyclic group structure of pitch classes
Definitions
The pitch helix is the topological space
parameterized by:
- Chroma : the pitch class modulo octave equivalence, with for semitone
- Height : the octave number (e.g., for the octave containing middle C)
A physical tone with fundamental frequency maps to the point with chroma (fractional part, giving position on the circle) and height .
The helix projects onto two quotient spaces:
- Onto (chroma circle) by forgetting height:
- Onto (height line) by forgetting chroma:
A Shepard tone at chroma is the superposition of pure sinusoids at all octaves of the corresponding pitch, weighted by a fixed bell-shaped (Gaussian) spectral envelope centered at a fixed reference height :
where selects the octave closest to the center of the envelope.
The crucial property: the envelope is the same for every chroma . Moving from to (one semitone) shifts every component up by one semitone: the highest partial fades out, a new lowest partial fades in. The overall spectral shape is preserved.
A tritone is the interval of six semitones — exactly half an octave. On the chroma circle , two pitches and are antipodal: they are diametrically opposite, equidistant in both the clockwise and counterclockwise directions.
The tritone was called diabolus in musica (the devil in music) in medieval counterpoint and was forbidden in strict voice-leading (see EP05).
The chroma circle is the quotient space (the real line with integers identified): two real numbers and are identified iff .
A continuous surjective map defined by is the universal covering map. The real line is the universal cover of ; it is simply connected (no holes), while has fundamental group .
In pitch perception: the full pitch continuum (ordered by frequency) is the universal cover of the chroma circle. Hearing a Shepard tone means perceiving only the chroma without being able to lift to a specific height in .
Main Theorems
Step 1 (Individual steps go up): When chroma increases from to , every partial in at frequency maps to in where . Every partial rises.
Step 2 (Global spectrum is unchanged): The set of frequencies with the same envelope values is invariant under the shift (equivalently ), because this replaces by — a relabeling of the summation index, which preserves the sum (under appropriate convergence of the envelope).
Step 3 (Paradox): The listener hears individual partials rising (Step 1) but the global spectral envelope is unchanged (Step 2). The auditory system integrates the local motion cue (upward), producing the percept of continuous ascent, while lacking the information needed to track absolute height. After 12 steps, the identical signal repeats — the perception is of infinite ascent on a circle.
Let and be antipodal chromas. A Shepard tone pair presented in sequence carries no objective information about direction (ascending vs. descending) because the clockwise and counterclockwise arcs on between and are equal in length.
Formally: any continuous “direction function” that is antisymmetric () and depends only on the circular arc length must satisfy , which is not in — i.e., no such function can be defined at antipodal pairs.
The circular distance between antipodal points and on (normalized to circumference 1) is in both directions. An antisymmetric function satisfying (i.e., if the shorter clockwise arc is less than , otherwise) is undefined (or must choose a convention) exactly when the arc lengths are equal, i.e., at separation . There is no canonical choice of at this degeneracy point.
Experimental consequence (Deutsch 1986): Since the sound itself is ambiguous, listeners' responses are determined by an internal prior — a learned “height template” mapping chroma to expected height. Deutsch found that this template correlates with the speaker’s native language tonal range: speakers of tonal languages with high fundamental frequencies (e.g., Vietnamese) have templates peaked at different chroma values than speakers of non-tonal languages (e.g., California English). The template is calibrated in childhood by the statistical distribution of fundamental frequencies in ambient speech.
The chroma circle is the quotient group (real numbers modulo 12, corresponding to semitones modulo octave equivalence). The natural projection is a group homomorphism with kernel .
The pitch helix decomposes pitch into an element of the quotient group (chroma) and a coset representative (height). The Shepard construction projects out the coset representative, retaining only the quotient group element.
is an abelian group under addition. is a closed subgroup. The quotient group has elements , and inherits addition . Since is a discrete closed subgroup of , the quotient is a Lie group homeomorphic to .
Any can be written uniquely as where (the height component, selecting the octave) and (the chroma component). The projection forgets and retains .
Visual: A helix drawn in 3D space: the vertical axis is height (octave), the circular axis is chroma. A Shepard tone “flattens” the helix onto the circle by making the vertical coordinate invisible. An animation shows pitches climbing the helix while the projection onto the circle cycles.
Musical Connection
Bach’s Canon per Tonos (1747) — a topological anticipation
Bach’s Musical Offering (BWV 1079) contains a modulating canon, Canon per Tonos, that ascends through all six whole-tone steps of the major scale before returning to the starting key — but one octave higher. Played in an infinite loop, it creates the same ascending-without-arriving sensation as a Shepard scale, 217 years before Shepard’s experiment.
The mathematical structure is identical: the canon traverses the quotient space (six modulating steps) while the fundamental frequency shifts upward, making the “height” component grow without bound. The listener, if the performance starts and ends identically (which requires an implicit octave transposition), perceives a circular journey.
Hofstadter’s Gödel, Escher, Bach (1979) opens with this canon as an emblem of strange loops — self-referential structures that seem to ascend or descend while cycling back.
Hans Zimmer’s Dunkirk (2017) — three layered Shepard scales
Christopher Nolan’s Dunkirk uses Shepard tones composed by Hans Zimmer as a structural device: three interlocking Shepard scales at different tempos correspond to the film’s three temporal scales (one hour, one day, one week). The perpetual ascending tension physically embodies the narrative’s escalating dread. The mathematical structure (three independent loops in , perceived as a single climbing line) mirrors the film’s three narrative threads that independently cycle while the audience experiences them as linearly converging.
Spatial audio extension: spherical harmonics
For 3D sound field representation (ambisonics), the two-dimensional chroma circle generalizes to the sphere . The analog of Fourier modes on are spherical harmonics:
where are associated Legendre polynomials, is the degree (angular frequency), and is the order. An order- ambisonics system uses all spherical harmonics up to degree , representing the 3D sound field as a finite spatial Fourier series on the sphere.
Limits and Open Questions
-
Individual differences in the tritone paradox. Deutsch’s finding that native language predicts direction responses is robust across replication, but the mechanism (critical period learning of fundamental frequency statistics vs. genetic predisposition) is debated. A fully predictive mathematical model of the individual height template is an open problem in auditory neuroscience.
-
Continuous Shepard tones and glissandi. The discrete Shepard scale uses 12 steps. A continuous variant (Risset glissando, named after Jean-Claude Risset, 1969) uses a continuously varying chroma and produces a continuously ascending tone. The mathematical treatment requires the topology of path spaces on — which paths lift to bounded paths in the universal cover ?
-
Pitch perception in other species. Shepard tone experiments have been conducted on macaques, rhesus monkeys, and some songbirds. Whether the tritone paradox (linguistically calibrated height template) generalizes to non-linguistic animals is an open question that would test the cultural vs. acoustic origin of the height prior.
-
Higher-dimensional pitch spaces. Tymoczko’s voice-leading geometry and Balzano’s (1980) group-theoretic model of tonal pitch space both propose spaces richer than . Whether there is a single “correct” mathematical model of pitch perception space, or whether different tasks (melody recognition, harmony, voice-leading) engage different geometric representations, is an active research question.
-
Persistent homology of pitch spaces. Topological data analysis (TDA) can be applied to chroma features of audio to detect the topology of the pitch class distribution. Whether higher Betti numbers () appear in the pitch space of complex harmonies (chords, polyphony) is an open application of computational topology.
Academic References
-
Shepard, R. N. (1964). Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 36(12), 2346–2353. (The original Shepard tone paper.)
-
Shepard, R. N. (1982). Geometrical approximations to the structure of musical pitch. Psychological Review, 89(4), 305–333. (The pitch helix formalization.)
-
Deutsch, D. (1986). A musical paradox. Music Perception, 3(3), 275–280. (First report of the tritone paradox.)
-
Deutsch, D., Henthorn, T., & Dolson, M. (2004). Absolute pitch, speech, and tone language: Some experiments and a proposed framework. Music Perception, 21(3), 339–356. (Language-pitch connection.)
-
Risset, J.-C. (1969). Pitch control and pitch paradoxes demonstrated with computer-synthesized sounds. Journal of the Acoustical Society of America, 46(1A), 88. (The continuous Shepard / Risset glissando.)
-
Hofstadter, D. R. (1979). Gödel, Escher, Bach: An Eternal Golden Braid. Basic Books. (Bach’s Canon per Tonos as a strange loop; Shepard tones as an emblem of circularity.)
-
Tymoczko, D. (2011). A Geometry of Music. Oxford University Press. Ch. 2 (Pitch-class space as quotient).
-
Hatcher, A. (2002). Algebraic Topology. Cambridge University Press. Ch. 1 (Covering spaces and the fundamental group of ).
-
Ziemer, T. (2020). Psychoacoustics: Auditory Display in Science and Technology. Springer. Ch. 4 (Spatial audio and ambisonics).