EP09

EP09: Combination Tones and Nonlinear Acoustics

Product-to-Sum Identities, Cochlear Nonlinearity, Tartini Tones

▶ 6:28 Physics/AcousticsNonlinear Dynamics

前置知识

EP02 String Vibration and the Wave Equation

小红书 Watch on 小红书

Overview

Play two pure tones — 200 Hz and 300 Hz. The spectrum shows exactly two lines. Yet some listeners perceive a faint 100 Hz drone that is not present in the air at all. It is manufactured inside the cochlea.

中文: “这个100赫兹不存在于空气中。它是你的耳蜗制造出来的。那它从哪来的？”

This episode answers that question through the mathematics of nonlinear systems. A linear system passes both frequencies through unchanged. But the cochlea’s outer hair cells act as nonlinear amplifiers: their response curve is an S-shaped tanh function, not a straight line. When you expand a nonlinear response $y = x + \varepsilon x^2 + \delta x^3 + \cdots$ around a two-tone input, algebra forces new frequencies into existence. These are the combination tones — $mf_1 \pm nf_2$ for all non-negative integers $m, n$ .

The same mathematics that explains why pure-fifth intervals feel “locked” also underlies every modern newborn hearing screening: the distortion product otoacoustic emission (DPOAE) test, which records the cochlea’s own self-generated sound to check whether outer hair cells are intact.

Prerequisites

Fourier Analysis and the Wave Equation (EP02) — sinusoids as building blocks, frequency domain
Basic trigonometric identities: $\cos\alpha\cos\beta = \frac{1}{2}[\cos(\alpha-\beta)+\cos(\alpha+\beta)]$
Elementary differential equations (for the hair-cell oscillator model in EP31)

Definitions

Definition 9.1 (Linear System)

A system $L$ mapping input signals to output signals is linear if and only if it satisfies superposition:

L(af_1 + bf_2) = a\,L(f_1) + b\,L(f_2)

for all signals $f_1, f_2$ and scalars $a, b$ .

Consequence for pure tones: if $x(t) = \cos(2\pi f_0 t)$ is a pure tone, then $L(x)(t)$ contains only the frequency $f_0$ . A linear system cannot create new frequencies.

Definition 9.2 (Nonlinear Polynomial Response)

A nonlinear response function $y = F(x)$ is one that cannot be written as $F(x) = cx$ for a constant $c$ . The simplest model is the Taylor expansion

y = x + \varepsilon x^2 + \delta x^3 + \cdots

where $\varepsilon, \delta \ll 1$ are small nonlinearity coefficients. The quadratic term $\varepsilon x^2$ is responsible for second-order combination tones; the cubic term $\delta x^3$ generates third-order combination tones.

Definition 9.3 (Combination Tone)

Given two input frequencies $f_1 < f_2$ , a combination tone is any frequency of the form

f_{m,n} = |m f_1 \pm n f_2|, \qquad m, n \in \mathbb{Z}_{\geq 0},\; m+n \geq 1

The most audible combination tones are:

Order	Formula	Name
2nd	$f_2 - f_1$	simple difference tone
2nd	$f_1 + f_2$	summation tone
3rd	$2f_1 - f_2$	cubic difference tone (CDT)
3rd	$2f_2 - f_1$	upper cubic difference tone

In human hearing, the cubic difference tone $2f_1 - f_2$ is the strongest, because the cochlear nonlinearity is predominantly cubic rather than quadratic.

Definition 9.4 (Distortion Product Otoacoustic Emission (DPOAE))

A distortion product otoacoustic emission is a sound generated by the cochlea itself in response to two simultaneous pure-tone stimuli $f_1$ and $f_2$ ( $f_2 > f_1$ ). The emitted sound propagates back through the middle ear and can be recorded in the ear canal with a sensitive microphone.

The dominant DPOAE frequency is $2f_1 - f_2$ . Its presence and amplitude serve as a diagnostic indicator of outer hair cell (OHC) function. Absence or attenuation of the DPOAE at $2f_1 - f_2$ indicates OHC damage.

Main Theorems

Theorem 9.1 (Second-Order Frequency Generation)

Let the input be $x(t) = A\cos(2\pi f_1 t) + B\cos(2\pi f_2 t)$ . Then the output of the quadratic term $\varepsilon x^2$ is

\varepsilon x^2 = \varepsilon\bigl[A\cos(2\pi f_1 t) + B\cos(2\pi f_2 t)\bigr]^2

which contains sinusoidal components at frequencies $0$, $2f_1$ , $2f_2$ , $f_1 + f_2$ , and $|f_1 - f_2|$ .

Specifically:

\varepsilon x^2 = \varepsilon\left[\frac{A^2+B^2}{2} + \frac{A^2}{2}\cos(4\pi f_1 t) + \frac{B^2}{2}\cos(4\pi f_2 t) + AB\cos(2\pi(f_1-f_2)t) + AB\cos(2\pi(f_1+f_2)t)\right]

Proof.

Expand $\varepsilon(A\cos\alpha + B\cos\beta)^2$ where $\alpha = 2\pi f_1 t$ and $\beta = 2\pi f_2 t$ :

\varepsilon x^2 = \varepsilon\left(A^2\cos^2\alpha + 2AB\cos\alpha\cos\beta + B^2\cos^2\beta\right)

Apply the identities $\cos^2\theta = \frac{1+\cos 2\theta}{2}$ and $\cos\alpha\cos\beta = \frac{1}{2}[\cos(\alpha-\beta)+\cos(\alpha+\beta)]$ :

= \varepsilon\left[\frac{A^2}{2}(1+\cos 2\alpha) + AB(\cos(\alpha-\beta)+\cos(\alpha+\beta)) + \frac{B^2}{2}(1+\cos 2\beta)\right]

= \varepsilon\left[\frac{A^2+B^2}{2} + \frac{A^2}{2}\cos(4\pi f_1 t) + \frac{B^2}{2}\cos(4\pi f_2 t) + AB\cos(2\pi(f_1-f_2)t) + AB\cos(2\pi(f_1+f_2)t)\right]

The new frequencies $f_1 - f_2$ and $f_1 + f_2$ arise purely from the algebraic expansion. No such frequencies exist in the input spectrum. $\square$

Theorem 9.2 (Third-Order Cubic Difference Tone)

For input $x(t) = A\cos(2\pi f_1 t) + B\cos(2\pi f_2 t)$ , the cubic term $\delta x^3$ generates, among other components, a sinusoid at the cubic difference tone frequency $2f_1 - f_2$ with amplitude $\frac{3}{4}\delta A^2 B$ .

In particular, for equal amplitudes $A = B$ , this term has amplitude $\frac{3}{4}\delta A^3$ .

Proof.

We need the cross-term in $\delta(A\cos\alpha + B\cos\beta)^3$ that produces frequency $2f_1 - f_2$ . The expansion yields terms of the form $\cos^2\alpha\cos\beta$ :

3\delta A^2 B \cos^2\alpha\cos\beta = 3\delta A^2 B \cdot \frac{1+\cos 2\alpha}{2} \cdot \cos\beta

= \frac{3\delta A^2 B}{2}\cos\beta + \frac{3\delta A^2 B}{2}\cos 2\alpha\cos\beta

Apply the product formula to the last term:

\frac{3\delta A^2 B}{2}\cos(2\alpha-\beta) + \frac{3\delta A^2 B}{2}\cos(2\alpha+\beta)

The term $\frac{3\delta A^2 B}{2}\cos(2\alpha-\beta) = \frac{3\delta A^2 B}{2}\cos(2\pi(2f_1-f_2)t)$ is the cubic difference tone at $2f_1 - f_2$ .

Since the full expansion of $\cos^3$ and the cross-terms also contribute to the coefficient by a factor of $\frac{1}{2}$ (from averaging), the amplitude of the CDT component is $\frac{3}{4}\delta A^2 B$ . $\square$

Theorem 9.3 (Harmonic Alignment Under Just Intonation)

Let

f_1

and

f_2 = \frac{3}{2}f_1

be a just perfect fifth. Then all combination tones

|mf_1 - nf_2|

for

m, n \in \{0,1,2\}

are integer multiples of

\frac{f_1}{2}

. In particular, they all belong to the harmonic series with fundamental

\frac{f_1}{2}

Proof.

Set $f_1 = 2F$ so that $f_2 = 3F$ , where $F = f_1/2$ . Then:

$f_2 - f_1 = F$ (1st harmonic of $F$ )
$f_1 = 2F$ (2nd harmonic)
$f_2 = 3F$ (3rd harmonic)
$2f_1 - f_2 = 4F - 3F = F$ (1st harmonic)
$2f_2 - f_1 = 6F - 2F = 4F$ (4th harmonic)
$f_1 + f_2 = 5F$ (5th harmonic)

All values are positive integer multiples of $F$ . They form a complete harmonic series $\{F, 2F, 3F, 4F, 5F\}$ .

中文: “看这组数字——它们全部是100赫兹的整数倍。第1、第2、第3、第4、第5谐波，一个不差。神经元看到的是一面整齐的栅栏——每根栏杆间距完全相等。同步放电。大脑说：这是一个声音。”

$\square$

Prop 9.1 (12-TET Fifth Misalignment)

For the equal-tempered fifth

f_2 = 2^{7/12}f_1 \approx 1.4983\,f_1

, the combination tones

2f_1 - f_2

and

f_2 - f_1

are not integer multiples of any common fundamental. The misalignment (compared to the just fifth) is approximately

|f_2^{\text{ET}} - f_2^{\text{just}}| \approx 0.0017 f_1

, which corresponds to about 2 cents per fifth — small but systematic.

Proof.

For $f_1 = 200$ Hz, $f_2^{\text{just}} = 300$ Hz, $f_2^{\text{ET}} = 200 \times 2^{7/12} \approx 299.66$ Hz.

Just CDT: $2(200) - 300 = 100$ Hz
ET CDT: $2(200) - 299.66 = 100.34$ Hz

These differ by 0.34 Hz. Similarly, $f_2^{\text{ET}} - f_1 = 99.66$ Hz vs. 100 Hz for just. The combination tones no longer align on a harmonic series: no integer $k$ satisfies $k \times 99.66 = 100.34 \times j$ for small integers $j$ . The slight incommensurability produces beating — a slow amplitude modulation at $\approx 0.34$ Hz for the CDT, and $\approx 0.68$ Hz between the two difference tones — which is perceived as a mild instability or “floating” quality compared to the locked just fifth. $\square$

Numerical Example: Tartini’s Major Third

Tartini (1714) observed a third tone when bowing a major third. For a just major third $f_2/f_1 = 5/4$ :

Set $f_1 = 400$ Hz, $f_2 = 500$ Hz.

f_2 - f_1 = 100 \text{ Hz} \quad \Bigl(\text{two octaves below } f_1\Bigr)

2f_1 - f_2 = 800 - 500 = 300 \text{ Hz} \quad \Bigl(= \tfrac{3}{4}f_1\Bigr)

2f_2 - f_1 = 1000 - 400 = 600 \text{ Hz} \quad \Bigl(= \tfrac{3}{2}f_1\Bigr)

The difference tone 100 Hz is $f_1/4$ — two octaves below the lower note. All tones lie on the harmonic series of 100 Hz (the 4th, 5th, 6th, 8th harmonics). Tartini used the clarity of this 100 Hz ghost tone to verify that his double-stop was in tune.

Musical Connection

音乐联系

Why pure fifths “lock” and equal-tempered fifths “float”

The perception of consonance in sustained intervals (especially by string players) is tightly coupled to combination tone alignment. When $f_2/f_1 = 3/2$ exactly, all second- and third-order combination tones fall on a shared harmonic series. The auditory nerve, which phase-locks to individual frequency components, receives a periodic pattern whose inter-spike intervals are all integer multiples of the same period. The brainstem perceives this as “one sound” — what string players call the sensation of locking or resonating in.

When $f_2/f_1 = 2^{7/12} \approx 1.4983$ , the combination tones are incommensurable. The auditory nerve receives multiple nearly-but-not-quite periodic patterns, producing slow intermodulation — perceived as a gentle beating or floating quality.

中文: “组合音不是协和感的全部解释，但它是’锁定感’的一个重要机制。不只是音符本身在决定稳不稳——你的耳蜗制造的副产品，也在参与投票。”

DPOAE and the cochlea as a sound source

The cubic difference tone $2f_1 - f_2$ is not merely perceived — it is physically emitted. Measurements by Kemp (1978) confirmed that a microphone placed in the ear canal can detect this frequency while the listener hears the two-tone stimulus. The cochlea is not merely a passive receiver; it is an active, nonlinear amplifier that broadcasts evidence of its own computation.

中文: “1714年，意大利小提琴家Tartini拉双音时听到了这个幽灵般的第三音。他教学生用它来校准纯律五度——如果第三音清晰稳定，说明音准对了。他以为自己发现了一个演奏技巧。其实他发现了所有哺乳动物耳蜗的基本工作原理——比Helmholtz的严格数学分析早了149年。”

Forward connection to EP31 and EP32

The tanh saturation curve of the outer hair cell is the same nonlinearity that Moog used to model tube saturation in filter design (EP31), and the polynomial expansion $x + \varepsilon x^2 + \delta x^3$ is the generating mechanism for Chebyshev waveshaping (EP32). Nonlinear distortion is simultaneously a biological fact, a psychoacoustic mechanism, and a synthesis tool.

Limits and Open Questions

Cochlear mechanics vs. neural origin. Whether combination tones arise primarily in the mechanical nonlinearity of the basilar membrane (Helmholtz) or in the neural transduction process (Goldstein 1967) is still debated. The DPOAE evidence strongly favors a mechanical, pre-neural origin for the $2f_1-f_2$ tone, but higher-order combination tones may have neural contributions.
Threshold and masking. Combination tones are audible only above certain stimulus levels and in the absence of masking noise. A full psychoacoustic model requires integrating the nonlinear generation mechanism with critical-band masking theory (EP40). The transition between “audible CDT” and “masked CDT” is not sharp and depends on individual cochlear gain.
Musical consonance as a single-mechanism account. The harmonic alignment argument for consonance (Theorem 9.3) is necessary but not sufficient. Cultural conditioning, spectral envelope, and sensory fusion all contribute. The Plomp-Levelt (1965) roughness model provides a complementary account based on critical-band interference rather than combination tones.
Composition using DPOAEs. Maryanne Amacher’s work (referenced in the narration) raises the open question: can a composer write music whose “notes” are the listener’s own DPOAEs? This requires knowing the individual listener’s cochlear gain curve — a personalized, biologically-dependent compositional parameter.

Academic References

Helmholtz, H. L. F. (1877). On the Sensations of Tone as a Physiological Basis for the Theory of Music (A. J. Ellis, Trans.). Longmans, Green. (Original German: 1863.)
Tartini, G. (1754). Trattato di Musica Secondo la Vera Scienza dell’Armonia. Padova.
Kemp, D. T. (1978). Stimulated acoustic emissions from within the human auditory system. Journal of the Acoustical Society of America, 64(5), 1386–1391.
Goldstein, J. L. (1967). Auditory nonlinearity. Journal of the Acoustical Society of America, 41(3), 676–689.
Plomp, R., & Levelt, W. J. M. (1965). Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38(4), 548–560.
Robles, L., Ruggero, M. A., & Rich, N. C. (1997). Two-tone distortion in the basilar membrane of the cochlea. Nature, 349, 413–414.
Zwicker, E., & Fastl, H. (1999). Psychoacoustics: Facts and Models (2nd ed.). Springer. Ch. 4 (Nonlinear distortion).