EP19

EP19: Vibrato as FM Synthesis

频率调制方程, Bessel函数Jₙ(β), Jacobi-Anger展开, 共振峰扫描

▶ 8:10 Signal ProcessingPhysics/AcousticsPsychoacoustics

前置知识

EP09 Combination Tones and Nonlinear Acoustics EP15 Opera Acoustics and the Singer's Formant

后续拓展

EP34 The Bessel Secret of the DX7 — FM Synthesis Mathematics

Overview

A pure A4 (440 Hz) and a gently trembling A4. The strange thing is — the trembling one sounds more stable, more vivid.

中文: “这就是颤音，vibrato。每个歌剧歌手都在用它，但很少有人意识到，它背后的数学和雅马哈 DX7 合成器完全一样。”

Vibrato is periodic modulation of the carrier frequency — one sine wave nested inside another. On the surface, only one pitch is wobbling. But the Jacobi-Anger expansion reveals a hidden spectral structure: symmetric sidebands appear on both sides of the carrier, each with amplitude governed by the Bessel function $J_n(\beta)$ . As the modulation index $\beta$ grows from small to large, the sound evolves from vibrato into an entirely new timbre — a discovery John Chowning made at Stanford in 1967, which gave birth to the DX7 synthesizer and the sound of an entire decade.

This episode builds the complete mathematical chain from the FM equation to spectral expansion, and shows how vibrato sidebands repeatedly sweep through the Singer's Formant window (EP15) , giving opera singers their penetrating edge.

Prerequisites

Combination Tones and Nonlinear Acoustics (EP09) — nonlinear systems generating new frequencies; the cochlear $\tanh$ response curve
Opera Acoustics and the Singer's Formant (EP15) — the 3 kHz window, $F_3/F_4/F_5$ formant clustering, penetration index

Definitions

Definition 19.1 (Frequency Modulation Signal)

Let carrier frequency $f_c > 0$ (Hz), modulation frequency $f_m > 0$ (Hz), modulation index $\beta \geq 0$ (dimensionless), and amplitude $A > 0$ . The frequency modulation (FM) signal is defined as

$x(t) = A \sin\!\bigl(2\pi f_c\, t + \beta \sin(2\pi f_m\, t)\bigr)$

Three parameters and their physical meaning:

Parameter	Name	Meaning	Typical value (opera vibrato)
$f_c$	Carrier frequency	The pitch the singer is singing	440 Hz (A4)
$f_m$	Modulation rate	The speed of trembling	5–7 Hz
$\beta$	Modulation index	The amplitude of frequency deviation	2–4

The outer sine is the carrier; the inner sine is the modulating wave. The modulating wave does not appear directly in the output spectrum — it shapes the spectrum by altering the carrier’s instantaneous phase.

中文: “就这一个公式。载波加正弦调制。看起来简单。但把它展开，你会看到一个完全不同的频谱世界。”

Unpacking the phase function

The FM signal $x(t) = A\sin(\phi(t))$ is just a sine wave whose argument — the total phase $\phi(t)$ — varies with time. For a pure tone at frequency $f_c$ , phase grows linearly:

\phi_{\text{pure}}(t) = 2\pi f_c\, t

After $t = 1/f_c$ seconds (one period), the phase advances by exactly $2\pi$ radians — one full cycle. The factor $2\pi$ converts from cycles to radians, the natural unit for trigonometric functions.

FM adds a sinusoidal perturbation to this linear phase ramp:

\phi(t) = \underbrace{2\pi f_c\, t}_{\text{linear trend}} + \underbrace{\beta \sin(2\pi f_m\, t)}_{\text{phase wobble}}

The second term wobbles the phase back and forth around the linear trend. The modulation index $\beta$ controls how many radians the phase deviates — this is why $\beta$ is dimensionless (radians / radian = 1).

中文: “相位函数就是正弦波的’角度参数'。纯音的角度匀速增长；FM 在匀速上叠加了一个来回摆动。”

From phase to instantaneous frequency: the $1/(2\pi)$ factor

Definition 19.2 (Instantaneous Frequency)

Differentiating the phase gives the angular velocity:

$\omega(t) = \frac{d\phi}{dt} = 2\pi f_c + 2\pi \beta f_m \cos(2\pi f_m\, t) \quad \text{(rad/s)}$

But frequency is conventionally measured in Hz (cycles per second), not radians per second. Since one full cycle = $2\pi$ radians, we divide by $2\pi$ to convert:

$f_{\text{inst}}(t) = \frac{1}{2\pi}\frac{d\phi}{dt} = f_c + \beta f_m \cos(2\pi f_m\, t)$

The $1/(2\pi)$ is a unit conversion factor: the $2\pi$ that was built into the phase definition cancels out cleanly when we differentiate and divide.

Concept	Symbol	Units	Role
Phase	$\phi(t)$	radians	Total accumulated angle at time $t$
Angular frequency	$d\phi/dt$	rad/s	Rate of phase change
Instantaneous frequency	$\frac{1}{2\pi}\frac{d\phi}{dt}$	Hz	Cycles per second
Frequency deviation	$\Delta f = \beta f_m$	Hz	Max swing of $f_{\text{inst}}$ from $f_c$

The instantaneous frequency oscillates periodically between $f_c - \beta f_m$ and $f_c + \beta f_m$ . The product $\Delta f = \beta f_m$ is the frequency deviation (Hz).

Worked example: Opera vibrato with $f_c = 440$ Hz, $f_m = 6$ Hz, $\beta = 2$ :

$\Delta f = \beta f_m = 2 \times 6 = 12 \text{ Hz}$

The instantaneous frequency swings between 428 Hz and 452 Hz — approximately ±50 cents, i.e. ±one semitone. This is the typical extent of operatic vibrato.

Definition 19.3 (Bessel Functions of the First Kind)

The Bessel function of the first kind $J_n(\beta)$ ( $n \in \mathbb{Z}$ , $\beta \geq 0$ ) has several equivalent definitions. The most direct is the integral representation:

$J_n(\beta) = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i(\beta \sin\theta - n\theta)}\, d\theta = \frac{1}{\pi} \int_0^{\pi} \cos(n\theta - \beta\sin\theta)\, d\theta$

Equivalently, $J_n(\beta)$ solves the Bessel differential equation:

$\beta^2 J_n''(\beta) + \beta J_n'(\beta) + (\beta^2 - n^2) J_n(\beta) = 0$

Key properties:

Symmetry: $J_{-n}(\beta) = (-1)^n J_n(\beta)$
Normalization: $\sum_{n=-\infty}^{\infty} J_n^2(\beta) = 1$ (Parseval identity; total energy is conserved)
Behavior at zero: $J_0(0) = 1$ , $J_n(0) = 0$ for all $n \neq 0$
First zero: $J_0(\beta) = 0$ first occurs at $\beta \approx 2.4048$

The physical meaning of property 2: FM modulation neither creates nor destroys energy — it only redistributes the carrier energy among the sidebands. As $\beta$ grows, energy transfers from the carrier ( $J_0$ ) to higher-order sidebands ( $J_1, J_2, \ldots$ ).

Definition 19.4 (Sidebands)

The FM signal’s spectrum consists of the carrier flanked by symmetric sidebands:

$f_c, \quad f_c \pm f_m, \quad f_c \pm 2f_m, \quad f_c \pm 3f_m, \quad \ldots$

The $n$ -th pair of sidebands sits at $f_c \pm nf_m$ , with amplitude determined by $J_n(\beta)$ . Sideband spacing is uniformly $f_m$ .

When $\beta = 0$ , only the carrier remains — a pure tone. As $\beta$ increases, sidebands emerge and the carrier weakens.

中文: “这个公式表面上只有一根正弦波，但内部嵌套了另一根正弦波。要看清楚它对频谱做了什么，我们把它展开——就像用棱镜把白光分成彩虹。”

Definition 19.5 (Carson Bandwidth Rule)

The Carson bandwidth rule (Carson, 1922) gives an approximation for the bandwidth containing 98% of the FM signal’s power:

$B_{\text{Carson}} = 2(\beta + 1) f_m$

Opera vibrato ( $\beta = 2$ , $f_m = 6$ Hz) yields a Carson bandwidth of

$B = 2(2 + 1) \times 6 = 36 \text{ Hz}$

The spectrum is concentrated within $440 \pm 18$ Hz. Compare DX7 synthesis ( $\beta = 10$ , $f_m = 300$ Hz): $B = 2(10+1) \times 300 = 6600$ Hz — the bandwidth expands 180-fold.

Main Theorems

Theorem 19.1 (Jacobi-Anger Expansion)

For any real $\beta$ and $\theta$ ,

$e^{i\beta\sin\theta} = \sum_{n=-\infty}^{\infty} J_n(\beta)\, e^{in\theta}$

The equivalent real form is

$\sin(\alpha + \beta\sin\theta) = \sum_{n=-\infty}^{\infty} J_n(\beta)\, \sin(\alpha + n\theta)$

Substituting the FM signal parameters ( $\alpha = 2\pi f_c\, t$ , $\theta = 2\pi f_m\, t$ ) yields

$x(t) = A \sum_{n=-\infty}^{\infty} J_n(\beta)\, \sin\!\bigl(2\pi(f_c + n f_m)\, t\bigr)$

Physical interpretation: The FM signal is a superposition of infinitely many pure sinusoids. The $n$ -th component has frequency $f_c + nf_m$ and amplitude $A \cdot J_n(\beta)$ .

Proof.

Strategy: Treat $e^{i\beta\sin\theta}$ as a $2\pi$ -periodic function of $\theta$ and expand it as a Fourier series.

Step 1: Fourier coefficients. Write

$e^{i\beta\sin\theta} = \sum_{n=-\infty}^{\infty} c_n\, e^{in\theta}$

The Fourier coefficients are

$c_n = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i\beta\sin\theta}\, e^{-in\theta}\, d\theta = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i(\beta\sin\theta - n\theta)}\, d\theta$

Step 2: Identification with Bessel functions. Comparing with the integral representation in Definition 19.3:

$J_n(\beta) = \frac{1}{2\pi} \int_{-\pi}^{\pi} e^{i(\beta\sin\theta - n\theta)}\, d\theta$

we conclude $c_n = J_n(\beta)$ .

Step 3: Substitution into the FM signal. Setting $\theta = 2\pi f_m\, t$ :

$e^{i\beta\sin(2\pi f_m t)} = \sum_{n=-\infty}^{\infty} J_n(\beta)\, e^{i \cdot 2\pi n f_m\, t}$

The FM signal can be written as

$x(t) = A\, \text{Im}\!\left[e^{i2\pi f_c t} \cdot e^{i\beta\sin(2\pi f_m t)}\right] = A\, \text{Im}\!\left[\sum_{n=-\infty}^{\infty} J_n(\beta)\, e^{i2\pi(f_c + nf_m)t}\right]$

Taking the imaginary part:

$x(t) = A \sum_{n=-\infty}^{\infty} J_n(\beta)\, \sin\!\bigl(2\pi(f_c + nf_m)\, t\bigr) \qquad \square$

Convergence: For fixed $\beta$ , $J_n(\beta) \to 0$ as $|n| \to \infty$ (exponential decay), so the series converges absolutely and uniformly.

Theorem 19.2 (Energy Conservation (Bessel-Parseval Identity))

For any real $\beta$ ,

$\sum_{n=-\infty}^{\infty} J_n^2(\beta) = 1$

Equivalently, $J_0^2(\beta) + 2\sum_{n=1}^{\infty} J_n^2(\beta) = 1$ (using $J_{-n} = (-1)^n J_n$ , hence $J_{-n}^2 = J_n^2$ ).

Proof.

By Parseval’s theorem, the Fourier coefficients $c_n = J_n(\beta)$ of the $2\pi$ -periodic function $f(\theta) = e^{i\beta\sin\theta}$ satisfy

$\sum_{n=-\infty}^{\infty} |c_n|^2 = \frac{1}{2\pi}\int_{-\pi}^{\pi} |f(\theta)|^2\, d\theta$

Since $|e^{i\beta\sin\theta}| = 1$ for all $\theta$ , the right-hand side equals

$\frac{1}{2\pi}\int_{-\pi}^{\pi} 1\, d\theta = 1$

Therefore $\sum_{n=-\infty}^{\infty} J_n^2(\beta) = 1$ . $\square$

Physical meaning: FM modulation only redistributes total power ( $\propto A^2$ ) among the sidebands. Regardless of $\beta$ , the mean-square power of $x(t)$ remains $A^2/2$ . Energy is neither created nor destroyed by modulation — it only flows between the carrier and its sidebands.

Theorem 19.3 (Carrier Vanishing Points)

The carrier component (frequency $f_c$ ) has amplitude $A \cdot J_0(\beta)$ . The first zero of $J_0$ occurs at

$\beta_1^{(0)} \approx 2.4048$

At $\beta = \beta_1^{(0)}$ , the carrier line vanishes entirely from the spectrum. All energy resides in the sidebands.

More generally, $J_0$ has infinitely many positive zeros $\beta_1^{(0)} < \beta_2^{(0)} < \beta_3^{(0)} < \cdots$ (spaced approximately $\pi$ apart); the carrier vanishes at each one.

Proof.

The power series expansion of $J_0(\beta)$ is

$J_0(\beta) = \sum_{k=0}^{\infty} \frac{(-1)^k}{(k!)^2}\left(\frac{\beta}{2}\right)^{2k} = 1 - \frac{\beta^2}{4} + \frac{\beta^4}{64} - \cdots$

This is a damped oscillatory function (asymptotically resembling $\cos\beta / \sqrt{\beta}$ ). By the intermediate value theorem, $J_0(0) = 1 > 0$ and $J_0(3) < 0$ (verifiable by substitution into the power series), so $J_0$ has at least one zero in $(0, 3)$ . Numerical computation gives $\beta_1^{(0)} \approx 2.4048$ .

Asymptotic formula ( $\beta \gg 1$ ):

$J_0(\beta) \sim \sqrt{\frac{2}{\pi\beta}}\cos\!\left(\beta - \frac{\pi}{4}\right)$

This confirms zeros are spaced approximately $\pi$ apart, with an envelope decaying as $1/\sqrt{\beta}$ . $\square$

中文: “当 β 大约等于二点四时，J零第一次等于零——载波线完全消失了。声音还在继续，但原来的基频那条线不见了。全部能量都在边带里。”

Prop 19.1 (Spectral Signature of Opera Vibrato)

For typical opera vibrato parameters $f_c = 440$ Hz, $f_m = 6$ Hz, $\beta = 2$ , the expanded sideband amplitudes are

Component	Frequency (Hz)	Amplitude $J_n(2)$	Power fraction $J_n^2(2)$
Carrier $n=0$	440	$J_0(2) \approx 0.224$	5.0%
1st pair $n=\pm 1$	434, 446	$J_1(2) \approx 0.577$	33.3% (each side)
2nd pair $n=\pm 2$	428, 452	$J_2(2) \approx 0.353$	12.5% (each side)
3rd pair $n=\pm 3$	422, 458	$J_3(2) \approx 0.129$	1.7% (each side)

Verification: $0.224^2 + 2(0.577^2 + 0.353^2 + 0.129^2 + \cdots) \approx 0.050 + 0.666 + 0.249 + 0.033 \approx 1.0$ $\checkmark$

At $\beta = 2$ the carrier holds only 5% of total power — most energy has already shifted to the first and second sideband pairs. But since sideband spacing is only 6 Hz (far below the auditory frequency resolution threshold), the ear fuses the entire spectral cluster into a single pitch with a “warm trembling” quality.

Proof.

Direct substitution into the Theorem 19.1 expansion, using numerical tables (or power series computation) for $J_n(2)$ :

$J_0(2) = 1 - 1 + 1/4 - 1/36 + \cdots \approx 0.2239$

$J_1(2) = 1 - 1/2 + 1/12 - 1/144 + \cdots \approx 0.5767$

$J_2(2) = 1/2 - 1/3 + 1/16 - \cdots \approx 0.3528$

$J_3(2) = 1/6 - 1/12 + \cdots \approx 0.1289$

Power conservation is guaranteed by Theorem 19.2. $\square$

From Vibrato to Synthesis: Chowning’s Discovery

Theorem 19.4 (Perceptual Partition of FM Parameter Space)

The FM equation $x(t) = A\sin(2\pi f_c\, t + \beta\sin(2\pi f_m\, t))$ produces three perceptual regions in the $(f_m, \beta)$ parameter plane:

Vibrato region ( $f_m \lesssim 7$ Hz, $\beta \lesssim 4$ ): The auditory system tracks the periodic deviation of the instantaneous frequency and perceives “pitch is moving.” Carson bandwidth $B = 2(\beta+1)f_m \lesssim 60$ Hz is well below the critical bandwidth (~100 Hz at 440 Hz), so sidebands are unresolvable.
Vibrato-tremolo transition ( $f_m \approx 7\text{--}10$ Hz): The modulation rate exceeds the auditory system’s pitch-tracking capacity. Perception gradually shifts from “pitch is moving” to “amplitude is fluctuating” (tremolo). This is not a sharp boundary but a gradual crossover depending on $\beta$ and individual differences.
FM synthesis region ( $f_m \gtrsim 20$ Hz, especially when $f_m$ is comparable to $f_c$ ): Sideband spacing exceeds the critical bandwidth; individual sidebands are resolved as independent frequency components. $\beta$ controls timbral richness — different $(f_m, \beta)$ combinations simulate brass, electric piano, bells, and other timbres.

Proof.

Vibrato region spectral constraint: With $\beta = 2$ , $f_m = 6$ Hz, Carson bandwidth $B = 2(2+1) \times 6 = 36$ Hz. At 440 Hz, the equivalent rectangular bandwidth (ERB) is approximately

$\text{ERB}(440) = 24.7(4.37 \times 0.44 + 1) = 24.7 \times 2.92 \approx 72 \text{ Hz}$

(Moore-Glasberg formula, 1983). Since $B < \text{ERB}$ , all sidebands fall within a single critical band and are fused by the auditory system into a single “broadened” pitch.

FM synthesis region spectral constraint: With $f_m = 300$ Hz, $\beta = 5$ , $B = 2(5+1) \times 300 = 3600$ Hz. The sideband spacing of 300 Hz far exceeds the ERB; individual components are independently perceived — producing an entirely new timbre.

Transition mechanism: When $f_m$ approaches ~7–10 Hz, the modulation period (~100–140 ms) nears the auditory system’s pitch-tracking time constant (~130 ms, Demany & Semal 1989). When tracking fails, the auditory system reinterprets frequency modulation as envelope modulation — i.e. tremolo. $\square$

中文: “他意识到，颤音和音色合成只是同一个方程的不同参数。低调制频率低调制深度是颤音，高调制频率高调制深度是FM合成。”

Historical note: In 1967, John Chowning was working in the Stanford AI Laboratory, adjusting these parameters. He raised $f_m$ from 6 Hz to several hundred Hz. The sound abruptly changed — no longer a trembling note, but an entirely new timbre: brass, electric piano, bells.

中文: “1967年，乔宁把调制频率从六赫兹提高到几百赫兹时，声音突然变了。不再是颤抖的音符，而是一种全新的音色——铜管、电钢琴、钟声。”

He published in 1973 (The Synthesis of Complex Audio Spectra by Means of Frequency Modulation). In 1983 the Yamaha DX7 commercialized the equation. The DX7 sold 200,000 units and defined the sound of 1980s pop music.

中文: “同一个公式。歌剧舞台上它是颤音。录音棚里它是合成器。让我们来听一听。同一个方程，改变两个参数，声音完全不同。”

DX7-Style FM Demo

The video demonstrates four timbres synthesized from the same FM equation, using carrier-to-modulator ratios and time-varying modulation index envelopes:

Preset	$f_c$	$f_m$	$f_c : f_m$	$\beta_{\text{peak}} \to \beta_{\text{sustain}}$	Character
Electric Piano	200 Hz	200 Hz	1 : 1	8 → 2	Bright attack, mellow sustain
Bell	200 Hz	280 Hz	1 : 1.4	12 → 4	Metallic shimmer, long ring
Brass	200 Hz	200 Hz	1 : 1	10 → 6	Slow buildup, sustained brightness
Wood Bass	100 Hz	100 Hz	1 : 1	5 → 1	Punchy attack, warm decay

The key insight from Codex review: real DX7 patches use time-varying $\beta$ , not a static modulation index. The modulation envelope decays faster than the carrier amplitude envelope — this is what gives the characteristic “bright attack that mellows out.” Each preset uses separate ADSR envelopes for amplitude and $\beta$ , with the $\beta$ envelope decaying from $\beta_{\text{peak}}$ to $\beta_{\text{sustain}}$ .

The irrational ratio 1 : 1.4 for the bell is deliberate — inharmonic sidebands (at non-integer multiples of $f_c$ ) produce the metallic, bell-like quality characteristic of FM bells.

The Formant Sweep Mechanism

Theorem 19.5 (Vibrato Formant Frequency Sweep)

Let the voice fundamental frequency be $f_0$ , with harmonic series $f_0, 2f_0, 3f_0, \ldots$ When vibrato (modulation rate $f_m$ , modulation index $\beta$ ) is applied, every harmonic undergoes the same FM modulation simultaneously. The instantaneous frequency of the $k$ -th harmonic is

$f_k(t) = k f_0 + k \beta f_m \cos(2\pi f_m\, t)$

The frequency deviation $\Delta f_k = k\beta f_m$ grows linearly with harmonic number $k$ .

Taking A4 as an example ( $f_0 = 440$ Hz, $f_m = 6$ Hz, $\beta \approx 4$ , corresponding to ±1 semitone or ±25 Hz deviation):

Harmonic	Center freq.	Deviation $\Delta f_k$	Sweep range
$k = 1$	440 Hz	±25 Hz	415–465 Hz
$k = 3$	1320 Hz	±75 Hz	1245–1395 Hz
$k = 7$	3080 Hz	±175 Hz	2905–3255 Hz
$k = 8$	3520 Hz	±200 Hz	3320–3720 Hz

The 7th and 8th harmonics sweep across the

Singer’s Formant window (2500–3500 Hz, EP15 Definition 15.3)

. These harmonics act like brooms, sweeping through the formant 5–7 times per second.

Proof.

If the fundamental is FM-modulated: $f_0(t) = f_0 + \beta f_m \cos(2\pi f_m t)$ , then in the linear source model, the $k$ -th harmonic’s frequency is $k$ times the fundamental:

$f_k(t) = k \cdot f_0(t) = k f_0 + k\beta f_m \cos(2\pi f_m t)$

The deviation $\Delta f_k = k\beta f_m$ grows linearly.

For A4 ( $f_0 = 440$ Hz), the harmonics nearest 3 kHz are $k = 7$ (3080 Hz) and $k = 8$ (3520 Hz). With $\beta \approx 4$ , $f_m = 6$ Hz:

$\Delta f_7 = 7 \times 4 \times 6 = 168 \text{ Hz}, \qquad \Delta f_8 = 8 \times 4 \times 6 = 192 \text{ Hz}$

Sweep ranges: $3080 \pm 168 = [2912, 3248]$ Hz and $3520 \pm 192 = [3328, 3712]$ Hz, covering the 2500–3500 Hz Singer’s Formant window.

Effect: Sidebands continually enter and exit the formant window, repeatedly routing existing harmonic energy through the formant peak. Total energy is conserved (Theorem 19.2) — no additional energy is created. Rather, time-averaged over the modulation cycle, the formant band is repeatedly excited, producing a sustained, flickering brightness. $\square$

中文: “颤音不只是’好听的抖动'。它是一种频率扫描策略，让歌手的声音在三千赫兹窗口里持续闪烁，穿透力反而更强。”

Perceptual Averaging and the “More to Hear” Principle

A key insight from the narration synthesizes perception and spectrum:

中文: “颤音让音高在一个小范围内循环波动，但大脑对整个调制周期做时间平均——感知到的是波动中心的音高，而不是瞬时极值。同时，边带带来的额外频谱成分给了听觉系统更多锚点，声音反而更实、更清晰。抖动，给了大脑更多东西听。”

The auditory system performs temporal integration over the full modulation period. The perceived pitch is the time-averaged center of the wobble, not the instantaneous extremes. Meanwhile, the additional spectral components introduced by the sidebands provide the auditory system with more “anchor points” — the sound becomes richer and more vivid, not less stable. Paradoxically, the trembling adds perceptual clarity.

Numerical Example: Varying the Modulation Index

Fixing $f_c = 440$ Hz, $f_m = 6$ Hz, observe how the spectrum evolves as $\beta$ increases from 0:

$\beta$	$J_0(\beta)$	$J_1(\beta)$	$J_2(\beta)$	$J_3(\beta)$	Musical type	Freq. deviation
0	1.000	0	0	0	Pure tone	0
0.5	0.938	0.242	0.031	–	Pop vibrato	±3 Hz
2	0.224	0.577	0.353	0.129	Opera vibrato	±12 Hz
2.405	0	0.520	0.432	0.199	Carrier vanishes	±14.4 Hz
4	−0.397	−0.066	0.364	0.430	Wide vibrato	±24 Hz
10	−0.246	0.043	0.255	0.058	DX7 synthesis	±60 Hz

At $\beta = 2.405$ , $J_0 = 0$ : the carrier line disappears entirely. The sound continues, but the spectral line at the “fundamental” is gone — all energy is in the sidebands.

Four Perspectives on One Equation

The narration concludes by viewing the FM equation through four complementary lenses:

中文: “一个方程，四个视角。”

Perception: Vibrato causes pitch to cycle within a small range, but the brain time-averages over the full modulation period — perceiving the center pitch, not the instantaneous extremes. The sidebands supply the auditory system with extra spectral anchors, making the sound feel more solid and vivid.
Spectrum: Bessel functions distribute energy across sidebands. Small $\beta$ is vibrato; large $\beta$ is synthesis.
Acoustic engineering: Sidebands repeatedly sweep through the formant; time-averaged, they sustain excitation of the 3 kHz band and enhance penetration.
Synthesis: The same equation, two parameters changed, leads from the opera stage straight to the DX7.

中文: “频率调制。从振动到合成，从贝塞尔到三千赫兹。歌手的颤抖和合成器的音色，用的是同一个方程——只是把调制频率从六赫兹推到几百赫兹，把调制深度从几推到几十。两个参数的移动，跨越了整个音乐史。”

Musical Connection

音乐联系

Vibrato: A Frequency Sweep Strategy

Vibrato is not merely “pleasant trembling.” From Theorem 19.5, it is a frequency sweep strategy: it makes the singer’s upper harmonics flicker continuously inside the

3 kHz window (EP15)

. Sidebands repeatedly sweep through the formant; time-averaged, they sustain excitation of the Singer’s Formant band. Penetration comes not from greater loudness but from smarter frequency allocation.

From EP09 to EP19: Nonlinearity and Modulation

EP09

showed how nonlinear systems create new frequencies from two inputs (combination tones $mf_1 \pm nf_2$ ). EP19’s FM is a different mechanism — not nonlinearity generating new frequencies, but phase modulation redistributing energy into an existing sideband structure. Yet the final effect is similar: a single spectral line broadens into many. The cochlea’s nonlinearity (EP09) and the vocal cord’s frequency modulation (EP19) both enrich the spectrum — one at the receiver, one at the transmitter.

One Equation, Two Worlds

The same formula. $f_m = 6$ Hz, $\beta = 2$ — opera vibrato on stage. $f_m = 300$ Hz, $\beta = 10$ — electric piano in the DX7 studio. A shift of two parameters spans the entire history from vocal music to electronic music.

FM Signatures Across Singing Styles

Style	$f_m$	$\beta$	Extent	Perception
Opera	5–7 Hz	2–4	±1/2–1 semitone	Warm, sustained, penetrating
Pop	4–5 Hz	0.5–2	±1/4–1/2 semitone	Soft, decorative
Tremolo transition	7–10 Hz	–	–	From “pitch moving” to “amplitude fluctuating”

**Forward:

EP34 (DX7 FM Synthesis)

This episode establishes the mathematical foundation of the FM equation. EP34 will build on it to examine the DX7’s six-operator architecture, the bifurcation phenomena caused by operator feedback, and the practical application of the Carson bandwidth rule in synthesizer design.

The Closing Image

中文: “下次你听到歌剧歌手的声音在微微颤动，别只是觉得好听。那是一个频率调制方程在运行。边带在扫过共振峰，贝塞尔函数在分配能量，三千赫兹的窗口在持续闪烁。”

中文: “一切都藏在那个公式里。”

Limits and Open Questions

The Jacobi-Anger expansion assumes pure sinusoidal modulation. Real opera vibrato is not a perfect sine — it typically includes a slow onset phase and irregular period-to-period fluctuations. FM signals with non-sinusoidal modulation have no closed-form Bessel expansion; STFT or wavelet analysis (EP35/EP36) is needed.
Limitation of the linear source model. Theorem 19.5 assumes each harmonic’s frequency deviation is strictly proportional to its harmonic number. But the glottal wave is not an ideal harmonic source — nonlinearities during the glottal closed phase cause upper harmonics to deviate from linear predictions.
Perceptual nonlinearity. The ear perceives frequency deviation logarithmically (in cents), not linearly. The same Hz deviation corresponds to different cent deviations in low and high registers. A complete perceptual model requires the ERB frequency scale (Glasberg & Moore, 1990) combined with the modulation transfer function (Viemeister, 1979).
The vibrato-tremolo boundary. The transition zone (~7–10 Hz) in Theorem 19.4 is an empirical observation lacking a precise mathematical characterization. The key question is: how do modulation frequency and modulation depth jointly determine the perceptual crossover? Does there exist a critical curve $\beta_{\text{crit}}(f_m)$ separating vibrato from tremolo percepts?

Conjecture (Vibrato-Tremolo Perceptual Boundary)

In the $(f_m, \beta)$ parameter space, there exists a perceptual critical curve $\mathcal{C}$ separating the “vibrato percept” from the “tremolo percept.” Preliminary evidence suggests $\mathcal{C}$ runs roughly along $f_m \approx 7$ Hz, but bends toward lower $f_m$ at large $\beta$ — deeper modulation causes the auditory system to abandon frequency tracking sooner.

Open question: What is the precise shape of $\mathcal{C}$ ? Does it depend on carrier frequency $f_c$ ?

References

Chowning, J. M. (1973). The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society, 21(7), 526–534. — The founding paper of FM synthesis; complete derivation from vibrato to synthesizer.
Watson, G. N. (1944). A Treatise on the Theory of Bessel Functions (2nd ed.). Cambridge University Press. — Standard reference for Bessel functions; rigorous proof of the Jacobi-Anger expansion (Ch. 2).
Sundberg, J. (1987). The Science of the Singing Voice. Northern Illinois University Press. — Measured acoustic parameters of opera vibrato ( $f_m = 5\text{--}7$ Hz, $\beta \approx 2\text{--}4$ ); interaction between the Singer’s Formant and vibrato.
Carson, J. R. (1922). Notes on the theory of modulation. Proceedings of the IRE, 10(1), 57–64. — Original derivation of the FM bandwidth approximation formula.
Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. — Equivalent rectangular bandwidth (ERB) formula; estimating FM sideband resolvability.
Demany, L., & Semal, C. (1989). Detection thresholds for sinusoidal frequency modulation. Journal of the Acoustical Society of America, 85(3), 1295–1301. — FM detection thresholds; psychoacoustic data on the vibrato-tremolo transition zone.
Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. Journal of the Acoustical Society of America, 66(5), 1364–1380. — Temporal modulation transfer functions; bandwidth limitations of the auditory system’s amplitude and frequency modulation tracking.
Prame, E. (1997). Vibrato extent and intonation in professional Western lyric singing. Journal of the Acoustical Society of America, 102(1), 616–621. — Vibrato parameter statistics from 42 opera singers (mean $f_m = 5.7$ Hz, mean extent ±0.6 semitones).

前置知识

后续拓展

Overview

Prerequisites

Definitions

Unpacking the phase function

From phase to instantaneous frequency: the factor

Main Theorems

From Vibrato to Synthesis: Chowning’s Discovery

DX7-Style FM Demo

The Formant Sweep Mechanism

Perceptual Averaging and the “More to Hear” Principle

Numerical Example: Varying the Modulation Index

Four Perspectives on One Equation

Musical Connection

The Closing Image

Limits and Open Questions

References

From phase to instantaneous frequency: the $1/(2\pi)$ factor