EP19: Vibrato as FM Synthesis
Overview
A pure A4 (440 Hz) and a gently trembling A4. The strange thing is — the trembling one sounds more stable, more vivid.
中文: “这就是颤音,vibrato。每个歌剧歌手都在用它,但很少有人意识到,它背后的数学和雅马哈 DX7 合成器完全一样。”
Vibrato is periodic modulation of the carrier frequency — one sine wave nested inside another. On the surface, only one pitch is wobbling. But the Jacobi-Anger expansion reveals a hidden spectral structure: symmetric sidebands appear on both sides of the carrier, each with amplitude governed by the Bessel function . As the modulation index grows from small to large, the sound evolves from vibrato into an entirely new timbre — a discovery John Chowning made at Stanford in 1967, which gave birth to the DX7 synthesizer and the sound of an entire decade.
This episode builds the complete mathematical chain from the FM equation to spectral expansion, and shows how vibrato sidebands repeatedly sweep through the Singer's Formant window (EP15) , giving opera singers their penetrating edge.
Prerequisites
- Combination Tones and Nonlinear Acoustics (EP09) — nonlinear systems generating new frequencies; the cochlear response curve
- Opera Acoustics and the Singer's Formant (EP15) — the 3 kHz window, formant clustering, penetration index
Definitions
Let carrier frequency (Hz), modulation frequency (Hz), modulation index (dimensionless), and amplitude . The frequency modulation (FM) signal is defined as
Three parameters and their physical meaning:
| Parameter | Name | Meaning | Typical value (opera vibrato) |
|---|---|---|---|
| Carrier frequency | The pitch the singer is singing | 440 Hz (A4) | |
| Modulation rate | The speed of trembling | 5–7 Hz | |
| Modulation index | The amplitude of frequency deviation | 2–4 |
The outer sine is the carrier; the inner sine is the modulating wave. The modulating wave does not appear directly in the output spectrum — it shapes the spectrum by altering the carrier’s instantaneous phase.
中文: “就这一个公式。载波加正弦调制。看起来简单。但把它展开,你会看到一个完全不同的频谱世界。”
Unpacking the phase function
The FM signal is just a sine wave whose argument — the total phase — varies with time. For a pure tone at frequency , phase grows linearly:
After seconds (one period), the phase advances by exactly radians — one full cycle. The factor converts from cycles to radians, the natural unit for trigonometric functions.
FM adds a sinusoidal perturbation to this linear phase ramp:
The second term wobbles the phase back and forth around the linear trend. The modulation index controls how many radians the phase deviates — this is why is dimensionless (radians / radian = 1).
中文: “相位函数就是正弦波的’角度参数'。纯音的角度匀速增长;FM 在匀速上叠加了一个来回摆动。”
From phase to instantaneous frequency: the factor
Differentiating the phase gives the angular velocity:
But frequency is conventionally measured in Hz (cycles per second), not radians per second. Since one full cycle = radians, we divide by to convert:
The is a unit conversion factor: the that was built into the phase definition cancels out cleanly when we differentiate and divide.
| Concept | Symbol | Units | Role |
|---|---|---|---|
| Phase | radians | Total accumulated angle at time | |
| Angular frequency | rad/s | Rate of phase change | |
| Instantaneous frequency | Hz | Cycles per second | |
| Frequency deviation | Hz | Max swing of from |
The instantaneous frequency oscillates periodically between and . The product is the frequency deviation (Hz).
Worked example: Opera vibrato with Hz, Hz, :
The instantaneous frequency swings between 428 Hz and 452 Hz — approximately ±50 cents, i.e. ±one semitone. This is the typical extent of operatic vibrato.
The Bessel function of the first kind (, ) has several equivalent definitions. The most direct is the integral representation:
Equivalently, solves the Bessel differential equation:
Key properties:
- Symmetry:
- Normalization: (Parseval identity; total energy is conserved)
- Behavior at zero: , for all
- First zero: first occurs at
The physical meaning of property 2: FM modulation neither creates nor destroys energy — it only redistributes the carrier energy among the sidebands. As grows, energy transfers from the carrier () to higher-order sidebands ().
The FM signal’s spectrum consists of the carrier flanked by symmetric sidebands:
The -th pair of sidebands sits at , with amplitude determined by . Sideband spacing is uniformly .
When , only the carrier remains — a pure tone. As increases, sidebands emerge and the carrier weakens.
中文: “这个公式表面上只有一根正弦波,但内部嵌套了另一根正弦波。要看清楚它对频谱做了什么,我们把它展开——就像用棱镜把白光分成彩虹。”
The Carson bandwidth rule (Carson, 1922) gives an approximation for the bandwidth containing 98% of the FM signal’s power:
Opera vibrato (, Hz) yields a Carson bandwidth of
The spectrum is concentrated within Hz. Compare DX7 synthesis (, Hz): Hz — the bandwidth expands 180-fold.
Main Theorems
For any real and ,
The equivalent real form is
Substituting the FM signal parameters (, ) yields
Physical interpretation: The FM signal is a superposition of infinitely many pure sinusoids. The -th component has frequency and amplitude .
Strategy: Treat as a -periodic function of and expand it as a Fourier series.
Step 1: Fourier coefficients. Write
The Fourier coefficients are
Step 2: Identification with Bessel functions. Comparing with the integral representation in Definition 19.3:
we conclude .
Step 3: Substitution into the FM signal. Setting :
The FM signal can be written as
Taking the imaginary part:
Convergence: For fixed , as (exponential decay), so the series converges absolutely and uniformly.
For any real ,
Equivalently, (using , hence ).
By Parseval’s theorem, the Fourier coefficients of the -periodic function satisfy
Since for all , the right-hand side equals
Therefore .
Physical meaning: FM modulation only redistributes total power () among the sidebands. Regardless of , the mean-square power of remains . Energy is neither created nor destroyed by modulation — it only flows between the carrier and its sidebands.
The carrier component (frequency ) has amplitude . The first zero of occurs at
At , the carrier line vanishes entirely from the spectrum. All energy resides in the sidebands.
More generally, has infinitely many positive zeros (spaced approximately apart); the carrier vanishes at each one.
The power series expansion of is
This is a damped oscillatory function (asymptotically resembling ). By the intermediate value theorem, and (verifiable by substitution into the power series), so has at least one zero in . Numerical computation gives .
Asymptotic formula ():
This confirms zeros are spaced approximately apart, with an envelope decaying as .
中文: “当 β 大约等于二点四时,J零第一次等于零——载波线完全消失了。声音还在继续,但原来的基频那条线不见了。全部能量都在边带里。”
For typical opera vibrato parameters Hz, Hz, , the expanded sideband amplitudes are
| Component | Frequency (Hz) | Amplitude | Power fraction |
|---|---|---|---|
| Carrier | 440 | 5.0% | |
| 1st pair | 434, 446 | 33.3% (each side) | |
| 2nd pair | 428, 452 | 12.5% (each side) | |
| 3rd pair | 422, 458 | 1.7% (each side) |
Verification:
At the carrier holds only 5% of total power — most energy has already shifted to the first and second sideband pairs. But since sideband spacing is only 6 Hz (far below the auditory frequency resolution threshold), the ear fuses the entire spectral cluster into a single pitch with a “warm trembling” quality.
Direct substitution into the Theorem 19.1 expansion, using numerical tables (or power series computation) for :
Power conservation is guaranteed by Theorem 19.2.
From Vibrato to Synthesis: Chowning’s Discovery
The FM equation produces three perceptual regions in the parameter plane:
-
Vibrato region ( Hz, ): The auditory system tracks the periodic deviation of the instantaneous frequency and perceives “pitch is moving.” Carson bandwidth Hz is well below the critical bandwidth (~100 Hz at 440 Hz), so sidebands are unresolvable.
-
Vibrato-tremolo transition ( Hz): The modulation rate exceeds the auditory system’s pitch-tracking capacity. Perception gradually shifts from “pitch is moving” to “amplitude is fluctuating” (tremolo). This is not a sharp boundary but a gradual crossover depending on and individual differences.
-
FM synthesis region ( Hz, especially when is comparable to ): Sideband spacing exceeds the critical bandwidth; individual sidebands are resolved as independent frequency components. controls timbral richness — different combinations simulate brass, electric piano, bells, and other timbres.
Vibrato region spectral constraint: With , Hz, Carson bandwidth Hz. At 440 Hz, the equivalent rectangular bandwidth (ERB) is approximately
(Moore-Glasberg formula, 1983). Since , all sidebands fall within a single critical band and are fused by the auditory system into a single “broadened” pitch.
FM synthesis region spectral constraint: With Hz, , Hz. The sideband spacing of 300 Hz far exceeds the ERB; individual components are independently perceived — producing an entirely new timbre.
Transition mechanism: When approaches ~7–10 Hz, the modulation period (~100–140 ms) nears the auditory system’s pitch-tracking time constant (~130 ms, Demany & Semal 1989). When tracking fails, the auditory system reinterprets frequency modulation as envelope modulation — i.e. tremolo.
中文: “他意识到,颤音和音色合成只是同一个方程的不同参数。低调制频率低调制深度是颤音,高调制频率高调制深度是FM合成。”
Historical note: In 1967, John Chowning was working in the Stanford AI Laboratory, adjusting these parameters. He raised from 6 Hz to several hundred Hz. The sound abruptly changed — no longer a trembling note, but an entirely new timbre: brass, electric piano, bells.
中文: “1967年,乔宁把调制频率从六赫兹提高到几百赫兹时,声音突然变了。不再是颤抖的音符,而是一种全新的音色——铜管、电钢琴、钟声。”
He published in 1973 (The Synthesis of Complex Audio Spectra by Means of Frequency Modulation). In 1983 the Yamaha DX7 commercialized the equation. The DX7 sold 200,000 units and defined the sound of 1980s pop music.
中文: “同一个公式。歌剧舞台上它是颤音。录音棚里它是合成器。让我们来听一听。同一个方程,改变两个参数,声音完全不同。”
DX7-Style FM Demo
The video demonstrates four timbres synthesized from the same FM equation, using carrier-to-modulator ratios and time-varying modulation index envelopes:
| Preset | Character | ||||
|---|---|---|---|---|---|
| Electric Piano | 200 Hz | 200 Hz | 1 : 1 | 8 → 2 | Bright attack, mellow sustain |
| Bell | 200 Hz | 280 Hz | 1 : 1.4 | 12 → 4 | Metallic shimmer, long ring |
| Brass | 200 Hz | 200 Hz | 1 : 1 | 10 → 6 | Slow buildup, sustained brightness |
| Wood Bass | 100 Hz | 100 Hz | 1 : 1 | 5 → 1 | Punchy attack, warm decay |
The key insight from Codex review: real DX7 patches use time-varying , not a static modulation index. The modulation envelope decays faster than the carrier amplitude envelope — this is what gives the characteristic “bright attack that mellows out.” Each preset uses separate ADSR envelopes for amplitude and , with the envelope decaying from to .
The irrational ratio 1 : 1.4 for the bell is deliberate — inharmonic sidebands (at non-integer multiples of ) produce the metallic, bell-like quality characteristic of FM bells.
The Formant Sweep Mechanism
Let the voice fundamental frequency be , with harmonic series When vibrato (modulation rate , modulation index ) is applied, every harmonic undergoes the same FM modulation simultaneously. The instantaneous frequency of the -th harmonic is
The frequency deviation grows linearly with harmonic number .
Taking A4 as an example ( Hz, Hz, , corresponding to ±1 semitone or ±25 Hz deviation):
| Harmonic | Center freq. | Deviation | Sweep range |
|---|---|---|---|
| 440 Hz | ±25 Hz | 415–465 Hz | |
| 1320 Hz | ±75 Hz | 1245–1395 Hz | |
| 3080 Hz | ±175 Hz | 2905–3255 Hz | |
| 3520 Hz | ±200 Hz | 3320–3720 Hz |
The 7th and 8th harmonics sweep across the
Singer’s Formant window (2500–3500 Hz, EP15 Definition 15.3)
. These harmonics act like brooms, sweeping through the formant 5–7 times per second.
If the fundamental is FM-modulated: , then in the linear source model, the -th harmonic’s frequency is times the fundamental:
The deviation grows linearly.
For A4 ( Hz), the harmonics nearest 3 kHz are (3080 Hz) and (3520 Hz). With , Hz:
Sweep ranges: Hz and Hz, covering the 2500–3500 Hz Singer’s Formant window.
Effect: Sidebands continually enter and exit the formant window, repeatedly routing existing harmonic energy through the formant peak. Total energy is conserved (Theorem 19.2) — no additional energy is created. Rather, time-averaged over the modulation cycle, the formant band is repeatedly excited, producing a sustained, flickering brightness.
中文: “颤音不只是’好听的抖动'。它是一种频率扫描策略,让歌手的声音在三千赫兹窗口里持续闪烁,穿透力反而更强。”
Perceptual Averaging and the “More to Hear” Principle
A key insight from the narration synthesizes perception and spectrum:
中文: “颤音让音高在一个小范围内循环波动,但大脑对整个调制周期做时间平均——感知到的是波动中心的音高,而不是瞬时极值。同时,边带带来的额外频谱成分给了听觉系统更多锚点,声音反而更实、更清晰。抖动,给了大脑更多东西听。”
The auditory system performs temporal integration over the full modulation period. The perceived pitch is the time-averaged center of the wobble, not the instantaneous extremes. Meanwhile, the additional spectral components introduced by the sidebands provide the auditory system with more “anchor points” — the sound becomes richer and more vivid, not less stable. Paradoxically, the trembling adds perceptual clarity.
Numerical Example: Varying the Modulation Index
Fixing Hz, Hz, observe how the spectrum evolves as increases from 0:
| Musical type | Freq. deviation | |||||
|---|---|---|---|---|---|---|
| 0 | 1.000 | 0 | 0 | 0 | Pure tone | 0 |
| 0.5 | 0.938 | 0.242 | 0.031 | – | Pop vibrato | ±3 Hz |
| 2 | 0.224 | 0.577 | 0.353 | 0.129 | Opera vibrato | ±12 Hz |
| 2.405 | 0 | 0.520 | 0.432 | 0.199 | Carrier vanishes | ±14.4 Hz |
| 4 | −0.397 | −0.066 | 0.364 | 0.430 | Wide vibrato | ±24 Hz |
| 10 | −0.246 | 0.043 | 0.255 | 0.058 | DX7 synthesis | ±60 Hz |
At , : the carrier line disappears entirely. The sound continues, but the spectral line at the “fundamental” is gone — all energy is in the sidebands.
Four Perspectives on One Equation
The narration concludes by viewing the FM equation through four complementary lenses:
中文: “一个方程,四个视角。”
-
Perception: Vibrato causes pitch to cycle within a small range, but the brain time-averages over the full modulation period — perceiving the center pitch, not the instantaneous extremes. The sidebands supply the auditory system with extra spectral anchors, making the sound feel more solid and vivid.
-
Spectrum: Bessel functions distribute energy across sidebands. Small is vibrato; large is synthesis.
-
Acoustic engineering: Sidebands repeatedly sweep through the formant; time-averaged, they sustain excitation of the 3 kHz band and enhance penetration.
-
Synthesis: The same equation, two parameters changed, leads from the opera stage straight to the DX7.
中文: “频率调制。从振动到合成,从贝塞尔到三千赫兹。歌手的颤抖和合成器的音色,用的是同一个方程——只是把调制频率从六赫兹推到几百赫兹,把调制深度从几推到几十。两个参数的移动,跨越了整个音乐史。”
Musical Connection
Vibrato: A Frequency Sweep Strategy
Vibrato is not merely “pleasant trembling.” From Theorem 19.5, it is a frequency sweep strategy: it makes the singer’s upper harmonics flicker continuously inside the
. Sidebands repeatedly sweep through the formant; time-averaged, they sustain excitation of the Singer’s Formant band. Penetration comes not from greater loudness but from smarter frequency allocation.
From EP09 to EP19: Nonlinearity and Modulation
showed how nonlinear systems create new frequencies from two inputs (combination tones ). EP19’s FM is a different mechanism — not nonlinearity generating new frequencies, but phase modulation redistributing energy into an existing sideband structure. Yet the final effect is similar: a single spectral line broadens into many. The cochlea’s nonlinearity (EP09) and the vocal cord’s frequency modulation (EP19) both enrich the spectrum — one at the receiver, one at the transmitter.
One Equation, Two Worlds
The same formula. Hz, — opera vibrato on stage. Hz, — electric piano in the DX7 studio. A shift of two parameters spans the entire history from vocal music to electronic music.
FM Signatures Across Singing Styles
| Style | Extent | Perception | ||
|---|---|---|---|---|
| Opera | 5–7 Hz | 2–4 | ±1/2–1 semitone | Warm, sustained, penetrating |
| Pop | 4–5 Hz | 0.5–2 | ±1/4–1/2 semitone | Soft, decorative |
| Tremolo transition | 7–10 Hz | – | – | From “pitch moving” to “amplitude fluctuating” |
**Forward:
**
This episode establishes the mathematical foundation of the FM equation. EP34 will build on it to examine the DX7’s six-operator architecture, the bifurcation phenomena caused by operator feedback, and the practical application of the Carson bandwidth rule in synthesizer design.
The Closing Image
中文: “下次你听到歌剧歌手的声音在微微颤动,别只是觉得好听。那是一个频率调制方程在运行。边带在扫过共振峰,贝塞尔函数在分配能量,三千赫兹的窗口在持续闪烁。”
中文: “一切都藏在那个公式里。”
Limits and Open Questions
-
The Jacobi-Anger expansion assumes pure sinusoidal modulation. Real opera vibrato is not a perfect sine — it typically includes a slow onset phase and irregular period-to-period fluctuations. FM signals with non-sinusoidal modulation have no closed-form Bessel expansion; STFT or wavelet analysis (EP35/EP36) is needed.
-
Limitation of the linear source model. Theorem 19.5 assumes each harmonic’s frequency deviation is strictly proportional to its harmonic number. But the glottal wave is not an ideal harmonic source — nonlinearities during the glottal closed phase cause upper harmonics to deviate from linear predictions.
-
Perceptual nonlinearity. The ear perceives frequency deviation logarithmically (in cents), not linearly. The same Hz deviation corresponds to different cent deviations in low and high registers. A complete perceptual model requires the ERB frequency scale (Glasberg & Moore, 1990) combined with the modulation transfer function (Viemeister, 1979).
-
The vibrato-tremolo boundary. The transition zone (~7–10 Hz) in Theorem 19.4 is an empirical observation lacking a precise mathematical characterization. The key question is: how do modulation frequency and modulation depth jointly determine the perceptual crossover? Does there exist a critical curve separating vibrato from tremolo percepts?
In the parameter space, there exists a perceptual critical curve separating the “vibrato percept” from the “tremolo percept.” Preliminary evidence suggests runs roughly along Hz, but bends toward lower at large — deeper modulation causes the auditory system to abandon frequency tracking sooner.
Open question: What is the precise shape of ? Does it depend on carrier frequency ?
References
-
Chowning, J. M. (1973). The synthesis of complex audio spectra by means of frequency modulation. Journal of the Audio Engineering Society, 21(7), 526–534. — The founding paper of FM synthesis; complete derivation from vibrato to synthesizer.
-
Watson, G. N. (1944). A Treatise on the Theory of Bessel Functions (2nd ed.). Cambridge University Press. — Standard reference for Bessel functions; rigorous proof of the Jacobi-Anger expansion (Ch. 2).
-
Sundberg, J. (1987). The Science of the Singing Voice. Northern Illinois University Press. — Measured acoustic parameters of opera vibrato ( Hz, ); interaction between the Singer’s Formant and vibrato.
-
Carson, J. R. (1922). Notes on the theory of modulation. Proceedings of the IRE, 10(1), 57–64. — Original derivation of the FM bandwidth approximation formula.
-
Glasberg, B. R., & Moore, B. C. J. (1990). Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47(1–2), 103–138. — Equivalent rectangular bandwidth (ERB) formula; estimating FM sideband resolvability.
-
Demany, L., & Semal, C. (1989). Detection thresholds for sinusoidal frequency modulation. Journal of the Acoustical Society of America, 85(3), 1295–1301. — FM detection thresholds; psychoacoustic data on the vibrato-tremolo transition zone.
-
Viemeister, N. F. (1979). Temporal modulation transfer functions based upon modulation thresholds. Journal of the Acoustical Society of America, 66(5), 1364–1380. — Temporal modulation transfer functions; bandwidth limitations of the auditory system’s amplitude and frequency modulation tracking.
-
Prame, E. (1997). Vibrato extent and intonation in professional Western lyric singing. Journal of the Acoustical Society of America, 102(1), 616–621. — Vibrato parameter statistics from 42 opera singers (mean Hz, mean extent ±0.6 semitones).