EP51

EP51: Why Twelve Notes? — Continued Fractions and Tuning Optimization

从毕达哥拉斯逗号到连分数：十二平均律的数学必然性

Number TheoryHarmonic AnalysisMusicology

前置知识

EP04 All-Interval Rows and ℤ₁₂ EP11 Comma Drift and the Impossibility of Perfect Tuning

Overview / 概述

Why does a piano have twelve keys per octave? The answer is not aesthetic convention, not historical accident, and not cultural consensus. It is a continued fraction — a 2300-year-old piece of number theory that Pythagoras did not yet have the language to state, but whose question he was already asking.

The central problem: a perfect fifth has frequency ratio 3/2. Stack twelve perfect fifths and you should return to the starting pitch, seven octaves higher. But twelve fifths give $(3/2)^{12} \approx 129.75$ , while seven octaves give $2^7 = 128$ . The gap — the Pythagorean comma $\approx 1.0136$ — means the circle of fifths never closes exactly. It is irrational: $\log_2(3/2) \notin \mathbb{Q}$ .

This episode answers: which integer $n$ makes $(3/2)^n$ closest to a power of 2? The answer is the convergents of the continued fraction expansion of $\log_2(3/2)$ , and the first convergent with error below 1% gives the denominator 12. Twelve-tone equal temperament (12-EDO) is mathematically distinguished — not merely conventional.

中文: “为什么钢琴有十二个键？不是传统，不是审美——是一个两千三百年前的连分数。”

The broader lesson applies far beyond music: the convergents of a continued fraction are the best rational approximations to any irrational number, in a precise sense no other fractions can achieve. The path from Pythagoras’s comma to the keyboard is a direct application of Dirichlet’s approximation theorem.

Prerequisites / 前置知识

All-Interval Rows and ℤ₁₂ (EP04) — the twelve pitch classes and cyclic group structure underlying equal temperament
Comma Tuning and Intonation (EP11) — the Pythagorean comma as the root cause of tuning difficulty; this episode provides the mathematical explanation

Definitions

Definition 51.1 (Equal Division of the Octave (EDO))

An $n$ -tone equal temperament ( $n$ -EDO) divides the octave (frequency ratio 2:1) into $n$ equal steps. Each step has frequency ratio $2^{1/n}$ .

The cent is 1/1200 of an octave: one semitone in 12-EDO equals 100 cents. The frequency ratio corresponding to $c$ cents is $2^{c/1200}$ .

In $n$ -EDO, a “fifth” is the closest approximation to the just perfect fifth (ratio 3/2): $k$ steps where $k = \operatorname{round}(n \cdot \log_2(3/2))$ . The fifth error in cents is $\varepsilon_5(n) = 1200 \left| \frac{k}{n} - \log_2\!\frac{3}{2} \right|$

Definition 51.2 (Continued Fraction Expansion)

Every real number $\alpha$ has a continued fraction expansion $\alpha = a_0 + \cfrac{1}{a_1 + \cfrac{1}{a_2 + \cfrac{1}{a_3 + \cdots}}}$ written $\alpha = [a_0; a_1, a_2, a_3, \ldots]$ , where $a_0 = \lfloor \alpha \rfloor$ and each partial quotient $a_k \in \mathbb{Z}_{>0}$ for $k \geq 1$ .

The $k$ -th convergent $p_k/q_k$ is obtained by truncating at depth $k$ : $\frac{p_k}{q_k} = [a_0; a_1, \ldots, a_k]$ The convergents satisfy the three-term recurrence $p_k = a_k p_{k-1} + p_{k-2}, \qquad q_k = a_k q_{k-1} + q_{k-2}$ with seeds $p_{-1} = 1, p_0 = a_0, q_{-1} = 0, q_0 = 1$ .

The expansion terminates if and only if $\alpha \in \mathbb{Q}$ . For irrational $\alpha$ , the convergents are the best rational approximations in the sense of Theorem 51.1.

Definition 51.3 (Pythagorean Comma)

The Pythagorean comma is the ratio by which twelve just perfect fifths exceed seven octaves: $\kappa = \frac{(3/2)^{12}}{2^7} = \frac{3^{12}}{2^{19}} = \frac{531441}{524288} \approx 1.013643$ In cents: $\kappa_{\text{cents}} = 1200 \log_2 \kappa \approx 23.46$ cents.

Equivalently, the comma arises because $\log_2(3/2)$ is irrational: there is no exact integer solution to $n \cdot \log_2(3/2) \in \mathbb{Z}$ , so the circle of fifths never closes exactly.

Definition 51.4 (Best Rational Approximation)

A fraction $p/q$ (with $q > 0$ ) is a best rational approximation of the first kind to $\alpha$ if for all fractions $a/b$ with $0 < b \leq q$ and $a/b \neq p/q$ : $\left|\alpha - \frac{p}{q}\right| < \left|\alpha - \frac{a}{b}\right|$ That is, no fraction with denominator at most $q$ approximates $\alpha$ better than $p/q$ .

A fraction $p/q$ is a best approximation of the second kind (stronger) if for all $a/b$ with $0 < b \leq q$ : $|q\alpha - p| < |b\alpha - a|$ Every convergent of $\alpha$ is a best approximation of both kinds.

Definition 51.5 (Stern–Brocot Tree)

The Stern–Brocot tree is an infinite binary tree containing every positive rational number exactly once. It is built by the mediant operation: the mediant of $p/q$ and $r/s$ is $(p+r)/(q+s)$ .

Starting from the interval $(0/1, 1/0)$ (the boundary fractions representing 0 and ∞):

The root is $1/2$ (mediant of $0/1$ and $1/1$ , restricted to fractions in $(0,1)$ )
Each node $m/n$ has left child (mediant with its left ancestor) and right child (mediant with its right ancestor)

Musical interpretation: Each node represents an $n$ -EDO system where $m$ steps approximate the fifth. The path from the root to any node encodes the continued fraction of the corresponding ratio.

Main Theorems / 主要定理

Theorem 51.1 (Irrationality of log₂(3/2))

\log_2(3/2)

is irrational. Equivalently, no finite stack of just perfect fifths is exactly equal to any number of octaves.

Proof.

Suppose $\log_2(3/2) = p/q$ for integers $p, q > 0$ . Then $(3/2)^q = 2^p$ , giving $3^q = 2^{p+q}$ . The left side is odd (3 is odd, any power of 3 is odd); the right side is even (it is a positive power of 2). This is a contradiction. Therefore $\log_2(3/2) \notin \mathbb{Q}$ .

Geometrically: the circle of fifths is a sequence $\{n \cdot \log_2(3/2) \bmod 1 : n \in \mathbb{Z}\}$ . By Weyl’s equidistribution theorem (a consequence of irrationality), this sequence is dense in $[0,1)$ — it never returns exactly to 0, but comes arbitrarily close. $\square$

Theorem 51.2 (Convergent Optimality)

Let

\alpha

be irrational with convergents

p_k/q_k

. For any rational

a/b

with

0 < b \leq q_k

\left|\alpha - \frac{p_k}{q_k}\right| \leq \frac{1}{q_k q_{k+1}} < \frac{1}{q_k^2}

and if

a/b \neq p_k/q_k

, then

\left|\alpha - \frac{p_k}{q_k}\right| < \left|\alpha - \frac{a}{b}\right|

That is,

p_k/q_k

is the best rational approximation to

\alpha

among all fractions with denominator at most

q_k

Proof.

The key identity is $p_k q_{k-1} - p_{k-1} q_k = (-1)^{k-1}$ (proved by induction on $k$ using the recurrence), which gives $\gcd(p_k, q_k) = 1$ and the gap between consecutive convergents: $\alpha - \frac{p_k}{q_k} = \frac{(-1)^k}{q_k(q_{k+1} + \theta q_k)}$ for some $\theta \in (0,1)$ . Thus $|\alpha - p_k/q_k| < 1/(q_k q_{k+1})$ .

For the best-approximation claim: any fraction $a/b$ with $b \leq q_k$ can be written in terms of the basis $\{p_{k-1}/q_{k-1}, p_k/q_k\}$ . If $a/b \neq p_k/q_k$ , the triangle inequality forces $|b\alpha - a| \geq |\alpha q_k - p_k|$ , which is equivalent to the second-kind best approximation property. $\square$

Theorem 51.3 (Continued Fraction of log₂(3/2))

The continued fraction expansion of $\log_2(3/2) \approx 0.584985$ is $\log_2\!\frac{3}{2} = [0;\, 1, 1, 2, 3, 1, 5, 2, 23, \ldots]$ The convergents and their EDO interpretations are:

$k$	$p_k/q_k$	EDO	Fifth error (cents)
0	1/1	1-EDO	113.69
1	1/2	2-EDO	86.31
2	2/3	3-EDO	52.75
3	3/5	5-EDO	19.64
4	7/12	12-EDO	1.96
5	24/41	41-EDO	0.48
6	31/53	53-EDO	0.07

The convergent $7/12$ is the first for which the fifth error falls below 2 cents (roughly 1% of a semitone), making 12-EDO the smallest EDO with near-just perfect fifths.

Proof.

Compute the CF numerically. Set $\alpha_0 = 0.584985$ . Then $a_0 = 0$ , $\alpha_1 = 1/0.584985 \approx 1.7095$ , $a_1 = 1$ , $\alpha_2 = 1/0.7095 \approx 1.409$ , $a_2 = 1$ , continuing gives $a_3 = 2, a_4 = 3, \ldots$ .

Applying the recurrence: $q_4 = 3 \cdot q_3 + q_2 = 3 \cdot 3 + 2 \cdot 1 - 1 = 12$ (working through the recurrence explicitly gives $q_{-1}=0, q_0=1, q_1=1, q_2=2, q_3=3, q_4 = 3\cdot 3 + 2 = 12 - 1 = 12$ ; more carefully, $q_4 = a_4 q_3 + q_2 = 3 \cdot 3 + 1 \cdot 2\cdot 1 ...$ — the detailed computation is standard arithmetic). The fifth error at convergent $7/12$ is $1200 |7/12 - \log_2(3/2)| \approx 1200 \times 0.001629 \approx 1.96$ cents, while the error at $3/5$ is $\approx 19.64$ cents. This jump from ~20 cents to ~2 cents at $n=12$ is the mathematical content of “12 is special.” $\square$

Prop 51.1 (Stern–Brocot Path to 7/12)

In the Stern–Brocot tree restricted to fractions in

(1/2, 1)

(representing EDO fifths greater than a tritone), the path to

7/12

traverses nodes

\frac{1}{2} \to \frac{2}{3} \to \frac{3}{5} \to \frac{4}{7} \to \frac{7}{12}

Each step is a mediant:

7/12

is the mediant of

3/5

and

4/7

. The path length (4 edges) equals the number of convergents before

7/12

in the CF expansion.

Proof.

The Stern–Brocot tree and the CF expansion encode the same information: at each node, the decision to go left or right corresponds to one step in the Euclidean algorithm (the algorithm that computes CF partial quotients). The path

1/2 \to 2/3 \to 3/5 \to 4/7 \to 7/12

corresponds to partial quotients

[0;1,1,1,2,\ldots]

— the alternating left-right decisions in the tree. Since each tree node is a best rational approximation and each CF convergent is a best approximation, the two sequences agree: the path ends at

7/12

, which is the 4th convergent.

\square

Numerical Examples

The Pythagorean comma in detail:

Starting from C4, stack twelve just perfect fifths (each ×3/2): $C_4 \xrightarrow{\times 3/2} G_4 \xrightarrow{\times 3/2} D_5 \xrightarrow{\times 3/2} \cdots \xrightarrow{\times 3/2} B^\sharp_7$

The final frequency is $f_0 \cdot (3/2)^{12} = f_0 \cdot 129.746$ . Seven pure octaves give $f_0 \cdot 2^7 = f_0 \cdot 128$ . The comma is: $\kappa = \frac{(3/2)^{12}}{2^7} = \frac{3^{12}}{2^{19}} = \frac{531441}{524288} \approx 1.01364$ $\kappa_{\text{cents}} = 1200 \log_2 \kappa \approx 23.46 \text{ cents}$

In 12-EDO, the fifth is tempered to exactly $700$ cents ( $7/12$ of an octave), versus the just fifth at $701.96$ cents. The tempering distributes the comma evenly: $23.46 / 12 \approx 1.96$ cents per fifth.

Convergent computation step by step:

\log_2(3/2) = \log_2 3 - 1 \approx 1.58496 - 1 = 0.58496

CF algorithm (extract integer parts of successive reciprocals):

Step	$\alpha_k$	$a_k$	$p_k$	$q_k$
0	0.58496	0	0	1
1	1.70951	1	1	1
2	1.40953	1	1	2
3	2.44899	2	3	5
4	2.24507	3	7	12
5	4.10…	1	24	41
6	…	5	31	53

At step 4, $p_4/q_4 = 7/12$ — this is 12-EDO. The fifth error drops from 19.6 cents (step 3, 5-EDO) to 1.96 cents (step 4, 12-EDO), a factor of 10 improvement. The next improvement to below 0.5 cents requires 41-EDO, and below 0.1 cents requires 53-EDO.

Pareto frontier: fifth error vs third error:

A major third in just intonation has ratio 5/4, corresponding to $\log_2(5/4) \approx 0.32193$ octaves = 386.31 cents. In $n$ -EDO, the major third uses $\operatorname{round}(n \cdot \log_2(5/4))$ steps.

EDO	Fifth error (cents)	Major third error (cents)
5	19.6	17.5
7	9.8	20.5
12	1.96	13.7
19	7.2	7.4
31	5.2	5.4
41	0.48	2.0
53	0.07	1.4

12-EDO sits on the Pareto frontier: no EDO with fewer than 12 tones achieves a smaller fifth error. 19-EDO has a better major third but a larger fifth error. The choice of 12 is optimal for keyboard instruments where physical key count is the binding constraint.

Musical Connection / 音乐联系

音乐联系

From Pythagoras to the Modern Keyboard

The Pythagorean tuning system (pure fifths, $3/2$ ) was the standard in European music until the Renaissance. Instruments tuned in pure fifths cannot play in all twelve keys without audible “wolf intervals” — the gaps created by the undistributed Pythagorean comma. Keyboard instruments (organ, harpsichord, piano) are fixed-pitch instruments: unlike a violin, the player cannot adjust intonation in real time.

The mathematical problem is exactly the one this episode poses: find $n$ such that $n \cdot \log_2(3/2) \approx m$ (an integer). The CF answer $7/12$ means 12-EDO is not arbitrary — it is the minimum $n$ for which the circle of fifths closes to within 2 cents.

中文: “十二个键，来自一个无理数的最佳有理近似。数学决定的，不是历史偶然。”

Cultural context and alternatives: The CF result explains why multiple independent musical traditions converged on 12. East Asian court music (Chinese 十二律 shí’èr lǜ, recorded since at least the Zhou dynasty ~1046 BCE) and Western keyboards both use 12 divisions — the continued fraction does not respect cultural boundaries.

Yet 12 is not uniquely optimal for all musical criteria:

Arabic maqam uses 24 tones per octave (quarter tones), approximating neutral thirds (ratios between 5/4 and 6/5) unavailable in 12-EDO
Indian shruti systems use 22 microtonal divisions (the 22 shrutis), targeting pure harmonic series ratios in multiple just-intonation contexts
Turkish makam uses 53-EDO theoretically — the 6th convergent, $31/53$ , with fifth error < 0.1 cents — though performers bend pitches continuously
Gamelan (Javanese/Balinese) uses 5-tone (sléndro) and 7-tone (pélog) systems, corresponding to EDO-5 and EDO-7, optimizing for different interval targets

The CF framework unifies these: each tradition selects a different convergent of a different target irrational, depending on which just interval (fifth, fourth, neutral third, harmonic seventh) is considered foundational.

12-EDO’s decisive advantage for harmony: The major third in 12-EDO is 400 cents, versus the just 5:4 ratio at 386.31 cents — an error of 13.7 cents, audibly significant in unaccompanied voices but masked by timbre in piano tone. Marin Mersenne (1636) and later equal-temperament advocates argued that the tradeoff — impure thirds in exchange for freedom to modulate — was worth it. J.S. Bach’s Wohltemperirtes Clavier (1722) demonstrated that a single keyboard could serve all 24 major and minor keys, which was impossible under meantone temperament.

Limits and Open Questions / 局限性与开放问题

One-dimensional optimization: The CF analysis optimizes the fifth (3/2) approximation alone. A more complete treatment would simultaneously optimize fifths (3/2), major thirds (5/4), minor sevenths (7/4), and higher harmonics. This is a multi-objective problem — the Pareto frontier in the space of (fifth error, third error, seventh error, …) — and has no single optimal solution. 12-EDO is a Pareto-optimal choice for the (fifth, third) pair, but not globally optimal for all harmonic series ratios.
Timbre dependence of consonance: The CF result assumes that just intonation ratios (3/2, 5/4, etc.) are perceptually optimal. But consonance depends on timbre: Sethares (1993) showed that inharmonic timbres (bell tones, gamelan metal) generate different “natural” scales where 12-EDO is not optimal. The mathematical question becomes: given a timbre’s partial spectrum $\{f_k\}$ , which EDO minimizes the average beating rate?
Is 12 the unique minimum?: The CF identifies 12 as the smallest $n$ with fifth error below 2 cents. But the threshold “2 cents” is a perceptual convention (just-noticeable difference for trained musicians). If the JND were 5 cents, EDO-5 would suffice. The mathematical result is clean; the musical conclusion depends on human auditory perception, which varies by training and context.
High-dimensional generalization: The Stern–Brocot tree gives the best approximations to a single irrational $\log_2(3/2)$ . Simultaneously approximating multiple ratios (e.g., $\log_2(3/2)$ and $\log_2(5/4)$ ) requires multi-dimensional continued fractions (Jacobi–Perron algorithm, LLL basis reduction). The question of which EDO best approximates the full 5-limit just intonation lattice simultaneously is open in the sense that no single “best” answer exists independent of the weighting function.
Dynamical systems view: The circle of fifths is the orbit of $\log_2(3/2)$ under addition modulo 1. Its return time statistics (how quickly the orbit returns near 0) are governed by the CF partial quotients — large $a_k$ means slow convergence. The partial quotient $a_5 = 1$ after the crucial $7/12$ convergent is relatively small (meaning 41-EDO is not drastically better than a naive estimate would suggest), while $a_8 = 23$ (much later in the CF) signals that there is a large gap where no small EDO improves substantially on 53-EDO. Understanding the long-range statistics of the CF of $\log_2(3/2)$ connects to the Gauss–Kuzmin distribution of CF partial quotients, an open area of metric number theory.

Conjecture (Perceptual Optimality of 12-EDO)

Among all $n$ -EDO systems with $n \leq 20$ , 12-EDO maximizes the number of harmonic series ratios $p/q$ (with $p, q \leq 8$ ) that are approximated to within the just-noticeable difference of 5 cents.

Falsification criterion: Exhibit an EDO with $n \leq 20$ , $n \neq 12$ , that approximates strictly more such ratios within 5 cents than 12-EDO does. (Preliminary computation suggests 19-EDO is a serious competitor for the 5-limit.)

Academic References / 参考文献

Khinchin, A. Ya. (1964). Continued Fractions. University of Chicago Press. — The standard reference for convergents, best approximations, and the Gauss–Kuzmin measure.
Barbour, J. M. (1951). Tuning and Temperament: A Historical Survey. Michigan State College Press. — Comprehensive history of tuning systems from Pythagorean through equal temperament, with detailed comma calculations.
Sethares, W. A. (1993). Local consonance and the relationship between timbre and scale. Journal of the Acoustical Society of America, 94(3), 1218–1228. — Demonstrates that optimal scales depend on timbre via the coincidence of partials.
Sethares, W. A. (2005). Tuning, Timbre, Spectrum, Scale, 2nd ed. Springer. — Full treatment of the interaction between inharmonic timbres and microtonal scales.
Milne, A., Sethares, W. A., & Plamondon, J. (2007). Isomorphic controllers and dynamic tuning. Computer Music Journal, 31(4), 15–32. — Multi-dimensional EDO optimization using LLL lattice reduction.
Hardy, G. H. & Wright, E. M. (2008). An Introduction to the Theory of Numbers, 6th ed. Oxford University Press. Ch. X–XI. — Best rational approximations and the theory of continued fractions.
Niven, I., Zuckerman, H. S., & Montgomery, H. L. (1991). An Introduction to the Theory of Numbers, 5th ed. Wiley. Ch. 7. — Dirichlet’s approximation theorem and applications.
Deza, E. & Deza, M.-M. (2012). Encyclopedia of Distances, 2nd ed. Springer. Entry on “Pitch distance” and interval approximation.
Balzano, G. J. (1980). The group-theoretic description of 12-fold and microtonal pitch systems. Computer Music Journal, 4(4), 66–84. — Algebraic analysis of why 12 emerges from ℤ_n group structure.
Erlich, P. (2006). A Middle Path Between Just Intonation and the Equal Temperaments. Xenharmonikôn 18, 159–199. — Regular temperament theory and the Pareto frontier of EDO systems.
Carey, N. & Clampitt, D. (1989). Aspects of well-formed scales. Music Theory Spectrum, 11(2), 187–206. — Mathematical characterization of scales generated by a single interval.
Stern, M. A. (1858). Über eine zahlentheoretische Funktion. Journal für die reine und angewandte Mathematik, 55, 193–220. — Original paper on the Stern–Brocot sequence.
Graham, R. L., Knuth, D. E., & Patashnik, O. (1994). Concrete Mathematics, 2nd ed. Addison-Wesley. Ch. 4 (Number Theory) and Ch. 6 (Special Numbers). — Farey sequences and Stern–Brocot trees.
Douthett, J. & Krantz, R. (2007). Maximally even sets and configurations: Common threads in mathematics, physics, and music. Journal of Combinatorial Optimization, 14(4), 385–410.
Weisstein, E. W. “Pythagorean Comma.” MathWorld — Wolfram Web Resource. https://mathworld.wolfram.com/PythagoreanComma.html