EP07: Information Entropy and All-Interval Rows
Overview
A listener hears two melodies and judges the second as “more chaotic.” But when you measure the distribution of adjacent intervals, the second melody is mathematically more uniform — its Shannon entropy is higher, not lower. The discomfort comes not from disorder but from the absence of the familiar concentrated pattern that tonal music trains the brain to expect.
This episode builds Shannon’s entropy from first principles and applies it to music. The central object is the all-interval row on : a 12-tone sequence whose 11 adjacent intervals hit every value in exactly once. This uniform distribution achieves the theoretical entropy maximum bits. In contrast, a Bach prelude’s interval distribution concentrates on steps and thirds, yielding bits.
The paradox: the all-interval row looks random (uniform histogram) but is extremely rare (only 1,928 out of twelve-tone rows satisfy the condition). It is a pseudorandom object — deterministically constructed to maximize a statistical property. This is the same mathematical principle underlying cryptographic pseudorandom generators.
中文: “随机性不是无序的对立面,而是极端有序的另一种表现。全音程音列用最严格的约束,生成了最均匀的输出。”
Prerequisites
- All-Interval Rows and (EP04) — the definition of all-interval rows, the Klein mother chord
- Basic probability: discrete probability distributions, expected value
Definitions
Let be a discrete random variable taking values in with probabilities . The Shannon entropy of is
using the convention . Entropy measures the average surprise of an outcome: is the surprise of outcome .
Given a melody as a sequence of pitch classes , the adjacent interval sequence is for .
The interval distribution is the empirical probability distribution over : .
The melodic entropy is .
A twelve-tone row is a permutation of . It is an all-interval row if the 11 adjacent intervals , , take each value in exactly once.
There are exactly 1,928 all-interval rows (up to starting pitch; if we also mod out by the dihedral group of twelve-tone operations, the count reduces further). Out of possible rows, this is a fraction of approximately .
A set is a perfect difference set if every nonzero element of appears exactly once among the differences .
Perfect difference sets exist when (a prime power condition). For , : the set is a perfect difference set.
All-interval rows are related but distinct: they concern ordered sequences of adjacent differences over , not unordered multisets of all pairwise differences.
A first-order Markov chain on alphabet has transition probabilities and stationary distribution (where ).
The entropy rate is
For music modeled as a Markov chain on pitch classes, measures the long-run unpredictability of the melody per note.
Main Theorems
We use Jensen’s inequality: since is strictly concave,
with equality iff all are equal, i.e., for all .
The left side is .
The right side is .
Therefore , with equality iff .
Corollary for all-interval rows: The adjacent interval distribution of an all-interval row is uniform over (each value exactly once out of 11), giving bits — the maximum possible for an 11-symbol alphabet.
Let be i.i.d. with distribution over alphabet . For any lossless prefix-free code assigning codeword of length to symbol :
where is the expected codeword length. The lower bound is achievable; no prefix-free code can do better on average than bits per symbol.
Lower bound (): For any prefix-free code, the Kraft inequality states . Let where . Then
since KL divergence and (as ).
Upper bound (): Choose . Then , so . The Kraft inequality is satisfied since .
Musical corollary: Tonal melody with bits/interval can be encoded with bits per interval. An all-interval row with bits requires bits per interval. The tonal melody is more compressible — its redundancy (repeated steps and thirds) is exactly what permits shorter average codes.
For an ergodic Markov chain with stationary distribution and transition matrix , the entropy rate equals
This is the weighted average of the per-state conditional entropies.
By the chain rule for entropy, .
For a Markov chain, .
By ergodicity, the empirical distribution of converges to , so
An all-interval row achieves the maximum adjacent-interval entropy and is incompressible in the Lempel-Ziv sense over its 11 intervals — yet it is the output of a deterministic construction satisfying strict algebraic constraints.
Formally: the 11-symbol string formed by the adjacent intervals of an all-interval row has Kolmogorov complexity bits, since the row can be specified by its first element and a short description of the all-interval property. It is statistically indistinguishable from a uniform draw over (all permutations of ), but computationally it is a highly structured object.
Visual: Two histograms side by side: left shows Bach C-major Prelude interval frequencies (tall bars at 1,2,3,4; short bars at 6,7,10,11), right shows a flat histogram for an all-interval row. Caption: bits vs. bits.
Musical Connection
The entropy hierarchy of musical styles
Measured entropy of adjacent interval distributions, from various corpora:
| Corpus | Approximate (bits) | Character |
|---|---|---|
| Gregorian chant | Almost entirely stepwise | |
| Bach chorales | Steps + thirds dominate | |
| Bach C-major Prelude | Wider range of intervals | |
| Bebop jazz heads | Chromatic passing tones | |
| Webern Op. 27 | Wide leaps, near-uniform | |
| All-interval row | $3.459$ | Uniform, theoretical max |
The entropy gradient roughly tracks Western music history: the expansion of harmonic vocabulary from modal chant to twelve-tone serialism corresponds to a monotone increase in interval entropy.
朱载堉 (Zhu Zaiyu) and equal temperament — the same operation
In 1584, the Ming dynasty scholar Zhu Zaiyu solved the tuning problem by choosing frequency ratios of for all semitones — mathematically equalizing all twelve intervals at the cost of making each one slightly irrational. This is the same mathematical operation as the all-interval row: Zhu uniformized pitch ratios across twelve semitones; Schoenberg uniformized interval frequencies across twelve interval classes. Two revolutions, four centuries apart, unified by the principle of replacing a natural hierarchy with a mathematical uniform distribution.
Entropy rate and musical style fingerprints
Modeling a composer’s style as a first-order Markov chain on , the entropy rate characterizes how “predictable” their melodic choices are:
- Bach: bits/note (strong tonal gravity, concentrated transitions)
- Chopin: bits/note (more chromatic but still tonal)
- Schoenberg (twelve-tone period): bits/note (near-maximal)
The entropy rate is a single real number that summarizes a composer’s interval vocabulary — a lossy compression of style into one bit of information theory.
This connects forward to EP21 (AI composition uses Markov chain entropy rate as a diversity metric) and EP24 (pop music is low-entropy by design; melodic predictability is a feature, not a bug).
Limits and Open Questions
-
First-order vs. higher-order models. Melodic entropy computed from adjacent intervals is a first-order statistic. Long-range dependencies (phrase structure, motivic development) require higher-order Markov models or full probabilistic context-free grammars. The entropy rate of a -th order chain requires much more data to estimate reliably.
-
Entropy vs. perceived complexity. High entropy does not equal perceived complexity. A melody of random pitches (maximum entropy) is less interesting than a fugue (moderate entropy, rich long-range structure). Information theory measures statistical uniformity; perceived complexity involves pattern recognition at multiple timescales.
-
Combinatorial counting of all-interval rows. The exact count of 1,928 all-interval rows (up to starting pitch) was determined computationally. The algebraic structure that generates them — related to perfect difference sets and the theory of cyclic difference families — is only partially understood. An open question: is there a direct constructive bijection between all-interval rows and a known combinatorial object?
-
Entropy of rhythm. Analogously to pitch interval entropy, one can define the entropy of inter-onset intervals (IOIs) in rhythm. African polyrhythm typically achieves higher rhythmic entropy than Western common-practice meter. A comparative study across global music traditions is an active research area.
-
Minimum description length and style. Kolmogorov complexity of a melody is the length of the shortest program that generates . By the source coding theorem, for i.i.d. melodies, but for structured compositions because the structure allows extreme compression. The gap measures “structural redundancy” — it is large for tonal music, small for twelve-tone music.
Academic References
-
Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27(3), 379–423. (The founding paper of information theory.)
-
Cover, T. M., & Thomas, J. A. (2006). Elements of Information Theory (2nd ed.). Wiley. Ch. 2 (Entropy), Ch. 5 (Source coding theorem).
-
Knopoff, L., & Hutchinson, W. (1983). Entropy as a measure of style: The influence of sample length. Journal of Music Theory, 27(1), 75–97.
-
Margulis, E. H., & Beatty, A. P. (2008). Musical style, psychoaesthetics, and prospects for entropy as an analytic tool. Computer Music Journal, 32(4), 64–78.
-
Morris, R. (1987). Composition with Pitch Classes. Yale University Press. (All-interval rows and their algebraic properties.)
-
Jedrzejewski, F. (2006). Mathematical Theory of Music. Editions Delatour France/IRCAM. (Perfect difference sets and interval content.)
-
Nolan, C. (2000). On the mathematical side of Schoenberg’s twelve-tone method. Music Theory Spectrum, 22(2), 162–182.
-
Temperley, D. (2007). Music and Probability. MIT Press. (Probabilistic models of musical cognition including Markov chain entropy.)