Identifying music

Music. Hmm. When I was learning to play an instrument, I found all the rules and jargon extremely confusing. Surely something like music, based on rather simple mathematics, should have an elegant and logical structure to it. Yet music theory seemed like a labyrinth of rules of thumb and historical accident. It was all rather unsatisfying, and I gave up after 6 or 7 years. Nowadays the world wide web has some good information which helps put it in perspective, and I hope I’ve come up with a reasonably jargon-free answer to the following question for my little “More or Less” slot:Longplayer bowl

I am always amazed by the number of songs one can recognise on hearing the first second or two of music. Since music has a limited number of building blocks and there are mathematical rules for how these can be combined to sound musical, is it possible to calculate the total number of potential opening bars? Surely it must be finite?

To take a stab at this, we’ve first got to establish some ground rules. It sounds like we’re talking about identifying music using just the notes in a tune, rather than identifying different versions or recordings of the same tune. In other words, how we can identify the same tune whether it’s played on a piano, a trumpet, or sung by a human voice. I suspect that cuts out a lot of the normal recognition process, but for the purposes of calculation it’s a sensible restriction – there are so many subtle variations on each instrument that the number of different possible recordings is essentially infiniteaYes, ‘essentially infinite’ is one of those phrases like ‘almost unique’ that should probably be banned by mathematical edict. And although the frequency of a simple sound can theoretically take an infinity of different values, the physical constraints on transmitting it probably reduce the possible variations to a finite (albeit very large) number. For example, the number of energy states of the air will be quantized. More restrictive than that, natural variations in air pressure will drown out some of the smallest variations in sound. Even more restrictive still, the human ear won’t be able to resolve the difference between many sounds. In fact, you can get a simple handle on this by working out how many different noises can be encoded in (say) the first second of a CD. So on a CD, your music is sampled 44100 times a second, and each time, the sampled number is one of 65536 (or 216) possible numbers. So there are 6553644100 possible noises that can be made by 1 second of recording on a CD. While not infinity, that’s really quite a big number, even if the vast majority of these simply sound like white noise..

Rhythm

Even if we ignore the instruments used, we should still expect some very big numbers, because of what’s been called the “combinatorial explosion” – even a few choices, when combined together, can produce a ridiculous number of outcomes. Take, for instance the beginning of a musical piece – I hesitate to say “bar” because musical “bars” come in many different lengths (in fact, I’ve just found out that the very concept of a bar is a relatively recent invention of Western European music). Now assume about the fastest note we can play lasts about 1/16th of a second, so that we can cram 32 of them into 2 seconds of the introductionbfor the musical reader, I’m imagining 32 semidemiquavers in a bar of 4/4 music at 120 beats/second – I consider hemisemidemiquavers and faster notes as beyond the pale.. If we purely focus on the rhythm, that produces 32 places where we could sound a noise or just stay quiet. So without even specifying the actual notes, we can theoretically tap out 232 or about 4.3 billion different rhythms!

Melody

Unsurprisingly, if we add an actual melody into the calculation, the numbers become astronomical. Take a simple piece written in the key of “C major” – the white notes on a piano keyboard. Throw in a handful of lower notes that can only be reached by tubas, church organs, specially made pianos, and the like, and you have an orchestral range of 8 octaves, each comprising 7 notes, from what’s known as “C0” at the bottom to “C8” at the topcI tried to link these to recorded versions, but my soundcard won’t play C0 !: that’s 8×7+1 = 57 notes for each of the 32 possible places in our musical introduction. Add the possibility of a pausedLength of notes, and hence rhythm, is incorporated into this calculation in assuming that longer notes are produced by two adjacent notes of the same pitch being “tied” together into a single note. I reckon that’s a reasonable approximation – two immediately sequential notes of the same pitch will appear as one if the gap between them is negligible. Repeated notes can be incorporated by imagining there is a pause of 1/16th of a second between them., and we have 5832 ≈ 2.7×1056 possible permutations. That’s many, many more orders of magnitude than the number of stars in the observable universe.

Chords

In fact, we’ve still only just scratched the surface, because we’ve restricted the analysis to playing one note at a time – a simple melody. But most music has multiple notes played simultaneously, usually in a pleasing combination called a “chord”. Here’s a page which lists about 25 different chords – that is, a single root note with combinations of additional notes to add harmony. Since the additional notes can be chosen either above or below the root note, the same page estimates there are about 4 different ways to play each chordeI’m not sure I trust this calculation!. If that’s so, then our total number would be a little lessfbecause we can’t hit additional notes higher than C8 or lower than C0 than (57×100 + 1)32 or about 1.6×10120. That’s far more than the entire number of atoms in the observable universe – more even than a googol. Is there an adjective to make “astronomical” seem like peanuts?

Other keys

To round it all off, I should point out that I’ve only considered a single musical key: C major. It’s true that most people probably won’t notice if the same tune is transposed to a different key. But even with a normal piano, there are various other keys we could use (e.g. those that produce a minor scale) which include alternative notes. There are also many tunes that throw in an occasional note that’s not in the original key. So we should probably allow the full 12 notes in each octave on a piano keyboard (i.e. both the black and the white keys), giving 9832 ≈ 5.2×1063 permutations of a simple melody, and of course, far more if you allow chords. But why stop there? Instruments that don’t have fixed mechanical tuning, such as the human voice, or violins, can use sound intervals that are more harmonious than the fixed notes on a piano keyboard, and that gives some subtly different notes we could add in to the mix. More extreme still are pieces of western music written in scales that have 19, 31 or even 43 notes in each octave. And that’s not including things like Indian, Arabic, and Chinese music, which use other intervals between notes.

I think the point is made. We’re talking almost impossibly large numbers.

Back to the question

So if we allow pretty much any note in the first bar of a tune, the number of possible openings is ludicrously large. Surely it would be surprising to find any two identical song openings? Yet there are some examples. Two that spring to mind are “Twinkle, twinkle little star”, and “Baa baa black sheep”. That’s probably indicative of the fact that I have 2 young daughters. I suspect there are more examples (I’d love to be told of some), especially where the same note is repeated many times over. That’s a pointer – as if one were needed – that music is not just notes chosen at random. As the questioner put it, there must be “rules” about which notes combine to “sound musical”.

Of course, it’s possible to discern patterns in music – for example, small shifts between notes are much more common than large jumps from one end of the keyboard to another. But I’m not aware of anyone successfully writing down any strict “rules” that apply without fail. Indeed, anyone proposing such rules will surely just encourage a composer to deliberately break them! Moreover, even rough “rules of thumb” for composing are likely to be very specific to particular cultures and musical traditions. So (certainly at the moment) I think it’s impossible to work out a sensible way to restrict all the mathematical possibilities to those that sound like “real music”.

To investigate how to move from mathematical possibilities to what people perceive as music, a more successful approach has been to generate reasonable-sounding music using an algorithm – a set of logical steps usually implemented on a computer. It’s a process that’s popular with music researchers,  – and you you can listen to a few examples online. There’s even an art installation intended to produce non-repeating music for the next 1000 years, which illustrates just how many permutations of reasonable-sounding music are possible. Coupled with my back-of-the-envelope calculations merely based on rhythm, I think it’s quite reasonable to say that there are more possible musically and culturally acceptable openings to songs than, say, the number of seconds since the Big Bang.

Mining real music

Instead of analysing all the mathematic possibilities, we could try looking at real music. In fact, there’s a large amount of academic and commercial interest in identifying music from short snippets – a field that’s called “Music Information Retrieval”, and there are a number of websites that you can try out. For example, it’s common to use a “musical fingerprint” to identify particular recordings of music, to help listeners find information about a song, or to flag up illegal music sharing. More interesting for our question are the sites that try to identify a song from a small sample that you hum or whistle into your computer. In fact, there are some algorithms that try to identify a tune based merely on whether the melody rises or falls at each step, or even simply based on a tapped-out rhythm. If you’re interested in this topic, I’d encourage you to have a go at http://www.musipedia.org or http://www.midomi.com.

The fact that that a computer can identify a song based on such sparse data, in some cases just the rhythm, is yet another illustration of how little variation you need to distinguish  between a large number of possibilities. It also shows that songs do indeed explore a large number of these. Can we quantify this in any way? Ideally we’d be able to analyze the opening bars from collections of real music to work out how many starting notes are needed to distinguish different tunes, and perhaps get a feel for how much of the possible mathematical music space is taken up by real music. In fact, for reasons not unrelated to our original question, it’s often conventional to document music by the opening snatch of a melody. So there are a number of online collections of these “musical incipits”, although most refer to out-of-copyright classical or folk tunes, rather than popular musicgThe most extensive ones seem to be maintained by RISM.

Given the interest in identifying music from as short a snippet as possible, I’m surprised that I’ve only managed to find one directly relevant analysis which has studied tens of thousands of musical incipits hThere are also more informative and detailed slides from their talk online, which I  would definitely advise looking at if you are deeply interested in the science behind this question. Other comprehensive studies of incipits seem to be focussed on other things . Although it’s just a single study, and may not be representative of all music, it’s definitely worth describing what they found.

The Stanford study

I’ll start with their analysis of about 8500 European and Asian folk songs – a collection of music which you would expect to contain a good number of quite similar tunes. Simply by specifying whether a note in the tune is higher, lower, or the same pitch as the previous note (so-called “contour searching“), it took on average about 12 notes in to the melody before the song could be uniquely identified from the 8500 availableiThey call this method “pitch gross contour”, listed in Table 8 in their paper.. Specifying only a summary of the rhythm (whether a note was longer, shorter, or the same length as the previous note), it took about 17 notes for unique identification. But because pitch and rhythm were “very independent”jTheir phrasing! – Table 7 if the two were combined, it only took 8 initial notes to fully identify the song. More precise methods of specifying pitch brought that count down by a few noteskFigure 8, for “match count (log2)’= 0. In fact, the majority of the narrowing down happens within the first five or six noteslTable 5, also Figure 8, where the lines begin to flatten out. After that, the additional notes help less and less to identify the song.

Similar results are found when looking at a database of over 10000 classical pieces. Here they looked at the number of notes needed to narrow the identification down to 10 possible tunesmThis measure is more robust to problems such as duplication of songs in the collection, and gives a more reliable way to measure the predictability of the song. Specifying whether successive notes were higher, lower, or of identical pitch, this narrowing down took 11 notes, whereas when rhythm was added, it only took 6. More precise methods of specifying the pitch led to that degree of narrowing-down in only 4 notes.

It’s actually possible to compare this to what you’d expect if there were no musical rules – as if the notes in each song were jumbled at randomnThis “permuation test” approach is not exactly what they did – instead they calculated the “informational entropy” of the collected notes from all the tunes, and compared it to the actual entropy of each tune, as estimated from how quickly the search was narrowed down. The difference is a measure of how much information is contained in the precise order of the notes in each tune (Figure 6) – a rather nice way to do it. That’s entropy, man.. They calculated a measure of the “unpredictability” of real music versus what you’d expect from a completely unpredictable random jumbleoRather nicely, this can also be interpreted as a measure of the “complexity” of the music: random jumbles (in some sense) the most complex, seeming followed (in their data) by Italian Renaissance motets.. What they found was, when it comes to trying to identify the music using the first few notes, real music isn’t hugely more predictable than randomness. That conclusion also held across a range of different music genres.  It seems as if the speed of identification of a song isn’t disastrously affected by the constraint of having to sound tunefulpI should add that the measures were done without rhythm information, and when the notes were “normalised” to a range with the 12 notes within a normal western octave, so that it wasn’t as if randomly allocated notes were chosen from the 88 keys on a keyboard (or the 97 notes in 8 octaves). Hence the mathematical space we’re talking about here is much smaller, and we are effectively skating over stuff like large jumps between octaves, which we know to be indicative of non-musical noise. So for the first (say) 8 notes there are only (12)8, or approximately 420 million possibilities. It would be more interesting for our purposes to calculate entropy measures with rhythm and wider pitch range as well..

For our purposes, the general implication is again that the opening to a musical piece is not so tightly constrained as to be negligible. The number of musically acceptable permutations, while very considerably smaller than all the mathematical possibilities, is still likely to be enormous.

Conclusions

Phew – that was a long post. I’m not sure how I’ll fit all that in to a radio broadcast. But the general points are:

  1. We can try to enumerate all the possible openings of a piece of music, but the “combinatorial explosion” means we rapidly hit ridiculous numbers, especially if we allow different rhythms, chords, etc.
  2. Clearly most of these mathematically possible openings won’t be very pleasant to listen to. But there’s no obvious way to use musically-inspired mathematical “rules” to weed out the horrible compositions, and leave us with the ones we perceive as “sounding musical”. Nevertheless, even if only the minutest fraction of the possible openings are acceptable to our ears, we’re still talking an enormous number.
  3. Instead of doing pure maths, we can look at actual written songs. Even with similar genres, one study finds that you only need to know a handful (fewer than 8) introductory notes (together with their associated rhythms), and the tune becomes uniquely identifiable.
  4. Using this to estimate the “predictability” of these real songs, compared to randomly jumbled notes, means we can roughly assess how restrictive it is to require all our tunes to “sound musical”. The indication is that there’s still a lot of complexity available in a short excerpt of real music.
  5. So it turns out not to be surprising that we can identify music from a very short burst of intro, especially when, in addition to the notes and rhythm, we also add the almost limitless information present in the tonal quality of the particular recording we’re listening to.

Perhaps what’s even more interesting is the neuroscience behind our own music retrieval system. How do we store music in our brain such that we can identify tunes from just a snippet of incipit. That’s going to hit upon the thorny question of why certain sounds are pleasing, and others not. Unfortunately, I haven’t the faintest hope of investigating that in 5 minutes of radio!

References

Notes   [ + ]

a. Yes, ‘essentially infinite’ is one of those phrases like ‘almost unique’ that should probably be banned by mathematical edict. And although the frequency of a simple sound can theoretically take an infinity of different values, the physical constraints on transmitting it probably reduce the possible variations to a finite (albeit very large) number. For example, the number of energy states of the air will be quantized. More restrictive than that, natural variations in air pressure will drown out some of the smallest variations in sound. Even more restrictive still, the human ear won’t be able to resolve the difference between many sounds. In fact, you can get a simple handle on this by working out how many different noises can be encoded in (say) the first second of a CD. So on a CD, your music is sampled 44100 times a second, and each time, the sampled number is one of 65536 (or 216) possible numbers. So there are 6553644100 possible noises that can be made by 1 second of recording on a CD. While not infinity, that’s really quite a big number, even if the vast majority of these simply sound like white noise.
b. for the musical reader, I’m imagining 32 semidemiquavers in a bar of 4/4 music at 120 beats/second – I consider hemisemidemiquavers and faster notes as beyond the pale.
c. I tried to link these to recorded versions, but my soundcard won’t play C0 !
d. Length of notes, and hence rhythm, is incorporated into this calculation in assuming that longer notes are produced by two adjacent notes of the same pitch being “tied” together into a single note. I reckon that’s a reasonable approximation – two immediately sequential notes of the same pitch will appear as one if the gap between them is negligible. Repeated notes can be incorporated by imagining there is a pause of 1/16th of a second between them.
e. I’m not sure I trust this calculation!
f. because we can’t hit additional notes higher than C8 or lower than C0
g. The most extensive ones seem to be maintained by RISM
h. There are also more informative and detailed slides from their talk online, which I  would definitely advise looking at if you are deeply interested in the science behind this question. Other comprehensive studies of incipits seem to be focussed on other things
i. They call this method “pitch gross contour”, listed in Table 8 in their paper.
j. Their phrasing! – Table 7
k. Figure 8, for “match count (log2)’= 0
l. Table 5, also Figure 8, where the lines begin to flatten out
m. This measure is more robust to problems such as duplication of songs in the collection, and gives a more reliable way to measure the predictability of the song
n. This “permuation test” approach is not exactly what they did – instead they calculated the “informational entropy” of the collected notes from all the tunes, and compared it to the actual entropy of each tune, as estimated from how quickly the search was narrowed down. The difference is a measure of how much information is contained in the precise order of the notes in each tune (Figure 6) – a rather nice way to do it. That’s entropy, man.
o. Rather nicely, this can also be interpreted as a measure of the “complexity” of the music: random jumbles (in some sense) the most complex, seeming followed (in their data) by Italian Renaissance motets.
p. I should add that the measures were done without rhythm information, and when the notes were “normalised” to a range with the 12 notes within a normal western octave, so that it wasn’t as if randomly allocated notes were chosen from the 88 keys on a keyboard (or the 97 notes in 8 octaves). Hence the mathematical space we’re talking about here is much smaller, and we are effectively skating over stuff like large jumps between octaves, which we know to be indicative of non-musical noise. So for the first (say) 8 notes there are only (12)8, or approximately 420 million possibilities. It would be more interesting for our purposes to calculate entropy measures with rhythm and wider pitch range as well.

Leave a Reply

Your email address will not be published. Required fields are marked *