Analyse/classify my music collection

I reckon I don’t listen to music as much as most people: I don’t have a personal music player, and all 4 radios in our house are tuned to BBC Radio 4. But I’ve just been trying to answer a question about music for the Radio 4 programme “More or Less” (I’ll blog about that later in the week). Now I’m intrigued by an area that I know very little about, and want to do a bit of analysis, but I’m not quite sure how to do it.

I reckon I’ve got one of the most eclectic iTunes library of anyone I knowaProbably because I don’t save much music to my computer, so I end up with very random stuff, such as a children’s nursery rhymes in different languages, next to Flanders and Swann. But how do I justify that statistically? I don’t want to rely on the predefined categories for each music track: I want to analyse the actual sound of each piece. Presumably there must be algorithms out there that take (say) a .wav or .mp3 file and output the sort of distance metrics that people use to do acoustic fingerprinting or other methods of Music Information RetrievalbI presume those are the sort of algorithms that get submitted to the annual MIREX competition.. I’d like to plug these in to my favourite statistical analysis software, to be able to do different forms of cluster analysis, and be able to plot out just how weird my collection is.

3D scatterplot

A snapshot of the sort of representation I’m thinking of, although most people’s music collection will have more datapoints and more categories. Rotating the plot should be trivial, although I haven’t worked out how the get a cross-browser rotatable plot embedded into this page.

I’m imagining a 3D scatter plot of the songs in my library, perhaps identified into clusters using different colours. The dimensions could be 3 appropriate metrics, or possibly the 3 principle components of a PCA. If it’s easy to get hold of the data, I’d also like to illustrate the range that’s “normally” seen in each dimension: either in other people’s music collections, or in online databases.

Unlike some other methods of music classification and identification, I’m not looking for algorithms based on other people’s music recommendations. In fact, it would be more fun to suggest tracks that are most different to the ones you own (for some definition of “different”).

Surely there’s an idea for a silly smartphone app in there? Not only to graphically display your music collection, but also to recommend tracks you’ll probably hate. Suggesting Bulgarian women’s folk songs to someone who only has hip-hop in their collection has a certain charm. And the distance metrics don’t need to be terribly rigorous – it’s not a scientific study, after all!

Anyone have any pointers to algorithms I could use, preferably already implemented in R? I see there’s some fun stuff using the R package tuneR, but a brief web trawl fails to find any implementations of simple 1D metrics which can be easily stolen used.

p.s. an amusing spin-off would be to reverse the process, and see if you could represent your own scientific datasets in music form. Map each dimension to a music metric, then for each datapoint, locate the nearest track in music space. Might work best for a single repeated measure over time, so you could create an album. Reminds me of “Dance your PhD“.

Notes   [ + ]

a. Probably because I don’t save much music to my computer, so I end up with very random stuff, such as a children’s nursery rhymes in different languages, next to Flanders and Swann
b. I presume those are the sort of algorithms that get submitted to the annual MIREX competition.

3 thoughts on “Analyse/classify my music collection

  1. I don’t know how but five minutes ago – I was thinking of making an android app which would classify music based on the “waves” it contains. And that’s how I landed on your site. And had other ideas to improve that. Were you able to do anything regarding this?

    Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *