Schrëwdinger and her descendants

Mosaic of placental mammals

670 descendants of Concestor 13, appropriately arranged.

I’ve been meaning to write about this for a good while, ever since the release of a paper which hit the science headlines last year. For the first time to my knowledge, researchers have tried to do professionally what I, together with a graphic illustrator, did in an amateur fashion for The Ancestor’s Tale. I’m talking about attempting to reconstruct what our distant ancestors looked like, an intriguing task that also leads to some striking visual possibilities.

Continue reading

Human impressions of animal sounds

If a friend tries to do an impression of an animal call, how easy is it to work out the animal? Alternatively, how good are different people at making animal noises? If a computer could assess the accuracy of these impressions, it would open up some fun possibilities. Imagine searching a database of animal sounds, or a set of known animals in a nature reserve, simply by doing an appropriate impression. I haven’t been able to find many people researching this topic, but I reckon I’m halfway to solving it. I just need a little extra help with some image analysis. Continue reading

Spotting maps, part II

Grasses drawing

Gratifyingly, my technique correctly classifies drawings like this as pictures, not maps.

How effectively can a computer distinguish pictures or drawings of organisms from maps of their distribution? Based on my previous thoughts on recognising maps, a simple statistical technique allows a computer to correctly identify 99% of maps from within a training set of 1210 images (including 272 maps). Pleasingly, this classification has only a 0.5% false positive rate.

Pretty good, but in the case of images submitted to the Encyclopaedia of Life, we can do better. If we make a guess as to the original format of the image, and include this into the model, we can correctly separate all 272 maps from the 938 pictures and drawings in my particular dataset. If you want to try it out, the dataset is here, and the R code to perform the classification is near the end of this post.
Continue reading

Spotting maps among images of organisms


How do you get a computer to distinguish pictures like the fallow deer at top from distribution maps (bottom)?

I’ve been writing some code to download freely usable images on the internet for large numbers of organisms – for example, for all species of mammal. The Encyclopedia of Life has done a lot of the hard work already – collecting images from Wikimedia Commons, Flickr, etc., encouraging experts to tag them as trusted, and providing an API for retrieving all the relevant data.

One problem is that a small percentage of the automatically harvested pictures are not pictures of the organism, but maps of its distribution, as seen in the lower picture on the right. Is there a way to automatically identify these as maps, or at least to flag up that they might need checking? Continue reading

Reproducible plots of public data (Guardian Google spreadsheets, UNdata) using R

There’s a fair bit of of publicly available data that can be used to answer questions about global trends. Previously, I’ve used data from the UN and the UK (often the Office for National Statistics), but I’ve only just discovered the commendable data service being provided by the British paper “The Guardian“. They encourage public access to (and visualization of) the data on which their articles are based. This is done by tidying up data and putting it in publicly accessible Google spreadsheets.

Plot reproducible from free online data using R

With all this data available, it’s possible to carry out reproducible research and analysis, such as the plot opposite. For this you require an easy (and free) way to to carry out analysis using the datasets. Perhaps the most powerful way is to use something like R, which can be issued commands to carry out analysis, generate visualizations, etc. For maximum reproducibility, I’ve been trying to write R code that accesses these data sources directly, without having to download and tweak intermediate data files. This should make it easy to analyse datasets and produce attractive plots – for example, of estimated life expectancy and population size. Continue reading

The world’s rarest plant?

We’ll probably never see this plant again. Image from Wood (2012)

Many of the rarest and most endangered species we know of are plants. The sad story of Hibiscedelphus woodii, which I’ve only just read about, is not atypical. Of the 4 last known specimens, on a cliff in Hawaii,

three individuals of H. woodii were apparently crushed by a large fallen boulder and died between 1995 and 1998. on 17 August 2011, the last remaining H. woodii was observed dead. [zotpressInText  item=”X8HWMKBN”]

The unfortunate (but appropriate) choice of this as a endangered species at a recent event for the Society of Biology, prompted me to dig out the following, which I wrote in 2009 to answer the question “What is the world’s rarest plant”: Continue reading

Period table Scrabble

Just how many words can you make using the element symbols. And what if you aren’t allowed to use the symbols more than once? An obvious job for a computer, especially if you have a text file of English words hanging around. So this morning I hunted for a decent list of chemical elements and symbols, and found 38163 valid words: 15% of a large English vocabulary… Continue reading

Analyse/classify my music collection

I reckon I don’t listen to music as much as most people: I don’t have a personal music player, and all 4 radios in our house are tuned to BBC Radio 4. But I’ve just been trying to answer a question about music for the Radio 4 programme “More or Less” (I’ll blog about that later in the week). Now I’m intrigued by an area that I know very little about, and want to do a bit of analysis, but I’m not quite sure how to do it. Continue reading