Influential yet unresolved branch points on the Open Tree of Life


The visualisation of life at shows all branch points as a split into two branches. Yet the Open Tree of Life, from whom much of the data is gathered, have many examples of large polytomies, where many branches emerge from a single point. These are almost entirely cases where a large number of species have been classified by taxonomists into a single group (say the ragworts), but where the OpenTree has not incorporated any extra studies that can resolve relationships between species within the group.

Continue reading

A new universal tree of life

A revised universal tree of life

A revised universal tree of life

Here’s a new depiction of the ‘universal evolutionary tree of life’ which incorporates some recent advances in the field. In particular, some fast-evolving taxa that are known spuriously placed in previous ribosomal RNA trees have been omitted (e.g. microsporidia, which are known to fall within the fungi), and it also illustrates the increasingly popular idea that Eukaryotes nest within the Archaea. Unlike some other universal trees on the internet, this is shown as an unrooted tree, which is a more conservative approach than the (arguable) rooting of the tree of life between Eubacteria and Archaea. The tree has the distinct advantage that it (more or less) correctly reflects our understanding of the major groups of Eukaryotes (including plants, fungi & animals). In my opinion it is a more accurate depiction of our state of knowledge than other ‘universal trees of life’ commonly found on the internet. Continue reading

A pragmatic redating of our 700 million year ancestry

As part of the second edition of the Ancestor’s Tale, I have been reassessing the dates on the backwards journey from today’s humans to the origin of animals. What is written below is rather technical, and mostly for my own benefit, and also for Richard Dawkins. Nevertheless, perhaps others may be interested in my reasoning, which I hope is seen as a pragmatic assessment of our current beliefs. Continue reading

Ancestor’s Tale list of concestors, revised


Some surprises in store for Concestor 23

When I helped write the Ancestor’s Tale, one of the big tasks was to make a human-centred tree of life: to list of all the point at which, backwards in time, the human lineage joined with lineages of other extant lifeforms. In February last year I was forwarded an email from someone using these “rendezvous points” as the basis for a song and story for children. She had seen the O’Leary  paper and wondered how much revision was needed to our original list.

For those who wish to skip to the chase, I’ve come up with a new list at the end of this post. Continue reading

Visualizing data on large phylogenies at the pixel-level


A phylogenetically organised display of data for all placental mammal species. Red pixels are those without a picture on EoL.

Modern technology, coupled with molecular taxonomy, means we now have very large evolutionary trees: ones with tens or hundreds of thousands of species. In fact, the Open Tree of Life project aims to create a tree of all living things, which would have millions of species. The obvious question is how to display these enormous amounts of data.

Here’s one possibility I’ve come up with: use a single pixel for each tip of the tree (each species). Then, if we could use the whole of a one megapixel canvas, we could display information about a million or so species. Continue reading

Human impressions of animal sounds

If a friend tries to do an impression of an animal call, how easy is it to work out the animal? Alternatively, how good are different people at making animal noises? If a computer could assess the accuracy of these impressions, it would open up some fun possibilities. Imagine searching a database of animal sounds, or a set of known animals in a nature reserve, simply by doing an appropriate impression. I haven’t been able to find many people researching this topic, but I reckon I’m halfway to solving it. I just need a little extra help with some image analysis. Continue reading

Spotting maps, part II

Grasses drawing

Gratifyingly, my technique correctly classifies drawings like this as pictures, not maps.

How effectively can a computer distinguish pictures or drawings of organisms from maps of their distribution? Based on my previous thoughts on recognising maps, a simple statistical technique allows a computer to correctly identify 99% of maps from within a training set of 1210 images (including 272 maps). Pleasingly, this classification has only a 0.5% false positive rate.

Pretty good, but in the case of images submitted to the Encyclopaedia of Life, we can do better. If we make a guess as to the original format of the image, and include this into the model, we can correctly separate all 272 maps from the 938 pictures and drawings in my particular dataset. If you want to try it out, the dataset is here, and the R code to perform the classification is near the end of this post.
Continue reading

Spotting maps among images of organisms


How do you get a computer to distinguish pictures like the fallow deer at top from distribution maps (bottom)?

I’ve been writing some code to download freely usable images on the internet for large numbers of organisms – for example, for all species of mammal. The Encyclopedia of Life has done a lot of the hard work already – collecting images from Wikimedia Commons, Flickr, etc., encouraging experts to tag them as trusted, and providing an API for retrieving all the relevant data.

One problem is that a small percentage of the automatically harvested pictures are not pictures of the organism, but maps of its distribution, as seen in the lower picture on the right. Is there a way to automatically identify these as maps, or at least to flag up that they might need checking? Continue reading

Reproducible plots of public data (Guardian Google spreadsheets, UNdata) using R

There’s a fair bit of of publicly available data that can be used to answer questions about global trends. Previously, I’ve used data from the UN and the UK (often the Office for National Statistics), but I’ve only just discovered the commendable data service being provided by the British paper “The Guardian“. They encourage public access to (and visualization of) the data on which their articles are based. This is done by tidying up data and putting it in publicly accessible Google spreadsheets.

Plot reproducible from free online data using R

With all this data available, it’s possible to carry out reproducible research and analysis, such as the plot opposite. For this you require an easy (and free) way to to carry out analysis using the datasets. Perhaps the most powerful way is to use something like R, which can be issued commands to carry out analysis, generate visualizations, etc. For maximum reproducibility, I’ve been trying to write R code that accesses these data sources directly, without having to download and tweak intermediate data files. This should make it easy to analyse datasets and produce attractive plots – for example, of estimated life expectancy and population size. Continue reading