When I helped write the Ancestor’s Tale, one of the big tasks was to make a human-centred tree of life: to list of all the point at which, backwards in time, the human lineage joined with lineages of other extant lifeforms. In February last year I was forwarded an email from someone using these “rendezvous points” as the basis for a song and story for children. She had seen the O’Leary (2013) paper and wondered how much revision was needed to our original list.
Modern technology, coupled with molecular taxonomy, means we now have very large evolutionary trees: ones with tens or hundreds of thousands of species. In fact, the Open Tree of Life project aims to create a tree of all living things, which would have millions of species. The obvious question is how to display these enormous amounts of data.
Here’s one possibility I’ve come up with: use a single pixel for each tip of the tree (each species). Then, if we could use the whole of a one megapixel canvas, we could display information about a million or so species. Continue reading
I’ve been meaning to write about this for a good while, ever since the release of a paper which hit the science headlines last year. For the first time to my knowledge, researchers have tried to do professionally what I, together with a graphic illustrator, did in an amateur fashion for The Ancestor’s Tale. I’m talking about attempting to reconstruct what our distant ancestors looked like, an intriguing task that also leads to some striking visual possibilities.
If a friend tries to do an impression of an animal call, how easy is it to work out the animal? Alternatively, how good are different people at making animal noises? If a computer could assess the accuracy of these impressions, it would open up some fun possibilities. Imagine searching a database of animal sounds, or a set of known animals in a nature reserve, simply by doing an appropriate impression. I haven’t been able to find many people researching this topic, but I reckon I’m halfway to solving it. I just need a little extra help with some image analysis. Continue reading
I’ve put up a page to help get hold of pretty pictures of living organisms. I often need to do this when giving talks and trying to avoid text-heavy slides. One time I might need a rather general picture, such as a bat to discuss animal sonar. Other times I might need much a more specific image, such as a vampire bat nose (when talking about the nasal heat receptors of bats, and their evolutionary links to spicy heat).
While messing around with ideas for my talk at the Cheltenham Science Festival, I hit on the idea of combining a piece of acrylic that only passes infrared light, with a cheap Fresnel magnifying lens. Completely by accident, I managed to get the rather pretty effect to the right. You can see that the paper is burning where there is no visible light, in the infrared portion of the spectrum. Even better, it’s quite a cheap effect to achieve.
How effectively can a computer distinguish pictures or drawings of organisms from maps of their distribution? Based on my previous thoughts on recognising maps, a simple statistical technique allows a computer to correctly identify 99% of maps from within a training set of 1210 images (including 272 maps). Pleasingly, this classification has only a 0.5% false positive rate.
Pretty good, but in the case of images submitted to the Encyclopaedia of Life, we can do better. If we make a guess as to the original format of the image, and include this into the model, we can correctly separate all 272 maps from the 938 pictures and drawings in my particular dataset. If you want to try it out, the dataset is here, and the R code to perform the classification is near the end of this post.
I’ve been writing some code to download freely usable images on the internet for large numbers of organisms – for example, for all species of mammal. The Encyclopedia of Life has done a lot of the hard work already – collecting images from Wikimedia Commons, Flickr, etc., encouraging experts to tag them as trusted, and providing an API for retrieving all the relevant data.
One problem is that a small percentage of the automatically harvested pictures are not pictures of the organism, but maps of its distribution, as seen in the lower picture on the right. Is there a way to automatically identify these as maps, or at least to flag up that they might need checking? Continue reading
Last month I recorded an update to my Radio 4 programme on why women live longer than men (which you can still listen to online). A cut down version was broadcast on the BBC World Service. For this we wanted to include statistics for different countries. I was busy investigating official UN sources for these data, when I noticed that morning’s papers were reporting a relevant set of studies published that day in the Lancet (Das, Samarasekera, 2013). A couple of thoughts followed. Firstly, these new data indicate that in 2010, men had a higher life expectancy than women in only 2 countries: Afghanistan and Jordan (Wang, Dwyer-Lindgren, Lofgren, Rajaratnam, Marcus, Levin-Rector, Levitz, Lopez, Murray, 2013). That’s in contrast to the UN data, which finds this effect in a small number of African countries too – suggesting the UN data could be inaccurate in these cases. Secondly, the data are given with uncertainty intervals for the first time, which got me thinking about the best way to illustrate the data. In particular, I’ve been thinking about how to improve the plot above (taken from the wikipedia page listing the life expectancy of different countries).
There’s a fair bit of of publicly available data that can be used to answer questions about global trends. Previously, I’ve used data from the UN and the UK (often the Office for National Statistics), but I’ve only just discovered the commendable data service being provided by the British paper “The Guardian“. They encourage public access to (and visualization of) the data on which their articles are based. This is done by tidying up data and putting it in publicly accessible Google spreadsheets.With all this data available, it’s possible to carry out reproducible research and analysis, such as the plot opposite. For this you require an easy (and free) way to to carry out analysis using the datasets. Perhaps the most powerful way is to use something like R, which can be issued commands to carry out analysis, generate visualizations, etc. For maximum reproducibility, I’ve been trying to write R code that accesses these data sources directly, without having to download and tweak intermediate data files. This should make it easy to analyse datasets and produce attractive plots – for example, of estimated life expectancy and population size. Continue reading