I’ve put up a page to help get hold of pretty pictures of living organisms. I often need to do this when giving talks and trying to avoid text-heavy slides. One time I might need a rather general picture, such as a bat to discuss animal sonar. Other times I might need much a more specific image, such as a vampire bat nose (when talking about the nasal heat receptors of bats, and their evolutionary links to spicy heat).
While messing around with ideas for my talk at the Cheltenham Science Festival, I hit on the idea of combining a piece of acrylic that only passes infrared light, with a cheap Fresnel magnifying lens. Completely by accident, I managed to get the rather pretty effect to the right. You can see that the paper is burning where there is no visible light, in the infrared portion of the spectrum. Even better, it’s quite a cheap effect to achieve.
How effectively can a computer distinguish pictures or drawings of organisms from maps of their distribution? Based on my previous thoughts on recognising maps, a simple statistical technique allows a computer to correctly identify 99% of maps from within a training set of 1210 images (including 272 maps). Pleasingly, this classification has only a 0.5% false positive rate.
Pretty good, but in the case of images submitted to the Encyclopaedia of Life, we can do better. If we make a guess as to the original format of the image, and include this into the model, we can correctly separate all 272 maps from the 938 pictures and drawings in my particular dataset. If you want to try it out, the dataset is here, and the R code to perform the classification is near the end of this post.
I’ve been writing some code to download freely usable images on the internet for large numbers of organisms – for example, for all species of mammal. The Encyclopedia of Life has done a lot of the hard work already – collecting images from Wikimedia Commons, Flickr, etc., encouraging experts to tag them as trusted, and providing an API for retrieving all the relevant data.
One problem is that a small percentage of the automatically harvested pictures are not pictures of the organism, but maps of its distribution, as seen in the lower picture on the right. Is there a way to automatically identify these as maps, or at least to flag up that they might need checking? Continue reading
Last month I recorded an update to my Radio 4 programme on why women live longer than men (which you can still listen to online). A cut down version was broadcast on the BBC World Service. For this we wanted to include statistics for different countries. I was busy investigating official UN sources for these data, when I noticed that morning’s papers were reporting a relevant set of studies published that day in the Lancet . A couple of thoughts followed. Firstly, these new data indicate that in 2010, men had a higher life expectancy than women in only 2 countries: Afghanistan and Jordan . That’s in contrast to the UN data, which finds this effect in a small number of African countries too – suggesting the UN data could be inaccurate in these cases. Secondly, the data are given with uncertainty intervals for the first time, which got me thinking about the best way to illustrate the data. In particular, I’ve been thinking about how to improve the plot above (taken from the wikipedia page listing the life expectancy of different countries).
There’s a fair bit of of publicly available data that can be used to answer questions about global trends. Previously, I’ve used data from the UN and the UK (often the Office for National Statistics), but I’ve only just discovered the commendable data service being provided by the British paper “The Guardian“. They encourage public access to (and visualization of) the data on which their articles are based. This is done by tidying up data and putting it in publicly accessible Google spreadsheets.With all this data available, it’s possible to carry out reproducible research and analysis, such as the plot opposite. For this you require an easy (and free) way to to carry out analysis using the datasets. Perhaps the most powerful way is to use something like R, which can be issued commands to carry out analysis, generate visualizations, etc. For maximum reproducibility, I’ve been trying to write R code that accesses these data sources directly, without having to download and tweak intermediate data files. This should make it easy to analyse datasets and produce attractive plots – for example, of estimated life expectancy and population size. Continue reading
In the light of the recent hospital admission of the Duchess of Cambridge for an condition loosely associated with female babies and multiple births, I’ve been asked by the Radio 4 programme “More or Less” to calculate the probability that she is pregnant with more than one embryo. I’m somewhat reluctant to contribute to what is already a topic of rampant media speculation, and the attendant intrusive journalism that often plagues issues like this (which, after I had written this post, led to a sad and particularly tragic outcome). Nevertheless, few media articles seem to give links to solid data sources, and some even give rather misleading information, so I’ve overcome my reluctance in order to put some solid statistical facts into the public domain. Simply put, compared to the average, the probability of a mother having twins given that she has this condition is not quite doubled. However, it’s still likely to be a very low number: something like an increase from about 1.5% to a 2.4% chance. For the gory details, read on. Continue reading
Many of the rarest and most endangered species we know of are plants. The sad story of Hibiscedelphus woodii, which I’ve only just read about, is not atypical. Of the 4 last known specimens, on a cliff in Hawaii,
three individuals of H. woodii were apparently crushed by a large fallen boulder and died between 1995 and 1998. on 17 August 2011, the last remaining H. woodii was observed dead. [zotpressInText item=”X8HWMKBN”]
The unfortunate (but appropriate) choice of this as a endangered species at a recent event for the Society of Biology, prompted me to dig out the following, which I wrote in 2009 to answer the question “What is the world’s rarest plant”: Continue reading
I’m quite pleased that the Society of Biology has asked me to speak at the opening debate to promote Biology Week. Although the proliferation of commemorative days, weeks, and months risks stultifying many people, this seems an event worth supporting. The title is “Do we need pandas? Choosing which species to save”, and I thought I’d document my thoughts here.
Just how many words can you make using the element symbols. And what if you aren’t allowed to use the symbols more than once? An obvious job for a computer, especially if you have a text file of English words hanging around. So this morning I hunted for a decent list of chemical elements and symbols, and found 38163 valid words: 15% of a large English vocabulary… Continue reading