Reproducible plots of public data (Guardian Google spreadsheets, UNdata) using R

There’s a fair bit of of publicly available data that can be used to answer questions about global trends. Previously, I’ve used data from the UN and the UK (often the Office for National Statistics), but I’ve only just discovered the commendable data service being provided by the British paper “The Guardian“. They encourage public access to (and visualization of) the data on which their articles are based. This is done by tidying up data and putting it in publicly accessible Google spreadsheets.

Plot reproducible from free online data using R

With all this data available, it’s possible to carry out reproducible research and analysis, such as the plot opposite. For this you require an easy (and free) way to to carry out analysis using the datasets. Perhaps the most powerful way is to use something like R, which can be issued commands to carry out analysis, generate visualizations, etc. For maximum reproducibility, I’ve been trying to write R code that accesses these data sources directly, without having to download and tweak intermediate data files. This should make it easy to analyse datasets and produce attractive plots – for example, of estimated life expectancy and population size. Continue reading

Severe morning sickness and twins

Dizygotic (non-identical) twin sisters, image from Wikimedia Commons, ©2006 Dustin M. Ramsey

In the light of the recent hospital admission of the Duchess of Cambridge for an condition loosely associated with female babies and multiple births, I’ve been asked by the Radio 4 programme “More or Less” to calculate the probability that she is pregnant with more than one embryo. I’m somewhat reluctant to contribute to what is already a topic of rampant media speculation, and the attendant intrusive journalism that often plagues issues like this (which, after I had written this post, led to a sad and particularly tragic outcome). Nevertheless, few media articles seem to give links to solid data sources, and some even give rather misleading information, so I’ve overcome my reluctance in order to put some solid statistical facts into the public domain. Simply put, compared to the average, the probability of a mother having twins given that she has this condition is not quite doubled. However, it’s still likely to be a very low number: something like an increase from about 1.5% to a 2.4% chance. For the gory details, read on. Continue reading