Spotting maps, part II

Grasses drawing

Gratifyingly, my technique correctly classifies drawings like this as pictures, not maps.

How effectively can a computer distinguish pictures or drawings of organisms from maps of their distribution? Based on my previous thoughts on recognising maps, a simple statistical technique allows a computer to correctly identify 99% of maps from within a training set of 1210 images (including 272 maps). Pleasingly, this classification has only a 0.5% false positive rate.

Pretty good, but in the case of images submitted to the Encyclopaedia of Life, we can do better. If we make a guess as to the original format of the image, and include this into the model, we can correctly separate all 272 maps from the 938 pictures and drawings in my particular dataset. If you want to try it out, the dataset is here, and the R code to perform the classification is near the end of this post.
Continue reading