Period table Scrabble

Last Wednesday, at the last minute, I popped along to the Royal Institution for a one-off “Science News Quiz“. There I was, minding my own business in the cafe, when I got chatting to the people on the teams. Next thing, I’m roped in to do the scoring for the evening. It was pretty enjoyable: 2 teams against each other and against the audience. Extra points were available at the end for coming up with words that could be constructed from the individual element symbols from the periodic table. Imagine normal scrabble tiles with “Ar“, “Na“, “O“, etc rather than simply all the single letters in the alphabet. Normal scrabble words were valid, and you could reuse “tiles”.

Needless to say, the combined ingenuity of the audience (including, it must be said, a fair few 6th formers) managed to produce far more words than the contestants. Although they were ahead anyway, that made it painfully obvious that the audience were the winners.

Words greater or equal to twenty letters long, valid in Periodic table Scrabble
floccinaucinihilipilification 29
nonrepresentationalisms 23
nonrepresentationalism 22
hypersusceptibilities 21
internationalisations 21
representationalistic 21
hypercoagulabilities 20
hyperconsciousnesses 20
internationalisation 20
phosphomonoesterases 20
professionalisations 20
representationalisms 20
supercalifragilistic 20
supposititiousnesses 20
undercapitalisations 20
underrepresentations 20

But just how many words can you make using the element symbols. And what if you aren’t allowed to use the symbols more than once? An obvious job for a computer, especially if you have a text file of English words hanging around. So this morning I hunted for a decent list of chemical elements and symbols – after having found that my previous version needed updating with the recent IUPAC additions to the periodic table. Then I bashed out a quick computer script to check, and seemed to come up with about 11% of English words being allowed under the rules we were working to: 29103 out of the 263533 – one of the longest being “interconvertibilities”. Then I said as much on Twitter.

Oops.

I shouldn’t have been so hasty. I should also have checked my example word properly. On close inspection, “interconvertibilities” doesn’t worka I had mistakenly adopted the technique of removing element symbols piece by piece from the word. For example “interconvertibilities” minus “Er” turns to “intconvertibilities”, then minus “Tc” gives “inonvertibilities”, and so on. That isn’t a valid way to do it. I should at least have put a replacement unused character, like an asterisk, in each place.. Someone on Twitter noticed this. They also pointed out a website that already provides a list of words made from chemical element symbols. Damn – perhaps I should have done an internet search before starting on this.

On the other hand, it’s probably worth verifying other people’s lists, and it’s an interesting problem. Indeed, a bit more thought revealed that it is more complex than I first thought. The problem is that there are not just single letters, but 2-letter symbols as well. The end of, say, “Mo” (Molybdenum) is the same as the start of, say “Os” (Osmium), so in a word like “almost”, should you match the “Mo” first, or the “Os”? In some cases, it will make a difference to whether the rest of the word will then become do-able. In other words, the order you try to match symbols into words, and the various possible locations of different symbols within each word, make a big difference.

I suspect that means that you potentially have to test all the possible locations and permutations before ruling out a word. That sounds quite hard. And then there’s the additional check to see if you can form the word without using the same element twice – i.e. without replacing a tile once you’ve used it up. There’s probably a nifty computer-scientist way of doing it, but here’s my rather clunky effort. Run it as a perl program, with the files “ElmtSymbols.txt” as a line-by-line list of the symbols for the chemical elements, and your dictionary of words in “Dictionary.txt”. It should spew out valid “periodic table” words with an asterisk beside those that can be formed without replacement. But I’m still not 100% convinced my algorithm is correct:

#!/usr/bin/perl -W

#For a given set of abbreviations (eg symbols for chemical elements)
#find all the words in English which contain those letters
open(ABBR,"<:crlf","ElmtSymbols.txt")||die("Couldn't open file $!");
open(DICT,"<:crlf","Dictionary.txt")||die("Couldn't open file $!");
chomp(my @symb = map(lc,<ABBR>)); #make sure symbols are lowercase
close(ABBR);

sub contains_with_replacement {
  #check if a word contains the element symbols, allowing
  #replacement. This works by snipping element symbols off the
  #front of each word, until we are (hopefully) left with nothing
  my @check = ($_[0]);
  my $sym_ref = $_[1];
  while (@check) {
    my $curr = shift(@check); #grab the word off the stack.
    foreach my $s (@$sym_ref) { #try each symbol in turn
      if ($curr =~ (m/^$s(.*)$/i)) { #match at start of the word
        if (length($1)) { #end of word still exists
          push(@check,$1); #stick remainder into pile to check
        } else {
          return(1); #hurrah, we've reduced one to length = 0
}}}}};

sub contains { 
  #check if a word contains the element symbols, without replacement
  # (i.e. all symbols only used 0 or 1 times). Works by collecting
  # variants: chem symbols within the word at each potential match
  # point get alternately killed (all that remains in place is "-")
  my @check = ($_[0]); #start with variant to check - the full word
  my $sym_ref = $_[1];
  foreach my $s (@$sym_ref) { #match each symbol in turn, then stop
    for(my $i=scalar(@check)-1; $i>=0; $i--) { #check all variants
      while ($check[$i] =~ m/$s/gip) { #loop over possible locations
        my $remains = ${^PREMATCH}."-".${^POSTMATCH}; #kill symbol
        #print " $remains"; #for debugging: show how algorithm works
        return(1) if ($remains =~ m/^-+$/); #YES! all chars replaced
        push(@check, $remains); #add variant for future checking
  }}};
  return(0);  
}

foreach my $dictword (<DICT>) {
  chomp($dictword);
  if(contains_with_replacement($dictword, \@symb)) {
    print $dictword; 
    print " *" if(contains($dictword, \@symb));
    print "\n";
  }}
close(DICT);

Of course, the number of words depends on the dictionary you use. As of 2012 many online word lists such as ENABLE and ABLE are rather difficult to find, and the version I’ve found of SOWPODS doesn’t have really long words. It seems like the nicest freely-available word lists to use at the moment are those from SCOWL. This is a set of lists that can be combined to produce different levels and localizations of spellings. I’ve used the combined American, British, and Canadian spellings (plus variants) for all basic words up to SCOWL level “80” – described as “huge” but not “insane” – which gives 254301 wordsbThese include some with apostrophes and hyphens, but supposedly no abbreviations or proper names. I tweaked the list to remove words with apostrophes and hyphens, which you can get by downloading SCOWL, finding the folder full of word lists (names like “british-words.70”), and running something like the following on the (bash) command-line, assuming you have a computer running Unix of some description, e.g. Linux, BSD, Mac OS X, etc.

export LC_ALL=en_GB.ISO8859-1 #make "sort" work appropriately
#omit variant 2 and level 95 words, plus words with hyphens and apostrophes
cat *[!2]-*words*.[!9]? | grep -v "[-']" | sort -u > Dictionary.txt

.

Running my program on this list produces 38163 hits. In other words, almost exactly 15% of the English words in my extensive list are valid to use for “periodic table scrabble” (38163/254301=0.1500702). Of these, the longest is “floccinaucinihilipilification”cI’ve seen some word lists which pluralize this too, making a word that’s one letter longer. As for words found without reusing symbols, I find 27752 (10%), the longest of which are “hyperconsciousnesses” and “hypercoagulabilities”, as also found in “Nandor’s Exhaustive Chemical Words Pages“.

As a check, I’ve compared my list against his. The 148 words in his list but not in mine are mostly ones with accents (e.g. attaché, café, vicuña, soupçon). That’s because SCOWL records accents, whilst the ENABLE list doesn’t. On the other hand, there are 11117 words in my list but not in his, mostly words I had never heard of (“acarbose”, “acarophobias”) but a few I had (“acausal”, “amygdal”, “cardboardy”, “biccies”, “hackery”). Anyway, I hope I’ve provided any reader of this post with just about enough technical detail to create such a list themselves, if anyone wants to take it further.

Notes   [ + ]

a.  I had mistakenly adopted the technique of removing element symbols piece by piece from the word. For example “interconvertibilities” minus “Er” turns to “intconvertibilities”, then minus “Tc” gives “inonvertibilities”, and so on. That isn’t a valid way to do it. I should at least have put a replacement unused character, like an asterisk, in each place.
b. These include some with apostrophes and hyphens, but supposedly no abbreviations or proper names. I tweaked the list to remove words with apostrophes and hyphens, which you can get by downloading SCOWL, finding the folder full of word lists (names like “british-words.70”), and running something like the following on the (bash) command-line, assuming you have a computer running Unix of some description, e.g. Linux, BSD, Mac OS X, etc.

export LC_ALL=en_GB.ISO8859-1 #make "sort" work appropriately
#omit variant 2 and level 95 words, plus words with hyphens and apostrophes
cat *[!2]-*words*.[!9]? | grep -v "[-']" | sort -u > Dictionary.txt

c. I’ve seen some word lists which pluralize this too, making a word that’s one letter longer

6 thoughts on “Period table Scrabble

  1. Hi Yan, Nice write up. I will like to have a periodic table scrabble. let me know as soon as a software or game becomes available. thank you

  2. You can just let posix regular expressions do all the heavy lifting for you, in the case where you don’t mind replacement. This is about 5 times as fast as your contains_with_replacement, on my machine:

    my $regex_constructor = join(‘|’, @symb);
    my $regex = qr/^($regex_constructor)+$/;

    sub cwr_fast {
    my $word = lc(shift);
    return $word =~ m/$regex/;
    }

    • Thanks. A very good point – I suspect that’s the “nifty computer-scientist way of doing it” that I guessed might be possible. I can’t think of how to do the without_replacement example with RegExpr parsing, unfortunately.

      Good to know that the RE engines are nicely optimised for speed too!

Leave a Reply

Your email address will not be published. Required fields are marked *