Versatility
Dec. 22nd, 2012 08:44 pmWhat is the most versatile set of three distinct letters that one can have in Scrabble?
This is the kind of thing that randomly goes through my head from time to time. I was wondering if there was any set of 3 letters for which all six unique permutations of the letters are actual English words.
So for example, if you have the letters PTO, you can write POT, TOP, and OPT, but TPO, PTO, and OTP are not real words (fandom acronym appropriations notwithstanding).
What is the most permutable set of three English Letters? Are there any that have all six possibilities covered? Using the list at this site, I did some research and some data manipulation in R to find the answer.
The Data Set:
There are 1011 valid three-letter English words. Not all of them have three unique letters, however. For example, the first entry
It has 2 A's so the length of unique characters is 2. Assigning those values to each of the entries in my data frame, using the "apply" function:
Out of the 1011 words, 905 of them have 3 distinct letters. Snag them and rearrange the ordering so that any multiples will be sorted the same way: eg, OPT, TOP, and POT all get mapped to alphabetically-ordered "OPT". This is done by splitting the word into its constituent characters, sorting the characters, and pasting them back together into the new word.
Now we have a list of 905 words, that will contain repeats of a unique letter sequence when that sequence makes more than one word. We just have to tabulate them:
So there are definitely fewer unique 3-letter groups than words, obviously, owing to overlaps and repeats like OPT, PIN, BAG, etc. The 905 words fall into 640 categories. Do any have six permutations?
No! about 66% of the combinations actually only have 1 valid word (422/640). The highest we have is one set with 5 permutations, and then we have 2 sets with 4 permutations, and we jump all the way to 40 sets with 3. ETA: I should also say, there are 26 choose 3 = 2600 different ways to select three unique letters out of 26, so there should be a 0 spot there, with a count of 1960 letter combinations with no valid words.
Any guesses as to the three most popular?
So that's the answer. The most versatile set of letters is:
AET: for which ATE, EAT, ETA, TAE, and TEA are all valid words
Runners up:
APS: for which ASP, SAP, PAS, and SPA are all valid words. I think, give it a few more years, and this group will join into the 5-permutation crowd, since we are well on the way to APS being a word.
AHS: for which AHS, ASH, HAS, and SHA are all words.
So there's the answer. No fully permutable set of 3-letter words, and only a few groups come close. I wonder what 4-letter words would look like!
This post brought to you by procrastination and geekery.
This is the kind of thing that randomly goes through my head from time to time. I was wondering if there was any set of 3 letters for which all six unique permutations of the letters are actual English words.
So for example, if you have the letters PTO, you can write POT, TOP, and OPT, but TPO, PTO, and OTP are not real words (fandom acronym appropriations notwithstanding).
What is the most permutable set of three English Letters? Are there any that have all six possibilities covered? Using the list at this site, I did some research and some data manipulation in R to find the answer.
The Data Set:
> tlw = read.table("threeletterwords.txt")
> tlw[,1] = as.character(tlw[,1])
> tlw[1:10,]
[1] "AAL" "AAS" "ABA" "ABO" "ABS" "ABY" "ACE" "ACT" "ADD" "ADO"
> nrow(tlw)
[1] 1011
There are 1011 valid three-letter English words. Not all of them have three unique letters, however. For example, the first entry
> unlist(strsplit(tlw[1,1],"")) [1] "A" "A" "L" > table(unlist(strsplit(tlw[1,1],""))) A L 2 1 > length(table(unlist(strsplit(tlw[1,1],"")))) [1] 2
It has 2 A's so the length of unique characters is 2. Assigning those values to each of the entries in my data frame, using the "apply" function:
> tlw$NumUniq = apply(tlw,1,function(x){length(table(unlist(strsplit(x[1],""))))})
> tlw[1:10,]
tlw NumUniq
1 AAL 2
2 AAS 2
3 ABA 2
4 ABO 3
5 ABS 3
6 ABY 3
7 ACE 3
8 ACT 3
9 ADD 2
10 ADO 3
> sum(tlw$NumUniq == 3)
[1] 905
Out of the 1011 words, 905 of them have 3 distinct letters. Snag them and rearrange the ordering so that any multiples will be sorted the same way: eg, OPT, TOP, and POT all get mapped to alphabetically-ordered "OPT". This is done by splitting the word into its constituent characters, sorting the characters, and pasting them back together into the new word.
> tlw.uniq=tlw[(tlw$NumUniq == 3),1]
> tlw.uniq.sort = unlist(lapply(tlw.uniq, function(x){paste(sort(unlist(strsplit(x,""))), collapse="" )}))
Now we have a list of 905 words, that will contain repeats of a unique letter sequence when that sequence makes more than one word. We just have to tabulate them:
> tlw.uniq.sort.count = rev(sort(table(tlw.uniq.sort))) > length(tlw.uniq.sort.count) [1] 640
So there are definitely fewer unique 3-letter groups than words, obviously, owing to overlaps and repeats like OPT, PIN, BAG, etc. The 905 words fall into 640 categories. Do any have six permutations?
> table(tlw.uniq.sort.count) tlw.uniq.sort.count 1 2 3 4 5 422 175 40 2 1
No! about 66% of the combinations actually only have 1 valid word (422/640). The highest we have is one set with 5 permutations, and then we have 2 sets with 4 permutations, and we jump all the way to 40 sets with 3. ETA: I should also say, there are 26 choose 3 = 2600 different ways to select three unique letters out of 26, so there should be a 0 spot there, with a count of 1960 letter combinations with no valid words.
Any guesses as to the three most popular?
> tlw.uniq.sort.count[1:3] tlw.uniq.sort AET APS AHS 5 4 4
So that's the answer. The most versatile set of letters is:
AET: for which ATE, EAT, ETA, TAE, and TEA are all valid words
Runners up:
APS: for which ASP, SAP, PAS, and SPA are all valid words. I think, give it a few more years, and this group will join into the 5-permutation crowd, since we are well on the way to APS being a word.
AHS: for which AHS, ASH, HAS, and SHA are all words.
So there's the answer. No fully permutable set of 3-letter words, and only a few groups come close. I wonder what 4-letter words would look like!
This post brought to you by procrastination and geekery.
no subject
Date: 2012-12-23 02:00 am (UTC)no subject
Date: 2012-12-23 02:04 am (UTC)I should really start writing fic now...
no subject
Date: 2012-12-23 04:22 am (UTC)no subject
Date: 2012-12-23 04:41 am (UTC)no subject
Date: 2012-12-23 02:18 pm (UTC)no subject
Date: 2012-12-24 06:24 am (UTC)This also reminds me why I stuck to C++ and java for my languages XD
no subject
Date: 2012-12-24 03:05 pm (UTC)C++ I learned first, but I still don't know Java. I like R because it has built-in functions for the statistically-minded, and because it is way easier to do in a command-line environment. It's like python but souped up. Except it's written by statisticians not computer scientists, so you need to avoid long for loops if you can (functional languages and maps, hooray!).
no subject
Date: 2012-12-25 08:42 pm (UTC)I don't know Java too well either other than just enough to barely scrape by. Haha, that does make it a much more appealing language than something like C++ if it's meant for statisticians!
no subject
Date: 2012-12-25 04:47 am (UTC)The maths involved in solving this query, however, are making my eyes and brains wobble. I'm ever so impressed by it, and also happy that there are people in the world who will write whole big equations like this to scientifically pin down little niggly questions just for the fun of it.
(and I never, in a million years, would've guessed H into one of the most useful trilogies)
no subject
Date: 2012-12-25 03:08 pm (UTC)Most of it is just number crunching in R. A bit of math involved in permutations and combinations.
Merry Christmas!
no subject
Date: 2012-12-26 09:17 pm (UTC)