# Matt Parker's Wordle challenge - the *pythonic vowel approach*

(c) by Thomas Reichert

## License

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, see http://www.gnu.org/licenses/.

## The code
*If you run this notebook, please run the code itself first and patiently wait for a few minutes as only then the values in the next markdown section will appear properly ;-)*

In [1]:
from tqdm import tqdm # only used to feel better because 'something happens on the screen all the time' ;-)
def cmb(x, l=10): # function that combines words recursively to tuples, triples, quadruples and quintuples
    return cmb([{'_'.join([kx, ky]): vx|vy for kx, vx in tqdm(x.pop(0).items()) for ky, vy in x[0].items()
                 if (len(vx|vy)==l)}] + x[1:], l+5) if (len(x) > 1) else x[0]
valid = {w: s for w in open('words_alpha.txt').read().split('\n') if (len(w)==5) and (len(s:=set(w))==5)} 
valid = {n: {k: v for k,v in valid.items() if (len(v & set(vwl:='aeiou'))<=n)} for n in [0, 1, 2, 3]}
vowels = {vw: {k: v for k,v in valid[1].items() if len(set(i for i in vwl if i!=vw)&v)<=0} for vw in vwl} 
result = {**(zero:=cmb([vowels[v] for v in 'uieoa'])), **(two:=cmb([valid[n] for n in [0, 0, 1, 3, 3]])),
          **(one:=cmb([valid[n] for n in [0, 1, 1, 1, 2]]))} # yes the order in each cmb matters speedwise
open('result.csv', 'w').write('\n'.join(result:=set(','.join(sorted(k.split('_'))) for k in result.keys())))

100%|██████████| 431/431 [00:00<00:00, 3824.62it/s]
100%|██████████| 47924/47924 [00:05<00:00, 8631.54it/s] 
100%|██████████| 497833/497833 [01:09<00:00, 7130.84it/s]
100%|██████████| 167055/167055 [00:33<00:00, 4932.52it/s]
100%|██████████| 26/26 [00:00<00:00, 132505.35it/s]
100%|██████████| 10/10 [00:00<00:00, 1592.07it/s]
100%|██████████| 282/282 [00:00<00:00, 351.69it/s]
100%|██████████| 7324/7324 [00:19<00:00, 384.29it/s]
100%|██████████| 26/26 [00:00<00:00, 1314.54it/s]
100%|██████████| 6546/6546 [00:03<00:00, 1755.71it/s]
100%|██████████| 128290/128290 [01:37<00:00, 1314.76it/s]
100%|██████████| 73434/73434 [02:50<00:00, 430.36it/s]


24929

## What the code actually does

The goal of this jupyter notebook is to solve Matt Parker's Wordle challenge **with my personal restriction that all has to happen in 10 lines of basic Python code using only one single thread and no additional packages that might speed up things**. The motivation for this is that if one almost exclusively works with Python such as yours truly as a data scientist and certified actuary does, the fact that solving this problem is possible more than 1000 times faster by just implementing it in e.g. C++ doesn't really help a lot as one could not spontaneusly write code in it. So Python it is ;-)

The only package being imported is tqdm which doesn't help the algorithm at all but just shows progress to make us humans feel better because 'something happens on the screen all the time' ;-). While clearly being way off the results from the super fast precompiled languages, I was at least able to get a runtime of a bit below 7 minutes on a M1 Macbook Pro and beat Benjamin Paassen's graph theory approach, which is the fastest pure Python approach that I am aware of (sorry if somebody meanwhile wrote a faster one and I missed it) and served as my benchmark, by approximately a factor of 3 ;-)

The basic idea of my approach: As the runtime of any algorithm for finding all possible combinations of five words with all distinct letters will be $O(n_1*n_2*n_3*n_4*n_5)$, we must keep our numbers of words $n_i$ for the first, second, etc. word as small as only possible. Hence the first thing we must do is to get rid of as many words as possible beforehand and divide our problem into as small as only possible subgroups.

We can also make use of the fact that there are only {{ len(valid.get(0)) }} valid words such as crypt, glyph or nymph in the word list that contain none of the five vowels a, e, i, o, or u. This means that a combination of five words can only contain a word with two vowels if at least one of them contains none. As can be seen easily (almost all of these either contain either an x or a y), the maximum number of words from that vowelless list that can be combined is two.

So we solve the problem in three parts:

* Five words where each contains none or one vowel -- here we can even split the word list by vowel to speed things up: The numbers of max-one-vowel words are {{ {k: len(v) for k, v in vowels.items()} }}.
* One out of {{ len(valid.get(0)) }} words that contains no vowel, three out of {{ len(valid.get(1)) }} words that contain at most one and one out of {{ len(valid.get(2)) }} words that contains at most two vowels
* Two out of {{ len(valid.get(0)) }} words that contain no vowel, one out of {{ len(valid.get(1)) }} words that contains at most one and two out of {{ len(valid.get(3)) }} words that contain at most three vowels

and build everything together to also find {{ len(result) }} combinations as Benjamin did.

## Final thoughts

Even though this approach clearly is no speed champ, my hope is that this *vowel based reduction of possible word combinations that need to be checked* can maybe help somebody else's super fast algorithm to even save a bit more time ;-)

What I found a surprise in the progress is that only {{ len(zero) }} out of these {{ len(result) }}  combinations contain none of those {{ len(valid.get(0)) }} non-vowel words. Originally I thought that combinations with non-vowel words would be the exception rather than the rule, but I had to learn that using one of them opens the chance for another one to contain one word with two vowels, which are {{ round(len(valid.get(2))/len(valid.get(1)),1) }} times more frequent than words with only one vowel.

## The result
Here are the word combinations:

In [2]:
print(len(result))
print(result)

831
{'brahm,fldxt,pungs,vejoz,wicky', 'chowk,fldxt,jambe,supvr,zingy', 'bhang,fldxt,rumpy,swick,vejoz', 'fldxt,quack,verby,whomp,zings', 'chirp,fldxt,gumby,swank,vejoz', 'bumpy,flang,hdqrs,twick,vejoz', 'fldxt,gconv,jerky,sabzi,whump', 'fldxt,gumby,snick,vejoz,wharp', 'bungy,fldxt,pashm,vejoz,wrick', 'becks,fldxt,ginzo,jarvy,whump', 'backy,fldxt,rings,vejoz,whump', 'busky,chawn,fldxt,grimp,vejoz', 'bumph,cawky,fldxt,grins,vejoz', 'chawk,fldxt,gimps,runby,vejoz', 'brave,fldxt,jocks,whump,zingy', 'bizen,fldxt,gucks,jarvy,whomp', 'bumps,fldxt,hacky,vejoz,wring', 'brick,fldxt,mungy,vejoz,whaps', 'fldxt,gawby,runch,skimp,vejoz', 'fldxt,gumby,spink,vejoz,warch', 'bingy,fldxt,rucks,vejoz,whamp', 'bunks,fldxt,gimpy,vejoz,warch', 'brock,japyx,seqwl,vingt,zhmud', 'bumpy,fldxt,gnash,vejoz,wrick', 'fldxt,grabs,nicky,vejoz,whump', 'crisp,fldxt,gumby,vejoz,whank', 'fldxt,grovy,jumps,whack,zineb', 'flong,japyx,twick,verbs,zhmud', 'chump,fldxt,kirby,swang,vejoz', 'fcomp,gunky,hdqrs,vibex,waltz', 'bump