# Summary

Wordle rules:

1. Each letter existing in a word is 'yellow' if the letter exists in the target. 'Green' if the position is also correct.
1. You get six guesses

Each word will have peers, which is similar words that include letters but exclude others.

### Credits

Note that the strategy is from Tiffany Chen. I just wrote the Python code. I wasted two hours this morning because I did not even understand the basic rules of Wordle. 


### References

A great example of constraint propagation is [Peter Norvig solving all the Sudoku puzzles](https://norvig.com/sudoku.html). The [OR-Tools Constraint Optimization guide from Google](https://developers.google.com/optimization/cp) is a learning opportunity for me. We can make a more general connection to the [boolean satisfiability problem](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem). As a higher level goal, I would like to understand if these approaches can help improve AI interpretability/explainability.

### Dependencies and Constants

Keep track of these for maintainability of code.

In [1]:
from collections import defaultdict
import json
import random
import math

In [2]:
CONSONANTS = set(['b','c','d','f','g','h','j','k','l','m','n','p','q','r','s','t','v','w','x','z'])
VOWELS = set(['a','e','i','o','u','y'])

## Wordle Words

Wordle words are from the word list found by [Robert Reichel, (Reverse Engineering Wordle)](https://reichel.dev/blog/reverse-engineering-wordle.html). I retrieved the list by using Chrome Dev Tools, parsing the Javascript for the `var La` array. I also found a `Ta` array containing a second list of words. Reichel appears to focus on the `La` array so I do the same here. Screenhots of parsing the Javascript for the word lists:

![la_wordle_words](var_la.png)

![ta_wordle_words](var_ta.png)

In [3]:
with open("la.json", "r") as f:
    data = json.load(f)

## Wordle Game

Replicating the wordle game so we can run simulations.

In [4]:
def wordle_word(data, seed=42):
    """ Word of the day. """
    random.seed(seed)
    return data[random.randint(0,len(data)-1)]
    

In [5]:
def play_wordle(guess:str, word_of_the_day:str) -> list:
    board = []
    correct_letters = set(word_of_the_day)
    has_letters = set(guess)
    has_letters = has_letters & correct_letters
    for correct_letter, guessed_letter in zip(word_of_the_day, guess):
        if correct_letter == guessed_letter:
            board.append(guessed_letter)
        else:
            board.append("")
    return board, has_letters
        

In [6]:
example_wordle_word = wordle_word(data)
example_guess = "adieu"
print("example guess '{}''\nexample wordle '{}'".format(example_guess, example_wordle_word))

example guess 'adieu''
example wordle 'onset'


In [7]:
play_wordle(example_guess, example_wordle_word)

(['', '', '', 'e', ''], {'e'})

## Global Distribution of Vowels & Consonants

We want to maximize the number of correct letters guessed even if 
the order is not apparent. We do this by generating frequencies of
vowels and consonants from the Wordle words. Our general strategy
is to narrow down candidates by leveraging the global distribution
of letters within Wordle words.

In [8]:
def most_shared(data:list, has_vowels=True):
    possible = []
    letters = defaultdict(int)
    shared = CONSONANTS
    if has_vowels:
        shared = VOWELS
    for word in data:
        wordset = set(word)
        overlap = len(wordset & shared)
        if overlap > 0:
            possible.append((word, overlap))
        for letter in (wordset & shared):
            letters[letter] += 1
            
    letters = {k:v for k,v in sorted(letters.items(), key=lambda x: x[1], reverse=True)}
    possible = sorted(possible, key=lambda x: x[1], reverse=True)
    return possible, letters

In [9]:
possible_vowels, usage_vowels = most_shared(data, has_vowels=True)
possible_vowels[:5]

[('bayou', 4), ('audio', 4), ('quiet', 3), ('adobe', 3), ('repay', 3)]

In [10]:
possible_conson, usage_conson = most_shared(data, has_vowels=False)
possible_conson[:5]

[('blush', 4), ('dwarf', 4), ('stink', 4), ('bench', 4), ('fresh', 4)]

In [11]:
usage_vowels

{'e': 1056, 'a': 909, 'o': 673, 'i': 647, 'u': 457, 'y': 417}

## Establish a Set of Candidates

We use all of the Wordle words as our starting set of candidates.

In [12]:
# Establish set of candidates
cands = {k: set(k) for k in data}

len(cands)

2315

## Assignment Strategy

We're assigning guesses based on most frequent vowels, followed by most frequent consonants. In the event that we guess the correct position of a letter, we will also assign based on that letter position within the game board.

In [13]:
def break_ties(cands, usage):
    # break ties by scoring with most frequent vowels
    uv = {x[0]: idx for idx, x in enumerate(reversed(usage_vowels.items())) }
    for bucket, words in cands.items():
        scored = []
        for word in words:
            scored.append((word, sum([uv.get(i,0) for i in word])))
        scored = [i[0] for i in sorted(scored, key=lambda x: x[1], reverse=True)]
        cands[bucket] = scored
    return cands
    
def feature_bucket(cands:dict, usage:dict, has_vowels=True) -> list:
    buckets = defaultdict(list)
    cset = CONSONANTS
    if has_vowels:
        cset = VOWELS
    for word, wordset in cands.items():
        buckets[len(wordset & cset)].append(word)
    buckets = break_ties(buckets, usage)
    return buckets 

In [14]:
feature_bucket(cands, usage_vowels, has_vowels=True)[4][:5]

['audio', 'bayou']

In [15]:
 play_wordle("aeiou", example_wordle_word)

(['', '', '', '', ''], {'e', 'o'})

In [16]:
board, found = play_wordle("audio", example_wordle_word)
board, found

(['', '', '', '', ''], {'o'})

Here we can see that just inputting vowels instead of a real word gets us
further along then choosing the word with the most vowels. We broke the tie
between words using most vowels by sorting within buckets by most frequent 
vowels.

## Constraint Propagation

1. If a word does not contain found letters, eliminate it from candidates.
1. If ties are found, break them using most frequent letters.

In [17]:
def eliminate(cands:dict, game:list, found:set) -> dict:
    candidate_words = list(cands.keys())
    for candidate_word in candidate_words:
        
        # (1) Eliminate based on shared letters
        wordset = cands[candidate_word]
        if len(wordset & found) != len(found):
            cands.pop(candidate_word)
            continue
            
        # (2) Eliminate based on shared position
        for i in range(len(game)):
            if len(game[i]) > 0 and candidate_word[i] != game[i]:
                cands.pop(candidate_word)
                break
    return cands

In [18]:
# test elimination step

test_cands = {
 'cigar': {'a', 'c', 'g', 'i', 'r'},
 'rebut': {'b', 'e', 'r', 't', 'u'},
 'sissy': {'i', 's', 'y'},
 'humph': {'h', 'm', 'p', 'u'},
}

test_game = ['','i','','','']
test_found = set(['i'])

eliminate(test_cands, test_game, test_found)

{'cigar': {'a', 'c', 'g', 'i', 'r'}, 'sissy': {'i', 's', 'y'}}

## Playing Wordle

We've established relevant distributions. Let's play Wordle.

In [19]:
def choose_guess(buckets:dict) -> str:
    choose = max(list(buckets.keys()))
    return buckets[choose][0]

def solved_game(board:list):
    return all([len(i) > 0 for i in board])

def report_solution(board:list, guess_attempt:int, report=True):
    h = {
        "board" : "".join(board),
        "attempts" : guess_attempt + 1,
    }
    if report:
        print("Solved game.")
        print(h)
    return h

def compute_wordle(cands:dict, example_wordle_word:str, usage_vowels:dict, usage_cons:dict, 
                   max_tries=6, report=True):
    
    for wordle_guess_attempt in range(max_tries):
        
        # (1) Find vowels
        vowel_bucket = feature_bucket(cands, usage_vowels, has_vowels=True)
        guess_word = choose_guess(vowel_bucket)
        board, found = play_wordle(guess_word, example_wordle_word)
        if solved_game(board):
            return report_solution(board, wordle_guess_attempt, report)
        
        print("{}, {}".format(guess_word, len(cands)))
    
        cands = eliminate(cands, board, found)
        print("cands: {}".format(len(cands)))
    
        # (2) Find consonants
        conson_bucket = feature_bucket(cands, usage_vowels, has_vowels=False)
        guess_word = choose_guess(conson_bucket)
        board, found = play_wordle(guess_word, example_wordle_word)
        if solved_game(board):
            return report_solution(board, wordle_guess_attempt, report)
        print("{}, {}".format(guess_word, len(cands)))

        cands = eliminate(cands, board, found)
        print("cands: {}".format(len(cands)))
        
    print("Unable to solve game")



In [20]:
compute_wordle(cands, example_wordle_word, usage_vowels, usage_conson)

audio, 2315
cands: 673
forth, 673
cands: 181
atone, 181
cands: 6
Solved game.
{'board': 'onset', 'attempts': 2}


{'board': 'onset', 'attempts': 2}

## Running Simulations

TODO: need to validate quality of solution by running simulations. Want to see if results are robust or more work is needed here.