With all the hype surrounding Wordle, I wanted to challenge myself to build an optimal wordle solver.

I relied on the intuition and information given by the wordle solver videos by youtuber 3Blue1Brown seen [here](https://www.youtube.com/watch?v=v68zYyaEmEA). No code was copied.

1. The Challenge
- Developing an algorithm that solves Wordle puzzles optimally with three modes:
  1. Automatic (computer played games for analysis)
  2. Manual (terminal interface for a player) 
  3. Guide (terminal interface for a player looking for optimal word suggestions for their Wordle game)
- Performance is measured by the lowest average number of guesses.

2. Gathering the Data

- The text file "wordle_words.txt" contains the list of human curated words for the official game. 
- The file "allowed_guesses.txt" contains a comprehensive list of five letter words that can be applied as guesses, but most of which will never be used as a puzzle solution. 

Since I expect this to be computationally demanding, I'll be basing my possible guesses on the official list of answers. This is bound to have an overfitting effect and render the solver inaccurate if the official list changes. The same logic can be applied with an updated official list of answers - but it will require an exhaustive 2-step entropy search to find the most optimal first guess. Regardless, "salet" will serve you well if the official list does not steer away from well-understood words (I make the assumption that the expected information potential of this word will more or less be the same based on this 2315 sample being adequately representative of the general list of well known five letter words).

3. Word score methodology: applying information theory (entropy).

$$ max(E[\textrm{All Information}]_{w, g}) = max(E[\textrm{Information}]_{w_{1}, g_{1}} + ... + E[\textrm{Information}]_{w_{j}, g_{n}})$$

where 
- $w$ - a unique word followed by distinct previous word(s) (if it's not the first word in the chain) 
- $g$ - guess number 
- $E[\textrm{Information}] = \sum_{x}^{X}p(x) \cdot log_{2}(1/p(x))$
- $X$ - the set of all possible information patterns derived from a specific word
- $p(x)$ - the probability of a unique information pattern 
- $log_{2}(1/p(x))$ - the number of times the word pool of possible answers was cut in half

This equation determines the most optimal word to use for a certain guess row based on the expected information "value" obtained from a word and its associated possible information patterns. The log portions of the equation measures information as "bits" where a single bit is associated with the pool of possible answers being cut into half. 

We know for a fact that "salet" is the optimal first word based on 3Blue1Brown's exhaustive search approach, where

Salet resulted in: 
- 80 two guess answers,
- 1,225 three guess answers, 
- 965 four guess answers, 
- 45 five guess answers,
- and 0 six guess answers 

Since I'd prefer not to spend the time on the same exhaustive method, I'll be hard-coding salet as my first guess everytime. Every subsequent guess only implements a single step entropy since I assume the quantity of information obtainable after two guesses is negligible (most games are finished within 3 to 4 guesses).

In [1]:
from resources.wordle_sim_stats import *

ALLOWED_WORDS: set = load_words_from_cwd_folder(folder_name="data", 
                                        file_name="allowed_guesses.txt")
ANSWER_WORDS: set = load_words_from_cwd_folder(folder_name="data", 
                                        file_name="answer_words.txt")
# wordle: Wordle = Wordle(answer_words=ANSWER_WORDS, 
#                         allowed_words=ALLOWED_WORDS, 
#                         show_output=False, 
#                         is_automated=True,
#                         use_hints=False)
# wordles_sim: SimulateGameStats = SimulateGameStats(wordle)

# wordles_sim.simulate_all_games()

## Results:

1. Automatic Wordle - implemented between solver.py, wordle.py and wordle_sim_stats.py

In [3]:
# automatic by setting is_automated to true
wordle: Wordle = Wordle(answer_words=ANSWER_WORDS, 
                        allowed_words=ALLOWED_WORDS, 
                        show_output=True, 
                        is_automated=True,
                        use_hints=False)
wordle.play()

---Wordle---
The aim of the game is to guess an unknown five-letter word within six guesses.
Each guess provides a hint for each letter.
- [32mGreen[39m indicates an exact letter match at that particular position.
- [33mYellow[39m indicates that the letter at that position is elsewhere in the answer.
- [47m[30mGrey[0m infers that the letter is not found in the answer.

1: [47m[30ms[0m[47m[30ma[0m[33ml[39m[33me[39m[47m[30mt[0m
2: [47m[30mb[0m[47m[30mi[0m[47m[30mr[0m[32ml[39m[32me[39m
3: [47m[30mc[0m[47m[30my[0m[47m[30mc[0m[32ml[39m[32me[39m
4: [32mw[39m[32mh[39m[32mo[39m[32ml[39m[32me[39m
You won!


2. Manual (terminal player interface) - implemented in wordle.py

In [4]:
# manual when is_automated is disabled
# the use_hints setting outputs usable words as potential answers when enabled
wordle: Wordle = Wordle(answer_words=ANSWER_WORDS, 
                        allowed_words=ALLOWED_WORDS, 
                        show_output=True, 
                        is_automated=False,
                        use_hints=True)
wordle.play()

---Wordle---
The aim of the game is to guess an unknown five-letter word within six guesses.
Each guess provides a hint for each letter.
- [32mGreen[39m indicates an exact letter match at that particular position.
- [33mYellow[39m indicates that the letter at that position is elsewhere in the answer.
- [47m[30mGrey[0m infers that the letter is not found in the answer.

1: [47m[30ms[0m[33ma[39m[32ml[39m[47m[30me[0m[47m[30mt[0m
hints: alloy, polka, bylaw, molar, allay, villa, polar, aglow, allow, lilac
2: [33ma[39ml[32ml[39m[33mo[39m[47m[30my[0m
hints: polka, molar, polar
Invalid input. Use a valid lowercase five letter word.
3: [47m[30mp[0m[32mo[39m[32ml[39m[47m[30mk[0m[33ma[39m
hint: molar
4: [32mm[39m[32mo[39m[32ml[39m[32ma[39m[32mr[39m
You won!


3. Guide (terminal interface for a player looking for optimal word suggestions for their external Wordle game)

In [5]:
# the initial settings are insignificant for this class method
wordle: Wordle = Wordle(answer_words=ANSWER_WORDS, 
                        allowed_words=ALLOWED_WORDS)
wordle.solve_active_wordle(n_guesses_so_far=1)

---Wordle---
The aim of the game is to guess an unknown five-letter word within six guesses.
Each guess provides a hint for each letter.
- [32mGreen[39m indicates an exact letter match at that particular position.
- [33mYellow[39m indicates that the letter at that position is elsewhere in the answer.
- [47m[30mGrey[0m infers that the letter is not found in the answer.

Follow the prompts to find an optimal word for your active wordle.
g = green, _ = grey, y = yellow, / = grey at index only

1: [47m[30ms[0m[33ma[39m[47m[30ml[0m[33me[39m[47m[30mt[0m
Calculating...


100%|██████████| 3814/3814 [02:43<00:00, 23.36it/s]

Try 'beard' for your next guess





2: [47m[30mb[0m[33me[39m[33ma[39m[47m[30mr[0m[32md[39m
Calculating...


100%|██████████| 1836/1836 [00:03<00:00, 606.34it/s]

Try 'ahead' for your next guess





3: [32ma[39m[32mh[39m[32me[39m[32ma[39m[32md[39m
Game Over!
