# COGS 118B - Final Project

# Insert title here

## Group members

- Pelé
- Diego Maradonna
- Johan Cruyff
- Roberto Carlos
- Franz Beckenbaur

# Abstract 
This section should be short and clearly stated. It should be a single paragraph <200 words.  It should summarize: 
- what your goal/problem is
- what the data used represents 
- the solution/what you did
- major results you came up with (mention how results are measured) 

__NB:__ this final project form is much more report-like than the proposal and the checkpoint. Think in terms of writing a paper with bits of code in the middle to make the plots/tables

# Background

Fill in the background and discuss the kind of prior work that has gone on in this research area here. **Use inline citation** to specify which references support which statements.  You can do that through HTML footnotes (demonstrated here). I used to reccommend Markdown footnotes (google is your friend) because they are simpler but recently I have had some problems with them working for me whereas HTML ones always work so far. So use the method that works for you, but do use inline citations.

Here is an example of inline citation. After government genocide in the 20th century, real birds were replaced with surveillance drones designed to look just like birds<a name="lorenz"></a>[<sup>[1]</sup>](#lorenznote). Use a minimum of 2 or 3 citations, but we prefer more <a name="admonish"></a>[<sup>[2]</sup>](#admonishnote). You need enough citations to fully explain and back up important facts. 

Remeber you are trying to explain why someone would want to answer your question or why your hypothesis is in the form that you've stated. 

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Data

Detail how/where you obtained the data and cleaned it (if necessary)

If the data cleaning process is very long (e.g., elaborate text processing) consider describing it briefly here in text, and moving the actual clearning process to another notebook in your repo (include a link here!).  The idea behind this approach: this is a report, and if you blow up the flow of the report to include a lot of code it makes it hard to read.

Please give the following infomration for each dataset you are using
- link/reference to obtain it
- description of the size of the dataset (# of variables, # of observations)
- what an observation consists of
- what some critical variables are, how they are represented
- any special handling, transformations, cleaning, etc you have done should be demonstrated here!


# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics
## Summary
The evaulation metrics that we utilized were fitting for our project, and they has been used on other wordle AI related projects before. We ended up using **Success Ratio** along with **Guesses Per Success** over 100000 games. We think that these are valid metrics to use since when we think about an AI playing a wordle, the first question you ask is "did it get the word?" The obvious next question is "in how many guesses did take?". We believe that we represent these questions perfectly with our metrics and will show how they are great quanitifiers for our agents. The **Success Ratio** is determined by the wins vs losses that an AI attains along 100000 games, and the guesses per success are the amount of guesses that each successful run took. We additionally used some common metrics to analyze the models: **mean**, **standard deviation**, and **training time** to measure the effectiveness, consistency, and efficiency of each model respectively.
## Mathematical Notation
Success Ratio $ = \frac{\sum_{i=1}^{100000} w_i}{100000}$ where $w_i$ is one hot encoded, $w_i=0$ is a loss and $w_0=1$ is a win

Guesses per Success $ = \frac{\sum_{i=1}^{G}g_i}{G}$, where G is the amount of games won and $g_i$ is the amount of guesses.

# Results

### First Attempt at Wordle AI
We first were lost when it came to the implementation and took a while to achieve a solid a plan on how to implement wordle... should we make it a single player game? should we have a VS AI mode? How will we interface with the game and how will we implement the AI portion? We first downloaded a complicated GUI, and tried to poke at it... to no avail, we decided against the complicated GUI, and researched Github until we stumbled upon the gym_wordle repository, which allowed for a seamless API integration with the project.

### Algorithm Selection
We were initially uncertain about which algorithms to and spent some time brainstorming, researching, and meeting to discuss which to use. Should we go for Q-Learning or try something more complex like DDPG? It was a question of not only how well would these models handle our environment, but how would efficiently would they be? We researched and tried Q-Learning which we went over a lot in class, but we realized it struggled with the complexity of our action spaces. DDPG seemed like another option but was a little hot and cold as it required a lot of tuning and could become unstable while training. After doing a lot of trial and error we came across both A2C and PPO. A2C offered an efficient way to train across multiple agents simultaneously which we knew would give an even approach between bias and variance. PPO impressed us with its ability to manage policy updates smoothly without excessive tweaking. In the end, we chose A2C and PPO for their practical benefits in balancing efficiency and stability, which perfectly aligned with our project's needs.

### Model and Training Implementation
![alt](implementation.png)
We decided to implement both of the models one after the other to get a good idea of the differences between the two. Above is the code that runs each of the models, and extracts the std and mean to graphs.

### Mean Reward Graph
![graph](graph.png)
Both models exhibit considerable fluctuations in performance, with A2C showing wider variability and occasionally dipping below zero. PPO achieves higher peaks in mean rewards shortly but is more stable overall. A2C’s synchronous updates lead to greater exploration but lead to a more variant performance, whereas PPO’s "safer" policy updates yield more stable yet oscillating and shifting results. These differences are a result of the underlying attributes of each algorithm's updating and their strategies for balancing exploration and exploitation in the wordle environment. 

### Success Ratio and Guesses Per Success
Success ratio over 100000 games using **A2C**: 0.57136 with 2.94055 guesses on average

Success ratio over 100000 games using **PPO**: 0.57433 with 2.95989 guesses on average

# Discussion

### Same Results?!
We observed similar results using A2C and PPO, with success ratios of 0.57136 and 0.57433, respectively, and an average of around 2.95 guesses per success. A2C’s parallelized actor-critic approach efficiently balances exploration and exploitation, leading to great policy learning. PPO’s clipped objective function provides stable policy updates, preventing large deviations and maintaining control. Both algorithms excel at managing the trade-off between exploration and exploitation. These complementary strengths resulted in nearly identical performance in our complex environment, partly due to the shear amount of games we played, they both deviated to a similar success rate. 

### AI > Human ?
Our AI models A2C and PPO, consistently outperform human players in Wordle, who typically solve the puzzle in an average of 4.71 guesses​​. Our AI achieves a higher success ratio of around 0.57 with only about 2.95 guesses on average per game. This is to say that the human trial and error approach is suboptimal, and the AI models will take a smarter and more agnostic approach to Wordle.

### RL: Good for Complex Environments
Reinforcement Learning (RL) excels in complex environments like Wordle due to its ability to learn and adapt from cumulative feedback, refining strategies over time. Algorithms like A2C, PPO, SAC, and DQN can develop nuanced policies by learning from large amounts of games while capturing complex patterns and long-run paradigms that are crucial for solving puzzles efficiently. One of our biggest discoveries was that this adaptive learning capability allows RL models to surpass past approaches by continuously updating their strategies based on advancing given game states.


### Limitations
We can say that since these models were trained on a specific set of words that they may struggle with words not in the original set.

### Ethics & Privacy
We couldn't find any pressing issues for ethics and privacy regarding our Wordles models, but we would like to state how we can address potential issues.

In the future, we will guard informed consent, making sure all parties know of their involvement in AI. We will always value data security, and refuse to work in environments where security may be an issue or may be breached.

### Conclusion
A2C and PPO produced nearly identical results in our AI Wordle experiments, with success ratios around 0.57 and averaging approximately 2.95 guesses per game - due to their effective joining of exploration and exploitation. This convergence is attributed to their strong policy and updating mechanisms and extensive game play. Additionally, our AI models consistently beat-out human players, who average 4.71 guesses, highlighting the efficiency of RL AI’s data-driven strategies over human trial-and-error.

# Footnotes
<a name="lorenznote"></a>1.[^](#lorenz): Lorenz, T. (9 Dec 2021) Birds Aren’t Real, or Are They? Inside a Gen Z Conspiracy Theory. *The New York Times*. https://www.nytimes.com/2021/12/09/technology/birds-arent-real-gen-z-misinformation.html<br> 
<a name="admonishnote"></a>2.[^](#admonish): Also refs should be important to the background, not some randomly chosen vaguely related stuff. Include a web link if possible in refs as above.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem
