# COGS 188 - Project Proposal

# Project Description

You have the choice of doing either (1) an AI solve a problem style project or (2) run a Special Topics class on a topic of your choice.  This repo is assuming you want to do (1).  If you want to do (2) you should fill out the Gradescope proposal for that instead of using this repo.

You will design and execute a machine learning project. There are a few constraints on the nature of the allowed project. 
- The problem addressed will not be a "toy problem" or "common training students problem" like 8-Queens or a small Traveling Salesman Problem or similar
- If its the kind of problem (e.g., RL) that interacts with a simulator or live task, then the problem will have a reasonably complex action space. For instance, a wupus world kind of thing with a 9x9 grid is definitely too small.  A simulated mountain car with a less complex 2-d road and simplified dynamics seems like a fairly low achievement level.  A more complex 3-d mountain car simulation with large extent and realistic dynamics, sure sounds great!
- If its the kind of problem that uses a dataset, then the dataset will have >1k observations and >5 variables. I'd prefer more like >10k observations and >10 variables. A general rule is that if you have >100x more observations than variables, your solution will likely generalize a lot better. The goal of training an unsupervised machine learning model is to learn the underlying pattern in a dataset in order to generalize well to unseen data, so choosing a large dataset is very important.
- The project must include some elements we talked about in the course
- The project will include a model selection and/or feature selection component where you will be looking for the best setup to maximize the performance of your ML system.
- You will evaluate the performance of your ML system using more than one appropriate metric
- You will be writing a report describing and discussing these accomplishments


Feel free to delete this description section when you hand in your proposal.

# Names
- WonJae Lee
- Luke Skerrett
- Alex Vo


# Abstract 
We would like to create a clone of the famous Wordle New York Times brain teaser. The user will be given 2 options: to compete against an AI opponent, or to watch the AI try to complete the Wordle successfully. The goal of the game is in a sense is to display how AI is a worthy opponent to anyone playing Wordle. With the game, will come two different game modes, in which one the AI only mode is a single wordle screen whereas the competetive mode will feature a split-screen with the AI screen being censored. We would like to implement reinforcement learning to train an agent that will solve the wordle **better** on average that a human opponent, which would have to better than ~ 4 guesses. We think that this a fun and clever demonstration of the power of AI and its ingenuity.


# Background
Wordle is a game that roughly 2 million <a name="post"></a>[<sup>[1]</sup>](#postnote) people play everyday, and it is a web service hosted by the New York times. We all remember in December 2021, when the game rose to fame seemingly out of nowhere!
We would like to develop an agent using reinforcement learning that can successfully solve Wordle. Upon doing some research on these agents, we discovered a few arrticles that help show which methods are best to employ. In an agent designed by Andrew Ho, he utilized the A2C algorithm, which acheived 99% effectiveness <a name="tettamanti"></a>[<sup>[2]</sup>](#tettamantinote). The game state was is represented as a one-hot encoding (whether the given letter goes in that position or not).
 
Here is an example of inline citation. After government genocide in the 20th century, real birds were replaced with surveillance drones designed to look just like birds<a name="lorenz"></a>[<sup>[1]</sup>](#lorenznote). Use a minimum of 2 or 3 citations, but we prefer more <a name="admonish"></a>[<sup>[2]</sup>](#admonishnote). You need enough citations to fully explain and back up important facts. 

Remeber you are trying to explain why someone would want to answer your question or why your hypothesis is in the form that you've stated. 

# Problem Statement

Clearly describe the problem that you are solving. Avoid ambiguous words. The problem described should be well defined and should have at least one ML-relevant potential solution. Additionally, describe the problem thoroughly such that it is clear that the problem is quantifiable (the problem can be expressed in mathematical or logical terms), measurable (the problem can be measured by some metric and clearly observed), and replicable (the problem can be reproduced and occurs more than once).

# Data

You should have a strong idea of what dataset(s) will be used to accomplish this project. 

If you know what (some) of the data you will use, please give the following information for each dataset:
- link/reference to obtain it
- description of the size of the dataset (# of variables, # of observations)
- what an observation consists of
- what some critical variables are, how they are represented
- any special handling, transformations, cleaning, etc will be needed

If you don't yet know what your dataset(s) will be, you should describe what you desire in terms of the above bullets.

# Proposed Solution

In this section, clearly describe a solution to the problem. The solution should be applicable to the project domain and appropriate for the dataset(s) or input(s) given. Provide enough detail (e.g., algorithmic description and/or theoretical properties) to convince us that your solution is applicable. Why might your solution work? Make sure to describe how the solution will be tested.  

If you know details already, describe how (e.g., library used, function calls) you plan to implement the solution in a way that is reproducible.

If it is appropriate to the problem statement, describe a benchmark model<a name="sota"></a>[<sup>[3]</sup>](#sotanote) against which your solution will be compared. 

# Evaluation Metrics

* Number of attempts: Track the average number of attempts the agent takes to correctly guess the word. The goal is to minimize the number of attempts, with the optimal being guessing the word in the fewest attempts possible.
* Success rate: Calculate the percentage of games where the agent successfully guesses the word within the given number of attempts (e.g., 6 attempts in the original Wordle game). A higher success rate indicates better performance.
* Comparison to baseline: Compare the performance of your reinforcement learning agent to a baseline algorithm or a human player.

# Ethics & Privacy

* We will make sure our dataset of words is obatined and used in an ethical and legal manner
* We will ensure that the words are diverse, inclusive, and do not contain offensive, discriminatory, or sensitive content

# Team Expectations 

* Communicate in a timely manner
* Complete assigned works by the deadlines
* Ask for help if needed
* Attend meetings

# Project Timeline Proposal

Replace this with something meaningful that is appropriate for your needs. It doesn't have to be something that fits this format.  It doesn't have to be set in stone... "no battle plan survives contact with the enemy". But you need a battle plan nonetheless, and you need to keep it updated so you understand what you are trying to accomplish, who's responsible for what, and what the expected due dates are for each item.

| Meeting Date  | Meeting Time| Completed Before Meeting  | Discuss at Meeting |
|---|---|---|---|
| 4/29  |  8 PM |  Brainstorm topics/questions (all)  | Determine best form of communication; Discuss and decide on final project topic; begin background research; assign proposal parts to complete | 
| 5/2  |  9 PM |  Do background research on topic (Luke) | Discuss dataset(s) and solutions (Alex); Evaluation metrics, ethics & privacy, team expectation (WonJae) | 
| 5/3  | 10 AM  | Edit, finalize, and submit proposal; Search for datasets | Discuss Wrangling and possible analytical approaches; Assign group members to lead each specific part  |
| 5/10  | 6 PM  | Import & Wrangle Data, do some EDA | Review/Edit wrangling/EDA; Discuss Analysis Plan  |
| 5/17  | 12 PM  | Finalize wrangling/EDA; Begin programming for project | Discuss/edit project code; Complete project |
| 5/31 | 12 PM  | Complete analysis; Draft results/conclusion/discussion | Discuss/edit full project |
| 6/12  | Before 11:59 PM  | NA | Turn in Final Project  |

# Footnotes
<a name="tettamantinote"></a>1.[^](#tettamanti): Tettamanti, T. How To Solve Wordle Using Machine Learning. *Rootstrap*. https://www.rootstrap.com/blog/how-to-solve-wordle-using-machine-learning<br> 
<a name="postnote"></a>2.[^](#post): Wordle players are cheating every day, mathematician says. *The Washington Post* https://nationalpost.com/news/wordle-cheating-claims#:~:text=Put%20another%20way%3A%20Of%20the,that%20might%20rise%20to%201%2C320.<br>
<a name="sotanote"></a>3.[^](#sota): Perhaps the current state of the art solution such as you see on [Papers with code](https://paperswithcode.com/sota). Or maybe not SOTA, but rather a standard textbook/Kaggle solution to this kind of problem
