# Introduction (Brian)

- What is reinforcement learning
- What the goals of the project are/were
  - train a model to learn air hockey

# Creating The Game (Brian)

- using pygame to create air hockey
- puck and paddle class are fed into the game class which controls game logic
- this then feeds into a main file where you can switch between modes (player v. player, player v. environment, RL v. environment)
- we created two different simple bots to play and train against (one just went up and down and the other chased the ball around)
  - while the one that just moved in the y direction is harder to play against, the puck can slow down on that half thus training is quicker and easier against the newer bot

# Wrapping Game in Gym Environment (Kang)

- gym is a toolkit for developing and comparing reinforcement learning algorithms
- what is the environment structure
- how is this then used for a RL algorithm

# Model Selection (Jorge)

- initially wanted to do a CNN model that takes downsampled grayscale frames of the game as input -> from 800x400 to 16x8
- would take in stack of (previous) three frames so that it can detect velocity
- while this seemed to work, the training was slow so we changed to a linear model
- the linear model took 8 inputs - x and y for each paddle and the puck (6) and the puck's velocity in the x and y direction (2)
- this gives the model the exact features we deem important which expedites training (besides just having less input)
- we had numerous versions of each model (2v3 hidden layers for the linaer model) and then the cnn had different number of linear output layers

# Proof of Concept (Kang)

- created simplified air hockey game - kinda a mesh between air hockey and pong
- did this directly in a custom gym environment
- wanted to first test the environment and then train to demonstarte proof of concept
- short videos to display results

# Training Specs (Eric)

We began training the reinforcement models on our personal laptops. This was sufficient at the beginning when working with the proof of concept and training a reinforcement model to play Pong. We could train the model beyond human performance overnight in 250,000 episodes. When we moved to train models on the Air Hockey personal computers fell short. 

The Air Hockey game is more complex than Pong. The physics of the game are more complex. Computing the new velocity of the puck after contact with a paddel is more complex, making it take longer to render the environment each step. What the model has to learn is also more complex since it has to score on a goal and not an entire side, and the shape of the padsdels adds complexity to how the model should hit the puck. These two facts made training take longer to reach the same results as compared to pong. It became clear during the early training runs that personal laptops were insuffecient to train the reenforment model to play. It was not fessible to keep a laptop running for the time required to train the model while having to use the laptop for other class work so we looked for other options. 

### Google Colab

The first option we looked into was Google Colab Pro, a hosted Jupyter Notebook service that provides access to computing resources [-@colab]. The Pro tier provides access to faster Graphical Processing Units (GPU) and more system memory. We tried training on systems connected to L4 GPU with no additional Random Access Memory (RAM). We used the following code provided by Google to ensure the instance was connected to and using the GPU:

```{{python}}
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
 print('Not connected to a GPU')
else:
 print(gpu_info)
```

Google Colab has some preinstalled libraries for users, but it laked the gymnasium library our code depended on. After connecting the session to Google Drive we used the following code to navigate to the correct working directory, install the gymnasium library, and run our python script:

```{{python}}
#set and check working directory
%cd /content/drive/MyDrive/neural_nets_project/models/cnn
%pwd

#install required packages fo colab
!pip install gymnasium

#run code
!python dqn.py
```
We quickly moved away from Colab after we began testing on it. While the model could take advantage of the connected GPU, stepping through the game did not. Stepping through the game was the most signifficant bottleneck to training, so we moved from a paid-for-use solution to something we had on hand. 

### Intel NUCs

We moved training to four Intel Next Unit of Computing (NUC). These are small computers messuring approximatly 4' x 4' x 2.5' and are used in many different usecases from desktopreplacement to edge computing nodes for major corperations like Chick-fil-A [-@Chambers_2023]. Our NUCs are equiped with i5-1135G7, 64GB RAM, and 500GB m.2 sold state drive. They are running Ununtu Server (22.044 LTCS). While there are many solutions to ensure the code ran continuously wee used screen to maintain persistent sessions even after we disconnect from the devices. The network's firewall is running a Virtual Private Network (VPN) service to allow us to remotely manage the servers from anywhere. 

While not as fast as the Google Colab instances, we could leave the systems run for days without interuption, something Colab doesn't support unless paying into a higher teir. The mini PC's each ran a model from 

# Training Process (Billy)

- Replay memory
- policy net
- target net
- selecting the action
- optimize functoin
- loss function

# Training Results (Eric & Billy)

- how models compared against each other (eric)
- maybe a plot or two with some descriptions (eric)


In [None]:
#import libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

#import data
new_ai = pd.read_csv("../data/new_ai/training_log.csv")
new_ai_750 = pd.read_csv("../data/new_ai_750/training_log.csv")
jeff = pd.read_csv("../data/old_ai/training_log.csv")
old_ai_750 = pd.read_csv("../data/old_ai_750/training_log.csv")
old_ai = pd.read_csv("../data/old_ai/training_log.csv")
final_run = pd.read_csv("../data/final_run/training_log.csv")

#create running sum of rewards
new_ai['Running Total'] = new_ai['Total Reward'].cumsum()
new_ai_750['Running Total'] = new_ai_750['Total Reward'].cumsum()
old_ai['Running Total'] = old_ai['Total Reward'].cumsum()
old_ai_750['Running Total'] = old_ai_750['Total Reward'].cumsum()
final_run['Running Total'] = final_run['Total Reward'].cumsum()

#add model's nick name
new_ai['model'] = 'new_ai'
new_ai_750['model'] = 'new_ai_750'
old_ai['model'] = 'old_ai'
old_ai_750['model'] = 'old_ai_750'
final_run['model'] = 'final_run'

#combine multiple dataframes into one
df = pd.concat([new_ai, new_ai_750, old_ai, old_ai_750, final_run], ignore_index=True)

#line plot showing running sum of rewards
plt.figure(figsize=(12, 6))
sns.set(font_scale=1.5)
sns.lineplot(data=df, x='Episode', y='Running Total', hue='model')
plt.title('Sum of Rewards')
plt.xlabel('Episodes')
plt.ylabel('Sum of Rewards')
plt.show()

- how did the model perform (eric)
- include videos at various iterations (billy)

# Human v. RL - billy if accomplished

- this is now a feature we have. billy or eric will train a model to make it super good and then hopefully we will be able to host this
- if we can, maybe this part just includes a link

# Conclusion (Jorge)

- did we accomplish our goals? what did we learn? 
- is there anything we think could have been done better?
- is there anything we would potentially implement in the future?