## Install Dependencies
You can do docker, colab or local installation. colab may run into time limit issues. For local, you may need wsl on windows since the oscilator has some c code that dont like Windows.
### Docker
Install docker and run the `run_docker.sh` script. This will pull the container and run jupyter server in the container, and you shall be good to go.
### Colab
Use the block below. I removed the rl-zoo part comparing to the lab module. You can add it back with `%pip install --ignore-installed rl-zoo3==2.0.0` most likely (works in docker, not tested for colab). Do not forget to save a separate copy of the notebook to your drive.
### Local
Use the colab block, except the things you clone will be in the current folder or your specified folder instead of /content, and you want to use conda to manage environments most likely.


#### For google colab
Do NOT restart kernel(runtime/session) as google prompted DURING the execution of the following cell. 

Restart the kernel(runtime/session) AFTER the execution of the following cell is completed.

In [None]:
!apt-get update 
!apt-get install -y software-properties-common
!apt-get update && apt-get install swig cmake ffmpeg freeglut3-dev xvfb
%pip install gymnasium
%pip install --ignore-installed rl-zoo3==2.0.0
%cd /content/
!git clone https://github.com/yusenz/gym-maze.git
%cd /content/gym-maze
%pip install .
%cd /content/
%pip install opencv-python-headless
%pip install scikit-learn==0.23.2
%pip install tvb-library
%pip install tvb-framework
%pip install tensorflow[and-cuda]==2.9.1

### Import notice
Since a lot of the imports rely on files in this repo, you may need to clone the repo, cd to the repo root and run the notebook from there. 

In [None]:
!git clone https://github.com/yusenz/RL_course.git
%cd /content/RL_course

Boilerplate code

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

## Agent framework
This time, I will just use the custom agent framework as an example. You may want to follow some code elsewhere that works with gym environment. 

In [None]:
import numpy as np
import gym
class BaseAgent:
    def __init__(self, env, verbose=1, ran_seed=42):
        self.env = env
        # random seed is only set once when the agent is initialized
        self.env.seed(ran_seed)
        self.env.action_space.seed(ran_seed+1)  # why isnt this set at env.seed?
        self.env.observation_space.seed(ran_seed+2)
        self.random_state = np.random.RandomState(ran_seed+3)
        self.observation_space = env.observation_space
        self.action_space = env.action_space
        self.verbose = verbose
        self.cumulative_reward = 0
        self.num_steps = 0
    def select_action(self, state):
        raise NotImplementedError
    def update_step(self, reward: float):
        self.cumulative_reward += reward
        self.num_steps += 1
    def update_episode(self):
        self.reset_episode()
    def update_rollout(self):
        if self.verbose > 0:
            print('update_rollout in base class is called, nothing is changed')
    def update_replay(self):
        if self.verbose > 0:
            print('update_replay in base class is called, nothing is changed')
    def reset_episode(self):
        self.cumulative_reward = 0
        self.num_steps = 0

class RandomAgent(BaseAgent):
    def __init__(self, *args, **kwargs):
        self.cumulative_reward = 0
        super().__init__(*args, **kwargs)
    def select_action(self, state):
        action = self.action_space.sample()
        if self.verbose > 1:
            print('Random agent selected action: ', action)
        return action
    def update_step(self, old_state, action, reward, new_state):
        super().update_step(reward)
    def update_episode(self, terminated, truncated):
        if self.verbose > 0:
            if terminated:
                print('Episode terminated')
            if truncated:
                print('Episode truncated')
        super().update_episode()
    def update_rollout(self):
        pass
    def update_replay(self):
        pass

### Example usage of rl cardiac
See the readme under `rl_cardiac` for more details. 

In [None]:
from rl_cardiac.tcn_model import TCN_config
from rl_cardiac.cardiac_model import CardiacModel_Env
tcn_model = TCN_config(rat_type)
env = CardiacModel_Env(tcn_model, rat_type)
# noise level is set to 0 by default, should be changed to see if your agent can handle noise once it works well without noise
# env = CardiacModel_Env(tcn_model, rat_type, noise_level) 

In [None]:
from stable_baselines3 import PPO
policy_kwargs = dict(net_arch=[64])
model = PPO("MlpPolicy", env, verbose = 1,  learning_rate = 0.002, n_steps=128, batch_size = 4, n_epochs=4, clip_range = 0.2, gamma = 0.95, vf_coef =1, ent_coef = 0.005, policy_kwargs = policy_kwargs )
env.seed = 42
env.reset()
model.learn(total_timesteps=10000)

### Example usage of rl dbs

In [None]:
import gymnasium as gym
import rl_dbs.gym_oscillator
import rl_dbs.gym_oscillator.envs
import rl_dbs.oscillator_cpp
env = rl_dbs.gym_oscillator.envs.oscillatorEnv()

In [None]:
from stable_baselines3 import PPO
policy_kwargs = dict(net_arch=[64])
model = PPO("MlpPolicy", env, verbose = 1,  learning_rate = 0.002, n_steps=128, batch_size = 4, n_epochs=4, clip_range = 0.2, gamma = 0.95, vf_coef =1, ent_coef = 0.005, policy_kwargs = policy_kwargs )
env.seed = 42
env.reset()
model.learn(total_timesteps=10000)

### Example usage of TVB Epileptor

In [None]:
from TVB.tvb_wrapper import TVBWrapper
env = TVBWrapper(timestep=10, history_len=2000, max_len=6000, dt=0.05)

In [None]:
from stable_baselines3 import PPO
policy_kwargs = dict(net_arch=[64])
model = PPO("MlpPolicy", env, verbose = 1,  learning_rate = 0.002, n_steps=128, batch_size = 4, n_epochs=4, clip_range = 0.2, gamma = 0.95, vf_coef =1, ent_coef = 0.005, policy_kwargs = policy_kwargs )
env.seed = 42
env.reset()
model.learn(total_timesteps=10000)