### RLlib: Industry-Grade Reinforcement Learning
<p align="center">
<img src="https://docs.ray.io/en/latest/_images/rllib-logo.png" width="50%" loading="lazy" >
</p>

For this tutorial we will rely `ray['rllib']` which can be installed using `pip install "ray[rllib]" tensorflow torch`

Standard Example:
(from: https://docs.ray.io/en/latest/rllib/index.html)

In [1]:
from ray.rllib.algorithms.ppo import PPOConfig

config = (  # 1. Configure the algorithm,
    PPOConfig()
    .environment("Taxi-v3")
    .rollouts(num_rollout_workers=2)
    .framework("tf2")
    .training(model={"fcnet_hiddens": [64, 64]})
    .evaluation(evaluation_num_workers=1)
)

algo = config.build()  # 2. build the algorithm,

for _ in range(5):
    print(algo.train())  # 3. train it,

algo.evaluate()  # 4. and evaluate it.

  from .autonotebook import tqdm as notebook_tqdm
2023-05-19 18:14:14,374	INFO worker.py:1616 -- Started a local Ray instance. View the dashboard at [1m[32m127.0.0.1:8266 [39m[22m
[2m[36m(RolloutWorker pid=71185)[0m   if not isinstance(terminated, (bool, np.bool8)):
2023-05-19 18:14:25,817	INFO trainable.py:172 -- Trainable.setup took 13.618 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.


{'custom_metrics': {}, 'episode_media': {}, 'info': {'learner': {'default_policy': {'learner_stats': {'cur_kl_coeff': 0.20000000298023224, 'cur_lr': 4.999999873689376e-05, 'total_loss': 9.936198, 'policy_loss': -0.0030872251, 'vf_loss': 9.939197, 'vf_explained_var': -1.8502275e-05, 'kl': 0.00043994037, 'entropy': 1.7913172, 'entropy_coeff': 0.0}, 'custom_metrics': {}, 'num_agent_steps_trained': 125.0, 'num_grad_updates_lifetime': 480.5, 'diff_num_grad_updates_vs_sampler_policy': 479.5}}, 'num_env_steps_sampled': 4000, 'num_env_steps_trained': 4000, 'num_agent_steps_sampled': 4000, 'num_agent_steps_trained': 4000}, 'sampler_results': {'episode_reward_max': -180.0, 'episode_reward_min': -857.0, 'episode_reward_mean': -732.2, 'episode_len_mean': 190.7, 'episode_media': {}, 'episodes_this_iter': 20, 'policy_reward_min': {}, 'policy_reward_max': {}, 'policy_reward_mean': {}, 'custom_metrics': {}, 'hist_stats': {'episode_reward': [-848.0, -749.0, -803.0, -686.0, -749.0, -550.0, -794.0, -857.

{'evaluation': {'episode_reward_max': -398.0,
  'episode_reward_min': -506.0,
  'episode_reward_mean': -453.8,
  'episode_len_mean': 200.0,
  'episode_media': {},
  'episodes_this_iter': 10,
  'policy_reward_min': {},
  'policy_reward_max': {},
  'policy_reward_mean': {},
  'custom_metrics': {},
  'hist_stats': {'episode_reward': [-461.0,
    -443.0,
    -452.0,
    -416.0,
    -461.0,
    -506.0,
    -470.0,
    -452.0,
    -479.0,
    -398.0],
   'episode_lengths': [200, 200, 200, 200, 200, 200, 200, 200, 200, 200]},
  'sampler_perf': {'mean_raw_obs_processing_ms': 0.13536682491121382,
   'mean_inference_ms': 1.550860788630343,
   'mean_action_processing_ms': 0.05605958331411687,
   'mean_env_wait_ms': 0.03695428401216872,
   'mean_env_render_ms': 0.0},
  'num_faulty_episodes': 0,
  'connector_metrics': {'ObsPreprocessorConnector_ms': 0.008883476257324219,
   'StateBufferConnector_ms': 0.0020837783813476562,
   'ViewRequirementAgentConnector_ms': 0.048661231994628906},
  'num_agent_s

### Vectorize RL Environments

It is possible to vectorize RL environments using `RayEnvWrapper`. This is an [external library](https://github.com/ingambe/RayEnvWrapper) based on Ray, which can be installed using `pip install RayEnvWrapper`.

In [11]:
import gym
from RayEnvWrapper import WrapperRayVecEnv

number_of_cpu_cores = 4 
envs_per_cpu_core = 1

def make_and_seed(seed: int) -> gym.Env:
    env = gym.make('CartPole-v1')
    env.seed(seed)
    return env

vec_env = WrapperRayVecEnv(make_and_seed, number_of_cpu_cores, envs_per_cpu_core)

In [12]:
vec_env.reset()

array([[-0.04456399,  0.04653909,  0.01326909, -0.02099827],
       [ 0.03073904,  0.00145001, -0.03088818, -0.03131252],
       [ 0.03468829,  0.01500225,  0.01230312,  0.01825218],
       [ 0.02281231, -0.02475473,  0.02306162,  0.02072129]],
      dtype=float32)

In [None]:
ray.shutdown()