# Biblioteca de Algoritmos - Lab 03

Nos últimos anos, muitas bibliotecas RL foram desenvolvidas. Essas bibliotecas foram projetadas para ter todas as ferramentas necessárias para implementar e testar agentes de Aprendizado por Reforço .

Ainda assim, elas se diferem muito. É por isso que é importante escolher uma biblioteca que seja rápida, confiável e relevante para sua tarefa de RL. Do ponto de vista técnico, existem algumas coisas a se ter em mente ao considerar uma bilioteca para RL.

- **Suporte para bibliotecas de aprendizado de máquina existentes:** Como o RL normalmente usa algoritmos baseados em gradiente para aprender e ajustar funções de política, você vai querer que ele suporte sua biblioteca favorita (Tensorflow, Keras, Pytorch, etc.)
- **Escalabilidade:** RL é computacionalmente intensivo e ter a opção de executar de forma distribuída torna-se importante ao atacar ambientes complexos.
- **Composibilidade:** Os algoritmos de RL normalmente envolvem simulações e muitos outros componentes. Você vai querer uma biblioteca que permita reutilizar componentes de algoritmos de RL, que seja compatível com várias estruturas de aprendizado profundo.

[Aqui](https://docs.google.com/spreadsheets/d/1ZWhViAwCpRqupA5E_xFHSaBaaBZ1wAjO6PvmmEEpXGI/edit#gid=0) você consegue visualizar uma lista com algumas bibliotecas existentes.

<img src="https://i1.wp.com/neptune.ai/wp-content/uploads/RL-tools.png?resize=1024%2C372&ssl=1" width=500>


## Ray RLlib

[Ray](https://docs.ray.io/en/latest/) é uma plataforma de execução distribuída que fornece bases para paralelismo e escalabilidade que são simples de usar e permitem que os programas Python sejam escalados em qualquer lugar, de um notebook a um grande cluster. Além disso, construída sobre o Ray, temos a [RLlib](https://docs.ray.io/en/latest/rllib.html), que fornece uma API unificada que pode ser aproveitada em uma ampla gama de aplicações.

<br>

<img src="https://miro.medium.com/max/1838/1*_bomm09XtiZfQ52Kfz9Ciw.png" width=600>


A RLlib foi projetada para oferecer suporte a várias estruturas de aprendizado profundo (TensorFlow e PyTorch) e pode ser acessada por meio de uma API Python simples. Atualmente, ela vem com uma [série de algoritmos RL](https://docs.ray.io/en/latest/rllib-algorithms.html#available-algorithms-overview).

Em particular, a RLlib permite um desenvolvimento rápido porque torna mais fácil construir algoritmos RL escaláveis ​​por meio da reutilização e montagem de implementações existentes. A RLlib também permite que os desenvolvedores usem redes neurais criadas com várias estruturas de aprendizado profundo e se integra facilmente a simuladores de terceiros.


## Configuração

Você precisará fazer uma cópia deste notebook em seu Google Drive antes de editar. Você pode fazer isso com **Arquivo → Salvar uma cópia no Drive**.

Ambiente da competição
!pip install --upgrade ceia-soccer-twos > /dev/null 2>&1
a versão do ray compatível com a implementação dos agentes disponibilizada é a 1.4.0
!pip install 'aioredis==1.3.1' > /dev/null 2>&1 
!pip install 'aiohttp==3.7.4' > /dev/null 2>&1 
!pip install 'ray==1.4.0' > /dev/null 2>&1 
!pip install 'ray[rllib]==1.4.0' > /dev/null 2>&1 
!pip install 'ray[tune]==1.4.0' > /dev/null 2>&1 
!pip install torch > /dev/null 2>&1 
!pip install lz4 > /dev/null 2>&1 

Dependências necessárias para gravar os vídeos
!apt-get install -y xvfb x11-utils > /dev/null 2>&1 
!pip install pyvirtualdisplay==0.2.* > /dev/null 2>&1 

! wget http://www.atarimania.com/roms/Roms.rar
! mkdir /content/ROM/
! unrar e /content/Roms.rar /content/ROM/ -y
! python -m atari_py.import_roms /content/ROM/ > /dev/null 2>&1

Inicializa uma instância de um display virtual
from pyvirtualdisplay import Display
display = Display(visible=False, size=(1400, 900))
_ = display.start()

In [None]:
# Carrega a extensão do notebook TensorBoard
%load_ext tensorboard

Como tarefa bônus, experimente com os algoritmos aprendidos no ambiente `soccer_twos`, que será utilizado na competição final deste curso*. Para facilitar, utilize a variação `team_vs_policy` como no laboratório anterior.

<img src="https://raw.githubusercontent.com/bryanoliveira/soccer-twos-env/master/images/screenshot.png" height="400">

> Visualização do ambiente

Este ambiente consiste em um jogo de futebol de carros 2x2, ou seja, o objetivo é marcar um gol no adversário o mais rápido possível. Na variação `team_vs_policy`, seu agente controla um jogador do time azul e joga contra um time aleatório. Mais informações sobre o ambiente podem ser encontradas [no repositório](https://github.com/bryanoliveira/soccer-twos-env) e [na documentação do Unity ml-agents](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md#soccer-twos).


**Sua tarefa é treinar um agente com a interface do Ray apresentada, experimentando com diferentes algoritmos e hiperparâmetros.**


<br>

*A variação utilizada na competição será a `multiagent_player`, mas agentes treinados para `team_vs_policy` podem ser facilmente adaptados. Na seção "Exportando seu agente treinado" o agente "MyDqnSoccerAgent" faz exatamente isso.

Utilize o ambiente instanciado abaixo para executar o algoritmo de treinamento. Ao final da execução, a recompensa do seu agente por episódio deve tender a +2.

In [1]:
import gym
from gym.wrappers.monitoring.video_recorder import VideoRecorder
from gym.spaces import Discrete, Box

import ray
import ray.rllib.agents.pg as pg
from ray.tune.logger import pretty_print
from ray import tune
from ray.rllib.env.env_context import EnvContext
from ray.rllib.models import ModelCatalog
from ray.rllib.models.torch.torch_modelv2 import TorchModelV2
from ray.rllib.models.torch.fcnet import FullyConnectedNetwork as TorchFC
from ray.rllib.agents.pg import PGTrainer

import numpy as np
import os
import random

import torch
import torch.nn as nn

lz4 not available, disabling sample compression. This will significantly impact RLlib performance. To install lz4, run `pip install lz4`.


In [2]:
import soccer_twos

# Fecha o ambiente caso tenha sido aberto anteriormente
try: env.close()
except: pass

env = soccer_twos.make(
    variation=soccer_twos.EnvType.team_vs_policy,
    flatten_branched=True, # converte o action_space de MultiDiscrete para Discrete
    single_player=True, # controla um dos jogadores enquanto os outros ficam parados
    opponent_policy=lambda *_: 0,  # faz os oponentes ficarem parados
)

# Obtem tamanhos de estado e ação
state_size = env.observation_space.shape[0]
action_size = env.action_space.n

print("Tamanho do estado: {}, tamanho da ação: {}".format(state_size, action_size))
env.close()

[INFO] Connected to Unity environment with package version 2.1.0-exp.1 and communication version 1.5.0


INFO:mlagents_envs.environment:Connected to Unity environment with package version 2.1.0-exp.1 and communication version 1.5.0


[INFO] Connected new brain: SoccerTwos?team=1


INFO:mlagents_envs.environment:Connected new brain: SoccerTwos?team=1


[INFO] Connected new brain: SoccerTwos?team=0


INFO:mlagents_envs.environment:Connected new brain: SoccerTwos?team=0


Tamanho do estado: 336, tamanho da ação: 27


obs_space, act_space = env.observation_space, env.action_space

In [3]:
ray.shutdown()
ray.init(num_gpus=0, ignore_reinit_error=True, include_dashboard=False, log_to_driver=False)

{'node_ip_address': '192.168.15.7',
 'raylet_ip_address': '192.168.15.7',
 'redis_address': '192.168.15.7:6379',
 'object_store_address': 'tcp://127.0.0.1:65497',
 'raylet_socket_name': 'tcp://127.0.0.1:65522',
 'webui_url': None,
 'session_dir': 'C:\\Users\\User\\AppData\\Local\\Temp\\ray\\session_2021-12-10_01-00-42_704270_21040',
 'metrics_export_port': 65528,
 'node_id': 'b3f8c1496e90eaaf4aa0ba54859d8af5ffa14623714da19bd6cbb04d'}

In [4]:
from ray import tune

def create_rllib_env(env_config: dict = {}):
    # suporte a múltiplas instâncias do ambiente na mesma máquina
    if hasattr(env_config, "worker_index"):
        env_config["worker_id"] = (
            env_config.worker_index * env_config.get("num_envs_per_worker", 1)
            + env_config.vector_index
        )
    return soccer_twos.make(**env_config)

# registra ambiente no Ray
tune.registry.register_env("Soccer", create_rllib_env)

Utilize a configuração abaixo como ponto de partida para seus testes. 

A parte mais imporante é a chave `env_config`, que configura o ambiente para ser compatível com o agente disponibilizado para exportação do seu agente. Neste ponto do curso você já deve conseguir testar as outras variações do ambiente e utilizar as APIs do Ray para treinar um agente próximo (ou melhor) do que o [ceia_baseline_agent](https://drive.google.com/file/d/1WEjr48D7QG9uVy1tf4GJAZTpimHtINzE/view). Exemplos de como utilizar as outras variações podem ser encontrados [aqui](https://github.com/dlb-rl/rl-tournament-starter/). Ao utilizar essas variações, você deve utilizar também outras definições de agente para lidar com os diferentes espaços de observação e ação (que também estão presentes nos exemplos).

In [7]:
NUM_ENVS_PER_WORKER = 6

In [8]:
#single player with opponent
analysis = tune.run(
    "PPO",
    config={
        # system settings
        "num_gpus": 0,
        "num_workers": 6,
        "num_envs_per_worker": NUM_ENVS_PER_WORKER,
        "log_level": "INFO",
        "lr": 0.0003,
        "lambda": 0.95,
        "gamma": 0.99,
        'sgd_minibatch_size': 256,
        #'train_batch_size': 4000,
        'clip_param': 0.2,
        'model': {
          'fcnet_hiddens': [256, 256],
        },
        "framework": "torch",
        # RL setup
        "env": "Soccer",
        "env_config": {
            "num_envs_per_worker": NUM_ENVS_PER_WORKER,
            "variation": soccer_twos.EnvType.team_vs_policy,
            "single_player": True,
            "flatten_branched": True,
            "opponent_policy": lambda *_: 0,
        },
    },
    stop={
        # 10000000 (10M) de steps podem ser necessários para aprender uma política útil
        "timesteps_total": 30000000,
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 68400, # 8h
    },
    checkpoint_freq=100,
    checkpoint_at_end=True,
    local_dir=os.path.join("results"),
    restore="results/PPO/PPO_Soccer_0b316_00000_0_2021-12-09_18-29-08/checkpoint_000995/checkpoint-995",
    
)

Trial name,status,loc
PPO_Soccer_e3a41_00000,PENDING,


Trial name,status,loc
PPO_Soccer_e3a41_00000,RUNNING,


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5704552
  custom_metrics: {}
  date: 2021-12-10_01-02-46
  done: false
  episode_len_mean: 38.8433734939759
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8765180663890149
  episode_reward_min: -2.0
  episodes_this_iter: 166
  episodes_total: 79409
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.2
          cur_lr: 0.0003
          entropy: 1.1327780038118362
          entropy_coeff: 0.0
          kl: 0.035540089826099575
          policy_loss: -0.11349753965623677
          total_loss: -0.07972015027189627
          vf_explained_var: 0.8244315981864929
          vf_loss: 0.02666937484173104
    num_agent_steps_sampled: 5704552
    num_steps_sampled: 5704552
    num_steps_trained: 5704552
  iterations_since_restore: 1
  node_ip: 192.168.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,996,43252.4,5704552,1.87652,1.9792,-2,38.8434


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5712544
  custom_metrics: {}
  date: 2021-12-10_01-03-33
  done: false
  episode_len_mean: 41.43434343434343
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8978080827780444
  episode_reward_min: -2.0
  episodes_this_iter: 198
  episodes_total: 79607
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.30000000000000004
          cur_lr: 0.0003
          entropy: 1.0784552618861198
          entropy_coeff: 0.0
          kl: 0.032717189751565456
          policy_loss: -0.11729610414477065
          total_loss: -0.08005998679436743
          vf_explained_var: 0.7540672421455383
          vf_loss: 0.027420959377195686
    num_agent_steps_sampled: 5712544
    num_steps_sampled: 5712544
    num_steps_trained: 5712544
  iterations_since_restore: 2
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,997,43299.2,5712544,1.89781,1.9792,-2,41.4343


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5720536
  custom_metrics: {}
  date: 2021-12-10_01-04-20
  done: false
  episode_len_mean: 40.69543147208122
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9190395991814317
  episode_reward_min: 1.485200047492981
  episodes_this_iter: 197
  episodes_total: 79804
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.45000000000000007
          cur_lr: 0.0003
          entropy: 1.0706304907798767
          entropy_coeff: 0.0
          kl: 0.027400876686442643
          policy_loss: -0.11490186821902171
          total_loss: -0.07616191413399065
          vf_explained_var: 0.7387990951538086
          vf_loss: 0.026409561280161142
    num_agent_steps_sampled: 5720536
    num_steps_sampled: 5720536
    num_steps_trained: 5720536
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,998,43346.3,5720536,1.91904,1.9792,1.4852,40.6954


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5728528
  custom_metrics: {}
  date: 2021-12-10_01-05-06
  done: false
  episode_len_mean: 37.87128712871287
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9056693032236383
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 80006
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.675
          cur_lr: 0.0003
          entropy: 1.0657974276691675
          entropy_coeff: 0.0
          kl: 0.023062850057613105
          policy_loss: -0.111589036139776
          total_loss: -0.07229731665574946
          vf_explained_var: 0.7792162299156189
          vf_loss: 0.023724297352600843
    num_agent_steps_sampled: 5728528
    num_steps_sampled: 5728528
    num_steps_trained: 5728528
  iterations_since_restore: 4
  node_ip: 192.1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,999,43392.3,5728528,1.90567,1.9804,-2,37.8713


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5736520
  custom_metrics: {}
  date: 2021-12-10_01-05-52
  done: false
  episode_len_mean: 39.072916666666664
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9023416663209598
  episode_reward_min: -2.0
  episodes_this_iter: 192
  episodes_total: 80198
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.084486361593008
          entropy_coeff: 0.0
          kl: 0.01771753534558229
          policy_loss: -0.10461393708828837
          total_loss: -0.06618848512880504
          vf_explained_var: 0.8192139863967896
          vf_loss: 0.020486452616751194
    num_agent_steps_sampled: 5736520
    num_steps_sampled: 5736520
    num_steps_trained: 5736520
  iterations_since_restore: 5
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1000,43438.2,5736520,1.90234,1.9792,-2,39.0729


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5744512
  custom_metrics: {}
  date: 2021-12-10_01-06-39
  done: false
  episode_len_mean: 43.02803738317757
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8960485954150976
  episode_reward_min: -2.0
  episodes_this_iter: 214
  episodes_total: 80412
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0281727388501167
          entropy_coeff: 0.0
          kl: 0.017848606454208493
          policy_loss: -0.10240432678256184
          total_loss: -0.062172475722036324
          vf_explained_var: 0.7608596086502075
          vf_loss: 0.022160134802106768
    num_agent_steps_sampled: 5744512
    num_steps_sampled: 5744512
    num_steps_trained: 5744512
  iterations_since_restore: 6
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1001,43484.7,5744512,1.89605,1.9792,-2,43.028


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5752504
  custom_metrics: {}
  date: 2021-12-10_01-07-25
  done: false
  episode_len_mean: 35.066964285714285
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8961035710360323
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 80636
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0260381996631622
          entropy_coeff: 0.0
          kl: 0.016836408874951303
          policy_loss: -0.10501752005075105
          total_loss: -0.06481220113346353
          vf_explained_var: 0.7998142838478088
          vf_loss: 0.023158453637734056
    num_agent_steps_sampled: 5752504
    num_steps_sampled: 5752504
    num_steps_trained: 5752504
  iterations_since_restore: 7


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1002,43530.8,5752504,1.8961,1.9776,-2,35.067


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5760496
  custom_metrics: {}
  date: 2021-12-10_01-08-12
  done: false
  episode_len_mean: 38.133663366336634
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9055267277330454
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 80838
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0639256834983826
          entropy_coeff: 0.0
          kl: 0.01853859401308
          policy_loss: -0.10960680901189335
          total_loss: -0.06867173462524079
          vf_explained_var: 0.7749137878417969
          vf_loss: 0.02216474775923416
    num_agent_steps_sampled: 5760496
    num_steps_sampled: 5760496
    num_steps_trained: 5760496
  iterations_since_restore: 8
  node

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1003,43577.7,5760496,1.90553,1.9816,-2,38.1337


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5768488
  custom_metrics: {}
  date: 2021-12-10_01-08-58
  done: false
  episode_len_mean: 38.122549019607845
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9242098045115377
  episode_reward_min: 1.242799997329712
  episodes_this_iter: 204
  episodes_total: 81042
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0932743921875954
          entropy_coeff: 0.0
          kl: 0.01867864394444041
          policy_loss: -0.11600267604808323
          total_loss: -0.07249732012860477
          vf_explained_var: 0.7297709584236145
          vf_loss: 0.024593227310106158
    num_agent_steps_sampled: 5768488
    num_steps_sampled: 5768488
    num_steps_trained: 5768488
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1004,43624,5768488,1.92421,1.9784,1.2428,38.1225


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5776480
  custom_metrics: {}
  date: 2021-12-10_01-09-46
  done: false
  episode_len_mean: 36.81531531531532
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9100324312845867
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 81264
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0642906315624714
          entropy_coeff: 0.0
          kl: 0.018352513201534748
          policy_loss: -0.10983737120113801
          total_loss: -0.06820153207809199
          vf_explained_var: 0.8243004679679871
          vf_loss: 0.023053915647324175
    num_agent_steps_sampled: 5776480
    num_steps_sampled: 5776480
    num_steps_trained: 5776480
  iterations_since_restore: 10
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1005,43671.5,5776480,1.91003,1.9796,-2,36.8153


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5784472
  custom_metrics: {}
  date: 2021-12-10_01-10-32
  done: false
  episode_len_mean: 36.714285714285715
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9095180954251971
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 81474
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0485328640788794
          entropy_coeff: 0.0
          kl: 0.018052852392429486
          policy_loss: -0.1014828845509328
          total_loss: -0.05594404923613183
          vf_explained_var: 0.8191484808921814
          vf_loss: 0.027260323346126825
    num_agent_steps_sampled: 5784472
    num_steps_sampled: 5784472
    num_steps_trained: 5784472
  iterations_since_restore: 11


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1006,43717.6,5784472,1.90952,1.9784,-2,36.7143


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5792464
  custom_metrics: {}
  date: 2021-12-10_01-11-18
  done: false
  episode_len_mean: 38.910526315789475
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8839368393546656
  episode_reward_min: -2.0
  episodes_this_iter: 190
  episodes_total: 81664
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.1130745373666286
          entropy_coeff: 0.0
          kl: 0.018249536224175245
          policy_loss: -0.11249325989047065
          total_loss: -0.0671930646058172
          vf_explained_var: 0.8387680053710938
          vf_loss: 0.026822537824045867
    num_agent_steps_sampled: 5792464
    num_steps_sampled: 5792464
    num_steps_trained: 5792464
  iterations_since_restore: 12


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1007,43764.3,5792464,1.88394,1.9808,-2,38.9105


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5800456
  custom_metrics: {}
  date: 2021-12-10_01-12-04
  done: false
  episode_len_mean: 42.93434343434343
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8765878767678232
  episode_reward_min: -2.0
  episodes_this_iter: 198
  episodes_total: 81862
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.063370168209076
          entropy_coeff: 0.0
          kl: 0.01782748562982306
          policy_loss: -0.1096588721848093
          total_loss: -0.06182471744250506
          vf_explained_var: 0.8026328682899475
          vf_loss: 0.029783826204948127
    num_agent_steps_sampled: 5800456
    num_steps_sampled: 5800456
    num_steps_trained: 5800456
  iterations_since_restore: 13
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1008,43809.9,5800456,1.87659,1.9812,-2,42.9343


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5808448
  custom_metrics: {}
  date: 2021-12-10_01-12-51
  done: false
  episode_len_mean: 42.4
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9156133282752263
  episode_reward_min: 0.0
  episodes_this_iter: 210
  episodes_total: 82072
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.046585202217102
          entropy_coeff: 0.0
          kl: 0.018455891957273707
          policy_loss: -0.10846124280942604
          total_loss: -0.06159434717847034
          vf_explained_var: 0.7360044717788696
          vf_loss: 0.028180307243019342
    num_agent_steps_sampled: 5808448
    num_steps_sampled: 5808448
    num_steps_trained: 5808448
  iterations_since_restore: 14
  node_ip: 192.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1009,43856.4,5808448,1.91561,1.9808,0,42.4


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5816440
  custom_metrics: {}
  date: 2021-12-10_01-13-39
  done: false
  episode_len_mean: 33.9954954954955
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9148522513406772
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 82294
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.019116461277008
          entropy_coeff: 0.0
          kl: 0.017631993163377047
          policy_loss: -0.10405980514769908
          total_loss: -0.06121773994527757
          vf_explained_var: 0.7754867076873779
          vf_loss: 0.024989671539515257
    num_agent_steps_sampled: 5816440
    num_steps_sampled: 5816440
    num_steps_trained: 5816440
  iterations_since_restore: 15
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1010,43904.5,5816440,1.91485,1.9812,-2,33.9955


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5824432
  custom_metrics: {}
  date: 2021-12-10_01-14-26
  done: false
  episode_len_mean: 36.78260869565217
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9268425098363904
  episode_reward_min: 1.6167999505996704
  episodes_this_iter: 207
  episodes_total: 82501
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0150794424116611
          entropy_coeff: 0.0
          kl: 0.018784460495226085
          policy_loss: -0.11578886165807489
          total_loss: -0.0734117266983958
          vf_explained_var: 0.722902238368988
          vf_loss: 0.02335786761250347
    num_agent_steps_sampled: 5824432
    num_steps_sampled: 5824432
    num_steps_trained: 5824432
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1011,43951.9,5824432,1.92684,1.9812,1.6168,36.7826


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5832424
  custom_metrics: {}
  date: 2021-12-10_01-15-12
  done: false
  episode_len_mean: 40.24545454545454
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.919920000704852
  episode_reward_min: 0.6388000249862671
  episodes_this_iter: 220
  episodes_total: 82721
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0030655451118946
          entropy_coeff: 0.0
          kl: 0.018652525526704267
          policy_loss: -0.11367283630534075
          total_loss: -0.0712619237601757
          vf_explained_var: 0.7177000045776367
          vf_loss: 0.02352523006265983
    num_agent_steps_sampled: 5832424
    num_steps_sampled: 5832424
    num_steps_trained: 5832424
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1012,43998.1,5832424,1.91992,1.9812,0.6388,40.2455


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5840416
  custom_metrics: {}
  date: 2021-12-10_01-15-59
  done: false
  episode_len_mean: 36.166666666666664
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9094628566787357
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 82931
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0007647313177586
          entropy_coeff: 0.0
          kl: 0.017951982154045254
          policy_loss: -0.10227395594120026
          total_loss: -0.062304648592544254
          vf_explained_var: 0.7845208048820496
          vf_loss: 0.02179292420623824
    num_agent_steps_sampled: 5840416
    num_steps_sampled: 5840416
    num_steps_trained: 5840416
  iterations_since_restore: 18

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1013,44044.8,5840416,1.90946,1.9812,-2,36.1667


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5848408
  custom_metrics: {}
  date: 2021-12-10_01-16-46
  done: false
  episode_len_mean: 35.72522522522522
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.894915311723142
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 83153
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9889937788248062
          entropy_coeff: 0.0
          kl: 0.01716215192573145
          policy_loss: -0.09394551726290956
          total_loss: -0.0505863260186743
          vf_explained_var: 0.8209524154663086
          vf_loss: 0.025982512801419944
    num_agent_steps_sampled: 5848408
    num_steps_sampled: 5848408
    num_steps_trained: 5848408
  iterations_since_restore: 19
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1014,44091.4,5848408,1.89492,1.9772,-2,35.7252


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5856400
  custom_metrics: {}
  date: 2021-12-10_01-17-33
  done: false
  episode_len_mean: 40.22959183673469
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8800142857493187
  episode_reward_min: -2.0
  episodes_this_iter: 196
  episodes_total: 83349
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0320536214858294
          entropy_coeff: 0.0
          kl: 0.017590359406312928
          policy_loss: -0.09702434078644728
          total_loss: -0.04631129847257398
          vf_explained_var: 0.7846819162368774
          vf_loss: 0.03290280344663188
    num_agent_steps_sampled: 5856400
    num_steps_sampled: 5856400
    num_steps_trained: 5856400
  iterations_since_restore: 20
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1015,44138.1,5856400,1.88001,1.9816,-2,40.2296


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5864392
  custom_metrics: {}
  date: 2021-12-10_01-18-21
  done: false
  episode_len_mean: 36.180952380952384
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9099923798016138
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 83559
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0807524882256985
          entropy_coeff: 0.0
          kl: 0.018456952064298093
          policy_loss: -0.11371903502731584
          total_loss: -0.0704528548521921
          vf_explained_var: 0.8018041253089905
          vf_loss: 0.024578515090979636
    num_agent_steps_sampled: 5864392
    num_steps_sampled: 5864392
    num_steps_trained: 5864392
  iterations_since_restore: 21
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1016,44186.1,5864392,1.90999,1.9792,-2,36.181


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5872384
  custom_metrics: {}
  date: 2021-12-10_01-19-08
  done: false
  episode_len_mean: 38.23444976076555
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.923942584740488
  episode_reward_min: 1.159600019454956
  episodes_this_iter: 209
  episodes_total: 83768
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0262044947594404
          entropy_coeff: 0.0
          kl: 0.017823523114202544
          policy_loss: -0.10593803771189414
          total_loss: -0.06361183212720789
          vf_explained_var: 0.8016210198402405
          vf_loss: 0.024279885576106608
    num_agent_steps_sampled: 5872384
    num_steps_sampled: 5872384
    num_steps_trained: 5872384
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1017,44233,5872384,1.92394,1.9784,1.1596,38.2344


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5880376
  custom_metrics: {}
  date: 2021-12-10_01-19-55
  done: false
  episode_len_mean: 34.757709251101325
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9143048467089951
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 83995
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0178328435868025
          entropy_coeff: 0.0
          kl: 0.01863026956561953
          policy_loss: -0.10166655562352389
          total_loss: -0.062331797496881336
          vf_explained_var: 0.8253161907196045
          vf_loss: 0.020471611933317035
    num_agent_steps_sampled: 5880376
    num_steps_sampled: 5880376
    num_steps_trained: 5880376
  iterations_since_restore: 23


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1018,44280.5,5880376,1.9143,1.9792,-2,34.7577


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5888368
  custom_metrics: {}
  date: 2021-12-10_01-20-42
  done: false
  episode_len_mean: 38.77056277056277
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9064675301184386
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 84226
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9985475763678551
          entropy_coeff: 0.0
          kl: 0.018387519579846412
          policy_loss: -0.10893464740365744
          total_loss: -0.065170394256711
          vf_explained_var: 0.7904099225997925
          vf_loss: 0.025146887404844165
    num_agent_steps_sampled: 5888368
    num_steps_sampled: 5888368
    num_steps_trained: 5888368
  iterations_since_restore: 24
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1019,44327.3,5888368,1.90647,1.9808,-2,38.7706


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5896360
  custom_metrics: {}
  date: 2021-12-10_01-21-28
  done: false
  episode_len_mean: 37.32142857142857
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8862795927086655
  episode_reward_min: -2.0
  episodes_this_iter: 196
  episodes_total: 84422
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.071067864075303
          entropy_coeff: 0.0
          kl: 0.017406209895852953
          policy_loss: -0.10864941446925513
          total_loss: -0.06312946049729362
          vf_explained_var: 0.7914538979530334
          vf_loss: 0.02789616899099201
    num_agent_steps_sampled: 5896360
    num_steps_sampled: 5896360
    num_steps_trained: 5896360
  iterations_since_restore: 25
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1020,44373.7,5896360,1.88628,1.9808,-2,37.3214


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5904352
  custom_metrics: {}
  date: 2021-12-10_01-22-15
  done: false
  episode_len_mean: 41.8
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.916824612250695
  episode_reward_min: 1.3248000144958496
  episodes_this_iter: 195
  episodes_total: 84617
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0632311552762985
          entropy_coeff: 0.0
          kl: 0.018578651477582753
          policy_loss: -0.11563114242744632
          total_loss: -0.07281996094388887
          vf_explained_var: 0.7697544693946838
          vf_loss: 0.024000295612495393
    num_agent_steps_sampled: 5904352
    num_steps_sampled: 5904352
    num_steps_trained: 5904352
  iterations_since_restore: 26


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1021,44420.8,5904352,1.91682,1.9808,1.3248,41.8


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5912344
  custom_metrics: {}
  date: 2021-12-10_01-23-02
  done: false
  episode_len_mean: 39.23671497584541
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9032811594470112
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 84824
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0057091992348433
          entropy_coeff: 0.0
          kl: 0.01776404949487187
          policy_loss: -0.10639372089644894
          total_loss: -0.0653108146507293
          vf_explained_var: 0.732229471206665
          vf_loss: 0.02309680636972189
    num_agent_steps_sampled: 5912344
    num_steps_sampled: 5912344
    num_steps_trained: 5912344
  iterations_since_restore: 27
  nod

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1022,44467.2,5912344,1.90328,1.9792,-2,39.2367


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5920336
  custom_metrics: {}
  date: 2021-12-10_01-23-49
  done: false
  episode_len_mean: 36.61009174311926
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9271577984914867
  episode_reward_min: 1.4564000368118286
  episodes_this_iter: 218
  episodes_total: 85042
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.012873712927103
          entropy_coeff: 0.0
          kl: 0.018895983026595786
          policy_loss: -0.11182854371145368
          total_loss: -0.07166257925564423
          vf_explained_var: 0.7498302459716797
          vf_loss: 0.021033781173173338
    num_agent_steps_sampled: 5920336
    num_steps_sampled: 5920336
    num_steps_trained: 5920336
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1023,44513.9,5920336,1.92716,1.9792,1.4564,36.6101


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5928328
  custom_metrics: {}
  date: 2021-12-10_01-24-36
  done: false
  episode_len_mean: 35.96506550218341
  episode_media: {}
  episode_reward_max: 1.9747999906539917
  episode_reward_mean: 1.8951999984974424
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 85271
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9620555341243744
          entropy_coeff: 0.0
          kl: 0.017523972288472578
          policy_loss: -0.1030589928268455
          total_loss: -0.06390735920285806
          vf_explained_var: 0.7902624607086182
          vf_loss: 0.021408611675724387
    num_agent_steps_sampled: 5928328
    num_steps_sampled: 5928328
    num_steps_trained: 5928328
  iterations_since_restore: 29
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1024,44561,5928328,1.8952,1.9748,-2,35.9651


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5936320
  custom_metrics: {}
  date: 2021-12-10_01-25-22
  done: false
  episode_len_mean: 38.700980392156865
  episode_media: {}
  episode_reward_max: 1.9759999513626099
  episode_reward_mean: 1.923015683889389
  episode_reward_min: 1.5224000215530396
  episodes_this_iter: 204
  episodes_total: 85475
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.007864810526371
          entropy_coeff: 0.0
          kl: 0.0192456558579579
          policy_loss: -0.11660507810302079
          total_loss: -0.0759159837034531
          vf_explained_var: 0.7375292778015137
          vf_loss: 0.021202864998485893
    num_agent_steps_sampled: 5936320
    num_steps_sampled: 5936320
    num_steps_trained: 5936320
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1025,44607.1,5936320,1.92302,1.976,1.5224,38.701


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5944312
  custom_metrics: {}
  date: 2021-12-10_01-26-08
  done: false
  episode_len_mean: 37.24285714285714
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.8701219030788967
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 85685
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0047666300088167
          entropy_coeff: 0.0
          kl: 0.01661148134735413
          policy_loss: -0.09740441551548429
          total_loss: -0.05700118011736777
          vf_explained_var: 0.77046138048172
          vf_loss: 0.02358411205932498
    num_agent_steps_sampled: 5944312
    num_steps_sampled: 5944312
    num_steps_trained: 5944312
  iterations_since_restore: 31
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1026,44653.1,5944312,1.87012,1.9768,-2,37.2429


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5952304
  custom_metrics: {}
  date: 2021-12-10_01-26-54
  done: false
  episode_len_mean: 36.69230769230769
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8743963786379785
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 85906
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9993873592466116
          entropy_coeff: 0.0
          kl: 0.01737115514697507
          policy_loss: -0.09905208009877242
          total_loss: -0.05858933983836323
          vf_explained_var: 0.8247642517089844
          vf_loss: 0.02287444198736921
    num_agent_steps_sampled: 5952304
    num_steps_sampled: 5952304
    num_steps_trained: 5952304
  iterations_since_restore: 32
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1027,44699.3,5952304,1.8744,1.9788,-2,36.6923


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5960296
  custom_metrics: {}
  date: 2021-12-10_01-27-41
  done: false
  episode_len_mean: 33.17004048582996
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.918191092217017
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 86153
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9617864768952131
          entropy_coeff: 0.0
          kl: 0.017335759184788913
          policy_loss: -0.09069593338062987
          total_loss: -0.04942188668064773
          vf_explained_var: 0.7471102476119995
          vf_loss: 0.0237215890083462
    num_agent_steps_sampled: 5960296
    num_steps_sampled: 5960296
    num_steps_trained: 5960296
  iterations_since_restore: 33
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1028,44745.9,5960296,1.91819,1.9792,-2,33.17


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5968288
  custom_metrics: {}
  date: 2021-12-10_01-28-28
  done: false
  episode_len_mean: 32.795081967213115
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9193114775126097
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 86397
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9952411074191332
          entropy_coeff: 0.0
          kl: 0.018016246322076768
          policy_loss: -0.10542930031078868
          total_loss: -0.06559044079040177
          vf_explained_var: 0.7817518711090088
          vf_loss: 0.02159741031937301
    num_agent_steps_sampled: 5968288
    num_steps_sampled: 5968288
    num_steps_trained: 5968288
  iterations_since_restore: 34


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1029,44793.2,5968288,1.91931,1.9844,-2,32.7951


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5976280
  custom_metrics: {}
  date: 2021-12-10_01-29-16
  done: false
  episode_len_mean: 32.65863453815261
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9350811262207337
  episode_reward_min: 1.6335999965667725
  episodes_this_iter: 249
  episodes_total: 86646
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9800197146832943
          entropy_coeff: 0.0
          kl: 0.01811987452674657
          policy_loss: -0.11140722804702818
          total_loss: -0.07069516205228865
          vf_explained_var: 0.734750509262085
          vf_loss: 0.02236569338128902
    num_agent_steps_sampled: 5976280
    num_steps_sampled: 5976280
    num_steps_trained: 5976280
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1030,44841.2,5976280,1.93508,1.9844,1.6336,32.6586


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5984272
  custom_metrics: {}
  date: 2021-12-10_01-30-04
  done: false
  episode_len_mean: 36.30733944954128
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8930385287748563
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 86864
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9971126858144999
          entropy_coeff: 0.0
          kl: 0.017443061457015574
          policy_loss: -0.10454069936531596
          total_loss: -0.0584635615850857
          vf_explained_var: 0.7406260967254639
          vf_loss: 0.028416037966962904
    num_agent_steps_sampled: 5984272
    num_steps_sampled: 5984272
    num_steps_trained: 5984272
  iterations_since_restore: 36
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1031,44888.7,5984272,1.89304,1.9844,-2,36.3073


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 5992264
  custom_metrics: {}
  date: 2021-12-10_01-30-50
  done: false
  episode_len_mean: 35.95475113122172
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9107782797576076
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 87085
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9697208143770695
          entropy_coeff: 0.0
          kl: 0.017685219791019335
          policy_loss: -0.1030026965891011
          total_loss: -0.06066233244200703
          vf_explained_var: 0.7263977527618408
          vf_loss: 0.02443407755345106
    num_agent_steps_sampled: 5992264
    num_steps_sampled: 5992264
    num_steps_trained: 5992264
  iterations_since_restore: 37
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1032,44935.4,5992264,1.91078,1.9844,-2,35.9548


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6000256
  custom_metrics: {}
  date: 2021-12-10_01-31-36
  done: false
  episode_len_mean: 35.78538812785388
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9287598154860544
  episode_reward_min: 1.6332000494003296
  episodes_this_iter: 219
  episodes_total: 87304
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0132122412323952
          entropy_coeff: 0.0
          kl: 0.01891719549894333
          policy_loss: -0.11172052816255018
          total_loss: -0.07014748553046957
          vf_explained_var: 0.7213050127029419
          vf_loss: 0.022419384389650077
    num_agent_steps_sampled: 6000256
    num_steps_sampled: 6000256
    num_steps_trained: 6000256
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1033,44981.2,6000256,1.92876,1.98,1.6332,35.7854


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6008248
  custom_metrics: {}
  date: 2021-12-10_01-32-23
  done: false
  episode_len_mean: 35.246808510638296
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9165719148960518
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 87539
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9458953868597746
          entropy_coeff: 0.0
          kl: 0.018546921608503908
          policy_loss: -0.10841190139763057
          total_loss: -0.07047882190090604
          vf_explained_var: 0.775010883808136
          vf_loss: 0.01915431849192828
    num_agent_steps_sampled: 6008248
    num_steps_sampled: 6008248
    num_steps_trained: 6008248
  iterations_since_restore: 39
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1034,45027.4,6008248,1.91657,1.9824,-2,35.2468


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6016240
  custom_metrics: {}
  date: 2021-12-10_01-33-09
  done: false
  episode_len_mean: 34.21800947867298
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9133402821012018
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 87750
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0324729923158884
          entropy_coeff: 0.0
          kl: 0.018131431163055822
          policy_loss: -0.10600996564608067
          total_loss: -0.065065000904724
          vf_explained_var: 0.7803253531455994
          vf_loss: 0.0225868909037672
    num_agent_steps_sampled: 6016240
    num_steps_sampled: 6016240
    num_steps_trained: 6016240
  iterations_since_restore: 40
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1035,45073.5,6016240,1.91334,1.9824,-2,34.218


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6024232
  custom_metrics: {}
  date: 2021-12-10_01-33-55
  done: false
  episode_len_mean: 41.24226804123711
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9006268037963159
  episode_reward_min: -2.0
  episodes_this_iter: 194
  episodes_total: 87944
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0227320175617933
          entropy_coeff: 0.0
          kl: 0.018387126503512263
          policy_loss: -0.1081859883852303
          total_loss: -0.06857628532452509
          vf_explained_var: 0.7965583801269531
          vf_loss: 0.02099273505154997
    num_agent_steps_sampled: 6024232
    num_steps_sampled: 6024232
    num_steps_trained: 6024232
  iterations_since_restore: 41
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1036,45120,6024232,1.90063,1.9788,-2,41.2423


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6032224
  custom_metrics: {}
  date: 2021-12-10_01-34-41
  done: false
  episode_len_mean: 38.63775510204081
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9232081637090566
  episode_reward_min: 1.6247999668121338
  episodes_this_iter: 196
  episodes_total: 88140
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0462430361658335
          entropy_coeff: 0.0
          kl: 0.01819809028529562
          policy_loss: -0.10647708301985404
          total_loss: -0.06677494125324301
          vf_explained_var: 0.7853952646255493
          vf_loss: 0.021276575163938105
    num_agent_steps_sampled: 6032224
    num_steps_sampled: 6032224
    num_steps_trained: 6032224
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1037,45166,6032224,1.92321,1.9824,1.6248,38.6378


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6040216
  custom_metrics: {}
  date: 2021-12-10_01-35-28
  done: false
  episode_len_mean: 38.399103139013455
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9236035824089306
  episode_reward_min: 0.46239998936653137
  episodes_this_iter: 223
  episodes_total: 88363
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0038555786013603
          entropy_coeff: 0.0
          kl: 0.019196305132936686
          policy_loss: -0.11445092805661261
          total_loss: -0.07254827822907828
          vf_explained_var: 0.7635290622711182
          vf_loss: 0.022466390510089695
    num_agent_steps_sampled: 6040216
    num_steps_sampled: 6040216
    num_steps_trained: 6040216
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1038,45212.7,6040216,1.9236,1.9824,0.4624,38.3991


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6048208
  custom_metrics: {}
  date: 2021-12-10_01-36-14
  done: false
  episode_len_mean: 36.101694915254235
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.9281966059894886
  episode_reward_min: 0.8863999843597412
  episodes_this_iter: 236
  episodes_total: 88599
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9641498643904924
          entropy_coeff: 0.0
          kl: 0.018694664991926402
          policy_loss: -0.1114369205897674
          total_loss: -0.07223502299166285
          vf_explained_var: 0.7363704442977905
          vf_loss: 0.02027355070458725
    num_agent_steps_sampled: 6048208
    num_steps_sampled: 6048208
    num_steps_trained: 6048208
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1039,45258.5,6048208,1.9282,1.9768,0.8864,36.1017


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6056200
  custom_metrics: {}
  date: 2021-12-10_01-37-00
  done: false
  episode_len_mean: 33.586776859504134
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9170826481393546
  episode_reward_min: -2.0
  episodes_this_iter: 242
  episodes_total: 88841
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9685062244534492
          entropy_coeff: 0.0
          kl: 0.01794059795793146
          policy_loss: -0.09815444238483906
          total_loss: -0.058516287070233375
          vf_explained_var: 0.7578401565551758
          vf_loss: 0.021473299129866064
    num_agent_steps_sampled: 6056200
    num_steps_sampled: 6056200
    num_steps_trained: 6056200
  iterations_since_restore: 45


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1040,45304.4,6056200,1.91708,1.9796,-2,33.5868


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6064192
  custom_metrics: {}
  date: 2021-12-10_01-37-46
  done: false
  episode_len_mean: 36.689320388349515
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9270077700753814
  episode_reward_min: 1.496000051498413
  episodes_this_iter: 206
  episodes_total: 89047
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0229430496692657
          entropy_coeff: 0.0
          kl: 0.018446095986291766
          policy_loss: -0.11298923083813861
          total_loss: -0.07139997860940639
          vf_explained_var: 0.7568527460098267
          vf_loss: 0.022912580403499305
    num_agent_steps_sampled: 6064192
    num_steps_sampled: 6064192
    num_steps_trained: 6064192
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1041,45350.3,6064192,1.92701,1.9788,1.496,36.6893


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6072184
  custom_metrics: {}
  date: 2021-12-10_01-38-32
  done: false
  episode_len_mean: 38.666666666666664
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8860289871980602
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 89254
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0066633764654398
          entropy_coeff: 0.0
          kl: 0.01795990887330845
          policy_loss: -0.10131422674749047
          total_loss: -0.05699721461132867
          vf_explained_var: 0.7776797413825989
          vf_loss: 0.026132602302823216
    num_agent_steps_sampled: 6072184
    num_steps_sampled: 6072184
    num_steps_trained: 6072184
  iterations_since_restore: 47
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1042,45396.4,6072184,1.88603,1.9832,-2,38.6667


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6080176
  custom_metrics: {}
  date: 2021-12-10_01-39-19
  done: false
  episode_len_mean: 38.0
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8875196292021563
  episode_reward_min: -2.0
  episodes_this_iter: 214
  episodes_total: 89468
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9810428787022829
          entropy_coeff: 0.0
          kl: 0.017259511136217043
          policy_loss: -0.09602032124530524
          total_loss: -0.048711561161326244
          vf_explained_var: 0.7183359861373901
          vf_loss: 0.02983350254362449
    num_agent_steps_sampled: 6080176
    num_steps_sampled: 6080176
    num_steps_trained: 6080176
  iterations_since_restore: 48
  node_ip: 192

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1043,45443.3,6080176,1.88752,1.9832,-2,38


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6088168
  custom_metrics: {}
  date: 2021-12-10_01-40-05
  done: false
  episode_len_mean: 34.325892857142854
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.915914284331458
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 89692
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0092718843370676
          entropy_coeff: 0.0
          kl: 0.017753653693944216
          policy_loss: -0.10954835813026875
          total_loss: -0.0668368642218411
          vf_explained_var: 0.8220046758651733
          vf_loss: 0.024735920422244817
    num_agent_steps_sampled: 6088168
    num_steps_sampled: 6088168
    num_steps_trained: 6088168
  iterations_since_restore: 49
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1044,45489.6,6088168,1.91591,1.9832,-2,34.3259


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6096160
  custom_metrics: {}
  date: 2021-12-10_01-40-51
  done: false
  episode_len_mean: 37.26704545454545
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9257795424623922
  episode_reward_min: 1.7007999420166016
  episodes_this_iter: 176
  episodes_total: 89868
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.1112277992069721
          entropy_coeff: 0.0
          kl: 0.01839311682851985
          policy_loss: -0.12057649536291137
          total_loss: -0.07861516282719094
          vf_explained_var: 0.8044784069061279
          vf_loss: 0.02333829994313419
    num_agent_steps_sampled: 6096160
    num_steps_sampled: 6096160
    num_steps_trained: 6096160
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1045,45535.7,6096160,1.92578,1.9824,1.7008,37.267


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6104152
  custom_metrics: {}
  date: 2021-12-10_01-41-37
  done: false
  episode_len_mean: 41.515306122448976
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8787795898257469
  episode_reward_min: -2.0
  episodes_this_iter: 196
  episodes_total: 90064
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.107487052679062
          entropy_coeff: 0.0
          kl: 0.01777533272979781
          policy_loss: -0.10614582290872931
          total_loss: -0.0641325595206581
          vf_explained_var: 0.8645757436752319
          vf_loss: 0.024015736766159534
    num_agent_steps_sampled: 6104152
    num_steps_sampled: 6104152
    num_steps_trained: 6104152
  iterations_since_restore: 51
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1046,45581.5,6104152,1.87878,1.9832,-2,41.5153


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6112144
  custom_metrics: {}
  date: 2021-12-10_01-42-23
  done: false
  episode_len_mean: 40.95631067961165
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9184776751451122
  episode_reward_min: 0.39879998564720154
  episodes_this_iter: 206
  episodes_total: 90270
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.005672050639987
          entropy_coeff: 0.0
          kl: 0.01860074361320585
          policy_loss: -0.1224848689744249
          total_loss: -0.07436192469322123
          vf_explained_var: 0.7876872420310974
          vf_loss: 0.029289689322467893
    num_agent_steps_sampled: 6112144
    num_steps_sampled: 6112144
    num_steps_trained: 6112144
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1047,45627.9,6112144,1.91848,1.9836,0.3988,40.9563


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6120136
  custom_metrics: {}
  date: 2021-12-10_01-43-09
  done: false
  episode_len_mean: 40.0353982300885
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8866707967445913
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 90496
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9884199891239405
          entropy_coeff: 0.0
          kl: 0.017148445884231478
          policy_loss: -0.10082578915171325
          total_loss: -0.059356895537348464
          vf_explained_var: 0.8391370177268982
          vf_loss: 0.02410609007347375
    num_agent_steps_sampled: 6120136
    num_steps_sampled: 6120136
    num_steps_trained: 6120136
  iterations_since_restore: 53
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1048,45673.9,6120136,1.88667,1.9848,-2,40.0354


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6128128
  custom_metrics: {}
  date: 2021-12-10_01-43-56
  done: false
  episode_len_mean: 39.116504854368934
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8674737880530867
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 90702
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0006925910711288
          entropy_coeff: 0.0
          kl: 0.018033761502010748
          policy_loss: -0.10643826017621905
          total_loss: -0.06059187931532506
          vf_explained_var: 0.8669005036354065
          vf_loss: 0.02758719160920009
    num_agent_steps_sampled: 6128128
    num_steps_sampled: 6128128
    num_steps_trained: 6128128
  iterations_since_restore: 54


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1049,45720,6128128,1.86747,1.9848,-2,39.1165


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6136120
  custom_metrics: {}
  date: 2021-12-10_01-44-42
  done: false
  episode_len_mean: 37.03030303030303
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9263010097272468
  episode_reward_min: 1.6064000129699707
  episodes_this_iter: 198
  episodes_total: 90900
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9998316317796707
          entropy_coeff: 0.0
          kl: 0.018209014495369047
          policy_loss: -0.10833571296598166
          total_loss: -0.06542658049147576
          vf_explained_var: 0.7423131465911865
          vf_loss: 0.024472503049764782
    num_agent_steps_sampled: 6136120
    num_steps_sampled: 6136120
    num_steps_trained: 6136120
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1050,45766.5,6136120,1.9263,1.9796,1.6064,37.0303


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6144112
  custom_metrics: {}
  date: 2021-12-10_01-45-29
  done: false
  episode_len_mean: 42.243781094527364
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9159164179616899
  episode_reward_min: 1.4615999460220337
  episodes_this_iter: 201
  episodes_total: 91101
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0400596652179956
          entropy_coeff: 0.0
          kl: 0.019881998596247286
          policy_loss: -0.10830101900501177
          total_loss: -0.06235101752099581
          vf_explained_var: 0.7259507179260254
          vf_loss: 0.025819481117650867
    num_agent_steps_sampled: 6144112
    num_steps_sampled: 6144112
    num_steps_trained: 6144112
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1051,45813.1,6144112,1.91592,1.9804,1.4616,42.2438


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6152104
  custom_metrics: {}
  date: 2021-12-10_01-46-15
  done: false
  episode_len_mean: 39.70754716981132
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8853150982901734
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 91313
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.003924747928977
          entropy_coeff: 0.0
          kl: 0.0176186315366067
          policy_loss: -0.10627116332761943
          total_loss: -0.06308256753254682
          vf_explained_var: 0.8367380499839783
          vf_loss: 0.0253497296362184
    num_agent_steps_sampled: 6152104
    num_steps_sampled: 6152104
    num_steps_trained: 6152104
  iterations_since_restore: 57
  nod

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1052,45859,6152104,1.88532,1.9836,-2,39.7075


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6160096
  custom_metrics: {}
  date: 2021-12-10_01-47-01
  done: false
  episode_len_mean: 33.97826086956522
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9153286980546038
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 91543
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9816076792776585
          entropy_coeff: 0.0
          kl: 0.018316273140953854
          policy_loss: -0.10462104616453871
          total_loss: -0.05858201312366873
          vf_explained_var: 0.7454509735107422
          vf_loss: 0.027493808360304683
    num_agent_steps_sampled: 6160096
    num_steps_sampled: 6160096
    num_steps_trained: 6160096
  iterations_since_restore: 58


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1053,45905,6160096,1.91533,1.9808,-2,33.9783


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6168088
  custom_metrics: {}
  date: 2021-12-10_01-47-47
  done: false
  episode_len_mean: 31.75609756097561
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.936837399878153
  episode_reward_min: 1.6360000371932983
  episodes_this_iter: 246
  episodes_total: 91789
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9427994675934315
          entropy_coeff: 0.0
          kl: 0.018111984187271446
          policy_loss: -0.1042520347982645
          total_loss: -0.061117363948142156
          vf_explained_var: 0.7378287315368652
          vf_loss: 0.024796285782940686
    num_agent_steps_sampled: 6168088
    num_steps_sampled: 6168088
    num_steps_trained: 6168088
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1054,45951,6168088,1.93684,1.9796,1.636,31.7561


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6176080
  custom_metrics: {}
  date: 2021-12-10_01-48-32
  done: false
  episode_len_mean: 36.65277777777778
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8910018531260666
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 92005
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0056775640696287
          entropy_coeff: 0.0
          kl: 0.01694532245164737
          policy_loss: -0.09740883612539619
          total_loss: -0.05620755974086933
          vf_explained_var: 0.8240151405334473
          vf_loss: 0.024044137622695416
    num_agent_steps_sampled: 6176080
    num_steps_sampled: 6176080
    num_steps_trained: 6176080
  iterations_since_restore: 60
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1055,45996.7,6176080,1.891,1.9836,-2,36.6528


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6184072
  custom_metrics: {}
  date: 2021-12-10_01-49-19
  done: false
  episode_len_mean: 32.625984251968504
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.919930710567264
  episode_reward_min: -2.0
  episodes_this_iter: 254
  episodes_total: 92259
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.933326717466116
          entropy_coeff: 0.0
          kl: 0.01857047266094014
          policy_loss: -0.10740595089737326
          total_loss: -0.06802064494695514
          vf_explained_var: 0.7321368455886841
          vf_loss: 0.020582700497470796
    num_agent_steps_sampled: 6184072
    num_steps_sampled: 6184072
    num_steps_trained: 6184072
  iterations_since_restore: 61
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1056,46043,6184072,1.91993,1.9796,-2,32.626


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6192064
  custom_metrics: {}
  date: 2021-12-10_01-50-05
  done: false
  episode_len_mean: 34.521929824561404
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9142350873403382
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 92487
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9673377797007561
          entropy_coeff: 0.0
          kl: 0.017521158646559343
          policy_loss: -0.10410328811849467
          total_loss: -0.06275578774511814
          vf_explained_var: 0.774193286895752
          vf_loss: 0.02360732591478154
    num_agent_steps_sampled: 6192064
    num_steps_sampled: 6192064
    num_steps_trained: 6192064
  iterations_since_restore: 62
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1057,46088.8,6192064,1.91424,1.9804,-2,34.5219


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6200056
  custom_metrics: {}
  date: 2021-12-10_01-50-51
  done: false
  episode_len_mean: 35.95939086294416
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9087066002908697
  episode_reward_min: -2.0
  episodes_this_iter: 197
  episodes_total: 92684
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.063371641561389
          entropy_coeff: 0.0
          kl: 0.017426325764972717
          policy_loss: -0.10351689154049382
          total_loss: -0.06484986315263086
          vf_explained_var: 0.8474355340003967
          vf_loss: 0.021022875502239913
    num_agent_steps_sampled: 6200056
    num_steps_sampled: 6200056
    num_steps_trained: 6200056
  iterations_since_restore: 63
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1058,46135,6200056,1.90871,1.9808,-2,35.9594


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6208048
  custom_metrics: {}
  date: 2021-12-10_01-51-37
  done: false
  episode_len_mean: 42.63592233009709
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8958757305608211
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 92890
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9817305561155081
          entropy_coeff: 0.0
          kl: 0.017856471269624308
          policy_loss: -0.10226437484379858
          total_loss: -0.0569160096347332
          vf_explained_var: 0.7106859087944031
          vf_loss: 0.02726868714671582
    num_agent_steps_sampled: 6208048
    num_steps_sampled: 6208048
    num_steps_trained: 6208048
  iterations_since_restore: 64
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1059,46180.8,6208048,1.89588,1.9804,-2,42.6359


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6216040
  custom_metrics: {}
  date: 2021-12-10_01-52-23
  done: false
  episode_len_mean: 34.607142857142854
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9135196411183901
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 93114
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9747814033180475
          entropy_coeff: 0.0
          kl: 0.01833868760149926
          policy_loss: -0.10064494400285184
          total_loss: -0.05692577447916847
          vf_explained_var: 0.7629036903381348
          vf_loss: 0.025151250360067934
    num_agent_steps_sampled: 6216040
    num_steps_sampled: 6216040
    num_steps_trained: 6216040
  iterations_since_restore: 65


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1060,46226.8,6216040,1.91352,1.9804,-2,34.6071


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6224032
  custom_metrics: {}
  date: 2021-12-10_01-53-09
  done: false
  episode_len_mean: 39.175879396984925
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9220301547841212
  episode_reward_min: 1.4651999473571777
  episodes_this_iter: 199
  episodes_total: 93313
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9994831159710884
          entropy_coeff: 0.0
          kl: 0.018367209297139198
          policy_loss: -0.11026757617946714
          total_loss: -0.06852833290759008
          vf_explained_var: 0.7645137310028076
          vf_loss: 0.02314244204899296
    num_agent_steps_sampled: 6224032
    num_steps_sampled: 6224032
    num_steps_trained: 6224032
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1061,46272.8,6224032,1.92203,1.9804,1.4652,39.1759


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6232024
  custom_metrics: {}
  date: 2021-12-10_01-53-55
  done: false
  episode_len_mean: 37.31718061674009
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9257480200166743
  episode_reward_min: 0.9648000001907349
  episodes_this_iter: 227
  episodes_total: 93540
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9802640359848738
          entropy_coeff: 0.0
          kl: 0.01879551768070087
          policy_loss: -0.11380378803005442
          total_loss: -0.072908905451186
          vf_explained_var: 0.8097636699676514
          vf_loss: 0.021864420064957812
    num_agent_steps_sampled: 6232024
    num_steps_sampled: 6232024
    num_steps_trained: 6232024
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1062,46318.6,6232024,1.92575,1.9828,0.9648,37.3172


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6240016
  custom_metrics: {}
  date: 2021-12-10_01-54-41
  done: false
  episode_len_mean: 33.8963963963964
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9146936954678715
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 93762
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0204226467758417
          entropy_coeff: 0.0
          kl: 0.018052399769658223
          policy_loss: -0.10990785056492314
          total_loss: -0.06311206580721773
          vf_explained_var: 0.7890365719795227
          vf_loss: 0.02851773053407669
    num_agent_steps_sampled: 6240016
    num_steps_sampled: 6240016
    num_steps_trained: 6240016
  iterations_since_restore: 68
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1063,46364.7,6240016,1.91469,1.9804,-2,33.8964


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6248008
  custom_metrics: {}
  date: 2021-12-10_01-55-27
  done: false
  episode_len_mean: 40.47115384615385
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9006596138844123
  episode_reward_min: -2.0
  episodes_this_iter: 208
  episodes_total: 93970
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0177917126566172
          entropy_coeff: 0.0
          kl: 0.01801914893439971
          policy_loss: -0.10132273635827005
          total_loss: -0.05145408817770658
          vf_explained_var: 0.7384796142578125
          vf_loss: 0.03162425890332088
    num_agent_steps_sampled: 6248008
    num_steps_sampled: 6248008
    num_steps_trained: 6248008
  iterations_since_restore: 69
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1064,46410.5,6248008,1.90066,1.9832,-2,40.4712


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6256000
  custom_metrics: {}
  date: 2021-12-10_01-56-12
  done: false
  episode_len_mean: 35.79475982532751
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9288366819573282
  episode_reward_min: 1.0424000024795532
  episodes_this_iter: 229
  episodes_total: 94199
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9232778437435627
          entropy_coeff: 0.0
          kl: 0.019413003756199032
          policy_loss: -0.10717619984643534
          total_loss: -0.06203709670808166
          vf_explained_var: 0.7014096975326538
          vf_loss: 0.02548343501985073
    num_agent_steps_sampled: 6256000
    num_steps_sampled: 6256000
    num_steps_trained: 6256000
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1065,46456.1,6256000,1.92884,1.9808,1.0424,35.7948


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6263992
  custom_metrics: {}
  date: 2021-12-10_01-56-58
  done: false
  episode_len_mean: 32.70995670995671
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9350043303006654
  episode_reward_min: 1.673200011253357
  episodes_this_iter: 231
  episodes_total: 94430
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9469859525561333
          entropy_coeff: 0.0
          kl: 0.0192322016810067
          policy_loss: -0.11011616970063187
          total_loss: -0.06720296907587908
          vf_explained_var: 0.7234221696853638
          vf_loss: 0.02344059431925416
    num_agent_steps_sampled: 6263992
    num_steps_sampled: 6263992
    num_steps_trained: 6263992
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1066,46501.9,6263992,1.935,1.9812,1.6732,32.71


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6271984
  custom_metrics: {}
  date: 2021-12-10_01-57-44
  done: false
  episode_len_mean: 38.76525821596244
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9228901406968704
  episode_reward_min: 0.9616000056266785
  episodes_this_iter: 213
  episodes_total: 94643
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9603495318442583
          entropy_coeff: 0.0
          kl: 0.019207318255212158
          policy_loss: -0.11177419481100515
          total_loss: -0.06911348985158838
          vf_explained_var: 0.7327648997306824
          vf_loss: 0.023213295498862863
    num_agent_steps_sampled: 6271984
    num_steps_sampled: 6271984
    num_steps_trained: 6271984
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1067,46548,6271984,1.92289,1.9844,0.9616,38.7653


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6279976
  custom_metrics: {}
  date: 2021-12-10_01-58-31
  done: false
  episode_len_mean: 33.122270742358076
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9003720507351072
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 94872
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9642072040587664
          entropy_coeff: 0.0
          kl: 0.017431566258892417
          policy_loss: -0.10111974406754598
          total_loss: -0.05864823466981761
          vf_explained_var: 0.8021786212921143
          vf_loss: 0.024822050123475492
    num_agent_steps_sampled: 6279976
    num_steps_sampled: 6279976
    num_steps_trained: 6279976
  iterations_since_restore: 73

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1068,46595.1,6279976,1.90037,1.9772,-2,33.1223


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6287968
  custom_metrics: {}
  date: 2021-12-10_01-59-18
  done: false
  episode_len_mean: 38.363207547169814
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9051679282818201
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 95084
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9766252413392067
          entropy_coeff: 0.0
          kl: 0.017734063876559958
          policy_loss: -0.10516495973570272
          total_loss: -0.06145074267988093
          vf_explained_var: 0.7776285409927368
          vf_loss: 0.02575847797561437
    num_agent_steps_sampled: 6287968
    num_steps_sampled: 6287968
    num_steps_trained: 6287968
  iterations_since_restore: 74


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1069,46641.6,6287968,1.90517,1.9844,-2,38.3632


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6295960
  custom_metrics: {}
  date: 2021-12-10_02-00-04
  done: false
  episode_len_mean: 38.96954314720812
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9224710658116995
  episode_reward_min: 1.128000020980835
  episodes_this_iter: 197
  episodes_total: 95281
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0398240704089403
          entropy_coeff: 0.0
          kl: 0.018559456046205014
          policy_loss: -0.11153261989238672
          total_loss: -0.07044234644854441
          vf_explained_var: 0.8251901865005493
          vf_loss: 0.022298824740573764
    num_agent_steps_sampled: 6295960
    num_steps_sampled: 6295960
    num_steps_trained: 6295960
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1070,46687.9,6295960,1.92247,1.9808,1.128,38.9695


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6303952
  custom_metrics: {}
  date: 2021-12-10_02-00-50
  done: false
  episode_len_mean: 39.20673076923077
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.903005769619575
  episode_reward_min: -2.0
  episodes_this_iter: 208
  episodes_total: 95489
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9926503300666809
          entropy_coeff: 0.0
          kl: 0.018727719492744654
          policy_loss: -0.10003049512306461
          total_loss: -0.052606178855057806
          vf_explained_var: 0.7931479215621948
          vf_loss: 0.028462498856242746
    num_agent_steps_sampled: 6303952
    num_steps_sampled: 6303952
    num_steps_trained: 6303952
  iterations_since_restore: 76


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1071,46734,6303952,1.90301,1.9844,-2,39.2067


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6311944
  custom_metrics: {}
  date: 2021-12-10_02-01-36
  done: false
  episode_len_mean: 36.56930693069307
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.89024158456538
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 95691
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9938581306487322
          entropy_coeff: 0.0
          kl: 0.018170993629610166
          policy_loss: -0.09729055329808034
          total_loss: -0.05094369428115897
          vf_explained_var: 0.8327495455741882
          vf_loss: 0.02794872783124447
    num_agent_steps_sampled: 6311944
    num_steps_sampled: 6311944
    num_steps_trained: 6311944
  iterations_since_restore: 77
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1072,46779.8,6311944,1.89024,1.9808,-2,36.5693


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6319936
  custom_metrics: {}
  date: 2021-12-10_02-02-22
  done: false
  episode_len_mean: 36.15075376884422
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9280321598052979
  episode_reward_min: 1.6548000574111938
  episodes_this_iter: 199
  episodes_total: 95890
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0453311651945114
          entropy_coeff: 0.0
          kl: 0.019174463057424873
          policy_loss: -0.11409746529534459
          total_loss: -0.07154593514860608
          vf_explained_var: 0.8239549994468689
          vf_loss: 0.02313738700468093
    num_agent_steps_sampled: 6319936
    num_steps_sampled: 6319936
    num_steps_trained: 6319936
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1073,46825.7,6319936,1.92803,1.9796,1.6548,36.1508


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6327928
  custom_metrics: {}
  date: 2021-12-10_02-03-08
  done: false
  episode_len_mean: 44.11616161616162
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9120969651925444
  episode_reward_min: 0.0
  episodes_this_iter: 198
  episodes_total: 96088
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0547564905136824
          entropy_coeff: 0.0
          kl: 0.019337551668286324
          policy_loss: -0.10999998822808266
          total_loss: -0.06827097784844227
          vf_explained_var: 0.8326543569564819
          vf_loss: 0.02214973553782329
    num_agent_steps_sampled: 6327928
    num_steps_sampled: 6327928
    num_steps_trained: 6327928
  iterations_since_restore: 79
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1074,46871.3,6327928,1.9121,1.982,0,44.1162


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6335920
  custom_metrics: {}
  date: 2021-12-10_02-03-54
  done: false
  episode_len_mean: 41.42788461538461
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8990865395619319
  episode_reward_min: -2.0
  episodes_this_iter: 208
  episodes_total: 96296
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0046508330851793
          entropy_coeff: 0.0
          kl: 0.017573528399225324
          policy_loss: -0.10100630472879857
          total_loss: -0.05854479130357504
          vf_explained_var: 0.8221150040626526
          vf_loss: 0.024668316647876054
    num_agent_steps_sampled: 6335920
    num_steps_sampled: 6335920
    num_steps_trained: 6335920
  iterations_since_restore: 80


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1075,46917.6,6335920,1.89909,1.982,-2,41.4279


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6343912
  custom_metrics: {}
  date: 2021-12-10_02-04-40
  done: false
  episode_len_mean: 41.07027027027027
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8569427052059688
  episode_reward_min: -2.0
  episodes_this_iter: 185
  episodes_total: 96481
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0243272352963686
          entropy_coeff: 0.0
          kl: 0.016799456265289336
          policy_loss: -0.10384611863992177
          total_loss: -0.058622149867005646
          vf_explained_var: 0.8025479316711426
          vf_loss: 0.028214519727043808
    num_agent_steps_sampled: 6343912
    num_steps_sampled: 6343912
    num_steps_trained: 6343912
  iterations_since_restore: 81


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1076,46963.7,6343912,1.85694,1.9816,-2,41.0703


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6351904
  custom_metrics: {}
  date: 2021-12-10_02-05-27
  done: false
  episode_len_mean: 42.66842105263158
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9150505310610721
  episode_reward_min: 1.5687999725341797
  episodes_this_iter: 190
  episodes_total: 96671
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0542019568383694
          entropy_coeff: 0.0
          kl: 0.018297406495548785
          policy_loss: -0.10744378704112023
          total_loss: -0.06354142751661129
          vf_explained_var: 0.7626065015792847
          vf_loss: 0.02537623356329277
    num_agent_steps_sampled: 6351904
    num_steps_sampled: 6351904
    num_steps_trained: 6351904
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1077,47010.1,6351904,1.91505,1.982,1.5688,42.6684


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6359896
  custom_metrics: {}
  date: 2021-12-10_02-06-12
  done: false
  episode_len_mean: 45.84782608695652
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.908708696132121
  episode_reward_min: 0.0
  episodes_this_iter: 184
  episodes_total: 96855
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0285164210945368
          entropy_coeff: 0.0
          kl: 0.019302607630379498
          policy_loss: -0.11568084033206105
          total_loss: -0.06930204780655913
          vf_explained_var: 0.7371119260787964
          vf_loss: 0.026834903052076697
    num_agent_steps_sampled: 6359896
    num_steps_sampled: 6359896
    num_steps_trained: 6359896
  iterations_since_restore: 83
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1078,47055.9,6359896,1.90871,1.982,0,45.8478


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6367888
  custom_metrics: {}
  date: 2021-12-10_02-06-59
  done: false
  episode_len_mean: 39.447236180904525
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9215035198920936
  episode_reward_min: 1.2740000486373901
  episodes_this_iter: 199
  episodes_total: 97054
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0293164998292923
          entropy_coeff: 0.0
          kl: 0.019763450138270855
          policy_loss: -0.11372861615382135
          total_loss: -0.07064838262158446
          vf_explained_var: 0.80460125207901
          vf_loss: 0.023069741029758006
    num_agent_steps_sampled: 6367888
    num_steps_sampled: 6367888
    num_steps_trained: 6367888
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1079,47102,6367888,1.9215,1.982,1.274,39.4472


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6375880
  custom_metrics: {}
  date: 2021-12-10_02-07-45
  done: false
  episode_len_mean: 44.70224719101124
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8682471915577235
  episode_reward_min: -2.0
  episodes_this_iter: 178
  episodes_total: 97232
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0216359160840511
          entropy_coeff: 0.0
          kl: 0.018905856180936098
          policy_loss: -0.10844641225412488
          total_loss: -0.06201130137196742
          vf_explained_var: 0.7749755382537842
          vf_loss: 0.027292932732962072
    num_agent_steps_sampled: 6375880
    num_steps_sampled: 6375880
    num_steps_trained: 6375880
  iterations_since_restore: 85


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1080,47148.4,6375880,1.86825,1.982,-2,44.7022


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6383872
  custom_metrics: {}
  date: 2021-12-10_02-08-32
  done: false
  episode_len_mean: 42.49514563106796
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.896223301447711
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 97438
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9971543475985527
          entropy_coeff: 0.0
          kl: 0.018422459193971008
          policy_loss: -0.10835657885763794
          total_loss: -0.06292240982293151
          vf_explained_var: 0.7402636408805847
          vf_loss: 0.02678142476361245
    num_agent_steps_sampled: 6383872
    num_steps_sampled: 6383872
    num_steps_trained: 6383872
  iterations_since_restore: 86
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1081,47195.5,6383872,1.89622,1.982,-2,42.4951


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6391864
  custom_metrics: {}
  date: 2021-12-10_02-09-18
  done: false
  episode_len_mean: 37.10679611650485
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9093436893907565
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 97644
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9626598227769136
          entropy_coeff: 0.0
          kl: 0.019651872396934777
          policy_loss: -0.11394471419043839
          total_loss: -0.07189595804084092
          vf_explained_var: 0.7816704511642456
          vf_loss: 0.02215123304631561
    num_agent_steps_sampled: 6391864
    num_steps_sampled: 6391864
    num_steps_trained: 6391864
  iterations_since_restore: 87
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1082,47241.7,6391864,1.90934,1.982,-2,37.1068


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6399856
  custom_metrics: {}
  date: 2021-12-10_02-10-05
  done: false
  episode_len_mean: 36.53603603603604
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9111333344433759
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 97866
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9424702320247889
          entropy_coeff: 0.0
          kl: 0.01829329802421853
          policy_loss: -0.10511166707146913
          total_loss: -0.0649032664950937
          vf_explained_var: 0.7701067924499512
          vf_loss: 0.021686434338334948
    num_agent_steps_sampled: 6399856
    num_steps_sampled: 6399856
    num_steps_trained: 6399856
  iterations_since_restore: 88
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1083,47288.8,6399856,1.91113,1.982,-2,36.536


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6407848
  custom_metrics: {}
  date: 2021-12-10_02-10-53
  done: false
  episode_len_mean: 36.64732142857143
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9096482156642847
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 98090
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9345237351953983
          entropy_coeff: 0.0
          kl: 0.017760701302904636
          policy_loss: -0.09339207044104114
          total_loss: -0.04945335540105589
          vf_explained_var: 0.7769970893859863
          vf_loss: 0.02595600363565609
    num_agent_steps_sampled: 6407848
    num_steps_sampled: 6407848
    num_steps_trained: 6407848
  iterations_since_restore: 89
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1084,47336.5,6407848,1.90965,1.9808,-2,36.6473


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6415840
  custom_metrics: {}
  date: 2021-12-10_02-11-41
  done: false
  episode_len_mean: 35.47290640394089
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9294699530296138
  episode_reward_min: 1.5568000078201294
  episodes_this_iter: 203
  episodes_total: 98293
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9697002936154604
          entropy_coeff: 0.0
          kl: 0.01893040252616629
          policy_loss: -0.10834868653910235
          total_loss: -0.06626243249047548
          vf_explained_var: 0.7935038805007935
          vf_loss: 0.02291921799769625
    num_agent_steps_sampled: 6415840
    num_steps_sampled: 6415840
    num_steps_trained: 6415840
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1085,47383.9,6415840,1.92947,1.9804,1.5568,35.4729


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6423832
  custom_metrics: {}
  date: 2021-12-10_02-12-28
  done: false
  episode_len_mean: 38.81220657276995
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9050610311714136
  episode_reward_min: -2.0
  episodes_this_iter: 213
  episodes_total: 98506
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9539393242448568
          entropy_coeff: 0.0
          kl: 0.01871835795463994
          policy_loss: -0.10617696316330694
          total_loss: -0.06375399109674618
          vf_explained_var: 0.7992234230041504
          vf_loss: 0.023470635467674583
    num_agent_steps_sampled: 6423832
    num_steps_sampled: 6423832
    num_steps_trained: 6423832
  iterations_since_restore: 91
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1086,47430.9,6423832,1.90506,1.9828,-2,38.8122


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6431824
  custom_metrics: {}
  date: 2021-12-10_02-13-15
  done: false
  episode_len_mean: 34.2152466367713
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9151282497585622
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 98729
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9399244878441095
          entropy_coeff: 0.0
          kl: 0.018964550661621615
          policy_loss: -0.10345530114136636
          total_loss: -0.06269272441568319
          vf_explained_var: 0.8085314631462097
          vf_loss: 0.021560968569247052
    num_agent_steps_sampled: 6431824
    num_steps_sampled: 6431824
    num_steps_trained: 6431824
  iterations_since_restore: 92
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1087,47478,6431824,1.91513,1.982,-2,34.2152


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6439816
  custom_metrics: {}
  date: 2021-12-10_02-14-02
  done: false
  episode_len_mean: 35.21186440677966
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8805322040945798
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 98965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9205121211707592
          entropy_coeff: 0.0
          kl: 0.016050451376941055
          policy_loss: -0.09312607545871288
          total_loss: -0.05091890886251349
          vf_explained_var: 0.8223755955696106
          vf_loss: 0.02595608407864347
    num_agent_steps_sampled: 6439816
    num_steps_sampled: 6439816
    num_steps_trained: 6439816
  iterations_since_restore: 93
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1088,47525.3,6439816,1.88053,1.9824,-2,35.2119


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6447808
  custom_metrics: {}
  date: 2021-12-10_02-14-49
  done: false
  episode_len_mean: 32.44017094017094
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9189333273814275
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 99199
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9248621203005314
          entropy_coeff: 0.0
          kl: 0.018145304755307734
          policy_loss: -0.10018338475492783
          total_loss: -0.06025834349566139
          vf_explained_var: 0.783219575881958
          vf_loss: 0.021552915626671165
    num_agent_steps_sampled: 6447808
    num_steps_sampled: 6447808
    num_steps_trained: 6447808
  iterations_since_restore: 94
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1089,47571.8,6447808,1.91893,1.9824,-2,32.4402


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6455800
  custom_metrics: {}
  date: 2021-12-10_02-15-36
  done: false
  episode_len_mean: 37.327354260089685
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9085345257558095
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 99422
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9626981653273106
          entropy_coeff: 0.0
          kl: 0.018320777569897473
          policy_loss: -0.10414063328062184
          total_loss: -0.06187354662688449
          vf_explained_var: 0.849219024181366
          vf_loss: 0.023717297473922372
    num_agent_steps_sampled: 6455800
    num_steps_sampled: 6455800
    num_steps_trained: 6455800
  iterations_since_restore: 95


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1090,47618.8,6455800,1.90853,1.9848,-2,37.3274


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6463792
  custom_metrics: {}
  date: 2021-12-10_02-16-22
  done: false
  episode_len_mean: 39.15686274509804
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9032803931657005
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 99626
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0370619222521782
          entropy_coeff: 0.0
          kl: 0.019317100814078003
          policy_loss: -0.10801930702291429
          total_loss: -0.06260654875950422
          vf_explained_var: 0.8301160335540771
          vf_loss: 0.025854191393591464
    num_agent_steps_sampled: 6463792
    num_steps_sampled: 6463792
    num_steps_trained: 6463792
  iterations_since_restore: 96


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1091,47665,6463792,1.90328,1.9848,-2,39.1569


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6471784
  custom_metrics: {}
  date: 2021-12-10_02-17-09
  done: false
  episode_len_mean: 41.51
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9173480010032653
  episode_reward_min: 1.24399995803833
  episodes_this_iter: 200
  episodes_total: 99826
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9926967099308968
          entropy_coeff: 0.0
          kl: 0.0189436980872415
          policy_loss: -0.11399014020571485
          total_loss: -0.06915912264958024
          vf_explained_var: 0.7820996046066284
          vf_loss: 0.02565052261343226
    num_agent_steps_sampled: 6471784
    num_steps_sampled: 6471784
    num_steps_trained: 6471784
  iterations_since_restore: 97
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1092,47712,6471784,1.91735,1.9844,1.244,41.51


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6479776
  custom_metrics: {}
  date: 2021-12-10_02-17-55
  done: false
  episode_len_mean: 36.3469387755102
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9276734724336742
  episode_reward_min: 1.6548000574111938
  episodes_this_iter: 196
  episodes_total: 100022
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0009580478072166
          entropy_coeff: 0.0
          kl: 0.019633038144093007
          policy_loss: -0.1115752513287589
          total_loss: -0.06586794319446199
          vf_explained_var: 0.7833129167556763
          vf_loss: 0.025828854355495423
    num_agent_steps_sampled: 6479776
    num_steps_sampled: 6479776
    num_steps_trained: 6479776
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1093,47758.4,6479776,1.92767,1.9824,1.6548,36.3469


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6487768
  custom_metrics: {}
  date: 2021-12-10_02-18-42
  done: false
  episode_len_mean: 35.049107142857146
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9303642878574985
  episode_reward_min: 1.3616000413894653
  episodes_this_iter: 224
  episodes_total: 100246
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0047542564570904
          entropy_coeff: 0.0
          kl: 0.019523062917869538
          policy_loss: -0.10740361566422507
          total_loss: -0.06696639143046923
          vf_explained_var: 0.8384959697723389
          vf_loss: 0.020670122088631615
    num_agent_steps_sampled: 6487768
    num_steps_sampled: 6487768
    num_steps_trained: 6487768
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1094,47805.1,6487768,1.93036,1.9844,1.3616,35.0491


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6495760
  custom_metrics: {}
  date: 2021-12-10_02-19-29
  done: false
  episode_len_mean: 40.171875
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9200041654209297
  episode_reward_min: 1.0628000497817993
  episodes_this_iter: 192
  episodes_total: 100438
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0432093553245068
          entropy_coeff: 0.0
          kl: 0.0192903017741628
          policy_loss: -0.10794768313644454
          total_loss: -0.06669682145002298
          vf_explained_var: 0.8328326344490051
          vf_loss: 0.02171942754648626
    num_agent_steps_sampled: 6495760
    num_steps_sampled: 6495760
    num_steps_trained: 6495760
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1095,47851.9,6495760,1.92,1.9824,1.0628,40.1719


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6503752
  custom_metrics: {}
  date: 2021-12-10_02-20-16
  done: false
  episode_len_mean: 42.794392523364486
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.914857941079084
  episode_reward_min: 0.09719999879598618
  episodes_this_iter: 214
  episodes_total: 100652
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0085828825831413
          entropy_coeff: 0.0
          kl: 0.01865096390247345
          policy_loss: -0.11381270558922552
          total_loss: -0.0725712327985093
          vf_explained_var: 0.7784615755081177
          vf_loss: 0.022357370937243104
    num_agent_steps_sampled: 6503752
    num_steps_sampled: 6503752
    num_steps_trained: 6503752
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1096,47898.8,6503752,1.91486,1.9796,0.0972,42.7944


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6511744
  custom_metrics: {}
  date: 2021-12-10_02-21-03
  done: false
  episode_len_mean: 43.01546391752577
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.8940453624602445
  episode_reward_min: -2.0
  episodes_this_iter: 194
  episodes_total: 100846
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0055761579424143
          entropy_coeff: 0.0
          kl: 0.01742004629340954
          policy_loss: -0.10490798245882615
          total_loss: -0.06633048452204093
          vf_explained_var: 0.8113049268722534
          vf_loss: 0.020939701702445745
    num_agent_steps_sampled: 6511744
    num_steps_sampled: 6511744
    num_steps_trained: 6511744
  iterations_since_restore: 102

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1097,47945.7,6511744,1.89405,1.9768,-2,43.0155


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6519736
  custom_metrics: {}
  date: 2021-12-10_02-21-49
  done: false
  episode_len_mean: 37.4300518134715
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8850984468361256
  episode_reward_min: -2.0
  episodes_this_iter: 193
  episodes_total: 101039
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0346231069415808
          entropy_coeff: 0.0
          kl: 0.01775893455487676
          policy_loss: -0.1056659130990738
          total_loss: -0.06604486337164417
          vf_explained_var: 0.8321465253829956
          vf_loss: 0.021640129562001675
    num_agent_steps_sampled: 6519736
    num_steps_sampled: 6519736
    num_steps_trained: 6519736
  iterations_since_restore: 103
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1098,47992,6519736,1.8851,1.9772,-2,37.4301


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6527728
  custom_metrics: {}
  date: 2021-12-10_02-22-36
  done: false
  episode_len_mean: 41.855670103092784
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9167298994113489
  episode_reward_min: 1.1887999773025513
  episodes_this_iter: 194
  episodes_total: 101233
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9974390733987093
          entropy_coeff: 0.0
          kl: 0.019337732170242816
          policy_loss: -0.11510079620347824
          total_loss: -0.07360533281462267
          vf_explained_var: 0.7664821147918701
          vf_loss: 0.021916012570727617
    num_agent_steps_sampled: 6527728
    num_steps_sampled: 6527728
    num_steps_trained: 6527728
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1099,48038.5,6527728,1.91673,1.9772,1.1888,41.8557


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6535720
  custom_metrics: {}
  date: 2021-12-10_02-23-23
  done: false
  episode_len_mean: 38.47598253275109
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9065135358202405
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 101462
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8991988170892
          entropy_coeff: 0.0
          kl: 0.018108837539330125
          policy_loss: -0.1018755189870717
          total_loss: -0.06034114572685212
          vf_explained_var: 0.7437942028045654
          vf_loss: 0.023199174203909934
    num_agent_steps_sampled: 6535720
    num_steps_sampled: 6535720
    num_steps_trained: 6535720
  iterations_since_restore: 105
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1100,48085.4,6535720,1.90651,1.9772,-2,38.476


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6543712
  custom_metrics: {}
  date: 2021-12-10_02-24-11
  done: false
  episode_len_mean: 34.346666666666664
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.931727998521593
  episode_reward_min: 1.6424000263214111
  episodes_this_iter: 225
  episodes_total: 101687
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9203525204211473
          entropy_coeff: 0.0
          kl: 0.01933663757517934
          policy_loss: -0.10378613852662966
          total_loss: -0.061733812239253893
          vf_explained_var: 0.7524006366729736
          vf_loss: 0.022473978577181697
    num_agent_steps_sampled: 6543712
    num_steps_sampled: 6543712
    num_steps_trained: 6543712
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1101,48133.6,6543712,1.93173,1.9808,1.6424,34.3467


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6551704
  custom_metrics: {}
  date: 2021-12-10_02-24-58
  done: false
  episode_len_mean: 34.35217391304348
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9317391286725583
  episode_reward_min: 1.6483999490737915
  episodes_this_iter: 230
  episodes_total: 101917
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9189121350646019
          entropy_coeff: 0.0
          kl: 0.019460759300272912
          policy_loss: -0.10634263412794098
          total_loss: -0.06450286926701665
          vf_explained_var: 0.7171289920806885
          vf_loss: 0.022135744336992502
    num_agent_steps_sampled: 6551704
    num_steps_sampled: 6551704
    num_steps_trained: 6551704
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1102,48180.7,6551704,1.93174,1.9808,1.6484,34.3522


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6559696
  custom_metrics: {}
  date: 2021-12-10_02-25-45
  done: false
  episode_len_mean: 34.1858407079646
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8987964586874024
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 102143
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9316812846809626
          entropy_coeff: 0.0
          kl: 0.018038972542854026
          policy_loss: -0.0929648962628562
          total_loss: -0.05461825463135028
          vf_explained_var: 0.8299916982650757
          vf_loss: 0.02008218044647947
    num_agent_steps_sampled: 6559696
    num_steps_sampled: 6559696
    num_steps_trained: 6559696
  iterations_since_restore: 108
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1103,48227.9,6559696,1.8988,1.9788,-2,34.1858


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6567688
  custom_metrics: {}
  date: 2021-12-10_02-26-32
  done: false
  episode_len_mean: 36.4061135371179
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.910471615832966
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 102372
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9759041015058756
          entropy_coeff: 0.0
          kl: 0.017446605081204325
          policy_loss: -0.10102833469863981
          total_loss: -0.06095031416043639
          vf_explained_var: 0.7936317920684814
          vf_loss: 0.022413330560084432
    num_agent_steps_sampled: 6567688
    num_steps_sampled: 6567688
    num_steps_trained: 6567688
  iterations_since_restore: 109
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1104,48274.3,6567688,1.91047,1.9832,-2,36.4061


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6575680
  custom_metrics: {}
  date: 2021-12-10_02-27-18
  done: false
  episode_len_mean: 40.143540669856456
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9017110059135838
  episode_reward_min: -2.0
  episodes_this_iter: 209
  episodes_total: 102581
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9558273274451494
          entropy_coeff: 0.0
          kl: 0.017990099382586777
          policy_loss: -0.10750327131245285
          total_loss: -0.06661988887935877
          vf_explained_var: 0.7524417638778687
          vf_loss: 0.022668406716547906
    num_agent_steps_sampled: 6575680
    num_steps_sampled: 6575680
    num_steps_trained: 6575680
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1105,48321,6575680,1.90171,1.9836,-2,40.1435


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6583672
  custom_metrics: {}
  date: 2021-12-10_02-28-05
  done: false
  episode_len_mean: 33.094339622641506
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.897694339729705
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 102793
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9884345754981041
          entropy_coeff: 0.0
          kl: 0.01770355342887342
          policy_loss: -0.094332277396461
          total_loss: -0.05322927818633616
          vf_explained_var: 0.816231369972229
          vf_loss: 0.023178148199804127
    num_agent_steps_sampled: 6583672
    num_steps_sampled: 6583672
    num_steps_trained: 6583672
  iterations_since_restore: 111
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1106,48368,6583672,1.89769,1.9836,-2,33.0943


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6591664
  custom_metrics: {}
  date: 2021-12-10_02-28-53
  done: false
  episode_len_mean: 41.40487804878049
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9175570738024827
  episode_reward_min: 1.4592000246047974
  episodes_this_iter: 205
  episodes_total: 102998
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9655986651778221
          entropy_coeff: 0.0
          kl: 0.01896367664448917
          policy_loss: -0.11165248590987176
          total_loss: -0.06657865791930817
          vf_explained_var: 0.7483510971069336
          vf_loss: 0.02587310306262225
    num_agent_steps_sampled: 6591664
    num_steps_sampled: 6591664
    num_steps_trained: 6591664
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1107,48415.7,6591664,1.91756,1.9836,1.4592,41.4049


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6599656
  custom_metrics: {}
  date: 2021-12-10_02-29-42
  done: false
  episode_len_mean: 36.87128712871287
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.926594056115292
  episode_reward_min: 1.2416000366210938
  episodes_this_iter: 202
  episodes_total: 103200
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9651399608701468
          entropy_coeff: 0.0
          kl: 0.019968307577073574
          policy_loss: -0.10973875751369633
          total_loss: -0.06418608606327325
          vf_explained_var: 0.739250123500824
          vf_loss: 0.02533476077951491
    num_agent_steps_sampled: 6599656
    num_steps_sampled: 6599656
    num_steps_trained: 6599656
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1108,48464.6,6599656,1.92659,1.9836,1.2416,36.8713


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6607648
  custom_metrics: {}
  date: 2021-12-10_02-30-30
  done: false
  episode_len_mean: 38.773148148148145
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9228351806048993
  episode_reward_min: 1.2483999729156494
  episodes_this_iter: 216
  episodes_total: 103416
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0023264177143574
          entropy_coeff: 0.0
          kl: 0.019188597390893847
          policy_loss: -0.11482549845823087
          total_loss: -0.07288491856888868
          vf_explained_var: 0.7945860028266907
          vf_loss: 0.022512124618515372
    num_agent_steps_sampled: 6607648
    num_steps_sampled: 6607648
    num_steps_trained: 6607648
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1109,48512.7,6607648,1.92284,1.9812,1.2484,38.7731


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6615640
  custom_metrics: {}
  date: 2021-12-10_02-31-18
  done: false
  episode_len_mean: 37.518324607329845
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.925382201584222
  episode_reward_min: 1.531999945640564
  episodes_this_iter: 191
  episodes_total: 103607
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0570523589849472
          entropy_coeff: 0.0
          kl: 0.01948866929160431
          policy_loss: -0.11374317423906177
          total_loss: -0.07310340530239046
          vf_explained_var: 0.835747241973877
          vf_loss: 0.02090748809860088
    num_agent_steps_sampled: 6615640
    num_steps_sampled: 6615640
    num_steps_trained: 6615640
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1110,48560.7,6615640,1.92538,1.9852,1.532,37.5183


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6623632
  custom_metrics: {}
  date: 2021-12-10_02-32-06
  done: false
  episode_len_mean: 38.53658536585366
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9232526796620066
  episode_reward_min: 0.9395999908447266
  episodes_this_iter: 205
  episodes_total: 103812
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0110595505684614
          entropy_coeff: 0.0
          kl: 0.018929252051748335
          policy_loss: -0.11629459133837372
          total_loss: -0.07528262629057281
          vf_explained_var: 0.815037727355957
          vf_loss: 0.021846099640242755
    num_agent_steps_sampled: 6623632
    num_steps_sampled: 6623632
    num_steps_trained: 6623632
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1111,48608.1,6623632,1.92325,1.9784,0.9396,38.5366


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6631624
  custom_metrics: {}
  date: 2021-12-10_02-32-53
  done: false
  episode_len_mean: 39.25433526011561
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.921909828406538
  episode_reward_min: 1.6296000480651855
  episodes_this_iter: 173
  episodes_total: 103985
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0435711350291967
          entropy_coeff: 0.0
          kl: 0.02005312079563737
          policy_loss: -0.11255945544689894
          total_loss: -0.06974097667261958
          vf_explained_var: 0.802810549736023
          vf_loss: 0.022514693380799145
    num_agent_steps_sampled: 6631624
    num_steps_sampled: 6631624
    num_steps_trained: 6631624
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1112,48655.8,6631624,1.92191,1.9792,1.6296,39.2543


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6639616
  custom_metrics: {}
  date: 2021-12-10_02-33-41
  done: false
  episode_len_mean: 44.55555555555556
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.9112999999412783
  episode_reward_min: 0.0
  episodes_this_iter: 216
  episodes_total: 104201
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9740164000540972
          entropy_coeff: 0.0
          kl: 0.014994634955655783
          policy_loss: -0.10620228986954316
          total_loss: -0.05843233349150978
          vf_explained_var: 0.7553242444992065
          vf_loss: 0.024996853375341743
    num_agent_steps_sampled: 6639616
    num_steps_sampled: 6639616
    num_steps_trained: 6639616
  iterations_since_restore: 118

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1113,48703.1,6639616,1.9113,1.9764,0,44.5556


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6647608
  custom_metrics: {}
  date: 2021-12-10_02-34-28
  done: false
  episode_len_mean: 40.50485436893204
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.919370868831005
  episode_reward_min: 0.0
  episodes_this_iter: 206
  episodes_total: 104407
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.978412451222539
          entropy_coeff: 0.0
          kl: 0.01493674164521508
          policy_loss: -0.11270238691940904
          total_loss: -0.06701452590641566
          vf_explained_var: 0.7887356281280518
          vf_loss: 0.023002685222309083
    num_agent_steps_sampled: 6647608
    num_steps_sampled: 6647608
    num_steps_trained: 6647608
  iterations_since_restore: 119
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1114,48750.7,6647608,1.91937,1.9792,0,40.5049


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6655600
  custom_metrics: {}
  date: 2021-12-10_02-35-16
  done: false
  episode_len_mean: 37.3716814159292
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9092106186183153
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 104633
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9303281679749489
          entropy_coeff: 0.0
          kl: 0.014238815230783075
          policy_loss: -0.09889343373652082
          total_loss: -0.054435749640106224
          vf_explained_var: 0.761439323425293
          vf_loss: 0.022832485323306173
    num_agent_steps_sampled: 6655600
    num_steps_sampled: 6655600
    num_steps_trained: 6655600
  iterations_since_restore: 120

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1115,48798.6,6655600,1.90921,1.9788,-2,37.3717


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6663592
  custom_metrics: {}
  date: 2021-12-10_02-36-05
  done: false
  episode_len_mean: 34.24122807017544
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9154157879059774
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 104861
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.936810526996851
          entropy_coeff: 0.0
          kl: 0.014688428404042497
          policy_loss: -0.09899916265567299
          total_loss: -0.054231194662861526
          vf_explained_var: 0.7847263813018799
          vf_loss: 0.022459917032392696
    num_agent_steps_sampled: 6663592
    num_steps_sampled: 6663592
    num_steps_trained: 6663592
  iterations_since_restore: 12

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1116,48846.9,6663592,1.91542,1.9808,-2,34.2412


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6671584
  custom_metrics: {}
  date: 2021-12-10_02-36-52
  done: false
  episode_len_mean: 34.78341013824885
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9308350256511144
  episode_reward_min: 1.7131999731063843
  episodes_this_iter: 217
  episodes_total: 105078
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9704973064363003
          entropy_coeff: 0.0
          kl: 0.014936921244952828
          policy_loss: -0.10299242910696194
          total_loss: -0.05801316612632945
          vf_explained_var: 0.7852509617805481
          vf_loss: 0.022293817542959005
    num_agent_steps_sampled: 6671584
    num_steps_sampled: 6671584
    num_steps_trained: 6671584
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1117,48894.5,6671584,1.93084,1.9804,1.7132,34.7834


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6679576
  custom_metrics: {}
  date: 2021-12-10_02-37-40
  done: false
  episode_len_mean: 38.09004739336493
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9067924186516712
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 105289
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9996549654752016
          entropy_coeff: 0.0
          kl: 0.013826643029460683
          policy_loss: -0.10725110891507939
          total_loss: -0.06488445747527294
          vf_explained_var: 0.815609335899353
          vf_loss: 0.02136744011659175
    num_agent_steps_sampled: 6679576
    num_steps_sampled: 6679576
    num_steps_trained: 6679576
  iterations_since_restore: 123


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1118,48942.2,6679576,1.90679,1.9808,-2,38.09


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6687568
  custom_metrics: {}
  date: 2021-12-10_02-38-28
  done: false
  episode_len_mean: 39.88837209302326
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9206734862438468
  episode_reward_min: 1.018399953842163
  episodes_this_iter: 215
  episodes_total: 105504
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9405247587710619
          entropy_coeff: 0.0
          kl: 0.015335172327468172
          policy_loss: -0.10571781136968639
          total_loss: -0.059862405236344784
          vf_explained_var: 0.6933578252792358
          vf_loss: 0.022565116989426315
    num_agent_steps_sampled: 6687568
    num_steps_sampled: 6687568
    num_steps_trained: 6687568
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1119,48989.8,6687568,1.92067,1.98,1.0184,39.8884


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6695560
  custom_metrics: {}
  date: 2021-12-10_02-39-15
  done: false
  episode_len_mean: 36.67906976744186
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9090976726177127
  episode_reward_min: -2.0
  episodes_this_iter: 215
  episodes_total: 105719
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9466976337134838
          entropy_coeff: 0.0
          kl: 0.014379676838871092
          policy_loss: -0.09668638489529258
          total_loss: -0.04942402266897261
          vf_explained_var: 0.734306812286377
          vf_loss: 0.02542322810040787
    num_agent_steps_sampled: 6695560
    num_steps_sampled: 6695560
    num_steps_trained: 6695560
  iterations_since_restore: 125
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1120,49037.3,6695560,1.9091,1.9796,-2,36.6791


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6703552
  custom_metrics: {}
  date: 2021-12-10_02-40-03
  done: false
  episode_len_mean: 33.8421052631579
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.91539123079233
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 105947
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9508496858179569
          entropy_coeff: 0.0
          kl: 0.014279918716056272
          policy_loss: -0.10070320003433153
          total_loss: -0.055044380424078554
          vf_explained_var: 0.7794944643974304
          vf_loss: 0.023971195390913635
    num_agent_steps_sampled: 6703552
    num_steps_sampled: 6703552
    num_steps_trained: 6703552
  iterations_since_restore: 126


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1121,49085,6703552,1.91539,1.98,-2,33.8421


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6711544
  custom_metrics: {}
  date: 2021-12-10_02-40-51
  done: false
  episode_len_mean: 35.833333333333336
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.909243752559026
  episode_reward_min: -2.0
  episodes_this_iter: 192
  episodes_total: 106139
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0184087343513966
          entropy_coeff: 0.0
          kl: 0.014356107596540824
          policy_loss: -0.1008571942220442
          total_loss: -0.05474340554792434
          vf_explained_var: 0.821507453918457
          vf_loss: 0.024310452572535723
    num_agent_steps_sampled: 6711544
    num_steps_sampled: 6711544
    num_steps_trained: 6711544
  iterations_since_restore: 127


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1122,49132.8,6711544,1.90924,1.9776,-2,35.8333


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6719536
  custom_metrics: {}
  date: 2021-12-10_02-41-38
  done: false
  episode_len_mean: 37.62311557788945
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9252361825962163
  episode_reward_min: 1.554800033569336
  episodes_this_iter: 199
  episodes_total: 106338
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0426109097898006
          entropy_coeff: 0.0
          kl: 0.014441042148973793
          policy_loss: -0.11066742305411026
          total_loss: -0.0643924030882772
          vf_explained_var: 0.8283073902130127
          vf_loss: 0.02434268849901855
    num_agent_steps_sampled: 6719536
    num_steps_sampled: 6719536
    num_steps_trained: 6719536
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1123,49180.2,6719536,1.92524,1.9804,1.5548,37.6231


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6727528
  custom_metrics: {}
  date: 2021-12-10_02-42-26
  done: false
  episode_len_mean: 39.34934497816594
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.905558082946133
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 106567
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9988908916711807
          entropy_coeff: 0.0
          kl: 0.014278408489190042
          policy_loss: -0.10325068089878187
          total_loss: -0.058706719311885536
          vf_explained_var: 0.8486292362213135
          vf_loss: 0.022858629818074405
    num_agent_steps_sampled: 6727528
    num_steps_sampled: 6727528
    num_steps_trained: 6727528
  iterations_since_restore: 12

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1124,49227.9,6727528,1.90556,1.982,-2,39.3493


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6735520
  custom_metrics: {}
  date: 2021-12-10_02-43-13
  done: false
  episode_len_mean: 39.90909090909091
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9015454582192681
  episode_reward_min: -2.0
  episodes_this_iter: 198
  episodes_total: 106765
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0241973381489515
          entropy_coeff: 0.0
          kl: 0.014061192778171971
          policy_loss: -0.1016631607490126
          total_loss: -0.053882896681898274
          vf_explained_var: 0.8618503212928772
          vf_loss: 0.026424828043673187
    num_agent_steps_sampled: 6735520
    num_steps_sampled: 6735520
    num_steps_trained: 6735520
  iterations_since_restore: 13

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1125,49275.2,6735520,1.90155,1.98,-2,39.9091


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6743512
  custom_metrics: {}
  date: 2021-12-10_02-44-01
  done: false
  episode_len_mean: 38.69
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8858719956874848
  episode_reward_min: -2.0
  episodes_this_iter: 200
  episodes_total: 106965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9903873112052679
          entropy_coeff: 0.0
          kl: 0.014214825408998877
          policy_loss: -0.10798755002906546
          total_loss: -0.05825946416007355
          vf_explained_var: 0.8207620978355408
          vf_loss: 0.028139321773778647
    num_agent_steps_sampled: 6743512
    num_steps_sampled: 6743512
    num_steps_trained: 6743512
  iterations_since_restore: 131
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1126,49322.6,6743512,1.88587,1.9808,-2,38.69


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6751504
  custom_metrics: {}
  date: 2021-12-10_02-44-48
  done: false
  episode_len_mean: 38.65800865800866
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9066701321116775
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 107196
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9555722307413816
          entropy_coeff: 0.0
          kl: 0.013482834096066654
          policy_loss: -0.09409971832064912
          total_loss: -0.04620774932845961
          vf_explained_var: 0.78165602684021
          vf_loss: 0.02741491823690012
    num_agent_steps_sampled: 6751504
    num_steps_sampled: 6751504
    num_steps_trained: 6751504
  iterations_since_restore: 132
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1127,49370.3,6751504,1.90667,1.98,-2,38.658


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6759496
  custom_metrics: {}
  date: 2021-12-10_02-45-36
  done: false
  episode_len_mean: 38.29611650485437
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.88602135713818
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 107402
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9861755184829235
          entropy_coeff: 0.0
          kl: 0.014544313977239653
          policy_loss: -0.09730715322075412
          total_loss: -0.047380498610436916
          vf_explained_var: 0.8101836442947388
          vf_loss: 0.02783747616922483
    num_agent_steps_sampled: 6759496
    num_steps_sampled: 6759496
    num_steps_trained: 6759496
  iterations_since_restore: 133


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1128,49417.9,6759496,1.88602,1.98,-2,38.2961


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6767488
  custom_metrics: {}
  date: 2021-12-10_02-46-23
  done: false
  episode_len_mean: 39.47826086956522
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8848270511857554
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 107609
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0133556723594666
          entropy_coeff: 0.0
          kl: 0.013539512030547485
          policy_loss: -0.09498461068142205
          total_loss: -0.04660119887557812
          vf_explained_var: 0.8527733683586121
          vf_loss: 0.027820280607556924
    num_agent_steps_sampled: 6767488
    num_steps_sampled: 6767488
    num_steps_trained: 6767488
  iterations_since_restore: 13

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1129,49464.9,6767488,1.88483,1.9804,-2,39.4783


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6775480
  custom_metrics: {}
  date: 2021-12-10_02-47-10
  done: false
  episode_len_mean: 35.98148148148148
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.895268519167547
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 107825
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.007179306820035
          entropy_coeff: 0.0
          kl: 0.01433832969632931
          policy_loss: -0.09917659760685638
          total_loss: -0.04246551403775811
          vf_explained_var: 0.7610931396484375
          vf_loss: 0.03493474633432925
    num_agent_steps_sampled: 6775480
    num_steps_sampled: 6775480
    num_steps_trained: 6775480
  iterations_since_restore: 135
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1130,49511.9,6775480,1.89527,1.9828,-2,35.9815


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6783472
  custom_metrics: {}
  date: 2021-12-10_02-47-57
  done: false
  episode_len_mean: 38.885416666666664
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.881445836275816
  episode_reward_min: -2.0
  episodes_this_iter: 192
  episodes_total: 108017
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0474011301994324
          entropy_coeff: 0.0
          kl: 0.013705543125979602
          policy_loss: -0.09713862475473434
          total_loss: -0.04483884147339268
          vf_explained_var: 0.7975994348526001
          vf_loss: 0.03148449235595763
    num_agent_steps_sampled: 6783472
    num_steps_sampled: 6783472
    num_steps_trained: 6783472
  iterations_since_restore: 136


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1131,49558.8,6783472,1.88145,1.9828,-2,38.8854


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6791464
  custom_metrics: {}
  date: 2021-12-10_02-48-44
  done: false
  episode_len_mean: 39.25
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9042549033959706
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 108221
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.050006402656436
          entropy_coeff: 0.0
          kl: 0.014137968857539818
          policy_loss: -0.10653306695166975
          total_loss: -0.05873205701936968
          vf_explained_var: 0.846958339214325
          vf_loss: 0.026328966894652694
    num_agent_steps_sampled: 6791464
    num_steps_sampled: 6791464
    num_steps_trained: 6791464
  iterations_since_restore: 137
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1132,49605.4,6791464,1.90425,1.9824,-2,39.25


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6799456
  custom_metrics: {}
  date: 2021-12-10_02-49-30
  done: false
  episode_len_mean: 36.58878504672897
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.927213087817219
  episode_reward_min: 1.3916000127792358
  episodes_this_iter: 214
  episodes_total: 108435
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9714025538414717
          entropy_coeff: 0.0
          kl: 0.014183013496221974
          policy_loss: -0.10297955409623682
          total_loss: -0.05923669983167201
          vf_explained_var: 0.8045709133148193
          vf_loss: 0.022202402295079082
    num_agent_steps_sampled: 6799456
    num_steps_sampled: 6799456
    num_steps_trained: 6799456
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1133,49652,6799456,1.92721,1.9784,1.3916,36.5888


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6807448
  custom_metrics: {}
  date: 2021-12-10_02-50-17
  done: false
  episode_len_mean: 39.806603773584904
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9023566057659544
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 108647
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9644600488245487
          entropy_coeff: 0.0
          kl: 0.01438007049728185
          policy_loss: -0.09732994643854909
          total_loss: -0.04587415000423789
          vf_explained_var: 0.7360070943832397
          vf_loss: 0.029616063518915325
    num_agent_steps_sampled: 6807448
    num_steps_sampled: 6807448
    num_steps_trained: 6807448
  iterations_since_restore: 13

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1134,49699,6807448,1.90236,1.9772,-2,39.8066


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6815440
  custom_metrics: {}
  date: 2021-12-10_02-51-04
  done: false
  episode_len_mean: 42.0959595959596
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9162909124949665
  episode_reward_min: 0.03359999880194664
  episodes_this_iter: 198
  episodes_total: 108845
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0249631889164448
          entropy_coeff: 0.0
          kl: 0.0139370966644492
          policy_loss: -0.10752637396217324
          total_loss: -0.06069577659945935
          vf_explained_var: 0.7958226799964905
          vf_loss: 0.025663631793577224
    num_agent_steps_sampled: 6815440
    num_steps_sampled: 6815440
    num_steps_trained: 6815440
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1135,49746.1,6815440,1.91629,1.9824,0.0336,42.096


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6823432
  custom_metrics: {}
  date: 2021-12-10_02-51-51
  done: false
  episode_len_mean: 36.465437788018434
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9098930897251252
  episode_reward_min: -2.0
  episodes_this_iter: 217
  episodes_total: 109062
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9528115354478359
          entropy_coeff: 0.0
          kl: 0.014193300536135212
          policy_loss: -0.09979469608515501
          total_loss: -0.05341296437836718
          vf_explained_var: 0.7776352167129517
          vf_loss: 0.024825656262692064
    num_agent_steps_sampled: 6823432
    num_steps_sampled: 6823432
    num_steps_trained: 6823432
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1136,49792.7,6823432,1.90989,1.9804,-2,36.4654


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6831424
  custom_metrics: {}
  date: 2021-12-10_02-52-38
  done: false
  episode_len_mean: 39.17619047619048
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9087352343967983
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 109272
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.939403485506773
          entropy_coeff: 0.0
          kl: 0.014039030531421304
          policy_loss: -0.1018177357618697
          total_loss: -0.054556854302063584
          vf_explained_var: 0.7557544112205505
          vf_loss: 0.025939104903955013
    num_agent_steps_sampled: 6831424
    num_steps_sampled: 6831424
    num_steps_trained: 6831424
  iterations_since_restore: 142


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1137,49839.4,6831424,1.90874,1.9792,-2,39.1762


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6839416
  custom_metrics: {}
  date: 2021-12-10_02-53-24
  done: false
  episode_len_mean: 36.675438596491226
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8933877150217693
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 109500
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.919234013184905
          entropy_coeff: 0.0
          kl: 0.013097957504214719
          policy_loss: -0.09714381069352385
          total_loss: -0.05191719808499329
          vf_explained_var: 0.7839648723602295
          vf_loss: 0.02533409121679142
    num_agent_steps_sampled: 6839416
    num_steps_sampled: 6839416
    num_steps_trained: 6839416
  iterations_since_restore: 143

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1138,49885.8,6839416,1.89339,1.98,-2,36.6754


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6847408
  custom_metrics: {}
  date: 2021-12-10_02-54-11
  done: false
  episode_len_mean: 37.44230769230769
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8887903833618531
  episode_reward_min: -2.0
  episodes_this_iter: 208
  episodes_total: 109708
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9649260491132736
          entropy_coeff: 0.0
          kl: 0.013767570460913703
          policy_loss: -0.10142105992417783
          total_loss: -0.05270727080642246
          vf_explained_var: 0.823485255241394
          vf_loss: 0.0278042919235304
    num_agent_steps_sampled: 6847408
    num_steps_sampled: 6847408
    num_steps_trained: 6847408
  iterations_since_restore: 144
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1139,49932.4,6847408,1.88879,1.98,-2,37.4423


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6855400
  custom_metrics: {}
  date: 2021-12-10_02-54-57
  done: false
  episode_len_mean: 41.4300518134715
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.876974092864002
  episode_reward_min: -2.0
  episodes_this_iter: 193
  episodes_total: 109901
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9894408974796534
          entropy_coeff: 0.0
          kl: 0.014293065323727205
          policy_loss: -0.09526964725228027
          total_loss: -0.04309513856424019
          vf_explained_var: 0.7622339725494385
          vf_loss: 0.030466919532045722
    num_agent_steps_sampled: 6855400
    num_steps_sampled: 6855400
    num_steps_trained: 6855400
  iterations_since_restore: 145


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1140,49978.9,6855400,1.87697,1.9808,-2,41.4301


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6863392
  custom_metrics: {}
  date: 2021-12-10_02-55-44
  done: false
  episode_len_mean: 37.794117647058826
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8684549039485407
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 110105
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9811696745455265
          entropy_coeff: 0.0
          kl: 0.013048914988758042
          policy_loss: -0.09430263584363274
          total_loss: -0.04425175255164504
          vf_explained_var: 0.8231828212738037
          vf_loss: 0.03023284045048058
    num_agent_steps_sampled: 6863392
    num_steps_sampled: 6863392
    num_steps_trained: 6863392
  iterations_since_restore: 14

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1141,50025.8,6863392,1.86845,1.9784,-2,37.7941


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6871384
  custom_metrics: {}
  date: 2021-12-10_02-56-31
  done: false
  episode_len_mean: 35.74458874458875
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9125194817910462
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 110336
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9307574722915888
          entropy_coeff: 0.0
          kl: 0.014533504145219922
          policy_loss: -0.0976827320700977
          total_loss: -0.04647364125412423
          vf_explained_var: 0.7419337630271912
          vf_loss: 0.029136335127986968
    num_agent_steps_sampled: 6871384
    num_steps_sampled: 6871384
    num_steps_trained: 6871384
  iterations_since_restore: 147

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1142,50072.7,6871384,1.91252,1.9784,-2,35.7446


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6879376
  custom_metrics: {}
  date: 2021-12-10_02-57-18
  done: false
  episode_len_mean: 40.300970873786405
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9197398088510753
  episode_reward_min: 1.61080002784729
  episodes_this_iter: 206
  episodes_total: 110542
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.952803635969758
          entropy_coeff: 0.0
          kl: 0.015116446884348989
          policy_loss: -0.11167638865299523
          total_loss: -0.06280544283799827
          vf_explained_var: 0.6747974157333374
          vf_loss: 0.025912844692356884
    num_agent_steps_sampled: 6879376
    num_steps_sampled: 6879376
    num_steps_trained: 6879376
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1143,50119.4,6879376,1.91974,1.9816,1.6108,40.301


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6887368
  custom_metrics: {}
  date: 2021-12-10_02-58-04
  done: false
  episode_len_mean: 36.58371040723982
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9098334841059343
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 110763
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9735569674521685
          entropy_coeff: 0.0
          kl: 0.014192382717737928
          policy_loss: -0.1024431670375634
          total_loss: -0.05717097397428006
          vf_explained_var: 0.7813727855682373
          vf_loss: 0.023717514355666935
    num_agent_steps_sampled: 6887368
    num_steps_sampled: 6887368
    num_steps_trained: 6887368
  iterations_since_restore: 149

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1144,50165.6,6887368,1.90983,1.9804,-2,36.5837


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6895360
  custom_metrics: {}
  date: 2021-12-10_02-58-51
  done: false
  episode_len_mean: 34.11009174311926
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9147577996647687
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 110981
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9588538035750389
          entropy_coeff: 0.0
          kl: 0.014252037624828517
          policy_loss: -0.10417545697418973
          total_loss: -0.05946114363905508
          vf_explained_var: 0.798210084438324
          vf_loss: 0.023069029499311
    num_agent_steps_sampled: 6895360
    num_steps_sampled: 6895360
    num_steps_trained: 6895360
  iterations_since_restore: 150
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1145,50212.2,6895360,1.91476,1.9796,-2,34.1101


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6903352
  custom_metrics: {}
  date: 2021-12-10_02-59-38
  done: false
  episode_len_mean: 37.4218009478673
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8700492901824661
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 111192
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9628470875322819
          entropy_coeff: 0.0
          kl: 0.012883174058515579
          policy_loss: -0.08448141333064996
          total_loss: -0.03812328801723197
          vf_explained_var: 0.7656071186065674
          vf_loss: 0.026791803946252912
    num_agent_steps_sampled: 6903352
    num_steps_sampled: 6903352
    num_steps_trained: 6903352
  iterations_since_restore: 151

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1146,50259,6903352,1.87005,1.9788,-2,37.4218


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6911344
  custom_metrics: {}
  date: 2021-12-10_03-00-25
  done: false
  episode_len_mean: 37.486363636363635
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.889910915223035
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 111412
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9224472660571337
          entropy_coeff: 0.0
          kl: 0.013929805718362331
          policy_loss: -0.09539642572053708
          total_loss: -0.04971951380139217
          vf_explained_var: 0.7509273290634155
          vf_loss: 0.024521020706743002
    num_agent_steps_sampled: 6911344
    num_steps_sampled: 6911344
    num_steps_trained: 6911344
  iterations_since_restore: 15

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1147,50305.8,6911344,1.88991,1.982,-2,37.4864


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6919336
  custom_metrics: {}
  date: 2021-12-10_03-01-12
  done: false
  episode_len_mean: 35.24311926605505
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9125431185468622
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 111630
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9571501985192299
          entropy_coeff: 0.0
          kl: 0.014841675263596699
          policy_loss: -0.10676585003966466
          total_loss: -0.06090295319154393
          vf_explained_var: 0.7405414581298828
          vf_loss: 0.023322104243561625
    num_agent_steps_sampled: 6919336
    num_steps_sampled: 6919336
    num_steps_trained: 6919336
  iterations_since_restore: 15

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1148,50352.8,6919336,1.91254,1.9844,-2,35.2431


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6927328
  custom_metrics: {}
  date: 2021-12-10_03-01-58
  done: false
  episode_len_mean: 35.7551867219917
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.880856430629477
  episode_reward_min: -2.0
  episodes_this_iter: 241
  episodes_total: 111871
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9123797006905079
          entropy_coeff: 0.0
          kl: 0.01180612706230022
          policy_loss: -0.08634858037112281
          total_loss: -0.0309905245230766
          vf_explained_var: 0.7244935035705566
          vf_loss: 0.03742750018136576
    num_agent_steps_sampled: 6927328
    num_steps_sampled: 6927328
    num_steps_trained: 6927328
  iterations_since_restore: 154
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1149,50399.5,6927328,1.88086,1.9844,-2,35.7552


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6935320
  custom_metrics: {}
  date: 2021-12-10_03-02-45
  done: false
  episode_len_mean: 34.625
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9140035744224275
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 112095
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9537149649113417
          entropy_coeff: 0.0
          kl: 0.014085236296523362
          policy_loss: -0.10406447903369553
          total_loss: -0.0561618841602467
          vf_explained_var: 0.8007303476333618
          vf_loss: 0.026510643860092387
    num_agent_steps_sampled: 6935320
    num_steps_sampled: 6935320
    num_steps_trained: 6935320
  iterations_since_restore: 155
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1150,50446.5,6935320,1.914,1.9844,-2,34.625


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6943312
  custom_metrics: {}
  date: 2021-12-10_03-03-32
  done: false
  episode_len_mean: 33.082251082251084
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9176121232829568
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 112326
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9239079728722572
          entropy_coeff: 0.0
          kl: 0.013366345898248255
          policy_loss: -0.08795618693693541
          total_loss: -0.04277261014794931
          vf_explained_var: 0.8218858242034912
          vf_loss: 0.024883440404664725
    num_agent_steps_sampled: 6943312
    num_steps_sampled: 6943312
    num_steps_trained: 6943312
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1151,50493.2,6943312,1.91761,1.9844,-2,33.0823


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6951304
  custom_metrics: {}
  date: 2021-12-10_03-04-19
  done: false
  episode_len_mean: 36.23076923076923
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.927971040501314
  episode_reward_min: 1.576799988746643
  episodes_this_iter: 221
  episodes_total: 112547
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9670575242489576
          entropy_coeff: 0.0
          kl: 0.014867850200971588
          policy_loss: -0.10434640615130775
          total_loss: -0.05839549901429564
          vf_explained_var: 0.7440428137779236
          vf_loss: 0.023370363865979016
    num_agent_steps_sampled: 6951304
    num_steps_sampled: 6951304
    num_steps_trained: 6951304
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1152,50540.3,6951304,1.92797,1.984,1.5768,36.2308


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6959296
  custom_metrics: {}
  date: 2021-12-10_03-05-06
  done: false
  episode_len_mean: 39.45145631067961
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8838737925279487
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 112753
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.959862170740962
          entropy_coeff: 0.0
          kl: 0.012987773603526875
          policy_loss: -0.09339277740218677
          total_loss: -0.045633536734385416
          vf_explained_var: 0.7898780107498169
          vf_loss: 0.028034061775542796
    num_agent_steps_sampled: 6959296
    num_steps_sampled: 6959296
    num_steps_trained: 6959296
  iterations_since_restore: 15

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1153,50587.2,6959296,1.88387,1.98,-2,39.4515


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6967288
  custom_metrics: {}
  date: 2021-12-10_03-05-53
  done: false
  episode_len_mean: 36.36619718309859
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.909447884335764
  episode_reward_min: -2.0
  episodes_this_iter: 213
  episodes_total: 112966
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9943367186933756
          entropy_coeff: 0.0
          kl: 0.014167856745189056
          policy_loss: -0.09349334533908404
          total_loss: -0.04445038383710198
          vf_explained_var: 0.832119882106781
          vf_loss: 0.027525529090780765
    num_agent_steps_sampled: 6967288
    num_steps_sampled: 6967288
    num_steps_trained: 6967288
  iterations_since_restore: 159


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1154,50634,6967288,1.90945,1.98,-2,36.3662


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6975280
  custom_metrics: {}
  date: 2021-12-10_03-06-40
  done: false
  episode_len_mean: 39.96135265700483
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8832425071997343
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 113173
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9390229303389788
          entropy_coeff: 0.0
          kl: 0.014030294434633106
          policy_loss: -0.09907739062327892
          total_loss: -0.05314794337027706
          vf_explained_var: 0.7957877516746521
          vf_loss: 0.024620938173029572
    num_agent_steps_sampled: 6975280
    num_steps_sampled: 6975280
    num_steps_trained: 6975280
  iterations_since_restore: 16

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1155,50681.1,6975280,1.88324,1.9844,-2,39.9614


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6983272
  custom_metrics: {}
  date: 2021-12-10_03-07-27
  done: false
  episode_len_mean: 34.57272727272727
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9136509109627118
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 113393
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9296677503734827
          entropy_coeff: 0.0
          kl: 0.014222734258510172
          policy_loss: -0.10026410728460178
          total_loss: -0.057040436076931655
          vf_explained_var: 0.7942863702774048
          vf_loss: 0.02162289433181286
    num_agent_steps_sampled: 6983272
    num_steps_sampled: 6983272
    num_steps_trained: 6983272
  iterations_since_restore: 16

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1156,50727.9,6983272,1.91365,1.984,-2,34.5727


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6991264
  custom_metrics: {}
  date: 2021-12-10_03-08-14
  done: false
  episode_len_mean: 36.45
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9274527262557637
  episode_reward_min: 0.7200000286102295
  episodes_this_iter: 220
  episodes_total: 113613
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9675478301942348
          entropy_coeff: 0.0
          kl: 0.014171268907375634
          policy_loss: -0.10429333819774911
          total_loss: -0.061802482698112726
          vf_explained_var: 0.7711106538772583
          vf_loss: 0.020968243188690394
    num_agent_steps_sampled: 6991264
    num_steps_sampled: 6991264
    num_steps_trained: 6991264
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1157,50775,6991264,1.92745,1.984,0.72,36.45


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 6999256
  custom_metrics: {}
  date: 2021-12-10_03-09-01
  done: false
  episode_len_mean: 36.404545454545456
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.92757272828709
  episode_reward_min: 1.6856000423431396
  episodes_this_iter: 220
  episodes_total: 113833
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9243297912180424
          entropy_coeff: 0.0
          kl: 0.014408968127099797
          policy_loss: -0.10449342284118757
          total_loss: -0.06033822233439423
          vf_explained_var: 0.7226461172103882
          vf_loss: 0.02227158407913521
    num_agent_steps_sampled: 6999256
    num_steps_sampled: 6999256
    num_steps_trained: 6999256
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1158,50821.7,6999256,1.92757,1.9848,1.6856,36.4045


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7007248
  custom_metrics: {}
  date: 2021-12-10_03-09-48
  done: false
  episode_len_mean: 35.578947368421055
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9292669900866788
  episode_reward_min: 1.518399953842163
  episodes_this_iter: 209
  episodes_total: 114042
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9618979785591364
          entropy_coeff: 0.0
          kl: 0.014705380221130326
          policy_loss: -0.11316366883693263
          total_loss: -0.06938533988432027
          vf_explained_var: 0.7539653778076172
          vf_loss: 0.02144453237997368
    num_agent_steps_sampled: 7007248
    num_steps_sampled: 7007248
    num_steps_trained: 7007248
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1159,50868.4,7007248,1.92927,1.9848,1.5184,35.5789


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7015240
  custom_metrics: {}
  date: 2021-12-10_03-10-34
  done: false
  episode_len_mean: 38.57575757575758
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9232658000090421
  episode_reward_min: 0.14560000598430634
  episodes_this_iter: 231
  episodes_total: 114273
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9293502829968929
          entropy_coeff: 0.0
          kl: 0.014756016404135153
          policy_loss: -0.10230990298441611
          total_loss: -0.056652982893865556
          vf_explained_var: 0.720539927482605
          vf_loss: 0.02324622025480494
    num_agent_steps_sampled: 7015240
    num_steps_sampled: 7015240
    num_steps_trained: 7015240
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1160,50915.1,7015240,1.92327,1.9848,0.1456,38.5758


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7023232
  custom_metrics: {}
  date: 2021-12-10_03-11-21
  done: false
  episode_len_mean: 37.23873873873874
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8756396373112996
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 114495
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9587994888424873
          entropy_coeff: 0.0
          kl: 0.01319874613545835
          policy_loss: -0.09144546819788957
          total_loss: -0.04904927362804301
          vf_explained_var: 0.8305928707122803
          vf_loss: 0.022350598010234535
    num_agent_steps_sampled: 7023232
    num_steps_sampled: 7023232
    num_steps_trained: 7023232
  iterations_since_restore: 166

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1161,50962.1,7023232,1.87564,1.9848,-2,37.2387


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7031224
  custom_metrics: {}
  date: 2021-12-10_03-12-08
  done: false
  episode_len_mean: 32.66129032258065
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9350758065139093
  episode_reward_min: 1.575600028038025
  episodes_this_iter: 248
  episodes_total: 114743
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.922199172899127
          entropy_coeff: 0.0
          kl: 0.01546841106028296
          policy_loss: -0.10741910373326391
          total_loss: -0.06227649631910026
          vf_explained_var: 0.6751300096511841
          vf_loss: 0.021649962523952127
    num_agent_steps_sampled: 7031224
    num_steps_sampled: 7031224
    num_steps_trained: 7031224
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1162,51009.1,7031224,1.93508,1.9848,1.5756,32.6613


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7039216
  custom_metrics: {}
  date: 2021-12-10_03-12-55
  done: false
  episode_len_mean: 34.013392857142854
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.932335717337472
  episode_reward_min: 1.7267999649047852
  episodes_this_iter: 224
  episodes_total: 114967
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9438809510320425
          entropy_coeff: 0.0
          kl: 0.015087103733094409
          policy_loss: -0.10950366547331214
          total_loss: -0.06586138275451958
          vf_explained_var: 0.7039904594421387
          vf_loss: 0.020728745876112953
    num_agent_steps_sampled: 7039216
    num_steps_sampled: 7039216
    num_steps_trained: 7039216
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1163,51056.2,7039216,1.93234,1.9848,1.7268,34.0134


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7047208
  custom_metrics: {}
  date: 2021-12-10_03-13-43
  done: false
  episode_len_mean: 37.18385650224215
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9090062770073724
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 115190
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9372197948396206
          entropy_coeff: 0.0
          kl: 0.014448670292040333
          policy_loss: -0.10290019449894316
          total_loss: -0.06114761912613176
          vf_explained_var: 0.7908807992935181
          vf_loss: 0.01980865775840357
    num_agent_steps_sampled: 7047208
    num_steps_sampled: 7047208
    num_steps_trained: 7047208
  iterations_since_restore: 169

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1164,51103.3,7047208,1.90901,1.9848,-2,37.1839


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7055200
  custom_metrics: {}
  date: 2021-12-10_03-14-29
  done: false
  episode_len_mean: 33.88532110091743
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8790844006275913
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 115408
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.999949261546135
          entropy_coeff: 0.0
          kl: 0.01359704983769916
          policy_loss: -0.09761055358103476
          total_loss: -0.05108293337980285
          vf_explained_var: 0.7947465181350708
          vf_loss: 0.025877098960336298
    num_agent_steps_sampled: 7055200
    num_steps_sampled: 7055200
    num_steps_trained: 7055200
  iterations_since_restore: 170
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1165,51150.2,7055200,1.87908,1.9792,-2,33.8853


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7063192
  custom_metrics: {}
  date: 2021-12-10_03-15-16
  done: false
  episode_len_mean: 35.767857142857146
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.928807141525405
  episode_reward_min: 1.4859999418258667
  episodes_this_iter: 224
  episodes_total: 115632
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9779386930167675
          entropy_coeff: 0.0
          kl: 0.014614177052862942
          policy_loss: -0.1072942603204865
          total_loss: -0.06409921057638712
          vf_explained_var: 0.7520872354507446
          vf_loss: 0.02099977049510926
    num_agent_steps_sampled: 7063192
    num_steps_sampled: 7063192
    num_steps_trained: 7063192
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1166,51196.8,7063192,1.92881,1.9824,1.486,35.7679


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7071184
  custom_metrics: {}
  date: 2021-12-10_03-16-03
  done: false
  episode_len_mean: 35.756637168141594
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8779469021653707
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 115858
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9598397184163332
          entropy_coeff: 0.0
          kl: 0.012881092727184296
          policy_loss: -0.0937300497898832
          total_loss: -0.051593948679510504
          vf_explained_var: 0.8072288036346436
          vf_loss: 0.022572943125851452
    num_agent_steps_sampled: 7071184
    num_steps_sampled: 7071184
    num_steps_trained: 7071184
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1167,51243.5,7071184,1.87795,1.9804,-2,35.7566


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7079176
  custom_metrics: {}
  date: 2021-12-10_03-16-50
  done: false
  episode_len_mean: 32.565891472868216
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9202294589937194
  episode_reward_min: -2.0
  episodes_this_iter: 258
  episodes_total: 116116
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9088448286056519
          entropy_coeff: 0.0
          kl: 0.013539920008042827
          policy_loss: -0.09716338898579124
          total_loss: -0.055846987554105
          vf_explained_var: 0.7061581611633301
          vf_loss: 0.020752648590132594
    num_agent_steps_sampled: 7079176
    num_steps_sampled: 7079176
    num_steps_trained: 7079176
  iterations_since_restore: 173


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1168,51290.9,7079176,1.92023,1.9792,-2,32.5659


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7087168
  custom_metrics: {}
  date: 2021-12-10_03-17-37
  done: false
  episode_len_mean: 32.66390041493776
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.935065559331807
  episode_reward_min: 1.7075999975204468
  episodes_this_iter: 241
  episodes_total: 116357
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9286199007183313
          entropy_coeff: 0.0
          kl: 0.01523802787414752
          policy_loss: -0.10702959779882804
          total_loss: -0.06393766531255096
          vf_explained_var: 0.6595197916030884
          vf_loss: 0.019949180161347613
    num_agent_steps_sampled: 7087168
    num_steps_sampled: 7087168
    num_steps_trained: 7087168
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1169,51337.9,7087168,1.93507,1.9804,1.7076,32.6639


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7095160
  custom_metrics: {}
  date: 2021-12-10_03-18-24
  done: false
  episode_len_mean: 35.2
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8803421299508278
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 116592
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9381367657333612
          entropy_coeff: 0.0
          kl: 0.012463188642868772
          policy_loss: -0.08626135357189924
          total_loss: -0.04251936502987519
          vf_explained_var: 0.7856432199478149
          vf_loss: 0.024813521944452077
    num_agent_steps_sampled: 7095160
    num_steps_sampled: 7095160
    num_steps_trained: 7095160
  iterations_since_restore: 175
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1170,51384.8,7095160,1.88034,1.9784,-2,35.2


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7103152
  custom_metrics: {}
  date: 2021-12-10_03-19-11
  done: false
  episode_len_mean: 33.78632478632478
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9328000030965886
  episode_reward_min: 1.6303999423980713
  episodes_this_iter: 234
  episodes_total: 116826
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9106275197118521
          entropy_coeff: 0.0
          kl: 0.014688599476357922
          policy_loss: -0.10800574097083881
          total_loss: -0.06598178145941347
          vf_explained_var: 0.6925813555717468
          vf_loss: 0.019715652539161965
    num_agent_steps_sampled: 7103152
    num_steps_sampled: 7103152
    num_steps_trained: 7103152
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1171,51431.6,7103152,1.9328,1.9784,1.6304,33.7863


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7111144
  custom_metrics: {}
  date: 2021-12-10_03-19-58
  done: false
  episode_len_mean: 37.86057692307692
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8874096159751599
  episode_reward_min: -2.0
  episodes_this_iter: 208
  episodes_total: 117034
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9886333234608173
          entropy_coeff: 0.0
          kl: 0.013592109637102112
          policy_loss: -0.09354430704843253
          total_loss: -0.047860242120805196
          vf_explained_var: 0.8198039531707764
          vf_loss: 0.025041049579158425
    num_agent_steps_sampled: 7111144
    num_steps_sampled: 7111144
    num_steps_trained: 7111144
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1172,51478,7111144,1.88741,1.9804,-2,37.8606


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7119136
  custom_metrics: {}
  date: 2021-12-10_03-20-45
  done: false
  episode_len_mean: 34.4304932735426
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.897381162964175
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 117257
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9362659715116024
          entropy_coeff: 0.0
          kl: 0.01329598916345276
          policy_loss: -0.09214188413170632
          total_loss: -0.04524954943917692
          vf_explained_var: 0.8524571657180786
          vf_loss: 0.026699050096794963
    num_agent_steps_sampled: 7119136
    num_steps_sampled: 7119136
    num_steps_trained: 7119136
  iterations_since_restore: 178
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1173,51524.9,7119136,1.89738,1.9804,-2,34.4305


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7127128
  custom_metrics: {}
  date: 2021-12-10_03-21-31
  done: false
  episode_len_mean: 33.37394957983193
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9011865563753272
  episode_reward_min: -2.0
  episodes_this_iter: 238
  episodes_total: 117495
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.965180404484272
          entropy_coeff: 0.0
          kl: 0.013671631983015686
          policy_loss: -0.09509392199106514
          total_loss: -0.04970079357735813
          vf_explained_var: 0.801199197769165
          vf_loss: 0.024629337538499385
    num_agent_steps_sampled: 7127128
    num_steps_sampled: 7127128
    num_steps_trained: 7127128
  iterations_since_restore: 179
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1174,51571.7,7127128,1.90119,1.9796,-2,33.3739


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7135120
  custom_metrics: {}
  date: 2021-12-10_03-22-18
  done: false
  episode_len_mean: 36.20603015075377
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9279135657315278
  episode_reward_min: 1.6016000509262085
  episodes_this_iter: 199
  episodes_total: 117694
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9916115049272776
          entropy_coeff: 0.0
          kl: 0.01455876394174993
          policy_loss: -0.10503016601433046
          total_loss: -0.05844994282233529
          vf_explained_var: 0.7741647958755493
          vf_loss: 0.024469101335853338
    num_agent_steps_sampled: 7135120
    num_steps_sampled: 7135120
    num_steps_trained: 7135120
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1175,51618.7,7135120,1.92791,1.9784,1.6016,36.206


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7143112
  custom_metrics: {}
  date: 2021-12-10_03-23-05
  done: false
  episode_len_mean: 38.215859030837
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9065656388908756
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 117921
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9454641025513411
          entropy_coeff: 0.0
          kl: 0.01397650662693195
          policy_loss: -0.09983911173185334
          total_loss: -0.054304382792906836
          vf_explained_var: 0.7914656400680542
          vf_loss: 0.02430791052756831
    num_agent_steps_sampled: 7143112
    num_steps_sampled: 7143112
    num_steps_trained: 7143112
  iterations_since_restore: 181
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1176,51665.2,7143112,1.90657,1.9784,-2,38.2159


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7151104
  custom_metrics: {}
  date: 2021-12-10_03-23-51
  done: false
  episode_len_mean: 37.99523809523809
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9063733299573262
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 118131
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9691142458468676
          entropy_coeff: 0.0
          kl: 0.014162677573040128
          policy_loss: -0.10068003687774763
          total_loss: -0.05513193451042753
          vf_explained_var: 0.8025234937667847
          vf_loss: 0.024038535309955478
    num_agent_steps_sampled: 7151104
    num_steps_sampled: 7151104
    num_steps_trained: 7151104
  iterations_since_restore: 18

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1177,51711.7,7151104,1.90637,1.978,-2,37.9952


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7159096
  custom_metrics: {}
  date: 2021-12-10_03-24-38
  done: false
  episode_len_mean: 38.21800947867298
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9239128000363355
  episode_reward_min: 0.5971999764442444
  episodes_this_iter: 211
  episodes_total: 118342
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9670741837471724
          entropy_coeff: 0.0
          kl: 0.01485360463266261
          policy_loss: -0.10362064636137802
          total_loss: -0.055636725039221346
          vf_explained_var: 0.7599872350692749
          vf_loss: 0.025425007683224976
    num_agent_steps_sampled: 7159096
    num_steps_sampled: 7159096
    num_steps_trained: 7159096
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1178,51758.5,7159096,1.92391,1.9796,0.5972,38.218


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7167088
  custom_metrics: {}
  date: 2021-12-10_03-25-25
  done: false
  episode_len_mean: 38.9622641509434
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.90432452761902
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 118554
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.966548852622509
          entropy_coeff: 0.0
          kl: 0.014529956912156194
          policy_loss: -0.10250908727175556
          total_loss: -0.05194351257523522
          vf_explained_var: 0.7681275606155396
          vf_loss: 0.028498201863840222
    num_agent_steps_sampled: 7167088
    num_steps_sampled: 7167088
    num_steps_trained: 7167088
  iterations_since_restore: 184
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1179,51805.4,7167088,1.90432,1.978,-2,38.9623


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7175080
  custom_metrics: {}
  date: 2021-12-10_03-26-12
  done: false
  episode_len_mean: 36.39408866995074
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.909318222788167
  episode_reward_min: -2.0
  episodes_this_iter: 203
  episodes_total: 118757
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0072824209928513
          entropy_coeff: 0.0
          kl: 0.013413199660135433
          policy_loss: -0.09990536293480545
          total_loss: -0.05551619752077386
          vf_explained_var: 0.8297165632247925
          vf_loss: 0.024017868156079203
    num_agent_steps_sampled: 7175080
    num_steps_sampled: 7175080
    num_steps_trained: 7175080
  iterations_since_restore: 185


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1180,51852.2,7175080,1.90932,1.9828,-2,36.3941


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7183072
  custom_metrics: {}
  date: 2021-12-10_03-26-59
  done: false
  episode_len_mean: 36.25431034482759
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9110000012763615
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 118989
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.967468025162816
          entropy_coeff: 0.0
          kl: 0.013720814196858555
          policy_loss: -0.09819494147086516
          total_loss: -0.05448228193563409
          vf_explained_var: 0.8175156116485596
          vf_loss: 0.022874173941090703
    num_agent_steps_sampled: 7183072
    num_steps_sampled: 7183072
    num_steps_trained: 7183072
  iterations_since_restore: 186

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1181,51899.2,7183072,1.911,1.98,-2,36.2543


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7191064
  custom_metrics: {}
  date: 2021-12-10_03-27-46
  done: false
  episode_len_mean: 34.008810572687224
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.93237180279215
  episode_reward_min: 1.6360000371932983
  episodes_this_iter: 227
  episodes_total: 119216
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9819108191877604
          entropy_coeff: 0.0
          kl: 0.014160699211061
          policy_loss: -0.10258253040956333
          total_loss: -0.057646390559966676
          vf_explained_var: 0.7861039638519287
          vf_loss: 0.023429577064234763
    num_agent_steps_sampled: 7191064
    num_steps_sampled: 7191064
    num_steps_trained: 7191064
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1182,51946,7191064,1.93237,1.9772,1.636,34.0088


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7199056
  custom_metrics: {}
  date: 2021-12-10_03-28-32
  done: false
  episode_len_mean: 32.572649572649574
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9352324798575833
  episode_reward_min: 1.7344000339508057
  episodes_this_iter: 234
  episodes_total: 119450
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9640663675963879
          entropy_coeff: 0.0
          kl: 0.014322446571895853
          policy_loss: -0.10469193884637207
          total_loss: -0.062137887623975985
          vf_explained_var: 0.793681800365448
          vf_loss: 0.02080183700309135
    num_agent_steps_sampled: 7199056
    num_steps_sampled: 7199056
    num_steps_trained: 7199056
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1183,51991.8,7199056,1.93523,1.9784,1.7344,32.5726


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7207048
  custom_metrics: {}
  date: 2021-12-10_03-29-18
  done: false
  episode_len_mean: 32.80349344978166
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9179213973632545
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 119679
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.982695834711194
          entropy_coeff: 0.0
          kl: 0.013775991479633376
          policy_loss: -0.10474427785084117
          total_loss: -0.06465624982956797
          vf_explained_var: 0.851259171962738
          vf_loss: 0.019165742705808952
    num_agent_steps_sampled: 7207048
    num_steps_sampled: 7207048
    num_steps_trained: 7207048
  iterations_since_restore: 189
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1184,52037.7,7207048,1.91792,1.9796,-2,32.8035


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7215040
  custom_metrics: {}
  date: 2021-12-10_03-30-04
  done: false
  episode_len_mean: 41.5260663507109
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8993061583189037
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 119890
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.949994008988142
          entropy_coeff: 0.0
          kl: 0.013919160468503833
          policy_loss: -0.09710480735520832
          total_loss: -0.05552226968575269
          vf_explained_var: 0.8320329785346985
          vf_loss: 0.02044281375128776
    num_agent_steps_sampled: 7215040
    num_steps_sampled: 7215040
    num_steps_trained: 7215040
  iterations_since_restore: 190
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1185,52083.5,7215040,1.89931,1.9796,-2,41.5261


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7223032
  custom_metrics: {}
  date: 2021-12-10_03-30-50
  done: false
  episode_len_mean: 39.346846846846844
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9043207206167616
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 120112
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9377172384411097
          entropy_coeff: 0.0
          kl: 0.014422950451262295
          policy_loss: -0.10461599007248878
          total_loss: -0.060436147992732
          vf_explained_var: 0.7738479971885681
          vf_loss: 0.022274986549746245
    num_agent_steps_sampled: 7223032
    num_steps_sampled: 7223032
    num_steps_trained: 7223032
  iterations_since_restore: 191


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1186,52129.8,7223032,1.90432,1.9832,-2,39.3468


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7231024
  custom_metrics: {}
  date: 2021-12-10_03-31-35
  done: false
  episode_len_mean: 36.2
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.910688000255161
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 120337
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9210457149893045
          entropy_coeff: 0.0
          kl: 0.013873176299966872
          policy_loss: -0.09363667777506635
          total_loss: -0.05077299219556153
          vf_explained_var: 0.7179233431816101
          vf_loss: 0.021793800115119666
    num_agent_steps_sampled: 7231024
    num_steps_sampled: 7231024
    num_steps_trained: 7231024
  iterations_since_restore: 192
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1187,52175.4,7231024,1.91069,1.982,-2,36.2


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7239016
  custom_metrics: {}
  date: 2021-12-10_03-32-22
  done: false
  episode_len_mean: 33.43290043290043
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9336086630305171
  episode_reward_min: 1.6684000492095947
  episodes_this_iter: 231
  episodes_total: 120568
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.939184008166194
          entropy_coeff: 0.0
          kl: 0.015181961003690958
          policy_loss: -0.09857533837202936
          total_loss: -0.054282023484120145
          vf_explained_var: 0.7473660707473755
          vf_loss: 0.021235710533801466
    num_agent_steps_sampled: 7239016
    num_steps_sampled: 7239016
    num_steps_trained: 7239016
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1188,52221.4,7239016,1.93361,1.982,1.6684,33.4329


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7247008
  custom_metrics: {}
  date: 2021-12-10_03-33-07
  done: false
  episode_len_mean: 33.29680365296804
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.933822834872764
  episode_reward_min: 1.4539999961853027
  episodes_this_iter: 219
  episodes_total: 120787
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9946218281984329
          entropy_coeff: 0.0
          kl: 0.013953521847724915
          policy_loss: -0.10304104059468955
          total_loss: -0.05832753849972505
          vf_explained_var: 0.7885405421257019
          vf_loss: 0.02352159161819145
    num_agent_steps_sampled: 7247008
    num_steps_sampled: 7247008
    num_steps_trained: 7247008
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1189,52266.9,7247008,1.93382,1.9796,1.454,33.2968


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7255000
  custom_metrics: {}
  date: 2021-12-10_03-33-53
  done: false
  episode_len_mean: 34.86607142857143
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8964946413678783
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 121011
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9669407084584236
          entropy_coeff: 0.0
          kl: 0.013473908882588148
          policy_loss: -0.09424544853391126
          total_loss: -0.045991374863660894
          vf_explained_var: 0.8068610429763794
          vf_loss: 0.027790573658421636
    num_agent_steps_sampled: 7255000
    num_steps_sampled: 7255000
    num_steps_trained: 7255000
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1190,52312.6,7255000,1.89649,1.982,-2,34.8661


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7262992
  custom_metrics: {}
  date: 2021-12-10_03-34-38
  done: false
  episode_len_mean: 36.476190476190474
  episode_media: {}
  episode_reward_max: 1.975600004196167
  episode_reward_mean: 1.90876571337382
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 121221
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9918946512043476
          entropy_coeff: 0.0
          kl: 0.013847508875187486
          policy_loss: -0.09696953315869905
          total_loss: -0.053604720524162985
          vf_explained_var: 0.8246982097625732
          vf_loss: 0.022333909757435322
    num_agent_steps_sampled: 7262992
    num_steps_sampled: 7262992
    num_steps_trained: 7262992
  iterations_since_restore: 196

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1191,52358.1,7262992,1.90877,1.9756,-2,36.4762


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7270984
  custom_metrics: {}
  date: 2021-12-10_03-35-24
  done: false
  episode_len_mean: 34.031088082901555
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.912777197175693
  episode_reward_min: -2.0
  episodes_this_iter: 193
  episodes_total: 121414
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0390349328517914
          entropy_coeff: 0.0
          kl: 0.014258415380027145
          policy_loss: -0.11579184720176272
          total_loss: -0.06913315007113852
          vf_explained_var: 0.8442397713661194
          vf_loss: 0.025003730785101652
    num_agent_steps_sampled: 7270984
    num_steps_sampled: 7270984
    num_steps_trained: 7270984
  iterations_since_restore: 19

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1192,52404,7270984,1.91278,1.9784,-2,34.0311


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7278976
  custom_metrics: {}
  date: 2021-12-10_03-36-10
  done: false
  episode_len_mean: 45.03738317757009
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.910274767987082
  episode_reward_min: 0.0
  episodes_this_iter: 214
  episodes_total: 121628
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0174270030111074
          entropy_coeff: 0.0
          kl: 0.014014344254974276
          policy_loss: -0.10363684190087952
          total_loss: -0.059190702660998795
          vf_explained_var: 0.8447538614273071
          vf_loss: 0.02316185785457492
    num_agent_steps_sampled: 7278976
    num_steps_sampled: 7278976
    num_steps_trained: 7278976
  iterations_since_restore: 198
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1193,52449.6,7278976,1.91027,1.9796,0,45.0374


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7286968
  custom_metrics: {}
  date: 2021-12-10_03-36-56
  done: false
  episode_len_mean: 39.525862068965516
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.904296553211993
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 121860
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9477970283478498
          entropy_coeff: 0.0
          kl: 0.013516933133359998
          policy_loss: -0.09800921610440128
          total_loss: -0.04877636322635226
          vf_explained_var: 0.7488244771957397
          vf_loss: 0.028704010823275894
    num_agent_steps_sampled: 7286968
    num_steps_sampled: 7286968
    num_steps_trained: 7286968
  iterations_since_restore: 19

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1194,52495.5,7286968,1.9043,1.9776,-2,39.5259


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7294960
  custom_metrics: {}
  date: 2021-12-10_03-37-42
  done: false
  episode_len_mean: 37.24311926605505
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9259339417886296
  episode_reward_min: 1.4811999797821045
  episodes_this_iter: 218
  episodes_total: 122078
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9290776886045933
          entropy_coeff: 0.0
          kl: 0.014198015967849642
          policy_loss: -0.10990924929501489
          total_loss: -0.0650791639345698
          vf_explained_var: 0.6926113963127136
          vf_loss: 0.023266849981155246
    num_agent_steps_sampled: 7294960
    num_steps_sampled: 7294960
    num_steps_trained: 7294960
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1195,52541.8,7294960,1.92593,1.982,1.4812,37.2431


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7302952
  custom_metrics: {}
  date: 2021-12-10_03-38-28
  done: false
  episode_len_mean: 32.38260869565217
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9184539152228315
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 122308
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9819716773927212
          entropy_coeff: 0.0
          kl: 0.013785380142508075
          policy_loss: -0.1070112279849127
          total_loss: -0.06513773446204141
          vf_explained_var: 0.792711615562439
          vf_loss: 0.020936946908477694
    num_agent_steps_sampled: 7302952
    num_steps_sampled: 7302952
    num_steps_trained: 7302952
  iterations_since_restore: 201


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1196,52587.6,7302952,1.91845,1.978,-2,32.3826


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7310944
  custom_metrics: {}
  date: 2021-12-10_03-39-14
  done: false
  episode_len_mean: 39.117924528301884
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9221754726373925
  episode_reward_min: 1.093999981880188
  episodes_this_iter: 212
  episodes_total: 122520
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9698949623852968
          entropy_coeff: 0.0
          kl: 0.014422424574149773
          policy_loss: -0.10350624614511617
          total_loss: -0.060917957467609085
          vf_explained_var: 0.774834156036377
          vf_loss: 0.020684230170445517
    num_agent_steps_sampled: 7310944
    num_steps_sampled: 7310944
    num_steps_trained: 7310944
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1197,52633.2,7310944,1.92218,1.9792,1.094,39.1179


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7318936
  custom_metrics: {}
  date: 2021-12-10_03-39-59
  done: false
  episode_len_mean: 34.050228310502284
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9322739698026823
  episode_reward_min: 1.6092000007629395
  episodes_this_iter: 219
  episodes_total: 122739
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9511177744716406
          entropy_coeff: 0.0
          kl: 0.01525502116419375
          policy_loss: -0.10940677156031597
          total_loss: -0.06333206646377221
          vf_explained_var: 0.7370184659957886
          vf_loss: 0.022906143160071224
    num_agent_steps_sampled: 7318936
    num_steps_sampled: 7318936
    num_steps_trained: 7318936
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1198,52678.7,7318936,1.93227,1.9832,1.6092,34.0502


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7326928
  custom_metrics: {}
  date: 2021-12-10_03-40-45
  done: false
  episode_len_mean: 39.11707317073171
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8684039017049279
  episode_reward_min: -2.0
  episodes_this_iter: 205
  episodes_total: 122944
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0084258187562227
          entropy_coeff: 0.0
          kl: 0.012597477034432814
          policy_loss: -0.09184531315986533
          total_loss: -0.04183952839230187
          vf_explained_var: 0.7902019619941711
          vf_loss: 0.03087336791213602
    num_agent_steps_sampled: 7326928
    num_steps_sampled: 7326928
    num_steps_trained: 7326928
  iterations_since_restore: 204

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1199,52724.4,7326928,1.8684,1.984,-2,39.1171


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7334920
  custom_metrics: {}
  date: 2021-12-10_03-41-31
  done: false
  episode_len_mean: 40.18781725888325
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9006700509695837
  episode_reward_min: -2.0
  episodes_this_iter: 197
  episodes_total: 123141
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0053202100098133
          entropy_coeff: 0.0
          kl: 0.014323599549243227
          policy_loss: -0.09927029084064998
          total_loss: -0.05520491873903666
          vf_explained_var: 0.8074020743370056
          vf_loss: 0.022311404929496348
    num_agent_steps_sampled: 7334920
    num_steps_sampled: 7334920
    num_steps_trained: 7334920
  iterations_since_restore: 20

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1200,52770.2,7334920,1.90067,1.984,-2,40.1878


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7342912
  custom_metrics: {}
  date: 2021-12-10_03-42-17
  done: false
  episode_len_mean: 39.49268292682927
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.921418534837118
  episode_reward_min: 1.0515999794006348
  episodes_this_iter: 205
  episodes_total: 123346
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0072130300104618
          entropy_coeff: 0.0
          kl: 0.01526010359521024
          policy_loss: -0.10701889588381164
          total_loss: -0.0597386063891463
          vf_explained_var: 0.7643210291862488
          vf_loss: 0.02410401013912633
    num_agent_steps_sampled: 7342912
    num_steps_sampled: 7342912
    num_steps_trained: 7342912
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1201,52816.1,7342912,1.92142,1.9836,1.0516,39.4927


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7350904
  custom_metrics: {}
  date: 2021-12-10_03-43-03
  done: false
  episode_len_mean: 34.887445887445885
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8805887415295555
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 123577
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9419489204883575
          entropy_coeff: 0.0
          kl: 0.012780637305695564
          policy_loss: -0.09308177686762065
          total_loss: -0.05039218903402798
          vf_explained_var: 0.8300732374191284
          vf_loss: 0.023278997628949583
    num_agent_steps_sampled: 7350904
    num_steps_sampled: 7350904
    num_steps_trained: 7350904
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1202,52862.4,7350904,1.88059,1.9836,-2,34.8874


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7358896
  custom_metrics: {}
  date: 2021-12-10_03-43-49
  done: false
  episode_len_mean: 35.652582159624416
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9105126734630602
  episode_reward_min: -2.0
  episodes_this_iter: 213
  episodes_total: 123790
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9861763138324022
          entropy_coeff: 0.0
          kl: 0.013403971359366551
          policy_loss: -0.09881985059473664
          total_loss: -0.053329067362938076
          vf_explained_var: 0.8050599098205566
          vf_loss: 0.025133500341326
    num_agent_steps_sampled: 7358896
    num_steps_sampled: 7358896
    num_steps_trained: 7358896
  iterations_since_restore: 208

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1203,52908.2,7358896,1.91051,1.9836,-2,35.6526


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7366888
  custom_metrics: {}
  date: 2021-12-10_03-44-35
  done: false
  episode_len_mean: 34.507109004739334
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9130426520984884
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 124001
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.00475799664855
          entropy_coeff: 0.0
          kl: 0.013537939463276416
          policy_loss: -0.10266975731065031
          total_loss: -0.055615035627852194
          vf_explained_var: 0.8154561519622803
          vf_loss: 0.026493975368794054
    num_agent_steps_sampled: 7366888
    num_steps_sampled: 7366888
    num_steps_trained: 7366888
  iterations_since_restore: 20

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1204,52954.1,7366888,1.91304,1.9836,-2,34.5071


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7374880
  custom_metrics: {}
  date: 2021-12-10_03-45-21
  done: false
  episode_len_mean: 41.08810572687225
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9182431737231789
  episode_reward_min: 0.0
  episodes_this_iter: 227
  episodes_total: 124228
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9529848285019398
          entropy_coeff: 0.0
          kl: 0.014419274957617745
          policy_loss: -0.10482715771649964
          total_loss: -0.05741307488642633
          vf_explained_var: 0.740512490272522
          vf_loss: 0.025514810986351222
    num_agent_steps_sampled: 7374880
    num_steps_sampled: 7374880
    num_steps_trained: 7374880
  iterations_since_restore: 210


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1205,53000.2,7374880,1.91824,1.9836,0,41.0881


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7382872
  custom_metrics: {}
  date: 2021-12-10_03-46-07
  done: false
  episode_len_mean: 37.410628019323674
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9255072471600223
  episode_reward_min: 1.229200005531311
  episodes_this_iter: 207
  episodes_total: 124435
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9894544165581465
          entropy_coeff: 0.0
          kl: 0.015095121489139274
          policy_loss: -0.11485757870832458
          total_loss: -0.06652905960800126
          vf_explained_var: 0.740381121635437
          vf_loss: 0.025402804370969534
    num_agent_steps_sampled: 7382872
    num_steps_sampled: 7382872
    num_steps_trained: 7382872
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1206,53045.8,7382872,1.92551,1.9836,1.2292,37.4106


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7390864
  custom_metrics: {}
  date: 2021-12-10_03-46-52
  done: false
  episode_len_mean: 34.54838709677419
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8787594451332972
  episode_reward_min: -2.0
  episodes_this_iter: 217
  episodes_total: 124652
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9783992283046246
          entropy_coeff: 0.0
          kl: 0.013714512140722945
          policy_loss: -0.09330335883714724
          total_loss: -0.04753460135543719
          vf_explained_var: 0.8513166904449463
          vf_loss: 0.024939841823652387
    num_agent_steps_sampled: 7390864
    num_steps_sampled: 7390864
    num_steps_trained: 7390864
  iterations_since_restore: 21

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1207,53091.5,7390864,1.87876,1.9836,-2,34.5484


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7398856
  custom_metrics: {}
  date: 2021-12-10_03-47-38
  done: false
  episode_len_mean: 32.834782608695654
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9347547816193622
  episode_reward_min: 1.2943999767303467
  episodes_this_iter: 230
  episodes_total: 124882
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9649177137762308
          entropy_coeff: 0.0
          kl: 0.014911904407199472
          policy_loss: -0.10151218461396638
          total_loss: -0.05676262965425849
          vf_explained_var: 0.8125536441802979
          vf_loss: 0.022102098912000656
    num_agent_steps_sampled: 7398856
    num_steps_sampled: 7398856
    num_steps_trained: 7398856
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1208,53137.1,7398856,1.93475,1.9836,1.2944,32.8348


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7406848
  custom_metrics: {}
  date: 2021-12-10_03-48-23
  done: false
  episode_len_mean: 39.896039603960396
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9012237633809004
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 125084
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9996606446802616
          entropy_coeff: 0.0
          kl: 0.013680381351150572
          policy_loss: -0.10633712133858353
          total_loss: -0.05975790478987619
          vf_explained_var: 0.8091424107551575
          vf_loss: 0.025802139076404274
    num_agent_steps_sampled: 7406848
    num_steps_sampled: 7406848
    num_steps_trained: 7406848
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1209,53182.7,7406848,1.90122,1.9788,-2,39.896


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7414840
  custom_metrics: {}
  date: 2021-12-10_03-49-09
  done: false
  episode_len_mean: 40.79342723004695
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9188037561558782
  episode_reward_min: 0.15960000455379486
  episodes_this_iter: 213
  episodes_total: 125297
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9752737153321505
          entropy_coeff: 0.0
          kl: 0.014270093233790249
          policy_loss: -0.11012676695827395
          total_loss: -0.061789293540641665
          vf_explained_var: 0.7457678318023682
          vf_loss: 0.0266647677635774
    num_agent_steps_sampled: 7414840
    num_steps_sampled: 7414840
    num_steps_trained: 7414840
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1210,53228.2,7414840,1.9188,1.9796,0.1596,40.7934


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7422832
  custom_metrics: {}
  date: 2021-12-10_03-49-55
  done: false
  episode_len_mean: 37.370892018779344
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9074272309110758
  episode_reward_min: -2.0
  episodes_this_iter: 213
  episodes_total: 125510
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0173831526190042
          entropy_coeff: 0.0
          kl: 0.014013732550665736
          policy_loss: -0.1002465518831741
          total_loss: -0.0555464497738285
          vf_explained_var: 0.82906174659729
          vf_loss: 0.023416745592840016
    num_agent_steps_sampled: 7422832
    num_steps_sampled: 7422832
    num_steps_trained: 7422832
  iterations_since_restore: 216
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1211,53273.7,7422832,1.90743,1.982,-2,37.3709


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7430824
  custom_metrics: {}
  date: 2021-12-10_03-50-40
  done: false
  episode_len_mean: 35.49583333333333
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9294666642944018
  episode_reward_min: 1.267199993133545
  episodes_this_iter: 240
  episodes_total: 125750
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9492356646806002
          entropy_coeff: 0.0
          kl: 0.014813086076173931
          policy_loss: -0.11172177636763081
          total_loss: -0.06607175100361928
          vf_explained_var: 0.736130952835083
          vf_loss: 0.023152651032432914
    num_agent_steps_sampled: 7430824
    num_steps_sampled: 7430824
    num_steps_trained: 7430824
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1212,53319.5,7430824,1.92947,1.9824,1.2672,35.4958


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7438816
  custom_metrics: {}
  date: 2021-12-10_03-51-26
  done: false
  episode_len_mean: 31.821862348178136
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9050348181473582
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 125997
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9338809978216887
          entropy_coeff: 0.0
          kl: 0.012666694121435285
          policy_loss: -0.08961001451825723
          total_loss: -0.04002792573010083
          vf_explained_var: 0.7690125703811646
          vf_loss: 0.030344547471031547
    num_agent_steps_sampled: 7438816
    num_steps_sampled: 7438816
    num_steps_trained: 7438816
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1213,53365.3,7438816,1.90503,1.9824,-2,31.8219


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7446808
  custom_metrics: {}
  date: 2021-12-10_03-52-12
  done: false
  episode_len_mean: 34.33624454148472
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.914433184669528
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 126226
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9537202008068562
          entropy_coeff: 0.0
          kl: 0.013697718357434496
          policy_loss: -0.09664067003177479
          total_loss: -0.05111313486122526
          vf_explained_var: 0.7792298793792725
          vf_loss: 0.02472412574570626
    num_agent_steps_sampled: 7446808
    num_steps_sampled: 7446808
    num_steps_trained: 7446808
  iterations_since_restore: 219


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1214,53411.1,7446808,1.91443,1.9824,-2,34.3362


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7454800
  custom_metrics: {}
  date: 2021-12-10_03-52-58
  done: false
  episode_len_mean: 34.39449541284404
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9140183504568327
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 126444
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9762759543955326
          entropy_coeff: 0.0
          kl: 0.013279526436235756
          policy_loss: -0.09584497688047122
          total_loss: -0.04415543966752011
          vf_explained_var: 0.7526845932006836
          vf_loss: 0.03152125870110467
    num_agent_steps_sampled: 7454800
    num_steps_sampled: 7454800
    num_steps_trained: 7454800
  iterations_since_restore: 220

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1215,53456.9,7454800,1.91402,1.9824,-2,34.3945


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7462792
  custom_metrics: {}
  date: 2021-12-10_03-53-43
  done: false
  episode_len_mean: 36.48803827751196
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.927473680254375
  episode_reward_min: 1.5063999891281128
  episodes_this_iter: 209
  episodes_total: 126653
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9925598427653313
          entropy_coeff: 0.0
          kl: 0.014851732354145497
          policy_loss: -0.10450411454075947
          total_loss: -0.05834723575389944
          vf_explained_var: 0.7999580502510071
          vf_loss: 0.02360081166261807
    num_agent_steps_sampled: 7462792
    num_steps_sampled: 7462792
    num_steps_trained: 7462792
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1216,53502.4,7462792,1.92747,1.9824,1.5064,36.488


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7470784
  custom_metrics: {}
  date: 2021-12-10_03-54-29
  done: false
  episode_len_mean: 47.88421052631579
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8432357888472708
  episode_reward_min: -2.0
  episodes_this_iter: 190
  episodes_total: 126843
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0135301370173693
          entropy_coeff: 0.0
          kl: 0.012902049056719989
          policy_loss: -0.09593031654367223
          total_loss: -0.04529784744954668
          vf_explained_var: 0.7923710942268372
          vf_loss: 0.031037482898682356
    num_agent_steps_sampled: 7470784
    num_steps_sampled: 7470784
    num_steps_trained: 7470784
  iterations_since_restore: 22

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1217,53547.9,7470784,1.84324,1.9804,-2,47.8842


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7478776
  custom_metrics: {}
  date: 2021-12-10_03-55-16
  done: false
  episode_len_mean: 36.2029702970297
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8522257438980707
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 127045
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9949788935482502
          entropy_coeff: 0.0
          kl: 0.013243642897577956
          policy_loss: -0.0952404471609043
          total_loss: -0.03826053629745729
          vf_explained_var: 0.8199940919876099
          vf_loss: 0.036866127979010344
    num_agent_steps_sampled: 7478776
    num_steps_sampled: 7478776
    num_steps_trained: 7478776
  iterations_since_restore: 223


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1218,53594.7,7478776,1.85223,1.9784,-2,36.203


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7486768
  custom_metrics: {}
  date: 2021-12-10_03-56-05
  done: false
  episode_len_mean: 37.424778761061944
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9090300861713105
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 127271
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9657274391502142
          entropy_coeff: 0.0
          kl: 0.013840031431755051
          policy_loss: -0.10088170450762846
          total_loss: -0.05423670800519176
          vf_explained_var: 0.826453685760498
          vf_loss: 0.02562545007094741
    num_agent_steps_sampled: 7486768
    num_steps_sampled: 7486768
    num_steps_trained: 7486768
  iterations_since_restore: 224


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1219,53643.6,7486768,1.90903,1.9816,-2,37.4248


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7494760
  custom_metrics: {}
  date: 2021-12-10_03-56-50
  done: false
  episode_len_mean: 41.09950248756219
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8798666729855893
  episode_reward_min: -2.0
  episodes_this_iter: 201
  episodes_total: 127472
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9782695267349482
          entropy_coeff: 0.0
          kl: 0.013048400258412585
          policy_loss: -0.09842376812594011
          total_loss: -0.04579480621032417
          vf_explained_var: 0.7862008810043335
          vf_loss: 0.03281170455738902
    num_agent_steps_sampled: 7494760
    num_steps_sampled: 7494760
    num_steps_trained: 7494760
  iterations_since_restore: 225

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1220,53689.1,7494760,1.87987,1.982,-2,41.0995


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7502752
  custom_metrics: {}
  date: 2021-12-10_03-57-36
  done: false
  episode_len_mean: 35.97129186602871
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8949033478230382
  episode_reward_min: -2.0
  episodes_this_iter: 209
  episodes_total: 127681
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0147226974368095
          entropy_coeff: 0.0
          kl: 0.014114237215835601
          policy_loss: -0.10158814341411926
          total_loss: -0.05399223818676546
          vf_explained_var: 0.8513805866241455
          vf_loss: 0.026159907458350062
    num_agent_steps_sampled: 7502752
    num_steps_sampled: 7502752
    num_steps_trained: 7502752
  iterations_since_restore: 226

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1221,53734.6,7502752,1.8949,1.9832,-2,35.9713


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7510744
  custom_metrics: {}
  date: 2021-12-10_03-58-22
  done: false
  episode_len_mean: 35.38073394495413
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8943577988432088
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 127899
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9650035593658686
          entropy_coeff: 0.0
          kl: 0.014130517287412658
          policy_loss: -0.09776417218381539
          total_loss: -0.050263452969375066
          vf_explained_var: 0.8535503149032593
          vf_loss: 0.026039996941108257
    num_agent_steps_sampled: 7510744
    num_steps_sampled: 7510744
    num_steps_trained: 7510744
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1222,53780.5,7510744,1.89436,1.98,-2,35.3807


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7518736
  custom_metrics: {}
  date: 2021-12-10_03-59-07
  done: false
  episode_len_mean: 37.315555555555555
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9086613334549798
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 128124
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9766652956604958
          entropy_coeff: 0.0
          kl: 0.01493866229429841
          policy_loss: -0.10595660732360557
          total_loss: -0.055761048279237
          vf_explained_var: 0.8168075084686279
          vf_loss: 0.02750746440142393
    num_agent_steps_sampled: 7518736
    num_steps_sampled: 7518736
    num_steps_trained: 7518736
  iterations_since_restore: 228
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1223,53825.9,7518736,1.90866,1.982,-2,37.3156


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7526728
  custom_metrics: {}
  date: 2021-12-10_03-59-53
  done: false
  episode_len_mean: 34.61818181818182
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9311345447193493
  episode_reward_min: 1.7015999555587769
  episodes_this_iter: 220
  episodes_total: 128344
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9580125138163567
          entropy_coeff: 0.0
          kl: 0.014919937268132344
          policy_loss: -0.10680381319252774
          total_loss: -0.060578689590329304
          vf_explained_var: 0.7745278477668762
          vf_loss: 0.0235654714924749
    num_agent_steps_sampled: 7526728
    num_steps_sampled: 7526728
    num_steps_trained: 7526728
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1224,53871.5,7526728,1.93113,1.9808,1.7016,34.6182


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7534720
  custom_metrics: {}
  date: 2021-12-10_04-00-38
  done: false
  episode_len_mean: 40.05263157894737
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8839559788909255
  episode_reward_min: -2.0
  episodes_this_iter: 209
  episodes_total: 128553
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9794934224337339
          entropy_coeff: 0.0
          kl: 0.01333557369071059
          policy_loss: -0.09115412601386197
          total_loss: -0.04545675605186261
          vf_explained_var: 0.8570590019226074
          vf_loss: 0.025443964230362326
    num_agent_steps_sampled: 7534720
    num_steps_sampled: 7534720
    num_steps_trained: 7534720
  iterations_since_restore: 230

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1225,53917,7534720,1.88396,1.9808,-2,40.0526


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7542712
  custom_metrics: {}
  date: 2021-12-10_04-01-24
  done: false
  episode_len_mean: 36.43478260869565
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8896657003872637
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 128760
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9832378923892975
          entropy_coeff: 0.0
          kl: 0.01320213483995758
          policy_loss: -0.09783504591905512
          total_loss: -0.044281771464738995
          vf_explained_var: 0.7633360624313354
          vf_loss: 0.03350253443932161
    num_agent_steps_sampled: 7542712
    num_steps_sampled: 7542712
    num_steps_trained: 7542712
  iterations_since_restore: 231

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1226,53962.7,7542712,1.88967,1.9804,-2,36.4348


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7550704
  custom_metrics: {}
  date: 2021-12-10_04-02-10
  done: false
  episode_len_mean: 40.91203703703704
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.900414815655461
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 128976
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9620718415826559
          entropy_coeff: 0.0
          kl: 0.013881443330319598
          policy_loss: -0.09561490715714172
          total_loss: -0.04372880651499145
          vf_explained_var: 0.704084575176239
          vf_loss: 0.03080365783534944
    num_agent_steps_sampled: 7550704
    num_steps_sampled: 7550704
    num_steps_trained: 7550704
  iterations_since_restore: 232
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1227,54008.3,7550704,1.90041,1.9808,-2,40.912


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7558696
  custom_metrics: {}
  date: 2021-12-10_04-02-56
  done: false
  episode_len_mean: 38.34975369458128
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.886191130858924
  episode_reward_min: -2.0
  episodes_this_iter: 203
  episodes_total: 129179
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9493448678404093
          entropy_coeff: 0.0
          kl: 0.012854727217927575
          policy_loss: -0.08876654900086578
          total_loss: -0.036626625136705115
          vf_explained_var: 0.736143946647644
          vf_loss: 0.03261680531431921
    num_agent_steps_sampled: 7558696
    num_steps_sampled: 7558696
    num_steps_trained: 7558696
  iterations_since_restore: 233


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1228,54054.5,7558696,1.88619,1.9812,-2,38.3498


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7566688
  custom_metrics: {}
  date: 2021-12-10_04-03-42
  done: false
  episode_len_mean: 36.3348623853211
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.927682569267553
  episode_reward_min: 1.2267999649047852
  episodes_this_iter: 218
  episodes_total: 129397
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9689001236110926
          entropy_coeff: 0.0
          kl: 0.01467481633881107
          policy_loss: -0.107339620240964
          total_loss: -0.058936481684213504
          vf_explained_var: 0.7290701270103455
          vf_loss: 0.026115763175766915
    num_agent_steps_sampled: 7566688
    num_steps_sampled: 7566688
    num_steps_trained: 7566688
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1229,54100.3,7566688,1.92768,1.9824,1.2268,36.3349


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7574680
  custom_metrics: {}
  date: 2021-12-10_04-04-28
  done: false
  episode_len_mean: 37.069868995633186
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.909708295326566
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 129626
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9371951054781675
          entropy_coeff: 0.0
          kl: 0.01406490127556026
          policy_loss: -0.09918475622544065
          total_loss: -0.05345866779680364
          vf_explained_var: 0.768930196762085
          vf_loss: 0.02436501975171268
    num_agent_steps_sampled: 7574680
    num_steps_sampled: 7574680
    num_steps_trained: 7574680
  iterations_since_restore: 235
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1230,54146.3,7574680,1.90971,1.9824,-2,37.0699


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7582672
  custom_metrics: {}
  date: 2021-12-10_04-05-14
  done: false
  episode_len_mean: 33.61181434599156
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.933191560491731
  episode_reward_min: 1.6643999814987183
  episodes_this_iter: 237
  episodes_total: 129863
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9122703373432159
          entropy_coeff: 0.0
          kl: 0.01496078455238603
          policy_loss: -0.10578045347938314
          total_loss: -0.05999817786505446
          vf_explained_var: 0.6682450771331787
          vf_loss: 0.02306058316025883
    num_agent_steps_sampled: 7582672
    num_steps_sampled: 7582672
    num_steps_trained: 7582672
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1231,54192.1,7582672,1.93319,1.9824,1.6644,33.6118


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7590664
  custom_metrics: {}
  date: 2021-12-10_04-06-00
  done: false
  episode_len_mean: 33.51315789473684
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.882070175388403
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 130091
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.946384260430932
          entropy_coeff: 0.0
          kl: 0.012688544287811965
          policy_loss: -0.09598527582420502
          total_loss: -0.04244232882047072
          vf_explained_var: 0.6923978328704834
          vf_loss: 0.0342722178902477
    num_agent_steps_sampled: 7590664
    num_steps_sampled: 7590664
    num_steps_trained: 7590664
  iterations_since_restore: 237
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1232,54238.2,7590664,1.88207,1.9824,-2,33.5132


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7598656
  custom_metrics: {}
  date: 2021-12-10_04-06-46
  done: false
  episode_len_mean: 34.871244635193136
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.897050644195131
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 130324
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9361184053122997
          entropy_coeff: 0.0
          kl: 0.013640124176163226
          policy_loss: -0.09360700534307398
          total_loss: -0.04601858148816973
          vf_explained_var: 0.764440655708313
          vf_loss: 0.02687248616712168
    num_agent_steps_sampled: 7598656
    num_steps_sampled: 7598656
    num_steps_trained: 7598656
  iterations_since_restore: 238


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1233,54284.2,7598656,1.89705,1.9824,-2,34.8712


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7606648
  custom_metrics: {}
  date: 2021-12-10_04-07-31
  done: false
  episode_len_mean: 33.662100456621005
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9151762517075561
  episode_reward_min: -2.0
  episodes_this_iter: 219
  episodes_total: 130543
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9726717546582222
          entropy_coeff: 0.0
          kl: 0.01398316552513279
          policy_loss: -0.09870003093965352
          total_loss: -0.0558328288316261
          vf_explained_var: 0.7901820540428162
          vf_loss: 0.021630270639434457
    num_agent_steps_sampled: 7606648
    num_steps_sampled: 7606648
    num_steps_trained: 7606648
  iterations_since_restore: 239

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1234,54329.9,7606648,1.91518,1.9836,-2,33.6621


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7614640
  custom_metrics: {}
  date: 2021-12-10_04-08-17
  done: false
  episode_len_mean: 37.82378854625551
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8909303963446933
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 130770
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9339815638959408
          entropy_coeff: 0.0
          kl: 0.013737453176872805
          policy_loss: -0.0926814440463204
          total_loss: -0.043740695633459836
          vf_explained_var: 0.7661490440368652
          vf_loss: 0.028076989168766886
    num_agent_steps_sampled: 7614640
    num_steps_sampled: 7614640
    num_steps_trained: 7614640
  iterations_since_restore: 24

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1235,54375.9,7614640,1.89093,1.9836,-2,37.8238


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7622632
  custom_metrics: {}
  date: 2021-12-10_04-09-03
  done: false
  episode_len_mean: 33.74476987447699
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.932923851152843
  episode_reward_min: 1.5404000282287598
  episodes_this_iter: 239
  episodes_total: 131009
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9304346274584532
          entropy_coeff: 0.0
          kl: 0.01516515132971108
          policy_loss: -0.0994145986624062
          total_loss: -0.05186912516364828
          vf_explained_var: 0.6528298854827881
          vf_loss: 0.02451339695835486
    num_agent_steps_sampled: 7622632
    num_steps_sampled: 7622632
    num_steps_trained: 7622632
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1236,54421.7,7622632,1.93292,1.9836,1.5404,33.7448


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7630624
  custom_metrics: {}
  date: 2021-12-10_04-09-49
  done: false
  episode_len_mean: 35.03947368421053
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8968175430046885
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 131237
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9522453788667917
          entropy_coeff: 0.0
          kl: 0.014127556496532634
          policy_loss: -0.09879213714157231
          total_loss: -0.055610787705518305
          vf_explained_var: 0.8215353488922119
          vf_loss: 0.021725122787756845
    num_agent_steps_sampled: 7630624
    num_steps_sampled: 7630624
    num_steps_trained: 7630624
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1237,54467.6,7630624,1.89682,1.9824,-2,35.0395


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7638616
  custom_metrics: {}
  date: 2021-12-10_04-10-35
  done: false
  episode_len_mean: 35.633187772925766
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9292104811647572
  episode_reward_min: 1.5715999603271484
  episodes_this_iter: 229
  episodes_total: 131466
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.949477905407548
          entropy_coeff: 0.0
          kl: 0.015430669474881142
          policy_loss: -0.10916462301975116
          total_loss: -0.06464195690932684
          vf_explained_var: 0.6718092560768127
          vf_loss: 0.021087338915094733
    num_agent_steps_sampled: 7638616
    num_steps_sampled: 7638616
    num_steps_trained: 7638616
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1238,54513.6,7638616,1.92921,1.9836,1.5716,35.6332


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7646608
  custom_metrics: {}
  date: 2021-12-10_04-11-21
  done: false
  episode_len_mean: 33.57964601769911
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9332566403709681
  episode_reward_min: 1.6776000261306763
  episodes_this_iter: 226
  episodes_total: 131692
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9442086890339851
          entropy_coeff: 0.0
          kl: 0.01503002576646395
          policy_loss: -0.10809797659749165
          total_loss: -0.0649516083067283
          vf_explained_var: 0.6955334544181824
          vf_loss: 0.02031952008837834
    num_agent_steps_sampled: 7646608
    num_steps_sampled: 7646608
    num_steps_trained: 7646608
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1239,54559.6,7646608,1.93326,1.982,1.6776,33.5796


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7654600
  custom_metrics: {}
  date: 2021-12-10_04-12-07
  done: false
  episode_len_mean: 35.05172413793103
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.899036208103443
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 131924
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9338505826890469
          entropy_coeff: 0.0
          kl: 0.013212301506428048
          policy_loss: -0.08273620245745406
          total_loss: -0.03839137058821507
          vf_explained_var: 0.8112796545028687
          vf_loss: 0.024278650933410972
    num_agent_steps_sampled: 7654600
    num_steps_sampled: 7654600
    num_steps_trained: 7654600
  iterations_since_restore: 245

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1240,54605.4,7654600,1.89904,1.982,-2,35.0517


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7662592
  custom_metrics: {}
  date: 2021-12-10_04-12-53
  done: false
  episode_len_mean: 33.41991341991342
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9335567125510345
  episode_reward_min: 1.6435999870300293
  episodes_this_iter: 231
  episodes_total: 132155
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9351945295929909
          entropy_coeff: 0.0
          kl: 0.014888778154272586
          policy_loss: -0.10016106336843222
          total_loss: -0.054707454110030085
          vf_explained_var: 0.7174843549728394
          vf_loss: 0.022841279744170606
    num_agent_steps_sampled: 7662592
    num_steps_sampled: 7662592
    num_steps_trained: 7662592
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1241,54651.3,7662592,1.93356,1.9796,1.6436,33.4199


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7670584
  custom_metrics: {}
  date: 2021-12-10_04-13-39
  done: false
  episode_len_mean: 37.99047619047619
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8694876182646978
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 132365
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9778235871344805
          entropy_coeff: 0.0
          kl: 0.013575972348917276
          policy_loss: -0.09332541306503117
          total_loss: -0.051038502555456944
          vf_explained_var: 0.8212841749191284
          vf_loss: 0.021668403642252088
    num_agent_steps_sampled: 7670584
    num_steps_sampled: 7670584
    num_steps_trained: 7670584
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1242,54697.5,7670584,1.86949,1.982,-2,37.9905


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7678576
  custom_metrics: {}
  date: 2021-12-10_04-14-25
  done: false
  episode_len_mean: 32.295081967213115
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.935763932153827
  episode_reward_min: 1.4575999975204468
  episodes_this_iter: 244
  episodes_total: 132609
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9448303077369928
          entropy_coeff: 0.0
          kl: 0.013726834673434496
          policy_loss: -0.09894968720618635
          total_loss: -0.05638272827491164
          vf_explained_var: 0.8144422769546509
          vf_loss: 0.021719326759921387
    num_agent_steps_sampled: 7678576
    num_steps_sampled: 7678576
    num_steps_trained: 7678576
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1243,54743.2,7678576,1.93576,1.982,1.4576,32.2951


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7686568
  custom_metrics: {}
  date: 2021-12-10_04-15-11
  done: false
  episode_len_mean: 35.42916666666667
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8971716654797395
  episode_reward_min: -2.0
  episodes_this_iter: 240
  episodes_total: 132849
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9283650368452072
          entropy_coeff: 0.0
          kl: 0.013228208350483328
          policy_loss: -0.09409408905776218
          total_loss: -0.04910070356709184
          vf_explained_var: 0.7499786615371704
          vf_loss: 0.024903041776269674
    num_agent_steps_sampled: 7686568
    num_steps_sampled: 7686568
    num_steps_trained: 7686568
  iterations_since_restore: 24

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1244,54788.8,7686568,1.89717,1.9812,-2,35.4292


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7694560
  custom_metrics: {}
  date: 2021-12-10_04-15-56
  done: false
  episode_len_mean: 33.881632653061224
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.916542043004717
  episode_reward_min: -2.0
  episodes_this_iter: 245
  episodes_total: 133094
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9135973677039146
          entropy_coeff: 0.0
          kl: 0.013621608086396009
          policy_loss: -0.09209355944767594
          total_loss: -0.04385254161206831
          vf_explained_var: 0.6217817068099976
          vf_loss: 0.027553199615795165
    num_agent_steps_sampled: 7694560
    num_steps_sampled: 7694560
    num_steps_trained: 7694560
  iterations_since_restore: 25

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1245,54834.3,7694560,1.91654,1.982,-2,33.8816


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7702552
  custom_metrics: {}
  date: 2021-12-10_04-16-42
  done: false
  episode_len_mean: 29.265151515151516
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9120181798934937
  episode_reward_min: -2.0
  episodes_this_iter: 264
  episodes_total: 133358
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9004886839538813
          entropy_coeff: 0.0
          kl: 0.013176136621041223
          policy_loss: -0.08707725023850799
          total_loss: -0.043743911664932966
          vf_explained_var: 0.7478229999542236
          vf_loss: 0.023322083754464984
    num_agent_steps_sampled: 7702552
    num_steps_sampled: 7702552
    num_steps_trained: 7702552
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1246,54879.9,7702552,1.91202,1.982,-2,29.2652


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7710544
  custom_metrics: {}
  date: 2021-12-10_04-17-28
  done: false
  episode_len_mean: 33.43478260869565
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9335565193839694
  episode_reward_min: 1.6848000288009644
  episodes_this_iter: 230
  episodes_total: 133588
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9471257403492928
          entropy_coeff: 0.0
          kl: 0.01455323604750447
          policy_loss: -0.10903882316779345
          total_loss: -0.06772449176060036
          vf_explained_var: 0.7320311069488525
          vf_loss: 0.019211606413591653
    num_agent_steps_sampled: 7710544
    num_steps_sampled: 7710544
    num_steps_trained: 7710544
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1247,54925.5,7710544,1.93356,1.982,1.6848,33.4348


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7718536
  custom_metrics: {}
  date: 2021-12-10_04-18-13
  done: false
  episode_len_mean: 38.66504854368932
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9231145624975556
  episode_reward_min: 1.1615999937057495
  episodes_this_iter: 206
  episodes_total: 133794
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0014235954731703
          entropy_coeff: 0.0
          kl: 0.015230363642331213
          policy_loss: -0.10773054682067595
          total_loss: -0.06590794971270952
          vf_explained_var: 0.7464896440505981
          vf_loss: 0.018691480450797826
    num_agent_steps_sampled: 7718536
    num_steps_sampled: 7718536
    num_steps_trained: 7718536
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1248,54971.1,7718536,1.92311,1.98,1.1616,38.665


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7726528
  custom_metrics: {}
  date: 2021-12-10_04-19-00
  done: false
  episode_len_mean: 33.71739130434783
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.916085216273432
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 134024
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9410132244229317
          entropy_coeff: 0.0
          kl: 0.01349931544973515
          policy_loss: -0.1032431630010251
          total_loss: -0.06314282448147424
          vf_explained_var: 0.7745081186294556
          vf_loss: 0.01959825516678393
    num_agent_steps_sampled: 7726528
    num_steps_sampled: 7726528
    num_steps_trained: 7726528
  iterations_since_restore: 254
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1249,55017.5,7726528,1.91609,1.9812,-2,33.7174


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7734520
  custom_metrics: {}
  date: 2021-12-10_04-19-45
  done: false
  episode_len_mean: 35.355
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8910560029745103
  episode_reward_min: -2.0
  episodes_this_iter: 200
  episodes_total: 134224
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0014521088451147
          entropy_coeff: 0.0
          kl: 0.01371039220248349
          policy_loss: -0.08911970388726331
          total_loss: -0.04563101666281
          vf_explained_var: 0.8164737224578857
          vf_loss: 0.02266603213502094
    num_agent_steps_sampled: 7734520
    num_steps_sampled: 7734520
    num_steps_trained: 7734520
  iterations_since_restore: 255
  node_ip: 192

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1250,55063.3,7734520,1.89106,1.9808,-2,35.355


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7742512
  custom_metrics: {}
  date: 2021-12-10_04-20-31
  done: false
  episode_len_mean: 38.58510638297872
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9232468097767932
  episode_reward_min: 1.5288000106811523
  episodes_this_iter: 188
  episodes_total: 134412
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0590204931795597
          entropy_coeff: 0.0
          kl: 0.014307504374301061
          policy_loss: -0.10452638333663344
          total_loss: -0.06292634017881937
          vf_explained_var: 0.8459782600402832
          vf_loss: 0.019870522955898196
    num_agent_steps_sampled: 7742512
    num_steps_sampled: 7742512
    num_steps_trained: 7742512
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1251,55109.1,7742512,1.92325,1.98,1.5288,38.5851


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7750504
  custom_metrics: {}
  date: 2021-12-10_04-21-17
  done: false
  episode_len_mean: 40.17488789237668
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8851049315234472
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 134635
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9579046051949263
          entropy_coeff: 0.0
          kl: 0.012925522518344223
          policy_loss: -0.09404975580400787
          total_loss: -0.044304246141109616
          vf_explained_var: 0.7608551979064941
          vf_loss: 0.030114872090052813
    num_agent_steps_sampled: 7750504
    num_steps_sampled: 7750504
    num_steps_trained: 7750504
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1252,55154.9,7750504,1.8851,1.9856,-2,40.1749


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7758496
  custom_metrics: {}
  date: 2021-12-10_04-22-03
  done: false
  episode_len_mean: 43.051162790697674
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8963032538114593
  episode_reward_min: -2.0
  episodes_this_iter: 215
  episodes_total: 134850
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9235530439764261
          entropy_coeff: 0.0
          kl: 0.014598483394365758
          policy_loss: -0.1069554904679535
          total_loss: -0.0626606538426131
          vf_explained_var: 0.8057975769042969
          vf_loss: 0.022123385540908203
    num_agent_steps_sampled: 7758496
    num_steps_sampled: 7758496
    num_steps_trained: 7758496
  iterations_since_restore: 258

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1253,55200.4,7758496,1.8963,1.9856,-2,43.0512


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7766488
  custom_metrics: {}
  date: 2021-12-10_04-22-48
  done: false
  episode_len_mean: 33.802575107296136
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8995828311330771
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 135083
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9261515270918608
          entropy_coeff: 0.0
          kl: 0.013333061913726851
          policy_loss: -0.0968263722024858
          total_loss: -0.05251612377469428
          vf_explained_var: 0.8045715093612671
          vf_loss: 0.02406066219555214
    num_agent_steps_sampled: 7766488
    num_steps_sampled: 7766488
    num_steps_trained: 7766488
  iterations_since_restore: 259

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1254,55246.1,7766488,1.89958,1.9856,-2,33.8026


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7774480
  custom_metrics: {}
  date: 2021-12-10_04-23-34
  done: false
  episode_len_mean: 34.8471615720524
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.913708294843482
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 135312
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9289746936410666
          entropy_coeff: 0.0
          kl: 0.013653755420818925
          policy_loss: -0.09618981159292161
          total_loss: -0.04933306596649345
          vf_explained_var: 0.8409748077392578
          vf_loss: 0.026120106282178313
    num_agent_steps_sampled: 7774480
    num_steps_sampled: 7774480
    num_steps_trained: 7774480
  iterations_since_restore: 260


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1255,55291.8,7774480,1.91371,1.9856,-2,34.8472


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7782472
  custom_metrics: {}
  date: 2021-12-10_04-24-20
  done: false
  episode_len_mean: 32.76954732510288
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8713086421597642
  episode_reward_min: -2.0
  episodes_this_iter: 243
  episodes_total: 135555
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9236013684421778
          entropy_coeff: 0.0
          kl: 0.013216770283179358
          policy_loss: -0.09133456423296593
          total_loss: -0.037879296418395825
          vf_explained_var: 0.7997201085090637
          vf_loss: 0.03338229807559401
    num_agent_steps_sampled: 7782472
    num_steps_sampled: 7782472
    num_steps_trained: 7782472
  iterations_since_restore: 26

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1256,55337.7,7782472,1.87131,1.9856,-2,32.7695


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7790464
  custom_metrics: {}
  date: 2021-12-10_04-25-06
  done: false
  episode_len_mean: 33.53086419753087
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.9333234554455605
  episode_reward_min: 1.351199984550476
  episodes_this_iter: 243
  episodes_total: 135798
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9034824948757887
          entropy_coeff: 0.0
          kl: 0.014661764376796782
          policy_loss: -0.09978364972630516
          total_loss: -0.05315947945928201
          vf_explained_var: 0.7018402814865112
          vf_loss: 0.024356615671422333
    num_agent_steps_sampled: 7790464
    num_steps_sampled: 7790464
    num_steps_trained: 7790464
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1257,55383.4,7790464,1.93332,1.9856,1.3512,33.5309


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7798456
  custom_metrics: {}
  date: 2021-12-10_04-25-52
  done: false
  episode_len_mean: 30.48046875
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.9239234356209636
  episode_reward_min: -2.0
  episodes_this_iter: 256
  episodes_total: 136054
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9262693803757429
          entropy_coeff: 0.0
          kl: 0.013217831292422488
          policy_loss: -0.09234095460851677
          total_loss: -0.04739186179358512
          vf_explained_var: 0.7533866167068481
          vf_loss: 0.024874509021174163
    num_agent_steps_sampled: 7798456
    num_steps_sampled: 7798456
    num_steps_trained: 7798456
  iterations_since_restore: 263
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1258,55429.3,7798456,1.92392,1.9856,-2,30.4805


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7806448
  custom_metrics: {}
  date: 2021-12-10_04-26-37
  done: false
  episode_len_mean: 34.68122270742358
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8844349327045757
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 136283
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.94575010612607
          entropy_coeff: 0.0
          kl: 0.012330837256740779
          policy_loss: -0.09079933498287573
          total_loss: -0.046286899378173985
          vf_explained_var: 0.8618955612182617
          vf_loss: 0.025784976489376277
    num_agent_steps_sampled: 7806448
    num_steps_sampled: 7806448
    num_steps_trained: 7806448
  iterations_since_restore: 264

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1259,55474.9,7806448,1.88443,1.9804,-2,34.6812


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7814440
  custom_metrics: {}
  date: 2021-12-10_04-27-23
  done: false
  episode_len_mean: 34.48898678414097
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9314026438717276
  episode_reward_min: 1.2791999578475952
  episodes_this_iter: 227
  episodes_total: 136510
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9700349308550358
          entropy_coeff: 0.0
          kl: 0.014041508547961712
          policy_loss: -0.10099181180703454
          total_loss: -0.05218656890792772
          vf_explained_var: 0.753516674041748
          vf_loss: 0.027479701326228678
    num_agent_steps_sampled: 7814440
    num_steps_sampled: 7814440
    num_steps_trained: 7814440
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1260,55520.7,7814440,1.9314,1.982,1.2792,34.489


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7822432
  custom_metrics: {}
  date: 2021-12-10_04-28-09
  done: false
  episode_len_mean: 36.2
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9092000025074656
  episode_reward_min: -2.0
  episodes_this_iter: 205
  episodes_total: 136715
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0185249466449022
          entropy_coeff: 0.0
          kl: 0.013634878006996587
          policy_loss: -0.10313080278865527
          total_loss: -0.05997161942650564
          vf_explained_var: 0.8470240831375122
          vf_loss: 0.022451212455052882
    num_agent_steps_sampled: 7822432
    num_steps_sampled: 7822432
    num_steps_trained: 7822432
  iterations_since_restore: 266
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1261,55566.5,7822432,1.9092,1.9804,-2,36.2


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7830424
  custom_metrics: {}
  date: 2021-12-10_04-28-55
  done: false
  episode_len_mean: 34.10434782608696
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9177321729452714
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 136945
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9643475878983736
          entropy_coeff: 0.0
          kl: 0.013866362249245867
          policy_loss: -0.09435547156317625
          total_loss: -0.048561234812950715
          vf_explained_var: 0.803767204284668
          vf_loss: 0.024734698876272887
    num_agent_steps_sampled: 7830424
    num_steps_sampled: 7830424
    num_steps_trained: 7830424
  iterations_since_restore: 267

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1262,55612.5,7830424,1.91773,1.9852,-2,34.1043


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7838416
  custom_metrics: {}
  date: 2021-12-10_04-29-41
  done: false
  episode_len_mean: 42.01980198019802
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8977544495079777
  episode_reward_min: -2.0
  episodes_this_iter: 202
  episodes_total: 137147
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.012437704950571
          entropy_coeff: 0.0
          kl: 0.013816796679748222
          policy_loss: -0.10104992869310081
          total_loss: -0.053432176340720616
          vf_explained_var: 0.8187638521194458
          vf_loss: 0.026633490750100464
    num_agent_steps_sampled: 7838416
    num_steps_sampled: 7838416
    num_steps_trained: 7838416
  iterations_since_restore: 26

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1263,55658,7838416,1.89775,1.9808,-2,42.0198


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7846408
  custom_metrics: {}
  date: 2021-12-10_04-30-26
  done: false
  episode_len_mean: 34.44796380090498
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8961085998095
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 137368
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9965432099997997
          entropy_coeff: 0.0
          kl: 0.013409071048954502
          policy_loss: -0.0983884414890781
          total_loss: -0.046104491339065135
          vf_explained_var: 0.7967140674591064
          vf_loss: 0.03191892494214699
    num_agent_steps_sampled: 7846408
    num_steps_sampled: 7846408
    num_steps_trained: 7846408
  iterations_since_restore: 269
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1264,55703.8,7846408,1.89611,1.9792,-2,34.448


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7854400
  custom_metrics: {}
  date: 2021-12-10_04-31-12
  done: false
  episode_len_mean: 41.54502369668246
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9172890999870842
  episode_reward_min: 0.5676000118255615
  episodes_this_iter: 211
  episodes_total: 137579
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9480742011219263
          entropy_coeff: 0.0
          kl: 0.015092544723302126
          policy_loss: -0.10825043253134936
          total_loss: -0.06046786159276962
          vf_explained_var: 0.7814128398895264
          vf_loss: 0.024860765726771206
    num_agent_steps_sampled: 7854400
    num_steps_sampled: 7854400
    num_steps_trained: 7854400
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1265,55749.6,7854400,1.91729,1.9804,0.5676,41.545


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7862392
  custom_metrics: {}
  date: 2021-12-10_04-31-58
  done: false
  episode_len_mean: 33.0655737704918
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9342229478671902
  episode_reward_min: 1.5019999742507935
  episodes_this_iter: 244
  episodes_total: 137823
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9281271062791348
          entropy_coeff: 0.0
          kl: 0.014313348714495078
          policy_loss: -0.10241465692524798
          total_loss: -0.055671444821200566
          vf_explained_var: 0.680473268032074
          vf_loss: 0.025004814146086574
    num_agent_steps_sampled: 7862392
    num_steps_sampled: 7862392
    num_steps_trained: 7862392
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1266,55795.3,7862392,1.93422,1.9824,1.502,33.0656


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7870384
  custom_metrics: {}
  date: 2021-12-10_04-32-44
  done: false
  episode_len_mean: 34.48497854077253
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8978025749517613
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 138056
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9467075038701296
          entropy_coeff: 0.0
          kl: 0.013660106342285872
          policy_loss: -0.09138491538760718
          total_loss: -0.04478327254764736
          vf_explained_var: 0.763392448425293
          vf_loss: 0.02585535595426336
    num_agent_steps_sampled: 7870384
    num_steps_sampled: 7870384
    num_steps_trained: 7870384
  iterations_since_restore: 272


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1267,55841.2,7870384,1.8978,1.9824,-2,34.485


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7878376
  custom_metrics: {}
  date: 2021-12-10_04-33-30
  done: false
  episode_len_mean: 33.17619047619048
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8964609452656338
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 138266
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9863618407398462
          entropy_coeff: 0.0
          kl: 0.013268127018818632
          policy_loss: -0.09369739700923674
          total_loss: -0.04614194680470973
          vf_explained_var: 0.7819857001304626
          vf_loss: 0.02740448055556044
    num_agent_steps_sampled: 7878376
    num_steps_sampled: 7878376
    num_steps_trained: 7878376
  iterations_since_restore: 273


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1268,55887,7878376,1.89646,1.9832,-2,33.1762


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7886368
  custom_metrics: {}
  date: 2021-12-10_04-34-16
  done: false
  episode_len_mean: 36.14912280701754
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9109508782102351
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 138494
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9312278907746077
          entropy_coeff: 0.0
          kl: 0.013839038670994341
          policy_loss: -0.10156322954571806
          total_loss: -0.05411714600631967
          vf_explained_var: 0.7468280792236328
          vf_loss: 0.02642804241622798
    num_agent_steps_sampled: 7886368
    num_steps_sampled: 7886368
    num_steps_trained: 7886368
  iterations_since_restore: 274

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1269,55933.3,7886368,1.91095,1.9824,-2,36.1491


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7894360
  custom_metrics: {}
  date: 2021-12-10_04-35-02
  done: false
  episode_len_mean: 35.46521739130435
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8637460848559504
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 138724
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9434353765100241
          entropy_coeff: 0.0
          kl: 0.013592218310805038
          policy_loss: -0.095175324997399
          total_loss: -0.04660573498404119
          vf_explained_var: 0.8320834040641785
          vf_loss: 0.027926409849897027
    num_agent_steps_sampled: 7894360
    num_steps_sampled: 7894360
    num_steps_trained: 7894360
  iterations_since_restore: 275


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1270,55979.1,7894360,1.86375,1.9824,-2,35.4652


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7902352
  custom_metrics: {}
  date: 2021-12-10_04-35-48
  done: false
  episode_len_mean: 33.88636363636363
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.917851515791633
  episode_reward_min: -2.0
  episodes_this_iter: 264
  episodes_total: 138988
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8877471033483744
          entropy_coeff: 0.0
          kl: 0.013543852925067768
          policy_loss: -0.09273369138827547
          total_loss: -0.046252504136646166
          vf_explained_var: 0.7505369186401367
          vf_loss: 0.02591146281338297
    num_agent_steps_sampled: 7902352
    num_steps_sampled: 7902352
    num_steps_trained: 7902352
  iterations_since_restore: 276

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1271,56025.2,7902352,1.91785,1.9824,-2,33.8864


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7910344
  custom_metrics: {}
  date: 2021-12-10_04-36-34
  done: false
  episode_len_mean: 31.85593220338983
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9035033881664276
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 139224
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9181700311601162
          entropy_coeff: 0.0
          kl: 0.013629962923005223
          policy_loss: -0.10061336436774582
          total_loss: -0.05172030440007802
          vf_explained_var: 0.7480432987213135
          vf_loss: 0.028192551049869508
    num_agent_steps_sampled: 7910344
    num_steps_sampled: 7910344
    num_steps_trained: 7910344
  iterations_since_restore: 277

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1272,56071.2,7910344,1.9035,1.9828,-2,31.8559


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7918336
  custom_metrics: {}
  date: 2021-12-10_04-37-20
  done: false
  episode_len_mean: 35.25454545454546
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9125636333769018
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 139444
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9457017816603184
          entropy_coeff: 0.0
          kl: 0.013685702520888299
          policy_loss: -0.10010643365967553
          total_loss: -0.0516512242029421
          vf_explained_var: 0.771682858467102
          vf_loss: 0.0276700469548814
    num_agent_steps_sampled: 7918336
    num_steps_sampled: 7918336
    num_steps_trained: 7918336
  iterations_since_restore: 278
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1273,56116.7,7918336,1.91256,1.9832,-2,35.2545


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7926328
  custom_metrics: {}
  date: 2021-12-10_04-38-05
  done: false
  episode_len_mean: 37.352380952380955
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9257180934860594
  episode_reward_min: 1.6419999599456787
  episodes_this_iter: 210
  episodes_total: 139654
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9587865862995386
          entropy_coeff: 0.0
          kl: 0.015399272844661027
          policy_loss: -0.10957660927670076
          total_loss: -0.06011852709343657
          vf_explained_var: 0.6616048812866211
          vf_loss: 0.026070438732858747
    num_agent_steps_sampled: 7926328
    num_steps_sampled: 7926328
    num_steps_trained: 7926328
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1274,56162.4,7926328,1.92572,1.9832,1.642,37.3524


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7934320
  custom_metrics: {}
  date: 2021-12-10_04-38-51
  done: false
  episode_len_mean: 36.37719298245614
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9276473699954517
  episode_reward_min: 1.3167999982833862
  episodes_this_iter: 228
  episodes_total: 139882
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9396774582564831
          entropy_coeff: 0.0
          kl: 0.01528633153066039
          policy_loss: -0.1069399492116645
          total_loss: -0.06068966741440818
          vf_explained_var: 0.7475269436836243
          vf_loss: 0.023034166952129453
    num_agent_steps_sampled: 7934320
    num_steps_sampled: 7934320
    num_steps_trained: 7934320
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1275,56207.9,7934320,1.92765,1.9832,1.3168,36.3772


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7942312
  custom_metrics: {}
  date: 2021-12-10_04-39-37
  done: false
  episode_len_mean: 33.02358490566038
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9158811355536838
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 140094
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9955677259713411
          entropy_coeff: 0.0
          kl: 0.013415797700872645
          policy_loss: -0.10023534897482023
          total_loss: -0.05226262490032241
          vf_explained_var: 0.799345076084137
          vf_loss: 0.02759748144308105
    num_agent_steps_sampled: 7942312
    num_steps_sampled: 7942312
    num_steps_trained: 7942312
  iterations_since_restore: 281
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1276,56253.8,7942312,1.91588,1.9832,-2,33.0236


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7950304
  custom_metrics: {}
  date: 2021-12-10_04-40-22
  done: false
  episode_len_mean: 35.256637168141594
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8954513252308938
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 140320
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9479728937149048
          entropy_coeff: 0.0
          kl: 0.013433394255116582
          policy_loss: -0.09132555010728538
          total_loss: -0.04529095868929289
          vf_explained_var: 0.8286494612693787
          vf_loss: 0.025632629985921085
    num_agent_steps_sampled: 7950304
    num_steps_sampled: 7950304
    num_steps_trained: 7950304
  iterations_since_restore: 28

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1277,56299.3,7950304,1.89545,1.9832,-2,35.2566


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7958296
  custom_metrics: {}
  date: 2021-12-10_04-41-08
  done: false
  episode_len_mean: 41.388392857142854
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9175714257040195
  episode_reward_min: 0.0
  episodes_this_iter: 224
  episodes_total: 140544
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9437238331884146
          entropy_coeff: 0.0
          kl: 0.014632380043622106
          policy_loss: -0.10378323995973915
          total_loss: -0.0546579398906033
          vf_explained_var: 0.7152847051620483
          vf_loss: 0.026902373880147934
    num_agent_steps_sampled: 7958296
    num_steps_sampled: 7958296
    num_steps_trained: 7958296
  iterations_since_restore: 283


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1278,56345.2,7958296,1.91757,1.9832,0,41.3884


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7966288
  custom_metrics: {}
  date: 2021-12-10_04-41-54
  done: false
  episode_len_mean: 35.56585365853658
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9292585384554979
  episode_reward_min: 1.6740000247955322
  episodes_this_iter: 205
  episodes_total: 140749
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9581089951097965
          entropy_coeff: 0.0
          kl: 0.015400498319650069
          policy_loss: -0.10258450105902739
          total_loss: -0.055997039715293795
          vf_explained_var: 0.7109310626983643
          vf_loss: 0.023197955975774676
    num_agent_steps_sampled: 7966288
    num_steps_sampled: 7966288
    num_steps_trained: 7966288
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1279,56390.6,7966288,1.92926,1.9832,1.674,35.5659


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7974280
  custom_metrics: {}
  date: 2021-12-10_04-42-39
  done: false
  episode_len_mean: 38.31277533039648
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9068352419899424
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 140976
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9238745123147964
          entropy_coeff: 0.0
          kl: 0.01430917569086887
          policy_loss: -0.09963425787282176
          total_loss: -0.05591195421584416
          vf_explained_var: 0.7596158981323242
          vf_loss: 0.02199024721630849
    num_agent_steps_sampled: 7974280
    num_steps_sampled: 7974280
    num_steps_trained: 7974280
  iterations_since_restore: 285
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1280,56436.3,7974280,1.90684,1.9832,-2,38.3128


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7982272
  custom_metrics: {}
  date: 2021-12-10_04-43-25
  done: false
  episode_len_mean: 35.95260663507109
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9285004720868657
  episode_reward_min: 1.6160000562667847
  episodes_this_iter: 211
  episodes_total: 141187
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9721897542476654
          entropy_coeff: 0.0
          kl: 0.014238641655538231
          policy_loss: -0.10534159815870225
          total_loss: -0.06289659658796154
          vf_explained_var: 0.7594456672668457
          vf_loss: 0.02082006650744006
    num_agent_steps_sampled: 7982272
    num_steps_sampled: 7982272
    num_steps_trained: 7982272
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1281,56481.7,7982272,1.9285,1.9804,1.616,35.9526


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7990264
  custom_metrics: {}
  date: 2021-12-10_04-44-11
  done: false
  episode_len_mean: 38.51456310679612
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8852990275447807
  episode_reward_min: -2.0
  episodes_this_iter: 206
  episodes_total: 141393
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0055685192346573
          entropy_coeff: 0.0
          kl: 0.012915275030536577
          policy_loss: -0.10001193534117192
          total_loss: -0.052483219231362455
          vf_explained_var: 0.7755498290061951
          vf_loss: 0.027913638565223664
    num_agent_steps_sampled: 7990264
    num_steps_sampled: 7990264
    num_steps_trained: 7990264
  iterations_since_restore: 28

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1282,56527.6,7990264,1.8853,1.9832,-2,38.5146


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 7998256
  custom_metrics: {}
  date: 2021-12-10_04-44-56
  done: false
  episode_len_mean: 36.885714285714286
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9265851388658797
  episode_reward_min: 1.6740000247955322
  episodes_this_iter: 175
  episodes_total: 141568
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0617055352777243
          entropy_coeff: 0.0
          kl: 0.01472767480299808
          policy_loss: -0.11115690466249362
          total_loss: -0.06495787692256272
          vf_explained_var: 0.7923077940940857
          vf_loss: 0.023831375234294683
    num_agent_steps_sampled: 7998256
    num_steps_sampled: 7998256
    num_steps_trained: 7998256
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1283,56573.1,7998256,1.92659,1.9796,1.674,36.8857


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8006248
  custom_metrics: {}
  date: 2021-12-10_04-45-42
  done: false
  episode_len_mean: 39.83
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.885917997956276
  episode_reward_min: -2.0
  episodes_this_iter: 200
  episodes_total: 141768
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0361403618007898
          entropy_coeff: 0.0
          kl: 0.013454004714731127
          policy_loss: -0.1039315132657066
          total_loss: -0.056766619672998786
          vf_explained_var: 0.8223803043365479
          vf_loss: 0.026731624850071967
    num_agent_steps_sampled: 8006248
    num_steps_sampled: 8006248
    num_steps_trained: 8006248
  iterations_since_restore: 289
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1284,56618.6,8006248,1.88592,1.9796,-2,39.83


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8014240
  custom_metrics: {}
  date: 2021-12-10_04-46-27
  done: false
  episode_len_mean: 49.77777777777778
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8817396823060575
  episode_reward_min: -2.0
  episodes_this_iter: 189
  episodes_total: 141957
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0112347416579723
          entropy_coeff: 0.0
          kl: 0.01421868591569364
          policy_loss: -0.10772667088895105
          total_loss: -0.056941468588775024
          vf_explained_var: 0.7781215310096741
          vf_loss: 0.02919057389954105
    num_agent_steps_sampled: 8014240
    num_steps_sampled: 8014240
    num_steps_trained: 8014240
  iterations_since_restore: 290


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1285,56664.2,8014240,1.88174,1.9796,-2,49.7778


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8022232
  custom_metrics: {}
  date: 2021-12-10_04-47-13
  done: false
  episode_len_mean: 35.683982683982684
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.912533332259108
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 142188
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9271581787616014
          entropy_coeff: 0.0
          kl: 0.013885128166293725
          policy_loss: -0.10221567674307153
          total_loss: -0.05553118186071515
          vf_explained_var: 0.7972235083580017
          vf_loss: 0.025596459221560508
    num_agent_steps_sampled: 8022232
    num_steps_sampled: 8022232
    num_steps_trained: 8022232
  iterations_since_restore: 291

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1286,56709.7,8022232,1.91253,1.9796,-2,35.684


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8030224
  custom_metrics: {}
  date: 2021-12-10_04-47-59
  done: false
  episode_len_mean: 34.80373831775701
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9307401904435915
  episode_reward_min: 1.4531999826431274
  episodes_this_iter: 214
  episodes_total: 142402
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.975272998213768
          entropy_coeff: 0.0
          kl: 0.014942780166165903
          policy_loss: -0.10768111568177119
          total_loss: -0.060062485725211445
          vf_explained_var: 0.7627509832382202
          vf_loss: 0.024924284080043435
    num_agent_steps_sampled: 8030224
    num_steps_sampled: 8030224
    num_steps_trained: 8030224
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1287,56755.6,8030224,1.93074,1.9812,1.4532,34.8037


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8038216
  custom_metrics: {}
  date: 2021-12-10_04-48-44
  done: false
  episode_len_mean: 40.96296296296296
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.918455554931252
  episode_reward_min: 0.0
  episodes_this_iter: 216
  episodes_total: 142618
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9857791755348444
          entropy_coeff: 0.0
          kl: 0.014098722022026777
          policy_loss: -0.10701724945101887
          total_loss: -0.06131332213408314
          vf_explained_var: 0.802108645439148
          vf_loss: 0.024291495617944747
    num_agent_steps_sampled: 8038216
    num_steps_sampled: 8038216
    num_steps_trained: 8038216
  iterations_since_restore: 293
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1288,56801.2,8038216,1.91846,1.9804,0,40.963


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8046208
  custom_metrics: {}
  date: 2021-12-10_04-49-31
  done: false
  episode_len_mean: 31.983606557377048
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9205131139911589
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 142862
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9530804194509983
          entropy_coeff: 0.0
          kl: 0.014299585833214223
          policy_loss: -0.10407293873868184
          total_loss: -0.0615026462910464
          vf_explained_var: 0.780316174030304
          vf_loss: 0.020852800749707967
    num_agent_steps_sampled: 8046208
    num_steps_sampled: 8046208
    num_steps_trained: 8046208
  iterations_since_restore: 294

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1289,56847.4,8046208,1.92051,1.98,-2,31.9836


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8054200
  custom_metrics: {}
  date: 2021-12-10_04-50-16
  done: false
  episode_len_mean: 34.51931330472103
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9313424902412513
  episode_reward_min: 1.0127999782562256
  episodes_this_iter: 233
  episodes_total: 143095
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9240245558321476
          entropy_coeff: 0.0
          kl: 0.015081619116244838
          policy_loss: -0.11206691738334484
          total_loss: -0.06711371178971604
          vf_explained_var: 0.6893072128295898
          vf_loss: 0.022048000479117036
    num_agent_steps_sampled: 8054200
    num_steps_sampled: 8054200
    num_steps_trained: 8054200
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1290,56893,8054200,1.93134,1.9824,1.0128,34.5193


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8062192
  custom_metrics: {}
  date: 2021-12-10_04-51-03
  done: false
  episode_len_mean: 31.92828685258964
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9212366537268892
  episode_reward_min: -2.0
  episodes_this_iter: 251
  episodes_total: 143346
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9081922490149736
          entropy_coeff: 0.0
          kl: 0.01403402563300915
          policy_loss: -0.09722885035444051
          total_loss: -0.053986189974239096
          vf_explained_var: 0.7402232885360718
          vf_loss: 0.021928481000941247
    num_agent_steps_sampled: 8062192
    num_steps_sampled: 8062192
    num_steps_trained: 8062192
  iterations_since_restore: 296

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1291,56939.5,8062192,1.92124,1.9832,-2,31.9283


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8070184
  custom_metrics: {}
  date: 2021-12-10_04-51-49
  done: false
  episode_len_mean: 32.90295358649789
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9345991566211362
  episode_reward_min: 1.5715999603271484
  episodes_this_iter: 237
  episodes_total: 143583
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9578137118369341
          entropy_coeff: 0.0
          kl: 0.015039777877973393
          policy_loss: -0.11062812386080623
          total_loss: -0.06786084669874981
          vf_explained_var: 0.7378591895103455
          vf_loss: 0.01992561630322598
    num_agent_steps_sampled: 8070184
    num_steps_sampled: 8070184
    num_steps_trained: 8070184
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1292,56985.1,8070184,1.9346,1.9832,1.5716,32.903


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8078176
  custom_metrics: {}
  date: 2021-12-10_04-52-34
  done: false
  episode_len_mean: 35.013953488372096
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9304316265638484
  episode_reward_min: 1.0795999765396118
  episodes_this_iter: 215
  episodes_total: 143798
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0062096510082483
          entropy_coeff: 0.0
          kl: 0.015087722626049072
          policy_loss: -0.11079839646117762
          total_loss: -0.0667861719703069
          vf_explained_var: 0.7567068338394165
          vf_loss: 0.021097744000144303
    num_agent_steps_sampled: 8078176
    num_steps_sampled: 8078176
    num_steps_trained: 8078176
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1293,57030.9,8078176,1.93043,1.9832,1.0796,35.014


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8086168
  custom_metrics: {}
  date: 2021-12-10_04-53-20
  done: false
  episode_len_mean: 33.50909090909091
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8805363633415915
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 144018
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0110735930502415
          entropy_coeff: 0.0
          kl: 0.012723788793664426
          policy_loss: -0.09313992032548413
          total_loss: -0.05234208528418094
          vf_explained_var: 0.8683192133903503
          vf_loss: 0.021473581669852138
    num_agent_steps_sampled: 8086168
    num_steps_sampled: 8086168
    num_steps_trained: 8086168
  iterations_since_restore: 299

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1294,57076.2,8086168,1.88054,1.9832,-2,33.5091


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8094160
  custom_metrics: {}
  date: 2021-12-10_04-54-05
  done: false
  episode_len_mean: 34.46046511627907
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9315218592798986
  episode_reward_min: 1.5140000581741333
  episodes_this_iter: 215
  episodes_total: 144233
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0165956038981676
          entropy_coeff: 0.0
          kl: 0.014429373841267079
          policy_loss: -0.10390970636217389
          total_loss: -0.05710019536491018
          vf_explained_var: 0.8108877539634705
          vf_loss: 0.024894896137993783
    num_agent_steps_sampled: 8094160
    num_steps_sampled: 8094160
    num_steps_trained: 8094160
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1295,57121.9,8094160,1.93152,1.9832,1.514,34.4605


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8102152
  custom_metrics: {}
  date: 2021-12-10_04-54-51
  done: false
  episode_len_mean: 44.10313901345292
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8610349773291515
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 144456
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9964669644832611
          entropy_coeff: 0.0
          kl: 0.013704662735108286
          policy_loss: -0.0984488720423542
          total_loss: -0.04835858001024462
          vf_explained_var: 0.8426565527915955
          vf_loss: 0.02927633677609265
    num_agent_steps_sampled: 8102152
    num_steps_sampled: 8102152
    num_steps_trained: 8102152
  iterations_since_restore: 301
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1296,57167.9,8102152,1.86103,1.9832,-2,44.1031


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8110144
  custom_metrics: {}
  date: 2021-12-10_04-55-37
  done: false
  episode_len_mean: 34.11864406779661
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9321610164844383
  episode_reward_min: 1.4515999555587769
  episodes_this_iter: 236
  episodes_total: 144692
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9700670707970858
          entropy_coeff: 0.0
          kl: 0.01418707548873499
          policy_loss: -0.1063235740584787
          total_loss: -0.05810915233450942
          vf_explained_var: 0.7899086475372314
          vf_loss: 0.026667801954317838
    num_agent_steps_sampled: 8110144
    num_steps_sampled: 8110144
    num_steps_trained: 8110144
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1297,57213.6,8110144,1.93216,1.9832,1.4516,34.1186


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8118136
  custom_metrics: {}
  date: 2021-12-10_04-56-23
  done: false
  episode_len_mean: 34.526315789473685
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8810947391024806
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 144920
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9610383044928312
          entropy_coeff: 0.0
          kl: 0.012594886415172368
          policy_loss: -0.09674386982806027
          total_loss: -0.05020535035873763
          vf_explained_var: 0.8142386674880981
          vf_loss: 0.02741003892151639
    num_agent_steps_sampled: 8118136
    num_steps_sampled: 8118136
    num_steps_trained: 8118136
  iterations_since_restore: 30

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1298,57259.5,8118136,1.88109,1.9824,-2,34.5263


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8126128
  custom_metrics: {}
  date: 2021-12-10_04-57-09
  done: false
  episode_len_mean: 33.116
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9341984038352966
  episode_reward_min: 0.9927999973297119
  episodes_this_iter: 250
  episodes_total: 145170
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.903446638956666
          entropy_coeff: 0.0
          kl: 0.012934181373566389
          policy_loss: -0.08888909814413637
          total_loss: -0.04463672284327913
          vf_explained_var: 0.7821535468101501
          vf_loss: 0.024608587962575257
    num_agent_steps_sampled: 8126128
    num_steps_sampled: 8126128
    num_steps_trained: 8126128
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1299,57305.4,8126128,1.9342,1.9824,0.9928,33.116


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8134120
  custom_metrics: {}
  date: 2021-12-10_04-57-55
  done: false
  episode_len_mean: 33.03347280334728
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9189623419709785
  episode_reward_min: -2.0
  episodes_this_iter: 239
  episodes_total: 145409
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.915524560958147
          entropy_coeff: 0.0
          kl: 0.01319588252226822
          policy_loss: -0.09396154727437533
          total_loss: -0.04719160543754697
          vf_explained_var: 0.787299394607544
          vf_loss: 0.026728696131613106
    num_agent_steps_sampled: 8134120
    num_steps_sampled: 8134120
    num_steps_trained: 8134120
  iterations_since_restore: 305
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1300,57351.3,8134120,1.91896,1.9832,-2,33.0335


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8142112
  custom_metrics: {}
  date: 2021-12-10_04-58-41
  done: false
  episode_len_mean: 32.043137254901964
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9065145127913532
  episode_reward_min: -2.0
  episodes_this_iter: 255
  episodes_total: 145664
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8776237387210131
          entropy_coeff: 0.0
          kl: 0.013104610639857128
          policy_loss: -0.08826939633581787
          total_loss: -0.037037241796497256
          vf_explained_var: 0.7801068425178528
          vf_loss: 0.03132952796295285
    num_agent_steps_sampled: 8142112
    num_steps_sampled: 8142112
    num_steps_trained: 8142112
  iterations_since_restore: 30

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1301,57397.3,8142112,1.90651,1.9816,-2,32.0431


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8150104
  custom_metrics: {}
  date: 2021-12-10_04-59-27
  done: false
  episode_len_mean: 31.57551020408163
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9212604075062032
  episode_reward_min: -2.0
  episodes_this_iter: 245
  episodes_total: 145909
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9252288322895765
          entropy_coeff: 0.0
          kl: 0.013700469047762454
          policy_loss: -0.0948072184692137
          total_loss: -0.04938230910920538
          vf_explained_var: 0.746191680431366
          vf_loss: 0.024617326620500535
    num_agent_steps_sampled: 8150104
    num_steps_sampled: 8150104
    num_steps_trained: 8150104
  iterations_since_restore: 307


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1302,57443.4,8150104,1.92126,1.9824,-2,31.5755


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8158096
  custom_metrics: {}
  date: 2021-12-10_05-00-14
  done: false
  episode_len_mean: 33.0655737704918
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9196114750182043
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 146153
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9311752337962389
          entropy_coeff: 0.0
          kl: 0.013063532504020259
          policy_loss: -0.09551699373696465
          total_loss: -0.05135280707327183
          vf_explained_var: 0.8454964756965637
          vf_loss: 0.02432394598145038
    num_agent_steps_sampled: 8158096
    num_steps_sampled: 8158096
    num_steps_trained: 8158096
  iterations_since_restore: 308


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1303,57489.8,8158096,1.91961,1.982,-2,33.0656


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8166088
  custom_metrics: {}
  date: 2021-12-10_05-01-00
  done: false
  episode_len_mean: 33.91914893617021
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.916117447995125
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 146388
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8955388106405735
          entropy_coeff: 0.0
          kl: 0.013858064514352009
          policy_loss: -0.0952016404189635
          total_loss: -0.05199358279060107
          vf_explained_var: 0.7683655023574829
          vf_loss: 0.02216112188762054
    num_agent_steps_sampled: 8166088
    num_steps_sampled: 8166088
    num_steps_trained: 8166088
  iterations_since_restore: 309
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1304,57535.9,8166088,1.91612,1.982,-2,33.9191


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8174080
  custom_metrics: {}
  date: 2021-12-10_05-01-46
  done: false
  episode_len_mean: 36.133971291866025
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9281377986287387
  episode_reward_min: 1.5471999645233154
  episodes_this_iter: 209
  episodes_total: 146597
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.961011616513133
          entropy_coeff: 0.0
          kl: 0.014953142614103854
          policy_loss: -0.10945593856740743
          total_loss: -0.06656390262651257
          vf_explained_var: 0.7754313945770264
          vf_loss: 0.020181956002488732
    num_agent_steps_sampled: 8174080
    num_steps_sampled: 8174080
    num_steps_trained: 8174080
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1305,57581.7,8174080,1.92814,1.9824,1.5472,36.134


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8182072
  custom_metrics: {}
  date: 2021-12-10_05-02-32
  done: false
  episode_len_mean: 33.796380090497735
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.897163797827328
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 146818
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9677044153213501
          entropy_coeff: 0.0
          kl: 0.013694488879991695
          policy_loss: -0.09835217028739862
          total_loss: -0.05075774040597025
          vf_explained_var: 0.7709954977035522
          vf_loss: 0.0267959245829843
    num_agent_steps_sampled: 8182072
    num_steps_sampled: 8182072
    num_steps_trained: 8182072
  iterations_since_restore: 311


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1306,57627.8,8182072,1.89716,1.9804,-2,33.7964


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8190064
  custom_metrics: {}
  date: 2021-12-10_05-03-18
  done: false
  episode_len_mean: 34.64601769911504
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9310265473559893
  episode_reward_min: 1.3688000440597534
  episodes_this_iter: 226
  episodes_total: 147044
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.967609666287899
          entropy_coeff: 0.0
          kl: 0.014740642334800214
          policy_loss: -0.10923631832702085
          total_loss: -0.06078824118594639
          vf_explained_var: 0.7305842638015747
          vf_loss: 0.026060729986056685
    num_agent_steps_sampled: 8190064
    num_steps_sampled: 8190064
    num_steps_trained: 8190064
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1307,57673.7,8190064,1.93103,1.9832,1.3688,34.646


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8198056
  custom_metrics: {}
  date: 2021-12-10_05-04-04
  done: false
  episode_len_mean: 38.37391304347826
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9236278248869854
  episode_reward_min: 0.0
  episodes_this_iter: 230
  episodes_total: 147274
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9828774910420179
          entropy_coeff: 0.0
          kl: 0.015027313929749653
          policy_loss: -0.11000373697606847
          total_loss: -0.06245314780971967
          vf_explained_var: 0.7675098776817322
          vf_loss: 0.024727856914978474
    num_agent_steps_sampled: 8198056
    num_steps_sampled: 8198056
    num_steps_trained: 8198056
  iterations_since_restore: 313

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1308,57719.6,8198056,1.92363,1.982,0,38.3739


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8206048
  custom_metrics: {}
  date: 2021-12-10_05-04-50
  done: false
  episode_len_mean: 34.24568965517241
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9156224111030842
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 147506
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9522627238184214
          entropy_coeff: 0.0
          kl: 0.012994052696740255
          policy_loss: -0.0925492896058131
          total_loss: -0.05041496828198433
          vf_explained_var: 0.8170066475868225
          vf_loss: 0.022399606241378933
    num_agent_steps_sampled: 8206048
    num_steps_sampled: 8206048
    num_steps_trained: 8206048
  iterations_since_restore: 314

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1309,57765.7,8206048,1.91562,1.982,-2,34.2457


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8214040
  custom_metrics: {}
  date: 2021-12-10_05-05-36
  done: false
  episode_len_mean: 32.516666666666666
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.935294997692108
  episode_reward_min: 1.7491999864578247
  episodes_this_iter: 240
  episodes_total: 147746
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9245859161019325
          entropy_coeff: 0.0
          kl: 0.014489071880234405
          policy_loss: -0.10676642609178089
          total_loss: -0.06426523847039789
          vf_explained_var: 0.7365790009498596
          vf_loss: 0.020495910954196006
    num_agent_steps_sampled: 8214040
    num_steps_sampled: 8214040
    num_steps_trained: 8214040
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1310,57812.1,8214040,1.93529,1.982,1.7492,32.5167


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8222032
  custom_metrics: {}
  date: 2021-12-10_05-06-22
  done: false
  episode_len_mean: 31.92920353982301
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9197238926338938
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 147972
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9566886462271214
          entropy_coeff: 0.0
          kl: 0.015302838961360976
          policy_loss: -0.09867386636324227
          total_loss: -0.052196187316440046
          vf_explained_var: 0.773353099822998
          vf_loss: 0.023236490436829627
    num_agent_steps_sampled: 8222032
    num_steps_sampled: 8222032
    num_steps_trained: 8222032
  iterations_since_restore: 31

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1311,57857.9,8222032,1.91972,1.982,-2,31.9292


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8230024
  custom_metrics: {}
  date: 2021-12-10_05-07-08
  done: false
  episode_len_mean: 38.167381974248926
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9076171606907006
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 148205
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9501216728240252
          entropy_coeff: 0.0
          kl: 0.014257354021538049
          policy_loss: -0.10125437960959971
          total_loss: -0.05379888872266747
          vf_explained_var: 0.7104861736297607
          vf_loss: 0.025802134245168418
    num_agent_steps_sampled: 8230024
    num_steps_sampled: 8230024
    num_steps_trained: 8230024
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1312,57904,8230024,1.90762,1.982,-2,38.1674


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8238016
  custom_metrics: {}
  date: 2021-12-10_05-07-54
  done: false
  episode_len_mean: 34.79185520361991
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8793248865938834
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 148426
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9580954257398844
          entropy_coeff: 0.0
          kl: 0.012322771537583321
          policy_loss: -0.09184343114611693
          total_loss: -0.04248206765623763
          vf_explained_var: 0.7731069326400757
          vf_loss: 0.030646154249552637
    num_agent_steps_sampled: 8238016
    num_steps_sampled: 8238016
    num_steps_trained: 8238016
  iterations_since_restore: 31

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1313,57949.9,8238016,1.87932,1.982,-2,34.7919


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8246008
  custom_metrics: {}
  date: 2021-12-10_05-08-40
  done: false
  episode_len_mean: 36.137614678899084
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.874935777909165
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 148644
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.949946315959096
          entropy_coeff: 0.0
          kl: 0.013199440320022404
          policy_loss: -0.09215645380027127
          total_loss: -0.0449340661871247
          vf_explained_var: 0.8123936653137207
          vf_loss: 0.027175736235221848
    num_agent_steps_sampled: 8246008
    num_steps_sampled: 8246008
    num_steps_trained: 8246008
  iterations_since_restore: 319


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1314,57995.9,8246008,1.87494,1.982,-2,36.1376


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8254000
  custom_metrics: {}
  date: 2021-12-10_05-09-26
  done: false
  episode_len_mean: 37.26146788990825
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8910128474235535
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 148862
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9894370213150978
          entropy_coeff: 0.0
          kl: 0.012930696655530483
          policy_loss: -0.09719728844356723
          total_loss: -0.04049838037462905
          vf_explained_var: 0.8127733469009399
          vf_loss: 0.03706041403347626
    num_agent_steps_sampled: 8254000
    num_steps_sampled: 8254000
    num_steps_trained: 8254000
  iterations_since_restore: 320

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1315,58041.6,8254000,1.89101,1.9836,-2,37.2615


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8261992
  custom_metrics: {}
  date: 2021-12-10_05-10-12
  done: false
  episode_len_mean: 32.76036866359447
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9172479265845865
  episode_reward_min: -2.0
  episodes_this_iter: 217
  episodes_total: 149079
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0043901335448027
          entropy_coeff: 0.0
          kl: 0.014069679426029325
          policy_loss: -0.1058724659960717
          total_loss: -0.05861047681537457
          vf_explained_var: 0.8317502737045288
          vf_loss: 0.02589366538450122
    num_agent_steps_sampled: 8261992
    num_steps_sampled: 8261992
    num_steps_trained: 8261992
  iterations_since_restore: 321


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1316,58087.4,8261992,1.91725,1.984,-2,32.7604


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8269984
  custom_metrics: {}
  date: 2021-12-10_05-10-58
  done: false
  episode_len_mean: 39.13989637305699
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.922149222131838
  episode_reward_min: 1.19760000705719
  episodes_this_iter: 193
  episodes_total: 149272
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0207855273038149
          entropy_coeff: 0.0
          kl: 0.01460951881017536
          policy_loss: -0.10908707225462422
          total_loss: -0.06366295661428012
          vf_explained_var: 0.8378711342811584
          vf_loss: 0.023235909116920084
    num_agent_steps_sampled: 8269984
    num_steps_sampled: 8269984
    num_steps_trained: 8269984
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1317,58133.4,8269984,1.92215,1.9836,1.1976,39.1399


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8277976
  custom_metrics: {}
  date: 2021-12-10_05-11-43
  done: false
  episode_len_mean: 41.541463414634144
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9172682930783527
  episode_reward_min: 0.0
  episodes_this_iter: 205
  episodes_total: 149477
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9818626549094915
          entropy_coeff: 0.0
          kl: 0.014641093031968921
          policy_loss: -0.10804310123785399
          total_loss: -0.058805754422792234
          vf_explained_var: 0.793208122253418
          vf_loss: 0.027001187263522297
    num_agent_steps_sampled: 8277976
    num_steps_sampled: 8277976
    num_steps_trained: 8277976
  iterations_since_restore: 32

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1318,58179.2,8277976,1.91727,1.984,0,41.5415


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8285968
  custom_metrics: {}
  date: 2021-12-10_05-12-30
  done: false
  episode_len_mean: 40.93991416309013
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9023124498870752
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 149710
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9323125686496496
          entropy_coeff: 0.0
          kl: 0.014005987992277369
          policy_loss: -0.10774215729907155
          total_loss: -0.06025328200485092
          vf_explained_var: 0.7709207534790039
          vf_loss: 0.02621728030499071
    num_agent_steps_sampled: 8285968
    num_steps_sampled: 8285968
    num_steps_trained: 8285968
  iterations_since_restore: 324

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1319,58225.3,8285968,1.90231,1.984,-2,40.9399


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8293960
  custom_metrics: {}
  date: 2021-12-10_05-13-16
  done: false
  episode_len_mean: 33.58227848101266
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.917321520515635
  episode_reward_min: -2.0
  episodes_this_iter: 237
  episodes_total: 149947
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9253343418240547
          entropy_coeff: 0.0
          kl: 0.014144857530482113
          policy_loss: -0.0980254384339787
          total_loss: -0.053549751683021896
          vf_explained_var: 0.8252220153808594
          vf_loss: 0.022993184393271804
    num_agent_steps_sampled: 8293960
    num_steps_sampled: 8293960
    num_steps_trained: 8293960
  iterations_since_restore: 325

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1320,58271.2,8293960,1.91732,1.984,-2,33.5823


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8301952
  custom_metrics: {}
  date: 2021-12-10_05-14-02
  done: false
  episode_len_mean: 33.315555555555555
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9162168878979153
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 150172
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9820381831377745
          entropy_coeff: 0.0
          kl: 0.013752714789006859
          policy_loss: -0.1031631546211429
          total_loss: -0.05839918731362559
          vf_explained_var: 0.7695586681365967
          vf_loss: 0.023877033148892224
    num_agent_steps_sampled: 8301952
    num_steps_sampled: 8301952
    num_steps_trained: 8301952
  iterations_since_restore: 32

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1321,58317.2,8301952,1.91622,1.984,-2,33.3156


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8309944
  custom_metrics: {}
  date: 2021-12-10_05-14-47
  done: false
  episode_len_mean: 34.053333333333335
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9323271073235406
  episode_reward_min: 1.4579999446868896
  episodes_this_iter: 225
  episodes_total: 150397
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9584308229386806
          entropy_coeff: 0.0
          kl: 0.014748259971383959
          policy_loss: -0.10204837331548333
          total_loss: -0.05669426781241782
          vf_explained_var: 0.7824087738990784
          vf_loss: 0.022955191088840365
    num_agent_steps_sampled: 8309944
    num_steps_sampled: 8309944
    num_steps_trained: 8309944
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1322,58362.8,8309944,1.93233,1.984,1.458,34.0533


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8317936
  custom_metrics: {}
  date: 2021-12-10_05-15-34
  done: false
  episode_len_mean: 33.72727272727273
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.876926319450853
  episode_reward_min: -2.0
  episodes_this_iter: 209
  episodes_total: 150606
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.022235257551074
          entropy_coeff: 0.0
          kl: 0.012822435761336237
          policy_loss: -0.09018751833355054
          total_loss: -0.04388259675761219
          vf_explained_var: 0.8290983438491821
          vf_loss: 0.026830847142264247
    num_agent_steps_sampled: 8317936
    num_steps_sampled: 8317936
    num_steps_trained: 8317936
  iterations_since_restore: 328


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1323,58409.1,8317936,1.87693,1.9836,-2,33.7273


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8325928
  custom_metrics: {}
  date: 2021-12-10_05-16-19
  done: false
  episode_len_mean: 42.546728971962615
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8858336416360373
  episode_reward_min: -2.0
  episodes_this_iter: 214
  episodes_total: 150820
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9447567835450172
          entropy_coeff: 0.0
          kl: 0.012947474955581129
          policy_loss: -0.09627360937884077
          total_loss: -0.04174774121202063
          vf_explained_var: 0.7590975165367126
          vf_loss: 0.03486189286923036
    num_agent_steps_sampled: 8325928
    num_steps_sampled: 8325928
    num_steps_trained: 8325928
  iterations_since_restore: 32

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1324,58454.8,8325928,1.88583,1.9836,-2,42.5467


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8333920
  custom_metrics: {}
  date: 2021-12-10_05-17-05
  done: false
  episode_len_mean: 32.51282051282051
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9353128202960022
  episode_reward_min: 1.5540000200271606
  episodes_this_iter: 234
  episodes_total: 151054
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.94381994754076
          entropy_coeff: 0.0
          kl: 0.015238800930092111
          policy_loss: -0.10682162060402334
          total_loss: -0.05929337759152986
          vf_explained_var: 0.7641230821609497
          vf_loss: 0.024384318618103862
    num_agent_steps_sampled: 8333920
    num_steps_sampled: 8333920
    num_steps_trained: 8333920
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1325,58500.7,8333920,1.93531,1.9836,1.554,32.5128


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8341912
  custom_metrics: {}
  date: 2021-12-10_05-17-51
  done: false
  episode_len_mean: 39.874439461883405
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.8680466376077969
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 151277
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9515177756547928
          entropy_coeff: 0.0
          kl: 0.012924078619107604
          policy_loss: -0.09025445179577218
          total_loss: -0.03791062780874199
          vf_explained_var: 0.8051899075508118
          vf_loss: 0.03271538013359532
    num_agent_steps_sampled: 8341912
    num_steps_sampled: 8341912
    num_steps_trained: 8341912
  iterations_since_restore: 33

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1326,58546.5,8341912,1.86805,1.9764,-2,39.8744


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8349904
  custom_metrics: {}
  date: 2021-12-10_05-18-37
  done: false
  episode_len_mean: 35.27014218009479
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.9299222750686356
  episode_reward_min: 1.430400013923645
  episodes_this_iter: 211
  episodes_total: 151488
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9953041598200798
          entropy_coeff: 0.0
          kl: 0.01404000900220126
          policy_loss: -0.10665426042396575
          total_loss: -0.06093343828979414
          vf_explained_var: 0.8273361921310425
          vf_loss: 0.024397559347562492
    num_agent_steps_sampled: 8349904
    num_steps_sampled: 8349904
    num_steps_trained: 8349904
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1327,58592.2,8349904,1.92992,1.9764,1.4304,35.2701


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8357896
  custom_metrics: {}
  date: 2021-12-10_05-19-22
  done: false
  episode_len_mean: 35.357798165137616
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.9297266050216255
  episode_reward_min: 1.1496000289916992
  episodes_this_iter: 218
  episodes_total: 151706
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9851634446531534
          entropy_coeff: 0.0
          kl: 0.014537759619997814
          policy_loss: -0.10465917366673239
          total_loss: -0.059196079382672906
          vf_explained_var: 0.8238678574562073
          vf_loss: 0.02338387304916978
    num_agent_steps_sampled: 8357896
    num_steps_sampled: 8357896
    num_steps_trained: 8357896
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1328,58637.9,8357896,1.92973,1.9764,1.1496,35.3578


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8365888
  custom_metrics: {}
  date: 2021-12-10_05-20-08
  done: false
  episode_len_mean: 39.153153153153156
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8870198170881014
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 151928
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9550144225358963
          entropy_coeff: 0.0
          kl: 0.014132092968793586
          policy_loss: -0.10304047429235652
          total_loss: -0.05416466778842732
          vf_explained_var: 0.7880445718765259
          vf_loss: 0.027412691270001233
    num_agent_steps_sampled: 8365888
    num_steps_sampled: 8365888
    num_steps_trained: 8365888
  iterations_since_restore: 33

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1329,58683.7,8365888,1.88702,1.9816,-2,39.1532


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8373880
  custom_metrics: {}
  date: 2021-12-10_05-20-55
  done: false
  episode_len_mean: 34.44444444444444
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9314879979027642
  episode_reward_min: 1.409600019454956
  episodes_this_iter: 225
  episodes_total: 152153
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9426825176924467
          entropy_coeff: 0.0
          kl: 0.014871170482365415
          policy_loss: -0.10733574378537014
          total_loss: -0.06132228561909869
          vf_explained_var: 0.7668863534927368
          vf_loss: 0.023427872161846608
    num_agent_steps_sampled: 8373880
    num_steps_sampled: 8373880
    num_steps_trained: 8373880
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1330,58730.1,8373880,1.93149,1.9796,1.4096,34.4444


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8381872
  custom_metrics: {}
  date: 2021-12-10_05-21-40
  done: false
  episode_len_mean: 32.329166666666666
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.903176668783029
  episode_reward_min: -2.0
  episodes_this_iter: 240
  episodes_total: 152393
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9468379933387041
          entropy_coeff: 0.0
          kl: 0.012892455590190366
          policy_loss: -0.0944655243656598
          total_loss: -0.04606354795396328
          vf_explained_var: 0.7965874671936035
          vf_loss: 0.02882155915722251
    num_agent_steps_sampled: 8381872
    num_steps_sampled: 8381872
    num_steps_trained: 8381872
  iterations_since_restore: 336
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1331,58775.7,8381872,1.90318,1.9816,-2,32.3292


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8389864
  custom_metrics: {}
  date: 2021-12-10_05-22-26
  done: false
  episode_len_mean: 38.30882352941177
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.904660786483802
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 152597
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9720952045172453
          entropy_coeff: 0.0
          kl: 0.014100730506470427
          policy_loss: -0.0992412706837058
          total_loss: -0.048429240414407104
          vf_explained_var: 0.7922953963279724
          vf_loss: 0.029396547586657107
    num_agent_steps_sampled: 8389864
    num_steps_sampled: 8389864
    num_steps_trained: 8389864
  iterations_since_restore: 337

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1332,58821.4,8389864,1.90466,1.9812,-2,38.3088


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8397856
  custom_metrics: {}
  date: 2021-12-10_05-23-12
  done: false
  episode_len_mean: 35.53303964757709
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.929318060433812
  episode_reward_min: 1.205199956893921
  episodes_this_iter: 227
  episodes_total: 152824
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9215680118650198
          entropy_coeff: 0.0
          kl: 0.01410575647605583
          policy_loss: -0.1003832578426227
          total_loss: -0.05413802250404842
          vf_explained_var: 0.7690562605857849
          vf_loss: 0.024822119914460927
    num_agent_steps_sampled: 8397856
    num_steps_sampled: 8397856
    num_steps_trained: 8397856
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1333,58867.1,8397856,1.92932,1.9816,1.2052,35.533


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8405848
  custom_metrics: {}
  date: 2021-12-10_05-23-58
  done: false
  episode_len_mean: 35.42672413793103
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9295051709331315
  episode_reward_min: 0.0
  episodes_this_iter: 232
  episodes_total: 153056
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9664865024387836
          entropy_coeff: 0.0
          kl: 0.01424150649108924
          policy_loss: -0.10113406466552988
          total_loss: -0.05593558777036378
          vf_explained_var: 0.8179856538772583
          vf_loss: 0.023569192038848996
    num_agent_steps_sampled: 8405848
    num_steps_sampled: 8405848
    num_steps_trained: 8405848
  iterations_since_restore: 339


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1334,58913,8405848,1.92951,1.9812,0,35.4267


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8413840
  custom_metrics: {}
  date: 2021-12-10_05-24-44
  done: false
  episode_len_mean: 38.04265402843602
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.905639806629922
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 153267
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9875553455203772
          entropy_coeff: 0.0
          kl: 0.013797071733279154
          policy_loss: -0.09629331511678174
          total_loss: -0.045185916242189705
          vf_explained_var: 0.7601468563079834
          vf_loss: 0.030153094965498894
    num_agent_steps_sampled: 8413840
    num_steps_sampled: 8413840
    num_steps_trained: 8413840
  iterations_since_restore: 34

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1335,58958.8,8413840,1.90564,1.9812,-2,38.0427


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8421832
  custom_metrics: {}
  date: 2021-12-10_05-25-30
  done: false
  episode_len_mean: 35.08196721311475
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8868885206394508
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 153511
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9054506439715624
          entropy_coeff: 0.0
          kl: 0.012500082841143012
          policy_loss: -0.09434839320601895
          total_loss: -0.05140082008438185
          vf_explained_var: 0.8329372406005859
          vf_loss: 0.02396307233721018
    num_agent_steps_sampled: 8421832
    num_steps_sampled: 8421832
    num_steps_trained: 8421832
  iterations_since_restore: 341

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1336,59004.9,8421832,1.88689,1.9812,-2,35.082


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8429824
  custom_metrics: {}
  date: 2021-12-10_05-26-16
  done: false
  episode_len_mean: 36.740196078431374
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.871431373498019
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 153715
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9769354816526175
          entropy_coeff: 0.0
          kl: 0.012743192230118439
          policy_loss: -0.0957679917482892
          total_loss: -0.045744297764031217
          vf_explained_var: 0.8399062752723694
          vf_loss: 0.030669973173644394
    num_agent_steps_sampled: 8429824
    num_steps_sampled: 8429824
    num_steps_trained: 8429824
  iterations_since_restore: 34

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1337,59050.6,8429824,1.87143,1.9812,-2,36.7402


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8437816
  custom_metrics: {}
  date: 2021-12-10_05-27-01
  done: false
  episode_len_mean: 35.02272727272727
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.9131727256558158
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 153935
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9565053842961788
          entropy_coeff: 0.0
          kl: 0.01411970381741412
          policy_loss: -0.09619921699049883
          total_loss: -0.047399795017554425
          vf_explained_var: 0.800006628036499
          vf_loss: 0.027355126105248928
    num_agent_steps_sampled: 8437816
    num_steps_sampled: 8437816
    num_steps_trained: 8437816
  iterations_since_restore: 343

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1338,59096.5,8437816,1.91317,1.9776,-2,35.0227


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8445808
  custom_metrics: {}
  date: 2021-12-10_05-27-47
  done: false
  episode_len_mean: 35.61344537815126
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8970369776757825
  episode_reward_min: -2.0
  episodes_this_iter: 238
  episodes_total: 154173
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9437683168798685
          entropy_coeff: 0.0
          kl: 0.013368765416089445
          policy_loss: -0.10093781203613617
          total_loss: -0.04940485910628922
          vf_explained_var: 0.821074366569519
          vf_loss: 0.031229140469804406
    num_agent_steps_sampled: 8445808
    num_steps_sampled: 8445808
    num_steps_trained: 8445808
  iterations_since_restore: 344

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1339,59142.4,8445808,1.89704,1.9812,-2,35.6134


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8453800
  custom_metrics: {}
  date: 2021-12-10_05-28-33
  done: false
  episode_len_mean: 37.8
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.857899996367368
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 154393
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9566728547215462
          entropy_coeff: 0.0
          kl: 0.01257836731383577
          policy_loss: -0.09455078237806447
          total_loss: -0.04066392785171047
          vf_explained_var: 0.8545281887054443
          vf_loss: 0.034783461422193795
    num_agent_steps_sampled: 8453800
    num_steps_sampled: 8453800
    num_steps_trained: 8453800
  iterations_since_restore: 345
  node_ip: 19

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1340,59188.2,8453800,1.8579,1.9788,-2,37.8


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8461792
  custom_metrics: {}
  date: 2021-12-10_05-29-19
  done: false
  episode_len_mean: 35.513888888888886
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8752574109368854
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 154609
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9406701009720564
          entropy_coeff: 0.0
          kl: 0.012427136447513476
          policy_loss: -0.09004312992328778
          total_loss: -0.03605206457723398
          vf_explained_var: 0.7761275172233582
          vf_loss: 0.035117351391818374
    num_agent_steps_sampled: 8461792
    num_steps_sampled: 8461792
    num_steps_trained: 8461792
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1341,59233.9,8461792,1.87526,1.9788,-2,35.5139


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8469784
  custom_metrics: {}
  date: 2021-12-10_05-30-05
  done: false
  episode_len_mean: 38.73271889400922
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9229843314342234
  episode_reward_min: 0.9168000221252441
  episodes_this_iter: 217
  episodes_total: 154826
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9343527518212795
          entropy_coeff: 0.0
          kl: 0.015097234019776806
          policy_loss: -0.10363652615342289
          total_loss: -0.048753135051811114
          vf_explained_var: 0.7042412757873535
          vf_loss: 0.03195447096368298
    num_agent_steps_sampled: 8469784
    num_steps_sampled: 8469784
    num_steps_trained: 8469784
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1342,59279.5,8469784,1.92298,1.9804,0.9168,38.7327


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8477776
  custom_metrics: {}
  date: 2021-12-10_05-30-51
  done: false
  episode_len_mean: 34.136563876651984
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9162149754914943
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 155053
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9163813143968582
          entropy_coeff: 0.0
          kl: 0.014587556826882064
          policy_loss: -0.10554588377999607
          total_loss: -0.05567847844213247
          vf_explained_var: 0.723020076751709
          vf_loss: 0.027712557755876333
    num_agent_steps_sampled: 8477776
    num_steps_sampled: 8477776
    num_steps_trained: 8477776
  iterations_since_restore: 34

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1343,59325.8,8477776,1.91621,1.9812,-2,34.1366


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8485768
  custom_metrics: {}
  date: 2021-12-10_05-31-37
  done: false
  episode_len_mean: 34.3215859030837
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.931795597601567
  episode_reward_min: 1.6615999937057495
  episodes_this_iter: 227
  episodes_total: 155280
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9363208692520857
          entropy_coeff: 0.0
          kl: 0.014651283476268873
          policy_loss: -0.11025493597844616
          total_loss: -0.0632010682602413
          vf_explained_var: 0.7294405102729797
          vf_loss: 0.024802231695502996
    num_agent_steps_sampled: 8485768
    num_steps_sampled: 8485768
    num_steps_trained: 8485768
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1344,59371.4,8485768,1.9318,1.9812,1.6616,34.3216


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8493760
  custom_metrics: {}
  date: 2021-12-10_05-32-22
  done: false
  episode_len_mean: 36.674528301886795
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9088962241163794
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 155492
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9477632269263268
          entropy_coeff: 0.0
          kl: 0.014681076660053805
          policy_loss: -0.10488778352737427
          total_loss: -0.056431591176078655
          vf_explained_var: 0.7552822828292847
          vf_loss: 0.026159309432841837
    num_agent_steps_sampled: 8493760
    num_steps_sampled: 8493760
    num_steps_trained: 8493760
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1345,59417.1,8493760,1.9089,1.9804,-2,36.6745


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8501752
  custom_metrics: {}
  date: 2021-12-10_05-33-08
  done: false
  episode_len_mean: 34.75909090909091
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8776781797409057
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 155712
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9629707876592875
          entropy_coeff: 0.0
          kl: 0.012779457028955221
          policy_loss: -0.09698603092692792
          total_loss: -0.036369946581544355
          vf_explained_var: 0.7473229169845581
          vf_loss: 0.041207284200936556
    num_agent_steps_sampled: 8501752
    num_steps_sampled: 8501752
    num_steps_trained: 8501752
  iterations_since_restore: 35

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1346,59462.7,8501752,1.87768,1.9816,-2,34.7591


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8509744
  custom_metrics: {}
  date: 2021-12-10_05-33-53
  done: false
  episode_len_mean: 36.93333333333333
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9069866620577298
  episode_reward_min: -2.0
  episodes_this_iter: 195
  episodes_total: 155907
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9946555644273758
          entropy_coeff: 0.0
          kl: 0.014178917277604342
          policy_loss: -0.10021530609810725
          total_loss: -0.04504076245939359
          vf_explained_var: 0.7893306016921997
          vf_loss: 0.0336403141845949
    num_agent_steps_sampled: 8509744
    num_steps_sampled: 8509744
    num_steps_trained: 8509744
  iterations_since_restore: 352


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1347,59508.3,8509744,1.90699,1.9804,-2,36.9333


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8517736
  custom_metrics: {}
  date: 2021-12-10_05-34-39
  done: false
  episode_len_mean: 39.91752577319588
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.88051546603134
  episode_reward_min: -2.0
  episodes_this_iter: 194
  episodes_total: 156101
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9972942899912596
          entropy_coeff: 0.0
          kl: 0.01427106757182628
          policy_loss: -0.09307822075788863
          total_loss: -0.042070288211107254
          vf_explained_var: 0.813684344291687
          vf_loss: 0.029333749320358038
    num_agent_steps_sampled: 8517736
    num_steps_sampled: 8517736
    num_steps_trained: 8517736
  iterations_since_restore: 353
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1348,59553.9,8517736,1.88052,1.9804,-2,39.9175


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8525728
  custom_metrics: {}
  date: 2021-12-10_05-35-25
  done: false
  episode_len_mean: 43.3041237113402
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.913806184050963
  episode_reward_min: 0.0
  episodes_this_iter: 194
  episodes_total: 156295
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.994425006210804
          entropy_coeff: 0.0
          kl: 0.014255214860895649
          policy_loss: -0.10655079412390478
          total_loss: -0.056763223623420345
          vf_explained_var: 0.7863254547119141
          vf_loss: 0.028137464134488255
    num_agent_steps_sampled: 8525728
    num_steps_sampled: 8525728
    num_steps_trained: 8525728
  iterations_since_restore: 354
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1349,59599.6,8525728,1.91381,1.9816,0,43.3041


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8533720
  custom_metrics: {}
  date: 2021-12-10_05-36-11
  done: false
  episode_len_mean: 40.32743362831859
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9027008828336158
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 156521
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9359563030302525
          entropy_coeff: 0.0
          kl: 0.014464159787166864
          policy_loss: -0.10469587196712382
          total_loss: -0.05236510137910955
          vf_explained_var: 0.7925304174423218
          vf_loss: 0.030363329336978495
    num_agent_steps_sampled: 8533720
    num_steps_sampled: 8533720
    num_steps_trained: 8533720
  iterations_since_restore: 355

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1350,59645.7,8533720,1.9027,1.9828,-2,40.3274


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8541712
  custom_metrics: {}
  date: 2021-12-10_05-36-57
  done: false
  episode_len_mean: 32.712446351931334
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8857081535036473
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 156754
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9326400812715292
          entropy_coeff: 0.0
          kl: 0.01300908692064695
          policy_loss: -0.09268695121863857
          total_loss: -0.045975747547345236
          vf_explained_var: 0.8313494324684143
          vf_loss: 0.0269536561681889
    num_agent_steps_sampled: 8541712
    num_steps_sampled: 8541712
    num_steps_trained: 8541712
  iterations_since_restore: 356


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1351,59691.7,8541712,1.88571,1.9828,-2,32.7124


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8549704
  custom_metrics: {}
  date: 2021-12-10_05-37-43
  done: false
  episode_len_mean: 38.33185840707964
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9236902644676446
  episode_reward_min: 0.7712000012397766
  episodes_this_iter: 226
  episodes_total: 156980
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9310400877147913
          entropy_coeff: 0.0
          kl: 0.014192646398441866
          policy_loss: -0.10856340522877872
          total_loss: -0.05846132355509326
          vf_explained_var: 0.7320634126663208
          vf_loss: 0.02854699915042147
    num_agent_steps_sampled: 8549704
    num_steps_sampled: 8549704
    num_steps_trained: 8549704
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1352,59737.3,8549704,1.92369,1.9828,0.7712,38.3319


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8557696
  custom_metrics: {}
  date: 2021-12-10_05-38-28
  done: false
  episode_len_mean: 35.97727272727273
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8942618191242218
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 157200
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.960174398496747
          entropy_coeff: 0.0
          kl: 0.013251014053821564
          policy_loss: -0.09790985693689436
          total_loss: -0.051052338792942464
          vf_explained_var: 0.8191385865211487
          vf_loss: 0.02673254261026159
    num_agent_steps_sampled: 8557696
    num_steps_sampled: 8557696
    num_steps_trained: 8557696
  iterations_since_restore: 358

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1353,59783,8557696,1.89426,1.9804,-2,35.9773


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8565688
  custom_metrics: {}
  date: 2021-12-10_05-39-14
  done: false
  episode_len_mean: 36.70454545454545
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.892394550821998
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 157420
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9513777866959572
          entropy_coeff: 0.0
          kl: 0.014126897469395772
          policy_loss: -0.10736595660273451
          total_loss: -0.05805903306463733
          vf_explained_var: 0.7996245622634888
          vf_loss: 0.027851698774611577
    num_agent_steps_sampled: 8565688
    num_steps_sampled: 8565688
    num_steps_trained: 8565688
  iterations_since_restore: 359


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1354,59828.9,8565688,1.89239,1.9828,-2,36.7045


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8573680
  custom_metrics: {}
  date: 2021-12-10_05-40-00
  done: false
  episode_len_mean: 31.29184549356223
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9378489235440037
  episode_reward_min: 1.7187999486923218
  episodes_this_iter: 233
  episodes_total: 157653
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.909865465015173
          entropy_coeff: 0.0
          kl: 0.014273214474087581
          policy_loss: -0.10448524419916794
          total_loss: -0.05496160325128585
          vf_explained_var: 0.7425466775894165
          vf_loss: 0.027846196200698614
    num_agent_steps_sampled: 8573680
    num_steps_sampled: 8573680
    num_steps_trained: 8573680
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1355,59874.8,8573680,1.93785,1.9828,1.7188,31.2918


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8581672
  custom_metrics: {}
  date: 2021-12-10_05-40-46
  done: false
  episode_len_mean: 41.43877551020408
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9042795902612257
  episode_reward_min: -2.0
  episodes_this_iter: 196
  episodes_total: 157849
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9944051019847393
          entropy_coeff: 0.0
          kl: 0.015027430636109784
          policy_loss: -0.10696255366201513
          total_loss: -0.0562107247824315
          vf_explained_var: 0.764362096786499
          vf_loss: 0.027928919822443277
    num_agent_steps_sampled: 8581672
    num_steps_sampled: 8581672
    num_steps_trained: 8581672
  iterations_since_restore: 361


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1356,59920.8,8581672,1.90428,1.9784,-2,41.4388


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8589664
  custom_metrics: {}
  date: 2021-12-10_05-41-32
  done: false
  episode_len_mean: 34.70531400966183
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8948347850698204
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 158056
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0031918324530125
          entropy_coeff: 0.0
          kl: 0.013181674119550735
          policy_loss: -0.09357472864212468
          total_loss: -0.04189922474324703
          vf_explained_var: 0.8404641151428223
          vf_loss: 0.03165583487134427
    num_agent_steps_sampled: 8589664
    num_steps_sampled: 8589664
    num_steps_trained: 8589664
  iterations_since_restore: 362


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1357,59966.8,8589664,1.89483,1.9796,-2,34.7053


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8597656
  custom_metrics: {}
  date: 2021-12-10_05-42-18
  done: false
  episode_len_mean: 39.82380952380952
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8689085707778021
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 158266
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9923723433166742
          entropy_coeff: 0.0
          kl: 0.013013350719120353
          policy_loss: -0.09386450669262558
          total_loss: -0.04436324362177402
          vf_explained_var: 0.8374788761138916
          vf_loss: 0.02973723883042112
    num_agent_steps_sampled: 8597656
    num_steps_sampled: 8597656
    num_steps_trained: 8597656
  iterations_since_restore: 363


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1358,60012.6,8597656,1.86891,1.9796,-2,39.8238


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8605648
  custom_metrics: {}
  date: 2021-12-10_05-43-04
  done: false
  episode_len_mean: 36.50232558139535
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9091720952544102
  episode_reward_min: -2.0
  episodes_this_iter: 215
  episodes_total: 158481
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9507529567927122
          entropy_coeff: 0.0
          kl: 0.013844801287632436
          policy_loss: -0.10719114801031537
          total_loss: -0.05357077451481018
          vf_explained_var: 0.7636932730674744
          vf_loss: 0.03259358456125483
    num_agent_steps_sampled: 8605648
    num_steps_sampled: 8605648
    num_steps_trained: 8605648
  iterations_since_restore: 364


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1359,60058.1,8605648,1.90917,1.9796,-2,36.5023


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8613640
  custom_metrics: {}
  date: 2021-12-10_05-43-50
  done: false
  episode_len_mean: 39.89316239316239
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.920547009533287
  episode_reward_min: 0.0
  episodes_this_iter: 234
  episodes_total: 158715
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9316762126982212
          entropy_coeff: 0.0
          kl: 0.014519220974761993
          policy_loss: -0.1024606047431007
          total_loss: -0.05320034988108091
          vf_explained_var: 0.72685706615448
          vf_loss: 0.027209190477151424
    num_agent_steps_sampled: 8613640
    num_steps_sampled: 8613640
    num_steps_trained: 8613640
  iterations_since_restore: 365
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1360,60104,8613640,1.92055,1.9776,0,39.8932


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8621632
  custom_metrics: {}
  date: 2021-12-10_05-44-35
  done: false
  episode_len_mean: 31.518218623481783
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9053441266781888
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 158962
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9082785248756409
          entropy_coeff: 0.0
          kl: 0.013975576468510553
          policy_loss: -0.08947415757575072
          total_loss: -0.03828781403717585
          vf_explained_var: 0.748231053352356
          vf_loss: 0.029960936401039362
    num_agent_steps_sampled: 8621632
    num_steps_sampled: 8621632
    num_steps_trained: 8621632
  iterations_since_restore: 366

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1361,60149.7,8621632,1.90534,1.9796,-2,31.5182


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8629624
  custom_metrics: {}
  date: 2021-12-10_05-45-21
  done: false
  episode_len_mean: 33.66094420600859
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.916128757173923
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 159195
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9458206035196781
          entropy_coeff: 0.0
          kl: 0.01429461091174744
          policy_loss: -0.1054228364082519
          total_loss: -0.057612257689470425
          vf_explained_var: 0.7428696155548096
          vf_loss: 0.026100635004695505
    num_agent_steps_sampled: 8629624
    num_steps_sampled: 8629624
    num_steps_trained: 8629624
  iterations_since_restore: 367
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1362,60195.6,8629624,1.91613,1.9796,-2,33.6609


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8637616
  custom_metrics: {}
  date: 2021-12-10_05-46-07
  done: false
  episode_len_mean: 33.171052631578945
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.934061400723039
  episode_reward_min: 1.36080002784729
  episodes_this_iter: 228
  episodes_total: 159423
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9291783776134253
          entropy_coeff: 0.0
          kl: 0.01450612541520968
          policy_loss: -0.10755289089865983
          total_loss: -0.06191357859643176
          vf_explained_var: 0.7052127718925476
          vf_loss: 0.023608134128153324
    num_agent_steps_sampled: 8637616
    num_steps_sampled: 8637616
    num_steps_trained: 8637616
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1363,60241.5,8637616,1.93406,1.9796,1.3608,33.1711


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8645608
  custom_metrics: {}
  date: 2021-12-10_05-46-53
  done: false
  episode_len_mean: 34.93665158371041
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9130642548945156
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 159644
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9809350296854973
          entropy_coeff: 0.0
          kl: 0.013797987048747018
          policy_loss: -0.1078573918202892
          total_loss: -0.06504479044815525
          vf_explained_var: 0.8134037852287292
          vf_loss: 0.021856907580513507
    num_agent_steps_sampled: 8645608
    num_steps_sampled: 8645608
    num_steps_trained: 8645608
  iterations_since_restore: 369

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1364,60287.5,8645608,1.91306,1.9784,-2,34.9367


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8653600
  custom_metrics: {}
  date: 2021-12-10_05-47-39
  done: false
  episode_len_mean: 35.995850622406635
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.928398336117693
  episode_reward_min: 0.0
  episodes_this_iter: 241
  episodes_total: 159885
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9372964352369308
          entropy_coeff: 0.0
          kl: 0.014455307886237279
          policy_loss: -0.10189152930979617
          total_loss: -0.058191326912492514
          vf_explained_var: 0.7965989112854004
          vf_loss: 0.02174620795994997
    num_agent_steps_sampled: 8653600
    num_steps_sampled: 8653600
    num_steps_trained: 8653600
  iterations_since_restore: 370

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1365,60333.5,8653600,1.9284,1.982,0,35.9959


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8661592
  custom_metrics: {}
  date: 2021-12-10_05-48-25
  done: false
  episode_len_mean: 30.497975708502025
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8927983769521057
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 160132
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9127226080745459
          entropy_coeff: 0.0
          kl: 0.012542214448330924
          policy_loss: -0.09222443198086694
          total_loss: -0.0515347191831097
          vf_explained_var: 0.8364595770835876
          vf_loss: 0.021641224942868575
    num_agent_steps_sampled: 8661592
    num_steps_sampled: 8661592
    num_steps_trained: 8661592
  iterations_since_restore: 37

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1366,60379.3,8661592,1.8928,1.982,-2,30.498


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8669584
  custom_metrics: {}
  date: 2021-12-10_05-49-11
  done: false
  episode_len_mean: 36.808035714285715
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8744892845196384
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 160356
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9554344303905964
          entropy_coeff: 0.0
          kl: 0.013385462312726304
          policy_loss: -0.10067637884640135
          total_loss: -0.04431797412689775
          vf_explained_var: 0.7877636551856995
          vf_loss: 0.03602923540165648
    num_agent_steps_sampled: 8669584
    num_steps_sampled: 8669584
    num_steps_trained: 8669584
  iterations_since_restore: 37

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1367,60425.1,8669584,1.87449,1.9808,-2,36.808


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8677576
  custom_metrics: {}
  date: 2021-12-10_05-49-57
  done: false
  episode_len_mean: 35.388429752066116
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9295801638571684
  episode_reward_min: 0.0
  episodes_this_iter: 242
  episodes_total: 160598
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9009571187198162
          entropy_coeff: 0.0
          kl: 0.014733453892404214
          policy_loss: -0.10310753781232052
          total_loss: -0.05467569065513089
          vf_explained_var: 0.7300451993942261
          vf_loss: 0.026055410620756447
    num_agent_steps_sampled: 8677576
    num_steps_sampled: 8677576
    num_steps_trained: 8677576
  iterations_since_restore: 37

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1368,60471.1,8677576,1.92958,1.9808,0,35.3884


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8685568
  custom_metrics: {}
  date: 2021-12-10_05-50-43
  done: false
  episode_len_mean: 33.06349206349206
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9189714278493608
  episode_reward_min: -2.0
  episodes_this_iter: 252
  episodes_total: 160850
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.890008732676506
          entropy_coeff: 0.0
          kl: 0.013663454039487988
          policy_loss: -0.09792316719540395
          total_loss: -0.053124644327908754
          vf_explained_var: 0.7288100719451904
          vf_loss: 0.02404715499142185
    num_agent_steps_sampled: 8685568
    num_steps_sampled: 8685568
    num_steps_trained: 8685568
  iterations_since_restore: 374

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1369,60516.9,8685568,1.91897,1.9824,-2,33.0635


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8693560
  custom_metrics: {}
  date: 2021-12-10_05-51-29
  done: false
  episode_len_mean: 31.933920704845814
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9196035279051322
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 161077
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.943807240575552
          entropy_coeff: 0.0
          kl: 0.013695806817850098
          policy_loss: -0.09533714491408318
          total_loss: -0.05058737294166349
          vf_explained_var: 0.7773298025131226
          vf_loss: 0.02394926588749513
    num_agent_steps_sampled: 8693560
    num_steps_sampled: 8693560
    num_steps_trained: 8693560
  iterations_since_restore: 375

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1370,60563.4,8693560,1.9196,1.98,-2,31.9339


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8701552
  custom_metrics: {}
  date: 2021-12-10_05-52-15
  done: false
  episode_len_mean: 33.854166666666664
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.916538336376349
  episode_reward_min: -2.0
  episodes_this_iter: 240
  episodes_total: 161317
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.94178182259202
          entropy_coeff: 0.0
          kl: 0.013872509269276634
          policy_loss: -0.1057540182955563
          total_loss: -0.060377781512215734
          vf_explained_var: 0.7581997513771057
          vf_loss: 0.02430736640235409
    num_agent_steps_sampled: 8701552
    num_steps_sampled: 8701552
    num_steps_trained: 8701552
  iterations_since_restore: 376
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1371,60609.2,8701552,1.91654,1.9852,-2,33.8542


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8709544
  custom_metrics: {}
  date: 2021-12-10_05-53-01
  done: false
  episode_len_mean: 33.70954356846473
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.900868049795697
  episode_reward_min: -2.0
  episodes_this_iter: 241
  episodes_total: 161558
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9314772486686707
          entropy_coeff: 0.0
          kl: 0.012577134126331657
          policy_loss: -0.08873969671549276
          total_loss: -0.03396192468062509
          vf_explained_var: 0.7094124555587769
          vf_loss: 0.03567624883726239
    num_agent_steps_sampled: 8709544
    num_steps_sampled: 8709544
    num_steps_trained: 8709544
  iterations_since_restore: 377
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1372,60655.2,8709544,1.90087,1.9852,-2,33.7095


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8717536
  custom_metrics: {}
  date: 2021-12-10_05-53-47
  done: false
  episode_len_mean: 32.943548387096776
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9038338752523545
  episode_reward_min: -2.0
  episodes_this_iter: 248
  episodes_total: 161806
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9149895720183849
          entropy_coeff: 0.0
          kl: 0.013188594137318432
          policy_loss: -0.08940280129900202
          total_loss: -0.043271994989481755
          vf_explained_var: 0.776057243347168
          vf_loss: 0.026100627321284264
    num_agent_steps_sampled: 8717536
    num_steps_sampled: 8717536
    num_steps_trained: 8717536
  iterations_since_restore: 37

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1373,60701.1,8717536,1.90383,1.9852,-2,32.9435


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8725528
  custom_metrics: {}
  date: 2021-12-10_05-54-33
  done: false
  episode_len_mean: 30.428015564202333
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.909394551343955
  episode_reward_min: -2.0
  episodes_this_iter: 257
  episodes_total: 162063
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8944381214678288
          entropy_coeff: 0.0
          kl: 0.013473108876496553
          policy_loss: -0.0935206699068658
          total_loss: -0.05028612882597372
          vf_explained_var: 0.771995484828949
          vf_loss: 0.02277225611032918
    num_agent_steps_sampled: 8725528
    num_steps_sampled: 8725528
    num_steps_trained: 8725528
  iterations_since_restore: 379
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1374,60747.1,8725528,1.90939,1.9852,-2,30.428


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8733520
  custom_metrics: {}
  date: 2021-12-10_05-55-19
  done: false
  episode_len_mean: 32.1965811965812
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9191623956729205
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 162297
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9404035154730082
          entropy_coeff: 0.0
          kl: 0.013809522002702579
          policy_loss: -0.09141749220725615
          total_loss: -0.03766179816739168
          vf_explained_var: 0.7537333369255066
          vf_loss: 0.03278247901471332
    num_agent_steps_sampled: 8733520
    num_steps_sampled: 8733520
    num_steps_trained: 8733520
  iterations_since_restore: 380
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1375,60793,8733520,1.91916,1.9852,-2,32.1966


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8741512
  custom_metrics: {}
  date: 2021-12-10_05-56-05
  done: false
  episode_len_mean: 37.92270531400966
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9058338181984023
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 162504
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9878253880888224
          entropy_coeff: 0.0
          kl: 0.015084288781508803
          policy_loss: -0.11049320240272209
          total_loss: -0.06219911064545158
          vf_explained_var: 0.7436938285827637
          vf_loss: 0.02538482815725729
    num_agent_steps_sampled: 8741512
    num_steps_sampled: 8741512
    num_steps_trained: 8741512
  iterations_since_restore: 381


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1376,60838.6,8741512,1.90583,1.9852,-2,37.9227


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8749504
  custom_metrics: {}
  date: 2021-12-10_05-56-51
  done: false
  episode_len_mean: 35.46666666666667
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9128266716003417
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 162729
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.939787644892931
          entropy_coeff: 0.0
          kl: 0.012980612489627674
          policy_loss: -0.0897412901831558
          total_loss: -0.04414320165233221
          vf_explained_var: 0.8312005996704102
          vf_loss: 0.025883783004246652
    num_agent_steps_sampled: 8749504
    num_steps_sampled: 8749504
    num_steps_trained: 8749504
  iterations_since_restore: 382
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1377,60884.8,8749504,1.91283,1.9852,-2,35.4667


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8757496
  custom_metrics: {}
  date: 2021-12-10_05-57-37
  done: false
  episode_len_mean: 35.9009900990099
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9285683159780975
  episode_reward_min: 1.4759999513626099
  episodes_this_iter: 202
  episodes_total: 162931
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9949161093682051
          entropy_coeff: 0.0
          kl: 0.013693732878891751
          policy_loss: -0.09811444507795386
          total_loss: -0.05330674783908762
          vf_explained_var: 0.8129650354385376
          vf_loss: 0.024010339751839638
    num_agent_steps_sampled: 8757496
    num_steps_sampled: 8757496
    num_steps_trained: 8757496
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1378,60930.5,8757496,1.92857,1.9828,1.476,35.901


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8765488
  custom_metrics: {}
  date: 2021-12-10_05-58-23
  done: false
  episode_len_mean: 37.13478260869565
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.8932591320379921
  episode_reward_min: -2.0
  episodes_this_iter: 230
  episodes_total: 163161
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9507759399712086
          entropy_coeff: 0.0
          kl: 0.013875093776732683
          policy_loss: -0.09509570154477842
          total_loss: -0.048875836364459246
          vf_explained_var: 0.8396955132484436
          vf_loss: 0.025147068896330893
    num_agent_steps_sampled: 8765488
    num_steps_sampled: 8765488
    num_steps_trained: 8765488
  iterations_since_restore: 38

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1379,60976.5,8765488,1.89326,1.9852,-2,37.1348


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8773480
  custom_metrics: {}
  date: 2021-12-10_05-59-09
  done: false
  episode_len_mean: 31.381322957198442
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.9233322992399045
  episode_reward_min: -2.0
  episodes_this_iter: 257
  episodes_total: 163418
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8999742846935987
          entropy_coeff: 0.0
          kl: 0.014196429023286328
          policy_loss: -0.10040411216323264
          total_loss: -0.05329403834184632
          vf_explained_var: 0.7646780610084534
          vf_loss: 0.02554925059666857
    num_agent_steps_sampled: 8773480
    num_steps_sampled: 8773480
    num_steps_trained: 8773480
  iterations_since_restore: 385

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1380,61022.5,8773480,1.92333,1.9852,-2,31.3813


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8781472
  custom_metrics: {}
  date: 2021-12-10_05-59-55
  done: false
  episode_len_mean: 38.72052401746725
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.906452400195026
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 163647
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9177712742239237
          entropy_coeff: 0.0
          kl: 0.013215396262239665
          policy_loss: -0.09617911145323887
          total_loss: -0.052311710504000075
          vf_explained_var: 0.8472909331321716
          vf_loss: 0.02379651478258893
    num_agent_steps_sampled: 8781472
    num_steps_sampled: 8781472
    num_steps_trained: 8781472
  iterations_since_restore: 386

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1381,61068.5,8781472,1.90645,1.9784,-2,38.7205


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8789464
  custom_metrics: {}
  date: 2021-12-10_06-00-41
  done: false
  episode_len_mean: 33.76754385964912
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9158964899548314
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 163875
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9044716227799654
          entropy_coeff: 0.0
          kl: 0.013516292645363137
          policy_loss: -0.09009126358432695
          total_loss: -0.0409066842548782
          vf_explained_var: 0.786537766456604
          vf_loss: 0.02865671197650954
    num_agent_steps_sampled: 8789464
    num_steps_sampled: 8789464
    num_steps_trained: 8789464
  iterations_since_restore: 387
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1382,61114.3,8789464,1.9159,1.9804,-2,33.7675


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8797456
  custom_metrics: {}
  date: 2021-12-10_06-01-27
  done: false
  episode_len_mean: 33.80408163265306
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9013844908500204
  episode_reward_min: -2.0
  episodes_this_iter: 245
  episodes_total: 164120
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8902668803930283
          entropy_coeff: 0.0
          kl: 0.01332580647431314
          policy_loss: -0.09169877565000206
          total_loss: -0.04607033162028529
          vf_explained_var: 0.7736873626708984
          vf_loss: 0.025389873597305268
    num_agent_steps_sampled: 8797456
    num_steps_sampled: 8797456
    num_steps_trained: 8797456
  iterations_since_restore: 388

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1383,61160.4,8797456,1.90138,1.9784,-2,33.8041


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8805448
  custom_metrics: {}
  date: 2021-12-10_06-02-13
  done: false
  episode_len_mean: 33.669603524229075
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8829092538304266
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 164347
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9296107944101095
          entropy_coeff: 0.0
          kl: 0.014127199334325269
          policy_loss: -0.09774273211951368
          total_loss: -0.046712903495063074
          vf_explained_var: 0.8552170991897583
          vf_loss: 0.02957414445700124
    num_agent_steps_sampled: 8805448
    num_steps_sampled: 8805448
    num_steps_trained: 8805448
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1384,61206.3,8805448,1.88291,1.9784,-2,33.6696


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8813440
  custom_metrics: {}
  date: 2021-12-10_06-02-59
  done: false
  episode_len_mean: 32.96120689655172
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9345310335529262
  episode_reward_min: 1.7308000326156616
  episodes_this_iter: 232
  episodes_total: 164579
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9150505438446999
          entropy_coeff: 0.0
          kl: 0.014809803309617564
          policy_loss: -0.10024770256131887
          total_loss: -0.05408523016376421
          vf_explained_var: 0.7380993366241455
          vf_loss: 0.023670083901379257
    num_agent_steps_sampled: 8813440
    num_steps_sampled: 8813440
    num_steps_trained: 8813440
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1385,61252.5,8813440,1.93453,1.9804,1.7308,32.9612


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8821432
  custom_metrics: {}
  date: 2021-12-10_06-03-45
  done: false
  episode_len_mean: 34.00440528634361
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.932468722045159
  episode_reward_min: 1.4204000234603882
  episodes_this_iter: 227
  episodes_total: 164806
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9422570783644915
          entropy_coeff: 0.0
          kl: 0.015579896047711372
          policy_loss: -0.10422829425078817
          total_loss: -0.05647620192030445
          vf_explained_var: 0.7761346697807312
          vf_loss: 0.02409012638963759
    num_agent_steps_sampled: 8821432
    num_steps_sampled: 8821432
    num_steps_trained: 8821432
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1386,61298.6,8821432,1.93247,1.9816,1.4204,34.0044


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8829424
  custom_metrics: {}
  date: 2021-12-10_06-04-31
  done: false
  episode_len_mean: 36.348017621145374
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9277039588416724
  episode_reward_min: 0.21040000021457672
  episodes_this_iter: 227
  episodes_total: 165033
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9551655780524015
          entropy_coeff: 0.0
          kl: 0.014098393789026886
          policy_loss: -0.10509265155997127
          total_loss: -0.057426243380177766
          vf_explained_var: 0.7558075189590454
          vf_loss: 0.026254476164467633
    num_agent_steps_sampled: 8829424
    num_steps_sampled: 8829424
    num_steps_trained: 8829424
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1387,61344.7,8829424,1.9277,1.9836,0.2104,36.348


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8837416
  custom_metrics: {}
  date: 2021-12-10_06-05-17
  done: false
  episode_len_mean: 35.098765432098766
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.930186004305082
  episode_reward_min: 0.8212000131607056
  episodes_this_iter: 243
  episodes_total: 165276
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9155286829918623
          entropy_coeff: 0.0
          kl: 0.014521716453600675
          policy_loss: -0.10198321851203218
          total_loss: -0.05672969465376809
          vf_explained_var: 0.7622194290161133
          vf_loss: 0.023198668466648087
    num_agent_steps_sampled: 8837416
    num_steps_sampled: 8837416
    num_steps_trained: 8837416
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1388,61390.7,8837416,1.93019,1.9816,0.8212,35.0988


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8845408
  custom_metrics: {}
  date: 2021-12-10_06-06-03
  done: false
  episode_len_mean: 31.77734375
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9219562499783933
  episode_reward_min: -2.0
  episodes_this_iter: 256
  episodes_total: 165532
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8722986895591021
          entropy_coeff: 0.0
          kl: 0.013415662164334208
          policy_loss: -0.09571032639360055
          total_loss: -0.05174524770700373
          vf_explained_var: 0.7569106221199036
          vf_loss: 0.023590042954310775
    num_agent_steps_sampled: 8845408
    num_steps_sampled: 8845408
    num_steps_trained: 8845408
  iterations_since_restore: 394
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1389,61436.7,8845408,1.92196,1.9836,-2,31.7773


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8853400
  custom_metrics: {}
  date: 2021-12-10_06-06-49
  done: false
  episode_len_mean: 35.6712962962963
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8761888891458511
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 165748
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9398021753877401
          entropy_coeff: 0.0
          kl: 0.013332758710021153
          policy_loss: -0.09367698336427566
          total_loss: -0.04805088724242523
          vf_explained_var: 0.8579388856887817
          vf_loss: 0.025376968667842448
    num_agent_steps_sampled: 8853400
    num_steps_sampled: 8853400
    num_steps_trained: 8853400
  iterations_since_restore: 395

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1390,61482.8,8853400,1.87619,1.9836,-2,35.6713


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8861392
  custom_metrics: {}
  date: 2021-12-10_06-07-36
  done: false
  episode_len_mean: 32.67063492063492
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9196523801674918
  episode_reward_min: -2.0
  episodes_this_iter: 252
  episodes_total: 166000
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8777035120874643
          entropy_coeff: 0.0
          kl: 0.012964083172846586
          policy_loss: -0.09117694367887452
          total_loss: -0.047076748800463974
          vf_explained_var: 0.7263904213905334
          vf_loss: 0.024410993850324303
    num_agent_steps_sampled: 8861392
    num_steps_sampled: 8861392
    num_steps_trained: 8861392
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1391,61529,8861392,1.91965,1.9836,-2,32.6706


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8869384
  custom_metrics: {}
  date: 2021-12-10_06-08-22
  done: false
  episode_len_mean: 35.61233480176212
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9122255510170554
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 166227
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9005422852933407
          entropy_coeff: 0.0
          kl: 0.014490685774944723
          policy_loss: -0.09901121538132429
          total_loss: -0.054085545300040394
          vf_explained_var: 0.6991643905639648
          vf_loss: 0.02291794156190008
    num_agent_steps_sampled: 8869384
    num_steps_sampled: 8869384
    num_steps_trained: 8869384
  iterations_since_restore: 397

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1392,61574.8,8869384,1.91223,1.9816,-2,35.6123


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8877376
  custom_metrics: {}
  date: 2021-12-10_06-09-07
  done: false
  episode_len_mean: 30.512
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9237599992752075
  episode_reward_min: -2.0
  episodes_this_iter: 250
  episodes_total: 166477
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8876126185059547
          entropy_coeff: 0.0
          kl: 0.013467970857163891
          policy_loss: -0.09697281913395273
          total_loss: -0.05423391782096587
          vf_explained_var: 0.7457690238952637
          vf_loss: 0.022284421604126692
    num_agent_steps_sampled: 8877376
    num_steps_sampled: 8877376
    num_steps_trained: 8877376
  iterations_since_restore: 398
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1393,61620.7,8877376,1.92376,1.9836,-2,30.512


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8885368
  custom_metrics: {}
  date: 2021-12-10_06-09-53
  done: false
  episode_len_mean: 36.4468085106383
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.894997449124113
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 166712
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9006874039769173
          entropy_coeff: 0.0
          kl: 0.013294141972437501
          policy_loss: -0.0949937961413525
          total_loss: -0.04200333834160119
          vf_explained_var: 0.6859320402145386
          vf_loss: 0.03279998269863427
    num_agent_steps_sampled: 8885368
    num_steps_sampled: 8885368
    num_steps_trained: 8885368
  iterations_since_restore: 399
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1394,61666.7,8885368,1.895,1.9836,-2,36.4468


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8893360
  custom_metrics: {}
  date: 2021-12-10_06-10-40
  done: false
  episode_len_mean: 30.46747967479675
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9237463440352338
  episode_reward_min: -2.0
  episodes_this_iter: 246
  episodes_total: 166958
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9113643057644367
          entropy_coeff: 0.0
          kl: 0.013832410506438464
          policy_loss: -0.09838099440094084
          total_loss: -0.05551439605187625
          vf_explained_var: 0.8099948167800903
          vf_loss: 0.02185862508486025
    num_agent_steps_sampled: 8893360
    num_steps_sampled: 8893360
    num_steps_trained: 8893360
  iterations_since_restore: 400

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1395,61712.7,8893360,1.92375,1.9836,-2,30.4675


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8901352
  custom_metrics: {}
  date: 2021-12-10_06-11-26
  done: false
  episode_len_mean: 35.32272727272727
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8775618184696545
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 167178
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9540319573134184
          entropy_coeff: 0.0
          kl: 0.013041273137787357
          policy_loss: -0.08916795518598519
          total_loss: -0.04063753440277651
          vf_explained_var: 0.823199987411499
          vf_loss: 0.028723986411932856
    num_agent_steps_sampled: 8901352
    num_steps_sampled: 8901352
    num_steps_trained: 8901352
  iterations_since_restore: 401


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1396,61758.7,8901352,1.87756,1.9832,-2,35.3227


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8909344
  custom_metrics: {}
  date: 2021-12-10_06-12-12
  done: false
  episode_len_mean: 36.09442060085837
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.911701286299546
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 167411
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8958581108599901
          entropy_coeff: 0.0
          kl: 0.013183708942960948
          policy_loss: -0.0947983012883924
          total_loss: -0.04614765009318944
          vf_explained_var: 0.6858615875244141
          vf_loss: 0.028627894120290875
    num_agent_steps_sampled: 8909344
    num_steps_sampled: 8909344
    num_steps_trained: 8909344
  iterations_since_restore: 402


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1397,61805.1,8909344,1.9117,1.984,-2,36.0944


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8917336
  custom_metrics: {}
  date: 2021-12-10_06-12-58
  done: false
  episode_len_mean: 31.796812749003983
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9367792877068082
  episode_reward_min: 1.6784000396728516
  episodes_this_iter: 251
  episodes_total: 167662
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8674984388053417
          entropy_coeff: 0.0
          kl: 0.014282097516115755
          policy_loss: -0.0967759127379395
          total_loss: -0.052624641451984644
          vf_explained_var: 0.6694931387901306
          vf_loss: 0.02246033848496154
    num_agent_steps_sampled: 8917336
    num_steps_sampled: 8917336
    num_steps_trained: 8917336
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1398,61851.4,8917336,1.93678,1.9836,1.6784,31.7968


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8925328
  custom_metrics: {}
  date: 2021-12-10_06-13-44
  done: false
  episode_len_mean: 32.24096385542169
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.935865061350137
  episode_reward_min: 1.7244000434875488
  episodes_this_iter: 249
  episodes_total: 167911
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.875684691593051
          entropy_coeff: 0.0
          kl: 0.014329333585919812
          policy_loss: -0.10219781800697092
          total_loss: -0.05844584119040519
          vf_explained_var: 0.6509941816329956
          vf_loss: 0.021989302709698677
    num_agent_steps_sampled: 8925328
    num_steps_sampled: 8925328
    num_steps_trained: 8925328
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1399,61897.1,8925328,1.93587,1.9836,1.7244,32.241


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8933320
  custom_metrics: {}
  date: 2021-12-10_06-14-30
  done: false
  episode_len_mean: 37.28095238095238
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9072133336748396
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 168121
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9358234666287899
          entropy_coeff: 0.0
          kl: 0.0136997977970168
          policy_loss: -0.10163265990559012
          total_loss: -0.05828566791024059
          vf_explained_var: 0.7545511722564697
          vf_loss: 0.02254042081767693
    num_agent_steps_sampled: 8933320
    num_steps_sampled: 8933320
    num_steps_trained: 8933320
  iterations_since_restore: 405
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1400,61943,8933320,1.90721,1.9808,-2,37.281


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8941312
  custom_metrics: {}
  date: 2021-12-10_06-15-16
  done: false
  episode_len_mean: 32.97424892703863
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9179793990221146
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 168354
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8989913109689951
          entropy_coeff: 0.0
          kl: 0.013227197079686448
          policy_loss: -0.09058522986015305
          total_loss: -0.04737929589464329
          vf_explained_var: 0.7847650051116943
          vf_loss: 0.023117132717743516
    num_agent_steps_sampled: 8941312
    num_steps_sampled: 8941312
    num_steps_trained: 8941312
  iterations_since_restore: 40

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1401,61988.9,8941312,1.91798,1.9836,-2,32.9742


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8949304
  custom_metrics: {}
  date: 2021-12-10_06-16-02
  done: false
  episode_len_mean: 35.98706896551724
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9115706908291783
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 168586
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9039300829172134
          entropy_coeff: 0.0
          kl: 0.014346707524964586
          policy_loss: -0.10613514063879848
          total_loss: -0.0633598197309766
          vf_explained_var: 0.736863374710083
          vf_loss: 0.020986261835787445
    num_agent_steps_sampled: 8949304
    num_steps_sampled: 8949304
    num_steps_trained: 8949304
  iterations_since_restore: 407


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1402,62034.8,8949304,1.91157,1.9836,-2,35.9871


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8957296
  custom_metrics: {}
  date: 2021-12-10_06-16-48
  done: false
  episode_len_mean: 31.579766536964982
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9372824880399593
  episode_reward_min: 1.6723999977111816
  episodes_this_iter: 257
  episodes_total: 168843
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8835800997912884
          entropy_coeff: 0.0
          kl: 0.015240576525684446
          policy_loss: -0.10460524656809866
          total_loss: -0.06017457433335949
          vf_explained_var: 0.6740472316741943
          vf_loss: 0.02128404879476875
    num_agent_steps_sampled: 8957296
    num_steps_sampled: 8957296
    num_steps_trained: 8957296
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1403,62080.7,8957296,1.93728,1.9836,1.6724,31.5798


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8965288
  custom_metrics: {}
  date: 2021-12-10_06-17-34
  done: false
  episode_len_mean: 33.45021645021645
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9165904743846878
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 169074
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9527910649776459
          entropy_coeff: 0.0
          kl: 0.013514253165340051
          policy_loss: -0.1014011378865689
          total_loss: -0.05934961358434521
          vf_explained_var: 0.773537278175354
          vf_loss: 0.02152675375691615
    num_agent_steps_sampled: 8965288
    num_steps_sampled: 8965288
    num_steps_trained: 8965288
  iterations_since_restore: 409
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1404,62127,8965288,1.91659,1.9836,-2,33.4502


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8973280
  custom_metrics: {}
  date: 2021-12-10_06-18-20
  done: false
  episode_len_mean: 33.85641025641026
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9327158921804184
  episode_reward_min: 1.298799991607666
  episodes_this_iter: 195
  episodes_total: 169269
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.034888008609414
          entropy_coeff: 0.0
          kl: 0.014700535364681855
          policy_loss: -0.10851314140018076
          total_loss: -0.06518817177857272
          vf_explained_var: 0.819736123085022
          vf_loss: 0.02099853110848926
    num_agent_steps_sampled: 8973280
    num_steps_sampled: 8973280
    num_steps_trained: 8973280
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1405,62172.6,8973280,1.93272,1.9824,1.2988,33.8564


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8981272
  custom_metrics: {}
  date: 2021-12-10_06-19-06
  done: false
  episode_len_mean: 42.57674418604651
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8970641862514408
  episode_reward_min: -2.0
  episodes_this_iter: 215
  episodes_total: 169484
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9769509676843882
          entropy_coeff: 0.0
          kl: 0.013951037224614993
          policy_loss: -0.10080799908610061
          total_loss: -0.057683091290527955
          vf_explained_var: 0.824647843837738
          vf_loss: 0.021936770644970238
    num_agent_steps_sampled: 8981272
    num_steps_sampled: 8981272
    num_steps_trained: 8981272
  iterations_since_restore: 41

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1406,62218.4,8981272,1.89706,1.982,-2,42.5767


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8989264
  custom_metrics: {}
  date: 2021-12-10_06-19-52
  done: false
  episode_len_mean: 33.33469387755102
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9017877544675554
  episode_reward_min: -2.0
  episodes_this_iter: 245
  episodes_total: 169729
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9318642485886812
          entropy_coeff: 0.0
          kl: 0.013233646081062034
          policy_loss: -0.09583743143593892
          total_loss: -0.042775061388965696
          vf_explained_var: 0.7610146403312683
          vf_loss: 0.03296377288643271
    num_agent_steps_sampled: 8989264
    num_steps_sampled: 8989264
    num_steps_trained: 8989264
  iterations_since_restore: 41

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1407,62264.3,8989264,1.90179,1.982,-2,33.3347


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 8997256
  custom_metrics: {}
  date: 2021-12-10_06-20-38
  done: false
  episode_len_mean: 36.029661016949156
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8956440651820878
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 169965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9168350230902433
          entropy_coeff: 0.0
          kl: 0.013222096458775923
          policy_loss: -0.09858029236784205
          total_loss: -0.05156595743028447
          vf_explained_var: 0.7892651557922363
          vf_loss: 0.026933277025818825
    num_agent_steps_sampled: 8997256
    num_steps_sampled: 8997256
    num_steps_trained: 8997256
  iterations_since_restore: 41

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1408,62310.3,8997256,1.89564,1.9832,-2,36.0297


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9005248
  custom_metrics: {}
  date: 2021-12-10_06-21-24
  done: false
  episode_len_mean: 32.28870292887029
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9358393278082044
  episode_reward_min: 1.6615999937057495
  episodes_this_iter: 239
  episodes_total: 170204
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9069077260792255
          entropy_coeff: 0.0
          kl: 0.013734642619965598
          policy_loss: -0.09973205940332264
          total_loss: -0.0536461486844928
          vf_explained_var: 0.7466138601303101
          vf_loss: 0.025226424913853407
    num_agent_steps_sampled: 9005248
    num_steps_sampled: 9005248
    num_steps_trained: 9005248
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1409,62356.4,9005248,1.93584,1.9832,1.6616,32.2887


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9013240
  custom_metrics: {}
  date: 2021-12-10_06-22-10
  done: false
  episode_len_mean: 33.52719665271967
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9174828449552528
  episode_reward_min: -2.0
  episodes_this_iter: 239
  episodes_total: 170443
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9012218043208122
          entropy_coeff: 0.0
          kl: 0.014973081124480814
          policy_loss: -0.10194836946902797
          total_loss: -0.05240862662321888
          vf_explained_var: 0.6982263326644897
          vf_loss: 0.026799378276336938
    num_agent_steps_sampled: 9013240
    num_steps_sampled: 9013240
    num_steps_trained: 9013240
  iterations_since_restore: 41

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1410,62402.4,9013240,1.91748,1.9824,-2,33.5272


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9021232
  custom_metrics: {}
  date: 2021-12-10_06-22-55
  done: false
  episode_len_mean: 34.26126126126126
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9144828808200252
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 170665
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.934939943253994
          entropy_coeff: 0.0
          kl: 0.013885402702726424
          policy_loss: -0.09682326824986376
          total_loss: -0.050945429189596325
          vf_explained_var: 0.7993159294128418
          vf_loss: 0.02478938509011641
    num_agent_steps_sampled: 9021232
    num_steps_sampled: 9021232
    num_steps_trained: 9021232
  iterations_since_restore: 416


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1411,62448,9021232,1.91448,1.9828,-2,34.2613


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9029224
  custom_metrics: {}
  date: 2021-12-10_06-23-41
  done: false
  episode_len_mean: 32.79831932773109
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9027747887523234
  episode_reward_min: -2.0
  episodes_this_iter: 238
  episodes_total: 170903
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9216434713453054
          entropy_coeff: 0.0
          kl: 0.01308322575641796
          policy_loss: -0.0915313676232472
          total_loss: -0.04227619731682353
          vf_explained_var: 0.7684289216995239
          vf_loss: 0.029385022935457528
    num_agent_steps_sampled: 9029224
    num_steps_sampled: 9029224
    num_steps_trained: 9029224
  iterations_since_restore: 417


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1412,62493.8,9029224,1.90277,1.9784,-2,32.7983


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9037216
  custom_metrics: {}
  date: 2021-12-10_06-24-27
  done: false
  episode_len_mean: 33.72727272727273
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.932909089868719
  episode_reward_min: 1.6691999435424805
  episodes_this_iter: 220
  episodes_total: 171123
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.970323221758008
          entropy_coeff: 0.0
          kl: 0.014178647979861125
          policy_loss: -0.1041766568669118
          total_loss: -0.0580595797218848
          vf_explained_var: 0.7448006868362427
          vf_loss: 0.02458325435873121
    num_agent_steps_sampled: 9037216
    num_steps_sampled: 9037216
    num_steps_trained: 9037216
  iterations_since_re

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1413,62539.5,9037216,1.93291,1.9784,1.6692,33.7273


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9045208
  custom_metrics: {}
  date: 2021-12-10_06-25-13
  done: false
  episode_len_mean: 38.92018779342723
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9052845088528916
  episode_reward_min: -2.0
  episodes_this_iter: 213
  episodes_total: 171336
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9612026028335094
          entropy_coeff: 0.0
          kl: 0.013362990372115746
          policy_loss: -0.10201776819303632
          total_loss: -0.05790441099088639
          vf_explained_var: 0.7756292819976807
          vf_loss: 0.02381831780076027
    num_agent_steps_sampled: 9045208
    num_steps_sampled: 9045208
    num_steps_trained: 9045208
  iterations_since_restore: 419

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1414,62585.3,9045208,1.90528,1.9788,-2,38.9202


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9053200
  custom_metrics: {}
  date: 2021-12-10_06-25-58
  done: false
  episode_len_mean: 36.563876651982376
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.893171806692552
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 171563
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9520283378660679
          entropy_coeff: 0.0
          kl: 0.013519097032258287
          policy_loss: -0.10482201501145028
          total_loss: -0.06022736628074199
          vf_explained_var: 0.8375698328018188
          vf_loss: 0.024062522337771952
    num_agent_steps_sampled: 9053200
    num_steps_sampled: 9053200
    num_steps_trained: 9053200
  iterations_since_restore: 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1415,62630.8,9053200,1.89317,1.978,-2,36.5639


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9061192
  custom_metrics: {}
  date: 2021-12-10_06-26-44
  done: false
  episode_len_mean: 35.81818181818182
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8952363637360659
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 171783
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9675177969038486
          entropy_coeff: 0.0
          kl: 0.014873416075715795
          policy_loss: -0.09850120806368068
          total_loss: -0.051526312716305256
          vf_explained_var: 0.8438191413879395
          vf_loss: 0.024385895347222686
    num_agent_steps_sampled: 9061192
    num_steps_sampled: 9061192
    num_steps_trained: 9061192
  iterations_since_restore: 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1416,62676.5,9061192,1.89524,1.9796,-2,35.8182


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9069184
  custom_metrics: {}
  date: 2021-12-10_06-27-30
  done: false
  episode_len_mean: 36.49321266968326
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9109574642656075
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 172004
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9239600319415331
          entropy_coeff: 0.0
          kl: 0.014037304325029254
          policy_loss: -0.10468464126461186
          total_loss: -0.06262270751176402
          vf_explained_var: 0.8166830539703369
          vf_loss: 0.020742778258863837
    num_agent_steps_sampled: 9069184
    num_steps_sampled: 9069184
    num_steps_trained: 9069184
  iterations_since_restore: 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1417,62722.3,9069184,1.91096,1.9788,-2,36.4932


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9077176
  custom_metrics: {}
  date: 2021-12-10_06-28-16
  done: false
  episode_len_mean: 36.77064220183486
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9088770665160013
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 172222
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9532884005457163
          entropy_coeff: 0.0
          kl: 0.013791333825793117
          policy_loss: -0.09904712706338614
          total_loss: -0.05165100345038809
          vf_explained_var: 0.7779077887535095
          vf_loss: 0.02645053470041603
    num_agent_steps_sampled: 9077176
    num_steps_sampled: 9077176
    num_steps_trained: 9077176
  iterations_since_restore: 423


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1418,62768.3,9077176,1.90888,1.9816,-2,36.7706


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9085168
  custom_metrics: {}
  date: 2021-12-10_06-29-01
  done: false
  episode_len_mean: 37.495098039215684
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.849645101556591
  episode_reward_min: -2.0
  episodes_this_iter: 204
  episodes_total: 172426
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9941844362765551
          entropy_coeff: 0.0
          kl: 0.013235685997642577
          policy_loss: -0.09718455089023337
          total_loss: -0.047536005586152896
          vf_explained_var: 0.8564246296882629
          vf_loss: 0.029546848207246512
    num_agent_steps_sampled: 9085168
    num_steps_sampled: 9085168
    num_steps_trained: 9085168
  iterations_since_restore: 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1419,62813.9,9085168,1.84965,1.9792,-2,37.4951


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9093160
  custom_metrics: {}
  date: 2021-12-10_06-29-47
  done: false
  episode_len_mean: 38.73513513513514
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8814637809186368
  episode_reward_min: -2.0
  episodes_this_iter: 185
  episodes_total: 172611
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9834687951952219
          entropy_coeff: 0.0
          kl: 0.014039629779290408
          policy_loss: -0.10170854840544052
          total_loss: -0.04889655698207207
          vf_explained_var: 0.8130189180374146
          vf_loss: 0.031489304383285344
    num_agent_steps_sampled: 9093160
    num_steps_sampled: 9093160
    num_steps_trained: 9093160
  iterations_since_restore: 42

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1420,62859.6,9093160,1.88146,1.978,-2,38.7351


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9101152
  custom_metrics: {}
  date: 2021-12-10_06-30-33
  done: false
  episode_len_mean: 41.660633484162894
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8993484171537252
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 172832
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9111645091325045
          entropy_coeff: 0.0
          kl: 0.014098542393185198
          policy_loss: -0.1015608791494742
          total_loss: -0.05143904950818978
          vf_explained_var: 0.764979362487793
          vf_loss: 0.028709666861686856
    num_agent_steps_sampled: 9101152
    num_steps_sampled: 9101152
    num_steps_trained: 9101152
  iterations_since_restore: 426

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1421,62905.2,9101152,1.89935,1.9788,-2,41.6606


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9109144
  custom_metrics: {}
  date: 2021-12-10_06-31-19
  done: false
  episode_len_mean: 39.8743961352657
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.901942030819142
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 173039
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9310854710638523
          entropy_coeff: 0.0
          kl: 0.014319031819468364
          policy_loss: -0.09881761972792447
          total_loss: -0.04794009434408508
          vf_explained_var: 0.771279513835907
          vf_loss: 0.029130497598089278
    num_agent_steps_sampled: 9109144
    num_steps_sampled: 9109144
    num_steps_trained: 9109144
  iterations_since_restore: 427
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1422,62951.3,9109144,1.90194,1.978,-2,39.8744


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9117136
  custom_metrics: {}
  date: 2021-12-10_06-32-05
  done: false
  episode_len_mean: 36.89523809523809
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9083619021234057
  episode_reward_min: -2.0
  episodes_this_iter: 210
  episodes_total: 173249
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9369909316301346
          entropy_coeff: 0.0
          kl: 0.01429804356303066
          policy_loss: -0.10300084087066352
          total_loss: -0.05085950510692783
          vf_explained_var: 0.752118706703186
          vf_loss: 0.0304261845885776
    num_agent_steps_sampled: 9117136
    num_steps_sampled: 9117136
    num_steps_trained: 9117136
  iterations_since_restore: 428
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1423,62996.9,9117136,1.90836,1.9836,-2,36.8952


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9125128
  custom_metrics: {}
  date: 2021-12-10_06-32-51
  done: false
  episode_len_mean: 41.456043956043956
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.8981736243426144
  episode_reward_min: -2.0
  episodes_this_iter: 182
  episodes_total: 173431
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9856310058385134
          entropy_coeff: 0.0
          kl: 0.01408232661196962
          policy_loss: -0.10321021522395313
          total_loss: -0.0515703312921687
          vf_explained_var: 0.8023363351821899
          vf_loss: 0.03025235258974135
    num_agent_steps_sampled: 9125128
    num_steps_sampled: 9125128
    num_steps_trained: 9125128
  iterations_since_restore: 429


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1424,63043.1,9125128,1.89817,1.9768,-2,41.456


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9133120
  custom_metrics: {}
  date: 2021-12-10_06-33-37
  done: false
  episode_len_mean: 36.21491228070175
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8781789436674954
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 173659
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9221638161689043
          entropy_coeff: 0.0
          kl: 0.013132908963598311
          policy_loss: -0.09183062589727342
          total_loss: -0.04218394309282303
          vf_explained_var: 0.7965041399002075
          vf_loss: 0.029701080406084657
    num_agent_steps_sampled: 9133120
    num_steps_sampled: 9133120
    num_steps_trained: 9133120
  iterations_since_restore: 43

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1425,63088.8,9133120,1.87818,1.9784,-2,36.2149


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9141112
  custom_metrics: {}
  date: 2021-12-10_06-34-23
  done: false
  episode_len_mean: 36.14678899082569
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.9281321064047856
  episode_reward_min: 1.4259999990463257
  episodes_this_iter: 218
  episodes_total: 173877
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9358510952442884
          entropy_coeff: 0.0
          kl: 0.014114710240392014
          policy_loss: -0.1005034878035076
          total_loss: -0.049544722296559485
          vf_explained_var: 0.7346504926681519
          vf_loss: 0.029522049706429243
    num_agent_steps_sampled: 9141112
    num_steps_sampled: 9141112
    num_steps_trained: 9141112
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1426,63134.7,9141112,1.92813,1.9776,1.426,36.1468


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9149104
  custom_metrics: {}
  date: 2021-12-10_06-35-09
  done: false
  episode_len_mean: 36.310204081632655
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9277746945011371
  episode_reward_min: 0.0
  episodes_this_iter: 245
  episodes_total: 174122
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8764912057667971
          entropy_coeff: 0.0
          kl: 0.014392750686965883
          policy_loss: -0.09815805347170681
          total_loss: -0.05120082790381275
          vf_explained_var: 0.6809305548667908
          vf_loss: 0.025098236103076488
    num_agent_steps_sampled: 9149104
    num_steps_sampled: 9149104
    num_steps_trained: 9149104
  iterations_since_restore: 43

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1427,63180.7,9149104,1.92777,1.9772,0,36.3102


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9157096
  custom_metrics: {}
  date: 2021-12-10_06-35-55
  done: false
  episode_len_mean: 31.192622950819672
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9220918001698666
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 174366
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8899863846600056
          entropy_coeff: 0.0
          kl: 0.013856878562364727
          policy_loss: -0.09474243977456354
          total_loss: -0.0481845079921186
          vf_explained_var: 0.7342650890350342
          vf_loss: 0.025512797757983208
    num_agent_steps_sampled: 9157096
    num_steps_sampled: 9157096
    num_steps_trained: 9157096
  iterations_since_restore: 43

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1428,63227,9157096,1.92209,1.9784,-2,31.1926


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9165088
  custom_metrics: {}
  date: 2021-12-10_06-36-41
  done: false
  episode_len_mean: 36.44796380090498
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8925882345950442
  episode_reward_min: -2.0
  episodes_this_iter: 221
  episodes_total: 174587
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9236051924526691
          entropy_coeff: 0.0
          kl: 0.013345601182663813
          policy_loss: -0.09413007751572877
          total_loss: -0.04717624561453704
          vf_explained_var: 0.8309361338615417
          vf_loss: 0.026685199583880603
    num_agent_steps_sampled: 9165088
    num_steps_sampled: 9165088
    num_steps_trained: 9165088
  iterations_since_restore: 43

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1429,63273.2,9165088,1.89259,1.984,-2,36.448


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9173080
  custom_metrics: {}
  date: 2021-12-10_06-37-27
  done: false
  episode_len_mean: 34.266949152542374
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9318525437581338
  episode_reward_min: 1.3408000469207764
  episodes_this_iter: 236
  episodes_total: 174823
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9224465973675251
          entropy_coeff: 0.0
          kl: 0.013834337529260665
          policy_loss: -0.0977960524323862
          total_loss: -0.05110921896994114
          vf_explained_var: 0.7317452430725098
          vf_loss: 0.025675932061858475
    num_agent_steps_sampled: 9173080
    num_steps_sampled: 9173080
    num_steps_trained: 9173080
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1430,63319,9173080,1.93185,1.984,1.3408,34.2669


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9181072
  custom_metrics: {}
  date: 2021-12-10_06-38-13
  done: false
  episode_len_mean: 35.17937219730942
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9300717510984617
  episode_reward_min: 1.6363999843597412
  episodes_this_iter: 223
  episodes_total: 175046
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9345050752162933
          entropy_coeff: 0.0
          kl: 0.015281860454706475
          policy_loss: -0.10551802310510539
          total_loss: -0.05803439096780494
          vf_explained_var: 0.7298946976661682
          vf_loss: 0.024274305091239512
    num_agent_steps_sampled: 9181072
    num_steps_sampled: 9181072
    num_steps_trained: 9181072
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1431,63365,9181072,1.93007,1.9808,1.6364,35.1794


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9189064
  custom_metrics: {}
  date: 2021-12-10_06-38-59
  done: false
  episode_len_mean: 33.02620087336245
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9176244548314523
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 175275
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9554367158561945
          entropy_coeff: 0.0
          kl: 0.01398846975644119
          policy_loss: -0.10412540752440691
          total_loss: -0.05600859920377843
          vf_explained_var: 0.7842718958854675
          vf_loss: 0.0268718209117651
    num_agent_steps_sampled: 9189064
    num_steps_sampled: 9189064
    num_steps_trained: 9189064
  iterations_since_restore: 437
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1432,63410.8,9189064,1.91762,1.984,-2,33.0262


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9197056
  custom_metrics: {}
  date: 2021-12-10_06-39-45
  done: false
  episode_len_mean: 36.268085106382976
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8951387240531596
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 175510
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9069140311330557
          entropy_coeff: 0.0
          kl: 0.013554303179262206
          policy_loss: -0.09809156536357477
          total_loss: -0.053830081422347575
          vf_explained_var: 0.8019841313362122
          vf_loss: 0.023675886739511043
    num_agent_steps_sampled: 9197056
    num_steps_sampled: 9197056
    num_steps_trained: 9197056
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1433,63456.6,9197056,1.89514,1.9816,-2,36.2681


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9205048
  custom_metrics: {}
  date: 2021-12-10_06-40-30
  done: false
  episode_len_mean: 34.10699588477366
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9322041175002425
  episode_reward_min: 1.4731999635696411
  episodes_this_iter: 243
  episodes_total: 175753
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.890777587890625
          entropy_coeff: 0.0
          kl: 0.014339238521642983
          policy_loss: -0.10335171766928397
          total_loss: -0.05953854418476112
          vf_explained_var: 0.6819925308227539
          vf_loss: 0.022035455622244626
    num_agent_steps_sampled: 9205048
    num_steps_sampled: 9205048
    num_steps_trained: 9205048
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1434,63502.2,9205048,1.9322,1.9816,1.4732,34.107


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9213040
  custom_metrics: {}
  date: 2021-12-10_06-41-16
  done: false
  episode_len_mean: 31.883534136546185
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.9049895594876454
  episode_reward_min: -2.0
  episodes_this_iter: 249
  episodes_total: 176002
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8811868019402027
          entropy_coeff: 0.0
          kl: 0.013188681768951938
          policy_loss: -0.08935705298790708
          total_loss: -0.04618300116271712
          vf_explained_var: 0.7219679355621338
          vf_loss: 0.023143742000684142
    num_agent_steps_sampled: 9213040
    num_steps_sampled: 9213040
    num_steps_trained: 9213040
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1435,63548.3,9213040,1.90499,1.9776,-2,31.8835


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9221032
  custom_metrics: {}
  date: 2021-12-10_06-42-02
  done: false
  episode_len_mean: 32.497975708502025
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9195255031469862
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 176249
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8928644564002752
          entropy_coeff: 0.0
          kl: 0.013570629438618198
          policy_loss: -0.09052870149025694
          total_loss: -0.04743423085892573
          vf_explained_var: 0.7422919273376465
          vf_loss: 0.022484078275738284
    num_agent_steps_sampled: 9221032
    num_steps_sampled: 9221032
    num_steps_trained: 9221032
  iterations_since_restore: 44

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1436,63594,9221032,1.91953,1.9816,-2,32.498


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9229024
  custom_metrics: {}
  date: 2021-12-10_06-42-48
  done: false
  episode_len_mean: 33.04526748971193
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9342979408585976
  episode_reward_min: 1.704800009727478
  episodes_this_iter: 243
  episodes_total: 176492
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8833778761327267
          entropy_coeff: 0.0
          kl: 0.014450480171944946
          policy_loss: -0.1025001485249959
          total_loss: -0.05772578690084629
          vf_explained_var: 0.647097110748291
          vf_loss: 0.022827693494036794
    num_agent_steps_sampled: 9229024
    num_steps_sampled: 9229024
    num_steps_trained: 9229024
  iterations_since_r

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1437,63640.1,9229024,1.9343,1.9816,1.7048,33.0453


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9237016
  custom_metrics: {}
  date: 2021-12-10_06-43-34
  done: false
  episode_len_mean: 32.959349593495936
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9185186978278121
  episode_reward_min: -2.0
  episodes_this_iter: 246
  episodes_total: 176738
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8863638453185558
          entropy_coeff: 0.0
          kl: 0.013402037526248023
          policy_loss: -0.09040956728858873
          total_loss: -0.04998976274509914
          vf_explained_var: 0.7547469735145569
          vf_loss: 0.020065461401827633
    num_agent_steps_sampled: 9237016
    num_steps_sampled: 9237016
    num_steps_trained: 9237016
  iterations_since_restore: 44

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1438,63685.8,9237016,1.91852,1.9816,-2,32.9593


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9245008
  custom_metrics: {}
  date: 2021-12-10_06-44-20
  done: false
  episode_len_mean: 29.776470588235295
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.910266666318856
  episode_reward_min: -2.0
  episodes_this_iter: 255
  episodes_total: 176993
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8958989009261131
          entropy_coeff: 0.0
          kl: 0.013696883630473167
          policy_loss: -0.09729431191226467
          total_loss: -0.05188851914135739
          vf_explained_var: 0.750371515750885
          vf_loss: 0.024603649333585054
    num_agent_steps_sampled: 9245008
    num_steps_sampled: 9245008
    num_steps_trained: 9245008
  iterations_since_restore: 444

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1439,63731.7,9245008,1.91027,1.9824,-2,29.7765


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9253000
  custom_metrics: {}
  date: 2021-12-10_06-45-06
  done: false
  episode_len_mean: 33.63111111111111
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9158702241049872
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 177218
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9390161018818617
          entropy_coeff: 0.0
          kl: 0.014688722760183737
          policy_loss: -0.09808855460141785
          total_loss: -0.04718764475546777
          vf_explained_var: 0.7188690900802612
          vf_loss: 0.028592414630111307
    num_agent_steps_sampled: 9253000
    num_steps_sampled: 9253000
    num_steps_trained: 9253000
  iterations_since_restore: 44

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1440,63777.4,9253000,1.91587,1.9844,-2,33.6311


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9260992
  custom_metrics: {}
  date: 2021-12-10_06-45-51
  done: false
  episode_len_mean: 33.77927927927928
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8823153140308622
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 177440
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9536414835602045
          entropy_coeff: 0.0
          kl: 0.013229360483819619
          policy_loss: -0.09399683534866199
          total_loss: -0.04526444562361576
          vf_explained_var: 0.815343976020813
          vf_loss: 0.028640300559345633
    num_agent_steps_sampled: 9260992
    num_steps_sampled: 9260992
    num_steps_trained: 9260992
  iterations_since_restore: 446

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1441,63823,9260992,1.88232,1.9844,-2,33.7793


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9268984
  custom_metrics: {}
  date: 2021-12-10_06-46-37
  done: false
  episode_len_mean: 35.28630705394191
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8665543552256223
  episode_reward_min: -2.0
  episodes_this_iter: 241
  episodes_total: 177681
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9012405350804329
          entropy_coeff: 0.0
          kl: 0.012442453124094754
          policy_loss: -0.09241872164420784
          total_loss: -0.046044174378039315
          vf_explained_var: 0.8374933004379272
          vf_loss: 0.027477574651129544
    num_agent_steps_sampled: 9268984
    num_steps_sampled: 9268984
    num_steps_trained: 9268984
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1442,63868.8,9268984,1.86655,1.9844,-2,35.2863


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9276976
  custom_metrics: {}
  date: 2021-12-10_06-47-23
  done: false
  episode_len_mean: 34.38333333333333
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9316799983382225
  episode_reward_min: 0.0
  episodes_this_iter: 240
  episodes_total: 177921
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9179275762289762
          entropy_coeff: 0.0
          kl: 0.014567831822205335
          policy_loss: -0.1038479873968754
          total_loss: -0.05594773538177833
          vf_explained_var: 0.7750616073608398
          vf_loss: 0.025775353016797453
    num_agent_steps_sampled: 9276976
    num_steps_sampled: 9276976
    num_steps_trained: 9276976
  iterations_since_restore: 448


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1443,63914.6,9276976,1.93168,1.9812,0,34.3833


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9284968
  custom_metrics: {}
  date: 2021-12-10_06-48-09
  done: false
  episode_len_mean: 34.765957446808514
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8816578717941934
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 178156
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9042705930769444
          entropy_coeff: 0.0
          kl: 0.012103530258173123
          policy_loss: -0.090566383296391
          total_loss: -0.040686295775230974
          vf_explained_var: 0.8284117579460144
          vf_loss: 0.031497852818574756
    num_agent_steps_sampled: 9284968
    num_steps_sampled: 9284968
    num_steps_trained: 9284968
  iterations_since_restore: 44

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1444,63960.6,9284968,1.88166,1.9844,-2,34.766


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9292960
  custom_metrics: {}
  date: 2021-12-10_06-48-55
  done: false
  episode_len_mean: 31.887966804979254
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9207485469050427
  episode_reward_min: -2.0
  episodes_this_iter: 241
  episodes_total: 178397
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9008032288402319
          entropy_coeff: 0.0
          kl: 0.014082156616495922
          policy_loss: -0.1041091805382166
          total_loss: -0.05912287779210601
          vf_explained_var: 0.7705192565917969
          vf_loss: 0.023599024687428027
    num_agent_steps_sampled: 9292960
    num_steps_sampled: 9292960
    num_steps_trained: 9292960
  iterations_since_restore: 45

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1445,64006.2,9292960,1.92075,1.9824,-2,31.888


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9300952
  custom_metrics: {}
  date: 2021-12-10_06-49-40
  done: false
  episode_len_mean: 34.26337448559671
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.900118517041697
  episode_reward_min: -2.0
  episodes_this_iter: 243
  episodes_total: 178640
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8885385394096375
          entropy_coeff: 0.0
          kl: 0.013187841948820278
          policy_loss: -0.09922094803187065
          total_loss: -0.051804661299684085
          vf_explained_var: 0.8068041801452637
          vf_loss: 0.02738725277595222
    num_agent_steps_sampled: 9300952
    num_steps_sampled: 9300952
    num_steps_trained: 9300952
  iterations_since_restore: 451


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1446,64051.8,9300952,1.90012,1.9828,-2,34.2634


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9308944
  custom_metrics: {}
  date: 2021-12-10_06-50-26
  done: false
  episode_len_mean: 34.15021459227468
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9321287527616444
  episode_reward_min: 1.4700000286102295
  episodes_this_iter: 233
  episodes_total: 178873
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8973047975450754
          entropy_coeff: 0.0
          kl: 0.014146668690955266
          policy_loss: -0.10301569220609963
          total_loss: -0.05002319569757674
          vf_explained_var: 0.656853437423706
          vf_loss: 0.031507239618804306
    num_agent_steps_sampled: 9308944
    num_steps_sampled: 9308944
    num_steps_trained: 9308944
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1447,64097.4,9308944,1.93213,1.9832,1.47,34.1502


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9316936
  custom_metrics: {}
  date: 2021-12-10_06-51-12
  done: false
  episode_len_mean: 35.92056074766355
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8753177584888778
  episode_reward_min: -2.0
  episodes_this_iter: 214
  episodes_total: 179087
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9333390071988106
          entropy_coeff: 0.0
          kl: 0.013103154604323208
          policy_loss: -0.0933167127368506
          total_loss: -0.04149318078998476
          vf_explained_var: 0.8287016749382019
          vf_loss: 0.031923113972879946
    num_agent_steps_sampled: 9316936
    num_steps_sampled: 9316936
    num_steps_trained: 9316936
  iterations_since_restore: 453


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1448,64143.3,9316936,1.87532,1.9828,-2,35.9206


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9324928
  custom_metrics: {}
  date: 2021-12-10_06-51-57
  done: false
  episode_len_mean: 39.75257731958763
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8239979375268995
  episode_reward_min: -2.0
  episodes_this_iter: 194
  episodes_total: 179281
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9804357308894396
          entropy_coeff: 0.0
          kl: 0.012607149023097008
          policy_loss: -0.09113185106252786
          total_loss: -0.03477998488233425
          vf_explained_var: 0.8722198009490967
          vf_loss: 0.03720475849695504
    num_agent_steps_sampled: 9324928
    num_steps_sampled: 9324928
    num_steps_trained: 9324928
  iterations_since_restore: 454


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1449,64188.5,9324928,1.824,1.9828,-2,39.7526


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9332920
  custom_metrics: {}
  date: 2021-12-10_06-52-43
  done: false
  episode_len_mean: 34.37850467289719
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.8958130829802184
  episode_reward_min: -2.0
  episodes_this_iter: 214
  episodes_total: 179495
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9692616332322359
          entropy_coeff: 0.0
          kl: 0.01294854047591798
          policy_loss: -0.08908609833451919
          total_loss: -0.030974352033808827
          vf_explained_var: 0.8062894344329834
          vf_loss: 0.03844615223351866
    num_agent_steps_sampled: 9332920
    num_steps_sampled: 9332920
    num_steps_trained: 9332920
  iterations_since_restore: 455

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1450,64234.4,9332920,1.89581,1.9768,-2,34.3785


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9340912
  custom_metrics: {}
  date: 2021-12-10_06-53-29
  done: false
  episode_len_mean: 38.17
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.885628000497818
  episode_reward_min: -2.0
  episodes_this_iter: 200
  episodes_total: 179695
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9758800510317087
          entropy_coeff: 0.0
          kl: 0.01350224541965872
          policy_loss: -0.09725390837411396
          total_loss: -0.03952655684406636
          vf_explained_var: 0.8403611183166504
          vf_loss: 0.037220816942863166
    num_agent_steps_sampled: 9340912
    num_steps_sampled: 9340912
    num_steps_trained: 9340912
  iterations_since_restore: 456
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1451,64280.2,9340912,1.88563,1.9768,-2,38.17


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9348904
  custom_metrics: {}
  date: 2021-12-10_06-54-15
  done: false
  episode_len_mean: 33.263392857142854
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9176553594214576
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 179919
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9550968427211046
          entropy_coeff: 0.0
          kl: 0.013614453026093543
          policy_loss: -0.09856211743317544
          total_loss: -0.04792771514621563
          vf_explained_var: 0.865452766418457
          vf_loss: 0.02995745267253369
    num_agent_steps_sampled: 9348904
    num_steps_sampled: 9348904
    num_steps_trained: 9348904
  iterations_since_restore: 457


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1452,64325.9,9348904,1.91766,1.9792,-2,33.2634


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9356896
  custom_metrics: {}
  date: 2021-12-10_06-55-00
  done: false
  episode_len_mean: 41.532467532467535
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9173281359853167
  episode_reward_min: 0.053599998354911804
  episodes_this_iter: 231
  episodes_total: 180150
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9044516794383526
          entropy_coeff: 0.0
          kl: 0.014262626704294235
          policy_loss: -0.1035181934130378
          total_loss: -0.05236556101590395
          vf_explained_var: 0.786102831363678
          vf_loss: 0.029491268447600305
    num_agent_steps_sampled: 9356896
    num_steps_sampled: 9356896
    num_steps_trained: 9356896
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1453,64371.5,9356896,1.91733,1.9816,0.0536,41.5325


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9364888
  custom_metrics: {}
  date: 2021-12-10_06-55-46
  done: false
  episode_len_mean: 36.890829694323145
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8759633219398266
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 180379
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8943744022399187
          entropy_coeff: 0.0
          kl: 0.012833697372116148
          policy_loss: -0.09015054938936373
          total_loss: -0.03162673159386031
          vf_explained_var: 0.7699394822120667
          vf_loss: 0.03903263801475987
    num_agent_steps_sampled: 9364888
    num_steps_sampled: 9364888
    num_steps_trained: 9364888
  iterations_since_restore: 459

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1454,64417.5,9364888,1.87596,1.9792,-2,36.8908


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9372880
  custom_metrics: {}
  date: 2021-12-10_06-56-32
  done: false
  episode_len_mean: 35.767857142857146
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9126892840223653
  episode_reward_min: -2.0
  episodes_this_iter: 224
  episodes_total: 180603
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9108240529894829
          entropy_coeff: 0.0
          kl: 0.01388823613524437
          policy_loss: -0.10048119825660251
          total_loss: -0.05266255303286016
          vf_explained_var: 0.7730344533920288
          vf_loss: 0.02672588877612725
    num_agent_steps_sampled: 9372880
    num_steps_sampled: 9372880
    num_steps_trained: 9372880
  iterations_since_restore: 460


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1455,64463.2,9372880,1.91269,1.9816,-2,35.7679


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9380872
  custom_metrics: {}
  date: 2021-12-10_06-57-18
  done: false
  episode_len_mean: 35.07109004739336
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8772492883329708
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 180814
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9584939908236265
          entropy_coeff: 0.0
          kl: 0.013549065159168094
          policy_loss: -0.09879969817120582
          total_loss: -0.048160260877921246
          vf_explained_var: 0.8472567796707153
          vf_loss: 0.030061794503126293
    num_agent_steps_sampled: 9380872
    num_steps_sampled: 9380872
    num_steps_trained: 9380872
  iterations_since_restore: 46

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1456,64508.9,9380872,1.87725,1.9816,-2,35.0711


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9388864
  custom_metrics: {}
  date: 2021-12-10_06-58-04
  done: false
  episode_len_mean: 35.082191780821915
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8948182656884738
  episode_reward_min: -2.0
  episodes_this_iter: 219
  episodes_total: 181033
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9592212755233049
          entropy_coeff: 0.0
          kl: 0.01399957129615359
          policy_loss: -0.10081737363361754
          total_loss: -0.048362127621658146
          vf_explained_var: 0.8299672603607178
          vf_loss: 0.031193397473543882
    num_agent_steps_sampled: 9388864
    num_steps_sampled: 9388864
    num_steps_trained: 9388864
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1457,64554.9,9388864,1.89482,1.984,-2,35.0822


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9396856
  custom_metrics: {}
  date: 2021-12-10_06-58-49
  done: false
  episode_len_mean: 36.63716814159292
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9270938059924978
  episode_reward_min: 0.8051999807357788
  episodes_this_iter: 226
  episodes_total: 181259
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9476004559546709
          entropy_coeff: 0.0
          kl: 0.014477462536888197
          policy_loss: -0.10452831539441831
          total_loss: -0.05586285176104866
          vf_explained_var: 0.7580201625823975
          vf_loss: 0.026677818095777184
    num_agent_steps_sampled: 9396856
    num_steps_sampled: 9396856
    num_steps_trained: 9396856
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1458,64600.4,9396856,1.92709,1.982,0.8052,36.6372


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9404848
  custom_metrics: {}
  date: 2021-12-10_06-59-35
  done: false
  episode_len_mean: 31.67948717948718
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9209948752680395
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 181493
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9520869571715593
          entropy_coeff: 0.0
          kl: 0.014169082045555115
          policy_loss: -0.09601994213880971
          total_loss: -0.04769540159031749
          vf_explained_var: 0.8230395317077637
          vf_loss: 0.02680524770403281
    num_agent_steps_sampled: 9404848
    num_steps_sampled: 9404848
    num_steps_trained: 9404848
  iterations_since_restore: 464

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1459,64645.9,9404848,1.92099,1.982,-2,31.6795


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9412840
  custom_metrics: {}
  date: 2021-12-10_07-00-20
  done: false
  episode_len_mean: 36.64622641509434
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8548094384231657
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 181705
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9479351229965687
          entropy_coeff: 0.0
          kl: 0.01354170698323287
          policy_loss: -0.09449254139326513
          total_loss: -0.03522861047531478
          vf_explained_var: 0.844710111618042
          vf_loss: 0.03869746584678069
    num_agent_steps_sampled: 9412840
    num_steps_sampled: 9412840
    num_steps_trained: 9412840
  iterations_since_restore: 465
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1460,64691.5,9412840,1.85481,1.984,-2,36.6462


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9420832
  custom_metrics: {}
  date: 2021-12-10_07-01-06
  done: false
  episode_len_mean: 37.522633744855966
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9098288071008376
  episode_reward_min: -2.0
  episodes_this_iter: 243
  episodes_total: 181948
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9274778198450804
          entropy_coeff: 0.0
          kl: 0.014807180181378499
          policy_loss: -0.09906017588218674
          total_loss: -0.04251540785480756
          vf_explained_var: 0.765270471572876
          vf_loss: 0.03405636525712907
    num_agent_steps_sampled: 9420832
    num_steps_sampled: 9420832
    num_steps_trained: 9420832
  iterations_since_restore: 466

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1461,64737.3,9420832,1.90983,1.9848,-2,37.5226


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9428824
  custom_metrics: {}
  date: 2021-12-10_07-01-52
  done: false
  episode_len_mean: 31.923766816143498
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9195838583959057
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 182171
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9498500637710094
          entropy_coeff: 0.0
          kl: 0.014215790579328313
          policy_loss: -0.10260700737126172
          total_loss: -0.05249715375248343
          vf_explained_var: 0.8116856217384338
          vf_loss: 0.028519622108433396
    num_agent_steps_sampled: 9428824
    num_steps_sampled: 9428824
    num_steps_trained: 9428824
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1462,64782.9,9428824,1.91958,1.9848,-2,31.9238


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9436816
  custom_metrics: {}
  date: 2021-12-10_07-02-38
  done: false
  episode_len_mean: 39.44736842105263
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8783649130348574
  episode_reward_min: -2.0
  episodes_this_iter: 228
  episodes_total: 182399
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9337529148906469
          entropy_coeff: 0.0
          kl: 0.012796178867574781
          policy_loss: -0.0915540570858866
          total_loss: -0.040000287641305476
          vf_explained_var: 0.8464518189430237
          vf_loss: 0.03211957192979753
    num_agent_steps_sampled: 9436816
    num_steps_sampled: 9436816
    num_steps_trained: 9436816
  iterations_since_restore: 468

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1463,64828.8,9436816,1.87836,1.9848,-2,39.4474


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9444808
  custom_metrics: {}
  date: 2021-12-10_07-03-24
  done: false
  episode_len_mean: 36.11914893617021
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.912243402004242
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 182634
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9062905926257372
          entropy_coeff: 0.0
          kl: 0.01371554788784124
          policy_loss: -0.09974749191314913
          total_loss: -0.04994255318888463
          vf_explained_var: 0.7741837501525879
          vf_loss: 0.028974453278351575
    num_agent_steps_sampled: 9444808
    num_steps_sampled: 9444808
    num_steps_trained: 9444808
  iterations_since_restore: 469
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1464,64874.6,9444808,1.91224,1.9832,-2,36.1191


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9452800
  custom_metrics: {}
  date: 2021-12-10_07-04-10
  done: false
  episode_len_mean: 33.36595744680851
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.918296169220133
  episode_reward_min: -2.0
  episodes_this_iter: 235
  episodes_total: 182869
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8924427628517151
          entropy_coeff: 0.0
          kl: 0.01365903895930387
          policy_loss: -0.09733478963607922
          total_loss: -0.052276681701187044
          vf_explained_var: 0.7728856205940247
          vf_loss: 0.024313441012054682
    num_agent_steps_sampled: 9452800
    num_steps_sampled: 9452800
    num_steps_trained: 9452800
  iterations_since_restore: 470


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1465,64920.5,9452800,1.9183,1.9816,-2,33.366


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9460792
  custom_metrics: {}
  date: 2021-12-10_07-04-55
  done: false
  episode_len_mean: 34.01851851851852
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.879994440961767
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 183085
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9232973363250494
          entropy_coeff: 0.0
          kl: 0.012954889476532117
          policy_loss: -0.09313858300447464
          total_loss: -0.04629291972378269
          vf_explained_var: 0.8435667157173157
          vf_loss: 0.027170428598765284
    num_agent_steps_sampled: 9460792
    num_steps_sampled: 9460792
    num_steps_trained: 9460792
  iterations_since_restore: 471

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1466,64966.2,9460792,1.87999,1.98,-2,34.0185


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9468784
  custom_metrics: {}
  date: 2021-12-10_07-05-41
  done: false
  episode_len_mean: 31.93625498007968
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.921241434446844
  episode_reward_min: -2.0
  episodes_this_iter: 251
  episodes_total: 183336
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8865011092275381
          entropy_coeff: 0.0
          kl: 0.014024155301740393
          policy_loss: -0.10062518969061784
          total_loss: -0.05457225043210201
          vf_explained_var: 0.8170384764671326
          vf_loss: 0.024753753503318876
    num_agent_steps_sampled: 9468784
    num_steps_sampled: 9468784
    num_steps_trained: 9468784
  iterations_since_restore: 472

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1467,65012.1,9468784,1.92124,1.98,-2,31.9363


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9476776
  custom_metrics: {}
  date: 2021-12-10_07-06-27
  done: false
  episode_len_mean: 34.764462809917354
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8515537226003063
  episode_reward_min: -2.0
  episodes_this_iter: 242
  episodes_total: 183578
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8888597190380096
          entropy_coeff: 0.0
          kl: 0.0128647439123597
          policy_loss: -0.08904014099971391
          total_loss: -0.04012901338865049
          vf_explained_var: 0.8648797273635864
          vf_loss: 0.02937280072364956
    num_agent_steps_sampled: 9476776
    num_steps_sampled: 9476776
    num_steps_trained: 9476776
  iterations_since_restore: 473
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1468,65058,9476776,1.85155,1.9796,-2,34.7645


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9484768
  custom_metrics: {}
  date: 2021-12-10_07-07-13
  done: false
  episode_len_mean: 34.84033613445378
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8979478999346244
  episode_reward_min: -2.0
  episodes_this_iter: 238
  episodes_total: 183816
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.881188377737999
          entropy_coeff: 0.0
          kl: 0.013468299293890595
          policy_loss: -0.09500583095359616
          total_loss: -0.04332135364529677
          vf_explained_var: 0.7545369863510132
          vf_loss: 0.03122949757380411
    num_agent_steps_sampled: 9484768
    num_steps_sampled: 9484768
    num_steps_trained: 9484768
  iterations_since_restore: 474
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1469,65103.7,9484768,1.89795,1.9832,-2,34.8403


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9492760
  custom_metrics: {}
  date: 2021-12-10_07-07-59
  done: false
  episode_len_mean: 31.0859375
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9080671849660575
  episode_reward_min: -2.0
  episodes_this_iter: 256
  episodes_total: 184072
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8309895657002926
          entropy_coeff: 0.0
          kl: 0.012514443253166974
          policy_loss: -0.08774933498352766
          total_loss: -0.03567233615467558
          vf_explained_var: 0.7575958967208862
          vf_loss: 0.03307068528374657
    num_agent_steps_sampled: 9492760
    num_steps_sampled: 9492760
    num_steps_trained: 9492760
  iterations_since_restore: 475
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1470,65149.4,9492760,1.90807,1.9832,-2,31.0859


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9500752
  custom_metrics: {}
  date: 2021-12-10_07-08-45
  done: false
  episode_len_mean: 30.779026217228463
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9097797785805406
  episode_reward_min: -2.0
  episodes_this_iter: 267
  episodes_total: 184339
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8567366320639849
          entropy_coeff: 0.0
          kl: 0.013362072117161006
          policy_loss: -0.0889403238252271
          total_loss: -0.03622439276659861
          vf_explained_var: 0.7472245693206787
          vf_loss: 0.03242228820454329
    num_agent_steps_sampled: 9500752
    num_steps_sampled: 9500752
    num_steps_trained: 9500752
  iterations_since_restore: 476


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1471,65195.5,9500752,1.90978,1.9832,-2,30.779


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9508744
  custom_metrics: {}
  date: 2021-12-10_07-09-31
  done: false
  episode_len_mean: 30.058365758754864
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9251377452672223
  episode_reward_min: -2.0
  episodes_this_iter: 257
  episodes_total: 184596
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8806407433003187
          entropy_coeff: 0.0
          kl: 0.014140186714939773
          policy_loss: -0.10248860434512608
          total_loss: -0.05531735118711367
          vf_explained_var: 0.7802547216415405
          vf_loss: 0.025695846474263817
    num_agent_steps_sampled: 9508744
    num_steps_sampled: 9508744
    num_steps_trained: 9508744
  iterations_since_restore: 47

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1472,65241.1,9508744,1.92514,1.9832,-2,30.0584


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9516736
  custom_metrics: {}
  date: 2021-12-10_07-10-17
  done: false
  episode_len_mean: 30.508196721311474
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.859613114204563
  episode_reward_min: -2.0
  episodes_this_iter: 244
  episodes_total: 184840
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9394635148346424
          entropy_coeff: 0.0
          kl: 0.011975225992500782
          policy_loss: -0.08431897478294559
          total_loss: -0.035962737165391445
          vf_explained_var: 0.8268740177154541
          vf_loss: 0.03016886324621737
    num_agent_steps_sampled: 9516736
    num_steps_sampled: 9516736
    num_steps_trained: 9516736
  iterations_since_restore: 478

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1473,65287.3,9516736,1.85961,1.9828,-2,30.5082


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9524728
  custom_metrics: {}
  date: 2021-12-10_07-11-03
  done: false
  episode_len_mean: 33.29004329004329
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9167099577007871
  episode_reward_min: -2.0
  episodes_this_iter: 231
  episodes_total: 185071
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9369148071855307
          entropy_coeff: 0.0
          kl: 0.012952750315889716
          policy_loss: -0.09424463112372905
          total_loss: -0.04407502787944395
          vf_explained_var: 0.7812913060188293
          vf_loss: 0.03049761103466153
    num_agent_steps_sampled: 9524728
    num_steps_sampled: 9524728
    num_steps_trained: 9524728
  iterations_since_restore: 479


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1474,65333.6,9524728,1.91671,1.9832,-2,33.29


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9532720
  custom_metrics: {}
  date: 2021-12-10_07-11-49
  done: false
  episode_len_mean: 32.9051724137931
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9346620702537998
  episode_reward_min: 1.0435999631881714
  episodes_this_iter: 232
  episodes_total: 185303
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9516801778227091
          entropy_coeff: 0.0
          kl: 0.014249237428884953
          policy_loss: -0.11038109647051897
          total_loss: -0.06355845887446776
          vf_explained_var: 0.8020405769348145
          vf_loss: 0.025181609351420775
    num_agent_steps_sampled: 9532720
    num_steps_sampled: 9532720
    num_steps_trained: 9532720
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1475,65379.4,9532720,1.93466,1.9832,1.0436,32.9052


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9540712
  custom_metrics: {}
  date: 2021-12-10_07-12-35
  done: false
  episode_len_mean: 39.77272727272727
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8852927300063047
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 185523
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9149680994451046
          entropy_coeff: 0.0
          kl: 0.01383146722218953
          policy_loss: -0.09387283120304346
          total_loss: -0.04182637500343844
          vf_explained_var: 0.756081759929657
          vf_loss: 0.031039918190799654
    num_agent_steps_sampled: 9540712
    num_steps_sampled: 9540712
    num_steps_trained: 9540712
  iterations_since_restore: 481
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1476,65425.3,9540712,1.88529,1.9828,-2,39.7727


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9548704
  custom_metrics: {}
  date: 2021-12-10_07-13-20
  done: false
  episode_len_mean: 34.51851851851852
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.931353083853859
  episode_reward_min: 0.8320000171661377
  episodes_this_iter: 243
  episodes_total: 185766
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8606545682996511
          entropy_coeff: 0.0
          kl: 0.014541834505507722
          policy_loss: -0.10369871708098799
          total_loss: -0.055299546496826224
          vf_explained_var: 0.6494994163513184
          vf_loss: 0.026313762355130166
    num_agent_steps_sampled: 9548704
    num_steps_sampled: 9548704
    num_steps_trained: 9548704
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1477,65470.9,9548704,1.93135,1.9828,0.832,34.5185


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9556696
  custom_metrics: {}
  date: 2021-12-10_07-14-07
  done: false
  episode_len_mean: 31.347457627118644
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9214152527057518
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 186002
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9232018478214741
          entropy_coeff: 0.0
          kl: 0.01395714475074783
          policy_loss: -0.10223318517091684
          total_loss: -0.06063224462559447
          vf_explained_var: 0.8205901980400085
          vf_loss: 0.020403525966685265
    num_agent_steps_sampled: 9556696
    num_steps_sampled: 9556696
    num_steps_trained: 9556696
  iterations_since_restore: 483

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1478,65517.1,9556696,1.92142,1.9828,-2,31.3475


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9564688
  custom_metrics: {}
  date: 2021-12-10_07-14-52
  done: false
  episode_len_mean: 35.572072072072075
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9292000000541274
  episode_reward_min: 1.1643999814987183
  episodes_this_iter: 222
  episodes_total: 186224
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9716715663671494
          entropy_coeff: 0.0
          kl: 0.014075977873289958
          policy_loss: -0.11170089349616319
          total_loss: -0.07004426483763382
          vf_explained_var: 0.8012298345565796
          vf_loss: 0.02027873817132786
    num_agent_steps_sampled: 9564688
    num_steps_sampled: 9564688
    num_steps_trained: 9564688
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1479,65562.9,9564688,1.9292,1.98,1.1644,35.5721


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9572680
  custom_metrics: {}
  date: 2021-12-10_07-15-38
  done: false
  episode_len_mean: 35.788990825688074
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9107211007984406
  episode_reward_min: -2.0
  episodes_this_iter: 218
  episodes_total: 186442
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.931742899119854
          entropy_coeff: 0.0
          kl: 0.013618883182061836
          policy_loss: -0.08950408396776766
          total_loss: -0.0351033580082003
          vf_explained_var: 0.7144651412963867
          vf_loss: 0.0337170529528521
    num_agent_steps_sampled: 9572680
    num_steps_sampled: 9572680
    num_steps_trained: 9572680
  iterations_since_restore: 485
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1480,65608.8,9572680,1.91072,1.9784,-2,35.789


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9580672
  custom_metrics: {}
  date: 2021-12-10_07-16-24
  done: false
  episode_len_mean: 36.06072874493927
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.85598218634061
  episode_reward_min: -2.0
  episodes_this_iter: 247
  episodes_total: 186689
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8632004801183939
          entropy_coeff: 0.0
          kl: 0.01269314149976708
          policy_loss: -0.08977764868177474
          total_loss: -0.036389367014635354
          vf_explained_var: 0.7672432661056519
          vf_loss: 0.03411057370249182
    num_agent_steps_sampled: 9580672
    num_steps_sampled: 9580672
    num_steps_trained: 9580672
  iterations_since_restore: 486
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1481,65654.5,9580672,1.85598,1.9808,-2,36.0607


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9588664
  custom_metrics: {}
  date: 2021-12-10_07-17-10
  done: false
  episode_len_mean: 33.26068376068376
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.917376069431631
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 186923
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9039098918437958
          entropy_coeff: 0.0
          kl: 0.014098681509494781
          policy_loss: -0.10430820676265284
          total_loss: -0.051013814663747326
          vf_explained_var: 0.7743576765060425
          vf_loss: 0.031882019713521004
    num_agent_steps_sampled: 9588664
    num_steps_sampled: 9588664
    num_steps_trained: 9588664
  iterations_since_restore: 487

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1482,65700.3,9588664,1.91738,1.9792,-2,33.2607


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9596656
  custom_metrics: {}
  date: 2021-12-10_07-17-56
  done: false
  episode_len_mean: 35.60829493087557
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9291188920148508
  episode_reward_min: 1.4603999853134155
  episodes_this_iter: 217
  episodes_total: 187140
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9138210918754339
          entropy_coeff: 0.0
          kl: 0.01431671270984225
          policy_loss: -0.10108136384224053
          total_loss: -0.04799790165270679
          vf_explained_var: 0.7818421721458435
          vf_loss: 0.031339956214651465
    num_agent_steps_sampled: 9596656
    num_steps_sampled: 9596656
    num_steps_trained: 9596656
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1483,65745.9,9596656,1.92912,1.9808,1.4604,35.6083


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9604648
  custom_metrics: {}
  date: 2021-12-10_07-18-42
  done: false
  episode_len_mean: 35.2008547008547
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.913215386052417
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 187374
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.883716706186533
          entropy_coeff: 0.0
          kl: 0.014858482230920345
          policy_loss: -0.09874055354157463
          total_loss: -0.04909863666398451
          vf_explained_var: 0.6762570142745972
          vf_loss: 0.027075599238742143
    num_agent_steps_sampled: 9604648
    num_steps_sampled: 9604648
    num_steps_trained: 9604648
  iterations_since_restore: 489
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1484,65791.8,9604648,1.91322,1.9784,-2,35.2009


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9612640
  custom_metrics: {}
  date: 2021-12-10_07-19-27
  done: false
  episode_len_mean: 34.689189189189186
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9133729687682144
  episode_reward_min: -2.0
  episodes_this_iter: 222
  episodes_total: 187596
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9043952394276857
          entropy_coeff: 0.0
          kl: 0.014010517159476876
          policy_loss: -0.10108851664699614
          total_loss: -0.05243007227545604
          vf_explained_var: 0.7438513040542603
          vf_loss: 0.027379971521440893
    num_agent_steps_sampled: 9612640
    num_steps_sampled: 9612640
    num_steps_trained: 9612640
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1485,65837.5,9612640,1.91337,1.978,-2,34.6892


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9620632
  custom_metrics: {}
  date: 2021-12-10_07-20-13
  done: false
  episode_len_mean: 35.99074074074074
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9107796262811731
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 187812
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9337663743644953
          entropy_coeff: 0.0
          kl: 0.014500857068924233
          policy_loss: -0.09518854029010981
          total_loss: -0.048969083407428116
          vf_explained_var: 0.784699559211731
          vf_loss: 0.02419627789640799
    num_agent_steps_sampled: 9620632
    num_steps_sampled: 9620632
    num_steps_trained: 9620632
  iterations_since_restore: 491

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1486,65883.3,9620632,1.91078,1.9824,-2,35.9907


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9628624
  custom_metrics: {}
  date: 2021-12-10_07-20-59
  done: false
  episode_len_mean: 36.50230414746544
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8909917026071505
  episode_reward_min: -2.0
  episodes_this_iter: 217
  episodes_total: 188029
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9146504495292902
          entropy_coeff: 0.0
          kl: 0.013002695923205465
          policy_loss: -0.09504402833408676
          total_loss: -0.046421563281910494
          vf_explained_var: 0.7673949003219604
          vf_loss: 0.028874621784780174
    num_agent_steps_sampled: 9628624
    num_steps_sampled: 9628624
    num_steps_trained: 9628624
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1487,65929.1,9628624,1.89099,1.9776,-2,36.5023


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9636616
  custom_metrics: {}
  date: 2021-12-10_07-21-45
  done: false
  episode_len_mean: 35.1822429906542
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.930007476672948
  episode_reward_min: 1.3375999927520752
  episodes_this_iter: 214
  episodes_total: 188243
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9317451436072588
          entropy_coeff: 0.0
          kl: 0.014994051860412583
          policy_loss: -0.10129020240856335
          total_loss: -0.05021849981858395
          vf_explained_var: 0.7302252054214478
          vf_loss: 0.028299487195909023
    num_agent_steps_sampled: 9636616
    num_steps_sampled: 9636616
    num_steps_trained: 9636616
  iterations_since_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1488,65974.8,9636616,1.93001,1.9792,1.3376,35.1822


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9644608
  custom_metrics: {}
  date: 2021-12-10_07-22-31
  done: false
  episode_len_mean: 38.403669724770644
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9235963318326057
  episode_reward_min: 0.0
  episodes_this_iter: 218
  episodes_total: 188461
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9225799106061459
          entropy_coeff: 0.0
          kl: 0.014608819998102263
          policy_loss: -0.11096593602269422
          total_loss: -0.06365308992099017
          vf_explained_var: 0.7384436726570129
          vf_loss: 0.025125702319201082
    num_agent_steps_sampled: 9644608
    num_steps_sampled: 9644608
    num_steps_trained: 9644608
  iterations_since_restore: 49

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1489,66020.7,9644608,1.9236,1.9824,0,38.4037


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9652600
  custom_metrics: {}
  date: 2021-12-10_07-23-16
  done: false
  episode_len_mean: 35.57085020242915
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9292566834191078
  episode_reward_min: 0.5748000144958496
  episodes_this_iter: 247
  episodes_total: 188708
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9059284199029207
          entropy_coeff: 0.0
          kl: 0.014618112880270928
          policy_loss: -0.10220354620105354
          total_loss: -0.05635772715322673
          vf_explained_var: 0.7173081040382385
          vf_loss: 0.023644561413675547
    num_agent_steps_sampled: 9652600
    num_steps_sampled: 9652600
    num_steps_trained: 9652600
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1490,66066.5,9652600,1.92926,1.9816,0.5748,35.5709


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9660592
  custom_metrics: {}
  date: 2021-12-10_07-24-03
  done: false
  episode_len_mean: 35.00431034482759
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.883612071645671
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 188940
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9056081231683493
          entropy_coeff: 0.0
          kl: 0.011841407424071804
          policy_loss: -0.08738615177571774
          total_loss: -0.04144648298097309
          vf_explained_var: 0.8038837909698486
          vf_loss: 0.02795553271425888
    num_agent_steps_sampled: 9660592
    num_steps_sampled: 9660592
    num_steps_trained: 9660592
  iterations_since_restore: 496


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1491,66112.8,9660592,1.88361,1.98,-2,35.0043


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9668584
  custom_metrics: {}
  date: 2021-12-10_07-24-48
  done: false
  episode_len_mean: 32.96186440677966
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9031355956853446
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 189176
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8660619892179966
          entropy_coeff: 0.0
          kl: 0.014200418896507472
          policy_loss: -0.09518175010452978
          total_loss: -0.04566357791190967
          vf_explained_var: 0.8024711608886719
          vf_loss: 0.02795128815341741
    num_agent_steps_sampled: 9668584
    num_steps_sampled: 9668584
    num_steps_trained: 9668584
  iterations_since_restore: 497

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1492,66158.3,9668584,1.90314,1.9788,-2,32.9619


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9676576
  custom_metrics: {}
  date: 2021-12-10_07-25-34
  done: false
  episode_len_mean: 30.702898550724637
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8826811564141426
  episode_reward_min: -2.0
  episodes_this_iter: 276
  episodes_total: 189452
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8401220254600048
          entropy_coeff: 0.0
          kl: 0.012587888515554368
          policy_loss: -0.08235994848655537
          total_loss: -0.03523189545376226
          vf_explained_var: 0.8054903745651245
          vf_loss: 0.028010200534481555
    num_agent_steps_sampled: 9676576
    num_steps_sampled: 9676576
    num_steps_trained: 9676576
  iterations_since_restore: 49

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1493,66204.4,9676576,1.88268,1.9832,-2,30.7029


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9684568
  custom_metrics: {}
  date: 2021-12-10_07-26-20
  done: false
  episode_len_mean: 29.75619834710744
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.909259500582356
  episode_reward_min: -2.0
  episodes_this_iter: 242
  episodes_total: 189694
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9016928151249886
          entropy_coeff: 0.0
          kl: 0.013800411601550877
          policy_loss: -0.09837240068009123
          total_loss: -0.04615690105129033
          vf_explained_var: 0.838449239730835
          vf_loss: 0.03125612501753494
    num_agent_steps_sampled: 9684568
    num_steps_sampled: 9684568
    num_steps_trained: 9684568
  iterations_since_restore: 499
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1494,66250.2,9684568,1.90926,1.9796,-2,29.7562


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9692560
  custom_metrics: {}
  date: 2021-12-10_07-27-06
  done: false
  episode_len_mean: 35.0
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9303532446617688
  episode_reward_min: 1.2444000244140625
  episodes_this_iter: 231
  episodes_total: 189925
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9121409188956022
          entropy_coeff: 0.0
          kl: 0.01443752262275666
          policy_loss: -0.10168735159095377
          total_loss: -0.05366649554343894
          vf_explained_var: 0.7689955830574036
          vf_loss: 0.026093867782037705
    num_agent_steps_sampled: 9692560
    num_steps_sampled: 9692560
    num_steps_trained: 9692560
  iterations_since_restore: 500

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1495,66295.9,9692560,1.93035,1.9796,1.2444,35


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9700552
  custom_metrics: {}
  date: 2021-12-10_07-27-52
  done: false
  episode_len_mean: 36.48979591836735
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8690693877181228
  episode_reward_min: -2.0
  episodes_this_iter: 196
  episodes_total: 190121
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9865461010485888
          entropy_coeff: 0.0
          kl: 0.013492291967850178
          policy_loss: -0.09437367847567657
          total_loss: -0.04441319928446319
          vf_explained_var: 0.8514099717140198
          vf_loss: 0.029469062166754156
    num_agent_steps_sampled: 9700552
    num_steps_sampled: 9700552
    num_steps_trained: 9700552
  iterations_since_restore: 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1496,66342,9700552,1.86907,1.9808,-2,36.4898


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9708544
  custom_metrics: {}
  date: 2021-12-10_07-28-38
  done: false
  episode_len_mean: 37.93396226415094
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8568320757937882
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 190333
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9672074895352125
          entropy_coeff: 0.0
          kl: 0.011808896961156279
          policy_loss: -0.0924667893559672
          total_loss: -0.0321496124524856
          vf_explained_var: 0.8308831453323364
          vf_loss: 0.04238241509301588
    num_agent_steps_sampled: 9708544
    num_steps_sampled: 9708544
    num_steps_trained: 9708544
  iterations_since_restore: 502
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1497,66387.7,9708544,1.85683,1.9792,-2,37.934


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9716536
  custom_metrics: {}
  date: 2021-12-10_07-29-24
  done: false
  episode_len_mean: 38.791666666666664
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.909283331422894
  episode_reward_min: -2.0
  episodes_this_iter: 216
  episodes_total: 190549
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9296750742942095
          entropy_coeff: 0.0
          kl: 0.013567790942033753
          policy_loss: -0.10369014996103942
          total_loss: -0.053059257654240355
          vf_explained_var: 0.8104320764541626
          vf_loss: 0.03002481209114194
    num_agent_steps_sampled: 9716536
    num_steps_sampled: 9716536
    num_steps_trained: 9716536
  iterations_since_restore: 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1498,66434,9716536,1.90928,1.9804,-2,38.7917


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9724528
  custom_metrics: {}
  date: 2021-12-10_07-30-10
  done: false
  episode_len_mean: 37.02575107296137
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8780000017947904
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 190782
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8932502754032612
          entropy_coeff: 0.0
          kl: 0.01291719128494151
          policy_loss: -0.09480853704735637
          total_loss: -0.03835659252945334
          vf_explained_var: 0.787116289138794
          vf_loss: 0.036833960097283125
    num_agent_steps_sampled: 9724528
    num_steps_sampled: 9724528
    num_steps_trained: 9724528
  iterations_since_restore: 504
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1499,66480.1,9724528,1.878,1.9792,-2,37.0258


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9732520
  custom_metrics: {}
  date: 2021-12-10_07-30-57
  done: false
  episode_len_mean: 33.51271186440678
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.917164407544217
  episode_reward_min: -2.0
  episodes_this_iter: 236
  episodes_total: 191018
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9010568670928478
          entropy_coeff: 0.0
          kl: 0.014677966042654589
          policy_loss: -0.10028030557441525
          total_loss: -0.04997973625722807
          vf_explained_var: 0.7610845565795898
          vf_loss: 0.028008413966745138
    num_agent_steps_sampled: 9732520
    num_steps_sampled: 9732520
    num_steps_trained: 9732520
  iterations_since_restore: 505

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1500,66526.7,9732520,1.91716,1.9812,-2,33.5127


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9740512
  custom_metrics: {}
  date: 2021-12-10_07-31-43
  done: false
  episode_len_mean: 30.619433198380566
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9391352242303763
  episode_reward_min: 1.5276000499725342
  episodes_this_iter: 247
  episodes_total: 191265
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8981535751372576
          entropy_coeff: 0.0
          kl: 0.014358715270645916
          policy_loss: -0.10168875285307877
          total_loss: -0.05541694164276123
          vf_explained_var: 0.766689658164978
          vf_loss: 0.024464510672260076
    num_agent_steps_sampled: 9740512
    num_steps_sampled: 9740512
    num_steps_trained: 9740512
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1501,66572.6,9740512,1.93914,1.978,1.5276,30.6194


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9748504
  custom_metrics: {}
  date: 2021-12-10_07-32-29
  done: false
  episode_len_mean: 35.50438596491228
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9293719281752904
  episode_reward_min: 0.7052000164985657
  episodes_this_iter: 228
  episodes_total: 191493
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9261191915720701
          entropy_coeff: 0.0
          kl: 0.014017320761922747
          policy_loss: -0.10521975916344672
          total_loss: -0.06085088138934225
          vf_explained_var: 0.7966065406799316
          vf_loss: 0.023080071492586285
    num_agent_steps_sampled: 9748504
    num_steps_sampled: 9748504
    num_steps_trained: 9748504
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1502,66619,9748504,1.92937,1.9808,0.7052,35.5044


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9756496
  custom_metrics: {}
  date: 2021-12-10_07-33-15
  done: false
  episode_len_mean: 35.98706896551724
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9284620649855713
  episode_reward_min: 1.0715999603271484
  episodes_this_iter: 232
  episodes_total: 191725
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.87249412573874
          entropy_coeff: 0.0
          kl: 0.014478423516266048
          policy_loss: -0.10272911863285117
          total_loss: -0.05738610523985699
          vf_explained_var: 0.7169344425201416
          vf_loss: 0.023353907628916204
    num_agent_steps_sampled: 9756496
    num_steps_sampled: 9756496
    num_steps_trained: 9756496
  iterations_since

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1503,66664.7,9756496,1.92846,1.978,1.0716,35.9871


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9764488
  custom_metrics: {}
  date: 2021-12-10_07-34-01
  done: false
  episode_len_mean: 32.288065843621396
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9196263360388486
  episode_reward_min: -2.0
  episodes_this_iter: 243
  episodes_total: 191968
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8613338712602854
          entropy_coeff: 0.0
          kl: 0.013448781537590548
          policy_loss: -0.09382452036516042
          total_loss: -0.04808418411994353
          vf_explained_var: 0.7319008111953735
          vf_loss: 0.025315003527794033
    num_agent_steps_sampled: 9764488
    num_steps_sampled: 9764488
    num_steps_trained: 9764488
  iterations_since_restore: 50

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1504,66710.7,9764488,1.91963,1.9828,-2,32.2881


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9772480
  custom_metrics: {}
  date: 2021-12-10_07-34-47
  done: false
  episode_len_mean: 36.24568965517241
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8777586196003289
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 192200
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8822673968970776
          entropy_coeff: 0.0
          kl: 0.012642468442209065
          policy_loss: -0.08642768464051187
          total_loss: -0.03754034178564325
          vf_explained_var: 0.8007889986038208
          vf_loss: 0.029686594614759088
    num_agent_steps_sampled: 9772480
    num_steps_sampled: 9772480
    num_steps_trained: 9772480
  iterations_since_restore: 51

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1505,66756.6,9772480,1.87776,1.9772,-2,36.2457


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9780472
  custom_metrics: {}
  date: 2021-12-10_07-35-33
  done: false
  episode_len_mean: 30.033457249070633
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9125680311904965
  episode_reward_min: -2.0
  episodes_this_iter: 269
  episodes_total: 192469
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8508705627173185
          entropy_coeff: 0.0
          kl: 0.01243724298547022
          policy_loss: -0.09042563469847664
          total_loss: -0.04865859140409157
          vf_explained_var: 0.8015472888946533
          vf_loss: 0.022877978131873533
    num_agent_steps_sampled: 9780472
    num_steps_sampled: 9780472
    num_steps_trained: 9780472
  iterations_since_restore: 51

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1506,66802.7,9780472,1.91257,1.98,-2,30.0335


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9788464
  custom_metrics: {}
  date: 2021-12-10_07-36-19
  done: false
  episode_len_mean: 32.17479674796748
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.936146342657446
  episode_reward_min: 1.6407999992370605
  episodes_this_iter: 246
  episodes_total: 192715
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.836817629635334
          entropy_coeff: 0.0
          kl: 0.01408397193881683
          policy_loss: -0.1003541532845702
          total_loss: -0.05708700050308835
          vf_explained_var: 0.638558030128479
          vf_loss: 0.02187712350860238
    num_agent_steps_sampled: 9788464
    num_steps_sampled: 9788464
    num_steps_trained: 9788464
  iterations_since_res

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1507,66848.6,9788464,1.93615,1.984,1.6408,32.1748


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9796456
  custom_metrics: {}
  date: 2021-12-10_07-37-05
  done: false
  episode_len_mean: 34.23605150214592
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8820412015710266
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 192948
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.873780420050025
          entropy_coeff: 0.0
          kl: 0.013331896916497499
          policy_loss: -0.08533235797949601
          total_loss: -0.043329259497113526
          vf_explained_var: 0.8495920300483704
          vf_loss: 0.02175528122461401
    num_agent_steps_sampled: 9796456
    num_steps_sampled: 9796456
    num_steps_trained: 9796456
  iterations_since_restore: 513

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1508,66894.7,9796456,1.88204,1.984,-2,34.2361


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9804448
  custom_metrics: {}
  date: 2021-12-10_07-37-51
  done: false
  episode_len_mean: 33.166666666666664
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9011333345348
  episode_reward_min: -2.0
  episodes_this_iter: 234
  episodes_total: 193182
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9069079048931599
          entropy_coeff: 0.0
          kl: 0.01265544889611192
          policy_loss: -0.09394684559083544
          total_loss: -0.05098969195387326
          vf_explained_var: 0.8479334115982056
          vf_loss: 0.02373669226653874
    num_agent_steps_sampled: 9804448
    num_steps_sampled: 9804448
    num_steps_trained: 9804448
  iterations_since_restore: 514
  n

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1509,66940.3,9804448,1.90113,1.9832,-2,33.1667


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9812440
  custom_metrics: {}
  date: 2021-12-10_07-38-37
  done: false
  episode_len_mean: 33.1265306122449
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9024653055229965
  episode_reward_min: -2.0
  episodes_this_iter: 245
  episodes_total: 193427
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8939268812537193
          entropy_coeff: 0.0
          kl: 0.013258122227853164
          policy_loss: -0.09151602920610458
          total_loss: -0.047551612777169794
          vf_explained_var: 0.7804535627365112
          vf_loss: 0.023828646284528077
    num_agent_steps_sampled: 9812440
    num_steps_sampled: 9812440
    num_steps_trained: 9812440
  iterations_since_restore: 515

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1510,66986.3,9812440,1.90247,1.9828,-2,33.1265


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9820432
  custom_metrics: {}
  date: 2021-12-10_07-39-23
  done: false
  episode_len_mean: 34.180616740088105
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.915087226729036
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 193654
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8917872458696365
          entropy_coeff: 0.0
          kl: 0.012511435255873948
          policy_loss: -0.08864305622410029
          total_loss: -0.0437878806842491
          vf_explained_var: 0.8223066329956055
          vf_loss: 0.025853434635791928
    num_agent_steps_sampled: 9820432
    num_steps_sampled: 9820432
    num_steps_trained: 9820432
  iterations_since_restore: 516

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1511,67032.3,9820432,1.91509,1.98,-2,34.1806


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9828424
  custom_metrics: {}
  date: 2021-12-10_07-40-09
  done: false
  episode_len_mean: 34.39090909090909
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.91411818374287
  episode_reward_min: -2.0
  episodes_this_iter: 220
  episodes_total: 193874
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.905014144256711
          entropy_coeff: 0.0
          kl: 0.014480247686151415
          policy_loss: -0.09901793678000104
          total_loss: -0.0507468193245586
          vf_explained_var: 0.768714189529419
          vf_loss: 0.026279243524186313
    num_agent_steps_sampled: 9828424
    num_steps_sampled: 9828424
    num_steps_trained: 9828424
  iterations_since_restore: 517
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1512,67078.1,9828424,1.91412,1.9792,-2,34.3909


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9836416
  custom_metrics: {}
  date: 2021-12-10_07-40-55
  done: false
  episode_len_mean: 35.549107142857146
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9293071451996053
  episode_reward_min: 0.8023999929428101
  episodes_this_iter: 224
  episodes_total: 194098
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9209816046059132
          entropy_coeff: 0.0
          kl: 0.014327672484796494
          policy_loss: -0.09869552741292864
          total_loss: -0.050615151441888884
          vf_explained_var: 0.7839011549949646
          vf_loss: 0.026320225151721388
    num_agent_steps_sampled: 9836416
    num_steps_sampled: 9836416
    num_steps_trained: 9836416
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1513,67124,9836416,1.92931,1.9792,0.8024,35.5491


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9844408
  custom_metrics: {}
  date: 2021-12-10_07-41-40
  done: false
  episode_len_mean: 35.46696035242291
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.912417618713715
  episode_reward_min: -2.0
  episodes_this_iter: 227
  episodes_total: 194325
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9116350915282965
          entropy_coeff: 0.0
          kl: 0.013319277059054002
          policy_loss: -0.09398287133080885
          total_loss: -0.03931461462343577
          vf_explained_var: 0.7512372732162476
          vf_loss: 0.0344396042637527
    num_agent_steps_sampled: 9844408
    num_steps_sampled: 9844408
    num_steps_trained: 9844408
  iterations_since_restore: 519
  

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1514,67169.6,9844408,1.91242,1.9832,-2,35.467


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9852400
  custom_metrics: {}
  date: 2021-12-10_07-42-26
  done: false
  episode_len_mean: 38.76381909547739
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.883716585049078
  episode_reward_min: -2.0
  episodes_this_iter: 199
  episodes_total: 194524
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.940461752936244
          entropy_coeff: 0.0
          kl: 0.013659882155479863
          policy_loss: -0.10028246516594663
          total_loss: -0.048466164284036495
          vf_explained_var: 0.7966091632843018
          vf_loss: 0.031070355034898967
    num_agent_steps_sampled: 9852400
    num_steps_sampled: 9852400
    num_steps_trained: 9852400
  iterations_since_restore: 520


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1515,67215.5,9852400,1.88372,1.9832,-2,38.7638


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9860392
  custom_metrics: {}
  date: 2021-12-10_07-43-12
  done: false
  episode_len_mean: 33.27093596059113
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.914843353731879
  episode_reward_min: -2.0
  episodes_this_iter: 203
  episodes_total: 194727
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9723856952041388
          entropy_coeff: 0.0
          kl: 0.013605436950456351
          policy_loss: -0.10330234374850988
          total_loss: -0.052623247232986614
          vf_explained_var: 0.8139400482177734
          vf_loss: 0.030015839263796806
    num_agent_steps_sampled: 9860392
    num_steps_sampled: 9860392
    num_steps_trained: 9860392
  iterations_since_restore: 52

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1516,67261.1,9860392,1.91484,1.9824,-2,33.2709


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9868384
  custom_metrics: {}
  date: 2021-12-10_07-43-58
  done: false
  episode_len_mean: 46.6986301369863
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8902648431525382
  episode_reward_min: -2.0
  episodes_this_iter: 219
  episodes_total: 194946
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9148247390985489
          entropy_coeff: 0.0
          kl: 0.014262251235777512
          policy_loss: -0.09512077990802936
          total_loss: -0.045131067774491385
          vf_explained_var: 0.7866755723953247
          vf_loss: 0.02832891821162775
    num_agent_steps_sampled: 9868384
    num_steps_sampled: 9868384
    num_steps_trained: 9868384
  iterations_since_restore: 522

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1517,67307,9868384,1.89026,1.9788,-2,46.6986


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9876376
  custom_metrics: {}
  date: 2021-12-10_07-44-44
  done: false
  episode_len_mean: 34.11353711790393
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9152244548089639
  episode_reward_min: -2.0
  episodes_this_iter: 229
  episodes_total: 195175
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.916800370439887
          entropy_coeff: 0.0
          kl: 0.013858583581168205
          policy_loss: -0.09521706443047151
          total_loss: -0.046812739470624365
          vf_explained_var: 0.8028701543807983
          vf_loss: 0.027356599806807935
    num_agent_steps_sampled: 9876376
    num_steps_sampled: 9876376
    num_steps_trained: 9876376
  iterations_since_restore: 52

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1518,67353.3,9876376,1.91522,1.9788,-2,34.1135


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9884368
  custom_metrics: {}
  date: 2021-12-10_07-45-30
  done: false
  episode_len_mean: 32.52208835341366
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.919802409817416
  episode_reward_min: -2.0
  episodes_this_iter: 249
  episodes_total: 195424
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8487207312136889
          entropy_coeff: 0.0
          kl: 0.01385309366742149
          policy_loss: -0.09372208616696298
          total_loss: -0.04669987512170337
          vf_explained_var: 0.741247832775116
          vf_loss: 0.025982822407968342
    num_agent_steps_sampled: 9884368
    num_steps_sampled: 9884368
    num_steps_trained: 9884368
  iterations_since_restore: 524
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1519,67399.1,9884368,1.9198,1.9788,-2,32.5221


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9892360
  custom_metrics: {}
  date: 2021-12-10_07-46-16
  done: false
  episode_len_mean: 36.07582938388626
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9100739311832953
  episode_reward_min: -2.0
  episodes_this_iter: 211
  episodes_total: 195635
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9104139339178801
          entropy_coeff: 0.0
          kl: 0.013704900746233761
          policy_loss: -0.09355546545702964
          total_loss: -0.04143545335682575
          vf_explained_var: 0.7075312733650208
          vf_loss: 0.031305695592891425
    num_agent_steps_sampled: 9892360
    num_steps_sampled: 9892360
    num_steps_trained: 9892360
  iterations_since_restore: 52

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1520,67445.1,9892360,1.91007,1.9812,-2,36.0758


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9900352
  custom_metrics: {}
  date: 2021-12-10_07-47-02
  done: false
  episode_len_mean: 32.42489270386266
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9051433474209176
  episode_reward_min: -2.0
  episodes_this_iter: 233
  episodes_total: 195868
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9181132540106773
          entropy_coeff: 0.0
          kl: 0.012376487749861553
          policy_loss: -0.09364517786889337
          total_loss: -0.04974060409585945
          vf_explained_var: 0.8229063749313354
          vf_loss: 0.025107781577389687
    num_agent_steps_sampled: 9900352
    num_steps_sampled: 9900352
    num_steps_trained: 9900352
  iterations_since_restore: 52

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1521,67490.8,9900352,1.90514,1.978,-2,32.4249


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9908344
  custom_metrics: {}
  date: 2021-12-10_07-47-48
  done: false
  episode_len_mean: 35.469565217391306
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9294608678506768
  episode_reward_min: 0.6176000237464905
  episodes_this_iter: 230
  episodes_total: 196098
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.911822410300374
          entropy_coeff: 0.0
          kl: 0.013703286036616191
          policy_loss: -0.10062441411719192
          total_loss: -0.053259658452589065
          vf_explained_var: 0.8373919725418091
          vf_loss: 0.026552895491477102
    num_agent_steps_sampled: 9908344
    num_steps_sampled: 9908344
    num_steps_trained: 9908344
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1522,67536.6,9908344,1.92946,1.9804,0.6176,35.4696


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9916336
  custom_metrics: {}
  date: 2021-12-10_07-48-33
  done: false
  episode_len_mean: 37.265700483091784
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.907070529633674
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 196305
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.903909781947732
          entropy_coeff: 0.0
          kl: 0.014472632959950715
          policy_loss: -0.10437676805304363
          total_loss: -0.055811324273236096
          vf_explained_var: 0.7885591983795166
          vf_loss: 0.026585133222397417
    num_agent_steps_sampled: 9916336
    num_steps_sampled: 9916336
    num_steps_trained: 9916336
  iterations_since_restore: 52

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1523,67582.4,9916336,1.90707,1.9804,-2,37.2657


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9924328
  custom_metrics: {}
  date: 2021-12-10_07-49-19
  done: false
  episode_len_mean: 33.18396226415094
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8972452828344308
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 196517
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9528359808027744
          entropy_coeff: 0.0
          kl: 0.013432248000754043
          policy_loss: -0.09440035474835895
          total_loss: -0.042569401091896
          vf_explained_var: 0.8090623617172241
          vf_loss: 0.03143072739476338
    num_agent_steps_sampled: 9924328
    num_steps_sampled: 9924328
    num_steps_trained: 9924328
  iterations_since_restore: 529
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1524,67628,9924328,1.89725,1.9808,-2,33.184


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9932320
  custom_metrics: {}
  date: 2021-12-10_07-50-05
  done: false
  episode_len_mean: 37.740566037735846
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9144622655409687
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 196729
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9800544828176498
          entropy_coeff: 0.0
          kl: 0.014007235615281388
          policy_loss: -0.09772047551814467
          total_loss: -0.051496743260941
          vf_explained_var: 0.8464164137840271
          vf_loss: 0.024950247432570904
    num_agent_steps_sampled: 9932320
    num_steps_sampled: 9932320
    num_steps_trained: 9932320
  iterations_since_restore: 530

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1525,67674.1,9932320,1.91446,1.9784,-2,37.7406


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9940312
  custom_metrics: {}
  date: 2021-12-10_07-50-51
  done: false
  episode_len_mean: 40.886792452830186
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.901147166794201
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 196941
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9639870803803205
          entropy_coeff: 0.0
          kl: 0.014171345508657396
          policy_loss: -0.09738774778088555
          total_loss: -0.049475015519419685
          vf_explained_var: 0.8238673210144043
          vf_loss: 0.0263899986166507
    num_agent_steps_sampled: 9940312
    num_steps_sampled: 9940312
    num_steps_trained: 9940312
  iterations_since_restore: 531


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1526,67720,9940312,1.90115,1.9792,-2,40.8868


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9948304
  custom_metrics: {}
  date: 2021-12-10_07-51-37
  done: false
  episode_len_mean: 38.185344827586206
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.907624133188149
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 197173
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9013716243207455
          entropy_coeff: 0.0
          kl: 0.012528943712823093
          policy_loss: -0.09288625553017482
          total_loss: -0.04851393114950042
          vf_explained_var: 0.8215838074684143
          vf_loss: 0.025343992630951107
    num_agent_steps_sampled: 9948304
    num_steps_sampled: 9948304
    num_steps_trained: 9948304
  iterations_since_restore: 53

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1527,67766,9948304,1.90762,1.9784,-2,38.1853


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9956296
  custom_metrics: {}
  date: 2021-12-10_07-52-23
  done: false
  episode_len_mean: 32.64840182648402
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9351470279911338
  episode_reward_min: 1.7680000066757202
  episodes_this_iter: 219
  episodes_total: 197392
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9318157900124788
          entropy_coeff: 0.0
          kl: 0.014356142841279507
          policy_loss: -0.10188911104341969
          total_loss: -0.057041862775804475
          vf_explained_var: 0.7997143268585205
          vf_loss: 0.023043855384457856
    num_agent_steps_sampled: 9956296
    num_steps_sampled: 9956296
    num_steps_trained: 9956296
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1528,67811.8,9956296,1.93515,1.9812,1.768,32.6484


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9964288
  custom_metrics: {}
  date: 2021-12-10_07-53-08
  done: false
  episode_len_mean: 33.1125
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.934136667350928
  episode_reward_min: 0.9936000108718872
  episodes_this_iter: 240
  episodes_total: 197632
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9222592636942863
          entropy_coeff: 0.0
          kl: 0.013957969640614465
          policy_loss: -0.10505807079607621
          total_loss: -0.060225504377740435
          vf_explained_var: 0.8075710535049438
          vf_loss: 0.02363390108803287
    num_agent_steps_sampled: 9964288
    num_steps_sampled: 9964288
    num_steps_trained: 9964288
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1529,67857.2,9964288,1.93414,1.9784,0.9936,33.1125


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9972280
  custom_metrics: {}
  date: 2021-12-10_07-53-54
  done: false
  episode_len_mean: 37.08620689655172
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8934568941336254
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 197864
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8936518281698227
          entropy_coeff: 0.0
          kl: 0.012560626462800428
          policy_loss: -0.08623250442906283
          total_loss: -0.04447070001333486
          vf_explained_var: 0.8238953351974487
          vf_loss: 0.02268535306211561
    num_agent_steps_sampled: 9972280
    num_steps_sampled: 9972280
    num_steps_trained: 9972280
  iterations_since_restore: 535

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1530,67902.8,9972280,1.89346,1.9812,-2,37.0862


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9980272
  custom_metrics: {}
  date: 2021-12-10_07-54-40
  done: false
  episode_len_mean: 34.663677130044846
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.899174885364926
  episode_reward_min: -2.0
  episodes_this_iter: 223
  episodes_total: 198087
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9000262506306171
          entropy_coeff: 0.0
          kl: 0.013165673153707758
          policy_loss: -0.10119609968387522
          total_loss: -0.05687239805411082
          vf_explained_var: 0.8333903551101685
          vf_loss: 0.02432833780767396
    num_agent_steps_sampled: 9980272
    num_steps_sampled: 9980272
    num_steps_trained: 9980272
  iterations_since_restore: 536

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1531,67948.4,9980272,1.89917,1.9812,-2,34.6637


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9988264
  custom_metrics: {}
  date: 2021-12-10_07-55-25
  done: false
  episode_len_mean: 35.233050847457626
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9300135611477545
  episode_reward_min: 0.0
  episodes_this_iter: 236
  episodes_total: 198323
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8620028086006641
          entropy_coeff: 0.0
          kl: 0.013857631071005017
          policy_loss: -0.09496375903836451
          total_loss: -0.05233459928422235
          vf_explained_var: 0.7785518169403076
          vf_loss: 0.0215828834916465
    num_agent_steps_sampled: 9988264
    num_steps_sampled: 9988264
    num_steps_trained: 9988264
  iterations_since_restore: 537


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1532,67994,9988264,1.93001,1.9812,0,35.2331


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 9996256
  custom_metrics: {}
  date: 2021-12-10_07-56-11
  done: false
  episode_len_mean: 37.34222222222222
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9080639998118083
  episode_reward_min: -2.0
  episodes_this_iter: 225
  episodes_total: 198548
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8961584698408842
          entropy_coeff: 0.0
          kl: 0.014194929477525875
          policy_loss: -0.10321601321629714
          total_loss: -0.05904922635818366
          vf_explained_var: 0.785705029964447
          vf_loss: 0.022608240775298327
    num_agent_steps_sampled: 9996256
    num_steps_sampled: 9996256
    num_steps_trained: 9996256
  iterations_since_restore: 538

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1533,68039.6,9996256,1.90806,1.9812,-2,37.3422


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10004248
  custom_metrics: {}
  date: 2021-12-10_07-56-57
  done: false
  episode_len_mean: 31.97609561752988
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.936478083827106
  episode_reward_min: 1.4844000339508057
  episodes_this_iter: 251
  episodes_total: 198799
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8523654937744141
          entropy_coeff: 0.0
          kl: 0.014566463098162785
          policy_loss: -0.10316994664026424
          total_loss: -0.05822905999957584
          vf_explained_var: 0.7349803447723389
          vf_loss: 0.02281807258259505
    num_agent_steps_sampled: 10004248
    num_steps_sampled: 10004248
    num_steps_trained: 10004248
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1534,68085.1,10004248,1.93648,1.9812,1.4844,31.9761


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10012240
  custom_metrics: {}
  date: 2021-12-10_07-57-42
  done: false
  episode_len_mean: 35.42439024390244
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.929559022333564
  episode_reward_min: 0.8748000264167786
  episodes_this_iter: 205
  episodes_total: 199004
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9382164143025875
          entropy_coeff: 0.0
          kl: 0.014477395307039842
          policy_loss: -0.10595106985419989
          total_loss: -0.05990915250731632
          vf_explained_var: 0.7873139381408691
          vf_loss: 0.024054374342085794
    num_agent_steps_sampled: 10012240
    num_steps_sampled: 10012240
    num_steps_trained: 10012240
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1535,68130.6,10012240,1.92956,1.98,0.8748,35.4244


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10020232
  custom_metrics: {}
  date: 2021-12-10_07-58-28
  done: false
  episode_len_mean: 36.21551724137931
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.8943137922163666
  episode_reward_min: -2.0
  episodes_this_iter: 232
  episodes_total: 199236
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8914941847324371
          entropy_coeff: 0.0
          kl: 0.01283235481241718
          policy_loss: -0.09419669778435491
          total_loss: -0.04925980391999474
          vf_explained_var: 0.7691980600357056
          vf_loss: 0.025447756110224873
    num_agent_steps_sampled: 10020232
    num_steps_sampled: 10020232
    num_steps_trained: 10020232
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1536,68176.2,10020232,1.89431,1.9764,-2,36.2155


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10028224
  custom_metrics: {}
  date: 2021-12-10_07-59-13
  done: false
  episode_len_mean: 35.862385321100916
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9286752293962952
  episode_reward_min: 1.5887999534606934
  episodes_this_iter: 218
  episodes_total: 199454
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8960409238934517
          entropy_coeff: 0.0
          kl: 0.01489504289929755
          policy_loss: -0.10512809161446057
          total_loss: -0.05950219798251055
          vf_explained_var: 0.748197078704834
          vf_loss: 0.023004048562142998
    num_agent_steps_sampled: 10028224
    num_steps_sampled: 10028224
    num_steps_trained: 10028224
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1537,68221.7,10028224,1.92868,1.98,1.5888,35.8624


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10036216
  custom_metrics: {}
  date: 2021-12-10_07-59-59
  done: false
  episode_len_mean: 33.47345132743363
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9334460143494394
  episode_reward_min: 1.5884000062942505
  episodes_this_iter: 226
  episodes_total: 199680
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.888154523447156
          entropy_coeff: 0.0
          kl: 0.015282942476915196
          policy_loss: -0.1035775258205831
          total_loss: -0.05752717750146985
          vf_explained_var: 0.7535829544067383
          vf_loss: 0.022839380078949034
    num_agent_steps_sampled: 10036216
    num_steps_sampled: 10036216
    num_steps_trained: 10036216
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1538,68267.5,10036216,1.93345,1.978,1.5884,33.4735


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10044208
  custom_metrics: {}
  date: 2021-12-10_08-00-45
  done: false
  episode_len_mean: 36.769911504424776
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8751309761958839
  episode_reward_min: -2.0
  episodes_this_iter: 226
  episodes_total: 199906
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9012438971549273
          entropy_coeff: 0.0
          kl: 0.013195696898037568
          policy_loss: -0.0909116894035833
          total_loss: -0.044680996565148234
          vf_explained_var: 0.8282477855682373
          vf_loss: 0.0261897302698344
    num_agent_steps_sampled: 10044208
    num_steps_sampled: 10044208
    num_steps_trained: 10044208
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1539,68313.5,10044208,1.87513,1.9808,-2,36.7699


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10052200
  custom_metrics: {}
  date: 2021-12-10_08-01-31
  done: false
  episode_len_mean: 35.62264150943396
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9111056631466128
  episode_reward_min: -2.0
  episodes_this_iter: 212
  episodes_total: 200118
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9232953600585461
          entropy_coeff: 0.0
          kl: 0.013957657298306003
          policy_loss: -0.10291810339549556
          total_loss: -0.05378936740453355
          vf_explained_var: 0.812175989151001
          vf_loss: 0.027930541895329952
    num_agent_steps_sampled: 10052200
    num_steps_sampled: 10052200
    num_steps_trained: 10052200
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1540,68359.4,10052200,1.91111,1.978,-2,35.6226


Result for PPO_Soccer_e3a41_00000:
  agent_timesteps_total: 10060192
  custom_metrics: {}
  date: 2021-12-10_08-02-17
  done: true
  episode_len_mean: 34.45410628019324
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9129178737096741
  episode_reward_min: -2.0
  episodes_this_iter: 207
  episodes_total: 200325
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9531681295484304
          entropy_coeff: 0.0
          kl: 0.013055990944849327
          policy_loss: -0.09784194140229374
          total_loss: -0.046866806398611516
          vf_explained_var: 0.8305264711380005
          vf_loss: 0.031146350491326302
    num_agent_steps_sampled: 10060192
    num_steps_sampled: 10060192
    num_steps_trained: 10060192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,RUNNING,192.168.15.7:6600,1541,68405.2,10060192,1.91292,1.9808,-2,34.4541


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_e3a41_00000,TERMINATED,,1541,68405.2,10060192,1.91292,1.9808,-2,34.4541


2021-12-10 08:02:18,766	INFO tune.py:549 -- Total run time: 25234.06 seconds (25232.71 seconds for the tuning loop).


In [9]:
ALGORITHM = "PPO"
TRIAL = analysis.get_best_logdir("episode_reward_mean", "max")
CHECKPOINT = analysis.get_best_checkpoint(
  TRIAL,
  "training_iteration",
  "max",
)
TRIAL, CHECKPOINT

('D:\\CEIA\\game\\results\\PPO\\PPO_Soccer_e3a41_00000_0_2021-12-10_01-01-44',
 'D:\\CEIA\\game\\results\\PPO\\PPO_Soccer_e3a41_00000_0_2021-12-10_01-01-44\\checkpoint_001541\\checkpoint-1541')

In [10]:
NUM_ENVS_PER_WORKER = 5

In [11]:
#single player without opponent
analysis = tune.run(
    "PPO",
    config={
        # system settings
        "num_gpus": 0,
        "num_workers": 5,
        "num_envs_per_worker": NUM_ENVS_PER_WORKER,
        "log_level": "INFO",
        #"lr": ray.tune.uniform(1e-7, 1e-3),
        "lr": 0.0003,
        "lambda": 0.95,
        "gamma": 0.99,
        'sgd_minibatch_size': 256,
        #'train_batch_size': 4000,
        'clip_param': 0.2,
        'model': {
          'fcnet_hiddens': [256, 256],
        },
        "framework": "torch",
        # RL setup
        "env": "Soccer",
        "env_config": {
            "num_envs_per_worker": NUM_ENVS_PER_WORKER,
            "variation": soccer_twos.EnvType.team_vs_policy,
            "single_player": True,
            "flatten_branched": True,
            #"opponent_policy": lambda *_: 0,
        },
    },
    stop={
        # 10000000 (10M) de steps podem ser necessários para aprender uma política útil
        "timesteps_total": 30000000,
        # você também pode limitar por tempo, de acordo com o tempo limite do colab
        "time_total_s": 80000, # 12h
    },
    checkpoint_freq=100,
    checkpoint_at_end=True,
    local_dir=os.path.join("results"),
    restore="results/PPO/PPO_Soccer_e3a41_00000_0_2021-12-10_01-01-44/checkpoint_001541/checkpoint-1541",
)

Trial name,status,loc
PPO_Soccer_a0663_00000,PENDING,


Trial name,status,loc
PPO_Soccer_a0663_00000,RUNNING,


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10064192
  custom_metrics: {}
  date: 2021-12-10_08-45-49
  done: false
  episode_len_mean: 31.87378640776699
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8979184465500916
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 200428
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.2
          cur_lr: 0.0003
          entropy: 0.9302581436932087
          entropy_coeff: 0.0
          kl: 0.03477791871409863
          policy_loss: -0.1054547168314457
          total_loss: -0.05323598859831691
          vf_explained_var: 0.767951250076294
          vf_loss: 0.04526314069516957
    num_agent_steps_sampled: 10064192
    num_steps_sampled: 10064192
    num_steps_trained: 10064192
  iterations_since_restore: 1
  node_ip: 192.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1542,68434,10064192,1.89792,1.9776,-2,31.8738


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10068192
  custom_metrics: {}
  date: 2021-12-10_08-46-14
  done: false
  episode_len_mean: 36.2803738317757
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8589943970475242
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 200535
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.30000000000000004
          cur_lr: 0.0003
          entropy: 0.9310116656124592
          entropy_coeff: 0.0
          kl: 0.03131414589006454
          policy_loss: -0.10720030451193452
          total_loss: -0.06270697840955108
          vf_explained_var: 0.8202623128890991
          vf_loss: 0.03509908728301525
    num_agent_steps_sampled: 10068192
    num_steps_sampled: 10068192
    num_steps_trained: 10068192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1543,68458.6,10068192,1.85899,1.978,-2,36.2804


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10072192
  custom_metrics: {}
  date: 2021-12-10_08-46-39
  done: false
  episode_len_mean: 33.4
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8596000058310374
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 200640
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.45000000000000007
          cur_lr: 0.0003
          entropy: 0.9483561590313911
          entropy_coeff: 0.0
          kl: 0.027060221997089684
          policy_loss: -0.1038367721484974
          total_loss: -0.06388012482784688
          vf_explained_var: 0.8834971785545349
          vf_loss: 0.027779547846876085
    num_agent_steps_sampled: 10072192
    num_steps_sampled: 10072192
    num_steps_trained: 10072192
  iterations_since_restore: 3
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1544,68483.4,10072192,1.8596,1.9788,-2,33.4


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10076192
  custom_metrics: {}
  date: 2021-12-10_08-47-03
  done: false
  episode_len_mean: 41.94
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8488160049915314
  episode_reward_min: -2.0
  episodes_this_iter: 82
  episodes_total: 200722
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 0.675
          cur_lr: 0.0003
          entropy: 1.0806021392345428
          entropy_coeff: 0.0
          kl: 0.02150943223387003
          policy_loss: -0.10874131845775992
          total_loss: -0.03935991827165708
          vf_explained_var: 0.8178671598434448
          vf_loss: 0.05486253183335066
    num_agent_steps_sampled: 10076192
    num_steps_sampled: 10076192
    num_steps_trained: 10076192
  iterations_since_restore: 4
  node_ip: 192.168.15.7


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1545,68508.1,10076192,1.84882,1.9788,-2,41.94


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10080192
  custom_metrics: {}
  date: 2021-12-10_08-47-28
  done: false
  episode_len_mean: 43.48
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.6831360000371933
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 200818
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0274541638791561
          entropy_coeff: 0.0
          kl: 0.01586966816103086
          policy_loss: -0.09714628057554364
          total_loss: -0.03121169182122685
          vf_explained_var: 0.8602923154830933
          vf_loss: 0.049866551999002695
    num_agent_steps_sampled: 10080192
    num_steps_sampled: 10080192
    num_steps_trained: 10080192
  iterations_since_restore: 5
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1546,68533.1,10080192,1.68314,1.9828,-2,43.48


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10084192
  custom_metrics: {}
  date: 2021-12-10_08-47-53
  done: false
  episode_len_mean: 38.51
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.88620000064373
  episode_reward_min: -2.0
  episodes_this_iter: 88
  episodes_total: 200906
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.095729187130928
          entropy_coeff: 0.0
          kl: 0.01800375070888549
          policy_loss: -0.11414596461690962
          total_loss: -0.06367407756624743
          vf_explained_var: 0.8846790790557861
          vf_loss: 0.03224309126380831
    num_agent_steps_sampled: 10084192
    num_steps_sampled: 10084192
    num_steps_trained: 10084192
  iterations_since_restore: 6
  node_ip: 192.

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1547,68557.4,10084192,1.8862,1.9792,-2,38.51


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10088192
  custom_metrics: {}
  date: 2021-12-10_08-48-17
  done: false
  episode_len_mean: 44.64
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8730999994277955
  episode_reward_min: -2.0
  episodes_this_iter: 88
  episodes_total: 200994
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.044874221086502
          entropy_coeff: 0.0
          kl: 0.016943774826359004
          policy_loss: -0.0968744873534888
          total_loss: -0.027703050465788692
          vf_explained_var: 0.8202561140060425
          vf_loss: 0.05201586289331317
    num_agent_steps_sampled: 10088192
    num_steps_sampled: 10088192
    num_steps_trained: 10088192
  iterations_since_restore: 7
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1548,68581.8,10088192,1.8731,1.9792,-2,44.64


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10092192
  custom_metrics: {}
  date: 2021-12-10_08-48-42
  done: false
  episode_len_mean: 45.37
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8840360009670258
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 201093
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0287653878331184
          entropy_coeff: 0.0
          kl: 0.016914420353714377
          policy_loss: -0.09870093379868194
          total_loss: -0.051299328450113535
          vf_explained_var: 0.8965951204299927
          vf_loss: 0.03027575323358178
    num_agent_steps_sampled: 10092192
    num_steps_sampled: 10092192
    num_steps_trained: 10092192
  iterations_since_restore: 8
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1549,68606.4,10092192,1.88404,1.9784,-2,45.37


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10096192
  custom_metrics: {}
  date: 2021-12-10_08-49-06
  done: false
  episode_len_mean: 32.04
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8224440038204193
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 201183
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0385948568582535
          entropy_coeff: 0.0
          kl: 0.0160229405737482
          policy_loss: -0.09786593413446099
          total_loss: -0.040755619877018034
          vf_explained_var: 0.893530011177063
          vf_loss: 0.04088708874769509
    num_agent_steps_sampled: 10096192
    num_steps_sampled: 10096192
    num_steps_trained: 10096192
  iterations_since_restore: 9
  node_ip: 19

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1550,68630.7,10096192,1.82244,1.9816,-2,32.04


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10100192
  custom_metrics: {}
  date: 2021-12-10_08-49-31
  done: false
  episode_len_mean: 42.85
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8100999987125397
  episode_reward_min: -2.0
  episodes_this_iter: 84
  episodes_total: 201267
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0623217858374119
          entropy_coeff: 0.0
          kl: 0.01700869557680562
          policy_loss: -0.09669057396240532
          total_loss: -0.02478675969177857
          vf_explained_var: 0.8422561287879944
          vf_loss: 0.05468250915873796
    num_agent_steps_sampled: 10100192
    num_steps_sampled: 10100192
    num_steps_trained: 10100192
  iterations_since_restore: 10
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1551,68655.1,10100192,1.8101,1.9816,-2,42.85


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10104192
  custom_metrics: {}
  date: 2021-12-10_08-49-55
  done: false
  episode_len_mean: 43.65
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9131160008907317
  episode_reward_min: 1.5463999509811401
  episodes_this_iter: 83
  episodes_total: 201350
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0675726234912872
          entropy_coeff: 0.0
          kl: 0.01795917289564386
          policy_loss: -0.10439580184174702
          total_loss: -0.05251765431603417
          vf_explained_var: 0.8850696086883545
          vf_loss: 0.03369448287412524
    num_agent_steps_sampled: 10104192
    num_steps_sampled: 10104192
    num_steps_trained: 10104192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1552,68679.3,10104192,1.91312,1.982,1.5464,43.65


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10108192
  custom_metrics: {}
  date: 2021-12-10_08-50-19
  done: false
  episode_len_mean: 57.36538461538461
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8115846205216188
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 201454
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9723166562616825
          entropy_coeff: 0.0
          kl: 0.017804157338105142
          policy_loss: -0.10451717185787857
          total_loss: -0.036746140103787184
          vf_explained_var: 0.849573016166687
          vf_loss: 0.04974432219751179
    num_agent_steps_sampled: 10108192
    num_steps_sampled: 10108192
    num_steps_trained: 10108192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1553,68703.7,10108192,1.81158,1.9804,-2,57.3654


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10112192
  custom_metrics: {}
  date: 2021-12-10_08-50-44
  done: false
  episode_len_mean: 45.09
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8717159986495973
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 201552
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0327360890805721
          entropy_coeff: 0.0
          kl: 0.016984433634206653
          policy_loss: -0.10406715516000986
          total_loss: -0.0477877464145422
          vf_explained_var: 0.8837939500808716
          vf_loss: 0.039082672679796815
    num_agent_steps_sampled: 10112192
    num_steps_sampled: 10112192
    num_steps_trained: 10112192
  iterations_since_restore: 13
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1554,68728,10112192,1.87172,1.982,-2,45.09


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10116192
  custom_metrics: {}
  date: 2021-12-10_08-51-08
  done: false
  episode_len_mean: 39.46
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8820359939336777
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 201647
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.010607872158289
          entropy_coeff: 0.0
          kl: 0.017288624891079962
          policy_loss: -0.10814845364075154
          total_loss: -0.041341200936585665
          vf_explained_var: 0.8245221972465515
          vf_loss: 0.049302522325888276
    num_agent_steps_sampled: 10116192
    num_steps_sampled: 10116192
    num_steps_trained: 10116192
  iterations_since_restore: 14
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1555,68752.5,10116192,1.88204,1.9816,-2,39.46


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10120192
  custom_metrics: {}
  date: 2021-12-10_08-51-33
  done: false
  episode_len_mean: 40.91
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8074039971828462
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 201747
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9390483982861042
          entropy_coeff: 0.0
          kl: 0.017323484702501446
          policy_loss: -0.10408532223664224
          total_loss: -0.03746884485008195
          vf_explained_var: 0.8420029878616333
          vf_loss: 0.04907644842751324
    num_agent_steps_sampled: 10120192
    num_steps_sampled: 10120192
    num_steps_trained: 10120192
  iterations_since_restore: 15
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1556,68777.4,10120192,1.8074,1.9804,-2,40.91


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10124192
  custom_metrics: {}
  date: 2021-12-10_08-51-58
  done: false
  episode_len_mean: 39.39449541284404
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8500660559453002
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 201856
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9027231931686401
          entropy_coeff: 0.0
          kl: 0.01786484383046627
          policy_loss: -0.10513861454091966
          total_loss: -0.0453322276880499
          vf_explained_var: 0.7962073683738708
          vf_loss: 0.04171823593787849
    num_agent_steps_sampled: 10124192
    num_steps_sampled: 10124192
    num_steps_trained: 10124192
  iterations_since_restore: 16

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1557,68802.3,10124192,1.85007,1.9828,-2,39.3945


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10128192
  custom_metrics: {}
  date: 2021-12-10_08-52-23
  done: false
  episode_len_mean: 37.598130841121495
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9252112112312674
  episode_reward_min: 1.3839999437332153
  episodes_this_iter: 107
  episodes_total: 201963
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9406625889241695
          entropy_coeff: 0.0
          kl: 0.019022610620595515
          policy_loss: -0.1168868481181562
          total_loss: -0.0571920937509276
          vf_explained_var: 0.7486238479614258
          vf_loss: 0.040434358874335885
    num_agent_steps_sampled: 10128192
    num_steps_sampled: 10128192
    num_steps_trained: 10128192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1558,68827.5,10128192,1.92521,1.9796,1.384,37.5981


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10132192
  custom_metrics: {}
  date: 2021-12-10_08-52-47
  done: false
  episode_len_mean: 43.93
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.7960480010509492
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 202053
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9495524205267429
          entropy_coeff: 0.0
          kl: 0.01788280252367258
          policy_loss: -0.10398083156906068
          total_loss: -0.006800004455726594
          vf_explained_var: 0.6334177255630493
          vf_loss: 0.0790744898840785
    num_agent_steps_sampled: 10132192
    num_steps_sampled: 10132192
    num_steps_trained: 10132192
  iterations_since_restore: 18
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1559,68851.7,10132192,1.79605,1.978,-2,43.93


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10136192
  custom_metrics: {}
  date: 2021-12-10_08-53-12
  done: false
  episode_len_mean: 41.51
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.7992479991912842
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 202150
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9314739219844341
          entropy_coeff: 0.0
          kl: 0.018098722910508513
          policy_loss: -0.10368153429590166
          total_loss: -0.029450811387505382
          vf_explained_var: 0.7645323276519775
          vf_loss: 0.05590576305985451
    num_agent_steps_sampled: 10136192
    num_steps_sampled: 10136192
    num_steps_trained: 10136192
  iterations_since_restore: 19
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1560,68876,10136192,1.79925,1.9784,-2,41.51


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10140192
  custom_metrics: {}
  date: 2021-12-10_08-53-36
  done: false
  episode_len_mean: 36.96078431372549
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.7726588179083431
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 202252
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9685775078833103
          entropy_coeff: 0.0
          kl: 0.016571513144299388
          policy_loss: -0.09769646805943921
          total_loss: -0.015191635699011385
          vf_explained_var: 0.7769836187362671
          vf_loss: 0.06572617241181433
    num_agent_steps_sampled: 10140192
    num_steps_sampled: 10140192
    num_steps_trained: 10140192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1561,68900.1,10140192,1.77266,1.9824,-2,36.9608


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10144192
  custom_metrics: {}
  date: 2021-12-10_08-54-00
  done: false
  episode_len_mean: 39.62
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.921172000169754
  episode_reward_min: 1.3095999956130981
  episodes_this_iter: 97
  episodes_total: 202349
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9696142598986626
          entropy_coeff: 0.0
          kl: 0.019674383685924113
          policy_loss: -0.11421502998564392
          total_loss: -0.0544181241421029
          vf_explained_var: 0.7635049819946289
          vf_loss: 0.03987658838741481
    num_agent_steps_sampled: 10144192
    num_steps_sampled: 10144192
    num_steps_trained: 10144192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1562,68924.2,10144192,1.92117,1.9812,1.3096,39.62


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10148192
  custom_metrics: {}
  date: 2021-12-10_08-54-24
  done: false
  episode_len_mean: 39.61
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.7749799978733063
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 202448
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0123950019478798
          entropy_coeff: 0.0
          kl: 0.01646172924665734
          policy_loss: -0.10211710911244154
          total_loss: -0.035250409942818806
          vf_explained_var: 0.8432974815368652
          vf_loss: 0.050199203193187714
    num_agent_steps_sampled: 10148192
    num_steps_sampled: 10148192
    num_steps_trained: 10148192
  iterations_since_restore: 22
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1563,68948,10148192,1.77498,1.9812,-2,39.61


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10152192
  custom_metrics: {}
  date: 2021-12-10_08-54-48
  done: false
  episode_len_mean: 37.75892857142857
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9249714267041003
  episode_reward_min: 0.5175999999046326
  episodes_this_iter: 112
  episodes_total: 202560
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9440243355929852
          entropy_coeff: 0.0
          kl: 0.01860324596054852
          policy_loss: -0.10936441423837095
          total_loss: -0.04988240171223879
          vf_explained_var: 0.795567512512207
          vf_loss: 0.040646230801939964
    num_agent_steps_sampled: 10152192
    num_steps_sampled: 10152192
    num_steps_trained: 10152192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1564,68972.1,10152192,1.92497,1.9824,0.5176,37.7589


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10156192
  custom_metrics: {}
  date: 2021-12-10_08-55-12
  done: false
  episode_len_mean: 43.3
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.874480001926422
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 202660
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9265992231667042
          entropy_coeff: 0.0
          kl: 0.01883307989919558
          policy_loss: -0.10810915264301002
          total_loss: -0.03716559160966426
          vf_explained_var: 0.7389531135559082
          vf_loss: 0.051875066477805376
    num_agent_steps_sampled: 10156192
    num_steps_sampled: 10156192
    num_steps_trained: 10156192
  iterations_since_restore: 24
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1565,68996.1,10156192,1.87448,1.9816,-2,43.3


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10160192
  custom_metrics: {}
  date: 2021-12-10_08-55-36
  done: false
  episode_len_mean: 36.13
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8888680005073548
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 202757
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9369199350476265
          entropy_coeff: 0.0
          kl: 0.01833301002625376
          policy_loss: -0.10776140599045902
          total_loss: -0.041191734722815454
          vf_explained_var: 0.7499804496765137
          vf_loss: 0.04800750222057104
    num_agent_steps_sampled: 10160192
    num_steps_sampled: 10160192
    num_steps_trained: 10160192
  iterations_since_restore: 25
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1566,69020,10160192,1.88887,1.9824,-2,36.13


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10164192
  custom_metrics: {}
  date: 2021-12-10_08-56-00
  done: false
  episode_len_mean: 43.0
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8479600024223328
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 202857
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9472369402647018
          entropy_coeff: 0.0
          kl: 0.017777378496248275
          policy_loss: -0.10135521041229367
          total_loss: -0.045122099574655294
          vf_explained_var: 0.8485406637191772
          vf_loss: 0.03823351324535906
    num_agent_steps_sampled: 10164192
    num_steps_sampled: 10164192
    num_steps_trained: 10164192
  iterations_since_restore: 26
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1567,69044,10164192,1.84796,1.9792,-2,43


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10168192
  custom_metrics: {}
  date: 2021-12-10_08-56-25
  done: false
  episode_len_mean: 41.594339622641506
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.845430187459262
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 202963
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9429865591228008
          entropy_coeff: 0.0
          kl: 0.0173552724882029
          policy_loss: -0.10380658670328557
          total_loss: -0.04144911840558052
          vf_explained_var: 0.8140060901641846
          vf_loss: 0.04478525579907
    num_agent_steps_sampled: 10168192
    num_steps_sampled: 10168192
    num_steps_trained: 10168192
  iterations_since_restore: 27
 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1568,69069.3,10168192,1.84543,1.9824,-2,41.5943


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10172192
  custom_metrics: {}
  date: 2021-12-10_08-56-50
  done: false
  episode_len_mean: 30.306306306306308
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.9042162164911494
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 203074
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9638885296881199
          entropy_coeff: 0.0
          kl: 0.017778592300601304
          policy_loss: -0.10469960555201396
          total_loss: -0.040691119502298534
          vf_explained_var: 0.7958675622940063
          vf_loss: 0.04600766336079687
    num_agent_steps_sampled: 10172192
    num_steps_sampled: 10172192
    num_steps_trained: 10172192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1569,69094.2,10172192,1.90422,1.9776,-2,30.3063


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10176192
  custom_metrics: {}
  date: 2021-12-10_08-57-16
  done: false
  episode_len_mean: 36.333333333333336
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.890590474719093
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 203179
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9517177678644657
          entropy_coeff: 0.0
          kl: 0.01778844732325524
          policy_loss: -0.10367905348539352
          total_loss: -0.0521971887210384
          vf_explained_var: 0.8445709347724915
          vf_loss: 0.03347106045112014
    num_agent_steps_sampled: 10176192
    num_steps_sampled: 10176192
    num_steps_trained: 10176192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1570,69120.1,10176192,1.89059,1.9784,-2,36.3333


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10180192
  custom_metrics: {}
  date: 2021-12-10_08-57-41
  done: false
  episode_len_mean: 36.359223300970875
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8891533971990195
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 203282
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.948346845805645
          entropy_coeff: 0.0
          kl: 0.017412059125490487
          policy_loss: -0.10105863711214624
          total_loss: -0.030793402576819062
          vf_explained_var: 0.7550644874572754
          vf_loss: 0.05263552442193031
    num_agent_steps_sampled: 10180192
    num_steps_sampled: 10180192
    num_steps_trained: 10180192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1571,69145,10180192,1.88915,1.9776,-2,36.3592


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10184192
  custom_metrics: {}
  date: 2021-12-10_08-58-06
  done: false
  episode_len_mean: 41.944954128440365
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8460036696644004
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 203391
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9375598095357418
          entropy_coeff: 0.0
          kl: 0.01697529951343313
          policy_loss: -0.10136951191816479
          total_loss: -0.037377543514594436
          vf_explained_var: 0.7849808931350708
          vf_loss: 0.04680448095314205
    num_agent_steps_sampled: 10184192
    num_steps_sampled: 10184192
    num_steps_trained: 10184192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1572,69170,10184192,1.846,1.98,-2,41.945


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10188192
  custom_metrics: {}
  date: 2021-12-10_08-58-31
  done: false
  episode_len_mean: 38.68
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.7700440037250518
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 203486
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9421860948204994
          entropy_coeff: 0.0
          kl: 0.017192960716784
          policy_loss: -0.09623153880238533
          total_loss: -0.03884430031757802
          vf_explained_var: 0.8515579700469971
          vf_loss: 0.039979363908059895
    num_agent_steps_sampled: 10188192
    num_steps_sampled: 10188192
    num_steps_trained: 10188192
  iterations_since_restore: 32
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1573,69195.1,10188192,1.77004,1.9788,-2,38.68


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10192192
  custom_metrics: {}
  date: 2021-12-10_08-58-56
  done: false
  episode_len_mean: 34.92
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.854300001859665
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 203586
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9741025567054749
          entropy_coeff: 0.0
          kl: 0.016172940260730684
          policy_loss: -0.09805057005723938
          total_loss: -0.02707685000495985
          vf_explained_var: 0.8011370897293091
          vf_loss: 0.054598618065938354
    num_agent_steps_sampled: 10192192
    num_steps_sampled: 10192192
    num_steps_trained: 10192192
  iterations_since_restore: 33
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1574,69220,10192192,1.8543,1.9796,-2,34.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10196192
  custom_metrics: {}
  date: 2021-12-10_08-59-21
  done: false
  episode_len_mean: 49.05
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8262799978256226
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 203678
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9336670748889446
          entropy_coeff: 0.0
          kl: 0.01823102473281324
          policy_loss: -0.10428863670676947
          total_loss: -0.042851731646806
          vf_explained_var: 0.7965238094329834
          vf_loss: 0.042977989884093404
    num_agent_steps_sampled: 10196192
    num_steps_sampled: 10196192
    num_steps_trained: 10196192
  iterations_since_restore: 34
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1575,69244.5,10196192,1.82628,1.9796,-2,49.05


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10200192
  custom_metrics: {}
  date: 2021-12-10_08-59-46
  done: false
  episode_len_mean: 49.29
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.901891992688179
  episode_reward_min: 0.8956000208854675
  episodes_this_iter: 96
  episodes_total: 203774
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9156559072434902
          entropy_coeff: 0.0
          kl: 0.01979510812088847
          policy_loss: -0.1127456488320604
          total_loss: -0.059137584059499204
          vf_explained_var: 0.7827030420303345
          vf_loss: 0.03356551600154489
    num_agent_steps_sampled: 10200192
    num_steps_sampled: 10200192
    num_steps_trained: 10200192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1576,69269.8,10200192,1.90189,1.9792,0.8956,49.29


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10204192
  custom_metrics: {}
  date: 2021-12-10_09-00-11
  done: false
  episode_len_mean: 32.0
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8736000031721396
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 203896
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8199623562395573
          entropy_coeff: 0.0
          kl: 0.017714798916131258
          policy_loss: -0.09515415481291711
          total_loss: -0.04569217487005517
          vf_explained_var: 0.8072475790977478
          vf_loss: 0.031525741796940565
    num_agent_steps_sampled: 10204192
    num_steps_sampled: 10204192
    num_steps_trained: 10204192
  iterations_since_restore: 36
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1577,69295,10204192,1.8736,1.9796,-2,32


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10208192
  custom_metrics: {}
  date: 2021-12-10_09-00-36
  done: false
  episode_len_mean: 35.11504424778761
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8955681418950578
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 204009
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8953262269496918
          entropy_coeff: 0.0
          kl: 0.017732722801156342
          policy_loss: -0.10478269122540951
          total_loss: -0.041422869748203084
          vf_explained_var: 0.7190225124359131
          vf_loss: 0.04540543758776039
    num_agent_steps_sampled: 10208192
    num_steps_sampled: 10208192
    num_steps_trained: 10208192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1578,69319.7,10208192,1.89557,1.9816,-2,35.115


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10212192
  custom_metrics: {}
  date: 2021-12-10_09-01-01
  done: false
  episode_len_mean: 39.71
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8821159970760346
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 204101
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9722293429076672
          entropy_coeff: 0.0
          kl: 0.018073172541335225
          policy_loss: -0.1094800217251759
          total_loss: -0.04841493454296142
          vf_explained_var: 0.7754713296890259
          vf_loss: 0.042765995603986084
    num_agent_steps_sampled: 10212192
    num_steps_sampled: 10212192
    num_steps_trained: 10212192
  iterations_since_restore: 38
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1579,69344.4,10212192,1.88212,1.9816,-2,39.71


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10216192
  custom_metrics: {}
  date: 2021-12-10_09-01-25
  done: false
  episode_len_mean: 35.68141592920354
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.89547257191312
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 204214
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9251236990094185
          entropy_coeff: 0.0
          kl: 0.018419897882267833
          policy_loss: -0.10448079137131572
          total_loss: -0.050340769317699596
          vf_explained_var: 0.8176620602607727
          vf_loss: 0.03548987815156579
    num_agent_steps_sampled: 10216192
    num_steps_sampled: 10216192
    num_steps_trained: 10216192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1580,69368.9,10216192,1.89547,1.9808,-2,35.6814


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10220192
  custom_metrics: {}
  date: 2021-12-10_09-01-50
  done: false
  episode_len_mean: 34.80373831775701
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8595140214278318
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 204321
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9261267781257629
          entropy_coeff: 0.0
          kl: 0.017253891797736287
          policy_loss: -0.09707230617641471
          total_loss: -0.03261225495953113
          vf_explained_var: 0.7887240052223206
          vf_loss: 0.04699048865586519
    num_agent_steps_sampled: 10220192
    num_steps_sampled: 10220192
    num_steps_trained: 10220192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1581,69393.9,10220192,1.85951,1.9808,-2,34.8037


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10224192
  custom_metrics: {}
  date: 2021-12-10_09-02-15
  done: false
  episode_len_mean: 37.22018348623853
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.9259669693238144
  episode_reward_min: 1.434000015258789
  episodes_this_iter: 109
  episodes_total: 204430
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8884568884968758
          entropy_coeff: 0.0
          kl: 0.018727522459812462
          policy_loss: -0.10918922378914431
          total_loss: -0.05700397677719593
          vf_explained_var: 0.7876476049423218
          vf_loss: 0.03322363004554063
    num_agent_steps_sampled: 10224192
    num_steps_sampled: 10224192
    num_steps_trained: 10224192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1582,69418.6,10224192,1.92597,1.9768,1.434,37.2202


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10228192
  custom_metrics: {}
  date: 2021-12-10_09-02-40
  done: false
  episode_len_mean: 37.12
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8483000016212463
  episode_reward_min: -2.0
  episodes_this_iter: 81
  episodes_total: 204511
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0607394129037857
          entropy_coeff: 0.0
          kl: 0.017263752059079707
          policy_loss: -0.09958439879119396
          total_loss: -0.03714129234140273
          vf_explained_var: 0.8386763334274292
          vf_loss: 0.044963555643334985
    num_agent_steps_sampled: 10228192
    num_steps_sampled: 10228192
    num_steps_trained: 10228192
  iterations_since_restore: 42
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1583,69443.1,10228192,1.8483,1.9784,-2,37.12


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10232192
  custom_metrics: {}
  date: 2021-12-10_09-03-05
  done: false
  episode_len_mean: 49.61
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.78256799723953
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 204604
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0276552587747574
          entropy_coeff: 0.0
          kl: 0.01664530922425911
          policy_loss: -0.09861130159697495
          total_loss: -0.007124856230802834
          vf_explained_var: 0.7569279670715332
          vf_loss: 0.07463306980207562
    num_agent_steps_sampled: 10232192
    num_steps_sampled: 10232192
    num_steps_trained: 10232192
  iterations_since_restore: 43
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1584,69468.1,10232192,1.78257,1.9792,-2,49.61


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10236192
  custom_metrics: {}
  date: 2021-12-10_09-03-30
  done: false
  episode_len_mean: 38.70873786407767
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.884730094844855
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 204707
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.009085025638342
          entropy_coeff: 0.0
          kl: 0.016501477104611695
          policy_loss: -0.10140509251505136
          total_loss: -0.034579309285618365
          vf_explained_var: 0.8341128826141357
          vf_loss: 0.05011803493835032
    num_agent_steps_sampled: 10236192
    num_steps_sampled: 10236192
    num_steps_trained: 10236192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1585,69493.1,10236192,1.88473,1.9844,-2,38.7087


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10240192
  custom_metrics: {}
  date: 2021-12-10_09-03-55
  done: false
  episode_len_mean: 36.48
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8109319984912873
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 204797
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0313489101827145
          entropy_coeff: 0.0
          kl: 0.016819766315165907
          policy_loss: -0.10309872997459024
          total_loss: -0.04314488940872252
          vf_explained_var: 0.8656266927719116
          vf_loss: 0.042923831613734365
    num_agent_steps_sampled: 10240192
    num_steps_sampled: 10240192
    num_steps_trained: 10240192
  iterations_since_restore: 45
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1586,69518.5,10240192,1.81093,1.9792,-2,36.48


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10244192
  custom_metrics: {}
  date: 2021-12-10_09-04-20
  done: false
  episode_len_mean: 47.69
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.866028003692627
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 204893
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.000566978007555
          entropy_coeff: 0.0
          kl: 0.01819411327596754
          policy_loss: -0.10623173380736262
          total_loss: -0.040195930167101324
          vf_explained_var: 0.8151661157608032
          vf_loss: 0.047614269657060504
    num_agent_steps_sampled: 10244192
    num_steps_sampled: 10244192
    num_steps_trained: 10244192
  iterations_since_restore: 46
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1587,69543.7,10244192,1.86603,1.9796,-2,47.69


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10248192
  custom_metrics: {}
  date: 2021-12-10_09-04-45
  done: false
  episode_len_mean: 51.95
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8192599964141847
  episode_reward_min: -2.0
  episodes_this_iter: 73
  episodes_total: 204966
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.1227906793355942
          entropy_coeff: 0.0
          kl: 0.016984834684990346
          policy_loss: -0.10350102430675179
          total_loss: -0.029343914822675288
          vf_explained_var: 0.8121200799942017
          vf_loss: 0.0569599624723196
    num_agent_steps_sampled: 10248192
    num_steps_sampled: 10248192
    num_steps_trained: 10248192
  iterations_since_restore: 47
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1588,69568.9,10248192,1.81926,1.9796,-2,51.95


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10252192
  custom_metrics: {}
  date: 2021-12-10_09-05-11
  done: false
  episode_len_mean: 58.92
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8824999982118606
  episode_reward_min: 0.0
  episodes_this_iter: 92
  episodes_total: 205058
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9947384744882584
          entropy_coeff: 0.0
          kl: 0.019566911389119923
          policy_loss: -0.11557390668895096
          total_loss: -0.05934083479223773
          vf_explained_var: 0.8214572072029114
          vf_loss: 0.036421573255211115
    num_agent_steps_sampled: 10252192
    num_steps_sampled: 10252192
    num_steps_trained: 10252192
  iterations_since_restore: 48
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1589,69594.1,10252192,1.8825,1.9796,0,58.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10256192
  custom_metrics: {}
  date: 2021-12-10_09-05-36
  done: false
  episode_len_mean: 41.97
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8005640029907226
  episode_reward_min: -2.0
  episodes_this_iter: 89
  episodes_total: 205147
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9617121554911137
          entropy_coeff: 0.0
          kl: 0.01788207743084058
          policy_loss: -0.10174732957966626
          total_loss: -0.02706794039113447
          vf_explained_var: 0.7722594738006592
          vf_loss: 0.05657378421165049
    num_agent_steps_sampled: 10256192
    num_steps_sampled: 10256192
    num_steps_trained: 10256192
  iterations_since_restore: 49
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1590,69619.3,10256192,1.80056,1.9788,-2,41.97


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10260192
  custom_metrics: {}
  date: 2021-12-10_09-06-01
  done: false
  episode_len_mean: 38.13
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8855359983444213
  episode_reward_min: -2.0
  episodes_this_iter: 81
  episodes_total: 205228
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.1006463542580605
          entropy_coeff: 0.0
          kl: 0.017848765826784074
          policy_loss: -0.11035918665584177
          total_loss: -0.05041519762016833
          vf_explained_var: 0.8693927526473999
          vf_loss: 0.041872111964039505
    num_agent_steps_sampled: 10260192
    num_steps_sampled: 10260192
    num_steps_trained: 10260192
  iterations_since_restore: 50
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1591,69644.3,10260192,1.88554,1.9792,-2,38.13


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10264192
  custom_metrics: {}
  date: 2021-12-10_09-06-26
  done: false
  episode_len_mean: 36.90291262135922
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9266097013232777
  episode_reward_min: 1.3788000345230103
  episodes_this_iter: 103
  episodes_total: 205331
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0243911370635033
          entropy_coeff: 0.0
          kl: 0.019393458147533238
          policy_loss: -0.11409352615009993
          total_loss: -0.0690238889073953
          vf_explained_var: 0.9035352468490601
          vf_loss: 0.025433763745240867
    num_agent_steps_sampled: 10264192
    num_steps_sampled: 10264192
    num_steps_trained: 10264192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1592,69669.6,10264192,1.92661,1.9812,1.3788,36.9029


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10268192
  custom_metrics: {}
  date: 2021-12-10_09-06-52
  done: false
  episode_len_mean: 51.43
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8198839993774891
  episode_reward_min: -2.0
  episodes_this_iter: 84
  episodes_total: 205415
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0225689299404621
          entropy_coeff: 0.0
          kl: 0.017443110584281385
          policy_loss: -0.09788929863134399
          total_loss: -0.021889524068683386
          vf_explained_var: 0.8040091395378113
          vf_loss: 0.05833862372674048
    num_agent_steps_sampled: 10268192
    num_steps_sampled: 10268192
    num_steps_trained: 10268192
  iterations_since_restore: 52
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1593,69695.2,10268192,1.81988,1.9812,-2,51.43


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10272192
  custom_metrics: {}
  date: 2021-12-10_09-07-18
  done: false
  episode_len_mean: 43.08
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.7624279952049255
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 205515
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9765115417540073
          entropy_coeff: 0.0
          kl: 0.016997278144117445
          policy_loss: -0.10092562565114349
          total_loss: -0.02720424730796367
          vf_explained_var: 0.8198100328445435
          vf_loss: 0.05651163402944803
    num_agent_steps_sampled: 10272192
    num_steps_sampled: 10272192
    num_steps_trained: 10272192
  iterations_since_restore: 53
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1594,69721,10272192,1.76243,1.9804,-2,43.08


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10276192
  custom_metrics: {}
  date: 2021-12-10_09-07-44
  done: false
  episode_len_mean: 38.04
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.885056004524231
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 205614
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9597020819783211
          entropy_coeff: 0.0
          kl: 0.01799564016982913
          policy_loss: -0.10569146974012256
          total_loss: -0.037437718943692744
          vf_explained_var: 0.7990577816963196
          vf_loss: 0.050033163744956255
    num_agent_steps_sampled: 10276192
    num_steps_sampled: 10276192
    num_steps_trained: 10276192
  iterations_since_restore: 54
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1595,69746.8,10276192,1.88506,1.9844,-2,38.04


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10280192
  custom_metrics: {}
  date: 2021-12-10_09-08-09
  done: false
  episode_len_mean: 53.91
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8534080028533935
  episode_reward_min: -2.0
  episodes_this_iter: 82
  episodes_total: 205696
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.030076652765274
          entropy_coeff: 0.0
          kl: 0.018735309888143092
          policy_loss: -0.10879721597302705
          total_loss: -0.03630373248597607
          vf_explained_var: 0.7990719079971313
          vf_loss: 0.05352397938258946
    num_agent_steps_sampled: 10280192
    num_steps_sampled: 10280192
    num_steps_trained: 10280192
  iterations_since_restore: 55
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1596,69772.5,10280192,1.85341,1.9788,-2,53.91


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10284192
  custom_metrics: {}
  date: 2021-12-10_09-08-35
  done: false
  episode_len_mean: 50.9
  episode_media: {}
  episode_reward_max: 1.9759999513626099
  episode_reward_mean: 1.820860004425049
  episode_reward_min: -2.0
  episodes_this_iter: 87
  episodes_total: 205783
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0288040973246098
          entropy_coeff: 0.0
          kl: 0.018534312723204494
          policy_loss: -0.10598927014507353
          total_loss: -0.041970656835474074
          vf_explained_var: 0.8578575253486633
          vf_loss: 0.04525261907838285
    num_agent_steps_sampled: 10284192
    num_steps_sampled: 10284192
    num_steps_trained: 10284192
  iterations_since_restore: 56
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1597,69798.1,10284192,1.82086,1.976,-2,50.9


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10288192
  custom_metrics: {}
  date: 2021-12-10_09-09-00
  done: false
  episode_len_mean: 50.69
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8990360069274903
  episode_reward_min: 1.0492000579833984
  episodes_this_iter: 90
  episodes_total: 205873
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.043224386870861
          entropy_coeff: 0.0
          kl: 0.019732439541257918
          policy_loss: -0.11500482074916363
          total_loss: -0.057317987724673
          vf_explained_var: 0.8508846759796143
          vf_loss: 0.037707740208134055
    num_agent_steps_sampled: 10288192
    num_steps_sampled: 10288192
    num_steps_trained: 10288192
  iterations_since_restore: 5

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1598,69823.5,10288192,1.89904,1.9792,1.0492,50.69


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10292192
  custom_metrics: {}
  date: 2021-12-10_09-09-26
  done: false
  episode_len_mean: 44.29
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8723999989032745
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 205972
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0001872703433037
          entropy_coeff: 0.0
          kl: 0.017945661209523678
          policy_loss: -0.10523948166519403
          total_loss: -0.038468703627586365
          vf_explained_var: 0.8090977072715759
          vf_loss: 0.04860079113859683
    num_agent_steps_sampled: 10292192
    num_steps_sampled: 10292192
    num_steps_trained: 10292192
  iterations_since_restore: 58
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1599,69849.1,10292192,1.8724,1.9824,-2,44.29


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10296192
  custom_metrics: {}
  date: 2021-12-10_09-09-51
  done: false
  episode_len_mean: 46.95145631067961
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8033475702248731
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 206075
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9148946590721607
          entropy_coeff: 0.0
          kl: 0.01723121089162305
          policy_loss: -0.10203052428551018
          total_loss: -0.035028801939915866
          vf_explained_var: 0.8360210657119751
          vf_loss: 0.04955512227024883
    num_agent_steps_sampled: 10296192
    num_steps_sampled: 10296192
    num_steps_trained: 10296192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1600,69874.3,10296192,1.80335,1.9796,-2,46.9515


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10300192
  custom_metrics: {}
  date: 2021-12-10_09-10-16
  done: false
  episode_len_mean: 36.22
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.928023998737335
  episode_reward_min: 1.6083999872207642
  episodes_this_iter: 100
  episodes_total: 206175
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9198273159563541
          entropy_coeff: 0.0
          kl: 0.01819896773668006
          policy_loss: -0.10834696760866791
          total_loss: -0.052886068006046116
          vf_explained_var: 0.8043245673179626
          vf_loss: 0.03703444404527545
    num_agent_steps_sampled: 10300192
    num_steps_sampled: 10300192
    num_steps_trained: 10300192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1601,69899.2,10300192,1.92802,1.9812,1.6084,36.22


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10304192
  custom_metrics: {}
  date: 2021-12-10_09-10-41
  done: false
  episode_len_mean: 37.9537037037037
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8523148132695093
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 206283
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9063196368515491
          entropy_coeff: 0.0
          kl: 0.017737780406605452
          policy_loss: -0.10199736570939422
          total_loss: -0.0335363085323479
          vf_explained_var: 0.7799409627914429
          vf_loss: 0.05050155520439148
    num_agent_steps_sampled: 10304192
    num_steps_sampled: 10304192
    num_steps_trained: 10304192
  iterations_since_restore: 6

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1602,69923.9,10304192,1.85231,1.9812,-2,37.9537


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10308192
  custom_metrics: {}
  date: 2021-12-10_09-11-06
  done: false
  episode_len_mean: 36.02
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8496879994869233
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 206377
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9508887641131878
          entropy_coeff: 0.0
          kl: 0.017477474932093173
          policy_loss: -0.09998917899793014
          total_loss: -0.030422948068007827
          vf_explained_var: 0.7579573392868042
          vf_loss: 0.051870284136384726
    num_agent_steps_sampled: 10308192
    num_steps_sampled: 10308192
    num_steps_trained: 10308192
  iterations_since_restore: 62
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1603,69948.8,10308192,1.84969,1.9776,-2,36.02


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10312192
  custom_metrics: {}
  date: 2021-12-10_09-11-30
  done: false
  episode_len_mean: 44.67326732673267
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8731168263619489
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 206478
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8944362252950668
          entropy_coeff: 0.0
          kl: 0.01783864840399474
          policy_loss: -0.09893077588640153
          total_loss: -0.04422811963013373
          vf_explained_var: 0.8231831789016724
          vf_loss: 0.03664102510083467
    num_agent_steps_sampled: 10312192
    num_steps_sampled: 10312192
    num_steps_trained: 10312192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1604,69973.3,10312192,1.87312,1.9836,-2,44.6733


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10316192
  custom_metrics: {}
  date: 2021-12-10_09-11-55
  done: false
  episode_len_mean: 35.72222222222222
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8674444434819397
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 206586
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9566642381250858
          entropy_coeff: 0.0
          kl: 0.01704038237221539
          policy_loss: -0.09748198336455971
          total_loss: -0.04453683434985578
          vf_explained_var: 0.8576451539993286
          vf_loss: 0.035691759549081326
    num_agent_steps_sampled: 10316192
    num_steps_sampled: 10316192
    num_steps_trained: 10316192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1605,69998.3,10316192,1.86744,1.9784,-2,35.7222


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10320192
  custom_metrics: {}
  date: 2021-12-10_09-12-20
  done: false
  episode_len_mean: 39.131147540983605
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8896360622077693
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 206708
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8399594984948635
          entropy_coeff: 0.0
          kl: 0.0181226214626804
          policy_loss: -0.10113312001340091
          total_loss: -0.03461323241936043
          vf_explained_var: 0.6620652675628662
          vf_loss: 0.04817073477897793
    num_agent_steps_sampled: 10320192
    num_steps_sampled: 10320192
    num_steps_trained: 10320192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1606,70023.1,10320192,1.88964,1.98,-2,39.1311


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10324192
  custom_metrics: {}
  date: 2021-12-10_09-12-46
  done: false
  episode_len_mean: 35.24
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8920599961280822
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 206807
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8783483542501926
          entropy_coeff: 0.0
          kl: 0.018498032703064382
          policy_loss: -0.10505446679599117
          total_loss: -0.04942427412606776
          vf_explained_var: 0.8263772130012512
          vf_loss: 0.03690093324985355
    num_agent_steps_sampled: 10324192
    num_steps_sampled: 10324192
    num_steps_trained: 10324192
  iterations_since_restore: 66
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1607,70048.6,10324192,1.89206,1.978,-2,35.24


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10328192
  custom_metrics: {}
  date: 2021-12-10_09-13-11
  done: false
  episode_len_mean: 39.179245283018865
  episode_media: {}
  episode_reward_max: 1.9759999513626099
  episode_reward_mean: 1.8143169778697896
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 206913
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8735126964747906
          entropy_coeff: 0.0
          kl: 0.01683844323270023
          policy_loss: -0.09844772546784952
          total_loss: -0.032829093281179667
          vf_explained_var: 0.7919400930404663
          vf_loss: 0.04856970836408436
    num_agent_steps_sampled: 10328192
    num_steps_sampled: 10328192
    num_steps_trained: 10328192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1608,70074,10328192,1.81432,1.976,-2,39.1792


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10332192
  custom_metrics: {}
  date: 2021-12-10_09-13-36
  done: false
  episode_len_mean: 38.57425742574257
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8843881186872427
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 207014
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9144485555589199
          entropy_coeff: 0.0
          kl: 0.017690044012852013
          policy_loss: -0.10022090718848631
          total_loss: -0.03730218287091702
          vf_explained_var: 0.7692347764968872
          vf_loss: 0.045007559936493635
    num_agent_steps_sampled: 10332192
    num_steps_sampled: 10332192
    num_steps_trained: 10332192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1609,70099.2,10332192,1.88439,1.9812,-2,38.5743


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10336192
  custom_metrics: {}
  date: 2021-12-10_09-14-02
  done: false
  episode_len_mean: 32.04854368932039
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9362834911901974
  episode_reward_min: 1.7095999717712402
  episodes_this_iter: 103
  episodes_total: 207117
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9119597412645817
          entropy_coeff: 0.0
          kl: 0.019697417388670146
          policy_loss: -0.11479335615877062
          total_loss: -0.05887799640186131
          vf_explained_var: 0.8098711967468262
          vf_loss: 0.03597172081936151
    num_agent_steps_sampled: 10336192
    num_steps_sampled: 10336192
    num_steps_trained: 10336192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1610,70124.4,10336192,1.93628,1.9804,1.7096,32.0485


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10340192
  custom_metrics: {}
  date: 2021-12-10_09-14-26
  done: false
  episode_len_mean: 38.49122807017544
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.7856526291161252
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 207231
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.905622486025095
          entropy_coeff: 0.0
          kl: 0.016255410271696746
          policy_loss: -0.08943236881168559
          total_loss: -0.023038008192088455
          vf_explained_var: 0.828315794467926
          vf_loss: 0.04993575869593769
    num_agent_steps_sampled: 10340192
    num_steps_sampled: 10340192
    num_steps_trained: 10340192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1611,70149,10340192,1.78565,1.9812,-2,38.4912


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10344192
  custom_metrics: {}
  date: 2021-12-10_09-14-52
  done: false
  episode_len_mean: 32.53846153846154
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8609807651776533
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 207335
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9554404355585575
          entropy_coeff: 0.0
          kl: 0.017190580547321588
          policy_loss: -0.09839001472573727
          total_loss: -0.02796167932683602
          vf_explained_var: 0.8105233311653137
          vf_loss: 0.05302287032827735
    num_agent_steps_sampled: 10344192
    num_steps_sampled: 10344192
    num_steps_trained: 10344192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1612,70174.3,10344192,1.86098,1.9776,-2,32.5385


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10348192
  custom_metrics: {}
  date: 2021-12-10_09-15-17
  done: false
  episode_len_mean: 41.91
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8775919896364213
  episode_reward_min: -2.0
  episodes_this_iter: 85
  episodes_total: 207420
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0229069702327251
          entropy_coeff: 0.0
          kl: 0.018119058979209512
          policy_loss: -0.10478389577474445
          total_loss: -0.041504608117975295
          vf_explained_var: 0.8553804755210876
          vf_loss: 0.04493373539298773
    num_agent_steps_sampled: 10348192
    num_steps_sampled: 10348192
    num_steps_trained: 10348192
  iterations_since_restore: 72
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1613,70199.6,10348192,1.87759,1.9812,-2,41.91


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10352192
  custom_metrics: {}
  date: 2021-12-10_09-15-42
  done: false
  episode_len_mean: 40.63
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8025720024108887
  episode_reward_min: -2.0
  episodes_this_iter: 85
  episodes_total: 207505
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0495055988430977
          entropy_coeff: 0.0
          kl: 0.015872005955316126
          policy_loss: -0.09353640826884657
          total_loss: -0.014751561568118632
          vf_explained_var: 0.8039981722831726
          vf_loss: 0.06271443888545036
    num_agent_steps_sampled: 10352192
    num_steps_sampled: 10352192
    num_steps_trained: 10352192
  iterations_since_restore: 73
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1614,70224.7,10352192,1.80257,1.9812,-2,40.63


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10356192
  custom_metrics: {}
  date: 2021-12-10_09-16-07
  done: false
  episode_len_mean: 53.77
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.7463039970397949
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 207595
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9456758238375187
          entropy_coeff: 0.0
          kl: 0.01643817056901753
          policy_loss: -0.09941264591179788
          total_loss: -0.015048312314320356
          vf_explained_var: 0.8044264316558838
          vf_loss: 0.06772068375721574
    num_agent_steps_sampled: 10356192
    num_steps_sampled: 10356192
    num_steps_trained: 10356192
  iterations_since_restore: 74
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1615,70249.7,10356192,1.7463,1.9812,-2,53.77


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10360192
  custom_metrics: {}
  date: 2021-12-10_09-16-32
  done: false
  episode_len_mean: 47.84070796460177
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8118831889819256
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 207708
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8666496202349663
          entropy_coeff: 0.0
          kl: 0.01837996463291347
          policy_loss: -0.1070659170509316
          total_loss: -0.03022242954466492
          vf_explained_var: 0.8272807598114014
          vf_loss: 0.05823377170599997
    num_agent_steps_sampled: 10360192
    num_steps_sampled: 10360192
    num_steps_trained: 10360192
  iterations_since_restore: 7

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1616,70274.7,10360192,1.81188,1.9788,-2,47.8407


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10364192
  custom_metrics: {}
  date: 2021-12-10_09-16-57
  done: false
  episode_len_mean: 38.03
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.7705039954185486
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 207805
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9258463196456432
          entropy_coeff: 0.0
          kl: 0.017829769931267947
          policy_loss: -0.10289842807105742
          total_loss: -0.017895250057335943
          vf_explained_var: 0.8350073099136353
          vf_loss: 0.06695053284056485
    num_agent_steps_sampled: 10364192
    num_steps_sampled: 10364192
    num_steps_trained: 10364192
  iterations_since_restore: 76
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1617,70299.7,10364192,1.7705,1.984,-2,38.03


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10368192
  custom_metrics: {}
  date: 2021-12-10_09-17-22
  done: false
  episode_len_mean: 43.74
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8735079956054688
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 207905
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.881435714662075
          entropy_coeff: 0.0
          kl: 0.018470190349034965
          policy_loss: -0.10486919776303694
          total_loss: -0.04442989616654813
          vf_explained_var: 0.7845625281333923
          vf_loss: 0.04173823073506355
    num_agent_steps_sampled: 10368192
    num_steps_sampled: 10368192
    num_steps_trained: 10368192
  iterations_since_restore: 77
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1618,70324.9,10368192,1.87351,1.9816,-2,43.74


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10372192
  custom_metrics: {}
  date: 2021-12-10_09-17-47
  done: false
  episode_len_mean: 35.66
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8115880036354064
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 208003
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9421909563243389
          entropy_coeff: 0.0
          kl: 0.018450599221978337
          policy_loss: -0.10257404710864648
          total_loss: -0.03129550983430818
          vf_explained_var: 0.809911847114563
          vf_loss: 0.0525973082985729
    num_agent_steps_sampled: 10372192
    num_steps_sampled: 10372192
    num_steps_trained: 10372192
  iterations_since_restore: 78
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1619,70349.7,10372192,1.81159,1.9824,-2,35.66


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10376192
  custom_metrics: {}
  date: 2021-12-10_09-18-11
  done: false
  episode_len_mean: 39.29
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9218679940700532
  episode_reward_min: 1.3703999519348145
  episodes_this_iter: 82
  episodes_total: 208085
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9909387305378914
          entropy_coeff: 0.0
          kl: 0.018860741052776575
          policy_loss: -0.10718044592067599
          total_loss: -0.043011497939005494
          vf_explained_var: 0.8444575071334839
          vf_loss: 0.045072443783283234
    num_agent_steps_sampled: 10376192
    num_steps_sampled: 10376192
    num_steps_trained: 10376192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1620,70373.9,10376192,1.92187,1.9788,1.3704,39.29


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10380192
  custom_metrics: {}
  date: 2021-12-10_09-18-37
  done: false
  episode_len_mean: 36.54
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8904119956493377
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 208184
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9494275562465191
          entropy_coeff: 0.0
          kl: 0.019551653414964676
          policy_loss: -0.11009720235597342
          total_loss: -0.050911313970573246
          vf_explained_var: 0.8427974581718445
          vf_loss: 0.03938983927946538
    num_agent_steps_sampled: 10380192
    num_steps_sampled: 10380192
    num_steps_trained: 10380192
  iterations_since_restore: 80
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1621,70399,10380192,1.89041,1.984,-2,36.54


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10384192
  custom_metrics: {}
  date: 2021-12-10_09-19-02
  done: false
  episode_len_mean: 36.41
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8521200108528137
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 208276
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0251688845455647
          entropy_coeff: 0.0
          kl: 0.017166134261060506
          policy_loss: -0.09885646589100361
          total_loss: -0.0251679579669144
          vf_explained_var: 0.8075134754180908
          vf_loss: 0.056307798251509666
    num_agent_steps_sampled: 10384192
    num_steps_sampled: 10384192
    num_steps_trained: 10384192
  iterations_since_restore: 81
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1622,70424.3,10384192,1.85212,1.9812,-2,36.41


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10388192
  custom_metrics: {}
  date: 2021-12-10_09-19-27
  done: false
  episode_len_mean: 50.5
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8779919955134392
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 208374
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9494244083762169
          entropy_coeff: 0.0
          kl: 0.017372731410432607
          policy_loss: -0.10552154330071062
          total_loss: -0.040831127553246915
          vf_explained_var: 0.8562028408050537
          vf_loss: 0.04710052371956408
    num_agent_steps_sampled: 10388192
    num_steps_sampled: 10388192
    num_steps_trained: 10388192
  iterations_since_restore: 82
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1623,70449.1,10388192,1.87799,1.9824,-2,50.5


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10392192
  custom_metrics: {}
  date: 2021-12-10_09-19-52
  done: false
  episode_len_mean: 41.18
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8101719951629638
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 208465
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.026915218681097
          entropy_coeff: 0.0
          kl: 0.017014940502122045
          policy_loss: -0.10599433339666575
          total_loss: -0.03886832529678941
          vf_explained_var: 0.8493719100952148
          vf_loss: 0.04989837878383696
    num_agent_steps_sampled: 10392192
    num_steps_sampled: 10392192
    num_steps_trained: 10392192
  iterations_since_restore: 83
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1624,70474.7,10392192,1.81017,1.9832,-2,41.18


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10396192
  custom_metrics: {}
  date: 2021-12-10_09-20-17
  done: false
  episode_len_mean: 49.75
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8640360006690024
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 208557
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0050458684563637
          entropy_coeff: 0.0
          kl: 0.01753035158617422
          policy_loss: -0.10431252513080835
          total_loss: -0.044515545552712865
          vf_explained_var: 0.8568140268325806
          vf_loss: 0.04204749711789191
    num_agent_steps_sampled: 10396192
    num_steps_sampled: 10396192
    num_steps_trained: 10396192
  iterations_since_restore: 84
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1625,70499.6,10396192,1.86404,1.9832,-2,49.75


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10400192
  custom_metrics: {}
  date: 2021-12-10_09-20-42
  done: false
  episode_len_mean: 38.79
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8834480023384095
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 208657
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9362994842231274
          entropy_coeff: 0.0
          kl: 0.018692005309276283
          policy_loss: -0.10468729125568643
          total_loss: -0.03685916549875401
          vf_explained_var: 0.8046112656593323
          vf_loss: 0.04890246735885739
    num_agent_steps_sampled: 10400192
    num_steps_sampled: 10400192
    num_steps_trained: 10400192
  iterations_since_restore: 85
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1626,70524.2,10400192,1.88345,1.9824,-2,38.79


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10404192
  custom_metrics: {}
  date: 2021-12-10_09-21-07
  done: false
  episode_len_mean: 38.78
  episode_media: {}
  episode_reward_max: 1.9759999513626099
  episode_reward_mean: 1.8468359971046449
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 208757
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9311890080571175
          entropy_coeff: 0.0
          kl: 0.016639798181131482
          policy_loss: -0.10018283082172275
          total_loss: -0.032239516731351614
          vf_explained_var: 0.8481109142303467
          vf_loss: 0.05109551758505404
    num_agent_steps_sampled: 10404192
    num_steps_sampled: 10404192
    num_steps_trained: 10404192
  iterations_since_restore: 86
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1627,70549.4,10404192,1.84684,1.976,-2,38.78


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10408192
  custom_metrics: {}
  date: 2021-12-10_09-21-33
  done: false
  episode_len_mean: 34.23
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8559799993038177
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 208854
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9615863673388958
          entropy_coeff: 0.0
          kl: 0.017261244589462876
          policy_loss: -0.09880335966590792
          total_loss: -0.04435661993920803
          vf_explained_var: 0.8958827257156372
          vf_loss: 0.036969736218452454
    num_agent_steps_sampled: 10408192
    num_steps_sampled: 10408192
    num_steps_trained: 10408192
  iterations_since_restore: 87
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1628,70574.8,10408192,1.85598,1.9808,-2,34.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10412192
  custom_metrics: {}
  date: 2021-12-10_09-21-58
  done: false
  episode_len_mean: 40.333333333333336
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.850986665203458
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 208959
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0039855651557446
          entropy_coeff: 0.0
          kl: 0.017008475144393742
          policy_loss: -0.1036627150606364
          total_loss: -0.030483825656119734
          vf_explained_var: 0.8438048362731934
          vf_loss: 0.05595781002193689
    num_agent_steps_sampled: 10412192
    num_steps_sampled: 10412192
    num_steps_trained: 10412192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1629,70600,10412192,1.85099,1.9824,-2,40.3333


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10416192
  custom_metrics: {}
  date: 2021-12-10_09-22-23
  done: false
  episode_len_mean: 48.51
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.7913879942893982
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 209053
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0079223588109016
          entropy_coeff: 0.0
          kl: 0.016580213676206768
          policy_loss: -0.09835097345057875
          total_loss: -0.025444941129535437
          vf_explained_var: 0.8345260620117188
          vf_loss: 0.05611856281757355
    num_agent_steps_sampled: 10416192
    num_steps_sampled: 10416192
    num_steps_trained: 10416192
  iterations_since_restore: 89
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1630,70624.8,10416192,1.79139,1.9844,-2,48.51


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10420192
  custom_metrics: {}
  date: 2021-12-10_09-22-48
  done: false
  episode_len_mean: 43.1
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8358839970827103
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 209151
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9258433654904366
          entropy_coeff: 0.0
          kl: 0.017492674582172185
          policy_loss: -0.10255464259535074
          total_loss: -0.031915762927383184
          vf_explained_var: 0.830201268196106
          vf_loss: 0.05292754713445902
    num_agent_steps_sampled: 10420192
    num_steps_sampled: 10420192
    num_steps_trained: 10420192
  iterations_since_restore: 90
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1631,70649.8,10420192,1.83588,1.9832,-2,43.1


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10424192
  custom_metrics: {}
  date: 2021-12-10_09-23-12
  done: false
  episode_len_mean: 47.823008849557525
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.805511506257859
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 209264
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8532161302864552
          entropy_coeff: 0.0
          kl: 0.016706228896509856
          policy_loss: -0.09202884207479656
          total_loss: -0.020185315108392388
          vf_explained_var: 0.7920789122581482
          vf_loss: 0.054928472032770514
    num_agent_steps_sampled: 10424192
    num_steps_sampled: 10424192
    num_steps_trained: 10424192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1632,70674.3,10424192,1.80551,1.9804,-2,47.823


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10428192
  custom_metrics: {}
  date: 2021-12-10_09-23-37
  done: false
  episode_len_mean: 34.23762376237624
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.895100994865493
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 209365
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8845044262707233
          entropy_coeff: 0.0
          kl: 0.01765743369469419
          policy_loss: -0.10096211405470967
          total_loss: -0.03547451499616727
          vf_explained_var: 0.7707455158233643
          vf_loss: 0.04760944494046271
    num_agent_steps_sampled: 10428192
    num_steps_sampled: 10428192
    num_steps_trained: 10428192
  iterations_since_restore: 92

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1633,70699.5,10428192,1.8951,1.9796,-2,34.2376


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10432192
  custom_metrics: {}
  date: 2021-12-10_09-24-02
  done: false
  episode_len_mean: 35.98
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.890180002450943
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 209464
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9669443145394325
          entropy_coeff: 0.0
          kl: 0.01836165983695537
          policy_loss: -0.1090938022825867
          total_loss: -0.048506437335163355
          vf_explained_var: 0.8457437753677368
          vf_loss: 0.041996183106675744
    num_agent_steps_sampled: 10432192
    num_steps_sampled: 10432192
    num_steps_trained: 10432192
  iterations_since_restore: 93
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1634,70723.9,10432192,1.89018,1.9788,-2,35.98


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10436192
  custom_metrics: {}
  date: 2021-12-10_09-24-26
  done: false
  episode_len_mean: 46.92
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8006799972057344
  episode_reward_min: -2.0
  episodes_this_iter: 74
  episodes_total: 209538
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0200142562389374
          entropy_coeff: 0.0
          kl: 0.01743920345325023
          policy_loss: -0.09664619015529752
          total_loss: -0.04122321668546647
          vf_explained_var: 0.8911008238792419
          vf_loss: 0.03776577580720186
    num_agent_steps_sampled: 10436192
    num_steps_sampled: 10436192
    num_steps_trained: 10436192
  iterations_since_restore: 94
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1635,70748.1,10436192,1.80068,1.978,-2,46.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10440192
  custom_metrics: {}
  date: 2021-12-10_09-24-51
  done: false
  episode_len_mean: 40.01
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8471639990806579
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 209633
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0038241297006607
          entropy_coeff: 0.0
          kl: 0.017056215787306428
          policy_loss: -0.09870739886537194
          total_loss: -0.018702320754528046
          vf_explained_var: 0.7747130393981934
          vf_loss: 0.06273565883748233
    num_agent_steps_sampled: 10440192
    num_steps_sampled: 10440192
    num_steps_trained: 10440192
  iterations_since_restore: 95
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1636,70773.3,10440192,1.84716,1.9804,-2,40.01


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10444192
  custom_metrics: {}
  date: 2021-12-10_09-25-17
  done: false
  episode_len_mean: 45.6
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8797479939460755
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 209725
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9609345309436321
          entropy_coeff: 0.0
          kl: 0.018067290307953954
          policy_loss: -0.10887062526308
          total_loss: -0.047006659442558885
          vf_explained_var: 0.8283370733261108
          vf_loss: 0.04357084142975509
    num_agent_steps_sampled: 10444192
    num_steps_sampled: 10444192
    num_steps_trained: 10444192
  iterations_since_restore: 96
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1637,70798.5,10444192,1.87975,1.9824,-2,45.6


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10448192
  custom_metrics: {}
  date: 2021-12-10_09-25-42
  done: false
  episode_len_mean: 41.43
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.84414400100708
  episode_reward_min: -2.0
  episodes_this_iter: 86
  episodes_total: 209811
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.012354027479887
          entropy_coeff: 0.0
          kl: 0.017716985486913472
          policy_loss: -0.10450372751802206
          total_loss: -0.03796194662572816
          vf_explained_var: 0.8345219492912292
          vf_loss: 0.04860333143733442
    num_agent_steps_sampled: 10448192
    num_steps_sampled: 10448192
    num_steps_trained: 10448192
  iterations_since_restore: 97
  node_ip: 19

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1638,70824.1,10448192,1.84414,1.9828,-2,41.43


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10452192
  custom_metrics: {}
  date: 2021-12-10_09-26-07
  done: false
  episode_len_mean: 46.27
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.868708000779152
  episode_reward_min: -2.0
  episodes_this_iter: 70
  episodes_total: 209881
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.039171777665615
          entropy_coeff: 0.0
          kl: 0.018119300541002303
          policy_loss: -0.10477075050584972
          total_loss: -0.03378776920726523
          vf_explained_var: 0.8387781381607056
          vf_loss: 0.052637192187830806
    num_agent_steps_sampled: 10452192
    num_steps_sampled: 10452192
    num_steps_trained: 10452192
  iterations_since_restore: 98
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1639,70849.2,10452192,1.86871,1.9828,-2,46.27


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10456192
  custom_metrics: {}
  date: 2021-12-10_09-26-33
  done: false
  episode_len_mean: 44.89
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8329479986429214
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 209971
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0682315155863762
          entropy_coeff: 0.0
          kl: 0.016584120166953653
          policy_loss: -0.10053468216210604
          total_loss: -0.039010753636830486
          vf_explained_var: 0.8811749219894409
          vf_loss: 0.04473250324372202
    num_agent_steps_sampled: 10456192
    num_steps_sampled: 10456192
    num_steps_trained: 10456192
  iterations_since_restore: 99
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1640,70874.8,10456192,1.83295,1.9812,-2,44.89


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10460192
  custom_metrics: {}
  date: 2021-12-10_09-26-58
  done: false
  episode_len_mean: 42.1
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8068000066280365
  episode_reward_min: -2.0
  episodes_this_iter: 84
  episodes_total: 210055
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 1.0371017716825008
          entropy_coeff: 0.0
          kl: 0.016387822572141886
          policy_loss: -0.09788023977307603
          total_loss: -0.03029841574607417
          vf_explained_var: 0.8563294410705566
          vf_loss: 0.05098915146663785
    num_agent_steps_sampled: 10460192
    num_steps_sampled: 10460192
    num_steps_trained: 10460192
  iterations_since_restore: 100
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1641,70899.9,10460192,1.8068,1.9796,-2,42.1


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10464192
  custom_metrics: {}
  date: 2021-12-10_09-27-23
  done: false
  episode_len_mean: 52.52
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.7787959963083266
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 210149
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9571760334074497
          entropy_coeff: 0.0
          kl: 0.017253829399123788
          policy_loss: -0.09641943374299444
          total_loss: -0.03208223916590214
          vf_explained_var: 0.8649078607559204
          vf_loss: 0.046867697266861796
    num_agent_steps_sampled: 10464192
    num_steps_sampled: 10464192
    num_steps_trained: 10464192
  iterations_since_restore: 101
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1642,70924.6,10464192,1.7788,1.9836,-2,52.52


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10468192
  custom_metrics: {}
  date: 2021-12-10_09-27-49
  done: false
  episode_len_mean: 37.15
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8122399997711183
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 210245
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9455718472599983
          entropy_coeff: 0.0
          kl: 0.017714884073939174
          policy_loss: -0.10032280249288306
          total_loss: -0.033225468127056956
          vf_explained_var: 0.8479620814323425
          vf_loss: 0.04916101321578026
    num_agent_steps_sampled: 10468192
    num_steps_sampled: 10468192
    num_steps_trained: 10468192
  iterations_since_restore: 102
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1643,70950.6,10468192,1.81224,1.9804,-2,37.15


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10472192
  custom_metrics: {}
  date: 2021-12-10_09-28-15
  done: false
  episode_len_mean: 34.07
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8568800008296966
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 210344
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9435191452503204
          entropy_coeff: 0.0
          kl: 0.01858566584996879
          policy_loss: -0.10489385877735913
          total_loss: -0.02691015269374475
          vf_explained_var: 0.7913970947265625
          vf_loss: 0.05916571640409529
    num_agent_steps_sampled: 10472192
    num_steps_sampled: 10472192
    num_steps_trained: 10472192
  iterations_since_restore: 103
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1644,70976.8,10472192,1.85688,1.9844,-2,34.07


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10476192
  custom_metrics: {}
  date: 2021-12-10_09-28-42
  done: false
  episode_len_mean: 53.11009174311926
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8587229301759955
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 210453
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9221112206578255
          entropy_coeff: 0.0
          kl: 0.017941987549420446
          policy_loss: -0.10921506304293871
          total_loss: -0.03673617460299283
          vf_explained_var: 0.8166161179542542
          vf_loss: 0.054312625201418996
    num_agent_steps_sampled: 10476192
    num_steps_sampled: 10476192
    num_steps_trained: 10476192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1645,71003.6,10476192,1.85872,1.9812,-2,53.1101


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10480192
  custom_metrics: {}
  date: 2021-12-10_09-29-08
  done: false
  episode_len_mean: 47.36
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8056280016899109
  episode_reward_min: -2.0
  episodes_this_iter: 89
  episodes_total: 210542
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.951091818511486
          entropy_coeff: 0.0
          kl: 0.017392038425896317
          policy_loss: -0.09789074017317034
          total_loss: -0.026813956210389733
          vf_explained_var: 0.7983102202415466
          vf_loss: 0.053467341465875506
    num_agent_steps_sampled: 10480192
    num_steps_sampled: 10480192
    num_steps_trained: 10480192
  iterations_since_restore: 105
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1646,71029.3,10480192,1.80563,1.978,-2,47.36


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10484192
  custom_metrics: {}
  date: 2021-12-10_09-29-33
  done: false
  episode_len_mean: 39.820754716981135
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8845169769143157
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 210648
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9223161526024342
          entropy_coeff: 0.0
          kl: 0.018566133570857346
          policy_loss: -0.10962733277119696
          total_loss: -0.04423010390019044
          vf_explained_var: 0.7748287320137024
          vf_loss: 0.046599023742601275
    num_agent_steps_sampled: 10484192
    num_steps_sampled: 10484192
    num_steps_trained: 10484192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1647,71054.6,10484192,1.88452,1.9832,-2,39.8208


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10488192
  custom_metrics: {}
  date: 2021-12-10_09-29-59
  done: false
  episode_len_mean: 36.0655737704918
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8959409829045906
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 210770
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.8540163189172745
          entropy_coeff: 0.0
          kl: 0.01880463445559144
          policy_loss: -0.10522191689233296
          total_loss: -0.04924220882821828
          vf_explained_var: 0.7841112613677979
          vf_loss: 0.036940018413588405
    num_agent_steps_sampled: 10488192
    num_steps_sampled: 10488192
    num_steps_trained: 10488192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1648,71080.2,10488192,1.89594,1.9844,-2,36.0656


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10492192
  custom_metrics: {}
  date: 2021-12-10_09-30-24
  done: false
  episode_len_mean: 37.49
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9254120016098022
  episode_reward_min: 1.5504000186920166
  episodes_this_iter: 99
  episodes_total: 210869
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.0125000000000002
          cur_lr: 0.0003
          entropy: 0.9093046821653843
          entropy_coeff: 0.0
          kl: 0.020385990967042744
          policy_loss: -0.1138048178399913
          total_loss: -0.06120488164015114
          vf_explained_var: 0.7787952423095703
          vf_loss: 0.03195911936927587
    num_agent_steps_sampled: 10492192
    num_steps_sampled: 10492192
    num_steps_trained: 10492192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1649,71105.5,10492192,1.92541,1.9828,1.5504,37.49


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10496192
  custom_metrics: {}
  date: 2021-12-10_09-30-50
  done: false
  episode_len_mean: 31.719298245614034
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9022315761499238
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 210983
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8976849429309368
          entropy_coeff: 0.0
          kl: 0.013699192903004587
          policy_loss: -0.09705056226812303
          total_loss: -0.034436454909155145
          vf_explained_var: 0.7826464772224426
          vf_loss: 0.04180845874361694
    num_agent_steps_sampled: 10496192
    num_steps_sampled: 10496192
    num_steps_trained: 10496192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1650,71131.3,10496192,1.90223,1.98,-2,31.7193


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10500192
  custom_metrics: {}
  date: 2021-12-10_09-31-15
  done: false
  episode_len_mean: 37.92
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8080159974098207
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 211074
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9897869192063808
          entropy_coeff: 0.0
          kl: 0.013642790785524994
          policy_loss: -0.09475233382545412
          total_loss: -0.008973745862022042
          vf_explained_var: 0.7848861813545227
          vf_loss: 0.06505859876051545
    num_agent_steps_sampled: 10500192
    num_steps_sampled: 10500192
    num_steps_trained: 10500192
  iterations_since_restore: 110
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1651,71156.9,10500192,1.80802,1.9828,-2,37.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10504192
  custom_metrics: {}
  date: 2021-12-10_09-31-41
  done: false
  episode_len_mean: 37.93
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.806404002904892
  episode_reward_min: -2.0
  episodes_this_iter: 83
  episodes_total: 211157
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0170506164431572
          entropy_coeff: 0.0
          kl: 0.013786862778943032
          policy_loss: -0.09947266895323992
          total_loss: -0.02267722727265209
          vf_explained_var: 0.8082572221755981
          vf_loss: 0.055856643710285425
    num_agent_steps_sampled: 10504192
    num_steps_sampled: 10504192
    num_steps_trained: 10504192
  iterations_since_restore: 111
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1652,71182.7,10504192,1.8064,1.9828,-2,37.93


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10508192
  custom_metrics: {}
  date: 2021-12-10_09-32-07
  done: false
  episode_len_mean: 47.1
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8294519996643066
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 211251
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9619227647781372
          entropy_coeff: 0.0
          kl: 0.014617195585742593
          policy_loss: -0.10902484669350088
          total_loss: -0.04575526562985033
          vf_explained_var: 0.8256763219833374
          vf_loss: 0.041069717379286885
    num_agent_steps_sampled: 10508192
    num_steps_sampled: 10508192
    num_steps_trained: 10508192
  iterations_since_restore: 112
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1653,71208,10508192,1.82945,1.9848,-2,47.1


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10512192
  custom_metrics: {}
  date: 2021-12-10_09-32-33
  done: false
  episode_len_mean: 43.23
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9138919985294343
  episode_reward_min: 1.1139999628067017
  episodes_this_iter: 87
  episodes_total: 211338
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0234736688435078
          entropy_coeff: 0.0
          kl: 0.014255486137699336
          policy_loss: -0.10294889053329825
          total_loss: -0.041104378855379764
          vf_explained_var: 0.8242684602737427
          vf_loss: 0.04019399266690016
    num_agent_steps_sampled: 10512192
    num_steps_sampled: 10512192
    num_steps_trained: 10512192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1654,71233.9,10512192,1.91389,1.9808,1.114,43.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10516192
  custom_metrics: {}
  date: 2021-12-10_09-32-57
  done: false
  episode_len_mean: 53.17
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.817352003864944
  episode_reward_min: -2.0
  episodes_this_iter: 83
  episodes_total: 211421
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.986194159835577
          entropy_coeff: 0.0
          kl: 0.013391020183917135
          policy_loss: -0.0915029151365161
          total_loss: -0.013866652268916368
          vf_explained_var: 0.7720463275909424
          vf_loss: 0.05729865236207843
    num_agent_steps_sampled: 10516192
    num_steps_sampled: 10516192
    num_steps_trained: 10516192
  iterations_since_restore: 114
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1655,71258.8,10516192,1.81735,1.9816,-2,53.17


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10520192
  custom_metrics: {}
  date: 2021-12-10_09-33-23
  done: false
  episode_len_mean: 41.083969465648856
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8611664089537758
  episode_reward_min: -2.0
  episodes_this_iter: 131
  episodes_total: 211552
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8230500631034374
          entropy_coeff: 0.0
          kl: 0.013664060039445758
          policy_loss: -0.09585019550286233
          total_loss: -0.0379827773431316
          vf_explained_var: 0.8136617541313171
          vf_loss: 0.03711513034068048
    num_agent_steps_sampled: 10520192
    num_steps_sampled: 10520192
    num_steps_trained: 10520192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1656,71284.3,10520192,1.86117,1.9844,-2,41.084


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10524192
  custom_metrics: {}
  date: 2021-12-10_09-33-49
  done: false
  episode_len_mean: 32.375
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.900714285671711
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 211664
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9107679538428783
          entropy_coeff: 0.0
          kl: 0.013133195927366614
          policy_loss: -0.09411758003989235
          total_loss: -0.031575804809108377
          vf_explained_var: 0.8169515132904053
          vf_loss: 0.042595733422786
    num_agent_steps_sampled: 10524192
    num_steps_sampled: 10524192
    num_steps_trained: 10524192
  iterations_since_restore: 116
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1657,71309.9,10524192,1.90071,1.982,-2,32.375


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10528192
  custom_metrics: {}
  date: 2021-12-10_09-34-14
  done: false
  episode_len_mean: 35.79090909090909
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9288109107451006
  episode_reward_min: 1.565999984741211
  episodes_this_iter: 110
  episodes_total: 211774
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8549363799393177
          entropy_coeff: 0.0
          kl: 0.014381026790942997
          policy_loss: -0.1010630518430844
          total_loss: -0.04347628087271005
          vf_explained_var: 0.7622597217559814
          vf_loss: 0.03574558824766427
    num_agent_steps_sampled: 10528192
    num_steps_sampled: 10528192
    num_steps_trained: 10528192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1658,71335.2,10528192,1.92881,1.9804,1.566,35.7909


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10532192
  custom_metrics: {}
  date: 2021-12-10_09-34-39
  done: false
  episode_len_mean: 34.28181818181818
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9318909103220159
  episode_reward_min: 1.7259999513626099
  episodes_this_iter: 110
  episodes_total: 211884
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8957654163241386
          entropy_coeff: 0.0
          kl: 0.014665354217868298
          policy_loss: -0.10771054844371974
          total_loss: -0.04898567224154249
          vf_explained_var: 0.7489517331123352
          vf_loss: 0.03645187139045447
    num_agent_steps_sampled: 10532192
    num_steps_sampled: 10532192
    num_steps_trained: 10532192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1659,71360.1,10532192,1.93189,1.9816,1.726,34.2818


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10536192
  custom_metrics: {}
  date: 2021-12-10_09-35-04
  done: false
  episode_len_mean: 37.57798165137615
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8941467863704087
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 211993
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9246273636817932
          entropy_coeff: 0.0
          kl: 0.013341585698071867
          policy_loss: -0.09775520511902869
          total_loss: -0.04702279111370444
          vf_explained_var: 0.8308860063552856
          vf_loss: 0.03046987857669592
    num_agent_steps_sampled: 10536192
    num_steps_sampled: 10536192
    num_steps_trained: 10536192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1660,71385.1,10536192,1.89415,1.9844,-2,37.578


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10540192
  custom_metrics: {}
  date: 2021-12-10_09-35-29
  done: false
  episode_len_mean: 35.288288288288285
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9298414464469429
  episode_reward_min: 1.6059999465942383
  episodes_this_iter: 111
  episodes_total: 212104
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8554763868451118
          entropy_coeff: 0.0
          kl: 0.015522724948823452
          policy_loss: -0.1081768583972007
          total_loss: -0.05398417860851623
          vf_explained_var: 0.7441469430923462
          vf_loss: 0.03061754594091326
    num_agent_steps_sampled: 10540192
    num_steps_sampled: 10540192
    num_steps_trained: 10540192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1661,71410.1,10540192,1.92984,1.9828,1.606,35.2883


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10544192
  custom_metrics: {}
  date: 2021-12-10_09-35-54
  done: false
  episode_len_mean: 30.394495412844037
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9395119279896447
  episode_reward_min: 1.7136000394821167
  episodes_this_iter: 109
  episodes_total: 212213
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9294019863009453
          entropy_coeff: 0.0
          kl: 0.015249110583681613
          policy_loss: -0.10831396258436143
          total_loss: -0.056279047043062747
          vf_explained_var: 0.8013842105865479
          vf_loss: 0.028875328949652612
    num_agent_steps_sampled: 10544192
    num_steps_sampled: 10544192
    num_steps_trained: 10544192
  iteratio

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1662,71435.6,10544192,1.93951,1.9784,1.7136,30.3945


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10548192
  custom_metrics: {}
  date: 2021-12-10_09-36-21
  done: false
  episode_len_mean: 44.91509433962264
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8738565990385019
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 212319
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8926950134336948
          entropy_coeff: 0.0
          kl: 0.013076679257210344
          policy_loss: -0.09283093409612775
          total_loss: -0.04146964126266539
          vf_explained_var: 0.8009138107299805
          vf_loss: 0.031501089106313884
    num_agent_steps_sampled: 10548192
    num_steps_sampled: 10548192
    num_steps_trained: 10548192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1663,71461.9,10548192,1.87386,1.984,-2,44.9151


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10552192
  custom_metrics: {}
  date: 2021-12-10_09-36-46
  done: false
  episode_len_mean: 37.38
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8513640022277833
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 212417
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9515324421226978
          entropy_coeff: 0.0
          kl: 0.012830383377149701
          policy_loss: -0.08656947687268257
          total_loss: -0.029238884046208113
          vf_explained_var: 0.8307021856307983
          vf_loss: 0.03784445172641426
    num_agent_steps_sampled: 10552192
    num_steps_sampled: 10552192
    num_steps_trained: 10552192
  iterations_since_restore: 123
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1664,71487,10552192,1.85136,1.9828,-2,37.38


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10556192
  custom_metrics: {}
  date: 2021-12-10_09-37-10
  done: false
  episode_len_mean: 35.77049180327869
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.7373344258206789
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 212539
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9076492637395859
          entropy_coeff: 0.0
          kl: 0.011893722577951849
          policy_loss: -0.08895627001766115
          total_loss: -0.00199805069132708
          vf_explained_var: 0.7747594118118286
          vf_loss: 0.0688946321606636
    num_agent_steps_sampled: 10556192
    num_steps_sampled: 10556192
    num_steps_trained: 10556192
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1665,71511.4,10556192,1.73733,1.9832,-2,35.7705


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10560192
  custom_metrics: {}
  date: 2021-12-10_09-37-36
  done: false
  episode_len_mean: 36.63551401869159
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.927054210243938
  episode_reward_min: 1.5140000581741333
  episodes_this_iter: 107
  episodes_total: 212646
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9029506668448448
          entropy_coeff: 0.0
          kl: 0.014846742793451995
          policy_loss: -0.1029633367434144
          total_loss: -0.04380212223622948
          vf_explained_var: 0.7905449271202087
          vf_loss: 0.03661272244062275
    num_agent_steps_sampled: 10560192
    num_steps_sampled: 10560192
    num_steps_trained: 10560192
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1666,71536.9,10560192,1.92705,1.9816,1.514,36.6355


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10564192
  custom_metrics: {}
  date: 2021-12-10_09-38-02
  done: false
  episode_len_mean: 39.78846153846154
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8831692280677648
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 212750
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9169008508324623
          entropy_coeff: 0.0
          kl: 0.014507415413390845
          policy_loss: -0.10175326719763689
          total_loss: -0.02670900432849521
          vf_explained_var: 0.7146795392036438
          vf_loss: 0.05301112704910338
    num_agent_steps_sampled: 10564192
    num_steps_sampled: 10564192
    num_steps_trained: 10564192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1667,71563.2,10564192,1.88317,1.9844,-2,39.7885


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10568192
  custom_metrics: {}
  date: 2021-12-10_09-38-28
  done: false
  episode_len_mean: 39.07
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8459840047359466
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 212840
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9549549445509911
          entropy_coeff: 0.0
          kl: 0.012917960237246007
          policy_loss: -0.09706258590449579
          total_loss: -0.04017524575465359
          vf_explained_var: 0.8558579087257385
          vf_loss: 0.03726818738505244
    num_agent_steps_sampled: 10568192
    num_steps_sampled: 10568192
    num_steps_trained: 10568192
  iterations_since_restore: 127
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1668,71588.4,10568192,1.84598,1.9828,-2,39.07


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10572192
  custom_metrics: {}
  date: 2021-12-10_09-38-53
  done: false
  episode_len_mean: 42.61
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.7721919977664948
  episode_reward_min: -2.0
  episodes_this_iter: 88
  episodes_total: 212928
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9422258026897907
          entropy_coeff: 0.0
          kl: 0.013357654330320656
          policy_loss: -0.09370980467065237
          total_loss: -0.021993423521053046
          vf_explained_var: 0.8475742340087891
          vf_loss: 0.05142944469116628
    num_agent_steps_sampled: 10572192
    num_steps_sampled: 10572192
    num_steps_trained: 10572192
  iterations_since_restore: 128
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1669,71613.8,10572192,1.77219,1.9828,-2,42.61


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10576192
  custom_metrics: {}
  date: 2021-12-10_09-39-18
  done: false
  episode_len_mean: 44.491379310344826
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8442379318434616
  episode_reward_min: -2.0
  episodes_this_iter: 116
  episodes_total: 213044
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8390113338828087
          entropy_coeff: 0.0
          kl: 0.01379877096042037
          policy_loss: -0.08921265404205769
          total_loss: -0.013005410553887486
          vf_explained_var: 0.6700916290283203
          vf_loss: 0.05525036295875907
    num_agent_steps_sampled: 10576192
    num_steps_sampled: 10576192
    num_steps_trained: 10576192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1670,71638.7,10576192,1.84424,1.9816,-2,44.4914


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10580192
  custom_metrics: {}
  date: 2021-12-10_09-39-43
  done: false
  episode_len_mean: 29.614754098360656
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8124295043163612
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 213166
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9114642180502415
          entropy_coeff: 0.0
          kl: 0.012545582838356495
          policy_loss: -0.09007738088257611
          total_loss: 0.0008507876773364842
          vf_explained_var: 0.7594384551048279
          vf_loss: 0.07187456591054797
    num_agent_steps_sampled: 10580192
    num_steps_sampled: 10580192
    num_steps_trained: 10580192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1671,71663.4,10580192,1.81243,1.9788,-2,29.6148


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10584192
  custom_metrics: {}
  date: 2021-12-10_09-40-07
  done: false
  episode_len_mean: 32.61682242990654
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8293233597390006
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 213273
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9112251587212086
          entropy_coeff: 0.0
          kl: 0.012956984632182866
          policy_loss: -0.09563570294994861
          total_loss: -0.03326911048498005
          vf_explained_var: 0.8828557133674622
          vf_loss: 0.042688168468885124
    num_agent_steps_sampled: 10584192
    num_steps_sampled: 10584192
    num_steps_trained: 10584192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1672,71688.1,10584192,1.82932,1.9808,-2,32.6168


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10588192
  custom_metrics: {}
  date: 2021-12-10_09-40-33
  done: false
  episode_len_mean: 30.982456140350877
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9385473717722976
  episode_reward_min: 1.7580000162124634
  episodes_this_iter: 114
  episodes_total: 213387
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9313055276870728
          entropy_coeff: 0.0
          kl: 0.013436562614515424
          policy_loss: -0.10335214520455338
          total_loss: -0.04277188132982701
          vf_explained_var: 0.8547742366790771
          vf_loss: 0.04017348610796034
    num_agent_steps_sampled: 10588192
    num_steps_sampled: 10588192
    num_steps_trained: 10588192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1673,71713.6,10588192,1.93855,1.9848,1.758,30.9825


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10592192
  custom_metrics: {}
  date: 2021-12-10_09-40-58
  done: false
  episode_len_mean: 42.48543689320388
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8387611618319761
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 213490
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8930696733295918
          entropy_coeff: 0.0
          kl: 0.013310471375007182
          policy_loss: -0.0979460395174101
          total_loss: -0.02419984678272158
          vf_explained_var: 0.7700884342193604
          vf_loss: 0.053530913311988115
    num_agent_steps_sampled: 10592192
    num_steps_sampled: 10592192
    num_steps_trained: 10592192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1674,71738.7,10592192,1.83876,1.9816,-2,42.4854


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10596192
  custom_metrics: {}
  date: 2021-12-10_09-41-23
  done: false
  episode_len_mean: 38.19
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8071559977531433
  episode_reward_min: -2.0
  episodes_this_iter: 86
  episodes_total: 213576
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0367241725325584
          entropy_coeff: 0.0
          kl: 0.014109423907939345
          policy_loss: -0.10700873867608607
          total_loss: -0.03960062871919945
          vf_explained_var: 0.8421322107315063
          vf_loss: 0.04597942833788693
    num_agent_steps_sampled: 10596192
    num_steps_sampled: 10596192
    num_steps_trained: 10596192
  iterations_since_restore: 134
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1675,71763.2,10596192,1.80716,1.9824,-2,38.19


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10600192
  custom_metrics: {}
  date: 2021-12-10_09-41-47
  done: false
  episode_len_mean: 40.59
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.7661040019989014
  episode_reward_min: -2.0
  episodes_this_iter: 74
  episodes_total: 213650
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0635213404893875
          entropy_coeff: 0.0
          kl: 0.01242659252602607
          policy_loss: -0.09637788101099432
          total_loss: -0.020505732158198953
          vf_explained_var: 0.8259552121162415
          vf_loss: 0.0569992670789361
    num_agent_steps_sampled: 10600192
    num_steps_sampled: 10600192
    num_steps_trained: 10600192
  iterations_since_restore: 135
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1676,71787.5,10600192,1.7661,1.9816,-2,40.59


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10604192
  custom_metrics: {}
  date: 2021-12-10_09-42-11
  done: false
  episode_len_mean: 58.07
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8057479977607727
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 213746
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9201919063925743
          entropy_coeff: 0.0
          kl: 0.014344003575388342
          policy_loss: -0.1039360233116895
          total_loss: -0.031700186285888776
          vf_explained_var: 0.803146481513977
          vf_loss: 0.05045088077895343
    num_agent_steps_sampled: 10604192
    num_steps_sampled: 10604192
    num_steps_trained: 10604192
  iterations_since_restore: 136
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1677,71812.1,10604192,1.80575,1.9804,-2,58.07


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10608192
  custom_metrics: {}
  date: 2021-12-10_09-42-36
  done: false
  episode_len_mean: 40.06
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8419679999351501
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 213840
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9465737417340279
          entropy_coeff: 0.0
          kl: 0.014113152225036174
          policy_loss: -0.10168189718388021
          total_loss: -0.022519782767631114
          vf_explained_var: 0.7937514185905457
          vf_loss: 0.05772776412777603
    num_agent_steps_sampled: 10608192
    num_steps_sampled: 10608192
    num_steps_trained: 10608192
  iterations_since_restore: 137
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1678,71836.6,10608192,1.84197,1.9816,-2,40.06


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10612192
  custom_metrics: {}
  date: 2021-12-10_09-43-01
  done: false
  episode_len_mean: 39.71287128712871
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8441861407591564
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 213941
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9048380814492702
          entropy_coeff: 0.0
          kl: 0.01382145454408601
          policy_loss: -0.10053115477785468
          total_loss: -0.03312466153874993
          vf_explained_var: 0.8328450918197632
          vf_loss: 0.0464151578489691
    num_agent_steps_sampled: 10612192
    num_steps_sampled: 10612192
    num_steps_trained: 10612192
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1679,71861.8,10612192,1.84419,1.9788,-2,39.7129


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10616192
  custom_metrics: {}
  date: 2021-12-10_09-43-26
  done: false
  episode_len_mean: 45.42
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8712399971485139
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 214036
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0012075528502464
          entropy_coeff: 0.0
          kl: 0.012881200353149325
          policy_loss: -0.09064029745059088
          total_loss: -0.017714954155962914
          vf_explained_var: 0.7825114130973816
          vf_loss: 0.05336202238686383
    num_agent_steps_sampled: 10616192
    num_steps_sampled: 10616192
    num_steps_trained: 10616192
  iterations_since_restore: 139
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1680,71886.6,10616192,1.87124,1.9804,-2,45.42


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10620192
  custom_metrics: {}
  date: 2021-12-10_09-43-51
  done: false
  episode_len_mean: 37.47747747747748
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.889913519223531
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 214147
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9189158007502556
          entropy_coeff: 0.0
          kl: 0.01381251256680116
          policy_loss: -0.10258331359364092
          total_loss: -0.02999979563173838
          vf_explained_var: 0.7559615969657898
          vf_loss: 0.051605763379484415
    num_agent_steps_sampled: 10620192
    num_steps_sampled: 10620192
    num_steps_trained: 10620192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1681,71911.4,10620192,1.88991,1.9788,-2,37.4775


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10624192
  custom_metrics: {}
  date: 2021-12-10_09-44-16
  done: false
  episode_len_mean: 37.81372549019608
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.73426274692311
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 214249
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9647745862603188
          entropy_coeff: 0.0
          kl: 0.012571343046147376
          policy_loss: -0.09401781496126205
          total_loss: -0.013939376338385046
          vf_explained_var: 0.8019881248474121
          vf_loss: 0.0609857109375298
    num_agent_steps_sampled: 10624192
    num_steps_sampled: 10624192
    num_steps_trained: 10624192
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1682,71936.5,10624192,1.73426,1.9788,-2,37.8137


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10628192
  custom_metrics: {}
  date: 2021-12-10_09-44-42
  done: false
  episode_len_mean: 36.25714285714286
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8595123847325643
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 214354
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9202052541077137
          entropy_coeff: 0.0
          kl: 0.012567112979013473
          policy_loss: -0.09352979942923412
          total_loss: -0.024256075965240598
          vf_explained_var: 0.8213082551956177
          vf_loss: 0.05018741847015917
    num_agent_steps_sampled: 10628192
    num_steps_sampled: 10628192
    num_steps_trained: 10628192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1683,71962,10628192,1.85951,1.9828,-2,36.2571


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10632192
  custom_metrics: {}
  date: 2021-12-10_09-45-07
  done: false
  episode_len_mean: 37.52
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8891599988937378
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 214452
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9835929647088051
          entropy_coeff: 0.0
          kl: 0.01408402092056349
          policy_loss: -0.10443018155638129
          total_loss: -0.047088166465982795
          vf_explained_var: 0.8506203293800354
          vf_loss: 0.035951907630078495
    num_agent_steps_sampled: 10632192
    num_steps_sampled: 10632192
    num_steps_trained: 10632192
  iterations_since_restore: 143
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1684,71987.4,10632192,1.88916,1.9848,-2,37.52


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10636192
  custom_metrics: {}
  date: 2021-12-10_09-45-32
  done: false
  episode_len_mean: 36.96521739130435
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.9015652221182118
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 214567
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9517248198390007
          entropy_coeff: 0.0
          kl: 0.013979844632558525
          policy_loss: -0.10509931395063177
          total_loss: -0.04120923756272532
          vf_explained_var: 0.7843550443649292
          vf_loss: 0.04265818803105503
    num_agent_steps_sampled: 10636192
    num_steps_sampled: 10636192
    num_steps_trained: 10636192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1685,72012.2,10636192,1.90157,1.9824,-2,36.9652


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10640192
  custom_metrics: {}
  date: 2021-12-10_09-45-57
  done: false
  episode_len_mean: 37.84466019417476
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.8092893211586962
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 214670
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9093234203755856
          entropy_coeff: 0.0
          kl: 0.012848612910602242
          policy_loss: -0.08907945232931525
          total_loss: -0.0018671727739274502
          vf_explained_var: 0.708868682384491
          vf_loss: 0.06769844866357744
    num_agent_steps_sampled: 10640192
    num_steps_sampled: 10640192
    num_steps_trained: 10640192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1686,72037.2,10640192,1.80929,1.9848,-2,37.8447


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10644192
  custom_metrics: {}
  date: 2021-12-10_09-46-23
  done: false
  episode_len_mean: 42.57
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8514000010490417
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 214768
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9769489169120789
          entropy_coeff: 0.0
          kl: 0.013563582440838218
          policy_loss: -0.10105966016453749
          total_loss: -0.04424870607908815
          vf_explained_var: 0.870366096496582
          vf_loss: 0.036211265018209815
    num_agent_steps_sampled: 10644192
    num_steps_sampled: 10644192
    num_steps_trained: 10644192
  iterations_since_restore: 146
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1687,72063.1,10644192,1.8514,1.9816,-2,42.57


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10648192
  custom_metrics: {}
  date: 2021-12-10_09-46-48
  done: false
  episode_len_mean: 43.26
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8743839967250824
  episode_reward_min: -2.0
  episodes_this_iter: 81
  episodes_total: 214849
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0142698101699352
          entropy_coeff: 0.0
          kl: 0.013693390705157071
          policy_loss: -0.1066831965581514
          total_loss: -0.0371524229994975
          vf_explained_var: 0.7923036217689514
          vf_loss: 0.04873393918387592
    num_agent_steps_sampled: 10648192
    num_steps_sampled: 10648192
    num_steps_trained: 10648192
  iterations_since_restore: 147
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1688,72088.4,10648192,1.87438,1.9832,-2,43.26


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10652192
  custom_metrics: {}
  date: 2021-12-10_09-47-13
  done: false
  episode_len_mean: 49.61
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.861699995994568
  episode_reward_min: -2.0
  episodes_this_iter: 66
  episodes_total: 214915
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0880813002586365
          entropy_coeff: 0.0
          kl: 0.014481866965070367
          policy_loss: -0.1090109896613285
          total_loss: -0.05658589178347029
          vf_explained_var: 0.8660343885421753
          vf_loss: 0.03043075860477984
    num_agent_steps_sampled: 10652192
    num_steps_sampled: 10652192
    num_steps_trained: 10652192
  iterations_since_restore: 148
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1689,72113.1,10652192,1.8617,1.9768,-2,49.61


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10656192
  custom_metrics: {}
  date: 2021-12-10_09-47-37
  done: false
  episode_len_mean: 54.44
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8253159976005555
  episode_reward_min: -2.0
  episodes_this_iter: 82
  episodes_total: 214997
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0129914097487926
          entropy_coeff: 0.0
          kl: 0.013747026270721108
          policy_loss: -0.10146417177747935
          total_loss: -0.038290760247036815
          vf_explained_var: 0.838990330696106
          vf_loss: 0.04229512088932097
    num_agent_steps_sampled: 10656192
    num_steps_sampled: 10656192
    num_steps_trained: 10656192
  iterations_since_restore: 149
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1690,72137.5,10656192,1.82532,1.9796,-2,54.44


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10660192
  custom_metrics: {}
  date: 2021-12-10_09-48-02
  done: false
  episode_len_mean: 40.42
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.880047996044159
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 215093
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0139882192015648
          entropy_coeff: 0.0
          kl: 0.013311668939422816
          policy_loss: -0.09749062394257635
          total_loss: -0.03319137077778578
          vf_explained_var: 0.8138867616653442
          vf_loss: 0.04408215649891645
    num_agent_steps_sampled: 10660192
    num_steps_sampled: 10660192
    num_steps_trained: 10660192
  iterations_since_restore: 150
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1691,72162.3,10660192,1.88005,1.9788,-2,40.42


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10664192
  custom_metrics: {}
  date: 2021-12-10_09-48-27
  done: false
  episode_len_mean: 34.31
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8922919976711272
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 215191
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0122797451913357
          entropy_coeff: 0.0
          kl: 0.014525117352604866
          policy_loss: -0.10331133124418557
          total_loss: -0.03981440817005932
          vf_explained_var: 0.8475170135498047
          vf_loss: 0.041436903178691864
    num_agent_steps_sampled: 10664192
    num_steps_sampled: 10664192
    num_steps_trained: 10664192
  iterations_since_restore: 151
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1692,72186.7,10664192,1.89229,1.9784,-2,34.31


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10668192
  custom_metrics: {}
  date: 2021-12-10_09-48-51
  done: false
  episode_len_mean: 51.28
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8583639954030513
  episode_reward_min: -2.0
  episodes_this_iter: 84
  episodes_total: 215275
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.00392359867692
          entropy_coeff: 0.0
          kl: 0.01356419624062255
          policy_loss: -0.10053093614988029
          total_loss: -0.03770386695396155
          vf_explained_var: 0.8321813344955444
          vf_loss: 0.04222644632682204
    num_agent_steps_sampled: 10668192
    num_steps_sampled: 10668192
    num_steps_trained: 10668192
  iterations_since_restore: 152
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1693,72210.8,10668192,1.85836,1.9788,-2,51.28


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10672192
  custom_metrics: {}
  date: 2021-12-10_09-49-15
  done: false
  episode_len_mean: 45.55
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.7942119944095611
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 215374
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9619132541120052
          entropy_coeff: 0.0
          kl: 0.013427466212306172
          policy_loss: -0.09846191783435643
          total_loss: -0.02306423312984407
          vf_explained_var: 0.8331310749053955
          vf_loss: 0.0550047205761075
    num_agent_steps_sampled: 10672192
    num_steps_sampled: 10672192
    num_steps_trained: 10672192
  iterations_since_restore: 153
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1694,72235.3,10672192,1.79421,1.9824,-2,45.55


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10676192
  custom_metrics: {}
  date: 2021-12-10_09-49-40
  done: false
  episode_len_mean: 42.84
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.9147360023856164
  episode_reward_min: 0.3084000051021576
  episodes_this_iter: 94
  episodes_total: 215468
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9376295171678066
          entropy_coeff: 0.0
          kl: 0.015187801094725728
          policy_loss: -0.10554407886229455
          total_loss: -0.041723822709172964
          vf_explained_var: 0.8258719444274902
          vf_loss: 0.04075378447305411
    num_agent_steps_sampled: 10676192
    num_steps_sampled: 10676192
    num_steps_trained: 10676192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1695,72259.9,10676192,1.91474,1.98,0.3084,42.84


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10680192
  custom_metrics: {}
  date: 2021-12-10_09-50-04
  done: false
  episode_len_mean: 45.88
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.7981319999694825
  episode_reward_min: -2.0
  episodes_this_iter: 82
  episodes_total: 215550
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9964409470558167
          entropy_coeff: 0.0
          kl: 0.013674149406142533
          policy_loss: -0.10123075172305107
          total_loss: -0.0357573619694449
          vf_explained_var: 0.8481698036193848
          vf_loss: 0.04470577696338296
    num_agent_steps_sampled: 10680192
    num_steps_sampled: 10680192
    num_steps_trained: 10680192
  iterations_since_restore: 155
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1696,72284.3,10680192,1.79813,1.9796,-2,45.88


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10684192
  custom_metrics: {}
  date: 2021-12-10_09-50-29
  done: false
  episode_len_mean: 43.6
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.797344000339508
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 215644
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0351072400808334
          entropy_coeff: 0.0
          kl: 0.012502382334787399
          policy_loss: -0.09652664512395859
          total_loss: -0.025190122541971505
          vf_explained_var: 0.8150627613067627
          vf_loss: 0.05234853411093354
    num_agent_steps_sampled: 10684192
    num_steps_sampled: 10684192
    num_steps_trained: 10684192
  iterations_since_restore: 156
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1697,72309,10684192,1.79734,1.9816,-2,43.6


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10688192
  custom_metrics: {}
  date: 2021-12-10_09-50-54
  done: false
  episode_len_mean: 37.94
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9245359981060028
  episode_reward_min: 1.509600043296814
  episodes_this_iter: 95
  episodes_total: 215739
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.003893181681633
          entropy_coeff: 0.0
          kl: 0.014602153969462961
          policy_loss: -0.10767064162064344
          total_loss: -0.04526108270511031
          vf_explained_var: 0.8124492764472961
          vf_loss: 0.04023254197090864
    num_agent_steps_sampled: 10688192
    num_steps_sampled: 10688192
    num_steps_trained: 10688192
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1698,72334.3,10688192,1.92454,1.9816,1.5096,37.94


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10692192
  custom_metrics: {}
  date: 2021-12-10_09-51-20
  done: false
  episode_len_mean: 42.375
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8451071456074715
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 215851
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9493576213717461
          entropy_coeff: 0.0
          kl: 0.013041530502960086
          policy_loss: -0.09741516702342778
          total_loss: -0.028066731902072206
          vf_explained_var: 0.7954546809196472
          vf_loss: 0.04954161401838064
    num_agent_steps_sampled: 10692192
    num_steps_sampled: 10692192
    num_steps_trained: 10692192
  iterations_since_restore: 158
  nod

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1699,72359.7,10692192,1.84511,1.9844,-2,42.375


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10696192
  custom_metrics: {}
  date: 2021-12-10_09-51-44
  done: false
  episode_len_mean: 40.72
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.775768003463745
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 215950
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9604893662035465
          entropy_coeff: 0.0
          kl: 0.01296390569768846
          policy_loss: -0.09692406479734927
          total_loss: -0.006973437033593655
          vf_explained_var: 0.8036534786224365
          vf_loss: 0.0702616972848773
    num_agent_steps_sampled: 10696192
    num_steps_sampled: 10696192
    num_steps_trained: 10696192
  iterations_since_restore: 159
  node_ip: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1700,72384.2,10696192,1.77577,1.9828,-2,40.72


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10700192
  custom_metrics: {}
  date: 2021-12-10_09-52-10
  done: false
  episode_len_mean: 37.38135593220339
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8665118651874995
  episode_reward_min: -2.0
  episodes_this_iter: 118
  episodes_total: 216068
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8682905472815037
          entropy_coeff: 0.0
          kl: 0.014278910122811794
          policy_loss: -0.09956859378144145
          total_loss: -0.027458102093078196
          vf_explained_var: 0.816552996635437
          vf_loss: 0.050424390472471714
    num_agent_steps_sampled: 10700192
    num_steps_sampled: 10700192
    num_steps_trained: 10700192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1701,72409.9,10700192,1.86651,1.9824,-2,37.3814


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10704192
  custom_metrics: {}
  date: 2021-12-10_09-52-34
  done: false
  episode_len_mean: 38.5462962962963
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.850737037482085
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 216176
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8797822184860706
          entropy_coeff: 0.0
          kl: 0.014561190502718091
          policy_loss: -0.10918155359104276
          total_loss: -0.03370347979944199
          vf_explained_var: 0.738531231880188
          vf_loss: 0.0533632670994848
    num_agent_steps_sampled: 10704192
    num_steps_sampled: 10704192
    num_steps_trained: 10704192
  iterations_since_restore: 161

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1702,72434.3,10704192,1.85074,1.9812,-2,38.5463


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10708192
  custom_metrics: {}
  date: 2021-12-10_09-52-59
  done: false
  episode_len_mean: 34.626168224299064
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8575476664249029
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 216283
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9152376763522625
          entropy_coeff: 0.0
          kl: 0.013884469226468354
          policy_loss: -0.10051501262933016
          total_loss: -0.024081991155981086
          vf_explained_var: 0.7243747115135193
          vf_loss: 0.05534598440863192
    num_agent_steps_sampled: 10708192
    num_steps_sampled: 10708192
    num_steps_trained: 10708192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1703,72458.6,10708192,1.85755,1.9788,-2,34.6262


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10712192
  custom_metrics: {}
  date: 2021-12-10_09-53-24
  done: false
  episode_len_mean: 37.19642857142857
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.750642859510013
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 216395
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9088264517486095
          entropy_coeff: 0.0
          kl: 0.012796425202395767
          policy_loss: -0.08971276017837226
          total_loss: -7.626949809491634e-05
          vf_explained_var: 0.7706409692764282
          vf_loss: 0.07020191731862724
    num_agent_steps_sampled: 10712192
    num_steps_sampled: 10712192
    num_steps_trained: 10712192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1704,72484.1,10712192,1.75064,1.982,-2,37.1964


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10716192
  custom_metrics: {}
  date: 2021-12-10_09-53-49
  done: false
  episode_len_mean: 34.84
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8917400050163269
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 216489
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9842984825372696
          entropy_coeff: 0.0
          kl: 0.013714082248043269
          policy_loss: -0.10821726024732925
          total_loss: -0.04170941805932671
          vf_explained_var: 0.8377752304077148
          vf_loss: 0.04567958158440888
    num_agent_steps_sampled: 10716192
    num_steps_sampled: 10716192
    num_steps_trained: 10716192
  iterations_since_restore: 164
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1705,72509,10716192,1.89174,1.9788,-2,34.84


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10720192
  custom_metrics: {}
  date: 2021-12-10_09-54-14
  done: false
  episode_len_mean: 50.14
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.7509080016613006
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 216584
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9721208401024342
          entropy_coeff: 0.0
          kl: 0.013848637463524938
          policy_loss: -0.09934278368018568
          total_loss: 0.007438646862283349
          vf_explained_var: 0.7247153520584106
          vf_loss: 0.08574881311506033
    num_agent_steps_sampled: 10720192
    num_steps_sampled: 10720192
    num_steps_trained: 10720192
  iterations_since_restore: 165
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1706,72533.3,10720192,1.75091,1.9844,-2,50.14


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10724192
  custom_metrics: {}
  date: 2021-12-10_09-54-38
  done: false
  episode_len_mean: 36.41284403669725
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8564366966212562
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 216693
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8473094068467617
          entropy_coeff: 0.0
          kl: 0.014670764794573188
          policy_loss: -0.1021985353436321
          total_loss: -0.03329361090436578
          vf_explained_var: 0.7470247745513916
          vf_loss: 0.04662369773723185
    num_agent_steps_sampled: 10724192
    num_steps_sampled: 10724192
    num_steps_trained: 10724192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1707,72558.2,10724192,1.85644,1.98,-2,36.4128


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10728192
  custom_metrics: {}
  date: 2021-12-10_09-55-03
  done: false
  episode_len_mean: 35.71568627450981
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9290000001589458
  episode_reward_min: 1.6252000331878662
  episodes_this_iter: 102
  episodes_total: 216795
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.955931942909956
          entropy_coeff: 0.0
          kl: 0.014896596490871161
          policy_loss: -0.1093378933146596
          total_loss: -0.04691046557854861
          vf_explained_var: 0.768837571144104
          vf_loss: 0.03980322438292205
    num_agent_steps_sampled: 10728192
    num_steps_sampled: 10728192
    num_steps_trained: 10728192
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1708,72582.5,10728192,1.929,1.978,1.6252,35.7157


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10732192
  custom_metrics: {}
  date: 2021-12-10_09-55-27
  done: false
  episode_len_mean: 41.424528301886795
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8092075496349695
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 216901
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8870655111968517
          entropy_coeff: 0.0
          kl: 0.012955479149241
          policy_loss: -0.09667397476732731
          total_loss: -0.0146304985973984
          vf_explained_var: 0.708957314491272
          vf_loss: 0.06236734916456044
    num_agent_steps_sampled: 10732192
    num_steps_sampled: 10732192
    num_steps_trained: 10732192
  iterations_since_restore: 168

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1709,72606.9,10732192,1.80921,1.9784,-2,41.4245


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10736192
  custom_metrics: {}
  date: 2021-12-10_09-55-52
  done: false
  episode_len_mean: 40.83
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8404560017585754
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 217000
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8896370381116867
          entropy_coeff: 0.0
          kl: 0.014163940621074289
          policy_loss: -0.10277461388614029
          total_loss: -0.029860630282200873
          vf_explained_var: 0.7654083967208862
          vf_loss: 0.05140250362455845
    num_agent_steps_sampled: 10736192
    num_steps_sampled: 10736192
    num_steps_trained: 10736192
  iterations_since_restore: 169
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1710,72631.3,10736192,1.84046,1.982,-2,40.83


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10740192
  custom_metrics: {}
  date: 2021-12-10_09-56-16
  done: false
  episode_len_mean: 36.13
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.9280799949169158
  episode_reward_min: 1.3716000318527222
  episodes_this_iter: 97
  episodes_total: 217097
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9002421535551548
          entropy_coeff: 0.0
          kl: 0.015766583383083344
          policy_loss: -0.1102602833416313
          total_loss: -0.04831795167410746
          vf_explained_var: 0.7544558644294739
          vf_loss: 0.03799683472607285
    num_agent_steps_sampled: 10740192
    num_steps_sampled: 10740192
    num_steps_trained: 10740192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1711,72655.7,10740192,1.92808,1.9832,1.3716,36.13


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10744192
  custom_metrics: {}
  date: 2021-12-10_09-56-40
  done: false
  episode_len_mean: 37.53846153846154
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.862512819787376
  episode_reward_min: -2.0
  episodes_this_iter: 117
  episodes_total: 217214
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8629035204648972
          entropy_coeff: 0.0
          kl: 0.01379629276925698
          policy_loss: -0.09510877996945055
          total_loss: -0.03854231827426702
          vf_explained_var: 0.8647392392158508
          vf_loss: 0.035613341722637415
    num_agent_steps_sampled: 10744192
    num_steps_sampled: 10744192
    num_steps_trained: 10744192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1712,72680,10744192,1.86251,1.9784,-2,37.5385


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10748192
  custom_metrics: {}
  date: 2021-12-10_09-57-05
  done: false
  episode_len_mean: 39.40594059405941
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.882799999548657
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 217315
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8944675996899605
          entropy_coeff: 0.0
          kl: 0.014369855751283467
          policy_loss: -0.10061928420327604
          total_loss: -0.0493470250221435
          vf_explained_var: 0.7898209691047668
          vf_loss: 0.029448042158037424
    num_agent_steps_sampled: 10748192
    num_steps_sampled: 10748192
    num_steps_trained: 10748192
  iterations_since_restore: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1713,72704.3,10748192,1.8828,1.9816,-2,39.4059


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10752192
  custom_metrics: {}
  date: 2021-12-10_09-57-29
  done: false
  episode_len_mean: 32.627450980392155
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8982509783670014
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 217417
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9342758283019066
          entropy_coeff: 0.0
          kl: 0.013616883079521358
          policy_loss: -0.09584546915721148
          total_loss: -0.03531521646073088
          vf_explained_var: 0.8911056518554688
          vf_loss: 0.03984960983507335
    num_agent_steps_sampled: 10752192
    num_steps_sampled: 10752192
    num_steps_trained: 10752192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1714,72728.7,10752192,1.89825,1.98,-2,32.6275


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10756192
  custom_metrics: {}
  date: 2021-12-10_09-57-55
  done: false
  episode_len_mean: 38.754901960784316
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8139254941659815
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 217519
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9527634158730507
          entropy_coeff: 0.0
          kl: 0.0135348568437621
          policy_loss: -0.09617130929837003
          total_loss: -0.03387085068970919
          vf_explained_var: 0.8962615728378296
          vf_loss: 0.04174439248163253
    num_agent_steps_sampled: 10756192
    num_steps_sampled: 10756192
    num_steps_trained: 10756192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1715,72754.1,10756192,1.81393,1.9836,-2,38.7549


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10760192
  custom_metrics: {}
  date: 2021-12-10_09-58-19
  done: false
  episode_len_mean: 36.46
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.733600002527237
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 217618
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9881210513412952
          entropy_coeff: 0.0
          kl: 0.012930859462358057
          policy_loss: -0.09345324581954628
          total_loss: -0.027830736886244267
          vf_explained_var: 0.8600932359695435
          vf_loss: 0.045983764342963696
    num_agent_steps_sampled: 10760192
    num_steps_sampled: 10760192
    num_steps_trained: 10760192
  iterations_since_restore: 175
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1716,72778.4,10760192,1.7336,1.9852,-2,36.46


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10764192
  custom_metrics: {}
  date: 2021-12-10_09-58-43
  done: false
  episode_len_mean: 48.6078431372549
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.872207847880382
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 217720
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9112381413578987
          entropy_coeff: 0.0
          kl: 0.013850461633410305
          policy_loss: -0.10122135712299496
          total_loss: -0.0325687070726417
          vf_explained_var: 0.7865800857543945
          vf_loss: 0.0476172654889524
    num_agent_steps_sampled: 10764192
    num_steps_sampled: 10764192
    num_steps_trained: 10764192
  iterations_since_restore: 176

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1717,72802.8,10764192,1.87221,1.982,-2,48.6078


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10768192
  custom_metrics: {}
  date: 2021-12-10_09-59-08
  done: false
  episode_len_mean: 36.61904761904762
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.9272076163973126
  episode_reward_min: 1.3555999994277954
  episodes_this_iter: 105
  episodes_total: 217825
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9380429834127426
          entropy_coeff: 0.0
          kl: 0.015117374947294593
          policy_loss: -0.11235035140998662
          total_loss: -0.04965300124604255
          vf_explained_var: 0.7615100145339966
          vf_loss: 0.039737837156280875
    num_agent_steps_sampled: 10768192
    num_steps_sampled: 10768192
    num_steps_trained: 10768192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1718,72827.2,10768192,1.92721,1.9848,1.3556,36.619


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10772192
  custom_metrics: {}
  date: 2021-12-10_09-59-32
  done: false
  episode_len_mean: 38.31
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.8471600008010864
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 217915
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9571386240422726
          entropy_coeff: 0.0
          kl: 0.013682230492122471
          policy_loss: -0.09951768722385168
          total_loss: -0.03651119739515707
          vf_explained_var: 0.8360742330551147
          vf_loss: 0.04222660604864359
    num_agent_steps_sampled: 10772192
    num_steps_sampled: 10772192
    num_steps_trained: 10772192
  iterations_since_restore: 178
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1719,72851.5,10772192,1.84716,1.9852,-2,38.31


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10776192
  custom_metrics: {}
  date: 2021-12-10_09-59-56
  done: false
  episode_len_mean: 33.15
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.895224004983902
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 218015
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9848042204976082
          entropy_coeff: 0.0
          kl: 0.013938964577391744
          policy_loss: -0.10567584057571366
          total_loss: -0.04087158458423801
          vf_explained_var: 0.8115016222000122
          vf_loss: 0.043634456233121455
    num_agent_steps_sampled: 10776192
    num_steps_sampled: 10776192
    num_steps_trained: 10776192
  iterations_since_restore: 179
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1720,72875.8,10776192,1.89522,1.9852,-2,33.15


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10780192
  custom_metrics: {}
  date: 2021-12-10_10-00-21
  done: false
  episode_len_mean: 43.78
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9128800010681153
  episode_reward_min: 0.6759999990463257
  episodes_this_iter: 89
  episodes_total: 218104
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9999388754367828
          entropy_coeff: 0.0
          kl: 0.013630753557663411
          policy_loss: -0.10484412958612666
          total_loss: -0.05552747700130567
          vf_explained_var: 0.8619991540908813
          vf_loss: 0.02861494361422956
    num_agent_steps_sampled: 10780192
    num_steps_sampled: 10780192
    num_steps_trained: 10780192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1721,72900.2,10780192,1.91288,1.9796,0.676,43.78


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10784192
  custom_metrics: {}
  date: 2021-12-10_10-00-45
  done: false
  episode_len_mean: 45.414414414414416
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9095243290737942
  episode_reward_min: 0.46480000019073486
  episodes_this_iter: 111
  episodes_total: 218215
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8986320830881596
          entropy_coeff: 0.0
          kl: 0.013866646506357938
          policy_loss: -0.10340548562817276
          total_loss: -0.04499745706561953
          vf_explained_var: 0.7391903400421143
          vf_loss: 0.03734805702697486
    num_agent_steps_sampled: 10784192
    num_steps_sampled: 10784192
    num_steps_trained: 10784192
  iteration

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1722,72924.8,10784192,1.90952,1.9772,0.4648,45.4144


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10788192
  custom_metrics: {}
  date: 2021-12-10_10-01-10
  done: false
  episode_len_mean: 30.098214285714285
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8341928550175257
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 218327
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9105172008275986
          entropy_coeff: 0.0
          kl: 0.012418529717251658
          policy_loss: -0.08828702487517148
          total_loss: -0.011499196174554527
          vf_explained_var: 0.7471222877502441
          vf_loss: 0.057927189860492945
    num_agent_steps_sampled: 10788192
    num_steps_sampled: 10788192
    num_steps_trained: 10788192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1723,72949.4,10788192,1.83419,1.982,-2,30.0982


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10792192
  custom_metrics: {}
  date: 2021-12-10_10-01-35
  done: false
  episode_len_mean: 51.15
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.7805959990620612
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 218426
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9526505433022976
          entropy_coeff: 0.0
          kl: 0.013336167787201703
          policy_loss: -0.0971358553506434
          total_loss: -0.013726673205383122
          vf_explained_var: 0.7646329402923584
          vf_loss: 0.06315488275140524
    num_agent_steps_sampled: 10792192
    num_steps_sampled: 10792192
    num_steps_trained: 10792192
  iterations_since_restore: 183
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1724,72974.4,10792192,1.7806,1.982,-2,51.15


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10796192
  custom_metrics: {}
  date: 2021-12-10_10-01-59
  done: false
  episode_len_mean: 37.208695652173915
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8247999989468118
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 218541
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8351386785507202
          entropy_coeff: 0.0
          kl: 0.013184265524614602
          policy_loss: -0.09478265955112875
          total_loss: -0.01878549251705408
          vf_explained_var: 0.7483395338058472
          vf_loss: 0.05597356450743973
    num_agent_steps_sampled: 10796192
    num_steps_sampled: 10796192
    num_steps_trained: 10796192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1725,72998.6,10796192,1.8248,1.98,-2,37.2087


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10800192
  custom_metrics: {}
  date: 2021-12-10_10-02-24
  done: false
  episode_len_mean: 33.321100917431195
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.933717432372067
  episode_reward_min: 1.791599988937378
  episodes_this_iter: 109
  episodes_total: 218650
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8626375757157803
          entropy_coeff: 0.0
          kl: 0.014872783969622105
          policy_loss: -0.1011193769518286
          total_loss: -0.03886022343067452
          vf_explained_var: 0.6468865871429443
          vf_loss: 0.03967110952362418
    num_agent_steps_sampled: 10800192
    num_steps_sampled: 10800192
    num_steps_trained: 10800192
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1726,73023,10800192,1.93372,1.9796,1.7916,33.3211


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10804192
  custom_metrics: {}
  date: 2021-12-10_10-02-48
  done: false
  episode_len_mean: 32.260162601626014
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9041463369276466
  episode_reward_min: -2.0
  episodes_this_iter: 123
  episodes_total: 218773
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8996288552880287
          entropy_coeff: 0.0
          kl: 0.013468406861647964
          policy_loss: -0.094564997125417
          total_loss: -0.034688071347773075
          vf_explained_var: 0.7370703220367432
          vf_loss: 0.039421780849806964
    num_agent_steps_sampled: 10804192
    num_steps_sampled: 10804192
    num_steps_trained: 10804192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1727,73047.3,10804192,1.90415,1.9836,-2,32.2602


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10808192
  custom_metrics: {}
  date: 2021-12-10_10-03-13
  done: false
  episode_len_mean: 34.61682242990654
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8589009347363052
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 218880
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9226475842297077
          entropy_coeff: 0.0
          kl: 0.014338142646010965
          policy_loss: -0.09482064447365701
          total_loss: -0.04083906684536487
          vf_explained_var: 0.8697711229324341
          vf_loss: 0.03220552718266845
    num_agent_steps_sampled: 10808192
    num_steps_sampled: 10808192
    num_steps_trained: 10808192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1728,73072.2,10808192,1.8589,1.9816,-2,34.6168


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10812192
  custom_metrics: {}
  date: 2021-12-10_10-03-38
  done: false
  episode_len_mean: 38.277227722772274
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.8845306953581253
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 218981
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9981056116521358
          entropy_coeff: 0.0
          kl: 0.012638771499041468
          policy_loss: -0.10021134559065104
          total_loss: -0.038967087981291115
          vf_explained_var: 0.8011102080345154
          vf_loss: 0.04204912588465959
    num_agent_steps_sampled: 10812192
    num_steps_sampled: 10812192
    num_steps_trained: 10812192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1729,73097,10812192,1.88453,1.9852,-2,38.2772


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10816192
  custom_metrics: {}
  date: 2021-12-10_10-04-02
  done: false
  episode_len_mean: 38.76
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.77110799908638
  episode_reward_min: -2.0
  episodes_this_iter: 79
  episodes_total: 219060
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0256059914827347
          entropy_coeff: 0.0
          kl: 0.012971364136319607
          policy_loss: -0.09244664409197867
          total_loss: -0.030990534694865346
          vf_explained_var: 0.8631092309951782
          vf_loss: 0.04175585275515914
    num_agent_steps_sampled: 10816192
    num_steps_sampled: 10816192
    num_steps_trained: 10816192
  iterations_since_restore: 189
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1730,73121.2,10816192,1.77111,1.9852,-2,38.76


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10820192
  custom_metrics: {}
  date: 2021-12-10_10-04-26
  done: false
  episode_len_mean: 53.31
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8937320041656494
  episode_reward_min: 1.1643999814987183
  episodes_this_iter: 86
  episodes_total: 219146
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9665836319327354
          entropy_coeff: 0.0
          kl: 0.014011391438543797
          policy_loss: -0.09809945023152977
          total_loss: -0.03998351184418425
          vf_explained_var: 0.763725221157074
          vf_loss: 0.03683613822795451
    num_agent_steps_sampled: 10820192
    num_steps_sampled: 10820192
    num_steps_trained: 10820192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1731,73145.3,10820192,1.89373,1.9836,1.1644,53.31


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10824192
  custom_metrics: {}
  date: 2021-12-10_10-04-51
  done: false
  episode_len_mean: 38.652542372881356
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8667796617847379
  episode_reward_min: -2.0
  episodes_this_iter: 118
  episodes_total: 219264
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9288736693561077
          entropy_coeff: 0.0
          kl: 0.012786135193891823
          policy_loss: -0.09903238504193723
          total_loss: -0.02253742271568626
          vf_explained_var: 0.7373120784759521
          vf_loss: 0.057076019467785954
    num_agent_steps_sampled: 10824192
    num_steps_sampled: 10824192
    num_steps_trained: 10824192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1732,73169.6,10824192,1.86678,1.9836,-2,38.6525


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10828192
  custom_metrics: {}
  date: 2021-12-10_10-05-15
  done: false
  episode_len_mean: 40.94
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8042720007896422
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 219362
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8520068190991879
          entropy_coeff: 0.0
          kl: 0.013451664242893457
          policy_loss: -0.09147411084268242
          total_loss: -0.013304534135386348
          vf_explained_var: 0.6337404251098633
          vf_loss: 0.05773986177518964
    num_agent_steps_sampled: 10828192
    num_steps_sampled: 10828192
    num_steps_trained: 10828192
  iterations_since_restore: 192
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1733,73194,10828192,1.80427,1.9836,-2,40.94


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10832192
  custom_metrics: {}
  date: 2021-12-10_10-05-40
  done: false
  episode_len_mean: 39.55
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9213000011444092
  episode_reward_min: 1.6003999710083008
  episodes_this_iter: 93
  episodes_total: 219455
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9272521659731865
          entropy_coeff: 0.0
          kl: 0.015416526061017066
          policy_loss: -0.10933753417339176
          total_loss: -0.053868146147578955
          vf_explained_var: 0.8092595338821411
          vf_loss: 0.03205554187297821
    num_agent_steps_sampled: 10832192
    num_steps_sampled: 10832192
    num_steps_trained: 10832192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1734,73218.8,10832192,1.9213,1.982,1.6004,39.55


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10836192
  custom_metrics: {}
  date: 2021-12-10_10-06-04
  done: false
  episode_len_mean: 45.0
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.7175520026683808
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 219545
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9636607505381107
          entropy_coeff: 0.0
          kl: 0.014069678843952715
          policy_loss: -0.10188898665364832
          total_loss: -0.02894920133985579
          vf_explained_var: 0.8110877275466919
          vf_loss: 0.0515714637003839
    num_agent_steps_sampled: 10836192
    num_steps_sampled: 10836192
    num_steps_trained: 10836192
  iterations_since_restore: 194
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1735,73243,10836192,1.71755,1.9808,-2,45


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10840192
  custom_metrics: {}
  date: 2021-12-10_10-06-29
  done: false
  episode_len_mean: 38.62
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8453400015830994
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 219637
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9562831856310368
          entropy_coeff: 0.0
          kl: 0.013792168814688921
          policy_loss: -0.09891369746765122
          total_loss: -0.030062896548770368
          vf_explained_var: 0.8166906833648682
          vf_loss: 0.047903944505378604
    num_agent_steps_sampled: 10840192
    num_steps_sampled: 10840192
    num_steps_trained: 10840192
  iterations_since_restore: 195
  node

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1736,73267.7,10840192,1.84534,1.98,-2,38.62


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10844192
  custom_metrics: {}
  date: 2021-12-10_10-06-53
  done: false
  episode_len_mean: 42.647619047619045
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8776571489515759
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 219742
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9005480483174324
          entropy_coeff: 0.0
          kl: 0.013944198377430439
          policy_loss: -0.1012989308219403
          total_loss: -0.03534422573284246
          vf_explained_var: 0.7716299295425415
          vf_loss: 0.044776951894164085
    num_agent_steps_sampled: 10844192
    num_steps_sampled: 10844192
    num_steps_trained: 10844192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1737,73292.2,10844192,1.87766,1.9836,-2,42.6476


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10848192
  custom_metrics: {}
  date: 2021-12-10_10-07-18
  done: false
  episode_len_mean: 35.7047619047619
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9290399982815698
  episode_reward_min: 1.190000057220459
  episodes_this_iter: 105
  episodes_total: 219847
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9205966144800186
          entropy_coeff: 0.0
          kl: 0.014743709820322692
          policy_loss: -0.10907549317926168
          total_loss: -0.048291677492670715
          vf_explained_var: 0.7627310752868652
          vf_loss: 0.038391807465814054
    num_agent_steps_sampled: 10848192
    num_steps_sampled: 10848192
    num_steps_trained: 10848192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1738,73316.7,10848192,1.92904,1.9804,1.19,35.7048


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10852192
  custom_metrics: {}
  date: 2021-12-10_10-07-42
  done: false
  episode_len_mean: 41.35849056603774
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8081169780695214
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 219953
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9308238551020622
          entropy_coeff: 0.0
          kl: 0.013263685395941138
          policy_loss: -0.09328561869915575
          total_loss: -0.00016905172378756106
          vf_explained_var: 0.6317073106765747
          vf_loss: 0.07297234563156962
    num_agent_steps_sampled: 10852192
    num_steps_sampled: 10852192
    num_steps_trained: 10852192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1739,73341.2,10852192,1.80812,1.9816,-2,41.3585


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10856192
  custom_metrics: {}
  date: 2021-12-10_10-08-07
  done: false
  episode_len_mean: 36.21904761904762
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8529142890657697
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 220058
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9205860272049904
          entropy_coeff: 0.0
          kl: 0.013533709512557834
          policy_loss: -0.0977891176007688
          total_loss: -0.017048246460035443
          vf_explained_var: 0.7191513776779175
          vf_loss: 0.06018655002117157
    num_agent_steps_sampled: 10856192
    num_steps_sampled: 10856192
    num_steps_trained: 10856192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1740,73365.6,10856192,1.85291,1.978,-2,36.219


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10860192
  custom_metrics: {}
  date: 2021-12-10_10-08-32
  done: false
  episode_len_mean: 37.41
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8864559996128083
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 220155
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9387133121490479
          entropy_coeff: 0.0
          kl: 0.01451457553775981
          policy_loss: -0.10499180265469477
          total_loss: -0.040169880143366754
          vf_explained_var: 0.7915282249450684
          vf_loss: 0.04277791338972747
    num_agent_steps_sampled: 10860192
    num_steps_sampled: 10860192
    num_steps_trained: 10860192
  iterations_since_restore: 200
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1741,73390.4,10860192,1.88646,1.9796,-2,37.41


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10864192
  custom_metrics: {}
  date: 2021-12-10_10-08-56
  done: false
  episode_len_mean: 43.66019417475728
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.6859262140051832
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 220258
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.993671078234911
          entropy_coeff: 0.0
          kl: 0.012211005378048867
          policy_loss: -0.09226573636988178
          total_loss: 0.023595636535901576
          vf_explained_var: 0.7072986364364624
          vf_loss: 0.09731590887531638
    num_agent_steps_sampled: 10864192
    num_steps_sampled: 10864192
    num_steps_trained: 10864192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1742,73415.1,10864192,1.68593,1.9836,-2,43.6602


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10868192
  custom_metrics: {}
  date: 2021-12-10_10-09-21
  done: false
  episode_len_mean: 37.02857142857143
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.7485295250302268
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 220363
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9290012791752815
          entropy_coeff: 0.0
          kl: 0.013597294280771166
          policy_loss: -0.0991050380980596
          total_loss: -0.0245796418748796
          vf_explained_var: 0.8721548914909363
          vf_loss: 0.05387450475245714
    num_agent_steps_sampled: 10868192
    num_steps_sampled: 10868192
    num_steps_trained: 10868192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1743,73439.4,10868192,1.74853,1.9836,-2,37.0286


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10872192
  custom_metrics: {}
  date: 2021-12-10_10-09-45
  done: false
  episode_len_mean: 40.96
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8457080018520355
  episode_reward_min: -2.0
  episodes_this_iter: 78
  episodes_total: 220441
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9716621451079845
          entropy_coeff: 0.0
          kl: 0.014635736239142716
          policy_loss: -0.10757499362807721
          total_loss: -0.03554437938146293
          vf_explained_var: 0.815301775932312
          vf_loss: 0.049802591325715184
    num_agent_steps_sampled: 10872192
    num_steps_sampled: 10872192
    num_steps_trained: 10872192
  iterations_since_restore: 203
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1744,73463.6,10872192,1.84571,1.9836,-2,40.96


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10876192
  custom_metrics: {}
  date: 2021-12-10_10-10-10
  done: false
  episode_len_mean: 47.72
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8693840026855468
  episode_reward_min: -2.0
  episodes_this_iter: 80
  episodes_total: 220521
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 1.0479935891926289
          entropy_coeff: 0.0
          kl: 0.013378019502852112
          policy_loss: -0.10184996051248163
          total_loss: -0.04104953148635104
          vf_explained_var: 0.8579758405685425
          vf_loss: 0.04048256226815283
    num_agent_steps_sampled: 10876192
    num_steps_sampled: 10876192
    num_steps_trained: 10876192
  iterations_since_restore: 204
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1745,73488.5,10876192,1.86938,1.9836,-2,47.72


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10880192
  custom_metrics: {}
  date: 2021-12-10_10-10-35
  done: false
  episode_len_mean: 46.3
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9078320008516312
  episode_reward_min: 0.4580000042915344
  episodes_this_iter: 96
  episodes_total: 220617
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9362067878246307
          entropy_coeff: 0.0
          kl: 0.014518776966724545
          policy_loss: -0.10340147308306769
          total_loss: -0.03501841810066253
          vf_explained_var: 0.7624881267547607
          vf_loss: 0.04633266222663224
    num_agent_steps_sampled: 10880192
    num_steps_sampled: 10880192
    num_steps_trained: 10880192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1746,73513.8,10880192,1.90783,1.9828,0.458,46.3


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10884192
  custom_metrics: {}
  date: 2021-12-10_10-11-00
  done: false
  episode_len_mean: 44.13
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.7623919987678527
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 220714
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9369303844869137
          entropy_coeff: 0.0
          kl: 0.012868266960140318
          policy_loss: -0.09573054709471762
          total_loss: -0.020567713072523475
          vf_explained_var: 0.8258004784584045
          vf_loss: 0.05561915412545204
    num_agent_steps_sampled: 10884192
    num_steps_sampled: 10884192
    num_steps_trained: 10884192
  iterations_since_restore: 206
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1747,73538.5,10884192,1.76239,1.9784,-2,44.13


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10888192
  custom_metrics: {}
  date: 2021-12-10_10-11-25
  done: false
  episode_len_mean: 36.45
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8883120000362397
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 220805
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9709087200462818
          entropy_coeff: 0.0
          kl: 0.013661181088536978
          policy_loss: -0.1028095034416765
          total_loss: -0.037905339733697474
          vf_explained_var: 0.8024109601974487
          vf_loss: 0.04415624774992466
    num_agent_steps_sampled: 10888192
    num_steps_sampled: 10888192
    num_steps_trained: 10888192
  iterations_since_restore: 207
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1748,73563,10888192,1.88831,1.98,-2,36.45


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10892192
  custom_metrics: {}
  date: 2021-12-10_10-11-49
  done: false
  episode_len_mean: 41.4070796460177
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.7581345159395607
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 220918
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9432571493089199
          entropy_coeff: 0.0
          kl: 0.01204185263486579
          policy_loss: -0.0911587993032299
          total_loss: -0.02307416353141889
          vf_explained_var: 0.8203290104866028
          vf_loss: 0.04979607043787837
    num_agent_steps_sampled: 10892192
    num_steps_sampled: 10892192
    num_steps_trained: 10892192
  iterations_since_restore: 20

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1749,73587.6,10892192,1.75813,1.9812,-2,41.4071


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10896192
  custom_metrics: {}
  date: 2021-12-10_10-12-14
  done: false
  episode_len_mean: 38.93859649122807
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8595228090620877
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 221032
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8488227017223835
          entropy_coeff: 0.0
          kl: 0.013292717456351966
          policy_loss: -0.09529472803114913
          total_loss: -0.030842365231364965
          vf_explained_var: 0.7177704572677612
          vf_loss: 0.04426404589321464
    num_agent_steps_sampled: 10896192
    num_steps_sampled: 10896192
    num_steps_trained: 10896192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1750,73612.1,10896192,1.85952,1.978,-2,38.9386


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10900192
  custom_metrics: {}
  date: 2021-12-10_10-12-38
  done: false
  episode_len_mean: 39.96078431372549
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.845286276994967
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 221134
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9244337938725948
          entropy_coeff: 0.0
          kl: 0.014047965116333216
          policy_loss: -0.10578251932747662
          total_loss: -0.04417877731611952
          vf_explained_var: 0.8252804279327393
          vf_loss: 0.04026839346624911
    num_agent_steps_sampled: 10900192
    num_steps_sampled: 10900192
    num_steps_trained: 10900192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1751,73636.5,10900192,1.84529,1.9808,-2,39.9608


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10904192
  custom_metrics: {}
  date: 2021-12-10_10-13-03
  done: false
  episode_len_mean: 36.43
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9275559961795807
  episode_reward_min: 1.6871999502182007
  episodes_this_iter: 95
  episodes_total: 221229
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9013599492609501
          entropy_coeff: 0.0
          kl: 0.014648386044427752
          policy_loss: -0.10550984682049602
          total_loss: -0.0455174736562185
          vf_explained_var: 0.7464195489883423
          vf_loss: 0.03774513374082744
    num_agent_steps_sampled: 10904192
    num_steps_sampled: 10904192
    num_steps_trained: 10904192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1752,73660.9,10904192,1.92756,1.982,1.6872,36.43


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10908192
  custom_metrics: {}
  date: 2021-12-10_10-13-28
  done: false
  episode_len_mean: 35.47787610619469
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8651221220472218
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 221342
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.897327084094286
          entropy_coeff: 0.0
          kl: 0.01398503320524469
          policy_loss: -0.09764252352761105
          total_loss: -0.03137884955503978
          vf_explained_var: 0.7725443840026855
          vf_loss: 0.045023903949186206
    num_agent_steps_sampled: 10908192
    num_steps_sampled: 10908192
    num_steps_trained: 10908192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1753,73685.9,10908192,1.86512,1.982,-2,35.4779


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10912192
  custom_metrics: {}
  date: 2021-12-10_10-13-52
  done: false
  episode_len_mean: 42.34
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.839324004650116
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 221439
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9287889264523983
          entropy_coeff: 0.0
          kl: 0.014584642311092466
          policy_loss: -0.10616530163679272
          total_loss: -0.03990769456140697
          vf_explained_var: 0.8105498552322388
          vf_loss: 0.04410718008875847
    num_agent_steps_sampled: 10912192
    num_steps_sampled: 10912192
    num_steps_trained: 10912192
  iterations_since_restore: 213
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1754,73710.6,10912192,1.83932,1.98,-2,42.34


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10916192
  custom_metrics: {}
  date: 2021-12-10_10-14-17
  done: false
  episode_len_mean: 37.666666666666664
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8565837859033465
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 221550
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9350244700908661
          entropy_coeff: 0.0
          kl: 0.012917271233163774
          policy_loss: -0.10107857431285083
          total_loss: -0.026027651852928102
          vf_explained_var: 0.8045272827148438
          vf_loss: 0.05543281836435199
    num_agent_steps_sampled: 10916192
    num_steps_sampled: 10916192
    num_steps_trained: 10916192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1755,73735,10916192,1.85658,1.9832,-2,37.6667


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10920192
  custom_metrics: {}
  date: 2021-12-10_10-14-42
  done: false
  episode_len_mean: 39.66990291262136
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.6954796163781176
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 221653
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.939041119068861
          entropy_coeff: 0.0
          kl: 0.01331501902313903
          policy_loss: -0.09350601604091935
          total_loss: -0.006075331300962716
          vf_explained_var: 0.8199155926704407
          vf_loss: 0.0672085010446608
    num_agent_steps_sampled: 10920192
    num_steps_sampled: 10920192
    num_steps_trained: 10920192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1756,73760.2,10920192,1.69548,1.9784,-2,39.6699


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10924192
  custom_metrics: {}
  date: 2021-12-10_10-15-07
  done: false
  episode_len_mean: 35.30434782608695
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9297565242518548
  episode_reward_min: 1.5607999563217163
  episodes_this_iter: 115
  episodes_total: 221768
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8658236637711525
          entropy_coeff: 0.0
          kl: 0.014908607758115977
          policy_loss: -0.1058425159426406
          total_loss: -0.04370369960088283
          vf_explained_var: 0.7390280365943909
          vf_loss: 0.03949636849574745
    num_agent_steps_sampled: 10924192
    num_steps_sampled: 10924192
    num_steps_trained: 10924192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1757,73785.6,10924192,1.92976,1.9788,1.5608,35.3043


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10928192
  custom_metrics: {}
  date: 2021-12-10_10-15-33
  done: false
  episode_len_mean: 34.66
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.891368006467819
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 221861
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9608439765870571
          entropy_coeff: 0.0
          kl: 0.014543279306963086
          policy_loss: -0.10581211384851485
          total_loss: -0.0267522477311104
          vf_explained_var: 0.7047425508499146
          vf_loss: 0.056972268037498
    num_agent_steps_sampled: 10928192
    num_steps_sampled: 10928192
    num_steps_trained: 10928192
  iterations_since_restore: 217
  node_ip: 1

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1758,73811.5,10928192,1.89137,1.9812,-2,34.66


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10932192
  custom_metrics: {}
  date: 2021-12-10_10-15-59
  done: false
  episode_len_mean: 39.08653846153846
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8472423152281687
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 221965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9467111080884933
          entropy_coeff: 0.0
          kl: 0.012932242243550718
          policy_loss: -0.10014530399348587
          total_loss: -0.03780160489259288
          vf_explained_var: 0.8074660301208496
          vf_loss: 0.04270285787060857
    num_agent_steps_sampled: 10932192
    num_steps_sampled: 10932192
    num_steps_trained: 10932192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1759,73836.7,10932192,1.84724,1.9788,-2,39.0865


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10936192
  custom_metrics: {}
  date: 2021-12-10_10-16-24
  done: false
  episode_len_mean: 44.99
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8706880009174347
  episode_reward_min: -2.0
  episodes_this_iter: 89
  episodes_total: 222054
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9911124408245087
          entropy_coeff: 0.0
          kl: 0.013052138092461973
          policy_loss: -0.0967946678865701
          total_loss: -0.025604178837966174
          vf_explained_var: 0.761349618434906
          vf_loss: 0.05136755248531699
    num_agent_steps_sampled: 10936192
    num_steps_sampled: 10936192
    num_steps_trained: 10936192
  iterations_since_restore: 219
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1760,73861.6,10936192,1.87069,1.978,-2,44.99


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10940192
  custom_metrics: {}
  date: 2021-12-10_10-16-49
  done: false
  episode_len_mean: 40.59
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.880116000175476
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 222147
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9414827413856983
          entropy_coeff: 0.0
          kl: 0.01464068511268124
          policy_loss: -0.10312431585043669
          total_loss: -0.04001877800328657
          vf_explained_var: 0.7727020382881165
          vf_loss: 0.040869997814297676
    num_agent_steps_sampled: 10940192
    num_steps_sampled: 10940192
    num_steps_trained: 10940192
  iterations_since_restore: 220
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1761,73886.7,10940192,1.88012,1.9824,-2,40.59


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10944192
  custom_metrics: {}
  date: 2021-12-10_10-17-14
  done: false
  episode_len_mean: 40.51
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8802360010147094
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 222238
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9276546686887741
          entropy_coeff: 0.0
          kl: 0.014781807141844183
          policy_loss: -0.10382995661348104
          total_loss: -0.045506883412599564
          vf_explained_var: 0.7711009383201599
          vf_loss: 0.03587320540100336
    num_agent_steps_sampled: 10944192
    num_steps_sampled: 10944192
    num_steps_trained: 10944192
  iterations_since_restore: 221
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1762,73911.6,10944192,1.88024,1.984,-2,40.51


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10948192
  custom_metrics: {}
  date: 2021-12-10_10-17-39
  done: false
  episode_len_mean: 36.248062015503876
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8425333361292995
  episode_reward_min: -2.0
  episodes_this_iter: 129
  episodes_total: 222367
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8378773666918278
          entropy_coeff: 0.0
          kl: 0.012252917687874287
          policy_loss: -0.08621478325221688
          total_loss: -0.03217785432934761
          vf_explained_var: 0.8113217949867249
          vf_loss: 0.035427808412350714
    num_agent_steps_sampled: 10948192
    num_steps_sampled: 10948192
    num_steps_trained: 10948192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1763,73936.8,10948192,1.84253,1.9788,-2,36.2481


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10952192
  custom_metrics: {}
  date: 2021-12-10_10-18-04
  done: false
  episode_len_mean: 42.26923076923077
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9158192322804377
  episode_reward_min: 0.0
  episodes_this_iter: 104
  episodes_total: 222471
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9152367636561394
          entropy_coeff: 0.0
          kl: 0.014335212123114616
          policy_loss: -0.10413159371819347
          total_loss: -0.03921797266229987
          vf_explained_var: 0.7077704668045044
          vf_loss: 0.04314201674424112
    num_agent_steps_sampled: 10952192
    num_steps_sampled: 10952192
    num_steps_trained: 10952192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1764,73961.9,10952192,1.91582,1.9836,0,42.2692


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10956192
  custom_metrics: {}
  date: 2021-12-10_10-18-29
  done: false
  episode_len_mean: 37.12264150943396
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8886754692725416
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 222577
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8553980402648449
          entropy_coeff: 0.0
          kl: 0.014830200641881675
          policy_loss: -0.09991708747111261
          total_loss: -0.02215461985906586
          vf_explained_var: 0.5898507833480835
          vf_loss: 0.0552391002420336
    num_agent_steps_sampled: 10956192
    num_steps_sampled: 10956192
    num_steps_trained: 10956192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1765,73986.7,10956192,1.88868,1.9796,-2,37.1226


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10960192
  custom_metrics: {}
  date: 2021-12-10_10-18-54
  done: false
  episode_len_mean: 42.12
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8372239947319031
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 222672
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.888451624661684
          entropy_coeff: 0.0
          kl: 0.013133718341123313
          policy_loss: -0.09660680405795574
          total_loss: -0.031392157601658255
          vf_explained_var: 0.7258532643318176
          vf_loss: 0.04526781546883285
    num_agent_steps_sampled: 10960192
    num_steps_sampled: 10960192
    num_steps_trained: 10960192
  iterations_since_restore: 225
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1766,74011.6,10960192,1.83722,1.978,-2,42.12


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10964192
  custom_metrics: {}
  date: 2021-12-10_10-19-19
  done: false
  episode_len_mean: 36.89523809523809
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.889436193874904
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 222777
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9527279436588287
          entropy_coeff: 0.0
          kl: 0.014265324512962252
          policy_loss: -0.10535873693879694
          total_loss: -0.04750896629411727
          vf_explained_var: 0.8053568601608276
          vf_loss: 0.03618431091308594
    num_agent_steps_sampled: 10964192
    num_steps_sampled: 10964192
    num_steps_trained: 10964192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1767,74036.9,10964192,1.88944,1.9816,-2,36.8952


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10968192
  custom_metrics: {}
  date: 2021-12-10_10-19-44
  done: false
  episode_len_mean: 36.17796610169491
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.894447455971928
  episode_reward_min: -2.0
  episodes_this_iter: 118
  episodes_total: 222895
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8952611982822418
          entropy_coeff: 0.0
          kl: 0.014411408687010407
          policy_loss: -0.10134755587205291
          total_loss: -0.0325317878396163
          vf_explained_var: 0.7111520767211914
          vf_loss: 0.046928440453484654
    num_agent_steps_sampled: 10968192
    num_steps_sampled: 10968192
    num_steps_trained: 10968192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1768,74061.9,10968192,1.89445,1.9828,-2,36.178


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10972192
  custom_metrics: {}
  date: 2021-12-10_10-20-09
  done: false
  episode_len_mean: 40.24
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8414640057086944
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 222992
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8928779140114784
          entropy_coeff: 0.0
          kl: 0.01358069822890684
          policy_loss: -0.09941623662598431
          total_loss: -0.0171424358850345
          vf_explained_var: 0.6538985371589661
          vf_loss: 0.061648114351555705
    num_agent_steps_sampled: 10972192
    num_steps_sampled: 10972192
    num_steps_trained: 10972192
  iterations_since_restore: 228
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1769,74087.1,10972192,1.84146,1.9788,-2,40.24


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10976192
  custom_metrics: {}
  date: 2021-12-10_10-20-34
  done: false
  episode_len_mean: 35.96078431372549
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8567843063204896
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 223094
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9670719690620899
          entropy_coeff: 0.0
          kl: 0.01257454656297341
          policy_loss: -0.09501212404575199
          total_loss: -0.02861540563753806
          vf_explained_var: 0.8234561681747437
          vf_loss: 0.04729912499897182
    num_agent_steps_sampled: 10976192
    num_steps_sampled: 10976192
    num_steps_trained: 10976192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1770,74111.9,10976192,1.85678,1.9812,-2,35.9608


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10980192
  custom_metrics: {}
  date: 2021-12-10_10-20-59
  done: false
  episode_len_mean: 40.96153846153846
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9184346130261054
  episode_reward_min: 1.1992000341415405
  episodes_this_iter: 104
  episodes_total: 223198
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9345077611505985
          entropy_coeff: 0.0
          kl: 0.014526949846185744
          policy_loss: -0.10798835824243724
          total_loss: -0.047436112217837945
          vf_explained_var: 0.7972341775894165
          vf_loss: 0.03848944546189159
    num_agent_steps_sampled: 10980192
    num_steps_sampled: 10980192
    num_steps_trained: 10980192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1771,74136.9,10980192,1.91843,1.9788,1.1992,40.9615


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10984192
  custom_metrics: {}
  date: 2021-12-10_10-21-24
  done: false
  episode_len_mean: 34.72727272727273
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8240872740745544
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 223308
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9101659059524536
          entropy_coeff: 0.0
          kl: 0.013303095067385584
          policy_loss: -0.0959260726813227
          total_loss: -0.011001121369190514
          vf_explained_var: 0.6875782012939453
          vf_loss: 0.06472087069414556
    num_agent_steps_sampled: 10984192
    num_steps_sampled: 10984192
    num_steps_trained: 10984192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1772,74162.1,10984192,1.82409,1.9812,-2,34.7273


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10988192
  custom_metrics: {}
  date: 2021-12-10_10-21-49
  done: false
  episode_len_mean: 43.65346534653465
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9132158384464755
  episode_reward_min: 0.7487999796867371
  episodes_this_iter: 101
  episodes_total: 223409
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9017967991530895
          entropy_coeff: 0.0
          kl: 0.014849093451630324
          policy_loss: -0.10879918944556266
          total_loss: -0.04595089209033176
          vf_explained_var: 0.7121288776397705
          vf_loss: 0.040296231396496296
    num_agent_steps_sampled: 10988192
    num_steps_sampled: 10988192
    num_steps_trained: 10988192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1773,74187,10988192,1.91322,1.9828,0.7488,43.6535


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10992192
  custom_metrics: {}
  date: 2021-12-10_10-22-15
  done: false
  episode_len_mean: 40.31
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8022279989719392
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 223505
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9697050526738167
          entropy_coeff: 0.0
          kl: 0.01286442659329623
          policy_loss: -0.09748366964049637
          total_loss: -0.015960957622155547
          vf_explained_var: 0.7477502822875977
          vf_loss: 0.06198486080393195
    num_agent_steps_sampled: 10992192
    num_steps_sampled: 10992192
    num_steps_trained: 10992192
  iterations_since_restore: 233
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1774,74212.2,10992192,1.80223,1.9844,-2,40.31


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 10996192
  custom_metrics: {}
  date: 2021-12-10_10-22-39
  done: false
  episode_len_mean: 39.0990990990991
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9222270304018312
  episode_reward_min: 1.518399953842163
  episodes_this_iter: 111
  episodes_total: 223616
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8650484532117844
          entropy_coeff: 0.0
          kl: 0.015244065667502582
          policy_loss: -0.10841708618681878
          total_loss: -0.04738527728477493
          vf_explained_var: 0.6572586297988892
          vf_loss: 0.03787988191470504
    num_agent_steps_sampled: 10996192
    num_steps_sampled: 10996192
    num_steps_trained: 10996192
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1775,74237.1,10996192,1.92223,1.9828,1.5184,39.0991


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11000192
  custom_metrics: {}
  date: 2021-12-10_10-23-04
  done: false
  episode_len_mean: 34.51960784313726
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.89245098361782
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 223718
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9412561655044556
          entropy_coeff: 0.0
          kl: 0.014235930400900543
          policy_loss: -0.10316044260980561
          total_loss: -0.04100677580572665
          vf_explained_var: 0.7744860649108887
          vf_loss: 0.040532848332077265
    num_agent_steps_sampled: 11000192
    num_steps_sampled: 11000192
    num_steps_trained: 11000192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1776,74262.1,11000192,1.89245,1.984,-2,34.5196


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11004192
  custom_metrics: {}
  date: 2021-12-10_10-23-29
  done: false
  episode_len_mean: 38.37
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9236240041255952
  episode_reward_min: 1.5160000324249268
  episodes_this_iter: 99
  episodes_total: 223817
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9048708453774452
          entropy_coeff: 0.0
          kl: 0.015504031034652144
          policy_loss: -0.10641859786119312
          total_loss: -0.045573640905786306
          vf_explained_var: 0.7119255065917969
          vf_loss: 0.03729821159504354
    num_agent_steps_sampled: 11004192
    num_steps_sampled: 11004192
    num_steps_trained: 11004192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1777,74286.8,11004192,1.92362,1.984,1.516,38.37


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11008192
  custom_metrics: {}
  date: 2021-12-10_10-23-54
  done: false
  episode_len_mean: 45.32
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8720800030231475
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 223911
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8727248348295689
          entropy_coeff: 0.0
          kl: 0.014803714584559202
          policy_loss: -0.10467062162933871
          total_loss: -0.04623536975122988
          vf_explained_var: 0.7291300892829895
          vf_loss: 0.03595210798084736
    num_agent_steps_sampled: 11008192
    num_steps_sampled: 11008192
    num_steps_trained: 11008192
  iterations_since_restore: 237
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1778,74311.6,11008192,1.87208,1.9796,-2,45.32


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11012192
  custom_metrics: {}
  date: 2021-12-10_10-24-19
  done: false
  episode_len_mean: 38.22608695652174
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8900695676388948
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 224026
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8294806852936745
          entropy_coeff: 0.0
          kl: 0.0139832011773251
          policy_loss: -0.09624173445627093
          total_loss: -0.03547406604047865
          vf_explained_var: 0.6642796397209167
          vf_loss: 0.03953068295959383
    num_agent_steps_sampled: 11012192
    num_steps_sampled: 11012192
    num_steps_trained: 11012192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1779,74336.8,11012192,1.89007,1.978,-2,38.2261


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11016192
  custom_metrics: {}
  date: 2021-12-10_10-24-44
  done: false
  episode_len_mean: 32.05555555555556
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9363407382258662
  episode_reward_min: 1.768399953842163
  episodes_this_iter: 108
  episodes_total: 224134
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8702386729419231
          entropy_coeff: 0.0
          kl: 0.015462600335013121
          policy_loss: -0.1102527289185673
          total_loss: -0.054703064961358905
          vf_explained_var: 0.6878580451011658
          vf_loss: 0.0320658452110365
    num_agent_steps_sampled: 11016192
    num_steps_sampled: 11016192
    num_steps_trained: 11016192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1780,74361.4,11016192,1.93634,1.9844,1.7684,32.0556


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11020192
  custom_metrics: {}
  date: 2021-12-10_10-25-09
  done: false
  episode_len_mean: 39.31481481481482
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8487888861585546
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 224242
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8683839030563831
          entropy_coeff: 0.0
          kl: 0.012498858268372715
          policy_loss: -0.08864511986030266
          total_loss: -0.011115946006611921
          vf_explained_var: 0.6679748892784119
          vf_loss: 0.05854653613641858
    num_agent_steps_sampled: 11020192
    num_steps_sampled: 11020192
    num_steps_trained: 11020192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1781,74386.3,11020192,1.84879,1.9788,-2,39.3148


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11024192
  custom_metrics: {}
  date: 2021-12-10_10-25-34
  done: false
  episode_len_mean: 36.4537037037037
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.7851222223705716
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 224350
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8592015169560909
          entropy_coeff: 0.0
          kl: 0.012969219533260912
          policy_loss: -0.09099522390170023
          total_loss: -0.013447330537019297
          vf_explained_var: 0.7509264349937439
          vf_loss: 0.057850894634611905
    num_agent_steps_sampled: 11024192
    num_steps_sampled: 11024192
    num_steps_trained: 11024192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1782,74411.3,11024192,1.78512,1.98,-2,36.4537


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11028192
  custom_metrics: {}
  date: 2021-12-10_10-26-00
  done: false
  episode_len_mean: 42.99
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8756879997253417
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 224444
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9128758609294891
          entropy_coeff: 0.0
          kl: 0.014712050033267587
          policy_loss: -0.10916625382378697
          total_loss: -0.05292276432737708
          vf_explained_var: 0.8249148726463318
          vf_loss: 0.03389956406317651
    num_agent_steps_sampled: 11028192
    num_steps_sampled: 11028192
    num_steps_trained: 11028192
  iterations_since_restore: 242
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1783,74437.2,11028192,1.87569,1.9788,-2,42.99


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11032192
  custom_metrics: {}
  date: 2021-12-10_10-26-25
  done: false
  episode_len_mean: 37.06603773584906
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8555999971785635
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 224550
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9250710047781467
          entropy_coeff: 0.0
          kl: 0.013270393654238433
          policy_loss: -0.1001557583513204
          total_loss: -0.03327952593099326
          vf_explained_var: 0.7550768256187439
          vf_loss: 0.04672182071954012
    num_agent_steps_sampled: 11032192
    num_steps_sampled: 11032192
    num_steps_trained: 11032192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1784,74461.9,11032192,1.8556,1.9828,-2,37.066


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11036192
  custom_metrics: {}
  date: 2021-12-10_10-26-50
  done: false
  episode_len_mean: 35.68852459016394
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9289245908377601
  episode_reward_min: 1.353600025177002
  episodes_this_iter: 122
  episodes_total: 224672
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8596412390470505
          entropy_coeff: 0.0
          kl: 0.014936630090232939
          policy_loss: -0.10788075439631939
          total_loss: -0.050666089402511716
          vf_explained_var: 0.6931124329566956
          vf_loss: 0.03452965815085918
    num_agent_steps_sampled: 11036192
    num_steps_sampled: 11036192
    num_steps_trained: 11036192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1785,74487.1,11036192,1.92892,1.9788,1.3536,35.6885


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11040192
  custom_metrics: {}
  date: 2021-12-10_10-27-15
  done: false
  episode_len_mean: 35.68807339449541
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8938642226227926
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 224781
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8710493110120296
          entropy_coeff: 0.0
          kl: 0.014106294780503958
          policy_loss: -0.10059571289457381
          total_loss: -0.046828820020891726
          vf_explained_var: 0.7150120735168457
          vf_loss: 0.03234295500442386
    num_agent_steps_sampled: 11040192
    num_steps_sampled: 11040192
    num_steps_trained: 11040192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1786,74511.8,11040192,1.89386,1.9816,-2,35.6881


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11044192
  custom_metrics: {}
  date: 2021-12-10_10-27-40
  done: false
  episode_len_mean: 36.2972972972973
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.892010810138943
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 224892
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8956804573535919
          entropy_coeff: 0.0
          kl: 0.013348616193979979
          policy_loss: -0.09842260903678834
          total_loss: -0.032671478955307975
          vf_explained_var: 0.644849419593811
          vf_loss: 0.04547791974619031
    num_agent_steps_sampled: 11044192
    num_steps_sampled: 11044192
    num_steps_trained: 11044192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1787,74537,11044192,1.89201,1.978,-2,36.2973


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11048192
  custom_metrics: {}
  date: 2021-12-10_10-28-05
  done: false
  episode_len_mean: 36.27
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8487999975681304
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 224982
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9486382156610489
          entropy_coeff: 0.0
          kl: 0.014071095152758062
          policy_loss: -0.09934115002397448
          total_loss: -0.03589065821142867
          vf_explained_var: 0.7304118871688843
          vf_loss: 0.04208001692313701
    num_agent_steps_sampled: 11048192
    num_steps_sampled: 11048192
    num_steps_trained: 11048192
  iterations_since_restore: 247
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1788,74561.9,11048192,1.8488,1.9796,-2,36.27


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11052192
  custom_metrics: {}
  date: 2021-12-10_10-28-30
  done: false
  episode_len_mean: 36.68695652173913
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9270469541135042
  episode_reward_min: 1.3312000036239624
  episodes_this_iter: 115
  episodes_total: 225097
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8577129952609539
          entropy_coeff: 0.0
          kl: 0.016427194874268025
          policy_loss: -0.11301060230471194
          total_loss: -0.056318193790502846
          vf_explained_var: 0.6905899047851562
          vf_loss: 0.03174360538832843
    num_agent_steps_sampled: 11052192
    num_steps_sampled: 11052192
    num_steps_trained: 11052192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1789,74587.2,11052192,1.92705,1.9796,1.3312,36.687


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11056192
  custom_metrics: {}
  date: 2021-12-10_10-28-55
  done: false
  episode_len_mean: 35.24
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8511840069293977
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 225197
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8963368609547615
          entropy_coeff: 0.0
          kl: 0.01320771814789623
          policy_loss: -0.09425814438145608
          total_loss: -0.02173860470065847
          vf_explained_var: 0.7118724584579468
          vf_loss: 0.052460315404459834
    num_agent_steps_sampled: 11056192
    num_steps_sampled: 11056192
    num_steps_trained: 11056192
  iterations_since_restore: 249
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1790,74611.8,11056192,1.85118,1.9816,-2,35.24


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11060192
  custom_metrics: {}
  date: 2021-12-10_10-29-19
  done: false
  episode_len_mean: 40.06
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.8064479982852937
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 225290
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.932294450700283
          entropy_coeff: 0.0
          kl: 0.013153893873095512
          policy_loss: -0.09869183838600293
          total_loss: -0.017793853534385562
          vf_explained_var: 0.7281215786933899
          vf_loss: 0.060920506715774536
    num_agent_steps_sampled: 11060192
    num_steps_sampled: 11060192
    num_steps_trained: 11060192
  iterations_since_restore: 250
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1791,74636.5,11060192,1.80645,1.9844,-2,40.06


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11064192
  custom_metrics: {}
  date: 2021-12-10_10-29-45
  done: false
  episode_len_mean: 42.74
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9148960030078888
  episode_reward_min: 1.3716000318527222
  episodes_this_iter: 97
  episodes_total: 225387
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9550226628780365
          entropy_coeff: 0.0
          kl: 0.015483100316487253
          policy_loss: -0.11332543127355166
          total_loss: -0.05601439508609474
          vf_explained_var: 0.7979181408882141
          vf_loss: 0.03379607538226992
    num_agent_steps_sampled: 11064192
    num_steps_sampled: 11064192
    num_steps_trained: 11064192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1792,74662,11064192,1.9149,1.9844,1.3716,42.74


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11068192
  custom_metrics: {}
  date: 2021-12-10_10-30-10
  done: false
  episode_len_mean: 52.55
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8354239988327026
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 225478
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9116475395858288
          entropy_coeff: 0.0
          kl: 0.013802813133224845
          policy_loss: -0.10546137054916471
          total_loss: -0.04613433970371261
          vf_explained_var: 0.7872927784919739
          vf_loss: 0.038364009000360966
    num_agent_steps_sampled: 11068192
    num_steps_sampled: 11068192
    num_steps_trained: 11068192
  iterations_since_restore: 252
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1793,74687.6,11068192,1.83542,1.9784,-2,52.55


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11072192
  custom_metrics: {}
  date: 2021-12-10_10-30-35
  done: false
  episode_len_mean: 33.9349593495935
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.900357727112809
  episode_reward_min: -2.0
  episodes_this_iter: 123
  episodes_total: 225601
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8276349604129791
          entropy_coeff: 0.0
          kl: 0.014598089386709034
          policy_loss: -0.09886191808618605
          total_loss: -0.027142087230458856
          vf_explained_var: 0.6369967460632324
          vf_loss: 0.04954897775314748
    num_agent_steps_sampled: 11072192
    num_steps_sampled: 11072192
    num_steps_trained: 11072192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1794,74712.5,11072192,1.90036,1.98,-2,33.935


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11076192
  custom_metrics: {}
  date: 2021-12-10_10-31-01
  done: false
  episode_len_mean: 32.9622641509434
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8974452794722791
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 225707
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9314807504415512
          entropy_coeff: 0.0
          kl: 0.013101923512294888
          policy_loss: -0.10179278254508972
          total_loss: -0.031036995627800934
          vf_explained_var: 0.672126293182373
          vf_loss: 0.050857240334153175
    num_agent_steps_sampled: 11076192
    num_steps_sampled: 11076192
    num_steps_trained: 11076192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1795,74737.6,11076192,1.89745,1.9792,-2,32.9623


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11080192
  custom_metrics: {}
  date: 2021-12-10_10-31-25
  done: false
  episode_len_mean: 40.351851851851855
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8833888879528753
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 225815
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8819644972681999
          entropy_coeff: 0.0
          kl: 0.014323943469207734
          policy_loss: -0.1020201459468808
          total_loss: -0.032979387789964676
          vf_explained_var: 0.6807314157485962
          vf_loss: 0.04728627041913569
    num_agent_steps_sampled: 11080192
    num_steps_sampled: 11080192
    num_steps_trained: 11080192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1796,74762.5,11080192,1.88339,1.98,-2,40.3519


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11084192
  custom_metrics: {}
  date: 2021-12-10_10-31-50
  done: false
  episode_len_mean: 36.66981132075472
  episode_media: {}
  episode_reward_max: 1.9764000177383423
  episode_reward_mean: 1.8899056585329883
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 225921
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8992736451327801
          entropy_coeff: 0.0
          kl: 0.013735661224927753
          policy_loss: -0.10229028121102601
          total_loss: -0.03781514393631369
          vf_explained_var: 0.6959068775177002
          vf_loss: 0.04361410695128143
    num_agent_steps_sampled: 11084192
    num_steps_sampled: 11084192
    num_steps_trained: 11084192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1797,74787.2,11084192,1.88991,1.9764,-2,36.6698


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11088192
  custom_metrics: {}
  date: 2021-12-10_10-32-16
  done: false
  episode_len_mean: 37.48275862068966
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8917310278991173
  episode_reward_min: -2.0
  episodes_this_iter: 116
  episodes_total: 226037
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8644202016294003
          entropy_coeff: 0.0
          kl: 0.013629792083520442
          policy_loss: -0.09349229786312208
          total_loss: -0.03826579387532547
          vf_explained_var: 0.7005399465560913
          vf_loss: 0.03452625940553844
    num_agent_steps_sampled: 11088192
    num_steps_sampled: 11088192
    num_steps_trained: 11088192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1798,74812.6,11088192,1.89173,1.9832,-2,37.4828


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11092192
  custom_metrics: {}
  date: 2021-12-10_10-32-41
  done: false
  episode_len_mean: 32.208695652173915
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8688556526018225
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 226152
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8438171334564686
          entropy_coeff: 0.0
          kl: 0.013259528612252325
          policy_loss: -0.09094490390270948
          total_loss: -0.03378153848461807
          vf_explained_var: 0.787030816078186
          vf_loss: 0.037025460856966674
    num_agent_steps_sampled: 11092192
    num_steps_sampled: 11092192
    num_steps_trained: 11092192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1799,74837.8,11092192,1.86886,1.978,-2,32.2087


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11096192
  custom_metrics: {}
  date: 2021-12-10_10-33-06
  done: false
  episode_len_mean: 36.223214285714285
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8941107147506304
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 226264
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8596126735210419
          entropy_coeff: 0.0
          kl: 0.01364504813682288
          policy_loss: -0.09562262304825708
          total_loss: -0.0343117177253589
          vf_explained_var: 0.695154070854187
          vf_loss: 0.04058748634997755
    num_agent_steps_sampled: 11096192
    num_steps_sampled: 11096192
    num_steps_trained: 11096192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1800,74863.2,11096192,1.89411,1.984,-2,36.2232


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11100192
  custom_metrics: {}
  date: 2021-12-10_10-33-32
  done: false
  episode_len_mean: 32.78378378378378
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9011783825384605
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 226375
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8761181347072124
          entropy_coeff: 0.0
          kl: 0.014587070501875132
          policy_loss: -0.1048404072644189
          total_loss: -0.05043300217948854
          vf_explained_var: 0.7593273520469666
          vf_loss: 0.03225329064298421
    num_agent_steps_sampled: 11100192
    num_steps_sampled: 11100192
    num_steps_trained: 11100192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1801,74888.9,11100192,1.90118,1.9788,-2,32.7838


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11104192
  custom_metrics: {}
  date: 2021-12-10_10-33-58
  done: false
  episode_len_mean: 41.825688073394495
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9168183497332651
  episode_reward_min: 1.3007999658584595
  episodes_this_iter: 109
  episodes_total: 226484
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8713329210877419
          entropy_coeff: 0.0
          kl: 0.0165236882166937
          policy_loss: -0.11337988497689366
          total_loss: -0.058746538939885795
          vf_explained_var: 0.6895031929016113
          vf_loss: 0.029537992319092155
    num_agent_steps_sampled: 11104192
    num_steps_sampled: 11104192
    num_steps_trained: 11104192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1802,74915.1,11104192,1.91682,1.9784,1.3008,41.8257


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11108192
  custom_metrics: {}
  date: 2021-12-10_10-34-23
  done: false
  episode_len_mean: 37.23
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8883640027046205
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 226583
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8915805965662003
          entropy_coeff: 0.0
          kl: 0.014039516565389931
          policy_loss: -0.09795051522087306
          total_loss: -0.03020506916800514
          vf_explained_var: 0.6822406649589539
          vf_loss: 0.04642293869983405
    num_agent_steps_sampled: 11108192
    num_steps_sampled: 11108192
    num_steps_trained: 11108192
  iterations_since_restore: 262
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1803,74940.2,11108192,1.88836,1.9804,-2,37.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11112192
  custom_metrics: {}
  date: 2021-12-10_10-34-48
  done: false
  episode_len_mean: 35.37068965517241
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.9297068879522126
  episode_reward_min: 1.6468000411987305
  episodes_this_iter: 116
  episodes_total: 226699
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8444603234529495
          entropy_coeff: 0.0
          kl: 0.01581461902242154
          policy_loss: -0.11070843937341124
          total_loss: -0.0556183346780017
          vf_explained_var: 0.6946080923080444
          vf_loss: 0.031071654171682894
    num_agent_steps_sampled: 11112192
    num_steps_sampled: 11112192
    num_steps_trained: 11112192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1804,74964.9,11112192,1.92971,1.9788,1.6468,35.3707


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11116192
  custom_metrics: {}
  date: 2021-12-10_10-35-13
  done: false
  episode_len_mean: 36.38532110091743
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9275999987890962
  episode_reward_min: 1.6984000205993652
  episodes_this_iter: 109
  episodes_total: 226808
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9057781063020229
          entropy_coeff: 0.0
          kl: 0.013975938956718892
          policy_loss: -0.09197546122595668
          total_loss: -0.03852484765229747
          vf_explained_var: 0.7639683485031128
          vf_loss: 0.03222466097213328
    num_agent_steps_sampled: 11116192
    num_steps_sampled: 11116192
    num_steps_trained: 11116192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1805,74989.8,11116192,1.9276,1.9804,1.6984,36.3853


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11120192
  custom_metrics: {}
  date: 2021-12-10_10-35-39
  done: false
  episode_len_mean: 39.48
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.844948000907898
  episode_reward_min: -2.0
  episodes_this_iter: 91
  episodes_total: 226899
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9271771237254143
          entropy_coeff: 0.0
          kl: 0.013649693108163774
          policy_loss: -0.09911087504588068
          total_loss: -0.028027955850120634
          vf_explained_var: 0.7766591310501099
          vf_loss: 0.05035244976170361
    num_agent_steps_sampled: 11120192
    num_steps_sampled: 11120192
    num_steps_trained: 11120192
  iterations_since_restore: 265
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1806,75015.3,11120192,1.84495,1.9784,-2,39.48


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11124192
  custom_metrics: {}
  date: 2021-12-10_10-36-04
  done: false
  episode_len_mean: 37.13461538461539
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8543730767873616
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 227003
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8938931152224541
          entropy_coeff: 0.0
          kl: 0.0134478232357651
          policy_loss: -0.09158913926512469
          total_loss: -0.015610722999554127
          vf_explained_var: 0.746673583984375
          vf_loss: 0.05555453372653574
    num_agent_steps_sampled: 11124192
    num_steps_sampled: 11124192
    num_steps_trained: 11124192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1807,75040.8,11124192,1.85437,1.9776,-2,37.1346


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11128192
  custom_metrics: {}
  date: 2021-12-10_10-36-29
  done: false
  episode_len_mean: 39.38392857142857
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.851889282464981
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 227115
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8458203747868538
          entropy_coeff: 0.0
          kl: 0.014464297099038959
          policy_loss: -0.10196243156678975
          total_loss: -0.042007902171462774
          vf_explained_var: 0.7398101091384888
          vf_loss: 0.03798688354436308
    num_agent_steps_sampled: 11128192
    num_steps_sampled: 11128192
    num_steps_trained: 11128192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1808,75065.7,11128192,1.85189,1.9816,-2,39.3839


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11132192
  custom_metrics: {}
  date: 2021-12-10_10-36-55
  done: false
  episode_len_mean: 32.57
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8586120009422302
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 227215
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.932180430740118
          entropy_coeff: 0.0
          kl: 0.013939745025709271
          policy_loss: -0.0974764302955009
          total_loss: -0.02362332516349852
          vf_explained_var: 0.7644666433334351
          vf_loss: 0.05268211767543107
    num_agent_steps_sampled: 11132192
    num_steps_sampled: 11132192
    num_steps_trained: 11132192
  iterations_since_restore: 268
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1809,75091.4,11132192,1.85861,1.9836,-2,32.57


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11136192
  custom_metrics: {}
  date: 2021-12-10_10-37-20
  done: false
  episode_len_mean: 39.83898305084746
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.888111861075385
  episode_reward_min: -2.0
  episodes_this_iter: 118
  episodes_total: 227333
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8573822490870953
          entropy_coeff: 0.0
          kl: 0.014077547704800963
          policy_loss: -0.100008336128667
          total_loss: -0.039489211136242375
          vf_explained_var: 0.7628613710403442
          vf_loss: 0.03913884959183633
    num_agent_steps_sampled: 11136192
    num_steps_sampled: 11136192
    num_steps_trained: 11136192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1810,75116.8,11136192,1.88811,1.9804,-2,39.839


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11140192
  custom_metrics: {}
  date: 2021-12-10_10-37-46
  done: false
  episode_len_mean: 36.30275229357798
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8919963317179898
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 227442
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.877327885478735
          entropy_coeff: 0.0
          kl: 0.01424821763066575
          policy_loss: -0.10154408914968371
          total_loss: -0.043475484708324075
          vf_explained_var: 0.7724430561065674
          vf_loss: 0.03642912511713803
    num_agent_steps_sampled: 11140192
    num_steps_sampled: 11140192
    num_steps_trained: 11140192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1811,75142.2,11140192,1.892,1.9788,-2,36.3028


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11144192
  custom_metrics: {}
  date: 2021-12-10_10-38-11
  done: false
  episode_len_mean: 38.88
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.887715995311737
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 227542
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9095548465847969
          entropy_coeff: 0.0
          kl: 0.014356789644807577
          policy_loss: -0.10470947506837547
          total_loss: -0.035831792280077934
          vf_explained_var: 0.7178137302398682
          vf_loss: 0.04707330861128867
    num_agent_steps_sampled: 11144192
    num_steps_sampled: 11144192
    num_steps_trained: 11144192
  iterations_since_restore: 271
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1812,75167.9,11144192,1.88772,1.9784,-2,38.88


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11148192
  custom_metrics: {}
  date: 2021-12-10_10-38-37
  done: false
  episode_len_mean: 37.40163934426229
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8932032809882868
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 227664
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8303112648427486
          entropy_coeff: 0.0
          kl: 0.01371071912581101
          policy_loss: -0.09790432860609144
          total_loss: -0.03534646757179871
          vf_explained_var: 0.6361901164054871
          vf_loss: 0.0417347033508122
    num_agent_steps_sampled: 11148192
    num_steps_sampled: 11148192
    num_steps_trained: 11148192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1813,75193.2,11148192,1.8932,1.9804,-2,37.4016


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11152192
  custom_metrics: {}
  date: 2021-12-10_10-39-02
  done: false
  episode_len_mean: 32.58536585365854
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8715674838399499
  episode_reward_min: -2.0
  episodes_this_iter: 123
  episodes_total: 227787
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8205837272107601
          entropy_coeff: 0.0
          kl: 0.012487733387388289
          policy_loss: -0.09377688658423722
          total_loss: -0.029884482704801485
          vf_explained_var: 0.6294041872024536
          vf_loss: 0.04492665524594486
    num_agent_steps_sampled: 11152192
    num_steps_sampled: 11152192
    num_steps_trained: 11152192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1814,75218.7,11152192,1.87157,1.9772,-2,32.5854


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11156192
  custom_metrics: {}
  date: 2021-12-10_10-39-28
  done: false
  episode_len_mean: 33.88785046728972
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8608074812131508
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 227894
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8575840704143047
          entropy_coeff: 0.0
          kl: 0.013841911568306386
          policy_loss: -0.09870808944106102
          total_loss: -0.04402565595228225
          vf_explained_var: 0.7554066181182861
          vf_loss: 0.033660034416243434
    num_agent_steps_sampled: 11156192
    num_steps_sampled: 11156192
    num_steps_trained: 11156192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1815,75244.2,11156192,1.86081,1.9788,-2,33.8879


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11160192
  custom_metrics: {}
  date: 2021-12-10_10-39-53
  done: false
  episode_len_mean: 35.71304347826087
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8603234778279845
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 228009
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8539419807493687
          entropy_coeff: 0.0
          kl: 0.013180337846279144
          policy_loss: -0.09418654895853251
          total_loss: -0.028042943449690938
          vf_explained_var: 0.7033207416534424
          vf_loss: 0.0461259683361277
    num_agent_steps_sampled: 11160192
    num_steps_sampled: 11160192
    num_steps_trained: 11160192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1816,75269.8,11160192,1.86032,1.9804,-2,35.713


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11164192
  custom_metrics: {}
  date: 2021-12-10_10-40-19
  done: false
  episode_len_mean: 37.88
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9245960056781768
  episode_reward_min: 1.6283999681472778
  episodes_this_iter: 99
  episodes_total: 228108
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9267474263906479
          entropy_coeff: 0.0
          kl: 0.013978723203763366
          policy_loss: -0.10191715299151838
          total_loss: -0.04852616542484611
          vf_explained_var: 0.7851876020431519
          vf_loss: 0.03216080239508301
    num_agent_steps_sampled: 11164192
    num_steps_sampled: 11164192
    num_steps_trained: 11164192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1817,75295.1,11164192,1.9246,1.9804,1.6284,37.88


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11168192
  custom_metrics: {}
  date: 2021-12-10_10-40-44
  done: false
  episode_len_mean: 35.66115702479339
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.870429755242403
  episode_reward_min: -2.0
  episodes_this_iter: 121
  episodes_total: 228229
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8372097723186016
          entropy_coeff: 0.0
          kl: 0.01333597459597513
          policy_loss: -0.09977095481008291
          total_loss: -0.03569230288849212
          vf_explained_var: 0.7014224529266357
          vf_loss: 0.04382463847286999
    num_agent_steps_sampled: 11168192
    num_steps_sampled: 11168192
    num_steps_trained: 11168192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1818,75320.5,11168192,1.87043,1.9808,-2,35.6612


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11172192
  custom_metrics: {}
  date: 2021-12-10_10-41-10
  done: false
  episode_len_mean: 37.157407407407405
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8893481481958319
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 228337
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8710978552699089
          entropy_coeff: 0.0
          kl: 0.01395728625357151
          policy_loss: -0.09840884420555085
          total_loss: -0.0427928082208382
          vf_explained_var: 0.747422456741333
          vf_loss: 0.034418409573845565
    num_agent_steps_sampled: 11172192
    num_steps_sampled: 11172192
    num_steps_trained: 11172192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1819,75346.1,11172192,1.88935,1.9804,-2,37.1574


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11176192
  custom_metrics: {}
  date: 2021-12-10_10-41-35
  done: false
  episode_len_mean: 33.56363636363636
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.898010911724784
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 228447
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8690026290714741
          entropy_coeff: 0.0
          kl: 0.013480208639521152
          policy_loss: -0.09679926640819758
          total_loss: -0.03883746941573918
          vf_explained_var: 0.756820023059845
          vf_loss: 0.03748873248696327
    num_agent_steps_sampled: 11176192
    num_steps_sampled: 11176192
    num_steps_trained: 11176192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1820,75371.7,11176192,1.89801,1.9788,-2,33.5636


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11180192
  custom_metrics: {}
  date: 2021-12-10_10-42-01
  done: false
  episode_len_mean: 35.1
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9301839983463287
  episode_reward_min: 1.7028000354766846
  episodes_this_iter: 99
  episodes_total: 228546
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9720521606504917
          entropy_coeff: 0.0
          kl: 0.013853857293725014
          policy_loss: -0.10111212910851464
          total_loss: -0.04551925174018834
          vf_explained_var: 0.823212742805481
          vf_loss: 0.03455233050044626
    num_agent_steps_sampled: 11180192
    num_steps_sampled: 11180192
    num_steps_trained: 11180192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1821,75396.9,11180192,1.93018,1.9808,1.7028,35.1


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11184192
  custom_metrics: {}
  date: 2021-12-10_10-42-26
  done: false
  episode_len_mean: 39.88785046728972
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.9205794423540061
  episode_reward_min: 0.9196000099182129
  episodes_this_iter: 107
  episodes_total: 228653
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9004168771207333
          entropy_coeff: 0.0
          kl: 0.01468154159374535
          policy_loss: -0.10971680423244834
          total_loss: -0.055080597288906574
          vf_explained_var: 0.7720317840576172
          vf_loss: 0.03233861515764147
    num_agent_steps_sampled: 11184192
    num_steps_sampled: 11184192
    num_steps_trained: 11184192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1822,75422,11184192,1.92058,1.9784,0.9196,39.8879


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11188192
  custom_metrics: {}
  date: 2021-12-10_10-42-51
  done: false
  episode_len_mean: 38.87735849056604
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.8867169888514392
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 228759
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9250962771475315
          entropy_coeff: 0.0
          kl: 0.014037700835615396
          policy_loss: -0.1029257153859362
          total_loss: -0.04682754765963182
          vf_explained_var: 0.7836179733276367
          vf_loss: 0.0347784060286358
    num_agent_steps_sampled: 11188192
    num_steps_sampled: 11188192
    num_steps_trained: 11188192
  iterations_since_restore: 2

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1823,75447.1,11188192,1.88672,1.9776,-2,38.8774


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11192192
  custom_metrics: {}
  date: 2021-12-10_10-43-16
  done: false
  episode_len_mean: 36.232
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8968831930160523
  episode_reward_min: -2.0
  episodes_this_iter: 125
  episodes_total: 228884
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8231734670698643
          entropy_coeff: 0.0
          kl: 0.014315163542050868
          policy_loss: -0.09689792385324836
          total_loss: -0.03691269189585
          vf_explained_var: 0.6939054131507874
          vf_loss: 0.038244076422415674
    num_agent_steps_sampled: 11192192
    num_steps_sampled: 11192192
    num_steps_trained: 11192192
  iterations_since_restore: 283
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1824,75472.4,11192192,1.89688,1.9772,-2,36.232


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11196192
  custom_metrics: {}
  date: 2021-12-10_10-43-41
  done: false
  episode_len_mean: 37.89
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9246080029010773
  episode_reward_min: 1.5776000022888184
  episodes_this_iter: 94
  episodes_total: 228978
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8732775412499905
          entropy_coeff: 0.0
          kl: 0.01463138940744102
          policy_loss: -0.1066008455818519
          total_loss: -0.05269299371866509
          vf_explained_var: 0.7211762070655823
          vf_loss: 0.03168642660602927
    num_agent_steps_sampled: 11196192
    num_steps_sampled: 11196192
    num_steps_trained: 11196192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1825,75497.5,11196192,1.92461,1.982,1.5776,37.89


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11200192
  custom_metrics: {}
  date: 2021-12-10_10-44-07
  done: false
  episode_len_mean: 38.838095238095235
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8881752377464658
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 229083
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8457688391208649
          entropy_coeff: 0.0
          kl: 0.014364617934916168
          policy_loss: -0.09893394471146166
          total_loss: -0.049066074410802685
          vf_explained_var: 0.8250946998596191
          vf_loss: 0.028051606845110655
    num_agent_steps_sampled: 11200192
    num_steps_sampled: 11200192
    num_steps_trained: 11200192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1826,75522.7,11200192,1.88818,1.9824,-2,38.8381


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11204192
  custom_metrics: {}
  date: 2021-12-10_10-44-32
  done: false
  episode_len_mean: 37.70873786407767
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8871262131385433
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 229186
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8994129858911037
          entropy_coeff: 0.0
          kl: 0.01429156115045771
          policy_loss: -0.09949646855238825
          total_loss: -0.04484809178393334
          vf_explained_var: 0.6891802549362183
          vf_loss: 0.03294306586030871
    num_agent_steps_sampled: 11204192
    num_steps_sampled: 11204192
    num_steps_trained: 11204192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1827,75547.9,11204192,1.88713,1.9836,-2,37.7087


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11208192
  custom_metrics: {}
  date: 2021-12-10_10-44-57
  done: false
  episode_len_mean: 38.69230769230769
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8865538445802836
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 229290
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8509733229875565
          entropy_coeff: 0.0
          kl: 0.014508927532006055
          policy_loss: -0.10045043402351439
          total_loss: -0.04366001134621911
          vf_explained_var: 0.6601070761680603
          vf_loss: 0.03475499467458576
    num_agent_steps_sampled: 11208192
    num_steps_sampled: 11208192
    num_steps_trained: 11208192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1828,75573.4,11208192,1.88655,1.9788,-2,38.6923


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11212192
  custom_metrics: {}
  date: 2021-12-10_10-45-23
  done: false
  episode_len_mean: 38.92
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9226279985904693
  episode_reward_min: 1.5252000093460083
  episodes_this_iter: 100
  episodes_total: 229390
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9210794791579247
          entropy_coeff: 0.0
          kl: 0.015056309872306883
          policy_loss: -0.10703553818166256
          total_loss: -0.054773577488958836
          vf_explained_var: 0.7990322709083557
          vf_loss: 0.029395192628726363
    num_agent_steps_sampled: 11212192
    num_steps_sampled: 11212192
    num_steps_trained: 11212192
  iterations_since_rest

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1829,75598.5,11212192,1.92263,1.9812,1.5252,38.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11216192
  custom_metrics: {}
  date: 2021-12-10_10-45-48
  done: false
  episode_len_mean: 34.96747967479675
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8093105709649684
  episode_reward_min: -2.0
  episodes_this_iter: 123
  episodes_total: 229513
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8475570008158684
          entropy_coeff: 0.0
          kl: 0.01220988150453195
          policy_loss: -0.08836376667022705
          total_loss: -0.02843803307041526
          vf_explained_var: 0.8448941707611084
          vf_loss: 0.04138197784777731
    num_agent_steps_sampled: 11216192
    num_steps_sampled: 11216192
    num_steps_trained: 11216192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1830,75623.5,11216192,1.80931,1.9784,-2,34.9675


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11220192
  custom_metrics: {}
  date: 2021-12-10_10-46-13
  done: false
  episode_len_mean: 35.083333333333336
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.7854222224818335
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 229621
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8635144904255867
          entropy_coeff: 0.0
          kl: 0.013199990906286985
          policy_loss: -0.09534947748761624
          total_loss: -0.02229600661667064
          vf_explained_var: 0.7999029159545898
          vf_loss: 0.05300598428584635
    num_agent_steps_sampled: 11220192
    num_steps_sampled: 11220192
    num_steps_trained: 11220192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1831,75648.7,11220192,1.78542,1.9788,-2,35.0833


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11224192
  custom_metrics: {}
  date: 2021-12-10_10-46-38
  done: false
  episode_len_mean: 34.596330275229356
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8947229319756185
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 229730
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9068840146064758
          entropy_coeff: 0.0
          kl: 0.013515285158064216
          policy_loss: -0.0969313268433325
          total_loss: -0.024899528245441616
          vf_explained_var: 0.688589334487915
          vf_loss: 0.05150545900687575
    num_agent_steps_sampled: 11224192
    num_steps_sampled: 11224192
    num_steps_trained: 11224192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1832,75674.1,11224192,1.89472,1.9832,-2,34.5963


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11228192
  custom_metrics: {}
  date: 2021-12-10_10-47-04
  done: false
  episode_len_mean: 37.26315789473684
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.89558947609182
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 229844
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.856641385704279
          entropy_coeff: 0.0
          kl: 0.014221763063687831
          policy_loss: -0.1007084809243679
          total_loss: -0.044279464083956555
          vf_explained_var: 0.7642043828964233
          vf_loss: 0.03482971538323909
    num_agent_steps_sampled: 11228192
    num_steps_sampled: 11228192
    num_steps_trained: 11228192
  iterations_since_restore: 292

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1833,75699.4,11228192,1.89559,1.9832,-2,37.2632


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11232192
  custom_metrics: {}
  date: 2021-12-10_10-47-29
  done: false
  episode_len_mean: 35.92
  episode_media: {}
  episode_reward_max: 1.985200047492981
  episode_reward_mean: 1.8908000016212463
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 229944
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9141341187059879
          entropy_coeff: 0.0
          kl: 0.013508304953575134
          policy_loss: -0.10122263501398265
          total_loss: -0.041750489559490234
          vf_explained_var: 0.8140862584114075
          vf_loss: 0.038956406991928816
    num_agent_steps_sampled: 11232192
    num_steps_sampled: 11232192
    num_steps_trained: 11232192
  iterations_since_restore: 293
  node

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1834,75724.6,11232192,1.8908,1.9852,-2,35.92


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11236192
  custom_metrics: {}
  date: 2021-12-10_10-47-54
  done: false
  episode_len_mean: 49.27
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.7863880026340484
  episode_reward_min: -2.0
  episodes_this_iter: 95
  episodes_total: 230039
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8708015494048595
          entropy_coeff: 0.0
          kl: 0.014348609314765781
          policy_loss: -0.09865306114079431
          total_loss: -0.033998237864580005
          vf_explained_var: 0.7467019557952881
          vf_loss: 0.042862870497629046
    num_agent_steps_sampled: 11236192
    num_steps_sampled: 11236192
    num_steps_trained: 11236192
  iterations_since_restore: 294
  node

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1835,75749.8,11236192,1.78639,1.9788,-2,49.27


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11240192
  custom_metrics: {}
  date: 2021-12-10_10-48-19
  done: false
  episode_len_mean: 36.01960784313726
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8897921536483018
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 230141
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9268909133970737
          entropy_coeff: 0.0
          kl: 0.013911501155234873
          policy_loss: -0.10270212148316205
          total_loss: -0.02909416425973177
          vf_explained_var: 0.6272038221359253
          vf_loss: 0.05247986363247037
    num_agent_steps_sampled: 11240192
    num_steps_sampled: 11240192
    num_steps_trained: 11240192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1836,75775.2,11240192,1.88979,1.9792,-2,36.0196


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11244192
  custom_metrics: {}
  date: 2021-12-10_10-48-45
  done: false
  episode_len_mean: 37.861111111111114
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8882629606458876
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 230249
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8558527268469334
          entropy_coeff: 0.0
          kl: 0.014543703582603484
          policy_loss: -0.10069209814537317
          total_loss: -0.0416734783211723
          vf_explained_var: 0.7270405888557434
          vf_loss: 0.03693037084303796
    num_agent_steps_sampled: 11244192
    num_steps_sampled: 11244192
    num_steps_trained: 11244192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1837,75800.4,11244192,1.88826,1.9832,-2,37.8611


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11248192
  custom_metrics: {}
  date: 2021-12-10_10-49-10
  done: false
  episode_len_mean: 35.35398230088496
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.929674336340575
  episode_reward_min: 1.6008000373840332
  episodes_this_iter: 113
  episodes_total: 230362
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8500668555498123
          entropy_coeff: 0.0
          kl: 0.016117660386953503
          policy_loss: -0.10879064878099598
          total_loss: -0.05064059153664857
          vf_explained_var: 0.6359900236129761
          vf_loss: 0.03367136116139591
    num_agent_steps_sampled: 11248192
    num_steps_sampled: 11248192
    num_steps_trained: 11248192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1838,75825.5,11248192,1.92967,1.9832,1.6008,35.354


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11252192
  custom_metrics: {}
  date: 2021-12-10_10-49-35
  done: false
  episode_len_mean: 35.32
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9296960020065308
  episode_reward_min: 1.6763999462127686
  episodes_this_iter: 97
  episodes_total: 230459
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.967622634023428
          entropy_coeff: 0.0
          kl: 0.014888451260048896
          policy_loss: -0.10842822038102895
          total_loss: -0.056242164282593876
          vf_explained_var: 0.7671511173248291
          vf_loss: 0.02957422228064388
    num_agent_steps_sampled: 11252192
    num_steps_sampled: 11252192
    num_steps_trained: 11252192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1839,75850.6,11252192,1.9297,1.9828,1.6764,35.32


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11256192
  custom_metrics: {}
  date: 2021-12-10_10-50-01
  done: false
  episode_len_mean: 39.62
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8421480011940004
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 230556
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9462666288018227
          entropy_coeff: 0.0
          kl: 0.012318716384470463
          policy_loss: -0.08928856218699366
          total_loss: -0.0009939753217622638
          vf_explained_var: 0.6423196792602539
          vf_loss: 0.0695855391677469
    num_agent_steps_sampled: 11256192
    num_steps_sampled: 11256192
    num_steps_trained: 11256192
  iterations_since_restore: 299
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1840,75876.4,11256192,1.84215,1.9788,-2,39.62


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11260192
  custom_metrics: {}
  date: 2021-12-10_10-50-26
  done: false
  episode_len_mean: 39.33009708737864
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.846124277531522
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 230659
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.921709205955267
          entropy_coeff: 0.0
          kl: 0.01317719678627327
          policy_loss: -0.0941271077026613
          total_loss: -0.03474367310991511
          vf_explained_var: 0.7825401425361633
          vf_loss: 0.039370563928969204
    num_agent_steps_sampled: 11260192
    num_steps_sampled: 11260192
    num_steps_trained: 11260192
  iterations_since_restore: 30

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1841,75901.5,11260192,1.84612,1.9784,-2,39.3301


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11264192
  custom_metrics: {}
  date: 2021-12-10_10-50-51
  done: false
  episode_len_mean: 38.90291262135922
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8842407765897733
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 230762
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9134657718241215
          entropy_coeff: 0.0
          kl: 0.013559432583861053
          policy_loss: -0.09833638311829418
          total_loss: -0.03215504781110212
          vf_explained_var: 0.6938314437866211
          vf_loss: 0.045587950153276324
    num_agent_steps_sampled: 11264192
    num_steps_sampled: 11264192
    num_steps_trained: 11264192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1842,75926.3,11264192,1.88424,1.984,-2,38.9029


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11268192
  custom_metrics: {}
  date: 2021-12-10_10-51-16
  done: false
  episode_len_mean: 43.648648648648646
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9131099082328178
  episode_reward_min: 0.4936000108718872
  episodes_this_iter: 111
  episodes_total: 230873
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8782111518085003
          entropy_coeff: 0.0
          kl: 0.015209389035589993
          policy_loss: -0.10962388885673136
          total_loss: -0.048850940307602286
          vf_explained_var: 0.6801033020019531
          vf_loss: 0.03767369547858834
    num_agent_steps_sampled: 11268192
    num_steps_sampled: 11268192
    num_steps_trained: 11268192
  iteration

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1843,75951.3,11268192,1.91311,1.978,0.4936,43.6486


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11272192
  custom_metrics: {}
  date: 2021-12-10_10-51-41
  done: false
  episode_len_mean: 37.23
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8862799990177155
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 230963
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.945255696773529
          entropy_coeff: 0.0
          kl: 0.013959547271952033
          policy_loss: -0.1023856324609369
          total_loss: -0.03321927320212126
          vf_explained_var: 0.6731134653091431
          vf_loss: 0.04796529922168702
    num_agent_steps_sampled: 11272192
    num_steps_sampled: 11272192
    num_steps_trained: 11272192
  iterations_since_restore: 303
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1844,75976.4,11272192,1.88628,1.9788,-2,37.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11276192
  custom_metrics: {}
  date: 2021-12-10_10-52-06
  done: false
  episode_len_mean: 45.922330097087375
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8703495159889887
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 231066
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9019672684371471
          entropy_coeff: 0.0
          kl: 0.015616223390679806
          policy_loss: -0.11021136393537745
          total_loss: -0.04912591050378978
          vf_explained_var: 0.6985252499580383
          vf_loss: 0.03736830921843648
    num_agent_steps_sampled: 11276192
    num_steps_sampled: 11276192
    num_steps_trained: 11276192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1845,76001.6,11276192,1.87035,1.978,-2,45.9223


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11280192
  custom_metrics: {}
  date: 2021-12-10_10-52-31
  done: false
  episode_len_mean: 34.728155339805824
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8189320402237976
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 231169
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9164583720266819
          entropy_coeff: 0.0
          kl: 0.013371292618103325
          policy_loss: -0.09493382461369038
          total_loss: -0.02312896039802581
          vf_explained_var: 0.7660713195800781
          vf_loss: 0.05149721330963075
    num_agent_steps_sampled: 11280192
    num_steps_sampled: 11280192
    num_steps_trained: 11280192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1846,76026.9,11280192,1.81893,1.9832,-2,34.7282


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11284192
  custom_metrics: {}
  date: 2021-12-10_10-52-56
  done: false
  episode_len_mean: 36.482142857142854
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8232964277267456
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 231281
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.884445309638977
          entropy_coeff: 0.0
          kl: 0.013288025453221053
          policy_loss: -0.09617724973941222
          total_loss: -0.020635776570998132
          vf_explained_var: 0.7497639656066895
          vf_loss: 0.05536028556525707
    num_agent_steps_sampled: 11284192
    num_steps_sampled: 11284192
    num_steps_trained: 11284192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1847,76051.9,11284192,1.8233,1.9792,-2,36.4821


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11288192
  custom_metrics: {}
  date: 2021-12-10_10-53-22
  done: false
  episode_len_mean: 38.108108108108105
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8562126095230516
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 231392
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8661736994981766
          entropy_coeff: 0.0
          kl: 0.013792805199045688
          policy_loss: -0.0990648886654526
          total_loss: -0.04103506018873304
          vf_explained_var: 0.7785528898239136
          vf_loss: 0.037082004244439304
    num_agent_steps_sampled: 11288192
    num_steps_sampled: 11288192
    num_steps_trained: 11288192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1848,76077.2,11288192,1.85621,1.984,-2,38.1081


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11292192
  custom_metrics: {}
  date: 2021-12-10_10-53-47
  done: false
  episode_len_mean: 32.109243697478995
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9362420164236502
  episode_reward_min: 1.6252000331878662
  episodes_this_iter: 119
  episodes_total: 231511
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8753269761800766
          entropy_coeff: 0.0
          kl: 0.013873219664674252
          policy_loss: -0.10474156332202256
          total_loss: -0.04294212922104634
          vf_explained_var: 0.6780648827552795
          vf_loss: 0.04072948219254613
    num_agent_steps_sampled: 11292192
    num_steps_sampled: 11292192
    num_steps_trained: 11292192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1849,76102.4,11292192,1.93624,1.9808,1.6252,32.1092


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11296192
  custom_metrics: {}
  date: 2021-12-10_10-54-12
  done: false
  episode_len_mean: 41.14
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.9180719983577728
  episode_reward_min: 1.5476000308990479
  episodes_this_iter: 100
  episodes_total: 231611
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9358654245734215
          entropy_coeff: 0.0
          kl: 0.015040171158034354
          policy_loss: -0.1108672876143828
          total_loss: -0.05507917582872324
          vf_explained_var: 0.6893855333328247
          vf_loss: 0.032945854822173715
    num_agent_steps_sampled: 11296192
    num_steps_sampled: 11296192
    num_steps_trained: 11296192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1850,76127.4,11296192,1.91807,1.978,1.5476,41.14


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11300192
  custom_metrics: {}
  date: 2021-12-10_10-54-37
  done: false
  episode_len_mean: 39.009803921568626
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.81069019612144
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 231713
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8712286092340946
          entropy_coeff: 0.0
          kl: 0.012755810399539769
          policy_loss: -0.09167402551975101
          total_loss: -0.026236849516863003
          vf_explained_var: 0.7409048080444336
          vf_loss: 0.04606429301202297
    num_agent_steps_sampled: 11300192
    num_steps_sampled: 11300192
    num_steps_trained: 11300192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1851,76152.4,11300192,1.81069,1.9836,-2,39.0098


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11304192
  custom_metrics: {}
  date: 2021-12-10_10-55-03
  done: false
  episode_len_mean: 41.24
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.84164400100708
  episode_reward_min: -2.0
  episodes_this_iter: 96
  episodes_total: 231809
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8942366130650043
          entropy_coeff: 0.0
          kl: 0.013872499635908753
          policy_loss: -0.09666804247535765
          total_loss: -0.022169069678056985
          vf_explained_var: 0.721813440322876
          vf_loss: 0.053430113242939115
    num_agent_steps_sampled: 11304192
    num_steps_sampled: 11304192
    num_steps_trained: 11304192
  iterations_since_restore: 311
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1852,76177.9,11304192,1.84164,1.9796,-2,41.24


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11308192
  custom_metrics: {}
  date: 2021-12-10_10-55-28
  done: false
  episode_len_mean: 36.77981651376147
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8195522968922186
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 231918
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8871412724256516
          entropy_coeff: 0.0
          kl: 0.012477759039029479
          policy_loss: -0.09533894900232553
          total_loss: -0.025862810609396547
          vf_explained_var: 0.8399675488471985
          vf_loss: 0.05052554444409907
    num_agent_steps_sampled: 11308192
    num_steps_sampled: 11308192
    num_steps_trained: 11308192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1853,76203,11308192,1.81955,1.9836,-2,36.7798


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11312192
  custom_metrics: {}
  date: 2021-12-10_10-55-53
  done: false
  episode_len_mean: 36.03669724770642
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8578421992993137
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 232027
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8730280883610249
          entropy_coeff: 0.0
          kl: 0.013738821551669389
          policy_loss: -0.09965687489602715
          total_loss: -0.029853516665752977
          vf_explained_var: 0.7358877062797546
          vf_loss: 0.048937522107735276
    num_agent_steps_sampled: 11312192
    num_steps_sampled: 11312192
    num_steps_trained: 11312192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1854,76228.1,11312192,1.85784,1.9784,-2,36.0367


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11316192
  custom_metrics: {}
  date: 2021-12-10_10-56-18
  done: false
  episode_len_mean: 42.48
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.7993719947338105
  episode_reward_min: -2.0
  episodes_this_iter: 92
  episodes_total: 232119
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9479362927377224
          entropy_coeff: 0.0
          kl: 0.013989608851261437
          policy_loss: -0.09621244110167027
          total_loss: -0.023145344690419734
          vf_explained_var: 0.8035615682601929
          vf_loss: 0.05182037269696593
    num_agent_steps_sampled: 11316192
    num_steps_sampled: 11316192
    num_steps_trained: 11316192
  iterations_since_restore: 314
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1855,76253.1,11316192,1.79937,1.9808,-2,42.48


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11320192
  custom_metrics: {}
  date: 2021-12-10_10-56-43
  done: false
  episode_len_mean: 36.15596330275229
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.928143119593279
  episode_reward_min: 1.5291999578475952
  episodes_this_iter: 109
  episodes_total: 232228
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9174815937876701
          entropy_coeff: 0.0
          kl: 0.014830084925051779
          policy_loss: -0.1096367142163217
          total_loss: -0.05651752604171634
          vf_explained_var: 0.7911714315414429
          vf_loss: 0.03059599397238344
    num_agent_steps_sampled: 11320192
    num_steps_sampled: 11320192
    num_steps_trained: 11320192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1856,76278.3,11320192,1.92814,1.9804,1.5292,36.156


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11324192
  custom_metrics: {}
  date: 2021-12-10_10-57-08
  done: false
  episode_len_mean: 38.46
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8462799978256226
  episode_reward_min: -2.0
  episodes_this_iter: 98
  episodes_total: 232326
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9136706218123436
          entropy_coeff: 0.0
          kl: 0.013439316360745579
          policy_loss: -0.09815971850184724
          total_loss: -0.03604747157078236
          vf_explained_var: 0.842960000038147
          vf_loss: 0.041701290057972074
    num_agent_steps_sampled: 11324192
    num_steps_sampled: 11324192
    num_steps_trained: 11324192
  iterations_since_restore: 316
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1857,76303.5,11324192,1.84628,1.9796,-2,38.46


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11328192
  custom_metrics: {}
  date: 2021-12-10_10-57-33
  done: false
  episode_len_mean: 39.31578947368421
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8199228048324585
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 232440
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8633064292371273
          entropy_coeff: 0.0
          kl: 0.013581716804765165
          policy_loss: -0.09556730388430879
          total_loss: -0.028583728475496173
          vf_explained_var: 0.7434695959091187
          vf_loss: 0.046356342267245054
    num_agent_steps_sampled: 11328192
    num_steps_sampled: 11328192
    num_steps_trained: 11328192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1858,76328.5,11328192,1.81992,1.98,-2,39.3158


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11332192
  custom_metrics: {}
  date: 2021-12-10_10-57-58
  done: false
  episode_len_mean: 31.247863247863247
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.937911100876637
  episode_reward_min: 1.7676000595092773
  episodes_this_iter: 117
  episodes_total: 232557
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8686951249837875
          entropy_coeff: 0.0
          kl: 0.014000335475429893
          policy_loss: -0.09606042655650526
          total_loss: -0.03169090661685914
          vf_explained_var: 0.6850428581237793
          vf_loss: 0.043106510769575834
    num_agent_steps_sampled: 11332192
    num_steps_sampled: 11332192
    num_steps_trained: 11332192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1859,76353.6,11332192,1.93791,1.9844,1.7676,31.2479


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11336192
  custom_metrics: {}
  date: 2021-12-10_10-58-24
  done: false
  episode_len_mean: 41.76
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.838076000213623
  episode_reward_min: -2.0
  episodes_this_iter: 94
  episodes_total: 232651
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.928826916962862
          entropy_coeff: 0.0
          kl: 0.012842270138207823
          policy_loss: -0.09371665399521589
          total_loss: -0.01980415591970086
          vf_explained_var: 0.7417469024658203
          vf_loss: 0.054408298106864095
    num_agent_steps_sampled: 11336192
    num_steps_sampled: 11336192
    num_steps_trained: 11336192
  iterations_since_restore: 319
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1860,76378.9,11336192,1.83808,1.98,-2,41.76


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11340192
  custom_metrics: {}
  date: 2021-12-10_10-58-49
  done: false
  episode_len_mean: 38.74074074074074
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.7792259222931333
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 232759
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8797766081988811
          entropy_coeff: 0.0
          kl: 0.012396429141517729
          policy_loss: -0.09178315266035497
          total_loss: -0.014196485979482532
          vf_explained_var: 0.7700737714767456
          vf_loss: 0.058759590378031135
    num_agent_steps_sampled: 11340192
    num_steps_sampled: 11340192
    num_steps_trained: 11340192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1861,76403.8,11340192,1.77923,1.98,-2,38.7407


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11344192
  custom_metrics: {}
  date: 2021-12-10_10-59-14
  done: false
  episode_len_mean: 33.578125
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9026156282052398
  episode_reward_min: -2.0
  episodes_this_iter: 128
  episodes_total: 232887
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.834121361374855
          entropy_coeff: 0.0
          kl: 0.013565008121076971
          policy_loss: -0.09238548122812063
          total_loss: -0.020714154801680706
          vf_explained_var: 0.599632978439331
          vf_loss: 0.051069475477561355
    num_agent_steps_sampled: 11344192
    num_steps_sampled: 11344192
    num_steps_trained: 11344192
  iterations_since_restore: 321
  no

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1862,76429.2,11344192,1.90262,1.9816,-2,33.5781


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11348192
  custom_metrics: {}
  date: 2021-12-10_10-59-39
  done: false
  episode_len_mean: 36.99
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8871999967098236
  episode_reward_min: -2.0
  episodes_this_iter: 90
  episodes_total: 232977
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9621802121400833
          entropy_coeff: 0.0
          kl: 0.015388480154797435
          policy_loss: -0.11466609442140907
          total_loss: -0.059753018023911864
          vf_explained_var: 0.7411720156669617
          vf_loss: 0.0315418248064816
    num_agent_steps_sampled: 11348192
    num_steps_sampled: 11348192
    num_steps_trained: 11348192
  iterations_since_restore: 322
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1863,76454,11348192,1.8872,1.978,-2,36.99


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11352192
  custom_metrics: {}
  date: 2021-12-10_11-00-04
  done: false
  episode_len_mean: 41.25471698113208
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.9178565988000833
  episode_reward_min: 1.4507999420166016
  episodes_this_iter: 106
  episodes_total: 233083
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8899436667561531
          entropy_coeff: 0.0
          kl: 0.015529218188021332
          policy_loss: -0.11106215766631067
          total_loss: -0.05594125599600375
          vf_explained_var: 0.7042257785797119
          vf_loss: 0.03153590334113687
    num_agent_steps_sampled: 11352192
    num_steps_sampled: 11352192
    num_steps_trained: 11352192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1864,76479.1,11352192,1.91786,1.9772,1.4508,41.2547


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11356192
  custom_metrics: {}
  date: 2021-12-10_11-00-30
  done: false
  episode_len_mean: 40.54
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9192840027809144
  episode_reward_min: 1.597599983215332
  episodes_this_iter: 90
  episodes_total: 233173
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9226625449955463
          entropy_coeff: 0.0
          kl: 0.014972884207963943
          policy_loss: -0.10636961506679654
          total_loss: -0.04544121806975454
          vf_explained_var: 0.6591188907623291
          vf_loss: 0.03818833129480481
    num_agent_steps_sampled: 11356192
    num_steps_sampled: 11356192
    num_steps_trained: 11356192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1865,76504.5,11356192,1.91928,1.9796,1.5976,40.54


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11360192
  custom_metrics: {}
  date: 2021-12-10_11-00-55
  done: false
  episode_len_mean: 36.65137614678899
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.783115595852563
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 233282
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.888213150203228
          entropy_coeff: 0.0
          kl: 0.012083016277756542
          policy_loss: -0.08800156746292487
          total_loss: -0.017635342723224312
          vf_explained_var: 0.7756068706512451
          vf_loss: 0.05201514204964042
    num_agent_steps_sampled: 11360192
    num_steps_sampled: 11360192
    num_steps_trained: 11360192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1866,76529.8,11360192,1.78312,1.9836,-2,36.6514


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11364192
  custom_metrics: {}
  date: 2021-12-10_11-01-20
  done: false
  episode_len_mean: 37.77049180327869
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8680622997831127
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 233404
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8554462417960167
          entropy_coeff: 0.0
          kl: 0.01368514692876488
          policy_loss: -0.0975493413861841
          total_loss: -0.036961804144084454
          vf_explained_var: 0.7737463712692261
          vf_loss: 0.03980322112329304
    num_agent_steps_sampled: 11364192
    num_steps_sampled: 11364192
    num_steps_trained: 11364192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1867,76554.5,11364192,1.86806,1.9792,-2,37.7705


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11368192
  custom_metrics: {}
  date: 2021-12-10_11-01-44
  done: false
  episode_len_mean: 35.450980392156865
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9295254852257522
  episode_reward_min: 1.722000002861023
  episodes_this_iter: 102
  episodes_total: 233506
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8897073566913605
          entropy_coeff: 0.0
          kl: 0.01488253171555698
          policy_loss: -0.10771544516319409
          total_loss: -0.0490110776736401
          vf_explained_var: 0.693511962890625
          vf_loss: 0.03610152320470661
    num_agent_steps_sampled: 11368192
    num_steps_sampled: 11368192
    num_steps_trained: 11368192
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1868,76579.3,11368192,1.92953,1.9816,1.722,35.451


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11372192
  custom_metrics: {}
  date: 2021-12-10_11-02-10
  done: false
  episode_len_mean: 33.14782608695652
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8356765187304953
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 233621
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9053203761577606
          entropy_coeff: 0.0
          kl: 0.013106499623972923
          policy_loss: -0.0935498327598907
          total_loss: -0.03384263039333746
          vf_explained_var: 0.841166615486145
          vf_loss: 0.03980170493014157
    num_agent_steps_sampled: 11372192
    num_steps_sampled: 11372192
    num_steps_trained: 11372192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1869,76604.5,11372192,1.83568,1.98,-2,33.1478


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11376192
  custom_metrics: {}
  date: 2021-12-10_11-02-35
  done: false
  episode_len_mean: 34.111111111111114
  episode_media: {}
  episode_reward_max: 1.9844000339508057
  episode_reward_mean: 1.9011968298563882
  episode_reward_min: -2.0
  episodes_this_iter: 126
  episodes_total: 233747
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8311662524938583
          entropy_coeff: 0.0
          kl: 0.012952348624821752
          policy_loss: -0.08951498847454786
          total_loss: -0.025789035484194756
          vf_explained_var: 0.6987680196762085
          vf_loss: 0.04405457607936114
    num_agent_steps_sampled: 11376192
    num_steps_sampled: 11376192
    num_steps_trained: 11376192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1870,76629.6,11376192,1.9012,1.9844,-2,34.1111


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11380192
  custom_metrics: {}
  date: 2021-12-10_11-03-00
  done: false
  episode_len_mean: 38.21495327102804
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8868074896179627
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 233854
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9005393832921982
          entropy_coeff: 0.0
          kl: 0.01364653604105115
          policy_loss: -0.10154331102967262
          total_loss: -0.03472037773462944
          vf_explained_var: 0.6242889165878296
          vf_loss: 0.046097255777567625
    num_agent_steps_sampled: 11380192
    num_steps_sampled: 11380192
    num_steps_trained: 11380192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1871,76654.4,11380192,1.88681,1.9792,-2,38.215


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11384192
  custom_metrics: {}
  date: 2021-12-10_11-03-25
  done: false
  episode_len_mean: 34.32773109243698
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.9318016777519418
  episode_reward_min: 1.5371999740600586
  episodes_this_iter: 119
  episodes_total: 233973
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8134754821658134
          entropy_coeff: 0.0
          kl: 0.014912181242834777
          policy_loss: -0.10435815894743428
          total_loss: -0.05114314646925777
          vf_explained_var: 0.6262861490249634
          vf_loss: 0.030567142879590392
    num_agent_steps_sampled: 11384192
    num_steps_sampled: 11384192
    num_steps_trained: 11384192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1872,76679.3,11384192,1.9318,1.9804,1.5372,34.3277


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11388192
  custom_metrics: {}
  date: 2021-12-10_11-03-50
  done: false
  episode_len_mean: 35.31818181818182
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.7884945500980725
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 234083
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8802765384316444
          entropy_coeff: 0.0
          kl: 0.012433799623977393
          policy_loss: -0.0876842177240178
          total_loss: -0.021545489551499486
          vf_explained_var: 0.7925775051116943
          vf_loss: 0.047254891716875136
    num_agent_steps_sampled: 11388192
    num_steps_sampled: 11388192
    num_steps_trained: 11388192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1873,76704.8,11388192,1.78849,1.9792,-2,35.3182


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11392192
  custom_metrics: {}
  date: 2021-12-10_11-04-16
  done: false
  episode_len_mean: 38.416666666666664
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8529814779758453
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 234191
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8865244649350643
          entropy_coeff: 0.0
          kl: 0.013484466238878667
          policy_loss: -0.09261854784563184
          total_loss: -0.02841828588861972
          vf_explained_var: 0.7958049774169922
          vf_loss: 0.04372073127888143
    num_agent_steps_sampled: 11392192
    num_steps_sampled: 11392192
    num_steps_trained: 11392192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1874,76730.7,11392192,1.85298,1.9792,-2,38.4167


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11396192
  custom_metrics: {}
  date: 2021-12-10_11-04-41
  done: false
  episode_len_mean: 32.467213114754095
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8705639331067194
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 234313
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.805128887295723
          entropy_coeff: 0.0
          kl: 0.012952178425621241
          policy_loss: -0.08508058229926974
          total_loss: -0.016966954339295626
          vf_explained_var: 0.7044904232025146
          vf_loss: 0.048442506697028875
    num_agent_steps_sampled: 11396192
    num_steps_sampled: 11396192
    num_steps_trained: 11396192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1875,76755.2,11396192,1.87056,1.9784,-2,32.4672


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11400192
  custom_metrics: {}
  date: 2021-12-10_11-05-05
  done: false
  episode_len_mean: 32.08527131782946
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.905506974966951
  episode_reward_min: -2.0
  episodes_this_iter: 129
  episodes_total: 234442
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8117873594164848
          entropy_coeff: 0.0
          kl: 0.01394548948155716
          policy_loss: -0.09290013392455876
          total_loss: -0.026383226097095758
          vf_explained_var: 0.5318782925605774
          vf_loss: 0.045337200397625566
    num_agent_steps_sampled: 11400192
    num_steps_sampled: 11400192
    num_steps_trained: 11400192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1876,76780,11400192,1.90551,1.978,-2,32.0853


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11404192
  custom_metrics: {}
  date: 2021-12-10_11-05-31
  done: false
  episode_len_mean: 36.304761904761904
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8529714300518945
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 234547
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8731505386531353
          entropy_coeff: 0.0
          kl: 0.013432020554319024
          policy_loss: -0.08921890682540834
          total_loss: -0.02792169278836809
          vf_explained_var: 0.7253087759017944
          vf_loss: 0.0408973308512941
    num_agent_steps_sampled: 11404192
    num_steps_sampled: 11404192
    num_steps_trained: 11404192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1877,76805.4,11404192,1.85297,1.978,-2,36.3048


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11408192
  custom_metrics: {}
  date: 2021-12-10_11-05-55
  done: false
  episode_len_mean: 39.49
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8829440033435823
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 234646
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8944817446172237
          entropy_coeff: 0.0
          kl: 0.013669913168996572
          policy_loss: -0.09730098012369126
          total_loss: -0.04428920865757391
          vf_explained_var: 0.7404162883758545
          vf_loss: 0.03225059085525572
    num_agent_steps_sampled: 11408192
    num_steps_sampled: 11408192
    num_steps_trained: 11408192
  iterations_since_restore: 337
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1878,76829.9,11408192,1.88294,1.9772,-2,39.49


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11412192
  custom_metrics: {}
  date: 2021-12-10_11-06-20
  done: false
  episode_len_mean: 35.0
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8944909128275784
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 234756
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8584842085838318
          entropy_coeff: 0.0
          kl: 0.014332952443510294
          policy_loss: -0.09306710367673077
          total_loss: -0.03144106036052108
          vf_explained_var: 0.6212907433509827
          vf_loss: 0.03985787206329405
    num_agent_steps_sampled: 11412192
    num_steps_sampled: 11412192
    num_steps_trained: 11412192
  iterations_since_restore: 338
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1879,76854.6,11412192,1.89449,1.9804,-2,35


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11416192
  custom_metrics: {}
  date: 2021-12-10_11-06-46
  done: false
  episode_len_mean: 33.951219512195124
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8681495945628097
  episode_reward_min: -2.0
  episodes_this_iter: 123
  episodes_total: 234879
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9075735844671726
          entropy_coeff: 0.0
          kl: 0.012647559924516827
          policy_loss: -0.09036887588445097
          total_loss: -0.017754962580511346
          vf_explained_var: 0.7100182771682739
          vf_loss: 0.05340543435886502
    num_agent_steps_sampled: 11416192
    num_steps_sampled: 11416192
    num_steps_trained: 11416192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1880,76880.5,11416192,1.86815,1.9816,-2,33.9512


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11420192
  custom_metrics: {}
  date: 2021-12-10_11-07-11
  done: false
  episode_len_mean: 35.628571428571426
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8189676193963913
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 234984
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9036610908806324
          entropy_coeff: 0.0
          kl: 0.01244558661710471
          policy_loss: -0.09463970747310668
          total_loss: -0.031425408786162734
          vf_explained_var: 0.7827839851379395
          vf_loss: 0.04431256267707795
    num_agent_steps_sampled: 11420192
    num_steps_sampled: 11420192
    num_steps_trained: 11420192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1881,76905.2,11420192,1.81897,1.9784,-2,35.6286


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11424192
  custom_metrics: {}
  date: 2021-12-10_11-07-36
  done: false
  episode_len_mean: 43.41346153846154
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8398769211310606
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 235088
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8625246956944466
          entropy_coeff: 0.0
          kl: 0.013294065895024687
          policy_loss: -0.09219506921363063
          total_loss: -0.02218526427168399
          vf_explained_var: 0.6923058032989502
          vf_loss: 0.04981944081373513
    num_agent_steps_sampled: 11424192
    num_steps_sampled: 11424192
    num_steps_trained: 11424192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1882,76930.1,11424192,1.83988,1.9796,-2,43.4135


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11428192
  custom_metrics: {}
  date: 2021-12-10_11-08-00
  done: false
  episode_len_mean: 34.96330275229358
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.7883192596085575
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 235197
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8649508841335773
          entropy_coeff: 0.0
          kl: 0.012561033829115331
          policy_loss: -0.08955248282290995
          total_loss: -0.0027470014756545424
          vf_explained_var: 0.7080428600311279
          vf_loss: 0.06772841326892376
    num_agent_steps_sampled: 11428192
    num_steps_sampled: 11428192
    num_steps_trained: 11428192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1883,76954.7,11428192,1.78832,1.9808,-2,34.9633


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11432192
  custom_metrics: {}
  date: 2021-12-10_11-08-25
  done: false
  episode_len_mean: 37.627272727272725
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8564981861547991
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 235307
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8374954909086227
          entropy_coeff: 0.0
          kl: 0.01330985315144062
          policy_loss: -0.09562697022920474
          total_loss: -0.03762053942773491
          vf_explained_var: 0.7788674831390381
          vf_loss: 0.03779209335334599
    num_agent_steps_sampled: 11432192
    num_steps_sampled: 11432192
    num_steps_trained: 11432192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1884,76979.1,11432192,1.8565,1.978,-2,37.6273


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11436192
  custom_metrics: {}
  date: 2021-12-10_11-08-49
  done: false
  episode_len_mean: 32.114285714285714
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8986285663786389
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 235412
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8953706361353397
          entropy_coeff: 0.0
          kl: 0.014062180882319808
          policy_loss: -0.1008951966650784
          total_loss: -0.03540325741050765
          vf_explained_var: 0.6922434568405151
          vf_loss: 0.04413500404916704
    num_agent_steps_sampled: 11436192
    num_steps_sampled: 11436192
    num_steps_trained: 11436192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1885,77003.6,11436192,1.89863,1.984,-2,32.1143


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11440192
  custom_metrics: {}
  date: 2021-12-10_11-09-15
  done: false
  episode_len_mean: 38.13461538461539
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.7735115335537837
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 235516
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8522675521671772
          entropy_coeff: 0.0
          kl: 0.012413669261150062
          policy_loss: -0.08731515659019351
          total_loss: -0.012074400336132385
          vf_explained_var: 0.79057776927948
          vf_loss: 0.056387493619695306
    num_agent_steps_sampled: 11440192
    num_steps_sampled: 11440192
    num_steps_trained: 11440192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1886,77029,11440192,1.77351,1.98,-2,38.1346


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11444192
  custom_metrics: {}
  date: 2021-12-10_11-09-41
  done: false
  episode_len_mean: 37.624
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8945824012756347
  episode_reward_min: -2.0
  episodes_this_iter: 125
  episodes_total: 235641
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.829086422920227
          entropy_coeff: 0.0
          kl: 0.014090472017414868
          policy_loss: -0.10101234470494092
          total_loss: -0.037713272438850254
          vf_explained_var: 0.7130212783813477
          vf_loss: 0.0418991653714329
    num_agent_steps_sampled: 11444192
    num_steps_sampled: 11444192
    num_steps_trained: 11444192
  iterations_since_restore: 346
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1887,77054.7,11444192,1.89458,1.9816,-2,37.624


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11448192
  custom_metrics: {}
  date: 2021-12-10_11-10-06
  done: false
  episode_len_mean: 31.486486486486488
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8328504476461325
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 235752
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8796081654727459
          entropy_coeff: 0.0
          kl: 0.01257803081534803
          policy_loss: -0.09064130822662264
          total_loss: -0.025740339304320514
          vf_explained_var: 0.8208694458007812
          vf_loss: 0.04579808877315372
    num_agent_steps_sampled: 11448192
    num_steps_sampled: 11448192
    num_steps_trained: 11448192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1888,77080,11448192,1.83285,1.98,-2,31.4865


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11452192
  custom_metrics: {}
  date: 2021-12-10_11-10-32
  done: false
  episode_len_mean: 39.883495145631066
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8472271891473566
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 235855
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9255894534289837
          entropy_coeff: 0.0
          kl: 0.013945584301836789
          policy_loss: -0.10054208285873756
          total_loss: -0.0332682170701446
          vf_explained_var: 0.7854247093200684
          vf_loss: 0.046094011748209596
    num_agent_steps_sampled: 11452192
    num_steps_sampled: 11452192
    num_steps_trained: 11452192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1889,77105.7,11452192,1.84723,1.9788,-2,39.8835


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11456192
  custom_metrics: {}
  date: 2021-12-10_11-10-57
  done: false
  episode_len_mean: 35.87068965517241
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9286275834872806
  episode_reward_min: 1.6131999492645264
  episodes_this_iter: 116
  episodes_total: 235971
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8630097098648548
          entropy_coeff: 0.0
          kl: 0.014725454151630402
          policy_loss: -0.10943633923307061
          total_loss: -0.044497052615042776
          vf_explained_var: 0.6664369702339172
          vf_loss: 0.04257500567473471
    num_agent_steps_sampled: 11456192
    num_steps_sampled: 11456192
    num_steps_trained: 11456192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1890,77131.6,11456192,1.92863,1.984,1.6132,35.8707


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11460192
  custom_metrics: {}
  date: 2021-12-10_11-11-22
  done: false
  episode_len_mean: 39.78217821782178
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8052910896811154
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 236072
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8575499877333641
          entropy_coeff: 0.0
          kl: 0.012325309740845114
          policy_loss: -0.08851228188723326
          total_loss: -0.011849718517623842
          vf_explained_var: 0.7003806829452515
          vf_loss: 0.057943498250097036
    num_agent_steps_sampled: 11460192
    num_steps_sampled: 11460192
    num_steps_trained: 11460192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1891,77156.5,11460192,1.80529,1.9816,-2,39.7822


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11464192
  custom_metrics: {}
  date: 2021-12-10_11-11-47
  done: false
  episode_len_mean: 32.588235294117645
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8367932784457166
  episode_reward_min: -2.0
  episodes_this_iter: 119
  episodes_total: 236191
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8714400976896286
          entropy_coeff: 0.0
          kl: 0.013522319379262626
          policy_loss: -0.09564411011524498
          total_loss: -0.017170229402836412
          vf_explained_var: 0.7511278390884399
          vf_loss: 0.05793685920070857
    num_agent_steps_sampled: 11464192
    num_steps_sampled: 11464192
    num_steps_trained: 11464192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1892,77181.5,11464192,1.83679,1.984,-2,32.5882


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11468192
  custom_metrics: {}
  date: 2021-12-10_11-12-13
  done: false
  episode_len_mean: 38.87619047619047
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8873676141103108
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 236296
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8638788871467113
          entropy_coeff: 0.0
          kl: 0.0144558030879125
          policy_loss: -0.10340168769471347
          total_loss: -0.03922247956506908
          vf_explained_var: 0.7598626613616943
          vf_loss: 0.042224458418786526
    num_agent_steps_sampled: 11468192
    num_steps_sampled: 11468192
    num_steps_trained: 11468192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1893,77207,11468192,1.88737,1.9796,-2,38.8762


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11472192
  custom_metrics: {}
  date: 2021-12-10_11-12-37
  done: false
  episode_len_mean: 34.86725663716814
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.864088499440556
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 236409
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8599101901054382
          entropy_coeff: 0.0
          kl: 0.011998395144473761
          policy_loss: -0.08406503961305134
          total_loss: -0.014747068344149739
          vf_explained_var: 0.7665733695030212
          vf_loss: 0.05109541048295796
    num_agent_steps_sampled: 11472192
    num_steps_sampled: 11472192
    num_steps_trained: 11472192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1894,77231.3,11472192,1.86409,1.9788,-2,34.8673


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11476192
  custom_metrics: {}
  date: 2021-12-10_11-13-02
  done: false
  episode_len_mean: 38.523809523809526
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8176190501167662
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 236514
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8813041336834431
          entropy_coeff: 0.0
          kl: 0.013084356090985239
          policy_loss: -0.09006656834390014
          total_loss: -0.015966342412866652
          vf_explained_var: 0.7827670574188232
          vf_loss: 0.05422836192883551
    num_agent_steps_sampled: 11476192
    num_steps_sampled: 11476192
    num_steps_trained: 11476192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1895,77256.4,11476192,1.81762,1.978,-2,38.5238


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11480192
  custom_metrics: {}
  date: 2021-12-10_11-13-28
  done: false
  episode_len_mean: 37.36283185840708
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9257380529842545
  episode_reward_min: 1.4747999906539917
  episodes_this_iter: 113
  episodes_total: 236627
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8500332571566105
          entropy_coeff: 0.0
          kl: 0.01474094926379621
          policy_loss: -0.10355027572950348
          total_loss: -0.037965110706863925
          vf_explained_var: 0.665104329586029
          vf_loss: 0.043197352439165115
    num_agent_steps_sampled: 11480192
    num_steps_sampled: 11480192
    num_steps_trained: 11480192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1896,77282,11480192,1.92574,1.9792,1.4748,37.3628


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11484192
  custom_metrics: {}
  date: 2021-12-10_11-13-54
  done: false
  episode_len_mean: 34.592592592592595
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8958222203784518
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 236735
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8855858445167542
          entropy_coeff: 0.0
          kl: 0.01412334170890972
          policy_loss: -0.1020410907221958
          total_loss: -0.038342924643075094
          vf_explained_var: 0.7056901454925537
          vf_loss: 0.042248345678672194
    num_agent_steps_sampled: 11484192
    num_steps_sampled: 11484192
    num_steps_trained: 11484192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1897,77307.5,11484192,1.89582,1.984,-2,34.5926


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11488192
  custom_metrics: {}
  date: 2021-12-10_11-14-19
  done: false
  episode_len_mean: 35.921052631578945
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.7926631548948455
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 236849
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.843886986374855
          entropy_coeff: 0.0
          kl: 0.012209076550789177
          policy_loss: -0.09012134041404352
          total_loss: -0.016638895409414545
          vf_explained_var: 0.7769346237182617
          vf_loss: 0.05493991309776902
    num_agent_steps_sampled: 11488192
    num_steps_sampled: 11488192
    num_steps_trained: 11488192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1898,77332.5,11488192,1.79266,1.9784,-2,35.9211


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11492192
  custom_metrics: {}
  date: 2021-12-10_11-14-44
  done: false
  episode_len_mean: 35.47787610619469
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.8610584071252199
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 236962
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8324912935495377
          entropy_coeff: 0.0
          kl: 0.013203725218772888
          policy_loss: -0.08860649331472814
          total_loss: -0.005638727860059589
          vf_explained_var: 0.6037479043006897
          vf_loss: 0.06291461107321084
    num_agent_steps_sampled: 11492192
    num_steps_sampled: 11492192
    num_steps_trained: 11492192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1899,77357.6,11492192,1.86106,1.9772,-2,35.4779


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11496192
  custom_metrics: {}
  date: 2021-12-10_11-15-09
  done: false
  episode_len_mean: 34.49586776859504
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.899666113301742
  episode_reward_min: -2.0
  episodes_this_iter: 121
  episodes_total: 237083
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.84704764559865
          entropy_coeff: 0.0
          kl: 0.012960669759195298
          policy_loss: -0.09801066829822958
          total_loss: -0.027224055083934218
          vf_explained_var: 0.6353108882904053
          vf_loss: 0.05110259517095983
    num_agent_steps_sampled: 11496192
    num_steps_sampled: 11496192
    num_steps_trained: 11496192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1900,77382.6,11496192,1.89967,1.9808,-2,34.4959


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11500192
  custom_metrics: {}
  date: 2021-12-10_11-15-34
  done: false
  episode_len_mean: 36.25
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.889008003473282
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 237183
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8509836085140705
          entropy_coeff: 0.0
          kl: 0.014064681075979024
          policy_loss: -0.10094636830035597
          total_loss: -0.04054698662366718
          vf_explained_var: 0.6775922179222107
          vf_loss: 0.03903864941094071
    num_agent_steps_sampled: 11500192
    num_steps_sampled: 11500192
    num_steps_trained: 11500192
  iterations_since_restore: 360
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1901,77407.8,11500192,1.88901,1.9796,-2,36.25


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11504192
  custom_metrics: {}
  date: 2021-12-10_11-15-59
  done: false
  episode_len_mean: 38.75438596491228
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8894982484349034
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 237297
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8437937088310719
          entropy_coeff: 0.0
          kl: 0.014648074866272509
          policy_loss: -0.10153071535751224
          total_loss: -0.0465274965390563
          vf_explained_var: 0.7926206588745117
          vf_loss: 0.03275645663961768
    num_agent_steps_sampled: 11504192
    num_steps_sampled: 11504192
    num_steps_trained: 11504192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1902,77432.6,11504192,1.8895,1.98,-2,38.7544


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11508192
  custom_metrics: {}
  date: 2021-12-10_11-16-24
  done: false
  episode_len_mean: 35.0
  episode_media: {}
  episode_reward_max: 1.9780000448226929
  episode_reward_mean: 1.8577871563237742
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 237406
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8636906072497368
          entropy_coeff: 0.0
          kl: 0.014053966326173395
          policy_loss: -0.09799740370362997
          total_loss: -0.028168625314719975
          vf_explained_var: 0.6040031909942627
          vf_loss: 0.04848431749269366
    num_agent_steps_sampled: 11508192
    num_steps_sampled: 11508192
    num_steps_trained: 11508192
  iterations_since_restore: 362
  node_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1903,77457.6,11508192,1.85779,1.978,-2,35


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11512192
  custom_metrics: {}
  date: 2021-12-10_11-16-49
  done: false
  episode_len_mean: 39.95049504950495
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.8813702989332746
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 237507
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8211635090410709
          entropy_coeff: 0.0
          kl: 0.013887078268453479
          policy_loss: -0.0964067513414193
          total_loss: -0.034863255568780005
          vf_explained_var: 0.6201351284980774
          vf_loss: 0.04045250092167407
    num_agent_steps_sampled: 11512192
    num_steps_sampled: 11512192
    num_steps_trained: 11512192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1904,77482.5,11512192,1.88137,1.9768,-2,39.9505


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11516192
  custom_metrics: {}
  date: 2021-12-10_11-17-14
  done: false
  episode_len_mean: 37.689320388349515
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.8500893254881923
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 237610
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8753093257546425
          entropy_coeff: 0.0
          kl: 0.013957318966276944
          policy_loss: -0.09681675978936255
          total_loss: -0.041262945742346346
          vf_explained_var: 0.742391049861908
          vf_loss: 0.03435613680630922
    num_agent_steps_sampled: 11516192
    num_steps_sampled: 11516192
    num_steps_trained: 11516192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1905,77507.2,11516192,1.85009,1.9828,-2,37.6893


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11520192
  custom_metrics: {}
  date: 2021-12-10_11-17-40
  done: false
  episode_len_mean: 34.17213114754098
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.7096196715949012
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 237732
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8604328781366348
          entropy_coeff: 0.0
          kl: 0.011453344661276788
          policy_loss: -0.08279690274503082
          total_loss: -0.0036350652226246893
          vf_explained_var: 0.8140010833740234
          vf_loss: 0.061767072416841984
    num_agent_steps_sampled: 11520192
    num_steps_sampled: 11520192
    num_steps_trained: 11520192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1906,77533.2,11520192,1.70962,1.9828,-2,34.1721


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11524192
  custom_metrics: {}
  date: 2021-12-10_11-18-06
  done: false
  episode_len_mean: 34.45132743362832
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9315433607692212
  episode_reward_min: 1.6339999437332153
  episodes_this_iter: 113
  episodes_total: 237845
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8327620103955269
          entropy_coeff: 0.0
          kl: 0.01372479897690937
          policy_loss: -0.09793603152502328
          total_loss: -0.029660244355909526
          vf_explained_var: 0.7200004458427429
          vf_loss: 0.04743125336244702
    num_agent_steps_sampled: 11524192
    num_steps_sampled: 11524192
    num_steps_trained: 11524192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1907,77559.4,11524192,1.93154,1.9796,1.634,34.4513


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11528192
  custom_metrics: {}
  date: 2021-12-10_11-18-32
  done: false
  episode_len_mean: 33.225
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.9339899927377702
  episode_reward_min: 1.5135999917984009
  episodes_this_iter: 120
  episodes_total: 237965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8381757289171219
          entropy_coeff: 0.0
          kl: 0.013901271508075297
          policy_loss: -0.09631534706568345
          total_loss: -0.04097720893332735
          vf_explained_var: 0.7081504464149475
          vf_loss: 0.03422558447346091
    num_agent_steps_sampled: 11528192
    num_steps_sampled: 11528192
    num_steps_trained: 11528192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1908,77585.3,11528192,1.93399,1.984,1.5136,33.225


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11532192
  custom_metrics: {}
  date: 2021-12-10_11-18-58
  done: false
  episode_len_mean: 34.660714285714285
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8268071466258593
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 238077
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8253318034112453
          entropy_coeff: 0.0
          kl: 0.012695158016867936
          policy_loss: -0.0897128094220534
          total_loss: -0.0014266799262259156
          vf_explained_var: 0.6459944248199463
          vf_loss: 0.069005356868729
    num_agent_steps_sampled: 11532192
    num_steps_sampled: 11532192
    num_steps_trained: 11532192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1909,77611.4,11532192,1.82681,1.9856,-2,34.6607


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11536192
  custom_metrics: {}
  date: 2021-12-10_11-19-24
  done: false
  episode_len_mean: 39.388888888888886
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8136666704107214
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 238185
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8453296422958374
          entropy_coeff: 0.0
          kl: 0.013240732834674418
          policy_loss: -0.0973817203193903
          total_loss: -0.036327663518022746
          vf_explained_var: 0.8007259368896484
          vf_loss: 0.04094469593837857
    num_agent_steps_sampled: 11536192
    num_steps_sampled: 11536192
    num_steps_trained: 11536192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1910,77637.5,11536192,1.81367,1.9836,-2,39.3889


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11540192
  custom_metrics: {}
  date: 2021-12-10_11-19-49
  done: false
  episode_len_mean: 34.3
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.9317781871015376
  episode_reward_min: 1.7323999404907227
  episodes_this_iter: 110
  episodes_total: 238295
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8423013053834438
          entropy_coeff: 0.0
          kl: 0.014435176097322255
          policy_loss: -0.0978011169936508
          total_loss: -0.04267595644341782
          vf_explained_var: 0.65648353099823
          vf_loss: 0.03320173779502511
    num_agent_steps_sampled: 11540192
    num_steps_sampled: 11540192
    num_steps_trained: 11540192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1911,77662.8,11540192,1.93178,1.9836,1.7324,34.3


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11544192
  custom_metrics: {}
  date: 2021-12-10_11-20-15
  done: false
  episode_len_mean: 34.82608695652174
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9308208703994751
  episode_reward_min: 1.6507999897003174
  episodes_this_iter: 115
  episodes_total: 238410
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8875059299170971
          entropy_coeff: 0.0
          kl: 0.015380653843749315
          policy_loss: -0.10827940818853676
          total_loss: -0.050272085587494075
          vf_explained_var: 0.6964209079742432
          vf_loss: 0.03464795439504087
    num_agent_steps_sampled: 11544192
    num_steps_sampled: 11544192
    num_steps_trained: 11544192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1912,77688.2,11544192,1.93082,1.9812,1.6508,34.8261


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11548192
  custom_metrics: {}
  date: 2021-12-10_11-20-40
  done: false
  episode_len_mean: 36.14159292035398
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9280601834828874
  episode_reward_min: 1.5839999914169312
  episodes_this_iter: 113
  episodes_total: 238523
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8466738164424896
          entropy_coeff: 0.0
          kl: 0.015017376106698066
          policy_loss: -0.10978652804624289
          total_loss: -0.060377080750185996
          vf_explained_var: 0.6926627159118652
          vf_loss: 0.02660180546808988
    num_agent_steps_sampled: 11548192
    num_steps_sampled: 11548192
    num_steps_trained: 11548192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1913,77713.7,11548192,1.92806,1.9828,1.584,36.1416


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11552192
  custom_metrics: {}
  date: 2021-12-10_11-21-06
  done: false
  episode_len_mean: 37.10476190476191
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.851340955779666
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 238628
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8690639436244965
          entropy_coeff: 0.0
          kl: 0.012848649814259261
          policy_loss: -0.08822516805958003
          total_loss: -0.012012210325337946
          vf_explained_var: 0.6255800127983093
          vf_loss: 0.05669907317496836
    num_agent_steps_sampled: 11552192
    num_steps_sampled: 11552192
    num_steps_trained: 11552192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1914,77739.1,11552192,1.85134,1.9848,-2,37.1048


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11556192
  custom_metrics: {}
  date: 2021-12-10_11-21-31
  done: false
  episode_len_mean: 37.79245283018868
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8877018847555485
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 238734
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8894986398518085
          entropy_coeff: 0.0
          kl: 0.014381179062183946
          policy_loss: -0.10281346598640084
          total_loss: -0.04643188465706771
          vf_explained_var: 0.7783356308937073
          vf_loss: 0.03454016847535968
    num_agent_steps_sampled: 11556192
    num_steps_sampled: 11556192
    num_steps_trained: 11556192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1915,77764.2,11556192,1.8877,1.9836,-2,37.7925


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11560192
  custom_metrics: {}
  date: 2021-12-10_11-21-56
  done: false
  episode_len_mean: 36.70754716981132
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.9269886792830702
  episode_reward_min: 1.3240000009536743
  episodes_this_iter: 106
  episodes_total: 238840
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9066345915198326
          entropy_coeff: 0.0
          kl: 0.015824581612832844
          policy_loss: -0.10935696528758854
          total_loss: -0.053859225066844374
          vf_explained_var: 0.7033198475837708
          vf_loss: 0.03146415716037154
    num_agent_steps_sampled: 11560192
    num_steps_sampled: 11560192
    num_steps_trained: 11560192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1916,77789.3,11560192,1.92699,1.9812,1.324,36.7075


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11564192
  custom_metrics: {}
  date: 2021-12-10_11-22-21
  done: false
  episode_len_mean: 39.38095238095238
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8512266635894776
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 238945
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9048791229724884
          entropy_coeff: 0.0
          kl: 0.013021457649301738
          policy_loss: -0.09285080875270069
          total_loss: -0.019541048677638173
          vf_explained_var: 0.6968058347702026
          vf_loss: 0.05353342241141945
    num_agent_steps_sampled: 11564192
    num_steps_sampled: 11564192
    num_steps_trained: 11564192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1917,77814.5,11564192,1.85123,1.9856,-2,39.381


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11568192
  custom_metrics: {}
  date: 2021-12-10_11-22-46
  done: false
  episode_len_mean: 35.14912280701754
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8956245633593776
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 239059
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8439806215465069
          entropy_coeff: 0.0
          kl: 0.013253187818918377
          policy_loss: -0.09438931103795767
          total_loss: -0.027598414046224207
          vf_explained_var: 0.638924241065979
          vf_loss: 0.04666261840611696
    num_agent_steps_sampled: 11568192
    num_steps_sampled: 11568192
    num_steps_trained: 11568192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1918,77839.4,11568192,1.89562,1.984,-2,35.1491


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11572192
  custom_metrics: {}
  date: 2021-12-10_11-23-11
  done: false
  episode_len_mean: 37.366336633663366
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.8491841590050424
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 239160
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8971683494746685
          entropy_coeff: 0.0
          kl: 0.013278594182338566
          policy_loss: -0.09459593566134572
          total_loss: -0.02388214919483289
          vf_explained_var: 0.736020028591156
          vf_loss: 0.05054691876284778
    num_agent_steps_sampled: 11572192
    num_steps_sampled: 11572192
    num_steps_trained: 11572192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1919,77864.4,11572192,1.84918,1.9856,-2,37.3663


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11576192
  custom_metrics: {}
  date: 2021-12-10_11-23-36
  done: false
  episode_len_mean: 36.61344537815126
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8296806682057742
  episode_reward_min: -2.0
  episodes_this_iter: 119
  episodes_total: 239279
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.900835569947958
          entropy_coeff: 0.0
          kl: 0.012208126951009035
          policy_loss: -0.08659725519828498
          total_loss: -0.013748804980423301
          vf_explained_var: 0.7632747888565063
          vf_loss: 0.054307357873767614
    num_agent_steps_sampled: 11576192
    num_steps_sampled: 11576192
    num_steps_trained: 11576192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1920,77889.4,11576192,1.82968,1.984,-2,36.6134


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11580192
  custom_metrics: {}
  date: 2021-12-10_11-24-01
  done: false
  episode_len_mean: 27.367521367521366
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.945593160441798
  episode_reward_min: 1.7344000339508057
  episodes_this_iter: 117
  episodes_total: 239396
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8891491927206516
          entropy_coeff: 0.0
          kl: 0.01393530989298597
          policy_loss: -0.10348956729285419
          total_loss: -0.047917257936205715
          vf_explained_var: 0.7703640460968018
          vf_loss: 0.03440805815625936
    num_agent_steps_sampled: 11580192
    num_steps_sampled: 11580192
    num_steps_trained: 11580192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1921,77914.6,11580192,1.94559,1.9836,1.7344,27.3675


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11584192
  custom_metrics: {}
  date: 2021-12-10_11-24-26
  done: false
  episode_len_mean: 41.213592233009706
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8418679596150962
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 239499
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8862843699753284
          entropy_coeff: 0.0
          kl: 0.013686512946151197
          policy_loss: -0.0985071377363056
          total_loss: -0.027538564638234675
          vf_explained_var: 0.7667695879936218
          vf_loss: 0.05018217908218503
    num_agent_steps_sampled: 11584192
    num_steps_sampled: 11584192
    num_steps_trained: 11584192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1922,77939.6,11584192,1.84187,1.9808,-2,41.2136


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11588192
  custom_metrics: {}
  date: 2021-12-10_11-24-52
  done: false
  episode_len_mean: 40.08
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9202200031280519
  episode_reward_min: 1.4556000232696533
  episodes_this_iter: 92
  episodes_total: 239591
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.939892552793026
          entropy_coeff: 0.0
          kl: 0.014790182991418988
          policy_loss: -0.10766386846080422
          total_loss: -0.050900320522487164
          vf_explained_var: 0.7565551400184631
          vf_loss: 0.03430095547810197
    num_agent_steps_sampled: 11588192
    num_steps_sampled: 11588192
    num_steps_trained: 11588192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1923,77965.4,11588192,1.92022,1.9828,1.4556,40.08


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11592192
  custom_metrics: {}
  date: 2021-12-10_11-25-19
  done: false
  episode_len_mean: 46.09090909090909
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8761890964074568
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 239701
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8310074433684349
          entropy_coeff: 0.0
          kl: 0.013518464576918632
          policy_loss: -0.09939835182740353
          total_loss: -0.04210744402371347
          vf_explained_var: 0.7247704267501831
          vf_loss: 0.03675973869394511
    num_agent_steps_sampled: 11592192
    num_steps_sampled: 11592192
    num_steps_trained: 11592192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1924,77991.7,11592192,1.87619,1.9788,-2,46.0909


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11596192
  custom_metrics: {}
  date: 2021-12-10_11-25-45
  done: false
  episode_len_mean: 32.78761061946903
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8664460192739436
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 239814
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8534552045166492
          entropy_coeff: 0.0
          kl: 0.013781686953734607
          policy_loss: -0.09743553772568703
          total_loss: -0.0326846455282066
          vf_explained_var: 0.7787330746650696
          vf_loss: 0.04381995287258178
    num_agent_steps_sampled: 11596192
    num_steps_sampled: 11596192
    num_steps_trained: 11596192
  iterations_since_restore: 3

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1925,78017.9,11596192,1.86645,1.9796,-2,32.7876


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11600192
  custom_metrics: {}
  date: 2021-12-10_11-26-11
  done: false
  episode_len_mean: 35.02459016393443
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.9303442650153988
  episode_reward_min: 1.6303999423980713
  episodes_this_iter: 122
  episodes_total: 239936
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.814452949911356
          entropy_coeff: 0.0
          kl: 0.014374713937286288
          policy_loss: -0.09954014985123649
          total_loss: -0.0445159602095373
          vf_explained_var: 0.6674718856811523
          vf_loss: 0.033192592789418995
    num_agent_steps_sampled: 11600192
    num_steps_sampled: 11600192
    num_steps_trained: 11600192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1926,78044.3,11600192,1.93034,1.9856,1.6304,35.0246


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11604192
  custom_metrics: {}
  date: 2021-12-10_11-26-37
  done: false
  episode_len_mean: 32.44166666666667
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8701066662867865
  episode_reward_min: -2.0
  episodes_this_iter: 120
  episodes_total: 240056
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8258445039391518
          entropy_coeff: 0.0
          kl: 0.01222332421457395
          policy_loss: -0.08532968547660857
          total_loss: -0.03105335298459977
          vf_explained_var: 0.7525466680526733
          vf_loss: 0.03571215667761862
    num_agent_steps_sampled: 11604192
    num_steps_sampled: 11604192
    num_steps_trained: 11604192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1927,78070.3,11604192,1.87011,1.982,-2,32.4417


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11608192
  custom_metrics: {}
  date: 2021-12-10_11-27-02
  done: false
  episode_len_mean: 35.716814159292035
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8610053094087449
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 240169
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8054088540375233
          entropy_coeff: 0.0
          kl: 0.013502070563845336
          policy_loss: -0.09447846072725952
          total_loss: -0.03029686742229387
          vf_explained_var: 0.7542188167572021
          vf_loss: 0.043675322784110904
    num_agent_steps_sampled: 11608192
    num_steps_sampled: 11608192
    num_steps_trained: 11608192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1928,78095.4,11608192,1.86101,1.9808,-2,35.7168


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11612192
  custom_metrics: {}
  date: 2021-12-10_11-27-27
  done: false
  episode_len_mean: 39.1
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8828159999847411
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 240262
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8874615617096424
          entropy_coeff: 0.0
          kl: 0.014398929197341204
          policy_loss: -0.10207382158841938
          total_loss: -0.04113792988937348
          vf_explained_var: 0.7054991722106934
          vf_loss: 0.039067517151124775
    num_agent_steps_sampled: 11612192
    num_steps_sampled: 11612192
    num_steps_trained: 11612192
  iterations_since_restore: 388
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1929,78120.4,11612192,1.88282,1.9788,-2,39.1


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11616192
  custom_metrics: {}
  date: 2021-12-10_11-27-53
  done: false
  episode_len_mean: 34.653225806451616
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.7729612915746626
  episode_reward_min: -2.0
  episodes_this_iter: 124
  episodes_total: 240386
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8453703969717026
          entropy_coeff: 0.0
          kl: 0.012726507848128676
          policy_loss: -0.09060952463187277
          total_loss: -0.023952378804096952
          vf_explained_var: 0.7904117703437805
          vf_loss: 0.047328762244433165
    num_agent_steps_sampled: 11616192
    num_steps_sampled: 11616192
    num_steps_trained: 11616192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1930,78145.6,11616192,1.77296,1.9828,-2,34.6532


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11620192
  custom_metrics: {}
  date: 2021-12-10_11-28-18
  done: false
  episode_len_mean: 36.48543689320388
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8510679608409843
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 240489
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8779152818024158
          entropy_coeff: 0.0
          kl: 0.01339954906143248
          policy_loss: -0.0947404254693538
          total_loss: -0.02328360383398831
          vf_explained_var: 0.7922025322914124
          vf_loss: 0.05110624968074262
    num_agent_steps_sampled: 11620192
    num_steps_sampled: 11620192
    num_steps_trained: 11620192
  iterations_since_restore: 39

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1931,78170.7,11620192,1.85107,1.9816,-2,36.4854


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11624192
  custom_metrics: {}
  date: 2021-12-10_11-28-43
  done: false
  episode_len_mean: 32.7463768115942
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8505391297133074
  episode_reward_min: -2.0
  episodes_this_iter: 138
  episodes_total: 240627
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8031783774495125
          entropy_coeff: 0.0
          kl: 0.012150839669629931
          policy_loss: -0.08642237656749785
          total_loss: -0.011447174823842943
          vf_explained_var: 0.7030172348022461
          vf_loss: 0.05652110977098346
    num_agent_steps_sampled: 11624192
    num_steps_sampled: 11624192
    num_steps_trained: 11624192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1932,78195.8,11624192,1.85054,1.9808,-2,32.7464


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11628192
  custom_metrics: {}
  date: 2021-12-10_11-29-08
  done: false
  episode_len_mean: 32.4537037037037
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.9355222196490676
  episode_reward_min: 1.7259999513626099
  episodes_this_iter: 108
  episodes_total: 240735
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8342987969517708
          entropy_coeff: 0.0
          kl: 0.014332782418932766
          policy_loss: -0.09651210397714749
          total_loss: -0.02666284766746685
          vf_explained_var: 0.6357565522193909
          vf_loss: 0.0480813467875123
    num_agent_steps_sampled: 11628192
    num_steps_sampled: 11628192
    num_steps_trained: 11628192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1933,78221.2,11628192,1.93552,1.9808,1.726,32.4537


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11632192
  custom_metrics: {}
  date: 2021-12-10_11-29-34
  done: false
  episode_len_mean: 35.62068965517241
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8949689649302384
  episode_reward_min: -2.0
  episodes_this_iter: 116
  episodes_total: 240851
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8461539298295975
          entropy_coeff: 0.0
          kl: 0.013538131141103804
          policy_loss: -0.09629908896749839
          total_loss: -0.029993793461471796
          vf_explained_var: 0.6231368780136108
          vf_loss: 0.045744262053631246
    num_agent_steps_sampled: 11632192
    num_steps_sampled: 11632192
    num_steps_trained: 11632192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1934,78246.5,11632192,1.89497,1.9792,-2,35.6207


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11636192
  custom_metrics: {}
  date: 2021-12-10_11-29-59
  done: false
  episode_len_mean: 35.90350877192982
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.861091228953579
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 240965
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8405557796359062
          entropy_coeff: 0.0
          kl: 0.01314715895568952
          policy_loss: -0.09385782654862851
          total_loss: -0.037379072746261954
          vf_explained_var: 0.7967177629470825
          vf_loss: 0.03651150898076594
    num_agent_steps_sampled: 11636192
    num_steps_sampled: 11636192
    num_steps_trained: 11636192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1935,78271.5,11636192,1.86109,1.9788,-2,35.9035


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11640192
  custom_metrics: {}
  date: 2021-12-10_11-30-24
  done: false
  episode_len_mean: 33.16814159292036
  episode_media: {}
  episode_reward_max: 1.9855999946594238
  episode_reward_mean: 1.864778768699781
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 241078
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8435962833464146
          entropy_coeff: 0.0
          kl: 0.012747790664434433
          policy_loss: -0.08690131080220453
          total_loss: -0.025056601967662573
          vf_explained_var: 0.7672089338302612
          vf_loss: 0.042484001722186804
    num_agent_steps_sampled: 11640192
    num_steps_sampled: 11640192
    num_steps_trained: 11640192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1936,78296.2,11640192,1.86478,1.9856,-2,33.1681


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11644192
  custom_metrics: {}
  date: 2021-12-10_11-30-49
  done: false
  episode_len_mean: 41.179245283018865
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8808226382957314
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 241184
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8346298485994339
          entropy_coeff: 0.0
          kl: 0.013664369645994157
          policy_loss: -0.09776788609451614
          total_loss: -0.03849107981659472
          vf_explained_var: 0.638236403465271
          vf_loss: 0.0385240453761071
    num_agent_steps_sampled: 11644192
    num_steps_sampled: 11644192
    num_steps_trained: 11644192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1937,78321.6,11644192,1.88082,1.98,-2,41.1792


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11648192
  custom_metrics: {}
  date: 2021-12-10_11-31-15
  done: false
  episode_len_mean: 32.43362831858407
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.798353985347579
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 241297
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8735726214945316
          entropy_coeff: 0.0
          kl: 0.01182092382805422
          policy_loss: -0.08693999017123133
          total_loss: -0.0066130464256275445
          vf_explained_var: 0.728223443031311
          vf_loss: 0.06237391568720341
    num_agent_steps_sampled: 11648192
    num_steps_sampled: 11648192
    num_steps_trained: 11648192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1938,78347.5,11648192,1.79835,1.98,-2,32.4336


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11652192
  custom_metrics: {}
  date: 2021-12-10_11-31-40
  done: false
  episode_len_mean: 36.898148148148145
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8538703719774883
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 241405
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8266141563653946
          entropy_coeff: 0.0
          kl: 0.012596562039107084
          policy_loss: -0.09192807087674737
          total_loss: -0.015003603009972721
          vf_explained_var: 0.720396876335144
          vf_loss: 0.05779343796893954
    num_agent_steps_sampled: 11652192
    num_steps_sampled: 11652192
    num_steps_trained: 11652192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1939,78372.1,11652192,1.85387,1.9804,-2,36.8981


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11656192
  custom_metrics: {}
  date: 2021-12-10_11-32-04
  done: false
  episode_len_mean: 35.608333333333334
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8988466660181682
  episode_reward_min: -2.0
  episodes_this_iter: 120
  episodes_total: 241525
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8353894278407097
          entropy_coeff: 0.0
          kl: 0.013460280606523156
          policy_loss: -0.09755943156778812
          total_loss: -0.034987122839083895
          vf_explained_var: 0.6996175050735474
          vf_loss: 0.042129511130042374
    num_agent_steps_sampled: 11656192
    num_steps_sampled: 11656192
    num_steps_trained: 11656192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1940,78397.1,11656192,1.89885,1.9812,-2,35.6083


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11660192
  custom_metrics: {}
  date: 2021-12-10_11-32-30
  done: false
  episode_len_mean: 31.975409836065573
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9363672088404171
  episode_reward_min: 1.663599967956543
  episodes_this_iter: 122
  episodes_total: 241647
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8193654231727123
          entropy_coeff: 0.0
          kl: 0.014328259218018502
          policy_loss: -0.10424028569832444
          total_loss: -0.04148720216471702
          vf_explained_var: 0.6253616809844971
          vf_loss: 0.040992043912410736
    num_agent_steps_sampled: 11660192
    num_steps_sampled: 11660192
    num_steps_trained: 11660192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1941,78422.6,11660192,1.93637,1.982,1.6636,31.9754


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11664192
  custom_metrics: {}
  date: 2021-12-10_11-32-56
  done: false
  episode_len_mean: 32.741379310344826
  episode_media: {}
  episode_reward_max: 1.9767999649047852
  episode_reward_mean: 1.9016758614572986
  episode_reward_min: -2.0
  episodes_this_iter: 116
  episodes_total: 241763
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8790197968482971
          entropy_coeff: 0.0
          kl: 0.015126830257941037
          policy_loss: -0.10921749123372138
          total_loss: -0.058115175110287964
          vf_explained_var: 0.7616643905639648
          vf_loss: 0.028128441656008363
    num_agent_steps_sampled: 11664192
    num_steps_sampled: 11664192
    num_steps_trained: 11664192
  iterations_since_resto

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1942,78448.1,11664192,1.90168,1.9768,-2,32.7414


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11668192
  custom_metrics: {}
  date: 2021-12-10_11-33-22
  done: false
  episode_len_mean: 43.29
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.6482240045070649
  episode_reward_min: -2.0
  episodes_this_iter: 89
  episodes_total: 241852
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9641727358102798
          entropy_coeff: 0.0
          kl: 0.011516538623254746
          policy_loss: -0.08856201299931854
          total_loss: -0.009003151208162308
          vf_explained_var: 0.816753625869751
          vf_loss: 0.06206811754964292
    num_agent_steps_sampled: 11668192
    num_steps_sampled: 11668192
    num_steps_trained: 11668192
  iterations_since_restore: 402
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1943,78474,11668192,1.64822,1.9784,-2,43.29


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11672192
  custom_metrics: {}
  date: 2021-12-10_11-33-47
  done: false
  episode_len_mean: 42.23
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8382200014591217
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 241952
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8603732734918594
          entropy_coeff: 0.0
          kl: 0.013730405014939606
          policy_loss: -0.10113463958259672
          total_loss: -0.02912779338657856
          vf_explained_var: 0.7896655201911926
          vf_loss: 0.05115379486232996
    num_agent_steps_sampled: 11672192
    num_steps_sampled: 11672192
    num_steps_trained: 11672192
  iterations_since_restore: 403
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1944,78499.3,11672192,1.83822,1.9816,-2,42.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11676192
  custom_metrics: {}
  date: 2021-12-10_11-34-13
  done: false
  episode_len_mean: 31.486725663716815
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.9027327436261472
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 242065
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8744665756821632
          entropy_coeff: 0.0
          kl: 0.013473351835273206
          policy_loss: -0.10143090540077537
          total_loss: -0.02933701453730464
          vf_explained_var: 0.7687572240829468
          vf_loss: 0.051631242386065423
    num_agent_steps_sampled: 11676192
    num_steps_sampled: 11676192
    num_steps_trained: 11676192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1945,78525.5,11676192,1.90273,1.9828,-2,31.4867


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11680192
  custom_metrics: {}
  date: 2021-12-10_11-34-38
  done: false
  episode_len_mean: 39.08
  episode_media: {}
  episode_reward_max: 1.9783999919891357
  episode_reward_mean: 1.8840760028362273
  episode_reward_min: -2.0
  episodes_this_iter: 87
  episodes_total: 242152
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9436460696160793
          entropy_coeff: 0.0
          kl: 0.013721221475861967
          policy_loss: -0.1061088350834325
          total_loss: -0.04306558953248896
          vf_explained_var: 0.8222157955169678
          vf_loss: 0.042204140685498714
    num_agent_steps_sampled: 11680192
    num_steps_sampled: 11680192
    num_steps_trained: 11680192
  iterations_since_restore: 405
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1946,78550.9,11680192,1.88408,1.9784,-2,39.08


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11684192
  custom_metrics: {}
  date: 2021-12-10_11-35-04
  done: false
  episode_len_mean: 37.53846153846154
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8515730786782045
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 242256
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9090113192796707
          entropy_coeff: 0.0
          kl: 0.012496378447394818
          policy_loss: -0.0921040594112128
          total_loss: -0.020107631862629205
          vf_explained_var: 0.8107597827911377
          vf_loss: 0.05301754979882389
    num_agent_steps_sampled: 11684192
    num_steps_sampled: 11684192
    num_steps_trained: 11684192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1947,78576.5,11684192,1.85157,1.9836,-2,37.5385


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11688192
  custom_metrics: {}
  date: 2021-12-10_11-35-30
  done: false
  episode_len_mean: 37.77450980392157
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.769898035362655
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 242358
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.894443791359663
          entropy_coeff: 0.0
          kl: 0.012308963981922716
          policy_loss: -0.09202267989167012
          total_loss: -0.011977436442975886
          vf_explained_var: 0.7855377793312073
          vf_loss: 0.06135100265964866
    num_agent_steps_sampled: 11688192
    num_steps_sampled: 11688192
    num_steps_trained: 11688192
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1948,78602,11688192,1.7699,1.9832,-2,37.7745


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11692192
  custom_metrics: {}
  date: 2021-12-10_11-35-54
  done: false
  episode_len_mean: 47.21
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8269879949092864
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 242458
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9108349941670895
          entropy_coeff: 0.0
          kl: 0.013006330409552902
          policy_loss: -0.09730374999344349
          total_loss: -0.0184281381953042
          vf_explained_var: 0.7566013336181641
          vf_loss: 0.05912224855273962
    num_agent_steps_sampled: 11692192
    num_steps_sampled: 11692192
    num_steps_trained: 11692192
  iterations_since_restore: 408
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1949,78626.7,11692192,1.82699,1.9832,-2,47.21


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11696192
  custom_metrics: {}
  date: 2021-12-10_11-36-21
  done: false
  episode_len_mean: 34.57943925233645
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8940785020311302
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 242565
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.871790636330843
          entropy_coeff: 0.0
          kl: 0.013458211615215987
          policy_loss: -0.09852297394536436
          total_loss: -0.018184368032962084
          vf_explained_var: 0.7164939045906067
          vf_loss: 0.05989895109087229
    num_agent_steps_sampled: 11696192
    num_steps_sampled: 11696192
    num_steps_trained: 11696192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1950,78653.2,11696192,1.89408,1.9796,-2,34.5794


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11700192
  custom_metrics: {}
  date: 2021-12-10_11-36-47
  done: false
  episode_len_mean: 40.32673267326733
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.883211880627245
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 242666
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9193167686462402
          entropy_coeff: 0.0
          kl: 0.014593803673051298
          policy_loss: -0.10705084400251508
          total_loss: -0.03084075090009719
          vf_explained_var: 0.7465353012084961
          vf_loss: 0.054045750526711345
    num_agent_steps_sampled: 11700192
    num_steps_sampled: 11700192
    num_steps_trained: 11700192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1951,78679.1,11700192,1.88321,1.982,-2,40.3267


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11704192
  custom_metrics: {}
  date: 2021-12-10_11-37-12
  done: false
  episode_len_mean: 38.375
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8917178564838
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 242778
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8501314967870712
          entropy_coeff: 0.0
          kl: 0.013589660578873008
          policy_loss: -0.10045347874984145
          total_loss: -0.039238983270479366
          vf_explained_var: 0.7608597278594971
          vf_loss: 0.04057520069181919
    num_agent_steps_sampled: 11704192
    num_steps_sampled: 11704192
    num_steps_trained: 11704192
  iterations_since_restore: 411
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1952,78704.5,11704192,1.89172,1.982,-2,38.375


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11708192
  custom_metrics: {}
  date: 2021-12-10_11-37-38
  done: false
  episode_len_mean: 37.53508771929825
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.891217549641927
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 242892
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8733620084822178
          entropy_coeff: 0.0
          kl: 0.013559479149989784
          policy_loss: -0.09746789297787473
          total_loss: -0.03737811092287302
          vf_explained_var: 0.7520565390586853
          vf_loss: 0.039496326353400946
    num_agent_steps_sampled: 11708192
    num_steps_sampled: 11708192
    num_steps_trained: 11708192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1953,78730.2,11708192,1.89122,1.9816,-2,37.5351


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11712192
  custom_metrics: {}
  date: 2021-12-10_11-38-04
  done: false
  episode_len_mean: 40.49
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9194079983234404
  episode_reward_min: 1.4579999446868896
  episodes_this_iter: 93
  episodes_total: 242985
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8766363933682442
          entropy_coeff: 0.0
          kl: 0.01500457024667412
          policy_loss: -0.10955095069948584
          total_loss: -0.05146484047872946
          vf_explained_var: 0.6821335554122925
          vf_loss: 0.03529791999608278
    num_agent_steps_sampled: 11712192
    num_steps_sampled: 11712192
    num_steps_trained: 11712192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1954,78755.9,11712192,1.91941,1.9796,1.458,40.49


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11716192
  custom_metrics: {}
  date: 2021-12-10_11-38-29
  done: false
  episode_len_mean: 37.28695652173913
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8240834806276405
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 243100
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8734138421714306
          entropy_coeff: 0.0
          kl: 0.013307946734130383
          policy_loss: -0.09313193766865879
          total_loss: -0.021541479232837446
          vf_explained_var: 0.7686225771903992
          vf_loss: 0.05137901229318231
    num_agent_steps_sampled: 11716192
    num_steps_sampled: 11716192
    num_steps_trained: 11716192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1955,78781.6,11716192,1.82408,1.9792,-2,37.287


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11720192
  custom_metrics: {}
  date: 2021-12-10_11-38-55
  done: false
  episode_len_mean: 38.47
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.8055160009860993
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 243199
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8813470676541328
          entropy_coeff: 0.0
          kl: 0.011735437263268977
          policy_loss: -0.08920339122414589
          total_loss: -0.014598308538552374
          vf_explained_var: 0.7063661217689514
          vf_loss: 0.05678188521414995
    num_agent_steps_sampled: 11720192
    num_steps_sampled: 11720192
    num_steps_trained: 11720192
  iterations_since_restore: 415
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1956,78807.3,11720192,1.80552,1.9796,-2,38.47


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11724192
  custom_metrics: {}
  date: 2021-12-10_11-39-20
  done: false
  episode_len_mean: 38.46078431372549
  episode_media: {}
  episode_reward_max: 1.9775999784469604
  episode_reward_mean: 1.884772543813668
  episode_reward_min: -2.0
  episodes_this_iter: 102
  episodes_total: 243301
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9049177318811417
          entropy_coeff: 0.0
          kl: 0.013369705528020859
          policy_loss: -0.09623746675788425
          total_loss: -0.025370809016749263
          vf_explained_var: 0.6778831481933594
          vf_loss: 0.05056142155081034
    num_agent_steps_sampled: 11724192
    num_steps_sampled: 11724192
    num_steps_trained: 11724192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1957,78832.5,11724192,1.88477,1.9776,-2,38.4608


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11728192
  custom_metrics: {}
  date: 2021-12-10_11-39-46
  done: false
  episode_len_mean: 34.2972972972973
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8614630656199411
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 243412
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9113551191985607
          entropy_coeff: 0.0
          kl: 0.012944420392159373
          policy_loss: -0.09474846336524934
          total_loss: -0.03790495969587937
          vf_explained_var: 0.8044815063476562
          vf_loss: 0.0371841675369069
    num_agent_steps_sampled: 11728192
    num_steps_sampled: 11728192
    num_steps_trained: 11728192
  iterations_since_restore: 41

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1958,78858.4,11728192,1.86146,1.9832,-2,34.2973


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11732192
  custom_metrics: {}
  date: 2021-12-10_11-40-12
  done: false
  episode_len_mean: 39.61538461538461
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8459961460186884
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 243516
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8622802719473839
          entropy_coeff: 0.0
          kl: 0.013149321370292455
          policy_loss: -0.09260475973132998
          total_loss: -0.01715866755694151
          vf_explained_var: 0.7091213464736938
          vf_loss: 0.055475559551268816
    num_agent_steps_sampled: 11732192
    num_steps_sampled: 11732192
    num_steps_trained: 11732192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1959,78884.3,11732192,1.846,1.98,-2,39.6154


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11736192
  custom_metrics: {}
  date: 2021-12-10_11-40-38
  done: false
  episode_len_mean: 35.032786885245905
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8673016331234917
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 243638
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.812785942107439
          entropy_coeff: 0.0
          kl: 0.012638902466278523
          policy_loss: -0.09264568041544408
          total_loss: -0.025358724873512983
          vf_explained_var: 0.70755535364151
          vf_loss: 0.04809162486344576
    num_agent_steps_sampled: 11736192
    num_steps_sampled: 11736192
    num_steps_trained: 11736192
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1960,78910.3,11736192,1.8673,1.9792,-2,35.0328


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11740192
  custom_metrics: {}
  date: 2021-12-10_11-41-04
  done: false
  episode_len_mean: 37.18
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8880560040473937
  episode_reward_min: -2.0
  episodes_this_iter: 97
  episodes_total: 243735
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8769999630749226
          entropy_coeff: 0.0
          kl: 0.014074865321163088
          policy_loss: -0.10039463656721637
          total_loss: -0.03546500619268045
          vf_explained_var: 0.7655290365219116
          vf_loss: 0.04355342825874686
    num_agent_steps_sampled: 11740192
    num_steps_sampled: 11740192
    num_steps_trained: 11740192
  iterations_since_restore: 420
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1961,78936.3,11740192,1.88806,1.9816,-2,37.18


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11744192
  custom_metrics: {}
  date: 2021-12-10_11-41-30
  done: false
  episode_len_mean: 40.924528301886795
  episode_media: {}
  episode_reward_max: 1.982800006866455
  episode_reward_mean: 1.881581133266665
  episode_reward_min: -2.0
  episodes_this_iter: 106
  episodes_total: 243841
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8494097702205181
          entropy_coeff: 0.0
          kl: 0.014094353304244578
          policy_loss: -0.09682739642448723
          total_loss: -0.03356366866501048
          vf_explained_var: 0.7219505906105042
          vf_loss: 0.041857929434627295
    num_agent_steps_sampled: 11744192
    num_steps_sampled: 11744192
    num_steps_trained: 11744192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1962,78962.4,11744192,1.88158,1.9828,-2,40.9245


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11748192
  custom_metrics: {}
  date: 2021-12-10_11-41-57
  done: false
  episode_len_mean: 35.598290598290596
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.7608854678960948
  episode_reward_min: -2.0
  episodes_this_iter: 117
  episodes_total: 243958
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8461425676941872
          entropy_coeff: 0.0
          kl: 0.011856057681143284
          policy_loss: -0.08289290138054639
          total_loss: -0.003361987415701151
          vf_explained_var: 0.7379442453384399
          vf_loss: 0.06152452970854938
    num_agent_steps_sampled: 11748192
    num_steps_sampled: 11748192
    num_steps_trained: 11748192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1963,78988.4,11748192,1.76089,1.9832,-2,35.5983


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11752192
  custom_metrics: {}
  date: 2021-12-10_11-42-23
  done: false
  episode_len_mean: 38.416666666666664
  episode_media: {}
  episode_reward_max: 1.9803999662399292
  episode_reward_mean: 1.8503888834405828
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 244066
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8637830354273319
          entropy_coeff: 0.0
          kl: 0.012676570157054812
          policy_loss: -0.09120232483837754
          total_loss: -0.016827011786517687
          vf_explained_var: 0.6771830320358276
          vf_loss: 0.05512277199886739
    num_agent_steps_sampled: 11752192
    num_steps_sampled: 11752192
    num_steps_trained: 11752192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1964,79014.4,11752192,1.85039,1.9804,-2,38.4167


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11756192
  custom_metrics: {}
  date: 2021-12-10_11-42-48
  done: false
  episode_len_mean: 35.44761904761905
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8944000005722046
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 244171
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8967011570930481
          entropy_coeff: 0.0
          kl: 0.014754654141142964
          policy_loss: -0.10622503503691405
          total_loss: -0.04162953089689836
          vf_explained_var: 0.7561066746711731
          vf_loss: 0.042186872102320194
    num_agent_steps_sampled: 11756192
    num_steps_sampled: 11756192
    num_steps_trained: 11756192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1965,79040,11756192,1.8944,1.98,-2,35.4476


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11760192
  custom_metrics: {}
  date: 2021-12-10_11-43-13
  done: false
  episode_len_mean: 36.62
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.8875560057163239
  episode_reward_min: -2.0
  episodes_this_iter: 100
  episodes_total: 244271
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9236013144254684
          entropy_coeff: 0.0
          kl: 0.013741498754825443
          policy_loss: -0.10234929202124476
          total_loss: -0.04012421268271282
          vf_explained_var: 0.7945267558097839
          vf_loss: 0.0413551788078621
    num_agent_steps_sampled: 11760192
    num_steps_sampled: 11760192
    num_steps_trained: 11760192
  iterations_since_restore: 425
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1966,79064.8,11760192,1.88756,1.9792,-2,36.62


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11764192
  custom_metrics: {}
  date: 2021-12-10_11-43-38
  done: false
  episode_len_mean: 37.74311926605505
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.892913765863541
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 244380
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8959571942687035
          entropy_coeff: 0.0
          kl: 0.012904982548207045
          policy_loss: -0.09488006751053035
          total_loss: -0.04066187376156449
          vf_explained_var: 0.7831233143806458
          vf_loss: 0.034618753008544445
    num_agent_steps_sampled: 11764192
    num_steps_sampled: 11764192
    num_steps_trained: 11764192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1967,79089.9,11764192,1.89291,1.9812,-2,37.7431


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11768192
  custom_metrics: {}
  date: 2021-12-10_11-44-03
  done: false
  episode_len_mean: 36.627272727272725
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8568836353041909
  episode_reward_min: -2.0
  episodes_this_iter: 110
  episodes_total: 244490
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8388654552400112
          entropy_coeff: 0.0
          kl: 0.012999128201045096
          policy_loss: -0.0911819173488766
          total_loss: -0.027914166392292827
          vf_explained_var: 0.7133165597915649
          vf_loss: 0.043525329674594104
    num_agent_steps_sampled: 11768192
    num_steps_sampled: 11768192
    num_steps_trained: 11768192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1968,79115,11768192,1.85688,1.9824,-2,36.6273


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11772192
  custom_metrics: {}
  date: 2021-12-10_11-44-28
  done: false
  episode_len_mean: 34.91304347826087
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.896473042861275
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 244605
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8006400614976883
          entropy_coeff: 0.0
          kl: 0.0148208899772726
          policy_loss: -0.09765892941504717
          total_loss: -0.031071961973793805
          vf_explained_var: 0.6906082630157471
          vf_loss: 0.044077740400098264
    num_agent_steps_sampled: 11772192
    num_steps_sampled: 11772192
    num_steps_trained: 11772192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1969,79139.8,11772192,1.89647,1.982,-2,34.913


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11776192
  custom_metrics: {}
  date: 2021-12-10_11-44-54
  done: false
  episode_len_mean: 35.31481481481482
  episode_media: {}
  episode_reward_max: 1.9772000312805176
  episode_reward_mean: 1.820807409507257
  episode_reward_min: -2.0
  episodes_this_iter: 108
  episodes_total: 244713
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8822687603533268
          entropy_coeff: 0.0
          kl: 0.012608593853656203
          policy_loss: -0.09126084932358935
          total_loss: -0.02540637823403813
          vf_explained_var: 0.8011201620101929
          vf_loss: 0.04670516971964389
    num_agent_steps_sampled: 11776192
    num_steps_sampled: 11776192
    num_steps_trained: 11776192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1970,79165.3,11776192,1.82081,1.9772,-2,35.3148


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11780192
  custom_metrics: {}
  date: 2021-12-10_11-45-19
  done: false
  episode_len_mean: 39.76699029126213
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8837009703071372
  episode_reward_min: -2.0
  episodes_this_iter: 103
  episodes_total: 244816
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8760522119700909
          entropy_coeff: 0.0
          kl: 0.013679783092811704
          policy_loss: -0.10009238263592124
          total_loss: -0.04462536162463948
          vf_explained_var: 0.7607392072677612
          vf_loss: 0.03469085192773491
    num_agent_steps_sampled: 11780192
    num_steps_sampled: 11780192
    num_steps_trained: 11780192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1971,79191.1,11780192,1.8837,1.9816,-2,39.767


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11784192
  custom_metrics: {}
  date: 2021-12-10_11-45-44
  done: false
  episode_len_mean: 35.81147540983606
  episode_media: {}
  episode_reward_max: 1.979599952697754
  episode_reward_mean: 1.9287409889893454
  episode_reward_min: 1.2583999633789062
  episodes_this_iter: 122
  episodes_total: 244938
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.787653673440218
          entropy_coeff: 0.0
          kl: 0.015359016659203917
          policy_loss: -0.1042133424198255
          total_loss: -0.050048223325575236
          vf_explained_var: 0.567689061164856
          vf_loss: 0.03083861607592553
    num_agent_steps_sampled: 11784192
    num_steps_sampled: 11784192
    num_steps_trained: 11784192
  iterations_sin

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1972,79215.5,11784192,1.92874,1.9796,1.2584,35.8115


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11788192
  custom_metrics: {}
  date: 2021-12-10_11-46-09
  done: false
  episode_len_mean: 33.50833333333333
  episode_media: {}
  episode_reward_max: 1.979200005531311
  episode_reward_mean: 1.9334199994802475
  episode_reward_min: 1.714400053024292
  episodes_this_iter: 120
  episodes_total: 245058
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8394153416156769
          entropy_coeff: 0.0
          kl: 0.015425195335410535
          policy_loss: -0.10996485617943108
          total_loss: -0.056378655368462205
          vf_explained_var: 0.6367894411087036
          vf_loss: 0.03015918133314699
    num_agent_steps_sampled: 11788192
    num_steps_sampled: 11788192
    num_steps_trained: 11788192
  iterations_s

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1973,79240.4,11788192,1.93342,1.9792,1.7144,33.5083


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11792192
  custom_metrics: {}
  date: 2021-12-10_11-46-34
  done: false
  episode_len_mean: 35.735849056603776
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9289698162168827
  episode_reward_min: 1.5908000469207764
  episodes_this_iter: 106
  episodes_total: 245164
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8648854717612267
          entropy_coeff: 0.0
          kl: 0.015286502428352833
          policy_loss: -0.11011071142274886
          total_loss: -0.05585179501213133
          vf_explained_var: 0.6591100692749023
          vf_loss: 0.031042546848766506
    num_agent_steps_sampled: 11792192
    num_steps_sampled: 11792192
    num_steps_trained: 11792192
  iterations

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1974,79265.5,11792192,1.92897,1.9816,1.5908,35.7358


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11796192
  custom_metrics: {}
  date: 2021-12-10_11-46-58
  done: false
  episode_len_mean: 32.70866141732284
  episode_media: {}
  episode_reward_max: 1.9788000583648682
  episode_reward_mean: 1.8119905492452186
  episode_reward_min: -2.0
  episodes_this_iter: 127
  episodes_total: 245291
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8235518261790276
          entropy_coeff: 0.0
          kl: 0.01123509427998215
          policy_loss: -0.07979302678722888
          total_loss: -0.00568104068224784
          vf_explained_var: 0.7079112529754639
          vf_loss: 0.05704868887551129
    num_agent_steps_sampled: 11796192
    num_steps_sampled: 11796192
    num_steps_trained: 11796192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1975,79289.9,11796192,1.81199,1.9788,-2,32.7087


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11800192
  custom_metrics: {}
  date: 2021-12-10_11-47-23
  done: false
  episode_len_mean: 33.98360655737705
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.870019676255398
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 245413
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8118643313646317
          entropy_coeff: 0.0
          kl: 0.012337323045358062
          policy_loss: -0.08743431000038981
          total_loss: -0.03181010572006926
          vf_explained_var: 0.748587965965271
          vf_loss: 0.03688689472619444
    num_agent_steps_sampled: 11800192
    num_steps_sampled: 11800192
    num_steps_trained: 11800192
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1976,79314.4,11800192,1.87002,1.98,-2,33.9836


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11804192
  custom_metrics: {}
  date: 2021-12-10_11-47-48
  done: false
  episode_len_mean: 29.470588235294116
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9083025225070345
  episode_reward_min: -2.0
  episodes_this_iter: 119
  episodes_total: 245532
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8608514554798603
          entropy_coeff: 0.0
          kl: 0.013805451802909374
          policy_loss: -0.09868744912091643
          total_loss: -0.04220261229784228
          vf_explained_var: 0.771528959274292
          vf_loss: 0.035517805139534175
    num_agent_steps_sampled: 11804192
    num_steps_sampled: 11804192
    num_steps_trained: 11804192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1977,79339.2,11804192,1.9083,1.982,-2,29.4706


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11808192
  custom_metrics: {}
  date: 2021-12-10_11-48-13
  done: false
  episode_len_mean: 37.92920353982301
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.9245345138870509
  episode_reward_min: 1.291200041770935
  episodes_this_iter: 113
  episodes_total: 245645
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8338025808334351
          entropy_coeff: 0.0
          kl: 0.01441017776960507
          policy_loss: -0.10260192956775427
          total_loss: -0.0448021802585572
          vf_explained_var: 0.7539808750152588
          vf_loss: 0.03591429442167282
    num_agent_steps_sampled: 11808192
    num_steps_sampled: 11808192
    num_steps_trained: 11808192
  iterations_sinc

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1978,79364.5,11808192,1.92453,1.9816,1.2912,37.9292


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11812192
  custom_metrics: {}
  date: 2021-12-10_11-48-38
  done: false
  episode_len_mean: 36.95238095238095
  episode_media: {}
  episode_reward_max: 1.981600046157837
  episode_reward_mean: 1.8893028554462252
  episode_reward_min: -2.0
  episodes_this_iter: 105
  episodes_total: 245750
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8742672204971313
          entropy_coeff: 0.0
          kl: 0.015410274383611977
          policy_loss: -0.10876352025661618
          total_loss: -0.05503502604551613
          vf_explained_var: 0.8183420300483704
          vf_loss: 0.03032413637265563
    num_agent_steps_sampled: 11812192
    num_steps_sampled: 11812192
    num_steps_trained: 11812192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1979,79389.1,11812192,1.8893,1.9816,-2,36.9524


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11816192
  custom_metrics: {}
  date: 2021-12-10_11-49-02
  done: false
  episode_len_mean: 35.142857142857146
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.8637714279549462
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 245862
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8348048366606236
          entropy_coeff: 0.0
          kl: 0.013813081895932555
          policy_loss: -0.09729803289519623
          total_loss: -0.03611958434339613
          vf_explained_var: 0.777450442314148
          vf_loss: 0.04019983054604381
    num_agent_steps_sampled: 11816192
    num_steps_sampled: 11816192
    num_steps_trained: 11816192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1980,79413.8,11816192,1.86377,1.984,-2,35.1429


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11820192
  custom_metrics: {}
  date: 2021-12-10_11-49-28
  done: false
  episode_len_mean: 39.56
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8052999997138977
  episode_reward_min: -2.0
  episodes_this_iter: 93
  episodes_total: 245955
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9428557604551315
          entropy_coeff: 0.0
          kl: 0.01392976735951379
          policy_loss: -0.10259502311237156
          total_loss: -0.02552324184216559
          vf_explained_var: 0.7761681079864502
          vf_loss: 0.05591595219448209
    num_agent_steps_sampled: 11820192
    num_steps_sampled: 11820192
    num_steps_trained: 11820192
  iterations_since_restore: 440
  node_ip:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1981,79439.2,11820192,1.8053,1.9832,-2,39.56


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11824192
  custom_metrics: {}
  date: 2021-12-10_11-49-53
  done: false
  episode_len_mean: 37.28695652173913
  episode_media: {}
  episode_reward_max: 1.9800000190734863
  episode_reward_mean: 1.8251582612162052
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 246070
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8648034334182739
          entropy_coeff: 0.0
          kl: 0.012330060126259923
          policy_loss: -0.09119419270427898
          total_loss: -0.01778863527579233
          vf_explained_var: 0.767644464969635
          vf_loss: 0.05467927874997258
    num_agent_steps_sampled: 11824192
    num_steps_sampled: 11824192
    num_steps_trained: 11824192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1982,79464.8,11824192,1.82516,1.98,-2,37.287


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11828192
  custom_metrics: {}
  date: 2021-12-10_11-50-19
  done: false
  episode_len_mean: 34.24166666666667
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8664466629425684
  episode_reward_min: -2.0
  episodes_this_iter: 120
  episodes_total: 246190
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8610823638737202
          entropy_coeff: 0.0
          kl: 0.012228474603034556
          policy_loss: -0.08931253862101585
          total_loss: -0.0174202723428607
          vf_explained_var: 0.7214224338531494
          vf_loss: 0.053320275037549436
    num_agent_steps_sampled: 11828192
    num_steps_sampled: 11828192
    num_steps_trained: 11828192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1983,79490.6,11828192,1.86645,1.9824,-2,34.2417


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11832192
  custom_metrics: {}
  date: 2021-12-10_11-50-45
  done: false
  episode_len_mean: 36.05309734513274
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.893125660651553
  episode_reward_min: -2.0
  episodes_this_iter: 113
  episodes_total: 246303
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8446708284318447
          entropy_coeff: 0.0
          kl: 0.013905981846619397
          policy_loss: -0.09853417798876762
          total_loss: -0.03482661419548094
          vf_explained_var: 0.6689346432685852
          vf_loss: 0.04258784977719188
    num_agent_steps_sampled: 11832192
    num_steps_sampled: 11832192
    num_steps_trained: 11832192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1984,79515.7,11832192,1.89313,1.9824,-2,36.0531


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11836192
  custom_metrics: {}
  date: 2021-12-10_11-51-10
  done: false
  episode_len_mean: 31.0
  episode_media: {}
  episode_reward_max: 1.983199954032898
  episode_reward_mean: 1.8708521749662317
  episode_reward_min: -2.0
  episodes_this_iter: 115
  episodes_total: 246418
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9218113198876381
          entropy_coeff: 0.0
          kl: 0.013655983610078692
          policy_loss: -0.10223072429653257
          total_loss: -0.04710119526134804
          vf_explained_var: 0.7875239849090576
          vf_loss: 0.034389502950944006
    num_agent_steps_sampled: 11836192
    num_steps_sampled: 11836192
    num_steps_trained: 11836192
  iterations_since_restore: 444
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1985,79541,11836192,1.87085,1.9832,-2,31


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11840192
  custom_metrics: {}
  date: 2021-12-10_11-51-36
  done: false
  episode_len_mean: 38.38461538461539
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8903769231759584
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 246522
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8858189508318901
          entropy_coeff: 0.0
          kl: 0.013699425850063562
          policy_loss: -0.09501065750373527
          total_loss: -0.04024416898027994
          vf_explained_var: 0.7747111320495605
          vf_loss: 0.03396048687864095
    num_agent_steps_sampled: 11840192
    num_steps_sampled: 11840192
    num_steps_trained: 11840192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1986,79566.7,11840192,1.89038,1.9824,-2,38.3846


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11844192
  custom_metrics: {}
  date: 2021-12-10_11-52-01
  done: false
  episode_len_mean: 37.3421052631579
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.89817192993666
  episode_reward_min: -2.0
  episodes_this_iter: 114
  episodes_total: 246636
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.860790554434061
          entropy_coeff: 0.0
          kl: 0.013811623386573046
          policy_loss: -0.09793433640152216
          total_loss: -0.04191237664781511
          vf_explained_var: 0.7608639001846313
          vf_loss: 0.035045561264269054
    num_agent_steps_sampled: 11844192
    num_steps_sampled: 11844192
    num_steps_trained: 11844192
  iterations_since_restore: 44

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1987,79592.2,11844192,1.89817,1.9812,-2,37.3421


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11848192
  custom_metrics: {}
  date: 2021-12-10_11-52-26
  done: false
  episode_len_mean: 36.67289719626168
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.745046735923981
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 246743
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.865841880440712
          entropy_coeff: 0.0
          kl: 0.011985814315266907
          policy_loss: -0.08724083303241059
          total_loss: 0.009870187157503096
          vf_explained_var: 0.7121586799621582
          vf_loss: 0.07890756637789309
    num_agent_steps_sampled: 11848192
    num_steps_sampled: 11848192
    num_steps_trained: 11848192
  iterations_since_restore: 4

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1988,79617.4,11848192,1.74505,1.9836,-2,36.6729


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11852192
  custom_metrics: {}
  date: 2021-12-10_11-52-52
  done: false
  episode_len_mean: 36.23
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.8886880016326903
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 246842
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.9028896726667881
          entropy_coeff: 0.0
          kl: 0.015222499438095838
          policy_loss: -0.11073258696706034
          total_loss: -0.05116557018482126
          vf_explained_var: 0.7362788915634155
          vf_loss: 0.03644784679636359
    num_agent_steps_sampled: 11852192
    num_steps_sampled: 11852192
    num_steps_trained: 11852192
  iterations_since_restore: 448
  node_i

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1989,79642.8,11852192,1.88869,1.9824,-2,36.23


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11856192
  custom_metrics: {}
  date: 2021-12-10_11-53-17
  done: false
  episode_len_mean: 42.33644859813084
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8421345793198203
  episode_reward_min: -2.0
  episodes_this_iter: 107
  episodes_total: 246949
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8230338990688324
          entropy_coeff: 0.0
          kl: 0.013731549202930182
          policy_loss: -0.09451746568083763
          total_loss: -0.021729152998887002
          vf_explained_var: 0.7145707607269287
          vf_loss: 0.05193351791240275
    num_agent_steps_sampled: 11856192
    num_steps_sampled: 11856192
    num_steps_trained: 11856192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1990,79668.3,11856192,1.84213,1.9812,-2,42.3364


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11860192
  custom_metrics: {}
  date: 2021-12-10_11-53-42
  done: false
  episode_len_mean: 34.358333333333334
  episode_media: {}
  episode_reward_max: 1.9823999404907227
  episode_reward_mean: 1.868320002158483
  episode_reward_min: -2.0
  episodes_this_iter: 120
  episodes_total: 247069
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8176069632172585
          entropy_coeff: 0.0
          kl: 0.013952295703347772
          policy_loss: -0.10046382597647607
          total_loss: -0.03586104523856193
          vf_explained_var: 0.7597270011901855
          vf_loss: 0.043412732891738415
    num_agent_steps_sampled: 11860192
    num_steps_sampled: 11860192
    num_steps_trained: 11860192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1991,79693.4,11860192,1.86832,1.9824,-2,34.3583


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11864192
  custom_metrics: {}
  date: 2021-12-10_11-54-08
  done: false
  episode_len_mean: 37.018348623853214
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8914825621001217
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 247178
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.858603797852993
          entropy_coeff: 0.0
          kl: 0.013449539081193507
          policy_loss: -0.09780708607286215
          total_loss: -0.04233048653986771
          vf_explained_var: 0.789972722530365
          vf_loss: 0.03505011333618313
    num_agent_steps_sampled: 11864192
    num_steps_sampled: 11864192
    num_steps_trained: 11864192
  iterations_since_restore: 

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1992,79718.7,11864192,1.89148,1.9836,-2,37.0183


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11868192
  custom_metrics: {}
  date: 2021-12-10_11-54-33
  done: false
  episode_len_mean: 31.973214285714285
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.8315750009247236
  episode_reward_min: -2.0
  episodes_this_iter: 112
  episodes_total: 247290
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.878959909081459
          entropy_coeff: 0.0
          kl: 0.011964062287006527
          policy_loss: -0.08542770508211106
          total_loss: -0.015542863140581176
          vf_explained_var: 0.7458726167678833
          vf_loss: 0.05171442241407931
    num_agent_steps_sampled: 11868192
    num_steps_sampled: 11868192
    num_steps_trained: 11868192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1993,79744.3,11868192,1.83158,1.982,-2,31.9732


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11872192
  custom_metrics: {}
  date: 2021-12-10_11-54-59
  done: false
  episode_len_mean: 36.96610169491525
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8263728800466505
  episode_reward_min: -2.0
  episodes_this_iter: 118
  episodes_total: 247408
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8095873445272446
          entropy_coeff: 0.0
          kl: 0.011658302042633295
          policy_loss: -0.0829869331791997
          total_loss: -0.012647311057662591
          vf_explained_var: 0.7592601776123047
          vf_loss: 0.05263357609510422
    num_agent_steps_sampled: 11872192
    num_steps_sampled: 11872192
    num_steps_trained: 11872192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1994,79770.1,11872192,1.82637,1.9812,-2,36.9661


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11876192
  custom_metrics: {}
  date: 2021-12-10_11-55-25
  done: false
  episode_len_mean: 33.452991452991455
  episode_media: {}
  episode_reward_max: 1.9839999675750732
  episode_reward_mean: 1.7994153846023428
  episode_reward_min: -2.0
  episodes_this_iter: 117
  episodes_total: 247525
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8534245379269123
          entropy_coeff: 0.0
          kl: 0.011381522053852677
          policy_loss: -0.08666792698204517
          total_loss: -0.007382104231510311
          vf_explained_var: 0.8069893717765808
          vf_loss: 0.06200013472698629
    num_agent_steps_sampled: 11876192
    num_steps_sampled: 11876192
    num_steps_trained: 11876192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1995,79795.9,11876192,1.79942,1.984,-2,33.453


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11880192
  custom_metrics: {}
  date: 2021-12-10_11-55-50
  done: false
  episode_len_mean: 33.614754098360656
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.80438360523005
  episode_reward_min: -2.0
  episodes_this_iter: 122
  episodes_total: 247647
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8174278400838375
          entropy_coeff: 0.0
          kl: 0.011646651546470821
          policy_loss: -0.09214055869961157
          total_loss: -0.021241083304630592
          vf_explained_var: 0.7393742799758911
          vf_loss: 0.05321112251840532
    num_agent_steps_sampled: 11880192
    num_steps_sampled: 11880192
    num_steps_trained: 11880192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1996,79821.1,11880192,1.80438,1.982,-2,33.6148


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11884192
  custom_metrics: {}
  date: 2021-12-10_11-56-15
  done: false
  episode_len_mean: 30.696
  episode_media: {}
  episode_reward_max: 1.9847999811172485
  episode_reward_mean: 1.876300802230835
  episode_reward_min: -2.0
  episodes_this_iter: 125
  episodes_total: 247772
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8083821274340153
          entropy_coeff: 0.0
          kl: 0.012752223119605333
          policy_loss: -0.09125595982186496
          total_loss: -0.019290094496682286
          vf_explained_var: 0.8054289817810059
          vf_loss: 0.05259842798113823
    num_agent_steps_sampled: 11884192
    num_steps_sampled: 11884192
    num_steps_trained: 11884192
  iterations_since_restore: 456
  node

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1997,79845.9,11884192,1.8763,1.9848,-2,30.696


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11888192
  custom_metrics: {}
  date: 2021-12-10_11-56-40
  done: false
  episode_len_mean: 37.92660550458716
  episode_media: {}
  episode_reward_max: 1.9808000326156616
  episode_reward_mean: 1.8888403724092957
  episode_reward_min: -2.0
  episodes_this_iter: 109
  episodes_total: 247881
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8885329216718674
          entropy_coeff: 0.0
          kl: 0.013757954991888255
          policy_loss: -0.10022067127283663
          total_loss: -0.022045856283511966
          vf_explained_var: 0.7051605582237244
          vf_loss: 0.057279919274151325
    num_agent_steps_sampled: 11888192
    num_steps_sampled: 11888192
    num_steps_trained: 11888192
  iterations_since_restor

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1998,79871,11888192,1.88884,1.9808,-2,37.9266


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11892192
  custom_metrics: {}
  date: 2021-12-10_11-57-05
  done: false
  episode_len_mean: 30.764705882352942
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.9389176418801315
  episode_reward_min: 1.541200041770935
  episodes_this_iter: 119
  episodes_total: 248000
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8442648686468601
          entropy_coeff: 0.0
          kl: 0.013901715748943388
          policy_loss: -0.10213440368534066
          total_loss: -0.04059363866690546
          vf_explained_var: 0.7689844369888306
          vf_loss: 0.04042753390967846
    num_agent_steps_sampled: 11892192
    num_steps_sampled: 11892192
    num_steps_trained: 11892192
  iterations_

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,1999,79896.1,11892192,1.93892,1.982,1.5412,30.7647


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11896192
  custom_metrics: {}
  date: 2021-12-10_11-57-30
  done: false
  episode_len_mean: 38.95192307692308
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.922446147753642
  episode_reward_min: 1.6859999895095825
  episodes_this_iter: 104
  episodes_total: 248104
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8419547863304615
          entropy_coeff: 0.0
          kl: 0.01498340890975669
          policy_loss: -0.10604689316824079
          total_loss: -0.04173067741794512
          vf_explained_var: 0.6981849670410156
          vf_loss: 0.04156016279011965
    num_agent_steps_sampled: 11896192
    num_steps_sampled: 11896192
    num_steps_trained: 11896192
  iterations_si

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,2000,79921.1,11896192,1.92245,1.982,1.686,38.9519


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11900192
  custom_metrics: {}
  date: 2021-12-10_11-57-56
  done: false
  episode_len_mean: 32.153153153153156
  episode_media: {}
  episode_reward_max: 1.9819999933242798
  episode_reward_mean: 1.83220180197879
  episode_reward_min: -2.0
  episodes_this_iter: 111
  episodes_total: 248215
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.7803399674594402
          entropy_coeff: 0.0
          kl: 0.013380690128542483
          policy_loss: -0.09171905263792723
          total_loss: -0.027643421082757413
          vf_explained_var: 0.7601189613342285
          vf_loss: 0.04375370766501874
    num_agent_steps_sampled: 11900192
    num_steps_sampled: 11900192
    num_steps_trained: 11900192
  iterations_since_restore:

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,2001,79946.3,11900192,1.8322,1.982,-2,32.1532


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11904192
  custom_metrics: {}
  date: 2021-12-10_11-58-21
  done: false
  episode_len_mean: 43.625
  episode_media: {}
  episode_reward_max: 1.9836000204086304
  episode_reward_mean: 1.8375000030948565
  episode_reward_min: -2.0
  episodes_this_iter: 104
  episodes_total: 248319
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8873762898147106
          entropy_coeff: 0.0
          kl: 0.012954826350323856
          policy_loss: -0.09336647461168468
          total_loss: -0.008808918704744428
          vf_explained_var: 0.6757285594940186
          vf_loss: 0.06488241394981742
    num_agent_steps_sampled: 11904192
    num_steps_sampled: 11904192
    num_steps_trained: 11904192
  iterations_since_restore: 461
  nod

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,2002,79971.8,11904192,1.8375,1.9836,-2,43.625


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11908192
  custom_metrics: {}
  date: 2021-12-10_11-58-46
  done: false
  episode_len_mean: 37.415841584158414
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.8478297065980365
  episode_reward_min: -2.0
  episodes_this_iter: 101
  episodes_total: 248420
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8957098796963692
          entropy_coeff: 0.0
          kl: 0.013096757349558175
          policy_loss: -0.09600563207641244
          total_loss: -0.024727920826990157
          vf_explained_var: 0.761868953704834
          vf_loss: 0.05138701177202165
    num_agent_steps_sampled: 11908192
    num_steps_sampled: 11908192
    num_steps_trained: 11908192
  iterations_since_restore

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,2003,79996.7,11908192,1.84783,1.9812,-2,37.4158


Result for PPO_Soccer_a0663_00000:
  agent_timesteps_total: 11912192
  custom_metrics: {}
  date: 2021-12-10_11-59-12
  done: true
  episode_len_mean: 39.77
  episode_media: {}
  episode_reward_max: 1.9811999797821045
  episode_reward_mean: 1.806968002319336
  episode_reward_min: -2.0
  episodes_this_iter: 99
  episodes_total: 248519
  experiment_id: 680584edd9f34418abf7c9e3e3a03506
  hostname: DESKTOP-DGRCPNR
  info:
    learner:
      default_policy:
        learner_stats:
          allreduce_latency: 0.0
          cur_kl_coeff: 1.5187500000000003
          cur_lr: 0.0003
          entropy: 0.8215372450649738
          entropy_coeff: 0.0
          kl: 0.013463641342241317
          policy_loss: -0.09267504364834167
          total_loss: -0.025239087117370218
          vf_explained_var: 0.7879037857055664
          vf_loss: 0.04698805185034871
    num_agent_steps_sampled: 11912192
    num_steps_sampled: 11912192
    num_steps_trained: 11912192
  iterations_since_restore: 463
  node_ip

Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,RUNNING,192.168.15.7:2324,2004,80022.7,11912192,1.80697,1.9812,-2,39.77


Trial name,status,loc,iter,total time (s),ts,reward,episode_reward_max,episode_reward_min,episode_len_mean
PPO_Soccer_a0663_00000,TERMINATED,,2004,80022.7,11912192,1.80697,1.9812,-2,39.77


2021-12-10 11:59:14,038	INFO tune.py:549 -- Total run time: 11644.85 seconds (11643.48 seconds for the tuning loop).


In [12]:
ALGORITHM = "PPO"
TRIAL = analysis.get_best_logdir("episode_reward_mean", "max")
CHECKPOINT = analysis.get_best_checkpoint(
  TRIAL,
  "training_iteration",
  "max",
)
TRIAL, CHECKPOINT

('D:\\CEIA\\game\\results\\PPO\\PPO_Soccer_a0663_00000_0_2021-12-10_08-45-09',
 'D:\\CEIA\\game\\results\\PPO\\PPO_Soccer_a0663_00000_0_2021-12-10_08-45-09\\checkpoint_002004\\checkpoint-2004')



In [None]:
import gym
from ray.rllib import MultiAgentEnv
import soccer_twos


class RLLibWrapper(gym.core.Wrapper, MultiAgentEnv):
    """
    A RLLib wrapper so our env can inherit from MultiAgentEnv.
    """

    pass


def create_rllib_env(env_config: dict = {}):
    """
    Creates a RLLib environment and prepares it to be instantiated by Ray workers.
    Args:
        env_config: configuration for the environment.
            You may specify the following keys:
            - variation: one of soccer_twos.EnvType. Defaults to EnvType.multiagent_player.
            - opponent_policy: a Callable for your agent to train against. Defaults to a random policy.
    """
    if hasattr(env_config, "worker_index"):
        env_config["worker_id"] = (
            env_config.worker_index * env_config.get("num_envs_per_worker", 1)
            + env_config.vector_index
        )
    env = soccer_twos.make(**env_config)
    if "multiagent" in env_config and not env_config["multiagent"]:
        # is multiagent by default, is only disabled if explicitly set to False
        return env
    return RLLibWrapper(env)

In [None]:
if __name__ == "__main__":
    ray.shutdown()
    ray.init(num_gpus=0, ignore_reinit_error=True, include_dashboard=False, log_to_driver=False)

    tune.registry.register_env("Soccer", create_rllib_env)
    temp_env = create_rllib_env({"variation": soccer_twos.EnvType.multiagent_player, "flatten_branched": True})
    obs_space = temp_env.observation_space
    act_space = temp_env.action_space
    
    temp_env.close()

    analysis = tune.run(
        "PPO",
        name="PPO_selfplay_1",
        config={
            # system settings
            # system settings
            "num_gpus": 0,
            "num_workers": 5,
            "num_envs_per_worker": NUM_ENVS_PER_WORKER,
            "log_level": "INFO",
            #"lr": ray.tune.uniform(1e-7, 1e-3),
            "lr": 0.0003,
            "lambda": 0.95,
            "gamma": 0.99,
            'sgd_minibatch_size': 256,
            #'train_batch_size': 4000,
            'clip_param': 0.2,
            'model': {
              'fcnet_hiddens': [256, 256],
            },
            "framework": "torch",
            # RL setup
            "multiagent": {
                "policies": {
                    "default": (None, obs_space, act_space, {}),
                },
                "policy_mapping_fn": tune.function(lambda _: "default"),
                "policies_to_train": ["default"],
            },
            "env": "Soccer",
            "env_config": {
                "num_envs_per_worker": NUM_ENVS_PER_WORKER,
                "variation": soccer_twos.EnvType.multiagent_player,
                "flatten_branched": True,
            },
        },
        stop={
            "timesteps_total": 15000000,  # 15M
            "time_total_s": 14400, # 4h
        },
        checkpoint_freq=100,
        checkpoint_at_end=True,
        local_dir="./ray_results",
        # restore="./ray_results/PPO_selfplay_1/PPO_Soccer_ID/checkpoint_00X/checkpoint-X",
        restore="results/PPO/PPO_Soccer_0b316_00000_0_2021-12-09_18-29-08/checkpoint_000995/checkpoint-995",
    )

    # Gets best trial based on max accuracy across all training iterations.
    best_trial = analysis.get_best_trial("episode_reward_mean", mode="max")
    print(best_trial)
    # Gets best checkpoint for trial based on accuracy.
    best_checkpoint = analysis.get_best_checkpoint(
        trial=best_trial, metric="episode_reward_mean", mode="max"
    )
    print(best_checkpoint)
    print("Done training")

In [None]:
print(obs_space, act_space)

## Exportando seu agente treinado

Assim como no Lab 02, você pode exportar seu agente treinado para ser executado como competidor no ambiente da competição ou simplesmente assistí-lo. Para isso, devemos definir uma classe de agente que implemente a interface e trate as observações/ações para o formato da competição. Abaixo, configuramos qual experimento/checkpoint exportar e guardamos a implementação em uma variável para salvá-la em um arquivo posteriormente.

In [None]:
DRIVE_PATH = "results"
DRIVE_PYTHON_PATH = DRIVE_PATH.replace("\\", "")
if not os.path.exists(DRIVE_PYTHON_PATH):
  %mkdir -p $DRIVE_PATH

In [None]:
print(os.path.dirname(os.path.abspath('__file__')))

In [None]:
agent_file = f"""
import pickle
import os

import gym
from gym_unity.envs import ActionFlattener
import ray
from ray import tune
from ray.tune.registry import get_trainable_cls

from soccer_twos import AgentInterface, DummyEnv


ALGORITHM = "{ALGORITHM}"
CHECKPOINT_PATH = os.path.join('results')


class MyRaySoccerAgent(AgentInterface):
    def __init__(self, env: gym.Env):
        super().__init__()
        ray.init(ignore_reinit_error=True)

        self.flattener = ActionFlattener(env.action_space.nvec)

        # Load configuration from checkpoint file.
        config_path = ""
        if CHECKPOINT_PATH:
            config_dir = os.path.dirname(CHECKPOINT_PATH)
            config_path = os.path.join(config_dir, "params.pkl")
            # Try parent directory.
            if not os.path.exists(config_path):
                config_path = os.path.join(config_dir, "../params.pkl")

        # Load the config from pickled.
        if os.path.exists(config_path):
            with open(config_path, "rb") as f:
                config = pickle.load(f)
        else:
            # If no config in given checkpoint -> Error.
            raise ValueError(
                "Could not find params.pkl in either the checkpoint dir or "
                "its parent directory!"
            )

        # no need for parallelism on evaluation
        config["num_workers"] = 0
        config["num_gpus"] = 0

        # create a dummy env since it's required but we only care about the policy
        obs_space = env.observation_space
        act_space = self.flattener.action_space
        tune.registry.register_env(
            "DummyEnv",
            lambda *_: DummyEnv(obs_space, act_space),
        )
        config["env"] = "DummyEnv"

        # create the Trainer from config
        cls = get_trainable_cls(ALGORITHM)
        agent = cls(env=config["env"], config=config)
        # load state from checkpoint
        agent.restore(CHECKPOINT_PATH)
        # get default policy for evaluation
        self.policy = agent.get_policy()

    def act(self, observation):
        actions = {{}}
        for player_id in observation:
            # compute_single_action returns a tuple of (action, action_info, ...)
            # as we only need the action, we discard the other elements
            actions[player_id] = self.flattener.lookup_action(
                self.policy.compute_single_action(observation[player_id])[0]
            )
        return actions
"""

In [None]:
import os
import shutil

agent_name = "my_ray_soccer_agent"
agent_path = os.path.join('results', agent_name, agent_name)
shutil.rmtree(agent_path)
os.makedirs(agent_path)

# salva a classe do agente
with open(os.path.join('', "agent.py"), "w") as f:
    f.write(agent_file)

# salva um __init__ para criar o módulo Python
with open(os.path.join(agent_path, "__init__.py"), "w") as f:
    f.write("from .agent import MyRaySoccerAgent")

# copia o trial inteiro, incluindo os arquivos de configuração do experimento
shutil.copytree(TRIAL, os.path.join(agent_path, TRIAL.split("results/")[1]))

# empacota tudo num arquivo .zip
shutil.make_archive(os.path.join(DRIVE_PATH, agent_name), "zip", os.path.join(DRIVE_PATH, agent_name))

Após empacotar todos os arquivos necessários para a execução do seu agente, será criado um arquivo `minicurso_rl/lab03/my_ray_soccer_agent.zip` nos arquivos do Colab e na pasta correspondente no Google Drive. Baixe o arquivo e extraia-o para alguma pasta no seu computador. 

Assumindo que o ambiente Python já está configurado (e.g. os pacotes no [requirements.txt](https://github.com/dlb-rl/rl-tournament-starter/blob/main/requirements.txt) estão instalados), rode `python -m soccer_twos.watch -m my_ray_soccer_agent` para assistir seu agente jogando contra si mesmo. 

Você também pode testar dois agentes diferentes jogando um contra o outro. Utilize o seguinte comando: `python -m soccer_twos.watch -m1 my_ray_soccer_agent -m2 ceia_baseline_agent`. Você pode baixar o agente *ceia_baseline_agent* [aqui](https://drive.google.com/file/d/1WEjr48D7QG9uVy1tf4GJAZTpimHtINzE/view).