# Step 2A: Launch Local Training Job

In this notebook, we will generate the training command to train our reinforcement learning model on a single machine. 

In [1]:
import os

We will define the following hyperparameters for the training job:

* **batch_update_frequency**: This is how often the weights from the actively trained network get copied to the target network. It is also how often the model gets saved to disk. For more details on how this works, check out the [Deep Q-learning paper](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf).
* **max_epoch_runtime_sec**: This is the maximum runtime for each epoch. If the car has not reached a terminal state after this many seconds, the epoch will be terminated and training will begin.
* **per_iter_epsilon_reduction**: The agent uses an epsilon greedy linear annealing strategy while training. This is the amount by which epsilon is reduced each iteration.
* **min_epsilon**: The minimum value for epsilon. Once reached, the epsilon value will not decrease any further.
* **batch_size**: The minibatch size to use for training.
* **replay_memory_size**: The number of examples to keep in the replay memory. The replay memory is a FIFO buffer used to reduce the effects of nearby states being correlated. Minibatches are generated from randomly selecting examples from the replay memory.
* **weights_path**: If we are doing transfer learning and using pretrained weights for the model, they will be loaded from this path.
* **train_conv_layers**: If we are using pretrained weights, we may prefer to freeze the convolutional layers to speed up training.
* **airsim_path**: The path to the folder containing the .ps1 to start AirSim. This path cannot contain spaces.
* **data_dir**: The path to the directory containing the road_points.txt and reward_points.txt used to compute the reward function. This path cannot contain spaces.
* **experiment_name**: A unique identifier for this experiment

In [8]:
batch_update_frequency = 300
#batch_update_frequency = 10
max_epoch_runtime_sec = 30
per_iter_epsilon_reduction=0.003
min_epsilon = 0.1
batch_size = 32
#replay_memory_size = 2000
replay_memory_size = 50
weights_path = os.path.join(os.getcwd(), 'Share\\data\\pretrain_model_weights.h5')
train_conv_layers = 'false'
airsim_path = 'C:\\Airsim\\AD_Cookbook_AirSim'
data_dir = os.path.join(os.getcwd(), 'Share')
experiment_name = 'local_run'

We will now generate a training batch file. The file will be written to *Share\scripts_downpour\app*. Run this file from an activated python environment in that directory to kick off the training.

In [9]:
train_cmd = 'python distributed_agent.py'
train_cmd += ' batch_update_frequency={0}'.format(batch_update_frequency)
train_cmd += ' max_epoch_runtime_sec={0}'.format(max_epoch_runtime_sec)
train_cmd += ' per_iter_epsilon_reduction={0}'.format(per_iter_epsilon_reduction)
train_cmd += ' min_epsilon={0}'.format(min_epsilon)
train_cmd += ' batch_size={0}'.format(batch_size)
train_cmd += ' replay_memory_size={0}'.format(replay_memory_size)
train_cmd += ' weights_path={0}'.format(weights_path)
train_cmd += ' train_conv_layers={0}'.format(train_conv_layers)
train_cmd += ' airsim_path={0}'.format(airsim_path)
train_cmd += ' data_dir={0}'.format(data_dir)
train_cmd += ' experiment_name={0}'.format(experiment_name)
train_cmd += ' local_run=true'

with open(os.path.join(os.getcwd(), 'Share/scripts_downpour/app/train.bat'), 'w') as f:
    f.write(train_cmd)

Note that training the model from scratch can take up to 5 days on a powerful GPU. Using pre-trained weights, the model should begin to visibly converge after 3 hours of training. Once the model has trained, move on to **[Step 3 - Run the Model](RunModel.ipynb)**.