# Train Agents

Use this file to trian the informed agents. Load the parameters by changing the variable 'load_fileheader'. Be sure to set fileheader to the desired name of the output file. Specify the number of epochs to trian for with max_updates

In [1]:
import os
from google.colab import drive

drive.mount('/content/drive')
os.chdir('drive/My Drive/Colab Notebooks/dissertation/final')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
!pip install torch==1.11

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [3]:
%run 'import_final.ipynb'

In [4]:
%run 'functions_final.ipynb'

In [5]:
%run 'agents_final.ipynb'

In [6]:
%run 'simulations_final.ipynb'

# Parameters 

Set parameters for model and simulation

* https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw

In [7]:
# General parameters and setup
device = 'cuda:0' if torch.cuda.is_available() else 'cpu'
results_path = './numerical_results'
if not os.path.exists(results_path):
    os.makedirs(results_path)

# For loading data
load_fileheader = '4Traders_0Fees_9995LW_3a'
results_file = load_fileheader + '_state.pth.tar'
pm_file = load_fileheader + '_parameters.json'

# Different for each run
n_updates_alpha = 10
n_updates_v = 10
max_updates = 20
history_dict = dict(n_updates_alpha=n_updates_alpha,
                    n_updates_v=n_updates_v,
                    max_updates=max_updates)


# Header to output current run data
fileheader = 'Test'
print(device)

cuda:0


In [8]:
agent, parameters, bellman_loss, bellman_approx =\
      load_agent(os.path.join(results_path, results_file),
                 os.path.join(results_path, pm_file), device)

success
tensor([0.8963])


This cell is where the actual learning gets done

In [9]:
# learning
pbar = tqdm.tqdm(total = max_updates) # This is just the progress bar
count = 0

while count < max_updates:
    # Policy evaluation steps
    for i in range(n_updates_v):
        loss = agent.update_v(**parameters['learning_args'],
                              **parameters['s_args'])
        bellman_loss.append(loss)
    pbar.write('bellman loss: {:1.2e}'.format(loss.item()))
    # Policy improvement steps
    for i in range(n_updates_alpha):
        v_approx = agent.update_alpha(**parameters['learning_args'],
                                      **parameters['s_args'])
        bellman_approx.append(v_approx)
    pbar.write('bellman approx: {:1.2e}'.format(v_approx.item()))
    count += n_updates_alpha + n_updates_v
    pbar.update(n_updates_alpha + n_updates_v)

# Keep a record of how long the agent has been trained for
parameters['training_history'].append(history_dict)

  input = module(input)
  0%|          | 0/20 [00:32<?, ?it/s]

bellman loss: 1.49e+08


100%|██████████| 20/20 [01:08<00:00,  3.41s/it]

bellman approx: 1.44e+04


Save state of the agent so that it can be trained later. Save a matching parameters file so that the parameters used for the agent are known

In [10]:
save_state(fileheader, agent, parameters, bellman_loss, bellman_approx,
           device=device, results_path=results_path)

tensor([0.0146])


In [11]:
print('Done')

Done
