## GSOC 2018 - Reverse Transition Dynamics of Random Walk

Here we try to understand the reverse transitional dynamics of a stochastic process conditioned on the terminal observation. In this tutorial we check how different samplers perform in this task. For the purpose of the tutorial we take a simple stochastic process - One Dimensional Random Walk. For the demonstration, we use the Monte Carlo Sampler(MCSampler), Importance Sampling Samplers(ISSamplers) with a Funnel proposal, ISSampler with a softer version of the Funnel proposal and finally a reinforcement learning approach - reinforced variational inference(RVI).

note: make sure the repository is setup following directions from https://github.com/zafarali/better-sampling

In [None]:
# Here we import some important packages

import sys
sys.path.append('..')
import torch
import matplotlib
matplotlib.use('Agg')
import torch.nn as nn
import numpy as np
import os
import seaborn as sns
from rvi_sampling import utils
from rvi_sampling.samplers import ISSampler, ABCSampler, MCSampler
from rvi_sampling.distributions.proposal_distributions import FunnelProposal, SimonsSoftProposal

Here we set the parametersof the random walk process. To simplify, a one dimensional random walk is taken and the process is unbiased(means that each step the direction the walk takes is chosen randomly).

In [None]:
DIMENSIONS = 1    # Set dimension of the random walk
OUTPUT_SIZE = 2   # The output dimension of sampler networks (action, action probabilities)
BIASED = False

Command line parsers can be created using the `utils.parsers.create_parser` function. This adds basic command line arguments for rvi sampling and basic experimental arguments
```
parser = utils.parsers.create_parser('1D random walk', 'random_walk')
```


additional required arguments can be added using the `parser.add_argument` function
```
parser.add_argument('-cycles', '--cycles', type=int, default=15,
                    help='number of train-test cycles.')
```

The `parser.parse_args` function execute the parser on the command line arguments and we get the parameters in the variable assigned to it.
```

args = parser.parse_args()
```

#### NOTE: For the purpose of tutorial, the args variable is set manually

In [None]:
### command line arguments

class Arguments():
    def __init__(
        self,
        entropy = 0,  # Rvi environment
        baseline_decay = 0.99,
        learning_rate = 0.001,
        baseline_learning_rate = 0.001,
        only_rvi = False,
        no_train = False,
        baseline_type = 'moving_average',
        notime = True,
        gamma = 1,
        rewardclip = -10,
        gae = False,
        lam = 1.0,
        n_agents = 1,
        plot_posterior = False,
        neural_network = [16, 16],
        pretrained = None,
        samples = 1000, # experimental arguments
        sampler_seed = 0,
        method = "ISSampler",
        n_cpus = 3,
        no_tensorboard = False,
        name = 'results',
        IS_proposal = 'funnel',
        softness_coefficient = 1.0,
        override_endpoint = False,
        outfolder = './',
        profile_performance = False
    ):
        self.entropy = entropy
        self.baseline_decay = baseline_decay
        self.learning_rate = learning_rate
        self.baseline_learning_rate = baseline_learning_rate
        self.only_rvi = only_rvi
        self.no_train = no_train
        self.baseline_type = baseline_type
        self.notime = notime
        self.gamma = gamma
        self.rewardclip = rewardclip
        self.gae = gae
        self.lam = lam
        self.n_agents = n_agents,
        self.plot_posterior = plot_posterior,
        self.neural_network = neural_network,
        self.pretrained = pretrained,
        self.samples = samples, # experimental arguments
        self.sampler_seed = sampler_seed,
        self.method = method,
        self.n_cpus = n_cpus,
        self.no_tensorboard = no_tensorboard,
        self.name = name,
        self.IS_proposal = IS_proposal,
        self.softness_coefficient = softness_coefficient,
        self.override_endpoint = override_endpoint,
        self.outfolder = outfolder,
        self.profile_performance = profile_performance

Here we set some of the required aspects of the experiments - seeds(for reproducibility), folders to save results.

In [None]:
# This sets the global seed for the random number generators
utils.common.set_global_seeds(args.sampler_seed)
sns.set_style('whitegrid')

# Create the folder name for where the results are to be stored
folder_name = utils.io.create_folder_name(args.outfolder, args.name+'_'+str(args.sampler_seed)+'_'+str(args.rw_seed)+'_'+str(args.method))

# Training results are stored in separate train folder
train_folder_name = os.path.join(folder_name, 'training_results')

train_folder_to_save_in = os.path.join(train_folder_name, '0')
utils.io.create_folder(train_folder_to_save_in)

# This tracks the training kl divergence results cumulatively
kl_train_cumulative_track = os.path.join(folder_name, 'kl_training_cumulative.txt')
kl_train_track = os.path.join(folder_name, 'kl_training.txt')

# This trackes the proposal success rates cumulatively
prop_train_cumulative_track = os.path.join(folder_name, 'prop_training_cumulative.txt')
prop_train_track = os.path.join(folder_name, 'prop_training.txt')

# These functions create the folders required for saving results
utils.io.create_folder(folder_name)
utils.io.create_folder(train_folder_name)

Let us create a random walk process-it is created using the `utils.stochastic_process.create_rw(<args>, biased=<True/False>, n_agents=<number of agents interacting with the process>)`

In [None]:
# This function creates the random walk with the given parameters
# The n_agents parameter shows how many agents are interacting with the random walk
# Different stochastic processes can be implemented similar to random walk
rw, analytic = utils.stochastic_processes.create_rw(args, biased=BIASED, n_agents=args.n_agents)

The override endpoint command line argument will help to create different random processes - For example a simple random walk process can be made difficult to sample by making the endpoint farther from the starting window.

In [None]:
# This argument decides if we want to override the endpoint of the random walk process
if args.override_endpoint:
    rw.xT = np.array([ args.endpoint ])

In [None]:
utils.io.touch(os.path.join(folder_name, 'start={}'.format(rw.x0)))
utils.io.touch(os.path.join(folder_name, 'end={}'.format(rw.xT)))

In [None]:
# this argument sets where the ISproposal should push toward
push_toward = [-args.rw_width, args.rw_width]

# The soft proposal makes IS proposal softer such that the push towards is lighter
# the intensity of softness is given by the softness coefficient
if args.IS_proposal == 'soft':
    proposal = SimonsSoftProposal(push_toward, softness_coeff=args.softness_coefficient)
else:
    proposal = FunnelProposal(push_toward)

if args.method == 'ISSampler':
    sampler = ISSampler(proposal, seed=args.sampler_seed)
elif args.method == 'MCSampler':
    sampler = MCSampler(seed=args.sampler_seed)
elif args.method == 'ABCSampler':
    sampler = ABCSampler('slacked',seed=args.sampler_seed)
else:
    raise ValueError('Unknown method')

In [None]:
def kl_function(estimated_distribution):
    return analytic.kl_divergence(estimated_distribution, rw.xT)

a diagnostic can be used to track the different samplers at different training steps

In [None]:
sampler.set_diagnostic(utils.diagnostics.create_diagnostic(sampler._name, args, folder_name, kl_function))

print('True Starting Position is:{}'.format(rw.x0))
print('True Ending Position is: {}'.format(rw.xT))
print('Analytic Starting Position: {}'.format(analytic.expectation(rw.xT[0])))

train_results = None

utils.io.touch(kl_train_track)
utils.io.touch(kl_train_cumulative_track)
utils.io.touch(prop_train_track)
utils.io.touch(prop_train_cumulative_track)

Here the actual experiment is done - to make diagnostic easier, the sampler is run `args.cycles` number of times. Each cycle contains `args.samples` number of mc steps. At the end of each cycle, the kl divergences and proposal success rates are saved in files in the experiment folder.

In [None]:
for i in range(1, args.cycles+1):
    train_results_new = sampler.solve(rw, args.samples)

    # technically doing this saving doesn't take too long so doesn't need to be run
    # in a background thread. This is good because it saves time of having to copy
    # the policy for saving etc.
    if train_results is None:
        train_results = train_results_new
    else:
        # augment the old Results object.
        train_results._all_trajectories.extend(train_results_new.all_trajectories())
        train_results._trajectories.extend(train_results_new.trajectories())
        train_results._posterior_particles = np.hstack([train_results.posterior(),
                                                        train_results_new.posterior()])

        train_results._posterior_weights = np.hstack([train_results.posterior_weights(),
                                                      train_results_new.posterior_weights()])


    steps_so_far = str(i * args.samples)


    train_folder_to_save_in = os.path.join(train_folder_name, str(i))
    utils.io.create_folder(train_folder_to_save_in)
    print('Training Phase:')
    kld = utils.analysis.analyze_samplers_rw([train_results], args, None, rw,
                                       policy=None, analytic=analytic) # don't save these things again

    utils.io.stash(kl_train_cumulative_track, steps_so_far + ', ' + str(kld[0]))
    utils.io.stash(prop_train_cumulative_track, steps_so_far + ', ' + str(train_results.prop_success()))


    kld = utils.analysis.analyze_samplers_rw([train_results_new], args, train_folder_to_save_in, rw,
                                       policy=None, analytic=analytic) # don't save these things again
    utils.io.stash(kl_train_track, steps_so_far + ', ' + str(kld[0]))
    utils.io.stash(prop_train_track, steps_so_far + ', ' + str(train_results_new.prop_success()))

We now look at how different samplers behave in different scenarios - we look at a 2 different settings of simple random walk 

* the start window and endpoint are nearer(endpoint 0). 
* Then we look at a more difficult setting where the random walk endpoint is farther from the starting window(endpoint 8). The second setting is more difficult because the sampler has to take explore low probabilty trajectories.

![alt text](img/stochastic_process.jpg)

# Monte Carlo Sampler

Monte Carlo Sampler takes a random direction at each step of the random walk. For random walks with endpoints near the start region MC Sampler works well.

We look at some sample trajectories below

#### endpoint 0

![alt text](img/successful_trajectories_mc_end0.jpg)

#### endpoint 8

![no_img](img/successful_trajectories_mc_end8.jpg)

As it can be noticed, Monte Carlo sampler works very poorly for the more difficult setting because it fails to capture the low probability trajectories.

# Importance Sampling Sampler

The importance sampling sampler uses a Funnel proposal.

## Funnel proposal

![no_img](img/funnel_proposal.jpg)

Some sample trajectories are shown below

#### endpoint 0

![no_img](img/successful_trajectories_is_end0.jpg)

#### endpoint 8

![no_img](img/successful_trajectories_is_end8.jpg)

It can be noticed that for the importance sampling method, we get more successful trajectories from both the settings due to the Funnnel proposal which pushes the trajectory in the required direction.

# Importance Sampling with Soft Proposal

Here we use a softer proposal. The funnel proposal is a bit strict when it comes to trajectories in the sense that once a trajectory hits the boundary, the proposal pushes it to the endpoint. This results in having many trajectories starting up in the edge of the window. The Soft proposal gets over this limitation by only having softer push towards the window region. This results in a distribution which is much softer and resembles more with the target distribution.

#### endpoint 0

![no_img](img/successful_trajectories_issoft_end0.jpg)

#### endpoint 8

![no_img](img/successful_trajectories_issoft_end8.jpg)

# RVI Sampler

There has been some theoretical work on the use of Reinforcement Learning techniques for Variational Inference. In the RVI sampler some of these theoretical evidences is applied in practice to understand the practical effectiveness of the method. The samples shown below are from a reinforcement learning method with 1 agent, and using a variance reduction technique called Generalized Advantage Estimation with lambda 0.95.

#### endpoint 0

![no_img](img/traj_evol_rvi_end0.jpg)

The image shows how the RVI Sampler performs at different points in the training process. It can be noticed that more trajectories are successful in the final steps in training.

#### endpoint  8

![no_img](img/traj_evol_rvi_end8.jpg)

## Performance Comparison

Now we look at how different samplers(Monte Carlo, Importance sampling with Funnel proposal, Importance Sampling with a soft proposal, Reinforced Variational Inference) behave in different conditions. The endpoint of the process is changed to reflect different difficulty conditions. Endpoints farther from the starting position requires low probability trajectories to be successful. Monte Carlo sampler performs poorly in these adverse conditions but the performance of the IS sampler with soft proposal is not effected much by the changing difficulty.

![no_img](img/difficulty_comparissons.jpg)

### Related Commits

- Profiling - https://github.com/zafarali/better-sampling/commit/8148b3027d505032f2b76f5d3f4903b31d14466b
- Pytorch Version upgrade - https://github.com/zafarali/better-sampling/commit/8148b3027d505032f2b76f5d3f4903b31d14466b ; https://github.com/zafarali/policy-gradient-methods/commit/8bea455982ab7f0768951f467c5a96c7038be6ee
- Enhanced tests - https://github.com/zafarali/better-sampling/commit/5a7c6e369406b1b92ed8d914d89171bd1edf4d2f
- Function approximation - https://github.com/zafarali/better-sampling/commits/fn_approximator_baseline
- GAE implementation - https://github.com/zafarali/policy-gradient-methods/commit/52fb37694dc94e74f62d83f39dfb9597394b6de3
- Varying difficulty experiments - https://github.com/zafarali/better-sampling/commits/varying_difficulty

### Acnowledgements

I thank Dr. Simon Gravel and Zafarali Ahmed for being patient and helping out in the various aspects of the project