## Monte Carlo Option Pricing and Deep Learning model in gQuant


The Black–Scholes model can efficiently be used for pricing “plain vanilla” options with the European exercise rule. Options like the Barrier option and Basket option have a complicated structure with no simple analytical solution. The Monte Carlo simulation is an effective way to price them. Traditionally, Monte Carlo pricing is done in the C/C++ CUDA code.  In this [developer blog](https://developer.nvidia.com/blog/accelerating-python-for-exotic-option-pricing/), I explored how to use Python GPU libraries to achieve the state-of-the-art performance in the domain of exotic option pricing.   

Recently, Huge and Savine introduced a novel regularization for training fast, accurate pricing in a [paper](https://arxiv.org/pdf/2005.02347.pdf). Inspired by this method, in this notebook we are going to:
     
    1. Implement the differential regularization for the example Asian Barrier option
    2. Show how we do HPC (Monte Carlo simulation) and deep learning in gQuant way. 
    
Without loss of generality, we use the Asian Barrier Option as an example. The Asian Barrier Option is a mixture of the Asian Option and the Barrier Option. The derivative price depends on the average of underlying Asset Price S, the Strike Price K, and the Barrier Price B.  Use the Down-and-Out Call Discretized Asian Barrier Option as an example. 

    The option is void if the average price of the underlying asset goes below the barrier. 
    The asset Spot Price S is usually modeled as Geometric Brownian motion, which has three free parameters: Spot Price, Percent Volatility, and Percent Drift. 
    The price of the option is the expected profit at the maturity discount to the current value. 
    The path-dependent nature of the option makes an analytic solution of the option price impossible. 

This is a good sample option for pricing using the Monte Carlo simulation. 

As a reresher, let's first run Monte Carlo simulation for Option pricing using the method introduced in the [developer blog](https://developer.nvidia.com/blog/accelerating-python-for-exotic-option-pricing/). We choose to price the example Asian Barrier Option:

    Maturity (T): 1 year
    Spot (S) : 120
    Strike (K): 110
    Volatility (sigma): 35.0 %
    Risk Free Rate (r): 5.0 %
    Stock Drift Rate (mu): 10.0 %
    Barrier (B): 100

To handle continuous maturity time, we fix the number of steps per year and do the fractional step for the last step. Import the libraries and define the option parameters:

In [1]:
import cupy
import numpy as np
import math
import time
import numba
from numba import cuda
from numba import njit
from numba import prange
import cudf
cupy.cuda.set_allocator(None)
#110.0, 100.0, 120.0, 0.35, 0.1, 0.05
N_PATHS = 8192000
Y_STEPS = 365 # constant, number of steps per year
T = 1 # time, unit 1 year
K = 110.0 # Strike price
B = 100.0 # barrier price
S0 = 120.0 # initial stock price 
sigma = 0.35 # stock annual volatility 
mu = 0.1 # stock annual return
r = 0.05 # stock annual interest rate
N_STEPS = int(np.ceil(T*Y_STEPS))
print('steps', N_STEPS)

steps 365


allocate GPU arrays for random numbers and outputs.

In [2]:
randoms_gpu = cupy.random.normal(0, 1, N_PATHS * N_STEPS, dtype=cupy.float32)
output =  np.zeros(N_PATHS, dtype=np.float32)
doutput =  np.zeros(N_PATHS*5, dtype=np.float32)

The following is the Numba kernel that we use to run simulation for each of the path. Note, the last step is handled specially to account for the continuous maturity time.

In [3]:
@cuda.jit
def numba_gpu_barrier_option(d_s, K, B, S0, sigma, mu, r, d_normals, N_STEPS, N_PATHS):
    # ii - overall thread index
    ii = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
    stride = cuda.gridDim.x * cuda.blockDim.x
    tmp1 = mu/Y_STEPS
    tmp2 = math.exp(-r*T)
    tmp3 = math.sqrt(1.0/Y_STEPS)
    running_average = 0.0
    for i in range(ii, N_PATHS, stride):
        s_curr = S0
        for n in range(N_STEPS):             
            s_curr += tmp1 * s_curr + sigma*s_curr*tmp3*d_normals[i + n * N_PATHS]
            running_average += (s_curr - running_average) / (n + 1.0)

            if running_average <= B:
                break
        payoff = running_average - K if running_average>K else 0
        d_s[i] = tmp2 * payoff

Run the simulation and benchmark the computation time:

In [4]:
number_of_threads = 256
number_of_blocks = (N_PATHS-1) // number_of_threads + 1
output = cupy.zeros(N_PATHS, dtype=cupy.float32)
numba_gpu_barrier_option[(number_of_blocks,), (number_of_threads,)](output, np.float32(K), 
                    np.float32(B), np.float32(S0), 
                    np.float32(sigma), np.float32(mu), 
                    np.float32(r), randoms_gpu, N_STEPS, N_PATHS)
s = time.time()
numba_gpu_barrier_option[(number_of_blocks,), (number_of_threads,)](output, np.float32(K), 
                    np.float32(B), np.float32(S0), 
                    np.float32(sigma), np.float32(mu), 
                    np.float32(r), randoms_gpu, N_STEPS, N_PATHS)
v = output.mean()
cuda.synchronize()
e = time.time()
print('time', e-s, 'v', v)

time 0.5035278797149658 v 18.700891


Automatic adjoint differentiation(AAD) can be applied in the Monte Carlo simulation to calculate the Greeks accurately and efficiently according to 
the [paper](https://arxiv.org/pdf/2005.02347.pdf). We need to do some derivation to find the formular of pathwise differentials.

The option parameters are  $K$, $S_0$, $\sigma$, $\mu$, $r$. For simplicity, we define $\theta=(K, S_0, \sigma, \mu, r)$.

The option price is computed by $$ p = E(f_i(\theta)) = \frac{1}{N}\sum_i f_i$$, where $f_i$ is the option value at the exercise time for the $i^{th}$ path. The Greeks are the first-order differentiation with respect to $\theta$: 

$$\nabla_{\theta} p = \frac{1}{N}\sum_i \nabla_{\theta} f_i $$

Let's focus on the calculation of gradient of $f_i(\theta)$. $f_i$ is calculated by Monte Carlo simulation method. Break it down into individual time steps. Without loss of generality, we drop the index $i$ here.

$$    \nabla_{\theta} f = 
\begin{cases}
    \nabla_{\theta} (a_n(\theta) - K)  & \text{if } a_n\geq K\\
    (0,0,0,0,0)              & \text{otherwise}
\end{cases}
$$

where the moving average $a_n$ at step $n$ is

$$a_n = g(a_{n-1}, s_{n}) = a_{n-1} + \frac{s_{n} - a_{n-1}}{n + 1.0}$$

The gradient of $a_n$:
$$ \nabla_{\theta} a_n = \frac{\partial g} {\partial a_{n-1}} \nabla_{\theta} a_{n-1} + \frac{\partial g} {\partial s_{n}} \nabla_{\theta} s_{n} = \frac{n}{n+1} \nabla_{\theta} a_{n-1} + \frac{1}{n+1} \nabla_{\theta} s_{n} $$

The stock price $s_n$ at step $n$ is:
$$s_n = s(s_{n-1}, \theta) = s_{n-1} + \frac{\mu}{Y} s_{n-1} + \sigma  \sqrt{\frac{1}{Y}} n_n s_{n-1}$$


The gradient of $s_n$:
$$\nabla_{\theta} s_n = \nabla_{\theta} s(s_{n-1}, \theta) = (1 + \frac{\mu}{Y} + \sigma \sqrt{\frac{1}{Y}} n_n) \nabla_{\theta} s_{n-1} + \nabla_{\theta} (1 + \frac{\mu}{Y} + \sigma \sqrt{\frac{1}{Y}} n_n) s_{n-1} $$
where the gradient in the second term is:
$$\nabla_{\theta} (1 + \frac{\mu}{Y} + \sigma \sqrt{\frac{1}{Y}} v_n) = (0, 0, 1/Y, \sqrt{1/Y} n_n ,0) $$



The initial contition $$\nabla_{\theta} S_0 = (0,1,0,0,0)$$

Let's convert these equations into code in the Numba kernel:

In [5]:
@cuda.jit
def numba_gpu_barrier_option(d_s, doutput, K, B, S0, sigma, mu, r, d_normals, N_STEPS, N_PATHS):
    # ii - overall thread index
    
    ii = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
    stride = cuda.gridDim.x * cuda.blockDim.x
    tmp1 = mu/Y_STEPS
    tmp2 = math.exp(-r)
    tmp3 = math.sqrt(1.0/Y_STEPS)
    running_average = 0.0
    d_theta = numba.cuda.local.array(5, numba.float64)
    d_a = numba.cuda.local.array(5, numba.float64)
    for i in range(ii, N_PATHS, stride):
        d_theta[0] = 0 # K
        d_theta[1] = 1 # S_0
        d_theta[2] = 0 # mu
        d_theta[3] = 0 # sigma
        d_theta[4] = 0 # r
        for k in range(5):
            d_a[k] = 0
        s_curr = S0
        for n in range(N_STEPS):
            
            ## start to compute the gradient
            factor = (1.0+tmp1+sigma*tmp3*d_normals[i + n * N_PATHS])
            for k in range(5):
                 d_theta[k] *= factor

            d_theta[2] += 1.0/Y_STEPS * s_curr
            d_theta[3] += tmp3 * d_normals[i + n * N_PATHS] * s_curr
            for k in range(5):
                d_a[k] = d_a[k]*n/(n+1.0) + d_theta[k]/(n+1.0)
            ## start to compute current stock price and moving average
              
            s_curr += tmp1 * s_curr + sigma*s_curr*tmp3*d_normals[i + n * N_PATHS]
            running_average += (s_curr - running_average) / (n + 1.0)
            # print(running_average, n, tmp1 * s_curr, sigma,s_curr, tmp3,d_normals[i + n * N_PATHS])
            if running_average <= B:
                break
        payoff = running_average - K if running_average>K else 0
        d_s[i] = tmp2 * payoff
        # gradient for strik 
        if running_average > K:
            d_a[0] = -1
            # adjust gradient for discount factor
            for k in range(5):
                d_a[k] *= tmp2
            d_a[4] += payoff * tmp2* -1.0
        else:
            for k in range(5):
                d_a[k] = 0
        for k in range(5):
            doutput[k*N_PATHS+i] = d_a[k]

Run the simulation and benchmark the computation time:

In [6]:
number_of_threads = 256
number_of_blocks = (N_PATHS-1) // number_of_threads + 1
output = cupy.zeros(N_PATHS, dtype=cupy.float32)
numba_gpu_barrier_option[(number_of_blocks,), (number_of_threads,)](output, doutput, np.float32(K), 
                    np.float32(B), np.float32(S0), 
                    np.float32(sigma), np.float32(mu), 
                    np.float32(r), randoms_gpu, N_STEPS, N_PATHS)
s = time.time()
numba_gpu_barrier_option[(number_of_blocks,), (number_of_threads,)](output, doutput, np.float32(K), 
                    np.float32(B), np.float32(S0), 
                    np.float32(sigma), np.float32(mu), 
                    np.float32(r), randoms_gpu, N_STEPS, N_PATHS)
v = output.mean()
cuda.synchronize()
e = time.time()
print('time', e-s, 'v', v)
greeks = doutput.reshape(5, N_PATHS).mean(axis=1)
print('greeks', greeks)

time 1.3155066967010498 v 18.700891
greeks [ -0.6714548    0.77134085  48.02113     20.457764   -18.700884  ]


As we shown in the [developer blog](https://developer.nvidia.com/blog/accelerating-python-for-exotic-option-pricing/), Cupy implementation is faster as it compiles the native CUDA code. The following is the same GPU kernel that is implemented in Cupy. It can handle batches of simulations simutaneously in the GPU. 

In [7]:
import cupy
cupy_batched_barrier_option = cupy.RawKernel(r'''
extern "C" __global__ void batched_barrier_option(
    float *d_s,
    float *d_d,
    const float K,
    const float B,
    const float S0,
    const float sigma,
    const float mu,
    const float r,
    const float * d_normals,
    const long N_STEPS,
    const long Y_STEPS,
    const long N_PATHS)
{
  unsigned idx =  threadIdx.x + blockIdx.x * blockDim.x;
  unsigned stride = blockDim.x * gridDim.x;
  unsigned tid = threadIdx.x;
  double d_theta[5];
  double d_a[5];

  for (unsigned i = idx; i<N_PATHS; i+=stride)
  {
    d_theta[0] = 0; // K
    d_theta[1] = 1.0; // S_0
    d_theta[2] = 0; // mu
    d_theta[3] = 0; // sigma
    d_theta[4] = 0; // r
    for (unsigned k = 0; k < 5; k++){
      d_a[k] = 0.0;
    }
    
    int path_id = i;
    float s_curr = S0;
    float tmp1 = mu/Y_STEPS;
    float tmp2 = exp(-r);
    float tmp3 = sqrt(1.0/Y_STEPS);
    unsigned n=0;
    double running_average = 0.0;
    for(unsigned n = 0; n < N_STEPS; n++){

        float normal = d_normals[path_id + n * N_PATHS];
                
        // start to compute the gradient
        float factor = (1.0+tmp1+sigma*tmp3*normal);
        for (unsigned k=0; k < 5; k++) {
            d_theta[k] *= factor;
        }
        

        d_theta[2] += 1.0/Y_STEPS * s_curr;
        d_theta[3] += tmp3 * normal * s_curr;

        for (unsigned k = 0; k < 5; k++) {
                d_a[k] = d_a[k]*n/(n+1.0) + d_theta[k]/(n+1.0); 
        }
        
        
        // start to compute current stock price and moving average       
       
       s_curr += tmp1 * s_curr + sigma*s_curr*tmp3*normal;
       running_average += (s_curr - running_average) / (n + 1.0);
       if (running_average <= B){
           break;
       }
    }

    float payoff = (running_average>K ? running_average-K : 0.f); 
    d_s[i] = tmp2 * payoff;
    
    // gradient for strik 
    if (running_average > K){
       d_a[0] = -1.0;
       // adjust gradient for discount factor
       for (unsigned k = 0; k < 5; k++) {
            d_a[k] *= tmp2;
        }
        d_a[4] += - payoff * tmp2;
        
    }
    else {
        for (unsigned k = 0; k < 5; k++) {
           d_a[k] = 0.0;
        }

    }
    
    for (unsigned k = 0; k < 5; k++) {
       d_d[k*N_PATHS+i] = d_a[k];
    }
  }
}

''', 'batched_barrier_option')

Wrap the driver function into a function to call this Cupy GPU kernel:

In [8]:
import time
Y_STEPS = 365
N_BATCH = 2
N_PATHS = 10240
K = 110.0 # Strike price
B = 100.0 # barrier price
S0 = 120.0 # initial stock price 
sigma = 0.35 # stock annual volatility 
mu = 0.1 # stock annual return
r = 0.05 # stock annual interest rate
N_STEPS = Y_STEPS
print(N_STEPS)

def batch_run(seed=None):
    number_of_threads = 256
    number_of_blocks = (N_PATHS - 1) // number_of_threads + 1
    random_elements = int(N_STEPS*N_PATHS)
    if seed is not None:
        cupy.random.seed(seed)
    randoms_gpu = cupy.random.normal(0, 1, random_elements, dtype=cupy.float32)
    output = cupy.zeros(N_PATHS, dtype=cupy.float32)
    d_output = cupy.zeros(N_PATHS*5, dtype=cupy.float32)
    cupy.cuda.stream.get_current_stream().synchronize()
    s = time.time() 
    cupy_batched_barrier_option((number_of_blocks,), (number_of_threads,),
                       (output, d_output, np.float32(K), np.float32(B), np.float32(S0), np.float32(sigma), np.float32(mu), np.float32(r),
                        randoms_gpu, N_STEPS, Y_STEPS, N_PATHS))
    v = output.mean()
    b = d_output.reshape(5, N_PATHS).mean(axis=1)
    cupy.cuda.stream.get_current_stream().synchronize()
    e = time.time()
    print('time', e-s, 'v',v)
    print(b.shape)
    print('gradient', b)
    return output
o = batch_run()

365
time 0.004988431930541992 v 18.862803
(5,)
gradient [ -0.6728276    0.77394855  48.233276    20.973333   -18.862803  ]


In [11]:
def compute(K, B, S0, sigma, mu, r, N_PATHS=102400):
    Y_STEPS = 365
    N_STEPS = 365
    N_BATCH = 1


    def batch_run(seed=3):
        number_of_threads = 256
        number_of_blocks = (N_PATHS - 1) // number_of_threads + 1
        random_elements = int(N_STEPS*N_PATHS)
        cupy.random.seed(seed)
        randoms_gpu = cupy.random.normal(0, 1, random_elements, dtype=cupy.float32)
        output = cupy.zeros(N_PATHS, dtype=cupy.float32)
        d_output = cupy.zeros(N_PATHS*5, dtype=cupy.float32)
        cupy.cuda.stream.get_current_stream().synchronize()
        cupy_batched_barrier_option((number_of_blocks,), (number_of_threads,),
                       (output, d_output, np.float32(K), np.float32(B), np.float32(S0), np.float32(sigma), np.float32(mu), np.float32(r),
                        randoms_gpu, N_STEPS, Y_STEPS, N_PATHS))
        v = output.mean()
        b = d_output.reshape(5, N_PATHS).mean(axis=1)
        cupy.cuda.stream.get_current_stream().synchronize()
        return v, b
    return batch_run()
price, greeks = compute(110., 100., 120., 0.35, 0.1, 0.05, N_PATHS=10240000)
print('price', price)
print('greek', greeks)
def num_greek(T, K, B, S0, sigma, mu, r):
    delta = 1e-1
    v0, _ = compute(K-delta, B, S0, sigma, mu, r, N_PATHS=10240000)
    v1, _ = compute(K+delta, B, S0, sigma, mu, r, N_PATHS=10240000)
    print('dK', (v1- v0)/(2*delta))
    delta = 1e-1
    v0, _ = compute(K, B, S0-delta, sigma, mu, r, N_PATHS=10240000)
    v1, _ = compute(K, B, S0+delta, sigma, mu, r, N_PATHS=10240000)
    print('dS',(v1- v0)/(2*delta))  
    delta = 1e-2
    v0, _ = compute(K, B, S0, sigma-delta, mu, r, N_PATHS=10240000)
    v1, _ = compute(K, B, S0, sigma+delta, mu, r, N_PATHS=10240000)
    print('dSigma', (v1- v0)/(2*delta))
    delta = 1e-2
    v0, _ = compute(K, B, S0, sigma, mu-delta, r, N_PATHS=10240000)
    v1, _ = compute(K, B, S0, sigma, mu+delta, r, N_PATHS=10240000)
    print('dMu', (v1- v0)/(2*delta))
    delta = 1e-2
    v0, _ = compute(K, B, S0, sigma, mu, r-delta, N_PATHS=10240000)
    v1, _ = compute(K, B, S0, sigma, mu, r+delta, N_PATHS=10240000)
    print('dR', (v1- v0)/(2*delta))
    
num_greek(1.1, 110., 100., 120., 0.35, 0.1, 0.05)

110.0 100.0 120.0 0.35 0.1 0.05
price 18.69982
greek [ -0.67117995   0.77111435  48.007164    20.468712   -18.69982   ]
109.9 100.0 120.0 0.35 0.1 0.05
110.1 100.0 120.0 0.35 0.1 0.05
dK -0.6712055
110.0 100.0 119.9 0.35 0.1 0.05
110.0 100.0 120.1 0.35 0.1 0.05
dS 0.7872772
110.0 100.0 120.0 0.33999999999999997 0.1 0.05
110.0 100.0 120.0 0.36 0.1 0.05
dSigma 19.336033
110.0 100.0 120.0 0.35 0.09000000000000001 0.05
110.0 100.0 120.0 0.35 0.11 0.05
dMu 48.282337
110.0 100.0 120.0 0.35 0.1 0.04
110.0 100.0 120.0 0.35 0.1 0.060000000000000005
dR -18.699932


After the simulation, it generates both option price and Greeks. gQuant organizes all the computation steps into weakly coupled computation nodes.

We create one node that is used to generate random Option parameters.

In [10]:
from gquant.dataframe_flow import TaskGraph
taskgraph = TaskGraph.load_taskgraph('./option_parameter.gq.yaml')
taskgraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'parameters'), ('type', 'ParaNode'), ('conf', {'seed': 23}…

The paramter node returns an iteratable object that emits one set of parameters at a time. Let's evaluate it and get 3 parameters samples

In [11]:
para_iter = taskgraph.run()[0]
print(next(para_iter))
print(next(para_iter))
print(next(para_iter))

[1.0447818e+01 1.0731429e+01 7.5738693e+01 1.3961923e-01 2.1572977e-01
 5.2736355e-03]
[1.0187787e+01 1.0532363e+01 1.6310895e+02 2.6019182e-02 2.1567108e-01
 1.2974015e-01]
[1.0359708e+01 1.0886375e+01 6.4217033e+01 1.7851767e-01 2.5280696e-01
 1.3791301e-02]


The parameter node feed the parameter iterator to simulaiton node that computes the option price and greeks

In [12]:
from gquant.dataframe_flow import TaskGraph
taskgraph = TaskGraph.load_taskgraph('./option_simulation.gq.yaml')
taskgraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'parameters'), ('type', 'ParaNode'), ('conf', {'seed': 23}…

The simulation node returns an iteratable object that emits the parameters alongside with the corresponding option price and Greeks. Let's evaluate it and run 3 simulations:

In [15]:
sim_iter = taskgraph.run()[0]
print(next(sim_iter))
print(next(sim_iter))
print(next(sim_iter))

(array([1.1e+02, 1.0e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 26.979107 ,  -0.9511752,   1.0175067,  48.32628  ,   9.225855 ,
       -26.979107 ], dtype=float32))
(array([1.1e+02, 1.0e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 26.961382 ,  -0.9511752,   1.0173591,  48.306942 ,   9.172027 ,
       -26.961382 ], dtype=float32))
(array([1.1e+02, 1.0e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 26.976645 ,  -0.9511752,   1.0174862,  48.325832 ,   9.218872 ,
       -26.976645 ], dtype=float32))


In [13]:
from gquant.dataframe_flow import TaskGraph
taskgraph = TaskGraph.load_taskgraph('./verify.gq.yaml')
sim_iter = taskgraph.run()[0]
print(next(sim_iter))
print(next(sim_iter))
print(next(sim_iter))

110.0 100.0 120.0 0.3499999940395355 0.10000000149011612 0.05000000074505806
(array([1.0e+02, 1.1e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 18.705942 ,  -0.6713809,   0.7713495,  48.02244  ,  20.479053 ,
       -18.705942 ], dtype=float32))
110.0 100.0 120.0 0.3499999940395355 0.10000000149011612 0.05000000074505806
(array([1.0e+02, 1.1e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 18.706078  ,  -0.67118824,   0.771174  ,  48.011593  ,
        20.490744  , -18.706078  ], dtype=float32))
110.0 100.0 120.0 0.3499999940395355 0.10000000149011612 0.05000000074505806
(array([1.0e+02, 1.1e+02, 1.2e+02, 1.0e-01, 3.5e-01, 5.0e-02],
      dtype=float32), array([ 18.710524  ,  -0.67143583,   0.77143806,  48.026913  ,
        20.486582  , -18.710524  ], dtype=float32))


The simulation computes both the option price and Greeks. Greeks can be used to add differential regularization to the cost function. The simulation is very costly even with GPUs. The Option price accuracy depends on the number of paths because the standard deviation scale with the $n$ number of paths as $\frac{1}{\sqrt{n}}$. To speed up the option pricing and Greek compuation, we can use a neural network to approximate the option pricing simulation. 

We have seen the simulation can generate any numbers of data points in Cupy GPU arrays. It is easy to convert Cupy GPU array into Pytorch tensors via DLpack library.

In [21]:

next(taskgraph.run()[0])

(array([[5.0248528e+01, 1.4897156e+00, 7.5738693e+01, 1.4951923e+02,
         1.0791489e-01, 1.0447271e-02, 7.1361013e-02]], dtype=float32),
 array([[  77.81717   ,   -5.5078187 ,   -0.8991493 ,    0.975911  ,
          112.239784  ,   -0.15982881, -115.925446  ]], dtype=float32))

In [46]:

def collector_data(seed):
    taskgraph = TaskGraph.load_taskgraph('./option_simulation.gq.yaml')
    iterator = taskgraph.run(replace={"parameters": {
        "conf": {
            "seed": seed
        }
    }})[0]
    number = 102400
    block = 10
    for bid in range(block):
        paras = []
        targets = []
        for i in range(number):
            sim_result = next(iterator)
            para = sim_result[0]
            target = sim_result[1]
            paras.append(para)
            targets.append(target)
        cupy.save('para_seed{}_block{}'.format(seed, bid), cupy.concatenate(paras))
        cupy.save('taget_seed{}_block{}'.format(seed, bid), cupy.concatenate(targets))


In [64]:
    taskgraph = TaskGraph.load_taskgraph('./option_simulation.gq.yaml')
    iterator = taskgraph.run(replace={"parameters": {
        "conf": {
            "seed": 12
        }
    },
    "sim": {
        "conf":{
           "N_PATHS": 1024000,
           "Y_STEPS": 252
        }
    }})[0]
    print(next(iterator))
    iterator = taskgraph.run(replace={"parameters": {
        "conf": {
            "seed": 12
        }
    },
    "sim":{
        "conf":{
           "N_PATHS": 10240000,
           "Y_STEPS": 252
        }
    }})[0]
    print(next(iterator))

(array([[4.1671780e+01, 1.9071181e+00, 1.0493592e+02, 1.9296080e+02,
        1.8465599e-01, 1.7604582e-01, 1.5656979e-01]], dtype=float32), array([[ 9.3870270e+01, -1.4619228e+01, -7.4185354e-01,  8.8991117e-01,
         1.7414128e+02, -1.4608005e-01, -1.7902191e+02]], dtype=float32))


CUDARuntimeError: cudaErrorMemoryAllocation: out of memory

In [32]:
import torch
from torch.utils.dlpack import from_dlpack
X, Y = next(sim_iter)
X_t, Y_t = (from_dlpack(X[0].toDlpack()), from_dlpack(Y[0].toDlpack()))
print(X_t)
print(Y_t)

tensor([2.1169e+01, 1.4763e+00, 3.5437e+01, 1.8875e+02, 7.2569e-02, 1.7584e-01,
        1.9153e-01], device='cuda:0')
tensor([ 123.4491,  -23.6133,   -0.7537,    0.7955,  113.5782,   -0.3578,
        -182.2439], device='cuda:0')


Now we are ready to move from HPC to Deep Learning world. We create a gQuant node that takes an iterator of simulation results and convert it into NeMo DataLayer. We use the basic feed foward neural network to approximate the option prices. But we choose to use `Elu` activation function as we need high order differenations. While the popular `ReLu` activation funciton only have non-zero first order differentiation. 

Pytorch provides `grad` method to compute the gradient of the inputs. For a batch of input data points, we would like to calcuate all the input gradients in one step. The trick is to sum the batch of outputs together. Here is an example to compute the gradient of a batch of inputs for the function $f(x, y) = (xy)^2$. It's gradient is $$\nabla f(x,y) = (2 x^2 y, 2 y^2 x)$$

In [13]:
import torch
from torch.autograd import grad
'''
z = (xy)^2
x = 3, y =2
first order deriv [24 36]
x = 4, y =5
first order deriv [200, 160]
'''
# create a batch of two inputs
inputs = torch.tensor([[3.0,2.0], [4.0, 5.0]], requires_grad=True)
# sum the outputs together
z = (inputs.prod(axis=1)**2).sum()
first_order_grad = grad(z, inputs, create_graph=True)
print(first_order_grad)

(tensor([[ 24.,  36.],
        [200., 160.]], grad_fn=<DivBackward0>),)


In the feed-forward network, we use this trick to compute the gradients for all the input data points. The loss function will be composed of two terms. One is the regression on the option price prediction, and the second one is the regularization term to make sure the gradients match.

load the gquant task graph to train it.

In [17]:
import numba
from numba import cuda
import nemo
from gquant.dataframe_flow import TaskGraph

nemo.core.NeuralModuleFactory()
taskgraph = TaskGraph.load_taskgraph('../taskgraphs/option_price_example/option_price_nemo.gq.yaml')
taskgraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'parameters'), ('type', 'ParaNode'), ('conf', {'seed': Non…

Click on the `run` button to see the training in effect. Click on the `show log` button to see the full logs about the training loss descreasing.

Since we have all the gQuant nodes handy for the simulation and deep learning, we can re-use them to construct the evaluation network. The evaluation network will reuse the same neural network in the training to evaluate the prediction in a seperate dataset. In the train node, we can monitor the evaluation performance to determine whether the model has good generalization or not. The following is the full network:

In [19]:
taskgraph = TaskGraph.load_taskgraph('../taskgraphs/option_price_example/full_training_model.gq.yaml')
taskgraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'parameters'), ('type', 'ParaNode'), ('conf', {'seed': Non…

Once the model is trained, we can start to run the inference on the trained model. 

In [21]:
taskgraph = TaskGraph.load_taskgraph('../taskgraphs/option_price_example/option_price_inference.gq.yaml')
taskgraph.draw()

GQuantWidget(sub=HBox(), value=[OrderedDict([('id', 'parameters'), ('type', 'ParaNode'), ('conf', {'seed': Non…