# Using SyGNet

This notebook demonstrates the basic functionality of the **sygnet** package in Python.

To download the package, simply run `pip install sygnet` at the command line.

## Prerequisites

First, we will focus on a very simple case of learning a parametric relationship between numeric variables. To start, we load the required packages and define a data generating process (DGP). The DGP comprises random uniform variables (`x1` and `x2`), which in turn impact the value of two other variables (`x3` and `y`). Finally, we take 100,000 draws from this DGP to use as our training data:

In [9]:
%%capture
%cd ..

import pandas as pd
import numpy as np
from numpy.random import default_rng

# NB: once installed via pip, can run `from sygnet import SygnetModel`
from src.sygnet.sygnet_interface import SygnetModel
rng = default_rng()

def gen_sim_data(rng, n=100000):
    
    x1 = rng.uniform(low = 0, high = 1, size = n)
    x2 = rng.uniform(low = 0, high = 1, size = n)
    x3 = rng.normal(loc = x1 + x2, scale = 0.1)
    y = rng.normal(loc=3*x1 + 2*x2 + 1, scale = 1)

    sim_data = pd.DataFrame({
        'x1' : x1,
        'x2' : x2,
        'x3' : x3,
        'y' : y
    })   

    return sim_data

sim_data = gen_sim_data(rng)
sim_data.head()


## **sygnet** pipeline

We follow a very similar pipeline to sci-kit learn: 

1. Instantiate a model:
   * Users must specify what type of GAN architecture to use: we recommend "wgan" for non-conditional synthetic data, and "cgan" when conditional labels will be supplied 
   * Optional arguments allow the user to customise the hidden layer structure, dropout proportions, layer norming, ReLU leakage, and whether to range match the final output
2. Fit the model to the training data
   * Users must supply the training data
   * Optional arguments allow the user to alter the default hyperparameters (epochs, learning rate, batch size etc.)
3. Sample from the trained model
   * Users must specify the number of synthetic observations to draw from the model
   * Optional arguments allow the user to control the format of the returned results, as well as to save the synthetic data to disk



## Basic example

In this first example, we set `mode = "wgan"` to use the Wassterstein GAN architecture. We fit the model to our simulated data, for a single epoch (real uses will require more epochs), and then generate 100 synthetic observations:

In [2]:
model = SygnetModel(mode = "wgan")
model.fit(data = sim_data, epochs = 1)
synth_data = model.sample(nobs = 100)

synth_data.head()

Epoch: 100%|██████████| 1/1 [00:07<00:00,  7.63s/it]


Unnamed: 0,0,1,2,3
0,0.864504,1.126964,0.0,0.0
1,1.663468,1.92051,0.072198,0.0
2,1.132081,1.183887,0.0,0.0
3,0.504572,1.792356,0.0,0.0
4,0.291027,1.290172,0.349143,0.0


### GPU support

**sygnet** allows users to train the model using GPU computation, which should improve training times considerably. To run the synthetic generator on the GPU, we simply fit the model with the parameter `device = 'cuda'`. Here we are able to run 10 epochs in about as long as it takes to run 1 epoch on the CPU:

In [5]:
model_gpu = SygnetModel(mode = "wgan")
model_gpu.fit(data = sim_data, epochs = 10, device='cuda')
synth_data = model_gpu.sample(nobs = 100)

synth_data.head()

Epoch: 100%|██████████| 10/10 [00:09<00:00,  1.09it/s]


Unnamed: 0,0,1,2,3
0,2.536711,1.042354,0.0,1.452901
1,3.452708,0.527383,0.0,1.522696
2,2.792605,0.277383,0.0,0.801463
3,3.185599,1.000851,0.0,1.137364
4,3.499744,0.545892,0.0,0.904989
