# Using SyGNet

This notebook demonstrates the basic functionality of the *sygnet* pacage in Python.

To download the package, simply run `pip install sygnet` at the command line.

## Prerequisites

First, we will focus on a very simple case of learning a parametric relationship between numeric variables.

To start, we will define a data generating process:

In [1]:
%%capture
%cd ..

import pandas as pd
import numpy as np
from numpy.random import default_rng

from src.sygnet.sygnet_interface import SygnetModel
rng = default_rng()

def gen_sim_data(rng, n=100000):
    x1 = rng.uniform(low = 0, high = 1, size = n)
    x2 = rng.uniform(low = 0, high = 1, size = n)
    x3 = rng.normal(loc = x1 + x2, scale = 0.1)
    y = rng.normal(loc=3*x1 + 2*x2 + 1, scale = 1)
    sim_data = np.column_stack((y,x1,x2,x3)).astype(np.float32)
    sim_data = pd.DataFrame(sim_data)
    return sim_data

sim_data = gen_sim_data(rng)


## **sygnet** pipeline

We follow a very similar pipeline to sci-kit learn: 

1. Instantiate a model:
   * Users must specify what type of GAN architecture to use: we recommend "wgan" for non-conditional synthetic data, and "cgan" when conditional labels will be supplied 
   * Optional arguments allow the user to customise the hidden layer structure, dropout proportions, layer norming, ReLU leakage, and whether to range match the final output
2. Fit the model to the training data
   * Users must supply the training data
   * Optional arguments allow the user to alter the default hyperparameters (epochs, learning rate, batch size etc.)
3. Sample from the trained model
   * Users must specify the number of synthetic observations to draw from the model
   * Optional arguments allow the user to control the format of the returned results, as well as to save the synthetic data to disk



## Basic example

In this first example, we set `mode = "wgan"` to use the Wassterstein GAN architecture. We fit the model to our simulated data, for a single epoch (real uses will require more epochs), and then generate 100 synthetic observations:

In [7]:
model = SygnetModel(mode = "wgan")
model.fit(data = sim_data, epochs = 1)
synth_data = model.sample(nobs = 100)

synth_data.head()

Epoch: 100%|██████████| 1/1 [00:07<00:00,  7.42s/it]


Unnamed: 0,0,1,2,3
0,0.908301,1.387488,1.448862,0.0
1,1.068194,1.387225,1.757479,0.0
2,0.85051,0.642854,1.224454,0.0
3,1.105018,0.795381,1.106114,0.0
4,1.028862,1.67149,1.553509,0.0


### GPU support

**sygnet** allows users to train the model using GPU computation, which should improve training times considerably. To run the synthetic generator on the GPU, we simply fit the model with the parameter `device = 'cuda'`. Here we are able to run 50 epochs in about 45 seconds:

In [6]:
model_gpu = SygnetModel(mode = "wgan")
model_gpu.fit(data = sim_data, epochs = 50, device='cuda')
synth_data = model_gpu.sample(nobs = 100)

synth_data.head()

Epoch: 100%|██████████| 50/50 [00:44<00:00,  1.12it/s]


Unnamed: 0,0,1,2,3
0,3.337108,0.397411,0.0,0.604935
1,3.601906,0.480042,0.0,0.662759
2,3.922121,0.637514,0.0,1.243936
3,3.219973,0.334941,0.0,0.624882
4,3.395032,0.688927,0.0,1.101919
