# Meta-config analysis and generation for synthetic data

We will generate synthetic data configurations that can in turn generate synthetic data. This is useful for testing the pipeline. In the working directory, there are directories that are named by the number of covariates.
In each of the directories, we will generate a set of configurations that generate causal graphs with that number of covariates. The functional form of the covariate is one of the following:

1. Linear non-Gaussian datasets
2. Non-linear Gaussian datasets with parametric assumptions
    1. The invertible function is a polynomial of degree 3
    2. The invertible function is $x + sin(x)$
3. Non-linear Gaussian datasets with no parametric assumptions

We will explain in detail how we generate each of these datasets. The causal graph generators are separate entities and we will sample from three different causal graph types:

1. Chains (only one correct ordering and really sparse)
2. Stars (Many correct orderings and sparse)
3. Erdos-Renyi (Many correct orderings and dense)

## Functional form Generator

### Linear non-Gaussian datasets

In these datasets, the function format is of 

In [8]:
import typing as th
import numpy as np

np.random.seed(100)

def generate_linear_non_gaussan_configurations(n_cov: th.List[int],
                                               n_config: th.List[int]):
    scm_generator_args =  {}
    for n in n_cov:
        for i in n_config:
            
            # graph generator
            graph_generator_args = {'n': n, 'seed': np.random.randint(1000)}
            b = n_config // 5
            if i < b:
                graph_generator_args['graph_type'] = 'chain'
            elif 1 * b <= i < 2 * b:
                graph_generator_args['graph_type'] = 'fork' if i % 2 == 0 else 'v_structure'
            elif 2 * b <= i < 3 * b:
                graph_generator_args['graph_type'] = 'full'
            elif 3 * b <= i < 4 * b:
                graph_generator_args['graph_type'] = 'erdos_renyi'
                graph_generator_args['p'] = 0.4
            else:
                graph_generator_args['graph_type'] = 'star'
            
            scm_generator_args['graph_generator'] = 'ocd.data.scm.GraphGenerator'
            scm_generator_args['graph_generator_args'] = graph_generator_args
            
            

In [11]:
np.random.randint(0, 2)

1

In [None]:
graph_generator_args = [
    {
        'graph_type': 'random_dag',    
    },
    {
        'graph_type': 'chain',    
    },
    {
        'graph_type': 'star',
        'n': 5,
    }
]
class_path: lightning_toolbox.DataModule
init_args:
  dataset: ocd.data.SyntheticOCDDataset
  dataset_args:
    seed: 100
    scm_generator: ocd.data.synthetic.InvertibleModulatedGaussianSCMGenerator
    scm_generator_args:
      graph_generator: ocd.data.scm.GraphGenerator
      graph_generator_args:
        enforce_ordering: [0, 1, 2]
        graph_type: random_dag
        n: 3
        p: 0.7
        m: 2
        seed: 333
      seed: 100
      std: 1.0
      mean: 0.0
      weight_s: [1, 2]
      weight_t: [1, 2]
      s_function: >
        lambda x: numpy.log(1 + numpy.exp(x))
      s_function_signature: softplus
      t_function: >
        lambda x: x ** 3 + 10
      t_function_signature: cube_and_dislocate
    observation_size: 10240
  val_size: 0.1
  batch_size: 128
