# Testing PCA for Causal Inference
I need to get a baseline for what causal inference I can do using genotype PCs. So to start I'll just run a small simulation under three scenarios

1. Null case (no association between variables)
1. Full causal mediation
1. Independent association of genotype PC with mediator and response

First I'll just import some of the code I've written to generate datasets and run inference, then I'll construct the PCs and run a causal inference test on varying numbers of PCs for each set.

In [1]:
import cit
import data_model as dm
import numpy as np
import scipy as sp
import multiprocessing as mp
import pandas as pd

In [2]:
import warnings
warnings.filterwarnings(action='once')
def to_df(result):
    '''Merge result dictionary to a pd.DataFrame'''
    raise NotImplemented

## Null Case

In [3]:
NUM_SAMPLES = 1000
null_data = [dm.generate_null(n=500) for i in range(NUM_SAMPLES)]
SVDs = [np.linalg.svd(genotype, full_matrices=False, compute_uv=True) for (_,_,genotype) in null_data]
PCs = [U@np.diag(D) for (U,D,Vh) in SVDs]

In [None]:
import importlib as imp
imp.reload(cit)
null_results = []
for i in range(1,6):
    args = [(trait,gene_exp, Z[:, 0:i], 1000) for ((trait, gene_exp, _), Z) in zip(null_data, PCs)]
    with mp.Pool(processes=8) as pool:
        null_result = pool.starmap(cit.cit, args)
    null_results.append(null_result)

In [None]:
import pickle
with open('3_5_2019_PCA_null.dat','wb') as f:
    pickle.dump(PCs, f)
    pickle.dump(null_data, f)
    pickle.dump(null_results,f)