# Debugging PCA
When including new linear decompositions of genotype, but perhaps with different restrictions on the space/nature of the reduced spaces therein, I have decided to use `scikit-learn` models so I am more confident that the results for each method are directly comparable. I also know that I can implement these methods more quickly and that I can be more sure that I'm executing them properly (they are documented high-level methods, rather than a bunch of low-level code that I wrote from basics in numpy).

Here I'm just checking that the results of my PCA implementation match those in `scikit-learn`

In [9]:
import os
os.chdir("../scripts/")

In [11]:
import data_model as dm
import numpy as np

from scipy.stats import zscore
from sklearn.decomposition import PCA

In [13]:
_,_,genotype = dm.generate_caus1(500,400)

pca = PCA(n_components = 1)
skl_pcs = pca.fit_transform(genotype)

my_pcs = dm.reduce_genotype(genotype,'pca',1,"NA")

In [16]:
print(skl_pcs[0:5])
print(my_pcs[0:5])

[[-0.58094899]
 [ 0.64357105]
 [-1.02795413]
 [-1.96297645]
 [-0.85713605]]
[[ 0.50161667]
 [-1.52472867]
 [ 1.92714507]
 [ 3.47148476]
 [ 1.50977254]]


In [18]:
skl_pcs = pca.fit_transform(zscore(genotype))

my_pcs = dm.reduce_genotype(genotype,'pca',1,"NA")

In [19]:
print(skl_pcs[0:5])
print(my_pcs[0:5])

[[-0.50161667]
 [ 1.52472867]
 [-1.92714507]
 [-3.47148476]
 [-1.50977254]]
[[ 0.50161667]
 [-1.52472867]
 [ 1.92714507]
 [ 3.47148476]
 [ 1.50977254]]


Okay this looks fine, since PCA is equivalent to reflection, I'm fine that `scikit-learn`'s version returns a reflection of what I have here. No qualms