# Post-analysis 
Documentation includes:
- A generated overview of the notebook's structure (imports, functions, I/O).
- Light-weight commentary cells before code blocks for readability.
- Replacement of the final cell with a reusable function to compare the empirical distributions of training data and model outputs.



In [164]:
from quasinet.qnet import load_qnet, qdistance, qdistance_matrix
from quasinet.utils import numparameters
from quasinet.qsampling import qsample
import pandas as pd
import numpy as np
import warnings
import seaborn as sns
warnings.filterwarnings("ignore")

**Cell overview:** Model training / inference.

In [None]:
model=load_qnet('./LSM.gz')
numparameters(model)

**Cell overview:** visualize dependency trees

In [None]:
# visualize dependency trees
OUTDIR='treesRAND2016/'
model.viz_trees(tree_path=OUTDIR,big_enough_threshold=1,format='png',addurl=True,
                base_url='https://34.66.189.202/data/',edge_color='gray',
               text_color='gray',edge_label_color='gray')
! rm *.dot *.png
! scp -r treesRAND2016 ishanu@34.66.189.202:/var/www/html/data

**Cell overview:** Data loading / I/O.

In [None]:
df=pd.read_csv('./cleaneddf.csv',index_col=0,keep_default_na=False)[model.feature_names]
df_test=df.tail(4972).sample(1000)

**Cell overview:** Computation / analysis step.

In [None]:
X=df_test.values.astype(str)

**Cell overview:** generate and visualize LSM distanec matrix between 100 random samples

In [None]:
# generate and visualize LSM distanec matrix between 100 random samples
N=100
H=qdistance_matrix(X[:100],X[:100],model,model)

**Cell overview:** Computation / analysis step.

**Cell overview:** Computation / analysis step.

In [None]:
sns.clustermap(H**1.2,method='ward',cmap='terrain',vmin=0.03)

**Cell overview:** Model training / inference.

In [None]:
NULL=np.array(['']*len(model.feature_names))

**Cell overview:** Model training / inference.

In [None]:
qs=qsample(NULL,model,50)

**Cell overview:** Model training / inference.

In [None]:
k=300 # choose the index
varname = model.feature_names[k]
P=model.predict_distributions(qs)
B=pd.DataFrame(P[k],index=['lsm_hat']).T
B.index=B.index.values.astype(float)
B=B.sort_index()

A=pd.DataFrame(df.head(5000)[varname].value_counts())
A.columns=['data_distribution']
A.index.name=None
A=A.drop('')
A.index=A.index.values.astype(float)

A=A/A.sum()
A=A.sort_index()
A=A.join(B).fillna(0)
ax=A.plot(kind='bar')
#B.plot(kind='bar',ax=ax,color='r',alpha=.5)


**Cell overview:** Computation / analysis step.