# Hello (Preliminaries 🐈😺😹)
In this Notebook we walk through the creation of a belief network from the raw GSS dataset.

As a preliminary, make sure you actally have the raw dataset. It should be located and named as follows: 

>CLEAN\datasets\raw_data\gss7222_r4.sas7bdat 

Okay. Now, first we need to import all the functions we will need.

In [1]:
# Add parent directory to Python path
import os
import sys
project_root = os.path.dirname(os.path.dirname(os.path.abspath("..")))
if project_root not in sys.path:
    sys.path.append(project_root)

# 1. Read in the raw dataset and cache it. 
#    Note: when we import the dataset, we automatically discard all variables that we're not interested. Edit the function if there are variables you'd like to keep.
from CLEAN.datasets.import_gss import import_dataset

# 2. Clean the raw dataset and derive special variables we are interested in. 
#    This involves: 
#                       a) normalising variables between -1 and 1 and derive special variables.
#                       b) derriving new variables from existing ones.
from CLEAN.datasets.clean_raw_data import clean_datasets

# 3. Calculate the belief network.
#    This involves calculating the correlation matrix of the filtered dataset.
from CLEAN.source_code.generators.corr_make_network import calculate_correlation_matrix, CorrelationMethod, EdgeSuppressionMethod

# 4. Visualize the belief network.
#    This involves visualizing the belief network in a graph.
from CLEAN.source_code.visualizers.network_visualizer import generate_html_visualization


### Importing the raw dataset 😺
First we will run a script that filters the dataset down to only the variables we are interested in. 
Feel free to look at the code in `import_gss.py` to see which variables are included. But keep in mind that if you want to add in more variables, you'll need to manually normalise it in clean_raw_data.py.

In [2]:
df, _ = import_dataset()

Loading from cache...
Done! ✨


### Cleaning the raw dataset 😺
Next we will run a script that cleans the dataset and derives special variables. 

This will normalise all the variables between -1 and 1, and derive some special variables like "VOTELAST_DEMREP" (this tells you which major party the respondent voted for in the previous election).


In [None]:
cleaned_df = clean_datasets()

# This warning is annoying. Will fix later.

Loading from cache...
Done! ✨


  series = series.replace(['I', 'N', 'Y'], np.nan)


### Calculating the belief network 😺

Now we will run a script that calculates the belief network. This will calculate the correlation matrix of the dataset, and then use that to create a belief network.

Here we can specify the years of interest, further filther the variables of interest, specify the method of correlation, whether we want partial correlations, and how we want to suppress edges.


In [4]:
corr_matrix = calculate_correlation_matrix(
    cleaned_df, 
    years_of_interest=[2018, 2020],
    method=CorrelationMethod.PEARSON, 
    partial=True, 
    edge_suppression=EdgeSuppressionMethod.REGULARIZATION,
    suppression_params={'regularization': 0.18})

In [5]:
# Create and save network visualization for the final time period        # Create network data
generate_html_visualization(
    corr_matrix,
    highlight_nodes=['POLVIEWS'],
    output_path='delete_this_file.html'
)

Network visualization has been saved to c:\Users\timbo\Github\BeliefNetworkEvo\CLEAN\notebooks\tutorials\delete_this_file.html
