# Fully Funtioning Backend Demo

This notebook is designed to show how the DSEM pipeline works. It's fully funtioning, using SEM and RL NAS. Please fasten your seat belt and enjoy the journey.

In [1]:
import numpy as np
import pandas as pd
import autogluon as ag
import rpy2.robjects as ro
import matplotlib.pyplot as plt

from SEM import SemModel

ModuleNotFoundError: No module named 'numpy'

## 1. Data

In this demo, we use dataset _PoliticalDemocracy_ from Bollen's book on structural equation modeling published in 1989. We load this dataset in R and convert it to Python Dataframe for further usage.

In [None]:
# Load this dataset in R.
ro.packages.importr('lavaan')
rData = ro.r('PoliticalDemocracy')

# Convert it to Python Dataframe.
with ro.conversion.localconverter(ro.default_converter + ro.pandas2ri.converter):
    data = ro.conversion.rpy2py(rData)
data.describe()

## 2. Conventional SEM

After loading this dataset, let's play with the conventional SEM. It is a truth universally acknowledged, that SEM is rather confirmative than explorative. So the very only way to play with the conventional SEM is to run SEM on a manually proposed model.

Therefore, we use the model from lavaan (a package in R) official tutorial as an example.

In [None]:
# Describe the model
model = '''
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
'''

# Create the model and fit the data.
sem = SemModel()
buildRes = sem.build_sem_model(model)
assert buildRes
fitRes = sem.fit_sem_model(data)
assert fitRes['is_fitted']

# Evaluate the model fitted.
measureRes = sem.evaluate_sem_model()
print('AGFI: %f \nRMSEA: %f' % (
    measureRes['agfi'], 
    measureRes['rmsea']))

## 3. Deep SEM

Now let's try to apply RL NAS technology to SEM. In this section, we define the search space first, and then the search strategy, finally let's combine everything together.

Firstly, let's start with search space.

### 3.1. Search Space

In [None]:
facNames = ['factor1', 'factor2', 'factor3']  # Use the same factor number as the model propsoed above.
facNum = len(facNames)
varNames = data.columns
varNum = len(varNames)

# Define the search space.
searchSpace = {var: ag.space.Categorical(*facNames) 
               for var in varNames}  # Define search space for measurement model
for i in range(facNum):  # Define search space for regressions model
    for j in range(i):
        searchSpace[str((facNames[i], facNames[j]))] = ag.space.Categorical(*list(range(3)))

def evaluateSolution(model):
    try:
        sem = SemModel()

        buildRes = sem.build_sem_model(model)
        if not buildRes:
            return 0

        fitRes = sem.fit_sem_model(data)
        if not fitRes['is_fitted']:
            return 0

        measureRes = sem.evaluate_sem_model()
        if not measureRes['is_evaluated']:
            return 0

        AGFI = measureRes['agfi']
        RMSEA = measureRes['rmsea']
        index = AGFI - RMSEA * 10
        sigmoidIndex = 1/(1 + np.exp(-index)) 
        return sigmoidIndex
    except:
        return 0

def dict2des(dataDict, seperator):
    dataDes = ''
    for parent in dataDict.keys():
        if not dataDict[parent]:
            continue;
        
        relaDes = '' 
        for son in dataDict[parent]:
            if not relaDes:
                relaDes += son
            else:
                relaDes += ' + ' + son
                
        dataDes += parent + ' ' + seperator + ' ' + relaDes + '\n'
       
    return dataDes
    
@ag.args(**searchSpace)
def rl_simulation(args, reporter):
    measurementDict = {fac: [] for fac in facNames}
    regressionsDict = {fac: [] for fac in facNames} 
        
    for var, choice in args.items():
        if var == 'task_id': 
            continue
        elif var[0] != '(':  # measurement
            measurementDict[choice].append(var)
        else:  # regressions
            varTuple = eval(var)
            if choice == 1:
                regressionsDict[varTuple[0]].append(varTuple[1])
            elif choice == 2:
                regressionsDict[varTuple[1]].append(varTuple[0])
    
    # Prior knowledge from SEM.
    for fac, ind in measurementDict.items():
        if (len(ind) < 2):
            reporter(reward=0)
            return
    
    modelDes = dict2des(measurementDict, '=~') + \
               dict2des(regressionsDict, '~')
    
    reward = evaluateSolution(modelDes)
    
    reporter(reward=reward)
    

### 3.2. Search Strategy

In [None]:
# Running the following code might crash Python.
# This problem is caused by the multiprocessing of the RL algorithm and lavaan in R.
# But the numeric part has been take care of, so the result is not currupted.
rl_scheduler = ag.scheduler.RLScheduler(rl_simulation,
                                        resource={'num_cpus': 1, 'num_gpus': 0},
                                        num_trials=200,
                                        reward_attr='reward',
                                        controller_batch_size=4,
                                        controller_lr=5e-3,)

rl_scheduler.run()
rl_scheduler.join_jobs()
    
print('Best config: {}, best reward: {}'.format(rl_scheduler.get_best_config(), rl_scheduler.get_best_reward()))

### 3.3. Learning Curve

In [None]:
curveRL = [v[0]['reward'] for v in rl_scheduler.training_history.values()]
curveSmooth = [np.max(curveRL[i:i+5]) for i in range(0, len(curveRL), 5)]

plt.plot(range(len(curveSmooth)), curveSmooth)


## 4. Comparison

Now let's compare those conventional SEM and DSEM. First let's started with conventional SEM.

In [None]:
model = '''
  # measurement model
    ind60 =~ x1 + x2 + x3
    dem60 =~ y1 + y2 + y3 + y4
    dem65 =~ y5 + y6 + y7 + y8
  # regressions
    dem60 ~ ind60
    dem65 ~ ind60 + dem60
'''

evaluateSolution(model)

Now let's move to DSEM.

In [None]:
args = rl_scheduler.get_best_config()

measurementDict = {fac: [] for fac in facNames}
regressionsDict = {fac: [] for fac in facNames} 
        
for var, choice in args.items():
    var = var.split('▁')[0]
    if var[0] != '(':  # measurement
        measurementDict[facNames[choice]].append(var)
    else:  # regressions
        varTuple = eval(var)
        if choice == 1:
            regressionsDict[varTuple[0]].append(varTuple[1])
        elif choice == 2:
            regressionsDict[varTuple[1]].append(varTuple[0])

modelDes = dict2des(measurementDict, '=~') + \
           dict2des(regressionsDict, '~')

print('The best model looks like:\n' + modelDes)
    
evaluateSolution(modelDes)