# 1 | Introduction

The objective of this project is to build a predictive model for case managers to predict interventions for their clients. The model should be able to find the interventions a client may need and get the probability of Return to Work for both the baseline and the interventions.

# 2 | Importing Libraries

In [47]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json

from sklearn.compose import ColumnTransformer
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder

# 3 | Basic Exploration

### 3.1 | Read dataset

In [34]:
df = pd.read_csv('dummy_dataset_final.csv')

### 3.2 | Display data content

In [35]:
styled_df = df.head(5).style

# Set background color, text color, and border for the entire DataFrame
styled_df.set_properties(**{"background-color": "#fcfcfc", "color": "#080808", "border": "1.5px solid black"})

# Modify the color and background color of the table headers (th)
styled_df.set_table_styles([
    {"selector": "th", "props": [("color", "white"), ("background-color", "#a2a2a2")]},
    {"selector": "td", "props": [("height", "50px")]}  # Sets the height for all data cells
])

Unnamed: 0,CA113-115,CA1,CA2_1,CA2_2,CA2_3,CA3,CA4,CA4_1,CA5_1,CA5_2,CA5_3,CA5_4,CA6,CA7,CA8_1,CA9,CA10,CA10_1,CA108,CA31,CA24,CA11,CA12,CA15,CA14,CA13,CA32,CA33,CA34,CA27,CA51,CA52,CA53,CA54,CA55,CA95,CA96,CA104,CA105,CA106,CA120,CA28,CA25,CA26,CA20,CA21,CA22,CA23,CA101,CA102,CA19,CA121,CA111,CA112,SLD01,CA16,CA16_1,CA17,CA17_1,CA18,CA18_2,CA18_1,CA56,CA57,CA58,CA59,CA60,CA61,CA62,CA63,CA64,CA65,CA66_T_1,CA66_T_2,CA66_T_3,CA67,CA68,CA69_T_1,CA69_T_2,CA70,CA71,CA72,CA73,CA74,CA75,CA76,CA77,CA78,CA79,CA80,CA81,CA82,CA83,CA84,CA85,CA86,CA87,CA88,CA89_T_1,CA89_T_2,CA90,CA91,CA29,CA30,CA41,CA42,CA42_1,CA43,CA35,CA122,CA36,CA37,CA37_1,CA38,CA40,CA107,CA44,CA45,CA45_2,CA45_1,CA109,CA109_1,CA46,CA48,CA49_T_1,CA49_T_1_2,CA49_T_1_2.1,CA49_T_2,CA49_T_2_2,CA49_T_2_2.1,CA50_T_1,CA50_T_2,CA103,CA92,CA93,CA94,CA97,CA98,CA100,CA136,CA137,CA137_1,CA1-1,CA135_T_1,CA135_T_2,Life Stabilization,Employment Assistance Services,Retention Services,Specialized Services,Employer Financial Supports,Enhanced Referrals for Skills Development,Outcome
0,Yes,270-15-5868,John,,Jones,02/06/1998,"2696 Fallon Drive, Ridgetown, Ontario",N0P 2C0,519-693-7193,,5194835700,,jjones03@gmail.com,English,USA,03/04/2018,Permanent resident,,10,Divorced,2,Man,No,White,No,No,Some college,2017,Outside Canada,"No, I do not need help, my language skills are good enough",3,4,2,5,5,4,4,2,2,2,Often,Sometimes,Yes,Yes,Homeowner,2,Not worried,Yes,Often True,Never True,No,Sometimes,No,No,case-managed,Employment,,EI,,,4,2,No,Yes,No difficulty,Never,No,No difficulty,Never,Sometimes,Some difficulty,No difficulty,Rarely,,,Some difficulty,No difficulty,Sometimes,Never,No difficulty,Never,some difficulty,No,Yes,Rarely,Some difficulty,No difficulty,Never,No difficulty,No,Never,No difficulty,No,Never,No difficulty,No,Never,Yes,Yes,Yes,Sometimes,some difficulty,No,No,No,,,,Yes,1.0,Employee,permanant job,,20.0,23.0,None of the above,,,,,,,,,,30.0,2.0,,,,,10022.0,Yes,1,2,4,20010.0,40,35,,,,,,80022.0,Basic Needs – Housing,Job Search,Ongoing Job Coaching,Employer Job Carving,,,Return to Work
1,Yes,590-99-1120,Ava,Nicole,Smith,12/03/1997,"1081 Scotts Lane, Lake Cowichan, BC",V0R 2G0,2500458554,,2503458666,,asmith04@gmail.com,French,CANADA,03/05/2018,Canadian citizen,,11,married,1,Woman,No,Black,Yes,No,12th Grade,2018,Outside Canada,"Yes, I need help, my language skills need development",3,5,5,2,4,4,3,5,2,3,Often,Always,No,No,Renting-private,1,Not worried,No,Never True,Never True,No,Often,No,No,case-managed,Employment,,WSIB,,,3,2,No,Yes,No Difficulty,Never,No,No difficulty,Never,Always,Some difficulty,Some difficulty,Rarely,Rarely,Rarely,A lot of Difficulty,No difficulty,Sometimes,Never,No difficulty,Never,,,,Never,,No difficulty,Never,No difficulty,,Never,No difficulty,No,Never,No difficulty,No,Never,,,,Never,Never,No,No,No,,,,Yes,1.0,Employee,"Temporary, term or contract",,30.0,30.0,None of the above,,,,,,,,,,10.0,1.0,,,,,10020.0,Yes,2,4,3,11100.0,30,30,,,,,,80021.0,Basic Needs – Food Security,Job Search,Ongoing Job Coaching,,,,Return to Work
2,Yes,183-86-9884,Amy,,Hansen,04/20/2002,"901 Speers Road, Brampton, Ontario",L6S 3S1,9059649200,,9059694850,,ahanson34@gmail.com,English,CANADA,03/06/2018,Canadian citizen,,10,single,0,Man,No,White,No,No,Some college,2019,In Canada,"No, I do not need help, my language skills are strong",4,4,4,2,4,4,2,5,2,3,Often,Always,No,No,Institution,1,Not worried,No,Sometimes True,Never True,No,Rarely,No,No,self-directed,Employment,,EI,,,3,2,Yes,No,Some difficulty,Sometimes,Yes,No difficulty,Never,Sometimes,No difficulty,Some difficulty,Never,Rarely,Rarely,No difficulty,No difficulty,,Never,No difficulty,Never,,,,Never,No difficulty,No difficulty,Never,No difficulty,,Never,No difficulty,No,Never,No difficulty,No,Never,No,No,No,Never,Never,No,No,No,,,,Yes,1.0,Employee,"Temporary, term or contract",,30.0,30.0,None of the above,,,,,,,,,,20.0,1.0,,,,,10021.0,Yes,2,5,2,11101.0,30,40,,,,,,80020.0,Basic Needs – Food Security,Job Search,Ongoing Job Coaching,Employment Services for Newcomers,Employer Job Trials with Financial Supports,,No
3,Yes,094-98-0647,Tom,,Woods,05/02/1990,"1821 Dora Ave #APT 121 , Kitchener, BC",N2G 4L9,5193439175,,5198543940,,twoods98@hotmail.com,English,USA,03/07/2018,Permanent resident,,12,married,0,Man,No,Latino,No,Metis,Some university,2018,Outside Canada,"No, I do not need help, my language skills are good enough",4,3,3,3,5,3,4,2,3,2,Always,Sometimes,No,No,Institution,0,Not worried,No,Sometimes True,Never True,No,Rarely,Yes,No,self-directed,Self-Employment,,,,,4,2,No,No,No Difficulty,Never,No,No difficulty,Never,No,No difficulty,No difficulty,Never,,,No difficulty,No difficulty,,Never,No difficulty,Never,,,,Never,No difficulty,No difficulty,Never,No difficulty,,Never,No difficulty,Yes,Never,No difficulty,No,Never,No,No,No,Never,Never,No,No,No,,,,No,,,,,,,None of the above,Unemployed and looking for work,,,,Lack of Work,,,,,30.0,,,10.0,0.0,10029.0,,Yes,3,4,3,90010.0,30,30,,,,,80010.0,,Health Supports – Mental Health and addictions,Job Search,Ongoing Job Coaching,Employment Services for Newcomers,Employer Job Trials with Financial Supports,Referrals to Educational Institutions or Funded Programs,Return to Work
4,Yes,252-99-2817,Jack,,La,06/02/1987,"107 N New Haven Ave , Boston Bar, BC",V0K 1C0,604-822-2019,,6048843233,,jla84@hotmail.com,English,CANADA,03/08/2018,Canadian citizen,,8,Divorced,0,Man,No,White,No,No,Bachelor's degree,2019,Outside Canada,"Yes, I need help, my language skills need development",3,4,4,4,5,3,4,5,3,4,Rarely,Rarely,No,No,Renting-private,0,Not worried,No,Sometimes True,Never True,No,Rarely,No,No,self-directed,No Source of Income,,,,,3,2,No,No,No Difficulty,Never,No,No difficulty,Never,No,No difficulty,No difficulty,Never,,,No difficulty,No difficulty,,Never,No difficulty,Never,,,,Never,No difficulty,No difficulty,Never,No difficulty,,Never,No difficulty,No,Never,No difficulty,No,Never,No,No,No,Never,Never,No,No,No,,,,No,,,,,,,None of the above,Unemployed and looking for work,,,,Sick leave or injury,,,,,10.0,,,28.0,0.0,10030.0,,Yes,2,3,3,21120.0,40,40,,,,,70021.0,,Basic Needs – Housing,Job Search,Ongoing Job Coaching,Employment Services for Newcomers,,Referrals to Educational Institutions or Funded Programs,Return to Work


In [36]:
rows , col =  df.shape
print(f"Number of Rows : {rows} \nNumber of Columns : {col}")
df.isnull().sum()

Number of Rows : 30 
Number of Columns : 152


CA113-115                                     0
CA1                                           0
CA2_1                                         0
CA2_2                                        26
CA2_3                                         0
                                             ..
Retention Services                            6
Specialized Services                          6
Employer Financial Supports                  12
Enhanced Referrals for Skills Development    22
Outcome                                       0
Length: 152, dtype: int64

# 4 | Data Preprocessing

In [37]:
# Define target columns
target_columns = [
    "Life Stabilization",
    "Employment Assistance Services",
    "Retention Services",
    "Specialized Services",
    "Employer Financial Supports",
    "Enhanced Referrals for Skills Development",
    "Outcome"
]

def load_and_dropna(csv_path):
    """Load a CSV file and drop columns where all values are missing."""
    df = pd.read_csv(csv_path)
    df = df.dropna(axis=1, how='all')
    return df

def impute_missing_values(df):
    """Impute missing values in a DataFrame and return additional information."""
    numerical_cols = df.select_dtypes(include=['int64', 'float64']).columns.tolist()
    object_cols = df.select_dtypes(include=['object']).columns.tolist()

    num_imputer = SimpleImputer(strategy='most_frequent')
    df[numerical_cols] = num_imputer.fit_transform(df[numerical_cols])

    obj_imputer = SimpleImputer(strategy='constant', fill_value='missing')
    df[object_cols] = obj_imputer.fit_transform(df[object_cols])
    
    return df, numerical_cols, object_cols, num_imputer, obj_imputer

def preprocess_features(df, target_columns=None, is_training_data=True):
    """Preprocess features using OneHotEncoding for categorical variables."""
    if is_training_data and target_columns is not None:
        X = df.drop(target_columns, axis=1)
    else:
        X = df.copy()
    
    categorical_features = X.select_dtypes(include=['object']).columns.tolist()

    preprocessor = ColumnTransformer(
        transformers=[
            ('cat', OneHotEncoder(handle_unknown='ignore'), categorical_features)
        ], remainder='passthrough')

    X_transformed = preprocessor.fit_transform(X)

    if is_training_data and target_columns is not None:
        return X_transformed, df[target_columns], preprocessor
    else:
        return X_transformed, preprocessor

def split_data(X, y, test_size=0.20, random_state=42):
    """Split data into training and test sets."""
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
    return X_train, X_test, y_train, y_test

def compute_overall_accuracy(model, inputs, targets, name=''):
    """Compute overall accuracy for training and validation."""
    preds = model.predict(inputs)
    accuracies = []

    for i, col in enumerate(targets.columns):
        accuracy = accuracy_score(targets[col], preds[:, i])
        accuracies.append(accuracy)

    overall_accuracy = np.mean(accuracies)
    print(f"Overall {name} Accuracy: {overall_accuracy*100:.2f}%")

    return preds

def evaluate_triggered_interventions(trigger_mappings, filepath):
    """Evaluates which interventions are triggered based on the answers."""
    df = pd.read_csv(filepath, header=None)

    ca_numbers = df.iloc[0, :]
    answers = df.iloc[1, :]

    triggered_interventions = []
    triggering_ca_numbers = []

    for ca, triggers in trigger_mappings.items():
        ca_index = ca_numbers[ca_numbers == ca].index.tolist()
        if ca_index:
            ca_answer = answers.iloc[ca_index[0]]
            for answer, interventions in triggers.items():
                if ca == "CA108":
                    try:
                        if int(ca_answer) <= 5:
                            triggered_interventions.extend(interventions)
                            triggering_ca_numbers.extend([ca for _ in interventions])
                    except ValueError:
                        pass
                elif ca_answer in answer:
                    triggered_interventions.extend(interventions)
                    triggering_ca_numbers.extend([ca for _ in interventions])

    unique_triggered_interventions = list(set(triggered_interventions))
    unique_triggering_ca_numbers = list(set(triggering_ca_numbers))

    return unique_triggered_interventions, unique_triggering_ca_numbers

def load_trigger_mappings(filepath):
    """Load trigger mappings from a JSON file."""
    with open(filepath, 'r') as file:
        trigger_mappings = json.load(file)
    return trigger_mappings

def print_readable_triggered_interventions(triggered_interventions):
    """Prints the triggered interventions in a readable format."""
    interventions, ca_numbers = triggered_interventions
    
    print("Triggered Interventions:")
    for intervention in interventions:
        print(f"- {intervention}")

    print("\nTriggered by CA Questions:")
    print(", ".join(ca_numbers))

# 5 | Machine Learning models - Random Forest

In [38]:
# Load and prepare training data
df_train = load_and_dropna('dummy_dataset_final.csv')
df_train, numerical_cols, object_cols, num_imputer, obj_imputer = impute_missing_values(df_train)
X_transformed, y, preprocessor = preprocess_features(df_train, target_columns=target_columns, is_training_data=True)
X_train, X_test, y_train, y_test = split_data(X_transformed, y)

# Train the model
model = RandomForestClassifier(n_jobs =-1, random_state = 42)
model.fit(X_train,y_train)

In [39]:
# Compute and print overall training accuracy
train_preds = compute_overall_accuracy(model, X_train, y_train, 'Train')

# Compute and print overall validation accuracy
val_preds = compute_overall_accuracy(model, X_test, y_test, 'Validation')


Overall Train Accuracy: 100.00%
Overall Validation Accuracy: 71.43%


# 6 | Make Prediction

In [40]:
# Load new data
df_new = pd.read_csv('prediction_dataset_final.csv')

# Impute missing values in df_new using the imputers fitted on df_train
df_new[numerical_cols] = num_imputer.transform(df_new[numerical_cols])  # numerical_cols determined from df_train
df_new[object_cols] = obj_imputer.transform(df_new[object_cols])  # object_cols determined from df_train

# Apply preprocessing transformations
X_new_transformed = preprocessor.transform(df_new)  # 'preprocessor' fitted on df_train

# Predict target values using the trained model
predicted_targets = model.predict(X_new_transformed)
predicted_values = predicted_targets[0]

# Mapping and printing each target with its corresponding prediction
for target, prediction in zip(target_columns, predicted_values):
    # Using .strip() to clean up any leading/trailing whitespace or newline characters in the prediction
    print(f"{target}: {prediction.strip()}")

Life Stabilization: Basic Needs – Housing
Employment Assistance Services: Job Search
Retention Services: Ongoing Job Coaching
Specialized Services: Employer Job Carving
Employer Financial Supports: missing
Enhanced Referrals for Skills Development: missing
Outcome: Return to Work


### 6.1 | Extract The Probability For "Return to Work"

In [41]:
outcome_index = target_columns.index("Outcome")
outcome_classes = model.classes_[outcome_index]
index_return_to_work = list(outcome_classes).index('Return to Work')

# Predict probabilities
probabilities = model.predict_proba(X_new_transformed)
probability_return_to_work = probabilities[outcome_index][0][index_return_to_work]

print(f"Probability of 'Return to Work' for 'Outcome': {probability_return_to_work * 100:.2f}%")

Probability of 'Return to Work' for 'Outcome': 76.00%


# 7 | Probabilities For "Return to Work" With Different Interventions

### 7.1 | Train a new model to predict "outcome" only

In [42]:
# Load and prepare training data
target_columns_outcome = ["Outcome"]
X_transformed, y, preprocessor = preprocess_features(df_train, target_columns=target_columns_outcome, is_training_data=True)
X_train, X_test, y_train, y_test = split_data(X_transformed, y)

# Train the model
model_return_to_work = RandomForestClassifier(n_jobs =-1, random_state = 42)
model_return_to_work.fit(X_train,y_train)

### 7.2 | Make prediction on Outcome if all interventions are included

In [43]:
df_modified = df_new.copy()

# Insert predicted interventions into the copied DataFrame
for col, pred in zip(target_columns[:-1], predicted_targets[0]):
    df_modified[col] = pred

# Prepare the modified data for prediction
X_modified_preprocessed = preprocessor.transform(df_modified.drop('Outcome', axis=1))

print(f"Outcome: {model_return_to_work.predict(X_modified_preprocessed)[0]}")

Outcome: Return to Work


### 7.3 | The probability for "Return to Work" if all interventions are included

In [44]:
# Predict probabilities
probabilities_with_all_interventions = model_return_to_work.predict_proba(X_modified_preprocessed)
probability_return_to_work_with_all_interventions = probabilities_with_all_interventions[0][1]

print(f"Probability of Return to Work' for 'Outcome': {probability_return_to_work_with_all_interventions * 100:.2f}%")

Probability of Return to Work' for 'Outcome': 85.00%


### 7.4 | The impact of each intervention on the probability for "Return to Work"

In [45]:
# Make a copy of the original DataFrame to preserve the original data
df_base = df_new.copy()
df_base[target_columns[:-1]] = 'None'  # Set interventions to 'None' or another placeholder for no intervention

# Iterate through each predicted intervention and its corresponding column name
for col, intervention in zip(target_columns[:-1], predicted_values):
    # Skip if the predicted intervention is 'missing'
    if intervention == 'missing':
        continue

    # Make a copy of the base DataFrame for each intervention
    df_modified = df_base.copy()

    # Insert the current non-missing intervention into the DataFrame
    df_modified[col] = intervention

    # Prepare the modified data for prediction
    X_modified_preprocessed = preprocessor.transform(df_modified.drop('Outcome', axis=1))

    # Predict the probability for "Return to Work"
    probabilities = model_return_to_work.predict_proba(X_modified_preprocessed)
    probability_return_to_work = probabilities[0][1]  # Assuming index 1 corresponds to "Return to Work"

    print(f"Applying '{intervention}' for '{col}',\nThe probability of 'Return to Work' for 'Outcome' is: {probability_return_to_work * 100:.2f}%\n")


Applying 'Basic Needs – Housing' for 'Life Stabilization',
The probability of 'Return to Work' for 'Outcome' is: 83.00%

Applying 'Job Search' for 'Employment Assistance Services',
The probability of 'Return to Work' for 'Outcome' is: 82.00%

Applying 'Ongoing Job Coaching' for 'Retention Services',
The probability of 'Return to Work' for 'Outcome' is: 82.00%

Applying 'Employer Job Carving' for 'Specialized Services',
The probability of 'Return to Work' for 'Outcome' is: 83.00%



# 8 | The Interventions Triggered By Specific Answers

In [48]:
trigger_mappings = load_trigger_mappings('trigger_mappings.json')
triggered_interventions = evaluate_triggered_interventions(trigger_mappings, 'prediction_dataset_final.csv')
print_readable_triggered_interventions(triggered_interventions)

Triggered Interventions:
- Employer Job Carving
- Accessible Workplace Consultation for Clients with a Disability
- Employer - Job Placements with Financial Supports
- Job Seeker - Diagnostic Assessment
- Basic Needs - Housing
- Employer - Job Accommodation
- Job Seeker - Accommodation Needs - Assistive Devices and Adaptive Technology
- Basic Needs - Financial Support
- Health Supports - Primary Care and Ongoing Medical Concerns
- Employer Coaching

Triggered by CA Questions:
CA90, CA23
