## Weightage Calculation & Pillar Creation

### Weightage Calculation  
To determine the importance of each feature within its respective pillar, we use:  
- **Confirmatory Factor Analysis (CFA)**  
- **Random Forest SHAP (Shapley Additive Explanations) values**  

Each method contributes **50%** to the final weight calculation.  

#### Steps:  
1. **CFA Contribution (50%)**:  
   - Measures the relationship between observed variables and their underlying latent constructs (pillars).  
   - Helps ensure that the selected features correctly represent the intended pillar.  

2. **Random Forest SHAP Contribution (50%)**:  
   - SHAP values help interpret feature importance in the Random Forest model.  
   - Higher SHAP values indicate stronger influence on the target outcome.  

3. **Final Weight Calculation**:  
   - The weightages from CFA and SHAP are merged to derive a balanced final weight for each metric within the pillar structure.  

---

### Pillar Creation  
Once the final weightages are determined, we construct the **pillar values** from the harmonized dataset.  

#### Steps:  
1. Multiply each metric by its final weight within the pillar.  
2. Sum up weighted values to generate the pillar score.  
3. Store the processed pillar data for further modeling and analysis.  

This structured approach ensures that each pillar reflects both **statistical relationships (CFA)** and **model-driven importance (RF SHAP)**. 🚀  


In [1]:
import sys
import os

project_path = os.path.abspath("..")

if project_path not in sys.path:
    sys.path.append(project_path)


import numpy as np
import pandas as pd

from src.brand_health_centre.data_preparation import data_prepare

In [2]:

#Change the config settings according to the documentation
config_path = r"D:\BRAND_HUB_PROJECT\brandhub-capability\src\brand_health_centre\config.yml"
scaled_data, idv_list, config, paths = data_prepare(config_file_path=config_path)


{'filtered_data_path': './output\\filtered_data.csv', 'no_null_imputed_data_path': './output\\no_null_imputed_data.csv', 'scaled_data_path': './output\\scaled_data.csv', 'cfa_fit_data_path': './output\\cfa_fit_data.csv', 'rf_fit_data_path': 'output\\rf_fit_data.csv', 'rf_act_pred_data_path': 'output\\rf_act_pred_data.csv', 'pillar_weights_path': 'output\\pillar_weights.csv', 'pillar_data_path': 'output\\pillar_data.csv', 'trend_past_data_path': 'output\\trend_data.csv', 'scaled_score_data_path': 'output\\scaled_score_data.csv', 'imp_rf_fit_data_path': 'output\\imp_rf_fit_data.csv', 'imp_rf_act_pred_data_path': 'output\\imp_rf_act_pred_data.csv', 'score_card_final_df_path': 'output\\score_card_final_df.csv', 'relative_imp_model_results_path': 'output\\relative_imp_model_results.csv'}
All required columns are present in the DataFrame.
All independent variables in idv_list are present in the data.
Minimum date: 2017-01-07 00:00:00
Maximum date: 2025-01-11 00:00:00
Dropped_columns: [('vend

### CFA Modeling

In [3]:
from src.brand_health_centre.cfa_modeling import cfa_py, perform_cfa_analysis


cfa_df = perform_cfa_analysis(scaled_data, idv_list, config, cfa_py, paths)


  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1))
  return np.sqrt((chi2 / dof - 1) / (model.n_samples - 1

Index(['rhs', 'op', 'lhs', 'est.std', 'se', 'z', 'pvalue', 'factor_str', 'cfi',
       'tli', 'rmsea', 'vendor', 'brand', 'category', 'seed'],
      dtype='object')


In [4]:
cfa_df

Unnamed: 0,vendor,brand,category,seed,lhs,op,rhs,est.std,se,z,pvalue,factor_str,cfi,tli,rmsea
0,vendor_1,brand_1,category_1,2,advocacy,=~,directions_funnel_metrics_advocacy_t2b_buyers,1.000000,-,-,-,advocacy =~ directions_funnel_metrics_advocacy...,0.688131,1.000000,0.000000
1,vendor_1,brand_1,category_1,2,advocacy,=~,directions_strategic_measures_brand_love_index,0.227565,0.068392,3.327383,0.000877,advocacy =~ directions_funnel_metrics_advocacy...,0.688131,1.000000,0.000000
2,vendor_1,brand_1,category_1,2,advocacy,=~,social_percent_positive_neutral,-0.079943,0.028482,-2.806788,0.005004,advocacy =~ directions_funnel_metrics_advocacy...,0.688131,1.000000,0.000000
3,vendor_1,brand_1,category_1,2,advocacy,~~,advocacy,0.049732,0.015477,3.213364,0.001312,advocacy =~ directions_funnel_metrics_advocacy...,0.688131,1.000000,0.000000
4,vendor_1,brand_1,category_1,2,directions_funnel_metrics_advocacy_t2b_buyers,~~,directions_funnel_metrics_advocacy_t2b_buyers,0.000176,0.013594,0.012962,0.989658,advocacy =~ directions_funnel_metrics_advocacy...,0.688131,1.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4939,vendor_3,brand_6,category_2,19,ratings_reviews_review_sentiment_score_average,~~,ratings_reviews_review_sentiment_score_average,0.000381,0.000057,6.740209,0.0,product_feedback =~ ratings_reviews_good_exper...,0.954372,0.936121,0.226292
4940,vendor_3,brand_6,category_2,19,ratings_reviews_review_sentiment_score_health_...,~~,ratings_reviews_review_sentiment_score_health_...,0.033746,0.005021,6.721637,0.0,product_feedback =~ ratings_reviews_good_exper...,0.954372,0.936121,0.226292
4941,vendor_3,brand_6,category_2,19,ratings_reviews_review_sentiment_score_ingredi...,~~,ratings_reviews_review_sentiment_score_ingredi...,0.062984,0.009338,6.745307,0.0,product_feedback =~ ratings_reviews_good_exper...,0.954372,0.936121,0.226292
4942,vendor_3,brand_6,category_2,19,ratings_reviews_review_sentiment_score_pet_enj...,~~,ratings_reviews_review_sentiment_score_pet_enj...,0.000398,0.00006,6.584319,0.0,product_feedback =~ ratings_reviews_good_exper...,0.954372,0.936121,0.226292


### RF Modeling


In [5]:
from src.brand_health_centre.rf_modeling import train_and_evaluate_group_models_parallel # (Parallel process function)


rf_df, rf_act_pred_df = train_and_evaluate_group_models_parallel(
        config, scaled_data, idv_list, paths
    )

  from .autonotebook import tqdm as notebook_tqdm


In [6]:
rf_df

Unnamed: 0,vendor,brand,category,pillar,metric,feature_importance,shap_values,model_type,latest_dv,r2_score_train,mape_train,r2_score_fold,mape_fold,r2_score_hold_out,mape_hold_out,r2_score_all,mape_all,best_params_gridsearchcv
0,vendor_1,brand_1,category_1,advocacy,social_percent_positive_neutral,0.432657,0.000109,RandomForest,0.008493,0.560270,0.023552,0.208684,0.032461,0.061860,0.048041,0.495530,0.025848,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
1,vendor_1,brand_1,category_1,advocacy,directions_funnel_metrics_advocacy_t2b_buyers,0.327169,0.000066,RandomForest,0.008493,0.560270,0.023552,0.208684,0.032461,0.061860,0.048041,0.495530,0.025848,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
2,vendor_1,brand_1,category_1,advocacy,directions_strategic_measures_brand_love_index,0.240174,0.000061,RandomForest,0.008493,0.560270,0.023552,0.208684,0.032461,0.061860,0.048041,0.495530,0.025848,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
3,vendor_1,brand_1,category_1,awareness,directions_awareness_total_awareness_net_mentions,0.416056,0.000084,RandomForest,0.008493,0.391080,0.031468,-0.004981,0.037562,0.026878,0.029149,0.372070,0.031250,"{'max_depth': 2, 'max_features': 2, 'n_estimat..."
4,vendor_1,brand_1,category_1,awareness,directions_awareness_unaided_awareness_net_men...,0.333854,0.000051,RandomForest,0.008493,0.391080,0.031468,-0.004981,0.037562,0.026878,0.029149,0.372070,0.031250,"{'max_depth': 2, 'max_features': 2, 'n_estimat..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
280,vendor_3,brand_6,category_2,product_feedback,ratings_reviews_review_sentiment_score_pet_enj...,0.128842,0.000091,RandomForest,0.060086,0.593627,0.022917,-0.029760,0.033660,-0.090310,0.036418,0.542089,0.024182,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
281,vendor_3,brand_6,category_2,product_feedback,ratings_reviews_positive_ratings_percentage,0.124194,0.000256,RandomForest,0.060086,0.593627,0.022917,-0.029760,0.033660,-0.090310,0.036418,0.542089,0.024182,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
282,vendor_3,brand_6,category_2,product_feedback,ratings_reviews_review_sentiment_score_health_...,0.121246,0.000150,RandomForest,0.060086,0.593627,0.022917,-0.029760,0.033660,-0.090310,0.036418,0.542089,0.024182,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."
283,vendor_3,brand_6,category_2,product_feedback,ratings_reviews_review_sentiment_score_average,0.110100,0.000132,RandomForest,0.060086,0.593627,0.022917,-0.029760,0.033660,-0.090310,0.036418,0.542089,0.024182,"{'max_depth': 4, 'max_features': 2, 'n_estimat..."


In [7]:
rf_act_pred_df

Unnamed: 0,vendor,brand,category,pillar,actual,predicted,model_type
0,vendor_1,brand_1,category_1,advocacy,0.008234,0.007793,RandomForest
1,vendor_1,brand_1,category_1,advocacy,0.008066,0.007761,RandomForest
2,vendor_1,brand_1,category_1,advocacy,0.007958,0.007909,RandomForest
3,vendor_1,brand_1,category_1,advocacy,0.007795,0.007909,RandomForest
4,vendor_1,brand_1,category_1,advocacy,0.007716,0.007659,RandomForest
...,...,...,...,...,...,...,...
4603,vendor_3,brand_6,category_2,product_feedback,0.062521,0.059144,RandomForest
4604,vendor_3,brand_6,category_2,product_feedback,0.060978,0.058954,RandomForest
4605,vendor_3,brand_6,category_2,product_feedback,0.059843,0.057774,RandomForest
4606,vendor_3,brand_6,category_2,product_feedback,0.060110,0.058688,RandomForest
