## Data Import

* Feature Importance: Analyze the contribution of each feature to the final model’s predictions using coefficients and importance scores.
* ROC Curve: Plot the ROC curve to evaluate the model’s performance in distinguishing between bankrupt and non-bankrupt companies.
* SHAP Analysis: Use SHAP (SHapley Additive exPlanations) to visualize and explain individual predictions, highlighting key factors influencing bankruptcy risk.
* Business Insights and Implications: Translate model findings into actionable insights for stakeholders, including recommendations for risk mitigation and investment strategies.

Required Data
* Trained Model: Your final trained model (likely a Logistic Regression or Cox Proportional Hazards model)
* Test Dataset: A held-out portion of your data not used during training
* Feature Matrix: The processed features used for prediction (X_test)
* Target Variable: The actual bankruptcy outcomes (y_test)
* Prediction Scores: Probability outputs from your model on the test data

### Performance Metrics Visualization: ROC Curves

In [None]:
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc

# Assuming you have:
# model = your trained model
# X_test = your test features
# y_test = your test labels

# Get prediction probabilities
y_pred_proba = model.predict_proba(X_test)[:, 1]  # For binary classification

# Calculate ROC curve points
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure(figsize=(10, 8))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.savefig('roc_curve.png', dpi=300, bbox_inches='tight')
plt.show()

### Model Interpretation



1) SHAP Summary Plot

2. Feature Importance Bar Charts

3. Coefficient Plots for Logistic Regression

In [None]:
import shap
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Assuming you have:
# 1. A trained pipeline called 'best_model' from your bankruptcy prediction project
# 2. X_train and X_test datasets with your financial indicators
# 3. Feature names stored in your dataset

# --- Extract Steps from Your Best Pipeline ---
preprocessor = best_model.named_steps['preprocessor']  # Adjust if your pipeline has different step names
classifier = best_model.named_steps['classifier']  # This might be 'regressor' or another name in your pipeline

# --- Transform the Data ---
X_train_transformed = preprocessor.transform(X_train)
X_test_transformed = preprocessor.transform(X_test)

# If your transformed data is not a DataFrame but you have feature names
# Create a list of feature names that match your transformed data
# This might require extracting column names from your preprocessor if it includes OneHotEncoder or similar
feature_names = []  # Fill this with your actual feature names after preprocessing

# --- Create a Background Sample for SHAP ---
background = shap.sample(X_train_transformed, 100, random_state=42)

# --- Create the SHAP Explainer ---
# For classification models, you might want to use the predict_proba method instead
# to get probability estimates of bankruptcy
explainer = shap.KernelExplainer(
    lambda x: classifier.predict_proba(x)[:,1],  # For binary classification, focus on positive class
    background
)

# --- Local Explanation (for a single company) ---
# Select one observation from the test set
observation = X_test_transformed[0:1, :]
shap_values = explainer.shap_values(observation)

# Initialize the JavaScript visualization
shap.initjs()

# Create and display a force plot for the selected company
force_plot = shap.force_plot(
    explainer.expected_value, 
    shap_values, 
    features=observation, 
    feature_names=feature_names, 
    matplotlib=True
)
plt.title("SHAP Force Plot: Factors Influencing Bankruptcy Prediction")
plt.tight_layout()
plt.show()

# --- Global Explanation (for understanding overall model behavior) ---
# Compute SHAP values for a sample of the test data
X_test_sample = shap.sample(X_test_transformed, 200, random_state=42)
shap_values_test = explainer.shap_values(X_test_sample)

# Create a summary plot to show overall feature importance
plt.figure(figsize=(12, 8))
shap.summary_plot(
    shap_values_test, 
    features=X_test_sample, 
    feature_names=feature_names,
    plot_type="bar"  # This creates the feature importance bar chart you asked about
)
plt.title("Feature Importance Based on SHAP Values")
plt.tight_layout()
plt.show()

# Create a detailed summary plot showing distribution of SHAP values
plt.figure(figsize=(12, 10))
shap.summary_plot(
    shap_values_test, 
    features=X_test_sample, 
    feature_names=feature_names
)
plt.title("SHAP Summary Plot: Impact of Features on Bankruptcy Prediction")
plt.tight_layout()
plt.show()

# --- Additional Visualization: SHAP Dependence Plots ---
# For the top 3 most important features (based on your README: debt-to-equity ratio, current ratio, operating cash flow)
# Assuming these features are in your dataset and you know their indices after transformation
important_features = [
    # Replace these indices with the actual indices of your important features
    feature_names.index("debt_to_equity_ratio"),
    feature_names.index("current_ratio"),
    feature_names.index("operating_cash_flow")
]

for idx in important_features:
    plt.figure(figsize=(10, 6))
    shap.dependence_plot(
        idx, 
        shap_values_test, 
        X_test_sample,
        feature_names=feature_names
    )
    plt.title(f"SHAP Dependence Plot: {feature_names[idx]}")
    plt.tight_layout()
    plt.show()


### Key Findings Explanation