<a href="https://colab.research.google.com/github/tarakantaacharya/Stock_Movement_Analysis/blob/main/Model_Training%26Performace_metrics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Training and Performance Metrics

In [None]:
from sklearn.metrics import (
    accuracy_score,  # Calculates the ratio of correctly predicted instances to total instances
    precision_score,  # Measures the proportion of true positive predictions out of all positive predictions
    recall_score,  # Measures the proportion of true positives identified out of all actual positives
    f1_score,  # Harmonic mean of precision and recall, balancing the two metrics
    confusion_matrix,  # Summarizes prediction results as a matrix of True Positives, False Positives, etc.
    classification_report,  # Generates a detailed report including precision, recall, f1-score, and support
    roc_auc_score,  # Computes the Area Under the Receiver Operating Characteristic Curve (ROC AUC)
    roc_curve,  # Calculates the Receiver Operating Characteristic curve data (TPR vs. FPR)
    matthews_corrcoef  # Measures the quality of binary classifications with a balanced metric
)

####Explanation of Metrics:

1. Accuracy:

    Represents the overall correctness of the model's predictions.
Best suited when the dataset is balanced.

2. Precision:

    High precision means a low false positive rate.
    Useful when false positives are more costly than false negatives.
3. Recall:

    Also known as sensitivity or true positive rate.
    Important in scenarios where missing a positive case is costly (e.g., medical diagnoses).
4. F1-Score:

    Combines precision and recall into a single metric, particularly useful for imbalanced datasets.
    A high F1-score indicates a good balance between precision and recall.
5. Confusion Matrix:

    A matrix summarizing true positives, true negatives, false positives, and false negatives.
    Provides a comprehensive view of prediction errors.
6. Classification Report:

    Includes precision, recall, F1-score, and support (number of true instances for each class).
    Useful for understanding model performance across all classes.
7. ROC AUC:

    Measures the ability of the classifier to distinguish between classes.
    A value closer to 1 indicates better performance.
8. ROC Curve:

    Plots the true positive rate (TPR) against the false positive rate (FPR) at various thresholds.
    Visual representation of classifier performance.
9. Matthews Correlation Coefficient (MCC):

    A balanced metric even for imbalanced datasets.
    Values range from -1 (total disagreement) to +1 (perfect prediction).

#####When to Use:
1. Balanced datasets: Accuracy and F1-score.
2. Imbalanced datasets: Precision, recall, ROC AUC, and MCC.
3. Detailed analysis: Classification report and confusion matrix.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

1. Feature Scaling:

    -> Only applied to models like Logistic Regression, SVM, and KNN because they are sensitive to the scale of input features.

    -> Ensemble models like Random Forest and Gradient Boosting do not require feature scaling.

2. Metrics:

    A wide range of metrics is calculated to provide a comprehensive evaluation of each model’s performance.

    Special handling for models that lack predict_proba.

3. Confusion Matrix:

    Provides a granular view of model predictions in terms of True Positives, False Positives, True Negatives, and False Negatives.

4. Classification Report:

    Includes precision, recall, F1-score, and support for each class.

5. Results Dictionary:

    Each model's metrics are stored in a nested dictionary for easy conversion into a DataFrame for better readability.

6. DataFrame Summary:

    The results dictionary is converted into a DataFrame to provide a tabular summary of all models’ performances.

In [None]:
# Dictionary to store results of all models
results = {}

# Train each model and evaluate performance
for name, model in models.items():
    # Apply feature scaling for specific models that are sensitive to scale
    if name == "Logistic Regression" or name == "Support Vector Machine" or name == "K-Nearest Neighbors":
        X_train_ = scaler.fit_transform(X_train)  # Fit and transform the training data
        X_test_ = scaler.transform(X_test)  # Transform the test data
    else:
        X_train_ = X_train  # Use raw data for other models
        X_test_ = X_test

    # Train the model on the training data
    model.fit(X_train_, y_train)

    # Predict labels on the test data
    y_pred = model.predict(X_test_)

    # Predict probabilities if the model supports it
    y_pred_proba = model.predict_proba(X_test_)[:, 1] if hasattr(model, "predict_proba") else None

    # Calculate the confusion matrix
    cm = confusion_matrix(y_test, y_pred)
    tn, fp, fn, tp = cm.ravel()  # Extract true negatives, false positives, false negatives, and true positives

    # Calculate various performance metrics
    accuracy = accuracy_score(y_test, y_pred)  # Overall accuracy
    precision = precision_score(y_test, y_pred, zero_division=1)  # Precision (with zero-division handling)
    recall = recall_score(y_test, y_pred)  # Sensitivity/Recall
    f1 = f1_score(y_test, y_pred)  # F1-Score (harmonic mean of precision and recall)
    specificity = tn / (tn + fp) if (tn + fp) > 0 else 0  # Specificity: True Negative Rate
    roc_auc = roc_auc_score(y_test, y_pred_proba) if y_pred_proba is not None else None  # ROC AUC Score
    mcc = matthews_corrcoef(y_test, y_pred)  # Matthews Correlation Coefficient

    # Store all metrics in the results dictionary
    results[name] = {
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall (Sensitivity)': recall,
        'Specificity': specificity,
        'F1-Score': f1,
        'ROC AUC': roc_auc,
        'MCC': mcc,
        'Confusion Matrix': cm
    }

    # Print detailed metrics and reports for each model
    print(f"\n{name} Results:")
    print(f"Accuracy: {accuracy:.4f}")
    print(f"Precision: {precision:.4f}")
    print(f"Recall (Sensitivity): {recall:.4f}")
    print(f"Specificity: {specificity:.4f}")
    print(f"F1-Score: {f1:.4f}")
    if roc_auc is not None:
        print(f"ROC AUC: {roc_auc:.4f}")
    print(f"MCC: {mcc:.4f}")
    print("Confusion Matrix:")
    print(cm)
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, zero_division=1))

    # Convert the results dictionary into a DataFrame for better visualization
    results_df_1 = pd.DataFrame(results).T
    print("-" * 80)

# Display the consolidated DataFrame of results
print("\nSummary of Results 1:")
results_df_1


Random Forest Results:
Accuracy: 1.0000
Precision: 1.0000
Recall (Sensitivity): 1.0000
Specificity: 1.0000
F1-Score: 1.0000
ROC AUC: 1.0000
MCC: 1.0000
Confusion Matrix:
[[1485    0]
 [   0 1517]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1485
           1       1.00      1.00      1.00      1517

    accuracy                           1.00      3002
   macro avg       1.00      1.00      1.00      3002
weighted avg       1.00      1.00      1.00      3002

--------------------------------------------------------------------------------

Gradient Boosting Results:
Accuracy: 1.0000
Precision: 1.0000
Recall (Sensitivity): 1.0000
Specificity: 1.0000
F1-Score: 1.0000
ROC AUC: 1.0000
MCC: 1.0000
Confusion Matrix:
[[1485    0]
 [   0 1517]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      1485
           1       1.00      1.00  

Unnamed: 0,Accuracy,Precision,Recall (Sensitivity),Specificity,F1-Score,ROC AUC,MCC,Confusion Matrix
Random Forest,1.0,1.0,1.0,1.0,1.0,1.0,1.0,"[[1485, 0], [0, 1517]]"
Gradient Boosting,1.0,1.0,1.0,1.0,1.0,1.0,1.0,"[[1485, 0], [0, 1517]]"
AdaBoost,1.0,1.0,1.0,1.0,1.0,1.0,1.0,"[[1485, 0], [0, 1517]]"
Logistic Regression,0.989007,0.991391,0.986816,0.991246,0.989098,0.999629,0.978024,"[[1472, 13], [20, 1497]]"
Support Vector Machine,0.920053,0.92652,0.914305,0.925926,0.920372,,0.840186,"[[1375, 110], [130, 1387]]"
K-Nearest Neighbors,0.796802,0.800132,0.796968,0.796633,0.798547,0.878995,0.59358,"[[1183, 302], [308, 1209]]"
Deep Neural Network,0.49467,1.0,0.0,1.0,0.0,0.5,0.0,"[[1485, 0], [1517, 0]]"


#####Observations:

1. Random Forest, Gradient Boosting, AdaBoost gives well outstanding performance metrics
2. Logistic Regression also nears to good metrics but it missed few true predictions
3. Support Vector Machine marks up to 90% accuracy and missed out many true predictions
4. K-nearest has only 80% accuracy and it missed out more than SVM true predictions
5. DNN has the worst metrics in all models

---

### Checking whether the model overfitting or not....

1. Imports:

    -> StratifiedKFold for splitting the dataset into stratified folds.
    cross_val_score for performing cross-validation.

    -> Metrics from sklearn.metrics for scoring functions.

2. Cross-Validation Setup:

    -> StratifiedKFold ensures the proportion of each class is consistent across all folds.

    -> n_splits=5 divides the dataset into 5 folds.

    -> shuffle=True ensures random shuffling of data before splitting.
    
    -> random_state=42 makes the process reproducible.

3. Scoring Functions:

    -> The scoring functions (accuracy, precision, recall, and F1-score) are pre-defined using make_scorer for compatibility with cross_val_score.

    -> The zero_division=1 argument ensures no errors when a division by zero occurs in metrics like precision or recall.

4. Cross-Validation for Each Model:

    -> Loop over models in the models dictionary.

    -> For each model, compute cross-validation scores for all metrics defined in scoring_functions.

    -> Calculate the mean and standard deviation of the scores across the 5 folds.

5. Results Storage:

    Metrics are stored in a nested dictionary cv_results, where each model's results include the mean and standard deviation for all metrics.

6. Summary Table:

    The cv_results dictionary is converted into a Pandas DataFrame (cv_results_df) for easier viewing of results.


In [None]:
import pandas as pd
from sklearn.model_selection import StratifiedKFold, cross_val_score
import numpy as np
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

# Define Stratified K-Fold Cross-Validation (5 folds)
kf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# Dictionary to store the mean and standard deviation of each model's metrics
cv_results = {}

# Define scoring functions using make_scorer outside the loop
scoring_functions = {
    'Accuracy': make_scorer(accuracy_score),
    'Precision': make_scorer(precision_score, zero_division=1),
    'Recall': make_scorer(recall_score, zero_division=1),
    'F1-Score': make_scorer(f1_score, zero_division=1)
}


# Evaluate each model using cross-validation
for name, model in models.items():
    print(f"\n{name} Cross-Validation Results:")
    cv_results[name] = {}

    # Calculate and display metrics
    for metric, scorer in scoring_functions.items():  # Use pre-defined scorers

        # Perform cross-validation and calculate scores
        scores = cross_val_score(model, X, y, cv=kf, scoring=scorer)
        mean_score = np.mean(scores)
        std_score = np.std(scores)

        # Store results in the dictionary
        cv_results[name][f"{metric} Mean"] = mean_score
        cv_results[name][f"{metric} Std"] = std_score

        # Print results for each metric
        print(f"{metric}: Mean = {mean_score:.4f}, Std = {std_score:.4f}")
        print("-" * 80)

# Convert the results dictionary into a DataFrame
cv_results_df = pd.DataFrame(cv_results).T

# Display the DataFrame
print("\nCross-Validation Summary:")
cv_results_df


Random Forest Cross-Validation Results:
Accuracy: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
Precision: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
Recall: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
F1-Score: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------

Gradient Boosting Cross-Validation Results:
Accuracy: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
Precision: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
Recall: Mean = 1.0000, Std = 0.0000
--------------------------------------------------------------------------------
F1-Score: Mean = 1.0000, Std = 0.0000
---------------------------------------------

Unnamed: 0,Accuracy Mean,Accuracy Std,Precision Mean,Precision Std,Recall Mean,Recall Std,F1-Score Mean,F1-Score Std
Random Forest,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
Gradient Boosting,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
AdaBoost,1.0,0.0,1.0,0.0,1.0,0.0,1.0,0.0
Logistic Regression,0.988007,0.002793,0.98698,0.00404,0.989072,0.002295,0.988022,0.002784
Support Vector Machine,0.559165,0.015078,0.577496,0.02194,0.444559,0.011803,0.502158,0.012214
K-Nearest Neighbors,0.655919,0.009736,0.64548,0.009154,0.691902,0.014445,0.667832,0.01019
Deep Neural Network,0.5,0.000298,0.701337,0.243873,0.074933,0.149867,0.133333,0.266667


#### Observations from the Cross-Validation Summary:

1. **Ensemble Models Perform Perfectly**:
   - **Random Forest**, **Gradient Boosting**, and **AdaBoost** achieve a perfect score (1.000) across all metrics.
   - Observations:
     - This might indicate either excellent model fitting or potential data leakage.
     - Double-check preprocessing, cross-validation setup, and data splitting to ensure the models aren't being exposed to the test data during training.

2. **Logistic Regression Performs Very Well**:
   - **Accuracy Mean**: 0.988, with a low standard deviation (0.0028), indicating consistent performance across folds.
   - **Precision Mean**: 0.987, **Recall Mean**: 0.989, and **F1-Score Mean**: 0.988.
   - Observations:
     - Logistic Regression demonstrates reliable and balanced performance.
     - It's slightly behind the ensemble models but still very effective, likely due to feature scaling and regularization.

3. **Support Vector Machine (SVM) Struggles**:
   - **Accuracy Mean**: 0.559, **Precision Mean**: 0.577, **Recall Mean**: 0.445, and **F1-Score Mean**: 0.502.
   - High standard deviations (e.g., **Precision Std**: 0.0219) indicate inconsistency across folds.
   - Observations:
     - SVM might not be suitable for this dataset or may require further tuning of hyperparameters (e.g., kernel type, C, and gamma values).
     - Scaling has been applied, so other factors like class imbalance or feature relevance might need to be addressed.

4. **K-Nearest Neighbors (KNN) Shows Moderate Performance**:
   - **Accuracy Mean**: 0.656, **F1-Score Mean**: 0.668.
   - Consistent performance with low standard deviations across metrics (e.g., **Accuracy Std**: 0.0097).
   - Observations:
     - KNN could benefit from tuning the number of neighbors (`n_neighbors`) and distance metrics.
     - It performs slightly better than SVM but is not competitive with ensemble or logistic models.

5. **Deep Neural Network (DNN) Performs Poorly**:
   - **Accuracy Mean**: 0.500 (essentially random guessing).
   - **Precision Mean**: 0.701, but extremely high standard deviation (0.2439), indicating unreliable predictions.
   - **Recall Mean**: 0.075, **F1-Score Mean**: 0.133.
   - Observations:
     - The DNN fails to generalize, possibly due to:
       - Insufficient training epochs.
       - Suboptimal architecture (e.g., layer sizes, dropout rates, activation functions).
       - The model might be underfitting or not learning effectively with the given dataset.
     - Consider fine-tuning hyperparameters or increasing the size and quality of the dataset.



Saved the metrics and cross validation results into csv file for visulaization purpose....

In [None]:
# Save the results to a CSV file if needed
results_df_1.to_csv("metrics_results.csv", index=True)
# Save the results to a CSV file if needed
cv_results_df.to_csv("cross_validation_results.csv", index=True)