# **<h3 align="center">Machine Learning - Project</h3>**
## **<h3 align="center">7. Integration and Final Predictions</h3>**
### **<h3 align="center">Group 30 - Project</h3>**


### Group Members
| Name              | Email                        | Student ID |
|-------------------|------------------------------|------------|
| Alexandra Pinto   | 20211599@novaims.unl.pt      | 20211599   |
| Gonçalo Peres     | 20211625@novaims.unl.pt      | 20211625   |
| Leonor Mira       | 20240658@novaims.unl.pt      | 20240658   |
| Miguel Natário    | 20240498@novaims.unl.pt      | 20240498   |
| Nuno Bernardino   | 20211546@novaims.unl.pt      | 20211546    |


---

### **7. Integration and Final Predictions Notebook**
**Description:**
In this notebook, we integrate the results from all levels of the hierarchy to produce the **final classification outputs** and evaluate the overall pipeline.

Key steps include:
- Loading predictions from **Level 1**, **Level 2 Binary**, and **Level 2 Multi-Class** notebooks.
- **Merging predictions:** Combine outputs from all levels to assign a final class to each case.
- **Post-processing:** Apply any necessary adjustments or probability thresholds to improve consistency.
- **Evaluation:** Assess the pipeline's overall performance using metrics like accuracy, F1-score, and confusion matrices.
- **Output:** Save the final predictions in a structured format for deployment or reporting.

This notebook serves as the culmination of the hierarchical classification framework, ensuring all components work seamlessly together.

---

## Table of Contents
* [1. Import the Libraries](#chapter1)
* [2. Load and Prepare Datasets](#chapter2)
* [3. Merging Results](#chapter3)

# 1. Import the Libraries 📚<a class="anchor" id="chapter1"></a>

In [2]:
# --- Standard Libraries ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import zipfile


# --- Scikit-Learn Modules for Data Partitioning and Preprocessing ---
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder, LabelEncoder, MinMaxScaler, RobustScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import LabelEncoder


# --- Feature Selection Methods ---
# Filter Methods
import scipy.stats as stats
from scipy.stats import chi2_contingency
from sklearn.feature_selection import mutual_info_classif, chi2, SelectKBest

# Wrapper Methods
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

# Embedded Methods
from sklearn.linear_model import LassoCV

# --- Evaluation Metrics ---
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

#from xgboost import XGBClassifier

# --- Warnings ---


import warnings
warnings.filterwarnings('ignore')


#selecionar apenas as função que vamos usar neste :)
from utils import plot_importance, cor_heatmap, find_optimal_features_with_rfe, compare_rf_feature_importances,compare_feature_importances, select_high_score_features_chi2_no_model,select_high_score_features_MIC, metrics

# 2. Load and Prepare Datasets 📁<a class="anchor" id="chapter2"></a>

In [21]:
# Carregar o primeiro dataset
X_test_final_1 = pd.read_csv('X_test_final_1.csv')
# Carregar o segundo dataset
X_test_final_2 = pd.read_csv('X_test_final_2.csv')

# 3. Merging the results <a class="anchor" id="chapter3"></a>

In [22]:
# Combinar as colunas Final_Predictions, preenchendo valores ausentes (NaNs) do primeiro dataset com valores do segundo
X_test_final_combined = X_test_final_1.copy()
X_test_final_combined['Final_Predictions'] = X_test_final_1['Final_Predictions'].combine_first(X_test_final_2['Final_Predictions'])

In [23]:
X_test_final_combined['Final_Predictions'].value_counts()

Final_Predictions
2.0    169697
3.0    161111
4.0     43308
1.0      8197
5.0      5111
6.0       479
8.0        72
Name: count, dtype: int64

In [6]:
# Salvar o dataset consolidado
X_test_final_combined.to_csv('X_test_final_combined.csv', index=False)

In [18]:
# Mapear os valores de 'Final_Predictions' para os tipos de lesão
injury_type_mapping = {
    1: "NON-COMP",
    2: "NON-COMP",
    3: "TEMPORARY",
    5: "TEMPORARY",
    6: "PERMANENT",
    7: "PERMANENT",
    8: "PERMANENT"
}

# Criar a coluna 'Claim Injury Type' com o formato correto
X_test_final_combined['Claim Injury Type'] = (
    X_test_final_combined['Final_Predictions'].astype(int).astype(str) + ". " + 
    X_test_final_combined['Final_Predictions'].map(injury_type_mapping)
)

# Selecionar apenas as colunas necessárias para a submissão
submission = X_test_final_combined[['Claim Identifier', 'Claim Injury Type']]

# Exibir as primeiras linhas do arquivo gerado
print(submission.head())


   Claim Identifier Claim Injury Type
0           6165911      3. TEMPORARY
1           6166141      3. TEMPORARY
2           6165907      3. TEMPORARY
3           6166047      3. TEMPORARY
4           6166102      3. TEMPORARY


In [19]:
# Salvar o arquivo de submissão
submission.to_csv('submission.csv', index=False)

In [20]:
import os
print(os.getcwd())


c:\Users\migue\OneDrive\Documentos\GitHub\Machine_Learning_project\Deliverables
