# Modeling
- Te goal is to build a logistic regression model and implement different pieces of this model using scikit learn library.

### Logistic Regression
- Logistic regression is a supervised machine learning algorithm used for binary classification problems (e.g., yes/no, 0/1 outcomes). It models the relationship between the input features and the probability of belonging to a particular class using the logistic (sigmoid) function.

- In scikit-learn, logistic regression is implemented using the LogisticRegression class. It supports various regularization --techniques and optimization solvers, making it flexible for different datasets.

# Required Libraries

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pandas as pd 
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load Data

- To review the dataset and explore its descriptive statistics, please click on the **EDA** section.


[EDA Section](http://localhost:8888/notebooks/OneDrive/Desktop/Hospital%20Readmission/Hospital-Readmission-/00%20-%20eda.ipynb)


In [2]:
Data_Set=pd.read_csv("Data/hospital_readmissions.csv")
print(f"\033[1mDataset Shape:\033[0m {Data_Set.shape}")



[1mDataset Shape:[0m (25000, 17)


In [3]:

numerical_cols = ['time_in_hospital', 'n_lab_procedures', 'n_procedures', 
                  'n_medications', 'n_outpatient', 'n_inpatient', 'n_emergency']

categorical_cols = ['age', 'medical_specialty', 'diag_1', 'diag_2', 'diag_3',
                    'glucose_test', 'A1Ctest', 'change', 'diabetes_med']





# Preprocessing Step: Scaling Numerical and Encoding Categorical Data


In [4]:
preprocessor = ColumnTransformer(
    transformers=[
        ('num', StandardScaler(), numerical_cols),          # Scaling numerical data
        ('cat', OneHotEncoder(drop='first'), categorical_cols) # One-hot encoding categorical data
    ]
)


# Define the pipeline before calling fit()
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),  # Preprocessing step
    ('classifier', LogisticRegression(max_iter=1000))  # Logistic regression model
])


# Split the Dataset

In [5]:

X = Data_Set.drop('readmitted', axis=1)
y = Data_Set['readmitted']

# Ensure your target variable is binary (0, 1) for logistic regression
# If 'readmitted' is categorical (e.g., 'yes', 'no'), map it to binary values
y = y.map({'yes': 1, 'no': 0}) if y.dtype == 'object' else y

# 6. Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 7. Train the model using the pipeline
pipeline.fit(X_train, y_train)


y_pred = pipeline.predict(X_test)


# Evaluate the Model

In [6]:

print("Accuracy:", accuracy_score(y_test, y_pred))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))



Accuracy: 0.61
Confusion Matrix:
 [[2082  576]
 [1374  968]]
Classification Report:
               precision    recall  f1-score   support

           0       0.60      0.78      0.68      2658
           1       0.63      0.41      0.50      2342

    accuracy                           0.61      5000
   macro avg       0.61      0.60      0.59      5000
weighted avg       0.61      0.61      0.60      5000

