#**üßæ 1. Introduction**

This notebook analyses the Diabetes 130-US Hospitals dataset from the UCI Machine Learning Repository. The primary goal is to analyze patient readmission patterns and identify key predictors using Logistic Regression.

Dataset Source: UCI ML Repository (ID: 296)

Domain: Healthcare

Focus: 30-day hospital readmission for diabetic


#**üì¶ 2. Dataset Import**

We use the cleaned dataset from EDA for this model building

In [1]:
import pandas as pd

df = pd.read_csv("df_filtered_first_encounter_mapped.csv")

print(df.shape)
print(df.columns[:10])
df.head()

(59094, 121)
Index(['Unnamed: 0', 'age', 'time_in_hospital', 'num_lab_procedures',
       'num_procedures', 'num_medications', 'number_outpatient',
       'number_emergency', 'number_inpatient', 'number_diagnoses'],
      dtype='object')


Unnamed: 0.1,Unnamed: 0,age,time_in_hospital,num_lab_procedures,num_procedures,num_medications,number_outpatient,number_emergency,number_inpatient,number_diagnoses,...,diag_3_category_Mental Disorders,diag_3_category_Musculoskeletal System,diag_3_category_Neoplasms,diag_3_category_Nervous System and Sense Organs,diag_3_category_Pregnancy and Childbirth,diag_3_category_Respiratory System,diag_3_category_Skin and Subcutaneous Tissue,diag_3_category_Supplementary Factors (V codes),diag_3_category_Symptoms and Ill-Defined Conditions,diag_3_category_Unknown
0,0,8,13,68,2,28,0,0,0,8,...,0,0,0,0,0,0,0,0,0,0
1,1,9,12,33,3,18,0,0,0,8,...,0,0,0,0,0,1,0,0,0,0
2,2,4,1,51,0,8,0,0,0,5,...,0,0,0,0,0,0,0,0,0,0
3,3,6,7,62,0,11,0,0,0,7,...,0,0,1,0,0,0,0,0,0,0
4,4,4,7,60,0,15,0,1,0,8,...,0,0,0,0,0,0,0,0,0,0


#**üîç 3. Model Building**

Preparation of data for Logistic Regression

In [2]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Define target variable
target = "readmitted_flag"

# Separate Features (X) and Target (y)
X = df.drop(columns=["Unnamed: 0", target])
y = df[target]

# --- Train-Test Split ---
# Using 80/20 split and stratify to maintain same readmission ratio in both sets

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y)

print(f"X_train shape: {X_train.shape}, X_test shape: {X_test.shape}")

X_train shape: (47275, 119), X_test shape: (11819, 119)


#**‚öôÔ∏è 4. Feature Scaling**
Standardize features before running Logistic Regression


In [3]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

#**üß† 5. LASSO Logistic Regression**

We use L1 penalty to automatically perform feature selection

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report
import numpy as np

lasso = LogisticRegression(
    penalty="l1",
    solver="liblinear",
    C=0.1,
    max_iter=1000
)

lasso.fit(X_train_scaled, y_train)

# Predication
y_pred = lasso.predict(X_test_scaled)
y_prob = lasso.predict_proba(X_test_scaled)[:, 1]

#**üìä 6. LASSO Logistic Regression**

Evaluation of the model

In [5]:
# Evaluation
auc = roc_auc_score(y_test, y_prob)
print(f"ROC AUC Score: {auc:.3f}")
print(classification_report(y_test, y_pred))

ROC AUC Score: 0.706
              precision    recall  f1-score   support

           0       0.86      0.99      0.92     10094
           1       0.58      0.06      0.11      1725

    accuracy                           0.86     11819
   macro avg       0.72      0.53      0.52     11819
weighted avg       0.82      0.86      0.80     11819



#**üîé 6. Identify Important Predictors**

Extract coefficients with non-zero weights from LASSO

In [6]:
coef = pd.DataFrame({
    "Feature": X.columns,
    "Coefficient": lasso.coef_[0]
})
coef["Abs"] = np.abs(coef["Coefficient"])
important_features = coef[coef["Coefficient"] != 0].sort_values("Abs", ascending=False)

print("Top 20 Features Selected by LASSO:")
important_features.head(20)

Top 20 Features Selected by LASSO:


Unnamed: 0,Feature,Coefficient,Abs
42,discharge_disposition_name_Expired,-0.783019,0.783019
7,number_inpatient,0.359464,0.359464
27,discharge_disposition_name_Discharged to home,-0.164119,0.164119
45,discharge_disposition_name_Hospice / home,-0.146511,0.146511
26,diabetesMed_Yes,0.146177,0.146177
19,comorbidity_score,0.141241,0.141241
33,discharge_disposition_name_Discharged/transfer...,0.140957,0.140957
46,discharge_disposition_name_Hospice / medical f...,-0.12511,0.12511
6,number_emergency,0.125054,0.125054
64,admission_source_name_Transfer from a hospital,-0.094061,0.094061
