## üßæ Introduction

This notebook explores the Diabetes 130-US Hospitals dataset from the UCI Machine Learning Repository. The primary goal is to analyze patient readmission patterns and identify key predictors using data analytics and machine learning.

Dataset Source: UCI ML Repository (ID: 296)  
Domain: Healthcare  
Focus: 30-day hospital readmission for diabetic 

## üì¶ Dataset Import

We use the `ucimlrepo` Python package to fetch the dataset directly from the UCI ML Repository.

In [1]:
from ucimlrepo import fetch_ucirepo, list_available_datasets

# check which datasets can be imported
list_available_datasets()

# import dataset
diabetes = fetch_ucirepo(id=296)



-------------------------------------
The following datasets are available:
-------------------------------------
Dataset Name                                                                            ID    
------------                                                                            --    
Abalone                                                                                 1     
Adult                                                                                   2     
Annealing                                                                               3     
Audiology (Standardized)                                                                8     
Auto MPG                                                                                9     
Automobile                                                                              10    
Balance Scale                                                                           12    
Balloons                       

In [5]:
print(type(diabetes.data))

<class 'ucimlrepo.dotdict.dotdict'>


In [9]:
import pandas as pd

---

### üîç 3. **Data Overview**
```markdown
## üîç Data Overview

We convert the dataset into a Pandas DataFrame for exploration.

In [10]:
df = pd.DataFrame(data=diabetes.data.original)
df.head()

The dataset includes:

Features: Demographics, admission details, lab results, medications

Target: Readmission status (e.g., <30 days, >30 days, or no readmission)

In [11]:
from ydata_profiling import ProfileReport

In [12]:
profile = ProfileReport(df,title="Diabetes Readmission Analysis")

In [13]:
profile.to_file("diabetes_readmission.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]


  0%|                                                                                           | 0/50 [00:00<?, ?it/s][A
  2%|‚ñà‚ñã                                                                                 | 1/50 [00:08<06:57,  8.53s/it][A
  4%|‚ñà‚ñà‚ñà‚ñé                                                                               | 2/50 [00:08<02:52,  3.58s/it][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [00:17<00:00,  2.84it/s][A


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

In [19]:
profile_all = ProfileReport(df,title="Diabetes Readmission Analysis",correlations={
            "auto": {"calculate": True},
            "pearson": {"calculate": True},
            "spearman": {"calculate": True},
            "kendall": {"calculate": True},
            "phi_k": {"calculate": True},
            "cramers": {"calculate": True},
        },)

In [20]:
profile_all.to_file("diabetes_readmission_all_corr.html")

Summarize dataset:   0%|          | 0/5 [00:00<?, ?it/s]


  0%|                                                                                           | 0/50 [00:00<?, ?it/s][A
  2%|‚ñà‚ñã                                                                                 | 1/50 [00:08<06:26,  7.88s/it][A
  4%|‚ñà‚ñà‚ñà‚ñé                                                                               | 2/50 [00:08<02:53,  3.61s/it][A
100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 50/50 [00:16<00:00,  3.00it/s][A


Generate report structure:   0%|          | 0/1 [00:00<?, ?it/s]

Render HTML:   0%|          | 0/1 [00:00<?, ?it/s]

Export report to file:   0%|          | 0/1 [00:00<?, ?it/s]

---

### üßº 4. **Data Cleaning & Preprocessing**
```markdown
## üßº Data Cleaning & Preprocessing

Steps include:
- Handling missing values
- Encoding categorical variables
- Normalizing numerical features
- Feature selection based on domain knowledge and correlation analysis

## üìä Exploratory Data Analysis (EDA)

We visualize key patterns such as:
- Distribution of readmission status
- Relationship between number of medications and readmission
- Impact of admission type (e.g., emergency) on readmission rates