### Context

This notebook was adapted to the **UCI Car Evaluation** dataset at [`data/car.data`](https://archive.ics.uci.edu/dataset/19/car+evaluation).

**Columns**: `buying, maint, doors, persons, lug_boot, safety, class`  
**Target**: `class`

In [1]:
# Load dataset
from pathlib import Path
import pandas as pd

data_path = Path("data/car.data")
columns = ["buying","maint","doors","persons","lug_boot","safety","class"]
df = pd.read_csv(data_path, header=None, names=columns).astype(str)

print("Shape:", df.shape)
display(df.head())

Shape: (1728, 7)


Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
0,vhigh,vhigh,2,2,small,low,unacc
1,vhigh,vhigh,2,2,small,med,unacc
2,vhigh,vhigh,2,2,small,high,unacc
3,vhigh,vhigh,2,2,med,low,unacc
4,vhigh,vhigh,2,2,med,med,unacc


### # Create EDA reports

In [2]:
# Quick EDA
display(df.info())
display(df.describe(include="all"))

# Categorical cardinality & missingness
summary = []
for col in df.columns:
    vc = df[col].value_counts(dropna=False)
    summary.append({
        "column": col,
        "n_unique": df[col].nunique(dropna=False),
        "top_value": vc.index[0],
        "top_count": int(vc.iloc[0]),
        "missing": int(df[col].isna().sum() + (df[col] == "").sum())
    })
import pandas as pd
eda_summary = pd.DataFrame(summary)
display(eda_summary)

# Per-column value counts for quick inspection
for col in df.columns:
    print(f"\n=== {col} ===")
    display(df[col].value_counts())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1728 entries, 0 to 1727
Data columns (total 7 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   buying    1728 non-null   object
 1   maint     1728 non-null   object
 2   doors     1728 non-null   object
 3   persons   1728 non-null   object
 4   lug_boot  1728 non-null   object
 5   safety    1728 non-null   object
 6   class     1728 non-null   object
dtypes: object(7)
memory usage: 94.6+ KB


None

Unnamed: 0,buying,maint,doors,persons,lug_boot,safety,class
count,1728,1728,1728,1728,1728,1728,1728
unique,4,4,4,3,3,3,4
top,vhigh,vhigh,2,2,small,low,unacc
freq,432,432,432,576,576,576,1210


Unnamed: 0,column,n_unique,top_value,top_count,missing
0,buying,4,vhigh,432,0
1,maint,4,vhigh,432,0
2,doors,4,2,432,0
3,persons,3,2,576,0
4,lug_boot,3,small,576,0
5,safety,3,low,576,0
6,class,4,unacc,1210,0



=== buying ===


buying
vhigh    432
high     432
med      432
low      432
Name: count, dtype: int64


=== maint ===


maint
vhigh    432
high     432
med      432
low      432
Name: count, dtype: int64


=== doors ===


doors
2        432
3        432
4        432
5more    432
Name: count, dtype: int64


=== persons ===


persons
2       576
4       576
more    576
Name: count, dtype: int64


=== lug_boot ===


lug_boot
small    576
med      576
big      576
Name: count, dtype: int64


=== safety ===


safety
low     576
med     576
high    576
Name: count, dtype: int64


=== class ===


class
unacc    1210
acc       384
good       69
vgood      65
Name: count, dtype: int64

In [3]:
! pip install ydata-profiling typing_extensions==4.7.1 sweetviz



In [4]:
import pandas as pd
from ydata_profiling import ProfileReport

out_dir = Path("EDA_report")
out_dir.mkdir(parents=True, exist_ok=True)  # create folder if missing
out_file = out_dir / "my_report.html"

# Generate the df profiling report
report = ProfileReport(df, title='My Data')
report.to_file("EDA_report/my_report.html")

  from .autonotebook import tqdm as notebook_tqdm
100%|██████████| 7/7 [00:00<00:00, 477.38it/s]0<00:00, 69.48it/s, Describe variable: class]
100%|██████████| 7/7 [00:00<00:00, 477.38it/s]0<00:00, 69.48it/s, Describe variable: class]n]
Summarize dataset: 100%|██████████| 16/16 [00:00<00:00, 46.52it/s, Completed]                
Summarize dataset: 100%|██████████| 16/16 [00:00<00:00, 46.52it/s, Completed]               
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  1.06it/s]
Generate report structure: 100%|██████████| 1/1 [00:00<00:00,  1.06it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, -1.05it/s]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 302.38it/s]
Render HTML: 100%|██████████| 1/1 [00:00<00:00, -1.05it/s]
Export report to file: 100%|██████████| 1/1 [00:00<00:00, 302.38it/s]


In [None]:
import sweetviz as sv
report = sv.analyze(df)
report.show_html('EDA_report/report.html') # Generates a HTML report

Done! Use 'show' commands to display/save.   |██████████| [100%]   00:00 -> (00:00 left)



Report EDA_report/report.html was generated! NOTEBOOK/COLAB USERS: the web browser MAY not pop up, regardless, the report IS saved in your notebook/colab files.


gio: file:///home/najmi/cubix/vizsgafeladat/backend/notebooks/EDA_report/report.html: Failed to find default application for content type ‘text/html’
