<a href="https://colab.research.google.com/github/younas10/AI_LAB/blob/main/AI_TASK1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Install & Import**

In [None]:
!pip install pandas numpy scikit-learn --quiet

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler


**Load Dataset**

In [None]:
df = pd.read_csv("/content/Banksy.csv", encoding="latin1")  # adjust path if needed
print("Original shape:", df.shape)
df.head()


Original shape: (18, 3)


Unnamed: 0,Elementary,Intermediate,Advanced
0,The auction of a Banksy painting that\ndisappe...,The controversial auction of a Banksy mural\nt...,The controversial auction of a Banksy mural th...
1,Slave Labour is a spray-painted artwork that\n...,Slave Labour is a spray-painted artwork\nshowi...,"Slave Labour, a spray-painted artwork depictin..."
2,"But Frederic Thut, the owner of the Fine\nArts...","But auctioneer Frederic Thut, the owner of\nth...","But auctioneer Frederic Thut, the owner of the..."
3,"People in Haringey, London, were very\nhappy, ...","He would not give a reason, but community\nlea...","He would not give a reason, but community\nlea..."
4,I will write to the auction house to find out...,One of our two demands was that it doesnt\ns...,One of our two demands was that it doesnt\ns...


**Handle Missing Values**

In [None]:
# Fill numeric columns with mean, categorical with mode
for col in df.columns:
    if df[col].dtype == 'O':  # object = text
        df[col].fillna(df[col].mode()[0], inplace=True)
    else:
        df[col].fillna(df[col].mean(), inplace=True)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df[col].fillna(df[col].mode()[0], inplace=True)


**Remove Duplicates**

In [None]:
df.drop_duplicates(inplace=True)
print("After removing duplicates:", df.shape)


After removing duplicates: (18, 3)


**Remove Outliers (IQR method)**

In [None]:
num_cols = df.select_dtypes(include=[np.number]).columns
for col in num_cols:
    Q1 = df[col].quantile(0.25)
    Q3 = df[col].quantile(0.75)
    IQR = Q3 - Q1
    lower, upper = Q1 - 1.5*IQR, Q3 + 1.5*IQR
    df = df[(df[col] >= lower) & (df[col] <= upper)]

print("After removing outliers:", df.shape)


After removing outliers: (18, 3)


**Normalize Numeric Columns**

In [None]:
scaler = MinMaxScaler()
num_cols = df.select_dtypes(include=[np.number]).columns
if len(num_cols) > 0:
    df[num_cols] = scaler.fit_transform(df[num_cols])
else:
    print("No numerical columns found to scale.")

No numerical columns found to scale.


**Save Cleaned Data**

In [None]:
df.to_csv("/content/Banksy_cleaned.csv", index=False)
print("Cleaned file saved successfully!")
df.head()


Cleaned file saved successfully!


Unnamed: 0,Elementary,Intermediate,Advanced
0,The auction of a Banksy painting that\ndisappe...,The controversial auction of a Banksy mural\nt...,The controversial auction of a Banksy mural th...
1,Slave Labour is a spray-painted artwork that\n...,Slave Labour is a spray-painted artwork\nshowi...,"Slave Labour, a spray-painted artwork depictin..."
2,"But Frederic Thut, the owner of the Fine\nArts...","But auctioneer Frederic Thut, the owner of\nth...","But auctioneer Frederic Thut, the owner of the..."
3,"People in Haringey, London, were very\nhappy, ...","He would not give a reason, but community\nlea...","He would not give a reason, but community\nlea..."
4,I will write to the auction house to find out...,One of our two demands was that it doesnt\ns...,One of our two demands was that it doesnt\ns...
