## Basic Data Set Preparation Activities in Python using a CSV File

### 1. Import libraries and Load Data

In [1]:
import pandas as pd

# Replace 'data.csv' with your actual file path
data = pd.read_csv("Basic_data.csv")

print(data)

     AccID      Name Gender  Age    AccOpen  Balance AccStatus
0   ACC001       Raj      M   30  01-JAN-20     5000    Active
1   ACC002      Riya      F   29  01-JAN-21     8000  Inactive
2   ACC003      Amit      M   35  02-FEB-20    12000    Active
3   ACC004     Priya      F   28  02-FEB-21     4500    Active
4   ACC005    Vikram      M   40  03-MAR-20     7800    Active
5   ACC006     Sonia      F   32  03-MAR-21     9200  Inactive
6   ACC007     Rahul      M   25  04-APR-20    10500    Active
7   ACC008     Pooja      F   22  04-APR-21     3800    Active
8   ACC009     Sunil      M   50  05-MAY-20    25000    Active
9   ACC010    Anjali      F   45  05-MAY-21    18000    Active
10  ACC011     Vivek      M   38  06-JUN-20    11200    Active
11  ACC012      Neha      F   31  06-JUN-21     6700    Active
12  ACC013     Rohit      M   27  07-JUL-20     9800    Active
13  ACC014     Aisha      F   24  07-JUL-21     5200    Active
14  ACC015    Manish      M   42  08-AUG-20    14000   

### 2. Creating a New Column

You can create a new column based on calculations, transformations, or existing data.

In [4]:
# Example: Calculate a new column 'Discount' (replace formula as needed)
data["Interest"] = data["Balance"] * 0.01

# Example: Create a new column with a constant value
data["Active_Acct"] = True

print(data.head())


    AccID    Name Gender  Age    AccOpen  Balance AccStatus  Interest  \
0  ACC001     Raj      M   30  01-JAN-20     5000    Active      50.0   
1  ACC002    Riya      F   29  01-JAN-21     8000  Inactive      80.0   
2  ACC003    Amit      M   35  02-FEB-20    12000    Active     120.0   
3  ACC004   Priya      F   28  02-FEB-21     4500    Active      45.0   
4  ACC005  Vikram      M   40  03-MAR-20     7800    Active      78.0   

   Active_Acct  
0         True  
1         True  
2         True  
3         True  
4         True  


### 3. Replacing a Column

Replace an entire column with a new column or a constant value.


In [7]:
# Example: Replace 'Old_Column' with 'New_Column' (assuming compatible data types)
data = data.rename(columns={"Name": "Customer_Name"})

# Example: Replace 'Status' column with 'Inactive' for all rows
data["Active_Acct"] = False

print(data.head())


    AccID Customer_Name Gender  Age    AccOpen  Balance AccStatus  Interest  \
0  ACC001           Raj      M   30  01-JAN-20     5000    Active      50.0   
1  ACC002          Riya      F   29  01-JAN-21     8000  Inactive      80.0   
2  ACC003          Amit      M   35  02-FEB-20    12000    Active     120.0   
3  ACC004         Priya      F   28  02-FEB-21     4500    Active      45.0   
4  ACC005        Vikram      M   40  03-MAR-20     7800    Active      78.0   

   Active_Acct  
0        False  
1        False  
2        False  
3        False  
4        False  


### 4.  Replacing Values in a Column:

Replace specific values within a column with new values.

In [9]:
# Example: Replace 'Pending' with 'Processing' in 'Order_Status'
data["AccStatus"] = data["AccStatus"].replace("Active", "Live")

# Example: Replace values based on a condition (replace thresholds as needed)
data["Risk_Level"] = data["Balance"].apply(lambda x: "High" if x < 3000 else "Low")

print(data.head())

    AccID Customer_Name Gender  Age    AccOpen  Balance AccStatus  Interest  \
0  ACC001           Raj      M   30  01-JAN-20     5000      Live      50.0   
1  ACC002          Riya      F   29  01-JAN-21     8000  Inactive      80.0   
2  ACC003          Amit      M   35  02-FEB-20    12000      Live     120.0   
3  ACC004         Priya      F   28  02-FEB-21     4500      Live      45.0   
4  ACC005        Vikram      M   40  03-MAR-20     7800      Live      78.0   

   Active_Acct Risk_Level  
0        False        Low  
1        False        Low  
2        False        Low  
3        False        Low  
4        False        Low  


## 5. Splitting a Date Column:

Extract year, month, and day components from a date column.

In [12]:
# Example: Assuming 'Joined' is a datetime column

data["Joined"] = pd.to_datetime(data["AccOpen"])

data["Joined_Year"] = data["Joined"].dt.year
data["Joined_Month"] = data["Joined"].dt.month
data["Joined_Day"] = data["Joined"].dt.day

# Optional: Drop the original date column if no longer needed
# data = data.drop("Joined", axis=1)


print(data.head())

    AccID Customer_Name Gender  Age    AccOpen  Balance AccStatus  Interest  \
0  ACC001           Raj      M   30  01-JAN-20     5000      Live      50.0   
1  ACC002          Riya      F   29  01-JAN-21     8000  Inactive      80.0   
2  ACC003          Amit      M   35  02-FEB-20    12000      Live     120.0   
3  ACC004         Priya      F   28  02-FEB-21     4500      Live      45.0   
4  ACC005        Vikram      M   40  03-MAR-20     7800      Live      78.0   

   Active_Acct Risk_Level     Joined  Joined_Year  Joined_Month  Joined_Day  
0        False        Low 2020-01-01         2020             1           1  
1        False        Low 2021-01-01         2021             1           1  
2        False        Low 2020-02-02         2020             2           2  
3        False        Low 2021-02-02         2021             2           2  
4        False        Low 2020-03-03         2020             3           3  


# Remember:

These are basic examples. Explore pandas methods like apply, map, and conditional statements for more complex data manipulation.

Always consider the data types and desired outcomes when performing these operations.

By incorporating these techniques, you can effectively prepare your data from the CSV file for further analysis and modeling tasks.