## 1. Import libraries and Load Data:

We convert the 'AccOpen' column from strings to datetime format using pd.to_datetime() for date-based operations.


In [2]:
import pandas as pd

# Replace 'your_data.csv' with your actual file path
data = pd.read_csv("Basic_data.csv")

# Assuming 'AccOpen' is a string in 'DD-MON-YYYY' format
data["AccOpen"] = pd.to_datetime(data["AccOpen"])


print(data)

     AccID      Name Gender  Age    AccOpen  Balance AccStatus
0   ACC001       Raj      M   30 2020-01-01     5000    Active
1   ACC002      Riya      F   29 2021-01-01     8000  Inactive
2   ACC003      Amit      M   35 2020-02-02    12000    Active
3   ACC004     Priya      F   28 2021-02-02     4500    Active
4   ACC005    Vikram      M   40 2020-03-03     7800    Active
5   ACC006     Sonia      F   32 2021-03-03     9200  Inactive
6   ACC007     Rahul      M   25 2020-04-04    10500    Active
7   ACC008     Pooja      F   22 2021-04-04     3800    Active
8   ACC009     Sunil      M   50 2020-05-05    25000    Active
9   ACC010    Anjali      F   45 2021-05-05    18000    Active
10  ACC011     Vivek      M   38 2020-06-06    11200    Active
11  ACC012      Neha      F   31 2021-06-06     6700    Active
12  ACC013     Rohit      M   27 2020-07-07     9800    Active
13  ACC014     Aisha      F   24 2021-07-07     5200    Active
14  ACC015    Manish      M   42 2020-08-08    14000   

## 2.  Filtering Data with a Custom Function

We define a custom function is_high_balance to check if a balance exceeds a threshold.

apply() applies this function to each element in the "Balance" column, creating a boolean Series.

We then filter the original data (data) using this boolean Series to get accounts with high balance.



In [3]:
def is_high_balance(balance):
  """Defines a threshold for high balance accounts"""
  return balance > 10000

high_balance_data = data[data["Balance"].apply(is_high_balance)]
print(high_balance_data)



     AccID    Name Gender  Age    AccOpen  Balance AccStatus
2   ACC003    Amit      M   35 2020-02-02    12000    Active
6   ACC007   Rahul      M   25 2020-04-04    10500    Active
8   ACC009   Sunil      M   50 2020-05-05    25000    Active
9   ACC010  Anjali      F   45 2021-05-05    18000    Active
10  ACC011   Vivek      M   38 2020-06-06    11200    Active
14  ACC015  Manish      M   42 2020-08-08    14000    Active
18  ACC019    Ajay      M   55 2020-10-10    32000    Active
19  ACC020   Seema      F   48 2021-10-10    21000    Active
20  ACC021    Atul      M   41 2020-11-11    13500    Active
22  ACC023   Kapil      M   29 2020-12-12    10200    Active
24  ACC025  Deepak      M   37 2022-01-01    15800    Active
26  ACC027   Arjun      M   44 2022-02-02    17000  DeActive
27  ACC028  Sunita      F    3 2032-02-23    11000    Active


## 3. Replacing Missing Values


fillna() replaces missing values (often indicated by NaN) with a specified value ("Unknown" in this case).

unique() shows the unique values present in the "Name" column after filling missing values.

In [4]:
# Assuming 'Name' has missing values (indicated by '')
data["Name"] = data["Name"].fillna("Unknown")
print(data["Name"].unique())




['Raj' 'Riya' 'Amit' 'Priya' 'Vikram' 'Sonia' 'Rahul' 'Pooja' 'Sunil'
 'Anjali' 'Vivek' 'Neha' 'Rohit' 'Aisha' 'Manish' 'Kiara' 'Sagar' 'Rani'
 'Ajay' 'Seema' 'Atul' 'Nikita' 'Kapil' 'Nisha' 'Deepak' 'Sita' 'Arjun'
 'Sunita' 'Yash' 'Priyanka']


## 4. Creating New Features

We define a custom function get_age_group to categorize age into groups based on conditions.

apply() applies this function to each element in the "Age" column, creating a new "Age Group" column.


In [5]:
def get_age_group(age):
  """Categorizes age into groups"""
  if age < 30:
    return "Young"
  elif age < 50:
    return "Middle Aged"
  else:
    return "Senior"

data["Age Group"] = data["Age"].apply(get_age_group)
print(data[["Age", "Age Group"]].head())



   Age    Age Group
0   30  Middle Aged
1   29        Young
2   35  Middle Aged
3   28        Young
4   40  Middle Aged


## 5. Lambda function for Simple Operations

We use a lambda function (anonymous function) to square the values in the "Balance" column and create a new "Balance_Squared" column.

Lambda functions offer a concise way for short, one-line operations.


In [8]:
data["Balance_Squared"] = data["Balance"].apply(lambda x: x**2)
print(data[["Balance", "Balance_Squared"]].head())


   Balance  Balance_Squared
0     5000         25000000
1     8000         64000000
2    12000        144000000
3     4500         20250000
4     7800         60840000


## 6. Combining Functions for Complex Tasks

We define a function get_active_accounts_average_balance that takes the data as input.
It filters for active accounts, calculates the average balance, and returns the result.
We use data.copy() to avoid modifying the original data within the function.

In [9]:
def get_active_accounts_average_balance(data):
  """Calculates average balance for active accounts"""
  active_data = data[data["AccStatus"] == "Active"]
  return active_data["Balance"].mean()

average_balance = get_active_accounts_average_balance(data.copy())
print("Average Balance (Active Accounts):", average_balance)


Average Balance (Active Accounts): 10903.703703703704


### Considerations

Remember to replace "Basic_data.csv" with your actual file path when using these examples in your code.

By mastering these functions, you can effectively manipulate, transform, and analyze your CSV data for better insights.