# CSV - Processing


## CSV Reading - CSV method

Advantages:

Simple and built-in, no external library installation required.

Flexible with various reading modes ('r', 'w', 'a') and delimiters.

Suitable for smaller datasets or quick data exploration.


Considerations:

Limited functionality for advanced tasks like data manipulation or error handling.

In [6]:
import csv

with open("Basic_data.csv", "r") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        # Access data in each row
        print(row)


['AccID', 'Name', 'Gender', 'Age', 'AccOpen', 'Balance', 'AccStatus']
['ACC001', 'Raj', 'M', '30', '01-JAN-20', '5000', 'Active']
['ACC002', 'Riya', 'F', '29', '01-JAN-21', '8000', 'Inactive']
['ACC003', 'Amit', 'M', '35', '02-FEB-20', '12000', 'Active']
['ACC004', 'Priya', 'F', '28', '02-FEB-21', '4500', 'Active']
['ACC005', 'Vikram', 'M', '40', '03-MAR-20', '7800', 'Active']
['ACC006', 'Sonia', 'F', '32', '03-MAR-21', '9200', 'Inactive']
['ACC007', 'Rahul', 'M', '25', '04-APR-20', '10500', 'Active']
['ACC008', 'Pooja', 'F', '22', '04-APR-21', '3800', 'Active']
['ACC009', 'Sunil', 'M', '50', '05-MAY-20', '25000', 'Active']
['ACC010', 'Anjali', 'F', '45', '05-MAY-21', '18000', 'Active']
['ACC011', 'Vivek', 'M', '38', '06-JUN-20', '11200', 'Active']
['ACC012', 'Neha', 'F', '31', '06-JUN-21', '6700', 'Active']
['ACC013', 'Rohit', 'M', '27', '07-JUL-20', '9800', 'Active']
['ACC014', 'Aisha', 'F', '24', '07-JUL-21', '5200', 'Active']
['ACC015', 'Manish', 'M', '42', '08-AUG-20', '14000', 'A

## CSV Reading - Pandas method

Advantages:

Powerful for data manipulation, cleaning, and analysis.

Offers data structures like DataFrames for efficient handling of tabular data.

Reads large datasets efficiently and handles missing values.


Considerations:

Requires installing the pandas library.

May be overkill for simple data retrieval tasks.

In [7]:
import pandas as pd

data = pd.read_csv("Basic_data.csv")
# Access data by column names
print(data["Name"])


0          Raj
1         Riya
2         Amit
3        Priya
4       Vikram
5        Sonia
6        Rahul
7        Pooja
8        Sunil
9       Anjali
10       Vivek
11        Neha
12       Rohit
13       Aisha
14      Manish
15       Kiara
16       Sagar
17        Rani
18        Ajay
19       Seema
20        Atul
21      Nikita
22       Kapil
23       Nisha
24      Deepak
25        Sita
26       Arjun
27      Sunita
28        Yash
29    Priyanka
Name: Name, dtype: object


## CSV Reading - Numpy method

Advantages:

Efficient for reading CSV data into NumPy arrays, ideal for numerical computations.

Handles various data types and delimiters.

Good option for integrating with other NumPy operations.


Considerations:

Requires installing the NumPy library.

Data is returned as a NumPy array, which might require conversion for further analysis.

In [9]:
import numpy as np

data = np.genfromtxt("Basic_data.csv", delimiter=",", skip_header=1)  # Adjust parameters as needed
# Access data as a NumPy array
print(data)


[[     nan      nan      nan 3.00e+01      nan 5.00e+03      nan]
 [     nan      nan      nan 2.90e+01      nan 8.00e+03      nan]
 [     nan      nan      nan 3.50e+01      nan 1.20e+04      nan]
 [     nan      nan      nan 2.80e+01      nan 4.50e+03      nan]
 [     nan      nan      nan 4.00e+01      nan 7.80e+03      nan]
 [     nan      nan      nan 3.20e+01      nan 9.20e+03      nan]
 [     nan      nan      nan 2.50e+01      nan 1.05e+04      nan]
 [     nan      nan      nan 2.20e+01      nan 3.80e+03      nan]
 [     nan      nan      nan 5.00e+01      nan 2.50e+04      nan]
 [     nan      nan      nan 4.50e+01      nan 1.80e+04      nan]
 [     nan      nan      nan 3.80e+01      nan 1.12e+04      nan]
 [     nan      nan      nan 3.10e+01      nan 6.70e+03      nan]
 [     nan      nan      nan 2.70e+01      nan 9.80e+03      nan]
 [     nan      nan      nan 2.40e+01      nan 5.20e+03      nan]
 [     nan      nan      nan 4.20e+01      nan 1.40e+04      nan]
 [     nan

## Choosing the Right Method:

Small datasets and simple retrieval: csv module.

Large datasets, complex manipulation, and analysis: pandas.

Numerical computations and integration with NumPy: NumPy genfromtxt.

# Data Retrieval Methods


## Column Retrieval
This code snippet filters the DataFrame data to include only the columns "Name", "Age", and "Balance". It then displays the filtered DataFrame.

In [11]:
import pandas as pd

# Read the CSV data into a DataFrame
data = pd.read_csv("Basic_data.csv")

# Filter columns by name and display
filtered_columns = data[["Name", "Age", "Balance"]]
print("Filtered Columns (Name, Age, Balance):")
print(filtered_columns)


Filtered Columns (Name, Age, Balance):
        Name  Age  Balance
0        Raj   30     5000
1       Riya   29     8000
2       Amit   35    12000
3      Priya   28     4500
4     Vikram   40     7800
5      Sonia   32     9200
6      Rahul   25    10500
7      Pooja   22     3800
8      Sunil   50    25000
9     Anjali   45    18000
10     Vivek   38    11200
11      Neha   31     6700
12     Rohit   27     9800
13     Aisha   24     5200
14    Manish   42    14000
15     Kiara   36     7500
16     Sagar   33     8900
17      Rani   26     4100
18      Ajay   55    32000
19     Seema   48    21000
20      Atul   41    13500
21    Nikita   34     8100
22     Kapil   29    10200
23     Nisha   23     6000
24    Deepak   37    15800
25      Sita   30     9400
26     Arjun   44    17000
27    Sunita   39    11000
28      Yash   26     8500
29  Priyanka   21     4900


## Row Retrieval
Here, we filter the DataFrame to include only rows where the value in the "AccStatus" column is "Active". This retrieves all records marked as active accounts.

In [3]:
# Filter rows based on a condition (Active accounts)
active_accounts = data[data["AccStatus"] == "Active"]
print("\nActive Accounts:")
print(active_accounts)



Active Accounts:
     AccID      Name Gender  Age    AccOpen  Balance AccStatus
0   ACC001       Raj      M   30  01-JAN-20     5000    Active
2   ACC003      Amit      M   35  02-FEB-20    12000    Active
3   ACC004     Priya      F   28  02-FEB-21     4500    Active
4   ACC005    Vikram      M   40  03-MAR-20     7800    Active
6   ACC007     Rahul      M   25  04-APR-20    10500    Active
7   ACC008     Pooja      F   22  04-APR-21     3800    Active
8   ACC009     Sunil      M   50  05-MAY-20    25000    Active
9   ACC010    Anjali      F   45  05-MAY-21    18000    Active
10  ACC011     Vivek      M   38  06-JUN-20    11200    Active
11  ACC012      Neha      F   31  06-JUN-21     6700    Active
12  ACC013     Rohit      M   27  07-JUL-20     9800    Active
13  ACC014     Aisha      F   24  07-JUL-21     5200    Active
14  ACC015    Manish      M   42  08-AUG-20    14000    Active
15  ACC016     Kiara      F   36  08-AUG-21     7500    Active
16  ACC017     Sagar      M   33  09-

## Data Sampling
This code retrieves and displays the first 5 rows of the DataFrame using the head() method. You can uncomment the commented section to get the last 3 rows using tail().

In [4]:
# Get the first 5 rows (Head)
first_five = data.head()
print("\nFirst 5 Rows:")
print(first_five)

# Get the last 3 rows (Tail) - Optional
# last_three = data.tail(3)
# print("\nLast 3 Rows:")
# print(last_three)



First 5 Rows:
    AccID    Name Gender  Age    AccOpen  Balance AccStatus
0  ACC001     Raj      M   30  01-JAN-20     5000    Active
1  ACC002    Riya      F   29  01-JAN-21     8000  Inactive
2  ACC003    Amit      M   35  02-FEB-20    12000    Active
3  ACC004   Priya      F   28  02-FEB-21     4500    Active
4  ACC005  Vikram      M   40  03-MAR-20     7800    Active


## Data Display 
This code displays the entire DataFrame using the to_string() method.

In [5]:
# Display the entire DataFrame
print("\nFull DataFrame:")
print(data.to_string())



Full DataFrame:
     AccID      Name Gender  Age    AccOpen  Balance AccStatus
0   ACC001       Raj      M   30  01-JAN-20     5000    Active
1   ACC002      Riya      F   29  01-JAN-21     8000  Inactive
2   ACC003      Amit      M   35  02-FEB-20    12000    Active
3   ACC004     Priya      F   28  02-FEB-21     4500    Active
4   ACC005    Vikram      M   40  03-MAR-20     7800    Active
5   ACC006     Sonia      F   32  03-MAR-21     9200  Inactive
6   ACC007     Rahul      M   25  04-APR-20    10500    Active
7   ACC008     Pooja      F   22  04-APR-21     3800    Active
8   ACC009     Sunil      M   50  05-MAY-20    25000    Active
9   ACC010    Anjali      F   45  05-MAY-21    18000    Active
10  ACC011     Vivek      M   38  06-JUN-20    11200    Active
11  ACC012      Neha      F   31  06-JUN-21     6700    Active
12  ACC013     Rohit      M   27  07-JUL-20     9800    Active
13  ACC014     Aisha      F   24  07-JUL-21     5200    Active
14  ACC015    Manish      M   42  08-A