## Working with Data Structures in Python: Indexing, Decoding, Shape, and Reshaping (using CSV Files)

### 1: Indexing Data in a DataFrame

In [8]:
import pandas as pd

# Replace 'data.csv' with your actual file path
data = pd.read_csv("Basic_data.csv")

print(data.head())

    AccID    Name Gender  Age    AccOpen  Balance AccStatus
0  ACC001     Raj      M   30  01-JAN-20     5000    Active
1  ACC002    Riya      F   29  01-JAN-21     8000  Inactive
2  ACC003    Amit      M   35  02-FEB-20    12000    Active
3  ACC004   Priya      F   28  02-FEB-21     4500    Active
4  ACC005  Vikram      M   40  03-MAR-20     7800    Active


### By Label (Column Name):

In [2]:
# Access a specific column
name_column = data["Name"]
print(name_column)


0          Raj
1         Riya
2         Amit
3        Priya
4       Vikram
5        Sonia
6        Rahul
7        Pooja
8        Sunil
9       Anjali
10       Vivek
11        Neha
12       Rohit
13       Aisha
14      Manish
15       Kiara
16       Sagar
17        Rani
18        Ajay
19       Seema
20        Atul
21      Nikita
22       Kapil
23       Nisha
24      Deepak
25        Sita
26       Arjun
27      Sunita
28        Yash
29    Priyanka
Name: Name, dtype: object


### By Position (Integer Index)

In [4]:
# Access the first row
first_row = data.iloc[0]
print(first_row)

print('------------------')

# Access a specific cell by row and column index
cell_value = data.iloc[1, 2]  # Row 1 (index 0), Column 2 (index 1)
print(cell_value)


AccID           ACC001
Name               Raj
Gender               M
Age                 30
AccOpen      01-JAN-20
Balance           5000
AccStatus       Active
Name: 0, dtype: object
------------------
F


### Boolean Indexing (Filtering)

In [5]:
# Filter rows where 'Age' is greater than 30
filtered_data = data[data["Age"] > 30]
print(filtered_data)


     AccID    Name Gender  Age    AccOpen  Balance AccStatus
2   ACC003    Amit      M   35  02-FEB-20    12000    Active
4   ACC005  Vikram      M   40  03-MAR-20     7800    Active
5   ACC006   Sonia      F   32  03-MAR-21     9200  Inactive
8   ACC009   Sunil      M   50  05-MAY-20    25000    Active
9   ACC010  Anjali      F   45  05-MAY-21    18000    Active
10  ACC011   Vivek      M   38  06-JUN-20    11200    Active
11  ACC012    Neha      F   31  06-JUN-21     6700    Active
14  ACC015  Manish      M   42  08-AUG-20    14000    Active
15  ACC016   Kiara      F   36  08-AUG-21     7500    Active
16  ACC017   Sagar      M   33  09-SEP-20     8900    Active
18  ACC019    Ajay      M   55  10-OCT-20    32000    Active
19  ACC020   Seema      F   48  10-OCT-21    21000    Active
20  ACC021    Atul      M   41  11-NOV-20    13500    Active
21  ACC022  Nikita      F   34  11-NOV-21     8100    Active
24  ACC025  Deepak      M   37  01-JAN-22    15800    Active
26  ACC027   Arjun      

### Modifying Data

In [9]:
# Update a value in a specific cell
data.loc[2, "Name"] = "Amitkumar"  # Row 2 (index 1), 'City' column

# Update all values in a column
data["Balance"] = data["Balance"] * 1.05  # Increase balance by 5%

print(data.head())


    AccID       Name Gender  Age    AccOpen  Balance AccStatus
0  ACC001        Raj      M   30  01-JAN-20   5250.0    Active
1  ACC002       Riya      F   29  01-JAN-21   8400.0  Inactive
2  ACC003  Amitkumar      M   35  02-FEB-20  12600.0    Active
3  ACC004      Priya      F   28  02-FEB-21   4725.0    Active
4  ACC005     Vikram      M   40  03-MAR-20   8190.0    Active


### 2. Decoding Encoded Data

#### String Decoding:

If your CSV file contains non-standard characters or encodings, you might need to specify the encoding during reading.

In [11]:
# Read a CSV with a specific encoding (replace 'latin-1' as needed)
data = pd.read_csv("Basic_data.csv", encoding="latin-1")

# Access and print decoded data
print(data["Name"])



0          Raj
1         Riya
2         Amit
3        Priya
4       Vikram
5        Sonia
6        Rahul
7        Pooja
8        Sunil
9       Anjali
10       Vivek
11        Neha
12       Rohit
13       Aisha
14      Manish
15       Kiara
16       Sagar
17        Rani
18        Ajay
19       Seema
20        Atul
21      Nikita
22       Kapil
23       Nisha
24      Deepak
25        Sita
26       Arjun
27      Sunita
28        Yash
29    Priyanka
Name: Name, dtype: object


### Categorical Decoding:

In [12]:
# Assuming 'Category' column contains integer codes
data["Gender_Name"] = data["Gender"].map({"F": "Female", "M": "Male"})
print(data[["Gender", "Gender_Name"]].head())


  Gender Gender_Name
0      M        Male
1      F      Female
2      M        Male
3      F      Female
4      M        Male


### 3. Understanding Data Shape

The shape of a DataFrame is a tuple representing the number of rows and columns: (rows, columns).


In [13]:
# Get the shape of the DataFrame
data_shape = data.shape
print("Data Shape:", data_shape)


Data Shape: (30, 8)


### A Series (single column) shape is a single integer representing the number of elements.

In [14]:
# Get the shape of a Series (e.g., 'Age' column)
age_series_shape = data["Age"].shape
print("Age Series Shape:", age_series_shape)


Age Series Shape: (30,)


### 4. Reshaping Data (Resizing and Pivoting)

#### Resizing

.iloc for integer-based selection and resizing.

In [16]:
# Select the first 10 rows and all columns
data_subset = data.iloc[:10, :]
print(data_subset.shape)

print(data_subset)


(10, 8)
    AccID    Name Gender  Age    AccOpen  Balance AccStatus Gender_Name
0  ACC001     Raj      M   30  01-JAN-20     5000    Active        Male
1  ACC002    Riya      F   29  01-JAN-21     8000  Inactive      Female
2  ACC003    Amit      M   35  02-FEB-20    12000    Active        Male
3  ACC004   Priya      F   28  02-FEB-21     4500    Active      Female
4  ACC005  Vikram      M   40  03-MAR-20     7800    Active        Male
5  ACC006   Sonia      F   32  03-MAR-21     9200  Inactive      Female
6  ACC007   Rahul      M   25  04-APR-20    10500    Active        Male
7  ACC008   Pooja      F   22  04-APR-21     3800    Active      Female
8  ACC009   Sunil      M   50  05-MAY-20    25000    Active        Male
9  ACC010  Anjali      F   45  05-MAY-21    18000    Active      Female


#### .loc for label-based selection and resizing.

In [18]:
# Select rows where 'Age' is less than 25 and all columns
young_data = data.loc[data["Age"] < 25, :]
print(young_data.shape)

print(young_data)


(5, 8)
     AccID      Name Gender  Age    AccOpen  Balance AccStatus Gender_Name
7   ACC008     Pooja      F   22  04-APR-21     3800    Active      Female
13  ACC014     Aisha      F   24  07-JUL-21     5200    Active      Female
23  ACC024     Nisha      F   23  12-DEC-21     6000    Active      Female
27  ACC028    Sunita      F    3  32-FEB-23    11000    Active      Female
29  ACC030  Priyanka    NaN   21  03-MAR-23     4900    Active         NaN


#### Pivoting

pivot_table for summarizing or transforming data.

In [21]:
import numpy as np
# Create a pivot table with 'Age' as rows and 'City' as columns, aggregating balance by mean
age_city_balance = data.pivot_table(values="Balance", index="AccStatus", columns="Gender", aggfunc=np.mean)
print(age_city_balance)



Gender          F             M
AccStatus                      
Active     8775.0  13157.142857
DeActive      NaN  17000.000000
Inactive   8600.0           NaN


# Remember:

These topics provide a foundational understanding. Explore more advanced functionalities in pandas for comprehensive data manipulation.

Practice with different CSV files to solidify your learnings.