# Session 13: Introduction to Pandas and DataFrames

## Introduction
Pandas is a powerful data manipulation library in Python, providing data structures and functions needed to manipulate structured data seamlessly. In this tutorial, you will learn about Pandas and its core data structure, the DataFrame.

## Objectives
- Understand the basics of Pandas
- Learn how to create and manipulate DataFrames
- Perform data analysis with Pandas

## Prerequisites
- Knowledge of Python: functions, arrays, lists, NumPy, loops, conditionals
- Basic understanding of data visualization with Matplotlib

## Estimated Time
1.5 hours

## Part 1: Introduction to Pandas (20 minutes)

**What is Pandas?**
Pandas is an open-source library providing high-performance, easy-to-use data structures, and data analysis tools for Python. It is particularly useful for data manipulation and analysis.

**Installing Pandas (should already be installed)**
If you haven't installed Pandas yet, you can do so using pip: pip install pandas



**Importing Pandas**
```python
import pandas as pd


### Creating DataFrames**

#### Example 1: Creating a DataFrame from a Dictionary

**Instructions:**
1. Create a dictionary with some sample data.
2. Convert the dictionary to a DataFrame using `pd.DataFrame()`.


In [32]:
import pandas as pd

# Sample data in a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [24, 27, 22, 32, 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

# Creating DataFrame
df = pd.DataFrame(data)

# Displaying DataFrame
print(df)


      Name  Age         City
0    Alice   24     New York
1      Bob   27  Los Angeles
2  Charlie   22      Chicago
3    David   32      Houston
4      Eve   29      Phoenix


### Practice Problem 1: Create Your Own DataFrame

Create a DataFrame for the following data:

- **'Product'**: ['Laptop', 'Tablet', 'Smartphone']
- **'Price'**: [1000, 500, 800]
- **'Quantity'**: [50, 30, 100]

**Solution:**


In [33]:
import pandas as pd

# Sample data in a dictionary
data = {
    'Product': ['Laptop', 'Tablet', 'Smartphone'],
    'Price': [1000, 500, 800],
    'Quantity': [50, 30, 100]
}

# Creating DataFrame
df = pd.DataFrame(data)

# Displaying DataFrame
print(df)


      Product  Price  Quantity
0      Laptop   1000        50
1      Tablet    500        30
2  Smartphone    800       100


### Exploring DataFrames

#### Example 2: Basic DataFrame Operations

**Instructions:**

- Use `df.head()` to view the first few rows.
- Use `df.info()` to get a concise summary of the DataFrame.
- Use `df.describe()` to get a statistical summary.


In [34]:
# Viewing the first few rows
print(df.head())

# Getting a concise summary
print(df.info())

# Getting a statistical summary
print(df.describe())


      Product  Price  Quantity
0      Laptop   1000        50
1      Tablet    500        30
2  Smartphone    800       100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Product   3 non-null      object
 1   Price     3 non-null      int64 
 2   Quantity  3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes
None
             Price    Quantity
count     3.000000    3.000000
mean    766.666667   60.000000
std     251.661148   36.055513
min     500.000000   30.000000
25%     650.000000   40.000000
50%     800.000000   50.000000
75%     900.000000   75.000000
max    1000.000000  100.000000


### Practice Problem 2: Explore Your DataFrame

Explore the DataFrame you created in Practice Problem 1 using `head()`, `info()`, and `describe()`.


**Solution:**

In [35]:
# Viewing the first few rows
print(df.head())

# Getting a concise summary
print(df.info())

# Getting a statistical summary
print(df.describe())


      Product  Price  Quantity
0      Laptop   1000        50
1      Tablet    500        30
2  Smartphone    800       100
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Product   3 non-null      object
 1   Price     3 non-null      int64 
 2   Quantity  3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes
None
             Price    Quantity
count     3.000000    3.000000
mean    766.666667   60.000000
std     251.661148   36.055513
min     500.000000   30.000000
25%     650.000000   40.000000
50%     800.000000   50.000000
75%     900.000000   75.000000
max    1000.000000  100.000000


Part 2: Data Manipulation with Pandas (30 minutes)

### Selecting Data

#### Example 3: Selecting Columns and Rows

Instructions:
- Select a single column using `df['column_name']`.
- Select multiple columns using `df[['column1', 'column2']]`.
- Select rows using `df.iloc[]` for integer-location based indexing.


In [36]:
# Selecting a single column
products = df['Product']
print(products)

# Selecting multiple columns
product_price = df[['Product', 'Price']]
print(product_price)

# Selecting rows by index
first_row = df.iloc[0]
print(first_row)

# Selecting a subset of rows and columns
subset = df.iloc[0:2, 0:3]
print(subset)



0        Laptop
1        Tablet
2    Smartphone
Name: Product, dtype: object
      Product  Price
0      Laptop   1000
1      Tablet    500
2  Smartphone    800
Product     Laptop
Price         1000
Quantity        50
Name: 0, dtype: object
  Product  Price  Quantity
0  Laptop   1000        50
1  Tablet    500        30


### Practice Problem 3: Select Data from Your DataFrame

Select the 'Product' and 'Price' columns from your DataFrame. Also, select the first two rows.

**Solution:**


In [37]:
# Selecting the 'Product' and 'Price' columns
product_price = df[['Product', 'Price']]
print(product_price)

# Selecting the first two rows
first_two_rows = df.iloc[0:2]
print(first_two_rows)


      Product  Price
0      Laptop   1000
1      Tablet    500
2  Smartphone    800
  Product  Price  Quantity
0  Laptop   1000        50
1  Tablet    500        30


### Filtering Data

**Example 4: Filtering Data Based on Conditions**

Instructions:
- Filter rows where a column meets a condition using boolean indexing.
- Combine multiple conditions using & (and) and | (or).


In [38]:
# Filter rows where 'Price' is greater than 600
filtered_data = df[df['Price'] > 600]
print(filtered_data)

# Combine multiple conditions using & (and) and | (or)
filtered_data = df[(df['Price'] > 600) & (df['Quantity'] < 80)]
print(filtered_data)


      Product  Price  Quantity
0      Laptop   1000        50
2  Smartphone    800       100
  Product  Price  Quantity
0  Laptop   1000        50


### Practice Problem 4: Filter Data in Your DataFrame

Filter rows where the 'Price' is greater than 600 and 'Quantity' is less than 100.

**Solution:**


In [39]:
# Filtering rows based on conditions
filtered_df = df[(df['Price'] > 600) & (df['Quantity'] < 100)]
print(filtered_df)


  Product  Price  Quantity
0  Laptop   1000        50


### Adding and Removing Data

#### Example 5: Adding and Removing Columns

Instructions:
Add a new column to the DataFrame.
Remove a column using df.drop().


In [40]:
# Adding a new column
df['Discount'] = [0.1, 0.2, 0.15]

# Removing a column
df.drop('Discount', axis=1, inplace=True)


### Practice Problem 5: Add and Remove Columns in Your DataFrame

Add a 'Discount' column to your DataFrame with values [10, 15, 20]. Then, remove the 'Quantity' column.

**Solution:**


In [41]:
# Adding a new column 'Discount'
df['Discount'] = [10, 15, 20]
print(df)

# Removing the 'Quantity' column
df = df.drop('Quantity', axis=1)
print(df)


      Product  Price  Quantity  Discount
0      Laptop   1000        50        10
1      Tablet    500        30        15
2  Smartphone    800       100        20
      Product  Price  Discount
0      Laptop   1000        10
1      Tablet    500        15
2  Smartphone    800        20


### Part 3: Advanced DataFrame Operations (40 minutes)

**Handling Missing Data**

#### Example 6: Identifying and Handling Missing Data*

**Instructions:**

- Identify missing data using `df.isnull()`.
- Drop rows with missing data using `df.dropna()`.
- Fill missing data using `df.fillna()`.


In [42]:
import numpy as np

# Adding missing data
df.loc[2, 'Age'] = np.nan
print(df)

# Identifying missing data
print(df.isnull())

# Dropping rows with missing data
df_dropped = df.dropna()
print(df_dropped)

# Filling missing data
df_filled = df.fillna(0)
print(df_filled)


      Product  Price  Discount  Age
0      Laptop   1000        10  NaN
1      Tablet    500        15  NaN
2  Smartphone    800        20  NaN
   Product  Price  Discount   Age
0    False  False     False  True
1    False  False     False  True
2    False  False     False  True
Empty DataFrame
Columns: [Product, Price, Discount, Age]
Index: []
      Product  Price  Discount  Age
0      Laptop   1000        10  0.0
1      Tablet    500        15  0.0
2  Smartphone    800        20  0.0


### Practice Problem 6: Handle Missing Data in Your DataFrame

Introduce missing data into your DataFrame and practice dropping rows and filling missing values.

**Solution:**


In [43]:
import numpy as np

# Introducing missing data
df.loc[1, 'Price'] = np.nan
print(df)

# Identifying missing data
print(df.isnull())

# Dropping rows with missing data
df_dropped = df.dropna()
print(df_dropped)

# Filling missing data
df_filled = df.fillna(0)
print(df_filled)


      Product   Price  Discount  Age
0      Laptop  1000.0        10  NaN
1      Tablet     NaN        15  NaN
2  Smartphone   800.0        20  NaN
   Product  Price  Discount   Age
0    False  False     False  True
1    False   True     False  True
2    False  False     False  True
Empty DataFrame
Columns: [Product, Price, Discount, Age]
Index: []
      Product   Price  Discount  Age
0      Laptop  1000.0        10  0.0
1      Tablet     0.0        15  0.0
2  Smartphone   800.0        20  0.0


### Grouping and Aggregating Data

**Example 7: Grouping and Aggregating Data**

**Instructions:**
- Group data using `df.groupby()`.
- Perform aggregation functions such as `sum()`, `mean()`, etc.


In [44]:
# Sample data
data = {
    'Category': ['A', 'B', 'A', 'B', 'A'],
    'Value': [10, 20, 30, 40, 50]
}
df = pd.DataFrame(data)

# Grouping data by 'Category'
grouped = df.groupby('Category')

# Aggregating data
sum_values = grouped.sum()
print(sum_values)

mean_values = grouped.mean()
print(mean_values)


          Value
Category       
A            90
B            60
          Value
Category       
A          30.0
B          30.0


### Practice Problem 7: Group and Aggregate Data in Your DataFrame

Group your DataFrame by 'Product' and calculate the total and average price for each product.

**Solution:**


In [45]:
# Sample data
data = {
    'Product': ['Laptop', 'Tablet', 'Smartphone', 'Laptop', 'Tablet'],
    'Price': [1000, 500, 800, 1200, 450],
    'Quantity': [50, 30, 100, 70, 20]
}
df = pd.DataFrame(data)

# Grouping data by 'Product'
grouped = df.groupby('Product')

# Aggregating data
total_price = grouped['Price'].sum()
average_price = grouped['Price'].mean()

print(total_price)
print(average_price)


Product
Laptop        2200
Smartphone     800
Tablet         950
Name: Price, dtype: int64
Product
Laptop        1100.0
Smartphone     800.0
Tablet         475.0
Name: Price, dtype: float64


### Merging and Joining DataFrames

#### Example 8: Merging DataFrames

Instructions:
- Merge two DataFrames using pd.merge().
- Perform different types of merges: inner, outer, left, right.


In [46]:
# Sample data
data1 = {
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
}
df1 = pd.DataFrame(data1)

data2 = {
    'ID': [1, 2, 4],
    'Age': [24, 27, 30]
}
df2 = pd.DataFrame(data2)

# Merging DataFrames
merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)


   ID   Name  Age
0   1  Alice   24
1   2    Bob   27


### Practice Problem 8: Merge DataFrames

Create two DataFrames with some common columns and merge them using different merge types.

**Solution:**


In [47]:
# Sample data for first DataFrame
data1 = {
    'ID': [1, 2, 3],
    'Name': ['Alice', 'Bob', 'Charlie']
}
df1 = pd.DataFrame(data1)

# Sample data for second DataFrame
data2 = {
    'ID': [1, 2, 4],
    'Age': [24, 27, 30]
}
df2 = pd.DataFrame(data2)

# Inner merge
inner_merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(inner_merged_df)

# Outer merge
outer_merged_df = pd.merge(df1, df2, on='ID', how='outer')
print(outer_merged_df)

# Left merge
left_merged_df = pd.merge(df1, df2, on='ID', how='left')
print(left_merged_df)

# Right merge
right_merged_df = pd.merge(df1, df2, on='ID', how='right')
print(right_merged_df)


   ID   Name  Age
0   1  Alice   24
1   2    Bob   27
   ID     Name   Age
0   1    Alice  24.0
1   2      Bob  27.0
2   3  Charlie   NaN
3   4      NaN  30.0
   ID     Name   Age
0   1    Alice  24.0
1   2      Bob  27.0
2   3  Charlie   NaN
   ID   Name  Age
0   1  Alice   24
1   2    Bob   27
2   4    NaN   30


## Conclusion

In this tutorial, you have learned how to:
- Create and manipulate DataFrames using Pandas
- Select, filter, add, and remove data
- Handle missing data
- Group and aggregate data
- Merge and join DataFrames
