# Pandas Exercises

## Overview

This module covers essential Pandas operations, including data manipulation, analysis, and basic statistical functions. It provides hands-on experience with real-world data using the Pandas library.

## Learning Objectives

- Convert list of dictionaries and CSV files to DataFrames
- Perform data access operations using Pandas
- Handle missing data with `fillna` function
- Apply descriptive statistics functions to analyze data
- Utilize Pandas for data slicing and dicing

## Prerequisites

- Basic understanding of Python
- Familiarity with Jupyter notebooks
- Installed libraries: numpy, pandas

## Get Started

### Install required packages.

In [None]:
# Install the required libraries: numpy and pandas
%pip install numpy pandas

### Import necessary libraries

In [None]:
# Importing the numpy library, which provides support for large, multi-dimensional arrays and matrices
# It also provides mathematical functions to operate on these arrays
import numpy as np

# Importing the pandas library, which is a powerful, flexible, and easy-to-use data manipulation and analysis tool
# It provides data structures such as DataFrame and Series to handle and analyze structured data
import pandas as pd

## Convert list of dictionaries to DataFrame

In [None]:
# List of dictionaries containing city names and associated data values
d = [
    # First city: Delhi with associated data value 1000
    {"city": "Delhi", "data": 1000},
    
    # Second city: Bangalore with associated data value 2000
    {"city": "Bangalore", "data": 2000},
    
    # Third city: Mumbai with associated data value 1000
    {"city": "Mumbai", "data": 1000},
]

# Output the list of dictionaries
d

Convert the list of dictionaries `d` into a DataFrame.

In [None]:
# Create a pandas DataFrame from the dictionary 'd'

df = # Your code goes here

# Display the DataFrame
df

## Convert CSV files to DataFrame

Read in csv file and convert it to DataFrame.

In [None]:
# Read the CSV file ("../../Data/simplemaps-worldcities-basic.csv") containing city data into a pandas DataFrame
# The path to the CSV file is provided as relative from the current working directory

city_data = # Your code goes here

Show the first 10 rows of converted DataFrame.

In [None]:
# Display the first 10 rows of the 'city_data' DataFrame

# Your code goes here

## Data Access

### Head and Tail

Get the last 10 rows of `city_data`:

In [None]:
# Your code goes here

### Slicing and Dicing

In [None]:
series_es = city_data.lat
type(series_es)

Get the first 5 odd number of rows of `series_es`:

In [None]:
# Your code goes here

Get the first 8 rows of `series_es`:

In [None]:
# Your code goes here

Get first 8 rows of `city_data`:

In [None]:
# Your code goes here

Get the first 4 columns of the first 5 rows of **city_data**:

In [None]:
# Your code goes here

Select cities that have population of more than 10 million and select columns with column name start with the letter `p`:

In [None]:
# Filter the city_data dataframe to include cities with a population greater than 10 million
# The condition city_data["pop"] > 10000000 selects rows where the population exceeds 10 million
# Then, select only the columns whose names start with "p"
# This is achieved using the str.startswith("p") function on the column names

# Your code goes here

## Data Operations

### Missing data and the `fillna` function

In [None]:
# Create a DataFrame with 8 rows and 3 columns, filled with random numbers from a normal distribution
df = pd.DataFrame(np.random.randn(8, 3), columns=["A", "B", "C"])

# Set a specific value (at row 4, column 'C') to NaN (missing value)
df.iloc[4, 2] = np.nan

# Display the DataFrame
df

Replace all the "NaN" in `df` with `0`:

In [None]:
# Your code goes here

## Descriptive Statistics functions

In [None]:
columns_numeric = ["lat", "lng", "pop"]

Get average `lat`, `lng`, and `pop` values:

In [None]:
# Your code goes here

Get sum of `lat`, `lng`, and `pop` values:

In [None]:
# Your code goes here

Get total number of `lat`, `lng`, and `pop` values:

In [None]:
# Your code goes here

Get 75 percentile of `lat`, `lng`, and `pop` values:

In [None]:
# Your code goes here

Get sums of each row:

In [None]:
# Your code goes here

Calculate
the most important statistics for numerical data in one go so that we don’t have to use individual functions:

In [None]:
# Your code goes here

## Conclusion

In this module, you've learned how to:

- Convert different data formats to Pandas DataFrames
- Access and manipulate data using Pandas
- Handle missing data
- Perform basic statistical analysis on datasets
- Use various Pandas functions for data exploration and manipulation

These skills form a foundation for more advanced data analysis and machine learning tasks using Python and Pandas.

## Clean up

Remember to shut down your Jupyter notebook kernel when you're done to free up resources.