# **Introduction to Pandas**

## __Agenda__

In this lesson, we will cover the following concepts with the help of examples:

- Fundamentals of Pandas
  * Purpose of Pandas
  * Features of Pandas
- Data Structures
- Introduction to Series
  * Creating and Accessing Pandas Series Using Different Methods
  * Basic Information in Pandas Series
  * Operations and Transformations in Pandas Series
  * Querying a Series

## __1. Fundamentals of Pandas__

Pandas is an open-source library built on top of NumPy and is used for data manipulation. 

- It introduces data structures like DataFrame and Series that make working with structured data more efficient.

![image.png](attachment:25081b89-e5a2-48db-a049-49f0e7efd1d0.png)

### __1.1 Purpose of Pandas__
![image.png](attachment:00e17f94-086d-4da9-b246-ec13a12b73ca.png)

### __1.2 Features of Pandas__
![image.png](attachment:45a2c899-14a3-4f95-be31-69cd4773f430.png)





## __2. Data Structures__
The two main libraries of Pandas data structure are:
![image.png](attachment:cd42d5f9-517a-4aa8-a964-0b351c6da090.png)

## __3. Introduction to Series__
A Series is a one-dimensional array-like object containing data and labels or index.

It can be created with different data inputs:
![image.png](attachment:f47c6a25-748e-4bfd-a82f-d5d8010475a2.png)

### __3.1 Creating and Accessing Pandas Series Using Different Methods:__

In [None]:
import pandas as pd

In [None]:
ser = pd.Series(data=[10,24,34,46,32.5,10], index=["Jack","John", "Jacob","Ajay","Dheeraj","Suraj"],
         name='Age')

In [None]:
ser

In [None]:
ser.index

In [None]:
ser.values

In [None]:
ser.name

In [None]:
ser.dtype

In [None]:
ser.ndim # number of dimension

In [None]:
ser.shape # tuple

In [None]:
ser.size # scalar

In [None]:
# Creating a Pandas Series from a dictionary
data_dict = {'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}
series_from_dict = pd.Series(data_dict)

## Indexing and Slicing

In [None]:
# Indexing getting the element by specifying its position

In [None]:
ser['Jacob'] # index label method

In [None]:
ser[3] # Index Position

In [None]:
ser2 = pd.Series(data=[10,24,34,46,32.5,10], index=[1,2,4,5,0,3],
         name='Age')
ser2

In [None]:
ser2[2]

In [None]:
# loc and iloc attribute
# loc - index label
# iloc - index position

In [None]:
ser2.iloc[2]

In [None]:
ser2.loc[2]

In [None]:
ser3 = pd.Series(data=[10,24,34,46,32.5,10],
         name='Age')
ser3

In [None]:
ser2 = pd.Series(data=[10,24,34,46,32.5,10], index=[1,2,4,2,0,3],
         name='Age')
ser2

In [None]:
ser2.loc[2]

In [None]:
ser2.index

In [None]:
ser2.values

In [None]:
import pandas as pd

# Creating a Pandas Series from a list
data = [1, 2, 3, 4, 5] # list
series = pd.Series(data)

# Creating a Pandas Series with a specified index
index = ['a', 'b', 'c', 'd', 'e']
series_with_index = pd.Series(data, index=index)
series_with_index

In [None]:
# Accessing data in a Series
print(series[2])  # Accessing element at index 2
print(series_with_index['b'])  # Accessing element with index 'b'

### __3.2 Basic Information in Pandas Series__
These functions collectively help analysts summarize and understand the characteristics of the data, facilitating effective data exploration and analysis.

In [None]:
ser2.head() # return first 5 entries

In [None]:
ser2.head(2) # return first 2 entries

In [None]:
ser2.tail(2) # return last 2 entries

In [None]:
ser2.tail()# return last 5 entries

In [None]:
ser2.describe() # 5 point summary of the dataset

In [None]:
ser2.unique() # return the unique values from pandas series

In [None]:
ser2

In [None]:
set(ser2)

In [None]:
ser2.nunique() # number of unique values

In [None]:
# Return the first n rows
first_n_rows = series.head(3)

# Return the last n rows
last_n_rows = series.tail(3)

# Return dimensions (Rows, columns)
dimensions = series.shape

# Generate descriptive statistics
stats = series.describe()

# Return unique values
unique_values = series.unique()

# Return the number of unique values
num_unique_values = series.nunique()

### __3.3 Operations and Transformations in Pandas Series__
Operations and transformations in Pandas Series are crucial for modifying, enhancing, and cleaning data effectively.

They provide flexibility to adapt data to specific analyses or visualizations, preparing it for meaningful insights and ensuring data quality.

In [None]:
series

In [None]:
series_with_index

In [None]:
# Element-wise addition
result_series = series + series_with_index
result_series

In [None]:
ser2

In [None]:
ser2 + ser2

In [None]:
ser2.apply(lambda x:x*3)

In [None]:
def ops(n):
    return n*4

In [None]:
ser2.apply(ops)

In [None]:
ser2

In [None]:
res = ser2.map({10.0:99.99 , 24.0:35.42}) # missing values
res

In [None]:
res = ser2.map({10.0:99.99 , 24.0:35.42},na_action=None) # missing values
res

In [None]:
res.map({99.99:10, 35.42:50},na_action=None)

In [None]:
ser2.sort_values(ascending=False)

In [None]:
res

In [None]:
res.isnull() # check the missing value and return as bool

In [None]:
res.fillna(100) # fill the missing values in a pandas series with the required value

In [None]:
# Apply a function to each element
squared_series = series.apply(lambda x: x**2)

# Map values using a dictionary
mapped_series = series.map({1: 'one', 2: 'two', 3: 'three'})

# Sort the Series by values
sorted_series = series.sort_values()

# Check for missing values
missing_values = series.isnull()

# Fill missing values with a specified value
filled_series = series.fillna(0)

# Boolean Masking

In [None]:
ser2

In [None]:
m = [True, False, True, False, False,True]

In [None]:
ser2[m]

In [None]:
ser2 < 35

In [None]:
m = (ser2 < 35)

In [None]:
ser2[m] # values < 35

In [None]:
ser2[ser2<30] # Boolean masking

In [None]:
ser2

In [None]:
ser2.isin([10.0,32.5,60])

### __3.4 Querying a Series__
Selecting and filtering data based on specific conditions is an essential aspect of querying a Pandas Series. 

The following examples illustrate common querying operations that can be applied to a Pandas Series:

In [None]:
ser2.iloc[[0,2,4]]  # multiple values

In [None]:
ser2.loc[[0,2,4]] # multiple values

In [None]:
# Slicing
# iloc - start is included and stop is excluded
# loc - start and stop is included

In [None]:
ser2

In [None]:
ser2.iloc[1:3]

In [None]:
ser2.loc[1:4]

In [None]:
type(ser2)

In [None]:
import pandas as pd

# Create a Pandas Series
data = {'a': 10, 'b': 20, 'c': 30, 'd': 40, 'e': 50}
series = pd.Series(data)

# Select elements greater than 30
selected_greater_than_30 = series[series > 30]

# Select elements equal to 20
selected_equal_to_20 = series[series == 20]

# Select elements not equal to 40
selected_not_equal_to_40 = series[series != 40]

# Select elements based on multiple conditions
selected_multiple_conditions = series[(series > 20) & (series < 50)] # similar to logical and

# Select elements based on a list of values
selected_by_list = series[series.isin([20, 40, 60])]

# Select elements using string methods (if applicable)
string_series = pd.Series(['apple', 'banana', 'cherry', 'date', 'elderberry'])
selected_by_string_method = string_series[string_series.str.startswith('b')]

# Query based on index labels
selected_by_index_labels = series.loc[['a', 'c', 'e']]

# Query based on numeric position
selected_by_numeric_position = series.iloc[1:4]

# Display the results
print("Original Series:")
print(series)
print("\nSelected greater than 30:")
print(selected_greater_than_30)
print("\nSelected equal To 20:")
print(selected_equal_to_20)
print("\nSelected not equal to 40:")
print(selected_not_equal_to_40)
print("\nSelected based on multiple conditions:")
print(selected_multiple_conditions)
print("\nSelected based on list of values:")
print(selected_by_list)
print("\nSelected based on string method (startswith):")
print(selected_by_string_method)
print("\nSelected based on index labels:")
print(selected_by_index_labels)
print("\nSelected based on numeric position:")
print(selected_by_numeric_position)

# __Assisted Practice__

## __Problem Statement:__
Use Pandas Series to analyze sales data for a retail store over a week and draw insights from the data.

### __Dataset:__
__Sample sales data__

sales_data = [120, 150, 130, 170, 160, 180, 140]

days_of_week = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

## __Steps to Perform__

1. Create a Pandas Series for sales data
- Use a list of daily sales figures to create a Pandas Series
- Assign days of the week as the index 
2. Access and manipulate sales data
- Access sales data for specific days using index labels
- Calculate total sales for the week
- Identify the day with the highest and lowest sales
   
3. Basic analysis of sales data
- Calculate the average sales for the week
- Determine any days with sales figures significantly different from the average