# __Date and TimeDelta in Pandas__

## __Agenda__

In this lesson, we will cover the following concepts with the help of examples:

- Date and TimeDelta in Pandas
- Date Handling in Pandas
  * Extracting Components from Dates
- Timedelta in Pandas
  * Creating a Timedelta
  * Performing Arithmetic Operations
  * Resampling Time Series Data
- Categorical Data Handling
  * Creating a Categorical Variable
  * Counting Occurrences of Each Category
  * Creating Dummy Variables
  * Label Encoding

##  __1. Date and TimeDelta in Pandas__

In Pandas, the datetime module provides robust functionality for handling date and time data, while the timedelta class allows for convenient manipulation of time intervals. This combination is particularly useful for time-based analysis and working with temporal data in a DataFrame.

![image.png](attachment:88a45d36-de0a-4c61-a816-49f61fb1ae52.png)

## __2. Date Handling in Pandas__
#### Creating a Date Range:

- The date_range function is used to generate a sequence of dates within a specified range. 
- It is a powerful tool for creating time indices or date columns in a DataFrame. 
- The start and end parameters define the range, while freq determines the frequency, such as daily (D) or monthly (M).

In [None]:
import pandas as pd

# Generate a date range
date_range = pd.date_range(start='2023-01-01', end='2023-01-10', freq='D')
print(date_range)

### __2.1 Extracting Components from Dates__

Pandas provides the dt accessor to extract various components (Example: day, month, year) from a date column in a DataFrame. This is valuable for time-based analysis when specific date attributes need to be considered.

In [None]:
import pandas as pd

# Assuming 'df' is your DataFrame with a 'Date' column
data = {'Date': ['2023-01-01', '2023-02-15', '2023-03-20']}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date']) # convert the string to date time data type

# Extracting day, month, and year information
df['Day'] = df['Date'].dt.day
df['Month'] = df['Date'].dt.month
df['Year'] = df['Date'].dt.year

# Displaying the DataFrame with extracted information
print(df[['Date', 'Day', 'Month', 'Year']])


In [None]:
6//5

In [None]:
0,1,2,3,4,5,6

In [None]:
# Extracting weekday and weekend information
df = pd.DataFrame({'Date': pd.date_range(start='2023-01-01', periods=5)})
df['Weekday'] = df['Date'].dt.weekday
df['IsWeekend'] = df['Date'].dt.weekday // 5 == 1
print(df[['Date', 'Weekday', 'IsWeekend']])

In [None]:
# Shifting dates forward or backward
df['Date'] = pd.to_datetime(df['Date'])

In [None]:
df.dtypes

In [None]:
df

In [None]:
df['Date'] + pd.Timedelta(days=2)

In [None]:
# Shifting dates forward or backward
df['Date'] = pd.to_datetime(df['Date'])
df['PreviousDate'] = df['Date'] - pd.Timedelta(days=1)
df['NextDate'] = df['Date'] + pd.Timedelta(days=1)
print(df[['Date', 'PreviousDate', 'NextDate']])

## __3. Timedelta in Pandas__
### __3.1 Creating a Timedelta__

- The Timedelta class in Pandas represents a duration or the difference between two dates or times. 
- It can be created by specifying the desired duration, such as days, hours, or minutes.

In [None]:
import pandas as pd

# Creating a timedelta of 3 days
delta = pd.Timedelta(days=3)

### __3.2 Performing Arithmetic Operations__

Timedelta objects can be used to perform arithmetic operations on dates. For example, adding a timedelta to a date results in a new date. This is useful for calculating future or past dates based on a given time interval.

In [None]:
# Performing arithmetic operations with timedeltas
df['Date'] = pd.to_datetime(df['Date'])
df['FutureDate'] = df['Date'] + pd.Timedelta(weeks=2, days=3, hours=12)
print(df[['Date', 'FutureDate']])

### __3.3 Resampling Time Series Data__

Time series data often comes with irregular time intervals. Resampling is the process of changing the frequency of the time series data, either by upsampling (increasing frequency) or downsampling (decreasing frequency).

In [None]:
df

In [None]:
# Resampling time series data
df_resampled = df.resample('M', on='Date').sum()
print(df_resampled)

## __4. Categorical Data Handling__

### __4.1 Creating a Categorical Variable__
Pandas provides the Categorical class to create a categorical variable. Categorical variables are useful when dealing with data that can be divided into distinct, non-numeric categories.

In [None]:
import pandas as pd

# Creating a categorical variable
categories = ['Low', 'Medium', 'High']
values = ['Low', 'Medium', 'High', 'Low', 'High']
cat_variable = pd.Categorical(values, categories=categories, ordered=True)
print(cat_variable)

### __4.2 Counting Occurrences of Each Category__
The value_counts() method is used to count the occurrences of each category in a categorical column of a DataFrame.

In [None]:
# Assuming 'df' is your DataFrame with a 'Category' column
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B', 'A']})

# Counting occurrences of each category
category_counts = df['Category'].value_counts()
print(category_counts)


In [None]:
df = pd.read_csv("HousePrices.csv")
df.head()

In [None]:
pd.get_dummies(df['city'],prefix='city')

### __4.3 Creating Dummy Variables__

When working with machine learning models or statistical analyses, creating dummy variables is often necessary to represent categorical data numerically. The get_dummies function accomplishes this by creating binary columns for each category.

In [None]:
# Assuming 'df' is your DataFrame with a 'Category' column
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B', 'A']})

# Creating dummy variables for categorical data
dummy_variables = pd.get_dummies(df['Category'], prefix='Category')
print(dummy_variables)


### __4.4 Label Encoding__

Another way to handle categorical data is through label encoding, where each category is assigned a unique numerical label. This is useful in scenarios where ordinal relationships exist between categories.

In [None]:
# Assuming 'df' is your DataFrame with a 'Category' column
df = pd.DataFrame({'Category': ['A', 'B', 'A', 'C', 'B', 'A']})

# Label Encoding
df['Category_LabelEncoded'] = df['Category'].astype('category').cat.codes
print(df[['Category', 'Category_LabelEncoded']])

# __Assisted Practice__

## __Problem Statement:__
Analyze the housing dataset with a focus on handling date and categorical data to gain insights into house sales over time and the influence of house characteristics on its price.

## __Steps to Perform:__
- Convert the __YearBuilt__ and __YearRenovated__ columns to datetime format (if not converted)
- Extract useful components from the date like the year, month, or day
- Calculate the time difference between the year the house was built and the year it was remodeled
- Perform necessary arithmetic operations
- Count the number of occurrences of each category in categorical features
- Create dummy variables for categorical variables
