# Introduction to Date Filtering
In this lesson, we'll explore how to filter time series financial data by date range using the Pandas library. Filtering data by specific date ranges is vital in financial analysis, allowing us to focus on periods of interest, such as a particular year or month. This skill is essential for traders and analysts who need to examine stock performance during specific periods, such as economic crises or fiscal quarters.

## Converting Date Columns to Datetime Objects
The first step in filtering data by date is to ensure that the date column is in a suitable format. Let's start by loading the Tesla ($TSLA) stock dataset and converting the "Date" column to datetime objects using pd.to_datetime().

```Python
Copy
Play
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Display initial rows to inspect the format
print(tesla_df.head())
```
The output of the above code confirms that the 'Date' column is now in datetime format, which is crucial for time series analysis:

```Plain text
Copy
        Date      Open      High       Low     Close  Adj Close     Volume
0 2010-06-29  1.266667  1.666667  1.169333  1.592667   1.592667  281494500
1 2010-06-30  1.719333  2.028000  1.553333  1.588667   1.588667  257806500
2 2010-07-01  1.666667  1.728000  1.351333  1.464000   1.464000  123282000
3 2010-07-02  1.533333  1.540000  1.247333  1.280000   1.280000   77097000
4 2010-07-06  1.333333  1.333333  1.055333  1.074000   1.074000  103003500
```

## Setting the Date Column as Index
Setting the date column as the index of the DataFrame and sorting it simplifies the process of slicing and filtering data based on dates. It also enhances performance during such operations.

Here’s how to set the "Date" column as the index and sort it:

```Python
Copy
Play
# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)
print(tesla_df.head())
```
The output of the above code will be:

```Plain text
Copy
                Open      High       Low     Close  Adj Close     Volume
Date                                                                    
2010-06-29  1.266667  1.666667  1.169333  1.592667   1.592667  281494500
2010-06-30  1.719333  2.028000  1.553333  1.588667   1.588667  257806500
2010-07-01  1.666667  1.728000  1.351333  1.464000   1.464000  123282000
2010-07-02  1.533333  1.540000  1.247333  1.280000   1.280000   77097000
2010-07-06  1.333333  1.333333  1.055333  1.074000   1.074000  103003500
```
This output confirms that the Date column has successfully been set as the index of the DataFrame and successfully sorted in chronological order based on this index, ensuring an accurate timeline for subsequent analysis.

## Filtering Data by Specific Date Range
With the date column converted to datetime objects, set as the index, and sorted, we can now filter the DataFrame by a specific date range. This technique is particularly useful when you need to analyze data for a specific year, month, or any custom date range.

Let’s filter the dataset for the year 2020:

```Python
Copy
Play
# Filtering the dataset for the year 2020
tesla_2020 = tesla_df.loc['2020']
print(tesla_2020.head())
```
In this code, loc is a Pandas method used for label-based indexing. It allows you to select rows and columns based on labels, such as dates in this case. Here, we use loc to filter the DataFrame based on the date labels, extracting all rows corresponding to the year 2020.

The output of the above code will be:

```Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-01-02  28.299999  28.713333  28.114000  28.684000  28.684000  142981500
2020-01-03  29.366667  30.266666  29.128000  29.534000  29.534000  266677500
2020-01-06  29.364668  30.104000  29.333332  30.102667  30.102667  151995000
2020-01-07  30.760000  31.441999  30.224001  31.270666  31.270666  268231500
2020-01-08  31.580000  33.232666  31.215334  32.809334  32.809334  467164500
```
This output demonstrates the successful filtering of the DataFrame to show stock prices for the start of 2020. The simplified view focuses on the 'Open' column to display the opening stock prices at the beginning of the year, providing a quick insight into Tesla's stock performance during this period.

## Other Ways to Filter by Date
We can also filter for more specific date ranges, such as a particular month or quarter:

```Python
Copy
Play
# Filtering data for January 2020
tesla_jan_2020 = tesla_df.loc['2020-01']
print(tesla_jan_2020.head())
```
The output of the above code will be:

```Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-01-02  28.299999  28.713333  28.114000  28.684000  28.684000  142981500
2020-01-03  29.366667  30.266666  29.128000  29.534000  29.534000  266677500
2020-01-06  29.364668  30.104000  29.333332  30.102667  30.102667  151995000
2020-01-07  30.760000  31.441999  30.224001  31.270666  31.270666  268231500
2020-01-08  31.580000  33.232666  31.215334  32.809334  32.809334  467164500
```
To filter a quarter, the code will look like this:

```Python
Copy
Play
# Filtering data from January 2020 to March 2020 (Q1)
# Unlike the common Python slicing operator, here, March 2020 (2020-03) is inclusive
tesla_q1_2020 = tesla_df.loc['2020-01':'2020-03']
print(tesla_q1_2020.tail())
```
The output of the above code will be:

```Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-03-25  36.349998  37.133331  34.074001  35.950001  35.950001  318340500
2020-03-26  36.492668  37.333332  34.150002  35.210667  35.210667  260710500
2020-03-27  33.666668  35.053333  32.935333  34.290668  34.290668  215661000
2020-03-30  34.017334  34.443333  32.748669  33.475334  33.475334  179971500
2020-03-31  33.416668  36.197334  33.133331  34.933334  34.933334  266572500
```
## Plotting the Filtered Data
After filtering the data, visualizing it can help identify patterns and trends over the specified date range. We will use Matplotlib, a popular plotting library in Python, to create a time series plot.

Let's visualize the January 2020 data and the Q1 2020 data for Tesla stock:

```Python
Copy
Play
import matplotlib.pyplot as plt

# Plotting the filtered data for Q1 2020
tesla_q1_2020 = tesla_df.loc['2020-01':'2020-03']

plt.figure(figsize=(10, 5))
plt.plot(tesla_q1_2020.index, tesla_q1_2020['Close'], marker='o', linestyle='-')
plt.title('Tesla Stock Prices in Q1 2020')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.show()
```

By visualizing the data, you can gain insights into stock performance and identify trends over the specified periods, thereby enhancing your financial analysis capabilities.

## Lesson Summary
In this lesson, you learned how to filter time series financial data by date ranges using the Pandas library. We covered converting date columns to datetime objects, setting the date column as the index, sorting the DataFrame chronologically, and filtering data by specific date ranges. These techniques are essential for focusing on specific periods relevant to your financial analysis or trading strategy.

Next, we will have some practice exercises to reinforce these concepts, making sure you are comfortable with filtering financial data by date. This practice will enhance your data manipulation skills, which are crucial for efficient time series analysis in trading and financial contexts. Let's get started!

## Filter and Display Tesla Stock Data for Q1 2020

```py
import pandas as pd
import datasets
import datasets.utils.logging as datasets_logging

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices', split='train')
tesla_df = pd.DataFrame(tesla_data)

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)

# Filtering data for January 2020
tesla_jan_2020 = tesla_df.loc['2020-01']
print("\nTesla stock data for January 2020:", tesla_jan_2020.head())
```

## Identifying and Fixing Date Filter Issues

```py
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)

# Filtering the dataset for the years 2020 to 2022
tesla_2020_2022 = tesla_df.loc['2020', '2022']
print(tesla_2020_2022.head())
```

## Filter Tesla Stock Data for Q4 2019

```py
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# TODO: Filter data for TSLA prices only in Q4 of 2019 (Oct 2019 - Dec 2019)
# Print the first few rows
```

## Filter Tesla Stock Data by Date Range

Certainly! Below is the Python code to perform the tasks as outlined in your request:

```python
import pandas as pd
import datasets

# TODO: Load the Tesla (TSLA) dataset `codesignal/tsla-historic-prices` and create a DataFrame
data = datasets.load_dataset('codesignal/tsla-historic-prices')
df = pd.DataFrame(data['data'])

# TODO: Convert the 'Date' column to datetime type to handle date calculations
df['Date'] = pd.to_datetime(df['Date'])

# TODO: Set the converted 'Date' column as the DataFrame index to facilitate date-based querying
df.set_index('Date', inplace=True)

# TODO: Sort the DataFrame based on the date index to ensure it's ordered chronologically
df.sort_index(inplace=True)

# TODO: Filter the DataFrame to get data from January 2020 to March 2020 (Q1) and print the first few rows
q1_2020 = df.loc['2020-01-01':'2020-03-31']
print(q1_2020.head())
```

### Explanation:

1. **Loading the Dataset**: 
   - The dataset is loaded using the `datasets.load_dataset` function, and the data is converted into a DataFrame.

2. **Converting the 'Date' Column**:
   - The 'Date' column is converted to `datetime` type using `pd.to_datetime` to facilitate date-based operations.

3. **Setting the 'Date' Column as the Index**:
   - The 'Date' column is set as the DataFrame index, which allows for easy date-based filtering.

4. **Sorting the DataFrame**:
   - The DataFrame is sorted by the date index to ensure that the data is ordered chronologically.

5. **Filtering for Q1 2020**:
   - The DataFrame is filtered to include only the data from January 1, 2020, to March 31, 2020, and the first few rows are printed using `head()`.

If you run this code in a Python environment with the necessary dataset and dependencies, it will return the filtered Tesla stock data for the first quarter of 2020.

To filter and display Tesla stock data from January 2020 to March 2020, you can modify the code as follows:

1. Adjust the filter to include the entire first quarter of 2020 (January to March).
2. Update the `loc` method to slice the DataFrame based on the date range.

Here's the updated code:

```python
import pandas as pd
import datasets
import datasets.utils.logging as datasets_logging

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices', split='train')
tesla_df = pd.DataFrame(tesla_data)

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)

# Filtering data from January 2020 to March 2020
tesla_q1_2020 = tesla_df.loc['2020-01-01':'2020-03-31']
print("\nTesla stock data from January to March 2020:")
print(tesla_q1_2020.head())  # Display the first few rows of the filtered data
```

### Key Changes Made:
- **Date Filtering**: Changed the filter to `tesla_df.loc['2020-01-01':'2020-03-31']` to include the full date range from January 1, 2020, to March 31, 2020.
- **Output**: Adjusted the print statement to reflect the new date range.

### Summary of the Code:
- Load Tesla's historical stock prices.
- Convert the 'Date' column to datetime and set it as the index.
- Filter the data to include entries from January 1, 2020, to March 31, 2020.
- Print the filtered stock data.

With this code, you should be able to see Tesla stock data for the entire first quarter of 2020! 🚀




Imagine you are working as a data analyst and need to filter Tesla's stock data for the years 2020 to 2022 to analyze stock trends during that period. Your task is to identify and fix a bug in the given code. The code is designed to filter Tesla's stock data for the years 2020 to 2022. Find what's causing the issue and correct it.

Good luck!

`
mport pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)

# Filtering the dataset for the years 2020 to 2022
tesla_2020_2022 = tesla_df.loc['2020': '2022']
print(tesla_2020_2022.head())
`

Great job, Space Voyager! Now, let's make it a bit more challenging.

Your task is to fill in the missing pieces of code to filter the Tesla stock data for Q4 of 2019. Follow the TODO comments and make sure you succeed.

May the stars light your path!

`
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# TODO: Filter data for TSLA prices only in Q4 of 2019 (Oct 2019 - Dec 2019)
# Print the first few rows
tesla_q4_2019 = tesla_df.loc['2019-10-01':'2019-12-31']
print("\nTesla stock data from October to December 2019:")
print(tesla_q4_2019.head()) 
`

Let's get hands-on, Space Explorer! Fill in the missing pieces of code to filter data for the years 2018, 2019, and 2020.

May the stars guide your way!

`
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)

# TODO: Filter the dataset for the years 2018, 2019, 2020
tesla_2018_2020 = tesla_df.loc['2018': '2020']
print(tesla_2018_2020.head())
`



Introduction to Date Filtering
In this lesson, we'll explore how to filter time series financial data by date range using the Pandas library. Filtering data by specific date ranges is vital in financial analysis, allowing us to focus on periods of interest, such as a particular year or month. This skill is essential for traders and analysts who need to examine stock performance during specific periods, such as economic crises or fiscal quarters.

Converting Date Columns to Datetime Objects
The first step in filtering data by date is to ensure that the date column is in a suitable format. Let's start by loading the Tesla ($TSLA) stock dataset and converting the "Date" column to datetime objects using pd.to_datetime().

Python
Copy
Play
import pandas as pd
import datasets

# Load TSLA dataset
tesla_data = datasets.load_dataset('codesignal/tsla-historic-prices')
tesla_df = pd.DataFrame(tesla_data['train'])

# Convert the Date column to datetime type
tesla_df['Date'] = pd.to_datetime(tesla_df['Date'])

# Display initial rows to inspect the format
print(tesla_df.head())
The output of the above code confirms that the 'Date' column is now in datetime format, which is crucial for time series analysis:

Plain text
Copy
        Date      Open      High       Low     Close  Adj Close     Volume
0 2010-06-29  1.266667  1.666667  1.169333  1.592667   1.592667  281494500
1 2010-06-30  1.719333  2.028000  1.553333  1.588667   1.588667  257806500
2 2010-07-01  1.666667  1.728000  1.351333  1.464000   1.464000  123282000
3 2010-07-02  1.533333  1.540000  1.247333  1.280000   1.280000   77097000
4 2010-07-06  1.333333  1.333333  1.055333  1.074000   1.074000  103003500
Setting the Date Column as Index
Setting the date column as the index of the DataFrame and sorting it simplifies the process of slicing and filtering data based on dates. It also enhances performance during such operations.

Here’s how to set the "Date" column as the index and sort it:

Python
Copy
Play
# Set the Date column as the index
tesla_df.set_index('Date', inplace=True)

# Sort the DataFrame based on the index
tesla_df.sort_index(inplace=True)
print(tesla_df.head())
The output of the above code will be:

Plain text
Copy
                Open      High       Low     Close  Adj Close     Volume
Date                                                                    
2010-06-29  1.266667  1.666667  1.169333  1.592667   1.592667  281494500
2010-06-30  1.719333  2.028000  1.553333  1.588667   1.588667  257806500
2010-07-01  1.666667  1.728000  1.351333  1.464000   1.464000  123282000
2010-07-02  1.533333  1.540000  1.247333  1.280000   1.280000   77097000
2010-07-06  1.333333  1.333333  1.055333  1.074000   1.074000  103003500
This output confirms that the Date column has successfully been set as the index of the DataFrame and successfully sorted in chronological order based on this index, ensuring an accurate timeline for subsequent analysis.

Filtering Data by Specific Date Range
With the date column converted to datetime objects, set as the index, and sorted, we can now filter the DataFrame by a specific date range. This technique is particularly useful when you need to analyze data for a specific year, month, or any custom date range.

Let’s filter the dataset for the year 2020:

Python
Copy
Play
# Filtering the dataset for the year 2020
tesla_2020 = tesla_df.loc['2020']
print(tesla_2020.head())
In this code, loc is a Pandas method used for label-based indexing. It allows you to select rows and columns based on labels, such as dates in this case. Here, we use loc to filter the DataFrame based on the date labels, extracting all rows corresponding to the year 2020.

The output of the above code will be:

Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-01-02  28.299999  28.713333  28.114000  28.684000  28.684000  142981500
2020-01-03  29.366667  30.266666  29.128000  29.534000  29.534000  266677500
2020-01-06  29.364668  30.104000  29.333332  30.102667  30.102667  151995000
2020-01-07  30.760000  31.441999  30.224001  31.270666  31.270666  268231500
2020-01-08  31.580000  33.232666  31.215334  32.809334  32.809334  467164500
This output demonstrates the successful filtering of the DataFrame to show stock prices for the start of 2020. The simplified view focuses on the 'Open' column to display the opening stock prices at the beginning of the year, providing a quick insight into Tesla's stock performance during this period.

Other Ways to Filter by Date
We can also filter for more specific date ranges, such as a particular month or quarter:

Python
Copy
Play
# Filtering data for January 2020
tesla_jan_2020 = tesla_df.loc['2020-01']
print(tesla_jan_2020.head())
The output of the above code will be:

Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-01-02  28.299999  28.713333  28.114000  28.684000  28.684000  142981500
2020-01-03  29.366667  30.266666  29.128000  29.534000  29.534000  266677500
2020-01-06  29.364668  30.104000  29.333332  30.102667  30.102667  151995000
2020-01-07  30.760000  31.441999  30.224001  31.270666  31.270666  268231500
2020-01-08  31.580000  33.232666  31.215334  32.809334  32.809334  467164500
To filter a quarter, the code will look like this:

Python
Copy
Play
# Filtering data from January 2020 to March 2020 (Q1)
# Unlike the common Python slicing operator, here, March 2020 (2020-03) is inclusive
tesla_q1_2020 = tesla_df.loc['2020-01':'2020-03']
print(tesla_q1_2020.tail())
The output of the above code will be:

Plain text
Copy
                 Open       High        Low      Close  Adj Close     Volume
Date                                                                        
2020-03-25  36.349998  37.133331  34.074001  35.950001  35.950001  318340500
2020-03-26  36.492668  37.333332  34.150002  35.210667  35.210667  260710500
2020-03-27  33.666668  35.053333  32.935333  34.290668  34.290668  215661000
2020-03-30  34.017334  34.443333  32.748669  33.475334  33.475334  179971500
2020-03-31  33.416668  36.197334  33.133331  34.933334  34.933334  266572500
Plotting the Filtered Data
After filtering the data, visualizing it can help identify patterns and trends over the specified date range. We will use Matplotlib, a popular plotting library in Python, to create a time series plot.

Let's visualize the January 2020 data and the Q1 2020 data for Tesla stock:

Python
Copy
Play
import matplotlib.pyplot as plt

# Plotting the filtered data for Q1 2020
tesla_q1_2020 = tesla_df.loc['2020-01':'2020-03']

plt.figure(figsize=(10, 5))
plt.plot(tesla_q1_2020.index, tesla_q1_2020['Close'], marker='o', linestyle='-')
plt.title('Tesla Stock Prices in Q1 2020')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.grid(True)
plt.show()


By visualizing the data, you can gain insights into stock performance and identify trends over the specified periods, thereby enhancing your financial analysis capabilities.

Lesson Summary
In this lesson, you learned how to filter time series financial data by date ranges using the Pandas library. We covered converting date columns to datetime objects, setting the date column as the index, sorting the DataFrame chronologically, and filtering data by specific date ranges. These techniques are essential for focusing on specific periods relevant to your financial analysis or trading strategy.

Next, we will have some practice exercises to reinforce these concepts, making sure you are comfortable with filtering financial data by date. This practice will enhance your data manipulation skills, which are crucial for efficient time series analysis in trading and financial contexts. Let's get started!


