# Lesson 2: Working with Dates and Times

Here's your text fixed in Markdown format:

```markdown
# Lesson Introduction

Working with dates and times is crucial in data analysis. Imagine analyzing sales data over time to understand seasonal trends. To make sense of such data, you need to handle dates and times accurately.

## Today's goals:

- Convert columns with date info to datetime format, even if they are in different formats.
- Extract specific components like the year from datetime data.
- Perform basic datetime operations such as finding time differences and obtaining today's date.

By the end, you'll be comfortable manipulating dates and times in Pandas. Let's start!

### Converting Columns to Datetime

Date info often comes as text, which isn't very useful for analysis. Converting this text to datetime format lets us use powerful features in Pandas.

The `pd.to_datetime()` function converts different date formats correctly. Here's an example:

```python
import pandas as pd

# Sample data
data = {
    'order_date': ['2023-10-01', '10/02/2023', 'October 3 2023', '2023.10.04']
}
sales = pd.DataFrame(data)

# Convert 'order_date' to datetime
sales['order_date'] = pd.to_datetime(sales['order_date'], format='mixed')

print(sales)
```

**Output:**

```
  order_date
0 2023-10-01
1 2023-10-02
2 2023-10-03
3 2023-10-04
```

This example converts various date formats into datetime objects, making date operations easier. Note that you need to specify `format='mixed'`, so format will be inferred for each element individually.

### Extracting Components from Datetime

With a column in datetime format, we can extract components like the year, month, or day using the `.dt` accessor. Here’s how to extract the year, month, and day:

```python
# Extract year, month, and day from datetime
sales['year'] = sales['order_date'].dt.year
sales['month'] = sales['order_date'].dt.month
sales['day'] = sales['order_date'].dt.day

print(sales)
```

**Output:**

```
  order_date  year  month  day
0 2023-10-01  2023     10    1
1 2023-10-02  2023     10    2
2 2023-10-03  2023     10    3
3 2023-10-04  2023     10    4
```

This code creates new columns for the year, month, and day, which can be useful for time-based analyses like finding monthly or seasonal trends.

### Basic Datetime Operations

Pandas also allows for various datetime operations. For example, finding the time difference between two dates and obtaining today's date:

```python
from datetime import datetime

# Calculate time delta
sales['time_since_order'] = datetime.now() - sales['order_date']

# Today's date
today = pd.to_datetime('today')

print(sales)
print("Today's date:", today)
```

**Output:**

```
  order_date  year  month  day     time_since_order
0 2023-10-01  2023     10    1      3 days 10:23:30.456789
1 2023-10-02  2023     10    2      2 days 10:23:30.456789
2 2023-10-03  2023     10    3      1 day  10:23:30.456789
3 2023-10-04  2023     10    4      0 days 10:23:30.456789
Today's date: 2023-10-05
```

This code calculates the time difference between each order date and the current date, as well as retrieves today's date.

## Lesson Summary

Today, we learned:

- Converting date columns to datetime format using `pd.to_datetime()`, even for multiple formats.
- Extracting components like the year using the `.dt` accessor.
- Performing basic datetime operations such as finding time differences and obtaining today's date.

Understanding datetime manipulation is essential for efficient data analysis, enabling easy time-based computations.

Now it's time to apply your new skills. In the practice session, you’ll convert columns, extract date components, and explore more datetime features. Dive into the hands-on practice to reinforce today's knowledge!
```
This Markdown format organizes the content effectively with appropriate headers and code blocks, making it clear and easy to follow.

## Analyzing Monthly Sales Trends

Space Wanderer, we must analyze sales trends over time! In the given code, we convert various date formats to datetime, extract the month component, and add it as a new column. This will help us understand which months have the most sales.

Run the code to see how it works!

```py
import pandas as pd

# Sample sales data with various date formats
data = {
    'order_date': ['2023-12-25', 'Jan 15, 2024', '2024-01-20', 'February 1, 2024']
}
sales = pd.DataFrame(data)

# Convert 'order_date' to datetime
sales['order_date'] = pd.to_datetime(sales['order_date'], format='mixed')

# Extract month from datetime and create a new column 'month'
sales['month'] = sales['order_date'].dt.month

print(sales)

```

## Extract Year from Sale Date

Great job, Space Explorer!

Now, let’s enhance our date analysis. Modify the code to also extract the year from the sale_date and include it in the final DataFrame.

Let's code!

```py
import pandas as pd
from datetime import datetime

# Sample data
data = {
    'sale_date': ['01-11-2023', 'November 2, 2023', '03/11/2023', '2023/11/04'],
    'product': ['A', 'B', 'C', 'D'],
    'amount': [100, 150, 200, 250]
}
sales = pd.DataFrame(data)

# Convert 'sale_date' to datetime format
sales['sale_date'] = pd.to_datetime(sales['sale_date'], format='mixed')

# Extract month and day for analysis
sales['month'] = sales['sale_date'].dt.month
sales['day'] = sales['sale_date'].dt.day

print(sales[['sale_date', 'product', 'amount', 'month', 'day']])

```

To enhance your date analysis by also extracting the year from the `sale_date` and including it in the final DataFrame, you can simply add a line to extract the year using the `.dt.year` accessor. Here's the modified code:

```python
import pandas as pd
from datetime import datetime

# Sample data
data = {
    'sale_date': ['01-11-2023', 'November 2, 2023', '03/11/2023', '2023/11/04'],
    'product': ['A', 'B', 'C', 'D'],
    'amount': [100, 150, 200, 250]
}
sales = pd.DataFrame(data)

# Convert 'sale_date' to datetime format
sales['sale_date'] = pd.to_datetime(sales['sale_date'], format='mixed')

# Extract year, month, and day for analysis
sales['year'] = sales['sale_date'].dt.year
sales['month'] = sales['sale_date'].dt.month
sales['day'] = sales['sale_date'].dt.day

# Print the DataFrame with the new 'year' column included
print(sales[['sale_date', 'product', 'amount', 'year', 'month', 'day']])
```

This code will output the DataFrame with columns for the sale date, product, amount, year, month, and day, giving you a comprehensive view of each sale's timing details. This is particularly useful for any time series analysis or reporting where the year, month, and day are relevant metrics.

## Analyzing Sales Data Over Time

Great job, data voyager! Now, let's enhance your skills a bit more.

Fill in the missing pieces to convert the order_date column to datetime format and calculate the days since each order.

May the cosmos guide you!

```py
import pandas as pd
from datetime import datetime

# Sales data with date information
data = {'order_date': ['2023/01/05', '2023/02/15', '2023/03/10']}
sales = pd.DataFrame(data)

# TODO: Convert 'order_date' to datetime format

# Extract month from 'order_date' 
sales['month'] = sales['order_date'].dt.month
# TODO: Calculate days since each order date and put it into a separate days_since_order column

print(sales)

```

To complete your sales data analysis script, you need to convert the `order_date` column to a datetime format and calculate the days since each order. Here's how you can accomplish these tasks:

1. Use the `pd.to_datetime()` function to convert the `order_date` column.
2. Calculate the days since each order by subtracting the order date from today's date.

Here's the modified Python script with the missing pieces filled in:

```python
import pandas as pd
from datetime import datetime

# Sales data with date information
data = {'order_date': ['2023/01/05', '2023/02/15', '2023/03/10']}
sales = pd.DataFrame(data)

# Convert 'order_date' to datetime format
sales['order_date'] = pd.to_datetime(sales['order_date'])

# Extract month from 'order_date' 
sales['month'] = sales['order_date'].dt.month

# Calculate days since each order date and put it into a separate 'days_since_order' column
today = pd.to_datetime('today')
sales['days_since_order'] = (today - sales['order_date']).dt.days

print(sales)
```

This script will now output a DataFrame that includes the order date, the month of each order, and the number of days since each order was placed. The `days_since_order` column is calculated by subtracting the `order_date` from today's date (captured in the `today` variable), and then extracting the number of days from the resulting timedelta using `.dt.days`. This provides a clear view of how long it has been since each order was made, which can be useful for follow-up actions or analysis.

## Analyze Retail Sales Data Using Datetime Operations

Space Voyager, it's time to analyze retail sales data! Follow the steps to convert the order_date column to the datetime format, handle mixed formats, extract the day of the week, and calculate the days from a fixed reference date to the order date. Extracting the day of the week is easy: simply use .dt.day_name() to your datetime object.

Let's go!

```py
import pandas as pd
from datetime import datetime

# Sample sales data
data = {
    'order_id': [1, 2, 3],
    'order_date': ['2023-09-30', '10/01/2023', 'October 2, 2023']
}
sales = pd.DataFrame(data)

# TODO: Convert 'order_date' to datetime, ensuring to handle mixed date formats with format='mixed'

# TODO: Extract the day of the week from 'order_date' column

# TODO: Calculate the number of days from a fixed reference date 2023-10-05 to the order date by subtracting 'order_date' from the reference date

# TODO: Print the DataFrame to verify the results


```

To analyze your retail sales data as described, you'll need to follow these steps in your Python script:

1. Convert the `order_date` column to the datetime format, ensuring to handle mixed date formats.
2. Extract the day of the week from the `order_date` column.
3. Calculate the number of days from a fixed reference date (2023-10-05) to the order date by subtracting the `order_date` from the reference date.
4. Print the DataFrame to verify the results.

Here's how you can modify your script to achieve these tasks:

```python
import pandas as pd
from datetime import datetime

# Sample sales data
data = {
    'order_id': [1, 2, 3],
    'order_date': ['2023-09-30', '10/01/2023', 'October 2, 2023']
}
sales = pd.DataFrame(data)

# Convert 'order_date' to datetime, ensuring to handle mixed date formats
sales['order_date'] = pd.to_datetime(sales['order_date'], format='mixed')

# Extract the day of the week from 'order_date' column
sales['day_of_week'] = sales['order_date'].dt.day_name()

# Calculate the number of days from a fixed reference date 2023-10-05 to the order date
reference_date = pd.to_datetime('2023-10-05')
sales['days_from_reference'] = (reference_date - sales['order_date']).dt.days

# Print the DataFrame to verify the results
print(sales)
```

### Explanation:
- **Datetime Conversion**: The `pd.to_datetime()` function is used with the `format='mixed'` argument to handle different date formats in the `order_date` column.
- **Extract Day of the Week**: The `.dt.day_name()` method is used to get the name of the day from the `order_date`.
- **Days Calculation**: The number of days from the reference date to each order date is calculated by subtracting `order_date` from `reference_date`, and `.dt.days` is used to extract the number of days from the resulting timedelta objects.

This script will output a DataFrame that includes the original order data along with the day of the week for each order and the number of days from the reference date to each order date, providing comprehensive date-related insights into the sales data.