## Hands-on - Matrices, DataFrames, and Time-Series Data

In [47]:
# Import necessary libraries
import pandas as pd  # pandas is used for handling tabular datasets (dataframes) and performing operations such as reading CSV files
import numpy as np  # numpy is used for numerical computations such as working with arrays and applying mathematical operations

# Load dataset from GitHub URL
file_path = "https://raw.githubusercontent.com/Hamed-Ahmadinia/DASP-2025/main/Bike%20Sales.csv"  # URL link to the dataset stored on GitHub

# Read the dataset into a pandas dataframe
df = pd.read_csv(file_path)  # Load the dataset as a pandas DataFrame

# Display the first few rows of the dataframe to confirm the data has been loaded correctly
print("Dataset Preview:")  # Print a label for context
print(df.head(5))  # Display the first 5 rows of the dataset

Dataset Preview:
         Date  Day     Month  Year  Customer_Age       Age_Group  \
0  2013-11-26   26  November  2013            19     Youth (<25)   
1  2015-11-26   26  November  2015            19     Youth (<25)   
2  2014-03-23   23     March  2014            49  Adults (35-64)   
3  2016-03-23   23     March  2016            49  Adults (35-64)   
4  2014-05-15   15       May  2014            47  Adults (35-64)   

  Customer_Gender    Country             State Product_Category Sub_Category  \
0               M     Canada  British Columbia      Accessories   Bike Racks   
1               M     Canada  British Columbia      Accessories   Bike Racks   
2               M  Australia   New South Wales      Accessories   Bike Racks   
3               M  Australia   New South Wales      Accessories   Bike Racks   
4               F  Australia   New South Wales      Accessories   Bike Racks   

               Product  Order_Quantity  Unit_Cost  Unit_Price  Profit  Cost  \
0  Hitch Rack 

### **Exercise 1: Convert the "Date" column to datetime format**
**Question:** Convert the "Date" column to pandas datetime format.

In [48]:
df['Date'] = pd.to_datetime(df['Date'])

print("New date format:")
print(df.head(5))

New date format:
        Date  Day     Month  Year  Customer_Age       Age_Group  \
0 2013-11-26   26  November  2013            19     Youth (<25)   
1 2015-11-26   26  November  2015            19     Youth (<25)   
2 2014-03-23   23     March  2014            49  Adults (35-64)   
3 2016-03-23   23     March  2016            49  Adults (35-64)   
4 2014-05-15   15       May  2014            47  Adults (35-64)   

  Customer_Gender    Country             State Product_Category Sub_Category  \
0               M     Canada  British Columbia      Accessories   Bike Racks   
1               M     Canada  British Columbia      Accessories   Bike Racks   
2               M  Australia   New South Wales      Accessories   Bike Racks   
3               M  Australia   New South Wales      Accessories   Bike Racks   
4               F  Australia   New South Wales      Accessories   Bike Racks   

               Product  Order_Quantity  Unit_Cost  Unit_Price  Profit  Cost  \
0  Hitch Rack - 4-Bi

### **Exercise 2: Set the "Date" column as the index**
**Question:** Set the "Date" column as the index of the DataFrame.

In [49]:
df.set_index('Date', inplace=True)
df.sort_index(inplace=True)

print("Date as index:")
print(df.head(5))

Date as index:
            Day    Month  Year  Customer_Age             Age_Group  \
Date                                                                 
2011-01-01    1  January  2011            42        Adults (35-64)   
2011-01-01    1  January  2011            33  Young Adults (25-34)   
2011-01-01    1  January  2011            17           Youth (<25)   
2011-01-01    1  January  2011            39        Adults (35-64)   
2011-01-01    1  January  2011            23           Youth (<25)   

           Customer_Gender        Country             State Product_Category  \
Date                                                                           
2011-01-01               M  United States        California            Bikes   
2011-01-01               F         France           Yveline            Bikes   
2011-01-01               M         Canada  British Columbia            Bikes   
2011-01-01               M  United States        Washington            Bikes   
2011-01-01    

### **Exercise 3: Slice the data from '2013-01-01' to '2013-12-31'**
**Question:** Slice the DataFrame to show data for the year 2013.

In [50]:
print("Data for year 2013")
print(df['2013-01-01':'2013-12-31'])

Data for year 2013
            Day     Month  Year  Customer_Age             Age_Group  \
Date                                                                  
2013-01-01    1   January  2013            29  Young Adults (25-34)   
2013-01-01    1   January  2013            29  Young Adults (25-34)   
2013-01-01    1   January  2013            19           Youth (<25)   
2013-01-01    1   January  2013            53        Adults (35-64)   
2013-01-01    1   January  2013            42        Adults (35-64)   
...         ...       ...   ...           ...                   ...   
2013-12-31   31  December  2013            53        Adults (35-64)   
2013-12-31   31  December  2013            46        Adults (35-64)   
2013-12-31   31  December  2013            27  Young Adults (25-34)   
2013-12-31   31  December  2013            26  Young Adults (25-34)   
2013-12-31   31  December  2013            25  Young Adults (25-34)   

           Customer_Gender        Country              St

### **Exercise 4: Calculate cumulative revenue**
**Question:** Add a new column 'Cumulative_Revenue' that shows the cumulative sum of the revenue.

In [51]:
df['Cumulative_Revenue'] = df['Revenue'].cumsum()

print('Added cumulative revenue')
print(df.head(5))

Added cumulative revenue
            Day    Month  Year  Customer_Age             Age_Group  \
Date                                                                 
2011-01-01    1  January  2011            42        Adults (35-64)   
2011-01-01    1  January  2011            33  Young Adults (25-34)   
2011-01-01    1  January  2011            17           Youth (<25)   
2011-01-01    1  January  2011            39        Adults (35-64)   
2011-01-01    1  January  2011            23           Youth (<25)   

           Customer_Gender        Country             State Product_Category  \
Date                                                                           
2011-01-01               M  United States        California            Bikes   
2011-01-01               F         France           Yveline            Bikes   
2011-01-01               M         Canada  British Columbia            Bikes   
2011-01-01               M  United States        Washington            Bikes   
2011

### **Exercise 5: Downsample to show monthly total revenue (Hint: Use resampling)**
**Question:** Resample the data to calculate total monthly revenue.

In [52]:
monthly_revenue = df.resample('ME')['Revenue'].sum()

print("Monthly revenue:")
print(monthly_revenue)

Monthly revenue:
Date
2011-01-31     675193
2011-02-28     637598
2011-03-31     708517
2011-04-30     698782
2011-05-31     734537
               ...   
2016-03-31    2608663
2016-04-30    2756864
2016-05-31    3264343
2016-06-30    3586300
2016-07-31     499960
Freq: ME, Name: Revenue, Length: 67, dtype: int64
