# Step 1 - Adding Features

### Installation and Setup

In [None]:
%%capture
%pip install pandas 

In [None]:
import pandas as pd

### Load Data

In [2]:
import pandas as pd

hotels_file_path = "../data/hotels_data.csv"  
df = pd.read_csv(hotels_file_path)

df.head()

Unnamed: 0,Snapshot ID,Snapshot Date,Checkin Date,Days,Original Price,Discount Price,Discount Code,Available Rooms,Hotel Name,Hotel Stars
0,1,7/17/2015 0:00,8/12/2015 0:00,5,1178,1040,1,6,Best Western Plus Seaport Inn Downtown,3
1,1,7/17/2015 0:00,8/19/2015 0:00,5,1113,982,1,8,Best Western Plus Seaport Inn Downtown,3
2,1,7/17/2015 0:00,8/13/2015 0:00,5,4370,4240,1,3,The Peninsula New York,5
3,1,7/17/2015 0:00,7/26/2015 0:00,5,1739,1667,1,18,Eventi Hotel a Kimpton Hotel,4
4,1,7/17/2015 0:00,8/12/2015 0:00,5,1739,1672,1,3,Eventi Hotel a Kimpton Hotel,4


### Add columns and load to new CSV

In [3]:
#Ensure dates are in correct format
df['Snapshot Date'] = pd.to_datetime(df['Snapshot Date'])
df['Checkin Date'] = pd.to_datetime(df['Checkin Date'])

df['DayDiff'] = (df['Checkin Date'] - df['Snapshot Date']).dt.days
df['WeekDay'] = df['Checkin Date'].dt.day_name()
df['DiscountDiff'] = df['Original Price'] - df['Discount Price']
df['DiscountPerc'] = (df['DiscountDiff'] / df['Original Price']) * 100

display(df.head())

Unnamed: 0,Snapshot ID,Snapshot Date,Checkin Date,Days,Original Price,Discount Price,Discount Code,Available Rooms,Hotel Name,Hotel Stars,DayDiff,WeekDay,DiscountDiff,DiscountPerc
0,1,2015-07-17,2015-08-12,5,1178,1040,1,6,Best Western Plus Seaport Inn Downtown,3,26,Wednesday,138,11.714771
1,1,2015-07-17,2015-08-19,5,1113,982,1,8,Best Western Plus Seaport Inn Downtown,3,33,Wednesday,131,11.769991
2,1,2015-07-17,2015-08-13,5,4370,4240,1,3,The Peninsula New York,5,27,Thursday,130,2.974828
3,1,2015-07-17,2015-07-26,5,1739,1667,1,18,Eventi Hotel a Kimpton Hotel,4,9,Sunday,72,4.140311
4,1,2015-07-17,2015-08-12,5,1739,1672,1,3,Eventi Hotel a Kimpton Hotel,4,26,Wednesday,67,3.852789



**New csv format**

| Column Name       | Description                                                                                         | Example Value                       |
|--------------------|-----------------------------------------------------------------------------------------------------|-------------------------------------|
| **Snapshot ID**    | Unique identifier for each snapshot of data                                                        | 1                                   |
| **Snapshot Date**  | The date when the snapshot was taken                                                               | 2015-07-17                          |
| **Checkin Date**   | The date of check-in for the hotel                                                                 | 2015-08-12                          |
| **Days**           | Duration of the stay in days                                                                       | 5                                   |
| **Original Price** | Price of the stay without any discount (in dollars)                                                | 1178                                |
| **Discount Price** | Price of the stay after applying the discount (in dollars)                                         | 1040                                |
| **Discount Code**  | Code representing the type of discount applied (values 1-4, with 1 indicating no discount possible) | 1                                   |
| **Available Rooms**| Number of rooms available at the specified check-in date                                           | 6                                   |
| **Hotel Name**     | Name of the hotel                                                                                  | Best Western Plus Seaport Inn Downtown |
| **Hotel Stars**    | Star rating of the hotel                                                                           | 3                                   |
| **DayDiff**        | Number of days between the Snapshot Date and Checkin Date                                          | 26                                  |
| **WeekDay**        | Day of the week corresponding to the Checkin Date                                                  | Wednesday                           |
| **DiscountDiff**   | Difference between the Original Price and Discount Price (in dollars)                              | 138                                 |
| **DiscountPerc**   | Percentage of discount applied, calculated as `(DiscountDiff / Original Price) * 100`             | 11.714770797962649                           |


### Save To CSV

In [4]:
changed_hotels_path = "../data/hotels_data_changed.csv"
df.to_csv(changed_hotels_path, index=False)