## Restaurant Data Analysis
This project analyzes restaurant sales data to uncover trends, optimize operations, and predict revenue.

### Data Loading
In this step, we load the datasets and explore their structure.

In [6]:
import pandas as pd

# File paths
path = '/Users/zinger/Documents/Restaurant_project'
orders_file = f"{path}/orders_05_2022.csv"
menu_file = f"{path}/menu-data.csv"
guests_file = f"{path}/guests.csv"

# Load datasets
orders_df = pd.read_csv(orders_file)
menu_df = pd.read_csv(menu_file)
guests_df = pd.read_csv(guests_file)

# Preview datasets
print("Orders preview:")
print(orders_df.head())

print("\nMenu preview:")
print(menu_df.head())

print("\nGuests preview:")
print(guests_df.head())

Orders preview:
   order_id  client_id  order_date order_time  menu_item_id  quantity  discount (%)  total_price ($)
0   1000001       1847  2022-05-01   08:00:00           134         1             0            13.50
1   1000001       1847  2022-05-01   08:00:00           111         3             0            20.97
2   1000001       1847  2022-05-01   08:00:00           123         2             0            11.98
3   1000002       1516  2022-05-01   08:01:00           108         1             0            12.99
4   1000002       1516  2022-05-01   08:01:00           140         1             5             2.84

Menu preview:
   menu_item_id              item_name     category  price ($)  cost ($)
0           101    Classic Beef Burger  Beef Burger      11.99       4.8
1           102       BBQ Bacon Burger  Beef Burger      13.50       5.4
2           103  Mushroom Swiss Burger  Beef Burger      12.99       5.2
3           104  Spicy Jalapeno Burger  Beef Burger      12.50       5.

### Processing Date and Time Data
In this step, we transform `order_date` and `order_time` into datetime-compatible formats. Additionally, we create a new column, `time_of_day`, 
which categorizes orders into three periods: Morning, Afternoon and Evening.

In [7]:


# Define file paths for the datasets
path = '/Users/zinger/Documents/Restaurant_project'
orders_file = f"{path}/orders_05_2022.csv"  # Path to orders data
menu_file = f"{path}/menu-data.csv"         # Path to menu data
guests_file = f"{path}/guests.csv"          # Path to guest data

# Load datasets into pandas DataFrames
orders_df = pd.read_csv(orders_file)  # Orders table: details of each order
menu_df = pd.read_csv(menu_file)      # Menu table: details of menu items
guests_df = pd.read_csv(guests_file)  # Guests table: details of restaurant customers

# STEP 1: Data integrity check

# Identify menu_item_id values in the orders table that are not in the menu table
missing_menu_ids = orders_df[~orders_df['menu_item_id'].isin(menu_df['menu_item_id'])]
print("Missing menu IDs in orders (not found in menu):")
print(missing_menu_ids)

# Identify client_id values in the orders table that are not in the guests table
missing_guest_ids = orders_df[~orders_df['client_id'].isin(guests_df['guest_id'])]
print("Missing guest IDs in orders (not found in guests):")
print(missing_guest_ids)

# STEP 2: Process date and time columns

# Convert the 'order_date' column from string to datetime format
orders_df['order_date'] = pd.to_datetime(orders_df['order_date'])

# Convert the 'order_time' column from string to time format
orders_df['order_time'] = pd.to_datetime(orders_df['order_time'], format='%H:%M:%S').dt.time

# Add a new column 'time_of_day' to categorize orders based on updated time periods
def get_time_of_day(order_time):
    """
    Categorize time of day into three periods:
    - Morning: Before 11 AM
    - Afternoon: 11 AM to 5 PM
    - Evening: 5 PM to 10 PM
    """
    if order_time < pd.to_datetime("11:00:00").time():
        return "Morning"
    elif order_time < pd.to_datetime("17:00:00").time():
        return "Afternoon"
    else:
        return "Evening"

# Apply the updated time categorization function
orders_df['time_of_day'] = orders_df['order_time'].apply(get_time_of_day)

# Display the updated orders data with the new time_of_day column
print("Updated Orders Data with time_of_day:")
print(orders_df[['order_time', 'time_of_day']].head())



Missing menu IDs in orders (not found in menu):
Empty DataFrame
Columns: [order_id, client_id, order_date, order_time, menu_item_id, quantity, discount (%), total_price ($)]
Index: []
Missing guest IDs in orders (not found in guests):
Empty DataFrame
Columns: [order_id, client_id, order_date, order_time, menu_item_id, quantity, discount (%), total_price ($)]
Index: []
Updated Orders Data with time_of_day:
  order_time time_of_day
0   08:00:00     Morning
1   08:00:00     Morning
2   08:00:00     Morning
3   08:01:00     Morning
4   08:01:00     Morning
