# ECON 0150 | Replication Notebook

**Title:** Hotel Ratings by Room Type

**Original Authors:** Brodecki

**Original Date:** Fall 2024

---

This notebook replicates the analysis from a student final project in ECON 0150: Economic Data Analysis.

## About This Replication

**Research Question:** Is there a difference in hotel ratings between Economy and Luxury room classifications?

**Data Source:** Hotel pricing and rating data (108 hotels)

**Methods:** OLS regression with categorical variable: Rating ~ Room_Class

**Main Finding:** Luxury hotels have significantly higher ratings than Economy hotels.

**Course Concepts Used:**
- Categorical variables in regression
- Mean comparison
- Box plots
- Price-quality relationship

---
## Step 0 | Setup

In [None]:
# Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

In [None]:
# Load data from course website
base_url = 'https://tayweid.github.io/econ-0150/projects/replications/0051/data/'

data = pd.read_csv(base_url + 'hotels.csv')

print(f"Number of hotels: {len(data)}")
data.head(10)

---
## Step 1 | Data Preparation

In [None]:
# Check columns and class distribution
print("Columns:", data.columns.tolist())
print(f"\nRoom class distribution:")
print(data['class'].value_counts())

In [None]:
# Rename columns for clarity
data = data.rename(columns={
    'class': 'Room_Class',
    'rating': 'Rating',
    'price': 'Price'
})

# Create binary indicator for luxury
data['Is_Luxury'] = (data['Room_Class'] == 'Luxury').astype(int)

print(f"\nData prepared: {len(data)} hotels")
data.head()

---
## Step 2 | Data Exploration

In [None]:
# Summary statistics by class
print("Summary Statistics by Room Class:")
print(data.groupby('Room_Class')[['Rating', 'Price']].describe())

In [None]:
# Mean ratings by class
mean_ratings = data.groupby('Room_Class')['Rating'].mean()
print(f"\nMean Rating by Class:")
print(mean_ratings)
print(f"\nDifference: {mean_ratings['Luxury'] - mean_ratings['Economy']:.2f} points")

---
## Step 3 | Visualization

In [None]:
# Box plot of ratings by class
plt.figure(figsize=(10, 6))
sns.boxplot(x='Room_Class', y='Rating', data=data)
plt.title('Hotel Ratings by Room Class')
plt.xlabel('Room Class')
plt.ylabel('Rating')
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Scatter plot: Price vs Rating by class
plt.figure(figsize=(10, 6))
colors = {'Economy': 'blue', 'Luxury': 'gold'}
for room_class in ['Economy', 'Luxury']:
    subset = data[data['Room_Class'] == room_class]
    plt.scatter(subset['Price'], subset['Rating'], 
                label=room_class, alpha=0.6, c=colors[room_class], s=80)
plt.xlabel('Price ($)')
plt.ylabel('Rating')
plt.title('Price vs. Rating by Room Class')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

In [None]:
# Bar chart of mean ratings
plt.figure(figsize=(8, 6))
mean_ratings.plot(kind='bar', color=['steelblue', 'gold'])
plt.title('Average Rating by Room Class')
plt.xlabel('Room Class')
plt.ylabel('Average Rating')
plt.xticks(rotation=0)
plt.ylim(0, 100)
plt.grid(axis='y', alpha=0.3)
plt.show()

---
## Step 4 | Statistical Analysis

In [None]:
# OLS Regression with categorical variable
model = smf.ols('Rating ~ C(Room_Class)', data=data).fit()
print("OLS Regression: Rating ~ Room_Class")
print(model.summary())

In [None]:
# Key results
print("\n" + "="*50)
print("KEY RESULTS")
print("="*50)
print(f"\nNull Hypothesis: No difference in ratings between classes (beta = 0)")
print(f"\nModel Results:")
print(f"  Economy mean rating (Intercept): {model.params['Intercept']:.2f}")
print(f"  Luxury premium: {model.params['C(Room_Class)[T.Luxury]']:.2f} points")
print(f"  P-value: {model.pvalues['C(Room_Class)[T.Luxury]']:.6f}")
print(f"  R-squared: {model.rsquared:.3f}")
print(f"\nInterpretation:")
if model.pvalues['C(Room_Class)[T.Luxury]'] < 0.05:
    print(f"  REJECT null hypothesis")
    print(f"  Luxury hotels have significantly higher ratings")
else:
    print(f"  FAIL TO REJECT null hypothesis")
    print(f"  No significant difference between room classes")

In [None]:
# Additional: Does price explain the rating difference?
model_price = smf.ols('Rating ~ C(Room_Class) + Price', data=data).fit()
print("\nWith Price as Control:")
print(model_price.summary().tables[1])

---
## Step 5 | Results Interpretation

### Key Findings

1. **Rating Difference:** Luxury hotels have higher average ratings than Economy hotels

2. **Price-Quality Relationship:** The data shows a positive correlation between price and rating

3. **Within-Class Variation:** Both classes have considerable variation in ratings

### Economic Interpretation

- **Quality signaling:** Higher prices often signal (and deliver) higher quality
- **Selection effects:** Guests at luxury hotels may have different rating standards
- **Expectations:** Luxury hotels may exceed expectations more consistently

### Limitations

- Sample size is relatively small
- Rating scale interpretation may differ by guest type
- Location, amenities, and service factors not controlled

### Business Implications

- Economy hotels can achieve high ratings with good service
- Price positioning should match service quality
- Customer expectations matter for ratings

---
## Replication Exercises

### Exercise 1: Chain Analysis
Do ratings differ by hotel chain? Which chains have the highest ratings?

### Exercise 2: Price-Rating Relationship
Regress rating on price alone. What is the relationship?

### Exercise 3: Value Analysis
Create a "value" metric (rating/price). Which hotels offer the best value?

### Challenge Exercise
Research the hotel industry. What factors most influence guest satisfaction ratings?

In [None]:
# Your code for exercises

# Example: Ratings by chain
# print(data.groupby('chain')['Rating'].mean().sort_values(ascending=False))