# 🧠 Similarity-Based Salon Recommendation Engine

> ⚠️ This README was generated with the assistance of a Generative AI tool to document and summarize the project structure, logic, and purpose.

## 📘 Project Overview

This project implements a modular, object-oriented engine to recommend **similar salons** based on business, social, and demographic metrics. Given a query profile, the engine returns a ranked list of stores most similar to the input using **cosine similarity** on normalized feature vectors.

The architecture is designed for real-time use cases, where structured inputs are passed through a filtering and scoring logic pipeline to drive dynamic recommendations.

---

## ⚙️ Key Features

- Accepts structured input via Python dictionary
- Optional filters: store type, minimum review rating, exclude previously visited
- Feature normalization using `MinMaxScaler`
- Similarity scoring via `cosine_similarity`
- Returns top-k most similar stores with relevant metadata

---

## 🧾 Input Data

Each salon in the dataset includes:

| Feature             | Description                                          |
|---------------------|------------------------------------------------------|
| `monthly_sales`     | Average monthly revenue (can be 0 if unvisited)      |
| `social_followers`  | Number of social media followers                     |
| `social_tags`       | Number of user-generated social tags                 |
| `check_ins`         | Location-based user visits                           |
| `avg_income`        | Average income in the store’s surrounding area       |
| `urban_density`     | Urban population density index                       |
| `avg_review`        | Customer review rating (1.0 to 5.0 scale)            |
| `store_type`        | Store category label (`hair_salon`, `barber_shop`, etc.) |

---

## 🧰 Engine Class: `SalonSimilarityEngine`

### `__init__(salons_df: pd.DataFrame)`
- Loads and stores the dataset
- Defines the feature set used for similarity
- Scales numeric features to the `[0, 1]` range using `MinMaxScaler`

### `find_similar_salons(query: dict, top_k: int = 5)`
- Accepts a query dictionary with any of the feature keys
- Filters dataset based on optional fields:
  - `salon_type`: limit to a specific category of salons
  - `min_avg_review`: exclude poorly reviewed salons
  - `exclude_visited`: if `True`, exclude stores with `monthly_sales == 0`
- Scales the query and the filtered dataset
- Computes cosine similarity between query and salons
- Returns the top-k most similar salons with similarity scores

---

## 🔍 Example Use Case

```python
query = {
    "salon_type": "hair_salon",
    "monthly_sales": 20000,
    "social_followers": 4500,
    "social_tags": 120,
    "check_ins": 300,
    "avg_income": 70000,
    "urban_density": 0.85,
    "avg_review": 4.5,
    "min_avg_review": 4.0,
    "exclude_visited": True
}
```

---

## 📦 Dependencies

- `pandas`
- `numpy`
- `scikit-learn`




In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import cosine_similarity

df_salons = pd.read_csv("../data/synthetic_salon_dataset.csv")
df_salons

Unnamed: 0,salon_id,salon_type,monthly_sales,social_tags,social_followers,check_ins,avg_income,urban_density,avg_review,future_sales_90p
0,salon_000,Other,0.000000,44,1291,833,60540.708743,0.967994,3.5,0.000000
1,salon_001,hair_salon,24438.742298,53,6413,807,62403.511561,0.710952,4.6,22369.411724
2,salon_002,Other,24471.661651,47,7038,251,68233.525163,0.199507,1.6,11840.402952
3,salon_003,Other,23774.988986,56,2781,260,80524.527417,0.736247,1.4,16596.802689
4,salon_004,hair_salon,18964.170549,38,9509,886,62511.949722,0.529840,2.0,23385.663738
...,...,...,...,...,...,...,...,...,...,...
95,salon_095,hair_salon,22780.608993,51,7126,444,68318.181761,0.959075,3.3,30207.911845
96,salon_096,hair_salon,25219.303055,60,355,112,42538.384413,0.342524,4.5,30129.071034
97,salon_097,Other,22632.240807,40,4904,431,86166.999466,0.227351,3.2,28171.855782
98,salon_098,hair_salon,26819.432622,56,1938,713,55692.076121,0.423597,3.9,22869.624489


In [2]:
class Salon:
    def __init__(self, salon_id: str, monthly_sales: float, social_tags: int, social_followers: int, 
                 check_ins: int, avg_income: float, urban_density: float, future_sales_90p: float, 
                salon_type: str, avg_review: float):
        self.salon_id = salon_id
        self.monthly_sales = monthly_sales
        self.social_tags = social_tags
        self.social_followers = social_followers
        self.check_ins = check_ins
        self.avg_income = avg_income
        self.urban_density = urban_density
        self.future_sales_90p = future_sales_90p
        self.salon_type = salon_type
        self.avg_review = avg_review

    def total_social_score(self) -> int:
        return self.social_followers + (5 * self.social_tags) + (2 * self.check_ins)

    def location_score(self) -> float:
        return self.avg_income * self.urban_density

    def profile(self) -> dict:
        return { 
            "salon_id": self.salon_id,
            "monthly_sales": self.monthly_sales,
            "social_score": self.total_social_score(),
            "location_score": self.location_score()
        }
        
    @classmethod
    def from_row(cls, row):
        return cls(
            salon_id=row["salon_id"],
            monthly_sales=row["monthly_sales"],
            social_tags=row["social_tags"],
            social_followers=row["social_followers"],
            check_ins=row["check_ins"],
            avg_income=row["avg_income"],
            urban_density=row["urban_density"],
            future_sales_90p=row["future_sales_90p"],
            salon_type=row["salon_type"],
            avg_review=row["avg_review"]
            )

In [3]:
class SalonSimilarityEngine:
    def __init__(self, salons_df: pd.DataFrame):
        self.salons_df = salons_df.copy()
        self.feature_cols = [
            "monthly_sales", "social_followers", "social_tags", "check_ins",
            "avg_income", "urban_density", "avg_review"
        ]
        self.scaler = MinMaxScaler()
        self.scaled_features = self.scaler.fit_transform(self.salons_df[self.feature_cols])

    def find_similar_salons(self, query: dict, top_k: int = 5) -> pd.DataFrame:
        df = self.salons_df.copy()

        # Filter: by salon type
        if "salon_type" in query:
            df = df[df["salon_type"] == query["salon_type"]]

        # Filter: exclude visited (monthly_sales == 0)
        if query.get("exclude_visited", False):
            df = df[df["future_sales_90p"] == 0]

        # Filter: minimum average review
        if "min_avg_review" in query:
            df = df[df["avg_review"] >= query["min_avg_review"]]

        if df.empty:
            return pd.DataFrame(columns=["salon_id", "salon_type", "avg_review", "monthly_sales", "similarity"])

        # Scale the filtered dataset
        scaled_features = self.scaler.transform(df[self.feature_cols])

        # Scale the query
        query_vector = self.scaler.transform([[
            query.get("monthly_sales", 0),
            query.get("social_followers", 0),
            query.get("social_tags", 0),
            query.get("check_ins", 0),
            query.get("avg_income", 0),
            query.get("urban_density", 0),
            query.get("avg_review", 0)
        ]])

        # Compute cosine similarity
        similarities = cosine_similarity(query_vector, scaled_features)[0]
        df["similarity"] = similarities

        return df.sort_values(by="similarity", ascending=False).head(top_k)[
            ["salon_id", "salon_type", "avg_review", "monthly_sales", "similarity"]
        ]

In [4]:
# Put salons in objects
salons = [Salon.from_row(row) for _, row in df_salons.iterrows()]

# Instantiate engine
similarity_engine = SalonSimilarityEngine(df_salons)

In [5]:
# Query: Avg Review
query = {
    "monthly_sales": 18000,
    "social_followers": 5000,
    "social_tags": 100,
    "check_ins": 100,
    "avg_income": 65000,
    "urban_density": 0.9,
    "avg_review": 4.0
}

# Get similar salons
top_matches = similarity_engine.find_similar_salons(query, top_k=5)
print(top_matches)

     salon_id   salon_type  avg_review  monthly_sales  similarity
5   salon_005   hair_salon         3.9   16882.613025    0.935789
64  salon_064  barber_shop         2.4   22388.770497    0.920718
52  salon_052  barber_shop         3.1   21074.694150    0.887133
87  salon_087   hair_salon         4.5   13885.632317    0.887107
71  salon_071  barber_shop         3.9   14251.037113    0.884766




In [6]:
query = {
    "monthly_sales": 18000,
    "social_followers": 5000,
    "social_tags": 100,
    "check_ins": 250,
    "avg_income": 65000,
    "urban_density": 0.9,
    "avg_review": 3,
    "min_avg_review": 3,       
    "exclude_visited": True
}

# Get similar salons
top_matches = similarity_engine.find_similar_salons(query, top_k=10)
print(top_matches)

     salon_id  salon_type  avg_review  monthly_sales  similarity
44  salon_044  hair_salon         3.3            0.0    0.763471
53  salon_053  hair_salon         4.9            0.0    0.747780
83  salon_083  hair_salon         3.1            0.0    0.690108
80  salon_080       Other         4.0            0.0    0.629149
39  salon_039       Other         4.7            0.0    0.629038
0   salon_000       Other         3.5            0.0    0.622676


