# Store Recommendation System: Content-Based & Collaborative Filtering

In this notebook, we will develop a store recommendation system that uses **Content-Based Filtering** and **Collaborative Filtering** techniques. We will start by encoding thrift store features such as categories and geolocation and then build a model that recommends thrift stores to users.

## Table of Contents
1. [Content-Based Filtering](#content-based-filtering)
    - Convert store categories and geolocation into text representation
    - Use TF-IDF to encode store features
    - Compute cosine similarity to find the most similar stores
2. [Collaborative Filtering](#collaborative-filtering)
    - Create a user-store interaction matrix
    - Apply Singular Value Decomposition (SVD) for dimensionality reduction
    - Predict missing values in the user-store matrix
    - Recommend top unvisited stores based on predicted scores


In [1]:
import pandas as pd
import numpy as np
import hashlib

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse.linalg import svds

import warnings
warnings.filterwarnings("ignore")

In [2]:
# Load data
data = pd.read_csv('/Users/vanessafarias/RecommendME/data/processed/thrift_stores_mtl_preprocessed.csv')
print(data.shape)

reviews = pd.read_csv('/Users/vanessafarias/RecommendME/data/external/store_reviews.csv')
print(reviews.shape)

(40, 9)
(38, 4)


In [3]:
# Function to generate a unique Store ID
def generate_store_id(store, neighborhood):
    unique_string = f"{store}_{neighborhood}".lower().replace(" ", "_")  # Normalize
    store_id = hashlib.md5(unique_string.encode()).hexdigest()[:8]  # Short hash
    return store_id

In [4]:
# Pre-processing 
# Leave relevant columns and rename 
data.rename(columns = {'section': 'category', 'name':'store_name'}, inplace = True)
data.drop(['url', 'address', 'corrected_address', 'coordinates'], axis = 1, inplace = True)

# Apply function to create Store ID
data["store_id"] = data.apply(lambda row: generate_store_id(row["store_name"], row["neighborhood"]), axis=1)
data.head(3)

Unnamed: 0,category,store_name,neighborhood,latitude,longitude,store_id
0,Hand-Picked Cool,Annex x LOCAL,Mile-End,45.524801,-73.597082,be11d139
1,Hand-Picked Cool,Ex-Voto,La Petite-Patrie,45.530786,-73.610191,c2c4bb8b
2,Hand-Picked Cool,LNF,Villeray–Saint-Michel–Parc-Extension,45.530093,-73.622396,1525264d


In [5]:
reviews['store_name'] = reviews.store_name.replace('LNF (Les Nouvelles Friperies)', 'LNF')\
                                          .replace('Annex x LOCAL', 'Annex x LOCAL')\
                                          .replace('Ex-Voto Vintage', 'Ex-Voto')

reviews.drop('address', axis = 1, inplace = True)

In [6]:
reviewed_data = data.merge(reviews, on = 'store_name', how = 'left')
reviewed_data.head(3)

Unnamed: 0,category,store_name,neighborhood,latitude,longitude,store_id,star_rating,n_reviews
0,Hand-Picked Cool,Annex x LOCAL,Mile-End,45.524801,-73.597082,be11d139,4.5,50
1,Hand-Picked Cool,Ex-Voto,La Petite-Patrie,45.530786,-73.610191,c2c4bb8b,4.8,25
2,Hand-Picked Cool,LNF,Villeray–Saint-Michel–Parc-Extension,45.530093,-73.622396,1525264d,4.6,40


### 1. Content-Based Filtering (Similarity by Store Features)

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Create text-based feature (category + neighborhood + rating)
reviewed_data['features'] = reviewed_data["category"] + " " + reviewed_data["neighborhood"]+ " " + str(reviewed_data["star_rating"])+ " " + str(reviewed_data["n_reviews"])

# Convert text into numerical features using TF-IDF
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(reviewed_data["features"])

# Compute cosine similarity
similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)

# Function to recommend similar stores
def recommend_similar_stores(df, store_name, top_n=3):
    if store_name not in df["store_name"].values:
        return "Store not found."
    
    # Get store index
    store_index = df[df["store_name"] == store_name].index[0]
    
    # Get similarity scores
    similarity_scores = list(enumerate(similarity_matrix[store_index]))
    
    # Sort by similarity
    similar_stores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)[1:top_n+1]
    
    # Get store names
    recommended_stores = [df.iloc[i[0]]["store_name"] for i in similar_stores]
    
    return recommended_stores


In [8]:
print("Similar stores to Eva B:")
print(recommend_similar_stores(reviewed_data, "Eva B"))

Similar stores to Eva B:
['Hadio', 'KILOFripe', 'Marché Floh']


### 2. Location-Based Recommendation (Find Nearby Stores)

In [9]:
from geopy.distance import geodesic

# Function to recommend stores near a given location
def recommend_nearby_stores(df,lat, lon, top_n=3):
    df["distance_km"] = df.apply(lambda row: geodesic((lat, lon), (row["latitude"], row["longitude"])).km, axis=1)
    df_sorted = df.sort_values(by="distance_km")
    
    # Exclude the first result if its distance is 0 (same location)
    df_filtered = df_sorted[df_sorted["distance_km"] > 0]

    return df_filtered[["store_name", "category", "neighborhood", "distance_km"]].head(top_n)


In [11]:
# Example: Recommend stores near a user at (40.7130, -74.0065)
print("Nearby Stores:")
recommend_nearby_stores(reviewed_data, reviewed_data.latitude[0] ,reviewed_data.longitude[0])

Nearby Stores:


Unnamed: 0,store_name,category,neighborhood,distance_km
10,La Pompadour Boutique,Restored & Upcycled,Mile-End,0.012075
9,Citizen Vintage,Restored & Upcycled,Mile-End,0.039384
11,Covet Vintage,Restored & Upcycled,Mile-End,0.072927
