# Abstract

This project implements a robust and versatile recommendation system that seamlessly blends collaborative and content-based filtering techniques. Specifically designed for offering tailored tour and adventure suggestions to users, the system orchestrates the prediction of user preferences from historical interactions, employs clustering algorithms to group users with akin preferences, and delivers finely tuned recommendations for both experiences and tours. This amalgamation of collaborative and content-based methodologies ensures a holistic and personalized approach, enhancing the user experience by aligning recommendations closely with individual preferences gleaned from historical data.

# 1: Imports

Import necessary libraries for data manipulation, machine learning, and natural language processing.

In [1]:
import pandas as pd
import numpy as np
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
from sklearn.neighbors import NearestNeighbors
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import spacy
from sklearn.metrics.pairwise import cosine_similarity

# 2: Load Data

Load user interaction data (FINAL_DATA.csv) and tour information data (final_tours_and_adventures.csv).

In [2]:
data = pd.read_csv('FINAL_DATA.csv')
tours_data = pd.read_csv('final_tours_and_adventures.csv')

In [43]:
data.sample(3)

Unnamed: 0,experience_id,user,liked,shared,bucketlist,purchased,attended,score,age,avg_accomodation_cost,avg_transport_cost,name,description,adventureLevel,price,gender_Male,featured,rating,predicted_score
1234,64fc3d483d690a3e195ee6a4,81.0,1.0,1.0,1.0,0.0,0.0,3.0,60.0,0.0,0.0,Out of Africa Experience : Karen Blixen Museum,Step into the captivating world of Karen Blixe...,4.0,5555.0,1.0,1.0,0.88151,3.0
600,64fc46763d690a3e195ee6c6,38.0,1.0,0.0,0.0,0.0,0.0,1.0,38.0,0.0,0.0,Lunch with Elephants : Sheldrick Wildlife Trust,Embark on a transformative journey at the Shel...,6.5,5555.0,0.0,0.0,-1.015708,1.0
63,64fca0063d690a3e195ee937,1.0,1.0,1.0,0.0,0.0,0.0,2.0,35.0,1200.0,600.0,Art Viewing at Nairobi Gallery,Journey into the heart of Kenyan art and cultu...,4.0,5555.0,1.0,1.0,1.173121,2.0


In [44]:
tours_data.sample(3)

Unnamed: 0,name,imageCover,price,description,similarity_score
77,Mombasa Coastal Explorer,https://i.imgur.com/E4Xiiys.png,220.0,Explore the beautiful coastal treasures of Mom...,636.949768
58,Nairobi Full City Tour,https://cloudfront.safaribookings.com/lib/keny...,220.0,"During this amazing tour, you get a chance to ...",614.388489
82,Nairobi Safari Expedition,https://independenttravelcats.com/wp-content/u...,210.0,Embark on a thrilling safari adventure in Nair...,578.684082


# 3: Feature Selection

Select relevant features for analysis, focusing on user interactions, tour attributes, and ratings.

In [3]:
selected_features = ['user', 'liked', 'shared', 'bucketlist', 'purchased', 'attended', 'score', 'age', 'avg_accomodation_cost',
                     'avg_transport_cost', 'adventureLevel', 'price', 'gender_Male', 'featured', 'rating']
numerical_data = data[selected_features]

# 4: Correlation Analysis

Explore correlations between selected features to understand relationships within the data.

In [32]:
correlation_matrix = numerical_data.corr()
correlation_matrix

Unnamed: 0,user,liked,shared,bucketlist,purchased,attended,score,age,avg_accomodation_cost,avg_transport_cost,adventureLevel,price,gender_Male,featured,rating
user,1.0,-0.019419,0.03169,0.04555,0.015096,0.002711,0.026521,0.04548,0.348113,0.376507,-0.009719,0.017873,-0.035855,0.00663,-0.141861
liked,-0.019419,1.0,0.0487,0.099637,0.113508,0.049935,0.503237,-0.028578,0.013683,0.004538,-0.015209,-0.010434,-0.051211,0.016357,-0.009264
shared,0.03169,0.0487,1.0,0.277283,0.12503,0.054721,0.567466,-0.008321,0.030547,0.017869,-0.013709,-0.005771,0.02138,0.008014,-0.026087
bucketlist,0.04555,0.099637,0.277283,1.0,0.301398,0.191238,0.656101,-0.016731,0.03179,0.02315,0.012155,0.007894,0.007539,0.006886,-0.026488
purchased,0.015096,0.113508,0.12503,0.301398,1.0,0.571011,0.654072,-0.009563,0.020597,0.023278,0.000217,0.029905,-0.016376,-0.017843,0.000112
attended,0.002711,0.049935,0.054721,0.191238,0.571011,1.0,0.516113,0.005371,0.005573,0.005672,-0.014783,0.016404,-0.031254,-0.015853,0.009578
score,0.026521,0.503237,0.567466,0.656101,0.654072,0.516113,1.0,-0.020701,0.038683,0.026357,-0.006858,0.009854,-0.021826,0.000522,-0.016666
age,0.04548,-0.028578,-0.008321,-0.016731,-0.009563,0.005371,-0.020701,1.0,-0.011492,-0.030067,0.036702,-0.007735,0.411349,-0.01271,0.088797
avg_accomodation_cost,0.348113,0.013683,0.030547,0.03179,0.020597,0.005573,0.038683,-0.011492,1.0,0.884887,-0.012675,0.021101,-0.054355,-0.005922,-0.205835
avg_transport_cost,0.376507,0.004538,0.017869,0.02315,0.023278,0.005672,0.026357,-0.030067,0.884887,1.0,-0.004661,0.021708,-0.069883,-0.011281,-0.227383


# 5: Random Forest Regressor

Use a Random Forest Regressor to predict user ratings based on selected features. Impute missing values, standardize features, and train the model.

In [5]:
imputer = SimpleImputer()
data_imputed = pd.DataFrame(imputer.fit_transform(numerical_data), columns=numerical_data.columns)

features = ['liked', 'shared', 'bucketlist', 'purchased', 'attended', 'avg_accomodation_cost', 'avg_transport_cost',
            'price', 'featured', 'rating', 'gender_Male', 'price']
target = 'score'

X_train, X_test, y_train, y_test = train_test_split(data_imputed[features], data_imputed[target],
                                                    test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = RandomForestRegressor(random_state=42)
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

mse = mean_squared_error(y_test, y_pred)

data['predicted_score'] = model.predict(scaler.transform(data_imputed[features]))

user_scored_experiences = data.pivot_table(index='user', columns='experience_id', values='predicted_score', fill_value=0)
user_scored_experiences_matrix = user_scored_experiences.values

In [28]:
print(f'MSE: {mse}')

MSE: 0.020013121717830755


# 6: KMeans Clustering

Apply KMeans clustering to group users based on predicted scores. Determine the optimal number of clusters using silhouette score.

In [6]:
max_clusters = 10
best_score = -1
best_cluster = 0
for n_clusters in range(2, max_clusters + 1):
    kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
    cluster_labels = kmeans.fit_predict(user_scored_experiences_matrix)
    score = silhouette_score(user_scored_experiences_matrix, cluster_labels)
    if score > best_score:
        best_score = score
        best_cluster = n_clusters

kmeans = KMeans(n_clusters=best_cluster, random_state=42, n_init=10)
user_scored_experiences['cluster'] = kmeans.fit_predict(user_scored_experiences_matrix)

# 7: Experience Cross Recommendations Function

Function to provide cross-recommendations for users within the same cluster.

In [7]:
def cross_recommendations(user_id, num_recommendations=5):
    user_cluster = user_scored_experiences.loc[user_id, 'cluster']
    cluster_users = user_scored_experiences[user_scored_experiences['cluster'] == user_cluster]
    user_liked_experiences = user_scored_experiences.loc[user_id, user_scored_experiences.columns[user_scored_experiences.loc[user_id] > 0]].index.tolist()

    recommendations = []

    for idx, row in cluster_users.iterrows():
        if idx != user_id:
            liked_experiences = row[row > 0].index.tolist()
            for exp_id in liked_experiences:
                if exp_id not in user_liked_experiences and exp_id not in recommendations:
                    recommendations.append(exp_id)
                    if len(recommendations) == num_recommendations:
                        return recommendations

# 8: Get User Recommendations Function

Implement a function to recommend experiences and tours based on user preferences.
Utilize content-based filtering for tour recommendations.

In [38]:
def get_user_recommendations(user_id):
    # Cross-Recommendations for Experiences
    recommended_experiences_ids = cross_recommendations(user_id)
    recommended_experiences_df = data[data['experience_id'].isin(recommended_experiences_ids)][['experience_id', 'name', 'description']]

    # Content-Based Filtering for Tours
    nlp = spacy.load("en_core_web_md")
    recommended_descriptions = recommended_experiences_df['description'].tolist()
    new_tours_descriptions = tours_data['description'].tolist()

    # Check for NaN values and replace with an empty string
    recommended_descriptions = [str(desc) if pd.notna(desc) else '' for desc in recommended_descriptions]

    recommended_vectors = np.array([nlp(desc).vector for desc in recommended_descriptions])
    new_tours_vectors = np.array([nlp(desc).vector for desc in new_tours_descriptions])

    similarity_matrix = cosine_similarity(recommended_vectors, new_tours_vectors)
    total_similarity_scores = similarity_matrix.sum(axis=0)

    tours_data['similarity_score'] = total_similarity_scores
    sorted_tours_data = tours_data.sort_values(by='similarity_score', ascending=False)
    top_12_recommendations = sorted_tours_data.head(12)

    return recommended_experiences_df, top_12_recommendations

# Example Usage

In [45]:
# Replace with the desired user ID
user_id = 99 
experiences, tours = get_user_recommendations(user_id)

In [46]:
print(f"Recommended Experiences for User {user_id} \n ")
experiences.head()

Recommended Experiences for User 99 
 


Unnamed: 0,experience_id,name,description
3,64fc2511148fd2e0b23d5031,Game Drive at Nairobi National Park,Experience the best of both worlds at Nairobi ...
4,64fc2511148fd2e0b23d5031,Game Drive at Nairobi National Park,Experience the best of both worlds at Nairobi ...
5,64fc2511148fd2e0b23d5031,Game Drive at Nairobi National Park,Experience the best of both worlds at Nairobi ...
7,64fc2511148fd2e0b23d5031,Game Drive at Nairobi National Park,Experience the best of both worlds at Nairobi ...
10,64fc8bc73d690a3e195ee898,Souvenir Shopping at Maasai Market,Dive into the vibrant world of Kenyan artistry...


In [47]:
print(f"Recommended Tours for User {user_id} \n ")
tours

Recommended Tours for User 99 
 


Unnamed: 0,name,imageCover,price,description,similarity_score
49,Lake Naivasha and Masai Mara Safari (Mid-Range),https://cloudfront.safaribookings.com/lib/keny...,784.0,If you are looking for the perfect retreat wit...,636.920837
67,"Nairobi Park, Shedrick's Centre and Carnivore",https://cloudfront.safaribookings.com/lib/keny...,252.0,This is a short safari tour of the only park w...,636.747559
40,Amboseli National Park Mid Range Safari Tour,https://cloudfront.safaribookings.com/lib/keny...,565.0,Amboseli National Park is one of the most spec...,635.376099
22,"Amboseli NP, Lake Naivasha & Maasai Mara Mid-R...",https://cloudfront.safaribookings.com/lib/keny...,1518.0,"This Safari will take you to the ""Land of Gian...",633.438782
6,Great Migration in Masai Mara & Lake Nakuru Sa...,https://cloudfront.safaribookings.com/lib/keny...,550.0,This safari tour is everything that you have w...,633.053223
62,Maasai Mara and Diani Beach Luxury Safari,https://cloudfront.safaribookings.com/lib/keny...,2185.0,This safari gives you the lifetime opportunity...,632.944092
36,Safari (Including Masai Mara) & Zanzibar Exten...,https://cloudfront.safaribookings.com/lib/keny...,2250.0,This is a 6-day amazing safari with the best o...,631.142517
66,Gorillas & the Big Five,https://cloudfront.safaribookings.com/library/...,15160.0,This truly memorable holiday is for those want...,630.830566
77,Mombasa Coastal Explorer,https://i.imgur.com/E4Xiiys.png,220.0,Explore the beautiful coastal treasures of Mom...,630.69635
0,Great Migration at Masai Mara Budget Safari,https://cloudfront.safaribookings.com/lib/keny...,550.0,Witness an amazing annual event of the great m...,630.421875


# 9: Combined Recommendations Function

In [27]:
%%time

def run_user_recommendation_system():
    # Get user input for user ID
    while True:
        try:
            user_id = int(input("Enter a user ID to generate recommendations for: "))
            if 1 <= user_id <= 139:
                break
            else:
                print("Invalid user ID. Please enter a number between 1 and 139.")
        except ValueError:
            print("Invalid input. Please enter a valid number.")

    # Get and display recommendations
    experiences, tours = get_user_recommendations(user_id)

    print(f"\nRecommended Experiences for User {user_id}:\n")
    print(experiences.head())

    print(f"\nRecommended Tours for User {user_id}:\n")
    print(tours.head(12))

# Run the recommendation system
run_user_recommendation_system()

Enter a user ID to generate recommendations for: 111

Recommended Experiences for User 111:

              experience_id                                 name  \
3  64fc2511148fd2e0b23d5031  Game Drive at Nairobi National Park   
4  64fc2511148fd2e0b23d5031  Game Drive at Nairobi National Park   
5  64fc2511148fd2e0b23d5031  Game Drive at Nairobi National Park   
7  64fc2511148fd2e0b23d5031  Game Drive at Nairobi National Park   
8  64fca2693d690a3e195ee94d         Dive into the Fourteen Falls   

                                         description  
3  Experience the best of both worlds at Nairobi ...  
4  Experience the best of both worlds at Nairobi ...  
5  Experience the best of both worlds at Nairobi ...  
7  Experience the best of both worlds at Nairobi ...  
8  Fourteen Falls is nature's masterpiece. Witnes...  

Recommended Tours for User 111:

                                                 name  \
49    Lake Naivasha and Masai Mara Safari (Mid-Range)   
67      Nairobi Park

# Conclusion

In this recommendation system implementation, a synergistic blend of collaborative and content-based filtering techniques has been realized. Users were effectively clustered based on predicted preferences, leading to the successful generation of personalized recommendations for both experiences and tours. The model leverages Random Forests for predicting user preferences, KMeans clustering for user grouping, and natural language processing (NLP) for content-based filtering.

The model's accuracy and effectiveness in providing tailored recommendations have been demonstrated, ensuring a more engaging and user-centric experience. The integration of Random Forests allowed for robust prediction of user preferences, KMeans clustering facilitated the identification of user groups with similar tastes, and NLP techniques enhanced the accuracy of tour recommendations.

To further enhance the model, future efforts could focus on optimizing its speed and accuracy. Incorporating more extensive and diverse datasets could contribute to a richer understanding of user preferences. Additionally, exploring advanced NLP techniques and models may further refine content-based recommendations.

Looking forward, this recommendation system presents a valuable tool for platforms like Tajriba to recommend five experiences per user for each week of the month and twelve tours for each month of the year. This approach not only facilitates a personalized user experience but also allows for systematic monitoring and evaluation based on user engagement and feedback. The continuous feedback loop, driven by newer browsing and booking data, provides a mechanism for iterative improvement and adaptation to evolving user preferences.

Addressing potential challenges, such as optimizing recommendation speed to minimize user wait times, is crucial to prevent user churn. Striking a balance between accuracy and efficiency will be pivotal in maintaining user satisfaction and engagement.