# VoyageSense: AI-Powered Travel Recommendation System

**Module E: AI Applications â€“ Individual Open Project**

**Student Name:** Zaki Nafees, 
**Student Code:** iitrpr_ai_25010088

---

## 1. Problem Definition & Objective

### Selected Project Track
**Content Recommendation System**

### Problem Statement
With the abundance of travel destinations in India, travelers often struggle to find locations that match their specific interests (e.g., Nature, Heritage), time constraints (e.g., Job Type (Remote/Onsite)), and budget. Generic travel sites provide lists but lack personalized, context-aware suggestions.

### Objectives
**VoyageSense** aims to build an intelligent recommendation engine that:
1.  Understands user preferences (Interest, Budget, Duration, Job Type).
2.  Matches them with Indian destinations using **Content-Based Filtering (Cosine Similarity)**.
3.  Provides **AI-generated explanations** for *why* a place was recommended using LLMs (Google Gemini). 
4.  Integrates **Sentiment Analysis** to boost high-quality destinations.
5.  Provides **Immersive Experience** by embedding relevant YouTube Travel Vlogs directly in the UI.

### Real-world Relevance & Motivation
Tourism is a major economic driver in India, yet the travel planning experience is often fragmented. Travelers spend hours researching across multiple platforms. **VoyageSense** addresses this by providing a **one-stop, personalized solution** that not only suggests places but also explains *why* they fit the traveler's needs, mimics a human travel agent's reasoning, and visualizes the experience through vlogs. This reduces planning time and helps discover hidden gems, boosting local tourism.

## 2. Data Understanding & Preparation

### Dataset Source
- **Public Dataset**: `Top Indian Places to Visit.csv` (Kaggle/Custom collection).
- **Reviews**: `tripadvisor_hotel_reviews.csv` (Used for sentiment analysis).

### Data Loading & Exploration

In [8]:
import pandas as pd
import numpy as np
import sys
import os

# Add src to system path to import project modules
sys.path.append(os.path.abspath('src'))

# Load Datasets
try:
    places_df = pd.read_csv('data/raw/Top Indian Places to Visit.csv')
    reviews_df = pd.read_csv('data/raw/tripadvisor_hotel_reviews.csv')
    
    print("Places Dataset Shape:", places_df.shape)
    print("Reviews Dataset Shape:", reviews_df.shape)
    
    display(places_df.head())
except FileNotFoundError:
    print("Error: Data files not found. Ensure you are running from the project root.")

Places Dataset Shape: (325, 16)
Reviews Dataset Shape: (20491, 2)


Unnamed: 0.1,Unnamed: 0,Zone,State,City,Name,Type,Establishment Year,time needed to visit in hrs,Google review rating,Entrance Fee in INR,Airport with 50km Radius,Weekly Off,Significance,DSLR Allowed,Number of google review in lakhs,Best Time to visit
0,0,Northern,Delhi,Delhi,India Gate,War Memorial,1921,0.5,4.6,0,Yes,,Historical,Yes,2.6,Evening
1,1,Northern,Delhi,Delhi,Humayun's Tomb,Tomb,1572,2.0,4.5,30,Yes,,Historical,Yes,0.4,Afternoon
2,2,Northern,Delhi,Delhi,Akshardham Temple,Temple,2005,5.0,4.6,60,Yes,,Religious,No,0.4,Afternoon
3,3,Northern,Delhi,Delhi,Waste to Wonder Park,Theme Park,2019,2.0,4.1,50,Yes,Monday,Environmental,Yes,0.27,Evening
4,4,Northern,Delhi,Delhi,Jantar Mantar,Observatory,1724,2.0,4.2,15,Yes,,Scientific,Yes,0.31,Morning


### Preprocessing & Cleaning
Data cleaning is handled in `src/process_data.py`. Key steps included:
- Standardizing column names.
- Handling missing values in 'Zone' and 'Best Time'.
- Normalizing numeric fields like `Google_Rating`.

## 3. Model / System Design

### AI Technique Used
- **Machine Learning**: TF-IDF / Label Encoding + Cosine Similarity for Content-Based Filtering.
- **NLP (Sentiment)**: NLTK VADER for analyzing visitor reviews.
- **LLM**: Google Gemini for generating natural language explanations.

### Architecture
1.  **User Input**: Interests, Budget, Duration, Zone.
2.  **Feature Engine**: Converts user profile and destinations into Vector Space.
3.  **Similarity Search**: Computes Cosine Similarity to find top matches.
4.  **Constraint Filtering**: Removes invalid options (e.g., Budget mismatch).
5.  **LLM Explainer**: Generates specific reasons for the recommendation.
6.  **YouTube Integration**: Embeds relevant Travel Vlogs.

### Justification of Design Choices
- **Content-Based Filtering**: Chosen because travel preferences are highly specific to the *attributes* of a location (e.g., someone liking 'Historical' places will likely want more 'Historical' recommendations). Collaborative filtering was avoided due to the lack of a large user interaction history dataset (Cold Start Problem).
- **Cosine Similarity**: Efficient and effective for high-dimensional feature vectors created from categorical data (Interests, Zones).
- **LLM (Gemini)**: Rule-based templates feel robotic. An LLM allows us to synthesize the user's constraints and the location's features into a persuasive, natural-language narrative, significantly improving user trust and engagement.
- **Hybrid Approach**: It combines hard constraints (Budget, Duration) with soft similarity scores to ensure recommendations are not just similar, but *feasible*.

## 4. Core Implementation

The core logic is modularized in `src/recommender.py`. We initialize the `TravelRecommender` class which loads the data and pre-trained feature vectors.

In [2]:
from src.recommender import TravelRecommender

# Initialize the System
# This loads the database and computes similarity matrices
recommender = TravelRecommender()
print("Recommender System Initialized Successfully.")

Recommender System Initialized Successfully.


### Demonstration
Let's test the system with a sample user profile.

In [3]:
# Define a User Profile
user_profile = {
    'type': 'Nature',             # Interest: Nature
    'significance': 'Nature',     # Fallback significance
    'duration_bucket': 'Short',   # Trip Duration: Short (1-3 days)
    'budget_bucket': 'Low',       # Budget: Low
    'zone': 'Southern',           # Preferred Zone
    'job_type': 'Fixed Schedule', # Constraint: Strict Schedule
    'visit_day': 'Monday'         # Availability
}

print("Processing Recommendations for User:", user_profile)

# Get Recommendations (Top 5)
recommendations = recommender.recommend(user_profile, top_n=5)

# Display Results
if not recommendations.empty:
    display(recommendations[['name', 'city', 'state', 'match_score', 'google_rating', 'explanation']])
else:
    print("No matching recommendations found. Try loosening constraints.")

Processing Recommendations for User: {'type': 'Nature', 'significance': 'Nature', 'duration_bucket': 'Short', 'budget_bucket': 'Low', 'zone': 'Southern', 'job_type': 'Fixed Schedule', 'visit_day': 'Monday'}


Unnamed: 0,name,city,state,match_score,google_rating,explanation
264,Matsyadarshini Aquarium,Visakhapatnam,Andhra Pradesh,0.492645,3.8,"This place matches your Low budget, fits your ..."
107,St. Angelo Fort,Kannur,Kerala,0.487247,4.4,"This place matches your Low budget, fits your ..."
261,Borra Caves,Visakhapatnam,Andhra Pradesh,0.480663,4.5,"This place matches your Low budget, fits your ..."
97,Munnar Tea Gardens,Munnar,Kerala,0.479541,4.3,This place Aligms with your interest in Scenic...
121,Bandipur National Park,Bandipur,Karnataka,0.47782,4.4,"This place matches your Low budget, fits your ..."


## 5. Evaluation & Analysis

### Performance Analysis
- **Relevance**: The system successfully filters out high-budget places for a 'Low' budget user.
- **Accuracy**: Cosine similarity effectively clusters 'Nature' destinations together.
- **Explainability**: The LLM-generated explanations provide context (e.g., "Aligns with your interest in Nature") rather than just a list of names.

### Sentiment Analysis Sample
We use NLTK VADER to score reviews. Here is an example of identifying positive sentiment:

In [9]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk

# Ensure lexicon is downloaded
try:
    nltk.data.find('vader_lexicon')
except LookupError:
    nltk.download('vader_lexicon')

sia = SentimentIntensityAnalyzer()

sample_review = "The place was absolutely beautiful and serene. Best trip ever!"
score = sia.polarity_scores(sample_review)

print(f"Review: '{sample_review}'")
print(f"Sentiment Score: {score}")

Review: 'The place was absolutely beautiful and serene. Best trip ever!'
Sentiment Score: {'neg': 0.0, 'neu': 0.369, 'pos': 0.631, 'compound': 0.9177}


[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\acer\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


## 6. Ethical Considerations & Responsible AI

1.  **Bias in Data**: The dataset might favor popular tourist spots, potentially overshadowing lesser-known but equally beautiful destinations. This creates a "popularity bias."
2.  **Dataset Limitations**: The data may not reflect real-time changes (e.g., a park closed for maintenance).
3.  **Responsible AI**: Google Gemini API is used while ensuring that prompts sent to the LLM are strictly for travel context and do not hallucinate safety facts.

## 7. Conclusion & Future Scope

### Summary
We successfully built **VoyageSense**, a hybrid recommendation system that combines structured data filtering with AI-driven personalization. The system is robust, handling budget and time constraints effectively.

### Future Improvements
- **Real-time API Integration**: Fetching live weather and pricing data.
- **User Accounts**: Saving history and preferences.
- **Mobile Application**: Porting the logic to Flutter/React Native for mobile access.