## Travel Attractions Recommendation Script:
 * **Algorithm**: _cosine_similarity_. 
 * **Dataset**: Small dummy data of 13 travel attractions, and 9 fake users who rated some of them 1-5(file attached).
 * **Problem**: Suggest to a user travel attractions he might like based on his previous rattings and ratings of other users.
 * **Solution**: (_Collaborative Filtering_) Using ratings from all USERSxAttractions, creating a matrix of similar attractions based on taste. Then given current users' taste on some of the attractions, we calculate what rating the user whould give every other attraction, and then we can suggest the top of them.  
 * **Results**: A list of suggested attractions (from the dataset) a user might like.

In [118]:
import pandas as pd
from sklearn import preprocessing
from sklearn.metrics.pairwise import cosine_similarity

In [119]:
df = pd.read_csv('RealWorldApplications/travel-attractions-recommendation-dataset.csv', index_col=0)

df = df.fillna(0)
df.head(3)

Unnamed: 0,Rijksmuseum,Anna Frank House,Van Gogh Museum,The Jordaan,Vondelpark,Adam Lookout,Body Worlds,Rembrandt huis,ARTIS Royal Zoo,Micropia,Albert Cuyp Market,Amsterdamse Bos,De 9 Straatjes
museum_lover,5,5.0,5,3.0,0.0,0.0,1.0,2.0,0.0,1.0,0.0,0.0,0.0
city_guy,3,0.0,3,5.0,5.0,4.0,0.0,0.0,0.0,0.0,3.0,3.0,5.0
family_mom,4,5.0,3,4.0,5.0,0.0,0.0,0.0,5.0,0.0,4.0,4.0,4.0


In [120]:
# scaling all ratings to a smaller nummericals for more accuracy
X = preprocessing.scale(df)
X[:1]

array([[ 0.81110711,  0.89442719,  1.01015254, -0.42640143, -1.5430335 ,
        -0.8878117 , -0.43905704,  0.32547228, -0.55738641, -0.44450044,
        -1.79990817, -0.76338629, -1.83333333]])

In [121]:
# calculating the similarity % of each attraction to each other attraction
item_similarity = cosine_similarity(X.T)

# setting the results into a dataframe format for easy use
item_similarity_df = pd.DataFrame(item_similarity, index=df.columns, columns=df.columns)
item_similarity_df

array([[ 1.        ,  0.25391669,  0.36051044, -0.38044296, -0.55068879,
        -0.1667624 ,  0.11573974,  0.30623174, -0.05651251,  0.36053747,
        -0.22941573, -0.75255162, -0.45962736],
       [ 0.25391669,  1.        , -0.29364007, -0.76277007, -0.69006556,
        -0.99260365, -0.26998438,  0.75688926,  0.35832675, -0.24138378,
        -0.45996766, -0.26261287, -0.59628479],
       [ 0.36051044, -0.29364007,  1.        , -0.0861461 ,  0.01558699,
         0.3445697 ,  0.64309615, -0.15123726, -0.13372326,  0.73766463,
        -0.4025974 , -0.08304548, -0.42089689]])

In [123]:
def get_suggested_attractions(attraction, user_rating):
    # 2.5 is to make the results more extreme between what might be liked vs disliked
    attraction_scores = item_similarity_df[attraction] * (user_rating-2.5)
    attractions = attraction_scores.sort_values(ascending=False)
    return attractions

**Usage Example 1:**

_A user rated the attraction "The Jordaan" and gave it 5 stars,
now get_suggested_attractions() will get a list of the similarity of the Jordaan to all other attractions,
then we multiple each similarity by the users' rating, so that the users' taste is affecting the results. 
So if the user likes the base attraction, the similar attractions will get a positive boost up and the opposite attractions will go down.
And the opposite, if the user disliked this attraction, all similar attractions will go down in the and the opposite attractions will get a boost up_

_The results suggest the user might like other neighborhoods and cool attractions, and dislike museums_

In [124]:
print(get_suggested_attractions(attraction="The Jordaan", user_rating=5))

The Jordaan           2.500000
De 9 Straatjes        1.954340
Adam Lookout          1.892821
Vondelpark            1.809367
Albert Cuyp Market    1.781658
Amsterdamse Bos       1.064164
Body Worlds          -0.029252
Van Gogh Museum      -0.215365
Micropia             -0.473839
ARTIS Royal Zoo      -0.482768
Rijksmuseum          -0.951107
Anna Frank House     -1.906925
Rembrandt huis       -2.151119
Name: The Jordaan, dtype: float64


**Usage Example 2:**

_A user rated multiple attractions, some he liked, others disliked. 
We get results for each rating we have, and calculate the avarage of them to get the final suggestions list_

_Given that the user liked a neighborehood, and disliked a museum and a park, we can see the results we get this time, suggest another park (Vondelpark) lower in the list then in the previous example, and show a similar museum (Body Worlds) also lower in the list._

In [125]:
new_user = [("The Jordaan", 5), ("Body Worlds", 1), ("Amsterdamse Bos", 1)]

suggestions = pd.DataFrame()

for attraction, rating in new_user:
    suggestions = suggestions.append(get_suggested_attractions(attraction, rating), ignore_index=True)
    
suggestions.sum().sort_values(ascending=False)    

The Jordaan           1.879053
De 9 Straatjes        1.257355
Albert Cuyp Market    1.149868
Adam Lookout          1.050326
Vondelpark            0.649125
Rijksmuseum           0.004110
ARTIS Royal Zoo      -0.478853
Amsterdamse Bos      -0.585696
Van Gogh Museum      -1.055441
Anna Frank House     -1.108029
Rembrandt huis       -1.458198
Micropia             -1.520139
Body Worlds          -1.679112
dtype: float64