# Recommender Systems (рекомендаційні системи)

![types](https://assets-global.website-files.com/60f03643ffba6a48a3bda298/62690b50cf469a4a8f4f35b7_qj772QmOASv8t2tqRbTaQLmEsnO5dEgD0rIRqsOXh8K8qaCCplZaB2wHTnc5h5oePrXmbW4lLVyrHEI9ybjjBiz3KpmdUl4QNkkY9m3TMvu5IPQngtibC2J4WhKTAk7nXEubAOhq.jpeg)

In [1]:
import pandas as pd

# Source: https://cseweb.ucsd.edu/~jmcauley/datasets.html#clothing_fit

df = pd.read_json("../../data/recommendations/modcloth_final_data.json", lines=True)

In [2]:
df.head()

Unnamed: 0,item_id,waist,size,quality,cup size,hips,bra size,category,bust,height,user_name,length,fit,user_id,shoe size,shoe width,review_summary,review_text
0,123373,29.0,7,5.0,d,38.0,34.0,new,36.0,5ft 6in,Emily,just right,small,991571,,,,
1,123373,31.0,13,3.0,b,30.0,36.0,new,,5ft 2in,sydneybraden2001,just right,small,587883,,,,
2,123373,30.0,7,2.0,b,,32.0,new,,5ft 7in,Ugggh,slightly long,small,395665,9.0,,,
3,123373,,21,5.0,dd/e,,,new,,,alexmeyer626,just right,fit,875643,,,,
4,123373,,18,5.0,b,,36.0,new,,5ft 2in,dberrones1,slightly long,small,944840,,,,


In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 82790 entries, 0 to 82789
Data columns (total 18 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   item_id         82790 non-null  int64  
 1   waist           2882 non-null   float64
 2   size            82790 non-null  int64  
 3   quality         82722 non-null  float64
 4   cup size        76535 non-null  object 
 5   hips            56064 non-null  float64
 6   bra size        76772 non-null  float64
 7   category        82790 non-null  object 
 8   bust            11854 non-null  object 
 9   height          81683 non-null  object 
 10  user_name       82790 non-null  object 
 11  length          82755 non-null  object 
 12  fit             82790 non-null  object 
 13  user_id         82790 non-null  int64  
 14  shoe size       27915 non-null  float64
 15  shoe width      18607 non-null  object 
 16  review_summary  76065 non-null  object 
 17  review_text     76065 non-null 

In [4]:
df = df[~df["user_id"].isna()]
df = df[~df["item_id"].isna()]
df = df[~df["quality"].isna()]
df = df[~df["review_text"].isna()]
df = df.reset_index()

In [5]:
len(df.item_id.unique())

1322

In [6]:
len(df.user_id.unique())

44811

In [7]:
len(df)

76000

## Неперсоналізовані рекомендаційні системи

Popularity-based recommender systems: Popularity-based recommenders work by suggesting the most frequently purchased products to customers. As the name suggests, Popularity based recommendation system works with the trend. It basically uses the items which are in trend right now. 

### Frequency of purchase
Popularity-based recommenders work by suggesting the most frequently purchased products to customers. This vague idea can be turned into at least two concrete implementations:
- Check which articles are bought most often across all customers. Recommend these articles to each customer.
Source: https://towardsdatascience.com/how-to-build-popularity-based-recommenders-with-polars-cc7920ad3f68#:~:text=Popularity%2Dbased%20recommenders%20work%20by,these%20articles%20to%20each%20customer.

In [8]:
items_popularity = df.groupby("item_id")["user_id"].count().sort_values(ascending=False)
items_popularity = items_popularity.reset_index()
items_popularity

Unnamed: 0,item_id,user_id
0,539980,2007
1,668696,1555
2,397005,1506
3,175771,1438
4,407134,1437
...,...,...
1317,542404,1
1318,541405,1
1319,214259,1
1320,536646,1


In [9]:
items_popularity.iloc[:3]["item_id"].to_list()

[539980, 668696, 397005]

In [10]:
import random
popular_items = items_popularity.iloc[:3]["item_id"].to_list()

def present_recommended_products(popular_items: list):
    print("**Currently trending products**")
    print("")

    for index, item_id_ in enumerate(popular_items):
        slice_df = df[df["item_id"] == item_id_]
        print(f"Recommended item {index+1}/{len(popular_items)}: product {item_id_}")
    
        category = slice_df["category"].unique()[0]
        print(f"{category=}")
    
        slice_with_reviews = slice_df[~slice_df["review_text"].isna()]
        reviews_for_slice = list(slice_with_reviews["review_text"].unique())
        if len(reviews_for_slice) > 0:
            reviews = random.sample(reviews_for_slice, min(len(reviews_for_slice), 3))
            print(f"User reviews:")
            for review in reviews:
                print("-", review)
            print("...")
        else:
            print("There are no reviews for this product yet.")
        print("")

present_recommended_products(popular_items)

**Currently trending products**

Recommended item 1/3: product 539980
category='tops'
User reviews:
- Love the color of this cardigan. I have 2 of the Charter School Cardigans and hope to get all the colors!
- I bought this sweater in honey, and the color is gorgeous! The material is soft, and seems high quality. My only (small) quibble is that the buttons are very hard to do and undo. It's a beautiful sweater!
- Super cute and lightweight! Perfect to go with your dress or favorite shirt. I'm probably going to buy one of every color.
...

Recommended item 2/3: product 668696
category='bottoms'
User reviews:
- I really love this skirt. I can wear it casually or dress it up and the material is amazing.
- This skirt is amazing!! It is very full, which is why I love it, but if you're shy of a full, twirly skirt this may not be for you. The material is thick and good quality, I think it's worth the price.
- I really loved the skirt and it fit just right.
...

Recommended item 3/3: product 3

# Content-based personalized systems

In [11]:
df_reviews = df[["item_id", "review_text", "category"]][~df["review_text"].isna()]
df_reviews.head()

Unnamed: 0,item_id,review_text,category
0,152702,"I liked the color, the silhouette, and the fab...",new
1,152702,From the other reviews it seems like this dres...,new
2,152702,I love the design and fit of this dress! I wo...,new
3,152702,I bought this dress for work it is flattering...,new
4,152702,This is a very professional look. It is Great ...,new


In [12]:
len(df_reviews)

76000

In [13]:
df_grouped = df_reviews.groupby(["item_id", "category"]).agg({'review_text': ' '.join})
df_grouped = df_grouped.reset_index()
df_grouped.head()

Unnamed: 0,item_id,category,review_text
0,152702,new,"I liked the color, the silhouette, and the fab..."
1,153494,new,I wanted to fit in this dress so bad so I made...
2,153798,new,Unfortunately the fabric is soooo thin and wri...
3,154411,new,My only complaint is that people notice when I...
4,154882,new,Most of the other reviews said size up one but...


In [14]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

Creating recommendations based on TF-IDF score:

![if-idf2](https://miro.medium.com/v2/resize:fit:860/format:webp/1*dug-uXDMOD6H5JMnYNpgfQ.png)
![tf-idf](https://miro.medium.com/v2/resize:fit:4800/format:webp/1*Uucq42G4ntPGJKzI84b3aA.png)

https://medium.com/@imamun/creating-a-tf-idf-in-python-e43f05e4d424

In [15]:
# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(df_grouped['review_text'])
tfidf_matrix

<1322x22088 sparse matrix of type '<class 'numpy.float64'>'
	with 420695 stored elements in Compressed Sparse Row format>

In [16]:
len(df_grouped['review_text'])

1322

In [17]:
pd.DataFrame(tfidf_matrix.toarray())

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22078,22079,22080,22081,22082,22083,22084,22085,22086,22087
0,0.001489,0.0,0.0,0.0,0.0016,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
1,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.001636,0.0,0.0,0.0,0.0
2,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
3,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
4,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1317,0.000949,0.0,0.0,0.0,0.0000,0.0,0.001298,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
1318,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
1319,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0
1320,0.000000,0.0,0.0,0.0,0.0000,0.0,0.000000,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0


In [18]:
tfidf_vectorizer.vocabulary_

{'liked': 10972,
 'the': 19327,
 'color': 4221,
 'silhouette': 17207,
 'and': 1434,
 'fabric': 7073,
 'of': 13143,
 'this': 19449,
 'dress': 6173,
 'but': 3253,
 'ruching': 16247,
 'just': 10353,
 'looked': 11200,
 'bunchy': 3188,
 'ruined': 16261,
 'whole': 21489,
 'thing': 19418,
 'was': 21178,
 'so': 17764,
 'disappointed': 5804,
 'really': 15414,
 'waned': 21119,
 'to': 19692,
 'like': 10971,
 'runs': 16283,
 'little': 11066,
 'small': 17583,
 'would': 21798,
 'need': 12635,
 'size': 17309,
 'up': 20643,
 'make': 11502,
 'it': 10080,
 'workappropriate': 21735,
 'from': 8011,
 'other': 13428,
 'reviews': 15992,
 'seems': 16695,
 'either': 6473,
 'works': 21756,
 'for': 7818,
 'your': 22023,
 'body': 2703,
 'type': 20212,
 'or': 13370,
 'doesn': 5978,
 'have': 8970,
 'waist': 21058,
 'flabby': 7543,
 'tummy': 20099,
 'is': 10058,
 'perfect': 14002,
 'me': 11738,
 'detail': 5615,
 'around': 1702,
 'front': 8017,
 'hides': 9154,
 'everything': 6827,
 'clingyness': 4058,
 'makes': 11507

In [19]:
# Compute cosine similarity between user input and items
user_input = 'Knee-long skirt made of silk'
user_tfidf = tfidf_vectorizer.transform([user_input])
user_df = pd.DataFrame(user_tfidf.toarray())
user_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22078,22079,22080,22081,22082,22083,22084,22085,22086,22087
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [22]:
tfidf_vectorizer.vocabulary_["of"]

13143

In [23]:
user_df[tfidf_vectorizer.vocabulary_["of"]]

0    0.189594
Name: 13143, dtype: float64

In [24]:
tfidf_vectorizer.idf_[tfidf_vectorizer.vocabulary_["silk"]]

4.855452653939752

In [25]:
cosine_similarities = linear_kernel(user_tfidf, tfidf_matrix).flatten()
cosine_similarities

array([0.04200172, 0.04524237, 0.        , ..., 0.02718421, 0.03061408,
       0.04886547])

In [26]:
# Get indices of items sorted by similarity
top_n = 3
item_indices = cosine_similarities.argsort()[:-top_n-1:-1]
print("indices:", item_indices)
# Get recommended item names
recommendations = df_grouped['item_id'].iloc[item_indices].tolist()
print("item_ids:", recommendations)

indices: [ 915 1105 1133]
item_ids: [605558, 701811, 714723]


In [27]:
def content_based_recommender(df, user_input, top_n=3):
    """
    Content-based recommender system using TF-IDF.

    Parameters:
    - df: pandas DataFrame with 'item_id' and 'review_text' columns.
    - user_input: textual input representing user preferences.
    - top_n: number of top items to recommend.

    Returns:
    - recommendations: a list of top_n recommended item names.
    """
    # Combine relevant text features into a single string
    df['text_features'] = df['category'] + ' ' + df['review_text']

    # TF-IDF Vectorization
    tfidf_vectorizer = TfidfVectorizer()
    tfidf_matrix = tfidf_vectorizer.fit_transform(df['text_features'])

    # Compute cosine similarity between user input and items
    user_tfidf = tfidf_vectorizer.transform([user_input])
    cosine_similarities = linear_kernel(user_tfidf, tfidf_matrix).flatten()

    # Get indices of items sorted by similarity
    item_indices = cosine_similarities.argsort()[:-top_n-1:-1]

    # Get recommended item names
    recommendations = df['item_id'].iloc[item_indices].tolist()

    return recommendations

# Example usage
user_preferences = 'Knee-long skirt made of silk'
recommended_products = content_based_recommender(df_grouped, user_preferences, top_n=3)
present_recommended_products(recommended_products)

**Currently trending products**

Recommended item 1/3: product 605558
category='tops'
User reviews:
- Very good quality. The fabric feels like silk. :)
- Just as pictured; soft material; looks and feels good quality. I usually wear an XL or L (I'm an apple shape) and got an L for this item and it fits well.
...

Recommended item 2/3: product 701811
category='bottoms'
User reviews:
- The skirt is made very well but is comically long. I am only 4'10 so I expect things to be bigger on me but it goes to my ankles. The material is sturdy and thick but not like wool so it can be used in spring and early summer too. Overall a great find just very long.
- When I received this skirt, I was pleasantly surprised on the quality of the skirt. The buckle by the waist is a cute feature that makes the skirt unique. My only complaint is that the skirt is too long for my 5'4 stature. It reached about twothree inches above my ankles. Unfortunately, this skirt will be going back.
- I liked the librarian l

## Collaborative Filtering (Колаборативні рекомендаційні системи)

## Item-based filtering

Item-based collaborative filtering recommends items based on the similarity between items, particularly their past interactions with users. 

In [28]:
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity

# Sample dataset (user_id, item_id, rating)
data = {'user_id': [1, 1, 2, 2, 3, 3, 4, 4],
        'item_id': ['A', 'B', 'A', 'C', 'C', 'D', 'B', 'D'],
        'rating': [5, 4, 3, 2, 4, 5, 1, 3]}

df_sample = pd.DataFrame(data)
df_sample

Unnamed: 0,user_id,item_id,rating
0,1,A,5
1,1,B,4
2,2,A,3
3,2,C,2
4,3,C,4
5,3,D,5
6,4,B,1
7,4,D,3


In [29]:
# Pivot the DataFrame to create a user-item matrix
user_item_matrix = df_sample.pivot_table(index='user_id', columns='item_id', values='rating', fill_value=0)
user_item_matrix

item_id,A,B,C,D
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,5.0,4.0,0.0,0.0
2,3.0,0.0,2.0,0.0
3,0.0,0.0,4.0,5.0
4,0.0,1.0,0.0,3.0


In [30]:
user_item_matrix.T

user_id,1,2,3,4
item_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A,5.0,3.0,0.0,0.0
B,4.0,0.0,0.0,1.0
C,0.0,2.0,4.0,0.0
D,0.0,0.0,5.0,3.0


In [31]:
# Calculate cosine similarity between items
item_similarity = cosine_similarity(user_item_matrix.T)
item_similarity

array([[1.        , 0.83189033, 0.2300895 , 0.        ],
       [0.83189033, 1.        , 0.        , 0.12478355],
       [0.2300895 , 0.        , 1.        , 0.76696499],
       [0.        , 0.12478355, 0.76696499, 1.        ]])

In [32]:
# Retrieve the target item's similarity scores with all other items
target_item = 'A'
target_item_index = user_item_matrix.columns.get_loc(target_item)
print("item:", target_item, ", index:", target_item_index)
target_item_similarity = item_similarity[target_item_index]
print("item:", target_item, ", similarity scores:", target_item_similarity)

item: A , index: 0
item: A , similarity scores: [1.         0.83189033 0.2300895  0.        ]


In [33]:
# Get indices of items sorted by similarity
top_n = 2
item_indices = target_item_similarity.argsort()[:-top_n-1:-1]
print(f"{item_indices}")

# Get recommended item ids
recommendations = user_item_matrix.columns[item_indices].tolist()
print("Recommended Items:", recommendations)

[0 1]
Recommended Items: ['A', 'B']


In [34]:
def item_based_recommender(df, target_item, top_n=2):
    """
    Item-based collaborative filtering recommender system.

    Parameters:
    - df: pandas DataFrame with 'user_id', 'item_id', and 'rating' columns.
    - target_item: item for which recommendations are sought.
    - top_n: number of top items to recommend.

    Returns:
    - recommendations: a list of top_n recommended items.
    """
    # Pivot the DataFrame to create a user-item matrix
    user_item_matrix = df.pivot_table(index='user_id', columns='item_id', values='quality', fill_value=0)  # df["rating"] = df["quality"]
    # Calculate cosine similarity between items
    item_similarity = cosine_similarity(user_item_matrix.T)

    # Retrieve the target item's similarity scores with all other items
    target_item_index = user_item_matrix.columns.get_loc(target_item)
    target_item_similarity = item_similarity[target_item_index]

    # Get indices of items sorted by similarity
    item_indices = target_item_similarity.argsort()[:-top_n-1:-1]

    # Get recommended item ids
    recommendations = user_item_matrix.columns[item_indices].tolist()

    return recommendations

In [35]:
target_item = 714723
recommended_products = item_based_recommender(df, target_item, top_n=4)
present_recommended_products(recommended_products)

**Currently trending products**

Recommended item 1/4: product 714723
category='bottoms'
User reviews:
- I loved this skirt but when it arrived it was MILES to big. I measured myself exactly before ordering and the waist it about 5 inches to bog for me. A real disappointment to have to pay to get it to Australia and then pay for a refund.
- Perfect swishy skirt!  It's lined in a similar mustard color, very comfortable and very beautiful on  it's now my go to accent skirt.  The colors are wonderful for the last days of summer and will easily carry me into fall with the wine and berry colors.  I'm so glad I got this skirt.  It does run comfortably so it's not super snug, but it's also not really really loose.
- I love this skirt so much! It's on the long side but it looks great. I wish the waist were just a little smaller but I know my waist is generally too small for clothes. Anyway this skirt makes me feel like Snow White and I love it!
...

Recommended item 2/4: product 731618
categor

## User-based filtering

In [36]:
df_sample

Unnamed: 0,user_id,item_id,rating
0,1,A,5
1,1,B,4
2,2,A,3
3,2,C,2
4,3,C,4
5,3,D,5
6,4,B,1
7,4,D,3


In [37]:
# Pivot the DataFrame to create a user-item matrix
user_item_matrix = df_sample.pivot_table(index='user_id', columns='item_id', values='rating', fill_value=0)

# Calculate cosine similarity between users
user_similarity = cosine_similarity(user_item_matrix)
user_similarity

array([[1.        , 0.64972212, 0.        , 0.19754592],
       [0.64972212, 1.        , 0.34651847, 0.        ],
       [0.        , 0.34651847, 1.        , 0.7407972 ],
       [0.19754592, 0.        , 0.7407972 , 1.        ]])

In [38]:
item_similarity

array([[1.        , 0.83189033, 0.2300895 , 0.        ],
       [0.83189033, 1.        , 0.        , 0.12478355],
       [0.2300895 , 0.        , 1.        , 0.76696499],
       [0.        , 0.12478355, 0.76696499, 1.        ]])

In [39]:
print("item matrix")
print(user_item_matrix.T)
print("")
print("user matrix")
print(user_item_matrix)

item matrix
user_id    1    2    3    4
item_id                    
A        5.0  3.0  0.0  0.0
B        4.0  0.0  0.0  1.0
C        0.0  2.0  4.0  0.0
D        0.0  0.0  5.0  3.0

user matrix
item_id    A    B    C    D
user_id                    
1        5.0  4.0  0.0  0.0
2        3.0  0.0  2.0  0.0
3        0.0  0.0  4.0  5.0
4        0.0  1.0  0.0  3.0


In [40]:
# Retrieve the target user's similarity scores with all other users
target_user = 1
target_user_index = user_item_matrix.index.get_loc(target_user)
target_user_similarity = user_similarity[target_user_index]

# Get indices of users sorted by similarity
top_n = 3
user_indices = target_user_similarity.argsort()[:-top_n-1:-1]
user_indices

array([0, 1, 3])

In [44]:
# Get recommended item ids based on similar users

#recommendations = 
user_item_matrix.iloc[user_indices].mean().sort_values(ascending=False).index.tolist()


['A', 'B', 'D', 'C']

In [45]:
def user_based_recommender(df, target_user, top_n=2):
    """
    User-based collaborative filtering recommender system.

    Parameters:
    - df: pandas DataFrame with 'user_id', 'item_id', and 'rating' columns.
    - target_user: user for whom recommendations are sought.
    - top_n: number of top items to recommend.

    Returns:
    - recommendations: a list of top_n recommended items.
    """
    # Pivot the DataFrame to create a user-item matrix
    df = df[~df["quality"].isna()]
    user_item_matrix = df.pivot_table(index='user_id', columns='item_id', values='quality', fill_value=0)

    # Calculate cosine similarity between users
    user_similarity = cosine_similarity(user_item_matrix)

    # Retrieve the target user's similarity scores with all other users
    target_user_index = user_item_matrix.index.get_loc(target_user)
    target_user_similarity = user_similarity[target_user_index]

    # Get indices of users sorted by similarity
    user_indices = target_user_similarity.argsort()[:-top_n-1:-1]

    # Get recommended item ids based on similar users
    recommendations = user_item_matrix.iloc[user_indices].mean().sort_values(ascending=False).index.tolist()

    return recommendations

# Example usage
target_user = 320458
recommended_items = user_based_recommender(df, target_user, top_n=3)
print("Recommended Items:", recommended_items[:3])

Recommended Items: [714723, 152702, 590933]


In [46]:
present_recommended_products(recommended_items[:3])

**Currently trending products**

Recommended item 1/3: product 714723
category='bottoms'
User reviews:
- I originally ordered this skirt in a 1x but it was HUGE on me, so I exchanged it for an XL. I probably could've gone down to the L, but I wear this with my shirt tucked it, so it's comfortable. The material feels heavy and high quality, without being too hot.
- The print is beautiful, and the waist fit perfectly, but I didn't find it flattering on my body type (short and curvy, bottom heavy). I'm 5'3 with short legs (27 inseam) and it came right to my lower calf, not a good look. I could have fixed the length by having it shortened by a few inches, but I didn't feel like investing more money in it with the price point. It is very full and pleated throughout which made my already large bottom look bigger.  I would recommend this to someone 5'75'10 with a proportional rear end :)
- Best skirt I've gotten from Modcloth! The material and fit are perfect, also love the floral pattern.
..

## Hybrid Filtering

In [48]:
# Collaborative Filtering
collaborative_recommendations = user_based_recommender(df, target_user=320458, top_n=3)

In [54]:
collaborative_recommendations = collaborative_recommendations[:3]

In [50]:
# Content-Based Filtering

content_recommendations = content_based_recommender(df_grouped, user_input='Knee-long skirt made of silk', top_n=len(df_grouped["item_id"].unique()))

In [55]:
content_recommendations = content_recommendations[:3]

In [56]:
# Hybrid Recommendations (Combining Collaborative and Content-Based)
top_n = 3

hybrid_recommendations = set(collaborative_recommendations).union(set(content_recommendations))
hybrid_recommendations = [item for item in hybrid_recommendations]

print("Recommended Items:", hybrid_recommendations[:top_n])


Recommended Items: [714723, 701811, 590933]


In [57]:
present_recommended_products(hybrid_recommendations[:top_n])

**Currently trending products**

Recommended item 1/3: product 714723
category='bottoms'
User reviews:
- I don't think I've ever ever written a review about a piece of clothing I've bought online before, but after the 20th compliment on this skirt from friends, coworkers and strangers (!) I figured it was my duty to. I have a 2828 waist and am 5'2  size S fits well sitting high on the waist and hits 34 past the knee, which I thought would be too long but is actually really lovely. It's one of my only work pieces that I'd also happily wear out shopping or to a party. It's the perfect skirt.
- Love this skirt! It's even better than I hoped for and I'm very pleased with the quality.
- I LOVE this skirt. It feels sturdy and well made, it hangs well, and it twirls. I wear it high on my waist because it's a little tight otherwise (and still is sometimes... the waistline doesn't stretch.) I wear it with red heels or denim wedges and get so many compliments. I plan to add burgundy tights and a

## Matrix Factorization/Singular Vector Decomposition (SVD)

https://uk.wikipedia.org/wiki/Сингулярний_розклад_матриці

https://machinelearningmastery.com/using-singular-value-decomposition-to-build-a-recommender-system/

Explaining the intuition behind vectors in the Singular Value Decomposition (SVD) model for recommender systems to a high-school student can be challenging but is possible with a simplified analogy.

Analogy: The Movie Preference Matrix

Let's imagine we have a giant matrix that represents how much each person likes each movie. Each row in the matrix represents a person, each column represents a movie, and the numbers in the matrix represent the ratings people give to movies.

Rows (People): Different people have different tastes. Some people might like action movies more, while others prefer romantic movies.

Columns (Movies): Movies also have their unique characteristics. Some are action-packed, some are funny, and some are emotional.

Now, SVD helps us break down this giant matrix into three smaller matrices:

- User Matrix (U): This matrix represents how much each person likes a particular type of movie. Each row in this matrix tells us about a person's general taste in movies. The values in this matrix help capture the unique preferences of each person.
- Item Matrix (V): This matrix represents the characteristics of each movie. Each column in this matrix tells us about a specific aspect of a movie, like how much action it has or how funny it is. The values in this matrix help capture the unique features of each movie.
- Diagonal Matrix (Sigma): This matrix represents how important each type of movie preference is. It helps us prioritize which aspects of a movie (e.g., action, romance) are more crucial in determining recommendations.

The magic happens when we multiply these matrices back together (U * Sigma * V). This product matrix gives us the estimated ratings for each user-movie pair. It's like combining people's general preferences (U), the unique features of each movie (V), and the importance of different aspects (Sigma) to predict how much someone might like a movie they haven't seen.


Summary:

- User Matrix (U): Describes how much each person generally likes different types of movies.
- Item Matrix (V): Describes the unique features of each movie.
- Diagonal Matrix (Sigma): Assigns importance to different aspects of movie preferences.

By breaking down the original matrix into these three smaller matrices, we can understand and predict people's preferences for movies they haven't watched yet. It's like figuring out the recipe for a perfect movie recommendation based on what people generally like and what each movie has to offer.

In [58]:
import numpy as np

# Sample dataset (user_id, item_id, rating)
data = {'user_id': [1, 1, 2, 2, 3, 3, 4, 4],
        'item_id': ['A', 'B', 'A', 'C', 'C', 'D', 'B', 'D'],
        'rating': [5, 4, 3, 2, 4, 5, 1, 3]}

df_sample = pd.DataFrame(data)

# Convert records into user-book review score matrix
datamatrix = df_sample.pivot(index="user_id", columns="item_id", values="rating").fillna(0)
matrix = datamatrix.values
datamatrix, matrix

(item_id    A    B    C    D
 user_id                    
 1        5.0  4.0  0.0  0.0
 2        3.0  0.0  2.0  0.0
 3        0.0  0.0  4.0  5.0
 4        0.0  1.0  0.0  3.0,
 array([[5., 4., 0., 0.],
        [3., 0., 2., 0.],
        [0., 0., 4., 5.],
        [0., 1., 0., 3.]]))

In [59]:
# Singular value decomposition
u, s, vh = np.linalg.svd(matrix, full_matrices=False)
 
# Find the highest similarity
def cosine_similarity(v,u):
    return (v @ u)/ (np.linalg.norm(v) * np.linalg.norm(u))

In [64]:
vh

array([[-0.56385008, -0.37060169, -0.46354614, -0.57432783],
       [-0.61692835, -0.39789319,  0.34102009,  0.58718457],
       [-0.29237698,  0.55839378, -0.63830393,  0.44190516],
       [ 0.46474531, -0.62652357, -0.51127132,  0.36066834]])

In [65]:
vh[:,0]

array([-0.56385008, -0.61692835, -0.29237698,  0.46474531])

In [66]:
target_item = 0  # A

highest_similarity = -np.inf
highest_sim_col = -1
for col in range(1,vh.shape[1]):
    similarity = cosine_similarity(vh[:,target_item], vh[:,col])
    if similarity > highest_similarity:
        highest_similarity = similarity
        highest_sim_col = col
 
print(f"Item {highest_sim_col} (item_id {datamatrix.columns[col]}) is most similar to item {target_item} (item_id {datamatrix.columns[target_item]})")

Item 2 (item_id D) is most similar to item 0 (item_id A)


https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD

In [67]:
from surprise import Dataset
from surprise import Reader
from surprise import SVD
from surprise.model_selection import train_test_split
from surprise import accuracy

# Load the dataset into Surprise
reader = Reader(rating_scale=(1, 5))  # df.quality.min() & df.quality.max()
dataset = Dataset.load_from_df(df[['user_id', 'item_id', 'quality']], reader)

# Split the dataset into training and testing sets
trainset, testset = train_test_split(dataset, test_size=0.2, random_state=42)

# Build the SVD model
model = SVD(n_factors=100, random_state=42)
# Train the model on the training set
model.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x14c036510>

In [68]:
testset

[(515984, 739127, 4.0),
 (529649, 427041, 3.0),
 (209767, 427567, 5.0),
 (881136, 407134, 3.0),
 (538513, 407134, 3.0),
 (949604, 539980, 4.0),
 (5544, 647235, 2.0),
 (858201, 340237, 4.0),
 (476561, 693560, 2.0),
 (591404, 646088, 3.0),
 (622441, 407134, 4.0),
 (779253, 165716, 2.0),
 (510371, 657242, 3.0),
 (682499, 715662, 2.0),
 (384874, 407134, 4.0),
 (685062, 757242, 4.0),
 (457121, 427567, 5.0),
 (741537, 806856, 5.0),
 (887037, 507565, 3.0),
 (947324, 165716, 1.0),
 (372025, 803464, 1.0),
 (498413, 486643, 2.0),
 (516990, 380801, 5.0),
 (874483, 391519, 3.0),
 (685752, 359893, 5.0),
 (260839, 519836, 4.0),
 (16520, 169727, 4.0),
 (496365, 338596, 5.0),
 (750423, 410263, 4.0),
 (256600, 210299, 3.0),
 (813873, 659701, 5.0),
 (816475, 412737, 4.0),
 (289306, 427567, 3.0),
 (753687, 486643, 4.0),
 (457322, 153494, 3.0),
 (308738, 298256, 4.0),
 (220838, 407574, 4.0),
 (552053, 645822, 5.0),
 (57577, 719701, 2.0),
 (850000, 416738, 4.0),
 (304621, 796383, 2.0),
 (329621, 639328, 4.

In [69]:
# Make predictions on the test set
predictions = model.test(testset)

# Recommend items for a specific user
target_user = 320458
items_to_recommend = df['item_id'].unique()

# Create a list of tuples (item_id, predicted_rating)
user_predictions = [(item, model.predict(target_user, item).est) for item in items_to_recommend]

# Sort the list by predicted ratings in descending order
sorted_predictions = sorted(user_predictions, key=lambda x: x[1], reverse=True)

# Print the top recommended items
top_n = 3
print(f"Top {top_n} Recommendations for User {target_user}:")
for i in range(top_n):
    print(f"Item ID: {sorted_predictions[i][0]}, Predicted Rating: {sorted_predictions[i][1]:.2f}")


Top 3 Recommendations for User 320458:
Item ID: 768644, Predicted Rating: 4.85
Item ID: 680903, Predicted Rating: 4.83
Item ID: 697881, Predicted Rating: 4.73


In [70]:
top_n_recommendations = [pred[0] for pred in sorted_predictions[:3]]
top_n_recommendations

[768644, 680903, 697881]

In [71]:
present_recommended_products(top_n_recommendations)

**Currently trending products**

Recommended item 1/3: product 768644
category='bottoms'
User reviews:
- such a great rich material and front pocket is super cute
- I love this skirt! It is well made and flattering. I am glad to add it to my wardrobe as a staple item.
- This is such a great skirt! Brilliant forest / fern green colour, lovely thick fabric (not too thick  good for Autumn and spring as well as winter), and great quality. I don't think it runs very large, just sits on the hips rather than highwaisted like most modcloth skirts (which I much prefer). Great purchase!
...

Recommended item 2/3: product 680903
category='bottoms'
User reviews:
- love this skirt!!
- More mustard yellow than shown. Lined. Good pockets. I'm very hippy with a big butt. Pencil skirts look incredibly inappropriate on me, because they fit in the waist, and cup every curve. This skirt does not. Fits tight in the waste, and doesn't cup my assets. I'm an engineer, I'm a professional, and I'm not 18 anymor