<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>📜 Introduction : </font></h3>
    
Recently, there is a large influx of information on the web, and this influx continues to grow, providing users with information about products, hotels, restaurants, and various services by offering various sources. Despite the benefits of this information, the vast amount of data makes it difficult for users to process it and make choices among the available options. This leads to information overload and complicates the decision-making process. In this context, filtering the information to a limited amount based on the current user/customer preferences is important to assist them in making informed decisions. Such filtering is typically done by recommendation systems (RS) and is developed to address the problem of information overload, providing personalized recommendations for services based on specific customers' preferences.

Since the emergence of RS, the field has gained significant importance in academia, business, and industry. It is widely used in various domains such as e-commerce (Amazon), music (Pandora), movies (Netflix), travel (TripAdvisor), restaurants (Yelp), people (Facebook), and articles (TED). The recent development of e-commerce websites has shown that RS plays a significant role in helping users find items that match their needs and potentially discover items that align with their preferences. For example, according to Amazon statistics in 2015, 35% of sales came from items recommended to users.

* RS typically goes through three stages. The stages are as follows:

    1. **Modeling Stage:** This stage focuses on preparing data that will be used in the subsequent two stages. There are two cases: the first case creates a rating matrix that includes users as rows and items as columns, where each cell's value represents a rating given by a user for a specific item. The second case typically involves creating a user profile, often a vector for each user, which describes a user's preferences for an item as a whole or for certain aspects of the item. The third case involves creating an item profile that contains the features of a specific item.

    2. **Prediction Stage:** This stage aims to predict the ratings or scores for items that are unseen/unknown for a particular user based on the information extracted during the modeling stage.

    3. **Recommendation Stage:** This stage is an extension of the prediction stage and is applied to filter the most suitable items for a user using various approaches to support the user's decision. It suggests new items to the user (i.e., the top N items with the highest predicted ratings) that are likely to be the most appealing to them.

The primary goal of recommendation systems is to solve the problem of information overload on the web and provide users with different resources related to products, hotels, restaurants, and services. To achieve this fundamental goal, RS typically takes into account several common objectives such as relevance/precision, novelty, serendipity, and diversity. 
 
* These four objectives are briefly explained as follows:
    
    * **Relevance/Precision** - Relevance and precision, often used interchangeably, are the most crucial evaluation criteria for RS. Users prioritize selecting items that are more relevant to their interests. As a result, RS focuses on recommending items that align with the user's preferences. Precision in RS is determined based on how well the item matches the user's needs and how much the user likes it. High values of prediction performance metrics such as precision, recall, and F-score reflect good precision of the RS model. On the other hand, lower values for error metrics like MAE and MSE indicate better RS precision.

    * **Novelty** - Novelty is a fundamental aspect of the success of recommendations and a key measure of customer satisfaction. Novelty in RS refers to the system's ability to generate new recommendations for users. A recommendation is considered new when it satisfies three characteristics: unknown, satisfying, and different. Unknown represents items that the user is not aware of. Satisfying relates to items that are satisfactory to the user, while different concerns items that are distinct from those in the user's profile. Novelty counters popularity; the more popular the recommended items are for the user, the less novel the recommendations are for RS performance.

    * **Serendipity** - Closely related to the concept of novelty is serendipity, which involves the unexpectedness of positive emotional responses from users to previously unknown, chance-related items. RS that recommends serendipitous items will significantly boost sales and establish trust relationships with users. Recommending chance-related discoveries enhances the user experience with the system by stimulating the user's curiosity.

    * **Diversity** - Diversity creates a contrast with similarity; RS that provides diverse recommendations address the problem of overfitting and enhance the user's experience with RS. Diversity is typically applied to a set of items and relates to how different the items are from each other. Diversity often ensures that users are not dissatisfied with consistently receiving the same recommendations.
    
**Data and Information Sources**

Recommendation systems (RS) are information processing systems that actively collect various types of data to create recommendations. The data typically includes information about items to be recommended and the users who will receive these recommendations. However, the available data and information sources for recommendation systems can be quite diverse, depending on the recommendation technique being used.

The data used by RS refers to three types of entities: items, users, and interactions, which represent the relationships between users and items.

**Items** - Items are the objects to be recommended. Items can be characterized by their complexities and values or utility for users. The value of an item can be positive if it is useful for the user, or negative if the item is not useful, and the user made a wrong decision.

RS can utilize the features of items based on their fundamental technologies. For example, in a movie recommendation system, features such as genre (comedy, thriller, etc.), director, and actors can be used to describe a movie and learn why an item is considered useful.

**Users** - RS users can have various goals and characteristics. To personalize recommendations and human-computer interactions, RS uses various information about users. This information can be structured in different ways, and the choice of which information to model depends on the recommendation technique.

For example, in collaborative filtering, users are modeled as simple lists containing the ratings users have given to some items. In demographic-based RS, socio-demographic attributes like age, gender, profession, and education are used. The user data is said to create the user model, which profiles the user by encoding their preferences and needs.

**Transactions** - Transactions are user interactions with a recommendation system (RS), typically recorded as log-like data. These interactions include item selections, user context, and potentially explicit feedback like item ratings. Ratings, either explicitly provided by users on a scale or implicitly gathered, are common transaction data collected by RS.

Various forms of ratings include:

- Numerical ratings like 1-5 stars associated with book recommendations on Amazon.com.
- Ordinal ratings where users are asked to choose a term that best represents their opinion, often conducted through surveys, such as "strongly agree, agree, neutral, disagree, strongly disagree."
- Binary ratings where users are simply asked to decide if a specific item is good or bad.

These ratings are valuable data used by RS to make recommendations and improve the user experience.

<center><img src="https://i.imgur.com/M1pKmtr.png" width="800" height="800"></center>

# Content

1. [🍔 Association Rule Learning 🍟](#1)
    * [Apriori Algorithm](#2)
    * [Association Rule Based Recommender System](#3)
        * [Data Preprocessing](#4)
        * [Preparing Invoice-StockCode (Product) Matrix](#5)
        * [Association Rules](#6)
        * [Product Recommendation](#7)
1. [🧾 Content Based Filtering 🧾](#8)
    * [Count Vector](#9)
    * [TF-IDF](#10)
    * [Content Based Recommender System](#11)
        * [Data Preprocessing](#12)
        * [Creating the TF-IDF Matrix](#13)
        * [Movie Recommendation](#14)
1. [🫂 Collaborative Filtering 🫂](#15)
    * [User-Based Collaborative Filtering](#16)
    * [User-Based Recommender System](#17)
        * [Preparing Data](#18)
        * [Creating User-Movie DataFrame](#19)
        * [Determination of Similarity](#20)
        * [Score Calculation](#21)
    * [Item-Based Collaborative Filtering](#22)
    * [Item-Based Recommender System](#23)
    * [Model-Based Collaborative Filtering](#24)
    * [Gradient Descent](#25)
    * [Model-Based Recommender System](#26)
        * [Preparing Data](#27)
        * [Modelling](#28)
        * [Model Tuning](#29)
        * [Predict](#30)
1. [Sources](#31)

<a id="1"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">🍔 Association Rule Learning 🍟</h1>

Association Rule Learning is a type of unsupervised learning technique that examines the dependency of one data item on another data item and maps accordingly to make it more profitable. It aims to discover interesting relationships or associations among the variables of the dataset. It is based on different rules to uncover these interesting relationships between variables in the database.

Association Rule Learning is one of the essential concepts in machine learning and is employed in various fields such as Market Basket Analysis, Web Usage Mining, continuous production, and more. Market Basket Analysis, for instance, is a technique used by various large retailers to uncover associations between items. We can understand this by considering a supermarket example: In a supermarket, products that are often purchased together are placed nearby.

Association Rule Learning is often accomplished using three different algorithms:

1. **Apriori Algorithm:** The Apriori algorithm generates association rules using frequent data sets. It is designed to work with databases containing transactions and employs a breadth-first search and Hash Tree to efficiently calculate itemsets. It is commonly used for market basket analysis to understand which products tend to be bought together. It can also be applied in healthcare to discover drug reactions for patients.

2. **Eclat Algorithm:** The Eclat algorithm (Equivalence Class Transformation) uses a depth-first search technique to find frequent itemsets in a transaction database. It offers faster execution compared to the Apriori Algorithm, especially when dealing with large datasets.

3. **F-P Growth Algorithm:** The F-P Growth algorithm (Frequent Pattern Growth) is an improved version of the Apriori Algorithm. It represents the database in the form of a tree structure known as a frequent pattern tree. The purpose of this frequent tree is to extract the most frequent patterns.

<center><img src="https://i.imgur.com/cGuc8Na.png" width="400" height="400"></center>

<div style="border-radius:10px; border:#632626 solid; padding: 15px; background-color: #FDF6EC; font-size:100%; text-align:left">

<h3 align="left"><font color='#11324D'>💡 An Example: </font></h3>

**Scenario - Analyzing Product Relationships of a Coffee Shop:**

**Description:**

A coffee shop wants to track which products are often ordered together by customers and use this information to improve their marketing strategies. We aim to analyze these relationships using the Apriori algorithm.

**Data Collection:**

The coffee shop records product lists for each order transaction. Below are a few sample order transactions:

* Transaction 1:
    - Espresso
    - Scone
    - Latte

* Transaction 2:
    - Cappuccino
    - Scone

* Transaction 3:
    - Latte
    - Muffin

* Transaction 4:
    - Espresso
    - Cappuccino
    - Muffin

**Data Preprocessing:**

Data needs to be converted into an appropriate format and repeated products should be corrected.

**Rule Extraction with the Apriori Algorithm:**

Using the Apriori algorithm, we can identify frequent combinations of products that are often ordered together. For example, if "Espresso" and "Cappuccino" are frequently ordered together, we can extract a rule. This process results in rules that express relationships between products in the transactions.

**Example Rules:**

As a result of the Apriori algorithm, we can create rules such as:

- When an Espresso is ordered, a Cappuccino is often ordered as well.
- When a Scone is ordered, a Latte is often ordered as well.
- When a Muffin is ordered, Espresso or Cappuccino is often ordered as well.

**Calculation of Metrics:**

Let's calculate the metrics for these rules. Assume that when an Espresso is ordered, a Cappuccino is often ordered as well. Below are the calculations:

Support(Espresso → Cappuccino) = (Number of transactions where Espresso and Cappuccino are purchased together) / (Total number of transactions)

Confidence(Espresso → Cappuccino) = (Number of transactions where Espresso and Cappuccino are purchased together) / (Number of transactions where Espresso is purchased)

Lift(Espresso → Cappuccino) = (Confidence(Espresso → Cappuccino)) / (Support(Cappuccino))

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>👀 What are association rules?: </font></h3>

Market basket analysis is based on discovering associations between items by using association rules which take the form of if-then relationships. To build an association rule, we should have at least one antecedent and one consequent.

* One antecedent and one consequent: if { 🍪 } then { ☕️ }
* Multi antecedent: if { 🍪, 🍰 } then { ☕️}
* Multi consequent: if { 🍪 } then { ☕️, 🥛 }
    
"Antecedent" and "consequent" are terms used in various data mining and data analysis techniques, such as association rule learning. Here are the meanings of these terms:

**Antecedent**:

- "Antecedent" represents the starting or conditional part of an association rule.
- This is typically the "If..." part of a rule and represents the precursor of an event or condition.
- The antecedent is an expression where a specific situation or condition is met.
- Example: If a customer purchases "coffee" (coffee being the antecedent), it forms the antecedent part of the rule.

**Consequent**:

- "Consequent" represents the result or outcome part of an association rule.
- This is typically the "Then..." part of a rule and represents the consequence of an event or condition.
- The consequent indicates the event or condition that will occur if the antecedent is met.
- Example: If a customer purchases "coffee," then they may also buy "bread" (bread being the consequent).

Association rule learning is often used to determine relationships between antecedents and consequents. It is commonly applied in applications like market basket analysis to identify situations where specific items are purchased together and to make recommendations based on these relationships.
    
 The number of total rules grows by the factor of unique items. As you can imagine, not all rules can be equally important and we need metrics to help us identify which rules to eliminate and which to consider.

**Support**
    
Support is the main metric to measure how interesting and important a rule is. It can be applied to a single item or pair of antecedents and consequents. It is calculated by dividing the number of transactions including certain item(s) by the number of total transactions. Support value ranges from 0 to 1.

For example, if we have 10 transactions and 6 of them include coffee and 4 of them include both coffee and cookie, then support of {coffee} is 60% and support of {coffee, cookie} is 40%.
    
<center><img src="https://i.imgur.com/it5fzCB.png" width="400" height="400"></center>
    
**Confidence**
    
Confidence is the probability of purchasing item B given that they purchased item A. It is calculated by dividing the support of item A&B by the support of item A. Confidence value ranges from 0 to 1.

It is important to use support together with confidence since popular items can mislead the interpretation of results if we use support metric only.
    
It is important to mention that, confidence metric is not symmetric and Confidence(A→B) is different than Confidence(B→A).
    
<center><img src="https://i.imgur.com/SJPZfEd.png" width="400" height="400"></center>
        
**Lift**
    
Lift metric is used to detect uninteresting association rules to ease rule pruning. It assumes the occurrence of item A in a transaction is independent of the occurrence of item B if P(A ∪ B) = P(A)P(B), otherwise these two items are dependent and so correlated. It is calculated by dividing the proportion of transactions that contain items A and B by the proportion of item A and item B that takes place independently. The lift value ranges from 0 to infinity.


* Lift(A → B) > 1 means that items are positively correlated and occurrence of one positively affects the occurrence of other

* Lift(A → B) =1 means that there is no correlation

* Lift(A → B) < 1 means that items are negatively correlated and occurrence of one negatively affects the occurrence of other
                                 
<center><img src="https://i.imgur.com/2ULqH4g.png" width="400" height="400"></center>

**Leverage**

Leverage is the measure of the difference between having items A&B in a transaction together and having item A and item B as they were independent. It is calculated by extracting the proportion of item A and item B that takes place independently from the proportion of transactions that contain items A and B. Leverage value ranges between -1 and 1.

**Conviction**

Conviction metric is used to measure how much a consequent depends on an antecedent. It is calculated by multiplying the proportion of transactions that contain item A with the proportion of transactions that do not contain item B and dividing this by the proportion of transactions that contain item A and not item B. Conviction value ranges from 0 to infinity.

If the conviction value is high, this means that the consequent is highly dependent on the antecedent.

<a id = "2"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Apriori Algorithm✨</p>

The Apriori algorithm is used for mining frequent itemsets and devising association rules from a transactional database. The parameters “support” and “confidence” are used. Support refers to items’ frequency of occurrence; confidence is a conditional probability.

Items in a transaction form an item set. The algorithm begins by identifying frequent, individual items (items with a frequency greater than or equal to the given support) in the database and continues to extend them to larger, frequent itemsets​.

**What Is An Itemset?**

A set of items together is called an itemset. If any itemset has k-items it is called a k-itemset. An itemset consists of two or more items. An itemset that occurs frequently is called a frequent itemset. Thus frequent itemset mining is a data mining technique to identify the items that often occur together.

For Example, Bread and butter, Laptop and Antivirus software, etc.

**What Is A Frequent Itemset?**

A set of items is called frequent if it satisfies a minimum threshold value for support and confidence. Support shows transactions with items purchased together in a single transaction. Confidence shows transactions where the items are purchased one after the other.

For frequent itemset mining method, we consider only those transactions which meet minimum threshold support and confidence requirements. Insights from these mining algorithms offer a lot of benefits, cost-cutting and improved competitive advantage.

There is a tradeoff time taken to mine data and the volume of data for frequent mining. The frequent mining algorithm is an efficient algorithm to mine the hidden patterns of itemsets within a short time and less memory consumption.

**Frequent Pattern Mining (FPM)**

The frequent pattern mining algorithm is one of the most important techniques of data mining to discover relationships between different items in a dataset. These relationships are represented in the form of association rules. It helps to find the irregularities in data.

FPM has many applications in the field of data analysis, software bugs, cross-marketing, sale campaign analysis, market basket analysis, etc.

Frequent itemsets discovered through Apriori have many applications in data mining tasks. Tasks such as finding interesting patterns in the database, finding out sequence and Mining of association rules is the most important of them.

Association rules apply to supermarket transaction data, that is, to examine the customer behavior in terms of the purchased products. Association rules describe how often the items are purchased together.

<a id = "3"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Association Rule Based Recommender System✨</p>

<a id = "4"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Data Preprocessing</div>

In [25]:
# !pip install mlxtend

import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules, fpmax, hmine

import warnings
warnings.filterwarnings("ignore")

# pd.set_option('display.max_columns', None)
# pd.set_option('display.width', 500)
# pd.set_option('display.expand_frame_repr', False)

In [2]:
df_ = pd.read_csv("/kaggle/input/online-retail-ii-data-set-from-ml-repository/Year 2010-2011.csv",encoding='iso-8859-9')

In [3]:
df = df_.copy() # We work with a copy every time so we don't run main dataset again.
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


In [4]:
df.shape

(541910, 8)

In [5]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,541910.0,9.552234,218.080957,-80995.0,1.0,3.0,10.0,80995.0
Price,541910.0,4.611138,96.759765,-11062.06,1.25,2.08,4.13,38970.0
Customer ID,406830.0,15287.68416,1713.603074,12346.0,13953.0,15152.0,16791.0,18287.0


<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
* As we can see, there are negative values present. This is because returned items are recorded with negative values. We need to identify and remove them.
* Additionally, when we look at the "mean" and "max" values, we can discern the presence of outliers.

In [6]:
df.isnull().sum()

Invoice             0
StockCode           0
Description      1454
Quantity            0
InvoiceDate         0
Price               0
Customer ID    135080
Country             0
dtype: int64

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
* As seen, there is missing data in the Customer ID and Description features.
* Let's remove these missing rows and apply the other operations one by one to prepare the data.

In [7]:
df.dropna(inplace=True)

In [8]:
df = df[~df["Invoice"].str.contains("C", na=False)]

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
* The "C" expression represents canceled transactions. Therefore, since these values were added as negative values, the minimum values appear as missing in the describe check. Therefore, we need to clean this data.

In [9]:
df = df[df["Quantity"] > 0]
df = df[df["Price"] > 0]

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
* Price ve Quantity değerleri sıfır ve altında bir değer alamayacağından yeni eşik değerlerimizi ayarlıyoruz.

----

In [10]:
def corr_skew_outliner(df, cols):

    for col in cols:
        
        Q1 = df[col].quantile(0.05)
        Q3 = df[col].quantile(0.95)
        df.loc[df[col] < Q1, col] = Q1
        df.loc[df[col] > Q3, col] = Q3
        #df[col] = np.sqrt(df[col])
        
    return df

-----

In [11]:
cols = ["Quantity","Price"]

corr_skew_outliner(df,cols)

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,12/9/2011 12:50,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,12/9/2011 12:50,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,12/9/2011 12:50,4.15,12680.0,France
541908,581587,22138,BAKING SET 9 PIECE RETROSPOT,3,12/9/2011 12:50,4.95,12680.0,France


In [12]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Quantity,397885.0,8.868002,9.523421,1.0,2.0,6.0,12.0,36.0
Price,397885.0,2.6758,2.275069,0.42,1.25,1.95,3.75,8.5
Customer ID,397885.0,15294.416882,1713.144421,12346.0,13969.0,15159.0,16795.0,18287.0


<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>
    
* All removed. The data is now clear.

<a id = "5"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Preparing Invoice-StockCode (Product) Matrix</div>

In [13]:
df.head()

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,12/1/2010 8:26,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,12/1/2010 8:26,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,12/1/2010 8:26,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,12/1/2010 8:26,3.39,17850.0,United Kingdom


<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* We will transform the data into a column-based form where each column represents the products on a per-invoice basis, and we will also indicate whether the product is present or not within that invoice in binary form (0-1).

In [14]:
df_fr = df[df['Country'] == "France"]

In [15]:
inv_pro_matrix = df_fr.groupby(["Invoice","StockCode"])["Quantity"].count().unstack().notnull()

In [16]:
inv_pro_matrix

StockCode,10002,10120,10125,10135,11001,15036,15039,15044C,15056BL,15056N,...,90030C,90031,90099,90184B,90184C,90201B,90201C,C2,M,POST
Invoice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
536370,True,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
536852,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
536974,False,False,False,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,True
537065,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
537463,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
580986,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
581001,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
581171,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
581279,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True


-----

In [17]:
def check_id(dataframe, stock_code):
    
    product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"]].values[0].tolist()
    print(product_name)

In [18]:
check_id(df_fr, "10120")

['DOGGY RUBBER']


-----

<a id = "6"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Association Rules</div>

In [26]:
frequent_itemsets = hmine(inv_pro_matrix,
                            min_support=0.01,
                            use_colnames=True)

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* Here, we create frequent itemsets using hmine algorithm. Becayse apriori is still effective but hmine is the fastest arl algorithm.
* We filter only the items that exceed a specific threshold using the min_support parameter.
* We use the use_colnames parameter to use the real names of the items.

In [27]:
frequent_itemsets.sort_values("support", ascending=False)

Unnamed: 0,support,itemsets
40654,0.773779,(POST)
39604,0.187661,(23084)
18932,0.179949,(21731)
34878,0.172237,(22554)
36130,0.169666,(22556)
...,...,...
21375,0.010283,"(22554, 22745, 22748, 22138)"
21374,0.010283,"(22554, 22745, 22138)"
21373,0.010283,"(22659, 22554, 22138)"
21372,0.010283,"(22554, POST, 22138, 22556)"


In [28]:
rules = association_rules(frequent_itemsets,
                          metric="support",
                          min_threshold=0.01)

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* At this point, we are creating association rules.
* Association rules are calculated and ranked based on a specific support threshold.

In [29]:
rules.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
antecedent support,1372704.0,0.034928,0.072807,0.010283,0.010283,0.012853,0.028278,0.773779
consequent support,1372704.0,0.034928,0.072807,0.010283,0.010283,0.012853,0.028278,0.773779
support,1372704.0,0.01081,0.002242,0.010283,0.010283,0.010283,0.010283,0.167095
confidence,1372704.0,0.675912,0.337215,0.013289,0.363636,0.8,1.0,1.0
lift,1372704.0,43.230717,34.530939,0.409905,10.805556,32.416667,77.8,97.25
leverage,1372704.0,0.009889,0.001936,-0.014803,0.009595,0.010051,0.010177,0.10555
conviction,1372704.0,inf,,0.35549,1.547191,4.845758,,inf
zhangs_metric,1372704.0,0.954213,0.118284,-0.832938,0.960938,0.994751,1.0,1.0


In [31]:
rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \
sort_values("confidence", ascending=False).head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
504562,"(21080, 21086)",(21094),0.102828,0.128535,0.100257,0.975,7.5855,0.08704,34.858612,0.967673
504563,"(21080, 21094)",(21086),0.102828,0.138817,0.100257,0.975,7.023611,0.085983,34.447301,0.955918
529049,"(21080, 21086, POST)",(21094),0.084833,0.128535,0.082262,0.969697,7.544242,0.071358,28.758355,0.947858
529050,"(21080, 21094, POST)",(21086),0.084833,0.138817,0.082262,0.969697,6.98541,0.070486,28.419023,0.936271
560439,(21094),(21086),0.128535,0.138817,0.123393,0.96,6.915556,0.10555,21.529563,0.981563
608604,"(21094, POST)",(21086),0.107969,0.138817,0.102828,0.952381,6.86067,0.08784,18.084833,0.957637
1369995,(23256),(23254),0.069409,0.071979,0.064267,0.925926,12.863757,0.059271,12.528278,0.99105
1358526,"(22726, 22728, POST)",(22727),0.064267,0.095116,0.059126,0.92,9.672432,0.053013,11.311054,0.958194
903512,(21988),(21987),0.056555,0.064267,0.051414,0.909091,14.145455,0.047779,10.293059,0.985014
1369994,(23254),(23256),0.071979,0.069409,0.064267,0.892857,12.863757,0.059271,8.685518,0.993795


In [32]:
check_id(df_fr, "21086")

['SET/6 RED SPOTTY PAPER CUPS']


<a id = "7"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Product Recommendation</div>

In [33]:
sorted_rules = rules.sort_values("lift", ascending=False)

In [34]:
product_id = "21080"
check_id(df, product_id)

['SET/20 RED RETROSPOT PAPER NAPKINS ']


In [35]:
recommendation_list = sorted_rules.loc[sorted_rules["antecedents"].apply(lambda x: product_id in x), "consequents"].head(3).tolist()

In [36]:
recommendation_list

[frozenset({'21094', '22492', '22556', '22728'}),
 frozenset({'21094', '22492', '22556', '22728'}),
 frozenset({'21094', '22492', '22551', '22728'})]

In [37]:
def arl_recommender(rules_df, product_id, rec=1):
    
    sorted_rules = rules_df.sort_values("lift", ascending=False)
    
    recommendation_list = []
    
    for i, product in enumerate(sorted_rules["antecedents"]):
        
        for j in list(product):
            
            if j == product_id:
                
                for k in list(sorted_rules.iloc[i]["consequents"]):
                
                    if k not in recommendation_list:
                        
                        recommendation_list.append(k)

    return recommendation_list[0:rec]

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* With this function, association rules are sorted based on the "lift" metric, and recommendations are generated from among the items found in the "antecedents" column of a product.
* The recommended items are retrieved from the "consequents" column, allowing us to obtain the recommendations with the highest lift metric in relation to the target product.

In [38]:
arl_recommender(rules, "22492", 1)

['22631']

In [39]:
arl_recommender(rules, "22492", 2)

['22631', '23238']

In [40]:
arl_recommender(rules, "22492", 3)

['22631', '23238', '22556']

In [41]:
check_id(df_fr, "22492")
check_id(df_fr, "22551")
check_id(df_fr, "23238")
check_id(df_fr, "22631")

['MINI PAINT SET VINTAGE ']
['PLASTERS IN TIN SPACEBOY']
['SET OF 4 KNICK KNACK TINS LONDON ']
['CIRCUS PARADE LUNCH BOX ']


<div style="border-radius:10px; border:#D0C2F0 solid; padding: 15px; background-color: #F8E8EE; font-size:100%; text-align:left">

<h3 align="left"><font color='#5E5273'>👻 Analysis Results: </font></h3>

* First, the data needs to be cleaned and prepared.
* We create a transaction-products matrix to determine frequent itemsets and association rules.
* We generated association rules based on a specific threshold value.
* We applied some constraints to the DataFrame containing association rules using filtering methods.
* Finally, we made recommendations based on the highest Lift value.
* We anticipate that a person who has "**MINI PAINT SET VINTAGE**" could potentially purchase the following 3 items:
   * **PLASTERS IN TIN SPACEBOY**
   * **SET OF 4 KNICK KNACK TINS LONDON**
   * **CIRCUS PARADE LUNCH BOX**

<a id="8"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">🧾 Content Based Filtering 🧾</h1>

Content-based filtering is a type of recommendation system that can provide highly personalized item recommendations to each user. If you enjoy watching Marvel movies, you are more likely to watch Batman in the future compared to "The Fault in Our Stars." This is what content-based filtering aims to address. This algorithm provides item recommendations to you based on what you have liked in the past. It makes recommendations based on metadata and external features.

What truly sets content-based filtering apart from other recommendation systems is that it doesn't actually require other people's data. All recommendations are made solely based on your data and preferences.

Now you might ask, how does a content-based recommendation algorithm work to recommend similar items?

There are many different ways to perform content-based filtering, meaning it can recommend items based on one or more features. For example, let's assume that you liked "The Dark Knight" in the past. Content-based filtering will then recommend movies that are similar to "The Dark Knight" based on one or more features such as genre, movie synopsis, movie director, and more.

Another important point to remember is that this recommendation system uses similarity algorithms to recommend items that are similar to what you liked in the past. Similarity algorithms are distance-based methods used for making recommendations based on product similarity. Among various similarity algorithms, cosine similarity is the most commonly used algorithm, or Euclidean similarity can also be used. In summary, cosine similarity measures the distance between two vectors in a high-dimensional space using the following formula.

<center><img src="https://i.imgur.com/3EEJSSP.png" width="500" height="500"></center>

When using cosine similarity, we obtain a value within the range of -1 to 1. The closer the value is to 1, the more similar the items are. On the other hand, the closer the value is to -1, the more dissimilar the two items will be. This situation is associated with the cosine(0) angle resulting in "1" and the cosine(90) angle resulting in "0," as you can probably predict.

**Content-Based Filtering: Key Terms**

1. **User Matrix:** It represents a two-dimensional matrix that contains all the information related to users. This could be data about your customers or subscribers, for example. Each row of the matrix typically represents a specific user, and the columns represent the numerical representation of each user.

2. **Item Matrix:** The concept of the item matrix is similar to the user matrix. It consists of a two-dimensional matrix that contains all the information related to items. This could be a movie, an electronic device, or any product you offer to users. Each row of the item matrix represents a specific item, and the columns represent the numerical representation of each item.

3. **Features:** The numerical representations of each user and item (the columns of the user matrix and item matrix) can also be referred to as features. Since computers can only process numerical data, creating these features is a crucial step in producing high-quality recommendations.

For example, the features of a particular movie could be its summary. However, information about the summary comes in text format, and computers cannot process text data directly. Therefore, what we usually do is to convert the text into its numerical representation using different methods such as TF-IDF or count vectorization, to enable the computer to work with it and calculate similarity.

<center><img src="https://i.imgur.com/X5UQ5cs.png" width="700" height="700"></center>

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>📄 Content-Based Filtering: Advantages and Disadvantages:</font></h3>

**Advantages**
    
* It is easily scalable to a large number of customers since the data of other users is not required for recommending something to a particular user.
* Since the recommendations are based on the day-to-day activities of the user, all the preferences and parameters of the suggestions are finely tuned to the user’s choice. Therefore, the model can recommend specific niche items that other users might not be interested in.
* The latest items can be suggested as soon as they are launched, without waiting for a census, since the features are readily available from the start.
    
**Disadvantages**
    
* Building a content-based recommender engine requires a lot of domain knowledge since the feature selection of the items is mostly hard-coded into the system. Thus, the model is only as good as the knowledge of the one building it.
* The model can recommend new items based on the present interest of the user. Hence, discovering and expanding to newer avenues that might interest the user is not possible.
* The cold start problem is a significant drawback since the engine does not have sufficient information about a new user to start making suggestions.
* It is hard to make new recommendations to not-so-active users.

<a id = "9"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Count Vector✨</p>

Count Vectorization is a text mining and natural language processing technique used to represent a text document or a collection of documents. This method creates a vector by counting the frequency of words in a text document. Count Vectorization represents a text document with numbers, where these numbers indicate how many times each word appears in the text.

The basic steps of Count Vectorization are as follows:

1. **Text Document Preparation**: First, you need to prepare the text document or collection of documents for analysis. These documents could be a series of articles, reviews, or texts, for example.

2. **Tokenization**: Tokenization is the process of splitting the text into smaller pieces (tokens). These tokens are typically words or word clusters.

3. **Frequency Calculation**: It calculates how many times each word appears in the text. This measures the frequency of each word.

4. **Vector Creation**: A vector is created based on the frequency of words. Each word is represented as a column in the vector, and the value of this column shows how many times that word appears in the text.

Count Vectorization is commonly used in text mining and natural language processing applications. This method is a fundamental approach to convert text data into numerical data, and word frequencies are used as input data for text mining algorithms.

<a id = "10"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨TF-IDF✨</p>

TF-IDF (Term Frequency-Inverse Document Frequency) is a widely used term in the field of text mining and information retrieval. It is used to determine the importance ranking of words in text documents and to compare text documents. TF-IDF calculates the importance of a word in a document and compares it to other documents within a collection of documents.

The key components of TF-IDF are as follows:

1. **TF (Term Frequency)**: It measures how often a specific term appears in a document. In other words, it expresses the frequency of a word in a particular document, which determines the importance of a word in that document.

2. **IDF (Inverse Document Frequency)**: It measures how common or rare a word is within a collection of all documents. In other words, it determines the overall importance of a word. Rare terms have higher IDF values because rare terms are considered more informative.

TF-IDF is the product of the term frequency of a word in a specific document and the inverse document frequency of the same word across the entire collection of documents. This combines the importance of a word in a specific document with its overall importance within the document collection.

* The benefits of TF-IDF can include:

    - Reducing the weight of unimportant or common words.
    - Emphasizing the importance of unique and informative words.
    - Using text documents for similarity or comparison purposes.

TF-IDF is used in many applications such as text mining, document classification, recommendation systems, and search engines. It helps extract meaningful information from text data and makes document comparisons more effective.

<a id = "11"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Content Based Recommender System✨</p>

<a id = "12"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Data Preprocessing</div>

In [None]:
import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

import warnings
warnings.filterwarnings("ignore")

# pd.set_option('display.max_columns', None)
# pd.set_option('display.width', 500)
# pd.set_option('display.expand_frame_repr', False)

In [None]:
movies = pd.read_csv("/kaggle/input/the-movies-dataset/movies_metadata.csv",
                    usecols=["id","overview","title","vote_average","vote_count","release_date"],low_memory=False)

In [None]:
movies.head()

In [None]:
movies.shape

In [None]:
movies.isnull().sum()

In [None]:
movies = movies.dropna()

In [None]:
movies.duplicated().sum()

In [None]:
movies = movies.drop_duplicates()

In [None]:
movies = movies.rename(columns={"id":"movieId"})

In [None]:
movies["movieId"] = movies["movieId"].astype("int64")

In [None]:
movies = movies.reset_index(drop=True)

In [None]:
movies["overview"].head()

In [None]:
movies["overview"].loc[1]

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>👀 Text Mining Terms : </font></h3>
    
1. **Token:** Tokens are meaningful units that represent parts of a text. They can often be words, symbols, or characters.
2. **Document:** It represents a text document or a piece of content. It can be any piece of text, such as an article, a book, a news text, or a blog post. A document contains one or more tokens.
3. **Corpus:** It typically refers to a collection of text documents gathered around a language or a specific topic. A corpus includes text data compiled for use in a text mining or NLP project.

**An Example;**

"When siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures."

- This overview is a document.
- The word "Judy" is a token within the document.
- A "corpus" is the total of all sentences in the entire dataset.

<a id = "13"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Creating the TF-IDF Matrix</div>

In [None]:
movies["overview"] = movies["overview"].str.replace(r"[^\w\s]"," ",regex=True).str.replace(r"[\d]"," ",regex=True)

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* In every row of the "overview" column, we replace every character other than letters and whitespace characters with a space character. This means that punctuation marks, numbers, and other special characters are replaced and removed with space characters.
* Additionally, we clean up and convert the numbers in the text into space values.

In [None]:
tfidf = TfidfVectorizer(stop_words="english", min_df = 4)
tfidf_matrix = tfidf.fit_transform(movies["overview"])

In [None]:
tfidf_matrix.shape # 44407 movies,23499 unique words

In [None]:
similarity = cosine_similarity(tfidf_matrix,tfidf_matrix)

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* The `cosine_similarity` operation calculates a similarity measure based on TF-IDF (Term Frequency-Inverse Document Frequency) for a specific collection of text documents. This measurement computes the similarity between two documents and returns the result as a similarity matrix.
* Each element in this matrix represents the cosine similarity between two documents. Cosine similarity measures the similarity between documents and its value ranges from -1 to 1. The closer the value is to 1, the more similar the documents are; the closer it is to -1, the more different they are.

In [None]:
similarity.shape

In [None]:
similarity[1]

In [None]:
index = movies[movies["movieId"] == 8844].index[0]

In [None]:
# tfidf.get_feature_names()

feature_names = tfidf.get_feature_names_out()

feature_names

In [None]:
tfidf_matrix.toarray() # the scores at the intersection of documents and terms.

<a id = "14"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Movie Recommendation</div>

In [None]:
similarity_scores = pd.DataFrame(similarity[index],
                                 columns=["similarity"])

In [None]:
movie_indices = similarity_scores.sort_values("similarity", ascending=False)[1:11].index

In [None]:
movies['title'].iloc[movie_indices]

In [None]:
movies[movies["movieId"] == 8844]

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* We recommended 10 similar films to Jumanji. Recommendations were made based on the internal features of the films (actors, director, description).

<a id="15"></a>
<h1 style="border-radius: 10px; border: 2px solid #6B8E23; background-color: #F5F5DC; font-family: 'Pacifico', cursive; font-size: 200%; text-align: center; border-radius: 15px 50px; padding: 15px; box-shadow: 5px 5px 5px #556B2F; color: #556B2F;">🫂 Collaborative Filtering 🫂</h1>

**What is Collaborative Filtering?**

Collaborative filtering filters information by using the interactions and data collected by the system from other users. It’s based on the idea that people who agreed in their evaluation of certain items are likely to agree again in the future.

The concept is simple: when we want to find a new movie to watch we’ll often ask our friends for recommendations. Naturally, we have greater trust in the recommendations from friends who share tastes similar to our own.

Most collaborative filtering systems apply the so-called similarity index-based technique. In the neighborhood-based approach, a number of users are selected based on their similarity to the active user. Inference for the active user is made by calculating a weighted average of the ratings of the selected users.

Collaborative-filtering systems focus on the relationship between users and items. The similarity of items is determined by the similarity of the ratings of those items by the users who have rated both items.

**There are three classes of Collaborative Filtering:**

* User-based, which measures the similarity between target users and other users.
* Item-based, which measures the similarity between the items that target users rate or interact with and other items.
* Model-Based is a model-based approach used in recommendation systems. It generates personalized recommendations by calculating the similarities of users or items. This approach works based on users' past behaviors or item features.
* Also, Item-Based and User-Based filtering types are subcategories of the Memory-based collaborative approach.

**What are some of the challenges to be faced while using Collaborative Filtering?**

As we know that every algorithm has its pros and cons and so is the case with Collaborative Filtering Algorithms. Collaborative Filtering algorithms are very dynamic and can change as well as adapt to the changes in user preferences with time. But one of the main issues which are faced by recommender systems is that of scalability because as the user base increases then the respective sizes for the computation and the data storage space all increase manifold which leads to slow and inaccurate results. 

Also, collaborative filtering algorithms fail to recommend a diversity of products as it is based on historical data and hence provide recommendations related to them as well.

<center><img src="https://i.imgur.com/Z6O8yW1.png" width="900" height="900"></center>

<a id = "16"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨User-Based Collaborative Filtering✨</p>

User-based collaborative filtering makes recommendations based on user-product interactions in the past. The assumption behind the algorithm is that similar users like similar products.

User-based collaborative filtering algorithm usually has the following steps:

1. Find similar users based on interactions with common items.
1. Identify the items rated high by similar users but have not been exposed to the active user of interest.
1. Calculate the weighted average score for each item.
1. Rank items based on the score and pick the top n items to recommend.

<center><img src="https://i.imgur.com/nIbVPUK.png" width="500" height="500"></center>

This graph illustrates how user-based collaborative filtering works using a simplified example.

* Ms. Blond likes apples. Ms. Black likes watermelon and pineapple. Ms. Purple likes watermelon and grapes.
* Because Ms. Black and Ms. Purple like the same fruit, watermelon, they are similar users.
* Since Ms. Black likes pineapple and Ms. Purple has not been exposed to pineapple yet, the recommendation system recommends pineapple to Ms. purple.

<a id = "17"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨User-Based Recommendation System✨</p>

<a id = "18"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Preparing Data</div>

In [None]:
import pandas as pd

import warnings
warnings.filterwarnings("ignore")

In [None]:
rating = pd.read_csv('/kaggle/input/the-movies-dataset/ratings_small.csv')

In [None]:
rating["date"] = pd.to_datetime(rating["timestamp"],unit="s")

In [None]:
rating = rating.drop("timestamp",axis=1)

In [None]:
df = pd.merge(movies,rating, how="inner", on="movieId")

In [None]:
df.head()

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>👀 Features: </font></h3>

* **movieId:** unique movie identifier (UniqueID)
* **title:** movie title
* **userId:** unique user identifier
* **rating:** rating given to the movie by the user
* **timestamp:** date of the review

<a id = "19"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Creating User-Movie DataFrame</div>

In [None]:
df.shape # 44823 rating

In [None]:
df["title"].nunique() # 2772 unique movie

In [None]:
values_pd = df["title"].value_counts() # the number of comments for each movie

values_pd

In [None]:
rare_movies = values_pd[values_pd < 5].index

rare_movies

In [None]:
df_ = df[~df["title"].isin(rare_movies)] # removing those with fewer than 5 ratings.

In [None]:
df_.head()

In [None]:
user_title_df = df_.groupby(["userId","title"])["rating"].mean().unstack().notnull()

In [None]:
user_title_df.shape # 671 user_id , 1343 movies

In [None]:
user_title_df.head()

In [None]:
user_title_df.columns

In [None]:
sample_guy = user_title_df.sample(1,random_state=45).index[0] # taking a random sample

In [None]:
random_user_df = user_title_df[user_title_df.index == sample_guy] # observation units belonging to the sample.

In [None]:
movies_watched = random_user_df.dropna(axis=1).columns.tolist() # the movies that the sample has voted for

In [None]:
movies_watched_df = user_title_df[movies_watched]

In [None]:
user_movie_count = movies_watched_df.notnull().sum(axis=1) # the number of movies each user has watched in the sample

In [None]:
user_movie_count.max()

In [None]:
users_same_movies = user_movie_count[user_movie_count > (movies_watched_df.shape[1] * 60 ) / 100].index # people who watched more than 60% of the movies that the sample watched

users_same_movies

<a id = "20"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Determination of Similarity</div>

In [None]:
filted_df = movies_watched_df[movies_watched_df.index.isin(users_same_movies)]

filted_df

In [None]:
corr_df = filted_df.T.corr().unstack().drop_duplicates() # the correlations between users

In [None]:
corr_df.sort_values(ascending=False).head(20)

In [None]:
corr_df

In [None]:
corr_df.loc[(1, 3)]

In [None]:
corr_df[sample_guy].sort_values(ascending=False)

<a id = "21"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Score Calculation</div>

In [None]:
top_users = pd.DataFrame(corr_df[sample_guy][corr_df[sample_guy] > 0.10], columns=["corr"])

top_users

In [None]:
top_users_ratings = pd.merge(top_users, rating[["userId", "movieId", "rating"]], how='inner', on="userId")

top_users_ratings

In [None]:
top_users_ratings['weighted_rating'] = top_users_ratings['corr'] * top_users_ratings['rating']

In [None]:
recommendation_df = top_users_ratings.pivot_table(values="weighted_rating", index="movieId", aggfunc="mean")

recommendation_df

In [None]:
recommendation_df.sort_values(by= "weighted_rating" , ascending=False).head(20)

In [None]:
movies_to_be_recommend = recommendation_df[recommendation_df["weighted_rating"] > 0.7].sort_values(by="weighted_rating", ascending=False).head(10)

In [None]:
movies["title"][movies["movieId"].isin(movies_to_be_recommend.index)]

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🗨️ Comment: </b></font></h3>

* We used the "ratings_small" dataset due to excessive RAM usage, which has fewer observation units. You can choose the larger dataset for more consistent results. The aim was to discuss how the process proceeded in this regard.

<a id = "22"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Item-Based Collaborative Filtering✨</p>

In item-based filtering, new recommendations are selected based on the old interactions of the target user. First, all the items that the user has already liked are considered. Then, similar products are computed and clusters are made (nearest neighbors). New items from these clusters are suggested to the user.

* How Item-Based Collaborative Filtering works:

    * 1. First, a data matrix is created that represents how each item is rated or preferred by users. This matrix represents users as rows and items as columns, with user ratings or preferences stored within it.

    * 2. Next, similarity scores between each item and other items are calculated. These similarity scores measure the relationships between items. Common similarity metrics used include cosine similarity or Pearson correlation coefficient.

    * 3. When a user selects an item, Item-Based Collaborative Filtering provides recommendations based on similar items to the chosen item. Recommended items are ranked based on their similarity scores with the selected item, and the ones with the highest similarity scores are suggested to the user.

One of the advantages of Item-Based Collaborative Filtering is its ability to provide personalized recommendations based on users' past preferences. However, when working with large datasets, the computational costs can increase, and it may encounter some challenges, such as the cold start problem.

This method is a popular recommendation system approach, especially in areas like online shopping websites, music streaming platforms, and content sharing websites.

<center><img src="https://i.imgur.com/aT4IveL.png" width="700" height="700"></center>

<a id = "23"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Item-Based Recommender System✨</p>

In [None]:
df["movieId"].nunique()

In [None]:
user_movie_df = df.groupby(["userId","movieId"])["rating"].mean().unstack().notnull()

In [None]:
user_movie_df.head()

In [None]:
sample_movie = user_movie_df.sample(1,random_state=45).index[0]

sample_movie

In [None]:
filtered = user_movie_df[sample_movie]

In [None]:
user_movie_df_wo = user_movie_df.drop(sample_movie,axis=1)

In [None]:
movies_similarity = user_movie_df_wo.corrwith(filtered)

In [None]:
movies_similarity.sort_values(ascending=False).head(20)

In [None]:
movies_similarity = movies_similarity.sort_values(ascending=False).reset_index()
movies_similarity.columns = ["movieId","movies_similarity"]

In [None]:
movies_similarity.head()

In [None]:
filtered_movies = df[df['movieId'].isin([160, 172, 435, 173, 316])]

filtered_movies

In [None]:
filtered_movies['title'].value_counts()

<a id = "24"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Model-Based Collaborative Filtering (Matrix Factorization)✨</p>

Model-Based Collaborative Filtering, also known as Matrix Factorization, is an approach in recommendation systems. Recommendation systems are used to predict the preferences of users or items and provide personalized recommendations to users. Model-Based Collaborative Filtering generates recommendations by utilizing user's past preferences and the preferences of similar users.

Matrix Factorization is essentially a technique that decomposes a user-item matrix (a matrix containing user preferences and item features) into two smaller matrices. These two smaller matrices represent the latent features of users and items. Matrix Factorization involves the decomposition of these matrices to estimate these latent features.

For instance, consider a matrix that represents how users have rated items on an e-commerce site. The rows of this matrix represent users, and the columns represent items. However, this matrix might be incomplete because not every user has rated every item. Model-Based Collaborative Filtering employs Matrix Factorization to fill in these gaps and make recommendations.

Matrix Factorization learns the latent features of users and items by predicting the preference of each user for each item. These predictions are then used to make recommendations to users. Matrix Factorization is an effective method, particularly for collaborative filtering on large datasets and is used by many companies, such as Netflix, Amazon, and other major online platforms.

<center><img src="https://i.imgur.com/w4HggQL.png" width="600" height="600"></center>

<center><img src="https://i.imgur.com/97A2GqR.png" width="700" height="700"></center>

<div style="border-radius:10px; border:#65647C solid; padding: 15px; background-color: #F8EDE3; font-size:100%; text-align:left">

<h3 align="left"><font color='#7D6E83'><b>🤔 Inferences: </b></font></h3>

* Default weights for latent features, assumed to exist for users and movies, are determined from existing data, and predictions are made for missing observations using these weights.
* The User-Item matrix is decomposed into two smaller matrices.
* We assume that the transition from the two matrices to the User-Item matrix is governed by latent factors.
* We find the weights of latent factors based on observed values.
* The found weights are used to fill in missing observations.
* It is assumed that the Rating matrix is created by the dot product of the two factor matrices.
* Factor matrices represent latent factors or variables, including user latent factors and movie latent factors.
* Users and movies are believed to have scores for latent features, such as film genres like comedy, horror, adventure, the presence of specific actors, directors, screenwriters, etc. For example, in a comedy movie, the weight of the comedy factor could be 5, while the weight of the horror factor might be 0.
* Existing values are used to iteratively find all p and q values, which are then utilized.
* Initially, with random p and q values, an attempt is made to predict values in the rating matrix.
* In each iteration, erroneous predictions are adjusted to approximate the values in the rating matrix.
* For example, if a 5 was initially predicted, and it should have been a 3, the next prediction may be a 4, and so on.
* This way, after a certain number of iterations, the p and q matrices are filled.
* Predictions for missing observations are made based on the existing p and q values.

<a id = "25"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Gradient Descent✨</p>

Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable function. Gradient descent in machine learning is simply used to find the values of a function's parameters (coefficients) that minimize a cost function as far as possible.

You start by defining the initial parameter’s values and from there the gradient descent algorithm uses calculus to iteratively adjust the values so they minimize the given cost-function. To understand this concept fully, it’s important to know about gradients. 

**What Is a Gradient?**

> "A gradient measures how much the output of a function changes if you change the inputs a little bit." — Lex Fridman (MIT)

A gradient simply measures the change in all weights with regard to the change in error. You can also think of a gradient as the slope of a function. The higher the gradient, the steeper the slope and the faster a model can learn. But if the slope is zero, the model stops learning. In mathematical terms, a gradient is a partial derivative with respect to its inputs.

Imagine a blindfolded man who wants to climb to the top of a hill with the fewest steps along the way as possible. He might start climbing the hill by taking really big steps in the steepest direction, which he can do as long as he is not close to the top. As he comes closer to the top, however, his steps will get smaller and smaller to avoid overshooting it. This process can be described mathematically using the gradient.

**How Gradient Descent Works**

Instead of climbing up a hill, think of gradient descent as hiking down to the bottom of a valley. This is a better analogy because it is a minimization algorithm that minimizes a given function.

The equation below describes what the gradient descent algorithm does: b is the next position of our climber, while a represents his current position. The minus sign refers to the minimization part of the gradient descent algorithm. The gamma in the middle is a waiting factor and the gradient term ( Δf(a) ) is simply the direction of the steepest descent.

<center><img src="https://i.imgur.com/t3pKnKd.png" width="500" height="500"></center>

So this formula basically tells us the next position we need to go, which is the direction of the steepest descent. Let’s look at another example to really drive the concept home. 

Imagine you have a machine learning problem and want to train your algorithm with gradient descent to minimize your cost-function J(w, b) and reach its local minimum by tweaking its parameters (w and b). The image below shows the horizontal axes representing the parameters (w and b), while the cost function J(w, b) is represented on the vertical axes. Gradient descent is a convex function.

<center><img src="https://i.imgur.com/rL8zdCj.png" width="500" height="500"></center>

We know we want to find the values of w and b that correspond to the minimum of the cost function (marked with the red arrow). To start finding the right values we initialize w and b with some random numbers. Gradient descent then starts at that point (somewhere around the top of our illustration), and it takes one step after another in the steepest downside direction (i.e., from the top to the bottom of the illustration) until it reaches the point where the cost function is as small as possible.

**Gradient Descent Learning Rate**

How big the steps gradient descent takes into the direction of the local minimum are determined by the learning rate, which figures out how fast or slow we will move towards the optimal weights.

For the gradient descent algorithm to reach the local minimum we must set the learning rate to an appropriate value, which is neither too low nor too high. This is important because if the steps it takes are too big, it may not reach the local minimum because it bounces back and forth between the convex function of gradient descent (see left image below). If we set the learning rate to a very small value, gradient descent will eventually reach the local minimum but that may take a while (see the right image).

<center><img src="https://i.imgur.com/4uVBmz9.png" width="500" height="500"></center>

So, the learning rate should never be too high or too low for this reason. You can check if your learning rate is doing well by plotting it on a graph.

<a id = "26"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Model-Based Recommender System✨</p>

<a id = "27"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Data Preparing</div>

In [None]:
# !pip install surprise
# conda install -c conda-forge scikit-surprise to pycharm

import pandas as pd
from surprise import Reader, SVD, Dataset, accuracy
from surprise.model_selection import GridSearchCV, train_test_split, cross_validate
# pd.set_option('display.max_columns', None)

In [None]:
movie = pd.read_csv('/kaggle/input/movielens-20m-dataset/movie.csv')
rating = pd.read_csv('/kaggle/input/movielens-20m-dataset/rating.csv')
df = pd.merge(movies,rating, how="inner", on="movieId")
df.head()

In [None]:
movie_ids = [130219, 356, 4422, 541]

movies = ["The Dark Knight (2011)",
          "Cries and Whispers (Viskningar och rop) (1972)",
          "Forrest Gump (1994)",
          "Blade Runner (1982)"]

In [None]:
sample_df = df[df.movieId.isin(movie_ids)]

sample_df.head()

In [None]:
sample_df.shape

In [None]:
user_movie_df = sample_df.pivot_table(index=["userId"],
                                      columns=["title"],
                                      values="rating")

# Rows = users, Columns = movies

In [None]:
user_movie_df.shape

In [None]:
reader = Reader(rating_scale=(1, 5))

In [None]:
data = Dataset.load_from_df(sample_df[['userId',
                                       'movieId',
                                       'rating']], reader)

<a id = "28"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Modelling</div>

In [None]:
trainset, testset = train_test_split(data, test_size=.25)
svd_model = SVD()
svd_model.fit(trainset)
predictions = svd_model.test(testset)

In [None]:
accuracy.rmse(predictions)

In [None]:
svd_model.predict(uid=1.0, iid=541, verbose=True)

In [None]:
svd_model.predict(uid=1.0, iid=356, verbose=True)

In [None]:
sample_df[sample_df["userId"] == 1]

<a id = "29"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Model Tuning</div>

In [None]:
param_grid = {'n_epochs': [5, 10, 20],
              'lr_all': [0.002, 0.005, 0.007]}

In [None]:
gs = GridSearchCV(SVD,
                  param_grid,
                  measures=['rmse', 'mae'],
                  cv=3,
                  n_jobs=-1,
                  joblib_verbose=True)

In [None]:
gs.fit(data)

In [None]:
gs.best_score['rmse']

In [None]:
gs.best_params['rmse']

<a id = "30"></a><br>
<div style="font-family:JetBrains Mono; font-weight:bold; letter-spacing: 2px; color:#E5788F; font-size:150%; text-align:left; padding: 0px;">Predict</div>

In [None]:
# dir(svd_model)

In [None]:
svd_model.n_epochs

In [None]:
svd_model = SVD(**gs.best_params['rmse'])

In [None]:
data = data.build_full_trainset()

In [None]:
svd_model.fit(data)

In [None]:
svd_model.predict(uid=1.0, iid=541, verbose=True)

-----

In [None]:
def suggest(df,user_id,sug):
    
    didnt_watch = df["movieId"][~(df["userId"] == user_id)].drop_duplicates().values.tolist()
    temp_dict={}
    
    for i in didnt_watch:
        
        temp_dict[i] = svd_model.predict(uid=user_id, iid=i)[3]
        
    suggestions = pd.DataFrame(temp_dict.items(),columns=["movieId",'possible_rate']).sort_values(by="possible_rate", ascending=False).head(sug)
    merged = pd.merge(suggestions,movie[["movieId","title"]], how="inner", on="movieId")
    
    return merged

In [None]:
suggest(df,21,15).sort_values(by="title", ascending=False)

----

<div style="border-radius: 10px; border: #6B8E23 solid; padding: 15px; background-color: #F5F5DC; font-size: 100%; text-align: left">

<h3 align="left"><font color='#556B2F'>⚠️ Critical Challenges of Recommendation Engines ⚠️ </font></h3>
    
1. **Significant investments required:** Recommendation engines are a costly investment both financially and in terms of time. They require expertise, resources, and continuous maintenance.

2. **Too many choices:** There are many different recommendation engine solutions available, making it difficult to choose the right one for your business. You need to consider factors such as your business model, the type of products or services you offer, and your target audience.

3. **Complex onboarding process:** Implementing a recommendation engine can be complex and time-consuming. You need to integrate the engine with your existing systems, train it on your data, and make sure it is integrated into your user experience.

4. **Lack of data analytics capability:** Recommendation engines rely on data, and if you don't have high-quality data or can't analyze it properly, the engine won't be effective. You need to make sure you have the right data infrastructure and analytics skills in place.

5. **The 'cold start' problem:** When a new user or product is added to the system, it can be difficult for the algorithm to make accurate recommendations. Deep learning models can help to address this problem by analyzing context and making correlations between customers and products.

6. **Inability to capture changes in user behavior:** Consumers' preferences and behaviors change over time, and recommendation engines need to be able to adapt to these changes. You need to use a model that can learn and adapt in real-time.

7. **Privacy concerns:** Recommendation engines collect a lot of data about users, and some users are hesitant to share this data. You need to address these concerns by being transparent about how you use data and building trust with your users.

-----

<div style="border-radius:10px; border:#D0C2F0 solid; padding: 15px; background-color: #F8E8EE; font-size:100%; text-align:left">

<h3 align="left"><font color='#5E5273'>👻 Analysis Results: </font></h3>
    
This Kaggle notebook is designed as a guide covering the fundamental approaches and methods of recommendation systems. We begin with the "🍔 Association Rule Learning 🍟" section, starting with the Apriori algorithm, to construct an association rule-based recommendation system. This method provides a useful approach to offer product recommendations based on users' previous shopping histories.

Next, in the "🧾 Content-Based Filtering 🧾" section, we explore content-based recommendation systems. This method offers recommendations to users based on the characteristics of products. By using text mining techniques such as Count Vector and TF-IDF, we implement this approach to provide tailored recommendations that align with users' interests.

In the "🫂 Collaborative Filtering 🫂" section, we delve into collaborative filtering methods. We examine both user-based and item-based collaborative filtering approaches. Additionally, we employ data mining and modeling techniques to offer more personalized recommendations using model-based collaborative filtering.

In conclusion, this guide serves as a comprehensive resource to help you understand different aspects and applications of recommendation systems. Recommendation systems can be utilized in various fields such as e-commerce, content platforms, and more, serving as a powerful tool to enhance user experiences. This guide provides fundamental knowledge to help you understand suitable methods for different business scenarios. Best of luck, and keep refining your recommendation systems!

<a id = "31"></a><br>
<p style="font-family: 'Pacifico', cursive; font-weight: bold; letter-spacing: 2px; color: #556B2F; font-size: 160%; text-align: left; padding: 0px; border-bottom: 3px solid">✨Sources✨</p>

* Association Rule Learning;
    * https://towardsdatascience.com/a-guide-to-association-rule-mining-96c42968ba6
    * https://tutorialforbeginner.com/association-rule-learning-in-machine-learning
    * https://www.javatpoint.com/association-rule-learning
    * https://data-flair.training/blogs/data-science-r-movie-recommendation/
    * https://www.softwaretestinghelp.com/apriori-algorithm/
    * https://www.educative.io/answers/what-is-the-apriori-algorithm
    * https://bicorner.com/2015/07/22/what-the-heck-are-association-rules-in-analytics/
* Content-Based Filtering;
    * https://www.turing.com/kb/content-based-filtering-in-recommender-systems#content-based-filtering
    * https://medium.com/mlearning-ai/recommendation-systems-content-based-filtering-e19e3b0a309e
    * https://www.capitalone.com/tech/machine-learning/understanding-tf-idf/
    * https://miuul.com/
* Collaborative Filtering;
    * https://builtin.com/data-science/collaborative-filtering-recommender-system
    * https://www.iteratorshq.com/
    * https://datasciencedojo.com/blog/social-media-recommendation-system/#
    * https://miuul.com/
    * https://builtin.com/data-science/gradient-descent
    * https://medium.com/grabngoinfo/recommendation-system-user-based-collaborative-filtering-a2e76e3e15c4
    * https://www.appier.com/en/blog/7-critical-challenges-of-recommendation-engines