### Content-Based Filtering for Course Recommendation using TF-IDF Vectorization and Cosine Similarity

In [1]:
import pandas as pd

In [2]:
data = pd.read_csv('../dataset/final_courses_data.csv')

In [3]:
data

Unnamed: 0,title,description,level,rating,reviews_num,link,image,combined_features
0,(ISC)² Systems Security Certified Practitioner...,Pursue better IT security job opportunities an...,0,4.7,492.0,https://www.coursera.org/specializations/sscp-...,-,(ISC)² Systems Security Certified Practitioner...
1,.NET FullStack Developer,Develop the proficiency required to design and...,1,4.3,51.0,https://www.coursera.org/specializations/dot-n...,-,.NET FullStack Developer Develop the proficien...
2,21st Century Energy Transition: how do we make...,"Affordable, abundant and reliable energy is fu...",0,4.8,62.0,https://www.coursera.org/learn/21st-century-en...,-,21st Century Energy Transition: how do we make...
3,A Crash Course in Causality: Inferring Causal...,We have all heard the phrase “correlation does...,1,4.7,517.0,https://www.coursera.org/learn/crash-course-in...,-,A Crash Course in Causality: Inferring Causal...
4,AI Applications in Marketing and Finance,"In this course, you will learn about AI-powere...",3,4.7,140.0,https://www.coursera.org/learn/wharton-ai-appl...,-,AI Applications in Marketing and Finance In th...
...,...,...,...,...,...,...,...,...
5931,Game Theory Algorithms in Competitive Programm...,"Dive deep into game theory algorithms, learn &...",3,-,124 reviews,https://www.udemy.com/course/game-theory-algor...,https://img-b.udemycdn.com/course/240x135/3878...,Game Theory Algorithms in Competitive Programm...
5932,"Siemens WinCC SCADA Programming, SCADA1 ( Basic )",This course is a great push for any one who wa...,2,-,124 reviews,https://www.udemy.com/course/siemens-wincc-scada/,https://img-b.udemycdn.com/course/240x135/2858...,"Siemens WinCC SCADA Programming, SCADA1 ( Basi..."
5933,Python Object Oriented Programming (OOP): Begi...,Deep OOP Foundations From Absolute Scratch,3,-,124 reviews,https://www.udemy.com/course/object-oriented-p...,https://img-b.udemycdn.com/course/240x135/4450...,Python Object Oriented Programming (OOP): Begi...
5934,jQuery Basics Guide,Everything you need to know to Build a Retirem...,0,-,124 reviews,https://www.udemy.com/course/learn-basic-jquery/,https://img-b.udemycdn.com/course/240x135/2554...,jQuery Basics Guide Everything you need to kno...


In [4]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [5]:
vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = vectorizer.fit_transform(data['combined_features'])

In [6]:
# def recommend_items(item_title, data, tfidf_matrix):
#     if item_title not in data['title'].values:
#         return f"Item '{item_title}' tidak ditemukan dalam dataset."
#     # index dr item yg diminta
#     idx = data.index[data['title'] == item_title].tolist()[0]
#     # itung cosine similarity antara item yg diminta sm semua item yg lain
#     cosine_sim = cosine_similarity(tfidf_matrix[idx], tfidf_matrix).flatten()
#     # index dr item yg paling mirip
#     similar_indices = cosine_sim.argsort()[:-6:-1]
#     similar_indices = similar_indices[similar_indices != idx]
#     # judul item yg direkomendasiin
#     recommendations = data.iloc[similar_indices]['title'].tolist()
#     return recommendations

In [7]:
def recommend_search_items(item, vectorizer, tfidf_matrix, data):
    if item not in data['title'].values:
        print("There is no course with that name.")
    print(f"Here are some recommendations for '{item}':")
    # vectorize item
    item_vec = vectorizer.transform([item])
    # compute similarity scores
    cosine_sim = cosine_similarity(item_vec, tfidf_matrix).flatten()
    # rank results
    ranked_indices = cosine_sim.argsort()[:-6:-1]
    # retrieve results
    search_results = data.iloc[ranked_indices]['title'].tolist()
    return search_results

In [8]:
# recommendation and search testing

# item_to_test = 'Introduction to Machine Learning'
# recommended_items = recommend_items(item_to_test, data, tfidf_matrix)
# print(f"Rekomendasi untuk '{item_to_test}': {recommended_items}")

test = input("input item name:")
print('Item Name:', test)
print()
search_results = recommend_search_items(test, vectorizer, tfidf_matrix, data)
print(search_results)

Item Name: introduction to machine learning

There is no course with that name.
Here are some recommendations for 'introduction to machine learning':
['Machine Learning In The Cloud With Azure Machine Learning', 'Machine Learning Guide: Learn Machine Learning Algorithms', "Machine Learning : A Beginner's Basic Introduction", '2022 Machine Learning A to Z : 5 Machine Learning Projects', 'Practical Introduction to Machine Learning with Python']


In [9]:
import pickle

data.to_pickle('../pickle/courses_data.pkl')

with open('../pickle/vectorizer.pkl', 'wb') as f:
    pickle.dump(vectorizer, f)

with open('../pickle/tfidf_matrix.pkl', 'wb') as f:
    pickle.dump(tfidf_matrix, f)