# Course Recommendation System Model Building

**Popularity Based Technique**
*A popularity-based recommendation system is essential because it provides a simple and effective way to recommend items when there is little to no user data available `(cold-start problem)`. It suggests the most popular or frequently interacted items (e.g., most watched movies or top-selling products), making it ideal for new users or as a baseline model to compare with more complex systems. It also ensures diversity and general appeal, helping quickly surface trending or universally liked content.*

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings 
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

In [3]:
df = pd.read_csv('Datasets/online_courses_updated.csv')

In [88]:
df.shape

(100000, 16)

In [25]:
df.drop(columns='Unnamed: 0', inplace=True)
df.head()

Unnamed: 0,user_id,course_id,course_name,instructor,course_duration_hours,certification_offered,difficulty_level,rating,enrollment_numbers,course_price,feedback_score,study_material_available,time_spent_hours,previous_courses_taken,course_images,instructor_images
0,15796,9366,Python for Beginners,Emma Harris,39.1,Yes,Beginner,5.0,21600,317.5,0.797,Yes,17.6,4,https://images.unsplash.com/photo-152637909509...,https://images.pexels.com/photos/712521/pexels...
1,861,1928,Cybersecurity for Professionals,Alexander Young,36.3,Yes,Beginner,4.3,15379,40.99,0.77,Yes,28.97,9,https://images.pexels.com/photos/577585/pexels...,https://images.unsplash.com/photo-150064876779...
2,38159,9541,DevOps and Continuous Deployment,Dr. Mia Walker,13.4,Yes,Beginner,3.9,6431,380.81,0.772,Yes,52.44,4,https://images.pexels.com/photos/270404/pexels...,https://images.pexels.com/photos/733872/pexels...
3,44733,3708,Project Management Fundamentals,Benjamin Lewis,58.3,Yes,Beginner,3.1,48245,342.8,0.969,No,22.29,6,https://images.unsplash.com/photo-157316471371...,https://images.unsplash.com/photo-151908536075...
4,11285,3361,Ethical Hacking Masterclass,Daniel White,30.8,Yes,Beginner,2.8,34556,381.01,0.555,Yes,22.01,5,https://images.unsplash.com/photo-156398676860...,https://images.pexels.com/photos/2379004/pexel...


In [26]:
## checking for any null values
df.isnull().sum()

user_id                     0
course_id                   0
course_name                 0
instructor                  0
course_duration_hours       0
certification_offered       0
difficulty_level            0
rating                      0
enrollment_numbers          0
course_price                0
feedback_score              0
study_material_available    0
time_spent_hours            0
previous_courses_taken      0
course_images               0
instructor_images           0
dtype: int64

**We will consider only those course with respect to the instructor, whose `enrollment_numbers` have exceeded a certain threshold** <br>
*And we will take threshold as `80%` of the maximum enrollments* <br>
*that means we will only take those courses, which have enrollments more that or equal to the threshold*

In [79]:
max_enrollments, min_enrollments = df['enrollment_numbers'].max(), df['enrollment_numbers'].min()
max_enrollments, min_enrollments

(49999, 50)

In [80]:
threshold_score = max_enrollments * 0.80
threshold_score

39999.200000000004

In [81]:
threshold_df = df[df['enrollment_numbers']>=threshold_score]

In [82]:
threshold_df.head()

Unnamed: 0,user_id,course_id,course_name,instructor,course_duration_hours,certification_offered,difficulty_level,rating,enrollment_numbers,course_price,feedback_score,study_material_available,time_spent_hours,previous_courses_taken,course_images,instructor_images
3,44733,3708,Project Management Fundamentals,Benjamin Lewis,58.3,Yes,Beginner,3.1,48245,342.8,0.969,No,22.29,6,https://images.unsplash.com/photo-157316471371...,https://images.unsplash.com/photo-151908536075...
6,16851,7887,Networking and System Administration,Dr. Robert Davis,44.9,Yes,Beginner,4.9,41050,389.32,0.893,Yes,15.66,3,https://images.unsplash.com/photo-157316471398...,https://images.unsplash.com/photo-1545167622-3...
14,770,534,Photography and Video Editing,Daniel White,74.0,Yes,Advanced,4.1,40437,388.7,0.62,Yes,14.13,3,https://images.unsplash.com/photo-151603506937...,https://images.pexels.com/photos/2379004/pexel...
16,5312,3455,Python for Beginners,Charlotte King,11.1,Yes,Beginner,4.6,43655,426.0,0.966,Yes,22.8,5,https://images.unsplash.com/photo-152637909509...,https://images.pexels.com/photos/774909/pexels...
22,6397,1759,Fitness and Nutrition Coaching,Prof. Emily Johnson,88.5,No,Beginner,3.6,44312,178.6,0.598,No,14.45,1,https://images.unsplash.com/photo-157101961345...,https://images.pexels.com/photos/38554/girl-pe...


In [84]:
threshold_df.shape

(20041, 16)

**We cannot directly get the popularity with respect to number of course bought, we also need to find the average rating with respect to `Course` and `Instructor`, so we need to apply a groupby method.**

In [85]:
threshold_df['average_rating'] = threshold_df.groupby(['course_name','instructor'])['rating'].transform('mean')
threshold_df.drop_duplicates(subset=['course_name'], inplace=True)
popularity_df = threshold_df.sort_values('average_rating', ascending=False).reset_index().head(25)

In [64]:
popularity_df.drop(columns=['index'], inplace=True)

In [73]:
popularity_columns_needed = ['course_name', 'instructor', 'course_duration_hours', 'certification_offered', 'difficulty_level', 'course_price', 'study_material_available', 'course_images', 'instructor_images', 'average_rating']

In [74]:
popularity_df = popularity_df[popularity_columns_needed]

In [75]:
popularity_df

Unnamed: 0,course_name,instructor,course_duration_hours,certification_offered,difficulty_level,course_price,study_material_available,course_images,instructor_images,average_rating
0,Advanced Machine Learning,Liam Adams,24.9,No,Intermediate,292.79,Yes,https://images.unsplash.com/photo-162071294354...,https://images.pexels.com/photos/220453/pexels...,4.118182
1,Networking and System Administration,Dr. Robert Davis,44.9,Yes,Beginner,389.32,Yes,https://images.unsplash.com/photo-157316471398...,https://images.unsplash.com/photo-1545167622-3...,4.104167
2,Graphic Design with Canva,Jessica Martinez,11.0,No,Advanced,300.29,Yes,https://images.unsplash.com/photo-1547658719-d...,https://images.pexels.com/photos/1181686/pexel...,4.013725
3,Personal Finance and Wealth Building,Olivia Taylor,81.2,Yes,Advanced,397.19,Yes,https://images.unsplash.com/photo-1554224155-6...,https://images.pexels.com/photos/1326946/pexel...,4.004878
4,Fitness and Nutrition Coaching,Prof. Emily Johnson,88.5,No,Beginner,178.6,No,https://images.unsplash.com/photo-157101961345...,https://images.pexels.com/photos/38554/girl-pe...,4.001818
5,Data Visualization with Tableau,Dr. John Smith,80.9,Yes,Intermediate,304.65,Yes,https://images.pexels.com/photos/265087/pexels...,https://images.pexels.com/photos/428333/pexels...,3.997872
6,Python for Beginners,Charlotte King,11.1,Yes,Beginner,426.0,Yes,https://images.unsplash.com/photo-152637909509...,https://images.pexels.com/photos/774909/pexels...,3.991071
7,DevOps and Continuous Deployment,Michael Brown,24.4,Yes,Beginner,39.57,Yes,https://images.pexels.com/photos/270404/pexels...,https://images.unsplash.com/photo-1560250097-0...,3.981132
8,Cloud Computing Essentials,Olivia Taylor,63.4,No,Beginner,389.44,Yes,https://images.pexels.com/photos/19867468/pexe...,https://images.pexels.com/photos/1326946/pexel...,3.97561
9,Blockchain and Decentralized Applications,Emma Harris,28.3,Yes,Intermediate,32.89,Yes,https://images.pexels.com/photos/14902678/pexe...,https://images.pexels.com/photos/712521/pexels...,3.969388


In [71]:
popularity_df.shape

(20, 10)