## This notebook:
- Recommend a course based on the course description
1.    
    - Topic modeling on course descriptions
    - Take a course title (e.g. AAS 103)
    - Take the course description of this course
    - Process text of the description
    - Find the topics of this block of text
    
2.    
    - Process text of the descriptions of all other courses
    - Find the topics of those blocks of text
    
3.    
    - Match the topic of the input course and other courses
    - Compute similarity scores
    - Rank these scores from high to low
    - Return the n number of recommendations needed (num_of_rec) by order of similarity
    
- Need to improve

    - Text processing
    - Topic modeling (the recommendations are not quite logical yet since the text processing and topic modeling are not quite well-done yet)
    - Efficiency of the algorithm (slow now)

## Update
1. Because the courses already have distinct clusters such as academic groups (LSI, engineering, dentistry ...) and subject (Afroamerican sections, etc), it makes more sense that we recommend courses in the same academic group and subject.


2. Somehow, processing the texts (stopwords removal, lemma, etc) produce poorer recommendations. The results look much better without the language processing. 


3. About topic modeling -- I'm not sure how we could utilize topic modeling, since the total number of academic group is about 20 so if we cluster the courses with topic modeling, it's not going to work very well unless we use a large number of cluster like 200 - 500. We could try topic modeling in the subject level, but I think count vec and tfidf vec works pretty well, so not sure if that would be necessary. 

### Paper that helps: https://www.frontiersin.org/articles/10.3389/frai.2020.00042/full

In [1]:
import pandas as pd
import neattext.functions as nfx

In [2]:
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity, linear_kernel, sigmoid_kernel

In [3]:
# Load our dataset
f_21 = pd.read_csv('assets/f_21_merge.csv')
w_22 = pd.read_csv('assets/w_22_merge.csv')

## Cosine similarity

In [43]:
df, course_title, num_of_rec = f_21,'AAS 103', 10

def make_recommendation_cos(df, course_title, num_of_rec = 10):
    df = df.fillna('').drop_duplicates(subset=['course']).reset_index().drop(columns='index')


    input_ag = df.loc[df['course'] == course_title, 'Acad Group'].unique()
    input_sub = df.loc[df['course'] == course_title, 'Subject'].unique()
    input_course = df.loc[df['course'] == course_title, 'Course Title'].unique()

    df = df[df['Acad Group'].isin(input_ag)] 
    df = df[(df['Subject'].isin(input_sub)) | (df['Course Title'].isin(input_course))]

    # Vectorize our Text
    count_vect = CountVectorizer()
    cv_mat = count_vect.fit_transform(df['description'])

    df_cv_words = pd.DataFrame(cv_mat.todense(), columns=count_vect.get_feature_names())

    # Cosine Similarity Matrix
    cosine_sim_mat = cosine_similarity(cv_mat)

    # Get Course ID/Index
    course_indices = pd.Series(df.index, index=df['course'])

    # ID for title
    idx = course_indices[course_title]

    # Course Indice
    # Search inside cosine_sim_mat
    scores = list(enumerate(cosine_sim_mat[idx]))

    # Scores
    # Sort Scores
    sorted_scores = sorted(scores, key=lambda x:x[1], reverse=True)

    # Recommender
    selected_course_indices = [i[0] for i in sorted_scores[1:]]
    selected_course_scores = [i[1] for i in sorted_scores[1:]]

    result = df[df.columns].iloc[selected_course_indices]

    rec_df = pd.DataFrame(result)

    rec_df['similarity_scores'] = selected_course_scores

    return rec_df[:num_of_rec]

## Sigmoid kernel

In [46]:
def make_recommendation_sk(df, course, num_of_rec):
    df = df.fillna('').drop_duplicates(subset=['course']).reset_index().drop(columns='index')


    input_ag = df.loc[df['course'] == course_title, 'Acad Group'].unique()
    input_sub = df.loc[df['course'] == course_title, 'Subject'].unique()
    input_course = df.loc[df['course'] == course_title, 'Course Title'].unique()

    df = df[df['Acad Group'].isin(input_ag)] 
    df = df[(df['Subject'].isin(input_sub)) | (df['Course Title'].isin(input_course))]
    ### edit this section with more refined and detailed topic modeling
    
    tfidf = TfidfVectorizer(min_df=3, max_features=None, 
                strip_accents='unicode', analyzer='word', token_pattern=r'\w{1,}',
                ngram_range=(1, 3))
    
    ##############

    # Fitting the TF-IDF on the 'description' text
    tfidf_matrix = tfidf.fit_transform(df['description'])

    # Compute the sigmoid kernel
    sig = sigmoid_kernel(tfidf_matrix, tfidf_matrix)

    # Reverse mapping of indices and course titles
    indices = pd.Series(df.index, index=df['course']).drop_duplicates()

    # Get the index corresponding to course title
    idx = indices[course]

    # Get the pairwsie similarity scores 
    sig_scores = list(enumerate(sig[idx]))

    # Sort the courses
    sig_scores = sorted(sig_scores, key=lambda x: x[1], reverse=True)

    # Scores of the n most similar courses
    sig_scores = sig_scores[1:num_of_rec+1]

    # Take the indices
    course_indices = [i[0] for i in sig_scores]

    # Top 10 most similar courses
    rec_df = df[df.columns].iloc[course_indices]
    
    rec_df['sig_scores'] = sig_scores

    return rec_df

In [70]:
make_recommendation_cos(f_21, 'AAS 103', 10)

Unnamed: 0,Class Nbr,course,Term,Session,Acad Group,Subject,Course Title,description,Component,Time,...,Has WL,Units,sub_title,credits,requirements_distribution,consent,advisory_prerequisites,other_course_info,repeatability,similarity_scores
13,27259,AAS 304,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Gender&Immigr,"Refugees, migrants, immigrants, diaspora grou...",SEM,4-530PM,...,Y,3.0,"- Refugees of Unjust Worlds: Globalization, G...",3,SS,With permission of instructor.,The seminar is intended for junior and senior ...,,May not be repeated for credit.,0.766034
23,25830,AAS 365,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Gender Global Health,Feminists and anthropologists have produced vo...,SEM,1-230PM,...,Y,3.0,,3,SS,With permission of instructor.,One course in either Women's & Gender Studies ...,,May not be repeated for credit.,0.698175
27,33104,AAS 421,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Afdiaspora Religions,This survey course offers an overview of the r...,LEC,10-1130AM,...,Y,3.0,"- Religions of the African Diasp: Vodou, Sant...",3,RE,,,,May not be repeated for credit.,0.681765
28,31609,AAS 458,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Black World Issues,This seminar is designed to introduce students...,SEM,530-7PM,...,Y,3.0,- Political Violence in Africa,3,,,,,May be repeated for a maximum of 6 credit(s).,0.672489
11,27816,AAS 290,Fall 2021,First 7 Week Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Select Blk World Std,The mini-course seminars will introduce studen...,SEM,10-1130AM,...,Y,2.0,- Hoop Dreams: Race and Basketball in America,2,,,,,May not be repeated for credit.,0.672129
35,26151,AAS 558,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Black World Seminar,In this course we will study African American ...,SEM,10-1PM,...,Y,3.0,- Policing Blackness in America,3,,,Graduate standing or permission of instructor.,,May be repeated for a maximum of 6 credit(s).,0.634175
29,33106,AAS 495,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Senior Seminar,\nThis course explores cities in contemporary ...,SEM,4-530PM,...,Y,4.0,- Contemporary Africa and the World,4,ULWR,,Upperclass standing.,(Cross-Area Courses).,May be repeated for a maximum of 8 credit(s).,0.629296
12,31945,AAS 303,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Race&Ethnicity,This course examines the central tensions unde...,DIS,10-11AM,...,Y,4.0,,4,"SS, RE",,An introductory course in Sociology or AAS 201.,(African-American Studies).,May not be repeated for credit.,0.621194
8,33094,AAS 260,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,African Development,The course introduces students to the confluen...,LEC,4-530PM,...,Y,3.0,- The Political Economy,3,SS,,,,May not be repeated for credit.,0.615796
16,26182,AAS 322,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Intro Env Politics,This course introduces students to global envi...,DIS,10-11AM,...,Y,4.0,- Introduction to Environmental Politics: Rac...,4,"ULWR, SS, RE",,,(Cross-Area Courses).,May not be repeated for credit.,0.614989


In [71]:
make_recommendation_sk(f_21, 'AAS 103', 10)

Unnamed: 0,Class Nbr,course,Term,Session,Acad Group,Subject,Course Title,description,Component,Time,...,Has WL,Units,sub_title,credits,requirements_distribution,consent,advisory_prerequisites,other_course_info,repeatability,sig_scores
13,27259,AAS 304,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Gender&Immigr,"Refugees, migrants, immigrants, diaspora grou...",SEM,4-530PM,...,Y,3.0,"- Refugees of Unjust Worlds: Globalization, G...",3,SS,With permission of instructor.,The seminar is intended for junior and senior ...,,May not be repeated for credit.,"(13, 0.7623840172367933)"
28,31609,AAS 458,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Black World Issues,This seminar is designed to introduce students...,SEM,530-7PM,...,Y,3.0,- Political Violence in Africa,3,,,,,May be repeated for a maximum of 6 credit(s).,"(28, 0.7623592676058287)"
11,27816,AAS 290,Fall 2021,First 7 Week Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Select Blk World Std,The mini-course seminars will introduce studen...,SEM,10-1130AM,...,Y,2.0,- Hoop Dreams: Race and Basketball in America,2,,,,,May not be repeated for credit.,"(11, 0.7623105501427522)"
23,25830,AAS 365,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Gender Global Health,Feminists and anthropologists have produced vo...,SEM,1-230PM,...,Y,3.0,,3,SS,With permission of instructor.,One course in either Women's & Gender Studies ...,,May not be repeated for credit.,"(23, 0.7622815661289793)"
27,33104,AAS 421,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Afdiaspora Religions,This survey course offers an overview of the r...,LEC,10-1130AM,...,Y,3.0,"- Religions of the African Diasp: Vodou, Sant...",3,RE,,,,May not be repeated for credit.,"(27, 0.7622571836009558)"
18,29687,AAS 346,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Lit in African Hist,This course is about the history of African li...,SEM,1130-1PM,...,Y,3.0,,3,"HU, RE",,,,May not be repeated for credit.,"(18, 0.7622435866355507)"
35,26151,AAS 558,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Black World Seminar,In this course we will study African American ...,SEM,10-1PM,...,Y,3.0,- Policing Blackness in America,3,,,Graduate standing or permission of instructor.,,May be repeated for a maximum of 6 credit(s).,"(35, 0.7622368184800196)"
29,33106,AAS 495,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Senior Seminar,\nThis course explores cities in contemporary ...,SEM,4-530PM,...,Y,4.0,- Contemporary Africa and the World,4,ULWR,,Upperclass standing.,(Cross-Area Courses).,May be repeated for a maximum of 8 credit(s).,"(29, 0.7621922157208741)"
10,36353,AAS 275,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Blk Women Pop Cult,Popular culture is an important site for creat...,SEM,1-230PM,...,Y,3.0,,3,ID,,,,May not be repeated for credit.,"(10, 0.7621888873094687)"
21,36356,AAS 357,Fall 2021,Regular Academic Session,"Literature, Sci, and the Arts",Afroamerican & African Studies (AAS) Open Sect...,Environ Afr Dvlpmt,Environmental sustainability and economic deve...,SEM,230-4PM,...,Y,3.0,,3,SS,,,,May not be repeated for credit.,"(21, 0.7621726719156646)"
