# Breaking Down UVA Course Reviews

By: William Kaiser

## Layout

The layout for this site was designed in [Figma](https://www.figma.com/file/PCZFfmRXn0e6720BjxNAtg/Misc-Images?type=design&node-id=539-5&mode=design&t=4der6LRc6aprI2Sf-0) and designed to be responsive and intent-based.

![Layout](./imgs/layout.png)

However, after discussion with fellow class members and looking at [Python Graph Gallery](https://python-graph-gallery.com/) I decided to use a [Spider / Radar](https://python-graph-gallery.com/radar-chart/) plot as well as the more traditional bar charts.

# Data Loading

The data for this analysis comes from [theCourseForum](https://thecourseforum.com/) and a direct database connection was used. 

For more information about data provenance, collection, formatting, and cleaning, please see [Sprint 2: Data](./sprints/sprint2-data.ipynb).

In [112]:
# Generic imports
import pandas as pd
# import numpy as np
# import matplotlib.pyplot as plt
# import seaborn as sns
from tqdm import tqdm
from openai import OpenAI
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
from dash import dash_table, dcc, Dash, html, callback, Input, Output, State
from typing import List, Dict, TypedDict
import os

tqdm.pandas()

In [113]:
# Loading the data
df = pd.read_csv("./data.csv")
df['course_level'] = pd.to_numeric(df['course_number']).apply(lambda x: x // 1000)
df


Columns (14,15,26,29) have mixed types. Specify dtype option on import or set low_memory=False.



Unnamed: 0,course_id,title,description,course_number,average,a_plus,a,a_minus,b_plus,b,...,amount_group,amount_homework,review_created,year,season,instructor_email,instructor_name,year_last_taught,season_last_taught,course_level
0,12.0,Introductory Accounting I,Designed to introduce students to the language...,2010.0,3.200437,59.0,131.0,51.0,42.0,62.0,...,4,2.0,2020-11-10 22:11:29+00:00,2020,FALL,gdb5x@virginia.edu,Gary Brooks,2023.0,SPRING,2.0
1,1505.0,History and Civilization of Classical India,Studies the major elements of South Asian civi...,2001.0,3.395795,5.0,85.0,132.0,117.0,66.0,...,0,0.0,2020-12-19 04:08:23.637939+00:00,2020,FALL,sal9c@virginia.edu,Spencer Leonard,2021.0,FALL,2.0
2,1504.0,Introductory Seminar in South Asia,Introduction to the study of history intended ...,1501.0,3.601403,9.0,118.0,131.0,74.0,27.0,...,0,0.0,2016-11-02 16:38:11+00:00,2016,FALL,sal9c@virginia.edu,Spencer Leonard,2022.0,SPRING,1.0
3,1505.0,History and Civilization of Classical India,Studies the major elements of South Asian civi...,2001.0,3.395795,5.0,85.0,132.0,117.0,66.0,...,0,0.0,2021-12-20 18:54:54.439844+00:00,2021,FALL,sal9c@virginia.edu,Spencer Leonard,2021.0,FALL,2.0
4,15083.0,India From Akbar to Victoria,Studies the society and politics in the Mughal...,3002.0,2.984615,0.0,2.0,6.0,1.0,1.0,...,0,0.0,2022-11-03 03:15:03.409391+00:00,2021,FALL,sal9c@virginia.edu,Spencer Leonard,2021.0,FALL,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
41507,754.0,Money and Banking,Studies the role of money in the economic syst...,3030.0,3.144924,136.0,754.0,742.0,435.0,596.0,...,0,0.0,2011-06-27 00:00:00+00:00,2009,SPRING,hjm5p@virginia.edu,Hernan Moscoso Boedo,2024.0,SPRING,3.0
41508,754.0,Money and Banking,Studies the role of money in the economic syst...,3030.0,3.144924,136.0,754.0,742.0,435.0,596.0,...,0,0.0,2013-03-03 00:00:00+00:00,2009,SPRING,hjm5p@virginia.edu,Hernan Moscoso Boedo,2024.0,SPRING,3.0
41509,754.0,Money and Banking,Studies the role of money in the economic syst...,3030.0,3.144924,136.0,754.0,742.0,435.0,596.0,...,0,1.0,2013-08-15 00:00:00+00:00,2009,SPRING,hjm5p@virginia.edu,Hernan Moscoso Boedo,2024.0,SPRING,3.0
41510,8600.0,Special Topics in Politics,Special Topics in Politics,2500.0,3.419390,11.0,81.0,71.0,42.0,44.0,...,0,0.0,2021-01-20 19:38:10.579957+00:00,2021,JANUARY,dal7w@virginia.edu,David Leblang,2024.0,SPRING,2.0


In [114]:
# Printing the columns
print("\n".join(df.columns.to_list()))

course_id
title
description
course_number
average
a_plus
a
a_minus
b_plus
b
b_minus
c_plus
c
c_minus
total_enrolled
dfw
subdepartment_name
mnemonic
review_text
instructor_rating
difficulty
recommendability
enjoyability
hours_per_week
amount_reading
amount_writing
amount_group
amount_homework
review_created
year
season
instructor_email
instructor_name
year_last_taught
season_last_taught
course_level


## Making the Spider Plot

The goal of this section is for a specified course to be selected and then a radar plot to be generated based on the reviews for that course.

In [115]:
REVIEW_COMPONENTS = {
    "instructor_rating": "Instructor Rating",
    "difficulty": "Difficulty",
    "recommendability": "Recommendability",
    "enjoyability": "Enjoyability",
    # "hours_per_week": "Hours Per Week",
    "amount_reading": "Amount Reading",
    "amount_writing": "Amount Writing",
    "amount_group": "Amount Groupwork",
    "amount_homework": "Amount Homework",
}

# making each column into a numeric column
for column in REVIEW_COMPONENTS.keys():
    df[column] = pd.to_numeric(df[column], errors="coerce")

COURSE = "CS 3100"

def course_components(course_pneumonic_and_number: str) -> (str, int):
    """
    Turns a human readable course pneumonic and number into a database-searchable tuple
    """
    course_pneumonic_and_number = (
        course_pneumonic_and_number.strip().upper().replace("  ", " ").split(" ")
    )
    pneumoic, number = course_pneumonic_and_number
    print(pneumoic, number)
    return pneumoic, int(number)


def get_course_summary_ratings(pneumonic: str, number: int) -> pd.DataFrame:
    """
    Gets the ratings for the course
    """
    pneumonic_mask = df["mnemonic"] == pneumonic
    number_mask = df["course_number"] == number
    relevant = df[pneumonic_mask & number_mask]

    mean = relevant[REVIEW_COMPONENTS.keys()].mean()
    std = relevant[REVIEW_COMPONENTS.keys()].std()

    frame = pd.DataFrame(
        {"mean": mean.to_list(), "std": std.to_list(), "category": mean.index.to_list()}
    )
    frame['course'] = frame['mean'].apply(lambda x: f"{pneumonic} {number}")
    return frame


get_course_summary_ratings(*course_components(COURSE))

CS 3100


Unnamed: 0,mean,std,category,course
0,4.058824,1.344925,instructor_rating,CS 3100
1,4.235294,0.752447,difficulty,CS 3100
2,3.588235,1.416811,recommendability,CS 3100
3,3.411765,1.416811,enjoyability,CS 3100
4,1.235294,2.411675,amount_reading,CS 3100
5,2.058824,2.98895,amount_writing,CS 3100
6,1.705882,2.687115,amount_group,CS 3100
7,3.470588,4.07918,amount_homework,CS 3100


In [116]:
# Creating a spider plot for the course
# source: https://python-graph-gallery.com/571-radar-chart-with-plotly/
# Getting many plot with px
# source: https://stackoverflow.com/questions/56727843/how-can-i-create-subplots-with-plotly-express
# note: this sucks


def course_axis(
    frame: pd.DataFrame, course: str = None, include: str = "ratings"
) -> px.line_polar:
    """
    Makes a polar axis to compare courses
    """
    if "duration" in include:
        frame = frame[frame["category"].apply(lambda name: name.find("amount") != -1)]
    if "rating" in include:
        frame = frame[frame["category"].apply(lambda name: name.find("amount") == -1)]

    fig = go.Figure()
    frame['category'] = frame['category'].replace(REVIEW_COMPONENTS)
    for course in frame["course"].unique():
        rel_frame = frame[frame["course"] == course]
        graph_object = go.Scatterpolar(
            r=rel_frame["mean"], theta=rel_frame["category"], fill="toself",
             name=course.upper()
        )

        fig.add_trace(graph_object)

    return fig


cs_3100 = get_course_summary_ratings(*course_components(COURSE))
cs_2130 = get_course_summary_ratings(*course_components("CS 2130"))
cs_3130 = get_course_summary_ratings(*course_components("CS 3130"))
ds_4003 = get_course_summary_ratings(*course_components("DS 4003"))


rows = pd.concat([cs_3100, cs_2130, cs_3130, ds_4003])

course_axis(rows, course="CS 3100")

CS 3100
CS 2130
CS 3130
DS 4003




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



## Excel-Style Filterable Table

The goal of this section is to allow user filtering of tables in an excel like manner. 

The data in the section will be similar to [the spider plot](#making-the-spider-plot), however the format will be a bit more formal.

![Excel-style Table](./imgs/excel-table.png)

**Guide:** https://dash.plotly.com/datatable

In [117]:
# Defining columns of interest
SAMPLE_COURSES = ["CS 3100", "CS 2130", "CS 3130", "DS 4003"]

# additional table components
TABLE_COMPONENTS = {
    'average': 'Average',
}

course_components_to_agg = {**TABLE_COMPONENTS, **REVIEW_COMPONENTS}
course_components_to_agg

{'average': 'Average',
 'instructor_rating': 'Instructor Rating',
 'difficulty': 'Difficulty',
 'recommendability': 'Recommendability',
 'enjoyability': 'Enjoyability',
 'amount_reading': 'Amount Reading',
 'amount_writing': 'Amount Writing',
 'amount_group': 'Amount Groupwork',
 'amount_homework': 'Amount Homework'}

In [118]:
def get_data_for_course_comparison_table(courses: List) -> pd.DataFrame:
    """
    Gets relevant course_components to aggregate for a list of courses
    """
    # filtering the data down to the relevant frame
    mask = df["mnemonic"] == -1
    for course in courses:
        pneumonic, number = course
        mask = mask | ((df["mnemonic"] == pneumonic) & (df["course_number"] == number))

    print(course_components_to_agg)

    grouped = df[mask].groupby(["mnemonic", "course_number"]).agg({ key: 'mean' for key in course_components_to_agg.keys() }).reset_index()

    return grouped

course_tuples = [course_components(course) for course in SAMPLE_COURSES]
specific_data = get_data_for_course_comparison_table(course_tuples)
specific_data

CS 3100
CS 2130
CS 3130
DS 4003
{'average': 'Average', 'instructor_rating': 'Instructor Rating', 'difficulty': 'Difficulty', 'recommendability': 'Recommendability', 'enjoyability': 'Enjoyability', 'amount_reading': 'Amount Reading', 'amount_writing': 'Amount Writing', 'amount_group': 'Amount Groupwork', 'amount_homework': 'Amount Homework'}


Unnamed: 0,mnemonic,course_number,average,instructor_rating,difficulty,recommendability,enjoyability,amount_reading,amount_writing,amount_group,amount_homework
0,CS,2130.0,2.948611,3.45,4.3,2.5,2.3,1.9,1.2,1.15,5.7
1,CS,3100.0,3.37977,4.058824,4.235294,3.588235,3.411765,1.235294,2.058824,1.705882,3.470588
2,CS,3130.0,3.33,3.6,5.0,2.6,2.3,1.5,7.5,0.3,2.5


In [119]:
# making the table in plotly
# source: https://plotly.com/python/table/

new_course_components_to_agg = {
    **course_components_to_agg,
    "course_number": "Course #",
    "mnemonic": "Mnemonic",
}


def get_cells(course_data: pd.DataFrame) -> List[List[any]]:
    """
    Gets the cells in the data table
    """
    return course_data.round(2).to_numpy().transpose().tolist()


pretty_column_names = list(
    map(
        lambda col: f"<b>{new_course_components_to_agg.get(col, col).replace(' ', '<br>')}</b><br>",
        course_data.columns.to_list(),
    )
)

table = go.Figure(
    data=[
        go.Table(
            header=dict(
                values=pretty_column_names,
                line_color="darkslategray",
                fill_color="royalblue",
                align=["left", "center"],
                font=dict(color="white", size=12),
                height=40,
            ),
            cells=dict(
                values=get_cells(course_data),
                line_color="darkslategray",
                fill=dict(color=["paleturquoise", "white"]),
                align=["left", "center"],
                font_size=12,
                height=30,
            ),
        )
    ]
)

table  # note: I am pretty happy with this. I would like to add more interactivity here.

In [120]:
# making a source dropdown
# source: https://dash.plotly.com/dash-core-components/dropdown
# df['course_number'] = df['course_number'].apply(int)
df['course_number'].fillna(0, inplace=True)
df['name'] = df['mnemonic'] + " " + df['course_number'].astype(str)
df['name']

0        ACCT 2010.0
1        HISA 2001.0
2        HISA 1501.0
3        HISA 2001.0
4        HISA 3002.0
            ...     
41507    ECON 3030.0
41508    ECON 3030.0
41509    ECON 3030.0
41510    PLAD 2500.0
41511    COMM 3845.0
Name: name, Length: 41512, dtype: object

In [121]:
# adding source dropdowns here
course_mneumoic_dropdown = dcc.Dropdown(
    id='course_dropdown',
    options=[{'label': str(course).replace(".0", ""), 'value': str(course).replace(".0", "")} for course in df['name'].unique()],
    value='CS 3100',
    multi=True
)

In [122]:
# making a dash table so that things work off the cuff
@callback(
    Output('course_table', 'data'),
    Input('course_dropdown', 'value')
)
def update_table_data(data) -> pd.DataFrame:
    """
    Updates the data table to show the dropdown values
    """

    # new_course_tuples = [course_components(course) for course in data]
    # new_course_data = get_data_for_course_comparison_table(new_course_tuples)

    # print(new_course_data)

    course_tuples = [course_components(course) for course in SAMPLE_COURSES]
    course_data = get_data_for_course_comparison_table(course_tuples)

    course_data['course_number'] = course_data['course_number'].apply(int).apply(str)
    course_data['average'] = course_data['average'].apply(lambda x: f"{x:.2f}")
    course_data['amount_reading'] = course_data['amount_reading'].apply(lambda x: f"{x:.2f}")
    course_data['amount_writing'] = course_data['amount_writing'].apply(lambda x: f"{x:.2f}")
    course_data['amount_group'] = course_data['amount_group'].apply(lambda x: f"{x:.2f}")
    course_data['amount_homework'] = course_data['amount_homework'].apply(lambda x: f"{x:.2f}")
    course_data['instructor_rating'] = course_data['instructor_rating'].apply(lambda x: f"{x:.2f}")
    course_data['difficulty'] = course_data['difficulty'].apply(lambda x: f"{x:.2f}")
    course_data['recommendability'] = course_data['recommendability'].apply(lambda x: f"{x:.2f}")
    course_data['enjoyability'] = course_data['enjoyability'].apply(lambda x: f"{x:.2f}")

    return course_data


update_table_data(["CS 3100"])

CS 3100
CS 2130
CS 3130
DS 4003
{'average': 'Average', 'instructor_rating': 'Instructor Rating', 'difficulty': 'Difficulty', 'recommendability': 'Recommendability', 'enjoyability': 'Enjoyability', 'amount_reading': 'Amount Reading', 'amount_writing': 'Amount Writing', 'amount_group': 'Amount Groupwork', 'amount_homework': 'Amount Homework'}


Unnamed: 0,mnemonic,course_number,average,instructor_rating,difficulty,recommendability,enjoyability,amount_reading,amount_writing,amount_group,amount_homework
0,CS,2130,2.95,3.45,4.3,2.5,2.3,1.9,1.2,1.15,5.7
1,CS,3100,3.38,4.06,4.24,3.59,3.41,1.24,2.06,1.71,3.47
2,CS,3130,3.33,3.6,5.0,2.6,2.3,1.5,7.5,0.3,2.5


# Breaking Down Instructor Rating

![Correlating Instructor Reviews](./imgs/correlating_reviews.png)

In [123]:
# the goal for this is to create the functions which can control for a  particular factor
CHECKBOX_SELECTORS = {
    'average': 'GPA',
    # 'course_level': 'Course Level (1k, 2k, ...)',
    'difficulty' : 'Difficulty',
    'hours_per_week': 'Hours Worked Per Week',
    'amount_group': 'Group Work Per Week'
}

In [124]:
# making a bunch of checkboxes in plotly
checklist = dcc.Checklist(
    options=[
        {'label': v, 'value': k} for k, v in CHECKBOX_SELECTORS.items()
    ],
    value=[],
    id='features_to_plot'
)

In [125]:
# creating a histogram of course reviews
course_reviews = px.histogram(
    df.groupby(["mnemonic", "course_number"]).agg({"instructor_rating": "mean"}),
    x="instructor_rating",
)
course_reviews

### Making a regression

In [126]:
import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# making a linear regression model
model = LinearRegression()

columns = ['difficulty', 'instructor_rating', 'hours_per_week']
# getting the data
new_columns = ['average', 'difficulty', 'hours_per_week', 'amount_group']
df.dropna(subset=new_columns, inplace=True)
X = df[new_columns].values.reshape(-1, len(new_columns))
Y = df['instructor_rating'].values.reshape(-1, 1)

# fitting the model
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

model.fit(X_train, Y_train)

print(model.coef_, model.intercept_)

[[ 0.33336877 -0.19990082 -0.00625976 -0.03642834]] [3.30798362]


In [127]:
@callback(
    Output('correlation_plot', 'figure'),
    Input('features_to_plot', 'value')
)
def make_correlation_plot(columns): 
    """
    Makes a bar chart of the correlation between columns
    """
    # uses the data from the model
    data = zip(new_columns, model.coef_[0])
    print(data)
    # filter the data by which the column is in the columns
    column_renamer = {
        'average': 'GPA',
        'difficulty': 'Difficulty',
        'hours_per_week': 'Hours Worked Per Week',
        'amount_group': 'Group Work Per Week'
    }

    x_labels = []
    y_values = []
    for col, r in data:
        if col in columns:
            x_labels.append(column_renamer.get(col, col))
            y_values.append(r)
    
    fig = px.bar(x=x_labels, y=y_values, labels= {'x': 'Feature', 'y': 'Correlation Coefficient', **column_renamer})
    return fig

make_correlation_plot(new_columns)

<zip object at 0x2a3e228c0>


In [128]:
@callback(
    Output('review_residuals', 'figure'),
    Input('features_to_plot', 'value')
)
def make_residual_plot(columns): 
    """
    Makes a bar chart of the correlation between columns
    """
    data = zip(new_columns, model.coef_[0])
    print(data)
    # filter the data by which the column is in the columns
    column_renamer = {
        'average': 'GPA',
        'difficulty': 'Difficulty',
        'hours_per_week': 'Hours Worked Per Week',
        'amount_group': 'Group Work Per Week'
    }
    print("Columns to plot", columns)

    x_labels = []
    y_values = []
    for col, r in data:
        if col in columns:
            x_labels.append(column_renamer.get(col, col))
            y_values.append(r)

    new_df = df
    for col in new_columns:
        if col not in columns:
            new_df[col] = df[col].apply(lambda x: 0)
            print("Zeroing out", col)

    residuals = pd.Series(model.predict(new_df[new_columns]).flatten()) - new_df['instructor_rating']
    residuals = residuals.dropna()
    fig = px.histogram(x=residuals, title="Review Residuals", labels={
        "count": "# of Reviews",
        "x": "Difference between predicted and actual review"
    })

    fig.update_layout(

    )

    # making a plot of reviews
    return fig


make_residual_plot(new_columns)

<zip object at 0x295d8f280>
Columns to plot ['average', 'difficulty', 'hours_per_week', 'amount_group']



X has feature names, but LinearRegression was fitted without feature names



In [129]:
model.intercept_, model.get_params(), model.coef_

(array([3.30798362]),
 {'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'positive': False},
 array([[ 0.33336877, -0.19990082, -0.00625976, -0.03642834]]))

In [130]:
# getting measures of importance

from sklearn.inspection import permutation_importance

result = permutation_importance(model, X_test, Y_test, n_repeats=10, random_state=42)

X_train

array([[3.40264617, 4.        , 5.        , 1.        ],
       [3.11693794, 3.        , 3.        , 0.        ],
       [3.17024084, 3.        , 2.        , 0.        ],
       ...,
       [3.11693794, 2.        , 1.        , 0.        ],
       [2.94840044, 5.        , 0.        , 0.        ],
       [3.85618738, 2.        , 1.        , 0.        ]])

## Building Semantic Search

Semantic search will take place with an OpenAI client and a embedding model. This will use a pinecone database for textual similarity.

![Semantic Search](./imgs/semantic-search.png)

In [131]:
# getting and saving course reviews
course_reviews = df.groupby(['title', 'description']).agg({'instructor_rating': 'mean'}).reset_index()

def make_prompt(row):
    """
    Makes a prompt for the user to ask the course
    """

    return f"""course name {row['title']} ({row['description']}) has an average rating of {row['instructor_rating']}"""

course_reviews['prompt'] = course_reviews.apply(make_prompt, axis=1)
course_reviews['prompt']

0       course name "Spiritual But Not Religious": Spi...
1       course name 1492 and the Aftermath (Examines S...
2       course name 17th Century Philosophy (Studies t...
3       course name 18th Century Philosophy (Studies t...
4       course name A Buddhist Approach to Development...
                              ...                        
2386    course name Writing and Critical Inquiry: Comm...
2387    course name Writing with Sound (This course tr...
2388    course name Writing with Style (Develops an un...
2389    course name Young Adult Literature (Using Sims...
2390    course name Zen (Studies the development and h...
Name: prompt, Length: 2391, dtype: object

In [132]:
# getting the embeddings from OpenAI
# from: https://platform.openai.com/docs/guides/embeddings/use-cases

client = OpenAI()

def get_embedding(text, model="text-embedding-3-small"):
   text = text.replace("\n", " ")
   return client.embeddings.create(input = [text], model=model).data[0].embedding

# course_reviews['ada_embedding'] = course_reviews['prompt'].progress_apply(lambda x: get_embedding(x, model='text-embedding-3-small'))
# course_reviews.to_csv('data_1k_embeddings.csv', index=False)

In [133]:
from pinecone import Pinecone, ServerlessSpec
# import os

# # initialize connection to pinecone (get API key at app.pc.io)
api_key = os.environ.get('PINECONE_API_KEY') or '6b801a89-8fad-44bd-a8f5-1c2be8a9d208'

# # configure client
pc = Pinecone(api_key=api_key)
spec = ServerlessSpec(cloud='aws', region='us-west-2')

# try:
#     pc.create_index(
#         name="course-reviews-1k", 
#         dimension=1536, 
#         metric="euclidean",
#         spec=spec
#     )
# except:
#     pass # index already created

In [134]:
index = pc.Index("course-reviews-1k")

# def force_ascii(string: str) -> str:
#     """
#     Forces a string to be ascii
#     """
#     return string.encode('ascii', errors='ignore').decode('ascii')

# # making the things to upsert
# upsertion_reviews = [
#     {"id": force_ascii(item["title"]), "values": item["ada_embedding"]}
#     for item in course_reviews.to_dict(orient="records")
# ]

# # upload loop
# for i in range(0, len(upsertion_reviews), 100):
#     print(f"Uploading {i} to {i + 100}")
#     index.upsert(vectors=upsertion_reviews[i:i + 100])

In [135]:
@callback(
    # Output('search_results', 'figure'),
    Output("search_results", "data"), # , Output('search_results', 'columns')],
    Input('search-input', 'value'),
)
def search_for_course(search_term):
    """
    Performs a semantic search to get the course data
    """

    # gets the embeddings
    search_term = str(search_term)
    embedding = get_embedding(search_term)

    # searches for the embeddings
    results = index.query(vector=embedding, top_k=5)

    # gets the results
    matches = results['matches']
    ids = [match['id'] for match in matches]

    # gets the data
    data = course_reviews[course_reviews['title'].isin(ids)]

    # makes a table out of the data
    print("Review Table", data)

    # return dash_table.DataTable(
    #     columns=[{"name": i, "id": i} for i in data.columns],
    #     data=data.to_dict(orient="records"),
    # )

    return data.values

search_for_course("A foundational computer science course")

Review Table                                                   title  \
418                 Computer Systems and Organization 1   
479                    Data Structures and Algorithms 1   
1068  Introduction to Computing: Explorations in Lan...   
1157                        Introduction to Programming   
1158                        Introduction to Programming   
1159                        Introduction to Programming   
1160                        Introduction to Programming   
1981                    Software Development Essentials   

                                            description  instructor_rating  \
418   This course covers topics on the computer arch...           3.450000   
479   A second course in computing with an emphasis ...           3.480000   
1068  This course is an introduction to the most imp...           4.000000   
1157  A first course in programming, software develo...           4.106742   
1158  A first course in programming, software develo...          

array([['Computer Systems and Organization 1',
        'This course covers topics on the computer architecture abstraction hierarchy ranging from a step above silicon to a step below modern programming languages. Students in this course will learn to write low-level code in C and Assembly, how data is stored in memory, the basics of hardware design from gates and registers through general-purpose computers, and legal, ethical, and security issues related to these topics. CS 1100 - CS 1199 and either familiarity with Java, C++, or another C-like language, or concurrent enrollment in CS 2100',
        3.45,
        'course name Computer Systems and Organization 1 (This course covers topics on the computer architecture abstraction hierarchy ranging from a step above silicon to a step below modern programming languages. Students in this course will learn to write low-level code in C and Assembly, how data is stored in memory, the basics of hardware design from gates and registers through g

## Final Dashboard

Where all of the components come together to make the final dashboard. This launches the layout and the final project.

In [136]:
# defining callbacks
@callback(
    Output(component_id="course_axis", component_property="figure"),
    (Input(component_id="course_dropdown", component_property="value"))
)
def update_course_axis(course: str):
    """
    Updates the course axis
    """
    if course is str: # if the course is a string
        course = [course]
    
    summary = pd.concat([get_course_summary_ratings(*course_components(course)) for course in course])
    
    # rows = pd.concat([cs_3100, cs_2130, cs_3130, ds_4003])

    # course_data = get_course_summary_ratings(*course_components(course))
    return course_axis(summary, course=course)

@callback(
    Output(component_id="course_hours", component_property="figure"),
    (Input(component_id="course_dropdown", component_property="value"))
)
def update_course_hours(course: str):
    """
    Updates the course hours
    """
    if course is str:
        course = [course]
    
    summary = pd.concat([get_course_summary_ratings(*course_components(course)) for course in course])
    return course_axis(summary, course=course, include="duration")


### Training a linear model to predict the instructor rating based on (GPA, Course Level, Difficulty, Hours Worked Per Week, Group Work Per Week)


In [137]:
# Making an app to display everything
external_stylesheets = ["https://codepen.io/chriddyp/pen/bWLwgP.css"]

app = Dash(__name__, external_stylesheets=external_stylesheets)

### HEADER ###
header = html.Div(
    [
        html.H1("Course Review Explorer"),
    ]
)

### BREAKING DOWN RATINGS ###
course_filter_header = html.Div(
    [
        html.H1("Breaking down ratings"),
        html.P("Examine course reviews for a particular class"),
    ]
)

course_filter_row = html.Div(
    [
        dcc.Graph(
            figure=course_axis(rows, course="CS 3100"),
            id="course_axis",
            className="one-half column",
        ),
        dcc.Graph(
            figure=course_axis(rows, course="CS 2130"),
            id="course_hours",
            className="one-half column",
        ),
        # dcc.Graph(figure=table, id="course_table", className="one-half column"),
        html.Div([
            dash_table.DataTable(
                id="course_table",
                columns=[{"name": i, "id": i} for i in new_columns],
                data=course_data.to_dict(orient="records"),
            )
        ]
        className='one-half columns'
        ),
    ],
    className="row",
)

course_filter_section = html.Div(
    [
        course_filter_header,
        course_mneumoic_dropdown,
        course_filter_row,
    ]
)

### CORRELATING INSTRUCTOR REVIEWS ###
controlling_factors = html.Div(
    [
        html.H1("Correlating Instructor Reviews"),
        html.Div("Control for the following factors"),
        checklist,
    ],
    className="one-third column",
)

distribution_of_reviews = html.Div(
    [
        html.H1("Distribution of Residuals"),
        html.P("What's left over?"),
        dcc.Graph(id="review_residuals"),
    ],
    className="one-third column",
)

predictive_power_of_features = html.Div(
    [
        html.H1("Predictive Power of Features"),
        html.P("How well do these features predict the instructor rating?"),
        dcc.Graph(id="correlation_plot"),
    ],
    className="one-third column",
)

correlating_instructor_reviews = html.Div(
    [controlling_factors, distribution_of_reviews, predictive_power_of_features],
    className="row",
)

### SEMANTIC SEARCH ###
search_box = html.Div(
    [
        html.H1("Search for courses"),
        dcc.Input(
            id="search-input", type="text", # debounce=True,
            placeholder="I want a class on computer networking...",
        ),
    ],
    className="one-third column",
)

search_table = html.Div(
    [
        dash_table.DataTable(id="search_results", data=[]),
    ],
    className="two-thirds column",
)

semantic_search = html.Div([search_box, search_table], className="row")

### FINAL LAYOUT ###
app.layout = html.Div(
    [
        header,
        course_filter_section,
        correlating_instructor_reviews,
        semantic_search,
    ],
    className="container",
)

server = app.server

if __name__ == "__main__":
    app.run_server(jupyter_mode="tab")



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



TypeError: The `dash_table.DataTable` component (version 2.15.0) with the ID "course_table" received an unexpected keyword argument: `className`
Allowed arguments: active_cell, cell_selectable, column_selectable, columns, css, data, data_previous, data_timestamp, derived_filter_query_structure, derived_viewport_data, derived_viewport_indices, derived_viewport_row_ids, derived_viewport_selected_columns, derived_viewport_selected_row_ids, derived_viewport_selected_rows, derived_virtual_data, derived_virtual_indices, derived_virtual_row_ids, derived_virtual_selected_row_ids, derived_virtual_selected_rows, dropdown, dropdown_conditional, dropdown_data, editable, end_cell, export_columns, export_format, export_headers, fill_width, filter_action, filter_options, filter_query, fixed_columns, fixed_rows, hidden_columns, id, include_headers_on_copy_paste, is_focused, loading_state, locale_format, markdown_options, merge_duplicate_headers, page_action, page_count, page_current, page_size, persisted_props, persistence, persistence_type, row_deletable, row_selectable, selected_cells, selected_columns, selected_row_ids, selected_rows, sort_action, sort_as_null, sort_by, sort_mode, start_cell, style_as_list_view, style_cell, style_cell_conditional, style_data, style_data_conditional, style_filter, style_filter_conditional, style_header, style_header_conditional, style_table, tooltip, tooltip_conditional, tooltip_data, tooltip_delay, tooltip_duration, tooltip_header, virtualization