### Assignment #5: Callbacks

DS4003 | Spring 2024

Objective: Practice buidling basic UI components in Dash.

Task: Build an app that contains the following components user the gapminder dataset: `gdp_pcap.csv`. [Info](https://www.gapminder.org/gdp-per-capita/)

UI Components:

- A dropdown menu that allows the user to select `country`
- The dropdown should allow the user to select multiple countries
- The options should populate from the dataset (not be hard-coded)
- A slider that allows the user to select `year`
- The slider should allow the user to select a range of years
- The range should be from the minimum year in the dataset to the maximum year in the dataset
- A graph that displays the `gdpPercap` for the selected countries over the selected years
- The graph should display the gdpPercap for each country as a line
- Each country should have a unique color
- Graph DOES NOT need to interact with dropdown or slider
- The graph should have a title and axis labels in reader friendly format

Layout:

- Use a stylesheet
- There should be a title at the top of the page
- There should be a description of the data and app below the title (3-5 sentences)
- The dropdown and slider should be side by side above the graph and take up the full width of the page
- The graph should be below the dropdown and slider and take up the full width of the page

Submission:

- There should be only one app in your submitted work
- Comment your code
- Submit the html file of the notebook save as `DS4003_A4_LastName.html`

**For help you may use the web resources and pandas documentation. No co-pilot or ChatGPT.**


In [1]:
import pandas as pd
from dash import dcc, html, Input, Output, callback
import dash
import plotly.express as px
from datetime import datetime as dt

# Reading in the data

Reads the `gdp_pcap.csv` file into a pandas dataframe.

After this, a minimal amount of exploratory data analysis is performed to understand the data's structure.


In [2]:
df = pd.read_csv("gdp_pcap.csv")  # csv to dataframe
df.head()

Unnamed: 0,country,1800,1801,1802,1803,1804,1805,1806,1807,1808,...,2091,2092,2093,2094,2095,2096,2097,2098,2099,2100
0,Afghanistan,599,599,599,599,599,599,599,599,599,...,4800,4910,5030,5150,5270,5390,5520,5650,5780,5920
1,Angola,465,466,469,471,472,475,477,479,481,...,24.8k,25.3k,25.9k,26.4k,26.9k,27.4k,28k,28.5k,29.1k,29.6k
2,Albania,585,587,588,590,592,593,595,597,598,...,54k,54.6k,55.2k,55.8k,56.4k,56.9k,57.5k,58.1k,58.7k,59.2k
3,Andorra,1710,1710,1710,1720,1720,1720,1730,1730,1730,...,79.3k,79.5k,79.8k,80.1k,80.4k,80.7k,81k,81.2k,81.5k,81.8k
4,UAE,1420,1430,1430,1440,1450,1450,1460,1460,1470,...,92.5k,92.6k,92.6k,92.7k,92.8k,92.9k,92.9k,93k,93.1k,93.1k


In [3]:
# info about # of columns and # of rows
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 195 entries, 0 to 194
Columns: 302 entries, country to 2100
dtypes: int64(86), object(216)
memory usage: 460.2+ KB


In [4]:
# describing the data: general increase over time in gdp
df.describe()

Unnamed: 0,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,...,1876,1877,1878,1879,1880,1882,1884,1886,1893,1894
count,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,...,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0,195.0
mean,1285.45641,1284.789744,1288.630769,1287.758974,1290.117949,1289.687179,1291.328205,1291.584615,1279.794872,1281.328205,...,1908.215385,1921.712821,1936.994872,1938.871795,1973.758974,2010.410256,2051.323077,2078.851282,2206.035897,2247.969231
std,715.939713,712.727672,728.044739,720.093771,731.295371,722.923838,725.368563,717.463104,663.836715,670.97705,...,1459.100187,1483.208468,1520.256481,1481.343549,1550.196731,1582.676946,1639.876624,1666.155329,1742.157592,1770.715933
min,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,442.0,...,501.0,502.0,502.0,503.0,503.0,508.0,516.0,524.0,552.0,552.0
25%,791.5,792.5,793.0,793.5,794.0,794.5,795.5,796.0,797.5,797.5,...,977.0,979.0,981.5,985.0,987.0,996.0,1005.0,1010.0,1055.0,1065.0
50%,1150.0,1150.0,1150.0,1160.0,1160.0,1160.0,1160.0,1160.0,1160.0,1160.0,...,1440.0,1440.0,1450.0,1450.0,1470.0,1480.0,1500.0,1520.0,1590.0,1640.0
75%,1525.0,1520.0,1520.0,1515.0,1515.0,1510.0,1510.0,1515.0,1505.0,1505.0,...,2270.0,2295.0,2295.0,2330.0,2360.0,2420.0,2475.0,2540.0,2690.0,2760.0
max,5970.0,5860.0,6190.0,6050.0,6340.0,5970.0,6010.0,5510.0,4640.0,4720.0,...,9110.0,9180.0,9720.0,9560.0,9740.0,9360.0,9790.0,9840.0,9750.0,9770.0


# Data Processing

The goal of this section is to make a dataframe with columns: `country`, `year`, `gdp_per_cap`. Using this format is far easier in Plotly because I do not have to add multiple lines or do any weight filtering.

Getting the data into this format requires some weird hacks with pandas. The `melt` function was one that I learned about to un-pivot the data.

Furthermore, there was a tricky replace -> eval scenario to convert the `gdp_per_cap` to a float.


In [5]:
m_df = pd.melt(df, id_vars=["country"], var_name="year", value_name="gdp_per_cap")
m_df["year"] = m_df["year"].astype(int)  # year from str to int
m_df["gdp_per_cap"] = (
    m_df["gdp_per_cap"].replace({"k": "*1e3"}, regex=True).map(pd.eval).astype(float)
)  # EVIL EVAL HACKS
m_df

Unnamed: 0,country,year,gdp_per_cap
0,Afghanistan,1800,599.0
1,Angola,1800,465.0
2,Albania,1800,585.0
3,Andorra,1800,1710.0
4,UAE,1800,1420.0
...,...,...,...
58690,Samoa,2100,29200.0
58691,Yemen,2100,8000.0
58692,South Africa,2100,50200.0
58693,Zambia,2100,19600.0


# Making UI Components

We proceed by constructing three UI components for the dashboard:

- Header (title, countries and description)
- Dropdown + Range Slider
- Graph (updated later through a callback)


In [6]:
# Makes a HTMl with a header and a paragraph
header = [
    html.H1("GDP Per Capita By Country and Year"),
    html.P(
        f"""This dataset contains the GDP per capita of countries from {m_df['year'].min()} to {m_df['year'].max()}. GDP per capita is a measure of economic output per citizen. Roughly speaking, GDP per capita could be thought of as a measure of the goods produced by a society. A total of {m_df['country'].count()} are included in this dataset. Data was sourced from Gapminder which aggregated data and projections from Maddison project, the World Bank and the International Monetary Fund."""
    ),
]

In [7]:
CURRENT_YEAR = dt.now().year  # the year of the current date

# makes a dropdown for the countries
country_dropdown = dcc.Dropdown(
    options=df["country"].unique(),  # unique countries
    multi=True,  # more than one selection
    id="country-dropdown",
    className="one-half column",  # required to be in a row
)

# makes a slider for the years
year_slider = dcc.RangeSlider(
    min=m_df.year.min(),
    max=m_df.year.max(),
    marks={
        1800: "1800",
        1900: "1900",
        2000: "2000",
        CURRENT_YEAR: str(CURRENT_YEAR),  # current year for context
        2100: "2100",
    },
    id="year-slider",
    className="one-half column",  # required to be in a row
)

# make the div a row
selectors = html.Div(
    [country_dropdown, year_slider], className="row"
)  # note: we learned how to do this in class, but I am just using css

# Plotting the Data

This is a test plot to see the data before creating the dashboard and the callback.


In [8]:
# creates a multi-line plot using the melted dataframe
fig = px.line(
    m_df,
    x="year",
    y="gdp_per_cap",
    color="country",  # allows many lines to be put there
    title="Gross Domestic Product Per Capita By Country and Year",
)
# fig.show()

# Integrating UI Components

Taking the three UI components and integrating them into a Dash dashboard.


In [9]:
# Combining all the ui components
layout = [*header, selectors, dcc.Graph(id="gdp-per-cap-line-graph")]

In [11]:
# callback which creates and updates the graph using the two inputs: year range and countries
# Note: this was not required, but I wanted to get some practice writing callbacks
@callback(
    Output(component_id="gdp-per-cap-line-graph", component_property="figure"),
    (
        Input(component_id="year-slider", component_property="value"),
        Input(component_id="country-dropdown", component_property="value"),
    ),
)
def update_graph(year_range: [int], countries: [str]) -> px.line:
    if countries is None:
        countries = m_df["country"].unique()  # use all countries if none are selected

    if year_range is None:
        year_range = [
            m_df["year"].min(),
            m_df["year"].max(),
        ]  # use all years if none are selected (should not happen)

    countries_mask = m_df["country"].isin(countries)  # mask for the countries
    year_mask = m_df["year"].between(year_range[0], year_range[1])  # mask for the year

    filtered_df = m_df[countries_mask & year_mask]  # combine the masks and filter

    fig = px.line(  # create the plot using the filtered dataframe. Use labels to rename the columns to make them more readable
        filtered_df,
        x="year",
        y="gdp_per_cap",
        color="country",
        labels={
            "year": "Year",
            "gdp_per_cap": "Gross Domestic Product (GDP) per Capita",
            "country": "Country",
        },
        title="Gross Domestic Product Per Capita By Country and Year",
    )
    return fig

In [12]:
# stylesheet from the creator of dash which does nice things like adding class-based styling for containers
external_stylesheets = ["https://codepen.io/chriddyp/pen/bWLwgP.css"]

app = dash.Dash(
    __name__, external_stylesheets=external_stylesheets
)  # creates a dash app with the name __main__

# final app layout
app.layout = html.Div(
    layout, className="container"  # required to get the css row to work
)

if __name__ == "__main__":
    app.run_server(debug=True)  # runs the notebook