index.qmd

---
title: "AP Scholar Awards Per Capita"
---

AP Scholar Awards are academic distinctions given to students who have received high scores on multiple Advanced Placement (AP) exams. The number of AP Scholar Awards was sourced from the AP Data Archive and the state populations was sourced from the American Community Survey 1-Year Data for 2019. There are many types of AP Scholar Awards and the quantity of each award is presented in the AP Scholar Awards data, but only the total number of awards was utilized in the map below. Rural states with lower populations tend to have a lower number of AP Scholar Awards per capita.

Note: You can hover the cursor over the states on the plot to display extra information.

```{r}
#| echo: false
#| message: false
#| warning: false

library(readxl)
library(plotly)
library(dplyr)
library(tidycensus)
library(scales)

# Using data from 2019 not 2020 (current latest release as of February 15, 2023 because
# ACS 1-year data is not available for 2020 due to their data collection method
# being very different due to the pandemic

# Getting state population from the year 2019 and selecting needed columns
pop_by_state <- get_acs(geography = "state",
                        variables = "B01001_001",
                        year = 2019,
                        survey = "acs1") |>
  select(NAME, estimate)

# Changing column names
colnames(pop_by_state) <- c("State", "Population")

# Getting rid of Puerto Rico (sorry Puerto Rico)
pop_by_state <- pop_by_state[1:51,]

# Changing 'State' rows to their abbreviation
# Reason: So I can use a join function with the AP Scholar data
pop_by_state$State = c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "DC", "FL",
                       "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME",
                       "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH",
                       "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI",
                       "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI",
                       "WY")

ap_scholars <- read_xlsx(path = "data/ap-scholar-data-2019.xlsx")

# Renaming columns (some are broken)
colnames(ap_scholars) <- c("State", "Grade Level", "Total Scholar Awards", "Scholar", "Honor", "Distinction", "National", "StateAP", "DODEA",
                           "InterDiploma", "CanadianNational", "Bermuda", "InternationalAP", "CapstoneDip", "CapstoneCert")

# Arranged by state so all the missing values are at the bottom so it is easier to select the rows needed
ap_scholars <- ap_scholars |>
  arrange(State)


# selected rows needed, kept rows with 'State' being NA because those are the totals for each state
ap_scholars <- ap_scholars[311:361,]

# Added awards with fewer number that are not as significant (not listed on College Board website under AP Scholar Awards)
# Also so a reasonable amount of information displays when hovering over a state on the map plot
# Then selected necessary columns
ap_scholars <- ap_scholars |>
  mutate(Other = StateAP + DODEA + InterDiploma + CanadianNational + Bermuda + InternationalAP + CapstoneDip + CapstoneCert) |>
    select(State, `Total Scholar Awards`, Scholar, Honor, Distinction, National, Other)

# Renamed all 'State' rows to correct state since 'State' rows were NA
# Reason: I took total from each state for each column and state name was not provided in that row
ap_scholars$State <- c("AK", "AL", "AR", "AZ", "CA", "CO", "CT",
                       "DC", "DE", "FL", "GA", "HI", "IA", "ID",
                       "IL", "IN", "KS", "KY", "LA", "MA", "MD",
                       "ME", "MI", "MN", "MO", "MS", "MT", "NC",
                       "ND", "NE", "NH", "NJ", "NM", "NV", "NY",
                       "OH", "OK", "OR", "PA", "RI", "SC", "SD",
                       "TN", "TX", "UT", "VA", "VT", "WA", "WI",
                       "WV", "WY")

# Adding population column so I can potentially find something interesting o_O
ap_scholars <- left_join(ap_scholars, pop_by_state)


# Finding the number of 100,000s in each state so I can find the number of awards per 100,000 people in each state
# This is what what I want to plot, seems interesting
ap_scholars <- ap_scholars |>
  mutate(`Number of 100,000s` = Population / 100000) |>
    mutate(`Awards per 100,000 People` = `Total Scholar Awards` / `Number of 100,000s`)

# Rounding the numbers to integers because they had a lot of decimal places and
# the numbers themselves are high enough that the decimal won't change much
ap_scholars$`Awards per 100,000 People` <- round(ap_scholars$`Awards per 100,000 People`, 0)

# Creation of plot, changed color scale, added hovertext to display extra information while hovering over a state
# Finished plot by adding title and caption
plot_ly(type = "choropleth",
        locations = ap_scholars$State,
        locationmode = "USA-states",
        z = ap_scholars$`Awards per 100,000 People`,
        colors = 'YlOrBr',
        hoverinfo = "text",
        hovertext = paste("AP Scholar Awards Per Capita:", label_comma()(ap_scholars$`Awards per 100,000 People`),
                          "\nTotal Scholar Awards:", label_comma()(ap_scholars$`Total Scholar Awards`),
                          "\nPopulation:", label_comma()(ap_scholars$Population))) |>
  layout(geo = list(scope = "usa"),
         title = "Number of AP Scholar Awards per 100,000 People by State (2019)",
         annotations = list(yref = 'paper', xref = "paper", y = 1.02, x = 1.15,
                            text = "Awards Per Capita", showarrow = F)) |>
    add_annotations(yref = 'paper', xref = "paper", y = -0.07, x = 1.44,
                    text = "College Board: AP Scholar Counts by State 2019
                    Census Bureau: American Community Survey 1-Year Data (2019)                                           ",
                    showarrow = F)
```