-
Notifications
You must be signed in to change notification settings - Fork 0
/
DataVisualizationGallery.Rmd
579 lines (394 loc) · 27.2 KB
/
DataVisualizationGallery.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
---
title: "Static Data Visualization Gallery"
date: "March 2023"
output:
html_document:
theme: flatly
editor_options:
chunk_output_type: console
---
> This is a sample collection of non-interactive data visualizations. See the code used to develop these visualizations on [GitHub](https://github.com/toylyol/misc-data-viz/blob/main/Data%20Viz%20Gallery/DataVisualizationGallery.Rmd). R Markdown was used to create this gallery; the next iteration will, hopefully, be built using Quarto.
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval=TRUE, include=FALSE, echo=FALSE, comment=NA, cache=FALSE)
```
```{r list-development-packages, warnings = FALSE, messages = FALSE, eval = FALSE}
# The following functions are courtesy of Stats and R: https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/
# List names of required packages
packages <- c("tidyverse", "openxlsx", "ggtext", "sf", "ggrepel", "tigris",
"rmapshaper", "jsonlite", "sysfonts", "showtext", "here")
# Install required packages if not yet installed
installed_packages <- packages %in% rownames(installed.packages())
if (any(installed_packages == FALSE)) {
install.packages(packages[!installed_packages])
}
# Load required packages
lapply(packages, library, character.only = TRUE) |> invisible()
```
```{r load-packages, warnings = FALSE, messages = FALSE}
# Load only package actually needed to run this document
library(here)
path <- here()
```
```{r set-up-Google-font, eval = FALSE}
# The Google font used in the flatly theme was identified in this gist: https://gist.github.com/reywood/11069512
# Add Google fonts
sysfonts::font_add_google(name = "Lato",
family = "Lato")
showtext::showtext_auto() # load the font; must be done every session
```
# {.tabset}
## `r fontawesome::fa("chart-line", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
```{r get-slopegraph-data, eval = FALSE}
# Read in cleaned data ----
data <- read.xlsx( paste0(path, "/data/clean_aca_data.xlsx") )
# Restructure data for slopegraph ----
## identify states with the top 10 largest uninsured_rate_change ----
top_states <- data %>%
select(state, uninsured_rate_change) %>%
filter(state != "United States") %>% # remove aggregate value for entire country so not in top 10
slice_min(uninsured_rate_change, n = 10) %>% # replace the superseded top_n() function
pull(state) # isolate data in one column to save as a character vector to filter later
top_states[[11]] <- "United States" # add aggregate US to the top_states list to use for filtering
## add columns to use in formatting ggplot ----
data <- data %>%
mutate( pct_change = as.character(round(uninsured_rate_change*100, 0)) ) %>%
mutate( line_style = ifelse(state == "United States", "dotted", "solid") ) %>% # add col so USA will have special line type
mutate( abbrev = case_when(state == "Nevada" ~ "NV",
state == "Oregon" ~ "OR",
state == "California" ~ "CA",
state == "Kentucky" ~ "KY",
state == "New Mexico" ~ "NM",
state == "West Virginia" ~ "WV",
state == "Arkansas" ~ "AR",
state == "Florida" ~ "FL",
state == "Colorado" ~ "CO",
state == "Washington" ~ "WA",
state == "United States" ~ "USA") ) %>%
mutate( largest = case_when(abbrev == "CA" ~ "Yes", # add col to use for custom color scale
abbrev == "OR" ~ "Yes",
abbrev == "NV" ~ "Yes",
TRUE ~ "No") )
## subset dataframe ----
df_slopegraph <- data %>%
filter(state %in% top_states) %>%
select(state, uninsured_rate_2010, uninsured_rate_2015, pct_change, line_style, abbrev, largest) %>%
pivot_longer(!c(state, line_style, abbrev, pct_change, largest), # delineate cols to keep
names_to = "date",
values_to = "rate")
df_slopegraph$date <- df_slopegraph$date %>%
str_replace_all("uninsured_rate_2010", "2010") %>% # change name
str_replace_all("uninsured_rate_2015", "2015")
```
```{r generate-slopegraph, eval = FALSE}
# Create slopegraph
slopegraph <- ggplot(data = df_slopegraph,
aes(x = date,
y = rate,
group = state,
color = largest, # specify col to use to color points and lines
linetype = line_style, # specify col to change line style for USA
label = paste0(abbrev, " ", rate*100, "%"))) +
geom_line(alpha = 1, size = 1.25) +
geom_point(alpha = 1, size = 4) +
geom_text_repel( data = df_slopegraph %>% filter(date == "2010"), # label pts directly; format label for 2010
family = "Lato",
hjust = "left",
fontface = "bold",
size = 3.5,
nudge_x = -.3,
direction = "y") +
geom_text_repel( data = df_slopegraph %>% filter(date == "2015"),
family = "Lato",
hjust = "right",
fontface = "bold",
size = 3.5,
nudge_x = .3,
direction = "y") +
scale_color_manual(values = c("Yes" = "#2D708EFF", # specify custom scale colors using 'largest' col
"No" = "#c9c9c9")) +
scale_linetype_manual(values = c("solid" = "solid", # specify line types
"dotted" = "dotdash")) +
scale_x_discrete(expand = c(0, 0)) + # remove space between plot and axis
xlab("\nYear") + # change axes titles, using newline to add space between axes
ylab("Percentage of Uninsured Persons\n") +
labs(caption = "Source: Kaggle") +
theme_minimal() +
theme(
text = element_text(family = "Lato",
color = "#2C3E50"), # set the font family for all {ggtext} elements
plot.title.position = "plot", # help ensure title aligned with plot
axis.text.y = element_blank(), # remove y axis labels
panel.grid.major.y = element_blank(), # remove horizontal gridlines
panel.grid.minor.y = element_blank(),
legend.position = "none" # remove legend
)
ggsave( here("images", "slopegraph.png"), height = 576, units = "px" )
```
**California, Oregon, and Nevada saw the <span style ='color: #0F4B63'>largest decreases</span> in uninsured persons after the Affordable Care Act.**
On average, the uninsured rate in the United States decreased six percent from 2010 to 2015. The top 10 states with the largest decreases in uninsured persons had decreases of more than eight percent.
![]( `r here("images/slopegraph.png")` "This is a slopegraph depicting the percentage of uninsured persons in the United States in 2010 and in 2015 (before and after the Affordable Care Act). The states with the top ten deceases in the percentage of uninsured persons are on the y-axis, and the two years (2010 and 2015) are on the x-axis. The average for the United States is also included. The slopegraph shows a decrease in the percentage of uninsured persons from 2010 to 2015.")
## `r fontawesome::fa("info-circle", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
- Thanks to Bruno Kenzo for [inspiration](https://github.com/KenzoBH/Visualizations) on how to display this collection of data visualizations.
- The alt text was generated using the formula provided by Liz Hare in her October 2022 [presentation](https://lizharedogs.github.io/RLadiesNYAltText "Click here to be directed to Liz's presentation slides.") for R-Ladies NYC.
- The data used in this slopegraph was downloaded from [Kaggle](https://www.kaggle.com/datasets/hhs/health-insurance).
- Chuck Powell's [slopegraph tutorial](https://ibecav.github.io/slopegraph/) was invaluable for determining how to structure the data and to use {ggrepel}.
- This [Stack Overflow forum](https://stackoverflow.com/questions/71573377/cannot-import-fonts-into-r) provided a quick way to load and use Google fonts without downloading them locally.
- A [Thinking on Data](https://www.thinkingondata.com/something-about-viridis-library/) blog post provided the HEX codes in the viridis palette so that a cohesive color palette could be created.
- Many resources were used to remember the options that can be used for {ggplot} customization:
- [Removing gridlines](https://r-graphics.org/recipe-appearance-hide-gridlines)
- [Aligning captions](https://stackoverflow.com/questions/64701500/left-align-ggplot-caption)
- [Changing color conditionally within ggrepel::geom_text_repel()](https://stackoverflow.com/questions/49622822/control-colour-of-geom-text-repel)
- [Specifying custom color scale](https://community.rstudio.com/t/setting-colours-in-ggplot-conditional-on-value/8328/2)
- [Changing line types conditionally](http://www.sthda.com/english/wiki/ggplot2-line-types-how-to-change-line-types-of-a-graph-in-r-software)
- [Aligning to the plot](https://ggplot2.tidyverse.org/articles/faq-axes.html#how-can-i-remove-the-space-between-the-plot-and-the-axis)
# {-}
<br>
# {.tabset}
## `r fontawesome::fa("map-location-dot", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
```{r get-broadband-data, eval = FALSE}
# Get data ----
## retrieve data from Census Bureau's ACCESS Dashboard
broadband <- openxlsx::read.xlsx( paste0(path,"/data/county_data_ACCESS_BROADBAND_Dashboard.xlsx") )
## retrieve sf object using {tigris}
ut_counties <- tigris::counties(state = "UT", year = 2021) %>%
rmapshaper::ms_simplify()
## change case
names(ut_counties) <- tolower(names(ut_counties))
# Merge and clean up data ----
ut_counties_sf <- ut_counties %>%
left_join(broadband, by = c("geoid" = "GEO_ID")) %>%
select(-c("lsad", "classfp", "mtfcc", "csafp", "cbsafp", "metdivfp", "funcstat"))
# Create new sf object for counties comprising "Silicon Slopes" ----
## Utah County geoid: "49049"
## Salt Lake County geoid: "49035"
## Summit County geoid: "49043"
## subset counties
utah_co_sf <- ut_counties_sf[ut_counties_sf$geoid == "49049", ]
salt_lake_co_sf <- ut_counties_sf[ut_counties_sf$geoid == "49035", ]
summit_co_sf <- ut_counties_sf[ut_counties_sf$geoid == "49043", ]
## union counties; use bind_rows() to retain county boundaries
silicon_slopes_sf <- utah_co_sf %>%
st_union(salt_lake_co_sf) %>%
st_union(summit_co_sf) %>%
select(name.x, geometry)
## change name
silicon_slopes_sf$name.x <- "Silicon Slopes"
```
```{r make-broadband-map, eval = FALSE}
# Plot UT broadband map ----
broadband_map <- ggplot() +
geom_sf(data = ut_counties_sf,
aes(fill = pct_telework_ACS17_21),
color = "white", # change county borders
size = 0.5) + # change stroke width
scale_fill_viridis_c(name = "Percent of Teleworking Workers",
direction = -1,
label = scales::label_number(suffix = "%")) +
geom_sf(data = silicon_slopes_sf,
color = "white",
size = 1.85,
alpha = 0) + # make layer transparent
geom_sf_text(data = silicon_slopes_sf,
aes(label = name.x),
color = "white",
size = 5,
face = "bold",
nudge_x = 1.7,
nudge_y = -0.2) +
guides(fill = guide_colorbar(title.position = 'top',
title.hjust = .5,
title.theme = element_text(size = 9),
barwidth = unit(12, 'lines'),
barheight = unit(.5, 'lines'))) +
labs(caption = "Source: U.S. Census Bureau") +
theme_void() +
theme(
text = element_text(family = "Lato"),
plot.caption.position = "plot", # move caption to be right-aligned with plot
legend.position = "top", # move legend above map
legend.text = element_text(size = 8), # size legend title text
legend.margin = margin(10, 6, 6, 4), # add cushion
plot.margin = margin(10, 0, 0, 0)
)
```
**In 2021, the counties of 'Silicon Slopes' had some the highest percentages of teleworkers in Utah.**
Unsurprisingly, the three counties comprising UT's technology hub (Salt Lake, Summit, and Utah) were among the top five counties with the highest percentages of teleworkers.
![]( `r here("images/broadband_map.png")` "This is a choropleth map of the counties within the state of Utah. The counties are filled based on the percentage of teleworkers in the county in the 2017-2021 five-year American Community Survey estimates. The yellow-green-blue-purple viridis color palette is used such that the higher the percentage of teleworkers, the darker the color of the county. The three counties comprising “Silicon Slopes” are emphasized with a thick, white border to highlight the fact that they have some of the highest percentages of teleworkers in the state.")
## `r fontawesome::fa("info-circle", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
- The telework data were obtained from the U.S. Census Bureau's [ACCESS BROADBAND Dashboard](https://www.census.gov/programs-surveys/community-resilience-estimates/partnerships/ntia/broadband-act.html). This dashboard is a result of work done by the Census Bureau and the National Telecommunications and Information Administration (NTIA) following the passage of the ACCESS BROADBAND Act of 2021. The original source of the teleworker data was the five-year American Community Survey from 2017-2021.
- In the ACCESS Dashboard data dictionary, the definition for the percentage of teleworkers is as follows: "The percentage of workers ages 16 years and older that reported their residential address as the geographic location at which they carried out their occupational activities." See the complete data dictionary online or in "/data/file_layout_ACCESS_BROADBAND_Dashboard.xlsx" within the R project on [GitHub](https://github.com/toylyol/misc-data-viz/tree/main/Data%20Viz%20Gallery/data).
- [*Geomcomputation with R*](https://bookdown.org/robinlovelace/geocompr/geometric-operations.html) was helpful for learning which functions could be used to subset an existing shapefile, and union the resulting shapefiles, to create a new shapefile.
# {-}
<br>
# {.tabset}
## `r fontawesome::fa("table-cells", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
```{r retrieve-heatmap-data, eval = FALSE}
# Load data ----
mcd_data <- read_csv( paste0(path, "/data/mcd_menu.csv") ) # use readr instead of base to avoid substitution of spaces for dots
# Keep only columns with % DV ----
keep_cols <- mcd_data %>%
select( matches("% Daily Value") ) %>%
names()
keep_cols <- c("Category", "Item", keep_cols) # add two cols to char vector
mcd_data <- mcd_data %>%
select( all_of(keep_cols) ) # use select helper function to avoid ambiguous-external-vector warning
# Convert to decimals ---
convertPct <- function(column){column/100}
mcd_data <- mcd_data %>%
mutate(across(`Total Fat (% Daily Value)`:`Iron (% Daily Value)`,
convertPct))
# Subset data ----
breaky_foods <- c("Egg McMuffin", "Egg White Delight", "Sausage McMuffin", "Steak & Egg McMuffin",
"Bacon, Egg & Cheese Biscuit (Regular Biscuit)", "Sausage Biscuit (Regular Biscuit)",
"Southern Style Chicken Biscuit (Regular Biscuit)", "Steak & Egg Biscuit (Regular Biscuit)",
"Bacon, Egg & Cheese McGriddles", "Sausage McGriddles", "Bacon, Egg & Cheese Bagel",
"Steak, Egg & Cheese Bagel", "Big Breakfast (Regular Biscuit)",
"Big Breakfast with Hotcakes (Regular Biscuit)", "Hotcakes",
"Cinnamon Melts", "Sausage Burrito","Fruit & Maple Oatmeal")
breaky <- mcd_data %>%
filter(Item %in% breaky_foods)
# Shape data into long format ----
breaky <- breaky %>%
pivot_longer(cols = !c(Category, Item),
names_to = "nutrition",
values_to = "pct_dv")
breaky$Item <- str_replace_all(breaky$Item, "Regular Biscuit", "Regular")
breaky$nutrition <- str_replace_all(breaky$nutrition, " \\(% Daily Value\\)", "")
breaky <- breaky %>%
mutate(face_format = case_when(Item == "Big Breakfast (Regular)" ~ "bold",
Item == "Big Breakfast with Hotcakes (Regular)" ~ "bold",
TRUE ~ "plain")
)
```
```{r create-heatmap, eval = FALSE}
# Generate heatmap ----
heatmap <- ggplot(data = breaky,
aes(x = nutrition,
y = Item,
fill = pct_dv,
height = 1, # delineate size of tiles
width = 1)) +
geom_tile(color = "white", # add white border
size = 0.01) +
geom_text(aes(label = paste0(pct_dv*100, "%")), # format pct_dv as percent
color = "white",
size = 3) +
scale_fill_viridis_c(name = "Nutrition\n(% Daily Value)",
direction = -1,
labels = scales::percent_format()) +
scale_y_discrete(expand = c(0,0)) + # remove space between plot and axis
scale_x_discrete(expand = c(0, 0)) +
labs(x = "",
y = "", # remove titles from x and y axes
caption = "Source: Kaggle") +
theme_minimal() +
theme(
text = element_text(family = "Lato",
color = "#2C3E50",
size = 12),
axis.text.x = element_text(angle = -45, # angle the x-axis text
vjust = -0.01,
hjust = -0.01),
plot.caption.position = "plot",
legend.text = element_text(size = 9), # size legend title text
plot.margin = margin(10, 0, 0, 0)) +
coord_fixed() # keep tiles square
ggsave( here("images", "heatmap.png"), width = 889, height = 764, units = "px" )
```
**McDonald's 'Big Breakfast' is a high-cholesterol start to the day.**
Of the breakfast items listed, the Big Breakfast (with or without hotcakes) has the worst nutritional value.
![]( `r here("images/heatmap.png")` "This is a heatmap. On the y-axis, McDonald’s breakfast menu items are listed. On the x-axis, several nutritional categories are listed: calories, carbs, cholesterol, dietary fiber, iron, saturated fact, sodium, total fat, vitamin A, and vitamin C. The yellow-green-blue-purple viridis color palette is used: The higher the percentage (percent daily value) of a particular nutritional category that a breakfast item has, the darker the color of the square. For the vitamins and minerals, most of the squares are yellow or light yellow-green. The squares are darkest for nutritional categories like cholesterol, saturated fat, and sodium.")
## `r fontawesome::fa("info-circle", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
- The data visualized in this heatmap are from [Kaggle](https://www.kaggle.com/code/kathakaliseth/mcdonald-s-menu-comparative-nutrition-values).
- This Data Science Tutorials' [heatmap tutorial](https://www.r-bloggers.com/2022/10/how-to-create-a-heatmap-in-r/) distributed via R-Bloggers was quite helpful for recalling how the data must be structured for a heatmap made with ggplot2::geom_tile().
- The code for a beautifully formatted horizontal colorbar is courtesy of Cédric Scherer's [workshop materials](https://www.cedricscherer.com/slides/OutlierConf2021_ggplot-wizardry.pdf) from Data Visualization Society's Outlier Conf 2021.
- Two Stack Overflow forums ([1](https://stackoverflow.com/questions/37443499/how-to-fix-adjust-the-width-of-each-band-in-ggplot-geom-tile?noredirect=1&lq=1) and [2](https://stackoverflow.com/questions/23897175/adjust-ggplot2-geom-tile-height-and-width)) were helpful for learning how to configure the height/weight arguments to ensure the tiles were large enough when the heatmap was saved as a PNG.
# {-}
<br>
# {.tabset}
## `r fontawesome::fa("map", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
```{r get-map-data, eval = FALSE}
# Retrieve subset of Multiple Chronic Conditions (MCC) dataset using CMS API ----
#
# url <- "https://data.cms.gov/data-api/v1/dataset/15b08729-6ea2-4789-bf1a-b96b1da8338f/data?filter[Bene_Demo_Lvl][value]=Sex&filter[Bene_Geo_Lvl][value]=State"
#
# mcc_data <- jsonlite::fromJSON(url)
#
# write.xlsx( mcc_data, paste0(path, "/data/mcc_data.xlsx") ) # save a copy of the MCC data
# Load MCC data (if you don'tfeel like using the CMS API) ----
mcc_data <- read.xlsx( paste0(path, "/data/mcc_data.xlsx") )
# Ensure correct data type ----
mcc_data <- mcc_data %>%
mutate( across(c(Prvlnc, Tot_Mdcr_Stdzd_Pymt_PC, Tot_Mdcr_Pymt_PC, Hosp_Readmsn_Rate, ER_Visits_Per_1000_Benes),
as.numeric) )
# Subset MCC data ----
data <- mcc_data %>%
filter( Bene_Age_Lvl == "<65",
Bene_Demo_Desc == "Female",
Bene_MCC == "6+" )
# Load hexagonal U.S. shapefile ----
shapefile <- paste0(path,"/data/us_states_hexgrid.shp") # download from CARTO
hex_map <- read_sf(shapefile)
# Clean up shapefile columns ----
hex_map <- hex_map %>%
mutate( google_nam = gsub(" \\(United States\\)", "", google_nam) ) %>%
rename( state_name = google_nam,
abbrev = iso3166_2 ) %>%
select( -c(created_at, updated_at, label, bees) ) # remove necessary columns
# Join (non-spatial) with MCC data ----
hex_map <- hex_map %>%
left_join( data, by = c("state_name" = "Bene_Geo_Desc") )
# Retrieve centroid for labelling ----
hex_map <- cbind(hex_map, st_coordinates(st_centroid(hex_map)))
```
```{r make-hexmap, eval = FALSE}
# Create hexbin map ----
hex_map <- hex_map %>%
ggplot() +
geom_sf(aes(fill = Prvlnc, # indicate col to make chloropleth
shape = "No data available"), # create an override value for NA values
color = "white") + # change hexbin border colors
scale_fill_viridis_c(name = "Prevalence of 6+ Chronic Conditions", # give the legend a name
direction = -1, # reverse the scale, so darker equals larger number
labels = scales::percent_format(), # format the numbers in the legend
na.value = "gray68") + # specify color NA values
geom_text(aes(x = X, # specify long of centroid
y = Y, # specify lat of centroid
label = abbrev), # indicate the col to label each hexbin
color = "white", # change font color
family = "Lato") + # delineate the custom font
guides(shape = guide_legend(override.aes = list(fill = "gray68", # add NA value to legend
color = "white"), # set border w/in legend
order = 2,
title = NULL),
fill = guide_colorbar(title.position = 'top', # move legend title to top
title.hjust = .5, # use Cédric Scherer code to alter viridis colorbar
title.theme = element_text(size = 9),
barwidth = unit(20, 'lines'),
barheight = unit(.5, 'lines'),
order = 1)) + # ensure viridis scale is first
labs(caption = "Source: Centers for Medicare & Medicaid Services") +
theme_void() +
theme(
text = element_text(family = "Lato",
color = "#2C3E50"),
plot.title.position = "plot", # help ensure title aligned with plot
plot.caption.position = "plot", # move caption to be right-aligned with plot
legend.position = "top", # move legend above map
legend.text = element_text(size = 8), # size legend title text
legend.margin = margin(10, 6, 6, 4) # add cushion between subtitle, legend, and map
)
ggsave( here("images", "hex_map.png"), height = 576, units = "px" )
```
**In 2018, Oklahoma had the highest prevalence of elderly, female Medicare enrollees with 6+ chronic conditions.**
The prevalence of six or more chronic conditions among Medicare beneficiaries assigned female at birth aged 65 years or older was 20.5% in OK in 2018.
![]( `r here("images/hex_map.png")` "This is a hexbin map of the United States. Each state (and Washington D.C.) is represented as a hexagon. Each hexbin is filled based on the prevalence of Medicare beneficiaries aged 65 years or older assigned female at birth who have six or more chronic conditions. The yellow-green-blue-purple viridis color palette is used such that the higher the prevalence percentage, the darker the color of the hexagon. There are several states for which there are no data: Those hexbins are colored a medium gray. The darkest colors are in states that would be located in the southeastern United States in a traditional map (e.g., Oklahoma, Florida, Louisiana, etc.), indicating a relatively high prevalence of elderly, female Medicare beneficiaries with multiple chronic conditions.")
## `r fontawesome::fa("info-circle", fill = "#2c3e50", fill_opacity =0.75, height = "1.5em")`
- This visualization is a hexbin map of the United States; it features data from the Centers for Medicare and Medicaid Services' (CMS) Multiple Chronic Conditions (MCC) [dataset](https://data.cms.gov/medicare-chronic-conditions/multiple-chronic-conditions).
- The MCC data were retrieved through the [CMS API](https://data.cms.gov/api-docs). More information about the dataset can be viewed in the [data dictionary](https://data.cms.gov/resources/multiple-chronic-conditions-data-dictionary) and [methodology documentation](https://data.cms.gov/resources/medicare-chronic-conditions-methodology).
- Yan Holtz's excellent [hexbin map tutorial](https://r-graph-gallery.com/328-hexbin-map-of-the-usa.html) on The R Graph Gallery provided the data source for the hexagonal shapefile—as well as helpful hints for manipulating the file.
- The hexagonal shapefile of the United States was obtained from [Carto](https://team.carto.com/u/andrew/tables/andrew.us_states_hexgrid/public/map). It can be downloaded in other formats like GeoJSON.
- Tips and tricks for handling NA values in a continuous choropleth map were learned from R for the Rest of Us' [Mapping with R](https://rfortherestofus.com/courses/mapping/) course taught by Charlie Hadley.
# {-}
<br>
<p style="text-align: center;">Toyin L. Ola</p>