# Visualization and Communication

Zhentao Shi

<!-- code is tested on SCRP -->

## Plot()

* `plot` is a generic command for graphs in `r-base`.
  * For preliminary statistical graphs.

* `matplot` for multiple objects

In [None]:
quantmod::getFX("USD/JPY")
quantmod::getFX("HKD/JPY")
matplot( y = cbind(USDJPY, HKDJPY*7.8), 
         x = zoo::index(USDJPY), 
         type = "l", xlab = "time"  )

## ggplot2


* Many proposals to enhance `plot`
* `ggplot2` is the most successful. 

* Advanced system for high-quality statistical graphs.


## Syntax

* `ggplot()` specifies which dataset to use for the graph.
* `geom_XXX()` determines the shape to draw,  
  *  scatter dots
  *  lines
  *  curves or areas...

In [None]:
library(tidyverse)
d0 = read.csv("data_example/AJR.csv", header = TRUE)

# "avexpr: average protection against expropriation risk
# "logpgp95": logarithm of GDP per capita in 1995

ggplot(data = d0) + geom_point(mapping = aes(x = avexpr, y = logpgp95))


### Bank Marketing Data


In [None]:
bank_0 = read.csv("data_example/bank-full.csv", header = TRUE, sep = ";" )
dim(bank_0)

In [None]:
# scatter plot
p1 <- ggplot(data = bank_0) + geom_point(mapping = aes(x = age, y = balance)) 
# balance: average yearly balance, in euros (numeric) 
print(p1)

In [None]:
# scatter plot with groups
p2 <- ggplot(data = bank_0) + geom_point(mapping = aes(x = age, y = balance, 
                                                       color = education, 
                                                       alpha = 0.5))
print(p2)

### Subgraphs

* Subgraphs convey rich information and easy comparison.
* `ggplot2` is good at drawing multiple graphs, either of the same pattern or of
different patterns. 


In [None]:
p3 <- p1 + facet_wrap( martial ~ education)
print(p3)

In [None]:
# educational levels in each ago
ggplot(data = bank_0) + geom_bar(mapping = aes(x = age, fill = education))

In [None]:
p4 <- ggplot(data = bank_0) + geom_bar(mapping = aes(x = age, fill = education), position = "dodge")
print(p4)

In [None]:
p5 <- p4 + coord_flip()
print(p5)

## Tidy data

* ggplot adds elements in a graph one by one, and then print out the graph all together.

* `ggplot2` accommodates data frames of a particular format. 
* `tidyr` is a package that helps prepare the data frames.


* Example: [Penn World Table](https://www.rug.nl/ggdc/productivity/pwt/?lang=en)

In [None]:
d0 = readr::read_csv("data_example/PWT100.csv", col_names = TRUE)
head(d0)
colnames(d0)

In [None]:
# work with a smaller dataset

d1 <- select(d0, countrycode, year, rgdpe, pop) %>%
  filter(countrycode %in% c("CHN", "RUS", "JPN", "USA")) %>%
  mutate(gdpcapita = rgdpe/pop) 

# rgdpe: Expenditure-side real GDP at chained PPPs, 
#        to compare relative living standards across countries and over time

print(d1)  

In [None]:
ggplot(d1) + 
  geom_point(mapping = aes(x = year, y = rgdpe, color = countrycode))

In [None]:
ggplot(d1) + 
  geom_line(mapping = aes(x = year, y = gdpcapita, color = countrycode))

In [None]:
s1 <- d1 %>% 
  select( countrycode, year, pop) %>%
  spread( key = year, value = pop)
print(s1)


In [None]:
gather(s1, '1950':'2019', key = "year", value = "pop")

### Subgraphs of the same pattern


* Example: Plot the density of two estimators under three different data generating processes.



In [None]:
load("data_example/big150.Rdata")
head(big150)

In [None]:

big150_1 <- select(big150, typb, b1, b1_c) %>%
            gather("b1", "b1_c", key = "estimator", value = "value")
print(head(big150_1))


`theme` is to tune the supplementary elements like the background, the size and font of the axis text and so on.

In [None]:

p1 <- ggplot(big150_1)
p1 <- p1 + geom_area(
  stat = "density", alpha = .25,
  aes(x = value, fill = estimator), position = "identity"
)
p1 <- p1 + facet_grid(. ~ typb)
p1 <- p1 + geom_vline(xintercept = 0)
p1 <- p1 + theme_bw()
p1 <- p1 + theme(
  strip.text = element_text(size = 12),
  axis.text = element_text(size = 12)
)
print(p1)

### Example

* This example aligns two graphs of different patterns in one page.
  * Similar graphs appear in [Shi and Zheng, 2018](https://onlinelibrary.wiley.com/doi/abs/10.1002/jae.2640).
  * To unify the theme of the two subgraphs,
define an object `theme1` and apply it in both graphic objects
`p1` and `p2`.



In [None]:
# graph packages
library(lattice)
library(ggplot2)
library(gridExtra)

load("data_example/multigraph.Rdata") # load data

# unify the theme in the two graphs
theme1 <- theme_bw() + theme(
  axis.title.x = element_blank(),
  strip.text = element_text(size = 12),
  axis.text = element_text(size = 12),
  legend.position = "bottom", legend.title = element_blank()
)

In [None]:
# sub-graph 1
d1 <- data.frame(month = 1:480, m = m_vec)
p1 <- qplot(x = month, y = m, data = d1, geom = "line")
p1 <- p1 + theme1 + ylab("fraction of chartists")

# sug-graph 2
d2$month <- 1:480
p2 <- ggplot(d2) + geom_line(aes(x = month, y = value, col = variable))
p2 <- p2 + theme1 + ylab("price and fundamental")

# generate the grahp
grid.arrange(p1, p2, nrow = 2)

## Interactive Graph

* Users provides customized inputs
* Graph presents corresponding outcome

* `flexboard.Rmd` is an example.
* Easy to convert a ggplot2 graph with `plotly::ggplotly()`.



## R Markdown

* Provide R-basics from previous version of rmd file.

Notebooks

* Rmd format
* Ipynb format

## Shiny App

Web-based R package for interactive graph. 


* [tutorial](https://shiny.rstudio.com/tutorial/)


* `UI`: interface
* `Server`: calculation of input data


### Example


* [Shenzhen housing price](https://zhentao-shi.shinyapps.io/ShenzhenHousing-Shiny/)
  * [code](https://github.com/metricshilab/Shenzhen-Housing)
* [HP filter](https://zwmei-metrics.shinyapps.io/boosted_hp_app/)
  * [code](https://github.com/metricshilab/Boosted_HP_App)