# Visualisation Types


## Learning Objectives 

- Describe when to use the following kinds of visualizations to answer specific questions using a data set:
    - scatterplots
    - line plots
    - bar plots
    - histograms
- Generate and fine-tune visualizations using Stata commands
- Use `graph export` to save visualizations in various formats including `.svg`, `.png` and `.pdf`


## Introduction 

Data visualization is an effective way of communicating ideas to your audience, whether it's for an academic paper or a business setting. It can be a powerful medium to motivate your research, illustrate relationships between variables and provide some intuition behind why you applied certain econometric methods.

The real challenge is not understanding how to use Stata to create graphs, the challenge is figuring out the graph that will do the best job at telling your empirical story.

Before creating any graphs, be sure to identify the message you want to the graph convey. Try to answer the questions: Who is our audience? What is the question we're trying to answer?

<div class="alert alert-block alert-info">
<b>Note:</b> You can use the drop down menus to create your graphs. If you want to include this in your Do file you simply need to copy and paste the command that appears in the Command Window after you create the graph
</div>


## Scatter Plot

<!-- what is it? and, when to use? --> 
Scatter plots are frequently used to demonstrate how two quantitative variables are related to one another. This plot works great when we are interested in showing relationships and groupings among variables from relatively large datasets.

### Example
- ![Relationship of country religiosity vs wealth](https://ourworldindata.org/uploads/2013/11/GDP-vs-Religion.png) 
- [Comparing Americans' perceptions of which foods are healthy to the perception of nutritionists](https://www.nytimes.com/2017/10/09/learning/whats-going-on-in-this-graph-oct-10-2017.html)
- [Instagram followers in the fashion industry](https://qz.com/267635/explore-the-hidden-patterns-of-the-fashion-instagram-universe/)

### Creating scatter plots 

Let's begin by loading our dataset

In [None]:
clear* 

use fake_data, clear 

Perhaps you are already familiar with this dataset. If not, use the command `describe` to get a sense of the data. 

In [None]:
describe

Let's say we want to plot the log-earnings by year. We begin by generating a new variable for log earnings. 

In [None]:
gen log_earnings = log(earnings)

la var log_earnings "Log-earnings"

In [None]:
preserve

collapse (mean) log_earnings, by(year)

In [None]:
describe

To create a scatterplot we need to use Stata's `twoway` command. The most important skill with graphing in Stata is to be able to understand the documentation. 

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Try using the `help` command to pull up the documentation for `twoway`.
    
</div>


In [None]:
help twoway

In [None]:
twoway (scatter log_earnings year)

graph export ./img/myscatterplot.svg, replace

It should look something like this: ![myscatterplot](img/myscatterplot.svg)

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> You can customize and fine tune the graph using the Stata's *options* for `twoway` graphs. Try running `help twoway options` in the code cell below to see the documentation.
    
</div>

You can include axis titles using the *ytitle* and *xtitle* options. 

In [None]:
twoway (scatter log_earnings year), xtitle("Year") ytitle("Log-earnings")

graph export ./img/myscatterplot2.svg, replace

You can write your code in a single line as shown above. However, graph code can get very lengthy, so, to keep things neat and simple, we will break up the code into multiple lines using `///` in the next few examples. 

```stata

twoway (scatter log_earnings year), ///
    xtitle("Year") ytitle("Log-earnings")

graph export ./img/myscatterplot2.svg, replace

```

![myscatterplot2](img/myscatterplot2.svg)

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Here's an example of a connected scatterplot. Can you deduce the command from the `twoway` documentation? 
    
</div>

![connected-scatter-plot](./img/myconnectedplot.svg)

## Line Plot

<!-- what is it? and, when to use? --> 
Line plots visualize trends with respect to an independent, ordered quantity (e.g., time). This plot works great when one of our variables is ordinal (time-like) or when we want to display multiple series on a common timeline

### Creating line plots 

Line plots can be generated using Stata's `twoway` command we saw earlier. 

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Complete the code below to create a line plot of log_earnings by year and save it as "mylineplot.png"
    
</div>

In [None]:
twoway ( log_earnings year), ///
    xtitle("Year") ytitle("Log-earnings")

graph export , replace

It should look something like this: ![mylineplot](img/mylineplot.svg)

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Now let's try creating a line plot with multiple series on a commone timeline. Let's set up the data frame to include log-earnings, year and treatment variable. Then, use your code from the last exercise to complete the code for a multiple series line plot. Export the graph as `multilineplot.svg`.
    
</div>


In [None]:
restore

In [None]:
preserve

collapse (mean) log_earnings, by(treated year)

describe

In [None]:
twoway ( log_earnings year if treated) || ( log_earnings year if !treated), ///
    xtitle("Year") ytitle("Log-earnings")                                  ///
    legend( label(1 "Treated") label(2 "Control"))

graph export , replace

It should look something like this: ![multilineplot](img/multilineplot.svg)

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Building on your answers from before, complete the following code to insert an indicator line at treatment year (2002). Export the graph as `multilineplot2.svg`.
    
</div>

In [None]:
twoway ( log_earnings year if ) || ( log_earnings year if ),  ///
    xtitle(" ") ytitle(" ")                                           ///
    legend( label(1 " ") label(2 " "))                                 /// 
    xline( /*treatment year*/, lcolor(cranberry) lpattern(dash_dot))

graph export ./img/multilineplot2.svg, replace

It should look something like this:
![multilineplot2](img/multilineplot2.svg)

## Histogram

<!-- what is it? and, when to use? --> 
Histograms visualize the distribution of one quantitative variable. This plot works great when we are working with a discrete variable and are interested in visualizing all its possible values and how often they occur

### Example
### Creating histograms

Now let's restore the original dataset so that we can plot the distribution of log-earnings. 

In [None]:
restore

describe

In [None]:
histogram log_earnings

graph export ./img/myhistogram.svg, replace

It should look something like this: ![myhistogram](img/myhistogram.svg)

You can change the colour of the bars by using the `color` option

In [None]:
histogram log_earnings, color(emidblue)

graph export ./img/myhistogram2.svg, replace

![myhistogram](img/myhistogram2.svg)

Run the code cell below to view the colorstyle options available in Stata

In [None]:
help colorstyle

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Generate a histogram of the age distribution in our dataset. Try customizing the bar colour while you're at it. Export the graph as `myhistogram3.svg`. 
    
</div>

## Bar plot

<!-- what is it? and, when to use? --> 
Bar plots visualize comparisons of amounts. It is useful when we are interested in comparing a few categories as parts of a whole, or across time. 

> Bar plots should always start at 0. Starting bar plots at any number besides 0 is generally considered a misrepresentation of the data.

### Creating a bar plot


In [None]:
help graph bar   /* this is a "traditional" bar plot. You can also create a bar plot using the twoway command.*/

Now let's plot mean earnings by region. Note that the regions are numbered in our dataset. 

In [None]:
graph bar (mean) earnings, over(region)

graph export ./img/mybarchart.svg, replace

![mybarchart](img/mybarchart.svg)

You can also create a horizontal bar plot by using the command `hbar`.

In [None]:
graph hbar (mean) earnings, over(region)

graph export ./img/mybarchart2.svg, replace

![mybarchart2](./img/mybarchart2.svg)

You can also group your bars over another variable (or "category")

In [None]:
graph hbar (mean) earnings,  over(treated) over(region)

graph export ./img/mybarchart3.svg, replace

![mybarchart3](img/mybarchart3.svg)

<div class="alert alert-block alert-info">
    
<b>Note:</b> Compare this visualisation with `multilineplot2` we generated earlier. Do both these visualisations tell the same story? Does one capture the treatment effect better than the other? 
    
</div>

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> What happens when you switch the order of categories in the code above? Try this in the following code cell. 
    
</div>

In [None]:
graph hbar (mean) earnings,  over() over()

graph export ./img/mybarchart4.svg, replace

<div class="alert alert-block alert-warning">
    
<b>Your turn:</b> Run the code cell below. Then, try switching the `over` and `by` variables and store it in `mybarchart5.svg` in the next code cell. 
    
</div>

In [None]:
graph hbar (mean) earnings,  over(treated) by(region)

graph export ./img/mybarchart5.svg, replace

## Format

So far, we exported our graphs in svg format. You can also export your graph in other formats such as `.jpg`, `.png` and `.pdf`. This may be particularly helpful if you plan to use LaTeX for writing your paper, as `.svg` files cannot be used with LaTeX PDF output. 

## Fine-tuning your graph further

In order to customise your graph further, you can use the tools in the Stata graph window or the graph option commands we have been using in this module. So far, we added axis titles and labels using graph options. You can also include and adjust the following: 

- title 
- legend 
- axis 
- scale
- labels 
- theme (i.e. colour, appearance)
- adding lines, text or objects 

While we won't cover each of these in this module, you can always go back to the Stata documentation to explore the options available to you based on your needs. 


In [None]:
help twoway options

## Further reading

- [Make your data speak for itself! Less is more (and people don’t read)](https://towardsdatascience.com/data-visualization-best-practices-less-is-more-and-people-dont-read-ba41b8f29e7b)

## References 

- Timbers, T., Campbell, T., Lee, M. (2022). Data Science: A First Introduction. https://datasciencebook.ca/viz.html
- Schrimpf, Paul. "Data Visualization: Rules and Guidelines." In *QuantEcon DataScience*. Edited by Chase Coleman, Spencer Lyon, and Jesse Perla. https://datascience.quantecon.org/applications/visualization_rules.html
- Kopf, Dan. "A brief history of the scatter plot." *Quartz*. March 31, 2018. https://qz.com/1235712/the-origins-of-the-scatter-plot-data-visualizations-greatest-invention/