**SA463A &#x25aa; Data Wrangling and Visualization &#x25aa; Fall 2021 &#x25aa; Uhan**

# Project 2. Corruption and Human Development

In this project, you will recreate the graphic below, showing the relationship between corruption and human development in countries across the globe. The graphic below is based on a graphic originally published in *The Economist* in 2011. [Click here for the original graphic.](img/economist.png)

![](img/economist_altair.svg)

The learning goals of this project are three-fold:

1. You will apply what you've learned about multi-view composition; in particular, layering charts on top of each other.

2. You will get some practice customizing visualizations so that they're ready for publication: on a web page, in an academic report, or in a memo to your supervisor.

3. You will also get a lot of practice reading and applying the Altair documentation.

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

### Problem 0

Import Pandas and Altair.

### Problem 1

In the same folder as this notebook, there is a CSV file `data/economist.csv`. This file contains 4 columns:

| Column    | Description                                                                        |
| :-        | :-                                                                                 |
| `Country` | Name of country                                                                    |
| `HDI`     | [Human Development Index](http://hdr.undp.org/en/data), 2011                       |
| `CPI`     | [Corruption Perceptions Index](https://www.transparency.org/en/cpi/2011), 2011     |
| `Region`  | Region of the world, from the [CPI data](https://www.transparency.org/en/cpi/2011) |

Load `data/economist.csv` into a Pandas DataFrame. Check your work by displaying the first 5 rows.

### Problem 2

Start by creating a scatter plot of CPI vs. HDI, using color to differentiate between the different regions of the world. 

At this stage, don't try to match *all* aspects of the graphic above that you're trying to recreate. Instead, focus on the following:


- Include the correct titles for the x- and y-axes.


- Play with the mark properties to adjust the size, stroke width, and fill of the circles (i.e., scatter plot points), to get them as close as possible to the graphic above.


- Use the following CSS color strings to color the points in your scatter plot:

| Region                     | CSS Color String |
| :-                         | :-               |
| OECD                       | `#24576d`        |
| Americas                   | `#099dd7`        |
| Asia & Oceania             | `#28aadc`        |
| Central & Eastern Europe   | `#248e84`        |
| Middle East & North Africa | `#f2483f`        |
| Sub-Saharan Africa         | `#96503f`        |


*Hint.* [Here is the Altair documentation for mark properties.](https://altair-viz.github.io/user_guide/marks.html#mark-properties)

### Problem 3

Note that in the graphic you're trying to recreate, certain points are labeled with their corresponding country name. Your next step is to create these labels.

In the code cell below, there is a list named `points_to_label`, consisting of the country names of the points labeled in the graphic above.

Create a scatter plot of CPI vs. HDI for the countries in `points_to_label`, using *text* labels to represent each point. 

Again, at this stage, don't try to match *all* aspects of the graphic above that you're trying to recreate. Focus only on the text labels, and play around with the mark properties to:

- adjust the horizontal alignment and the vertical baseline of the text labels, and


- adjust the horizontal and vertical offset of the text labels.

For this problem, you should end up with a scatter plot with *only* text labels.

*Hint.* To filter the data so that one of the variables matches one of the values in a given list, use the `FieldOneOfPredicate()` predicate, instead of a Vega expression string, as we did with the `.transform_filter()` method in class. [Here is the Altair documentation for field predicates.](https://altair-viz.github.io/user_guide/transform/filter.html#field-predicates)

In [None]:
points_to_label = [
    'Russia',
    'Venezuela',
    'Iraq',
    'Myanmar',
    'Sudan',
    'Afghanistan',
    'Congo',
    'Greece',
    'Argentina',
    'Brazil',
    'India',
    'Italy',
    'China',
    'South Africa',
    'Spain',
    'Botswana',
    'Cape Verde',
    'Bhutan',
    'Rwanda',
    'France',
    'United States',
    'Germany',
    'Britain',
    'Barbados',
    'Norway',
    'Japan',
    'New Zealand',
    'Singapore'
]

In [None]:
# Your code here


### Problem 4

Note that in the graphic you're trying to recreate, there is a red fitted line. Your next step is to recreate this line.

It turns out that this line was computed by fitting a logarithmic regression to the CPI and HDI data. This can be done in Altair with the `.transform_regression()` method. In particular, to perform a logarithmic regression with `x` as the independent variable and `y` as the dependent variable, you can apply `.transform_regression()` on an Altair chart object, like this:

```python
alt.Chart(df).transform_regression(
    'x', 'y', method='log'
)
```

For more details on regression transforms, [here is the relevant part of the Altair documentation](https://altair-viz.github.io/user_guide/transform/regression.html).

Create a line chart of the fitted line obtained by performing logarithmic regression of CPI on HDI.  

Like in Problems 2 and 3, don't try to match *all* aspects of the graphic you're trying to recreate. Instead, focus on adjusting the mark properties so that the color of the line is red.

For this problem, you should end up with *only* a chart of the fitted line.

### Problem 5

Now it's time to put all the pieces together.

Layer the charts you created in Problems 2, 3 and 4 to create a single chart. Think carefully about the order in which you layer the charts, and how that would affect the readability of the chart.


In addition, make the following adjustments to make your chart look identical to the graphic you're trying to recreate:

- Adjust the chart properties to add a title ("Corruption and human development") to the chart, and resize the chart (600 pixels wide, 400 pixels high).


- Configure the title so it is anchored to the left, has a larger font size, and has some padding (i.e., y-coordinate offset from the plotting area).


- Configure both axes so that their titles have normal font weight and italic font style, no axis baseline, and no ticks. Increase the padding of the labels and title so that they are not so close to the axes.


- Configure the x-axis so that no vertical grid lines appear. Limit the number of ticks on the x-axis to 10.


- Configure the legend so that it is oriented on top of the chart, and has no title.


- Configure the view of the chart so that the stroke width of the box surrounding the chart is zero (i.e., remove the box surrounding the chart).


Now that you've layered the charts you made in Problems 2, 3 and 4, you may find that you need to revisit some of the mark properties you set earlier so that things line up nicely. 


*Hint.* Refer to the [Altair documentation on top-level chart configuration](https://altair-viz.github.io/user_guide/configuration.html) and use the various `.configure_...()` methods to make the necessary appearance adjustments.

<hr style="border-top: 2px solid gray; margin-top: 1px; margin-bottom: 1px"></hr>

## Grading rubric

| Problem |                                                                                                                                      | Points  |
| :-      | :-                                                                                                                                   | -:      |
| 0       | Imports Pandas and Altair                                                                                                            | 1       |
| 1       | Loads CSV file                                                                                                                       | 2       |
|         | Displays the first 5 rows of the DataFrame                                                                                           | 2       |
|         | Code runs without errors                                                                                                             | 1       |
| 2       | Creates scatter plot, using correct encodings for CPI and HDI                                                                        | 6       |
|         | Uses color to differentiate between regions                                                                                          | 3       |
|         | Uses given color scale for different regions                                                                                         | 3       |
|         | Includes title for x- and y- axes                                                                                                    | 3       |
|         | Sets size, stroke width, and fill mark properties appropriately                                                                      | 3       |
|         | Code runs without errors                                                                                                             | 6       |
| 3       | Filters for countries in `points_to_label`                                                                                           | 6       |
|         | Creates scatter plot of text labels, using correct encodings for CPI and HDI                                                         | 6       |
|         | Sets alignment and offset mark properties appropriately                                                                              | 3       |
|         | Code runs without errors                                                                                                             | 5       |
| 4       | Performs logarithmic regression transform of CPI on HDI                                                                              | 6       |
|         | Creates line chart of fitted logarithmic regression line                                                                             | 6       |
|         | Sets color mark property appropriately                                                                                               | 3       |
|         | Code runs without errors                                                                                                             | 5       |
| 5       | Layers 3 charts correctly and in a reasonable order                                                                                  | 6       |
|         | Sets chart properties to add title and resize width and height                                                                       | 3       |
|         | Configures title: left anchor, larger font size, padding                                                                             | 3       |
|         | Configures axes: normal font weight, italic font style, no axis baseline, no ticks, increased label padding, increased title padding | 6       |
|         | Configures x-axis: no vertical grid lines, 10 ticks                                                                                  | 2       |
|         | Configures legend: oriented on top, no title                                                                                         | 2       |
|         | Configures view: no surrounding box                                                                                                  | 1       |
|         | Code runs without errors                                                                                                             | 7       |
|         | **Total**                                                                                                                            | **100** |