# Mini Workshop - Programming a Literature Review

Modified from Serena Bonaretti's repository https://github.com/sbonaretti/cart_segm_liter_map and related Youtube videos https://www.youtube.com/channel/UCk1sLroo_tgJqcn-0EVh6zQ/featured. Repurposed for a mini-workshop demonstration on programming tools for open science, and how they can be useful for all disciplines. Presented to Horizon CDT (https://cdt.horizon.ac.uk/) students. Please note that for the exercise, some steps such as uploading data to a repository that provides a DOI have been skipped, but that these are important for creating true reproducible environments. 

---

In [None]:
import pandas as pd  
import numpy  as np
import altair as alt
import vega 
from   vega_datasets import data # for state contours 

In [None]:
alt.renderers.enable('notebook'); # for rendering in jupyter notebook

---
<a name = "download"></a>
## Download data in the repository

Ideally, input data should be in a repository that provides a **persistent digital object identifier (DOI)** so that data will be available in the future. It is *discouraged* to share data from **personal repositories** because links tend to get deleted, thus compromising the reproducibility of the workflow.

- Load the file `literature_review.csv` 
- Show data in file

In [3]:
# load literature table
literature = pd.read_csv("literature_review.csv")

In [None]:
# display all rows and columns
dataDimension = literature.shape # get number of rows
pd.set_option("display.max_rows",5)
pd.set_option("display.max_columns",dataDimension[1])

# show
literature

---
<a name = "manipulation"></a>
## Automate data manipulation

**Automatic** data manipulation does not compromise original data and keeps track of manipulations, making analyses reproducible. It is *discouraged* to do **manual** manipulation, as it compromizes original data, is prone to errors, and does not keep track of changes, making analyses hardly reproducible.

- Change `bibtext_id` format from `author_year` to `author (year)` for better readability when hovering (e.g. from `Solloway_1997` to `Solloway (1997)`)

In [None]:
# replace underscore with space and opening bracket
literature["bibtex_id"] = literature["bibtex_id"].str.replace('_',' (')
# adding closing bracket
literature["bibtex_id"] = literature["bibtex_id"].astype(str) + ")"  

# show table
literature

## Visualize map
Show the interactive map.

In [None]:
# import coordinates of countries to create the background map
countries = alt.topo_feature(data.world_110m.url, 'countries')

# create map
background = alt.Chart(countries).mark_geoshape(
    fill        = 'white',
    stroke      = 'lightgray',
    strokeWidth = 1.5
).project(
    "equirectangular"
).properties(
    width  = 1250,
    height = 750
)

# create points
points = alt.Chart(literature).mark_circle().encode(
    longitude = 'longitude',
    latitude  = 'latitude',
    size      = alt.value(100),
    color     = 'algorithm_type',
    tooltip   = 'bibtex_id' # name of each point when hovering
    
)

# show
background + points 