## A deeper look into Covid in Portugal

We are interested in analysing Covid data in portugal, understand the evolution of confirmed and active cases, as well as deaths, vaccinations, and much more. To do that we are going to use the data from this [github repository](https://github.com/dssg-pt/covid19pt-data), that collects _all_ of the data covid-wise in Portugal.


![Image of meme](https://static.thehoneycombers.com/wp-content/uploads/sites/4/2020/03/Best-funny-Coronavirus-memes-2020-Honeycombers-Bali-6.jpg)


First lets take a quick glance at the data.

### Preliminaries

We are going to use Python and one of its most known libraries, [pandas](https://pandas.pydata.org/), to explore and analyse this Covid data.

1. Python is a programming language, i.e you can do pretty much whatever with it. (Data Science, Games, Web and Mobile apps)
2. This is a jupyter notebook (one of the most common ways to explore data)
3. Pandas is a python library that is composed by DataFrames (tables) which in turn are composed of Series (one dimensional arrays)


### Initial Setup

Here we are importing libraries (code made by someone else) to agilize our exploration.

In [None]:
import pandas as pd #Pandas
import matplotlib.pyplot as plt #Help us to create plots.

### DataFrame Example
Lets just go through some basics first

## Now that we are experts at DataFraming, lets now jump straight at exploring covid data

First things first, we need to import the data from the github repository. They have several files, but for now we are only going to get **"data.csv"**

In [None]:
covid_data = 'https://raw.githubusercontent.com/dssg-pt/covid19pt-data/master/data.csv'
df = #...

Now that we have the data, lets just explore it. 

What about doing a quick visualization of 'confirmados' and 'activos' ? 

Hmm, idk that looks difficult.

There must be a better way to look at the data...

### Attention check
Something is wrong...


![Image of meme](https://i.imgflip.com/5drmkc.jpg)

The x axis should be 'data' or 'data_dados' right? However, in the plot the x are just numeric values starting in 0 going hover 400. Can you figure out why ?

In [None]:
date_parser = lambda date_str: pd.to_datetime(date_str,format='%d-%m-%Y')

### New syntax alert! 
*lambda* is used to define functions as an alternative to the _def_ keyword

In [None]:
#alternative way, more 'common' way
def date_parser(date_str):
    return pd.to_datetime(date_str,format='%d-%m-%Y')

In [None]:
df

### New syntax alert
[.plot](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) -'Make plots of Series or DataFrame.'


The data has different amplitudes, so, in order to help us just understand the correlation better lets normalize it between [0,1] (just divide every value by the maximum value of that collumn)

### New syntax alert! 

[.apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) -'Apply a function along an axis of the DataFrame.'

<img src="https://i.imgflip.com/5drmun.jpg" alt="drawing" style="width:250px;"/>


##### What if we just want a subset of the data? How do we select the data?

### New syntax alert
[.loc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html) -'Access a group of rows and columns by label(s) or a boolean array.'


### New syntax alert
[.filter](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.filter.html) -'Subset the dataframe rows or columns according to the specified index labels.'


### New syntax alert
[.resample](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html) -'Resample time-series data.'

1. M - month
2. W - week
3. so on..

### What was the worst month/week of _confirmados_novos_ ? 


### How does new confirmed cases differ between regions?

### New syntax alert
[.diff](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.diff.html) -'Calculates the difference of a Dataframe element compared with another element in the Dataframe (default is element in previous row).'



### Exercises

1. How many people died when r(t) was higher than 1.05? 
2. Whats the date with the *highest* *new* confirmed cases in *arsnorte*?
3. Plot the pct per day of confirmed cases by region 

## Lets study the impact of vaccines

We already looked at covid data and you already looked at vaccines data. Now we just have to _merge_ them!

In [None]:
#Get vaccines data
vacc_data = 'https://raw.githubusercontent.com/dssg-pt/covid19pt-data/master/vacinas.csv'

df_vac = pd.read_csv(vacc_data,index_col=0,date_parser=date_parser)

### New syntax alert
[.merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) -'Merge DataFrame or named Series objects with a database-style join.'


### Exercises


1. How does the R(t) correlates with vaccines?
2. Can we correlate the vaccines impact with covid cases? 
3. Lets say that a vaccine after the 2 dose takes 2 weeks to have an impact. How can we measure that?