## A deeper look into Covid in Portugal

We are interested to analyse Covid data in portugal, understand the evolution of confirmed and active cases, as well as deaths, vaccinations and much more. To do that we are going to use the data from this [github repository](https://github.com/dssg-pt/covid19pt-data), that collects _all_ of the data covid-wise in Portugal.


![Image of meme](https://static.thehoneycombers.com/wp-content/uploads/sites/4/2020/03/Best-funny-Coronavirus-memes-2020-Honeycombers-Bali-6.jpg)


Before asking you if you have any questions to ask the data, lets first take a quick glance at it.

### Preliminaries

We are going to use Python and one of its most known libraries, [pandas](https://pandas.pydata.org/), to explore and analyse this Covid data.

1. Python is a programming language, i.e you can do pretty much whatever with it. (Data Science, Games, Web and Mobile apps)«
2. This is a jupyter notebook (one of the more typical ways to explore data)


### Initial Setup

Here we are importing libraries (code made by someone else) to agilize our exploration.

In [None]:
import pandas as pd #Pandas
import matplotlib.pyplot as plt #Help us to create plots.

First things first, we need to import the data from the github repository. They have several files, but for now we are only going to get **"data.csv"**

In [None]:
data = 'https://raw.githubusercontent.com/dssg-pt/covid19pt-data/master/data.csv'
df = pd.read_csv(data)
df

Now that we have the data, lets just explore it. 

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.columns

In [None]:
#what if
#data["..."]

What about doing a quick visualization of 'confirmados' and 'activos' ? 

Hmm, idk that looks difficult.

In [None]:
df["confirmados_novos"].plot()

In [None]:
df["ativos"].plot()

The x axis should be dates right? What's going on? Lets look at the data again

In [None]:
df.head()

The x axis should be 'data' or 'data_dados' right? However, in the plot the x are just numeric values starting in 0 going hover 400. Can you figure out why ?

In [None]:
date_parser = lambda date_str: pd.to_datetime(date_str,format='%d-%m-%Y')

df = pd.read_csv(data,index_col=0,date_parser=date_parser)


In [None]:
df["confirmados"].plot()

In [None]:
df[['confirmados_novos','ativos']].plot()

You should have noticed that this "confirmados" column is a cumulative column. Lets take a look at 'confirmados_novos'

In [None]:
df[["confirmados_novos","ativos"]].plot()

In [None]:
#Simple example with mask, answer a simple and medium question ? What happen at X date ? What dates happen Y ? 

# Whats the worst week ? - resample

In [None]:
#Exercises time get vaccines data.

In [None]:
## Python advanced from here

In [None]:
#For future reference
df.loc['June 2020':].filter(regex="obitos_[a-z][a-z]+").plot()

What was the worst month/week ? 

Which region had the best month (lowest covid case) ?

In [None]:
## Resample
## Filter
## Use vacc data merge with covid

In [None]:
data = 'https://raw.githubusercontent.com/dssg-pt/covid19pt-data/master/vacinas.csv'

df_vac = pd.read_csv(data,index_col=0,date_parser=date_parser)


In [None]:
df

In [None]:
#Normalizar para ver pct
df.merge(df_vac,right_index=True,left_index=True)[["confirmados_novos","pessoas_vacinadas_completamente_novas"]].plot()

Do not jump into conclusions. Think through what you are doing and what you are trying to prove.

![MEME](https://i.imgur.com/uCbnAaf.jpeg)

### Exercises

1. How many (total) people died when r(t) was higher than 1.05? 
2. Whats the date with the *highest* *new* confirmed cases in *arsnorte*?
3. Plot the pct per day of the confirmed by region (hard) 