<p align="center">
  <img src="https://pandas.pydata.org/static/img/pandas_secondary.svg" width="450">
</p>

# Visão Geral

 - Biblioteca para manipulação e análise de dados;
 - Oferece um conjunto de funções para operar dados tabulares (2D) e séries temporais (1D);
 - Usado na área de finanças, estatística, ciências sociais, e muitas áreas da engenharia;
 - Alternativa para a linguagem **R**;
 
# Vantagens
 
 - Facilidade em tratar informações faltantes;
 - Colunas podem ser facilmente excluídas ou adicionadas;
 - Conversão de tipos;
 - Visualização dos dados;
 - Rápido;

 
# Tipos de Dados
## Series
 
 - Lista de valores rotulados e de tipo único;
 
 <img src="https://pandas.pydata.org/docs/_images/01_table_series.svg">
 <p style="text-align:center;">
    <small>
        Fonte: Documentação do Pandas [2]
    </small>
 </p>

In [22]:
import pandas as pd
batimento_cardiaco = [82, 82, 84, 96, 95, 86, 84, 88, 90, 95, 102]
serie = pd.Series(batimento_cardiaco)
print(serie)

0      82
1      82
2      84
3      96
4      95
5      86
6      84
7      88
8      90
9      95
10    102
dtype: int64


 ## DataFrame
  - Matrix 2D de valores rotulados;
  - Tipos diversos;
  - Tamanho mutável;
  - Semelhante à uma panilha/excel ou uma tabela SQL;
  - Cada coluna de um `DataFrame` é do tipo `Series`;
  
<img src="https://pandas.pydata.org/docs/_images/01_table_dataframe1.svg" />
<p style="text-align:center;">    
    <small>Fonte: Documentação do Pandas [2]</small>
</p>

# Lendo um dataset
![](https://pandas.pydata.org/docs/_images/02_io_readwrite1.svg)
<p style="text-align:center;">
    <small>Fonte: Documentação do Pandas [1]</small>
</p>

 #### Lendo dados de um arquivo CSV e mostrando suas 4 primeiras linhas;

In [39]:
import pandas as pd
dataset = pd.read_csv("../datasets/country_vaccinations.csv")
dataset.head(4)

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Albania,ALB,2021-01-10,0.0,0.0,,,,0.0,0.0,,,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
1,Albania,ALB,2021-01-11,,,,,64.0,,,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
2,Albania,ALB,2021-01-12,128.0,128.0,,,64.0,0.0,0.0,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
3,Albania,ALB,2021-01-13,188.0,188.0,,60.0,63.0,0.01,0.01,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...


### Mostrando informações sobre o dataset

In [46]:
dataset.shape

(4380, 15)

In [40]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4380 entries, 0 to 4379
Data columns (total 15 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   country                              4380 non-null   object 
 1   iso_code                             4080 non-null   object 
 2   date                                 4380 non-null   object 
 3   total_vaccinations                   2866 non-null   float64
 4   people_vaccinated                    2438 non-null   float64
 5   people_fully_vaccinated              1626 non-null   float64
 6   daily_vaccinations_raw               2421 non-null   float64
 7   daily_vaccinations                   4226 non-null   float64
 8   total_vaccinations_per_hundred       2866 non-null   float64
 9   people_vaccinated_per_hundred        2438 non-null   float64
 10  people_fully_vaccinated_per_hundred  1626 non-null   float64
 11  daily_vaccinations_per_million

In [45]:
dataset.describe()

Unnamed: 0,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million
count,2866.0,2438.0,1626.0,2421.0,4226.0,2866.0,2438.0,1626.0,4226.0
mean,1671404.0,1449764.0,473486.6,74241.04,54879.89,6.953692,5.667063,2.216052,2393.753195
std,5640491.0,4556002.0,1828830.0,207285.0,173458.1,12.932168,9.320872,5.404122,4380.248455
min,0.0,0.0,1.0,-50012.0,1.0,0.0,0.0,0.0,0.0
25%,30892.0,27732.5,10543.5,2224.0,1121.0,0.6025,0.5825,0.2025,321.0
50%,200937.5,179772.5,49799.5,11773.0,5745.0,2.69,2.52,0.825,1056.5
75%,836103.8,714345.5,249891.8,53089.0,26678.5,6.57,5.14,1.8775,2162.5
max,70454060.0,47184200.0,22613360.0,2242472.0,1916190.0,102.7,64.09,38.61,54264.0


<p>
    <h1 style="text-align:center;">
        Acessando colunas
    </h1>
</p>

![](https://pandas.pydata.org/docs/_images/03_subset_columns.svg)

<p style="text-align:center;">
    Fonte: Documentação do Pandas [1]
</p>

In [73]:
dataset["country"] # Seleção de uma única coluna 

0        Albania
1        Albania
2        Albania
3        Albania
4        Albania
          ...   
4375    Zimbabwe
4376    Zimbabwe
4377    Zimbabwe
4378    Zimbabwe
4379    Zimbabwe
Name: country, Length: 4380, dtype: object

In [48]:
dataset["country"].shape

(4380,)

In [74]:
dataset[["country", "date", "daily_vaccinations_raw"]] # Seleção de várias colunas

Unnamed: 0,country,date,daily_vaccinations_raw
0,Albania,2021-01-10,
1,Albania,2021-01-11,
2,Albania,2021-01-12,
3,Albania,2021-01-13,60.0
4,Albania,2021-01-14,78.0
...,...,...,...
4375,Zimbabwe,2021-02-22,
4376,Zimbabwe,2021-02-23,2727.0
4377,Zimbabwe,2021-02-24,3831.0
4378,Zimbabwe,2021-02-25,3135.0


<p>
    <h1 style="text-align:center;">Filtrando linhas do dataset</h1>
</p>

 ![](https://pandas.pydata.org/docs/_images/03_subset_rows.svg)
 
 <p style="text-align:center;">
    Fonte: Documentação do Pandas [1]
</p>

In [55]:
imunization_brazil = dataset[dataset["country"] == "Brazil"] # para selecionar linhas use condições dentro dos []
imunization_brazil[["country", "total_vaccinations", "date", "source_website"]]

Unnamed: 0,country,total_vaccinations,date,source_website
532,Brazil,0.0,2021-01-16,https://coronavirusbra1.github.io/
533,Brazil,112.0,2021-01-17,https://coronavirusbra1.github.io/
534,Brazil,1109.0,2021-01-18,https://coronavirusbra1.github.io/
535,Brazil,11470.0,2021-01-19,https://coronavirusbra1.github.io/
536,Brazil,28543.0,2021-01-20,https://coronavirusbra1.github.io/
537,Brazil,136519.0,2021-01-21,https://coronavirusbra1.github.io/
538,Brazil,245877.0,2021-01-22,https://coronavirusbra1.github.io/
539,Brazil,537774.0,2021-01-23,https://coronavirusbra1.github.io/
540,Brazil,604722.0,2021-01-24,https://coronavirusbra1.github.io/
541,Brazil,700608.0,2021-01-25,https://coronavirusbra1.github.io/


In [91]:
thursday = "2021-02-25"
friday = "2021-02-26"
dataset[(dataset["date"] == friday) | (dataset["date"] == thursday)]

In [132]:
imunization_br_arg = dataset[dataset["country"].isin(["Brazil", "Argentina"])]

#imunization_br_arg = dataset[
#    (dataset["country"] == "Brazil") | 
#    (dataset["country"] == "Argentina")]

imunization_br_arg

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
111,Argentina,ARG,2020-12-29,700.0,,,,,0.00,,,,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
112,Argentina,ARG,2020-12-30,,,,,15656.0,,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
113,Argentina,ARG,2020-12-31,32013.0,,,,15656.0,0.07,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
114,Argentina,ARG,2021-01-01,,,,,11070.0,,,,245.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
115,Argentina,ARG,2021-01-02,,,,,8776.0,,,,194.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
569,Brazil,BRA,2021-02-22,7028356.0,5857080.0,1171276.0,77554.0,247768.0,3.31,2.76,0.55,1166.0,"Oxford/AstraZeneca, Sinovac",Regional governments via Coronavirus Brasil,https://coronavirusbra1.github.io/
570,Brazil,BRA,2021-02-23,7297061.0,6002873.0,1294188.0,268705.0,241018.0,3.43,2.82,0.61,1134.0,"Oxford/AstraZeneca, Sinovac",Regional governments via Coronavirus Brasil,https://coronavirusbra1.github.io/
571,Brazil,BRA,2021-02-24,7551676.0,6116082.0,1435594.0,254615.0,238305.0,3.55,2.88,0.68,1121.0,"Oxford/AstraZeneca, Sinovac",Regional governments via Coronavirus Brasil,https://coronavirusbra1.github.io/
572,Brazil,BRA,2021-02-25,7799000.0,6202055.0,1596945.0,247324.0,227474.0,3.67,2.92,0.75,1070.0,"Oxford/AstraZeneca, Sinovac",Regional governments via Coronavirus Brasil,https://coronavirusbra1.github.io/


In [139]:
imunization_arg = dataset[dataset["country"] == "Argentina"]
imunization_arg[imunization_arg["total_vaccinations"].notna()]
imunization_arg

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
111,Argentina,ARG,2020-12-29,700.0,,,,,0.0,,,,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
112,Argentina,ARG,2020-12-30,,,,,15656.0,,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
113,Argentina,ARG,2020-12-31,32013.0,,,,15656.0,0.07,,,346.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
114,Argentina,ARG,2021-01-01,,,,,11070.0,,,,245.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
115,Argentina,ARG,2021-01-02,,,,,8776.0,,,,194.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
116,Argentina,ARG,2021-01-03,,,,,7400.0,,,,164.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
117,Argentina,ARG,2021-01-04,39599.0,,,,6483.0,0.09,,,143.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
118,Argentina,ARG,2021-01-05,,,,,7984.0,,,,177.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
119,Argentina,ARG,2021-01-06,,,,,8173.0,,,,181.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...
120,Argentina,ARG,2021-01-07,,,,,8363.0,,,,185.0,Sputnik V,Ministry of Health,http://datos.salud.gob.ar/dataset/vacunas-cont...


<p>
    <h1 style="text-align:center;">Filtrando linhas e colunas do dataset</h1>
</p>

 ![](https://pandas.pydata.org/docs/_images/03_subset_columns_rows1.svg)
 
 <p style="text-align:center;">
    Fonte: Documentação do Pandas [1]
</p>

In [149]:
data = imunization_br_arg.loc[
    imunization_br_arg["date"] == friday,
    ["country", "date", "people_fully_vaccinated"]]
data

Unnamed: 0,country,date,people_fully_vaccinated
170,Argentina,2021-02-26,283280.0
573,Brazil,2021-02-26,1755018.0


In [164]:
data.iloc[1]

country                         Brazil
date                        2021-02-26
people_fully_vaccinated    1.75502e+06
Name: 573, dtype: object

## Referências

[[1]](https://pandas.pydata.org/docs) Documentação do Pandas

[[2]](https://www.kaggle.com/gpreda/covid-world-vaccination-progress) Dataset do progresso da vacinação mundial contra a Covid-19