<p align="center">
  <img src="https://pandas.pydata.org/static/img/pandas_secondary.svg" width="450">
</p>

# Visão Geral

 - Biblioteca para manipulação e análise de dados;
 - Oferece um conjunto de funções para operar dados tabulares (2D) e séries temporais (1D);
 - Usado na área de finanças, estatística, ciências sociais, e muitas áreas da engenharia;
 - Alternativa para a linguagem **R**;
 
# Vantagens
 
 - Facilidade em tratar informações faltantes;
 - Colunas podem ser facilmente excluídas ou adicionadas;
 - Conversão de tipos;
 - Visualização dos dados;
 - Rápido;

 
# Tipos de Dados
## Series
 
 - Lista de valores rotulados e de tipo único;
 - Possuem somente uma dimensão;
 
 <img src="https://pandas.pydata.org/docs/_images/01_table_series.svg">
 <p style="text-align:center;">
    <small>
        Fonte: Documentação do Pandas [2]
    </small>
 </p>

In [None]:
import pandas as pd
batimento_cardiaco = pd.Series([82, 82, 84, 96, 95, 86, 84, 88, 90, 95, 102])
batimento_cardiaco

 ## DataFrame
  - Matrix 2D de valores rotulados;
  - Tipos diversos;
  - Tamanho mutável;
  - Semelhante à uma panilha/excel ou uma tabela SQL;
  - Cada coluna de um `DataFrame` é do tipo `Series`;
  
<img src="https://pandas.pydata.org/docs/_images/01_table_dataframe1.svg" />
<p style="text-align:center;">    
    <small>
        <b>Fonte</b>:
        Documentação do Pandas [2]
    </small>
</p>

# Lendo um dataset
![](https://pandas.pydata.org/docs/_images/02_io_readwrite1.svg)
<p style="text-align:center;">
    <small>
        <b>Fonte</b>:
        Documentação do Pandas [1]
    </small>
</p>

 #### Lendo dados de um arquivo CSV

In [2]:
import pandas as pd
dataset = pd.read_csv("../datasets/country_vaccinations.csv")
dataset.head(4)

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
0,Albania,ALB,2021-01-10,0.0,0.0,,,,0.0,0.0,,,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
1,Albania,ALB,2021-01-11,,,,,64.0,,,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
2,Albania,ALB,2021-01-12,128.0,128.0,,,64.0,0.0,0.0,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...
3,Albania,ALB,2021-01-13,188.0,188.0,,60.0,63.0,0.01,0.01,,22.0,Pfizer/BioNTech,Ministry of Health,https://shendetesia.gov.al/covid19-ministria-e...


### Mostrando informações sobre o dataset

In [None]:
dataset.shape

In [None]:
dataset.info()

In [None]:
dataset.describe()


# Acessando colunas

![](https://pandas.pydata.org/docs/_images/03_subset_columns.svg)

<p style="text-align:center;">
    Fonte: Documentação do Pandas [1]
</p>

In [None]:
dataset["country"] # Seleção de uma única coluna 

In [None]:
dataset["country"].shape

In [None]:
dataset[["country", "date", "daily_vaccinations_raw"]] # Seleção de várias colunas

# Filtrando linhas do dataset

![](https://pandas.pydata.org/docs/_images/03_subset_rows.svg)
 
 <p style="text-align:center;">
    <b>Fonte</b>: Documentação do Pandas [1]
</p>

In [None]:
# Selecionar o andamento da vacinação no Brasil.

imunization_in_brazil = dataset[dataset["country"] == "Brazil"]
imunization_in_brazil[["country", "total_vaccinations", "date", "source_website"]]

In [None]:
# Selecionar as informações da última quinta e sexta-feira.

last_thursday = "2021-02-25"
last_friday = "2021-02-26"
imunization_in_brazil[
    (imunization_in_brazil["date"] == last_friday) |
    (imunization_in_brazil["date"] == last_thursday)]

In [1]:
imunization_in_brazil[imunization_in_brazil["date"].isin([last_friday, last_thursday])]

NameError: name 'imunization_in_brazil' is not defined

# Filtrando linhas e colunas do dataset


 ![](https://pandas.pydata.org/docs/_images/03_subset_columns_rows1.svg)
 
 <p style="text-align:center;">
    <b>Fonte</b>: Documentação do Pandas [1]
</p>

### Filtrando através de labels

In [None]:
# Selecionar o total de pessoas vacinadas
# na última sexta-feira no Brazil e na Argentina.

imunization_br_arg = dataset[dataset["country"].isin(["Brazil", "Argentina"])]
imunization_br_arg.head(2)

In [None]:
br_arg_last_friday = imunization_br_arg.loc[
    imunization_br_arg["date"] == last_friday,
    "country":"people_vaccinated"
]

br_arg_last_friday

In [None]:
br_arg_last_friday.loc[:, ["country", "date", "total_vaccinations"]]

### Filtrando através de índices 

In [None]:
# Selecionar o total de pessoas vacinadas
# na última sexta-feira no Brazil e na Argentina.

imunization_br_arg = dataset[dataset["country"].isin(["Brazil", "Argentina"])]

br_arg_last_friday = imunization_br_arg[
    imunization_br_arg["date"] == last_friday]

br_arg_last_friday.iloc[:1, 0:5]

In [None]:
br_arg_last_friday


# Funções úteis

### value_counts

Conta o número de entradas em cada categoria de uma variável

In [None]:
amount_of_days = dataset["country"].value_counts()
print("Imunização no Brasil iniciou à", amount_of_days["Brazil"], "dias")

### fillna (fill NaN)

Substitui valores nulos como o valor especificado

In [None]:
people_fully_vaccinated_br = dataset.loc[
    dataset["country"] == "Brazil",
    ["country", "date", "people_fully_vaccinated"]]

filled_with_zeros = people_fully_vaccinated_br["people_fully_vaccinated"].fillna(0)

people_fully_vaccinated_br["people_fully_vaccinated"] = filled_with_zeros

people_fully_vaccinated_br


### Nunique (number of unique)

Conta a quantidade de valores distintos que a coluna especificada tem

In [None]:
country_amount = dataset["country"].nunique()
print("Quantidade de países distintos presentes no dataset", country_amount)

### Sort

Ordena os valores de uma coluna

In [None]:
# Ordenar os países pela taxa de pessoas vacinadas (%)

imun_last_friday = dataset[dataset["date"] == last_friday]

imun_last_friday = imun_last_friday.loc[
    imun_last_friday["people_fully_vaccinated_per_hundred"].notna(),
    ["country","date","people_fully_vaccinated_per_hundred"]]

imun_last_friday = imun_last_friday.sort_values(by="people_fully_vaccinated_per_hundred", ascending=False)

# List comprehension ─ explicar isso
imun_last_friday["rank"] = [i for i in range(1, imun_last_friday.shape[0]+1)]

imun_last_friday

### Apply

Executa uma função em cada valor de uma coluna.

In [31]:
falkland = dataset.loc[dataset["country"] == "Falkland Islands"]

def set_nan_to_none(arg):
    return arg + 1

#falkland[["total_vaccinations", "people_vaccinated"]].apply(set_nan_to_none)
falkland

Unnamed: 0,country,iso_code,date,total_vaccinations,people_vaccinated,people_fully_vaccinated,daily_vaccinations_raw,daily_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,daily_vaccinations_per_million,vaccines,source_name,source_website
1404,Falkland Islands,FLK,2021-02-07,0.0,0.0,,,,0.0,0.0,,,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1405,Falkland Islands,FLK,2021-02-08,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1406,Falkland Islands,FLK,2021-02-09,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1407,Falkland Islands,FLK,2021-02-10,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1408,Falkland Islands,FLK,2021-02-11,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1409,Falkland Islands,FLK,2021-02-12,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1410,Falkland Islands,FLK,2021-02-13,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1411,Falkland Islands,FLK,2021-02-14,,,,,189.0,,,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...
1412,Falkland Islands,FLK,2021-02-15,1515.0,1515.0,,,189.0,43.5,43.5,,54264.0,Oxford/AstraZeneca,Government of the Falkland Islands,https://www.facebook.com/FalkIandsGov/posts/42...


# Manipulando Strings

In [None]:
brazil = dataset[dataset["country"] == "Brazil"]
brazil["vaccines"].str.split(",")

# Referências

[[1]](https://pandas.pydata.org/docs) Documentação do Pandas

[[2]](https://www.kaggle.com/gpreda/covid-world-vaccination-progress) Dataset do progresso da vacinação mundial contra a Covid-19

[[3]](https://www.w3schools.com/python/python_lists_comprehension.asp) Compreensão de listas em python W3Schools

[[4]](https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf) Pandas Cheat Sheet