# 2.2. DataFrames I - Pandas
---

<img src="https://selecao.letscode.com.br/favicon.png" width="40px" style="position: absolute; top: 15px; right: 40px; border-radius: 5px;" />

## DataFrames

No Pandas, as tabelas são chamadas de `DataFrames`.

<img src="https://s3-sa-east-1.amazonaws.com/lcpi/bb07d871-b4c0-4fb0-b28b-9c308c7b1270.png" style="border-radius: 10px;" />

O Pandas suporta a integração com muitos formatos de arquivo ou fontes de dados prontas para uso (csv, excel, sql, json, parquet, ...). A importação de dados de cada uma dessas fontes de dados é fornecida pela função com o prefixo `read_*`. Da mesma forma, os métodos `to_*` são usados para armazenar dados.

<img src="https://s3-sa-east-1.amazonaws.com/lcpi/e31ee583-51f5-45fb-9b01-0fd20b6922ff.png" style="border-radius: 10px;" />



### Roteiro do Conteúdo

- Criação de DataFrames
- Acessando os dados
- Criando novas colunas
- Carregando e Salvando Datasets
    - Conhecendo o [Kaggle](https://www.kaggle.com/)
    - CSV
    - XLSX
    - HTML  
    - ...
- Principais métodos e atributos
    - `index`
    - `columns`
    - `values`
    - `shape`
    - `dtypes`
    - `head()`
    - `tail()`
    - `describe()`
    - `info()`
    - `min()`, `max()`, `sum()`...
    - `rename()`
    - `reset_index()` e `set_index()`
    - `duplicated()`
    - `drop_duplicates()`
- Deletando linhas/colunas
- Filtrando dados
- Agrupamento de dados
- Ordenação de dados

### Criação de DataFrames

<div style="display: block; margin: 2rem 0; background-color: #1E90FF33; border: 2px solid #1E90FF; border-radius: 7px; padding: 20px 20px 5px; width: 50%">
    <a style="text-align: center; display: block; width: 100%; font-size: 1.2rem" href="https://www.walissonsilva.com/posts/diferentes-formas-de-criar-um-dataframe">Diferentes formas de criar um DataFrame</a>
<div>

In [1]:
import pandas as pd
import numpy as np

### 1. Array bidimensional

In [88]:
dados = np.arange(1, 26, 1).reshape((5, 5))
dados

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [89]:
dados.shape[1]

5

In [90]:
df2 = pd.DataFrame(data=dados, index='A B C D E'.split(), columns=[f'Coluna {i + 1}' for i in range(dados.shape[1])])

In [91]:
df2

Unnamed: 0_level_0,Peso,Altura,IMC,Idade
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fulano,70,1.67,25.099502,17
Ciclano,56,1.7,19.377163,20
Fulana,60,1.55,24.973985,21
Ciclana,68,1.64,25.28257,18


### 2. Dicionários de Listas

In [92]:
pessoas = {
  'Nome': ['Fulano', 'Ciclano', 'Fulana', 'Ciclana'],
  'Peso': [70, 56, 60, 68],
  'Altura': [1.67, 1.7, 1.55, 1.64]
}

In [93]:
df = pd.DataFrame(pessoas)

df

Unnamed: 0,Nome,Peso,Altura
0,Fulano,70,1.67
1,Ciclano,56,1.7
2,Fulana,60,1.55
3,Ciclana,68,1.64


### 3. Lista de dicionários

In [94]:
pessoas_lista_dicionarios = [
  {
    'Nome': 'Fulano',
    'Peso': 70,
    'Altura': 1.67,
  },
  {
    'Nome': 'Ciclano',
    'Peso': 56,
    'Altura': 1.70,
    'Idade': 16
  }
]

In [95]:
pd.DataFrame(pessoas_lista_dicionarios)

Unnamed: 0,Nome,Peso,Altura,Idade
0,Fulano,70,1.67,
1,Ciclano,56,1.7,16.0


### Acessando os elementos

- Acessando coluna(s)
- Acessando um elemento específico
- Acessando linha(s)
- `loc` e `iloc`

In [96]:
df

Unnamed: 0,Nome,Peso,Altura
0,Fulano,70,1.67
1,Ciclano,56,1.7
2,Fulana,60,1.55
3,Ciclana,68,1.64


In [97]:
df['Nome'][2]

'Fulana'

In [98]:
df['Nome']

0     Fulano
1    Ciclano
2     Fulana
3    Ciclana
Name: Nome, dtype: object

In [99]:
type(df)

pandas.core.frame.DataFrame

In [100]:
df['Altura']

0    1.67
1    1.70
2    1.55
3    1.64
Name: Altura, dtype: float64

In [101]:
df.head()

Unnamed: 0,Nome,Peso,Altura
0,Fulano,70,1.67
1,Ciclano,56,1.7
2,Fulana,60,1.55
3,Ciclana,68,1.64


In [102]:
df.loc[0]

Nome      Fulano
Peso          70
Altura      1.67
Name: 0, dtype: object

In [103]:
df2.loc['A']

Coluna 1    1
Coluna 2    2
Coluna 3    3
Coluna 4    4
Coluna 5    5
Name: A, dtype: int64

In [104]:
df.iloc[0,1]

70

In [105]:
df2.iloc[0:2,2:]

Unnamed: 0,Coluna 3,Coluna 4,Coluna 5
A,3,4,5
B,8,9,10


### Criando novas colunas

In [106]:
df

Unnamed: 0,Nome,Peso,Altura
0,Fulano,70,1.67
1,Ciclano,56,1.7
2,Fulana,60,1.55
3,Ciclana,68,1.64


### IMC

$$
IMC = \frac{Peso}{Altura^2}
$$

In [107]:
df['IMC'] = df['Peso'] / df['Altura']**2

In [108]:
df

Unnamed: 0,Nome,Peso,Altura,IMC
0,Fulano,70,1.67,25.099502
1,Ciclano,56,1.7,19.377163
2,Fulana,60,1.55,24.973985
3,Ciclana,68,1.64,25.28257


In [109]:
df['Idade'] = [17, 20, 21, 18]

In [110]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [111]:
df['Sobrenome'] = pd.Series(['Lalala', 'Banana', 'Abacaxi'], index=[1, 3, 0])

In [112]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade,Sobrenome
0,Fulano,70,1.67,25.099502,17,Abacaxi
1,Ciclano,56,1.7,19.377163,20,Lalala
2,Fulana,60,1.55,24.973985,21,
3,Ciclana,68,1.64,25.28257,18,Banana


In [113]:
df.columns[1:-1]

Index(['Peso', 'Altura', 'IMC', 'Idade'], dtype='object')

In [114]:
df = df[['Nome', 'Sobrenome'] + list(df.columns[1:-1])]

In [115]:
df

Unnamed: 0,Nome,Sobrenome,Peso,Altura,IMC,Idade
0,Fulano,Abacaxi,70,1.67,25.099502,17
1,Ciclano,Lalala,56,1.7,19.377163,20
2,Fulana,,60,1.55,24.973985,21
3,Ciclana,Banana,68,1.64,25.28257,18


### Removendo Linhas/Colunas

In [116]:
df.drop(3) # Por padrão, a remoção será feita na linha (especifica-se o index da linha)

Unnamed: 0,Nome,Sobrenome,Peso,Altura,IMC,Idade
0,Fulano,Abacaxi,70,1.67,25.099502,17
1,Ciclano,Lalala,56,1.7,19.377163,20
2,Fulana,,60,1.55,24.973985,21


In [117]:
df

Unnamed: 0,Nome,Sobrenome,Peso,Altura,IMC,Idade
0,Fulano,Abacaxi,70,1.67,25.099502,17
1,Ciclano,Lalala,56,1.7,19.377163,20
2,Fulana,,60,1.55,24.973985,21
3,Ciclana,Banana,68,1.64,25.28257,18


In [118]:
df.drop('Sobrenome', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [119]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [120]:
df_copy = df.copy()

In [121]:
df_copy

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [122]:
df.drop(['Nome', 'Idade', 'Altura'], axis=1)

Unnamed: 0,Peso,IMC
0,70,25.099502
1,56,19.377163
2,60,24.973985
3,68,25.28257


In [123]:
df_aux = df[['Peso', 'IMC']]

In [124]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


### `reset_index` e `set_index`

In [125]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [126]:
df.set_index('Nome', inplace=True)

In [127]:
df

Unnamed: 0_level_0,Peso,Altura,IMC,Idade
Nome,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Fulano,70,1.67,25.099502,17
Ciclano,56,1.7,19.377163,20
Fulana,60,1.55,24.973985,21
Ciclana,68,1.64,25.28257,18


In [129]:
df.loc['Fulano']

Peso      70.000000
Altura     1.670000
IMC       25.099502
Idade     17.000000
Name: Fulano, dtype: float64

In [130]:
df.reset_index()

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [131]:
df2

Unnamed: 0,Coluna 1,Coluna 2,Coluna 3,Coluna 4,Coluna 5
A,1,2,3,4,5
B,6,7,8,9,10
C,11,12,13,14,15
D,16,17,18,19,20
E,21,22,23,24,25


In [132]:
df2.reset_index()

Unnamed: 0,index,Coluna 1,Coluna 2,Coluna 3,Coluna 4,Coluna 5
0,A,1,2,3,4,5
1,B,6,7,8,9,10
2,C,11,12,13,14,15
3,D,16,17,18,19,20
4,E,21,22,23,24,25


In [134]:
df.reset_index(inplace=True)

In [135]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Fulana,60,1.55,24.973985,21
3,Ciclana,68,1.64,25.28257,18


In [138]:
df.drop(2, axis=0, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().drop(


In [139]:
df

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
3,Ciclana,68,1.64,25.28257,18


In [142]:
df.reset_index(drop=True)

Unnamed: 0,Nome,Peso,Altura,IMC,Idade
0,Fulano,70,1.67,25.099502,17
1,Ciclano,56,1.7,19.377163,20
2,Ciclana,68,1.64,25.28257,18


### Carregando e Salvando DataFrames

In [150]:
df = pd.read_csv('../00. Datasets/aluguel.csv', sep=';')

In [151]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


### OBS

É possível que seja necessário instalar o pacote `openpyxl`. Basta utilizar o comando:

```
pip install openpyxl
```

In [154]:
df_excel = pd.read_excel('../00. Datasets/aluguel.xlsx')

In [155]:
df_excel

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU
0,Quitinete,Copacabana,1,0,0,40,1700,500.0,60.0
1,Casa,Jardim Botânico,2,0,1,100,7000,,
2,Conjunto Comercial/Sala,Barra da Tijuca,0,4,0,150,5200,4020.0,1111.0
3,Apartamento,Centro,1,0,0,15,800,390.0,20.0
4,Apartamento,Higienópolis,1,0,0,48,800,230.0,
5,Apartamento,Vista Alegre,3,1,0,70,1200,,
6,Apartamento,Cachambi,2,0,0,50,1300,301.0,17.0
7,Casa de Condomínio,Barra da Tijuca,5,4,5,750,22000,,
8,Casa de Condomínio,Ramos,2,2,0,65,1000,,
9,Conjunto Comercial/Sala,Centro,0,3,0,695,35000,19193.0,3030.0


In [156]:
pd.read_json('../00. Datasets/aluguel.json')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU
0,Quitinete,Copacabana,1,0,0,40,1700,500.0,60.0
1,Casa,Jardim Botânico,2,0,1,100,7000,,
2,Conjunto Comercial/Sala,Barra da Tijuca,0,4,0,150,5200,4020.0,1111.0
3,Apartamento,Centro,1,0,0,15,800,390.0,20.0
4,Apartamento,Higienópolis,1,0,0,48,800,230.0,
5,Apartamento,Vista Alegre,3,1,0,70,1200,,
6,Apartamento,Cachambi,2,0,0,50,1300,301.0,17.0
7,Casa de Condomínio,Barra da Tijuca,5,4,5,750,22000,,
8,Casa de Condomínio,Ramos,2,2,0,65,1000,,
9,Conjunto Comercial/Sala,Centro,0,3,0,695,35000,19193.0,3030.0


In [None]:
pd.read_t

In [159]:
pd.read_table('../00. Datasets/aluguel.txt', sep='\t')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU
0,Quitinete,Copacabana,1,0,0,40,1700,500.0,60.0
1,Casa,Jardim Botânico,2,0,1,100,7000,,
2,Conjunto Comercial/Sala,Barra da Tijuca,0,4,0,150,5200,4020.0,1111.0
3,Apartamento,Centro,1,0,0,15,800,390.0,20.0
4,Apartamento,Higienópolis,1,0,0,48,800,230.0,
5,Apartamento,Vista Alegre,3,1,0,70,1200,,
6,Apartamento,Cachambi,2,0,0,50,1300,301.0,17.0
7,Casa de Condomínio,Barra da Tijuca,5,4,5,750,22000,,
8,Casa de Condomínio,Ramos,2,2,0,65,1000,,
9,Conjunto Comercial/Sala,Centro,0,3,0,695,35000,19193.0,3030.0


In [170]:
df_petrobras = pd.read_clipboard()

In [171]:
df_petrobras

Unnamed: 0,Date,Open,High,Low,Close*,Adj Close**,Volume
0,"Jan 14, 2022",30.28,31.25,30.24,31.17,31.17,55342500
1,"Jan 13, 2022",29.53,30.63,29.5,30.32,30.32,76717500
2,"Jan 12, 2022",28.95,29.91,28.95,29.72,29.72,81378200
3,"Jan 11, 2022",28.1,29.07,27.85,28.99,28.99,62315600
4,"Jan 10, 2022",27.99,28.24,27.72,28.01,28.01,37455200
5,"Jan 07, 2022",28.11,28.29,27.82,28.18,28.18,47507600
6,"Jan 06, 2022",28.29,28.65,27.84,28.05,28.05,61163100
7,"Jan 05, 2022",29.19,29.27,27.94,28.07,28.07,78459800
8,"Jan 04, 2022",29.16,29.4,28.91,29.2,29.2,51739200
9,"Jan 03, 2022",28.54,29.22,28.53,29.09,29.09,52704700


### OBS

É possível que seja necessário instalar o pacote `lxml`. Basta utilizar o comando:

```
pip install lxml
```

In [173]:
df_list = pd.read_html('https://www.fdic.gov/resources/resolutions/bank-failures/failed-bank-list/')

In [174]:
len(df_list)

1

In [175]:
df_list[0]

Unnamed: 0,Bank NameBank,CityCity,StateSt,CertCert,Acquiring InstitutionAI,Closing DateClosing,FundFund
0,Almena State Bank,Almena,KS,15426,Equity Bank,"October 23, 2020",10538
1,First City Bank of Florida,Fort Walton Beach,FL,16748,"United Fidelity Bank, fsb","October 16, 2020",10537
2,The First State Bank,Barboursville,WV,14361,"MVB Bank, Inc.","April 3, 2020",10536
3,Ericson State Bank,Ericson,NE,18265,Farmers and Merchants Bank,"February 14, 2020",10535
4,City National Bank of New Jersey,Newark,NJ,21111,Industrial Bank,"November 1, 2019",10534
...,...,...,...,...,...,...,...
558,"Superior Bank, FSB",Hinsdale,IL,32646,"Superior Federal, FSB","July 27, 2001",6004
559,Malta National Bank,Malta,OH,6629,North Valley Bank,"May 3, 2001",4648
560,First Alliance Bank & Trust Co.,Manchester,NH,34264,Southern New Hampshire Bank & Trust,"February 2, 2001",4647
561,National State Bank of Metropolis,Metropolis,IL,3815,Banterra Bank of Marion,"December 14, 2000",4646


In [177]:
pd.read_csv('https://s3-sa-east-1.amazonaws.com/lcpi/164c4740-feb8-4bb2-8595-6423754567fa.csv', sep=';')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19826,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19827,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19828,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19829,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


In [178]:
pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.50,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4
...,...,...,...,...,...,...,...
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.00,Female,Yes,Sat,Dinner,2
241,22.67,2.00,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2


### Testando o `read_html` com outro link

In [3]:
df_list = pd.read_html('https://g1.globo.com/bemestar/coronavirus/noticia/2020/05/26/casos-de-coronavirus-e-numero-de-mortes-no-brasil-em-26-de-maio.ghtml')

In [4]:
len(df_list)

2

In [6]:
df_list[1]

Unnamed: 0,0,1,2
0,Estados,Nº de pacientes recuperados,Data de divulgação
1,Acre,1.574,25/5
2,Alagoas,3.653,25/5
3,Amapá,2.700,26/5
4,Amazonas,24.112,25/5
5,Bahia,4.680,26/5
6,Ceará,23.299,26/5
7,Distrito Federal,3.962,26/5
8,Espírito Santo,5.761,26/5
9,Maranhão,6.664,26/5


### Carregando novamente o Dataset

In [9]:
df = pd.read_csv('../00. Datasets/aluguel.csv', sep=';')

In [10]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


### Principais métodos e atributos

`index`

Obtendo os índices (label das linhas) do DataFrame.

In [11]:
df.index

RangeIndex(start=0, stop=19831, step=1)

`columns`

Obtendo o label das colunas do DataFrame.

In [17]:
df.columns

Index(['Tipo', 'Bairro', 'Quartos', 'Vagas', 'Suites', 'Area', 'Valor',
       'Condominio', 'IPTU', 'Valor m2', 'Tipo Agregado'],
      dtype='object')

`values`

Obtendo um array 2D que contém os valores do DataFrame.

In [18]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


In [20]:
df.values

array([['Apartamento', 'Centro', 1, ..., 20.0, 53.33, 'Apartamento'],
       ['Apartamento', 'Higienópolis', 1, ..., 0.0, 16.67, 'Apartamento'],
       ['Apartamento', 'Cachambi', 2, ..., 17.0, 26.0, 'Apartamento'],
       ...,
       ['Quitinete', 'Centro', 1, ..., 0.0, 45.83, 'Apartamento'],
       ['Quitinete', 'Copacabana', 1, ..., 200.0, 68.18, 'Apartamento'],
       ['Quitinete', 'Centro', 0, ..., 25.0, 29.63, 'Apartamento']],
      dtype=object)

In [21]:
df.values[0]

array(['Apartamento', 'Centro', 1, 0, 0, 15, 800.0, 390.0, 20.0, 53.33,
       'Apartamento'], dtype=object)

`shape`

Obtendo as dimensões do DataFrame (nº de linhas e colunas).

In [23]:
n_linhas, n_colunas = df.shape

In [24]:
n_linhas

19831

In [25]:
n_colunas

11

`dtypes`

Obtendo os tipos de cada dado (coluna) do DataFrame.

In [26]:
df.dtypes

Tipo              object
Bairro            object
Quartos            int64
Vagas              int64
Suites             int64
Area               int64
Valor            float64
Condominio       float64
IPTU             float64
Valor m2         float64
Tipo Agregado     object
dtype: object

`head()`

Visualizando as 5 primeiras linhas (ou as `n` primeiras linhas).

In [27]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


In [28]:
df.head(n=10)

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
5,Apartamento,Copacabana,1,0,1,40,2000.0,561.0,50.0,50.0,Apartamento
6,Apartamento,Freguesia (Jacarepaguá),3,0,0,54,950.0,300.0,28.0,17.59,Apartamento
7,Apartamento,Barra da Tijuca,2,1,1,67,1700.0,589.0,147.0,25.37,Apartamento
8,Apartamento,Tijuca,2,1,0,110,1900.0,700.0,138.0,17.27,Apartamento
9,Apartamento,Olaria,3,1,0,68,1000.0,670.0,0.0,14.71,Apartamento


`tail()`

Visualizando as 5 últimas linhas (ou as `n` últimas linhas).

In [29]:
df.tail()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
19826,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.0,Apartamento
19827,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19828,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19829,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento
19830,Quitinete,Centro,0,0,0,27,800.0,350.0,25.0,29.63,Apartamento


In [30]:
df.tail(10)

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
19821,Quitinete,Flamengo,1,0,0,22,1000.0,378.0,0.0,45.45,Apartamento
19822,Quitinete,Glória,1,0,0,32,1250.0,320.0,0.0,39.06,Apartamento
19823,Quitinete,Copacabana,1,0,0,36,1600.0,610.0,54.0,44.44,Apartamento
19824,Quitinete,Leblon,1,0,0,32,2400.0,715.0,121.0,75.0,Apartamento
19825,Quitinete,Copacabana,1,0,0,32,1600.0,692.0,0.0,50.0,Apartamento
19826,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.0,Apartamento
19827,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19828,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19829,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento
19830,Quitinete,Centro,0,0,0,27,800.0,350.0,25.0,29.63,Apartamento


`describe()`

Obtendo um resumo das métricas da estatística descritiva dos dados do DataFrame.

In [49]:
df.describe()

Unnamed: 0,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2
count,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0
mean,2.409712,1.200797,0.819575,120.265393,3465.402299,1386.19,433.981241,31.114005
std,1.031674,13.957338,1.026979,126.603826,3291.356043,46692.23,3698.116785,15.071508
min,0.0,0.0,0.0,10.0,100.0,0.0,0.0,2.78
25%,2.0,0.0,0.0,62.0,1500.0,500.0,0.0,20.12
50%,2.0,1.0,1.0,85.0,2500.0,800.0,100.0,28.16
75%,3.0,2.0,1.0,130.0,4300.0,1336.0,294.0,39.0
max,14.0,1960.0,14.0,3000.0,32000.0,6552570.0,450625.0,471.8


In [50]:
df.describe(include=['object'])

Unnamed: 0,Tipo,Bairro,Tipo Agregado
count,19831,19831,19831
unique,5,151,2
top,Apartamento,Barra da Tijuca,Apartamento
freq,16923,3383,17736


In [51]:
df.describe(include='all') # Não é muito legal utilizar o include='all'

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
count,19831,19831,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0,19831.0,19831
unique,5,151,,,,,,,,,2
top,Apartamento,Barra da Tijuca,,,,,,,,,Apartamento
freq,16923,3383,,,,,,,,,17736
mean,,,2.409712,1.200797,0.819575,120.265393,3465.402299,1386.19,433.981241,31.114005,
std,,,1.031674,13.957338,1.026979,126.603826,3291.356043,46692.23,3698.116785,15.071508,
min,,,0.0,0.0,0.0,10.0,100.0,0.0,0.0,2.78,
25%,,,2.0,0.0,0.0,62.0,1500.0,500.0,0.0,20.12,
50%,,,2.0,1.0,1.0,85.0,2500.0,800.0,100.0,28.16,
75%,,,3.0,2.0,1.0,130.0,4300.0,1336.0,294.0,39.0,


In [52]:
# Para saber quantos dados nulos temos em cada coluna
df.isnull().sum()

Tipo             0
Bairro           0
Quartos          0
Vagas            0
Suites           0
Area             0
Valor            0
Condominio       0
IPTU             0
Valor m2         0
Tipo Agregado    0
dtype: int64

`info()`

Obtendo um resumo dos dados que estão guardados no DF.

In [53]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19831 entries, 0 to 19830
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Tipo           19831 non-null  object 
 1   Bairro         19831 non-null  object 
 2   Quartos        19831 non-null  int64  
 3   Vagas          19831 non-null  int64  
 4   Suites         19831 non-null  int64  
 5   Area           19831 non-null  int64  
 6   Valor          19831 non-null  float64
 7   Condominio     19831 non-null  float64
 8   IPTU           19831 non-null  float64
 9   Valor m2       19831 non-null  float64
 10  Tipo Agregado  19831 non-null  object 
dtypes: float64(4), int64(4), object(3)
memory usage: 1.7+ MB


`min()`, `max()`, `sum()`, ...

Alguns métodos estatísticos.

In [54]:
df.max()

Tipo              Quitinete
Bairro           Água Santa
Quartos                  14
Vagas                  1960
Suites                   14
Area                   3000
Valor               32000.0
Condominio        6552570.0
IPTU               450625.0
Valor m2              471.8
Tipo Agregado          Casa
dtype: object

`rename()`

In [61]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


In [66]:
df.rename(index={0: 'Primeira Observação'})

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor m2,Tipo Agregado
Primeira Observação,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19826,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19827,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19828,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19829,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


In [71]:
df.rename(columns={'Valor m2': 'Valor (m²)'}, inplace=True)

In [72]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


`duplicated()`

Cria uma máscara que indica se a linha inteira é repetida (True) ou não (False).

In [76]:
dados = [
  [1, 2],
  [3, 4],
  [1, 2]
]

dados = pd.DataFrame(dados, columns=['A', 'B'])

In [77]:
dados

Unnamed: 0,A,B
0,1,2
1,3,4
2,1,2


In [78]:
dados.duplicated()

0    False
1    False
2     True
dtype: bool

In [80]:
dados.duplicated().sum()

1

In [73]:
df.duplicated()

0        False
1        False
2        False
3        False
4        False
         ...  
19826    False
19827    False
19828    False
19829    False
19830    False
Length: 19831, dtype: bool

In [79]:
# Visualizar a quantidade de duplicados
df.duplicated().sum()

635

`drop_duplicates()`

In [82]:
dados = [
  [1, 2],
  [1, 2],
  [3, 4]
]

dados = pd.DataFrame(dados, columns=['A', 'B'])

In [84]:
dados.drop_duplicates(ignore_index=True)

Unnamed: 0,A,B
0,1,2
1,3,4


In [86]:
df.drop_duplicates(ignore_index=True, inplace=True)

In [87]:
df.shape

(19196, 11)

`to_*()`

Salvando o DataFrame em um determinado formato (CSV, XLSX, JSON, ...).

In [88]:
df.shape

(19196, 11)

In [90]:
df.to_csv('../00. Datasets/aluguel_ultimate.csv', index=False)

In [95]:
df.to_excel('../00. Datasets/aluguel_ultimate.xlsx', index=False, sheet_name='Outro nome de aba')
df[['Bairro', 'Valor']].to_excel('../00. Datasets/aluguel_ultimate.xlsx', index=False, sheet_name='Bairro')

### Salvando em múltiplas abas de uma mesma planilha

```python
with pd.ExcelWriter('../00. Datasets/aluguel_ultimate.xlsx', mode = 'w', if_sheet_exists='new') as writer:
  df.to_excel(writer, index=False, sheet_name='Outro nome de aba')
  df[['Bairro', 'Valor']].to_excel(writer, index=False, sheet_name='Bairro')
```

- `if_sheet_exists`
  - 'error'
  - 'new'
  - 'replace'

In [98]:
with pd.ExcelWriter('../00. Datasets/aluguel_ultimate.xlsx', mode = 'w') as writer:
  df.to_excel(writer, index=False, sheet_name='Outro nome de aba')
  df[['Bairro', 'Valor']].to_excel(writer, index=False, sheet_name='Bairro')

> Para mais detalhes sobre salvar o DataFrame em XLSX: https://xlsxwriter.readthedocs.io/

### Lendo múltiplas abas de uma planilha do Excel

In [101]:
pd.read_excel('../00. Datasets/aluguel_ultimate.xlsx', sheet_name='Outro nome de aba')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800,390,20,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800,230,0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300,301,17,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500,642,74,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500,455,14,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400,107,10,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900,605,0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100,323,0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500,286,200,68.18,Apartamento


In [102]:
pd.read_excel('../00. Datasets/aluguel_ultimate.xlsx', sheet_name='Bairro')

Unnamed: 0,Bairro,Valor
0,Centro,800
1,Higienópolis,800
2,Cachambi,1300
3,Grajaú,1500
4,Lins de Vasconcelos,1500
...,...,...
19191,Glória,400
19192,Flamengo,900
19193,Centro,1100
19194,Copacabana,1500


In [103]:
# Valor default do sheet_name
pd.read_excel('../00. Datasets/aluguel_ultimate.xlsx', sheet_name=0)

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800,390,20,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800,230,0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300,301,17,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500,642,74,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500,455,14,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400,107,10,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900,605,0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100,323,0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500,286,200,68.18,Apartamento


### Filtragem de Dados

1. Utilizando máscaras
2. Utilizando o método `query`

```sql
SELECT * from aluguel;
```

In [105]:
df

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


```sql
SELECT Bairro, Valor FROM aluguel;
```

In [104]:
df[['Bairro', 'Valor']]

Unnamed: 0,Bairro,Valor
0,Centro,800.0
1,Higienópolis,800.0
2,Cachambi,1300.0
3,Grajaú,1500.0
4,Lins de Vasconcelos,1500.0
...,...,...
19191,Glória,400.0
19192,Flamengo,900.0
19193,Centro,1100.0
19194,Copacabana,1500.0


```sql
SELECT * FROM aluguel WHERE Valor <= 2000;
```

In [106]:
# 1. Máscaras
df[df['Valor'] <= 2000]

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


In [107]:
df.query('Valor <= 2000')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


```sql
SELECT * FROM aluguel WHERE Valor <= 2000 AND Tipo != 'Quitinete';
```

In [108]:
# 1. Máscaras
df[(df['Valor'] <= 2000) & (df['Tipo'] != 'Quitinete')]

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
18387,Casa de Vila,Riachuelo,2,0,0,43,1000.0,0.0,120.0,23.26,Casa
18388,Casa de Vila,Quintino Bocaiúva,2,0,0,58,1000.0,0.0,0.0,17.24,Casa
18389,Casa de Vila,Todos os Santos,3,1,1,92,1500.0,80.0,11.0,16.30,Casa
18390,Casa de Vila,Riachuelo,3,0,0,73,850.0,0.0,0.0,11.64,Casa


In [109]:
df.query('Valor <= 2000 and Tipo != "Quitinete"')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
18387,Casa de Vila,Riachuelo,2,0,0,43,1000.0,0.0,120.0,23.26,Casa
18388,Casa de Vila,Quintino Bocaiúva,2,0,0,58,1000.0,0.0,0.0,17.24,Casa
18389,Casa de Vila,Todos os Santos,3,1,1,92,1500.0,80.0,11.0,16.30,Casa
18390,Casa de Vila,Riachuelo,3,0,0,73,850.0,0.0,0.0,11.64,Casa


```sql
SELECT * FROM aluguel WHERE Valor <= 2000 OR Tipo == 'Casa de Vila';
```

In [110]:
df[(df['Valor'] <= 2000) | (df['Tipo'] == 'Casa de Vila')]

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


In [111]:
df.query('Valor <= 2000 or Tipo == "Casa de Vila"')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.00,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
19191,Quitinete,Glória,1,0,0,10,400.0,107.0,10.0,40.00,Apartamento
19192,Quitinete,Flamengo,1,0,0,23,900.0,605.0,0.0,39.13,Apartamento
19193,Quitinete,Centro,1,0,0,24,1100.0,323.0,0.0,45.83,Apartamento
19194,Quitinete,Copacabana,1,0,0,22,1500.0,286.0,200.0,68.18,Apartamento


### Agrupamento e Ordenação de Dados

#### Agrupamento

In [112]:
df.head()

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
0,Apartamento,Centro,1,0,0,15,800.0,390.0,20.0,53.33,Apartamento
1,Apartamento,Higienópolis,1,0,0,48,800.0,230.0,0.0,16.67,Apartamento
2,Apartamento,Cachambi,2,0,0,50,1300.0,301.0,17.0,26.0,Apartamento
3,Apartamento,Grajaú,2,1,0,70,1500.0,642.0,74.0,21.43,Apartamento
4,Apartamento,Lins de Vasconcelos,3,1,1,90,1500.0,455.0,14.0,16.67,Apartamento


```sql
SELECT AVG(Valor) FROM aluguel
  GROUP BY Bairro;
```

In [117]:
df.groupby('Bairro').mean()

Unnamed: 0_level_0,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²)
Bairro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
Abolição,1.866667,0.600000,0.000000,77.800000,1195.333333,191.400000,41.666667,19.008000
Alto da Boa Vista,3.666667,2.166667,1.333333,260.000000,3966.666667,885.000000,159.333333,13.631667
Anchieta,2.000000,1.000000,0.000000,78.500000,875.000000,19.750000,0.000000,11.235000
Andaraí,1.969072,0.608247,0.247423,70.670103,1464.711340,497.711340,46.680412,21.565052
Anil,2.549296,1.366197,0.732394,134.591549,2048.873239,455.845070,228.492958,19.348592
...,...,...,...,...,...,...,...,...
Vila Valqueire,2.458333,1.416667,0.479167,103.500000,1769.583333,276.875000,87.958333,16.346667
Vila da Penha,1.990385,0.538462,0.278846,72.528846,1260.576923,232.682692,8.730769,18.660481
Vista Alegre,2.000000,0.750000,0.062500,70.000000,1114.375000,210.062500,0.000000,16.049375
Zumbi,2.500000,0.500000,0.000000,77.000000,2150.000000,1050.000000,116.000000,27.500000


In [119]:
df.groupby('Bairro').mean()[['Valor']]

Unnamed: 0_level_0,Valor
Bairro,Unnamed: 1_level_1
Abolição,1195.333333
Alto da Boa Vista,3966.666667
Anchieta,875.000000
Andaraí,1464.711340
Anil,2048.873239
...,...
Vila Valqueire,1769.583333
Vila da Penha,1260.576923
Vista Alegre,1114.375000
Zumbi,2150.000000


In [121]:
df.groupby('Bairro', as_index=False).mean()

Unnamed: 0,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²)
0,Abolição,1.866667,0.600000,0.000000,77.800000,1195.333333,191.400000,41.666667,19.008000
1,Alto da Boa Vista,3.666667,2.166667,1.333333,260.000000,3966.666667,885.000000,159.333333,13.631667
2,Anchieta,2.000000,1.000000,0.000000,78.500000,875.000000,19.750000,0.000000,11.235000
3,Andaraí,1.969072,0.608247,0.247423,70.670103,1464.711340,497.711340,46.680412,21.565052
4,Anil,2.549296,1.366197,0.732394,134.591549,2048.873239,455.845070,228.492958,19.348592
...,...,...,...,...,...,...,...,...,...
146,Vila Valqueire,2.458333,1.416667,0.479167,103.500000,1769.583333,276.875000,87.958333,16.346667
147,Vila da Penha,1.990385,0.538462,0.278846,72.528846,1260.576923,232.682692,8.730769,18.660481
148,Vista Alegre,2.000000,0.750000,0.062500,70.000000,1114.375000,210.062500,0.000000,16.049375
149,Zumbi,2.500000,0.500000,0.000000,77.000000,2150.000000,1050.000000,116.000000,27.500000


In [128]:
# Realizar as funções de agregação, utilizando o agg
df.groupby('Bairro').agg({ 'Valor': np.mean, 'IPTU': np.max, 'Area': np.min })\
  .rename(columns={'Valor': 'Média dos alugueis', 'IPTU': 'Valor máximo do IPTU', 'Area': 'Valor mínimo da área'})

Unnamed: 0_level_0,Média dos alugueis,Valor máximo do IPTU,Valor mínimo da área
Bairro,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Abolição,1195.333333,192.0,42
Alto da Boa Vista,3966.666667,480.0,40
Anchieta,875.000000,0.0,70
Andaraí,1464.711340,229.0,30
Anil,2048.873239,3000.0,20
...,...,...,...
Vila Valqueire,1769.583333,2315.0,45
Vila da Penha,1260.576923,363.0,15
Vista Alegre,1114.375000,0.0,50
Zumbi,2150.000000,172.0,64


In [134]:
df.groupby(['Bairro', 'Tipo']).agg({ 'Valor': np.mean }).head(20)

Unnamed: 0_level_0,Unnamed: 1_level_0,Valor
Bairro,Tipo,Unnamed: 2_level_1
Abolição,Apartamento,994.444444
Abolição,Casa,1645.0
Abolição,Casa de Vila,1200.0
Alto da Boa Vista,Apartamento,1483.333333
Alto da Boa Vista,Casa,15000.0
Alto da Boa Vista,Casa de Condomínio,3750.0
Alto da Boa Vista,Casa de Vila,600.0
Anchieta,Apartamento,850.0
Anchieta,Casa,825.0
Anchieta,Casa de Vila,1000.0


In [135]:
df.groupby(['Bairro', 'Tipo'], as_index=False).agg({ 'Valor': np.mean }).head(20)

Unnamed: 0,Bairro,Tipo,Valor
0,Abolição,Apartamento,994.444444
1,Abolição,Casa,1645.0
2,Abolição,Casa de Vila,1200.0
3,Alto da Boa Vista,Apartamento,1483.333333
4,Alto da Boa Vista,Casa,15000.0
5,Alto da Boa Vista,Casa de Condomínio,3750.0
6,Alto da Boa Vista,Casa de Vila,600.0
7,Anchieta,Apartamento,850.0
8,Anchieta,Casa,825.0
9,Anchieta,Casa de Vila,1000.0


In [136]:
df.groupby(['Bairro', 'Tipo'], as_index=False).agg({ 'Valor': np.mean }).rename(columns={'Valor': 'Valor médio'})

Unnamed: 0,Bairro,Tipo,Valor médio
0,Abolição,Apartamento,994.444444
1,Abolição,Casa,1645.000000
2,Abolição,Casa de Vila,1200.000000
3,Alto da Boa Vista,Apartamento,1483.333333
4,Alto da Boa Vista,Casa,15000.000000
...,...,...,...
437,Vista Alegre,Casa de Condomínio,1150.000000
438,Zumbi,Apartamento,2150.000000
439,Água Santa,Apartamento,850.000000
440,Água Santa,Casa,1200.000000


#### Ordenação

```sql
SELECT * FROM aluguel
  ORDER BY Valor;
```

In [137]:
df.sort_values(by='Valor')

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
18521,Quitinete,Copacabana,0,0,0,36,100.0,0.0,0.0,2.78,Apartamento
10838,Apartamento,Leblon,0,0,0,15,100.0,1.0,0.0,6.67,Apartamento
19001,Quitinete,Anil,1,0,0,20,300.0,18.0,8.0,15.00,Apartamento
18524,Quitinete,Jacarepaguá,1,0,0,20,300.0,15.0,7.0,15.00,Apartamento
4255,Apartamento,Engenho de Dentro,1,0,0,30,300.0,50.0,0.0,10.00,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
17636,Casa de Condomínio,Barra da Tijuca,4,4,2,660,30000.0,1050.0,1300.0,45.45,Casa
17524,Casa de Condomínio,Barra da Tijuca,6,0,4,700,30000.0,2100.0,700.0,42.86,Casa
17750,Casa de Condomínio,Barra da Tijuca,5,6,5,1200,32000.0,2134.0,17480.0,26.67,Casa
18079,Casa de Condomínio,Barra da Tijuca,5,4,5,850,32000.0,1510.0,15804.0,37.65,Casa


```sql
SELECT * FROM aluguel
  ORDER BY Valor DESC;
```

In [138]:
df.sort_values(by='Valor', ascending=False)

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
17328,Casa de Condomínio,Barra da Tijuca,5,6,4,1000,32000.0,3000.0,1700.0,32.00,Casa
17750,Casa de Condomínio,Barra da Tijuca,5,6,5,1200,32000.0,2134.0,17480.0,26.67,Casa
18079,Casa de Condomínio,Barra da Tijuca,5,4,5,850,32000.0,1510.0,15804.0,37.65,Casa
17778,Casa de Condomínio,Barra da Tijuca,7,6,6,1350,30000.0,0.0,0.0,22.22,Casa
18022,Casa de Condomínio,Barra da Tijuca,4,5,4,1000,30000.0,2000.0,2670.0,30.00,Casa
...,...,...,...,...,...,...,...,...,...,...,...
4255,Apartamento,Engenho de Dentro,1,0,0,30,300.0,50.0,0.0,10.00,Apartamento
18524,Quitinete,Jacarepaguá,1,0,0,20,300.0,15.0,7.0,15.00,Apartamento
19001,Quitinete,Anil,1,0,0,20,300.0,18.0,8.0,15.00,Apartamento
10838,Apartamento,Leblon,0,0,0,15,100.0,1.0,0.0,6.67,Apartamento


In [139]:
df.sort_values(by=['Valor', 'IPTU'])

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
10838,Apartamento,Leblon,0,0,0,15,100.0,1.0,0.0,6.67,Apartamento
18521,Quitinete,Copacabana,0,0,0,36,100.0,0.0,0.0,2.78,Apartamento
4255,Apartamento,Engenho de Dentro,1,0,0,30,300.0,50.0,0.0,10.00,Apartamento
18524,Quitinete,Jacarepaguá,1,0,0,20,300.0,15.0,7.0,15.00,Apartamento
19001,Quitinete,Anil,1,0,0,20,300.0,18.0,8.0,15.00,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
17313,Casa de Condomínio,Barra da Tijuca,4,4,4,800,30000.0,2700.0,18300.0,37.50,Casa
17241,Casa de Condomínio,Barra da Tijuca,4,4,4,1000,30000.0,2695.0,23715.0,30.00,Casa
17328,Casa de Condomínio,Barra da Tijuca,5,6,4,1000,32000.0,3000.0,1700.0,32.00,Casa
18079,Casa de Condomínio,Barra da Tijuca,5,4,5,850,32000.0,1510.0,15804.0,37.65,Casa


In [141]:
df.sort_values(['Valor', 'IPTU'], ascending=[True, False])

Unnamed: 0,Tipo,Bairro,Quartos,Vagas,Suites,Area,Valor,Condominio,IPTU,Valor (m²),Tipo Agregado
10838,Apartamento,Leblon,0,0,0,15,100.0,1.0,0.0,6.67,Apartamento
18521,Quitinete,Copacabana,0,0,0,36,100.0,0.0,0.0,2.78,Apartamento
19001,Quitinete,Anil,1,0,0,20,300.0,18.0,8.0,15.00,Apartamento
18524,Quitinete,Jacarepaguá,1,0,0,20,300.0,15.0,7.0,15.00,Apartamento
4255,Apartamento,Engenho de Dentro,1,0,0,30,300.0,50.0,0.0,10.00,Apartamento
...,...,...,...,...,...,...,...,...,...,...,...
18116,Casa de Condomínio,Barra da Tijuca,5,4,5,1280,30000.0,1300.0,0.0,23.44,Casa
18154,Casa de Condomínio,Barra da Tijuca,4,4,2,600,30000.0,1050.0,0.0,50.00,Casa
17750,Casa de Condomínio,Barra da Tijuca,5,6,5,1200,32000.0,2134.0,17480.0,26.67,Casa
18079,Casa de Condomínio,Barra da Tijuca,5,4,5,850,32000.0,1510.0,15804.0,37.65,Casa
