**Topics**:
* Pandas DataFrames:
 - What is a DataFrame?
 - Initialization of a dataframe
 - Loading data from a csv and excel with Pandas
 - Attributes of a DataFrame
 - Accessing data
 - Ordering and Description
 - Transforming the data
 - Filtering Data
 - Conditional Change
 - Exercises
 - Exporting the Data Frame


In [2]:
import pandas as pd
import numpy as np

# Pandas (Parte II)

## What is an DataFrame ?

Pandas DataFrame is a two-dimensional tabular data structure of variable size, which may or may not be heterogeneous (various data types), with labeled axes (rows and columns). The arrangement of a DataFrame allows the distribution of data in rows and columns. Thus, we can highlight three components present in a data frame: data, rows and columns

<img src=https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png width=700 height=500>

## Initialization of a Data Frame

To initialize a dataframe we use the function below:

pandas.DataFrame(```data```, ```index```, ```columns```, ```dtype```)


* ```data```: numpy array, lists, series
* ```index```: It is optional. If not informed, an object np.arange(m) will be used
* ```columns```: Also optional and initialized to np.arange(n) if not passed.
* ```dtype```: column data type

In [3]:
df_empty = pd.DataFrame()

In [4]:
df_empty

We can start a data frame from several other structures, such as:

* lists
* dictionaries
* arrays
* Series

* Data Frame from a list

In [5]:
list_1 = [1,2,3]
columns = ['a']
df_list = pd.DataFrame(data = list_1 , columns = columns)
df_list

Unnamed: 0,a
0,1
1,2
2,3


* Data Frame from a dictionary

In [6]:
dic = {'a' : [1,2] , 'b': [3,5]}
df_dic = pd.DataFrame(dic , columns = ['a' , 'b'] , dtype = float)

In [7]:
df_dic

Unnamed: 0,a,b
0,1.0,3.0
1,2.0,5.0


* Array Numpy

In [8]:
arr = np.arange(6)
pd.DataFrame(data=arr)

Unnamed: 0,0
0,0
1,1
2,2
3,3
4,4
5,5


* Series

In [9]:
series = pd.Series([1,6,5,8])
df_series = pd.DataFrame(series)
df_series

Unnamed: 0,0
0,1
1,6
2,5
3,8


## Loading data from a csv and excel with Pandas


In everyday life it is more common that we have access to data sources in csv, xlsx format files, or direct access to databases. In this way, pandas can be used to load the available data in data frame format

Here we will demonstrate loading a csv file.

CSV: **C**omma **S**eparated **V**alues<br>

<img src=https://s3.amazonaws.com/cdn.freshdesk.com/data/helpdesk/attachments/production/1080247870/original/UDHlNQUo4ju-0SSFws_XskGYvIf5KZNn4w.png?1563457877>

pd.```read_csv```<br>
<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html">Click here</a>

pd.read_csv(```file_path_buffer```, ```delimiter```)

* ```filepath_or_buffer```: File path. It can also be an internet link (github, kaggle)
* ```delimiter```: can be ',', ';'

In [10]:
# URL
url= "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
df_countries = pd.read_csv(filepath_or_buffer=url)

In [11]:
df_countries

Unnamed: 0,Country,Region
0,Algeria,AFRICA
1,Angola,AFRICA
2,Benin,AFRICA
3,Botswana,AFRICA
4,Burkina,AFRICA
...,...,...
189,Paraguay,SOUTH AMERICA
190,Peru,SOUTH AMERICA
191,Suriname,SOUTH AMERICA
192,Uruguay,SOUTH AMERICA


In [12]:
# From a local archive 
df_pokemons = pd.read_csv(r'C:\Users\vitor.silva\Desktop\Estudo Python\Python  - Digital\pokemon_data.csv', index_col=0)

In [13]:
df_pokemons

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,Charmander,Fire,,39,52,43,60,50,65,1,False
...,...,...,...,...,...,...,...,...,...,...,...
719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True


## Attributes of a Data Frame

In [14]:
# columns
df_countries.columns

Index(['Country', 'Region'], dtype='object')

In [15]:
# index
df_countries.index

RangeIndex(start=0, stop=194, step=1)

In [16]:
# Shape
df_countries.shape

(194, 2)

In [17]:
# values or data
df_countries.values

array([['Algeria', 'AFRICA'],
       ['Angola', 'AFRICA'],
       ['Benin', 'AFRICA'],
       ['Botswana', 'AFRICA'],
       ['Burkina', 'AFRICA'],
       ['Burundi', 'AFRICA'],
       ['Cameroon', 'AFRICA'],
       ['Cape Verde', 'AFRICA'],
       ['Central African Republic', 'AFRICA'],
       ['Chad', 'AFRICA'],
       ['Comoros', 'AFRICA'],
       ['Congo', 'AFRICA'],
       ['Congo, Democratic Republic of', 'AFRICA'],
       ['Djibouti', 'AFRICA'],
       ['Egypt', 'AFRICA'],
       ['Equatorial Guinea', 'AFRICA'],
       ['Eritrea', 'AFRICA'],
       ['Ethiopia', 'AFRICA'],
       ['Gabon', 'AFRICA'],
       ['Gambia', 'AFRICA'],
       ['Ghana', 'AFRICA'],
       ['Guinea', 'AFRICA'],
       ['Guinea-Bissau', 'AFRICA'],
       ['Ivory Coast', 'AFRICA'],
       ['Kenya', 'AFRICA'],
       ['Lesotho', 'AFRICA'],
       ['Liberia', 'AFRICA'],
       ['Libya', 'AFRICA'],
       ['Madagascar', 'AFRICA'],
       ['Malawi', 'AFRICA'],
       ['Mali', 'AFRICA'],
       ['Mauritania', 'AF

## Accessing data

In [18]:
# Acessing column 1
df_pokemons['Name']

#
1                  Bulbasaur
2                    Ivysaur
3                   Venusaur
3      VenusaurMega Venusaur
4                 Charmander
               ...          
719                  Diancie
719      DiancieMega Diancie
720      HoopaHoopa Confined
720       HoopaHoopa Unbound
721                Volcanion
Name: Name, Length: 800, dtype: object

In [20]:
# Acessing column 2
df_pokemons.Name

#
1                  Bulbasaur
2                    Ivysaur
3                   Venusaur
3      VenusaurMega Venusaur
4                 Charmander
               ...          
719                  Diancie
719      DiancieMega Diancie
720      HoopaHoopa Confined
720       HoopaHoopa Unbound
721                Volcanion
Name: Name, Length: 800, dtype: object

In [21]:
# Acessing lines 
df_pokemons.iloc[0]

Name          Bulbasaur
Type 1            Grass
Type 2           Poison
HP                   45
Attack               49
Defense              49
Sp. Atk              65
Sp. Def              65
Speed                45
Generation            1
Legendary         False
Name: 1, dtype: object

In [22]:
#Acessing elements
df_pokemons.iloc[0,0]

'Bulbasaur'

In [23]:
# Acessing with loc
df_pokemons.loc[df_pokemons['Name']=='Bulbasaur' , ['Name' , 'HP']]

Unnamed: 0_level_0,Name,HP
#,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Bulbasaur,45



We can access rows and columns simultaneously through the commands ```.iloc``` and ```.loc```

## Ordering and Description 

In [24]:
# Sorting by HP (ascending and descending order)
df_pokemons.sort_values(['HP'] , ascending = True)

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
292,Shedinja,Bug,Ghost,1,90,45,30,30,40,3,False
50,Diglett,Ground,,10,55,25,35,45,95,1,False
129,Magikarp,Water,,20,10,55,15,20,80,1,False
355,Duskull,Ghost,,20,40,90,30,90,25,3,False
439,Mime Jr.,Psychic,Fairy,20,25,45,70,90,60,4,False
...,...,...,...,...,...,...,...,...,...,...,...
594,Alomomola,Water,,165,75,80,40,45,65,5,False
321,Wailord,Water,,170,90,45,90,45,60,3,False
202,Wobbuffet,Psychic,,190,33,58,33,58,33,2,False
113,Chansey,Normal,,250,5,5,35,105,50,1,False


In [25]:
# Sort based on more than one column -> descending
df_pokemons.sort_values(['HP' , 'Attack'] , ascending = False)

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
242,Blissey,Normal,,255,10,10,75,135,55,2,False
113,Chansey,Normal,,250,5,5,35,105,50,1,False
202,Wobbuffet,Psychic,,190,33,58,33,58,33,2,False
321,Wailord,Water,,170,90,45,90,45,60,3,False
594,Alomomola,Water,,165,75,80,40,45,65,5,False
...,...,...,...,...,...,...,...,...,...,...,...
349,Feebas,Water,,20,15,20,10,55,80,3,False
129,Magikarp,Water,,20,10,55,15,20,80,1,False
213,Shuckle,Bug,Rock,20,10,230,10,230,5,2,False
50,Diglett,Ground,,10,55,25,35,45,95,1,False


In [26]:
# Describing the numbers
df_pokemons.describe()

Unnamed: 0,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation
count,800.0,800.0,800.0,800.0,800.0,800.0,800.0
mean,69.25875,79.00125,73.8425,72.82,71.9025,68.2775,3.32375
std,25.534669,32.457366,31.183501,32.722294,27.828916,29.060474,1.66129
min,1.0,5.0,5.0,10.0,20.0,5.0,1.0
25%,50.0,55.0,50.0,49.75,50.0,45.0,2.0
50%,65.0,75.0,70.0,65.0,70.0,65.0,3.0
75%,80.0,100.0,90.0,95.0,90.0,90.0,5.0
max,255.0,190.0,230.0,194.0,230.0,180.0,6.0


In [27]:
# Describing categorical data
df_pokemons.describe(include=['object'])

Unnamed: 0,Name,Type 1,Type 2
count,800,800,414
unique,800,18,18
top,Bulbasaur,Water,Flying
freq,1,112,97


In [43]:
# HEAD
df_pokemons.head()

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False
4,Charmander,Fire,,39,52,43,60,50,65,1,False


In [44]:
# Tail
df_pokemons.tail()

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
719,Diancie,Rock,Fairy,50,100,150,100,150,50,6,True
719,DiancieMega Diancie,Rock,Fairy,50,160,110,160,110,110,6,True
720,HoopaHoopa Confined,Psychic,Ghost,80,110,60,150,130,70,6,True
720,HoopaHoopa Unbound,Psychic,Dark,80,160,60,170,130,80,6,True
721,Volcanion,Fire,Water,80,110,120,130,90,70,6,True


## Transforming the data

In [28]:
# combining attacks and creating a new column
df_pokemons['Total_Attack'] = df_pokemons['Attack'] + df_pokemons['Sp. Atk']
df_pokemons['Total_Attack']

#
1      114
2      142
3      182
3      222
4      112
      ... 
719    200
719    320
720    260
720    330
721    240
Name: Total_Attack, Length: 800, dtype: int64

In [29]:
# Excluding
'''df_pokemons.drop('Total_Attack' , inplace=True) # inplace = True para substituir no dataframe original'''

"df_pokemons.drop('Total_Attack' , inplace=True) # inplace = True para substituir no dataframe original"

## Filtering data

We can filter dataframe rows based on conditions.

In [51]:
arr = np.array([1,2,3,4])

In [52]:
arr[arr > 3]

array([4])

In [30]:
# Use loc to filter only one Type
df_pokemons.loc[df_pokemons['Type 1'] =='Grass']

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Total_Attack
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False,114
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False,142
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False,182
3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False,222
43,Oddish,Grass,Poison,45,50,55,75,65,30,1,False,125
...,...,...,...,...,...,...,...,...,...,...,...,...
650,Chespin,Grass,,56,61,65,48,45,38,6,False,109
651,Quilladin,Grass,,61,78,95,56,58,57,6,False,134
652,Chesnaught,Grass,Fighting,88,107,122,74,75,64,6,False,181
672,Skiddo,Grass,,66,65,48,62,57,52,6,False,127


In [31]:
# Multiple conditions
# Pokemon Type 1 = Class and Type 2 = Poison
df_pokemons.loc[(df_pokemons['Type 1'] == 'Grass') & (df_pokemons['Type 2'] == 'Poison')]

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Total_Attack
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False,114
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False,142
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False,182
3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False,222
43,Oddish,Grass,Poison,45,50,55,75,65,30,1,False,125
44,Gloom,Grass,Poison,60,65,70,85,75,40,1,False,150
45,Vileplume,Grass,Poison,75,80,85,110,90,50,1,False,190
69,Bellsprout,Grass,Poison,50,75,35,70,30,40,1,False,145
70,Weepinbell,Grass,Poison,65,90,50,85,45,55,1,False,175
71,Victreebel,Grass,Poison,80,105,65,100,70,70,1,False,205


In [32]:
# Multiple conditions
# Pokemon Type 1 = Grass or Type 2 = Poison
df_pokemons.loc[(df_pokemons['Type 1'] == 'Grass') | (df_pokemons['Type 2'] == 'Poison')]

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Total_Attack
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,Bulbasaur,Grass,Poison,45,49,49,65,65,45,1,False,114
2,Ivysaur,Grass,Poison,60,62,63,80,80,60,1,False,142
3,Venusaur,Grass,Poison,80,82,83,100,100,80,1,False,182
3,VenusaurMega Venusaur,Grass,Poison,80,100,123,122,120,80,1,False,222
13,Weedle,Bug,Poison,40,35,30,20,20,50,1,False,55
...,...,...,...,...,...,...,...,...,...,...,...,...
650,Chespin,Grass,,56,61,65,48,45,38,6,False,109
651,Quilladin,Grass,,61,78,95,56,58,57,6,False,134
652,Chesnaught,Grass,Fighting,88,107,122,74,75,64,6,False,181
672,Skiddo,Grass,,66,65,48,62,57,52,6,False,127


## Conditional Change

Based on the conditions we want, we can replace the cell values.

Example: Every Type 1 of any pokemon whose HP is greater than 60 will be "LEGENDARY"

In [33]:
# Changing a single value
df_pokemons.loc[df_pokemons['HP'] > 60 , 'Type 1' ] == 'LENDARIO'

#
3      False
3      False
6      False
6      False
6      False
       ...  
717    False
718    False
720    False
720    False
721    False
Name: Type 1, Length: 461, dtype: bool

In [65]:
df_pokemons.loc[df_pokemons['HP']]

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Total_Attack
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
45,Vileplume,Grass,Poison,75,80,85,110,90,50,1,False,190
60,Poliwag,Water,,40,50,40,40,40,90,1,False,90
80,Slowbro,Water,Psychic,95,75,110,100,80,30,1,False,175
80,SlowbroMega Slowbro,Water,Psychic,95,75,180,130,80,30,1,False,205
80,Slowbro,Water,Psychic,95,75,110,100,80,30,1,False,175
...,...,...,...,...,...,...,...,...,...,...,...,...
80,SlowbroMega Slowbro,Water,Psychic,95,75,180,130,80,30,1,False,205
80,Slowbro,Water,Psychic,95,75,110,100,80,30,1,False,175
80,SlowbroMega Slowbro,Water,Psychic,95,75,180,130,80,30,1,False,205
80,Slowbro,Water,Psychic,95,75,110,100,80,30,1,False,175


In [34]:
#Changing more than 1 value
df_pokemons.loc[df_pokemons['HP'] > 60, ['Type 1', 'Legendary']]= ['LENDARIO', True]

In [35]:
df_pokemons.loc[df_pokemons['HP'] > 60]

Unnamed: 0_level_0,Name,Type 1,Type 2,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary,Total_Attack
#,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
3,Venusaur,LENDARIO,Poison,80,82,83,100,100,80,1,True,182
3,VenusaurMega Venusaur,LENDARIO,Poison,80,100,123,122,120,80,1,True,222
6,Charizard,LENDARIO,Flying,78,84,78,109,85,100,1,True,193
6,CharizardMega Charizard X,LENDARIO,Dragon,78,130,111,130,85,100,1,True,260
6,CharizardMega Charizard Y,LENDARIO,Flying,78,104,78,159,115,100,1,True,263
...,...,...,...,...,...,...,...,...,...,...,...,...
717,Yveltal,LENDARIO,Flying,126,131,95,131,98,99,6,True,262
718,Zygarde50% Forme,LENDARIO,Ground,108,100,121,81,95,95,6,True,181
720,HoopaHoopa Confined,LENDARIO,Ghost,80,110,60,150,130,70,6,True,260
720,HoopaHoopa Unbound,LENDARIO,Dark,80,160,60,170,130,80,6,True,330
