# DataFrames III: Data Extraction

In [412]:
import pandas as pd

## This Module's Dataset
- This module's dataset is a collection of all James Bond movies.

In [413]:
bond = pd.read_csv('jamesbond.csv')
bond

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
8,Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
9,The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


In [414]:
bond.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27 entries, 0 to 26
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Film               27 non-null     object 
 1   Year               27 non-null     int64  
 2   Actor              27 non-null     object 
 3   Director           27 non-null     object 
 4   Box Office         27 non-null     float64
 5   Budget             27 non-null     float64
 6   Bond Actor Salary  20 non-null     float64
dtypes: float64(3), int64(1), object(3)
memory usage: 1.6+ KB


## The set_index and reset_index Methods
- The index serves as the collection of primary identifiers/labels/entrypoints for the rows.
- The fastest way to extract a row is from a sorted index by position/label.
- Pandas uses index labels/values when merging different objects together.
- The `set_index` method sets an existing column as the index of the **DataFrame**.
- The `reset_index` method sets the standard ascending numeric index as the index of the **DataFrame**.

In [415]:
bond = pd.read_csv('jamesbond.csv')
bond.head()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
4,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [416]:
bond = bond.set_index('Film')
bond

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


In [417]:
bond = bond.reset_index()
bond[bond['Bond Actor Salary'].notnull()].sort_values(by='Bond Actor Salary')

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
6,On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
5,You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
3,Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
18,GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
16,The Living Daylights,1987,Timothy Dalton,John Glen,313.5,68.8,5.2
7,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


## Retrieve Rows by Index Position with iloc Accessor
- The `iloc` accessor retrieves one or more rows by index position.
- Provide a pair of square brackets after the accessor.
- `iloc` accepts single values, lists, and slices.

In [418]:
# com uma linha especifica, é uma Serie
bond.iloc[5]

Film                 You Only Live Twice
Year                                1967
Actor                       Sean Connery
Director                   Lewis Gilbert
Box Office                         514.2
Budget                              59.9
Bond Actor Salary                    4.4
Name: 5, dtype: object

In [419]:
bond.iloc[[15,20]]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
15,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
20,The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5


In [420]:
bond['Actor'].loc[0]

'Sean Connery'

In [421]:
# Cria uma série com 'Actor' como índice e 'Film' como valor
bond_actor_serie = bond.set_index('Actor')['Film']
bond_actor_serie.loc['Sean Connery']

Actor
Sean Connery                   Dr. No
Sean Connery    From Russia with Love
Sean Connery               Goldfinger
Sean Connery              Thunderball
Sean Connery      You Only Live Twice
Sean Connery     Diamonds Are Forever
Sean Connery    Never Say Never Again
Name: Film, dtype: object

## Retrieve Rows by Index Label with loc Accessor
- The `loc` accessor retrieves one or more rows by index label.
- Provide a pair of square brackets after the accessor.

In [422]:
bond = pd.read_csv('jamesbond.csv', index_col='Film').sort_values('Film')
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [423]:
bond.loc['A View to a Kill']

Year                        1985
Actor                Roger Moore
Director               John Glen
Box Office                 275.2
Budget                      54.5
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object

In [424]:
bond.loc[['Casino Royale', 'Dr. No']].sort_values(by='Year')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3


In [425]:
bond.loc['Diamonds Are Forever':'Goldeneye']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1


In [426]:
bond.reset_index()

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
1,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
2,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
3,Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
4,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
5,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
6,For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
7,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
8,GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
9,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


## Second Arguments to loc and iloc Accessors
- The second value inside the square brackets targets the columns.
- The `iloc` requires numeric positions for rows and columns.
- The `loc` requires labels for rows and columns.

In [427]:
bond = bond.reset_index()
by_movie = bond.set_index('Film')
by_movie.sort_index()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [428]:
by_movie.loc[['Die Another Day', 'You Only Live Twice'], 'Director':'Budget']

Unnamed: 0_level_0,Director,Box Office,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Die Another Day,Lee Tamahori,465.4,154.2
You Only Live Twice,Lewis Gilbert,514.2,59.9


In [429]:
by_year = bond.set_index('Year')['Film']
by_year

Year
1985                   A View to a Kill
2006                      Casino Royale
1967                      Casino Royale
1971               Diamonds Are Forever
2002                    Die Another Day
1962                             Dr. No
1981                 For Your Eyes Only
1963              From Russia with Love
1995                          GoldenEye
1964                         Goldfinger
1989                    Licence to Kill
1973                   Live and Let Die
1979                          Moonraker
1983              Never Say Never Again
2021                     No Time to Die
1983                          Octopussy
1969    On Her Majesty's Secret Service
2008                  Quantum of Solace
2012                            Skyfall
2015                            Spectre
1987               The Living Daylights
1974        The Man with the Golden Gun
1977               The Spy Who Loved Me
1999            The World Is Not Enough
1965                        Thunder

In [430]:
# linha 4 (0-1-2-3), coluna 3 (0-1-2)
bond.iloc[3,2]

'Sean Connery'

In [431]:
# [linhas], [colunas]
bond.iloc[[0,2],[2,4]]

Unnamed: 0,Actor,Box Office
0,Roger Moore,275.2
2,David Niven,315.0


## Overwrite Value in a DataFrame
- Use the `iloc` or `loc` accessor on the **DataFrame** to target a value, then provide the equal sign and a new value.

In [432]:
change_value = bond.iloc[:1]
change_value

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


In [433]:
change_value.iloc[0,2]

'Roger Moore'

In [434]:
change_value.iloc[0,2] = 'New Unkown Actor'
change_value

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,A View to a Kill,1985,New Unkown Actor,John Glen,275.2,54.5,9.1


In [435]:
# cria um DF de index por nome de filme. Depois loc procura o nome 'index' desejado, na coluna desejada (por que é uma série com index filme)
# loc busca em Série e iloc 
bond_create_serie = pd.read_csv('jamesbond.csv', index_col='Film')
bond_create_serie.loc['From Russia with Love','Actor']

'Sean Connery'

In [436]:
bond_many_columns = pd.read_csv('jamesbond.csv', usecols=['Film','Actor','Year'])
bond_many_columns.head()

Unnamed: 0,Film,Year,Actor
0,Dr. No,1962,Sean Connery
1,From Russia with Love,1963,Sean Connery
2,Goldfinger,1964,Sean Connery
3,Thunderball,1965,Sean Connery
4,Casino Royale,1967,David Niven


##  Overwrite Multiple Values in a DataFrame
- The `replace` method replaces all occurrences of a **Series** value with another value (think of it like "Find and Replace").
- To overwrite multiple values in a **DataFrame**, remember to use an accessor on the **DataFrame** itself.
- Accessors like `loc` and `iloc` can accept Boolean Series. Use them to target the values to overwrite.

In [437]:
bond['Actor'] = bond['Actor'].replace('Sean Connery', 'Sir Sean Connery')
bond

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,A View to a Kill,1985,New Unkown Actor,John Glen,275.2,54.5,9.1
1,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
2,Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
3,Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
4,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
5,Dr. No,1962,Sir Sean Connery,Terence Young,448.8,7.0,0.6
6,For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
7,From Russia with Love,1963,Sir Sean Connery,Terence Young,543.8,12.6,1.6
8,GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
9,Goldfinger,1964,Sir Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [438]:
bond = pd.read_csv('jamesbond.csv', index_col='Film').sort_index()

In [439]:
bond[bond['Actor'] == 'Sean Connery'].loc['From Russia with Love':'Goldfinger']

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [440]:
is_sean_connery = bond['Actor'] == 'Sean Connery'
bond.loc[is_sean_connery, "Actor"]

Film
Diamonds Are Forever     Sean Connery
Dr. No                   Sean Connery
From Russia with Love    Sean Connery
Goldfinger               Sean Connery
Never Say Never Again    Sean Connery
Thunderball              Sean Connery
You Only Live Twice      Sean Connery
Name: Actor, dtype: object

In [441]:
bond.loc[is_sean_connery, "Actor"] = 'Sir Sean Connery'
bond

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sir Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sir Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sir Sean Connery,Guy Hamilton,820.4,18.6,3.2


## Rename Index Labels or Columns in a DataFrame
- The `rename` method accepts a dictionary for either its `columns` or `index` parameters.
- The dictionary keys represent the existing names and the values represent the new names.
- We can replace all columns by overwriting the **DataFrame's** `columns` attribute.

In [442]:
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [443]:
bond.rename(columns={
    "Year":"Year of Release",
    "Box Office":"Revenue"
    }).head()


Unnamed: 0_level_0,Year of Release,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [444]:
swaps = {
    'Casino Royale':'Cassino Royale',
    'Diamonds Are Forever':'Diamantes são eternos',
    'Die Another Day':'Um novo dia para morrer',
}
bond[bond['Bond Actor Salary'].notnull()].rename(index=swaps).head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Cassino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Diamantes são eternos,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Um novo dia para morrer,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sir Sean Connery,Terence Young,448.8,7.0,0.6


In [445]:
# bond.columns[2] ==> pandas não permite alterar nome usando essa sintaxe. Apenas alterando todo as colunas, conforme abaixo.
bond.columns = ['Ano','Ator', 'Diretor','Caixa','Budget','Salário Bond']
bond.head()

Unnamed: 0_level_0,Ano,Ator,Diretor,Caixa,Budget,Salário Bond
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


## Delete Rows or Columns from a DataFrame
- The `drop` method deletes one or more rows/columns from a **DataFrame**.
- Pass the `index` or `columns` parameters a list of the column names to remove.
- The `pop` method removes and returns a single **Series** (it mutates the **DataFrame** in the process).
- Python's `del` keyword also removes a single **Series**.

In [446]:
bond = pd.read_csv('jamesbond.csv', index_col='Film').sort_index()
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [447]:
bond.drop(columns=['Box Office', 'Budget'])
bond.drop(index=['No Time to Die',"Casino Royale"])
bond.drop(index=['No Time to Die',"Casino Royale"], columns=['Box Office', 'Budget'])


Unnamed: 0_level_0,Year,Actor,Director,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,1985,Roger Moore,John Glen,9.1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,17.9
Dr. No,1962,Sean Connery,Terence Young,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,
From Russia with Love,1963,Sean Connery,Terence Young,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,3.2
Licence to Kill,1989,Timothy Dalton,John Glen,7.9
Live and Let Die,1973,Roger Moore,Guy Hamilton,


In [448]:
# rodando o POP, ele já altera no dataframe original. DROP permite fazer apenas na cópia.
bond.pop('Year')
bond.head()

Unnamed: 0_level_0,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A View to a Kill,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [449]:
# faz o mesmo que o POP. Altera o df original.
del bond['Actor']

In [450]:
bond.head()

Unnamed: 0_level_0,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
A View to a Kill,John Glen,275.2,54.5,9.1
Casino Royale,Martin Campbell,581.5,145.3,3.3
Casino Royale,Ken Hughes,315.0,85.0,
Diamonds Are Forever,Guy Hamilton,442.5,34.7,5.8
Die Another Day,Lee Tamahori,465.4,154.2,17.9


## Create Random Sample with the sample Method
- The `sample` method returns a specified one or more random rows from the **DataFrame**.
- Customize the `axis` parameter to extract random columns.

In [451]:
bond = pd.read_csv('jamesbond.csv', index_col='Film').sort_index()

In [452]:
# n indices aleatórios
bond.sample(n=5)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
The Living Daylights,1987,Timothy Dalton,John Glen,313.5,68.8,5.2
The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [453]:
bond.sample(n=5, axis='rows')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,30.0
Tomorrow Never Dies,1997,Pierce Brosnan,Roger Spottiswoode,463.2,133.9,10.0


In [454]:
bond.sample(n=2, axis='columns').head()

Unnamed: 0_level_0,Actor,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1
A View to a Kill,Roger Moore,54.5
Casino Royale,Daniel Craig,145.3
Casino Royale,David Niven,85.0
Diamonds Are Forever,Sean Connery,34.7
Die Another Day,Pierce Brosnan,154.2


## The nsmallest and nlargest Methods
- The `nlargest` method returns a specified number of rows with the largest values from a given column.
- The `nsmallest` method returns rows with the smallest values from a given column.
- The `nlargest` and `nsmallest` methods are more efficient than sorting the entire **DataFrame**.

In [455]:
# 4 filmes com maiores Box Office:
bond.sort_values(by='Box Office', ascending=False).head(4)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [456]:
bond.nlargest(n=4,columns='Box Office')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
No Time to Die,2021,Daniel Craig,Cary Joji Fukunaga,774.2,301.0,25.0


In [457]:
bond['Box Office'].nlargest(4)

Film
Skyfall           943.5
Thunderball       848.1
Goldfinger        820.4
No Time to Die    774.2
Name: Box Office, dtype: float64

In [458]:
bond.nsmallest(n=3,columns='Bond Actor Salary')
bond['Bond Actor Salary'].nsmallest(3)

Film
Dr. No                             0.6
On Her Majesty's Secret Service    0.6
From Russia with Love              1.6
Name: Bond Actor Salary, dtype: float64

## Filtering with the where Method
- Similar to square brackets or `loc`, the `where` method filters the original `DataFrame` with a Boolean Series.
- Pandas will populate rows that do **not** match the criteria with `NaN` values.
- Leaving in the `NaN` values can be advantageous for certain merge and visualization operations.

In [459]:
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


In [461]:
is_sean_connery = bond['Actor'] == 'Sean Connery'
bond[is_sean_connery]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


In [477]:
bond[bond.where(is_sean_connery)['Bond Actor Salary'].notnull()].sort_values(by=['Director','Box Office'], ascending=[True,False])

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6


## The apply Method with DataFrames
- The `apply` method invokes a function on every column or every row in the **DataFrame**.
- Pass the uninvoked function as the first argument to the `apply` method.
- Pass the `axis` parameter an argument of `"columns"` to invoke the function on every row.
- Pandas will pass in the row's values as a **Series** object. We can use accessors like `loc` and `iloc` to extract the column's values for that row.

In [483]:
bond['Actor'].apply(len).head()

Film
A View to a Kill        11
Casino Royale           12
Casino Royale           11
Diamonds Are Forever    12
Die Another Day         14
Name: Actor, dtype: int64

In [490]:
def rank_movie(row):
    year = row.loc['Year']
    actor = row.loc['Actor']
    budget = row.loc['Budget']
    if year >= 1980 and year < 1990:
        return "Great 80's flick"
    if actor == 'Pierce Brosnan':
        return 'the best Bond Ever'
    if budget > 100:
        return 'Expensive Movie, fun'
    return 'No comment'

bond.apply(rank_movie, axis='columns')

Film
A View to a Kill                       Great 80's flick
Casino Royale                      Expensive Movie, fun
Casino Royale                                No comment
Diamonds Are Forever                         No comment
Die Another Day                      the best Bond Ever
Dr. No                                       No comment
For Your Eyes Only                     Great 80's flick
From Russia with Love                        No comment
GoldenEye                            the best Bond Ever
Goldfinger                                   No comment
Licence to Kill                        Great 80's flick
Live and Let Die                             No comment
Moonraker                                    No comment
Never Say Never Again                  Great 80's flick
No Time to Die                     Expensive Movie, fun
Octopussy                              Great 80's flick
On Her Majesty's Secret Service              No comment
Quantum of Solace                  Expensiv

In [None]:
def rank_movie(row):
    year = row.loc['Year']
    actor = row.loc['Actor']
    budget = row.loc['Budget']
    string = ''
    if year >= 1980 and year < 1990:
        string += "Great 80's flick"
    if actor == 'Pierce Brosnan':
        string += 'the best Bond Ever'
    if budget > 100:
        string += 'Expensive Movie, fun'
    if len(string) > 0:
        return string
    else:
        return 'No comment'

# Function to apply to each column or row >> axis : {0 or 'index', 1 or 'columns'}, default 0
bond.apply(rank_movie, axis='columns')

Film
A View to a Kill                                         Great 80's flick
Casino Royale                                        Expensive Movie, fun
Casino Royale                                                  No comment
Diamonds Are Forever                                           No comment
Die Another Day                    the best Bond EverExpensive Movie, fun
Dr. No                                                         No comment
For Your Eyes Only                                       Great 80's flick
From Russia with Love                                          No comment
GoldenEye                                              the best Bond Ever
Goldfinger                                                     No comment
Licence to Kill                                          Great 80's flick
Live and Let Die                                               No comment
Moonraker                                                      No comment
Never Say Never Again            