#### Loading the libraries

In [4]:
import pandas as pd
from pathlib import Path

DATASETS_PATH = Path("../../../../datasets/")

#### read csv data and show some data

In [5]:
df_imdb = pd.read_csv(Path(DATASETS_PATH / "IMDB_Movie_3columns_with_index.csv"))
df_imdb.head(3)

Unnamed: 0.1,Unnamed: 0,director_name,movie_title,title_year
0,0,James Cameron,Avatar \t,2009.0
1,1,Gore Verbinski,Pirates of the Caribbean: At World's End,2007.0
2,2,Sam Mendes,Spectre,2015.0


# Overview

Basically, dataframe selecting is action of select a subset of data inside full data. This action of selection could be:
* select a column
* select multiple columns
* select some rows
* select some rows inside some columns

Image show up each area of selection

# . operator

```df.column_name```

Each column name could be consider as a dataframe attribute, so each column could be accessed with . operator like a object class access it's attribute

In [6]:
# select director_name column with . operator
df_imdb.director_name  # df_imdb["director_name"]

0           James Cameron
1          Gore Verbinski
2              Sam Mendes
3       Christopher Nolan
4             Doug Walker
              ...        
5038          Scott Smith
5039                  NaN
5040     Benjamin Roberds
5041          Daniel Hsia
5042             Jon Gunn
Name: director_name, Length: 5043, dtype: object

# ```[]``` operator
```df[column selector]```
column selector could be;
    * a column
    * list of column use []
    * list of column use :
    

In [7]:
# select 'director_name', 'movie_title' with [] operator
df_imdb[["movie_title", "director_name"]]

Unnamed: 0,movie_title,director_name
0,Avatar \t,James Cameron
1,Pirates of the Caribbean: At World's End,Gore Verbinski
2,Spectre,Sam Mendes
3,The Dark Knight Rises,Christopher Nolan
4,Star Wars: Episode VII - The Force Awakens ...,Doug Walker
...,...,...
5038,Signed Sealed Delivered,Scott Smith
5039,The Following,
5040,A Plague So Pleasant,Benjamin Roberds
5041,Shanghai Calling,Daniel Hsia


# loc operator
```df.loc[[select row with row index],[select column with column index]]```
when print out a dataframe
* the top most row contain dataframe index for columns
* the left most column contain dataframe index for rows
* selector for row, column could be
    * one index
    * list of index ```[]```
    * use slicing ```:```

In [8]:
# select row with lable 0 and column label is 'director_name'
df_imdb.loc[0, "director_name"]

'James Cameron'

In [9]:
# select first 3 rows on 2 columns 'movie_title' and 'director_name'
df_imdb.loc[0:2, ["movie_title", "director_name"]]

Unnamed: 0,movie_title,director_name
0,Avatar \t,James Cameron
1,Pirates of the Caribbean: At World's End,Gore Verbinski
2,Spectre,Sam Mendes


# iloc operator

```df.iloc[[select columns with their position], [select rows with their position]]```

* imagine each row is assign a number, first row is assigned to 0 and final row is assigned to (total number of rows - 1)
* imagine each column is assign a number, first column is assign to 0 and final column is assigned to (total number of columns - 1)
* selector could be:
    * a number
    * a list of number
    * slicing with : operator
    

In [10]:
# select third row on second column
df_imdb.iloc[2, 1]

'Sam Mendes'

In [11]:
# select first and third rows
# on first and second columns use slicing
df_imdb.iloc[[0, 2], [0, 1]]

Unnamed: 0.1,Unnamed: 0,director_name
0,0,James Cameron
2,2,Sam Mendes


## Indexing

In [12]:
df_imdb.index

RangeIndex(start=0, stop=5043, step=1)

In [13]:
df_imdb.columns

Index(['Unnamed: 0', 'director_name', 'movie_title', 'title_year'], dtype='object')

In [14]:
# index is not part of the DataFrame
df_imdb.shape

(5043, 4)

What are indexes for?

    - Identification
    - Selection
    - Alignment

In [15]:
df_imdb.head()

Unnamed: 0.1,Unnamed: 0,director_name,movie_title,title_year
0,0,James Cameron,Avatar \t,2009.0
1,1,Gore Verbinski,Pirates of the Caribbean: At World's End,2007.0
2,2,Sam Mendes,Spectre,2015.0
3,3,Christopher Nolan,The Dark Knight Rises,2012.0
4,4,Doug Walker,Star Wars: Episode VII - The Force Awakens ...,


In [16]:
df_imdb.set_index("movie_title", inplace=True)

df_imdb.head()

Unnamed: 0_level_0,Unnamed: 0,director_name,title_year
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar \t,0,James Cameron,2009.0
Pirates of the Caribbean: At World's End,1,Gore Verbinski,2007.0
Spectre,2,Sam Mendes,2015.0
The Dark Knight Rises,3,Christopher Nolan,2012.0
Star Wars: Episode VII - The Force Awakens,4,Doug Walker,


In [17]:
df_imdb.index

Index(['Avatar \t', 'Pirates of the Caribbean: At World's End ', 'Spectre ',
       'The Dark Knight Rises ',
       'Star Wars: Episode VII - The Force Awakens             ',
       'John Carter ', 'Spider-Man 3 ', 'Tangled ', 'Avengers: Age of Ultron ',
       'Harry Potter and the Half-Blood Prince ',
       ...
       'Primer ', 'Cavite ', 'El Mariachi ', 'The Mongol King ', 'Newlyweds ',
       'Signed Sealed Delivered ', 'The Following             ',
       'A Plague So Pleasant ', 'Shanghai Calling ', 'My Date with Drew '],
      dtype='object', name='movie_title', length=5043)

In [18]:
df_imdb["director_name"]

movie_title
Avatar \t                                                      James Cameron
Pirates of the Caribbean: At World's End                      Gore Verbinski
Spectre                                                           Sam Mendes
The Dark Knight Rises                                      Christopher Nolan
Star Wars: Episode VII - The Force Awakens                       Doug Walker
                                                                 ...        
Signed Sealed Delivered                                          Scott Smith
The Following                                                            NaN
A Plague So Pleasant                                        Benjamin Roberds
Shanghai Calling                                                 Daniel Hsia
My Date with Drew                                                   Jon Gunn
Name: director_name, Length: 5043, dtype: object

In [19]:
df_imdb["director_name"].sort_index()

movie_title
#Horror                               Tara Subkoff
10 Cloverfield Lane               Dan Trachtenberg
10 Days in a Madhouse                Timothy Hines
10 Things I Hate About You              Gil Junger
10,000 B.C.                    Christopher Barnard
                                      ...         
[Rec]                              Jaume Balagueró
eXistenZ                          David Cronenberg
xXx: State of the Union               Lee Tamahori
xXx                                      Rob Cohen
Æon Flux                              Karyn Kusama
Name: director_name, Length: 5043, dtype: object