# Selecting Columns From Dataframe
By:<a href='https://www.youtube.com/wonkyCode'>WonkyCode</a>

In [1]:
import pandas as pd

dataset = pd.read_csv("datasets/netflix.csv")

In [2]:
dataset.head(3)

Unnamed: 0,show id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."


**Selecting One Column:**

In [3]:
dataset.title   # title is a column name (Not an attribute)

0           Norm of the North: King Sized Adventure
1                        Jandino: Whatever it Takes
2                                Transformers Prime
3                  Transformers: Robots in Disguise
4                                      #realityhigh
                           ...                     
6229                                   Red vs. Blue
6230                                          Maron
6231         Little Baby Bum: Nursery Rhyme Friends
6232    A Young Doctor's Notebook and Other Stories
6233                                        Friends
Name: title, Length: 6234, dtype: object

In [4]:
dataset.release_year

0       2019
1       2016
2       2013
3       2016
4       2017
        ... 
6229    2015
6230    2016
6231    2016
6232    2013
6233    2003
Name: release_year, Length: 6234, dtype: int64

* But the above approach doesn't work all the time when there is spaces in the column names.

In [5]:
dataset["title"] # it works even if you have spaces in the column name

0           Norm of the North: King Sized Adventure
1                        Jandino: Whatever it Takes
2                                Transformers Prime
3                  Transformers: Robots in Disguise
4                                      #realityhigh
                           ...                     
6229                                   Red vs. Blue
6230                                          Maron
6231         Little Baby Bum: Nursery Rhyme Friends
6232    A Young Doctor's Notebook and Other Stories
6233                                        Friends
Name: title, Length: 6234, dtype: object

In [6]:
dataset["director"]

0       Richard Finn, Tim Maltby
1                            NaN
2                            NaN
3                            NaN
4               Fernando Lebrija
                  ...           
6229                         NaN
6230                         NaN
6231                         NaN
6232                         NaN
6233                         NaN
Name: director, Length: 6234, dtype: object

In [7]:
dataset["show id"]

0       81145628
1       80117401
2       70234439
3       80058654
4       80125979
          ...   
6229    80000063
6230    70286564
6231    80116008
6232    70281022
6233    70153404
Name: show id, Length: 6234, dtype: int64

In [8]:
dataset.show id #doesn't work with spaced column name

SyntaxError: invalid syntax (<ipython-input-8-a69452e18d60>, line 1)

In [9]:
type(dataset["director"])

pandas.core.series.Series

In [10]:
dataset["director"].head()

0    Richard Finn, Tim Maltby
1                         NaN
2                         NaN
3                         NaN
4            Fernando Lebrija
Name: director, dtype: object

**Selecting Two or More Columns:**

In [11]:
dataset[ ["title", "director"] ]

Unnamed: 0,title,director
0,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby"
1,Jandino: Whatever it Takes,
2,Transformers Prime,
3,Transformers: Robots in Disguise,
4,#realityhigh,Fernando Lebrija
...,...,...
6229,Red vs. Blue,
6230,Maron,
6231,Little Baby Bum: Nursery Rhyme Friends,
6232,A Young Doctor's Notebook and Other Stories,


In [12]:
dataset[ ["director", "title"] ] # you can even reverse the order

Unnamed: 0,director,title
0,"Richard Finn, Tim Maltby",Norm of the North: King Sized Adventure
1,,Jandino: Whatever it Takes
2,,Transformers Prime
3,,Transformers: Robots in Disguise
4,Fernando Lebrija,#realityhigh
...,...,...
6229,,Red vs. Blue
6230,,Maron
6231,,Little Baby Bum: Nursery Rhyme Friends
6232,,A Young Doctor's Notebook and Other Stories


In [13]:
dataset[ ["title", "director"] ].head(3)

Unnamed: 0,title,director
0,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby"
1,Jandino: Whatever it Takes,
2,Transformers Prime,


In [14]:
dataset[ ["title", "director"] ].tail()

Unnamed: 0,title,director
6229,Red vs. Blue,
6230,Maron,
6231,Little Baby Bum: Nursery Rhyme Friends,
6232,A Young Doctor's Notebook and Other Stories,
6233,Friends,


In [17]:
columns = ["show id", "title", "crew"]
dataset[columns]

KeyError: "['crew'] not in index"

In [19]:
columns = ["show id", "title", "director"]
dataset[columns].head(3)

Unnamed: 0,show id,title,director
0,81145628,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby"
1,80117401,Jandino: Whatever it Takes,
2,70234439,Transformers Prime,


**Challenge:**
###### <span style='color: green'>When selecting columns, instead of giving comma separated column names, use slicing concept to select 3 or more columns</span> and let me know in the comment section below.

**Useful References:**
* https://www.geeksforgeeks.org/how-to-select-multiple-columns-in-a-pandas-dataframe/