# Selecting Subsets of Data from a Series

Selecting subsets of data from a Series is accomplished similarly to how it's done with DataFrames. 

## Series indexer rules

The same three indexers, `[]`, `loc`, and `iloc`, are available for the Series. Because there are no columns in a Series, the rules for each indexer are slightly different than they are for a DataFrame. Let's begin by reading in the movie dataset and setting the index to the title.

In [3]:
import pandas as pd
movie = pd.read_csv('movie.csv', index_col='title')
movie.head(3)

Unnamed: 0_level_0,year,color,content_rating,duration,director_name,director_fb,actor1,actor1_fb,actor2,actor2_fb,...,actor3_fb,gross,genres,num_reviews,num_voted_users,plot_keywords,language,country,budget,imdb_score
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,2009.0,Color,PG-13,178.0,James Cameron,0.0,CCH Pounder,1000.0,Joel David Moore,936.0,...,855.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,723.0,886204,avatar|future|marine|native|paraplegic,English,USA,237000000.0,7.9
Pirates of the Caribbean: At World's End,2007.0,Color,PG-13,169.0,Gore Verbinski,563.0,Johnny Depp,40000.0,Orlando Bloom,5000.0,...,1000.0,309404152.0,Action|Adventure|Fantasy,302.0,471220,goddess|marriage ceremony|marriage proposal|pi...,English,USA,300000000.0,7.1
Spectre,2015.0,Color,PG-13,148.0,Sam Mendes,0.0,Christoph Waltz,11000.0,Rory Kinnear,393.0,...,161.0,200074175.0,Action|Adventure|Thriller,602.0,275868,bomb|espionage|sequel|spy|terrorist,English,UK,245000000.0,6.8


In [47]:
movie['imdb_score']['Spectre']

6.8

Let's select a single column of data so that we can have access to a Series. Here, we select the `imdb_score` column.

In [55]:
imdb = movie[['imdb_score']]
imdb.head(3)

Unnamed: 0_level_0,imdb_score
title,Unnamed: 1_level_1
Avatar,7.9
Pirates of the Caribbean: At World's End,7.1
Spectre,6.8


In [57]:
imdb['imdb_score']['Avatar']

7.9

In [62]:
imdb = movie['imdb_score']
imdb.head(3)

title
Avatar                                      7.9
Pirates of the Caribbean: At World's End    7.1
Spectre                                     6.8
Name: imdb_score, dtype: float64

In [63]:
imdb['Avatar']

7.9

In [64]:
imdb[0]

7.9

### Series subset selection with just the brackets

For DataFrames, we learned that *just the brackets* accepted either a single label or a list of labels and used this input to select one or more DataFrame columns. For a Series, *just the brackets* has different rules that you must follow to use it correctly. It allows selection by index label. For instance, we can select the `imdb_score` for the movie Avatar like this:

In [12]:
imdb["Pirates of the Caribbean: At World's End"]

7.1

Interestingly enough, it's possible to use integer location as well with *just the brackets*. The movie Avatar is at integer location 0 and we can duplicate our previous result by using it.

In [10]:
imdb[0]

7.9

## Use `loc` and `iloc` instead of just the brackets

For a Series, *just the brackets* is flexible and can take either a label or integer location. This might make it seem like `loc` and `iloc` would be unnecessary, but the opposite is actually the case. Using *just the brackets* for a Series is ambiguous and not explicit. It's not clear whether the label or integer location are being used.

I suggest only using `loc` and `iloc` for clarity. Whenever the `loc` indexer is used, we are certain it selects by label. Likewise, whenever the `iloc` indexer is used, we are certain it selects by integer location.

### Select a single value with `loc`

Select a single value by providing the `loc` indexer the name of the index. Here, we select the `imdb_score` of the movie Forrest Gump. When selecting a single value, just that value is returned and not a Series.

In [13]:
imdb.loc['Forrest Gump']

8.8

### Select multiple values using a list with `loc`

Provide the `loc` indexer a list of index labels to select multiple values. This will return a Series.

In [31]:
names = ['Good Will Hunting', 'Home Alone', 'Meet the Parents']
imdb.loc[names]

title
Good Will Hunting    8.3
Home Alone           7.5
Meet the Parents     7.0
Name: imdb_score, dtype: float64

### Select multiple values using slice notation with `loc`

Provide the `loc` indexer index labels for the start and stop components of slice notation to select all of the values between those two labels. The results are **inclusive** of the stop label.

In [15]:
imdb.loc['Home Alone':'Top Gun']

title
Home Alone          7.5
3 Men and a Baby    5.9
Tootsie             7.4
Top Gun             6.9
Name: imdb_score, dtype: float64

As with any slice notation, all components are optional. Here, we select every `imdb_score` from the movie Twins to the end.

In [16]:
imdb.loc['Twins':].head()

title
Twins                      6.0
Scream: The TV Series      7.3
The Yellow Handkerchief    6.8
The Color Purple           7.8
Tidal Wave                 5.7
Name: imdb_score, dtype: float64

In this example, we select every 300th `imdb_score` beginning at the movie Twins to the end.

In [19]:
imdb.loc['Twins'::300]

title
Twins                                                    6.0
Ernest & Celestine                                       7.9
Welcome to the Rileys                                    7.0
Alpha and Omega 4: The Legend of the Saw Toothed Cave    6.0
Fast Times at Ridgemont High                             7.2
Young Frankenstein                                       8.0
Neal 'N' Nikki                                           3.3
Rise of the Entrepreneur: The Search for a Better Way    8.2
Name: imdb_score, dtype: float64

### Select a single value with `iloc`

Let's select the `imdb_score` for the movie with integer location 499.

In [20]:
imdb.iloc[499]

4.2

Selecting with a single integer always returns the value by itself and not within a Series. If we want to return a one-item Series, so that we can see the index, we can use a one-item list as our selection.

In [21]:
imdb.iloc[[499]]

title
A Sound of Thunder    4.2
Name: imdb_score, dtype: float64

### Select multiple values using a list with `iloc`

Provide `iloc` a list of integer locations to select multiple values.

In [24]:
ints = [499, 599, 699]
imdb.iloc[ints]

title
A Sound of Thunder    4.2
The Abyss             7.6
Blades of Glory       6.3
Name: imdb_score, dtype: float64

In [26]:
ints = [499, 599, 699]
imdb.iloc[ints][2]

6.3

### Select multiple values using slice notation with `iloc`

Provide `iloc` with slice notation using integers as the stop and start components to select all the values between those two locations. The results are **exclusive** of the last integer. Here, we select integer locations 145 through, but not including 148.

In [27]:
imdb.iloc[145:148]

title
Mr. Peabody & Sherman                 6.9
Troy                                  7.2
Madagascar 3: Europe's Most Wanted    6.9
Name: imdb_score, dtype: float64

Let's select the last three values using slice notation.

In [28]:
imdb.iloc[-3:]

title
A Plague So Pleasant    6.3
Shanghai Calling        6.3
My Date with Drew       6.6
Name: imdb_score, dtype: float64

Let's select every 200th value from integer location 1,000 to 2,000

In [29]:
imdb.iloc[1000:2000:200]

title
The Life Aquatic with Steve Zissou    7.3
Ride Along 2                          5.9
Trainwreck                            6.3
Down to Earth                         5.4
The Duchess                           6.9
Name: imdb_score, dtype: float64