# 7. Selecting Subsets of Data - Series


### Objectives

+ Learn how to select subsets from a Series using `.loc` and `.iloc`


### Overview
This notebook will teach you how to select a subset of data from a Series.

# Using Dot Notation to Select a Column as a Series
Previously we learned how to use *just the brackets* to select a single column as a Series. Another common way to do this uses dot notation. Place the column name following a dot after the name of your DataFrame.

Let's read in the movie dataset, set the index as the title and then select the year with dot notation.

In [None]:
import pandas as pd
movie = pd.read_csv('../data/movie.csv', index_col='title')
movie.head()

In [None]:
movie.year.head()

### I don't recommend doing this
Although this is valid Pandas syntax I don't recommend using this notation for the following reasons:
* You cannot select columns with spaces in them
* You cannot select columns that have the same name as a Pandas method such as **`count`**

Using *just the brackets* **always** works so I recommend doing the following instead:

In [None]:
movie['year'].head()

### Why even know about this?
Pandas is written differently by different people and you will definitely see this syntax around so it's important to be aware of it.

It also has the advantage of providing tab-completion help when chaining a method to the end. Place your cursor at the end of the following two lines and press tab. Only the one that selects via dot notation will show the available methods. This helps me remember what methods are possible, so sometimes I will use this to find the method I need and then change the syntax back to the brackets.

In [None]:
# place your cursor after the dot and press tab
movie.year.

In [None]:
# place your cursor after the dot and press tab
movie['year'].

# Selecting Subsets of Data From a Series
Selecting subsets of data from a Series is very similar to that as a DataFrame. Since there are no columns in a Series, there isn't a need to use *just the brackets*. Instead, you can do all of your subset selection with **`.loc`** and **`.iloc`**

Let's select the column for IMDB scores as a Series and output the head.

In [None]:
imdb = movie['imdb_score']
imdb.head()

### Selection with a scalar, a list, and a slice
Just like with a DataFrame, both **`.loc`** and **`.iloc`** accept either a single scalar, a list, or a slice.

Let's select the movie IMDB score for 'Forrest Gump':

In [None]:
imdb.loc['Forrest Gump']

Select both 'Forrest Gump' and 'Avatar'. Notice that a Series is returned.

In [None]:
imdb[['Forrest Gump', 'Avatar']]

Select every 100th movie from 'Avatar' to 'Forrest Gump':

In [None]:
imdb.loc['Avatar':'Forrest Gump':100]

### Repeat with `.iloc`
Select a single score

In [None]:
imdb[10]

Select multiple scores with a list

In [None]:
imdb[[10, 20, 30]]

Select multiple scores with a slice

In [None]:
imdb[3000:3050:10]

### Trouble with *just the brackets*
You can use just the brackets to make the same selections as above. See the following examples:

In [None]:
imdb['Forrest Gump']

In [None]:
imdb['Avatar':'Forrest Gump':100]

In [None]:
imdb[[10, 20, 30]]

In [None]:
imdb[3000:3050:10]

# Can you spot the problem?
The major issue is that using *just the brackets* is **ambiguous** and not **explicit**. We don't know if we are selecting by label or by integer location. With **`.loc`** and **`.iloc`**, it is clear exactly what our intentions are. I suggest using **`.loc`** and **`.iloc`** for clarity.

# Comparison to Python Lists and Dictionaries
It may be helpful to compare pandas ability to make selections by label and integer location to that of Python lists and dictionaries.

Python lists allow for selection of data only through **integer location**. You can use a single integer or slice notation to make the selection but NOT a list of integers.

Let’s see examples of subset selection of lists using integers:

In [None]:
a_list = [10, 5, 3, 89, 20, 44, 37]

In [None]:
a_list[4]

In [None]:
a_list[-3:]

# Selection by label with Python dictionaries
All values in each dictionary are labeled by a key. We use this key to make single selections. Dictionaries only allow selection with a single label. Slices and lists of labels are not allowed.

In [None]:
d = {'a':1, 'b':2, 't':20, 'z':26, 'A':27}
d['a']

In [None]:
d['A']

# Pandas has power of lists and dictionaries
DataFrames and Series are able to make selections with integers like a list and with labels like a dictionary.

# Exercises

### Problem 1
<span  style="color:green; font-size:16px">Read in the bikes dataset. We will be using it for the rest of the questions. Select the wind speed column as a Series and assign it to a variable and output the head. What kind of index does this Series have?</span>

In [None]:
# your code here

### Problem 2
<span  style="color:green; font-size:16px">From the wind speed Series, select the integer locations 4 through, but not including 10</span>

In [None]:
# your code here

### Problem 3
<span  style="color:green; font-size:16px">Copy and paste your answer to problem 2 below but use `.loc` instead. Do you get the same result? Why not?</span>

In [None]:
# your code here

### Problem 4
<span  style="color:green; font-size:16px">Read in the movie dataset and set the index to be the title. Select `actor1` as a Series. Who is the `actor1` for 'My Big Fat Greek Wedding'?</span>

In [None]:
# your code here

### Problem 5
<span  style="color:green; font-size:16px">Find `actor1` for your favorite two movies?</span>

In [None]:
# your code here

### Problem 6
<span  style="color:green; font-size:16px">Select the last 10 values from `actor1` using two different ways?</span>

In [None]:
# your code here