# Selecting Data

A common need is to grab a subset of records that meet a certain criteria. You can do this by indexing the `DataFrame` much like you've seen done with a `NumPy.ndarray`.

In [1]:
import os
import pandas as pd

users = pd.read_csv(os.path.join('data', 'users.csv'),)
# Pop out a quick sanity check
len(users)

475

In [2]:
users.head()

Unnamed: 0.1,Unnamed: 0,first_name,last_name,email,email_verified,signup_date,referral_count,balance
0,aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
1,acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
2,adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
3,adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01
4,adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85


In [3]:
#This time we did not select the first columns as index. We can rename all the columns at once doing
users.columns =['user_name', 'first_name', 'last_name', 'email', 'email_verified',
                'signup_date', 'referral_count', 'balance']

In [4]:
users.head()

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
0,aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
1,acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
2,adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
3,adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01
4,adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85


In [5]:
# This vectorized comparison returns a new `Series` ... 
#   We are naming it so we can use it later.
no_referrals_index = users['referral_count'] < 1
# See how the boolean `Series` returned includes all rows from the `DataFrame`.
#  The value is the result of each comparison
no_referrals_index.head()

0    False
1    False
2    False
3    False
4    False
Name: referral_count, dtype: bool

Using the boolean `Series` we just created, **`no_referrals_index`**, we can retrieve all rows where that comparison was True.

In [6]:
users.columns

Index(['user_name', 'first_name', 'last_name', 'email', 'email_verified',
       'signup_date', 'referral_count', 'balance'],
      dtype='object')

In [7]:
users[no_referrals_index].head()

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
5,alan9443,Alan,Pope,pope@hotmail.com,True,2018-04-17,0,56.09
13,andrew.alvarez,Andrew,Alvarez,aalvarez@hotmail.com,False,2018-08-01,0,81.66
37,boyer7005,Sara,Boyer,boyer8636@gmail.com,True,2018-07-31,0,91.41
43,brandon.gilbert,Brandon,Gilbert,brandon.gilbert@hotmail.com,True,2018-04-28,0,10.17
48,brooke2027,Brooke,,brooke6938@gmail.com,False,2018-05-23,0,7.22


## Inverted mask
A handy shortcut is to prefix the index with a `~` (tilde). This returns the inverse of the boolean `Series`. While I wish that the `~` was called "the opposite day" operator, it is in fact called `bitwise not` operator.

In [8]:
# Careful, double negative here. We don't need no education.
~no_referrals_index.head()

0    True
1    True
2    True
3    True
4    True
Name: referral_count, dtype: bool

In [9]:
# Use the inverse of the index to find where referral values DO NOT equal zero
users[~no_referrals_index].head()

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
0,aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
1,acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
2,adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
3,adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01
4,adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85


## In `loc`
Boolean `Series` as an index may also be used as an index the `DataFrame.loc` object.  

In [10]:
# Select rows where there are no referrals, and select only the following ordered columns
users.loc[no_referrals_index, ['balance', 'email']].head()

Unnamed: 0,balance,email
5,56.09,pope@hotmail.com
13,81.66,aalvarez@hotmail.com
37,91.41,boyer8636@gmail.com
43,10.17,brandon.gilbert@hotmail.com
48,7.22,brooke6938@gmail.com


It is also possible to do the comparison inline, without storing the index in a variable.

In [11]:
users[users['referral_count'] == 0].head()

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
5,alan9443,Alan,Pope,pope@hotmail.com,True,2018-04-17,0,56.09
13,andrew.alvarez,Andrew,Alvarez,aalvarez@hotmail.com,False,2018-08-01,0,81.66
37,boyer7005,Sara,Boyer,boyer8636@gmail.com,True,2018-07-31,0,91.41
43,brandon.gilbert,Brandon,Gilbert,brandon.gilbert@hotmail.com,True,2018-04-28,0,10.17
48,brooke2027,Brooke,,brooke6938@gmail.com,False,2018-05-23,0,7.22


Just like a NumPy `ndarray`, it's possible for a boolean `Series` to be compared to another boolean `Series` using bitwise operators.

Don't forget to surround your expressions with parenthesis to control the order of operations.

In [12]:
# Select all users where they haven't made a referral AND their email has been verified
users[(users['referral_count'] == 0) & (users['email_verified'] == True)].head()

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
5,alan9443,Alan,Pope,pope@hotmail.com,True,2018-04-17,0,56.09
37,boyer7005,Sara,Boyer,boyer8636@gmail.com,True,2018-07-31,0,91.41
43,brandon.gilbert,Brandon,Gilbert,brandon.gilbert@hotmail.com,True,2018-04-28,0,10.17
51,bryant,Darlene,Bryant,dbryant@yahoo.com,True,2018-07-19,0,36.91
56,calvin.perez,Calvin,Perez,cperez@gmail.com,True,2018-02-17,0,13.01


## Slicing ranges

The most robust and consistent way of slicing ranges along arbitrary axes is described in the Selection by Position 
section detailing the .iloc method. For now, we explain the semantics of slicing using the [] operator.

With Series, the syntax works exactly as with an ndarray, returning a slice of the values and the corresponding labels:

In [13]:
#Get the 5 first record
users[:5]

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
0,aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
1,acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
2,adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
3,adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01
4,adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85


In [14]:
#get the last 5:
users[-5:]

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
470,wilson,Robert,Wilson,robert@yahoo.com,False,2018-05-16,5,59.75
471,wking,Wanda,King,wanda.king@holt.com,True,2018-06-01,2,67.08
472,wright3590,Jacqueline,Wright,jacqueline.wright@gonzalez.com,True,2018-02-08,6,18.48
473,young,Jessica,Young,jessica4028@yahoo.com,True,2018-07-17,4,75.39
474,zachary.neal,Zachary,Neal,zneal@gmail.com,True,2018-07-26,1,39.9


In [15]:
#get the 100 to 109
users[100:110]

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
100,davis,Erica,Davis,erica@gmail.com,True,2018-06-08,2,53.52
101,davis9225,Laura,Davis,laura@hotmail.com,True,2018-03-19,3,51.81
102,davis9792,David,Davis,davis3883@hotmail.com,True,2018-01-04,4,36.9
103,dawn,Dawn,,dawn6718@hotmail.com,True,2018-03-02,2,72.63
104,dawn.juarez,Dawn,Juarez,dawn@hotmail.com,True,2018-04-13,1,16.38
105,dean,Lynn,Dean,dean@gmail.com,True,2018-05-22,3,17.45
106,dean2365,Brian,Dean,dean3892@hotmail.com,True,2018-01-08,6,8.5
107,debbie4918,Debbie,,debbie3109@hotmail.com,True,2018-02-09,7,73.63
108,debra,Debra,Frazier,frazier@yahoo.com,True,2018-06-01,1,80.71
109,decker1985,Robert,Decker,robert@yahoo.com,True,2018-04-04,7,92.55


In [16]:
#invert the order
users[::-1].head(10)

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
474,zachary.neal,Zachary,Neal,zneal@gmail.com,True,2018-07-26,1,39.9
473,young,Jessica,Young,jessica4028@yahoo.com,True,2018-07-17,4,75.39
472,wright3590,Jacqueline,Wright,jacqueline.wright@gonzalez.com,True,2018-02-08,6,18.48
471,wking,Wanda,King,wanda.king@holt.com,True,2018-06-01,2,67.08
470,wilson,Robert,Wilson,robert@yahoo.com,False,2018-05-16,5,59.75
469,william6714,William,,william5677@yahoo.com,True,2018-04-26,3,74.65
468,william4588,William,Pittman,william.pittman@gmail.com,True,2018-04-11,2,2.04
467,william2231,William,Douglas,douglas8813@yahoo.com,True,2018-05-19,4,85.32
466,william.lee,William,Lee,lee5646@yahoo.com,True,2018-05-30,3,67.65
465,william,William,,william@hotmail.com,True,2018-06-13,4,4.69


In [17]:
#jump one
users[0::2].head(10)

Unnamed: 0,user_name,first_name,last_name,email,email_verified,signup_date,referral_count,balance
0,aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
2,adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
4,adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85
6,alexander7808,Alexander,Moore,alexander.moore@gmail.com,False,2018-03-27,2,87.71
8,alvarez,John,Alvarez,john4346@hotmail.com,True,2018-09-18,6,49.62
10,amiller,Anne,Miller,miller@hotmail.com,False,2018-06-02,5,86.28
12,andrade,Melissa,Andrade,mandrade@yahoo.com,True,2018-01-06,3,83.22
14,andrew.wells,Andrew,Wells,andrew9976@yahoo.com,True,2018-06-13,5,76.07
16,andrew6347,Andrew,Horton,andrew.horton@hotmail.com,True,2018-02-01,2,85.73
18,anthony1788,Anthony,Valdez,anthony@gmail.com,True,2018-06-30,5,70.7


In [18]:
#using with index

In [19]:
users = users.set_index('user_name')

In [20]:
users.head()

Unnamed: 0_level_0,first_name,last_name,email,email_verified,signup_date,referral_count,balance
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
aaron,Aaron,Davis,aaron6348@gmail.com,True,2018-08-31,6,18.14
acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
adam.saunders,Adam,Saunders,adam@gmail.com,False,2018-05-29,3,72.12
adrian,Adrian,Fang,adrian.fang@teamtreehouse.com,True,2018-04-28,3,30.01
adrian.blair,Adrian,Blair,adrian9335@gmail.com,True,2018-06-16,7,25.85


In [21]:
#To access the index name one should use `.loc`

In [22]:
users.loc['aaron']

first_name                      Aaron
last_name                       Davis
email             aaron6348@gmail.com
email_verified                   True
signup_date                2018-08-31
referral_count                      6
balance                         18.14
Name: aaron, dtype: object

In [23]:
#getting all users with `w`
users.loc[users.index.str.startswith('w')]

Unnamed: 0_level_0,first_name,last_name,email,email_verified,signup_date,referral_count,balance
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
walsh,Kelli,Walsh,walsh@hotmail.com,True,2018-02-08,1,14.7
walters2042,Leslie,Walters,walters8435@gmail.com,True,2018-07-27,2,40.83
watts,Jenna,Watts,jenna@gmail.com,True,2018-03-12,0,54.5
wbrown,Wesley,Brown,wesley@hotmail.com,True,2018-06-24,7,35.64
wesley.hayes,Wesley,Hayes,wesley.hayes@gmail.com,True,2018-08-17,3,90.44
west,Brian,West,west@yahoo.com,True,2018-01-30,3,52.83
wilkins,David,Wilkins,david1254@goodman.info,True,2018-08-06,4,52.15
william,William,,william@hotmail.com,True,2018-06-13,4,4.69
william.lee,William,Lee,lee5646@yahoo.com,True,2018-05-30,3,67.65
william2231,William,Douglas,douglas8813@yahoo.com,True,2018-05-19,4,85.32


## Indexing with isin

In [24]:
# Select values in a list of values:
list_of_referral_count = [1,2]
users[users.referral_count.isin(list_of_referral_count)]

Unnamed: 0_level_0,first_name,last_name,email,email_verified,signup_date,referral_count,balance
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
acook,Anthony,Cook,cook@gmail.com,True,2018-05-12,2,55.45
alexander7808,Alexander,Moore,alexander.moore@gmail.com,False,2018-03-27,2,87.71
amanda,Amanda,Lynch,alynch@gmail.com,True,2018-09-18,2,43.76
andrew6216,Andrew,Bryan,andrew@gmail.com,True,2018-04-01,1,71.42
andrew6347,Andrew,Horton,andrew.horton@hotmail.com,True,2018-02-01,2,85.73
april9082,April,Santana,april.santana@hotmail.com,True,2018-08-14,2,53.87
ariley,Alexis,Riley,ariley@gmail.com,True,2018-05-10,2,89.22
arosario,Amanda,Rosario,amanda@yahoo.com,True,2018-01-29,1,76.19
awhitney,Amanda,Whitney,whitney6923@yahoo.com,True,2018-04-06,2,30.85
barnes,Mikayla,Barnes,mikayla.barnes@hotmail.com,True,2018-06-15,1,1.71
