# Introduction to Pandas 

[CheetSheet](https://www.codecademy.com/learn/paths/data-science/tracks/data-processing-pandas/modules/dspath-intro-pandas/cheatsheet)

## Create a DataFrame I
A DataFrame is an object 

take a dictionary: 
```
thisdict =	{
  "brand": "Ford",
  "model": "Mustang",
  "year": 1964
}
print(thisdict)
```

that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, an Excel spreadsheet, or a SQL query.

DataFrames have rows and columns. Each column has a name, which is a string. Each row has an index, which is an integer. DataFrames can contain many different data types: strings, ints, floats, tuples, etc.

You can pass in a **dictionary** to `pd.DataFrame()`. **Each key is a column name and each value is a list of column values.** The columns must all be the same length or you will get an error. Here’s an example:

```
df1 = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age': [34, 28, 51]
})
```

In [4]:
import pandas as pd
df1 = pd.DataFrame({
    'name': ['John Smith', 'Jane Doe', 'Joe Schmo'],
    'address': ['123 Main St.', '456 Maple Ave.', '789 Broadway'],
    'age': [34, 28, 51]
})
(df1)

Unnamed: 0,name,address,age
0,John Smith,123 Main St.,34
1,Jane Doe,456 Maple Ave.,28
2,Joe Schmo,789 Broadway,51


Note that the columns will appear in alphabetical order because dictionaries don’t have any inherent order for columns.

Another example 

Fill up the table
```
import codecademylib
import pandas as pd

df1 = pd.DataFrame({
  'Product ID': [1, 2, 3, 4],
  ....... 
})

print(df1)
```

In [5]:
import pandas as pd

df1 = pd.DataFrame({
  'Product ID': [1, 2, 3, 4],
  'Product Name': ['t-shirt', 't-shirt', 'skirt', 'skirt'],
  'Color': ['blue', 'green', 'red', 'black']
})

(df1)

Unnamed: 0,Product ID,Product Name,Color
0,1,t-shirt,blue
1,2,t-shirt,green
2,3,skirt,red
3,4,skirt,black


## Create a DataFrame II

For example, you can pass in a **list of lists**, where each one represents a row of data. Use the keyword argument **columns** to pass a list of column names. 

In [6]:
import pandas as pd

df2 = pd.DataFrame([
    ['John Smith', '123 Main St.', 34],
    ['Jane Doe', '456 Maple Ave.', 28],
    ['Joe Schmo', '789 Broadway', 51]
    ],
    columns=['name', 'address', 'age'])
(df2)

Unnamed: 0,name,address,age
0,John Smith,123 Main St.,34
1,Jane Doe,456 Maple Ave.,28
2,Joe Schmo,789 Broadway,51


Another Example: 

In [7]:
import pandas as pd

df2 = pd.DataFrame([
  [1, 'San Diego', 100],
  [2, 'Los Angeles', 120],
  [3, 'San Francisco', 90],
  [4, 'Sacramento', 115]
],
  columns=[
   'Store ID', 'Location', 'Number of Employees'
  ])

(df2)

Unnamed: 0,Store ID,Location,Number of Employees
0,1,San Diego,100
1,2,Los Angeles,120
2,3,San Francisco,90
3,4,Sacramento,115


## Comma Separated Variables (CSV) 

We now know how to create our own DataFrame. However, most of the time, we’ll be working with datasets that already exist. One of the most common formats for big datasets is the CSV.

CSV (comma separated values) is a text-only spreadsheet format. You can find CSVs in lots of places:

* Online datasets (here’s an example from data.gov)
* Export from Excel or Google Sheets
* Export from SQL

The first row of a CSV contains column headings. All subsequent rows contain values. Each column heading and each variable is separated by a comma:

```
column1,column2,column3
value1,value2,value3
```

Write the following data as a CSV in cupcakes.csv. 

```
name,	cake_flavor, 	frosting_flavor, 	topping,
Devil’s Food, 	chocolate,	chocolate, 	chocolate shavings,
Birthday Cake, 	vanilla, 	vanilla 	rainbow sprinkles
Carrot Cake, 	carrot, 	cream, cheese, 	almonds
```

## Loading and Saving CSVs 

When you have data in a CSV, you can load it into a DataFrame in Pandas using .`read_csv()`:

```
pd.read_csv('my-csv-file.csv')
```

In the example above, the `.read_csv()` method is called. The CSV file called `my-csv-file` is passed in as an argument.

We can also save data to a CSV, using `.to_csv()`.

```
df.to_csv('new-csv-file.csv')
```

In the example above, the .`to_csv()` method is called on `df` (which represents a DataFrame object). The name of the CSV file is passed in as an argument `(new-csv-file.csv)`. By default, this method will save the CSV file in your current directory.


In [3]:
import pandas as pd
# df is a varibale taht stores table 

df= pd.read_csv('sample.csv') 

(df.head(3))
(df)

Unnamed: 0,City,Population,Median Age
0,Maplewood,100000,40
1,Wayne,350000,33
2,Forrest Hills,300000,35
3,Paramus,400000,55
4,Hackensack,290000,39


In [8]:
series = df['City']
# series one column of data has 2 objects: values and index
series

0        Maplewood
1            Wayne
2    Forrest Hills
3          Paramus
4       Hackensack
Name: City, dtype: object

## Inspect a DataFrame 

When we load a new DataFrame from a CSV, we want to know what it looks like.

If it’s a small DataFrame, you can display it by typing `print(df)`.

If it’s a larger DataFrame, **it’s helpful to be able to inspect a few items** without having to look at the entire DataFrame.

The method `.head()` gives the first 5 rows of a DataFrame. If you want to see more rows, you can pass in the positional argument `n`. For example, df.head(10) would show the first 10 rows.

The method `df.info()` gives some statistics for each column.


In [11]:
import pandas as pd

df = pd.read_csv('imdb.csv')

(df.head(10))



Unnamed: 0,id,name,genre,year,imdb_rating
0,1,Avatar,action,2009,7.9
1,2,Jurassic World,action,2015,7.3
2,3,The Avengers,action,2012,8.1
3,4,The Dark Knight,action,2008,9.0
4,5,Star Wars: Episode I - The Phantom Menace,action,1999,6.6
5,6,Star Wars,action,1977,8.7
6,7,Avengers: Age of Ultron,action,2015,7.9
7,8,The Dark Knight Rises,action,2012,8.5
8,9,Pirates of the Caribbean: Dead Mans Chest,action,2006,7.3
9,10,Iron Man 3,action,2013,7.3


In [12]:

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 220 entries, 0 to 219
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           220 non-null    int64  
 1   name         220 non-null    object 
 2   genre        220 non-null    object 
 3   year         220 non-null    int64  
 4   imdb_rating  220 non-null    float64
dtypes: float64(1), int64(2), object(2)
memory usage: 8.7+ KB
None


## Select Columns

Now we know how to create and load data. Let’s select parts of those datasets that are interesting or important to our analyses.

Suppose you have the DataFrame called customers, which contains the ages of your customers:
```
name 	age
Rebecca Erikson 	35
Thomas Roberson 	28
Diane Ochoa 	42
… 	…
```

Perhaps you want to take the average or plot a histogram of the ages. In order to do either of these tasks, you’d need to select the column.

There are two possible syntaxes for selecting all values from a column:

* Select the column as if you were selecting a value from a dictionary using a key. In our example, we would type `customers['age']` to select the ages.

* If the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.), then you can select it using the following notation: `df.MySecondColumn`. In our example, we would type `customers.age`.

* **If its only one colums the type will be series**


In [53]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])
# selecting clinic_north column and saving it in clinic_north var
clinic_north = df["clinic_north"]
# or clinic_north = df.clinic_north

print(clinic_north)

print(type(clinic_north))

print(type(df))
(df)


0    100
1     45
2     96
3     80
4     54
5    109
Name: clinic_north, dtype: int64
<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
0,January,100,100,23,100
1,February,51,45,145,45
2,March,81,96,65,96
3,April,80,80,54,180
4,May,51,54,54,154
5,June,112,109,79,129


## Selecting Multiple Columns

When you have a larger DataFrame, you might want to select just a few columns.

For instance, let’s return to a DataFrame of orders from ShoeFly.com:

```
id 	first_name 	last_name 	email 	shoe_type 	shoe_material 	shoe_color
54791 	Rebecca 	Lindsay 	RebeccaLindsay57@hotmail.com 	clogs 	faux-leather 	black
53450 	Emily 	Joyce 	EmilyJoyce25@gmail.com 	ballet flats 	faux-leather 	navy
91987 	Joyce 	Waller 	Joyce.Waller@gmail.com 	sandals 	fabric 	black
14437 	Justin 	Erickson 	Justin.Erickson@outlook.com 	clogs 	faux-leather 	red
```

We might just be interested in the customer’s `last_name` and `email`. We want a DataFrame like this:
```
last_name 	email
Lindsay 	RebeccaLindsay57@hotmail.com
Joyce 	EmilyJoyce25@gmail.com
Waller 	Joyce.Waller@gmail.com
Erickson 	Justin.Erickson@outlook.com
```

To select two or more columns from a DataFrame, we use a list of the column names. To create the DataFrame shown above, we would use:
```
new_df = orders[['last_name', 'email']]
```

** *Note: *Make sure that you have a double set of brackets ([[]]), or this command won’t work!**


In [31]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)

# selecting 2 columns using [[]]  df[['ddd', 'ggg']]
clinic_north_south = df[['clinic_north', 'clinic_south']]

print(type(clinic_north_south))
print(df.clinic_north)
print(df.clinic_south)

(df)

<class 'pandas.core.frame.DataFrame'>
0    100
1     45
2     96
3     80
4     54
5    109
Name: clinic_north, dtype: int64
0     23
1    145
2     65
3     54
4     54
5     79
Name: clinic_south, dtype: int64


Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
0,January,100,100,23,100
1,February,51,45,145,45
2,March,81,96,65,96
3,April,80,80,54,180
4,May,51,54,54,154
5,June,112,109,79,129


## Select Rows

Let’s revisit our orders from ShoeFly.com:
```
id 	first_name 	last_name 	email 	shoe_type 	shoe_material 	shoe_color
54791 	Rebecca 	Lindsay 	RebeccaLindsay57@hotmail.com 	clogs 	faux-leather 	black
53450 	Emily 	James 	EmilyJames25@gmail.com 	ballet flats 	faux-leather 	navy
91987 	Joyce 	Waller 	Joyce.Waller@gmail.com 	sandals 	fabric 	black
14437 	Justin 	Erickson 	Justin.Erickson@outlook.com 	clogs 	faux-leather 	red
…
```

Maybe our Customer Service department has just received a message from Joyce Waller, so we want to know exactly what she ordered. We want to select this single row of data.

DataFrames are zero-indexed, meaning that we start with the 0th row and count up from there. Joyce Waller’s order is the 2nd row.

We select it using the following command:

`orders.iloc[2]`



In [38]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])

march = df.iloc[2]
print(march)

month           March
clinic_east        81
clinic_north       96
clinic_south       65
clinic_west        96
Name: 2, dtype: object


## Selecting Multiple Rows

You can also select multiple rows from a DataFrame.

Here are a few more rows from ShoeFly.com’s orders DataFrame:
```
id 	first_name 	last_name 	email 	shoe_type 	shoe_material 	shoe_color
54791 	Rebecca 	Lindsay 	RebeccaLindsay57@hotmail.com 	clogs 	faux-leather 	black
53450 	Emily 	Joyce 	EmilyJoyce25@gmail.com 	ballet flats 	faux-leather 	navy
91987 	Joyce 	Waller 	Joyce.Waller@gmail.com 	sandals 	fabric 	black
14437 	Justin 	Erickson 	Justin.Erickson@outlook.com 	clogs 	faux-leather 	red
79357 	Andrew 	Banks 	AB4318@gmail.com 	boots 	leather 	brown
52386 	Julie 	Marsh 	JulieMarsh59@gmail.com 	sandals 	fabric 	black
20487 	Thomas 	Jensen 	TJ5470@gmail.com 	clogs 	fabric 	navy
76971 	Janice 	Hicks 	Janice.Hicks@gmail.com 	clogs 	faux-leather 	navy
21586 	Gabriel 	Porter 	GabrielPorter24@gmail.com 	clogs 	leather 	brown
```
Here are some different ways of selecting multiple rows:

`orders.iloc[3:7]` 
would select all rows starting at the 3rd row and up to but not including the 7th row (i.e., the 3rd row, 4th row, 5th row, and 6th row)

```
id 	first_name 	last_name 	email 	shoe_type 	shoe_material 	shoe_color
14437 	Justin 	Erickson 	Justin.Erickson@outlook.com 	clogs 	faux-leather 	red
79357 	Andrew 	Banks 	AB4318@gmail.com 	boots 	leather 	brown
52386 	Julie 	Marsh 	JulieMarsh59@gmail.com 	sandals 	fabric 	black
20487 	Thomas 	Jensen 	TJ5470@gmail.com 	clogs 	fabric 	navy
```

`orders.iloc[:4]` would select all rows up to, but not including the 4th row (i.e., the 0th, 1st, 2nd, and 3rd rows)

```
id 	first_name 	last_name 	email 	shoe_type 	shoe_material 	shoe_color
54791 	Rebecca 	Lindsay 	RebeccaLindsay57@hotmail.com 	clogs 	faux-leather 	black
53450 	Emily 	Joyce 	EmilyJoyce25@gmail.com 	ballet flats 	faux-leather 	navy
91987 	Joyce 	Waller 	Joyce.Waller@gmail.com 	sandals 	fabric 	black
14437 	Justin 	Erickson 	Justin.Erickson@outlook.com 	clogs 	faux-leather 	red
```

`orders.iloc[-3:]` would select the rows starting at the 3rd to last row and up to and including the final row




In [34]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)

april_may_june = df.iloc[3:6]

print(april_may_june)

   month  clinic_east  clinic_north  clinic_south  clinic_west
3  April           80            80            54          180
4    May           51            54            54          154
5   June          112           109            79          129


## Select Rows with Logic I

You can select a subset of a DataFrame by using logical statements:

`df[df.MyColumnName == desired_column_value]`

We have a large DataFrame with information about our customers. A few of the many rows look like this:
```
name 	address 	phone 	age
Martha Jones 	123 Main St. 	234-567-8910 	28
Rose Tyler 	456 Maple Ave. 	212-867-5309 	22
Donna Noble 	789 Broadway 	949-123-4567 	35
Amy Pond 	98 West End Ave. 	646-555-1234 	29
Clara Oswald 	54 Columbus Ave. 	714-225-1957 	31
… 	… 	… 	…
```

Suppose we want to select all rows where the customer’s age is 30. We would use:

`df[df.age == 30]`

In Python, == is how we test if a value is exactly equal to another value.

We can use other logical statements, such as:

* Greater Than, > — Here, we select all rows where the customer’s age is greater than 30:

`df[df.age > 30]`

* Less Than, < — Here, we select all rows where the customer’s age is less than 30:

`df[df.age < 30]`

* Not Equal, != — This snippet selects all rows where the customer’s name is not Clara Oswald:

`df[df.name != 'Clara Oswald']`



In [36]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])

# Create variable january using a logical statement that selects 
#the row of df where the 'month' column is 'January'.
january = df[df.month == 'January']

# inspect january
january

Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
0,January,100,100,23,100


## Select Rows with Logic II

You can also combine multiple logical statements, as long as each statement is in parentheses.

For instance, suppose we wanted to select all rows where the customer’s age was under 30 or the 
customer’s name was “Martha Jones”:
```
name 	address 	phone 	age
Martha Jones 	123 Main St. 	234-567-8910 	28
Rose Tyler 	456 Maple Ave. 	212-867-5309 	22
Donna Noble 	789 Broadway 	949-123-4567 	35
Amy Pond 	98 West End Ave. 	646-555-1234 	29
Clara Oswald 	54 Columbus Ave. 	714-225-1957 	31
… 			
```

We could use the following code:

```
df[(df.age < 30) |
   (df.name == 'Martha Jones')]
```


** In Python, | means “or” and & means “and”.**



In [42]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])

# month "March" | (or) month "April"
march_april = df[(df.month == "March") | (df.month == "April" )]

march_april

Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
2,March,81,96,65,96
3,April,80,80,54,180


## Select Rows with Logic III

Suppose we want to select the rows where the customer’s name is either “Martha Jones”, “Rose Tyler” or “Amy Pond”.
```
name 	address 	phone 	age
Martha Jones 	123 Main St. 	234-567-8910 	28
Rose Tyler 	456 Maple Ave. 	212-867-5309 	22
Donna Noble 	789 Broadway 	949-123-4567 	35
Amy Pond 	98 West End Ave. 	646-555-1234 	29
Clara Oswald 	54 Columbus Ave. 	714-225-1957 	31
… 	… 	… 	…
```

We could use the isin command to check that df.name is one of a list of values:

```
df[df.name.isin(['Martha Jones',
     'Rose Tyler',
     'Amy Pond'])]
```



In [46]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west'])

# to choose all we use 
# df[df.nameOfColumn.isin(["name1", "name2"])]

january_february_march = df[df.month.isin(["January", "February", "March"])]

january_february_march

Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
0,January,100,100,23,100
1,February,51,45,145,45
2,March,81,96,65,96


## Setting indices

When we select a subset of a DataFrame using logic, we end up with non-consecutive indices. This is inelegant and makes it hard to use `.iloc()`.

We can fix this using the method `.reset_index()`. For example, here is a DataFrame called `df` with non-consecutive indices:
```
	First Name 	Last Name
0 	John 	Smith
4 	Jane 	Doe
7 	Joe 	Schmo
```

If we use the command `df.reset_index()`, we get a new DataFrame with a new set of indices:
```
	index 	First Name 	Last Name
0 	0 	John 	Smith
1 	4 	Jane 	Doe
2 	7 	Joe 	Schmo
```

Note that the old indices have been moved into a new column called `'index'`. Unless you need those values for something special, it’s probably better to use the keyword `drop=True` so that you don’t end up with that extra column. If we run the command `df.reset_index(drop=True)`, we get a new DataFrame that looks like this:
```
	First Name 	Last Name
0 	John 	Smith
1 	Jane 	Doe
2 	Joe 	Schmo
```

Using `.reset_index()` will return a new DataFrame, but we usually just want to modify our existing DataFrame. If we use the keyword inplace=True we can just modify our existing DataFrame.


In [48]:
import pandas as pd

df = pd.DataFrame([
  ['January', 100, 100, 23, 100],
  ['February', 51, 45, 145, 45],
  ['March', 81, 96, 65, 96],
  ['April', 80, 80, 54, 180],
  ['May', 51, 54, 54, 154],
  ['June', 112, 109, 79, 129]],
  columns=['month', 'clinic_east',
           'clinic_north', 'clinic_south',
           'clinic_west']
)


# choosing indexes to show 0= Jan, 1= Feb .... 
df2 = df.loc[[1, 3, 5]]
df2


Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
1,February,51,45,145,45
3,April,80,80,54,180
5,June,112,109,79,129


In [49]:

## resseting indexes in the previous DataFrame so indexes start from 0,1,2,3, ... 
df3 = df2.reset_index()
df3

Unnamed: 0,index,month,clinic_east,clinic_north,clinic_south,clinic_west
0,1,February,51,45,145,45
1,3,April,80,80,54,180
2,5,June,112,109,79,129


In [51]:
# get rid of index column 
df2.reset_index(inplace = True, drop = True)
df2

Unnamed: 0,month,clinic_east,clinic_north,clinic_south,clinic_west
0,February,51,45,145,45
1,April,80,80,54,180
2,June,112,109,79,129


## Review

You’ve completed the lesson! You’ve just learned the basics of working with a single table in Pandas, including:

* Create a table from scratch
* Loading data from another file
* Selecting certain rows or columns of a table

Let’s practice what you’ve learned.


In [57]:

import pandas as pd

#Part 1: reading the csv
orders = pd.read_csv('shoefly.csv')

#Part 2: inspecting the first five lines of data
orders.head(5)


Unnamed: 0,id,first_name,last_name,email,shoe_type,shoe_material,shoe_color
0,54791,Rebecca,Lindsay,RebeccaLindsay57@hotmail.com,clogs,faux-leather,black
1,53450,Emily,Joyce,EmilyJoyce25@gmail.com,ballet flats,faux-leather,navy
2,91987,Joyce,Waller,Joyce.Waller@gmail.com,sandals,fabric,black
3,14437,Justin,Erickson,Justin.Erickson@outlook.com,clogs,faux-leather,red
4,79357,Andrew,Banks,AB4318@gmail.com,boots,leather,brown


In [59]:

#Part 3: selecting the column 'email'
emails = orders.email; 
print(emails)


0     RebeccaLindsay57@hotmail.com
1           EmilyJoyce25@gmail.com
2           Joyce.Waller@gmail.com
3      Justin.Erickson@outlook.com
4                 AB4318@gmail.com
5           JulieMarsh59@gmail.com
6                 TJ5470@gmail.com
7           Janice.Hicks@gmail.com
8        GabrielPorter24@gmail.com
9        FrancesPalmer50@gmail.com
10         JessicaHale25@gmail.com
11      LawrenceParker44@gmail.com
12         SusanDennis58@gmail.com
13                DO2680@gmail.com
14       Rebecca.Charles@gmail.com
15              JC2072@hotmail.com
16              VS4753@outlook.com
17          RoyTillman20@gmail.com
18       Thomas.Roberson@gmail.com
19         ANewton1977@outlook.com
Name: email, dtype: object


In [61]:

# Frances Palmer claims that her order was wrong. What did Frances Palmer order?
# Use logic to select that row of orders and save it to the variable frances_palmer.

frances_palmer = orders[(orders.first_name == 'Frances') & (orders.last_name == 'Palmer')]
frances_palmer



Unnamed: 0,id,first_name,last_name,email,shoe_type,shoe_material,shoe_color
9,62083,Frances,Palmer,FrancesPalmer50@gmail.com,wedges,leather,white


In [62]:
#Part 5: Comfy feet means more time on the street
comfy_shoes = orders[orders.shoe_type.isin(['clogs', 'boots', 'ballet flats'])]

comfy_shoes

Unnamed: 0,id,first_name,last_name,email,shoe_type,shoe_material,shoe_color
0,54791,Rebecca,Lindsay,RebeccaLindsay57@hotmail.com,clogs,faux-leather,black
1,53450,Emily,Joyce,EmilyJoyce25@gmail.com,ballet flats,faux-leather,navy
3,14437,Justin,Erickson,Justin.Erickson@outlook.com,clogs,faux-leather,red
4,79357,Andrew,Banks,AB4318@gmail.com,boots,leather,brown
6,20487,Thomas,Jensen,TJ5470@gmail.com,clogs,fabric,navy
7,76971,Janice,Hicks,Janice.Hicks@gmail.com,clogs,faux-leather,navy
8,21586,Gabriel,Porter,GabrielPorter24@gmail.com,clogs,leather,brown
10,91629,Jessica,Hale,JessicaHale25@gmail.com,clogs,leather,red
12,45832,Susan,Dennis,SusanDennis58@gmail.com,ballet flats,fabric,white
14,73431,Rebecca,Charles,Rebecca.Charles@gmail.com,boots,faux-leather,white
