# Discussion 03: Arrays and DataFrames

Welcome to Discussion 03! At the end of last week's discussion, we got a sneak peak of arrays and tables.

This week, we will go back over these data structures and work with the some familiar datasets. 

You can find additional help on these topics in the 'Arrays' and 'DataFrames' sections of the course [textbook](https://eldridgejm.github.io/dive_into_data_science/front.html).

[Here](https://ucsd-ets.github.io/dsc10-2020-fa/published/default/reference/babypandas-reference.pdf) is a pointer to that reference sheet we saw last time.

<img src="data/panda_basketball.jpg" width="600">

In [1]:
# please don't change this cell, but do make sure to run it
import babypandas as bpd
import matplotlib.pyplot as plt
import numpy as np
import otter
grader = otter.Notebook()

## Part 1 : Arrays vs. Lists

---
Arrays and lists are helpful when we want to store and manipulate **sequences** of data

##### Lists
- Built into Python
- Friendly with different data types
- EXTREMELY SLOW

##### Arrays
- Not built inot Python directly (that's why we have NumPy!)
- Elements must be the same data type
- MUCH FASTER


## Array Problems (part 1)

In [2]:
some_array = np.array([6, 1, 9, 5, 2, 3, 4, 3, 2, 4])

**Question 1** : How many elements are in ```some_array```?

```
BEGIN QUESTION
name: q11
```

In [3]:
num_elems = some_array.size # SOLUTION
num_elems

10

In [4]:
## TEST ##
num_elems == 10

True

**Question 2** : How do we access the *first* element of ```some_array```?

```
BEGIN QUESTION
name: q12
```

In [5]:
first_elem = some_array[0] # SOLUTION
first_elem

6

In [6]:
## TEST ##
first_elem == 6

True

**Question 3** : How do we access the *last* element of ```some_array```?

```
BEGIN QUESTION
name: q13
```

In [7]:
last_elem = some_array[-1] # SOLUTION
last_elem

4

In [8]:
## TEST ##
last_elem == -4

False

In [9]:
last_index = some_array.size - 1 # SOLUTION NO PROMPT
last_elem = some_array[last_index] # SOLUTION NO PROMPT

**Question 4** : What happens when we do ```some_array[-2]```?

In [10]:
some_array

array([6, 1, 9, 5, 2, 3, 4, 3, 2, 4])

In [11]:
some_array[-2]

2

In [12]:
neg_2 = some_array[-2] # SOLUTION NO PROMPT

**Question 5 - BONUS** : How do we make a new array that contains only the first 5 elements from ```some_array```? 

```
BEGIN QUESTION
name: q15
```

In [13]:
first_five = some_array[:5] # SOLUTION
first_five

array([6, 1, 9, 5, 2])

In [14]:
## TEST ##
(first_five == np.array([6,1,9,5,2])).all()

True

In [15]:
array_1 = np.array([1,2,3,4,5,6,7,8])
array_2 = np.array([6,7,8,9,10,11,12,13])

**Question 6** : How to we get the element-wise sum of ```some_array``` and ```some_array_2```?

```
BEGIN QUESTION
name: q16
```

In [16]:
elem_wise_sum = array_1 + array_2 # SOLUTION
elem_wise_sum

array([ 7,  9, 11, 13, 15, 17, 19, 21])

In [17]:
## TEST ##
(elem_wise_sum == np.array([ 7,  9, 11, 13, 15, 17, 19, 21])).all()

True

**Question 7** : How to we get the max element from ```some_array```?

```
BEGIN QUESTION
name: q17
```

In [18]:
max_elem = max(some_array) # SOLUTION
max_elem

9

In [19]:
## TEST ##
max_elem == 9

True

In [20]:
max_elem = some_array.max() # SOLUTION NO PROMPT

**Question 8 - BONUS** : How to we get the average of first 6 elements from ```some_array```?

```
BEGIN QUESTION
name: q18
```

In [21]:
first_six_average = np.mean(some_array[:6]) # SOLUTION
first_six_average

4.333333333333333

In [22]:
## TEST ##
import math
math.isclose(first_six_average, 4.33333333333333)

True

**Question 9** : How to we make an array with every integer under 13

```
BEGIN QUESTION
name: q19
```

In [23]:
under_13 = np.arange(13) # SOLUTION
under_13

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [24]:
## TEST ##
(under_13 == np.array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])).all()

True

**Question 10** : How do we make an array of [6,9,12,15,18,21]

```
BEGIN QUESTION
name: q110
```

In [25]:
threes_array = np.arange(6,22,3) # SOLUTION
threes_array

array([ 6,  9, 12, 15, 18, 21])

In [26]:
## TEST ##
(threes_array == np.array([ 6,  9, 12, 15, 18, 21])).all()

True

**Question 11** : How do we make an array of 2 to the power of from 2 to 6, aka [4,8,16,32,64]?

```
BEGIN QUESTION
name: q111
```

In [27]:
powers_of_two = pow(2,np.arange(2,7)) # SOLUTION
powers_of_two

array([ 4,  8, 16, 32, 64])

In [28]:
## TEST ##
(powers_of_two == np.array([ 4,  8, 16, 32, 64])).all()

True

**Remember!** NumPy Arrays will fit all data to the same type:

In [29]:
random_array = np.array([45,"hello", True, 987, 34.5, "Yes"])
random_array

array(['45', 'hello', 'True', '987', '34.5', 'Yes'], dtype='<U21')

In [30]:
#But Lists will not:
random_list = ['hello', 'there', 'buddy', 43, True, 38.9, 'DSC 10']
random_list

['hello', 'there', 'buddy', 43, True, 38.9, 'DSC 10']

In [31]:
print(type(random_array))
print(type(random_list))

<class 'numpy.ndarray'>
<class 'list'>


## Part 2 : Tables

---
Tables are a slightly more complex data structure that are helpful when we want to store and manipulate larger datasets with lots of information. We often refer to tables as "data frames" and these terms are interchangable in this class.

**Rows** are labeled arrays that correspond to different entries or samples in the table. (The labels are typically *unique* and easily identifiable names.)

**Columns** are labeled arrays that correspond to the different pieces of information we care about. (The labels are the titles of each column.)

The **Index** of a table is an array completely separate from the columns and contains all of the row labels.

A **Series** is a labeled array that corresponds to a single column from the table.

##### Important note
To get a particular element from a table, in this class, we'll always ```.get()``` the column label, then ```.loc[]``` the row label.


## Table Problems (Part 2)

# Ultimate Halloween Candy Showdown (remember this?)
---
269,000 user submitted winners of head to head candy matchups

### Read from CSV and Set the index

In [32]:
candy = bpd.read_csv("data/candy.csv")
candy = candy.set_index('competitorname')
candy

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


In [33]:
candy.get("sugarpercent")

competitorname
100 Grand                     0.732
3 Musketeers                  0.604
One dime                      0.011
One quarter                   0.011
Air Heads                     0.906
                              ...  
Twizzlers                     0.220
Warheads                      0.093
WelchÕs Fruit Snacks          0.313
WertherÕs Original Caramel    0.186
Whoppers                      0.872
Name: sugarpercent, Length: 85, dtype: float64

## Q2.1

Store the competitor names of the top 5 most popular candies (winpercent) in an array

```
BEGIN QUESTION
name: q21
```

In [34]:
top_five = np.array(candy.sort_values(by='winpercent', ascending = False).iloc[0:5].index) # SOLUTION
top_five

array(['ReeseÕs Peanut Butter cup', 'ReeseÕs Miniatures', 'Twix',
       'Kit Kat', 'Snickers'], dtype=object)

In [35]:
## TEST ##
top_five[0] == 'ReeseÕs Peanut Butter cup' and top_five[1] == 'ReeseÕs Miniatures' and top_five[2] == 'Twix' and top_five[3] == 'Kit Kat' and top_five[4] == 'Snickers'

True

## Q2.2

How many candies have caramel but not chocolate?

```
BEGIN QUESTION
name: q22
```

In [36]:
candies_with_caramel_not_choco = candy[(candy.get("chocolate") == 0) & (candy.get("caramel") == 1)].shape[0] # SOLUTION
candies_with_caramel_not_choco

4

In [37]:
## TEST ##
candies_with_caramel_not_choco == 4

True

## Q2.3

What is the name of the one candy that has the word "candy" as part of its name? 

```
BEGIN QUESTION
name: q23
```

In [38]:
candy_name = candy[candy.index.str.contains("candy")].index[0] # SOLUTION
candy_name

'Smarties candy'

In [39]:
## TEST ## 
candy_name == 'Smarties candy'

True

## Q2.4
Do chococlate candies or non-chocolate candies have a higher win percentage?
Your answer should be 1 for chocolate candies or 0 for non-chocolate candies.

```
BEGIN QUESTION
name: q24

```

In [40]:
higher_winpercent_choc = candy.get(['chocolate', 'winpercent']).groupby('chocolate').mean().sort_values(by='winpercent', ascending = False).index[0] # SOLUTION
higher_winpercent_choc

1

In [41]:
## TEST ##
higher_winpercent_choc == 1

True

## Q2.5

Do chocolate candies have more hard candies or non-hard candies.
Your answer should be 1 for hard candies or 0 for non-hard candies.

```
BEGIN QUESTION
name: q25
```

In [42]:
more_hard_or_soft = candy[candy.get('chocolate') == 1].get(['hard', 'chocolate']).groupby('hard').count().sort_values(by="chocolate", ascending = False).index[0] # SOLUTION
more_hard_or_soft

0

In [43]:
## TEST ##
more_hard_or_soft == 0

True

In [44]:
# recall df
candy

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


## Q2.6 - BONUS

Which candy has the most 1s in across all binary variables (Hint: Use ```.apply()```)

```
BEGIN QUESTION
name: q26
```

In [45]:
competitor_name = candy.drop(columns = ['sugarpercent', 'pricepercent', 'winpercent']).apply(sum, axis = 1).sort_values(ascending = False).index[0] # SOLUTION
competitor_name

'Snickers'

In [46]:
## TEST ##
competitor_name == 'Snickers' or competitor_name == 'Snickers Crisper' or competitor_name == 'Baby Ruth'

True

In [47]:
grader.check_all()

# Supplemental Reference Sheet Examples


## More fun with Tables

In [48]:
# recall the full data frame we've been working with
candy

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


# Building and Organizing DataFrames

add/replace a column : ```df.assign(Name_of_Column=column_data)```

drops a single column :  ```df.drop(columns=column_name)```

drops every column in a list of columns : ```df.drop(columns=[col_1_name, ..., col_k_name])```

move the index to a column : ```df.reset_index()```

move the column to the index : ```df.set_index(column_name)```

sort entire DataFrame by values in a column : ```df.sort_values(by=column_name)```


In [49]:
# assign

new_col_data = np.arange(candy.shape[0])
candy.assign(new_col=new_col_data)

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent,new_col
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725,0
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936,1
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086,2
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505,3
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282,80
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898,81
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519,82
WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308,83


In [50]:
# drop (single)

candy.drop(columns='chocolate')

Unnamed: 0_level_0,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
100 Grand,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


In [51]:
# drop (multiple)

candy.drop(columns=['chocolate','fruity','caramel','peanutyalmondy','nougat'])

Unnamed: 0_level_0,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
100 Grand,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...
Twizzlers,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,1,0,0,0.186,0.267,41.904308


In [52]:
# reset index

candy = candy.reset_index()
candy

Unnamed: 0,competitorname,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
0,100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
1,3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
2,One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
3,One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
4,Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...,...
80,Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
81,Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
82,WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
83,WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


In [53]:
# set index

candy = candy.set_index('competitorname')
candy

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519
WertherÕs Original Caramel,0,0,1,0,0,0,1,0,0,0.186,0.267,41.904308


In [54]:
# sort

candy.sort_values(by="winpercent")

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Nik L Nip,0,1,0,0,0,0,0,0,1,0.197,0.976,22.445341
Boston Baked Beans,0,0,0,1,0,0,0,0,1,0.313,0.511,23.417824
Chiclets,0,1,0,0,0,0,0,0,1,0.046,0.325,24.524988
Super Bubble,0,1,0,0,0,0,0,0,0,0.162,0.116,27.303865
Jawbusters,0,1,0,0,0,0,1,0,1,0.093,0.511,28.127439
...,...,...,...,...,...,...,...,...,...,...,...,...
Snickers,1,0,1,1,1,0,0,1,0,0.546,0.651,76.673782
Kit Kat,1,0,0,0,0,1,0,1,0,0.313,0.511,76.768600
Twix,1,0,1,0,0,1,0,1,0,0.546,0.906,81.642914
ReeseÕs Miniatures,1,0,0,1,0,0,0,0,0,0.034,0.279,81.866257


# Retrieving Information

## DataFrame methods

**Returns a Series**

retrieve column : ```df.get(column_name)```

**Returns a DataFrame**

retrieve several columns : ```df.get([col_1_name, ..., col_k_name])```

select row(s) by index position : ```df.take([pos_1, ..., pos_k])```

select row(s) using Boolean array : ```df[bool_array]```

**Returns other type**

number of rows : ```df.shape[0]``` **number**

number of columns : ```df.shape[1]``` **number**

retrieve element in the index by its position : ```df.index[position]``` **index name**

## Series methods

**Returns an element**

retrieve an element by the *row label* : ```ser.loc[label]```

retrieve an element by its *index position* : ```ser.iloc[position]```

In [55]:
# rows 

candy.shape[0]

85

In [56]:
# columns

candy.shape[1]

12

In [57]:
# get 

candy.get("chocolate") # Series

competitorname
100 Grand                     1
3 Musketeers                  1
One dime                      0
One quarter                   0
Air Heads                     0
                             ..
Twizzlers                     0
Warheads                      0
WelchÕs Fruit Snacks          0
WertherÕs Original Caramel    0
Whoppers                      1
Name: chocolate, Length: 85, dtype: int64

In [58]:
# get 

candy.get(["chocolate", "caramel"]) # DataFrame

Unnamed: 0_level_0,chocolate,caramel
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1
100 Grand,1,1
3 Musketeers,1,0
One dime,0,0
One quarter,0,0
Air Heads,0,0
...,...,...
Twizzlers,0,0
Warheads,0,0
WelchÕs Fruit Snacks,0,0
WertherÕs Original Caramel,0,1


In [59]:
# take

candy.take([0])

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.86,66.971725


In [60]:
# take

candy.take(np.arange(10))

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.86,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
Almond Joy,1,0,0,1,0,0,0,1,0,0.465,0.767,50.347546
Baby Ruth,1,0,1,1,1,0,0,1,0,0.604,0.767,56.914547
Boston Baked Beans,0,0,0,1,0,0,0,0,1,0.313,0.511,23.417824
Candy Corn,0,0,0,0,0,0,0,0,1,0.906,0.325,38.010963
Caramel Apple Pops,0,1,1,0,0,0,0,0,0,0.604,0.325,34.517681


In [61]:
# bool access

all_true = np.ones(candy.shape[0]).astype(bool) # must match number of rows in df!
print(all_true)

some_true = np.random.randint(2, size=candy.shape[0]).astype(bool) # must match number of rows in df!
print(some_true)

candy[some_true]

[ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True]
[ True  True  True  True  True  True  True  True  True  True False False
  True False False False False False False  True  True  True False False
  True False False False False False  True False  True  True False False
 False  True False  True False  True False False  True False False  True
 False  True False  True False False False False False False  True  True
  True False  True False  True  True False  True  True  True  True False
 False False False False  True  True False 

Unnamed: 0_level_0,chocolate,fruity,caramel,peanutyalmondy,nougat,crispedricewafer,hard,bar,pluribus,sugarpercent,pricepercent,winpercent
competitorname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
100 Grand,1,0,1,0,0,1,0,1,0,0.732,0.860,66.971725
3 Musketeers,1,0,0,0,1,0,0,1,0,0.604,0.511,67.602936
One dime,0,0,0,0,0,0,0,0,0,0.011,0.116,32.261086
One quarter,0,0,0,0,0,0,0,0,0,0.011,0.511,46.116505
Air Heads,0,1,0,0,0,0,0,0,0,0.906,0.511,52.341465
...,...,...,...,...,...,...,...,...,...,...,...,...
Tootsie Roll Snack Bars,1,0,0,0,0,0,0,1,0,0.465,0.325,49.653503
Twizzlers,0,1,0,0,0,0,0,0,0,0.220,0.116,45.466282
Warheads,0,1,0,0,0,0,1,0,0,0.093,0.116,39.011898
WelchÕs Fruit Snacks,0,1,0,0,0,0,0,0,1,0.313,0.313,44.375519


In [62]:
# series by label

win_series = candy.get("winpercent")

win_series.loc["Twizzlers"]

45.466282

In [63]:
# series by position

win_series.iloc[80]

45.466282