In [1]:
import pandas as pd
import numpy as np

**Querying a Series**
- `iloc[]`, `loc[]`

In [2]:
students_class = {'Alice':'Physics',
                  'Jack':'Chemistry',
                  'Molly': 'English',
                  'Sam':'History'}
s = pd.Series(students_class)
s

Alice      Physics
Jack     Chemistry
Molly      English
Sam        History
dtype: object

**Ways to get data out of the series**

To see the fourth entry, we can use `iloc`

In [3]:
s.iloc[3], s[3]

('History', 'History')

In [4]:
s.loc['Molly'], s['Molly']

('English', 'English')

In [5]:
class_code = {99: 'Physics',
              100: 'Chemistry',
              101: 'English',
              102: 'History'}
s = pd.Series(class_code)

In [6]:
s.iloc[0]

'Physics'

A typical programmatic approach to iterate over all the items in the series and invoke the operation one is intrested in. For instance, we could create a series of integers represneting student grades, and just try and get an average grade.

In [7]:
# Approach 1
grades = pd.Series([90,80,70,60])

total = 0
for grade in grades:
  total += grade
print(total/len(grades))


75.0


In [8]:
# Approach 2
total = np.sum(grades)
print(total/len(grades))

75.0


Example

In [9]:
numbers = pd.Series(np.random.randint(0,1000,10000))
numbers.head(), len(numbers)

(0    207
 1    601
 2    766
 3    392
 4    774
 dtype: int64, 10000)

In [10]:
%%timeit -n 100
total = 0
for number in numbers:
  total += number

total/len(grades)

1.68 ms ± 470 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [11]:
%%timeit -n 100
total = 0
total = np.sum(numbers)
total/len(numbers)

70 µs ± 15.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


Here, we can find that the second approach can process much faster.

In [12]:
print(numbers.head())
numbers += 2
print(numbers.head())

0    207
1    601
2    766
3    392
4    774
dtype: int64
0    209
1    603
2    768
3    394
4    776
dtype: int64


**`iteritmes()`**: 
iterate through all of the items in the series

In [13]:
# For the updated version, `set_value` has changed to `_set_value
for label, value in numbers.iteritems():
  # noe for hte item which is returned, lets call set_value()
  numbers._set_value(label, value+2)

numbers.head()

0    211
1    605
2    770
3    396
4    778
dtype: int64

In [15]:
 #%%timeit -n 10
 #s = pd.Series(np.random.randint(0,1000,10000))
 
 #for label, value in s.iteritems():
 #  s.loc[label] = value + 2