<b>Copyright Notice</b><br>
Copyright © 2019 DigiPen (USA) Corp. and its owners.  All rights reserved.<br>
No parts of this publication may be copied or distributed, transmitted, transcribed, stored in a retrieval system, or translated into any human or computer language without the express written permission of DigiPen (USA) Corp., 9931 Willows Road NE, Redmond, WA 98052<br>
<b>Trademarks</b><br>
DigiPen® is a registered trademark of DigiPen (USA) Corp.<br>
All other product names mentioned in this booklet are trademarks or registered trademarks of their respective companies and are hereby acknowledged.

Pandas library was first developed by Wes McKinney in 2008 for data manipulation and analysis.

#### References:
    www.python.org
    www.numpy.org
    www.matplotlib.org
    https://pandas.pydata.org

#### Questions/feedback: petert@digipen.edu

# Chapter07: Pandas Series
## pandas
    - Series, Index, Values
    - Selection and Filtering

### Import pandas:
    using 'pd'  is standard by Python users
    import frequently used DataFrame and Series onto local namespace is a good practice

In [1]:
import pandas as pd                     # using 'pd'  is standard by Python users
from pandas import DataFrame            # optional, good practice
from pandas import Series               # optional, good practice

import numpy as np

### Series
    - one dimensional, similar to an array
    - a sequence (series) of values
    - associated (series of) data labels
##### Examples:

In [2]:
# create Series from a list of values
myseries = pd.Series(['apple', 2.7, 5, 'Friday', 42])
myseries

0     apple
1       2.7
2         5
3    Friday
4        42
dtype: object

In [3]:
l = ['apple', 2.7, 5, 'Friday', 42]

In [4]:
l[2]

5

In [5]:
print(['apple', 2.7, 5, 'Friday', 42])

['apple', 2.7, 5, 'Friday', 42]


In [6]:
type(myseries)

pandas.core.series.Series

Notes:
- the data type is shown as object, the elements are ints, floats and strings
- there is an index line starting from 0

Let's see what happens if we create a series using integers and floats:

In [7]:
# create Series from a numerical array of values (ints)
myseries = pd.Series(range(4))
myseries

0    0
1    1
2    2
3    3
dtype: int64

In [8]:
# create Series from a numerical array of values (floats)
myseries = pd.Series(np.arange(4.0))  # using numpy array this time
myseries

0    0.0
1    1.0
2    2.0
3    3.0
dtype: float64

Now let's mix: load a list of integers and floats:

In [9]:
# specify a series of values and associated indices
myseries = pd.Series([3.14, 2.71, 42, 101])
myseries

0      3.14
1      2.71
2     42.00
3    101.00
dtype: float64

Now let's load a list of strings when creating a series:

In [10]:
# specify a series of values and associated indices
myseries = pd.Series(["first", "second", "third", "fourth"])
myseries

0     first
1    second
2     third
3    fourth
dtype: object

In [11]:
print(type(myseries[2]))

<class 'str'>


Notes:
- The type of the series is recognized as object
- Individual elements of the series are still recognized as strings:

In [12]:
print("The type of the second element in the series:", type(myseries[2]))

The type of the second element in the series: <class 'str'>


##### Specifying and reordering the index
Specify using a list if indices:

In [13]:
# specify a series of values and associated indices
myseries = pd.Series(["first | a", "second | c", "third | d", "fourth | b"], index=['a', 'c', 'd', 'b'])
myseries

a     first | a
c    second | c
d     third | d
b    fourth | b
dtype: object

Reorder using a list if indices:

In [14]:
# reassign associated indices, note that the order of the values are not changing, only the index names
myseries.index = ['a', 'b', 'c', 'd']
myseries

a     first | a
b    second | c
c     third | d
d    fourth | b
dtype: object

Note the indices were reordered, though the order of the series did not change

##### Reference values and indices of a Series:

In [15]:
# recreate the same Series
myseries = pd.Series(["first | a", "second | c", "third | d", "fourth | b"], index=['a', 'c', 'd', 'b'])
print("The series:")
print(myseries, "\n")

# retrieve index values
print('Index values:')
print(myseries.index)

# retrieve values of the series
print('\nValues in the series:')
print(myseries.values)

The series:
a     first | a
c    second | c
d     third | d
b    fourth | b
dtype: object 

Index values:
Index(['a', 'c', 'd', 'b'], dtype='object')

Values in the series:
['first | a' 'second | c' 'third | d' 'fourth | b']


Both the indices and the values are iterable:

In [16]:
print("The second index is:", myseries.index[1])
print("The second value is:", myseries.values[1])

The second index is: c
The second value is: second | c


Reference and retrieve by index:

In [17]:
# reference and retrieve by index
print('Reference a single value by its index:')
print(myseries['c'])
print('\nReference multiple values by their indices:')
# reference and retrieve by indeces
print(myseries[['c', 'a']])

Reference a single value by its index:
second | c

Reference multiple values by their indices:
c    second | c
a     first | a
dtype: object


In [18]:
l = ['a', 'c', 'a']
myseries[l]

a     first | a
c    second | c
a     first | a
dtype: object

In [20]:
# retrieve values based on condition
numberSeries = pd.Series([3.14, 42, 2.71, 101])
print('List all values if they are greater than 40:')
numberSeries[numberSeries>40]

List all values if they are greater than 40:


1     42.0
3    101.0
dtype: float64

Lets go back to the indices and values

Swap the values so they follow their original indices:

In [21]:
# recreate the same Series
myseries = pd.Series(["first | a", "second | c", "third | d", "fourth | b"], index=['a', 'c', 'd', 'b'])
myseries

a     first | a
c    second | c
d     third | d
b    fourth | b
dtype: object

In [22]:
# swap 2nd and 4th elements
t = myseries[1]   # store 2nd temporarily
myseries[1] = myseries[3]
myseries[3] = t
myseries

a     first | a
c    fourth | b
d     third | d
b    second | c
dtype: object

In [None]:
# now swap 3rd and 4th elements
t = myseries[2]
myseries[2] = myseries[3]
myseries[3] = t
myseries

Notice that now the indices have not changed

How to handle such operation if there is no built in method? Write your own function!

In [None]:
# swap takes two numbers as indices and a series
def swap(n,m,s):
    tv = s[n]
    ti = s.index.values[n]
    
    s[n] = s[m]
    s.index.values[n] = s.index.values[m]
    
    s[m] = tv
    s.index.values[m] = ti
    
    print(s)

In [None]:
# specify the original series again
myseries = pd.Series(["first | a", "second | c", "third | d", "fourth | b"], index=['a', 'c', 'd', 'b'])
myseries

Swap 2nd and 4th elements:

In [None]:
swap(1, 3, myseries)

Swap 3rd and 4th elements:

In [None]:
swap(2, 3, myseries)

Now both values and indices are swapped at the same time.

In [None]:
myseries = pd.Series(["first | a", "second first | a", "second | c", "third | d", "fourth | b"], index=['a', 'a', 'c', 'd', 'b'])
myseries

In [None]:
myseries.values

In [None]:
myseries.index

#### Homework 7.1:
Create a series:
- create a series using 5 random integers between 1 and 9
- display a pie chart using pd.Series.plot.pie()

In [None]:
# Homework 7.1 code comes here:



#### Homework 7.2:
- create a series using 20 random integers between 1 and 9
- display a histogram using pd.Series.plot.hist()

In [None]:
# Homework 7.2 code comes here:



#### Homework 7.3:
- create a series using 20 random integers between 1 and 9
- display a box plot using pd.Series.plot.hist()

In [None]:
# Homework 7.3 code comes here:

