<a href="https://colab.research.google.com/github/verticalmeadows/freecodecamp-intro-to-pandas/blob/master/Copy_of_1_Pandas_Series_checkpoint.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![rmotr](https://user-images.githubusercontent.com/7065401/52071918-bda15380-2562-11e9-828c-7f95297e4a82.png)
<hr style="margin-bottom: 40px;">

<img src="https://user-images.githubusercontent.com/7065401/75165824-badf4680-5701-11ea-9c5b-5475b0a33abf.png"
    style="width:300px; float: right; margin: 0 40px 40px 40px;"></img>

# Pandas - Series


![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)

## Hands on! 

In [1]:
import pandas as pd
import numpy as np

## Pandas Series

We'll start analyzing "[The Group of Seven](https://en.wikipedia.org/wiki/Group_of_Seven)". Which is a political formed by Canada, France, Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing population, and for that, we'll use a `pandas.Series` object.

In [2]:
# In millions
g7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])

In [None]:
g7_pop

Someone might not know we're representing population in millions of inhabitants. Series can have a `name`, to better document the purpose of the Series:

In [3]:
g7_pop.name = 'G7 Population in millions'

In [None]:
g7_pop

Series are pretty similar to numpy arrays:

In [None]:
g7_pop.dtype

In [None]:
g7_pop.values

They're actually backed by numpy arrays:

In [None]:
type(g7_pop.values)

And they _look_ like simple Python lists or Numpy Arrays. But they're actually more similar to Python `dict`s.

A Series has an `index`, that's similar to the automatic index assigned to Python's lists:

In [None]:
g7_pop[0]

In [None]:
g7_pop[1]

In [None]:
g7_pop.index

But, in contrast to lists, we can explicitly define the index:

In [4]:
g7_pop.index = [
    'Canada',
    'France',
    'Germany',
    'Italy',
    'Japan',
    'United Kingdom',
    'United States',
]

In [None]:
g7_pop

Compare it with the [following table](https://docs.google.com/spreadsheets/d/1IlorV2-Oh9Da1JAZ7weVw86PQrQydSMp-ydVMH135iI/edit?usp=sharing): 

<img width="350" src="https://user-images.githubusercontent.com/872296/38149656-b5ce9816-3431-11e8-88e4-195756e25355.png" />

Series when indexed like this look a lot like Python dictionaries. But Series are ordered and dictionaries are not!

We can say that Series look like "ordered dictionaries". We can actually create Series out of dictionaries:

In [None]:
pd.Series({
    'Canada': 35.467,
    'France': 63.951,
    'Germany': 80.94,
    'Italy': 60.665,
    'Japan': 127.061,
    'United Kingdom': 64.511,
    'United States': 318.523
}, name='G7 Population in millions')

In [None]:
pd.Series(
    [35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
    index=['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United Kingdom',
       'United States'],
    name='G7 Population in millions')

You can also create Series out of other series, specifying indexes:

In [None]:
pd.Series(g7_pop, index=['France', 'Germany', 'Italy', 'Spain'])

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Indexing

Indexing works similarly to lists and dictionaries, you use the **index** of the element you're looking for:

In [5]:
g7_pop['Canada']

35.467

In [6]:
g7_pop['Japan']

127.061

Numeric positions can also be used, with the `iloc` attribute:

In [8]:
g7_pop[0] # this works too

35.467

In [7]:
g7_pop.iloc[0]

35.467

In [None]:
g7_pop.iloc[-1]

Selecting multiple elements at once:

In [None]:
g7_pop[['Italy', 'France']]

_(The result is another Series)_

In [None]:
g7_pop.iloc[[0, 1]]

Slicing also works, but **important**, in Pandas, the upper limit is also included:

In [None]:
g7_pop['Canada': 'Italy']

In [9]:
g7_pop[0:2] # using numbers it doesn't contain the one UP TO which we select it.

Canada    35.467
France    63.951
Name: G7 Population in millions, dtype: float64

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Conditional selection (boolean arrays)

The same boolean array techniques we saw applied to numpy arrays can be used for Pandas `Series`:

In [None]:
g7_pop

In [None]:
g7_pop > 70

In [None]:
g7_pop[g7_pop > 70]

In [None]:
g7_pop.mean()

In [10]:
g7_pop[g7_pop > g7_pop.mean()]

Japan            127.061
United States    318.523
Name: G7 Population in millions, dtype: float64

In [None]:
g7_pop.std()

In [None]:
g7_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + g7_pop.std() / 2)]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Operations and methods
Series also support vectorized operations and aggregation functions as Numpy:

In [None]:
g7_pop * 1_000_000

In [None]:
g7_pop.mean()

In [None]:
np.log(g7_pop)

In [None]:
g7_pop['France': 'Italy'].mean()

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Boolean arrays
(Work in the same way as numpy)

In [None]:
g7_pop

In [None]:
g7_pop > 80

In [None]:
g7_pop[g7_pop > 80]

In [None]:
g7_pop[(g7_pop > 80) | (g7_pop < 40)]

In [None]:
g7_pop[(g7_pop > 80) & (g7_pop < 200)]

![green-divider](https://user-images.githubusercontent.com/7065401/52071924-c003ad80-2562-11e9-8297-1c6595f8a7ff.png)

## Modifying series


In [None]:
g7_pop['Canada'] = 40.5

In [None]:
g7_pop

In [None]:
g7_pop.iloc[-1] = 500

In [None]:
g7_pop

In [None]:
g7_pop[g7_pop < 70] = 99.99

In [None]:
g7_pop

![purple-divider](https://user-images.githubusercontent.com/7065401/52071927-c1cd7100-2562-11e9-908a-dde91ba14e59.png)
