# 3.3 Series
`Series` is one of the two data structures the `pandas library supports`. The other data structure provided by `pandas` is the `DataFrame`.

`Series` is a one-dimensional labeled array that contains any data type. A Series object is created with the `pandas.Series(data)` where data is an array-like objects (Python `list`, `ndarray`) or a Python dictionary `dict`.

In [1]:
import pandas as pd

arr = pd.Series([1,2,3])
print(arr)

0    1
1    2
2    3
dtype: int64


From the printed output, the left column is the label of each row (a.k.a. index of the row) and the right column is the data. We can also see that the data type for this `Series` is `int64`.

By default, `Series` uses the numerical index. We can change the index by specifying the argument `index=`. The value of `index` should be an array-like object with the same size as the data.

In [2]:
import pandas as pd

arr = pd.Series([1,2,3], index=['a','b','c'])
print(arr)

a    1
b    2
c    3
dtype: int64


arr now has `['a','b','c']` as index instead of `[0,1,2]`. 

We can get the index of the `Series` with `Series.index`. 

In [3]:
import pandas as pd

arr = pd.Series([1,2,3], index=['a','b','c'])
print(arr.index)

Index(['a', 'b', 'c'], dtype='object')


If we pass a Python dictionary `dict` to create a `Series`,  the key of the `dict` becomes the index of the data.

In [4]:
import pandas as pd

arr = pd.Series({'b':1, 'a':2, 'c':3})
print(arr)

b    1
a    2
c    3
dtype: int64


Note that the created Series follows the order of the `dict` instead of a sorted manner. If we pass the argument `index=`, the `Series` will follow the order in the index.

In [5]:
import pandas as pd

arr = pd.Series({'b':1, 'a':2, 'c':3, 'e':4}, index=['a','b','c','d'])
print(arr)

a    2.0
b    1.0
c    3.0
d    NaN
dtype: float64


In the created `Series`, only items with key in index is created. If an index does not exist as a key to the dictionary, it will be populated as `NaN`, which stands for **not-a-number**. If a key in the dictionary does not exist in the index, it will not be created in the Series.

We can pass a scalar value, i.e. a single value, to create a `Series` with multiple `index`.

In [6]:
import pandas as pd

arr = pd.Series(1, index=['a','b','c'])
print(arr)

a    1
b    1
c    1
dtype: int64


The same value will be used for all the indices created. 



## 3.3.1 Indexing and slicing Series
**Access single item**

The data in a `Series` can be accessed with their respective label (`index`).

In [7]:
import pandas as pd

arr = pd.Series([1,2,3], index=['a','b','c'])
print(arr['b'])

2


If non-numerical index is specified, we can use numbers to access the data based on their position/sequence, i.e. 0 for the first item, 1 for the second item, and so forth.

In [None]:
import pandas as pd

arr = pd.Series([1,2,3], index=['a','b','c'])
print(arr[0])

If the `Series` is indexed with numerical values, numbers will be treated as the index instead of the position.



In [15]:
import pandas as pd

arr = pd.Series([1,2,3], index=[3,0,1])
print(arr)
print(arr[0])

3    1
0    2
1    3
dtype: int64
2


### 3.3.1.1 Slice a Series 
A `Series` can be sliced using a range of numerical index.

In [16]:
import pandas as pd

arr = pd.Series([1,2,3], index=['a','b','c'])
print(arr[1:])

b    2
c    3
dtype: int64


To access multiple items that are not positioned together, we can use a list of index to slice them. This applies to the label as well as the numerical indices.


In [None]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr[[0,2,4]])
print(arr[['a','c','e']])

Boolean indexing also applies for `Series`.

In [18]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr[arr > 3])
print(arr[arr % 2 == 1]) # odd numbers

d    4
e    5
dtype: int64
a    1
c    3
e    5
dtype: int64


### 3.3.1.2 Indexing with .loc and .iloc
pandas has also provided two methods for Series to perform indexing and slicing. In short, `.loc` is label based indexing and `.iloc` is integer position based indexing. Note that the inputs for `.loc` and `.iloc` are specified within square brackets `[]` instead of parentheses `()`.

Label is the index that has been specified when a `Series` is created. Similar to the basic indexing and slicing using square brackets `[]`, we can use the following inputs allowed for `.loc`:

- a single label
- a list of labels
- a slice object with labels

**.loc with a single label**

When we are using a single label as the input of `.loc`, it will be treated as the label and return the value of the label if the label exists in the `Series`. This is equivalent to `.at`.

In [19]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.loc['a'])
print(arr.at['a'])

arr2 = pd.Series(['a','b','c','d','e'], index=[1,3,5,7,9])
print(arr2.loc[1])
print(arr2.at[1])

1
1
a
a


**.loc with a list of labels**

Multiple values can be accessed when we specify multiple labels in a list to `.loc`.

In [20]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.loc[['a','c','e']])

a    1
c    3
e    5
dtype: int64


**.loc with a slice object with labels**

A slice object refers to using a colon `:` to specify a range of values.

In [21]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.loc['a':'d'])

a    1
b    2
c    3
d    4
dtype: int64


Note that, while using a Python list with the syntax of `arr[start:stop:step]`, the sliced values do not include the value of stop. However in Series, using `.loc`, the value of the label stop is included. The following code snippet shows an example to illustrate this difference in behaviour.

In [22]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.loc['a':'d'])

arr2 = [1,2,3,4,5]
print(arr2[0:3])

a    1
b    2
c    3
d    4
dtype: int64
[1, 2, 3]


Similarly the allowed inputs for `.iloc` include
- a single integer,
- a list of integers, and
- a slice object with integers.

Only integers are allowed for `.iloc` as they will be interpreted as the position in the `Series`.

**.iloc with a single integer**

The integer passed to `.iloc` will always be treated as the position even if the indices of the `Series` are integers. This is equivalent to `.iat`.

In [23]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.iloc[0])
print(arr.iat[0])

arr2 = pd.Series(['a','b','c','d','e'], index=[1,3,5,7,9])
print(arr2.iloc[1])
print(arr2.iat[1])

1
1
b
b


**.iloc with a list of integers**

Multiple values can be accessed with a list of integers as the input for `.iloc`. 

In [24]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.iloc[[0,1,3]])

a    1
b    2
d    4
dtype: int64


**.iloc with a slice object with integers**

We can use a slice object with integers to access or slice a range of values in a `Series`. Note that with .iloc the behaviour is similar to using the syntax `start:stop:step`, where the value of stop is not included, unless for `.loc`.



In [25]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.iloc[1:3])

arr2 = [1,2,3,4,5]
print(arr2[1:3])

b    2
c    3
dtype: int64
[2, 3]


**How about boolean indexing?**

Boolean indexing is available for both `.loc` and `.iloc` with the same expected outcome.

In [26]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
boo = [True, True, False, False, False]
print(arr.loc[boo])
print(arr.iloc[boo])

a    1
b    2
dtype: int64
a    1
b    2
dtype: int64


How about if we want to slice the values that's less than 3? Can we do `arr.loc[arr<3]` and `arr.iloc[arr<3]`? Short answer, yes for `arr.loc[arr<3]`, it will return the expected result, whereas no for `arr.iloc[arr<3]`, it will raise a `ValueError` exception.

To understand this behaviour, we need to first understand the output of `arr<3`.

In [27]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr<3)

a     True
b     True
c    False
d    False
e    False
dtype: bool


As seen from the previous code snippet, arr<3 creates a Series with the same label/index as the original `Series`. The boolean values are therefore mapped to the specific labels. In this case, we have a and b being True and others being False. Therefore doing `arr.loc[arr<3]` will produce the output identical to `arr.loc[['a','b']]`. 

In [28]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
print(arr.loc[arr<3])

a    1
b    2
dtype: int64


If we use `arr.iloc[arr<3]`, `iloc` will try to map the `a` and `b` with the integer position, which are not available. A workaround would be to convert the Series created from `arr<3` to list before passing as input for `.iloc`. This will create a list of boolean values, which will be mapped according to the position instead of label.

In [29]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=['a','b','c','d','e'])
lessthan3 = list(arr<3)
print(lessthan3)
print(arr.iloc[lessthan3])

[True, True, False, False, False]
a    1
b    2
dtype: int64


How about if the labels of the `Series` are integers? To avoid confusion of whether the integers refer to the label or the position, `.iloc` blocked the usage. We can see the exception raised when this situation happens in the following code snippet.

In [30]:
import pandas as pd

arr = pd.Series([1,2,3,4,5], index=[5,4,3,2,1])
lessthan3 = arr < 3
print(lessthan3)

print("\n.loc")
print(arr.loc[lessthan3])

print("\n.iloc")
print(arr.iloc[lessthan3])

5     True
4     True
3    False
2    False
1    False
dtype: bool

.loc
5    1
4    2
dtype: int64

.iloc


NotImplementedError: iLocation based boolean indexing on an integer type is not available