In [3]:
import pandas as pd

# Pandas Series Object

A Pandas Series is **a one-dimensional array** of indexed data, based on the NumPy ndarray.

But a Pandas Series object wraps both <span class="note">a sequence of values</span> and <span class="note">a sequence of indices</span>

<img src="./images/series_object.png" style="height:200px" alt="series_object">


References:

1. Series @ pandas user guide: https://pandas.pydata.org/docs/user_guide/dsintro.html#basics-series
2. Series @ pandas API: https://pandas.pydata.org/docs/reference/api/pandas.Series.html


## Create Series Object

### Create Series with Implicit Indexing

In [4]:
s1 = pd.Series([1,2,3,4,5])
s1

0    1
1    2
2    3
3    4
4    5
dtype: int64

### Create Series with Explicit Indexing

The explicit index definition gives the Series object additional capabilities compared to numpy arrays, i.e. the index need not to be an integers, but can consist of values of any desired type. 
Index values must have the same length as data values.

In [5]:
s1 = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])
s1

a    1
b    2
c    3
d    4
dtype: int64

### Create Series from dictionary

If we pass a dictionary to Series constructor, the dict keys will be used for index.

In [9]:
data_dict = { 
    'a': 1, 
    'b': 2,
    'c': 3
}

pd.Series(data_dict, name='COL')
data_dict.values()

dict_values([1, 2, 3])

If an index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [19]:
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'], name='numbers1')
s2 = pd.Series([4, 5, 6], index=['a', 'e', 'f'], name='numbers2')

s3 = s1 + s2

# s1.size
pd.notnull(s3)


a     True
b    False
c    False
e    False
f    False
dtype: bool

## Series Attributes

Series objec have defined next attributes

In [23]:
# Create a sample Series
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'], name='numbers')
print(s)
# Attributes of the Series
print("Index:", s.index)  # The index (axis labels) of the Series
print("Values:", s.values)  # The values of the Series
print("Data Type:", s.dtype)  # The data type of the values in the Series
print("Name:", s.name)  # The name of the Series
print("Shape:", s.shape)  # The shape of the Series (number of elements)
print("Number of Bytes:", s.nbytes)  # The number of bytes consumed by the Series
print("Number of Dimensions:", s.ndim)  # The number of dimensions of the Series (always 1 for a Series)
print("Size:", s.size)  # The number of elements in the Series
print("Is Empty:", s.empty)  # Returns True if the Series is empty

a    1
b    2
c    3
Name: numbers, dtype: int64
Index: Index(['a', 'b', 'c'], dtype='object')
Values: [1 2 3]
Data Type: int64
Name: numbers
Shape: (3,)
Number of Bytes: 24
Number of Dimensions: 1
Size: 3
Is Empty: False


In [27]:
# we can always reset index labels:
s.index = ['x','y','z']
s

x    1
y    2
z    3
Name: numbers, dtype: int64

## Access Elements (Series Indxing)

Let's have next series:

In [23]:
prices = pd.Series(
    [1.5, 2, 2.5, 3],
    index=["apples", "oranges", "bananas", "strawberries"]
)
prices

apples          1.5
oranges         2.0
bananas         2.5
strawberries    3.0
dtype: float64

### Access elements by position (implicit index): series.iloc[index]

We can access an element by its position (implicit index), using the **series.iloc[*index*]** attribute.
*index* must be an integer, which specifies a position, like in Python lists.

In [21]:
print( prices.iloc[0] )
print( prices.iloc[-1] )

1.5
3.0


### Access elements by index labels (explicit index): series.iloc[index]

We can access an element by its position (implicit index), using the **series.loc[*index*]** attribute.


In [31]:
prices.loc[['oranges', 'apples']]

oranges    2.0
apples     1.5
dtype: float64

In [29]:
prices.iloc[[0,1]]

apples     1.5
oranges    2.0
dtype: float64

As a shorthand, we can use:
- square bracket notation (as in Python dictionaries)
- dot notation (as in Python objects), which works only when labels are valid identifiers

In [28]:
# using square brackets notation:
print(prices['oranges'])

# using dot notation (works only when labels are valid identifiers)
print(prices.oranges)

2.0
2.0


### List indexes

Both .loc[] and .ilock[] can be used with a list of indexes/labels, to select multiple elements

In [32]:
prices.iloc[ [0, 1, 0] ]

apples     1.5
oranges    2.0
apples     1.5
dtype: float64

In [6]:
prices.loc[ ["apples", "oranges", "apples"] ]

apples     1.5
oranges    2.0
apples     1.5
dtype: float64

### Boolean Indexing:

Both .loc[] and .ilock[] can be used with a boolean array (see 'filtering by value (masking)' bellow)

In [7]:
mask = [False, False, True, True]
prices[mask]

bananas         2.5
strawberries    3.0
dtype: float64

In [32]:
prices

apples          1.5
oranges         2.0
bananas         2.5
strawberries    3.0
dtype: float64

In [34]:
prices[[True, False, False, False]]

apples    1.5
dtype: float64

In [9]:
# get elements which values are > 2
prices[prices>2]

bananas         2.5
strawberries    3.0
dtype: float64

In [35]:
prices[prices > 2]

bananas         2.5
strawberries    3.0
dtype: float64

### Series Slicing

We can pass slicing operators to .loc[] and .iloc[].

s.iloc[start:end:step] works exacly as in Python or numpy slicing.

Note, that when slicing with labels, like in s.loc[start:end:step], **both the start and the stop are included**.

In [15]:
# show the series
prices

apples          1.5
oranges         2.0
bananas         2.5
strawberries    3.0
dtype: float64

In [10]:
# positional slicing is exclusive
prices.iloc[0:2]

apples     1.5
oranges    2.0
dtype: float64

In [36]:
prices.iloc[:2:]

apples     1.5
oranges    2.0
dtype: float64

In [12]:
# label slicing is inclusive
prices['apples':'bananas']

apples     1.5
oranges    2.0
bananas    2.5
dtype: float64

In [13]:
# get all elements, from third till end
prices.iloc[2:]

bananas         2.5
strawberries    3.0
dtype: float64

In [17]:
# get all elements, from 'bananas' till end
prices.loc['bananas':]

bananas         2.5
strawberries    3.0
dtype: float64

In [18]:
# get last 3 elements
prices.iloc[-3:]

oranges         2.0
bananas         2.5
strawberries    3.0
dtype: float64

In [19]:
# slice with step 2:
prices[::2]

apples     1.5
bananas    2.5
dtype: float64

## Series Operations

### Arithemtic operations are point-to-point

In [37]:
prices + 2

apples          3.5
oranges         4.0
bananas         4.5
strawberries    5.0
dtype: float64

### Comparison operations are point-to-point

In [34]:
prices>2

apples          False
oranges         False
bananas          True
strawberries     True
dtype: bool

### Aligned operations

 When performing element-wise operations on two Series, Pandas matches the indices and performs the operations only on the matching index labels. If an index is present in one Series but not in the other, the result will have NaN for those indices.

In [38]:
# Create sample Series
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s3 = pd.Series([7, 8, 9], index=['a', 'x', 'y'])

print(s1)
print(s2)
print(s3)

a    1
b    2
c    3
dtype: int64
a    4
b    5
c    6
dtype: int64
a    7
x    8
y    9
dtype: int64


In [39]:
s1+s2

a    5
b    7
c    9
dtype: int64

In [41]:
s4 = s1+s3

In [42]:
pd.isna(s4) 

a    False
b     True
c     True
x     True
y     True
dtype: bool

### Dictionary like operation on Series

In [47]:
"apples" in prices

False

In [49]:
10 in prices.values

False

In [19]:
3 in s

False

### Missing Data

In [20]:
s1 = pd.Series([1,3], index=["a","c"], dtype="int32")
s2 = pd.Series([2,3], index=["b","c"], dtype="int32")

In [21]:
s1+s2

a    NaN
b    NaN
c    6.0
dtype: float64