### Pandas

#### Creating Series

Let's import the Pandas library

In [1]:
import pandas as pd

Let's create a empty Pandas Series

In [None]:
empty_series = pd.Series()
empty_series

In [None]:
type(empty_series)

In [None]:
empty_series.dtype

In [None]:
empty_series.ndim

In [None]:
empty_series.shape

In [None]:
empty_series.size

In [None]:
empty_series.index # Get index of Series as a RangeIndex object

In [None]:
empty_series.values # Get values stored in Series as NumPy array

In [13]:
# The default name of a series is None, the '0' you see at the top of the series is just a visual artifact which cannot be accessed
empty_series.name

In [None]:
type(empty_series.name)

Let's create a Series with Elements

In [None]:
my_series = pd.Series(data = [1, 2, 3, 4])
my_series

In [None]:
type(my_series)

In [None]:
my_series.ndim

In [None]:
my_series.shape

In [None]:
my_series.size

In [None]:
my_series.index

In [None]:
my_series.values

In [None]:
mixed_series = pd.Series([1, True, 'yes', 3.14, [42, 0]])
mixed_series

In [None]:
mixed_series.dtype

In [None]:
named_series = pd.Series(data = ['Ashok', 'Kumar', 18, 183], index = ['Forename', 'Surname', 'Age', 'Height'], name = 'Student Info')
named_series

In [None]:
named_series['Forename']

In [None]:
named_series.index

In [None]:
named_series.values

In [None]:
named_series.name

In [32]:
# pd.Series(data = [1, 2, 3], index = ['A', 'B', 'C', 'D']) # the lenght of the data and index should match

We can create Series from other data structures

In [None]:
my_list = [1, 2, 3, 4]
pd.Series(my_list)

In [None]:
my_tuple = (1, 2, 3, 4)
pd.Series(my_tuple)

In [35]:
# my_set = {1, 2, 3, 4}
# pd.Series(my_set) # Sets are unordered and cannot be used to make a Series

In [None]:
my_dict = {'A': 1, 'B': 2, 'C': 3, 'D': 4}
pd.Series(my_dict)

Pandas series are strictly one dimentional

In [None]:
pd.Series([[1, 2], [3, 4], [5, 6]])

In [43]:
import numpy as np
# pd.Series(np.array([[1, 2], [3, 4], [5, 6]]))

#### Indexing and Slicing Series

You can index and slice a Pandas Series in two ways: by using the position with `.iloc[]`, or by using the label with `.loc[]`.

In [44]:
import pandas as pd

In [47]:
my_series = pd.Series(data = [32, 33, 30, 29, 36], index = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri'], name = 'Peak Temperature')
my_series

Mon    32
Tue    33
Wed    30
Thu    29
Fri    36
Name: Peak Temperature, dtype: int64

We can index and slice Series by position with `.iloc[]`, the integer-location-based (position-based) indexer

In [48]:
my_series.iloc[0]

np.int64(32)

In [49]:
my_series.iloc[4]

np.int64(36)

In [50]:
my_series.iloc[-1]

np.int64(36)

In [52]:
my_series.iloc[1:3] # .iloc[] excludes the last element, much like regular indexing

Tue    33
Wed    30
Name: Peak Temperature, dtype: int64

In [53]:
my_series.iloc[::-1]

Fri    36
Thu    29
Wed    30
Tue    33
Mon    32
Name: Peak Temperature, dtype: int64

We can also index and slice Series using `.loc[]`, the label-based indexer, or just by using regular subscripting

In [56]:
my_series['Tue']

np.int64(33)

In [57]:
my_series.loc['Tue']

np.int64(33)

In [58]:
my_series.loc['Tue':]

Tue    33
Wed    30
Thu    29
Fri    36
Name: Peak Temperature, dtype: int64

In [60]:
my_series.loc['Tue':'Thu'] # .loc[] is inclusive of the last element, unlike .iloc[]

Tue    33
Wed    30
Thu    29
Name: Peak Temperature, dtype: int64

In [61]:
my_series.loc['Tue':'Thu':2]

Tue    33
Thu    29
Name: Peak Temperature, dtype: int64

If we sort the series, observe how .iloc[] reflects the new positions whereas .loc[] follows the same labels as before

In [62]:
my_series

Mon    32
Tue    33
Wed    30
Thu    29
Fri    36
Name: Peak Temperature, dtype: int64

In [65]:
sorted_series = my_series.sort_values() # method to sort by values in ascending order
sorted_series

Thu    29
Wed    30
Mon    32
Tue    33
Fri    36
Name: Peak Temperature, dtype: int64

In [66]:
my_series.iloc[1]

np.int64(33)

In [67]:
sorted_series.iloc[1]

np.int64(30)

In [None]:
my_series.loc

In [68]:
my_series.loc['Tue']

np.int64(33)

In [69]:
sorted_series.loc['Tue']

np.int64(33)

We can index Series with conditions

In [71]:
mask = my_series >= 30 # Creating a 'boolean mask' which we can use to index the series
mask

Mon     True
Tue     True
Wed     True
Thu    False
Fri     True
Name: Peak Temperature, dtype: bool

In [72]:
my_series.loc[mask]

Mon    32
Tue    33
Wed    30
Fri    36
Name: Peak Temperature, dtype: int64

In [73]:
my_series.loc[my_series.isin([28, 29, 30])] # check if value belongs to the list

Wed    30
Thu    29
Name: Peak Temperature, dtype: int64

In [75]:
my_series.loc[~my_series.isin([28, 29, 30])] # negation of the above; check if value does nor belongs to the list

Mon    32
Tue    33
Fri    36
Name: Peak Temperature, dtype: int64

In [77]:
# my_series.loc[(my_series >= 30) and (my_series < 35)] # in Pandas, while chaining conditions, use ~ for not, | for or, & for and

In [78]:
my_series.loc[(my_series >= 30) & (my_series < 35)]

Mon    32
Tue    33
Wed    30
Name: Peak Temperature, dtype: int64

We can use 'fancy indexing' to retrieve only certain elements of a Series by its position or label-based index

In [79]:
my_series.iloc[[0, 0, 4, 2]]

Mon    32
Mon    32
Fri    36
Wed    30
Name: Peak Temperature, dtype: int64

In [83]:
my_series.loc[['Mon', 'Wed', 'Fri', 'Fri']]

Mon    32
Wed    30
Fri    36
Fri    36
Name: Peak Temperature, dtype: int64

#### Operations on Series