## Pandas Series

The most basic object in Pandas is a __Series__. To visualise it easily, a series can be thought of as a __one-dimensional (1D) NumPy array__ with a label and an index attached to it. Also, <font color="red">unlike NumPy arrays, they can contain non-numeric data (characters, dates, time, booleans, etc)</font>. Usually, you will work with Series only as part of DataFrames.

First, let us import the necessary libraries for our use.

In [1]:
import numpy as np
import pandas as pd

Now, let us start learning on how to create a Pandas Series.

### Creating a Pandas Series

Pandas Series can be created from the lists, dictionary, and from a scalar value etc. To create a Pandas Series, we use the create() method used in Pandas. It has the following syntax – 

__`identifier = pandas.Series(data,dtype)`__

The parameter `data` is a data structure containing n number of dimensions, whereas the `dtype` is the data type of the series. Let us see an example.

In [20]:
pylist=[1,2,'Yash',3,5.6,'Python',True]
pytup=3,'Python',5,True,'Yash',5.6
pyset=set(pylist)
pydict={'{}M'.format(i) if i!=0 else i:pylist[i] for i in range(len(pylist))}

Now, let us create our Pandas series.

In [21]:
ser1,ser2,ser3=pd.Series(pylist),pd.Series(pytup),pd.Series(pydict)

In [22]:
ser1

0         1
1         2
2      Yash
3         3
4       5.6
5    Python
6      True
dtype: object

In [23]:
ser3

0          1
1M         2
2M      Yash
3M         3
4M       5.6
5M    Python
6M      True
dtype: object

See? We have easily created our Pandas Series. One basic point you should notice is that the data type in each of them is an `object`, which is different from a NumPy Array. 

If we create a NumPy array for the same, we will see a single data type. 

In [24]:
nparr=np.array(pylist)
nparr

array(['1', '2', 'Yash', '3', '5.6', 'Python', 'True'], dtype='<U32')

See? Our array automatically changed all elements to string data type. This is due to __data type precedence__. This is where Series are useful in comparison to NumPy arrays. We can also notice that the __series `ser3` contains an unordered index__; Another advantage from arrays. Let us see another example.

In [25]:
ser4=pd.Series({'font':123,'xyz':34.5,'yash':True})
ser4

font     123
xyz     34.5
yash    True
dtype: object

See? This is an unordered index. We will access the elements inside it using its index only, just like a dictionary. That is why __it is easiest to create pandas objects from dictionaries__.

### Accessing Elements inside a Pandas Series

To access elements inside a Pandas series, we follow the same rules as we follow in NumPy arrays. The only difference is due to the fact that the indexes are mutable in series, not in arrays. So, we will see for two types of series – Ordered Index and Unordered Index series. 

#### Ordered Indexes

In case of Ordered Index Series, we will follow exactly the same syntax as the NumPy arrays. We can do both slicing and indexing, with the only exception being that <b>we cannot do negative indexing in Series</b>. Let us see an example. 

In [192]:
ser=pd.Series(np.random.randint(1,5,10))
ser

0    4
1    4
2    3
3    2
4    3
5    4
6    2
7    4
8    1
9    4
dtype: int32

In [193]:
ser[5],ser[8]

(4, 1)

In [67]:
ser[-1]

KeyError: -1

Here, you can see that we get a valuerror, which says that -1 is not in range. This is because we cannot use negative indexing here. We can, however, use the `len()` function and subtract it by `1` to get the last element. Let us see it. 

In [196]:
ser[len(ser)-1]

4

Here, we have hardcoded the argument, which really defeats the purpose of negative indexing, but still, it is a quick workaround. Now, let us move towards <b>slicing</b>. 

In [197]:
ser[3:8]

3    2
4    3
5    4
6    2
7    4
dtype: int32

In [198]:
ser[2::3]

2    3
5    4
8    1
dtype: int32

See? We can easily use indexing and slicing in this case. Now, let us see for negative slicing. In case of negative slicing, __we can use negative slicing__. Let us see some examples.

In [199]:
ser[-2:3:-1]

8    1
7    4
6    2
5    4
4    3
dtype: int32

In [200]:
ser[3:-3]

3    2
4    3
5    4
6    2
dtype: int32

See? This is how indexing is done for ordered indexes. Now, let us see for unordered indexes. 

#### Unordered Indexes

In this case, we can do indexing, but it has to be done, like we do in dictionaries. We have to specify the exact index value inside the square brackets for indexing to work. Let us see an example for this. We will use the `index` keyword argument of the `Series()` method in pandas for creating the series. 

In [79]:
ser=pd.Series(np.random.randint(2,5,10),index=np.random.randint(2,15,10))
ser

4     4
3     3
11    3
4     3
5     3
11    3
5     3
12    2
11    2
10    4
dtype: int32

In [84]:
ser[10]

4

Here, you can see that __in the case of indexing, it is returning values as per the indexes assigned__ to those values. __This won't be the same in the case of slicing__. Let us see some examples.

In [82]:
ser[4:8]

5     3
11    3
5     3
12    2
dtype: int32

In [83]:
ser[-1:5:-1]

10    4
11    2
12    2
5     3
dtype: int32

See? So it does not matter what are the names of your indexes, because eventually, they will follow normal indexing standards as per pandas in the case of slicing. In the case of indexing, you have to be careful.

#### Multiple Values, Same Index

Now, imagine you have __same index for multiple values__. What would happen when you use indexing?

In [77]:
ser=pd.Series(np.random.randint(2,10,5),index=['one','two','three','two','five'])
ser

one      2
two      7
three    9
two      9
five     4
dtype: int32

In [78]:
ser['two']

two    7
two    9
dtype: int32

As you can see, it returned all the values that has the same index. This is another advantage of Series. 

#### Filter Indexing

We can also use __filter indexing__ on Pandas Series. Let us see an example. 

In [92]:
ser=pd.Series(np.random.randint(1,100,10))
ser

0    16
1     5
2    68
3     3
4    62
5    37
6    30
7    38
8     2
9    25
dtype: int32

In [96]:
ser[ser>=ser.mean()]

2    68
4    62
5    37
6    30
7    38
dtype: int32

See? This is the kind of indexing we can do with Series. Let us see some operations on Pandas Series now. 

### Operations on Pandas Series

Now, we will perform some basic operations - arithmetic, membership and logical on the Pandas series. Let us create a series now. 

In [88]:
nat=pd.Series(np.random.randint(1,10,10))
nat

0    6
1    3
2    4
3    7
4    2
5    4
6    3
7    3
8    8
9    5
dtype: int32

#### Arithmetic Operations

In the case of Arithmetic Operations, it will __follow all the limitations of Python Lists__, because it has a multi-data type structure. Hence, only some global operators like addition and multiplication would work on a Pandas series containing all the 4 data types – integer, float, boolean and strings. 

If the Pandas series is of a __single data type__, then we can perform __all the allowed operations__ on that data type to it. Let us see some examples.


In [98]:
s1=pd.Series([1,2.5,'three',True])
s1

0        1
1      2.5
2    three
3     True
dtype: object

In [104]:
s1*4

0                       4
1                    10.0
2    threethreethreethree
3                       4
dtype: object

As you can see, the multiplication operator supports all data types, so it performs multiplication on each element as per its limitations, in this case, repetition in case of strings.

In [103]:
s1+s1

0             2
1           5.0
2    threethree
3             2
dtype: object

Similarily, the addition operator also supports all of the data types, so it simply added them, concatenation in case of strings. Other operations come with their limitations, but can be done on similar data types. Let us see an example.

In [106]:
s1=pd.Series(np.random.randint(2,100,10))
s2=pd.Series(np.random.randint(2,100,10))

In [107]:
s1-s2

0    12
1   -73
2    63
3    65
4   -15
5    -8
6   -77
7   -44
8    27
9     0
dtype: int32

In [109]:
s1%s2

0    12
1    24
2    28
3    11
4     7
5    57
6    14
7    21
8    27
9     0
dtype: int32

You can see that all the operations can be performed in the same manner as NumPy arrays. Let us see now for logical operations.

#### Logical Operations

Logical Operations are the `and (&)` the `or (|)` and the `not (!)` operations. we can use logical operations <font color="darkred">only on the same data type Pandas series</font>, because we usually need comparison operators for that, and their limitation is similar or same data type. Let us see an example.

In [115]:
s1

0    92
1    24
2    98
3    92
4     7
5    57
6    14
7    21
8    85
9    20
dtype: int32

In [120]:
s1[(s1>=5) & (s1<50)]

1    24
4     7
6    14
7    21
9    20
dtype: int32

In [122]:
s1[5]>50 or s1[7]%3==0

True

In [123]:
s1[9]!=3

True

#### Membership Operations

The membership operations are the <code>in</code> and the <code>not in</code> operations performed on the series. They also carry the same limitations, the only difference being that __they check conditions for the indexes__, not the values. Let us see some examples.

In [129]:
s1

0    92
1    24
2    98
3    92
4     7
5    57
6    14
7    21
8    85
9    20
dtype: int32

In [130]:
4 in s1

True

In [131]:
92 in s1

False

See? Even though the first statement is False, and the second statement is True, it is showing the opposite, because that is in the case for indexes. This is an important difference between Pandas series and NumPy arrays.

Now, let us look at some methods for Series. 

### Methods for Pandas Series

We can see how efficient Pandas Series are, without using many methods. Nevertheless, it is important to know some of the methods for Series. They follow the following syntax - 

__`identifier.methodname(arguments)`__

The most widely used methods are -

- <b>`add()`,`sub()`,`mul()`,`div()`</b> - These methods are used to return the series of sum, difference, product and float quotient of the Pandas series to the called series. They take a keyword `fill_values=0` if the lengths of the series do not match.
- <b>`sum()`,`prod()`,`mean()`,`cov()`</b> - These methods return the sum, product, mean and covariance of the values inside a Pandas series.
- <b>`pow()`,`abs()`</b> - These methods return the series of the exponential value and the absolute value of an array respectively. 

Let us see some examples.

In [138]:
s1

0    92
1    24
2    98
3    92
4     7
5    57
6    14
7    21
8    85
9    20
dtype: int32

In [139]:
s2

0    80
1    97
2    35
3    27
4    22
5    65
6    91
7    65
8    58
9    20
dtype: int32

In [137]:
s1.sum(),s1.prod(),s1.mean(),s1.cov(s2)

(510, -839813120, 51.0, -125.55555555555554)

In [142]:
s1.div(s2)

0    1.150000
1    0.247423
2    2.800000
3    3.407407
4    0.318182
5    0.876923
6    0.153846
7    0.323077
8    1.465517
9    1.000000
dtype: float64

In [145]:
s1.sub(s2)

0    12
1   -73
2    63
3    65
4   -15
5    -8
6   -77
7   -44
8    27
9     0
dtype: int32

In [146]:
s1.pow(2)

0    8464
1     576
2    9604
3    8464
4      49
5    3249
6     196
7     441
8    7225
9     400
dtype: int32

In [150]:
(s1*-3).abs()

0    276
1     72
2    294
3    276
4     21
5    171
6     42
7     63
8    255
9     60
dtype: int32

Some other methods are used to perform some logical operations on the Pandas Series - 

- <b>`count()`,`size`</b> - These methods return the __number of non-null__ elements and __total elements__ in the series respectively. 
- <b>`between()`</b> - This method returns the __boolean series telling if the element lies between two numbers__ inclusively.
- <b>`max()`,`idxmax()`</b> - These methods return the __maximum value and its index__ in the series respectively.
- <b>`min()`,`idxmin()`</b> - These methods return the __minimum value and its index__ in the series respectively.
- <b>`sort_values()`,`sort_index()`</b> - These methods are used to __sort the series by values and by index__ respectively.
- <b>`head()`,`tail()`</b> - These methods return the __first and last `n` entries__ in the series.
- <b>`astype()`</b> - This method is used to __change the data type__ of the series.
- <b>`unique()`,`nunique()`,`value_counts()`</b> - These methods return the __array of unique values__, the __number of unique values__, and the __series having number of unique values for each unique number__ in the series respectively. 

Let us see them in action.

In [152]:
s1

0    92
1    24
2    98
3    92
4     7
5    57
6    14
7    21
8    85
9    20
dtype: int32

In [156]:
s1.count(),s1.size

(10, 10)

In [159]:
s1.max(),s1.idxmax(),s1.min(),s1.idxmin()

(98, 2, 7, 4)

In [162]:
s1.between(7,57)

0    False
1     True
2    False
3    False
4     True
5     True
6     True
7     True
8    False
9     True
dtype: bool

In [163]:
s1.head(4)

0    92
1    24
2    98
3    92
dtype: int32

In [164]:
s1.tail(3)

7    21
8    85
9    20
dtype: int32

In [183]:
s1.astype(str)

0    92
1    24
2    98
3    92
4     7
5    57
6    14
7    21
8    85
9    20
dtype: object

In [187]:
s1.sort_values()

4     7
6    14
9    20
7    21
1    24
5    57
8    85
0    92
3    92
2    98
dtype: int32

In [174]:
s3=pd.Series(np.random.randint(1,5,10),dtype=float)
s3

0    2.0
1    2.0
2    3.0
3    2.0
4    3.0
5    1.0
6    1.0
7    1.0
8    3.0
9    3.0
dtype: float64

In [175]:
s3.nunique(),s3.unique()

(3, array([2., 3., 1.]))

In [176]:
s3.value_counts()

3.0    4
2.0    3
1.0    3
dtype: int64

It is important to note that __all these functions are return functions__, i.e. they do not overwrite or modify your existing pandas series. They have to be assigned to a variable. 

This is how we use Pandas Series. These have been derived from NumPy, but hold much greater flexibility and power, in many areas, as shown above. 