# Pandas Series

Pandas Series is a special data structure defined in Pandas package. It is a one-dimensional labeled array which holds data of any type (integer, string, float, objects, etc.). Pandas Series can be regarded as a column in a Pandas dataframe.

Docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html

## Pandas Series in a DataFrame

In [1]:
import pandas as pd
my_dict = {
    'First Name':['James','Alexander', 'Ashley'],
    'Last Name':['Cook', 'Bell', 'Mason'],
    'Age':[38, 42, 23]
}
my_dict
df = pd.DataFrame(my_dict)
df #this is a Pandas dataframe

Unnamed: 0,First Name,Last Name,Age
0,James,Cook,38
1,Alexander,Bell,42
2,Ashley,Mason,23


In [2]:
df.Age #this is a pandas series (column wise). It is labelled (i.e. indexed for each row)

0    38
1    42
2    23
Name: Age, dtype: int64

In [6]:
df['First Name']

0        James
1    Alexander
2       Ashley
Name: First Name, dtype: object

In [3]:
df.loc[0] #this is a pandas series (row wise). It is labaled as well. 

First Name    James
Last Name      Cook
Age              38
Name: 0, dtype: object

### Creating Pandas Series

### Creating a Series from Array

In [2]:
import pandas as pd
import numpy as np
 
# define a numpy array
data = np.array(['James','Alexander', 'Ashley'])
 
ps = pd.Series(data)
ps

0        James
1    Alexander
2       Ashley
dtype: object

### Creating a Series from Dictionary

In [38]:
my_dict = {'a' : 10, 
        'b' : 20, 
        'c' : 30} 

ps = pd.Series(my_dict)
ps

a    10
b    20
c    30
dtype: int64

### Creating a Series from List

In [10]:
# a simple list
myList = ['James','Alexander', 'Ashley']
  
# create series form a list
ps = pd.Series(myList)
ps

0        James
1    Alexander
2       Ashley
dtype: object

### Accessing Elements
One can access elements of a Pandas series through either with position or using label (index).

In [16]:
df

Unnamed: 0,First Name,Last Name,Age
0,James,Cook,38
1,Alexander,Bell,42
2,Ashley,Mason,23


In [42]:
ps = df.loc[0] #now I have a Pandas series
ps

First Name    James
Last Name      Cook
Age              38
Name: 0, dtype: object

In [21]:
ps[1] #access by position

'Cook'

In [22]:
ps['Last Name'] access by label

'Cook'

### Adding Elements
We can add elements of pandas series type.

In [28]:
ps

First Name    James
Last Name      Cook
Age              38
Name: 0, dtype: object

In [29]:
ps.append('Engineer') #throws an error because the element is not of Series and DataFrame objs type

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

In [32]:
ps1 = pd.Series(['Engineer']) #first create Pandas series from a list
ps.append(ps1) #this will not update ps. it will create a new Pandas series

First Name       James
Last Name         Cook
Age                 38
0             Engineer
dtype: object

In [33]:
ps

First Name    James
Last Name      Cook
Age              38
Name: 0, dtype: object

In [40]:
ps1 = pd.Series(data = ['Engineer'], index = ['Occupation']) #create a series with a single element with ints label
ps1

Occupation    Engineer
dtype: object

In [43]:
ps.append(ps1)

First Name       James
Last Name         Cook
Age                 38
Occupation    Engineer
dtype: object

### Operations on Series
Algebraic operations such as addition, subtraction etc. can be done Pandas series through functions like .add(),.sub(),	.mul(), .div(), .sum(), .prod(), .mean(), .pow(), .abs() etc.<br/>
We can also perform various operations like changing datatype of series, changing a series to list etc.<br/>
For complete list of function: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html

In [3]:
my_data = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) # create a series
 
my_data1 = pd.Series([5, 6, 7, 8], index=['a', 'b', 'c', 'd']) # create anoter series with exact same labels

my_data.add(my_data1) 

a     6
b     8
c    10
d    12
dtype: int64

In [49]:
my_data = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd']) # create a series
 
my_data1 = pd.Series([5, 6, 7, 8], index=['a', 'b', 'd', 'e']) # create anoter series with different labels

my_data.add(my_data1, fill_value = 0) #we may specify how to treat the missing values. default is to assume 0 for missing ones. 

a     6.0
b     8.0
c     3.0
d    11.0
e     8.0
dtype: float64

Let's see how we can convert data types as Pandas series.

In [83]:
import pandas as pd
# titanic = pd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")
titanic = pd.read_csv("https://sites.google.com/site/yasinunlu/home/research/new1/Titanic_train.csv")
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [76]:
titanic[["Age"]].head()

Unnamed: 0,Age
0,22.0
1,38.0
2,26.0
3,35.0
4,35.0


In [52]:
titanic.Age.dtype

dtype('float64')

In [64]:
titanic["Age"].astype(int) 

ValueError: Cannot convert non-finite values (NA or inf) to integer

In [84]:
titanic["Age"] = titanic["Age"].fillna(0) #first takes care of NAs. Fill with 0s.

In [85]:
titanic["Age"] = titanic["Age"].astype(int) #then convert data type

In [86]:
titanic.Age.dtype 

dtype('int32')

In [87]:
titanic[["Age"]].head()

Unnamed: 0,Age
0,22
1,38
2,26
3,35
4,35


In [94]:
dtype1 = type(titanic["Name"]) 
dtype1 #this is a pandas series

pandas.core.series.Series

In [97]:
my_list = titanic['Name'].tolist() #convert Pandas series to a Python list
dtype2 = type(my_list)
dtype2 #this is a Python list

list

### Pandas.Series.map()
This function ties together the values from one object to another.<br/>
Series.map(arg, na_action=None)<br/>
**arg**: function, dict, or series<br/>
**na_action**: None by default. If 'ignore', propagate NaN values, without passing them to the mapping correspondence.

In [99]:
help(pd.Series.map)

Help on function map in module pandas.core.series:

map(self, arg, na_action=None)
    Map values of Series according to input correspondence.
    
    Used for substituting each value in a Series with another value,
    that may be derived from a function, a ``dict`` or
    a :class:`Series`.
    
    Parameters
    ----------
    arg : function, collections.abc.Mapping subclass or Series
        Mapping correspondence.
    na_action : {None, 'ignore'}, default None
        If 'ignore', propagate NaN values, without passing them to the
        mapping correspondence.
    
    Returns
    -------
    Series
        Same index as caller.
    
    See Also
    --------
    Series.apply : For applying more complex functions on a Series.
    DataFrame.apply : Apply a function row-/column-wise.
    DataFrame.applymap : Apply a function elementwise on a whole DataFrame.
    
    Notes
    -----
    When ``arg`` is a dictionary, values in Series that are not in the
    dictionary (as keys) ar

Let's create a very simple example to see how useful .map() is.

In [4]:
ps = pd.Series(['cat', 'dog', np.nan, 'rabbit', 'cat', 'dog', 'dog', 'dog','lion','panther']) #create a Pandas series
ps

0        cat
1        dog
2        NaN
3     rabbit
4        cat
5        dog
6        dog
7        dog
8       lion
9    panther
dtype: object

In [7]:
my_Dict = {'cat': 'domestic', 'dog': 'domestic', 'rabbit':'domestic', 'lion': 'wild', 'panther': 'wild'}
ps.map(my_Dict, 'ignore') #

0    domestic
1    domestic
2         NaN
3    domestic
4    domestic
5    domestic
6    domestic
7    domestic
8        wild
9        wild
dtype: object

In [8]:
ps.map(my_Dict, None)

0    domestic
1    domestic
2         NaN
3    domestic
4    domestic
5    domestic
6    domestic
7    domestic
8        wild
9        wild
dtype: object

In [9]:
ps.map(my_Dict, 1)

0    domestic
1    domestic
2         NaN
3    domestic
4    domestic
5    domestic
6    domestic
7    domestic
8        wild
9        wild
dtype: object

In [10]:
ps = pd.Series([1, 2, np.nan, np.nan, 3, 4, 5])
ps

0    1.0
1    2.0
2    NaN
3    NaN
4    3.0
5    4.0
6    5.0
dtype: float64

In [11]:
def odd_even(x):
    if x %2 == 1:
        return 'Odd'
    else:
        return 'Even'

ps.map(odd_even, na_action = None) #values are passed into function
    # NaN values are treated as Even

0     Odd
1    Even
2    Even
3    Even
4     Odd
5    Even
6     Odd
dtype: object

In [12]:
ps.map(odd_even, na_action = 'ignore') #ignored. Those values are not passed into function.

0     Odd
1    Even
2     NaN
3     NaN
4     Odd
5    Even
6     Odd
dtype: object

Let's see this function on a large scale dataframe.

In [13]:
import pandas as pd
# titanic = pd.read_csv("https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv")
titanic = pd.read_csv("https://sites.google.com/site/yasinunlu/home/research/new1/Titanic_train.csv")
titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


Let's convert "Embarked" column into certain values.

In [14]:
from_where = {"S": 'South', "C": 'Central', "Q": 'Somewhere'}

titanic['Embarked'] = titanic['Embarked'].map(from_where) # keeps NaN values

titanic.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,South
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,Central
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,South
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,South
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,South


In [16]:
titanic[titanic['Embarked'].isna()]  # 

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
61,62,1,1,"Icard, Miss. Amelie",female,38.0,0,0,113572,80.0,B28,
829,830,1,1,"Stone, Mrs. George Nelson (Martha Evelyn)",female,62.0,0,0,113572,80.0,B28,
