# pandas lesson 1 (Series)

## Introduction

pandas is *the* library for data analysis in Python.  It has two data structures:
* the Series for 1D labelled data such as a single row or column,
* the DataFrame for 2D data such as a table.

 These lessons shows examples of typical operations on a pandas DataFrame including:
* select a subset of columns
* calculate new columns
* filter rows by values or by index
* sort rows by index or by values
* group and summarise
* handle missing values

A good place to get started with pandas is at https://pandas.pydata.org/getting_started.html


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt # pandas uses matplotlib for plotting

## pandas Series (a 1D labelled array)

A Series is a 1D labelled array.  By default the labels are position-based integers, starting at 0.  Labels don't need to be unique.  The elements of a Series are usually of the same type. A Series may become a column in a dataframe (table) so we should expect this. These types include various types of number (ints and floats) and strings (objects).

We can create a Series in many ways, for example from a list.

### Create a Series

In [2]:
# Create a Series form a list of first names
names = ['Harry', 'Hermione', 'Ron']
first_names = pd.Series(names)
first_names

Unnamed: 0,0
0,Harry
1,Hermione
2,Ron


In [3]:
#  We can pass in an index when creating a Series
initials = ['hp', 'hg', 'rw']
first_names = pd.Series(names, index=initials)
first_names

Unnamed: 0,0
hp,Harry
hg,Hermione
rw,Ron


Examine the first_names Series.  You may want to try these properties and methods: index, values, dtype, shape, ndim, size.

In [7]:
# Write your code here as a set of print statements. The first one is provided.
print("Index:", first_names.index)
print("Shapes:", first_names.shape)
print("names:", first_names)

Index: Index(['hp', 'hg', 'rw'], dtype='object')
Shapes: (3,)
names: hp       Harry
hg    Hermione
rw         Ron
dtype: object


We can create a Series with a defined size and initialize with fixed or random values.

In [8]:
# a Series of 4 random numbers with mean 10 and standard deviation 1
pd.Series(np.random.randn(4), name = 'price') + 10 # name is an optional parameter

Unnamed: 0,price
0,9.748923
1,11.199383
2,11.06342
3,11.219069


### Access elements in a Series

We can access elements of the Series
* by position using the iloc property,  or
* by their index using the loc property.

In [9]:
# returns the item in the 2nd position
first_names.iloc[1]

'Hermione'

In [10]:
# returns the item in the 2nd and 3rd positions
first_names.iloc[1:3]

Unnamed: 0,0
hg,Hermione
rw,Ron


In [11]:
#  Returns the element but using the index label
first_names.loc['rw']

'Ron'

Note that when slicing with loc, the syntax is inclusive (and not the usual Pythton syntax!).

In [12]:
first_names.loc['hp':'rw']

Unnamed: 0,0
hp,Harry
hg,Hermione
rw,Ron


We can use the index to set values from the Series.

In [13]:
print("before change:", first_names.loc['rw'])
first_names['rw'] = 'Ronald' # set a value in the Series
print("after change:", first_names.loc['rw'])

before change: Ron
after change: Ronald


We can use *in* to see if the index value is in the Series

In [14]:
'hg' in first_names # check if an index is in the Series

True

### Element wise operations

An element wise operation is one that is performed on every element of the Series. For example,  multiply all values by 100

The examples in this section use a Series of 5 numbers named prices that is defined below.

In [15]:
prices = pd.Series([10, 12, 15, 18, 16])
prices

Unnamed: 0,0
0,10
1,12
2,15
3,18
4,16


THis multiplies every elemnt by a scalar value

In [16]:
prices * 100

Unnamed: 0,0
0,1000
1,1200
2,1500
3,1800
4,1600


Add 10 to each value in the prices Series.

In [17]:
# Write your code here
prices +10

Unnamed: 0,0
0,20
1,22
2,25
3,28
4,26


We can aggregate (e.g sum. average) the values in a Series either
* using a numpy method, e.g. np.sum(prices)
* a method on the Series e.g. prices.sum()

In [18]:
np.sum(prices) # total value of all elements

np.int64(71)

In [19]:
prices.sum()

np.int64(71)

Find the min, max, average, median and other summary statistics of the prices Series

In [22]:
# Write your code here as a set of print statements. The first one is provided.
print(f"minimum: numpy method {np.min(prices)}, Series method {prices.min()}")

#np.min(prices)
prices.min()

minimum: numpy method 10, Series method 10


10