## Ordinal numbering encoding

**Ordinal categorical variables**

Categorical variable which categories can be meaningfully ordered are called ordinal. For example:

- Student's grade in an exam (A, B, C or Fail).
- Days of the week can be ordinal with Monday = 1, and Sunday = 7.
- Educational level, with the categories: Elementary school,  High school, College graduate, PhD ranked from 1 to 4.

When the categorical variable is ordinal, the most straightforward approach is to replace the labels by some ordinal number.

### Advantages

- Keeps the semantical information of the variable (human readable content)
- Straightforward

### Disadvantage

- Does not add machine learning valuable information

I will simulate some data below to demonstrate this exercise

In [1]:
import pandas as pd
import datetime

In [2]:
# create a variable with dates, and from that extract the weekday
# I create a list of dates with 30 days difference from today
# and then transform it into a datafame

base = datetime.datetime.today()
date_list = [base - datetime.timedelta(days=x) for x in range(0, 30)]
df = pd.DataFrame(date_list)
df.columns = ['day']
df

Unnamed: 0,day
0,2017-11-24 23:37:17.497960
1,2017-11-23 23:37:17.497960
2,2017-11-22 23:37:17.497960
3,2017-11-21 23:37:17.497960
4,2017-11-20 23:37:17.497960
5,2017-11-19 23:37:17.497960
6,2017-11-18 23:37:17.497960
7,2017-11-17 23:37:17.497960
8,2017-11-16 23:37:17.497960
9,2017-11-15 23:37:17.497960


In [3]:
# extract the week day name

df['day_of_week'] = df['day'].dt.weekday_name
df.head()

Unnamed: 0,day,day_of_week
0,2017-11-24 23:37:17.497960,Friday
1,2017-11-23 23:37:17.497960,Thursday
2,2017-11-22 23:37:17.497960,Wednesday
3,2017-11-21 23:37:17.497960,Tuesday
4,2017-11-20 23:37:17.497960,Monday


In [4]:
# Engineer categorical variable by ordinal number replacement

weekday_map = {'Monday':1,
               'Tuesday':2,
               'Wednesday':3,
               'Thursday':4,
               'Friday':5,
               'Saturday':6,
               'Sunday':7
}

df['day_ordinal'] = df.day_of_week.map(weekday_map)
df.head(10)

Unnamed: 0,day,day_of_week,day_ordinal
0,2017-11-24 23:37:17.497960,Friday,5
1,2017-11-23 23:37:17.497960,Thursday,4
2,2017-11-22 23:37:17.497960,Wednesday,3
3,2017-11-21 23:37:17.497960,Tuesday,2
4,2017-11-20 23:37:17.497960,Monday,1
5,2017-11-19 23:37:17.497960,Sunday,7
6,2017-11-18 23:37:17.497960,Saturday,6
7,2017-11-17 23:37:17.497960,Friday,5
8,2017-11-16 23:37:17.497960,Thursday,4
9,2017-11-15 23:37:17.497960,Wednesday,3


We can now use the variable day_ordinal in sklearn to build machine learning models.