# Variable conversion

In this activity you will learn to convert variables from one type into the other.

## Numeric to categorical

Consider the wine dataset we used earlier:

In [1]:
import sklearn.datasets as datasets
import pandas as pd
import numpy as np

dataset = datasets.load_wine()
X = pd.DataFrame(data=dataset['data'], columns=dataset['feature_names'])

print(X.head())

   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  
0                  

Let's first bin the variable 'flavanoids' into 5 bins using pandas:

In [None]:
flavanoids = pd.cut(X['flavanoids'], 5)
print(flavanoids.value_counts())

Notice that the bins are all of an equal width, but the distribution is uneven.
We can use a different function to obtain equal-size bins:

In [None]:
flavanoids = pd.qcut(X['flavanoids'], 5)
print(flavanoids.value_counts())

## Categorical to numeric

Let's create a colour variable:

In [None]:
colours = ['blue', 'red', 'green', 'yellow']
colour_array = np.random.choice(colours, 100, p=[0.5, 0.1, 0.1, 0.3])
print(colour_array)

We can easily obtain dummies by using the following code:

In [None]:
dummy_colours = pd.get_dummies(colour_array, prefix='color', drop_first=True)
dummy_colours.head()

Notice that blue is not included? All encoding is relative to the presence of blue. This is due to the ```drop_first``` parameter.

We can also use scikit-learn:

In [None]:
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# We can use a label encoder to transform categories into numbers
enc = LabelEncoder()
colour_label = enc.fit_transform(colour_array)
print(colour_label)

You will notice that every colour now has its own integer value.