## 10. How do I change the data type of a pandas Series?

Changing data type of pandas series to appropriate types extends applicable functions on the series and is sematically correct way of working with data. In this blog, we will learn to change data type of pandas series to appropritae types.

In [1]:
import pandas as pd

We will use the 'drinks' dataset which contains alcohol consumption by country. We can check the data type of the columns using the ‘dtypes’ attribute. Three of the columns are integers, one float, and two object. 

In [2]:
drinks = pd.read_csv("http://bit.ly/drinksbycountry")
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0,0,0,0.0,Asia
1,Albania,89,132,54,4.9,Europe
2,Algeria,25,0,14,0.7,Africa
3,Andorra,245,138,312,12.4,Europe
4,Angola,217,57,45,5.9,Africa


In [3]:
drinks.dtypes

country                          object
beer_servings                     int64
spirit_servings                   int64
wine_servings                     int64
total_litres_of_pure_alcohol    float64
continent                        object
dtype: object

### 10.1. Changing data type of one column at once

We can change the data type of a series using ‘astype( )’ as a series method, and pass the data-type we want to change the series as the parameter. Notice how we can change the data type of ‘beer_servings’ and ’continent’ columns.

In [4]:
drinks["beer_servings"] = drinks.beer_servings.astype("float")

In [5]:
drinks["continent"] = drinks.continent.astype("category")

In [6]:
drinks.dtypes

country                           object
beer_servings                    float64
spirit_servings                    int64
wine_servings                      int64
total_litres_of_pure_alcohol     float64
continent                       category
dtype: object

### 10.2. Changing data type of multiple columns at once

We can change the data type for multiple columns at once by using ‘astype’ as the DataFrame method. We form a dictionary with the series name as ‘key’ and data type as ‘value’, and pass it to ‘astype’ as the parameter.

In [7]:
drinks = drinks.astype({"wine_servings":"float", "spirit_servings":"float"})

In [8]:
drinks.dtypes

country                           object
beer_servings                    float64
spirit_servings                  float64
wine_servings                    float64
total_litres_of_pure_alcohol     float64
continent                       category
dtype: object

### 10.3. Changing data type of series while reading the dataset

We can change the data type of series while reading the dataset using the parameter ‘dtype’. We form a dictionary with the series name as ‘key’ and data type as ‘value’ and equate it to the ‘dtype’ parameter.

In [9]:
drinks = pd.read_csv("http://bit.ly/drinksbycountry", dtype={"beer_servings":"float", "wine_servings":"float",
                                                             "spirit_servings":"float", "continent":"category"})
drinks.head()

Unnamed: 0,country,beer_servings,spirit_servings,wine_servings,total_litres_of_pure_alcohol,continent
0,Afghanistan,0.0,0.0,0.0,0.0,Asia
1,Albania,89.0,132.0,54.0,4.9,Europe
2,Algeria,25.0,0.0,14.0,0.7,Africa
3,Andorra,245.0,138.0,312.0,12.4,Europe
4,Angola,217.0,57.0,45.0,5.9,Africa


In [10]:
drinks.dtypes

country                           object
beer_servings                    float64
spirit_servings                  float64
wine_servings                    float64
total_litres_of_pure_alcohol     float64
continent                       category
dtype: object

### 10.4. Using 'astype' along with string methods

For learning ‘astype’ with string methods in pandas, we will use an online orders dataset from Chipotle restaurant chain.

In [11]:
orders = pd.read_table("http://bit.ly/chiporders")
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


In [12]:
orders.dtypes

order_id               int64
quantity               int64
item_name             object
choice_description    object
item_price            object
dtype: object

Let’s pretend we want to perform calculations using the ‘item_price’ column. Notice that the data type assigned to the column ‘object’ during reading the dataset is not compatible with athematic operations. The reason panda assigned it to type ‘object’ was the ‘$’ sign. So we must first remove it from the series and then use ‘astype’ to change it to numeric type. The final series will be compatible with mathematical functions. We will overwrite the ‘item_price’ column with the final series. Note that you can specify some data types with or without quotation marks.

In [13]:
orders.item_price.str.replace("$","").astype(float).mean()

7.464335785374397

In [14]:
orders["item_price"] = orders.item_price.str.replace("$", "").astype("float")
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,2.39
1,1,1,Izze,[Clementine],3.39
2,1,1,Nantucket Nectar,[Apple],3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98


We have already learned to convert Boolean series to 0 and 1 series using ‘map( )’ method. However, if we are working with Booleans we can change ‘True’ to 1 and ‘False’ to 0 easily using ‘astype( )’ method. For mapping non-Boolean values, we generally use ‘map( )’ method.

In [15]:
orders.item_name.str.contains("chicken").astype(int).head()

0    0
1    0
2    0
3    0
4    0
Name: item_name, dtype: int32

In [16]:
orders["contains_chicken"] = orders.item_name.str.contains("chicken").astype(int).head()
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price,contains_chicken
0,1,1,Chips and Fresh Tomato Salsa,,2.39,0.0
1,1,1,Izze,[Clementine],3.39,0.0
2,1,1,Nantucket Nectar,[Apple],3.39,0.0
3,1,1,Chips and Tomatillo-Green Chili Salsa,,2.39,0.0
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",16.98,0.0
