# Creating dictionary of column data types
A problem that I didn't immediately know how to solve in Python was how to determine the data type of each of my data frame's columns, mostly to aid in subsetting my data and performing class-specific operations. 

I found this solution via [StackExchange](http://stackoverflow.com/a/22475141/5454389) but it wasn't obvious to me why it worked. So I created this notebook to break it down step by step to better understand the solution.

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Create example data frame
df = pd.DataFrame({'X1':range(5),'X2':['a','b','c','d','e'],'X3':range(5)})
df

Unnamed: 0,X1,X2,X3
0,0,a,0
1,1,b,1
2,2,c,2
3,3,d,3
4,4,e,4


In [3]:
# Check the data types of the columns
df.dtypes

X1     int64
X2    object
X3     int64
dtype: object

So I have 3 columns, 2 are int64 and one is an object. I want to be able to select out just the int64 and just the object columns. 

### Creating Data Type Groups
1. First I want to group the columns according to data type

In [4]:
# Group the variable names according to their data types
groups = df.dtypes.groupby(df.dtypes).groups
groups

{dtype('int64'): ['X1', 'X3'], dtype('O'): ['X2']}

This looks nice, as we can see visually that the columns have been grouped appropriately. However, we can't quite put this information to use yet, as shown below

In [5]:
df[groups['int64']]

KeyError: 'int64'

In [6]:
df[groups['dtype('int64')']]

SyntaxError: invalid syntax (<ipython-input-6-72e8f92490d5>, line 1)

So we'll need to manipulate it into a different dictionary.

### Creating Data Type Dictionary
First, let's loop through and see what our dictionary structure looks like.

In [7]:
# Loop through each key value pair
for k,v in groups.items():
    print("key.name, values = '{0}', {1}".format(k.name,v))

key.name, values = 'int64', ['X1', 'X3']
key.name, values = 'object', ['X2']


This looks good, if we store the key name and the associated values we should have a perfect working dictionary.

In [8]:
# Create a new dictionary entry for each key.name, and store the associated values
data_dict = {k.name:v for k,v in groups.items()}
data_dict

{'int64': ['X1', 'X3'], 'object': ['X2']}

Let's try to call the dictionary now

In [9]:
df[data_dict['int64']]

Unnamed: 0,X1,X3
0,0,0
1,1,1
2,2,2
3,3,3
4,4,4


In [10]:
df[data_dict['object']]

Unnamed: 0,X2
0,a
1,b
2,c
3,d
4,e


This is exactly what we want! Now we can perform any operations we would like on the columns according to their data types.

### tl;dr
If you want to create a dictionary for your columns' data types, use the following function

In [11]:
data_types = {k.name:v for k,v in df.dtypes.groupby(df.dtypes).groups.items()}