# Featuretools Demo

Vijay, Kowshik, Rahul

Let us first install and import Featuretools

In [1]:
!pip install featuretools
import featuretools as ft

Collecting featuretools
[?25l  Downloading https://files.pythonhosted.org/packages/78/2a/8cee22677c82c6ecdaa6ffb320ea4384bfeb859ad620319684793b313326/featuretools-0.7.0-py3-none-any.whl (199kB)
[K    100% |████████████████████████████████| 204kB 3.8MB/s ta 0:00:01
[?25hCollecting psutil>=5.4.8 (from featuretools)
[?25l  Downloading https://files.pythonhosted.org/packages/2f/b8/11ec5006d2ec2998cb68349b8d1317c24c284cf918ecd6729739388e4c56/psutil-5.6.1.tar.gz (427kB)
[K    100% |████████████████████████████████| 430kB 4.1MB/s ta 0:00:0111
[?25hCollecting scikit-learn>=0.20.0 (from featuretools)
[?25l  Downloading https://files.pythonhosted.org/packages/aa/cc/a84e1748a2a70d0f3e081f56cefc634f3b57013b16faa6926d3a6f0598df/scikit_learn-0.20.3-cp37-cp37m-manylinux1_x86_64.whl (5.4MB)
[K    100% |████████████████████████████████| 5.4MB 4.5MB/s eta 0:00:011
Collecting dask>=1.1.0 (from featuretools)
[?25l  Downloading https://files.pythonhosted.org/packages/67/a2/0f0e777400f8288e80855661

Load sample dataset and check the contents

In [None]:
data = ft.demo.load_mock_customer()

In [5]:
customers_df = data["customers"]
customers_df

Unnamed: 0,customer_id,zip_code,join_date,date_of_birth
0,1,60091,2011-04-17 10:48:33,1994-07-18
1,2,13244,2012-04-15 23:31:04,1986-08-18
2,3,13244,2011-08-13 15:42:34,2003-11-21
3,4,60091,2011-04-08 20:08:14,2006-08-15
4,5,60091,2010-07-17 05:27:50,1984-07-28


In [6]:
sessions_df = data["sessions"]
sessions_df.sample

Unnamed: 0,session_id,customer_id,device,session_start
13,14,1,tablet,2014-01-01 03:28:00
6,7,3,tablet,2014-01-01 01:39:40
1,2,5,mobile,2014-01-01 00:17:20
28,29,1,mobile,2014-01-01 07:10:05
24,25,3,desktop,2014-01-01 05:59:40


In [7]:
transactions_df = data["transactions"]
transactions_df.sample(5)

Unnamed: 0,transaction_id,session_id,transaction_time,product_id,amount
74,232,5,2014-01-01 01:20:10,1,139.2
231,27,17,2014-01-01 04:10:15,2,90.79
434,36,31,2014-01-01 07:50:10,3,62.35
420,56,30,2014-01-01 07:35:00,3,72.7
54,444,4,2014-01-01 00:58:30,4,43.59


## We have raw data now. We need to perform two steps to transform the data to the format required by Featuretools.

1) Represent data as entities, by creating a dictionary of entity names which contains a tuple of the dataframe and its indices. 

2) Represent the relationship between entities using the format

(parent_entity, parent_variable, child_entity, child_variable)

In [8]:
entities = {
    "customers" : (customers_df, "customer_id"),
    "sessions" : (sessions_df, "session_id", "session_start"),
    "transactions" : (transactions_df, "transaction_id", "transaction_time")
}

relationships = [("sessions", "session_id", "transactions", "session_id"),
                 ("customers", "customer_id", "sessions", "customer_id")]

# Running Deep Feature Synthesis

DFS input: A set of entities, a list of relationships, and the “target_entity” to calculate features for
    
Output of DFS is a feature matrix and the corresponding list of feature defintions.

In [9]:
feature_matrix_customers, features_defs = ft.dfs(entities=entities,
                                                 relationships=relationships, 
                                                 target_entity="customers")
feature_matrix_customers

Unnamed: 0_level_0,zip_code,COUNT(sessions),NUM_UNIQUE(sessions.device),MODE(sessions.device),SUM(transactions.amount),STD(transactions.amount),MAX(transactions.amount),SKEW(transactions.amount),MIN(transactions.amount),MEAN(transactions.amount),...,NUM_UNIQUE(sessions.MODE(transactions.product_id)),NUM_UNIQUE(sessions.DAY(session_start)),NUM_UNIQUE(sessions.YEAR(session_start)),NUM_UNIQUE(sessions.MONTH(session_start)),NUM_UNIQUE(sessions.WEEKDAY(session_start)),MODE(sessions.MODE(transactions.product_id)),MODE(sessions.DAY(session_start)),MODE(sessions.YEAR(session_start)),MODE(sessions.MONTH(session_start)),MODE(sessions.WEEKDAY(session_start))
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,60091,8,3,mobile,9025.62,40.442059,139.43,0.019698,5.81,71.631905,...,4,1,1,1,1,4,1,2014,1,2
2,13244,7,3,desktop,7200.28,37.705178,146.81,0.098259,8.73,77.422366,...,4,1,1,1,1,3,1,2014,1,2
3,13244,6,3,desktop,6236.62,43.683296,149.15,0.41823,5.89,67.06043,...,4,1,1,1,1,1,1,2014,1,2
4,60091,8,3,mobile,8727.68,45.068765,149.95,-0.036348,5.73,80.070459,...,5,1,1,1,1,1,1,2014,1,2
5,60091,6,3,mobile,6349.66,44.09563,149.02,-0.025941,7.55,80.375443,...,5,1,1,1,1,3,1,2014,1,2


## All we did was specify our required target variable and pass the tables as input. Featuretools gives us a meaningful representation by creating new features from primitive data.