# Derive Features from Per-Account Time Series
Look at the sequences of interactions and orders from each account and try to identify classes and patterns that may suggest features to use in distinguishing customers at risk from those who are not.

## Preliminaries
Set up credentials and functions for file I/O, enable PixieDust for visualization.

In [3]:
# The code was removed by DSX for sharing.

In [4]:
# The code was removed by DSX for sharing.

In [5]:
# Allow display of multiple values without using print()
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [6]:
import pixiedust

Pixiedust database opened successfully


## Load the prepared time series data

In [7]:
df = pd.read_csv(
    get_object_storage_file_with_credentials(
            'CableCompany', 'accountTimeSeries.csv'))
df.head()

Unnamed: 0,AccountId,OrderClass,OrderStatus,AccountDelinquencyStatus,OrderReasonCode
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF
1,10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.S...
2,10271483,SS,OC,NN,NT.SE
3,10271491,SS,OC,NN,NT.SE
4,10380306,SSSSSSSSTTTSSS,OOXOOXOXOOCOOX,APNAWNANTTTAPN,NP.NP.NP.NP.NP.NP.NP.NP.D2.D2.D2.NP.NP.NP


## Explore Features

### Lookup dictionaries for status and class codes
These come in handy when interpreting one-letter codes

In [8]:
status_codes = {
    '':'Normal',
    'A':'Open non-pay disconnect and equipment is active',
    'C':'Voluntary disconnect',
    'E':'Non-pay disconnect',
    'F':'Open non-pay disconnect and equipment is force tuned',
    'P':'Pending non-pay disconnect and services are restored; CSG assigns this status in real time',
    'S':'Pending change of service job (applies to subscription billing)',
    'T':'PPV ordering restricted',
    'V':'Open voluntary disconnect job',
    'W':'Open non-pay disconnect and equipment is disabled',
    'Z':'Charged off'
}

class_codes = {
    'M':'Special request',
    'S':'Service order',
    'T':'Trouble call'
}

### Is everything normal?
AccountDelinquencyStatus is 'N' for every interaction, meaning there is no delinquency and presumably there's nothing going on that could indicate that a customer is at risk. We don't need to look into their history in further detail.

In [15]:
def isAllNormal(s):
    '''
    Helper function for feature extraction.
    
    Returns 1.0 if time series string s consists of nothing but 'N' status codes;
    otherwise, 0.0.
    
    Intended to be used on AccountDelinquencyStatus values.
    
    Raises an exception if string is empty; this is legitimate, because there
    should be an account delinquency status for every interaction and there would
    not be an entry at all if there were no interactions.
    '''
    return float(s == len(s) * 'N')

In [16]:
# Make sure it works as expected
isAllNormal(df['AccountDelinquencyStatus'][0])
isAllNormal(df['AccountDelinquencyStatus'][1])

1.0

0.0

In [17]:
# Create new column with normal/not-normal indicator for every account history
df['IsNormal'] = df['AccountDelinquencyStatus'].apply(isAllNormal)
df.head()
df['IsNormal'].value_counts()

Unnamed: 0,AccountId,OrderClass,OrderStatus,AccountDelinquencyStatus,OrderReasonCode,IsNormal
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0
1,10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.S...,0.0
2,10271483,SS,OC,NN,NT.SE,1.0
3,10271491,SS,OC,NN,NT.SE,1.0
4,10380306,SSSSSSSSTTTSSS,OOXOOXOXOOCOOX,APNAWNANTTTAPN,NP.NP.NP.NP.NP.NP.NP.NP.D2.D2.D2.NP.NP.NP,0.0


1.0    373
0.0    200
Name: IsNormal, dtype: int64

### How active is this account?
Simply count the number of interactions

In [18]:
# OrderClass, OrderStatus, and AccountDelinquencyStatus should all have the same length, so it doesn't
# matter which column we use
df['NumInteractions'] = df['OrderClass'].apply(len)
df.head()

Unnamed: 0,AccountId,OrderClass,OrderStatus,AccountDelinquencyStatus,OrderReasonCode,IsNormal,NumInteractions
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10
1,10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.S...,0.0,17
2,10271483,SS,OC,NN,NT.SE,1.0,2
3,10271491,SS,OC,NN,NT.SE,1.0,2
4,10380306,SSSSSSSSTTTSSS,OOXOOXOXOOCOOX,APNAWNANTTTAPN,NP.NP.NP.NP.NP.NP.NP.NP.D2.D2.D2.NP.NP.NP,0.0,14


In [30]:
display(df)

AccountId,OrderClass,OrderStatus,AccountDelinquencyStatus,OrderReasonCode,IsNormal,NumInteractions,IsVoluntaryDisconnect,IsExcessiveCancelation
10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.SJ.DF,0.0,17,0.0,1.0
10271483,SS,OC,NN,NT.SE,1.0,2,0.0,0.0
10271491,SS,OC,NN,NT.SE,1.0,2,0.0,0.0
10380306,SSSSSSSSTTTSSS,OOXOOXOXOOCOOX,APNAWNANTTTAPN,NP.NP.NP.NP.NP.NP.NP.NP.D2.D2.D2.NP.NP.NP,0.0,14,0.0,1.0
10504548,SS,OC,VC,OT.OT,0.0,2,1.0,0.0
10678689,SSSS,OOOX,AWPN,NP.NP.NP.NP,0.0,4,0.0,0.0
11650153,TSSSTSSSTTS,OCOOOOCCOCC,NNNNNNNNNNN,H5.NT.NT.NT.H5.NT.NT.DF.H5.H5.SJ,1.0,11,0.0,0.0
11915788,SSSSS,CCCCC,NNCNN,NT.NT.NT.NT.NT,0.0,5,0.0,0.0
12258879,SSS,OOC,NNN,DF.DF.DF,1.0,3,0.0,0.0


## Did the account disconnect at the customer's request?
As a first cut at a possible "churn" label, identify customers who left at their request.

We still don't know how to tell if they were unhappy or left for other reasons, such as a move.

In [19]:
# Look for AccountDelinquencyStatus sequences ending in 'VC'.
# May need to verify in OrderStatus that the order was closed (C) and not canceled (X).
def isVoluntaryDisconnect(s):
    '''
    Helper function for feature extraction.
    
    Returns True if time series string s ends in 'VC', meaning 'Open voluntary
    disconnect job' followed by 'Voluntary disconnect'. Otherwise, False.
    
    Intended to be used on AccountDelinquencyStatus values.
    
    Raises an exception if string is empty; this is legitimate, because there
    should be an account delinquency status for every interaction and there would
    not be an entry at all if there were no interactions.
    '''
    return float(s.endswith('V[^VC]*C'))

In [20]:
df['IsVoluntaryDisconnect'] = df['AccountDelinquencyStatus'].apply(isVoluntaryDisconnect)
df['IsVoluntaryDisconnect'].value_counts()

0.0    523
1.0     50
Name: IsVoluntaryDisconnect, dtype: int64

### Verification: voluntary disconnect is not normal

In [26]:
df.query('IsVoluntaryDisconnect * IsNormal == 1.0').index.size

0

## Were there a lot of canceled orders?
This could be nonsense, but perhaps canceled orders can indicate dissatisfaction, the customer
changing their mind, orders not carried out as expected, etc.

In [27]:
def isExcessiveCancelation(s, threshold):
    '''
    Helper function for feature extraction.
    
    Returns True if the number of cancelations ('X') in the time series string s
    exceeds the threshold; otherwise, False.
    
    Intended to be used on OrderStatus values.
    
    Raises an exception if string is empty; this is legitimate, because there
    should be an order status for every interaction and there would
    not be an entry at all if there were no interactions.
    
    Also raises an exception is threshold is not numeric. That's simply a bad call.
    '''
    return float(s.count('X') > threshold)

In [28]:
threshold = 2
df['IsExcessiveCancelation'] = df['OrderStatus'].apply(lambda s: isExcessiveCancelation(s, threshold))
df['IsExcessiveCancelation'].value_counts()

0.0    566
1.0      7
Name: IsExcessiveCancelation, dtype: int64

In [29]:
df.query('IsExcessiveCancelation').head(10)

Unnamed: 0,AccountId,OrderClass,OrderStatus,AccountDelinquencyStatus,OrderReasonCode,IsNormal,NumInteractions,IsVoluntaryDisconnect,IsExcessiveCancelation
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
1,10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.S...,0.0,17,0.0,1.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
1,10221159,SSSSSSSSSTTSSSSSS,OOXOOXOOXOXCOOOXO,AWNAWNAWNNNNNNNNN,NP.NP.NP.NP.NP.NP.NP.NP.NP.01.01.NT.SJ.SJ.SJ.S...,0.0,17,0.0,1.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
0,10043819,SSSSSSSSSS,OOCOOOOOOC,NNNNNNNNNN,DF.DF.DF.DF.DF.DF.DF.DF.DF.DF,1.0,10,0.0,0.0
