# ASAPP ML NLP Engineering Challenge - Auto Complete Server

The goal of this exercise is to code a relevant, robust, fast auto completion server that will be used to provide auto completion suggestions to customer service agents. The auto completion model will be trained on a provided set of customer/agent conversations.

This notebook will be used to explore the data and start prototyping the future components of our auto-complete server.

In [2]:
import json
import pandas as pd

## Load Data

The conversation json file is loaded into a dataframe where each row represents a message part of a given issue.

In [4]:
sample_file = '../data/sample_conversations.json'
 
with open(sample_file, encoding='utf8') as f:
    sample_json = json.loads(f.read())
    
df = pd.io.json.json_normalize(
    data=sample_json,
    record_path=['Issues', 'Messages'],
    meta=[
        ['Issues', 'CompanyGroupId'],
        ['Issues', 'IssueId']
    ]
)
 
df = df[['Issues.IssueId', 'Issues.CompanyGroupId', 'IsFromCustomer', 'Text']]
df.rename(columns={'Issues.IssueId': 'IssueId', 'Issues.CompanyGroupId': 'CompanyGroupId'},
          inplace=True)

Since our interest is in providing autocompletion suggestions to the customer service agent, we need to focus on their own messages, which are tagged with `IsFromCustomer = False`

In [6]:
from_agent = df.IsFromCustomer == False
df[from_agent].head(5)

Unnamed: 0,IssueId,CompanyGroupId,IsFromCustomer,Text
9,30001,40001,False,Hello Werner how may I help you today?
11,30001,40001,False,Sure I can help you with that? Could you pleas...
13,30001,40001,False,Let me update that information on our system
14,30001,40001,False,"OK Wernzio, I have updated your address to the..."
16,30001,40001,False,Ok let me go ahead and request a work order fo...
