## Finding Insights from Email - First Steps

<img src="img/dave-email.png" width="800" align="left">

### Preprocessing - Retrieving the Email Data

Let's start by getting the email data so we can analyse it.

At this point in the process we have received the email from microsoft
(https://docs.microsoft.com/en-us/graph/api/resources/mail-api-overview?view=graph-rest-1.0) and have saved it into an AWS S3 bucket.

<img src="img/step-function-preprocess.png" align="left">

In our pre-processing Lambda the first thing we would do is read the email from S3.
Here we will pull it in from the local file system, but the file is the same.

In [1]:
import json
with open('data/dave-email.json', 'r') as email_file:
    data=email_file.read()
email_json = json.loads(data)

The file we get from the Graph API is a json structure.

<img src="img/dave-email-json.png" width="800" align="left">

Now we have the data what should we do with it?

### <font color=#61A794>QUESTION: What information do YOU use to understand an email?</font>

Here we are first going to look at the meta data attached to the email (sender, date, etc) and do some light formatting on it.

In [2]:
import probablepeople

email_subject = email_json['subject']
email_sent = email_json['sentDateTime']
orig_email_sender = email_json['sender']['emailAddress']['name']
parsed_sender = probablepeople.tag(orig_email_sender)
email_sender = parsed_sender[0]['GivenName'] +' ' + parsed_sender[0]['Surname']

email_to_recipient_array = email_json['toRecipients']
email_recipients = ''

for index, value in enumerate(email_to_recipient_array):
    if index > 0:
        email_recipients = email_recipients + ' and '
    name = value['emailAddress']['name']
    parsed_name = probablepeople.tag(name)
    email_recipients = email_recipients + parsed_name[0]['GivenName'] +' ' + parsed_name[0]['Surname']
    

def print_meta_data():
    print('META-DATA')
    print('Subject: ', email_subject)
    print('Sent: ', email_sent)
    print('Sender: ', email_sender)
    print('Recipients: ', email_recipients)

print_meta_data()

META-DATA
Subject:  Hey there!
Sent:  2020-10-30T09:27:00Z
Sender:  David McFadden
Recipients:  Kevin Duffy


In [3]:
email_body = email_json['body']['content']
email_body

'<html><head>\r\n<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta content="text/html; charset=utf-8"><meta name="Generator" content="Microsoft Word 15 (filtered medium)"><style>\r\n<!--\r\n@font-face\r\n\t{font-family:Wingdings}\r\n@font-face\r\n\t{font-family:"Cambria Math"}\r\n@font-face\r\n\t{font-family:Calibri}\r\n@font-face\r\n\t{}\r\np.MsoNormal, li.MsoNormal, div.MsoNormal\r\n\t{margin:0cm;\r\n\tfont-size:12.0pt;\r\n\tfont-family:"Calibri",sans-serif}\r\na:link, span.MsoHyperlink\r\n\t{color:#0563C1;\r\n\ttext-decoration:underline}\r\np.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph\r\n\t{margin-top:0cm;\r\n\tmargin-right:0cm;\r\n\tmargin-bottom:0cm;\r\n\tmargin-left:36.0pt;\r\n\tfont-size:12.0pt;\r\n\tfont-family:"Calibri",sans-serif}\r\nspan.EmailStyle20\r\n\t{font-family:"Calibri",sans-serif;\r\n\tcolor:windowtext}\r\n.MsoChpDefault\r\n\t{font-size:10.0pt}\r\n@page WordSection1\r\n\t{margin:72.0pt 72.0pt 72.0pt 72.0pt}\r\ndiv.WordSection1

### Preprocessing - Cleaning up the Data

First, let's strip the HTML. There are lots of different libraries available for this, or you can write your pre-preprocessing method.

In [4]:
from bs4 import BeautifulSoup
import re

#strip out the header contents - in this case it's just meta-data on formatting
email_body_only = re.sub(r'<head>(\r|\n|.)*</head>', '', email_body)
parsed_email_body_1 = BeautifulSoup(email_body_only, features="html.parser")
parsed_email_text_1 = parsed_email_body_1.get_text()

print(f"Before Processing: {len(email_body)} characters")
print(f"After Processing: {len(parsed_email_text_1)} characters")
print(f"Now it's only {round(len(parsed_email_text_1)/len(email_body) * 100, 2)}% the size of the original!")

print('')
print('')
print(parsed_email_text_1)

Before Processing: 4177 characters
After Processing: 676 characters
Now it's only 16.18% the size of the original!


Hey Kevin! How are you? It’s been a while since we’ve chatted. I was wondering if you had time to meet with Gillian and me next Wednesday at 3pm? I also was also hoping you could take a look over the following documents:Amazing Flow ChartsGetting Fancy with AWS Step FunctionsPrototyping with Buttery Services I’ll need any comments/questions/compliments back by Monday. You can find out more about the Amazon AI Services here. Cheers,Dave David McFadden | DeveloperExample Corp. 1007 Main Street, BelfastTel: +44 2896496000  Disclaimer: The contents of this e-mail and attached files in no way reflect any policies of Example Corp or their affiliated superhero counterparts. 


<img src="img/dave-email.png" width="600" align="left">

In [5]:
pre_processed_email = email_body_only
# As you explore more data you will find other things you want to preserve in your text
pre_processed_email = pre_processed_email.replace('</div>', '\n')
pre_processed_email = pre_processed_email.replace('</li>', '\n')
pre_processed_email = pre_processed_email.replace('</p>', '\n')

parsed_email = BeautifulSoup(pre_processed_email, features="html.parser")
parsed_email_text = parsed_email.get_text()

print(parsed_email_text)

Hey Kevin!
 
How are you? It’s been a while since we’ve chatted.
 
I was wondering if you had time to meet with Gillian and me next Wednesday at 3pm?
 
I also was also hoping you could take a look over the following documents:
Amazing Flow Charts
Getting Fancy with AWS Step Functions
Prototyping with Buttery Services
 
I’ll need any comments/questions/compliments back by Monday.
 
You can find out more about the Amazon AI Services here.
 
Cheers,
Dave
 
David McFadden | Developer
Example Corp. 
1007 Main Street, Belfast
Tel: +44 2896496000 
 
Disclaimer: The contents of this e-mail and attached files in no way reflect any policies of Example Corp or their affiliated superhero counterparts.
 




|￣￣￣￣￣￣￣ |  
|&nbsp;FORMATTING &nbsp;|  
|&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;IS DATA &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|  
|＿＿＿＿＿＿＿ |  
(\\__/) ||  
(•ㅅ•) ||  
/ 　 づ  

**NOTE!** For your use case the HTML may be important, make sure you have examined what information is being used to understand it at at the moment.

### _Nice! Let's move on to the Machine Learning!!_

## Information Extraction

### <font color=#61A794>QUESTION: How can we identify some key information easily?</font>

<img src="img/step-function-extract.png" align="left">

#### What is Amazon Comprehend?
Amazon Comprehend is a Natural Language Processing (NLP) service.
<img src="img/comprehend.png" width="800" align="left">

#### Entity Extraction
The first service we are going to use is an out-of-the-box option "Detect Entities". This is a pre-trained model that can recognise common references such as Dates or Locations. This is a managed service, so you don't need to set anything up.

The items it current recognises are the following:

| Type      | Description |
| :- | :- |
| COMMERCIAL_ITEM      | A branded product       |
| DATE   | A full date (for example, 11/25/2017), day (Tuesday), month (May), or time (8:30 a.m.)        |
| EVENT   | An event, such as a festival, concert, election, etc.        |
| LOCATION   | A specific location, such as a country, city, lake, building, etc.        |
| ORGANIZATION   | Large organizations, such as a government, company, religion, sports team, etc.        |
| PERSON   | Individuals, groups of people, nicknames, fictional characters        |
| QUANTITY   | A quantified amount, such as currency, percentages, numbers, bytes, etc.        |
| TITLE   | An official name given to any creation or creative work, such as movies, books, songs, etc.        |
| OTHER   | Entities that don't fit into any of the other entity categories        |

Amazon Comprehend offers a full set of APIs and SDKS in a variety of languages are to make interacting with the service programatically very easy. 

Here we are using the python library https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend.html

In [6]:
import boto3
comprehend = boto3.client('comprehend')

In [7]:
### This is all the code needed to call Comprehend Detect Entities - no other setup required!
comprehend_entity_response = comprehend.detect_entities(
    Text=parsed_email_text, 
    LanguageCode='en')

print(json.dumps(comprehend_entity_response, indent=2))

{
  "Entities": [
    {
      "Score": 0.999258816242218,
      "Type": "PERSON",
      "Text": "Kevin",
      "BeginOffset": 4,
      "EndOffset": 9
    },
    {
      "Score": 0.9997270703315735,
      "Type": "PERSON",
      "Text": "Gillian",
      "BeginOffset": 112,
      "EndOffset": 119
    },
    {
      "Score": 0.9064348340034485,
      "Type": "DATE",
      "Text": "next Wednesday at 3pm",
      "BeginOffset": 127,
      "EndOffset": 148
    },
    {
      "Score": 0.993148148059845,
      "Type": "ORGANIZATION",
      "Text": "AWS",
      "BeginOffset": 266,
      "EndOffset": 269
    },
    {
      "Score": 0.99654620885849,
      "Type": "DATE",
      "Text": "Monday",
      "BeginOffset": 374,
      "EndOffset": 380
    },
    {
      "Score": 0.9585748910903931,
      "Type": "ORGANIZATION",
      "Text": "Amazon AI Services",
      "BeginOffset": 416,
      "EndOffset": 434
    },
    {
      "Score": 0.9948257803916931,
      "Type": "PERSON",
      "Text": "Dave\n\u

For each entity the service brings back the **Type**, the exact **Text** it matched, the location (**BeginOffset, EndOffset**)  of that Text in the text sent to the service and the **Score** which represents how confident Amazon Comprehend is in the accuracy of the result.

Let's print it out in a way that is a little easier to look at:

In [8]:
def print_entities_compressed(comprehend_entity_response):
    print(f"Score / Type : Text")
    print('--------------------')
    for entity in comprehend_entity_response['Entities']:
        print(f"{round(entity['Score'], 2)} / {entity['Type']} : {entity['Text']}")
              
print_entities_compressed(comprehend_entity_response)

Score / Type : Text
--------------------
1.0 / PERSON : Kevin
1.0 / PERSON : Gillian
0.91 / DATE : next Wednesday at 3pm
0.99 / ORGANIZATION : AWS
1.0 / DATE : Monday
0.96 / ORGANIZATION : Amazon AI Services
0.99 / PERSON : Dave
 
David McFadden
0.97 / ORGANIZATION : Example Corp.
0.99 / LOCATION : 1007 Main Street, Belfast
1.0 / OTHER : +44 2896496000
0.99 / ORGANIZATION : Example Corp


In [9]:
# cutting off the html since comprehend can only take 5,000 bytes 
# and ignoring the rest since this is just for demonstration purposes
comprehend_entity_response_unprocessed = comprehend.detect_entities(
    Text=str(email_body[:4800]), 
    LanguageCode='en')


print('Entity Recognition on the original HTML')
print('--------------------')
print_entities_compressed(comprehend_entity_response_unprocessed)

Entity Recognition on the original HTML
--------------------
Score / Type : Text
--------------------
0.71 / OTHER : =utf-8
0.62 / OTHER : utf-8
0.88 / TITLE : Microsoft Word 15
0.42 / TITLE : font-face
0.46 / ORGANIZATION : @font-face
0.41 / ORGANIZATION : font-face
0.48 / TITLE : font-face
0.97 / QUANTITY : 0cm
0.95 / QUANTITY : 12.0pt
1.0 / QUANTITY : 0cm
1.0 / QUANTITY : 0cm
1.0 / QUANTITY : 0cm
1.0 / QUANTITY : 36.0pt
1.0 / QUANTITY : 12.0pt
0.98 / QUANTITY : 10.0pt
0.99 / QUANTITY : 72.0pt
0.98 / QUANTITY : 72.0pt
0.98 / QUANTITY : 72.0pt
0.96 / QUANTITY : 72.0pt
0.93 / QUANTITY : 0cm
0.92 / QUANTITY : 0cm
0.8 / TITLE : WordSection1
0.75 / QUANTITY : 11.0pt
0.98 / PERSON : Kevin
0.85 / QUANTITY : 11.0pt
0.81 / QUANTITY : 11.0pt
0.51 / QUANTITY : 11.0pt
1.0 / PERSON : Gillian
0.96 / DATE : next Wednesday
0.97 / DATE : 3pm
0.93 / QUANTITY : 11.0pt
0.67 / QUANTITY : 0cm
0.96 / QUANTITY : 0cm
0.91 / QUANTITY : 11.0pt
0.66 / QUANTITY : 0cm
0.95 / QUANTITY : 11.0pt
0.77 / ORGANIZATION 

#### Pricing for Comprehend Entity Recognition
$0.0001 / unit (100 chars)

- Original HTML Email, 4177 characters, 42 Units = $0.0042

- Processed Email, 676 characters, 7 Units = $0.0007

(NOTE: Free Tier - 50K units/month)

#### Comprehend Sentiment Detection


The second service we are going to use is another out-of-the-box option "Detect Sentiment". This is a pre-trained model that can recognise sentiment in text. This is a managed service, so you don't need to set anything up.

The sentiments it could return are: POSITIVE, NEUTRAL, MIXED, or NEGATIVE

In [10]:
### This is all the code needed to call Comprehend Detect Sentiment - no other setup required

comprehend_sentiment_response = comprehend.detect_sentiment(
    Text=str(parsed_email_text), 
    LanguageCode='en')

def print_sentiment():
    print('SENTIMENT')
    print(comprehend_sentiment_response['Sentiment'])
comprehend_sentiment_response

{'Sentiment': 'NEUTRAL',
 'SentimentScore': {'Positive': 0.34901344776153564,
  'Negative': 0.0016374826664105058,
  'Neutral': 0.6493454575538635,
  'Mixed': 3.640495151557843e-06},
 'ResponseMetadata': {'RequestId': 'f05a4964-8af3-48f7-9575-d71a1d4303d0',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'f05a4964-8af3-48f7-9575-d71a1d4303d0',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '164',
   'date': 'Sat, 31 Oct 2020 17:56:10 GMT'},
  'RetryAttempts': 0}}

The **Sentiment** returned is the result with the highest level of confidence, however within the **SentimentScore** object each of the various sentiments are returned with their corresponding confidence levels.

Let's do a little cleanup so we can look at all the data together. 

|￣￣￣￣￣￣￣￣￣￣￣|  
|&nbsp;&nbsp;WHAT DOES THE &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|  
|&nbsp;&nbsp;COMPUTER KNOW? &nbsp;&nbsp;|  
|＿＿＿＿＿＿＿＿＿＿＿|  
(\\__/) ||  
(•ㅅ•) ||  
/ 　 づ  

In [11]:
entities = comprehend_entity_response['Entities']  
high_conf_entities = list(filter(lambda item: item['Score'] > 0.8, entities))

people = list(filter(lambda item: item['Type'] == 'PERSON', high_conf_entities))
dates = list(filter(lambda item: item['Type'] == 'DATE', high_conf_entities))
locations = list(filter(lambda item: item['Type'] == 'LOCATION', high_conf_entities))
organisations = list(filter(lambda item: item['Type'] == 'ORGANIZATION', high_conf_entities))

def print_entity_list(list):
    for item in list:
        text = item['Text'].replace('\n', ' ')
        text = text.strip()
        print(f"- {text}")
        

def print_entity_info():
    print('ENTITIES')
    print('People mentioned in this email:')
    print_entity_list(people)
    print('')
    print('Dates mentioned in this email:')
    print_entity_list(dates)
    print('')
    print('Locations mentioned in this email:')
    print_entity_list(locations)
    print('')
    print('Organisations mentioned in this email:')
    print_entity_list(organisations)

print_meta_data()
print('')
print_entity_info()
print('')
print_sentiment()

META-DATA
Subject:  Hey there!
Sent:  2020-10-30T09:27:00Z
Sender:  David McFadden
Recipients:  Kevin Duffy

ENTITIES
People mentioned in this email:
- Kevin
- Gillian
- Dave   David McFadden

Dates mentioned in this email:
- next Wednesday at 3pm
- Monday

Locations mentioned in this email:
- 1007 Main Street, Belfast

Organisations mentioned in this email:
- AWS
- Amazon AI Services
- Example Corp.
- Example Corp

SENTIMENT
NEUTRAL


## A little Fancier? Thinking about the Conversation...

<img src="img/dave-email-conversation.png" align="left">

### Amazon Lex
Amazon Lex is a service for building conversational interfaces. https://aws.amazon.com/lex/

<img src="img/amazon-lex.png" width="600" align="left">

Since Amazon Lex can recognise Conversational Intents, we can use it as a very quick test to show what might be possible if we were to do some classification work. Here we create a very small model with two Intents - one that can recognise phrases associated with Meeting Requests and one that can identify phrases associated with Deadlines. Since we don't have much data yet we have just added in some initial variations.

There is no cost for Model Training or Hosting, we will only pay for requests to the service.

<img src="img/lex-bot.png" align="left">

#### LEX MODEL

**Meeting Intent Phrases:**
- I was wondering if you had time to meet with me on  {date}
- Can you meet with me on {date}
- Could I get some time with you on {date}
- Are you available on {date}
- are you available sometime {date}
- do you have some time to meet with me {date}
- Could you do {date}


**Deadline Intent Phrases**
- I will need any comments back by {date}
- The deadline is {date}
- This needs to be done for {date}
- Final comments are due {date}
- This needs completed by {date}
- I will need any compliments back by {date}
- I will need any questions back by {date}


<BR>
Each of these uses an Entity type of DATE within the phrases

Let's take a look at how we could use this with our email:

In [12]:
parsed_email_text

'Hey Kevin!\n\xa0\nHow are you? It’s been a while since we’ve chatted.\n\xa0\nI was wondering if you had time to meet with Gillian and me next Wednesday at 3pm?\n\xa0\nI also was also hoping you could take a look over the following documents:\nAmazing Flow Charts\nGetting Fancy with AWS Step Functions\nPrototyping with Buttery Services\n\xa0\nI’ll need any comments/questions/compliments back by Monday.\n\xa0\nYou can find out more about the Amazon AI Services here.\n\xa0\nCheers,\nDave\n\xa0\nDavid McFadden |\xa0Developer\nExample Corp. \n1007 Main Street, Belfast\nTel: +44 2896496000\xa0\n\xa0\nDisclaimer: The contents of this e-mail and attached files in no way reflect any policies of Example Corp or their affiliated superhero counterparts.\n\xa0\n\n'

- Roughly break down the email (for our first pass by paragraphs,identified by new lines _\n_).
- Call Amazon Lex for each of those paragraphs. We won't use any of the conversational features - we are just looking for what Intent might be matched.

In [13]:
import uuid
import boto3

#https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/lex-runtime.html
lex = boto3.client('lex-runtime')

bot_name = 'analyse_date_phrases'
bot_alias = 'demo'
lex_array = []

#Split by new line and remove blanks
paragraphs = parsed_email_text.split('\n')
paragraphs = list(filter(lambda item: (len(item.strip()) > 0), paragraphs))

for paragraph in paragraphs:
    email_intent = lex.post_text(
        botName=bot_name,
        botAlias=bot_alias,
        userId=str(uuid.uuid4()),
        inputText=paragraph
    )
    email_intent['paragraph'] = paragraph
    lex_array.append(email_intent)

Here's an example of what a Lex Response looks like (we added in the paragraph attribute ourselves):

In [14]:
print(json.dumps(lex_array[2], indent=2))

{
  "ResponseMetadata": {
    "RequestId": "c36be114-2996-45af-adf0-b4e29f7f2c20",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "x-amzn-requestid": "c36be114-2996-45af-adf0-b4e29f7f2c20",
      "date": "Sat, 31 Oct 2020 17:57:33 GMT",
      "content-type": "application/json",
      "content-length": "511"
    },
    "RetryAttempts": 0
  },
  "intentName": "meeting_request",
  "nluIntentConfidence": {
    "score": 0.9
  },
  "alternativeIntents": [
    {
      "intentName": "AMAZON.FallbackIntent",
      "slots": {}
    },
    {
      "intentName": "deadline",
      "nluIntentConfidence": {
        "score": 0.41
      },
      "slots": {
        "date": null
      }
    }
  ],
  "slots": {
    "date": null
  },
  "message": "What date?",
  "messageFormat": "PlainText",
  "dialogState": "ElicitSlot",
  "slotToElicit": "date",
  "sessionId": "2020-10-31T17:57:33.667Z-nGEHyqIB",
  "botVersion": "11",
  "paragraph": "I was wondering if you had time to meet with Gillian and me next 

The only pieces are using for this test are the **intentName** (which intent was matched, if any) and **nluIntentConfidence** (the confidence of the accuracy the service has).

Let's convert what we have to an easier format to review:

In [15]:
print(f"Score / Type : Text")
print('--------------------')

for item in lex_array:
    intent = item.get('intentName', '')
    dialogState = item.get('dialogState', '')
    confidence = item.get('nluIntentConfidence', '')
    
    if(len(intent) > 0): 
        print(str(confidence['score']) +' / '+str(intent) +' : ' +item['paragraph'])

Score / Type : Text
--------------------
0.7 / meeting_request : How are you? It’s been a while since we’ve chatted.
0.9 / meeting_request : I was wondering if you had time to meet with Gillian and me next Wednesday at 3pm?
0.88 / deadline : I’ll need any comments/questions/compliments back by Monday.
0.65 / meeting_request : 1007 Main Street, Belfast


You'll see it isn't perfect, and we would definitely have some work to do - but we did find the requests we were interested in, even though the the model was not trained with those exact phrases.

Let's filter out lower confidence ones and do some formatting.

In [16]:
high_conf_intents = list(filter(lambda item: item.get('nluIntentConfidence', {"score":0})['score'] >= 0.8, lex_array))
meeting_requests = list(filter(lambda item: item.get('intentName', '') == 'meeting_request', high_conf_intents))
deadlines = list(filter(lambda item: item.get('intentName', '') == 'deadline', high_conf_intents))

def print_intents_found():
    print('REQUESTS')
    print('Meeting Requests:')
    for item in meeting_requests:
        print(item['paragraph'])

    print('')

    print('Deadline Requests:')
    for item in deadlines:
        print(item['paragraph'])

print_intents_found()

REQUESTS
Meeting Requests:
I was wondering if you had time to meet with Gillian and me next Wednesday at 3pm?

Deadline Requests:
I’ll need any comments/questions/compliments back by Monday.


|￣￣￣￣￣￣￣￣￣￣￣|  
|&nbsp;&nbsp;WHAT DOES THE &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;|  
|&nbsp;&nbsp;COMPUTER KNOW? &nbsp;&nbsp;|  
|＿＿＿＿＿＿＿＿＿＿＿|  
(\\__/) ||  
(•ㅅ•) ||  
/ 　 づ  

Let's combine it all:

In [17]:
print_meta_data()
print('')
print_entity_info()
print('')
print_sentiment()
print('')
print_intents_found()

META-DATA
Subject:  Hey there!
Sent:  2020-10-30T09:27:00Z
Sender:  David McFadden
Recipients:  Kevin Duffy

ENTITIES
People mentioned in this email:
- Kevin
- Gillian
- Dave   David McFadden

Dates mentioned in this email:
- next Wednesday at 3pm
- Monday

Locations mentioned in this email:
- 1007 Main Street, Belfast

Organisations mentioned in this email:
- AWS
- Amazon AI Services
- Example Corp.
- Example Corp

SENTIMENT
NEUTRAL

REQUESTS
Meeting Requests:
I was wondering if you had time to meet with Gillian and me next Wednesday at 3pm?

Deadline Requests:
I’ll need any comments/questions/compliments back by Monday.


<img src="img/dave-email.png" width="600" align="left">

### <font color=#61A794>QUESTION: What could we do next?</font> 

- Get a lot more Email Data!
- Work on improving how we Pre-Process our Data
<br>

- Experiment with the other out-of-the box Comprehend Services
- Try different variations of our Lex Model and ways of breaking up the text
<br>

- Do some Custom Entity Recognition or Classification using Comprehend Custom
https://docs.aws.amazon.com/comprehend/latest/dg/auto-ml.html
(take a look at SageMaker Ground Truth for help with labelling your data https://aws.amazon.com/sagemaker/groundtruth/)
<br>

- Look at the tools in SageMaker and write some completely custom models... https://aws.amazon.com/sagemaker/