# Experiment Sentiment Analysis on Calls Recording

In the notebook, we will be experimenting how to trigger a serverless-based workflow to do sentiment analysis on calls recordings. The use case is based on using Amazon Connect for call center setup and forwarding call recordings to a S3 bucket to trigger sentiment analysis of the call conversation. 

In the lab, we won't cover Amazon Connect setup, but will focus on how to use serverless-based architecture to manager the how process, including:
* Using AWS Lambda function to split audio file
* Using Amazon Transcribe to do speech recognition with Step Functions orchestration (coverting voice to text)
* Using Amazon Comprehend to do sentiment analysis on the text content

## Architecture Design

![architecture diagram](./images/sentiment-analysis-on-calls-recording.png)

## To Experiment

* To prepare a `*.wav` format file and upload to target S3 bucket to trigger the whole process
* To review the text content and compare to the origin
* To review the sentment analysis result 


#### Initialization

In [None]:
import boto3

ssm = boto3.client('ssm')

response = ssm.get_parameter(Name = "/aiml-lab/sentiment_analysis_s3_bucket_name")
bucket_name = response['Parameter']['Value']

In [None]:
prefix_transcript = 'connect'

### To Trigger the Process

We will be using **Open Speech Repository** [American English - Harvard Sentences](http://www.voiptroubleshooter.com/open_speech/american.html) as testing examples. While checking the text result, you may refer to [Harvard Sentences Text](http://www.cs.columbia.edu/~hgs/audio/harvard.html).

Meanwhile, you can also use your own voice record for testing.

#### Voice file collection

In [None]:
voice_file_name = 'OSR_us_000_0010_8k.wav'

In [None]:
!wget http://www.voiptroubleshooter.com/open_speech/american/$voice_file_name

In [None]:
from datetime import datetime

s3 = boto3.client('s3')

contact_id = datetime.now().strftime("%y%m%d-%H%M")

s3.upload_file(
    f'./{voice_file_name}', 
    bucket_name, 
    f'{prefix_transcript}/{voice_file_name}',
    ExtraArgs={ "Metadata": { "contact-id" : contact_id }}
)

You may wait a couple of minutes and run below cell multiple times until you can see examples like:

```
2020-12-06 11:39:37       2881 comprehend/20120X-####-Customer-comprehend.json
2020-12-06 11:35:46     538014 connect/OSR_us_000_0010_8k.wav
2020-12-06 11:38:56     538014 recordings/Customer/2020/12/20120X-####-Customer.wav
2020-12-06 11:39:35      11625 transcripts/20120X-####-Customer.json
```

In [None]:
!aws s3 ls s3://$bucket_name --recursive

Once analysis file arrives at 'comprehend' folder, let's download it to have a look.

In [None]:
# please update the file key based on the actual file.
s3_resource = boto3.resource('s3')
local_result_file = 'result.json'
comprehend_result_file_key = 'comprehend/201206-1135-Customer-comprehend.json'
s3_resource.Bucket(bucket_name).download_file(comprehend_result_file_key, local_result_file)

In [None]:
import pandas as pd
pd.set_option('display.max_colwidth', 0)  
df = pd.read_json(local_result_file)
df.head()

### Under the Hood

Now, we are going to disclose how the whole process work together. Before continue reading, would you want to check by your own? 

If **YES**, please refer to [template.yaml](./template.yaml) file to understand the stack structure; and also, please refer to  [code folder](./code) and below services console for more detail:
* [AWS Lambda](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions)
* [S3](https://s3.console.aws.amazon.com/s3/home?region=us-east-1)
* [AWS Step Functions](https://console.aws.amazon.com/states/home?region=us-east-1#/statemachines)
* [AWS Glue](https://console.aws.amazon.com/glue/home?region=us-east-1#catalog:tab=databases)

> Please scroll down to check 'Demystifying the Process' section when you are ready!

![Discovery](./images/discovery-unsplash.jpg)

#### Demystifying the Process

* Once voice file is uploaded to folder '/connect' with meta data 'connect-id' = ###, lambda function [split_audio_lambda](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/split_audio_lambda?tab=configuration) is triggered to split the voice file (customer vs agent) and save the result files into folder '/recordings'
* Once files arrive at folder '/recordings', lambda function [execute_transcription_state_machine](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/execute_transcription_state_machine?tab=configuration) will be triggered to execute Step Functions [State Machines](https://console.aws.amazon.com/states/home?region=us-east-1#/statemachines)
 * Lambda function [submit_transcribe_job](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/submit_transcribe_job?tab=configuration) will be executed in state machiine. 
 * Once Transcribe job is done, Lambda function [save_transcription_to_s3](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/save_transcription_to_s3?tab=configuration) will be executed to save the transcription to folder '/transcripts'
* Once file arrives at '/transcripts', lambda function [comprehend_transcript_lambda](https://console.aws.amazon.com/lambda/home?region=us-east-1#/functions/comprehend_transcript_lambda?tab=configuration) will be executed to process below NLP analysis and save result in folder '/comprehend'
 * Sentiment
 * Entities
 * KeyPhrases
 * DominantLanguage
* Last but not least, comprehend analysis result files can be searched with Athena (SQL-like queries) given related [Glue Database](https://console.aws.amazon.com/glue/home?region=us-east-1#catalog:tab=databases) & Table are created to parse the files.
 * [Try Athena Queries](https://console.aws.amazon.com/athena/home?force&region=us-east-1#query)?


#### Original Lab

To learn more, you may refer to the [original lab](https://master.d167n899s5ufv3.amplifyapp.com/en/600-pipeline.html)