# Telegram Chatbot Backed by KGQA with Lambda and API Gateway

References:
- [Building Your First Serverless Telegram Bot with AWS Lambda](https://iamondemand.com/blog/building-your-first-serverless-telegram-bot/)
- [github:lesterchan/telegram-bot ](https://github.com/lesterchan/telegram-bot)
- [Simple Telegram Bot with Python and AWS Lambda](https://levelup.gitconnected.com/simple-telegram-bot-with-python-and-aws-lambda-5eab1066b466)
- [API GatewayとLambda(Python)でLINE BOT(Messaging API)開発](https://qiita.com/w2or3w/items/1b80bfbae59fe19e2015)
- [API Gateway + LambdaでLINE Bot開発](https://xp-cloud.jp/blog/2019/06/24/5560/)
- [Serverless github bot with AWS Lambda and API Gateway](https://kalinchernev.github.io/serverless-github-bot-aws-lambda-api-gateway-nodejs)
- [Writing a Serverless Slack Bot](https://messagemedia.com/au/blog/writing-a-serverless-slack-bot/)
- [Build Chatbots using Serverless Bot Framework with Salesforce Integration](https://aws.amazon.com/blogs/architecture/build-chatbots-using-serverless-bot-framework-with-salesforce-integration/)
- [How to Deploy a Telegram Bot with Flask, pyTelegramBotAPI, Gunicorn and PostgreSQL to Heroku](https://medium.com/tech-insights/how-to-deploy-a-telegram-bot-with-flask-pytelegrambotapi-gunicorn-and-postgresql-to-heroku-19d87959a65)
- [I built a serverless Telegram bot over the weekend. Here’s what I learned.](https://www.freecodecamp.org/news/how-to-build-a-server-less-telegram-bot-227f842f4706/)

## Part 1: End-to-end Invocation

We write codes of converting a question to an answer in this part.

Environment variables:

In [18]:
import boto3
neptune_db_cluster_identifier = 'kg-neptune'
nlu_endpoint_name_contains = 'qa-model-from-registry-ep'

In [80]:
neptune = boto3.client('neptune')
response = neptune.describe_db_clusters(DBClusterIdentifier=neptune_db_cluster_identifier)
neptue_endpoint_desc = response['DBClusters'][0]
neptue_endpoint_desc

{'AllocatedStorage': 1,
 'AvailabilityZones': ['us-east-1b', 'us-east-1a', 'us-east-1c'],
 'BackupRetentionPeriod': 1,
 'DBClusterIdentifier': 'kg-neptune',
 'DBClusterParameterGroup': 'default.neptune1',
 'DBSubnetGroup': 'default',
 'Status': 'available',
 'EarliestRestorableTime': datetime.datetime(2021, 9, 27, 9, 56, 26, 863000, tzinfo=tzlocal()),
 'Endpoint': 'kg-neptune.cluster-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com',
 'ReaderEndpoint': 'kg-neptune.cluster-ro-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com',
 'MultiAZ': False,
 'Engine': 'neptune',
 'EngineVersion': '1.0.5.0',
 'LatestRestorableTime': datetime.datetime(2021, 9, 29, 2, 43, 12, 181000, tzinfo=tzlocal()),
 'Port': 8182,
 'MasterUsername': 'admin',
 'PreferredBackupWindow': '09:47-10:17',
 'PreferredMaintenanceWindow': 'fri:05:05-fri:05:35',
 'ReadReplicaIdentifiers': [],
 'DBClusterMembers': [{'DBInstanceIdentifier': 'kg-neptune-instance-1',
   'IsClusterWriter': True,
   'DBClusterParameterGroupStatus': 'in-sync

In [81]:
response

{'DBClusters': [{'AllocatedStorage': 1,
   'AvailabilityZones': ['us-east-1b', 'us-east-1a', 'us-east-1c'],
   'BackupRetentionPeriod': 1,
   'DBClusterIdentifier': 'kg-neptune',
   'DBClusterParameterGroup': 'default.neptune1',
   'DBSubnetGroup': 'default',
   'Status': 'available',
   'EarliestRestorableTime': datetime.datetime(2021, 9, 27, 9, 56, 26, 863000, tzinfo=tzlocal()),
   'Endpoint': 'kg-neptune.cluster-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com',
   'ReaderEndpoint': 'kg-neptune.cluster-ro-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com',
   'MultiAZ': False,
   'Engine': 'neptune',
   'EngineVersion': '1.0.5.0',
   'LatestRestorableTime': datetime.datetime(2021, 9, 29, 2, 43, 12, 181000, tzinfo=tzlocal()),
   'Port': 8182,
   'MasterUsername': 'admin',
   'PreferredBackupWindow': '09:47-10:17',
   'PreferredMaintenanceWindow': 'fri:05:05-fri:05:35',
   'ReadReplicaIdentifiers': [],
   'DBClusterMembers': [{'DBInstanceIdentifier': 'kg-neptune-instance-1',
     'IsClusterWri

In [20]:
sm = boto3.client('sagemaker')
response = sm.list_endpoints(
    NameContains=nlu_endpoint_name_contains
)
nlu_endpoint_desc = response['Endpoints'][0]
nlu_endpoint_desc

{'EndpointName': 'qa-model-from-registry-ep',
 'EndpointArn': 'arn:aws:sagemaker:us-east-1:093729152554:endpoint/qa-model-from-registry-ep',
 'CreationTime': datetime.datetime(2021, 9, 28, 1, 54, 31, 267000, tzinfo=tzlocal()),
 'LastModifiedTime': datetime.datetime(2021, 9, 28, 2, 2, 51, 936000, tzinfo=tzlocal()),
 'EndpointStatus': 'InService'}

In [21]:
neptune_endpoint = neptue_endpoint_desc['Endpoint']
neptune_endpoint_port = neptue_endpoint_desc['Port']
nlu_endpoint_name = nlu_endpoint_desc['EndpointName']

Function for running graph query

In [None]:
!pip install gremlinpython

In [200]:
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

def query_neptune(expr, neptune_endpoint, port):
    graph = Graph()
    if port == 80 or port == '80': # use unencrypted web socket if port is an http port
        neptune_web_socket = f"ws://{neptune_endpoint}:{port}/gremlin"
    else:
        neptune_web_socket = f"wss://{neptune_endpoint}:{port}/gremlin"
    remoteConn = DriverRemoteConnection(neptune_web_socket,'g')
    g = graph.traversal().withRemote(remoteConn)
    result = eval(expr)
    remoteConn.close()
    return result

Function for converting question to intention and values. 

In [29]:
import json
import sagemaker
from sagemaker.pytorch.model import PyTorchPredictor
from sagemaker.serializers import CSVSerializer
from sagemaker.deserializers import JSONDeserializer

def parse_questions(questions, nlu_endpoint_name):
    '''
    Args:
        questions (list(str)): A list of natural language questions
    '''
    sess = sagemaker.Session()
    predictor = PyTorchPredictor(
        endpoint_name=nlu_endpoint_name,
        sagemaker_session=sess,
        serializer=CSVSerializer(),
        deserializer=JSONDeserializer(),
    )
    predicted = predictor.predict(questions)
    return predicted['text'], predicted['intentions'], predicted['slot_labels']

In [49]:
def extract_slot_values(question, seq_label):
    assert len(question) == len(seq_label), f"question {question} should have the same \
length with sequence label {seq_label} ({len(question)} != {len(seq_label)})"
    value_buf = ''
    slot_buf = ''
    values = []
    slots = []
    for i, l in enumerate(seq_label):
        if l.startswith('B'):
            if value_buf != '':
                values.append(value_buf)
                slots.append(slot_buf)
            slot_buf = l[2:] # extract label part from B_label
            value_buf = question[i]
        elif l.startswith('I'):
            value_buf += question[i]
        elif l.startswith('O'):
            if value_buf != '':
                values.append(value_buf)
                slots.append(slot_buf)
            value_buf = ''
            slot_buf = ''  
    return slots, values

Function for generating graph query:

In [122]:
query_templates = {
    'ask_alumni': "g.V().has('学校', 'name', '{}').inE().hasLabel('毕业院校').outV().values('name').toList()",
    'ask_school': "g.V().has('人物','name','{}').out('毕业院校').values('name').next()",
    'ask_books': "g.V().has('人物', 'name', '{}').inE().hasLabel('作者').outV().values('name').toList()", 
    'ask_author': "g.V().has('图书作品','name','{}').out('作者').values('name').next()",
    'ask_wife': "g.V().has('人物','name','{}').out('妻子').values('name').next()",
    'ask_husband': "g.V().has('人物','name','{}').out('丈夫').values('name').next()",
    'ask_films': "g.V().has('人物', 'name', '{}').inE().hasLabel('导演').outV().values('name').toList()",
    'ask_director': "g.V().has('影视作品','name','{}').out('导演').values('name').next()",
    'ask_nationality': "g.V().has('人物','name','{}').out('国籍').values('name').next()"
}

In [53]:
def generate_graph_query(intent, slots, values, query_templates):
    if intent not in query_templates.keys():
        raise Exception(f"Query templates does not have a template for {intent}")
    template = query_templates[intent]
    query = template.format(*values)
    return query

In [89]:
question = '张艺谋导演了哪些电影'

In [54]:
_, intentions, slot_labels = parse_questions([question], nlu_endpoint_name)
print('Intentions:')
print(intentions)
print('Slot labels:')
print(slot_labels)

Intentions:
['ask_films']
Slot labels:
[['B_name', 'I_name', 'I_name', 'O', 'O', 'O', 'O', 'O', 'O', 'O']]


In [55]:
slots, values = extract_slot_values(question, slot_labels[0])
slots, values

(['name'], ['张艺谋'])

In [57]:
query = generate_graph_query(intentions[0], slots, values, query_templates)
query

"g.V().has('人物', 'name', '张艺谋').inE().hasLabel('导演').outV().values('name').toList()"

To run the query event loop in jupyter notebook event look, run following cell:

In [60]:
!pip install nest_asyncio
import nest_asyncio
nest_asyncio.apply()

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Use this command to check connectivity and status of your Neptune endpoint:

In [70]:
!curl database-2.cluster-ro-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com:8182/status

curl: (28) Failed to connect to database-2.cluster-ro-c2ycbhkszo5s.us-east-1.neptune.amazonaws.com port 8182: Connection timed out


In [71]:
!curl alb-neptune-test-62758122.us-east-1.elb.amazonaws.com/status

{"status":"healthy","startTime":"Fri Aug 20 04:52:57 UTC 2021","dbEngineVersion":"1.0.4.2.R5","role":"writer","gremlin":{"version":"tinkerpop-3.4.10"},"sparql":{"version":"sparql-1.1"},"labMode":{"NeptuneML":"disabled","ObjectIndex":"disabled","DFEQueryEngine":"disabled","ReadWriteConflictDetection":"enabled"},"resultCache":{"status":"Disabled"}}

In [79]:
query_neptune(query, 'alb-neptune-test-62758122.us-east-1.elb.amazonaws.com', 80)

['阳光灿烂的日子',
 '三枪',
 '一个陌生女人的来信',
 '2046',
 '三枪拍案惊奇',
 '万里长城',
 '山楂树之恋',
 '金陵十三钗',
 '印象·刘三姐',
 '影子武士',
 '秋菊打官司',
 '太阳照常升起',
 '让子弹飞',
 '王朝的女人·杨贵妃',
 '建国大业',
 '大红灯笼高高挂',
 '英雄',
 '我的父亲母亲',
 '妻妾成群',
 '菊豆',
 '左耳',
 '山乡书记',
 '北京人在纽约',
 '对话·寓言2047',
 '十面埋伏',
 '长城',
 '习近平',
 '栀子花开']

In [175]:
def question2answer(question, query_templates, nlu_endpoint_name, neptune_endpoint, neptune_endpoint_port):
    _, intentions, slot_labels = parse_questions([question], nlu_endpoint_name)
    print(f"Intention: {intentions[0]}")
    slots, values = extract_slot_values(question, slot_labels[0])
    print(f"Slot labels: {slots},{values}")
    query = generate_graph_query(intentions[0], slots, values, query_templates)
    print(f"Query: {query}")
    try:
        query_result = query_neptune(query, neptune_endpoint, neptune_endpoint_port)
    except Exception as e:
        print(e)
        query_result = '我不知道'
    return query_result

In [201]:
query_templates = {
    'ask_alumni': "g.V().has('学校', 'name', '{}').inE().hasLabel('毕业院校').outV().values('name').toList()",
    'ask_school': "g.V().has('人物','name','{}').out('毕业院校').values('name').toList()",
    'ask_books': "g.V().has('人物', 'name', '{}').inE().hasLabel('作者').outV().values('name').toList()", 
    'ask_author': "g.V().has('图书作品','name','{}').out('作者').values('name').toList()",
    'ask_wife': "g.V().has('人物','name','{}').out('妻子').values('name').toList()",
    'ask_husband': "g.V().has('人物','name','{}').out('丈夫').values('name').toList()",
    'ask_films': "g.V().has('人物', 'name', '{}').inE().hasLabel('导演').outV().values('name').toList()",
    'ask_director': "g.V().has('影视作品','name','{}').out('导演').values('name').toList()",
    'ask_nationality': "g.V().has('人物','name','{}').out('国籍').values('name').toList()"
}

In [202]:
query_templates = {
    'ask_alumni': "g.V().has('学校', 'name', '{}').inE().hasLabel('毕业院校').outV().values('name').toList()",
    'ask_school': "g.V().has('人物','name','{}').out('毕业院校').values('name').next()",
    'ask_books': "g.V().has('人物', 'name', '{}').inE().hasLabel('作者').outV().values('name').toList()", 
    'ask_author': "g.V().has('图书作品','name','{}').out('作者').values('name').next()",
    'ask_wife': "g.V().has('人物','name','{}').out('妻子').values('name').next()",
    'ask_husband': "g.V().has('人物','name','{}').out('丈夫').values('name').next()",
    'ask_films': "g.V().has('人物', 'name', '{}').inE().hasLabel('导演').outV().values('name').toList()",
    'ask_director': "g.V().has('影视作品','name','{}').out('导演').values('name').next()",
    'ask_nationality': "g.V().has('人物','name','{}').out('国籍').values('name').next()"
}

In [209]:
question = '异界之再战风云是谁的作品'

In [210]:
question2answer(question, query_templates, nlu_endpoint_name, 'alb-neptune-test-62758122.us-east-1.elb.amazonaws.com', 80)

Intention: ask_author
Slot labels: ['book'],['异界之再战风云']
Query: g.V().has('图书作品','name','异界之再战风云').out('作者').values('name').next()


'品味人生'