# Build a Question Answering System Engine
In this example we will be going over the code used to build a question answering system. This example uses a modified BERT model to extract features from questions and Milvus to search for similar questions and answers. 

## Prepare

### Start Milvus Server

In [None]:
! wget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml
! docker-compose up -d

### Start MySql Server 
For now, Milvus doesn't support storing string data. Thus, we need a relational database to store questions and answers, which in this case is MySql

In [None]:
! docker run -p 3306:3306 -e MYSQL_ROOT_PASSWORD=123456 -d --name qa_mysql mysql:5.7

### Check running servers

In [None]:
! docker-compose ps
! docker logs qa_mysql --tail 6

## Core Code

### Connect to Serves

In [1]:
#Connectings to Milvus, BERT and Postgresql
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility
import pymysql

connections.connect(host='localhost', port='19530')
conn = pymysql.connect(host='localhost', user='root', port=3306, password='123456', database='mysql',local_infile=True)
cursor = conn.cursor()

### Create Milvus Collection with index

In [2]:
def create_collection(collection_name, dim):
    if utility.has_collection(collection_name):
        collection = Collection(name=collection_name)
        collection.drop()

    field1 = FieldSchema(name="id", dtype=DataType.INT64, descrition="ids", is_primary=True, auto_id=False)
    field2 = FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, descrition="float vector",dim=dim, is_primary=False)
    schema = CollectionSchema(fields=[field1, field2], description="collection description")
    collection = Collection(name=collection_name, schema=schema)

    index_params = {
        "index_type": "IVF_FLAT",
        "metric_type": 'IP',
        "params": {"nlist": 200}
    }
    collection.create_index(field_name="embedding", index_params=index_params)

    return collection

In [3]:
collection = create_collection('question_answering', 768)

### Create MySql Table

In [4]:
# def create_table(table_name):
# 	drop = ''.join(['DROP TABLE IF EXISTS ', table_name, ';'])
# 	cursor.execute(drop)

# 	try:
# 		create = ''.join(['CREATE TABLE if not exists ', table_name, ' (id TEXT, question TEXT, answer TEXT);'])
# 		cursor.execute(create)
# 		print("create MySQL table successfully!")
# 	except Exception as e:
# 		print("can't create a MySQL table: ", e)

In [5]:
# create_table('question_answering')

### Generate embedding and insert into collection

In [6]:
import towhee
from sentence_transformers import SentenceTransformer
from sklearn.preprocessing import normalize

model = SentenceTransformer('paraphrase-mpnet-base-v2')

In [7]:
dc = (
	towhee.read_csv('qa.csv')
		.runas_op['id', 'id'](func=lambda x: int(x))
		.runas_op['question', 'qvec'](func = model.encode)
		.tensor_normalize['qvec', 'qvec']()
		.to_milvus['id', 'qvec'](collection=collection)
)

In [8]:
dc.show()
collection.num_entities

id,question,answer,qvec
0,Is Disability Insurance Requi...,Not generally. There are five s...,"[-0.007821438, 0.100024074, -0.0010973853, ...] shape=(768,)"
1,Can Creditors Take Life Insu...,If the person who passed away w...,"[0.02621962, 0.10332995, 0.0071792766, ...] shape=(768,)"
2,Does Travelers Insurance Have...,One of the insurance carriers I...,"[-0.044059016, 0.04683365, -0.0072263787, ...] shape=(768,)"
3,Can I Drive A New Car Home...,Most auto dealers will not let ...,"[-0.05536839, 0.07519536, -0.016274251, ...] shape=(768,)"
4,Is The Cash Surrender Value ...,Cash surrender value comes only...,"[0.0059586815, 0.033545397, 0.018074661, ...] shape=(768,)"


99

### Insert IDs and QA combos into PostgreSQL

In [9]:
# def load_data_to_mysql(data):
#     sql = ''.join(['insert into ', 'question_answering',' (id,question,answer) values (%s,%s,%s);'])
#     try:
#         cursor.executemany(sql, data)
#         conn.commit()
#         print('Load data to table: \'question_answering\' successfully')
#     except Exception as e:
#         print('MYSQL ERROR: {} with sql: {}'.format(e, sql))

# load_data_to_mysql(data)
data = {}
for i in dc:
    data[i.id] = i.answer

### Search

In [10]:
from towhee import Entity
queries = ['What is AAA?']
search_params = {"metric_type": 'IP', "params": {"nprobe": 16}}

dc = (
	towhee.DataFrame([Entity(query=query) for query in queries])
		.runas_op['query', 'qvec'](func = model.encode)
		.tensor_normalize['qvec', 'qvec']()
		.milvus_search['qvec', 'results'](collection=collection, anns_field="embedding", param=search_params, limit=5)
		.runas_op['results', 'answers'](func = lambda x: [{'answer': data[i.id], 'scores': i.score} for i in x])
		.select['query', 'answers']()
)

In [11]:
dc.show()

query,answers
What is AAA?,"[{'answer': ' AAA Home insurance, like all other major carriers, covers a wide variety of claims, including fire, theft, vandalis...,{'answer': ' Many insurers utilize credit scores when determining the auto insurance rate a customer will pay. The reason is tha...,{'answer': ' Yes, automobile insurance is typically paid in advance. Normally no less than thirty days at a time. Each carrier s...,{'answer': ' No, AARP does not carry Long Term Care insurance at this time. The American Association of Retired Persons does giv...,...] len=5"


In [12]:
dc[0].answers

[{'answer': ' AAA Home insurance, like all other major carriers, covers a wide variety of claims, including fire, theft, vandalism, and many other items. However, there are numerous types of policies offered, so it is best to determine the type of policy you have to accurately understand all of the benefits. An experienced broker can help.',
  'scores': 0.572884202003479},
 {'answer': ' Many insurers utilize credit scores when determining the auto insurance rate a customer will pay. The reason is that often, there is a direct correlation and relationship with bad credit and higher incidence of accidents. Although the discount or surcharge may not be large, it is till worthwhile to be aware of your current credit rating and take steps to monitor and improve it.',
  'scores': 0.4042107164859772},
 {'answer': ' Yes, automobile insurance is typically paid in advance. Normally no less than thirty days at a time. Each carrier sets their own requirements as to the initial payment amount for n