# Question Answering Engine

from scratch using [Milvus](https://milvus.io/) and [Towhee](https://towhee.io/).

### About Milvus

Milvus is an open-source vector database designed to power the next generation of AI applications by efficiently storing, indexing, and searching massive collections of high-dimensional vectors used in machine learning and deep learning models.

#### Key Features:
- **High Performance:** Optimized for speed and scalability, enabling fast similarity searches on billions of vectors.
- **Advanced Indexing:** Supports various indexing algorithms, including HNSW, IVF, and ScaNN.
- **Hybrid Search:** Combines vector similarity search with scalar filtering to refine results and improve accuracy.
- **Cloud-Native Architecture:** Features separated storage and computation layers for enhanced flexibility and elasticity.
- **Ease of Use:** Provides intuitive SDKs for various programming languages, facilitating easy integration.

#### Use Cases:
- **Recommendation Systems:** Product, content, or service recommendations based on user preferences.
- **Image and Video Search:** Searches for visually similar images or videos in large collections.
- **Anomaly Detection:** Identifies unusual patterns or outliers, such as fraudulent transactions or defective products.
- **Natural Language Processing (NLP):** Performs semantic search, clustering, and other NLP tasks based on text embeddings.
- **Drug Discovery:** Analyzes molecular structures to identify potential drug candidates.

#### Getting Started:
- Download and install Milvus from GitHub: [Milvus GitHub](https://github.com/milvus-io/milvus)
- For a managed solution, try Zilliz Cloud: [Zilliz Cloud](https://zilliz.com/what-is-milvus)

---

### About Towhee

Towhee is a cutting-edge framework for processing unstructured data using Large Language Model (LLM) based pipeline orchestration. It transforms raw data such as text, images, audio, and video files into specific formats like text, image, or embeddings, which can then be efficiently stored in vector databases. Developers can easily prototype data processing pipelines using a user-friendly Pythonic API and optimize them for production environments.

#### Key Features:
- üé® **Multi Modalities:** Processes various data types, including images, video clips, text, audio files, and molecular structures.
- üìÉ **LLM Pipeline Orchestration:** Adapts to different LLMs, hosts open-source large models locally, and features prompt management and knowledge retrieval.
- üéì **Rich Operators:** Provides over 140 ready-to-use state-of-the-art models for computer vision, NLP, multimodal, audio, and medical domains.
- üîå **Prebuilt ETL Pipelines:** Offers ready-to-use ETL pipelines for tasks like Retrieval-Augmented Generation, Text Image search, and Video copy detection.
- ‚ö°Ô∏è **High-Performance Backend:** Utilizes the Triton Inference Server to speed up model serving on CPU and GPU, and can convert Python pipelines into high-performance Docker containers.
- üêç **Pythonic API:** Includes a Pythonic method-chaining API for describing custom data processing pipelines, making unstructured data processing as easy as handling tabular data.

#### Core Concepts:
- **Operators:** Basic building blocks of neural data processing pipelines, including deep learning models, data processing methods, or Python functions.
- **Pipelines:** Composed of several operators interconnected as a directed acyclic graph (DAG) for complex functionalities.
- **DataCollection API:** Pythonic, method-chaining style API for building custom pipelines with multiple data conversion interfaces.
- **Engine:** Drives dataflow among operators, schedules tasks, and monitors compute resource usage, providing a basic engine for single-instance machines and a Triton-based engine for Docker containers.

---

## Preparations

In [3]:
#Install Dependencies
! python -m pip install -q towhee towhee.models gradio

We use a subset of the [InsuranceQA Corpus](https://github.com/shuzi/insuranceQA) (1000 pairs of questions and answers).

Link to download: [Github](https://github.com/towhee-io/examples/releases/download/data/question_answer.csv).

In [4]:
# Prepare the Data
! curl -L https://github.com/towhee-io/examples/releases/download/data/question_answer.csv -O

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  595k  100  595k    0     0   586k      0  0:00:01  0:00:01 --:--:--  866k


In [5]:
import pandas as pd
df = pd.read_csv('question_answer.csv')
df.head()

  from pandas.core import (


Unnamed: 0,id,question,answer
0,0,Is Disability Insurance Required By Law?,Not generally. There are five states that requ...
1,1,Can Creditors Take Life Insurance After ...,If the person who passed away was the one with...
2,2,Does Travelers Insurance Have Renters Ins...,One of the insurance carriers I represent is T...
3,3,Can I Drive A New Car Home Without Ins...,Most auto dealers will not let you drive the c...
4,4,Is The Cash Surrender Value Of Life Ins...,Cash surrender value comes only with Whole Lif...


In [9]:
# id_answer: a dictionary of id and corresponding answer
id_answer = df.set_index('id')['answer'].to_dict()
list(id_answer.items())[:5]

[(0,
  'Not generally. There are five states that require most all employers carry short term disability insurance on their employees. These states are: California, Hawaii, New Jersey, New York, and Rhode Island. Besides this mandatory short term disability law, there is no other legislative imperative for someone to purchase or be covered by disability insurance.'),
 (1,
  'If the person who passed away was the one with the debt, creditors generally cannot take the life insurance proceeds left as long as the beneficiary was a person. The money then belongs to that beneficiary, and as long as creditors do not have a claim against the beneficiary, they cannot take life insurance proceeds from them.'),
 (2,
  'One of the insurance carriers I represent is Travelers and yes, you can purchase Renters insurance through Travelers. I would look for a local agent who can assist you in placing a renters policy if you are interested. I am sure the local agent would be happy to quote Travelers if 

####¬†Creating Milvus Collection

In [10]:
! python -m pip install -q pymilvus==2.2.11

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
grpcio-status 1.62.2 requires grpcio>=1.62.2, but you have grpcio 1.53.0 which is incompatible.
grpcio-status 1.62.2 requires protobuf>=4.21.6, but you have protobuf 3.20.3 which is incompatible.
grpcio-tools 1.62.2 requires grpcio>=1.62.2, but you have grpcio 1.53.0 which is incompatible.
grpcio-tools 1.62.2 requires protobuf<5.0dev,>=4.21.6, but you have protobuf 3.20.3 which is incompatible.[0m[31m
[0m