## Hippo

>[Hippo](https://www.transwarp.cn/starwarp) Please visit our official website for how to run a Hippo instance and
how to use functionality related to the Hippo vector database

## Getting Started

The only prerequisite here is an API key from the OpenAI website. Make sure you have already started a Hippo instance.

## Installing Dependencies

Initially, we require the installation of certain dependencies, such as OpenAI, Langchain, and Hippo-API. Please note, you should install the appropriate versions tailored to your environment.

In [15]:
!pip install langchain tiktoken openai
!pip install hippo-api==1.1.0.rc3



Note: Python version needs to be >=3.8.

## Best Practice
### Importing Dependency Packages

In [16]:
from langchain.chat_models import AzureChatOpenAI, ChatOpenAI
from langchain.document_loaders import TextLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.hippo import Hippo
import os

### Loading Knowledge Documents

In [17]:
os.environ["OPENAI_API_KEY"] = "YOUR OPENAI KEY"
loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()

### Segmenting the Knowledge Document

Here, we use Langchain's CharacterTextSplitter for segmentation. The delimiter is a period. After segmentation, the text segment does not exceed 1000 characters, and the number of repeated characters is 0.

In [18]:
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

### Declaring the Embedding Model
Below, we create the OpenAI or Azure embedding model using the OpenAIEmbeddings method from Langchain.

In [19]:
# openai
embeddings = OpenAIEmbeddings()
# azure
# embeddings = OpenAIEmbeddings(
#     openai_api_type="azure",
#     openai_api_base="x x x",
#     openai_api_version="x x x",
#     model="x x x",
#     deployment="x x x",
#     openai_api_key="x x x"
# )

### Declaring Hippo Client

In [20]:
HIPPO_CONNECTION = {"host": "IP", "port": "PORT"}

### Storing the Document

In [23]:
print("input...")
# insert docs
vector_store = Hippo.from_documents(
    docs,
    embedding=embeddings,
    table_name="langchain_test",
    connection_args=HIPPO_CONNECTION,
)
print("success")

input...
success


### Conducting Knowledge-based Question and Answer
#### Creating a Large Language Question-Answering Model
Below, we create the OpenAI or Azure large language question-answering model respectively using the AzureChatOpenAI and ChatOpenAI methods from Langchain.

In [24]:
# llm = AzureChatOpenAI(
#     openai_api_base="x x x",
#     openai_api_version="xxx",
#     deployment_name="xxx",
#     openai_api_key="xxx",
#     openai_api_type="azure"
# )

llm = ChatOpenAI(openai_api_key="YOUR OPENAI KEY", model_name="gpt-3.5-turbo-16k")

### Acquiring Related Knowledge Based on the Question：

In [25]:
query = "Please introduce COVID-19"
# query = "Please introduce Hippo Core Architecture"
# query = "What operations does the Hippo Vector Database support for vector data?"
# query = "Does Hippo use hardware acceleration technology? Briefly introduce hardware acceleration technology."


# Retrieve similar content from the knowledge base,fetch the top two most similar texts.
res = vector_store.similarity_search(query, 2)
content_list = [item.page_content for item in res]
text = "".join(content_list)

### Constructing a Prompt Template

In [26]:
prompt = f"""
Please use the content of the following [Article] to answer my question. If you don't know, please say you don't know, and the answer should be concise."
[Article]:{text}
Please answer this question in conjunction with the above article:{query}
"""

### Waiting for the Large Language Model to Generate an Answer

In [27]:
response_with_hippo = llm.predict(prompt)
print(f"response_with_hippo:{response_with_hippo}")
response = llm.predict(query)
print("==========================================")
print(f"response_without_hippo:{response}")

response_with_hippo:COVID-19 is a virus that has impacted every aspect of our lives for over two years. It is a highly contagious and mutates easily, requiring us to remain vigilant in combating its spread. However, due to progress made and the resilience of individuals, we are now able to move forward safely and return to more normal routines.
response_without_hippo:COVID-19 is a contagious respiratory illness caused by the novel coronavirus SARS-CoV-2. It was first identified in December 2019 in Wuhan, China and has since spread globally, leading to a pandemic. The virus primarily spreads through respiratory droplets when an infected person coughs, sneezes, talks, or breathes, and can also spread by touching contaminated surfaces and then touching the face. COVID-19 symptoms include fever, cough, shortness of breath, fatigue, muscle or body aches, sore throat, loss of taste or smell, headache, and in severe cases, pneumonia and organ failure. While most people experience mild to mode