# LlamaIndex

LlamaIndex (GPT Index) is a data framework for your LLM application.

* [LlamaIndex](https://github.com/jerryjliu/llama_index)
* [LlamaIndex Documention](https://gpt-index.readthedocs.io/en/latest/)

## 安装

In [None]:
!pip install -U llama-index

## 增加调试日志

In [4]:
import logging
import sys

logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

## 基于GPT构建本地文档微量索引

In [3]:
from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader

journey_to_the_west_dir = "data/texts/西游记青少年版/"

documents = SimpleDirectoryReader(journey_to_the_west_dir).load_data()
index = GPTVectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

  from .autonotebook import tqdm as notebook_tqdm


### 保存

In [4]:
journey_to_the_west_persist_dir = "storage/llamaindex/西游记青少年版/"
index.storage_context.persist(persist_dir=journey_to_the_west_persist_dir)

### 加载

In [7]:
from llama_index import StorageContext, load_index_from_storage

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir=journey_to_the_west_persist_dir)
# load index
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()

## 问答

In [19]:
response = query_engine.query("猪八戒的媳妇是谁？")
print(response)


猪八戒的媳妇是谁？答案是没有提及。


In [18]:
response = query_engine.query("高老庄中猪八戒的媳妇是谁？")
print(response)


高老庄中猪八戒的媳妇是翠兰。


### 从上面可以看出，索引的本地内容多了，在提问的时候就需要多加一些内容帮助缩小范围，可以更精准的回答问题。

In [15]:
response = query_engine.query("孙悟空被谁镇压在五行山？")
print(response)


孙悟空被一个妖怪镇压在五行山下。


In [10]:
response = query_engine.query("猪八戒有没有被镇压在五行山？")
print(response)


No, 猪八戒没有被镇压在五行山。根据上文，猪八戒正在崇山峻岭间巡山探路，而不是在五行山。


In [11]:
response = query_engine.query("孙悟空喜欢叫猪八戒什么？")
print(response)


孙悟空喜欢叫猪八戒“猪悟能”。


In [14]:
response = query_engine.query("大闹天宫中孙悟空和谁战斗？")
print(response)


孙悟空在大闹天宫中与三个妖王战斗。


In [13]:
response = query_engine.query("女儿国中喝水就能怀孕的河叫什么？")
print(response)


子母河


## Document Summary Index

In [None]:
from llama_index import (
    SimpleDirectoryReader,
    LLMPredictor,
    ServiceContext,
    ResponseSynthesizer
)
from llama_index.indices.document_summary import GPTDocumentSummaryIndex
from langchain.chat_models import ChatOpenAI

import nest_asyncio
nest_asyncio.apply()

docs = SimpleDirectoryReader('data/texts/西游记青少年版/').load_data()
docs[0].doc_id = "西游记青少年版"

# LLM Predictor (gpt-3.5-turbo)
llm_predictor_chatgpt = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, chunk_size_limit=1024)

response_synthesizer = ResponseSynthesizer.from_args(response_mode="tree_summarize")#, use_async=True
doc_summary_index = GPTDocumentSummaryIndex.from_documents(
    docs, 
    service_context=service_context,
    response_synthesizer=response_synthesizer
)
doc_summary_index.get_document_summary("西游记青少年版")