In [26]:
import logging
import sys
from IPython.display import Markdown, display

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

In [48]:
import os.path
from llama_index import VectorStoreIndex, GPTTreeIndex, SummaryIndex
from llama_index import SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index import SummaryPrompt, ServiceContext
from llama_index.llms import OpenAI

In [49]:
llm = OpenAI(model="gpt-4")
service_context = ServiceContext.from_defaults(llm=llm)

In [56]:
DATA_PATH = "./data"
INDEX_PATH_VECTOR = "./storage_vector"
INDEX_PATH_SUMMARY = "./storage_summary"
INDEX_PATH_TREE = "./storage_tree"

In [51]:
query_str = "What is a summary of the 48th Tank Battalion's experiences between the time they left Lupstein in February 1945 and the announcement of VE Day in May 1945?"

## VECTOR INDEX ##

In [52]:
# check if storage already exists
if (not os.path.exists(INDEX_PATH_VECTOR)):
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_PATH).load_data()
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    # store it for later
    index.storage_context.persist(persist_dir=INDEX_PATH_VECTOR)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=INDEX_PATH_VECTOR)
    index = load_index_from_storage(storage_context)

In [53]:
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

<b>The context does not provide information on the 48th Tank Battalion's experiences between the time they left Lupstein in February 1945 and the announcement of VE Day in May 1945.</b>

Notes: This is not correct!?

## SUMMARY INDEX ##

In [57]:
# check if storage already exists
if (not os.path.exists(INDEX_PATH_SUMMARY)):
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_PATH).load_data()
    index = SummaryIndex.from_documents(documents, service_context=service_context)
    # store it for later
    index.storage_context.persist(persist_dir=INDEX_PATH_SUMMARY)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=INDEX_PATH_SUMMARY)
    index = load_index_from_storage(storage_context)

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.


In [55]:
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

<b>The 48th Tank Battalion had a series of significant military engagements from February to May 1945. Initially, they were attached to the 42nd Infantry Division and tasked with breaching the Siegfried Line, Germany's main line of defense, which they successfully accomplished. They then moved to Wissembourg where they were involved in clearing mines and pillboxes. The battalion also had the unique possession of the “Ripple Dipple” or Multi-barrelled Rocket projector. 

They participated in the Battle of Central Europe, advancing through several towns across the Rhine River, dealing with resistance and obstacles. They were also responsible for clearing German towns, handling roadblocks, and releasing Allied prisoners of war. The battalion seized important bridge crossings at the Isar River at Jettenbach and managed a continuous flow of prisoners. They were involved in the occupation of towns and handling displaced personnel. Their experiences during this period culminated in the announcement of VE Day in May 1945, marking the end of hostilities in Europe.</b>

Notes: This is good, concise but with good details

## TREE INDEX ##

In [59]:
# check if storage already exists
if (not os.path.exists(INDEX_PATH_TREE)):
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_PATH).load_data()
    index = GPTTreeIndex.from_documents(documents, service_context=service_context)
    # store it for later
    index.storage_context.persist(persist_dir=INDEX_PATH_TREE)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=INDEX_PATH_TREE)
    index = load_index_from_storage(storage_context)

INFO:llama_index.indices.common_tree.base:> Building index from nodes: 1 chunks
> Building index from nodes: 1 chunks
> Building index from nodes: 1 chunks
> Building index from nodes: 1 chunks
> Building index from nodes: 1 chunks


In [60]:
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]


<b>The 48th Tank Battalion had a series of intense experiences between leaving Lupstein in February 1945 and the announcement of VE Day in May 1945. They consolidated their gains and posted security, with the men getting some much-needed rest. They coordinated an attack plan with the Third Division and made an initial push at dawn, securing towns in the vicinity. They moved towards Neustadt, taking over town after town and leaving troops behind to maintain control. After overcoming resistance, they occupied one of the largest cities they had encountered, necessitating an intensive guard patrol system. 

The men had a brief respite where they could rest, take showers, catch up on mail, and get paid in Marks. They covered over 150 miles from Otterbach to Neustadt, across various terrains and against unknown resistance. They were alerted for movement again on April 11th, with the main objective being Bad Staffelstein. They moved through numerous towns and villages, encountering small arms and sniper fire. They reached the Mainz river, but all bridges were blown. They were forced to make a crossing and seize the next town. 

The battalion was tasked with cutting and securing the Autobahn running east of Nuremberg, which was being seized by the 3rd and 45th Infantry Divisions. They faced small arms and artillery fire, and terrain obstacles. Upon reaching the Autobahn, they headed south to Neudorf, dispersing companies to outlying towns and ordered to hold their positions.</b>

Notes: This is too much detail

## TREE INDEX W/ SUMMARY PROMPT ##

In [69]:
# define custom SummaryPrompt
SUMMARY_PROMPT_TMPL = (
    "Write a summary of the following. Try to use only the "
    "information provided. "
    "Try to include as many key details as possible with an emphasis on dates and names of towns.\n"
    "\n"
    "\n"
    "{context_str}\n"
    "\n"
    "\n"
    'SUMMARY:"""\n'
)
SUMMARY_PROMPT = SummaryPrompt(SUMMARY_PROMPT_TMPL)

In [70]:
# check if storage already exists
if (not os.path.exists(INDEX_PATH_TREE)):
    # load the documents and create the index
    documents = SimpleDirectoryReader(DATA_PATH).load_data()
    index = GPTTreeIndex.from_documents(documents, service_context=service_context, summary_template=SUMMARY_PROMPT)
    # store it for later
    index.storage_context.persist(persist_dir=INDEX_PATH_TREE)
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=INDEX_PATH_TREE)
    index = load_index_from_storage(storage_context)

INFO:llama_index.indices.loading:Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.
Loading all indices.


In [73]:
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = query_engine.query(query_str)
display(Markdown(f"<b>{response}</b>"))

INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
>[Level 0] Selected node: [2]/[2]
INFO:llama_index.indices.tree.select_leaf_retriever:>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]
>[Level 1] Selected node: [1]/[1]


<b>The 48th Tank Battalion embarked on a mission to secure towns and push towards Neustadt. They faced resistance but managed to reach their objective and establish control. After a few days of rest, they were alerted for movement towards Bad Staffelstein. They encountered small arms and sniper fire, and despite blown bridges, they forced a crossing and occupied the next town. The battalion's main objective was to cut and secure the Autobahn near Nuremberg. They faced opposition and overcame terrain obstacles, eventually reaching the Autobahn and heading south. Companies were dispersed to outlying towns, and their orders were to hold their positions.</b>

Notes: This is not bad but too concise