Idea: turn a PDF into a text representation and use embeddings to ask questions about the text (same idea as `ask_a_codebase.ipynb`, but with a bit more processing around it).

In [1]:
%reload_ext autoreload
%autoreload 2

In [2]:
import os
from pathlib import Path

from dotenv import load_dotenv

load_dotenv(verbose=True);

## Ask a paper questions

I use the LoFTR paper (https://arxiv.org/pdf/2104.00680.pdf) as an example paper.

In [3]:
"""
TODO:
    - try using a system prompt to make the model more precise and helpful
    - Document postprocessing:
        - footnotes are not distinguishable as foot notes. They are simply recognized as being part of the text.
        - if there is an image with text in it, that text is also extracted
        - check if math formatting can be better
            - surround with $ signs?
        - idea: add metadata to the chunks, e.g. page number
""";

In [4]:
import textwrap

def printw(text):
    print(textwrap.fill(text, width=100))

In [5]:
# Load the pdf

from langchain.document_loaders import PyPDFLoader

pdf_path = "data/LoFTR_paper.pdf"

loader = PyPDFLoader(pdf_path)  # the pdf is split by page
data = loader.load()

print(f"There are {len(data)} document(s) in the data")
print(f"Characters per document: {[len(d.page_content) for d in data]}")

There are 10 document(s) in the data
Characters per document: [3941, 5754, 26674, 21725, 4889, 5142, 5190, 3093, 5524, 3396]


In [6]:
# Do some postprocessing to filter out things like SHA1 hashes

import re

pattern = re.compile(r"<latexit.*?>.*?</latexit>", re.DOTALL)
for d in data:
    d.page_content = pattern.sub("", d.page_content)

In [7]:
for d in data:
    print("==================")
    printw(d.page_content)
    print("==================")

LoFTR: Detector-Free Local Feature Matching with Transformers Jiaming Sun1;2Zehong Shen1Yuang
Wang1Hujun Bao1Xiaowei Zhou1y 1Zhejiang University2SenseTime Research Abstract We present a novel
method for local image feature matching. Instead of performing image feature detection, description,
and matching sequentially, we propose to ﬁrst establish pixel-wise dense matches at a coarse level
and later reﬁne the good matches at a ﬁne level. In contrast to dense methods that use a cost volume
to search corre- spondences, we use self and cross attention layers in Trans- former to obtain
feature descriptors that are conditioned on both images. The global receptive ﬁeld provided by
Trans- former enables our method to produce dense matches in low-texture areas, where feature
detectors usually strug- gle to produce repeatable interest points. The experiments on indoor and
outdoor datasets show that LoFTR outper- forms state-of-the-art methods by a large margin. LoFTR
also ranks ﬁrst on two pu

In [10]:
# Split the data into smaller chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=500)
docs = text_splitter.split_documents(data)

print(f"There are now {len(docs)} documents")

There are now 46 documents


In [11]:
# Show the chunks
for d in docs:
    print("==================")
    print(d.metadata)
    printw(d.page_content)
    print("==================")

{'source': 'data/LoFTR_paper.pdf', 'page': 0}
LoFTR: Detector-Free Local Feature Matching with Transformers Jiaming Sun1;2Zehong Shen1Yuang
Wang1Hujun Bao1Xiaowei Zhou1y 1Zhejiang University2SenseTime Research Abstract We present a novel
method for local image feature matching. Instead of performing image feature detection, description,
and matching sequentially, we propose to ﬁrst establish pixel-wise dense matches at a coarse level
and later reﬁne the good matches at a ﬁne level. In contrast to dense methods that use a cost volume
to search corre- spondences, we use self and cross attention layers in Trans- former to obtain
feature descriptors that are conditioned on both images. The global receptive ﬁeld provided by
Trans- former enables our method to produce dense matches in low-texture areas, where feature
detectors usually strug- gle to produce repeatable interest points. The experiments on indoor and
outdoor datasets show that LoFTR outper- forms state-of-the-art methods by a

In [12]:
# Show the chunks
for d in docs:
    if "batch size" in d.page_content:
        print("==================")
        print(d.metadata)
        printw(d.page_content)
        print("==================")

{'source': 'data/LoFTR_paper.pdf', 'page': 4}
c: Lc= 1 jMgt cjX (~i;~j)2Mgt clogPc ~i;~j : Fine-level Supervision. We use the`2loss for ﬁne-level
reﬁnement. Following [50], for each query point ^i, we also measure its uncertainty by calculating
the total variance 2(^i)of the corresponding heatmap. The target is to opti- mize the reﬁned
position that has low uncertainty, resulting in the ﬁnal weighted loss function: Lf=1 jMfjX
(^i;^j0)2M f1 2(^i)   ^j0 ^j0 gt    2; in which ^j0 gtis calculated by warping each ^ifrom ^FA
tr(^i)to ^FB tr(^j)with the ground-truth camera pose and depth. We ignore ( ^i,^j0) if the warped
location of ^ifalls out of the local window of ^FB tr(^j)when calculatingLf. The gradient is not
backpropagated through 2(^i)during training. 3.6. Implementation Details We train the indoor model
of LoFTR on the ScanNet [7] dataset and the outdoor model on the MegaDepth [21] fol- lowing [37]. On
ScanNet, the model is trained using Adam with an initial learning rate of 1

In [13]:
# Initialize the embeddings class

from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
embeddings

OpenAIEmbeddings(client=<class 'openai.api_resources.embedding.Embedding'>, model='text-embedding-ada-002', deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=None, openai_organization=None, allowed_special=set(), disallowed_special='all', chunk_size=1000, max_retries=6, request_timeout=None, headers=None)

In [14]:
# Initialize the Chroma vectorstore

from langchain.vectorstores import Chroma

docsearch = Chroma.from_documents(docs, embeddings)

In [15]:
retriever = docsearch.as_retriever()
# Settings from https://python.langchain.com/en/latest/use_cases/code/code-analysis-deeplake.html
# Explanation of MMR: https://python.langchain.com/en/latest/modules/prompts/example_selectors/examples/mmr.html
# retriever.search_kwargs['distance_metric'] = 'cos'
# retriever.search_kwargs['maximal_marginal_relevance'] = True
# retriever.search_kwargs['fetch_k'] = 20  # Number of Documents to fetch to pass to MMR algorithm
# retriever.search_kwargs['k'] = 5  # Number of Documents to return

In [17]:
use_chat_model = True


if use_chat_model:
    # GPT3.5-turbo
    from langchain.chat_models import ChatOpenAI
    from langchain.chains import ConversationalRetrievalChain

    llm = ChatOpenAI(model_name='gpt-3.5-turbo', temperature=0) # 'ada' 'gpt-3.5-turbo' 'gpt-4',
    qa = ConversationalRetrievalChain.from_llm(llm, retriever=retriever, return_source_documents=True)

    print("Loaded GPT3.5-turbo model.")
else:
    # GPT3
    from langchain.llms import OpenAI
    from langchain.chains.question_answering import load_qa_chain
    from langchain.chains import RetrievalQA

    llm = OpenAI(temperature=0)
    # qa = load_qa_chain(llm)
    qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever, return_source_documents=True)

    print("Loaded GPT3 model.")

Loaded GPT3.5-turbo model.


### Ask questions

In [18]:
questions = [
    # Qs w good answers
    # "What is the LoFTR model?",
    # "Where can I find the Github source code?",
    # "Where can I find the Github source code for LoFTR?. Quote the part of the paper where they mention it.",
    # "What learning rate do they use for training LoFTR?",
    # "Why is the use of Transformer networks important for LoFTR?",
    "What learning rate is used for training?",
    # "What batch size is used for training?",

    # Qs w bad answers
    # "Who is the first author of the LoFTR paper? It's next to the paper title.", 
    # "Who are the authors of the paper?", 
    # "On what page is the conclusion?"  # does the llm get this information?
]

def ask_question(question, use_chat_model=False, chat_history=[]):
    if use_chat_model:
        response = qa({"question": question, "chat_history": chat_history})
        answer = response["answer"]
        chat_history.append((question, answer))
    else:
        # response = qa.run(input_documents=matched_docs, question=question)
        response = qa({"query": question})
        answer = response["result"]
    return answer, response["source_documents"]


for question in questions:
    print(f"-> **Question**: {question} \n")
    print(f"**Answer**:")

    answer, matched_docs = ask_question(question, use_chat_model=use_chat_model)

    print("\n".join(answer[i:i+100] for i in range(0, len(answer), 100)))
    print("\n**Sources**:")
    # Print short sources
    for i, doc in enumerate(matched_docs):
        print(f" {i+1}. Page {doc.metadata['page']+1}\n{doc.page_content[:150]}\n")
    # Print full sources
    for i, doc in enumerate(matched_docs):
        print(f" {i+1}. Page {doc.metadata['page']+1}\n{doc.page_content}\n")
    break

-> **Question**: What learning rate is used for training? 

**Answer**:
The indoor model of LoFTR is trained using Adam with an initial learning rate of 1e-3 and a batch si
ze of 64.

**Sources**:
 1. Page 5
c:
Lc= 1
jMgt
cjX
(~i;~j)2Mgt
clogPc ~i;~j
:
Fine-level Supervision. We use the`2loss for ﬁne-level
reﬁnement. Following [50], for each query point ^

 2. Page 5
1080Ti GPUs. The local feature CNN uses a modiﬁed ver-
sion of ResNet-18 [12] as the backbone. The entire model
is trained end-to-end with randomly in

 3. Page 2
tures have achieved good performances. SIFT [26] and
ORB [35] are arguably the most successful hand-crafted
local features and are widely adopted in m

 4. Page 7
2)1/16coarse-resolution + 1/4ﬁne-resolution 16.75 34.82 54.0
3) positional encoding per layer 18.02 35.64 52.77
4) larger model with Nc= 8;Nf= 2 20.87

 1. Page 5
c:
Lc= 1
jMgt
cjX
(~i;~j)2Mgt
clogPc ~i;~j
:
Fine-level Supervision. We use the`2loss for ﬁne-level
reﬁnement. Following [50], for each que