In [1]:
import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv())
openai_api_key = os.environ["OPENAI_API_KEY"]

## Basic app for QA a a code library or a github repository

**Load the github repo**
<br>
We will load the code of the github repo The Fuzz, a small python module for string matching.

In [2]:
root_dir = "data/thefuzz-master"

In [3]:
document_chunks = []

In [4]:
from langchain.document_loaders import TextLoader

In [5]:
for dirpath, dirnames, filenames in os.walk(root_dir):
    for file in filenames:
        try:
            loader = TextLoader(
                os.path.join(dirpath, file),
                encoding="utf-8"
            )
            document_chunks.extend(loader.load_and_split())
        except Exception as e:
            pass

In [6]:
print(f"We have {len(document_chunks)} chunks.")

We have 170 chunks.


In [7]:
print(document_chunks[0].page_content[:300])

import unittest
import re
import pycodestyle

from thefuzz import fuzz
from thefuzz import process
from thefuzz import utils

scorers = [
    fuzz.ratio,
    fuzz.partial_ratio,
    fuzz.token_sort_ratio,
    fuzz.token_set_ratio,
    fuzz.partial_token_sort_ratio,
    fuzz.partial_token_set_ratio,



**Convert the text chunks in embeddings and store them in a vector database**

In [8]:
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings

In [9]:
embeddings = OpenAIEmbeddings()

  warn_deprecated(


In [10]:
stored_embeddings = FAISS.from_documents(document_chunks, embeddings)

**Create the RetrievalQA chain**

In [11]:
from langchain_openai import ChatOpenAI

In [12]:
chat_model = ChatOpenAI()

In [13]:
from langchain.chains import RetrievalQA

In [14]:
qa_chain = RetrievalQA.from_chain_type(
    llm=chat_model,
    chain_type="stuff",
    retriever=stored_embeddings.as_retriever()
)

**Now we can make questions about the github library**

In [15]:
question = """
What function do I use if I want to find 
the most similar item in a list of items?
"""

In [16]:
answer = qa_chain.run(question)

  warn_deprecated(


In [17]:
print(answer)

You can use the `process.extractOne()` function from the `thefuzz` library to find the most similar item in a list of items. This function takes a query string and a list of choices, and it returns a tuple containing the best match and its similarity score. Here's an example of how to use it:

```python
from thefuzz import process

choices = ["apple", "banana", "orange"]
query = "aple"

best_match = process.extractOne(query, choices)
print(best_match)
```

Output:
```
('apple', 80)
```

In this example, the best match for the query "aple" is "apple" with a similarity score of 80.
