# Intro to LangChain (CoLab version)

LangChain is a Python framework for developing applications using language models. It abstracts the connection between applications and LLMs, allowing a loose coupling between code and specific providers like GoogleLLM, Anthropic, etc.

This guide details how to get started with LangChain, and walks-through setting up a Q&A over Documentation example. The files for this guide can be found here.

## Creating a knowledge worker
Instead of training LLMs using your own data (ie. fine-tuning), it is far easier and more effective to adapt the LLM to your use-case by prompt engineering only (ie. tuning).

A custom knowledge worker (an LLM app which only has access to specific knowledge, such as technical documentation), is described in two stages: 1. knowledge embedding and 2. LLM Q&A.

![A typical ingestion chain](assets/typical-ingestion-chain.png)

First, documents (website pages, Word documents, databases, Powerpoints, etc.) are loaded and split into chunks. Fragmenting is important for three reasons - first, there are technical restrictions on how much data (tokens) can be fed into an LLM at once, meaning all context + prompt engineering + chat history + new query must fit within the token limit (e.g.: ~4096 tokens for GoogleLLM). Second, even if it was possible to provide entire documents as context for a query, most LLM APIs operate on a per-token pricing model, meaning it is cost-effective to limit the size / amount of data provided to the LLM per query. Third, contextually information should be relevant to the user query, meaning it is optimial to provide only relevant snippets from documents, making the answer more relevant whilst saving costs as per (1) and (2).

Next, these document shards are embedded within a vector store. Embedding a document means to align it within a mutli-dimension space, which can then be searched according to user queries to find relevant documents. Document relevancy scoring can be as simple as a K-neighbours search, since embedded documents with similarity (as percieved by the LLM embedding model) will be proximate within the search space.

![A typical query chain](assets/typical-query-chain.png)

Once the vector store is created, a user can query the knowledge base using natural language questions. Relevant documents related to the query are found in the vector store by embedding the user query and finding local documents. These snippets of text are provided to the LLM (alongside the user query, chat history, prompt engineering, etc.) which parses the information to generate an answer.

## Prerequisites


In [None]:
from __future__ import annotations

import logging
import os
from getpass import getpass
from typing import Any, Dict, List, Mapping, Optional

import gradio as gr
from langchain.chains import ConversationalRetrievalChain
from langchain.document_loaders import DirectoryLoader, UnstructuredHTMLLoader
from langchain.embeddings.base import Embeddings
from langchain.prompts import PromptTemplate
from langchain.llms.base import LLM
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from pydantic import BaseModel, root_validator
from vertexai.preview.language_models import TextGenerationModel, TextEmbeddingModel


### Running the program

Run this notebook using poetry:
```bash
poetry shell
poetry run jupyter notebook langchain_demo.ipynb
```

### Install dependencies

In [None]:
%pip install langchain chromadb tiktoken gradio unstructured tqdm getpass google-cloud-aiplatform

### Setup LLM API Token
To complete this demo you will need an [GoogleLLM API Key](https://platform.GoogleLLM.com/account/api-keys).
Run the code block below to set a temporary environment variable containing your API token.

In [None]:
# Prompt notebook user for their API key
GoogleLLM_API_KEY = getpass("GoogleLLM API Key:")
# Save that key as an environment variable for later use.
os.environ["GoogleLLM_API_KEY"] = GoogleLLM_API_KEY


: 

## Task 01: Q&A over Documentation with GoogleLLM

Creating a custom knowledge worker is the “Hello World!” of LLMOps. Loading documents, creating embeddings, storing in a vector database and using an LLM to answer queries with knowledge from that database can be achieved in a few lines of Python.

### Document loading
First, we need to select our documentation. LangChain supports [numerous methods](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) for loading documents. As we will be constructing our knowledge worker using website data, we first download the files locally using `wget`. Note: replace `https://datatonic.com/` with any domain to build a knowledge worker specific to your use-case.

In [None]:
!wget -r -A.html https://datatonic.com/

Note how filepaths to the `.html` files form proper URL paths, we'll use this later to reference our answers with proper hyperlinks.

The downloaded website can then be loaded as documents using the LangChain [`DirectoryLoader`](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/directory_loader.html) and [`UnstructuredHTMLLoader`](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/html.html) loaders.

We load the entire collection of documents at once using this method, but it is also possible to load and embed documents iteratively, as we will do in task 2.

In [None]:
def load_documents(source_dir):
    # Load the documentation using a HTML parser
    loader = DirectoryLoader(
        source_dir,
        glob="**/*.html",
        loader_cls=UnstructuredHTMLLoader,
        show_progress=True,
    )
    documents = loader.load()

    return documents


If you wanted to build a knowledge worker with another document type, for instance [Microsoft Word](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/microsoft_word.html) documents, you would update the `load_documents()` function according to the documentation for that document type loader.

E.g.:
```python
from langchain.document_loaders import Docx2txtLoader

def load_documents():
    # Load the documentation using a Microsoft Word parser
    loader = Docx2txtLoader("example_data/fake.docx")
    documents = loader.load()

    return documents
```

### Creating or loading embeddings
Creating embeddings each time we use our app is time-consuming and expensive. By persisting the vector store database after embedding, we can load the saved embeddings for use in another session.

Also note the use of the `GoogleLLMEmbeddings()` model. There are [multiple](https://python.langchain.com/en/latest/modules/models/text_embedding.html?highlight=embedding) text embedding models available, many of which can be directly substituted here. Remember to add additional environment variables for different API keys, as per each model's documentation.

In [None]:
BATCH_SIZE = 5


class GooglePalmEmbeddings(BaseModel, Embeddings):
    model_name: str = "textembedding-gecko@001"

    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        """Validates that the python package exists in environment."""
        
        values["client"] = TextEmbeddingModel.from_pretrained(
            values["model_name"])
        return values

    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        """Embed a list of strings.
        Args:
            texts: List[str] The list of strings to embed.
        Returns:
            List of embeddings, one for each text.
        """
        logging.info(
            "API calls restricted to 5 instances per call, batching documents to embed..."
        )
        texts_batched = [
            texts[i: i + BATCH_SIZE] for i in range(0, len(texts), BATCH_SIZE)
        ]
        embeddings = [self.client.get_embeddings(x) for x in texts_batched]
        logging.info("Embeddings received!")
        return [el.values for batch in embeddings for el in batch]

    def embed_query(self, text: str) -> List[float]:
        """Embed a text.
        Args:
            text: The text to embed.
        Returns:
            Embedding for the text.
        """
        embeddings = self.client.get_embeddings([text])
        return embeddings[0].values


In [None]:
def embed_documents(persist_dir, texts):
    # We use GoogleLLM embeddings model, however other models can be substituted here
    embeddings = GooglePalmEmbeddings()()
    # We create a vector store database relating documents to embeddings
    # This embedding database is used to relate user queries to relevant documentation
    vector_store = Chroma.from_documents(
        persist_directory=persist_dir,
        documents=texts,
        embedding=embeddings,
    )

    vector_store.persist()

    return vector_store


In [None]:
def load_embeddings(persist_dir):
    # We use GoogleLLM embeddings model, however other models can be substituted here
    embeddings = GooglePalmEmbeddings()()

    # Creating embeddings with each re-run is highly inefficient and costly.
    # We instead aim to embed once, then load these embeddings from storage.
    vector_store = Chroma(
        embedding_function=embeddings,
        persist_directory=persist_dir,
    )

    return vector_store


In [None]:
# If the vectorstore embedding database has been created, load it
persist_dir = "chromadb"
if os.path.isdir(persist_dir):
    print(f"Loading {persist_dir} as vector store")
    vector_store = load_embeddings(persist_dir=persist_dir)
# If it exists, create it
else:
    print(f"Creating new vector store in dir {persist_dir}")
    documents = load_documents(source_dir="datatonic.com")

    # Individual documents will often exceed the 4096 token limit for GPT-3.
    # By splitting documents into chunks of 1000 token
    # These chunks fit into the token limit alongside the user prompt
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    vector_store = embed_documents(persist_dir=persist_dir, texts=texts)

### Creating the Conversational Q&A Chain

In [None]:
# The 'k' value indicates the number of sources to use per query.
# 'k' as in 'k-nearest-neighbours' to the query in the embedding space.
# 'temperature' is the degree of randomness introduced into the LLM response.
k = 2
temperature = 0.0


#### Prompt engineering
Prompt engineering is a method of zero-shot fine-tuning for large language models. By prompting a LLM with contextual information about its purpose, the model can simulate a variety of situations, such as a customer assistant chatbot, a document summariser, a translator, etc.

In this use-case, we prompt our model to respond as a conversational Q&A chatbot. Prompt engineering can be especially useful for introducing guard rails to an application - in this template we tell the model to not respond to queries it lacks the information to answer, as users will trust the application to provide factual replies, so rejecting a query is preferable to outputting false information.

In [None]:
template = """\
You are a helpful chatbot designed to perform Q&A on a set of documents.
Always respond to users with friendly and helpful messages.
Your goal is to answer user questions using relevant sources.

You were developed by Datatonic, and are powered by Google's PaLM-2 model.

In addition to your implicit model world knowledge, you have access to the following data sources:
- Company documentation.

If a user query is too vague, ask for more information.
If insufficient information exists to answer a query, respond with "I don't know".
NEVER make up information.

Chat History:
{chat_history}
Question: {question}
"""

SYSTEM_PROMPT = PromptTemplate.from_template(template)


In [None]:
class GoogleLLM(LLM):
    
    # Model name options {text-bison-alpha, text-bison@001}
    model_name: str="text-bison@001"
    _llm = TextGenerationModel.from_pretrained(model_name)
    max_output_tokens:int = 256
    temperature:float = 0.3
    top_p:float = 0.8
    top_k:int = 40

    @property
    def _llm_type(self) -> str:
        """Return type of llm"""
        return "google"
    
    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        """Get the identifying parameters."""
        return {
            "temperature": self.temperature,
            "max_output_tokens": self.max_output_tokens,
            "top_p": self.top_p,
            "top_k": self.top_k
        }
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        text = str(self._llm.predict(
            prompt, 
            max_output_tokens=self.max_output_tokens, 
            temperature=self.temperature, 
            top_p=self.top_p, 
            top_k=self.top_k,
        ))

        return text

In [None]:
# A vector store retriever relates queries to embedded documents
retriever = vector_store.as_retriever(k=k)

# The selected GoogleLLM model uses embedded documents related to the query
# It parses these documents in order to answer the user question.
# We use the GoogleLLM LLM, however other models can be substituted here
model = GoogleLLM(temperature=temperature)

# A conversation retrieval chain keeps a history of Q&A / conversation
# This allows for contextual questions such as "give an example of that (previous response)".
# The chain is also set to return the source documents used in generating an output
# This allows for explainability behind model output.
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    retriever=retriever,
    return_source_documents=True,
    condense_question_prompt=SYSTEM_PROMPT,
)

### Prompt engineering and Chat History

In [None]:
def q_a(question: str, history: list):
    # map history (list of lists) to expected format of chat_history (list of tuples)
    chat_history = map(tuple, history)
    
    # Query the LLM to get a response
    # First the Q&A chain will collect documents semantically similar to the question
    # Then it will ask the LLM to use this data to answer the user question
    # We also provide chat history as further context
    response = qa_chain(
        {
            "question": question,
            "chat_history": chat_history,
        }
    )

    # Format source documents (sources of excerpts passed to the LLM) into links the user can validate
    sources = [
        "[{0}]({0})".format(doc.metadata["source"])
        for doc in response["source_documents"]
    ]

    # Return the LLM answer, and list of sources used (formatted as a string)
    return response["answer"], "\n\n".join(sources)


### Building a simple GradI/O UI
As building with GradI/O is outside the scope of this workshop, a template GradI/O app has been provided.

In [None]:
# Build a simple GradIO app that accepts user input and queries the LLM
# Then displays the response in a ChatBot interface, with markdown support.
with gr.Blocks(theme=gr.themes.Base()) as demo:

    def submit(msg, chatbot):
        # First create a new entry in the conversation log
        msg, chatbot = user(msg, chatbot)
        # Then get the chatbot response to the user question
        chatbot = bot(chatbot)
        return msg, chatbot

    def user(user_message, history):
        # Return "" to clear the user input, and add the user question to the conversation history
        return "", history + [[user_message, None]]

    def bot(history):
        # Get the user question from conversation history
        user_message = history[-1][0]
        # Get the response and sources used to answer the user question
        bot_message, bot_sources = q_a(user_message, history[:-1])

        # Using a template, format the response and sources together
        bot_template = (
            "{0}\n\n<details><summary><b>Sources</b></summary>\n\n{1}</details>"
        )
        # Place the response into the conversation history and return
        history[-1][1] = bot_template.format(bot_message, bot_sources)
        return history

    # Set a page title
    gr.Markdown("# Custom knowledge worker")
    # Create a chatbot conversation log
    chatbot = gr.Chatbot(label="🤖 knowledge worker")
    # Create a textbox for user questions
    msg = gr.Textbox(
        label="👩‍💻 user input", info="Query information from the custom knowledge base."
    )

    # Align both buttons on the same row
    with gr.Row():
        send = gr.Button(value="Send", variant="primary").style(size="sm")
        clear = gr.Button(value="Clear History", variant="secondary").style(size="sm")

    # Submit message on <enter> or clicking "Send" button
    msg.submit(submit, [msg, chatbot], [msg, chatbot], queue=False)
    send.click(submit, [msg, chatbot], [msg, chatbot], queue=False)

    # Clear chatbot history on clicking "Clear History" button
    clear.click(lambda: None, None, chatbot, queue=False)

# Create a queue system so multiple users can access the page at once
demo.queue()
# Launch the webserver locally
demo.launch()


Testing our demo, we can query our knowledge base for answers (code emphasis added). Since we used a `ConversationalRetrievalChain`, we can also correct the model when it gives the wrong response and prompt it to fix it’s mistake, or ask for further detail on a previous response.

> 👩‍💻: how to install the gcloud cli
> 
> 🦜:  You can install the gcloud CLI by running the command `$ gcloud components update`.
> 
> 👩‍💻: that command requires gcloud to be installed, how can I install gcloud initially?
> 
> 🦜:  You can install gcloud initially by running the command '`pip install google-cloud`' in your terminal.
> 
> 👩‍💻: how can I set the target project
> 
> 🦜:  You can set the target project for the gcloud CLI by using the command `$ gcloud config set project my-new-default-project`.
> 
> 👩‍💻: what is the gcloud cli?
> 
> 🦜:  The gcloud CLI is a command line interface for Google Cloud Platform services.
> 
> 👩‍💻: explain the above in more detail
> 
> 🦜:  The gcloud CLI is a tool used to authenticate and configure credentials for Google Cloud services. It can be used to change the default project ID, update components, and authenticate the CLI itself.

## Task 02: Extending the knowledge base
As mentioned, it is possible to extend a knowledge base with additional documents. This is useful for updating a knowledge base with new information without having to re-embed established knowledge from scratch.

In this example, we will add an additional knowledge source to our worker, allowing it to answer queries on multiple domains. 


In [None]:
# Load another set of documents
documents = load_documents(source_dir="datatonic.com")

# Individual documents will often exceed the token limit.
# By splitting documents into chunks of 1000 token
# These chunks fit into the token limit alongside the user prompt
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Add new texts to the embedding vector store
vector_store.add_texts(texts=texts)

In [None]:
# Rebuild the Q&A chain

# A vector store retriever relates queries to embedded documents
retriever = vector_store.as_retriever(k=k)
# The selected GoogleLLM model uses embedded documents related to the query
# It parses these documents in order to answer the user question.
# We use the GoogleLLM LLM, however other models can be substituted here
model = GoogleLLM(temperature=temperature)

# A conversation retrieval chain keeps a history of Q&A / conversation
# This allows for contextual questions such as "give an example of that (previous response)".
# The chain is also set to return the source documents used in generating an output
# This allows for explainability behind model output.
qa_chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    retriever=retriever,
    return_source_documents=True,
    condense_question_prompt=SYSTEM_PROMPT
)

Note: if you're looking to deploy a knowledge worker with several knowledge bases, a [Router Chain](https://python.langchain.com/en/latest/modules/chains/examples/multi_retrieval_qa_router.html), which combines several knowledge workers with discrete knowledge bases into a single chain which selects the best worker for the query.

## Task 03: Using a different LLM (?)


## Task 04: Loading custom documents