# Intro to LangChain (CoLab version)

LangChain is a Python framework for developing applications using language models. It abstracts the connection between applications and LLMs, allowing a loose coupling between code and specific providers like Google PaLM.

This guide details how to get started with LangChain, and walks-through setting up a knowledge worker. The files for this guide can be found here.

## Creating a knowledge worker
Instead of training LLMs using your own data (ie. fine-tuning), it is far easier and more effective to adapt the LLM to your use-case by prompt engineering only (ie. tuning).

A custom knowledge worker (an LLM app which only has access to specific knowledge, such as technical documentation), is described in two stages: 

1. Knowledge embedding.
2. LLM Query & Answer (with sources).

![A typical ingestion chain](assets/typical-ingestion-chain.png)

First, documents (websites, Word documents, databases, Powerpoints, PDFs, etc.) are loaded and split into chunks. Fragmenting is important for three reasons -

1. There are technical restrictions on how much data (tokens) can be fed into an LLM at once, meaning the context + system prompt + chat history + user prompt must fit within the token limit.
2. Most LLM APIs operate on a per-token pricing model, meaning it is cost-effective to limit the size / amount of data provided to the LLM per query. 
3. Contextual information should be relevant to the user query, meaning it is optimial to provide only relevant snippets from documents, making the answer more relevant whilst saving costs as per (1) and (2).

Next, these document shards are embedded within a vector store. Embedding a document means to align it within a mutli-dimension space, which can then be searched according to user queries to find relevant documents. Document relevancy scoring can be as simple as a K-neighbours search, since embedded documents with similarity (as percieved by the LLM embedding model) will be proximate within the search space.

![A typical query chain](assets/typical-query-chain.png)

Once the vector store is created, a user can query the knowledge base using natural language questions. Relevant documents related to the query are found in the vector store by embedding the user query and finding local documents. These snippets of text are provided to the LLM (alongside the user query, chat history, prompt engineering, etc.) which parses the information to generate an answer.

## Prerequisites
### Authenticate with Google and install dependencies

In [None]:
from google.colab import auth as google_auth
google_auth.authenticate_user()


In [None]:
%pip install langchain chromadb tiktoken gradio unstructured tqdm google-cloud-aiplatform google-cloud-core

In [None]:
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)


#### NOTE: Do not forget to click the "Restart Runtime" button above.
#### If you dont see a restart button, go to the "Runtime" toolbar tab then "Restart Runtime".
#### After restarting, continue executing the project from below this cell ("Run All" will restart the runtime again).

## Setup Projects and Region

**If you don't know your project ID**, try the following:
* Run `gcloud config list`.
* Run `gcloud projects list`.
* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)

In [None]:
PROJECT_ID = "dt-langchain-apps-dev" # @param {type:"string"}
LOCATION = "europe-west2"  # @param {type:"string"}

In [None]:
!gcloud config set project {PROJECT_ID}

In [None]:
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)


## Task 01: Q&A over Documentation with GoogleLLM

Creating a custom knowledge worker is the “Hello World!” of LLMOps. Loading documents, creating embeddings, storing in a vector database and using an LLM to answer queries with knowledge from that database can be achieved in a few lines of Python.

### Document loading
First, we need to select our documentation. LangChain supports [numerous methods](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) for loading documents. As we will be constructing our knowledge worker using website data, we first download the files locally using `wget`. Note: replace `https://datatonic.com/` with any domain to build a knowledge worker specific to your use-case.

In [None]:
!wget -r -A.html https://datatonic.com/

Note how filepaths to the `.html` files form proper URL paths, we'll use this later to reference our answers with proper hyperlinks.

The downloaded website can then be loaded as documents using the LangChain [`DirectoryLoader`](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/directory_loader.html) and [`UnstructuredHTMLLoader`](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/html.html) loaders.

In [7]:
from langchain.document_loaders import DirectoryLoader, UnstructuredHTMLLoader

def load_documents(source_dir):
    # Load the documentation using a HTML parser
    loader = DirectoryLoader(
        source_dir,
        glob="**/*.html",
        loader_cls=UnstructuredHTMLLoader,
        show_progress=True,
    )
    documents = loader.load()

    return documents


If you wanted to build a knowledge worker with another document type, for instance [Microsoft Word](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/microsoft_word.html) documents, you would update the `load_documents()` function according to the documentation for that document type loader.

E.g.:
```python
from langchain.document_loaders import Docx2txtLoader

def load_documents():
    # Load the documentation using a Microsoft Word parser
    loader = Docx2txtLoader("example_data/fake.docx")
    documents = loader.load()

    return documents
```

### Creating or loading embeddings
Creating embeddings each time we use our app is time-consuming and expensive. By persisting the vector store database after embedding, we can load the saved embeddings for use in another session.

Also note the use of the `GoogleLLMEmbeddings()` model. There are [multiple](https://python.langchain.com/en/latest/modules/models/text_embedding.html?highlight=embedding) text embedding models available, many of which can be directly substituted here. Remember to add additional environment variables for different API keys, as per each model's documentation.

In [8]:
PERSIST_DIR = "chromadb"

In [9]:
from langchain.vectorstores import Chroma
from langchain.embeddings import VertexAIEmbeddings

def load_embeddings():
    # We use GoogleLLM embeddings model, however other models can be substituted here
    embeddings = VertexAIEmbeddings()

    # Creating embeddings with each re-run is highly inefficient and costly.
    # We instead aim to embed once, then load these embeddings from storage.
    vector_store = Chroma(
        embedding_function=embeddings,
        persist_directory=PERSIST_DIR,
    )

    return vector_store


In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

def embed_documents(embedding, documents):
    # Individual documents will often exceed the token limit.
    # By splitting documents into chunks of 1000 token
    # These chunks fit into the token limit alongside the user prompt
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    vector_store = Chroma.from_documents(
        documents=texts, embedding=embedding, persist_directory=PERSIST_DIR
    )

    # Persist the ChromaDB locally, so we can reload the script without expensively re-embedding the database
    vector_store.persist()

    return vector_store


In [11]:
def create_embeddings(source_dir):
    documents = load_documents(source_dir=source_dir)

    # We use GoogleLLM embeddings model, however other models can be substituted here
    embedding = VertexAIEmbeddings()

    vector_store = embed_documents(embedding, documents)

    return vector_store


In [12]:
import os

# If the vectorstore embedding database has been created, load it
if os.path.isdir(PERSIST_DIR):
    print(f"Loading {PERSIST_DIR} as vector store")
    vector_store = load_embeddings()
# If it doesn't exist, create it
else:
    print(f"Creating new vector store in dir {PERSIST_DIR}")
    vector_store = create_embeddings(source_dir="datatonic.com")


Creating new vector store in dir chromadb


  0%|          | 0/147 [00:00<?, ?it/s]

100%|██████████| 147/147 [00:01<00:00, 81.54it/s] 


### Creating the Conversational Q&A Chain

In [13]:
# The 'k' value indicates the number of sources to use per query.
# 'k' as in 'k-nearest-neighbours' to the query in the embedding space.
# 'temperature' is the degree of randomness introduced into the LLM response.
k = 2
temperature = 0.0


#### Prompt engineering
Prompt engineering is a method of zero-shot fine-tuning for large language models. By prompting a LLM with contextual information about its purpose, the model can simulate a variety of situations, such as a customer assistant chatbot, a document summariser, a translator, etc.

In this use-case, we prompt our model to respond as a conversational Q&A chatbot. Prompt engineering can be especially useful for introducing guard rails to an application - in this template we tell the model to not respond to queries it lacks the information to answer, as users will trust the application to provide factual replies, so rejecting a query is preferable to outputting false information.

In [14]:
from langchain.prompts import PromptTemplate

template = """\
You are a helpful chatbot designed to perform Q&A on a set of documents.
Always respond to users with friendly and helpful messages.
Your goal is to answer user questions using relevant sources.

You were developed by Datatonic, and are powered by Google's PaLM-2 model.

In addition to your implicit model world knowledge, you have access to the following data sources:
- Company documentation.

If a user query is too vague, ask for more information.
If insufficient information exists to answer a query, respond with "I don't know".
NEVER make up information.

Chat History:
{chat_history}
Question: {question}
"""

# The PromptTemplate reads input variables (i.e.: 'chat_history', 'question') from the template
SYSTEM_PROMPT = PromptTemplate.from_template(template)


In [15]:
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import VertexAI

def qa_chain():
    # A vector store retriever relates queries to embedded documents
    retriever = vector_store.as_retriever(k=k)

    # The selected GoogleLLM model uses embedded documents related to the query
    # It parses these documents in order to answer the user question.
    # We use the GoogleLLM LLM, however other models can be substituted here
    model = VertexAI(temperature=temperature)

    # A conversation retrieval chain keeps a history of Q&A / conversation
    # This allows for contextual questions such as "give an example of that (previous response)".
    # The chain is also set to return the source documents used in generating an output
    # This allows for explainability behind model output.
    chain = ConversationalRetrievalChain.from_llm(
        llm=model,
        retriever=retriever,
        return_source_documents=True,
        condense_question_prompt=SYSTEM_PROMPT,
    )

    return chain


In [16]:
def q_a(question: str, history: list):
    # map history (list of lists) to expected format of chat_history (list of tuples)
    chat_history = map(tuple, history)
    
    # Query the LLM to get a response
    # First the Q&A chain will collect documents semantically similar to the question
    # Then it will ask the LLM to use this data to answer the user question
    # We also provide chat history as further context
    response = qa_chain()(
        {
            "question": question,
            "chat_history": chat_history,
        }
    )

    # Format source documents (sources of excerpts passed to the LLM) into links the user can validate
    sources = [
        "[{0}]({0})".format(doc.metadata["source"])
        for doc in response["source_documents"]
    ]

    # Return the LLM answer, and list of sources used (formatted as a string)
    return response["answer"], "\n\n".join(sources)


### Building a simple GradI/O UI
As building with GradI/O is outside the scope of this workshop, a template GradI/O app has been provided.

In [17]:
def submit(msg, chatbot):
    # First create a new entry in the conversation log
    msg, chatbot = user(msg, chatbot)
    # Then get the chatbot response to the user question
    chatbot = bot(chatbot)
    return msg, chatbot


def user(user_message, history):
    # Return "" to clear the user input, and add the user question to the conversation history
    return "", history + [[user_message, None]]


def bot(history):
    # Get the user question from conversation history
    user_message = history[-1][0]
    # Get the response and sources used to answer the user question
    bot_message, bot_sources = q_a(user_message, history[:-1])

    # Using a template, format the response and sources together
    bot_template = (
        "{0}\n\n<details><summary><b>Sources</b></summary>\n\n{1}</details>"
    )
    # Place the response into the conversation history and return
    history[-1][1] = bot_template.format(bot_message, bot_sources)
    return history


In [18]:
import gradio as gr

# Build a simple GradIO app that accepts user input and queries the LLM
# Then displays the response in a ChatBot interface, with markdown support.
with gr.Blocks(theme=gr.themes.Base()) as demo:
    # Set a page title
    gr.Markdown("# Custom knowledge worker")
    # Create a chatbot conversation log
    chatbot = gr.Chatbot(label="🤖 knowledge worker")
    # Create a textbox for user questions
    msg = gr.Textbox(
        label="👩‍💻 user input", info="Query information from the custom knowledge base."
    )

    # Align both buttons on the same row
    with gr.Row():
        send = gr.Button(value="Send", variant="primary").style(size="sm")
        clear = gr.Button(value="Clear History", variant="secondary").style(size="sm")

    # Submit message on <enter> or clicking "Send" button
    msg.submit(submit, [msg, chatbot], [msg, chatbot], queue=False)
    send.click(submit, [msg, chatbot], [msg, chatbot], queue=False)

    # Clear chatbot history on clicking "Clear History" button
    clear.click(lambda: None, None, chatbot, queue=False)

# Create a queue system so multiple users can access the page at once
demo.queue()
# Launch the webserver locally
demo.launch()


  from .autonotebook import tqdm as notebook_tqdm


Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




Testing our demo, we can query our knowledge base for answers (code emphasis added). Since we used a `ConversationalRetrievalChain`, we can also correct the model when it gives the wrong response and prompt it to fix it’s mistake, or ask for further detail on a previous response.

> 👩‍💻: how to install the gcloud cli
> 
> 🦜:  You can install the gcloud CLI by running the command `$ gcloud components update`.
> 
> 👩‍💻: that command requires gcloud to be installed, how can I install gcloud initially?
> 
> 🦜:  You can install gcloud initially by running the command '`pip install google-cloud`' in your terminal.
> 
> 👩‍💻: how can I set the target project
> 
> 🦜:  You can set the target project for the gcloud CLI by using the command `$ gcloud config set project my-new-default-project`.
> 
> 👩‍💻: what is the gcloud cli?
> 
> 🦜:  The gcloud CLI is a command line interface for Google Cloud Platform services.
> 
> 👩‍💻: explain the above in more detail
> 
> 🦜:  The gcloud CLI is a tool used to authenticate and configure credentials for Google Cloud services. It can be used to change the default project ID, update components, and authenticate the CLI itself.

## Task 02: Extending the knowledge base
As mentioned, it is possible to extend a knowledge base with additional documents. This is useful for updating a knowledge base with new information without having to re-embed established knowledge from scratch.

Try extending the `create_embeddings()` function to load multiple knowledge sources. Update the `source_dir` to `source_dirs`, which accepts a list of folder paths, then iteratively load these folders as documents and embed them in the database.

**Solution:**
```python
def create_embeddings(source_dirs):
    vector_store = load_embeddings()

    for dir in source_dirs:
        documents = load_documents(source_dir=dir)
        vector_store = embed_documents(vector_store, documents)

    return vector_store
```

Also experiment with loading different document types - update the `load_documents` function to load `.pdf` or `.pptx` files, or write some logic to load any of these file types depending on an argument `document_type` (try using an `if-elif-else` switch to change which document loader is used) and embed multiple document types within the same vector store.

Note: if you're looking to deploy a knowledge worker with several knowledge bases, a [Router Chain](https://python.langchain.com/en/latest/modules/chains/examples/multi_retrieval_qa_router.html), which combines several knowledge workers with discrete knowledge bases into a single chain which selects the best worker for the query.

## Task 03: Text generation over a vector index
We can utilise our embedded documents for more than just Q&A. In tasks 1 and 2, we used the embedded documents as context for answering user queries, but in this task we will use it to generate original content using this knowledge base as a source of information and style.



In [19]:
from langchain.chains import LLMChain


In [20]:
prompt_template = """\
Use the context below to write a 400 word blog post about the topic below:
Context: {context}
Topic: {topic}
Blog post:
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "topic"]
)

model = VertexAI(temperature=0)

chain = LLMChain(llm=model, prompt=PROMPT)


In [21]:
def generate_blog_post(topic: str, k: int):
    # search for 'k' nearest documents related to our topic.
    docs = vector_store.similarity_search(topic, k=k)

    # associate topic with the content of each document to generate inputs
    inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]
    # generate blog posts for each context-topic pair
    output = chain.apply(inputs)

    return output


In [22]:
# generate variations of blog posts on the topic "Greentonic initiative" based on the 4 most relevant documents
output = generate_blog_post("Greentonic initiative", k=4)
for blog in output:
    print(blog['text'])


Climate change is the defining crisis of our time, and it is happening even more quickly than we feared. The current warming trend is highly significant, and it is extremely likely (greater than 95% probability) to be the result of human activity since the mid-20th century. We have known about these negative effects of our behaviour on the climate for several decades now, yet there has been limited action taken to prevent it.

At Datatonic, we believe that we have a responsibility to use our skills and expertise to help address climate change. That's why we've launched Greentonic, our sustainability initiative.
Datatonic is a global technology company that specializes in data analytics, artificial intelligence, and machine learning. The company was founded in 2009 and is headquartered in Ghent, Belgium. Datatonic has offices in Europe, North America, and Asia-Pacific.

Datatonic's mission is to create positive change in the world through the use of technology. The company believes that

# Conclusions
## What have we built?
In this session, we have built a knowledge worker use-case for accessing your complex information using Generative AI. This concept can be extended into a fully-fledged tool that can unlock the value of your data for customers or internal use.

# Going further..
This workshop has introduced all the LangChain knowledge required to create a knowledge worker. The next steps for moving this project from development to production are discussed below.

## Decoupling LangChain from GradI/O
It is not necessary to run LangChain within a GradI/O app. Decoupling LangChain into a separate API has several benefits:
1. We can deploy scalable servers / Docker containers
2. Simplified code - a frontend-backend loose coupling can lead to simpler code, which is ease to update and maintain.
3. If a more professional user interface is needed, such as a native React app. Replacing GradI/O is a straightforward process - FastAPI can be called from javascript, etc., allowing you to move beyond Python frontend frameworks.

An example of this separation can be found on the GitHub repository, using FastAPI to create a simple LangChain API server and Poetry to manage separate server environments.

## Deploying on Google Cloud
Once we have decoupled our frontend and backend code, we can deploy the project onto Google Cloud.

This reference architecture diagrams mirror the flow diagrams we first introducted in the workshop introduction. Using Google Cloud, we can create production pipelines for creating / updating vector databases, and deploy a knowledge worker API (which can be connected to a web UI, Slack bot, etc.).

**Example architecture: Ingestion**
![A typical ingestion chain](assets/knowledge-worker-gcp-ingestion-pipeline.png)

By creating a pipeline for data ingestion, we can continue to extend the knowledge base of our knowledge worker as you produce new documents and documentation.

**Example architecture: Inference**
![A typical ingestion chain](assets/knowledge-worker-gcp-inference-pipeline.png)

By creating a pipeline for inference, we can leverage the power of Google Cloud to provide a highly reliable and scalable API that can power a variety of applications.