# Generative AI Hackathon

## Introduction

This notebook walks you through the challenge of implementing a **Knowledge Worker** for your organisation using **Generative AI**!

**Why a knowledge worker?** Decentralized data across internal and external databases results in time wasted as workforce tries to find required information and transform into insights. A knowledge worker can consolidate this information, then answer queries in natural language providing summarisation and sources.

➡️ **Your task:** Implement a knowledge worker to enable users in your company to perform Q&A, in natural language, upon a knowledge base.
In this way, you'll centralise company data for easy access in a user-friendly manner, boosting productivity.
As such, you'll create a knowledge worker fine-tuned to your data domain.
This app will only have access to specific knowledge such as public data about your company available on your company's website and unstructured documents (websites, PDF, Word, text ...).

While solving the tasks as instructed in this notebook, you'll familiarise yourself with common concepts and tools for Generative AI including:

- The Open-Source tool LangChain
- Large Language Models (LLMs)
- Text Embeddings and Vector Databases
- Prompts and Prompt Engineering

Ultimately, this notebook details how to get started with LangChain, and walks through setting up a knowledge worker on Google Cloud and Vertex AI.

## Implementing a knowledge worker

When creating a knowledge worker, you recall Large Language Models (LLMs) can be tuned for a variety of tasks such as text summarisation, answering questions, and generating new content (and many more!).
When it comes to tuning approaches, you'll have the choice between:

**A) Zero-shot learning:** Use LLMs directly without providing additional data or fine-tuning the model.

**B) Few-shot learning:** Provide a select number of input examples when using LLM to improve the quality of outputs.

**C) Model Fine-tuning:** Fine-tune certain (or additional) layers in the LLM by training the model on provided training data.

Instead of training LLMs using your own data (ie. fine-tuning), it is far easier and more effective to adapt the LLM to your use-case by prompt engineering only (ie. tuning).
Thus, methods A) and B) are more applicable for creating your first knowledge worker.

A knowledge worker can be approached in two stages:

1. Embedding knowledge from diverse sources.
2. Querying a LLM which is aware of your relevant knowledge to answer questions.

![A typical ingestion chain](https://github.com/teamdatatonic/gen-ai-hackathon/blob/7f37d477b18ace5912d34b0574512559d7a457ed/assets/typical-ingestion-chain.png?raw=true)

First, documents (websites, Word documents, databases, Powerpoints, PDFs, etc.) are loaded and split into chunks. Fragmenting is important for three reasons:

1. There are technical restrictions on how much data (tokens) can be fed into an LLM at once, meaning the context + system prompt + chat history + user prompt must fit within the token limit.
2. Most LLM APIs operate on a per-token pricing model, meaning it is cost-effective to limit the size / amount of data provided to the LLM per query.
3. Contextual information should be relevant to the user query, meaning it is optimal to provide only relevant snippets from documents, making the answer more relevant whilst saving costs as per (1) and (2).

Next, these document shards are embedded within a vector store. Embedding a document means to align it within a mutli-dimension space, which can then be searched according to user queries to find relevant documents. Document relevancy scoring can be as simple as a K-neighbours search, since embedded documents with similarity (as percieved by the LLM embedding model) will be proximate within the search space.

![A typical query chain](https://github.com/teamdatatonic/gen-ai-hackathon/blob/7f37d477b18ace5912d34b0574512559d7a457ed/assets/typical-query-chain.png?raw=true)

Once the vector store is created, a user can query the knowledge base using natural language questions. Relevant documents related to the query are found in the vector store by embedding the user query and finding local documents. These snippets of text are provided to the LLM (alongside the user query, chat history, prompt engineering, etc.) which parses the information to generate an answer.

## Prerequisites

### Install Python dependencies
We use LangChain, a framework for developing applications powered by language models, and GradIO, a friendly frontend Python framework. Install both of these (and some dependencies) to run this labs code.

In [None]:
%pip install --quiet langchain chromadb tiktoken gradio tqdm google-cloud-aiplatform google-cloud-core unstructured google-cloud-discoveryengine

In [None]:
%pip install pydantic==1.10.8 typing-extensions==4.5.0 typing-inspect==0.8.0

**❗ Restart the Python kernel:** Ensure that your environment can access the newly installed dependencies. Continue after the restart from the `Setup cloud project` step.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

**❗ Note:** If your kernel doesn't restart automatically, click the "Restart Runtime" button above your notebook.
If you dont see a restart button, go to the "Runtime" toolbar tab then "Restart Runtime". After restarting, continue executing the project from below this cell.

## Setup cloud project

Currently, Vertex AI LLMs are accessible via Google Cloud projects. We will access the Vertex AI endpoint via a service account.

1. Upload the Google Application Credentials `.json` file sent to your email to the notebook filesystem.
2. Set the variable `GOOGLE_APPLICATION_CREDENTIALS` with the filepath (**❗ Note:** the `/content/` folder is where uploaded files are stored by default).

In [None]:
import os

GOOGLE_APPLICATION_CREDENTIALS = "/content/credentials.json"  # @param {type:"string"}
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = GOOGLE_APPLICATION_CREDENTIALS

## Task 1: Implementing a knowledge worker

Creating a custom knowledge worker is similar to your first step when learning a new programming language.
As such your first challenge is to create a “Hello World” program, however, adapted to LLMs which is way more exciting!

With a few lines of code, you'll:
- Load documents with information about your company
- Create text embeddings from documents
- Storing embedding in a local database
- Use an LLM to answer queries about your company knowledge

**❗ All of these steps can be achieved in a few lines of Python.**

### Introduction to LangChain

LangChain is a Python framework for developing applications using language models.
It abstracts the connection between applications and LLMs, allowing a loose coupling between code and specific providers like Google PaLM.

LangChain supports [numerous methods](https://python.langchain.com/en/latest/modules/indexes/document_loaders.html) for loading documents.

We will be using the `DirectoryLoader` and `UnstructuredHTMLLoader` in order to load a pre-compiled archive of your website. This method is similar to the [`RecursiveUrlLoader`](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/recursive_url_loader). This document loader searches for subpages of a website and loads each pages content as a document. Additionally, if we only wanted to download a list of URLs without searching for subpages, we could use a [`UnstructuredURLLoader`](https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/url).

**❗ Web scraping is liable to infringing upon website rate limits. We're using pre-compiled archives to avoid accidentially DDOSing your website during this session.**

**➡️ Your task:** Read the linked resources in the `Introduction to LangChain` step and study the following code cells as they provide reusable LangChain code for your knowledge worker.

### Collecting documents
First, we need to collect our data. To get started fast, we've already downloaded some sample website data upfront. Let's copy the website data from a public Cloud Storage bucket to your local file system

**❗ Note:** Although PaLM supports multiple languages, text embeddings currently work best with English documents.

**➡️ Your task:** Select a pre-compiled website from our list and download from the public bucket.

In [None]:
# @markdown This variable can be left as default for this task.
BUCKET = "dt-wpp-genai-hack-dev"  # @param {type:"string"}

# @markdown Choose any of these web archives as the base knowledge of your worker.
LOCAL_FOLDER = "www.datatonic.com"  # @param ["www.datatonic.com", "www.choreograph.com", "www.essencemediacom.com"]

In [None]:
!gsutil cp gs://{BUCKET}/{LOCAL_FOLDER}.tar.gz . && tar -xzf {LOCAL_FOLDER}.tar.gz

In [None]:
from langchain.document_loaders import DirectoryLoader, UnstructuredHTMLLoader


def load_documents(source_dir):
    # Load the documentation using a HTML parser
    loader = DirectoryLoader(
        source_dir,
        glob="**/*.html",
        loader_cls=UnstructuredHTMLLoader,
        show_progress=True,
    )
    documents = loader.load()

    print(f"Loaded: {len(documents)} documents from {source_dir}.")

    return documents

### Creating or loading embeddings

Creating embeddings each time we use our app is time-consuming and expensive.
By persisting the vector store database after embedding, we can load the saved embeddings for use in another session.

**➡️ Your task:** Study and execute the following code cells. Note that after the documents have been loaded, they are split into shards using the `RecursiveCharacterTextSplitter` function. 

In [None]:
# @markdown These variables can be left as default for this task.
PERSIST_DIR = "chromadb"  # @param {type:"string"}

🎉 Congratulations! 🎉 You've downloaded the WPP website data.

Now, lets embed these files so we can use them in our knowledge worker.

**➡️ Your task:** Run the following cells to *create* the text embeddings based on your downloaded data.

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import VertexAIEmbeddings


def create_embeddings(source_dir):
    documents = load_documents(source_dir=source_dir)

    # We use GoogleLLM embeddings model, however other models can be substituted here
    embedding = VertexAIEmbeddings()

    # Individual documents will often exceed the token limit.
    # By splitting documents into chunks of 1000 token
    # These chunks fit into the token limit alongside the user prompt
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)

    vector_store = Chroma.from_documents(
        documents=texts, embedding=embedding, persist_directory=PERSIST_DIR
    )

    # Persist the ChromaDB locally, so we can reload the script without expensively re-embedding the database
    vector_store.persist()

In [None]:
print(f"Creating new vector store in dir {PERSIST_DIR}.")
create_embeddings(
    source_dir=LOCAL_FOLDER
)  # creates the vector DB and saves it locally.

**➡️ Your task:** Run the following cells to *load* the text embeddings.

In [None]:
def load_embeddings():
    # We use VertexAI embeddings model, however other models can be substituted here
    embeddings = VertexAIEmbeddings()

    # Creating embeddings with each re-run is highly inefficient and costly.
    # We instead aim to embed once, then load these embeddings from storage.
    vector_store = Chroma(
        embedding_function=embeddings,
        persist_directory=PERSIST_DIR,
    )

    return vector_store

In [None]:
print(f"Loading {PERSIST_DIR} as vector store.")
vector_store = load_embeddings()  # loads the vector DB from the local file system.

**🎉 Congratulations! 🎉** You've created text embeddings from your company data and stored them successfully in a local vector database.
Now, you'll shift your focus to implementing the actual LLM by creating a chain using LangChain.

### Creating the Conversational Q&A Chain

In this section, you'll create a chain which will be able to provide an answer given a question from a user.
To understand the purpose of chains, you can read about chains in the [LangChain documentation](https://docs.langchain.com/docs/).

At first, we'll initialise a few hyperparameters for the LLM models which we'll reference later on:

In [None]:
# The 'k' value indicates the number of sources to use per query.
# 'k' as in 'k-nearest-neighbours' to the query in the embedding space.
# 'temperature' is the degree of randomness introduced into the LLM response.
k = 2
temperature = 0.0

### Prompt engineering

As outlined before, the creation of prompts is essential to adapt LLMs for your given use case.
**Prompt engineering** is a method of zero-shot fine-tuning for large language models.
By prompting a LLM with contextual information about its purpose, the model can simulate a variety of situations, such as a customer assistant chatbot, a document summariser, a translator, etc.

In this use case, we prompt our model to respond as a conversational Q&A chatbot.
Prompt engineering can be especially useful for introducing guard rails to an application - in this template we tell the model to not respond to queries it lacks the information to answer, as users will trust the application to provide factual replies, so rejecting a query is preferable to outputting false information.

You can use the prompt and code cells below for your knowledge worker.

**➡️ Your task:** Execute and study the following code cells as they provide reusable LangChain code for your knowledge worker.
Pay attention to the prompt in the `template` variable.
What elements do you notice in the prompt?
How is the prompt used in the chain?

In [None]:
from langchain.prompts import PromptTemplate


# The PromptTemplate reads input variables (i.e.: 'chat_history', 'question') from the template
SYSTEM_PROMPT = PromptTemplate.from_template(
    """\
You are a helpful chatbot designed to perform Q&A on a set of documents.
Always respond to users with friendly and helpful messages.
Your goal is to answer user questions using relevant sources.

You were developed by Datatonic, and are powered by Google's PaLM-2 model.

In addition to your implicit model world knowledge, you have access to the following data sources:
- Company documentation.

If a user query is too vague, ask for more information.
If insufficient information exists to answer a query, respond with "I don't know".
NEVER make up information.

Chat History:
{chat_history}
Question: {question}
"""
)

In [None]:
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import VertexAI


def qa_chain():
    # A vector store retriever relates queries to embedded documents
    retriever = vector_store.as_retriever(k=k)

    # The selected GoogleLLM model uses embedded documents related to the query
    # It parses these documents in order to answer the user question.
    # We use the GoogleLLM LLM, however other models can be substituted here
    model = VertexAI(temperature=temperature)

    # A conversation retrieval chain keeps a history of Q&A / conversation
    # This allows for contextual questions such as "give an example of that (previous response)".
    # The chain is also set to return the source documents used in generating an output
    # This allows for explainability behind model output.
    chain = ConversationalRetrievalChain.from_llm(
        llm=model,
        retriever=retriever,
        return_source_documents=True,
        condense_question_prompt=SYSTEM_PROMPT,
    )

    return chain

In [None]:
def q_a(question: str, history: list):
    # map history (list of lists) to expected format of chat_history (list of tuples)
    chat_history = map(tuple, history)

    # Query the LLM to get a response
    # First the Q&A chain will collect documents semantically similar to the question
    # Then it will ask the LLM to use this data to answer the user question
    # We also provide chat history as further context
    response = qa_chain()(
        {
            "question": question,
            "chat_history": chat_history,
        }
    )

    # Format source documents (sources of excerpts passed to the LLM) into links the user can validate
    sources = [
        "[https://{0}](https://{0})".format(doc.metadata["source"].replace("index.html", ""))
        for doc in response["source_documents"]
    ]

    # Return the LLM answer, and list of sources used (formatted as a string)
    return response["answer"], "\n\n".join(sources)

### Building the user interface

As building a UI is outside of the scope for this hackathon, a templated GUI using [Gradio](https://gradio.app/) is provided for your knowledge worker.

**➡️ Your task:** Execute the cells below to launch the user interface.

In [None]:
def submit(msg, chatbot):
    # First create a new entry in the conversation log
    msg, chatbot = user(msg, chatbot)
    # Then get the chatbot response to the user question
    chatbot = bot(chatbot)
    return msg, chatbot


def user(user_message, history):
    # Return "" to clear the user input, and add the user question to the conversation history
    return "", history + [[user_message, None]]


def bot(history):
    # Get the user question from conversation history
    user_message = history[-1][0]
    # Get the response and sources used to answer the user question
    bot_message, bot_sources = q_a(user_message, history[:-1])

    # Using a template, format the response and sources together
    bot_template = "{0}\n\n<details><summary><b>Sources</b></summary>\n\n{1}</details>"
    # Place the response into the conversation history and return
    history[-1][1] = bot_template.format(bot_message, bot_sources)
    return history

In [None]:
import gradio as gr


def app():
    # Build a simple GradIO app that accepts user input and queries the LLM
    # Then displays the response in a ChatBot interface, with markdown support.
    with gr.Blocks(theme=gr.themes.Base()) as demo:
        # Set a page title
        gr.Markdown("# Custom knowledge worker")
        # Create a chatbot conversation log
        chatbot = gr.Chatbot(label="🤖 knowledge worker")
        # Create a textbox for user questions
        msg = gr.Textbox(
            label="👩‍💻 user input",
            info="Query information from the custom knowledge base.",
        )

        # Align both buttons on the same row
        with gr.Row():
            send = gr.Button(value="Send", variant="primary", size="sm")
            clear = gr.Button(value="Clear History", variant="secondary", size="sm")

        # Submit message on <enter> or clicking "Send" button
        msg.submit(submit, [msg, chatbot], [msg, chatbot], queue=False)
        send.click(submit, [msg, chatbot], [msg, chatbot], queue=False)

        # Clear chatbot history on clicking "Clear History" button
        clear.click(lambda: None, None, chatbot, queue=False)

    return demo

**❗ Note:** The following cell will run until manually stopped. Remember to halt it before moving onto the next task.

In [None]:
demo = app()

# Create a queue system so multiple users can access the page at once
demo.queue()
# Launch the webserver locally
demo.launch(share=True, debug=True)

**➡️ Your task:** Use the user interface above (which you can also open in a separate tab given the shareable link above), to query your knowledge base.

Try out a few questions from this example Q&A:

> 👩‍💻: What is Datatonic?
> 
> 🦜: Datatonic is a data consultancy enabling companies to make better business decisions with the power of Modern Data Stack and MLOps.
> 
> 👩‍💻: Summarise the web article on Greentonic.
> 
> 🦜: Greentonic is Datatonic's sustainability initiative.
> 
> 👩‍💻: How is Datatonic being sustainable?
> 
> 🦜: Datatonic is committed to sustainability and has a number of initiatives in place to reduce its environmental impact. These include:
>    * Using renewable energy sources
>    * Reducing our carbon footprint
>    * Promoting sustainable practices in our supply chain
>    * Supporting environmental charities
>
> We believe that sustainability is essential for the future of our planet and we are committed to doing our part to make a difference.

Since we used a `ConversationalRetrievalChain`, we can also correct the model when it gives the wrong response and prompt it to fix it’s mistake, or ask for further detail on a previous response.

**🎉 Congratulations! 🎉** You've created your first chain using LangChain which you can query for general questions in a user interface.
Let's continue extending your knowledge worker in the next task.


## Task 2: Extending the knowledge base

As mentioned, it is possible to extend the knowledge base with additional documents.
This is useful for updating a knowledge base with new information without having to re-embed established knowledge from scratch.

If you wanted to build a knowledge worker with another document type, for instance [Microsoft Word](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/microsoft_word.html) documents, you would update the `load_documents()` function according to the documentation for that document type loader.

**❗ Note:** if you're looking to deploy a knowledge worker with several knowledge bases, an [Embedding Router Chain](https://python.langchain.com/docs/modules/chains/foundational/router#embeddingrouterchain), which combines several knowledge workers with discrete knowledge bases into a single chain which selects the best worker for the query.

**➡️ Your task:** Extend the knowledge worker with new documents.
1. Load the new documents.
2. Add new documents to the existing vector store using `.add_documents(documents=...)`.

See the following example for loading Word documents:


In [None]:
from langchain.document_loaders import Docx2txtLoader


def load_docx_documents(filepath):
    if filepath:
        # Load the documentation using a Microsoft Word parser
        loader = Docx2txtLoader(filepath)
        documents = loader.load()

        return documents

In [None]:
# load the word document by filepath
word_documents = load_docx_documents(
    filepath=None
)  # ❗ TODO: update this function to your own file type + file path

# add this document(s) to the vector store
vector_store.add_documents(word_documents)

# "save" the new vector store back to the file system
vector_store.persist()

In [None]:
# ❗ TODO: replicate the code above to add more documents to the vector store..

**➡️ Your task:** Relaunch the GradIO GUI and try asking questions using knowledge from your newly added documents.

**❗ Note:** The following cell will run until manually stopped. Remember to halt it before moving onto the next task.

In [None]:
demo = app()

# Create a queue system so multiple users can access the page at once
demo.queue()
# Launch the webserver locally
demo.launch(share=True, debug=True)

**🎉 Congratulations! 🎉** You've extended your knowledge to creating text embedding from a variety of sources - whether it's public data from your company's website or unstructured documents!

## Task 3: Generating text over a vector index

We can utilise our embedded documents for more than just Q&A.
In tasks 1 and 2, we used the embedded documents as context for answering user queries, but in this task we will use it to generate original content using this knowledge base as a source of information and style.

The concept of this use case is to generate ideas for new blogs, utilising knowledge and style information contained in the existing company website data.
We can use Generative AI for creative ideation, too!
Let's demonstrates the possibilities for human-computer interaction (HCI) apps in this task.

In [None]:
from langchain.chains import LLMChain

prompt_template = """\
Using the provided context, write the outline of a company blog post.
Include a bullet-point list of the main talking points, and a brief summary of the overall blog.
Context: {context}
Topic: {topic}
"""

PROMPT = PromptTemplate(template=prompt_template, input_variables=["context", "topic"])

model = VertexAI(temperature=0.7)

chain = LLMChain(llm=model, prompt=PROMPT)

In [None]:
def generate_blog_outline(topic: str, k: int):
    # search for 'k' nearest documents related to our topic.
    docs = vector_store.similarity_search(topic, k=k)

    # associate topic with the content of each document to generate inputs
    inputs = [{"context": doc.page_content, "topic": topic} for doc in docs]

    # generate blog outline
    output = chain.apply(inputs)

    return output

**➡️ Your task:** Create ideas for a new blog post.
Try adjusting the title of the post to generate new ideas!

In [None]:
BLOG_TITLE = "How we're making our business more sustainable"  # @param {type:"string"}

In [None]:
from IPython.display import display, Markdown

# generate variations of blog posts on the topic provided, based on the 4 most relevant documents
output = generate_blog_outline(BLOG_TITLE, k=4)
markdown = ""

for i, blog in enumerate(output):
    markdown += f"# #{i} {BLOG_TITLE}\n{blog['text']}\n\n"

display(Markdown(markdown))

**➡️ Your task:** Now it's time to change the prompt template to create new content based on your liking.
For that update the `prompt_template` and `temperature`.

In [None]:
# ❗ TODO: setup the prompt, model, and chain to create new types of content.

**🎉 Congratulations! 🎉** You've completed task 3 and generated ideas for future blog posts!
Continue with the next section to explore more possibilities and ideas using LangChain.

## Task 4: Extending the chain

So far you've created two types of chains:

### LLMChain

The `LLMChain` is a simple chain that adds some functionality around language models.
It is used widely throughout LangChain, including in other chains and agents.

An LLMChain consists of a **PromptTemplate** and a **language model** (either an LLM or chat model).
It formats the prompt template using the input key values provided (and also memory key values, if available), passes the formatted string to LLM and returns the LLM output.

```python
chain = LLMChain(llm=model, prompt=PROMPT)
```

### ConversationalRetrievalChain

The `ConversationalRetrievalQA` chain builds on RetrievalQAChain to provide a chat history component.

It first combines the chat history (either explicitly passed in or retrieved from the provided memory) and the question into a standalone question, then looks up relevant documents from the retriever, and finally passes those documents and the question to a question answering chain to return a response.

To create one, you will need a retriever.
In the below example, we will create one from a vector store, which can be created from embeddings.

```python
retriever = vector_store.as_retriever(k=k)
model = VertexAI(temperature=temperature)
chain = ConversationalRetrievalChain.from_llm(
    llm=model,
    retriever=retriever,
    return_source_documents=True,
    condense_question_prompt=SYSTEM_PROMPT,
)
```

### Explore more chains

**➡️ Your task:** Firm up your knowledge about the two chains used in this notebook [here](https://python.langchain.com/docs/modules/chains/foundational/llm_chain) and [here](https://python.langchain.com/docs/modules/chains/popular/chat_vector_db).
In which scenarios should you apply either of them?
What are their limitations?

*The **LLMChain** is useful when ...*

*It's limitations are ...*

*The **ConversationalRetrievalChain** is useful when ...*

*It's limitations are ...*

**➡️ Your task:** Read about more types of chains in the [official LangChain documentation](https://python.langchain.com/docs/modules/chains/additional/).
We recommend the **Sequential chain** and **Self-critique chain with constitutional AI**.
How can you extend your conversational knowledge worker which is currently based on the `ConversationalRetrievalChain`?
Summarise your idea either using pseudo code or actual code if you've time!
Overall we would like to you to consider:

**Idea + idea description:**

- *The idea is ...*
- *What it is ...*

**Problem it solves + impact:**

- *It would solve the following challenge ...*
- *The volume or value of the impact would be ...*

**Approach + Next steps:**

- *Next steps would be ...*

**❗ Note:** Do you have any other ideas (even outside of implementing a knowledge worker)?
Feel free to ideate about another use case which is relevant to your industry or company!

In [None]:
# ❗ TODO: create pseudo code

**🎉 Congratulations! 🎉** You've completed the last task of this hackathon!
Continue with the next section to explore next steps for *your* Generative AI journey on Google Cloud.

# Bonus Track - Using Gen App Builder Within your Knowledge Worker

Google Cloud has released a tool called Enterprise Search within the Gen App Builder service. Using Enterprise Search, you can ingest and retrieve websites, internal structured and unstructured data in a search engine and then use a Knowledge Worker to retrieve information in natural language. This is analogous to an internal Google search engine for your documents.

Using a custom implementation of a Knowledge Worker, you can combine the power of Gen App Builder with the customisation ability of a Knowledge Worker and build applications simpler, faster and more robustly. 

Let's have a sneak peak at how to do this. 

*Note* : This is not publicly availabble so you will need a whitelisted Google Cloud project. We have already created an Enterprise Search for you, so you can just go ahead and use it. It contains the following websites:
* www.wpp.com
* wwww.essencemediacom.com
* www.www.choreograph.com
* www.datatonic.com

In [None]:
# Define new environment variables
import os

# These credentials are for a different project, so you will need to ask for them again
GOOGLE_APPLICATION_CREDENTIALS = "/content/es_credentials.json"  # @param {type:"string"}
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = GOOGLE_APPLICATION_CREDENTIALS

SEARCH_ENGINE_PROJECT_ID = "dt-vertex-gen-ai-dev" # @param {type:"string"}
SEARCH_ENGINE_ID = "wpp-genai-day_1689017718091"  # @param {type:"string"}

### Create the Enterpise Search Retriever

In [None]:
"""Retriever wrapper for Google Cloud Enterprise Search."""
# pylint: disable=no-self-argument
from __future__ import annotations

from typing import Any, Dict, List

from google.cloud import discoveryengine_v1beta
from google.cloud.discoveryengine_v1beta.services.search_service import pagers
from langchain.schema import BaseRetriever, Document
from langchain.utils import get_from_dict_or_env
from pydantic import BaseModel, Extra, root_validator

class EnterpriseSearchRetriever(BaseRetriever, BaseModel):
    """Wrapper around Google Cloud Enterprise Search.
    This code has bene copied from a Google-owned repository here:
    https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gen-app-builder/retrieval-augmented-generation/utils/retriever.py
    But we have added the k attribute (max num documents).
    """

    client: Any = None  #: :meta private:
    serving_config: Any = None  #: :meta private:Any
    content_search_spec: Any = None  #: :meta private:Any
    project_id: str = ""
    search_engine_id: str = ""
    serving_config_id: str = "default_config"
    location_id: str = "global"
    max_snippet_count: int = 3
    credentials: Any = None
    "The default custom credentials (google.auth.credentials.Credentials) to use "
    "when making API calls. If not provided, credentials will be ascertained from "
    "the environment."
    k: int = 100  # The maximum number of documents to return.

    class Config:
        """Configuration for this pydantic object."""

        extra = Extra.forbid
        arbitrary_types_allowed = True

    @root_validator()
    def validate_environment(cls, values: Dict) -> Dict:
        try:
            from google.cloud import discoveryengine_v1beta
        except ImportError:
            raise ImportError(
                "google.cloud.discoveryengine is not installed. "
                "Please install it with pip install google-cloud-discoveryengine"
            )

        project_id = get_from_dict_or_env(values, "project_id", "SEARCH_ENGINE_PROJECT_ID")
        values["project_id"] = project_id
        search_engine_id = get_from_dict_or_env(
            values, "search_engine_id", "SEARCH_ENGINE_ID"
        )
        values["search_engine_id"] = search_engine_id
        location_id = get_from_dict_or_env(
            values, "location_id", "LOCATION_ID")
        values["location_id"] = location_id
        max_snippet_count = get_from_dict_or_env(
            values, "max_snippet_count", "MAX_SNIPPET_COUNT"
        )
        values["max_snippet_count"] = max_snippet_count

        client = discoveryengine_v1beta.SearchServiceClient(
            credentials=values["credentials"]
        )
        values["client"] = client

        serving_config = client.serving_config_path(
            project=project_id,
            location=location_id,
            data_store=search_engine_id,
            serving_config=values["serving_config_id"],
        )
        values["serving_config"] = serving_config

        content_search_spec = {
            "snippet_spec": {
                "max_snippet_count": max_snippet_count,
            }
        }
        values["content_search_spec"] = content_search_spec

        return values

    def _convert_search_response(
        self, search_results: pagers.SearchPager
    ) -> List[Document]:
        """Converts search response to a list of LangChain documents."""
        documents = []
        for result in search_results:
            if hasattr(result.document, "derived_struct_data"):
                doc_data = result.document.derived_struct_data
                for snippet in doc_data.get("snippets", []):
                    documents.append(
                        Document(
                            page_content=snippet.get("snippet", ""),
                            metadata={
                                "source": f"{doc_data.get('link', '')}:{snippet.get('pageNumber', '')}",
                                "id": result.document.id,
                            },
                        )
                    )
        return documents

    def get_relevant_documents(self, query: str) -> List[Document]:
        """Get documents relevant for a query."""
        request = discoveryengine_v1beta.SearchRequest(
            query=query,
            serving_config=self.serving_config,
            content_search_spec=self.content_search_spec,
            page_size=self.k,
        )
        response = self.client.search(request)
        documents = self._convert_search_response(response.results)

        return documents

    async def aget_relevant_documents(self, query: str) -> List[Document]:
        raise NotImplementedError("Async interface not implemented")


### Replace the existing Q&A Chain with the new retriever

In [None]:
from langchain.chains import ConversationalRetrievalChain
from langchain.llms import VertexAI


def qa_chain():
    # Using Enterprise Search as a retriever
    retriever = EnterpriseSearchRetriever(
        project_id=SEARCH_ENGINE_PROJECT_ID,
        search_engine_id=SEARCH_ENGINE_ID,
        k=4
        )

    # The selected GoogleLLM model uses embedded documents related to the query
    # It parses these documents in order to answer the user question.
    # We use the GoogleLLM LLM, however other models can be substituted here
    model = VertexAI(temperature=temperature)

    # A conversation retrieval chain keeps a history of Q&A / conversation
    # This allows for contextual questions such as "give an example of that (previous response)".
    # The chain is also set to return the source documents used in generating an output
    # This allows for explainability behind model output.
    chain = ConversationalRetrievalChain.from_llm(
        llm=model,
        retriever=retriever,
        return_source_documents=True,
        condense_question_prompt=SYSTEM_PROMPT,
    )

    return chain

### Running the app with the new retriever

In [None]:
def q_a(question: str, history: list):
    # map history (list of lists) to expected format of chat_history (list of tuples)
    chat_history = map(tuple, history)

    # Query the LLM to get a response
    # First the Q&A chain will collect documents semantically similar to the question
    # Then it will ask the LLM to use this data to answer the user question
    # We also provide chat history as further context
    response = qa_chain()(
        {
            "question": question,
            "chat_history": chat_history,
        }
    )

    # Format source documents (sources of excerpts passed to the LLM) into links the user can validate
    sources = [
        "[https://{0}](https://{0})".format(doc.metadata["source"].replace("index.html", ""))
        for doc in response["source_documents"]
    ]

    # Return the LLM answer, and list of sources used (formatted as a string)
    return response["answer"], "\n\n".join(sources)

In [None]:
def submit(msg, chatbot):
    # First create a new entry in the conversation log
    msg, chatbot = user(msg, chatbot)
    # Then get the chatbot response to the user question
    chatbot = bot(chatbot)
    return msg, chatbot


def user(user_message, history):
    # Return "" to clear the user input, and add the user question to the conversation history
    return "", history + [[user_message, None]]


def bot(history):
    # Get the user question from conversation history
    user_message = history[-1][0]
    # Get the response and sources used to answer the user question
    bot_message, bot_sources = q_a(user_message, history[:-1])

    # Using a template, format the response and sources together
    bot_template = "{0}\n\n<details><summary><b>Sources</b></summary>\n\n{1}</details>"
    # Place the response into the conversation history and return
    history[-1][1] = bot_template.format(bot_message, bot_sources)
    return history

In [None]:
import gradio as gr


def app():
    # Build a simple GradIO app that accepts user input and queries the LLM
    # Then displays the response in a ChatBot interface, with markdown support.
    with gr.Blocks(theme=gr.themes.Base()) as demo:
        # Set a page title
        gr.Markdown("# Custom knowledge worker")
        # Create a chatbot conversation log
        chatbot = gr.Chatbot(label="🤖 knowledge worker")
        # Create a textbox for user questions
        msg = gr.Textbox(
            label="👩‍💻 user input",
            info="Query information from the custom knowledge base.",
        )

        # Align both buttons on the same row
        with gr.Row():
            send = gr.Button(value="Send", variant="primary", size="sm")
            clear = gr.Button(value="Clear History", variant="secondary", size="sm")

        # Submit message on <enter> or clicking "Send" button
        msg.submit(submit, [msg, chatbot], [msg, chatbot], queue=False)
        send.click(submit, [msg, chatbot], [msg, chatbot], queue=False)

        # Clear chatbot history on clicking "Clear History" button
        clear.click(lambda: None, None, chatbot, queue=False)

    return demo

In [None]:
demo = app()

# Create a queue system so multiple users can access the page at once
demo.queue()
# Launch the webserver locally
demo.launch(share=True, debug=True)

# Conclusion

## What have we built?

In this session, we have built a knowledge worker use-case for accessing your complex information using Generative AI.
This concept can be extended into a fully-fledged tool that can unlock the value of your data for customers or internal use.

## Going further..

This workshop has introduced all the LangChain knowledge required to create a knowledge worker.
The next steps for moving this project from development to production are discussed below.

## Decoupling LangChain from Gradio

It is not necessary to run LangChain within a GradI/O app.
Decoupling LangChain into a separate API has several benefits:
1. We can deploy scalable servers / Docker containers
2. Simplified code - a frontend-backend loose coupling can lead to simpler code, which is ease to update and maintain.
3. If a more professional user interface is needed, such as a native React app.
Replacing GradI/O is a straightforward process - FastAPI can be called from javascript, etc., allowing you to move beyond Python frontend frameworks.

An example of this separation can be found on the GitHub repository, using FastAPI to create a simple LangChain API server and Poetry to manage separate server environments.

## Deploying on Google Cloud

Once we have decoupled our frontend and backend code, we can deploy the project onto Google Cloud.

This reference architecture diagrams mirror the flow diagrams we first introducted in the workshop introduction. Using Google Cloud, we can create production pipelines for creating / updating vector databases, and deploy a knowledge worker API (which can be connected to a web UI, Slack bot, etc.).

**Example architecture: Ingestion**
![A typical ingestion chain](https://github.com/teamdatatonic/gen-ai-hackathon/blob/7f37d477b18ace5912d34b0574512559d7a457ed/assets/knowledge-worker-gcp-ingestion-pipeline.png?raw=true)

By creating a pipeline for data ingestion, we can continue to extend the knowledge base of our knowledge worker as you produce new documents and documentation.

**Example architecture: Inference**
![A typical ingestion chain](https://github.com/teamdatatonic/gen-ai-hackathon/blob/7f37d477b18ace5912d34b0574512559d7a457ed/assets/knowledge-worker-gcp-inference-pipeline.png?raw=true)

By creating a pipeline for inference, we can leverage the power of Google Cloud to provide a highly reliable and scalable API that can power a variety of applications.



**🎉 Congratulations! 🎉** You've completed this notebook!
Now it's time to embark your Generative AI journey and ideate about use cases which can benefit your company in conjunction or in addition to your first knowledge worker.