- supervised tuning and reinforcement learning
- Crawl contents in reddit, summarize and store in vector DB
- Agent that look for relevant content

# Initialization

In [2]:
import os
from langchain import LLMChain
from langchain.llms import VertexAI
from langchain.schema import HumanMessage, SystemMessage
from langchain.embeddings import VertexAIEmbeddings
from langchain.chat_models import ChatVertexAI
from langchain.prompts import PromptTemplate
from google.cloud import aiplatform
import time
from typing import List
from langchain.callbacks import get_openai_callback
import inspect

In [3]:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = r'service-accounts\adroit-hall-301111-82c72e750ce5.json'

# Utility

In [None]:
# As of Oct 21 2023, this is not available for Vertex AI yet
def count_tokens(chain, query):
    with get_openai_callback() as cb:
        result = chain.run(query)
        print(f'Spent a total of {cb.total_tokens} tokens')

    return result

# Testing

## Text

In [17]:
# Define model
text_llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=50,
    temperature=0.1,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

# Define prompt
prompt = PromptTemplate(
    template = """
    Question: {question}
    Answer: Let's think step by step.
    """,
    input_variables=["question"]
    )

# Define chain
llm_chain = LLMChain(
    llm=text_llm,
    prompt=prompt,
    verbose=True
)

In [15]:
question = "Who was the president in the year Justin Beiber was born?"
print(llm_chain.run(question))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mQuestion: Who was the president in the year Justin Beiber was born?

Answer: Let's think step by step.[0m

[1m> Finished chain.[0m
Justin Beiber was born on March 1, 1994. The president in 1994 was Bill Clinton.
The final answer: Bill Clinton.


## Chat

In [None]:
# Chat
chat_llm = ChatVertexAI()

# Application examples

## Few-shot learning
Which is about giving some examples in the prompt, to inform the LLM about the expected behaviour

In [22]:
from langchain import FewShotPromptTemplate
from langchain.prompts.example_selector import LengthBasedExampleSelector

In [23]:
# create our examples
examples = [
    {
        "query": "How are you?",
        "answer": "I can't complain but sometimes I still do."
    }, {
        "query": "What time is it?",
        "answer": "It's time to get a watch."
    }, {
        "query": "What is the meaning of life?",
        "answer": "42"
    }, {
        "query": "What is the weather like today?",
        "answer": "Cloudy with a chance of memes."
    }, {
        "query": "What is your favorite movie?",
        "answer": "Terminator"
    }, {
        "query": "Who is your best friend?",
        "answer": "Siri. We have spirited debates about the meaning of life."
    }, {
        "query": "What should I do today?",
        "answer": "Stop talking to chatbots on the internet and go outside."
    }
]

In [31]:
# create a prompt example template
example_prompt = PromptTemplate(
    template="""
    User: {query}
    AI: {answer}
    """,
    input_variables=["query", "answer"]    
)

example_selector = LengthBasedExampleSelector(
    examples=examples,
    example_prompt=example_prompt,
    max_length=100  # this sets the max length that examples should be
)

In [34]:
# now break our previous prompt into a prefix and suffix
# the prefix is our instructions
prefix = """The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 
"""
# and the suffix our user input and output indicator
suffix = """
User: {query}
AI: """

# now create the few shot prompt template
few_shot_prompt_template = FewShotPromptTemplate(
    example_selector=example_selector, # use example_selector instead of examples
    # examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["query"],
    example_separator="\n"
)

In [62]:
query = "What is the meaning of life?"
print(few_shot_prompt_template.format(query=query))

The following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


    User: How are you?
    AI: I can't complain but sometimes I still do.
    

    User: What time is it?
    AI: It's time to get a watch.
    

    User: What is the meaning of life?
    AI: 42
    

User: What is the meaning of life?
AI: 


In [63]:
llm_chain = LLMChain(
    llm=text_llm,
    prompt=few_shot_prompt_template,
    verbose=True
)
query = "What is the meaning of life?"
print(llm_chain.run(query))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following are exerpts from conversations with an AI
assistant. The assistant is typically sarcastic and witty, producing
creative  and funny responses to the users questions. Here are some
examples: 


    User: How are you?
    AI: I can't complain but sometimes I still do.
    

    User: What time is it?
    AI: It's time to get a watch.
    

    User: What is the meaning of life?
    AI: 42
    

User: What is the meaning of life?
AI: [0m

[1m> Finished chain.[0m
To crush your enemies, see them driven before you, and to hear the lamentations of their women.


## Chains


### Chain types

Chains are divided in three types: Utility chains, Generic chains and Combine Documents chains.

1. Utility Chains: chains that are usually used to extract a specific answer from a llm with a very narrow purpose and are ready to be used out of the box.
2. Generic Chains: chains that are used as building blocks for other chains but cannot be used out of the box on their own.

Below shows some examples of utility chain. More can be found under [here](https://github.com/hwchase17/langchain-hub/tree/master/chains)

#### Math Chain
This chain can be used to perform calculations

In [86]:
from langchain.chains import LLMMathChain

llm_math = LLMMathChain(llm=text_llm, verbose=True)
llm_math.run("What is 13 raised to the .3432 power?")






[1m> Entering new LLMMathChain chain...[0m
What is 13 raised to the .3432 power?[32;1m[1;3m```text
13**(.3432)
```
...numexpr.evaluate("13**(.3432)")...
[0m
Answer: [33;1m[1;3m2.4116004626599237[0m
[1m> Finished chain.[0m


'Answer: 2.4116004626599237'

Under the hood the chain is structured like this

In [107]:
print(inspect.getsource(llm_math._call))

    def _call(
        self,
        inputs: Dict[str, str],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, str]:
        _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
        _run_manager.on_text(inputs[self.input_key])
        llm_output = self.llm_chain.predict(
            question=inputs[self.input_key],
            stop=["```output"],
            callbacks=_run_manager.get_child(),
        )
        return self._process_llm_result(llm_output, _run_manager)



As of time of writing (Oct 21, 2023), the chain issues a `self.llm_chain.predict` command, then apply `self._process_llm_result` against the output. So let's dive in and see what's the prompt it issues

In [103]:
# print(llm_math.prompt.template)
print("@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the prompt @@@@@@@@@@@@@@@@@@@@@@@@@@@")
print(llm_math.prompt)
print("@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the input variables @@@@@@@@@@@@@@@@@@@@@@@@@@@")
print(llm_math.prompt.input_variables)
print("@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the prompt template @@@@@@@@@@@@@@@@@@@@@@@@@@@")
print(llm_math.prompt.template)

@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the prompt @@@@@@@@@@@@@@@@@@@@@@@@@@@
input_variables=['question'] template='Translate a math problem into a expression that can be executed using Python\'s numexpr library. Use the output of running this code to answer the question.\n\nQuestion: ${{Question with math problem.}}\n```text\n${{single line mathematical expression that solves the problem}}\n```\n...numexpr.evaluate(text)...\n```output\n${{Output of running the code}}\n```\nAnswer: ${{Answer}}\n\nBegin.\n\nQuestion: What is 37593 * 67?\n```text\n37593 * 67\n```\n...numexpr.evaluate("37593 * 67")...\n```output\n2518731\n```\nAnswer: 2518731\n\nQuestion: 37593^(1/5)\n```text\n37593**(1/5)\n```\n...numexpr.evaluate("37593**(1/5)")...\n```output\n8.222831614237718\n```\nAnswer: 8.222831614237718\n\nQuestion: {question}\n'
@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the input variables @@@@@@@@@@@@@@@@@@@@@@@@@@@
['question']
@@@@@@@@@@@@@@@@@@@@@@@@@@@ This is the prompt template @@@@@@@@@@@@@@@@@@@

The result from LLM is then passed into `self._process_llm_result`. From the code we see the result is parsed and passed into `self._evaluate_expression`

In [108]:
print(inspect.getsource(llm_math._process_llm_result))

    def _process_llm_result(
        self, llm_output: str, run_manager: CallbackManagerForChainRun
    ) -> Dict[str, str]:
        run_manager.on_text(llm_output, color="green", verbose=self.verbose)
        llm_output = llm_output.strip()
        text_match = re.search(r"^```text(.*?)```", llm_output, re.DOTALL)
        if text_match:
            expression = text_match.group(1)
            output = self._evaluate_expression(expression)
            run_manager.on_text("\nAnswer: ", verbose=self.verbose)
            run_manager.on_text(output, color="yellow", verbose=self.verbose)
            answer = "Answer: " + output
        elif llm_output.startswith("Answer:"):
            answer = llm_output
        elif "Answer:" in llm_output:
            answer = "Answer: " + llm_output.split("Answer:")[-1]
        else:
            raise ValueError(f"unknown format from LLM: {llm_output}")
        return {self.output_key: answer}



And this `self._process_llm_result` passed the parsed code into the `numexpr` library for evaluation

In [109]:
print(inspect.getsource(llm_math._process_llm_result))

    def _evaluate_expression(self, expression: str) -> str:
        import numexpr  # noqa: F401

        try:
            local_dict = {"pi": math.pi, "e": math.e}
            output = str(
                numexpr.evaluate(
                    expression.strip(),
                    global_dict={},  # restrict access to globals
                    local_dict=local_dict,  # add common mathematical functions
                )
            )
        except Exception as e:
            raise ValueError(
                f'LLMMathChain._evaluate("{expression}") raised error: {e}.'
                " Please try again with a valid numerical expression"
            )

        # Remove any leading and trailing brackets from the output
        return re.sub(r"^\[|\]$", "", output)



#### Requests Chain
Chains could work differently from one another. This `Requests Chain` issues a web request via `BeautifulSoup`, then send the results to LLM which obtains an answer based on your prmopt

In [88]:
from langchain.chains import LLMRequestsChain, LLMChain

prompt = PromptTemplate(
    template="""
    Extract the answer to the question '{question}' or say "not found" if the information is not available.
    {requests_result}""",
    input_variables=["query", "requests_result"]
)

req_chain = LLMRequestsChain(llm_chain=LLMChain(llm=text_llm, prompt=prompt,verbose=True))

question = "What is the capital of UK?"
inputs = {
    "query": question,
    "url": "https://www.google.com/search?q=" + question.replace(" ", "+"),
}

req_chain.run(inputs)



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
    Extract the answer to the question 'What is the capital of UK?' or say "not found" if the information is not available.
    What is the capital of UK? - Google 搜尋若您在數秒內仍未能自動跳轉，請點擊這裏。無障礙功能連結跳至主內容 停用連續捲動功能啟用連續捲動功能無障礙功能說明無障礙功能意見  按下 / 便可跳至搜尋框What is the capital of UK?             顯示更多刪除刪除舉報不當的預測     搜尋模式全部圖片新聞影片地圖更多工具安全搜尋約 3,110,000,000 項搜尋結果 (0.46 秒)   搜尋結果 英國/首都倫敦London, city, capital of the United Kingdom. It is among the oldest of the world's great cities—its history spanning nearly two millennia—and one of the most cosmopolitan. By far Britain's largest metropolis, it is also the country's economic, transportation, and cultural centre.London | History, Maps, Population, Area, & Facts - BritannicaBritannicahttps://www.britannica.com › ... › Cities & Towns H-LBritannicahttps://www.britannica.com › ... › Cities & Towns H-L其他人也搜尋了英格蘭巴黎英國紐約大倫敦曼徹斯特倫敦市選擇您要提供意見的範疇或提供一般意見意見 相關問題您現在將會看到更多英文內容。Is London the ca

" The answer to the question 'What is the capital of UK?' is London."

Looking into the `self._call` we can see that it first obtain the result using `BeautifulSoup`, then passes the result into LLM using the prompt you defined

In [110]:
print(inspect.getsource(req_chain._call))

    def _call(
        self,
        inputs: Dict[str, Any],
        run_manager: Optional[CallbackManagerForChainRun] = None,
    ) -> Dict[str, Any]:
        from bs4 import BeautifulSoup

        _run_manager = run_manager or CallbackManagerForChainRun.get_noop_manager()
        # Other keys are assumed to be needed for LLM prediction
        other_keys = {k: v for k, v in inputs.items() if k != self.input_key}
        url = inputs[self.input_key]
        res = self.requests_wrapper.get(url)
        # extract the text from the html
        soup = BeautifulSoup(res, "html.parser")
        other_keys[self.requests_key] = soup.get_text()[: self.text_length]
        result = self.llm_chain.predict(
            callbacks=_run_manager.get_child(), **other_keys
        )
        return {self.output_key: result}



### Chaining
The Chains can be chained together. This example shows how to chain two chains:
1. **clean_extra_spaces_chain**: Clean text
2. **style_paraphrase_chain**: Feed the output for paraphrasing

In [68]:
from langchain.chains import LLMChain, LLMMathChain, TransformChain, SequentialChain
import re

In [69]:
# Define transform chain
def transform_func(inputs: dict) -> dict:
    text = inputs["text"]
    
    # replace multiple new lines and multiple spaces with a single one
    text = re.sub(r'(\r\n|\r|\n){2,}', r'\n', text)
    text = re.sub(r'[ \t]+', ' ', text)

    return {"output_text": text}

clean_extra_spaces_chain = TransformChain(input_variables=["text"], output_variables=["output_text"], transform=transform_func)

In [70]:
# Define paraphrasing chain
prompt = PromptTemplate(template="""
                        Paraphrase this text: {output_text} In the style of a {style}.
                        Paraphrase: """,
                        input_variables=["style", "output_text"])

style_paraphrase_chain = LLMChain(llm=text_llm, prompt=prompt, output_key='final_output', verbose=True)

In [71]:
# Chaining the chains
sequential_chain = SequentialChain(chains=[clean_extra_spaces_chain, style_paraphrase_chain], input_variables=['text', 'style'], output_variables=['final_output'])

input_text = """
Chains allow us to combine multiple 


components together to create a single, coherent application. 

For example, we can create a chain that takes user input,       format it with a PromptTemplate, 

and then passes the formatted response to an LLM. We can build more complex chains by combining     multiple chains together, or by 


combining chains with other components.
"""

print(sequential_chain.run({'text': input_text, 'style': 'a 90s rapper'}))



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
                        Paraphrase this text: 
Chains allow us to combine multiple 
components together to create a single, coherent application. 
For example, we can create a chain that takes user input, format it with a PromptTemplate, 
and then passes the formatted response to an LLM. We can build more complex chains by combining multiple chains together, or by 
combining chains with other components.
 In the style of a a 90s rapper.
                        Paraphrase: [0m

[1m> Finished chain.[0m
Yo

Chains are the bomb diggity. They allow you to combine multiple components together to create a single, coherent application. For example, you can create a chain that takes user input, formats it with a PromptTemplate, and then


## Retrieval Augmentation Generation
Improve LLM's response by augmenting LLM's knowledge with external data sources such as documents

Following are the sequence of tasks when ingesting knowledge base sources into the vector store:
- Read the documents (PDF files in this notebook)
- Chunk the documents  to include relevant parts of the document as context to the prompt
- Generate embeddings for each chunked document
- Add embedding to the vector store

Following is the data flow at runtime when user prompts the model:
- User enters a prompt or asks a question as a prompt
- Generated embedding for the user prompt to capture semantics
- Search the vector store to retrieve the nearest embeddings (relevant documents) closer to the prompt
- Fetch the actual text for the retrieved embeddings to add as context to the user's prompt
- Add the retrieved documents as context to the user's prompt
- Send the updated prompt to the LLM
- Return a summarized response to the user with references to the sources from the knowledge base

### STEP 0: Getting Started

#### Download custom Python modules and utilities

The cell below will download some helper functions needed for using [Vertex AI Matching Engine](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) in this notebook. These helper functions were created to keep this notebook more tidy and concise, and you can also [view them directly on Github](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/language/use-cases/document-qa/utils).

In [1]:
import os
import urllib.request

if not os.path.exists("utils"):
    os.makedirs("utils")

url_prefix = "https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/language/use-cases/document-qa/utils"
files = ["__init__.py", "matching_engine.py", "matching_engine_utils.py"]

for fname in files:
    urllib.request.urlretrieve(f"{url_prefix}/{fname}", filename=f"utils/{fname}")

#### Import libraries

In [1]:
import json
import textwrap
# Utils
import time
import uuid
from typing import List

import numpy as np
import vertexai
# Vertex AI
from google.cloud import aiplatform

print(f"Vertex AI SDK version: {aiplatform.__version__}")

# LangChain
import langchain

print(f"LangChain version: {langchain.__version__}")

from langchain.chains import RetrievalQA
from langchain.document_loaders import GCSDirectoryLoader
from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from pydantic import BaseModel

# Import custom Matching Engine packages
from langchain.vectorstores import MatchingEngine
from utils.matching_engine_utils import MatchingEngineUtils

from tqdm import tqdm

Vertex AI SDK version: 1.35.0
LangChain version: 0.0.319


In [2]:
PROJECT_ID = "adroit-hall-301111"  # @param {type:"string"}
REGION = "us-central1"  # @param {type:"string"}
INDEX_ID = "6862927280205725696"
ENDPOINT_ID = "1217383672320098304"

# Initialize Vertex AI SDK
vertexai.init(project=PROJECT_ID, location=REGION)

Next you will define some utility functions that you will use for the Vertex AI Embeddings API

In [3]:
# Utility functions for Embeddings API with rate limiting
def rate_limit(max_per_minute):
    period = 60 / max_per_minute
    print("Waiting")
    while True:
        before = time.time()
        yield
        after = time.time()
        elapsed = after - before
        sleep_time = max(0, period - elapsed)
        if sleep_time > 0:
            print(".", end="")
            time.sleep(sleep_time)


class CustomVertexAIEmbeddings(VertexAIEmbeddings):
    requests_per_minute: int
    num_instances_per_batch: int

    # Overriding embed_documents method
    def embed_documents(self, texts: List[str]):
        limiter = rate_limit(self.requests_per_minute)
        results = []
        docs = list(texts)

        while docs:
            # Working in batches because the API accepts maximum 5
            # documents per request to get embeddings
            head, docs = (
                docs[: self.num_instances_per_batch],
                docs[self.num_instances_per_batch :],
            )
            chunk = self.client.get_embeddings(head)
            results.extend(chunk)
            next(limiter)

        return [r.values for r in results]

Initialize LangChain Models

In [4]:
# Text model instance integrated with langChain
llm = VertexAI(
    model_name="text-bison@001",
    max_output_tokens=1024,
    temperature=0.2,
    top_p=0.8,
    top_k=40,
    verbose=True,
)

# Embeddings API integrated with langChain
EMBEDDING_QPM = 100
EMBEDDING_NUM_BATCH = 5
embeddings = CustomVertexAIEmbeddings(
    requests_per_minute=EMBEDDING_QPM,
    num_instances_per_batch=EMBEDDING_NUM_BATCH,
)

### STEP 1: Create Matching Engine Index and Endpoint for Retrieval


[Embeddings](https://cloud.google.com/blog/topics/developers-practitioners/meet-ais-multitool-vector-embeddings) are a way of representing data as n-dimensional vector, in a space where the locations of those points in space are semantically meaningful. These embeddings can be then used to find similar data points. You can get text embeddings using [Vertex AI Embeddings API](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings). These embeddings are managed using a vector database.


[Vertex AI Matching Engine](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) is a Google Cloud managed vector database, which stores data as high-dimensional vectors (embeddings) and can find the most similar vectors from over a billion vectors. Matching Engine's Approximate Nearest Neigbors (ANN) service can serve similarity-matching queries at high queries per second (QPS). Unlike vector stores that run locally, Matching Engine is optimized for scale (multi-million and billion vectors) and it's an enterprise ready engine.

As part of the environment setup, create an index on Vertex AI Matching Engine and deploy the index to an Endpoint. Index Endpoint can be [public](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public) or [private](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-vpc). This notebook uses a **Public endpoint**.

<br/>

Refer to the [Matching Engine documentation](https://cloud.google.com/vertex-ai/docs/matching-engine/overview) for details.

<br/>

<div class="alert alert-block alert-warning">
<b>⚠️ NOTE: Please note creating an Index on Matching Engine and deploying the Index to an Index Endpoint can take up to 1 hour.</b>
</div>

- Configure parameters to create Matching Engine index
    - `ME_REGION`: Region where Matching Engine Index and Index Endpoint are deployed
    - `ME_INDEX_NAME`: Matching Engine index display name
    - `ME_EMBEDDING_DIR`: Cloud Storage path to allow inserting, updating or deleting the contents of the Index
    - `ME_DIMENSIONS`: The number of dimensions of the input vectors. Vertex AI Embedding API generates 768 dimensional vector embeddings.

In [15]:
ME_REGION = REGION
ME_INDEX_NAME = f"{PROJECT_ID}-me-index"  # @param {type:"string"}
ME_EMBEDDING_DIR = f"{PROJECT_ID}-me-bucket"  # @param {type:"string"}
ME_DIMENSIONS = 768  # when using Vertex PaLM Embedding

Make a Google Cloud Storage bucket for your Matching Engine index

In [16]:
! gsutil mb -p $PROJECT_ID -l $ME_REGION gs://$ME_EMBEDDING_DIR

Creating gs://adroit-hall-301111-me-bucket/...
ServiceException: 409 A Cloud Storage bucket named 'adroit-hall-301111-me-bucket' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


Create a dummy embeddings file to initialize when creating the index
Note: Don't think it is needed

In [17]:
# # dummy embedding
# init_embedding = {"id": str(uuid.uuid4()), "embedding": list(np.zeros(ME_DIMENSIONS))}

# # dump embedding to a local file
# with open("embeddings_0.json", "w") as f:
#     json.dump(init_embedding, f)

# # write embedding to Cloud Storage
# !gsutil cp embeddings_0.json gs://{ME_EMBEDDING_DIR}/init_index/embeddings_0.json

Copying file://embeddings_0.json [Content-Type=application/json]...
/ [0 files][    0.0 B/  3.8 KiB]                                                
/ [1 files][  3.8 KiB/  3.8 KiB]                                                
-

Operation completed over 1 objects/3.8 KiB.                                      


#### Create Index

Create the Index itself before deployment

In [19]:
tree_ah_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name=ME_INDEX_NAME,
    contents_delta_uri=f'gs://{ME_EMBEDDING_DIR}/init_index',
    dimensions=ME_DIMENSIONS,
    approximate_neighbors_count=150,
    distance_measure_type="DOT_PRODUCT_DISTANCE",
    leaf_node_embedding_count=500,
    leaf_nodes_to_search_percent=7,
    description="Index for LangChain demo",
    labels={"label_name": "label_value"},
)

if tree_ah_index:
    print(tree_ah_index.name)

# Index created is 6862927280205725696

Creating MatchingEngineIndex


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Creating MatchingEngineIndex


Create MatchingEngineIndex backing LRO: projects/712368347106/locations/us-central1/indexes/4091806134489317376/operations/3290376988086239232


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Create MatchingEngineIndex backing LRO: projects/712368347106/locations/us-central1/indexes/4091806134489317376/operations/3290376988086239232


KeyboardInterrupt: 

#### Deploy Index to Endpoint

Deploy index to Index Endpoint on Matching Engine. This notebook [deploys the index to a public endpoint](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public). The deployment operation creates a  public endpoint that will be used for querying the index for approximate nearest neighbors.

For deploying index to a Private Endpoint, refer to the [documentation](https://cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-vpc) to set up pre-requisites.

In [20]:
index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name="index_endpoint_for_demo",
    description="index endpoint description",
    public_endpoint_enabled=True
)

Creating MatchingEngineIndexEndpoint


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:Creating MatchingEngineIndexEndpoint


Create MatchingEngineIndexEndpoint backing LRO: projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304/operations/9190092499941588992


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:Create MatchingEngineIndexEndpoint backing LRO: projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304/operations/9190092499941588992


MatchingEngineIndexEndpoint created. Resource name: projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:MatchingEngineIndexEndpoint created. Resource name: projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304


To use this MatchingEngineIndexEndpoint in another session:


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:To use this MatchingEngineIndexEndpoint in another session:


index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304')


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index_endpoint:index_endpoint = aiplatform.MatchingEngineIndexEndpoint('projects/712368347106/locations/us-central1/indexEndpoints/1217383672320098304')


In [29]:
# # Obtain the created index
# try:
#     tree_ah_index = tree_ah_index
# except NameError:
#     tree_ah_index = aiplatform.MatchingEngineIndex(
#         index_name = INDEX_ID,
#         project = PROJECT_ID,
#         location = REGION
# )
    
# # Obtain the created index endpoint
# try:
#     index_endpoint = index_endpoint
# except NameError:
#     index_endpoint = aiplatform.MatchingEngineIndexEndpoint(
#         index_endpoint_name = ENDPOINT_ID,
#         project = PROJECT_ID,
#         location = REGION
# )

# # Deploy the index to the index endpoint
# DEPLOYED_INDEX_ID = "tree_ah_deployed_unique"

# index_endpoint = index_endpoint.deploy_index(
#     index=tree_ah_index, deployed_index_id=DEPLOYED_INDEX_ID
# )

# if index_endpoint:
#     print(f"Index endpoint resource name: {index_endpoint.name}")
#     print(
#         f"Index endpoint public domain name: {index_endpoint.public_endpoint_domain_name}"
#     )
#     print("Deployed indexes on the index endpoint:")
#     for d in index_endpoint.deployed_indexes:
#         print(f"    {d.id}")

### STEP 2: Add Document Embeddings to Matching Engine - Vector Store

This step ingests and parse PDF documents, split them, generate embeddings and add the embeddings to the vector store. The document corpus used as dataset is a sample of Google published research papers across different domains - large models, traffic simulation, productivity etc.

#### Ingest PDF files

The document corpus is hosted on Cloud Storage bucket (at `gs://github-repo/documents/google-research-pdfs/`) and LangChain provides a convenient document loader [`GCSDirectoryLoader`](https://python.langchain.com/en/latest/modules/indexes/document_loaders/examples/google_cloud_storage_directory.html) to load documents from a Cloud Storage bucket. The loader uses `Unstructured` package to load files of many types including pdfs, images, html and more.

Make a Google Cloud Storage bucket in your GCP project to copy the document files into.

In [5]:
GCS_BUCKET_DOCS = f"{PROJECT_ID}-documents"
FOLDER_PREFIX = "documents/google-research-pdfs/"

In [31]:
!gsutil mb -p $PROJECT_ID -l $ME_REGION gs://$GCS_BUCKET_DOCS

Creating gs://adroit-hall-301111-documents/...


Copy document files to your bucket

In [32]:
!gsutil rsync -r gs://github-repo/documents/google-research-pdfs/ gs://$GCS_BUCKET_DOCS/$FOLDER_PREFIX

Building synchronization state...
Starting synchronization...
Copying gs://github-repo/documents/google-research-pdfs/a-human-centered-approach-to-developer-productivity.pdf [Content-Type=application/pdf]...
/ [0 files][    0.0 B/  2.7 MiB]                                                
-
\
\ [1 files][  2.7 MiB/  2.7 MiB]                                                
Copying gs://github-repo/documents/google-research-pdfs/a-mixed-methods-approach-to-understanding-user-trust-after-voice-assistant-failures.pdf [Content-Type=application/pdf]...
\ [1 files][  2.7 MiB/  3.5 MiB]                                                
|
| [2 files][  3.5 MiB/  3.5 MiB]                                                
Copying gs://github-repo/documents/google-research-pdfs/a-mixture-of-expert-approach-to-rl-based-dialogue-management.pdf [Content-Type=application/pdf]...
| [2 files][  3.5 MiB/  5.0 MiB]                                                
/
/ [3 files][  5.0 MiB/  5.0 MiB]              

Load documents and add document metadata such as file name, to be retrieved later when citing the references.

In [6]:
# Ingest PDF files

print(f"Processing documents from {GCS_BUCKET_DOCS}")
loader = GCSDirectoryLoader(
    project_name=PROJECT_ID, bucket=GCS_BUCKET_DOCS, prefix=FOLDER_PREFIX
)
documents = loader.load()

# Add document name and source to the metadata
for document in tqdm(documents):
    doc_md = document.metadata
    document_name = doc_md["source"].split("/")[-1]
    # derive doc source from Document loader
    doc_source_prefix = "/".join(GCS_BUCKET_DOCS.split("/")[:3])
    doc_source_suffix = "/".join(doc_md["source"].split("/")[4:-1])
    source = f"{doc_source_prefix}/{doc_source_suffix}"
    document.metadata = {"source": source, "document_name": document_name}

print(f"# of documents loaded (pre-chunking) = {len(documents)}")

Processing documents from adroit-hall-301111-documents
# of documents loaded (pre-chunking) = 20


Verify document metadata

In [7]:
documents[0].metadata

{'source': 'adroit-hall-301111-documents/google-research-pdfs',
 'document_name': 'a-human-centered-approach-to-developer-productivity.pdf'}

#### Chunk documents

Split the documents to smaller chunks. When splitting the document, ensure a few chunks can fit within the context length of LLM.

In [11]:
# split the documents into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
)
doc_splits = text_splitter.split_documents(documents)

# Add chunk number to metadata
for idx, split in enumerate(doc_splits):
    split.metadata["chunk"] = idx

print(f"# of documents = {len(doc_splits)}")

# of documents = 2093


In [76]:
doc_splits[0].metadata

{'source': 'adroit-hall-301111-documents/google-research-pdfs',
 'document_name': 'a-human-centered-approach-to-developer-productivity.pdf',
 'chunk': 0}

In [13]:
# See how the texts are split
for i in range(0,10):
    print(i, doc_splits[i].metadata['document_name'], len(doc_splits[i].page_content))

0 a-human-centered-approach-to-developer-productivity.pdf 961
1 a-human-centered-approach-to-developer-productivity.pdf 959
2 a-human-centered-approach-to-developer-productivity.pdf 878
3 a-human-centered-approach-to-developer-productivity.pdf 969
4 a-human-centered-approach-to-developer-productivity.pdf 669
5 a-human-centered-approach-to-developer-productivity.pdf 434
6 a-human-centered-approach-to-developer-productivity.pdf 898
7 a-human-centered-approach-to-developer-productivity.pdf 142
8 a-human-centered-approach-to-developer-productivity.pdf 790
9 a-human-centered-approach-to-developer-productivity.pdf 735


#### Configure Matching Engine as Vector Store

Initialize Matching Engine vector store with text embeddings model. These are updated for the original MatchingEngine class

- As of time of writing (2023-10-22), there is a bug in langchain `MachineEngine` class causing `add_texts` to fail during json append
- On the other hand, updating the key for metadata as `restricts` following GCP's documentation. See [here](https://cloud.google.com/vertex-ai/docs/vector-search/filtering)
- ID can be updated according to the use case concerned to avoid duplicated search result for the same document

In [65]:
from typing import TYPE_CHECKING, Any, Iterable, List, Optional, Type
import logging

logger = logging.getLogger()

class CustomMatchingEngine(MatchingEngine):

    def add_texts(
        self,
        texts: Iterable[str],
        metadatas: Optional[List[dict]] = None,

        **kwargs: Any,
    ) -> List[str]:
        """Run more texts through the embeddings and add to the vectorstore.

        Args:
            texts: Iterable of strings to add to the vectorstore.
            metadatas: Optional list of metadatas associated with the texts.
            kwargs: vectorstore specific parameters.

        Returns:
            List of ids from adding the texts into the vectorstore.
        """
        texts = list(texts)
        if metadatas is not None and len(texts) != len(metadatas):
            raise ValueError(
                "texts and metadatas do not have the same length. Received "
                f"{len(texts)} texts and {len(metadatas)} metadatas."
            )
        logger.debug("Embedding documents.")
        embeddings = self.embedding.embed_documents(texts)
        jsons = []
        ids = []
        # Could be improved with async.
        for idx, (embedding, text) in enumerate(zip(embeddings, texts)):
            id = str(uuid.uuid4())
            ids.append(id)
            json_: dict = {"id": id, "embedding": embedding}
            if metadatas is not None:
                # My change on 2023-10-22: json_["restricts"] for the accepted metadata field, not json_["metadata"]
                json_["restricts"] = metadatas[idx]
            # My change on 2023-10-22: jsons.append(json_), not the json module
            jsons.append(json_)
            self._upload_to_gcs(text, f"documents/{id}")

        logger.debug(f"Uploaded {len(ids)} documents to GCS.")

        # Creating json lines from the embedded documents.
        result_str = "\n".join([json.dumps(x) for x in jsons])

        filename_prefix = f"indexes/{uuid.uuid4()}"
        filename = f"{filename_prefix}/{time.time()}.json"
        self._upload_to_gcs(result_str, filename)
        logger.debug(
            f"Uploaded updated json with embeddings to "
            f"{self.gcs_bucket_name}/{filename}."
        )

        self.index = self.index.update_embeddings(
            contents_delta_uri=f"gs://{self.gcs_bucket_name}/{filename_prefix}/"
        )

        logger.debug("Updated index with new configuration.")

        return ids

In [66]:
# initialize vector store
me = CustomMatchingEngine.from_components(
    project_id=PROJECT_ID,
    region=ME_REGION,
    gcs_bucket_name=f"gs://{ME_EMBEDDING_DIR}".split("/")[2],
    embedding=embeddings,
    index_id=INDEX_ID,
    endpoint_id=ENDPOINT_ID,
)

DEBUG:root:Creating matching engine index with id 6862927280205725696.
DEBUG:root:Creating endpoint with id 1217383672320098304.
DEBUG:google.auth._default:Checking None for explicit credentials as part of auth process...
DEBUG:google.auth._default:Checking Cloud SDK credentials as part of auth process...
DEBUG:root:Initializing AI Platform for project adroit-hall-301111 on us-central1 and for adroit-hall-301111-me-bucket.


#### Add documents as embeddings in Matching Engine as index

The document chunks are transformed as embeddings (vectors) using Vertex AI Embeddings API and added to the index with **[streaming index update](https://cloud.google.com/vertex-ai/docs/matching-engine/create-manage-index#create-index)**. With Streaming Updates, you can update and query your index within a few seconds.

The original document text is stored on Cloud Storage bucket had referenced by id.

Prepare text and metadata to be added to the vectors

In [17]:
# Store docs as embeddings in Matching Engine index
# It may take a while since API is rate limited
texts = [doc.page_content for doc in doc_splits]
metadatas = [
    [
        {"namespace": "source", "allow_list": [doc.metadata["source"]]},
        {"namespace": "document_name", "allow_list": [doc.metadata["document_name"]]},
        {"namespace": "chunk", "allow_list": [str(doc.metadata["chunk"])]},
    ]
    for doc in doc_splits
]

Add embeddings to the vector store

**NOTE:** Depending on the volume and size of documents, this step may take time.

In [68]:
# Note: This could be very long running. It took 24 mins for 2 documents
doc_ids = me.add_texts(texts=texts[0:2], metadatas=metadatas[0:2])

DEBUG:root:Embedding documents.
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "GET /storage/v1/b/adroit-hall-301111-me-bucket?projection=noAcl&prettyPrint=false HTTP/1.1" 200 540


Waiting


DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /upload/storage/v1/b/adroit-hall-301111-me-bucket/o?uploadType=multipart HTTP/1.1" 200 942
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "GET /storage/v1/b/adroit-hall-301111-me-bucket?projection=noAcl&prettyPrint=false HTTP/1.1" 200 540
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /upload/storage/v1/b/adroit-hall-301111-me-bucket/o?uploadType=multipart HTTP/1.1" 200 942
DEBUG:root:Uploaded 2 documents to GCS.
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "GET /storage/v1/b/adroit-hall-301111-me-bucket?projection=noAcl&prettyPrint=false HTTP/1.1" 200 540
DEBUG:urllib3.connectionpool:https://storage.googleapis.com:443 "POST /upload/storage/v1/b/adroit-hall-301111-me-bucket/o?uploadType=multipart HTTP/1.1" 200 1032
DEBUG:root:Uploaded updated json with embeddings to adroit-hall-301111-me-bucket/indexes/acc5958a-925f-49be-a78c-5b713e3d2a4e/1697942560.963451.js

Updating MatchingEngineIndex index: projects/712368347106/locations/us-central1/indexes/6862927280205725696


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Updating MatchingEngineIndex index: projects/712368347106/locations/us-central1/indexes/6862927280205725696


Update MatchingEngineIndex index backing LRO: projects/712368347106/locations/us-central1/indexes/6862927280205725696/operations/1881137332812251136


INFO:google.cloud.aiplatform.matching_engine.matching_engine_index:Update MatchingEngineIndex index backing LRO: projects/712368347106/locations/us-central1/indexes/6862927280205725696/operations/1881137332812251136
DEBUG:google.api_core.retry:Retrying due to , sleeping 0.9s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 0.9s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 1.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 2.0s ...
DEBUG:google.api_core.retry:Retrying due to , sleeping 4.9s ...


KeyboardInterrupt: 

Validate semantic search with Matching Engine is working

In [74]:
me.similarity_search("What are video localized narratives?", k=2)

Waiting


[Document(page_content='DEVELOPER PRODUCTIVITY FOR HUMANS\n\nEditor: Ciera Jaspan Google ciera@google.com\n\nEditor: Collin Green Google colling@google.com\n\nA Human-Centered Approach to Developer Productivity\n\nCiera Jaspan and Collin Green\n\nFrom the Editors\n\nThe “Developer Productivity for Humans” column aims to draw attention to\n\nadvances and challenges in research and practice in tools and practices that\n\nhelp improve developers’ day-to-day tasks. In this column, we reinforce that\n\nsoftware engineers and developers are human and productivity tools should\n\nsupport making their jobs easier as opposed to turning them into productivity\n\nmachines. We share our experiences and expertise and welcome your contribu-\n\ntions and feedback.\n\nWE LEAD A mixed-methods re- search team at Google that seeks to understand what makes engi- neers productive and happy. We explore the impact of different engineering tools, infrastructure, processes, and best practices on engineering pr

In [75]:
me.similarity_search("What is NFC?", k=2, search_distance=0.4)

Waiting


[Document(page_content='Introduction As part of our job, we regularly meet with and advise Google leaders on what changes they should make (or should not make) to our development tools and processes. These leaders frequently wish to understand—in\n\nDigital Object Identifier 10.1109/MS.2022.3212165 Date of current version: 23 December 2022\n\nsimple terms—whether productivity is up, down, or stable. They want to know whether their particular tool is making an impact (for example, “Is my framework making develop- ers more productive?”). They hope to see a single metric that clearly goes up or down (and they want “up” and “down” to map unambiguously to “good” and “bad”). Alas, we fre- quently disappoint them, not because of the estimated effect of their sys- tem, but because of the uncertainty around such effects; uncertainty that comes from the fact that mea- suring developer productivity is in- herently difficult.\n\nWhy is it so difficult to measure\n\ndeveloper productivity?'),
 Docu

---

### STEP 3: Retrieval based Question/Answering Chain

LangChain provides easy ways to chain multiple tasks that can do QA over a set of documents, called QA chains. The notebook works with [**RetrievalQA**](https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa.html) chain which is based on **load_qa_chain** under the hood.

In the retrieval augmented generation chain, the Matching Engine uses semantic search to retrieve relevant documents based on the user's question. The resulting documents are then added as additional context to the prompt sent to the LLM, along with the user's question, to generate a response. Thus the response generated by LLM is grounded to your documents in the corpus.

This way, a user would only need to provide their question as a prompt and the retrieval chain would be able to seek the answers using Matching Engine directly, and return a proper text response answering the question.

#### Configure Question/Answering Chain with Vector Store using Text

Define Matching Engine Vector Store as retriever that takes in a query and returns a list of relevant documents. The retriever implementation supports configuring number of documents to fetch and filtering by search distance as a threshold value parameter.

In [77]:
# Create chain to answer questions
NUMBER_OF_RESULTS = 10
SEARCH_DISTANCE_THRESHOLD = 0.6

# Expose index to the retriever
retriever = me.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": NUMBER_OF_RESULTS,
        "search_distance": SEARCH_DISTANCE_THRESHOLD,
    },
)

Customize the default retrieval prompt template

In [78]:
template = """SYSTEM: You are an intelligent assistant helping the users with their questions on research papers.

Question: {question}

Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."

=============
{context}
=============

Question: {question}
Helpful Answer:"""

Configure RetrievalQA chain

In [79]:
# Uses LLM to synthesize results from the search index.
# Use Vertex PaLM Text API for LLM
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True,
    verbose=True,
    chain_type_kwargs={
        "prompt": PromptTemplate(
            template=template,
            input_variables=["context", "question"],
        ),
    },
)

Enable verbose logging for debugging and troubleshooting the chains which includes the complete prompt to the LLM

In [80]:
# Enable for troubleshooting
qa.combine_documents_chain.verbose = True
qa.combine_documents_chain.llm_chain.verbose = True
qa.combine_documents_chain.llm_chain.llm.verbose = True

Utility function to format the result

In [81]:
def formatter(result):
    print(f"Query: {result['query']}")
    print("." * 80)
    if "source_documents" in result.keys():
        for idx, ref in enumerate(result["source_documents"]):
            print("-" * 80)
            print(f"REFERENCE #{idx}")
            print("-" * 80)
            if "score" in ref.metadata:
                print(f"Matching Score: {ref.metadata['score']}")
            if "source" in ref.metadata:
                print(f"Document Source: {ref.metadata['source']}")
            if "document_name" in ref.metadata:
                print(f"Document Name: {ref.metadata['document_name']}")
            print("." * 80)
            print(f"Content: \n{wrap(ref.page_content)}")
    print("." * 80)
    print(f"Response: {wrap(result['result'])}")
    print("." * 80)


def wrap(s):
    return "\n".join(textwrap.wrap(s, width=120, break_long_words=False))


def ask(query, qa=qa, k=NUMBER_OF_RESULTS, search_distance=SEARCH_DISTANCE_THRESHOLD):
    qa.retriever.search_kwargs["search_distance"] = search_distance
    qa.retriever.search_kwargs["k"] = k
    result = qa({"query": query})
    return formatter(result)

#### Run QA chain on sample questions

Following are sample questions you could try. Wehn you run the query, RetrievalQA chain takes the user question, call the retriever to fetch top *k* semantically similar texts from the Matching Engine Index (vector store) and passes to the LLM as part of the prompt. The final prompt sent to the LLM looks of this format:

```
SYSTEM: {system}

=============
{context}
=============

Question: {question}
Helpful Answer:
```

where:
 - `system`: Instructions for LLM on how to respond to the question based on the context
 - `context`: Semantically similar text (a.k.a snippets) retreived from the vector store
 - `question`: question posed by the user


The response returned from the LLM includes both the response and references that lead to the response. This way the response from LLM is always grounded to the sources. Here we have formatted the response as:

```
Question: {question}
--------------------------------------------------------------------------------
REFERENCE #n
--------------------------------------------------------------------------------
Matching Score: <score>
Document Source: <document source location>
Document Name: <document file name>
................................................................................
Context:
{}
................................................................................
Response: <answer returned by the LLM>
................................................................................
```

In [85]:
ask("describe why it is difficult to measure productivity")



[1m> Entering new RetrievalQA chain...[0m
Waiting


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSYSTEM: You are an intelligent assistant helping the users with their questions on research papers.

Question: describe why it is difficult to measure productivity

Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."

Introduction As part of our job, we regularly meet with and advise Google leaders on what changes they should make (or should not make) to our development tools and processes. These leaders frequently wish to understand—in

Digital Object Identifier 10.1109/MS.2022.3212165 Date of current vers

Let's ask a question which is outside of the domain in the corpus. You should see something like - "I cannot determine the answer to that". This is because the output is conditioned in the prompts to not to respond when the question is out of the context.

Following is the instructions in prompt template that is configured in the retrieval QA chain above:

```
Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."
```

In [86]:
ask("what is the meaning of life?")



[1m> Entering new RetrievalQA chain...[0m
Waiting


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mSYSTEM: You are an intelligent assistant helping the users with their questions on research papers.

Question: what is the meaning of life?

Strictly Use ONLY the following pieces of context to answer the question at the end. Think step-by-step and then answer.

Do not try to make up an answer:
 - If the answer to the question cannot be determined from the context alone, say "I cannot determine the answer to that."
 - If the context is empty, just say "I do not know the answer to that."

DEVELOPER PRODUCTIVITY FOR HUMANS

Editor: Ciera Jaspan Google ciera@google.com

Editor: Collin Green Google colling@google.com

A Human-Centered Approach to Developer Productivity

Ciera Jaspan and Collin Green

From the Editors

The “Developer Productivity for Humans” column aims to draw attention to

advances and 

### Clean Up

Please delete Matching Index and Index Endpoint after running your experiments to avoid incurring additional charges. Please note that you will be charged as long as the endpoint is running.



<div class="alert alert-block alert-warning">
<b>⚠️ NOTE: Enabling `CLEANUP_RESOURCES` flag deletes Matching Engine Index, Index Endpoint and Cloud Storage bucket. Please run it with caution.</b>
</div>

In [87]:
CLEANUP_RESOURCES = True

In [89]:
mengine = MatchingEngineUtils(PROJECT_ID, ME_REGION, ME_INDEX_NAME)
ME_INDEX_ID = INDEX_ID
ME_INDEX_ENDPOINT_ID = ENDPOINT_ID
print(f"ME_INDEX_ID={ME_INDEX_ID}")
print(f"ME_INDEX_ENDPOINT_ID={ME_INDEX_ENDPOINT_ID}")

ME_INDEX_ID=6862927280205725696
ME_INDEX_ENDPOINT_ID=1217383672320098304


- Undeploy indexes and Delete index endpoint

In [95]:
!gcloud ai index-endpoints undeploy-index 1217383672320098304 \
  --deployed-index-id=tree_ah_deployed_unique \
  --project=adroit-hall-301111 \
  --region=us-central1

'@type': type.googleapis.com/google.cloud.aiplatform.v1.UndeployIndexResponse


Using endpoint [https://us-central1-aiplatform.googleapis.com/]


In [96]:
!gcloud ai index-endpoints delete 1217383672320098304 \
  --project=adroit-hall-301111 \
  --region=us-central1

- Delete index

In [None]:
gcloud ai indexes delete 6862927280205725696 \
  --project=adroit-hall-301111 \
  --region=us-central1

- Delete contents from the Cloud Storage bucket

In [None]:
if CLEANUP_RESOURCES and "ME_EMBEDDING_DIR" in globals():
    print(f"Deleting contents from the Cloud Storage bucket {ME_EMBEDDING_DIR}")
    ME_EMBEDDING_BUCKET = "/".join(ME_EMBEDDING_DIR.split("/")[:3])

    shell_output = ! gsutil du -ash gs://$ME_EMBEDDING_BUCKET
    print(shell_output)
    print(
        f"Size of the bucket {ME_EMBEDDING_BUCKET} before deleting = {' '.join(shell_output[0].split()[:2])}"
    )

    # uncomment below line to delete contents of the bucket
    # ! gsutil -m rm -r gs://$ME_EMBEDDING_BUCKET

## Agent and Tools

**Definition**: The key behind agents is giving LLM's the possibility of using tools in their workflow. This is where langchain departs from the popular chatgpt implementation and we can start to get a glimpse of what it offers us as builders. Until now, we covered several building blocks in isolation. Let's see them come to life.

The official definition of agents is the following:


> Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning to the user.

Here are some useful notebooks illustrating the concepts:
- [Agents](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/06-langchain-agents.ipynb)
- [Custom Tools](https://github.com/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/07-langchain-tools.ipynb) 
- [Example ReAct agent which uses RA as tool with conversational memory](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/handbook/08-langchain-retrieval-agent.ipynb#scrollTo=JaKTzPUEvOoy)

## Streamlit
Great for demo, please see `test_streamlit.py`