News summarization with LangChain agents and Vertex AI PaLM text models

Overview

It is a very common practice in real-world business scenarios to integrate Large Language Models (LLMs) with external sources of knowledge or applications. In a groundbreaking paper titled Synergizing Reasoning and Acting in Language Models (https://arxiv.org/pdf/2210.03629.pdf), Google Research and Princeton University introduced a paradigm - ReAct - that combines reasoning and acting with LLMs by allowing models to interact with external environments to gather additional information.

Langchain (https://python.langchain.com/docs/get_started/introduction) is a framework that allows you to create applications powered by language models. It includes extensive support for agents based on the ReAct concepts. Langchain agents use tools to interact with external systems. A tool is a component that performs a specific tasks, such as retrieving data from an external search engine. LangChain includes a number of predefined tools, such as tools for interacting with Google search, wikipedia, ArXiv, SQL databases, and many more. You can also define your own tools. 

This notebook illustrates how to use LangChain agents with VertexAI PaLM text models and custom tools. You will build an agent that can help you discover the most popular Google Search terms and analyze news articles related to those terms. The dataset is hosted on Google BigQuery as part of the Google Cloud Datasets initiative.

The GDELT Project, which is supported by Google Jigsaw, monitors the world's broadcast, print and web news from nearly every corner of every country in over 100 languages. The GDELT database is free to use and accessible via a variety of interfaces, including Google BigQuery and the REST API. In this notebook, we will be using the REST API.

The notebook is structured as follows:

1. You will begin by installing the necessary packages and configuring the GCP environment.
2. Next, you will define and test the custom LangChain tools around the Google Trends dataset and the GDEL API.
3. Finally, you will experiment with using the tools with a few different types of LangChain agents.

Install pre-requisites

Install the following python packages

In [None]:
! pip install -U google-cloud-aiplatform
! pip install -U langchain
! pip install -U python-dateutil
! pip install -U newspaper3k

DO NOT FORGET TO RESTART THE RUNTIME before continue

Configure Google Cloud Environment Settings

Set the following constants to reflect the GCP environment

1. PROJECT_ID: Your Google Cloud Project ID
2. REGION: The region to use for Vertex AI

In [None]:
PROJECT_ID = '<YOUR PROJECT ID HERE>'
REGION = 'us-central1'