##### Copyright 2024 Google LLC.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Summarize large documents using LangChain and Gemini

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_Summarization_WebLoad.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/google/generative-ai-docs/blob/main/examples/gemini/python/langchain/Gemini_LangChain_Summarization_WebLoad.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
</table>


## Overview

[Gemini](https://ai.google.dev/models/gemini) is a family of generative AI models that lets developers generate content and solve problems. These models are designed and trained to handle both text and images as input.

[LangChain](https://www.langchain.com/) is a framework designed to make integration of Large Language Models (LLM) like Gemini easier for applications.

In this notebook, you'll learn how to create an application to summarize large documents using Gemini and LangChain.


## Setup

First, you must install the packages and set the necessary environment variables.

### Installation

Install LangChain's Python library, `langchain`.

In [7]:
!pip install --quiet langchain

Install LangChain's integration package for Gemini, `langchain-google-genai`.

In [6]:
!pip install --quiet langchain-google-genai

### Grab an API Key


To use Gemini you need an *API key*. You can create an API key with one click in [Google AI Studio](https://makersuite.google.com/).
After creating the API key, you can either set an environment variable named `GOOGLE_API_KEY` to your API Key or pass the API key as an argument when creating the `ChatGoogleGenerativeAI` LLM using `LangChain`.

In this tutorial, you will set the environment variable `GOOGLE_API_KEY` to configure Gemini to use your API key.

In [20]:
# Run this cell and paste the API key in the prompt
import os
import getpass

# os.environ['GOOGLE_API_KEY'] = getpass.getpass('Gemini API Key:')
# google_api_key=os.environ.get("GOOGLE_API_KEY"),

## Summarize text

In this tutorial, you are going to summarize the text from a website using the Gemini model integrated through LangChain.

You'll perform the following steps to achieve the same:
1. Read and parse the website data using LangChain.
2. Chain together the following:
    * A prompt for extracting the required input data from the parsed website data.
    * A prompt for summarizing the text using LangChain.
    * An LLM model (Gemini) for prompting.

3. Run the created chain to prompt the model for the summary of the website data.

### Import the required libraries

In [21]:
from langchain import PromptTemplate
from langchain.document_loaders import WebBaseLoader
from langchain.schema import StrOutputParser
from langchain.schema.prompt_template import format_document

### Read and parse the website data

LangChain provides a wide variety of document loaders. To read the website data as a document, you will use the `WebBaseLoader` from LangChain.

To know more about how to read and parse input data from different sources using the document loaders of LangChain, read LangChain's [document loaders guide](https://python.langchain.com/docs/integrations/document_loaders).

In [22]:
loader = WebBaseLoader("https://blog.google/technology/ai/google-gemini-ai/#sundar-note")
docs = loader.load()

print(docs)

[Document(page_content="\n\n\n\n\n\nIntroducing Gemini: Google’s most capable AI model yet\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nSkip to main content\n\n\n\n\n\n        The Keyword\n      \n\n\n\n\n    Introducing Gemini: our largest and most capable AI model\n  \n\n\n\n\n\n\nShare\n\n\n\n\n\n\nTwitter\n\n\n\n\n\nFacebook\n\n\n\n\n\nLinkedIn\n\n\n\n\n\nMail\n\n\n\n\n\n\nCopy link\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n               Latest stories\n            \n\n\n\n              Product updates\n              \n\n\n\n\n\n\n\n\n\n\n\n\n    Product updates\n  \n\n\n\n            Android, Chrome & Play\n            \n\n\n\n\n\n\n\n\n                  Android\n                  \n                \n\n\n\n                  Chrome\n                  \n                \n\n\n\n                  Chromebooks\n                  \n                \n\n\n\n                  Google Play\n                  \n             

### Initialize Gemini

You must import the `ChatGoogleGenerativeAI` LLM from LangChain to initialize your model.
 In this example you will use **gemini-pro**, as it supports text summarization. To know more about the text model, read Google AI's [language documentation](https://ai.google.dev/models/gemini).

You can configure the model parameters such as ***temperature*** or ***top_p***,  by passing the appropriate values when creating the `ChatGoogleGenerativeAI` LLM.  To learn more about the parameters and their uses, read Google AI's [concepts guide](https://ai.google.dev/docs/concepts#model_parameters).

In [24]:
from langchain_google_genai import ChatGoogleGenerativeAI
# genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# model = genai.GenerativeModel('gemini-pro')
# If there is no env variable set for API key, you can pass the API key
# to the parameter `google_api_key` of the `ChatGoogleGenerativeAI` function:
# `google_api_key="key"`.

llm = ChatGoogleGenerativeAI(model="gemini-pro", api_key = os.environ.get("GOOGLE_API_KEY"),
                 temperature=0.7, top_p=0.85)

ValidationError: 1 validation error for ChatGoogleGenerativeAI
__root__
  Did not find google_api_key, please add an environment variable `GOOGLE_API_KEY` which contains it, or pass `google_api_key` as a named parameter. (type=value_error)

### Create prompt templates

You'll use LangChain's [PromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/prompt_templates/) to generate prompts for summarizing the text.

To summarize the text from the website, you will need the following prompts.
1. Prompt to extract the data from the output of `WebBaseLoader`, named `doc_prompt`
2. Prompt for the LLM model (Gemini) to summarize the extracted text, named `llm_prompt`.

In the `llm_prompt`, the variable `text` will be replaced later by the text from the website.

In [None]:
# To extract data from WebBaseLoader
doc_prompt = PromptTemplate.from_template("{page_content}")

# To query Gemini
llm_prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
llm_prompt = PromptTemplate.from_template(llm_prompt_template)

print(llm_prompt)

input_variables=['text'] template='Write a concise summary of the following:\n"{text}"\nCONCISE SUMMARY:'


### Create a Stuff documents chain

LangChain provides [Chains](https://python.langchain.com/docs/modules/chains/) for chaining together LLMs with each other or other components for complex applications. You will create a **Stuff documents chain** for this application. A **Stuff documents chain** lets you combine all the documents, insert them into the prompt and pass that prompt to the LLM.

You can create a Stuff documents chain using the [LangChain Expression Language (LCEL)](https://python.langchain.com/docs/expression_language).

To learn more about different types of document chains, read LangChain's [chains guide](https://python.langchain.com/docs/modules/chains/document/).

In [None]:
# Create Stuff documents chain using LCEL.
# This is called a chain because you are chaining
# together different elements with the LLM.
# In the following example, to create stuff chain,
# you will combine content, prompt, LLM model and
# output parser together like a chain using LCEL.
#
# The chain implements the following pipeline:
# 1. Extract data from documents and save to variable `text`.
# 2. This `text` is then passed to the prompt and input variable
#    in prompt is populated.
# 3. The prompt is then passed to the LLM (Gemini).
# 4. Output from the LLM is passed through an output parser
#    to structure the model response.

stuff_chain = (
    # Extract data from the documents and add to the key `text`.
    {
        "text": lambda docs: "\n\n".join(
            format_document(doc, doc_prompt) for doc in docs
        )
    }
    | llm_prompt         # Prompt for Gemini
    | llm                # Gemini function
    | StrOutputParser()  # output parser
)

### Prompt the model

To generate the summary of the  the website data, pass the documents extracted using the `WebBaseLoader` (`docs`) to `invoke()`.

In [None]:
stuff_chain.invoke(docs)

"Google introduces Gemini, its most capable AI model yet. Gemini is multimodal, flexible, and optimized for different sizes. It surpasses state-of-the-art performance on various benchmarks, including text, coding, and multimodal tasks. Gemini's capabilities include sophisticated reasoning, understanding text, images, audio, and advanced coding. It is designed with responsibility and safety at its core, undergoing comprehensive safety evaluations and incorporating safety classifiers. Gemini is being rolled out across Google products, including Bard, Pixel, Search, and Ads. Developers and enterprise customers can access Gemini Pro via the Gemini API. Gemini Ultra will be available to select partners and experts for early experimentation before a broader release. Gemini represents a new era of AI innovation, with future versions expected to advance planning, memory, and context processing capabilities."

# Conclusion

That's it. You have successfully created an LLM application to summarize text using LangChain and Gemini.