## Introduction

In this tutorial, you learn how to use Google Cloud AI tools to quickly bring the power of Large Language Models to enterprise systems.  

This tutorial covers the following -

*   What are embeddings - what business challenges do they help solve ?
*   Understanding Text with Vertex AI Text Embeddings
*   Find Embeddings fast with Vertex AI Vector Search
*   Grounding LLM outputs with Vector Search

This tutorial is based on [the blog post](https://cloud.google.com/blog/products/ai-machine-learning/how-to-use-grounding-for-your-llms-with-text-embeddings), combined with sample code.


### Prerequisites

This tutorial is designed for developers who has basic knowledge and experience with Python programming and machine learning.

If you are not reading this tutorial in Qwiklab, then you need to have a Google Cloud project that is linked to a billing account to run this. Please go through [this document](https://cloud.google.com/vertex-ai/docs/start/cloud-environment) to create a project and setup a billing account for it.

### Choose the runtime environment

The notebook can be run on either Google Colab or [Vertex AI Workbench](https://cloud.google.com/vertex-ai-workbench).

- To use Colab: Click [this link](https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/embeddings/intro-textemb-vectorsearch.ipynb) to open the tutorial in Colab.

- To use Workbench: If it is the first time to use Workbench in your Google Cloud project, open [the Workbench console](https://console.cloud.google.com/vertex-ai/workbench) and click ENABLE button to enable Notebooks API. Then click [this link](https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/embeddings/intro-textemb-vectorsearch.ipynb),  and select an existing notebook or create a new notebook.


### How much will this cost?

In case you are using your own Cloud project, not a temporary project on Qwiklab, you need to spend roughly a few US dollars to finish this tutorial.

The pricing of the Cloud services we will use in this tutorial are avilable in the following pages:

- [Vertex AI Embeddings for Text](https://cloud.google.com/vertex-ai/pricing#generative_ai_models)
- [Vertex AI Vector Search](https://cloud.google.com/vertex-ai/pricing#matchingengine)
- [BigQuery](https://cloud.google.com/bigquery/pricing)
- [Cloud Storage](https://cloud.google.com/storage/pricing)
- [Vertex AI Workbench](https://cloud.google.com/vertex-ai/pricing#notebooks) if you use one

You can use the [Pricing Calculator](https://cloud.google.com/products/calculator) to generate a cost estimate based on your projected usage. The following is an example of rough cost estimation with the calculator, assuming you will go through this tutorial a couple of time.

<img src="https://storage.googleapis.com/github-repo/img/embeddings/vs-quickstart/pricing.png" width="50%"/>

### **Warning: delete your objects after the tutorial**

In case you are using your own Cloud project, please make sure to delete all the Indexes, Index Endpoints and Cloud Storage buckets (and the Workbench instance if you use one) after finishing this tutorial. Otherwise the remaining assets would incur unexpected costs.


# Bringing Gen AI and LLMs to production services

Many people are now starting to think about how to bring Gen AI and LLMs to production services, and facing with several challenges.

- "How to integrate LLMs or AI chatbots with existing IT systems, databases and business data?"
- "We have thousands of products. How can I let LLM memorize them all precisely?"
- "How to handle the hallucination issues in AI chatbots to build a reliable service?"

Here is a quick solution: **grounding** with **embeddings** and **vector search**.

What is grounding? What are embedding and vector search? In this tutorial, we will learn these crucial concepts to build reliable Gen AI services for enterprise use. But before we dive deeper, let's try the demo below.

# Vertex AI Embeddings for Text

With the [Vertex AI Embeddings for Text](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings), you can easily create a text embedding with LLM. The product is also available on [Vertex AI Model Garden](https://cloud.google.com/model-garden)

![](https://storage.googleapis.com/github-repo/img/embeddings/textemb-vs-notebook/7.png)

This API is designed to extract embeddings from texts. It can take text input up to 3,072 input tokens, and outputs 768 dimensional text embeddings.

# Text Embeddings in Action
## Setup

Before get started with the Vertex AI services, we need to setup the following.

* Install Python SDK
* Environment variables
* Authentication (Colab only)
* Enable APIs
* Set IAM permissions

### Install Python SDK

In [1]:
# Install Vertex AI LLM SDK
! pip install --user --upgrade google-cloud-aiplatform==1.47.0 langchain==0.1.14 langchain-google-vertexai==0.1.3 typing_extensions==4.9.0

# Dependencies required by Unstructured PDF loader
! sudo apt -y -qq install tesseract-ocr libtesseract-dev
! sudo apt-get -y -qq install poppler-utils
! pip install --user --upgrade unstructured==0.12.4 pdf2image==1.17.0 pytesseract==0.3.10 pdfminer.six==20221105
! pip install --user --upgrade pillow-heif==0.15.0 opencv-python==4.9.0.80 unstructured-inference==0.7.24 pikepdf==8.13.0 pypdf==4.0.1

# For Matching Engine integration dependencies (default embeddings)
! pip install --user --upgrade tensorflow_hub==0.16.1 tensorflow_text==2.15.0
! pip install sentence-transformers
! pip install -U langchain-community faiss-gpu
! pip install --upgrade --quiet  sentence_transformers > /dev/null
! pip install langchain_community
! pip install gpt4all

Collecting google-cloud-aiplatform==1.47.0
  Downloading google_cloud_aiplatform-1.47.0-py2.py3-none-any.whl (4.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.8/4.8 MB[0m [31m23.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain==0.1.14
  Downloading langchain-0.1.14-py3-none-any.whl (812 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m812.8/812.8 kB[0m [31m31.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-google-vertexai==0.1.3
  Downloading langchain_google_vertexai-0.1.3-py3-none-any.whl (52 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m52.7/52.7 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing_extensions==4.9.0
  Downloading typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain==0.1.14)
  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain==0.1.14)
  Downloading json

The following additional packages will be installed:
  libarchive-dev libleptonica-dev tesseract-ocr-eng tesseract-ocr-osd
The following NEW packages will be installed:
  libarchive-dev libleptonica-dev libtesseract-dev tesseract-ocr
  tesseract-ocr-eng tesseract-ocr-osd
0 upgraded, 6 newly installed, 0 to remove and 45 not upgraded.
Need to get 8,560 kB of archives.
After this operation, 31.6 MB of additional disk space will be used.
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 6.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package libarchive-dev:amd64.
(Reading database ... 121752 files and directories

Collecting tensorflow_text==2.15.0
  Downloading tensorflow_text-2.15.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m42.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tensorflow_text
Successfully installed tensorflow_text-2.15.0
Collecting sentence-transformers
  Downloading sentence_transformers-2.7.0-py3-none-any.whl (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.5/171.5 kB[0m [31m4.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sentence-transformers
Successfully installed sentence-transformers-2.7.0
Collecting faiss-gpu
  Downloading faiss_gpu-1.7.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (85.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.5/85.5 MB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: faiss-gpu
Successfully installed faiss-gpu-1.7.2
C

# Download custom Python modules and utilities
The cell below will download some helper functions needed for using Vertex AI Matching Engine in this notebook. These helper functions were created to keep this notebook more tidy and concise, and you can also view them directly on Github.

In [1]:
# # Automatically restart kernel after installs so that your environment can access the new packages
# import IPython

# app = IPython.Application.instance()
# app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [1]:
#Authenticating your notebook environment
import sys

if "google.colab" in sys.modules:
    from google.colab import auth

    auth.authenticate_user()

In [2]:
PROJECT_ID = "iisccapstone-420805"

# Enable APIs
Run the following to enable APIs for Compute Engine, Vertex AI, Cloud Storage and BigQuery with this Google Cloud project.

In [None]:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com storage.googleapis.com bigquery.googleapis.com --project {PROJECT_ID}

Operation "operations/acat.p2-739467896105-1ee613f8-4317-42df-9e90-648894ebac1e" finished successfully.


### Set IAM permissions

Also, we need to add access permissions to the default service account for using those services.

- Go to [the IAM page](https://console.cloud.google.com/iam-admin/) in the Console
- Look for the principal for default compute service account. It should look like: `<project-number>-compute@developer.gserviceaccount.com`
- Click the edit button at right and click `ADD ANOTHER ROLE` to add `Vertex AI User`, `BigQuery User` and `Storage Admin` to the account.

This will look like this:

![](https://storage.googleapis.com/github-repo/img/embeddings/vs-quickstart/iam-setting.png)

# Environment variables

In [None]:
# get project ID
PROJECT_ID = ! gcloud config get project
PROJECT_ID = "iisccapstone-420805"
LOCATION = "us-central1"
if PROJECT_ID == "(unset)":
    print(f"Please set the project ID manually below")
    # define project information
if PROJECT_ID == "(unset)":
    PROJECT_ID = "iisccapstone-420805'"  # @param {type:"string"}

# generate an unique id for this session
from datetime import datetime

UID = datetime.now().strftime("%m%d%H%M")

In [None]:
PROJECT_ID

'iisccapstone-420805'

# Import libraries

In [3]:
import vertexai

#PROJECT_ID = PROJECT_ID # @param {type:"string"}
REGION = "us-central1"

vertexai.init(project={PROJECT_ID}, location=REGION)

In [4]:
import json
import textwrap

# Utils
import time
import uuid
from typing import List

import numpy as np
import vertexai

# Vertex AI
from google.cloud import aiplatform

print(f"Vertex AI SDK version: {aiplatform.__version__}")

# LangChain
import langchain

print(f"LangChain version: {langchain.__version__}")

from langchain.chains import RetrievalQA
from langchain.document_loaders import GCSDirectoryLoader
from langchain.embeddings import VertexAIEmbeddings
from langchain.llms import VertexAI
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Import custom Matching Engine packages
# from utils.matching_engine import MatchingEngine
# from utils.matching_engine_utils import MatchingEngineUtils
# Import custom Matching Engine packages
from langchain_google_vertexai import VertexAI , VertexAIEmbeddings , VectorSearchVectorStore
import faiss
from faiss import IndexFlatL2
import numpy as np
import spacy
from langchain_community.embeddings import HuggingFaceEmbeddings
import json
import pdfplumber
from langchain_community.vectorstores import FAISS
import os
from google.colab import files
import zipfile
from langchain.document_loaders import BigQueryLoader #Class for storing a piece of text and associated metadata.
from langchain_community.embeddings import GPT4AllEmbeddings


Vertex AI SDK version: 1.47.0
LangChain version: 0.1.14


# connecting to bigquery to extract the text data

In [None]:
# # load the BQ Table into a Pandas Dataframe
# import pandas as pd
# from google.cloud import bigquery


# bq_client = bigquery.Client(project=PROJECT_ID)
# QUERY_TEMPLATE = """
#         SELECT * from iisccapstone-420805.Pubmed.pubmed where content !='';
#         """
# # query_params=[
# #         bigquery.ArrayQueryParameter("q1","DATE", q1),
# #         bigquery.ArrayQueryParameter("q2","DATE", q2),
# #         bigquery.ArrayQueryParameter("q3","DATE", q3),
# #         bigquery.ArrayQueryParameter("q4","DATE", q4),
# #         bigquery.ArrayQueryParameter("rule_name","STRING", rule_name),
# #         bigquery.ArrayQueryParameter("insert_timestamp","DATE", insert_timestamp),
# #         bigquery.ArrayQueryParameter("Manufacturer","STRING", Manufacturer),
# #         bigquery.ArrayQueryParameter("partner_code","STRING", partner_code),
#     # ]
# try:
#   pubmed = bq_client.query(QUERY_TEMPLATE)  # Make an API request.
#   pubmed_data = pubmed.to_dataframe()
# except Exception as e:
#   print('Error',e,'Data_not_found')

# # examine the data
# pubmed_data.head()

Unnamed: 0,Title,content
0,"Impact of Alcoholism – Kerala.pdf,page:119",Problems Experienced While Tried to Cut Down /...
1,Impact of Alcohol Consumption on Young People....,Level of Grade of Details Year Country Cited e...
2,The Impact of Alcoholic Beverages on Human Hea...,"Nutrients2021,13,3938 1.4.Conclusion Insummary..."
3,REGIONAL STATUS REPORT ON ALCOHOL AND HEALTH I...,Regional status report on alcohol and health i...
4,therapy for multisystem inflammatory syndrome ...,"Articles significant comorbidities (eg, immune..."


# Load the text embeddings model
from vertexai.preview.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("textembedding-gecko@001")

# Load the text embeddings model

In [None]:

# from langchain_google_vertexai import VertexAIEmbeddings

# model = VertexAIEmbeddings(model_name="textembedding-gecko@003")

# Load the biquery where data for llm is stored

In [None]:

# BASE_QUERY = """SELECT * FROM `iisccapstone-420805.Pubmed.pubmed` where content !=''"""
# loader = BigQueryLoader(BASE_QUERY,project="iisccapstone-420805")
# documents = loader.load()

  warn_deprecated(


In [None]:
# Add document metadata to all the document of documents and formatting page_content to only contain contents of BQ table pubmed as it contained both title and content
# for document in documents:
#   document.metadata={'source':document.page_content.split('\n')[0]}
#   document.page_content=document.page_content.split('\n')[1]


In [None]:
# documents[0].page_content

'content: Problems Experienced While Tried to Cut Down / Stop Drinking The present study had a probe into the problems faced by the respondents, while they had tried to stop/cut down drinking. The query was posed only to the Alcohol Users (Adult & Adolescents) and not to the Spouses of Drinkers. Obviously, it was very pathetic to see that 63.4% of the Adults and 58.4% of the Adolescents have faced problems, while they tried to stop drinking. Further, 37.7% of the Adults had faced multiple problems. Multiple withdrawal problems were found to be comparatively less (15.9%) among the Adolescent Drinkers and headache and fidgety/restless was the major difficulties they faced when they cut down/stopped drinking. Further, 12.4% reported that they had a problem of „Unable to sleep‟. (Refer to table 2.6.7) Category-wise, the figure 2.6.2 showed that withdrawal problems were more (83.5%) among the Harmful Drinkers (Adults) compared to the Less-Harmful Drinkers (54.6%). Table No.2.6.7 Problems Ex

# Chunk documents
Split the documents to smaller chunks. When splitting the document, ensure a few chunks can fit within the context length of LLM.

In [None]:
# # split the documents into chunks
# text_splitter = RecursiveCharacterTextSplitter(
#     chunk_size=1000,
#     chunk_overlap=50,
#     separators=["\n\n", "\n", ".", "!", "?", ",", " ", ""],
# )
# doc_splits = text_splitter.split_documents(documents)

# # Add chunk number to metadata
# for idx, split in enumerate(doc_splits):
#     split.metadata["chunk"] = idx

# print(f"# of documents = {len(doc_splits)}")

# of documents = 4405


In [None]:
# doc_splits[0]

Document(page_content='content: Problems Experienced While Tried to Cut Down / Stop Drinking The present study had a probe into the problems faced by the respondents, while they had tried to stop/cut down drinking. The query was posed only to the Alcohol Users (Adult & Adolescents) and not to the Spouses of Drinkers. Obviously, it was very pathetic to see that 63.4% of the Adults and 58.4% of the Adolescents have faced problems, while they tried to stop drinking. Further, 37.7% of the Adults had faced multiple problems. Multiple withdrawal problems were found to be comparatively less (15.9%) among the Adolescent Drinkers and headache and fidgety/restless was the major difficulties they faced when they cut down/stopped drinking. Further, 12.4% reported that they had a problem of „Unable to sleep‟. (Refer to table 2.6.7) Category-wise, the figure 2.6.2 showed that withdrawal problems were more (83.5%) among the Harmful Drinkers (Adults) compared to the Less-Harmful Drinkers (54.6%). Tabl

# creating faiss db to store embeddings in offline mode


In [None]:
# import faiss
# from google.cloud import aiplatform
# from vertexai.language_models import TextEmbeddingModel
# import vertexai

# #PROJECT_ID = PROJECT_ID # @param {type:"string"}
# REGION = "us-central1"
# PROJECT_ID = "iisccapstone-420805"
# vertexai.init(project={PROJECT_ID}, location=REGION)
# text_embedding_model = TextEmbeddingModel.from_pretrained("textembedding-gecko@003")

In [None]:
# db = FAISS.from_documents(doc_splits , GPT4AllEmbeddings())

Downloading: 100%|██████████| 45.9M/45.9M [00:00<00:00, 175MiB/s]
Verifying: 100%|██████████| 45.9M/45.9M [00:00<00:00, 399MiB/s]


In [None]:
# print(db.index.ntotal)

4405


In [5]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
query = 'prolong alcohol intake impact human health?'
# docs = db.similarity_search(query)


In [None]:
# db.save_local("/content/drive/MyDrive/Capstone_project/GPT4AllEmbeddings/faiss_index")

In [6]:
new_db = FAISS.load_local("/content/drive/MyDrive/Capstone_project/GPT4AllEmbeddings/faiss_index", GPT4AllEmbeddings(),allow_dangerous_deserialization=True)
query = 'prolong alcohol intake impact human health?'

docs = new_db.similarity_search(query)

Downloading: 100%|██████████| 45.9M/45.9M [00:00<00:00, 107MiB/s]
Verifying: 100%|██████████| 45.9M/45.9M [00:00<00:00, 400MiB/s]


In [7]:
docs

[Document(page_content='content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits', metadata={'source': 'Title: alcohol_health_overview.PDF_.final_.080123

In [9]:
new_db.similarity_search_with_score(query)

[(Document(page_content='content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits', metadata={'source': 'Title: alcohol_health_overview.PDF_.final_.08012

# Model LLaMA2

In [10]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

In [11]:
# converting it into list of string to generate summary
docs1=format_docs(docs)

In [12]:
from transformers import pipeline
model_name="gpt2"
chat_pipeline=pipeline("text-generation",model=model_name)

from transformers import GPT2LMHeadModel, GPT2Tokenizer
from langchain.prompts import PromptTemplate

model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

In [13]:
chat_pipeline(docs1,max_length=1000)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.




# Text generator

In [14]:
def generate_text(prompt, max_length=1000):
  input_ids = tokenizer.encode(prompt, return_tensors="pt")
  output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, temperature=0.7)
  generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
  return generated_text

In [15]:
generate_text(docs1)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits\n\n. Expert evidence provided for the paper showed that the only group with potential to have a

# Text summarizer

In [16]:

def summarize_text(text, max_length=1000):
  input_ids = tokenizer.encode(text, return_tensors="pt", max_length=1024, truncation=True)
  output = model.generate(input_ids, max_length=max_length, num_return_sequences=1, temperature=0.2, early_stopping=True)
  summarized_text = tokenizer.decode(output[0], skip_special_tokens=True)
  return summarize_text

In [17]:
from transformers import pipeline

def summarize_text(text, max_length=1000):
    """Summarize input text using a pre-trained GPT-2 model."""
    summarization_pipeline = pipeline("summarization", model="gpt2")
    summary = summarization_pipeline(text, max_length=max_length, min_length=50, do_sample=True)[0]['summary_text']
    return summary

# Example usage:
input_text = docs1
summary = summarize_text(input_text)
print("Summary:", summary)

The model 'GPT2LMHeadModel' is not supported for summarization. Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].
Your max_length is set to 1000, but your input_length is only 786

Summary: content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits

. Expert evidence provided for the paper showed that the only group with potential to 

# Creating db as Retreiver

In [18]:
# Create a retriever object from the 'db' using the 'as_retriever' method.
# This retriever is likely used for retrieving data or documents from the database.
retriever = new_db.as_retriever()
docs = retriever.invoke('prolong alcohol intake impact human health?')
docs

[Document(page_content='content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits', metadata={'source': 'Title: alcohol_health_overview.PDF_.final_.080123

In [19]:
docs = retriever.get_relevant_documents("What is prolong alcohol intake impact human health?")

context_summ=' '.join([str(i.page_content) for i in docs])

In [56]:
context_summ

'content: 36 Volume 18, Issue 3 NIH MedlinePlus Magazine :STIDERC Alcohol’s health effects: What you need to know rinking alcohol is so common that people may not question how even one beer, cocktail, or glass of wine could impact their As of 2021, 29.5 million people health. Alcohol is a part of cultural traditions all ages 12 and older had an alcohol around the world…and it’s also a drug that chemically use disorder in the past year. alters the body. People of all ages need to understand these effects. SOURCE: NATIONAL SURVEY ON DRUG USE AND HEALTH The National Institute on Alcohol Abuse and Alcoholism (NIAAA) has information on how alcohol The alcohol you consume resides mostly in the body’s impacts your health. It also has resources to help water. Because women tend to have less water in their bodies than men, if a woman and a man of the same those looking to change their drinking habits content: nutrients Article The Global Impact of Alcohol Consumption on Premature Mortality and 

In [21]:
docs = retriever.get_relevant_documents("What is e-cigarrette?")

context_summ1=' '.join([str(i.page_content) for i in docs])

In [60]:
context_summ1



In [39]:
# import openai

# # Set up your OpenAI API key
# openai.api_key = 'sk-proj-dHFkSQp3XoUHkhJQ1MIcT3BlbkFJIQiX7fhFeAGtyYpMDGjV'

# Example text
input_text = context_summ1



In [43]:
!pip show faiss-gpu

Name: faiss-gpu
Version: 1.7.2
Summary: A library for efficient similarity search and clustering of dense vectors.
Home-page: https://github.com/kyamagu/faiss-wheels
Author: Kota Yamaguchi
Author-email: KotaYamaguchi1984@gmail.com
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: 
Required-by: 


In [50]:
!python --version

Python 3.10.12
