<a href="https://colab.research.google.com/github/victor-iyi/llm-examples/blob/main/Google_Generative_AI_%26_LangChain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q -U google-generativeai langchain-google-genai python-dotenv

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m299.3/299.3 kB[0m [31m7.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.0/116.0 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m53.0/53.0 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m141.1/141.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from google.colab import userdata

# Load secrets from colab
GOOGLE_API_KEY = userdata.get('gemini')
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

# Add to environment variables.
!echo 'GOOGLE_API_KEY='{GOOGLE_API_KEY} > .env
!echo 'OPENAI_API_KEY='{OPENAI_API_KEY} >> .env

In [None]:
!ls -a

.  ..  .config	.env  res  sample_data


In [None]:
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

True

In [None]:
import textwrap

from IPython.display import display
from IPython.display import Markdown

def to_markdown(text: str) -> Markdown:
  """Convert model output into Markdown for easy display."""
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

In [None]:
import os
import google.generativeai as genai

# Configure Google GenerativeAI API Key
genai.configure(api_key=GOOGLE_API_KEY)

In [None]:
model = genai.GenerativeModel(model_name='gemini-pro')
model

genai.GenerativeModel(
    model_name='models/gemini-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
)

In [None]:
prompt = [
    'What is Mixture of Experts?',
    'What is so special a'
]

response = model.generate_content(prompt)

In [None]:
to_markdown(response.text)

> **Mixture of Experts (MoE)**
> 
> **Definition:**
> Mixture of Experts is a machine learning model that consists of multiple expert networks or sub-models, each of which is specialized in handling a specific subset of data or problem. The overall prediction of the MoE model is a weighted average of the predictions from the individual experts.
> 
> **Special Features:**
> 
> * **Modularity:** MoE models are highly modular, allowing different experts to be added or removed as needed, making them flexible and adaptable to changing data or tasks.
> * **Specialization:** Each expert is trained on a specific subset of data, enabling it to learn more specialized knowledge for that subset.
> * **Parallel Computation:** The expert networks can be executed in parallel, which can significantly improve training and inference speed on large datasets.
> * **Robustness:** MoE models are more robust to outliers and noise in the data because the experts can compensate for each other's weaknesses.
> * **Interpretability:** The expert networks provide a way to understand how the model makes decisions, which can be useful for debugging and diagnostics.
> 
> **How It Works:**
> 
> 1. **Data Partitioning:** The training data is split into subsets, each corresponding to the expertise of one expert network.
> 2. **Expert Training:** Each expert network is trained independently on its assigned subset of data.
> 3. **Gating Network:** A gating network learns to determine which expert is most appropriate for each data point.
> 4. **Prediction:** The predictions from the individual experts are combined using weights determined by the gating network.
> 
> **Applications:**
> 
> MoE models are used in various applications, including:
> 
> * Image classification
> * Natural language processing
> * Audio and video analysis
> * Recommender systems
> * Time series forecasting
> 
> **Advantages:**
> 
> * Improved accuracy and performance
> * Reduced overfitting
> * Enhanced interpretability
> * Suitable for large-scale datasets
> * Faster training and inference times
> 
> **Disadvantages:**
> 
> * Increased model complexity
> * Potential for overspecialization
> * Requires careful design and implementation

## Using Google GenerativeAI with LangChain

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [None]:
llm = ChatGoogleGenerativeAI(model='gemini-pro')
llm

ChatGoogleGenerativeAI(model='gemini-pro', client=genai.GenerativeModel(
    model_name='models/gemini-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
))

In [None]:
result = llm.invoke(prompt[0])

In [None]:
to_markdown(result.content)

> **Mixture of Experts (MoE)** is a machine learning technique that leverages multiple specialized sub-models, known as experts, to make predictions. It is primarily used in the context of neural networks.
> 
> **How MoE Works:**
> 
> 1. **Input Data:** The input data is passed through a gating network.
> 
> 2. **Gating Network:** The gating network assigns weights (probabilities) to each expert. These weights determine the level of influence each expert will have on the final prediction.
> 
> 3. **Expert Models:** Multiple expert models are simultaneously trained on different subsets of the data or different aspects of the problem. Each expert is specialized in handling specific tasks or features.
> 
> 4. **Expert Predictions:** Each expert model makes predictions for the input data.
> 
> 5. **Mixture:** The predictions from all the experts are combined using the weighted average, where the weights are the probabilities assigned by the gating network.
> 
> **Benefits of MoE:**
> 
> * **Improved Performance:** By combining the expertise of multiple specialized models, MoE can achieve higher accuracy and generalization capabilities.
> * **Reduced Overfitting:** The gating network helps prevent overfitting by selectively activating experts that are relevant to the input data.
> * **Scalability:** MoE can be scaled to handle large datasets and complex tasks by adding more expert models.
> * **Explainability:** The gating network provides insights into which experts contribute to the final prediction, making MoE more interpretable.
> 
> **Applications of MoE:**
> 
> * Image classification
> * Natural language processing
> * Speech recognition
> * Recommendation systems
> * Fraud detection
> 
> **Advantages of MoE:**
> 
> * Can handle complex tasks with multiple aspects
> * Improves performance through specialization
> * Scales well to large datasets
> * Provides explainability
> 
> **Limitations of MoE:**
> 
> * Can be computationally expensive to train
> * Requires careful selection and training of experts
> * May suffer from cold start issues if new data differs significantly from the training data

## Use the GeminiPro Vision API

In [None]:
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model='gemini-pro-vision')
llm

ChatGoogleGenerativeAI(model='gemini-pro-vision', client=genai.GenerativeModel(
    model_name='models/gemini-pro-vision',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
))

In [None]:
message = HumanMessage(
    content=[
        {
            'type': 'text',
            'text': 'What do you see in this image?',
        },
        {
            'type': 'image_url',
            'image_url': 'https://www.goconstruct.org/media/qukhn3dc/soh.jpg?anchor=center&mode=crop&width=940&height=610&rnd=132743881555030000',
        },
    ]
)
message

HumanMessage(content=[{'type': 'text', 'text': 'What do you see in this image?'}, {'type': 'image_url', 'image_url': 'https://www.goconstruct.org/media/qukhn3dc/soh.jpg?anchor=center&mode=crop&width=940&height=610&rnd=132743881555030000'}])

In [None]:
response = llm.invoke([message])
response

AIMessage(content=' The Sydney Opera House.', response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'safety_ratings': [{'category': 'HARM_CATEGORY_SEXUALLY_EXPLICIT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HATE_SPEECH', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_HARASSMENT', 'probability': 'NEGLIGIBLE', 'blocked': False}, {'category': 'HARM_CATEGORY_DANGEROUS_CONTENT', 'probability': 'NEGLIGIBLE', 'blocked': False}]}, id='run-e7e3f4de-8330-44c0-8f7a-e85313f42f5b-0')

In [None]:
response.content

' The Sydney Opera House.'

In [None]:
message = HumanMessage(
    content=[
        {
            'type': 'text',
            'text': 'What do you see in this image? Write a short article about the content of the image.',
        },
        {
            'type': 'image_url',
            'image_url': 'https://www.goconstruct.org/media/qukhn3dc/soh.jpg?anchor=center&mode=crop&width=940&height=610&rnd=132743881555030000',
        },
    ]
)
message

HumanMessage(content=[{'type': 'text', 'text': 'What do you see in this image? Write a short article about the content of the image.'}, {'type': 'image_url', 'image_url': 'https://www.goconstruct.org/media/qukhn3dc/soh.jpg?anchor=center&mode=crop&width=940&height=610&rnd=132743881555030000'}])

In [None]:
response = llm.invoke([message])
to_markdown(response.content)

>  The Sydney Opera House is one of the most iconic buildings in the world. It is located in Sydney, Australia, and was designed by Danish architect Jørn Utzon. The building was completed in 1973 and is a UNESCO World Heritage Site.
> 
> The Sydney Opera House is known for its unique design, which features a series of large, white shells. The shells are made of precast concrete and are supported by a steel frame. The building is also known for its acoustics, which are considered to be some of the best in the world.
> 
> The Sydney Opera House is home to the Sydney Symphony Orchestra, the Australian Ballet, and the Australian Opera. It also hosts a variety of other events, such as concerts, plays, and exhibitions. The Sydney Opera House is a popular tourist destination and is visited by millions of people each year.

## In Contenxt Retrieval

In [None]:
model = ChatGoogleGenerativeAI(model='gemini-pro', temperature=0.3)
model

ChatGoogleGenerativeAI(model='gemini-pro', temperature=0.3, client=genai.GenerativeModel(
    model_name='models/gemini-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
))

In [None]:
# Download the document.
from pathlib import Path
import urllib

# Create data folder
data_folder = Path.cwd() / 'res/data'
Path(data_folder).mkdir(parents=True, exist_ok=True)

# Create pdf file path.
pdf_url = 'https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf'
pdf_file = str(Path(data_folder, pdf_url.split('/')[-1]))

# Download pdf
urllib.request.urlretrieve(pdf_url, pdf_file)

('/content/res/data/practitioners_guide_to_mlops_whitepaper.pdf',
 <http.client.HTTPMessage at 0x7b9de9b1ab00>)

In [None]:
!pip install -q -U langchain langchain-community langchain-text-splitters
!pip install -q -U pypdf tiktoken chromadb

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/526.8 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.6/526.8 kB[0m [31m5.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m45.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.9/91.9 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.8/60.8 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m59.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

In [None]:
from langchain_community.document_loaders import PyPDFLoader

In [None]:
pdf_file

'/content/res/data/practitioners_guide_to_mlops_whitepaper.pdf'

In [None]:
# Load PDF into Document objects
pdf_loader = PyPDFLoader(pdf_file)
pages = pdf_loader.load_and_split()
print(pages[2].page_content)

Executive summary
Across industries, DevOps and DataOps have been widely adopted as methodologies to improve quality and re -
duce the time to market of software engineering and data engineering initiatives. With the rapid growth in machine 
learning (ML) systems, similar approaches need to be developed in the context of ML engineering, which handle the 
unique complexities of the practical applications of ML. This is the domain of MLOps. MLOps is a set of standard -
ized processes and technology capabilities for building, deploying, and operationalizing ML systems rapidly and 
reliably.]
We previously published Google Cloud’s AI Adoption Framework  to provide guidance for technology leaders who 
want to build an effective artificial intelligence (AI) capability in order to transform their business. That framework 
covers AI challenges around people, data, technology, and process, structured in six different themes: learn, lead, 
access, secure, scale, and automate . 
The current docum

In [None]:
context = '\n'.join(str(p.page_content) for p in pages[:30])
print(f'The total words in the context: {len(context):,}')

The total words in the context: 55,545


## Prompt Design - In Context

In [None]:
from langchain_core.prompts import PromptTemplate


In [None]:
prompt_template = '''\
Answer the question as precise as possible using provided context. If the answer
is not contained in the context, say "answer not available in context" \n\n
Context: \n{context}\n
Question: \n{question}?\n
Answer:
'''

prompt = PromptTemplate(
    template=prompt_template,
    input_variables=['context', 'question'],
)

prompt

PromptTemplate(input_variables=['context', 'question'], template='Answer the question as precise as possible using provided context. If the answer\nis not contained in the context, say "answer not available in context" \n\n\nContext: \n{context}\n\nQuestion: \n{question}?\n\nAnswer:\n')

In [None]:
from langchain.chains.question_answering import load_qa_chain
from pprint import pprint

In [None]:
# Limited context
stuff_chain = load_qa_chain(model, chain_type='stuff', prompt=prompt)
stuff_chain

StuffDocumentsChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['context', 'question'], template='Answer the question as precise as possible using provided context. If the answer\nis not contained in the context, say "answer not available in context" \n\n\nContext: \n{context}\n\nQuestion: \n{question}?\n\nAnswer:\n'), llm=ChatGoogleGenerativeAI(model='gemini-pro', temperature=0.3, client=genai.GenerativeModel(
    model_name='models/gemini-pro',
    generation_config={},
    safety_settings={},
    tools=None,
    system_instruction=None,
))), document_variable_name='context')

In [None]:
# Answer is within pages[6:8]
question = 'What is Experimentation? Provide a detailed answer.'

stuff_answer = stuff_chain(
    {
        'input_documents': pages[6:8],  # Answer is within these pages.
        'question': question,
    },
    return_only_outputs=True,
)
pprint(stuff_answer)

{'output_text': 'Experimentation is the core activity during the ML '
                'development phase. Data scientists and ML researchers '
                'prototype model architectures and training routines, create '
                'labeled datasets, and use features and other reusable ML '
                'artifacts that are governed through the data and model '
                'management process.'}


In [None]:
# Answer is NOT within pages[6:8]
question = 'Describe data management and feature management systems.'

stuff_answer = stuff_chain(
    {
        'input_documents': pages[6:8],  # Answer is within these pages.
        'question': question,
    },
    return_only_outputs=True,
)
pprint(stuff_answer)

{'output_text': 'Answer not available in context'}


## RAG Pipeline: Embedding + LLM

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings
from langchain.vectorstores.chroma import Chroma

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10_000, chunk_overlap=0)
context = '\n\n'.join(str(p.page_content) for p in pages)
texts = text_splitter.split_text(context)

In [None]:
texts

['Practitioners guide to MLOps:  \nA framework for continuous \ndelivery and automation of  \nmachine learning.White paper\nMay 2021\nAuthors:  \nKhalid Salama,  \nJarek Kazmierczak,  \nDonna Schut\n\nTable of Contents\nExecutive summary  3\nOverview of MLOps lifecycle and core capabilities  4\nDeep dive of MLOps processes  15\nPutting it all together  34\nAdditional resources  36Building an ML-enabled system  6\nThe MLOps lifecycle  7\nMLOps: An end-to-end workflow  8\nMLOps capabilities  9\n      Experimentation  11\n      Data processing  11\n      Model training  11\n      Model evaluation  12\n      Model serving  12\n      Online experimentation  13\n      Model monitoring  13\n      ML pipelines  13\n      Model registry  14\n      Dataset and feature repository  14\n      ML metadata and artifact tracking  15\nML development  16\nTraining operationalization  18\nContinuous training  20\nModel deployment  23\nPrediction serving  25\nContinuous monitoring  26\nData and model mana

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
embeddings

GoogleGenerativeAIEmbeddings(model='models/embedding-001', task_type=None, google_api_key=None, credentials=None, client_options=None, transport=None, request_options=None)

In [None]:
vector_index = Chroma.from_texts(texts, embeddings).as_retriever()
vector_index

VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7b9de8ba00a0>)

In [None]:
# Perform semantic search.
docs = vector_index.get_relevant_documents(question)
docs

[Document(page_content='26\nThe serving engine can serve predictions to consumers in the following \nforms:\n• Online inference in near real time for high-frequency singleton \nrequests (or mini batches of requests), using interfaces like REST \nor gRPC.\n• Streaming inference in near real time, such as through an \nevent-processing pipeline.\n• Offline batch inference for bulk data scoring, usually integrated \nwith extract, transform, load (ETL) processes.\n• Embedded inference as part of embedded systems or edge devic -\nes.\nIn some scenarios of prediction serving, the serving engine might need \nto look up feature values that are related to the request. For example, you \nmight have a model that predicts the propensity of a customer to buy a \nparticular product, given a set of customer and product features. However, \nthe request includes only the customer and the product identifier. There -\nfore, the serving engine uses these identifiers to fetch the customer and \nthe product 

In [None]:
len(docs)

4

In [None]:
stuff_answer = stuff_chain(
    {
        'input_documents': docs,
        'question': question,
    },
    return_only_outputs=True,
)
pprint(stuff_answer)

{'output_text': 'Data management and feature management systems help mitigate '
                'such issues by providing a unified repository for ML features '
                'and datasets. As the diagram shows, the features and datasets '
                'are created, discovered, and reused in different experiments. '
                'Batch serving of the data is used for experimentation, '
                'continuous training, and batch prediction, while online '
                'serving of the data is used for real-time prediction use '
                'cases.'}
