In [1]:
from os import getenv
from dotenv import load_dotenv

load_dotenv()

PINECONE_API_KEY = getenv("PINECONE_API_KEY")
PINECONE_API_KEY

'19aefeca-47a1-4805-b897-dec997564163'

In [2]:
MODEL = "llama3"

In [3]:
from langchain_community.llms import Ollama
from langchain_community.embeddings import OllamaEmbeddings

model = Ollama(model=MODEL)
embeddings = OllamaEmbeddings(model=MODEL)

In [7]:
ans = model.invoke("Why the sky is becoming redish just before the evening?")
print(ans)

What a great question!

The reddish hue you're observing in the sky just before sunset is called "sunset glow" or "sundowner." It's a common phenomenon that occurs when the sun is low on the horizon, typically during the last hour or so before sundown. Here are some reasons why this happens:

1. **Atmospheric scattering**: When the sun's rays enter Earth's atmosphere, they encounter tiny molecules of gases like nitrogen and oxygen. These molecules scatter shorter (blue) wavelengths of light more than longer (red) wavelengths, a phenomenon known as Rayleigh scattering. As the sun dips lower in the sky, the light it emits has to travel longer distances through the atmosphere, which means more blue light is scattered away, leaving mainly red and orange hues.
2. **Dust and aerosols**: Tiny particles like dust, pollen, and other aerosols in the air can also scatter shorter wavelengths of light, enhancing the reddish color. These particles are often more abundant near sunset due to human act

In [4]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

In [34]:
chain = model | parser

chain.invoke("What is machine learning?")

"Machine learning (ML) is a subfield of artificial intelligence (AI) that involves training algorithms to make decisions or predictions based on data, without being explicitly programmed. In other words, ML allows computers to learn from experience and improve their performance over time.\n\nHere's how it works:\n\n1. **Training**: You provide the algorithm with a large dataset, which is used to train the model.\n2. **Model**: The algorithm learns patterns and relationships within the data, creating a model that can make predictions or take actions.\n3. **Testing**: You test the model on new, unseen data to evaluate its performance.\n\nMachine learning involves various types of algorithms, such as:\n\n1. **Supervised Learning**: The algorithm is trained on labeled data (e.g., images with labels) to learn relationships and make predictions.\n2. **Unsupervised Learning**: The algorithm discovers patterns and structures within the data without labeled examples.\n3. **Reinforcement Learnin

In [5]:
# * Prompt Template
from langchain.prompts import PromptTemplate

TEMPLATE = """
Answer the question based on the context below. You don't 
need to say based on the context or based on this or that.
If you can't answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template=TEMPLATE)
prompt.format(context="The context", question="The question") # * for test

'\nAnswer the question based on the context below. You don\'t \nneed to say based on the context or based on this or that.\nIf you can\'t answer the question, reply "I don\'t know".\n\nContext: The context\n\nQuestion: The question\n'

In [7]:
# * test the template
chain = prompt | model | parser

In [8]:
chain.input_schema.schema()

{'title': 'PromptInput',
 'type': 'object',
 'properties': {'context': {'title': 'Context', 'type': 'string'},
  'question': {'title': 'Question', 'type': 'string'}}}

In [30]:
chain.invoke({
  "context": "The name Tonni was given to my wife after she was born. She was so much cute while she was born. The time was 2000. Still she is so much beautiful, masha Allah. We say 'Masha Allah' after saying good words.",
  "question": "When my wife was born?"
})

'2000'

In [9]:
# * load the pdf
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("vector_stores_ mongodb.pdf")
pages = loader.load_and_split()
pages

[Document(page_content='You’re probably aware of the buzz around arti\x00cial\nintelligence (AI), language learning (LL), and machine\nlearning (ML), which impact everything from social media\nalgorithms to self-driving cars, but you may not know\nthat the real magic behind these technological\nadvancements and their query performance is data\nmanagement. Understanding this data dependence is\ncrucial for grasping how these emerging technologies\nare able to produce intelligent decisions so quickly.\nAt their core, these sophisticated digital technologies\nrely heavily on data that has been collected and\nprocessed to ultimately provide intelligent answers from\na specialized database called a vector database. The\nprocess starts by turning raw information like words,\nimages, video or music, into vectors, then feeding them\ninto a pre-trained machine language model (MLM) thatVector Stores in\nArti\x00cial Intelligence (AI)LIVE Watch now: The MongoDB.local NYC keynote. Hear the latest 

In [41]:
# * embedding & vetorization
from langchain_community.vectorstores import DocArrayInMemorySearch
vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings) 

In [42]:
# * retrieve
retriever = vectorstore.as_retriever()

In [43]:
retriever.invoke("Vector store")

[Document(page_content='Atlas Status Customer Support\nManage Cookies\nSocial\nGitHub Stack Over\x00ow\nLinkedIn YouTube\nX Twitch\nFacebook\n© 2024 MongoDB, Inc.5/11/24, 9:56 PM Vector Stores In Artiﬁcial Intelligence (AI) | MongoDB\nhttps://www.mongodb.com/resources/basics/vector-stores 12/12', metadata={'source': 'vector_stores_ mongodb.pdf', 'page': 11}),
 Document(page_content='Get Started With MongoDB Atlas\nTry Free\nAbout\nCareers Investor Relations\nLegal Notices Privacy Notices\nSecurity Information Trust Center\nSupport\nContact Us Customer PortalEnglish5/11/24, 9:56 PM Vector Stores In Artiﬁcial Intelligence (AI) | MongoDB\nhttps://www.mongodb.com/resources/basics/vector-stores 11/12', metadata={'source': 'vector_stores_ mongodb.pdf', 'page': 10}),
 Document(page_content='You’re probably aware of the buzz around arti\x00cial\nintelligence (AI), language learning (LL), and machine\nlearning (ML), which impact everything from social media\nalgorithms to self-driving cars, but

In [10]:
# * vector store: Pinecone
from langchain_pinecone import PineconeVectorStore

index_name = "my-first-rag"

pinecone = PineconeVectorStore.from_documents(pages, embedding=embeddings, index_name=index_name, pinecone_api_key=PINECONE_API_KEY)

In [11]:
retriever_pinecone = pinecone.as_retriever()

In [13]:
retriever_pinecone.invoke("Vector store")

[Document(page_content='Atlas Status Customer Support\nManage Cookies\nSocial\nGitHub Stack Over\x00ow\nLinkedIn YouTube\nX Twitch\nFacebook\n© 2024 MongoDB, Inc.5/11/24, 9:56 PM Vector Stores In Artiﬁcial Intelligence (AI) | MongoDB\nhttps://www.mongodb.com/resources/basics/vector-stores 12/12', metadata={'page': 11.0, 'source': 'vector_stores_ mongodb.pdf'}),
 Document(page_content='Get Started With MongoDB Atlas\nTry Free\nAbout\nCareers Investor Relations\nLegal Notices Privacy Notices\nSecurity Information Trust Center\nSupport\nContact Us Customer PortalEnglish5/11/24, 9:56 PM Vector Stores In Artiﬁcial Intelligence (AI) | MongoDB\nhttps://www.mongodb.com/resources/basics/vector-stores 11/12', metadata={'page': 10.0, 'source': 'vector_stores_ mongodb.pdf'}),
 Document(page_content='You’re probably aware of the buzz around arti\x00cial\nintelligence (AI), language learning (LL), and machine\nlearning (ML), which impact everything from social media\nalgorithms to self-driving cars,

In [14]:
from operator import itemgetter

chain = (
  {
    # "context": itemgetter("question") | retriever,
    "context": itemgetter("question") | retriever_pinecone,
    "question": itemgetter("question")
  }
  | prompt
  | model
  | parser
)

In [15]:
chain.invoke({"question": "What is vectorization?"})

'Vectorization is not explicitly mentioned in the provided context. However, based on the information available, it can be inferred that vectorization refers to the process of transforming raw data into vectors, which are high-dimensional vectors that can be searched in vector databases. This is discussed in the section "Raw data\'s transformation" and also briefly mentioned in the section "What is a vector in AI?".'

In [16]:
# * an example of RunnablePassthrough
from langchain_core.runnables import RunnablePassthrough
from operator import itemgetter
from langchain_pinecone import PineconeVectorStore

# * When already data in pinecone
index_name = "my-first-rag"

pinecone = PineconeVectorStore(embedding=embeddings, index_name=index_name, pinecone_api_key=PINECONE_API_KEY)
retriever = pinecone.as_retriever()
chain = (
  {
    # "context": retriever,
    # "question": RunnablePassthrough()
    "context": itemgetter("question") | retriever,
    "question": itemgetter("question")
  }
  | prompt
  | model
  | parser
)

[Document(page_content="optimizes the original ones into embeddings, which are\nhigh dimensional vectors that can be searched in vector\ndatabases.\nTable of Contents:\nRaw data's transformation\nWhat are vector stores?\nWhat is a vector in AI?\nHow are vectors stored so they can be used by AI?\nUse case for vector stores\nWhat makes vector stores and vector databases\ndi\x00erent from traditional data storage options?\nWhat is the role of ML models, vector embeddings,\nand ANN in AI?\nConclusion\nRaw data's transformation\nThis blog post will focus on how data is transformed into\nvectors and how they are used in vector databases,\nincluding some discussion around search, vector space,\nand approximate nearest neighbor.\nAlthough important, this article will not dig deep into\nde\x00ning vector indexes, cosine similarity searches,\ncosine distance, euclidean distance, orthogonal vectors,\nor facebook ai similarity search.\nWhat are vector stores?5/11/24, 9:56 PM Vector Stores In Artiﬁ

In [17]:
chain.invoke({"question": "What are the use cases for vector stores?"})

"According to the provided context, some use cases for vector stores include:\n\n1. Searching for recipes with certain characteristics (e.g., low-calorie sauce and gluten-free pasta) in a recipe database.\n2. Processing vast amounts of data from web pages in search engines to produce more accurate and relevant search results.\n3. Facilitating facial recognition technologies by comparing facial features within vector databases.\n4. Providing personalized recommendations in e-commerce applications by suggesting products that are close to a user's profile in the embedding space.\n\nThese use cases demonstrate the potential of vector stores for various AI-powered applications, including search engines, recommendation systems, and facial recognition technologies."

In [18]:
ans = chain.invoke({"question": "What is amount of unstructured data?"})

print(ans)

Almost 80% of the data is unstructured and needs a more complex storage system.
