## What is missing in this notebook:
1. Conversational Retreivals Chain

## What have been completed: 
1. Retreiving AI tutor
2. Creating Prompt using `StrOutputParser`
3. Creating Prompt using `JsonOutputFunctionsParser`
4. Simplifying inputs


Libraries

In [44]:
from operator import itemgetter

from langchain.prompts import ChatPromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough, RunnableLambda
from langchain.vectorstores import FAISS
from langchain.vectorstores.pgvector import PGVector
from langchain.document_loaders import TextLoader
from langchain.docstore.document import Document
from langchain.vectorstores import DocArrayInMemorySearch
import langchain
import supabase
from sentence_transformers import SentenceTransformer
from langchain.retrievers import BM25Retriever, EnsembleRetriever

from langchain.llms import OpenAI
import json

In [45]:
import os
import openai
from dotenv import load_dotenv

load_dotenv()

True

supabase vector store

In [46]:
import vecs

DB_CONNECTION = "postgresql://postgres:supa-jupyteach@192.168.0.77:54328/postgres"

# create vector store client
vx = vecs.create_client(DB_CONNECTION)

In [47]:
# Loading sentence embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2') 

user_question = input("What is your question? ")

# Creating embedding for user's question
user_embedding = embedding_model.encode(user_question)

2023-11-03 22:56:57,673:INFO - Load pretrained SentenceTransformer: all-MiniLM-L6-v2
2023-11-03 22:56:57,857:INFO - Use pytorch device: cpu


What is your question?  what are dataframes?


Batches: 100%|██████████| 1/1 [00:00<00:00, 113.98it/s]


In [48]:
COLLECTION_NAME = "documents"

In [49]:
def get_vectorstore():
    embeddings = OpenAIEmbeddings()

    db = PGVector(embedding_function=embeddings,
        collection_name=COLLECTION_NAME,
        connection_string=DB_CONNECTION,
    )
    return db

In [50]:
vector_store = get_vectorstore()
retriever = vector_store.as_retriever()

## Similarity Search 

In [51]:
db = get_vectorstore()
docs_with_score = db.similarity_search_with_score(user_question)
docs_with_score

[(Document(page_content="So this will find us the minimum one, and the maximum one could be done just like it. And then unemployment.detite tells us float 64. So if we look at our unemployment series, just notice that the values themselves are float 64, so unemployment.detite is just telling us what kind of values are being stored inside of our series. Great. So now we've talked about what a series is, and we'll now talk about what a data frame is. So a data frame is just going to be how pandas will store multiple columns of data. You could think about data frames as simply just multiple series stacked side by side. You'll notice that we still have an index, and now the index is zero, we'll just call this column zero, one and two. So index zero is associated with the A in columns zero, the A in column one and the A in column two. Index one is associated with the B in column zero, the B in column one and the B in column two. And the index two is associated with the C in column zero, et 

## PromptTemplate + LLM

In [22]:
prompt_temp = ChatPromptTemplate.from_template("tell me more about pandas {topic}")
model = ChatOpenAI()
chain_temp = prompt_temp | model

In [23]:
chain_temp.invoke({"topic": "AI"})

AIMessage(content='Pandas is not an AI entity. Pandas is actually a popular open-source Python library used for data manipulation and analysis. It provides data structures and functions for efficiently handling and analyzing structured data, primarily in tabular form.\n\nPandas is built on top of NumPy, another Python library for numerical computing. It offers powerful tools for data cleaning, preprocessing, merging, reshaping, and aggregating. Pandas also provides easy-to-use data structures like DataFrames, which allow users to organize and manipulate data in a tabular format, similar to spreadsheets or relational databases.\n\nOne of the key features of Pandas is its ability to handle missing data. It provides methods to detect, remove, or fill missing values in a dataset. Pandas also supports various data input and output formats, including CSV, Excel, SQL databases, and more.\n\nIn addition to data manipulation, Pandas also offers statistical analysis capabilities. It includes fun

In [24]:
chain_temp = prompt_temp | model.bind(stop=["\n"])
chain_temp.invoke({"topic": "AI"})

AIMessage(content='Pandas is an open-source data analysis and manipulation library for Python. It provides easy-to-use data structures and data analysis tools to perform various operations on structured data. While pandas itself is not an AI library, it is often used in conjunction with AI and machine learning frameworks to preprocess and analyze data.')

### Function Call Information

In [25]:
functions = [
    {
        "name": "datascience",
        "description": "pandas",
        "parameters": {
            "type": "object",
            "properties": {
                "setup": {"type": "string", "description": "Describe Pandas"},
                "punchline": {
                    "type": "string",
                    "description": "Describe Pandas",
                },
            },
            "required": ["setup", "punchline"],
        },
    }
]
chain_temp = prompt_temp | model.bind(function_call={"name": "datascience"}, functions=functions)

In [26]:
chain_temp.invoke({"topic": "AI"}, config={})

AIMessage(content='', additional_kwargs={'function_call': {'name': 'datascience', 'arguments': '{\n  "setup": "Pandas is a powerful open-source data manipulation and analysis library for Python.",\n  "punchline": "It provides data structures and functions necessary to manipulate and analyze structured data."\n}'}})

Prompt Using `StrOutputParser`

In [9]:
prompt = ChatPromptTemplate.from_template(
    "what is a pandas {topic}"
)
model = ChatOpenAI()
output_pars = StrOutputParser()

In [10]:
chain = prompt | model | output_pars
chain.invoke({"topic": "AI"})

'Pandas AI refers to the combination of two technologies: pandas, a popular open-source data manipulation library in Python, and artificial intelligence (AI). \n\nPandas is primarily used for data analysis and manipulation tasks, offering data structures and functions to efficiently handle large datasets. It provides tools for cleaning, transforming, and analyzing data, making it a valuable tool for data scientists.\n\nOn the other hand, AI involves the development of intelligent machines that can perform tasks that typically require human intelligence. This includes technologies like machine learning, natural language processing, computer vision, and more.\n\nWhen pandas is combined with AI techniques, it allows for more advanced data analysis and decision-making capabilities. For example, pandas AI can be used to develop predictive models, automate data processing tasks, perform sentiment analysis on text data, build recommendation systems, and much more.\n\nOverall, pandas AI enable

#### Output Parser using `jsonOutputFunctionsParser`

In [27]:
from langchain.output_parsers.openai_functions import JsonOutputFunctionsParser
chain_json = (
    prompt
    | model.bind(function_call={"name": "datascience"}, functions=functions)
    | JsonOutputFunctionsParser()
)

In [28]:
chain_json.invoke({"topic": "AI"})

{'setup': 'Pandas is a powerful open-source data manipulation and analysis library for Python.',
 'punchline': 'It provides data structures and functions for efficiently handling and analyzing structured data.'}

In [30]:
from langchain.output_parsers.openai_functions import JsonKeyOutputFunctionsParser
chain_json = (
    prompt
    | model.bind(function_call={"name": "datascience"}, functions=functions)
    | JsonKeyOutputFunctionsParser(key_name="setup")
)
chain_json.invoke({"topic": "AI"})

'Pandas is a powerful open-source data manipulation and analysis library for Python.'

## Simplifying input

In [40]:
from langchain.schema.runnable import RunnableMap, RunnablePassthrough
chain_simple = (
    {"topic": RunnablePassthrough()}
    | prompt
    | model.bind(function_call={"name": "datascience"}, functions=functions)
    | JsonKeyOutputFunctionsParser(key_name="setup")
)

In [41]:
chain_simple.invoke("AI")

'Pandas is a fast, powerful, and flexible open-source data manipulation and analysis library for Python.'

## Conversational Retrievals Chains

In [42]:
from langchain.schema import format_document
from langchain.prompts.prompt import PromptTemplate

_template = """Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question, in its original language.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
CONDENSE_QUESTION_PROMPT = PromptTemplate.from_template(_template)

In [43]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
ANSWER_PROMPT = ChatPromptTemplate.from_template(template)