# LangChain

Open-source development freamework for LLM applications.

* API calls through LangChain:
   * Prompts, style of creating inputs to the models
   * Models, LLM models
   * Output parsers, structured format 

In [None]:
#!pip install openai

In [1]:
import os
import openai
openai.api_key = os.environ['OPENAI_API_KEY']

## Chat API, OpenAI

In [2]:
def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = openai.ChatCompletion.create(
        model=model,
        messages=messages,
        temperature=0, 
    )
    return response.choices[0].message["content"]


In [10]:
get_completion("What is 1+1?")

'1+1 equals 2.'

In [3]:
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

In [4]:
style = """American English \
in a calm and respectful tone
"""

In [5]:
prompt = f"""Translate the text \
that is delimited by triple backticks 
into a style that is {style}.
text: ```{customer_email}```
"""

print(prompt)

Translate the text that is delimited by triple backticks 
into a style that is American English in a calm and respectful tone
.
text: ```
Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!
```



In [6]:
response = get_completion(prompt)
response

'I am quite frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie! To add to my frustration, the warranty does not cover the cost of cleaning up my kitchen. I kindly request your assistance at this moment, my friend.'

## Chat API :  LangChain

The next is an abstraction of langchain connected to Openai

In [None]:
#!pip install --upgrade langchain

In [5]:
from langchain.chat_models import ChatOpenAI

In [21]:
chat = ChatOpenAI(temperature=0.0)


In [4]:
template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

In [5]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)

In [6]:
prompt_template.messages[0].prompt

PromptTemplate(input_variables=['style', 'text'], output_parser=None, partial_variables={}, template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n', template_format='f-string', validate_template=True)

In [7]:
prompt_template.messages[0].prompt.input_variables

['style', 'text']

In [8]:
customer_style = """American English \
in a calm and respectful tone
"""

In [9]:
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

In [10]:
customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)

In [11]:
print(type(customer_messages))
print(type(customer_messages[0]))

<class 'list'>
<class 'langchain.schema.messages.HumanMessage'>


In [12]:
print(customer_messages[0])

content="Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone\n. text: ```\nArrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie! And to make matters worse, the warranty don't cover the cost of cleaning up me kitchen. I need yer help right now, matey!\n```\n" additional_kwargs={} example=False


In [13]:
# Call the LLM to translate to the style of the customer message
customer_response = chat(customer_messages)

In [14]:
customer_response

AIMessage(content="I'm really frustrated that my blender lid flew off and made a mess of my kitchen walls with smoothie! And to make things even worse, the warranty doesn't cover the cost of cleaning up my kitchen. I could really use your help right now, my friend!", additional_kwargs={}, example=False)

In [15]:
service_reply = """Hey there customer, \
the warranty does not cover \
cleaning expenses for your kitchen \
because it's your fault that \
you misused your blender \
by forgetting to put the lid on before \
starting the blender. \
Tough luck! See ya!
"""

In [16]:
service_style_pirate = """\
a polite tone \
that speaks in English Pirate\
"""

In [17]:
service_messages = prompt_template.format_messages(
    style=service_style_pirate,
    text=service_reply)

print(service_messages[0].content)

Translate the text that is delimited by triple backticks into a style that is a polite tone that speaks in English Pirate. text: ```Hey there customer, the warranty does not cover cleaning expenses for your kitchen because it's your fault that you misused your blender by forgetting to put the lid on before starting the blender. Tough luck! See ya!
```



In [19]:
service_response = chat(service_messages)
print(service_response.content)

Ahoy there, matey! I regret to inform ye that the warranty be not coverin' the costs o' cleanin' yer galley, as 'tis yer own fault fer misusin' yer blender by forgettin' to secure the lid afore startin' it. Aye, tough luck, me heartie! Fare thee well!


Prompt templates are useful abstraction to help you. They can be long and detailed. LangChain algo provides prompts for common operations. 

## Output Parsers

It is a way do define how we would like the LLM output look like:

In [1]:
{
  "gift": False,
  "delivery_days": 5,
  "price_value": "pretty affordable!"
}

{'gift': False, 'delivery_days': 5, 'price_value': 'pretty affordable!'}

In [2]:
customer_review = """\
This leaf blower is pretty amazing.  It has four settings:\
candle blower, gentle breeze, windy city, and tornado. \
It arrived in two days, just in time for my wife's \
anniversary present. \
I think my wife liked it so much she was speechless. \
So far I've been the only one using it, and I've been \
using it every other morning to clear the leaves on our lawn. \
It's slightly more expensive than the other leaf blowers \
out there, but I think it's worth it for the extra features.
"""

review_template = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product \
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

Format the output as JSON with the following keys:
gift
delivery_days
price_value

text: {text}
"""

In [3]:
from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(review_template)
print(prompt_template)

input_variables=['text'] output_parser=None partial_variables={} messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['text'], output_parser=None, partial_variables={}, template='For the following text, extract the following information:\n\ngift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.\n\ndelivery_days: How many days did it take for the product to arrive? If this information is not found, output -1.\n\nprice_value: Extract any sentences about the value or price,and output them as a comma separated Python list.\n\nFormat the output as JSON with the following keys:\ngift\ndelivery_days\nprice_value\n\ntext: {text}\n', template_format='f-string', validate_template=True), additional_kwargs={})]


In [6]:
messages = prompt_template.format_messages(text=customer_review)
chat = ChatOpenAI(temperature=0.0)
response = chat(messages)
print(response.content)

{
  "gift": false,
  "delivery_days": 2,
  "price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}


In [7]:
type(response.content)

str

*Parse the LLM output string into a Python dictionary*

In [8]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

In [10]:
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")

response_schemas = [gift_schema, 
                    delivery_days_schema,
                    price_value_schema]

In [11]:
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
output_parser

StructuredOutputParser(response_schemas=[ResponseSchema(name='gift', description='Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.', type='string'), ResponseSchema(name='delivery_days', description='How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.', type='string'), ResponseSchema(name='price_value', description='Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.', type='string')])

In [12]:
format_instructions = output_parser.get_format_instructions()
format_instructions

'The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":\n\n```json\n{\n\t"gift": string  // Was the item purchased                             as a gift for someone else?                              Answer True if yes,                             False if not or unknown.\n\t"delivery_days": string  // How many days                                      did it take for the product                                      to arrive? If this                                       information is not found,                                      output -1.\n\t"price_value": string  // Extract any                                    sentences about the value or                                     price, and output them as a                                     comma separated Python list.\n}\n```'

In [15]:
review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

prompt = ChatPromptTemplate.from_template(template=review_template_2)

messages = prompt.format_messages(text=customer_review, 
                                format_instructions=format_instructions)

In [16]:
print(messages[0].content)

For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the productto arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,and output them as a comma separated Python list.

text: This leaf blower is pretty amazing.  It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.


The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```

In [17]:
response = chat(messages)
print(response.content)

```json
{
	"gift": false,
	"delivery_days": "2",
	"price_value": "It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."
}
```


In [18]:
output_dict = output_parser.parse(response.content)

In [19]:
type(output_dict)

dict

In [20]:
output_dict.get('delivery_days')

'2'

## LLMChain

## Sequential Chains

### SimpleSequentialChain

### SequentialChain

## Router Chain

## Document Loading

In [19]:
from langchain.document_loaders import PyPDFLoader

In [20]:
loader_one = PyPDFLoader("test.pdf")

In [21]:
pages = loader_one.load()

In [22]:
len(pages)

3

In [23]:
page = pages[0]

In [24]:
page.metadata

{'source': 'test.pdf', 'page': 0}

In [25]:
page.page_content[500:1500]

'itization, multiple ways of attacking\nthrough code have emerged. Allowing cybercriminals to access\nconfidential information, impersonate identities, steal money,\naccess databases, among others. Our project approaches the 3 of\n10 vulnerabilities identified by OWASP as the most recognized\nand usual in web applications, to identify and evaluate them at\nthe JavaScript code level. For this, a machine learning model was\nproposed, detecting the frequency of occurrence of a word with\nthe identified vulnerability; resulting in a model with an accuracy\nof 89%. Finally, an extension was implemented in Visual Studio\nCode to read in real time the code that the person is writing in\norder to identify which vulnerabilities it has.\nIndex Terms —OWASP, Front-End Vulnerabilities, Machine\nLearning, Artificial Intelligence.\nI. INTRODUCTION\nFront-End developers do not have enough tools that en-\ncompass the complex diversity of cybersecurity challenges.\nPart of that inadequacy comes from cu

In [1]:
from langchain.document_loaders.generic import GenericLoader
from langchain.document_loaders.parsers import OpenAIWhisperParser
from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader

Pay attention that we have a limitation of requests using openai. The next request will cost USD $0.76

In [8]:
url = "https://youtu.be/_kr-XOeZmW4"
loader = GenericLoader(
    YoutubeAudioLoader([url],"."),
    OpenAIWhisperParser())
docs = loader.load()


[youtube] Extracting URL: https://youtu.be/_kr-XOeZmW4
[youtube] _kr-XOeZmW4: Downloading webpage
[youtube] _kr-XOeZmW4: Downloading ios player API JSON
[youtube] _kr-XOeZmW4: Downloading android player API JSON
[youtube] _kr-XOeZmW4: Downloading m3u8 information
[youtube] _kr-XOeZmW4: Downloading MPD manifest
[info] _kr-XOeZmW4: Downloading 1 format(s): 140
[download] ytcracker - enter my world (produced by amplitude problem).m4a has already been downloaded
[download] 100% of    2.98MiB
[ExtractAudio] Not converting audio ytcracker - enter my world (produced by amplitude problem).m4a; file is already in target format m4a
Transcribing part 1!


In [17]:
docs[0].page_content

"BASS BOOSTED MUSIC Look around what do you see, couple handfuls of hackers with their black hat mentality Most of us got records cause we feel compelled to flex All compulsion for these systems that we probably born to wreck My respect to my hackahgotten ex Hacking our election process like a boss bitch This type of terrorism really look terrible Ain't no bloodshed, nobody countin' for some barrels Had a few here, so check a few there Load up all your Twitter bots, I'll teach a crew there They're the marmade, foldin' up the shop origami Diamond, betting on the horses, betting on the jockeys Unh, unh, unh, unh Stick a fork in a book, a process, stick a fork in em Database pushed right to the open Google dork in em, stick a fork in a book, a process, stick a fork in em End of my world, there's no bake as it look There's dark artists doing modesty, no page in the book Can teach you anything, we learn it, no mistake and we cook And we be chopping up the blades like a fleet of Chinooks And

In [None]:
from langchain.document_loaders import NotionDirectoryLoader
loader = NotionDirectoryLoader("document")
docs = loader.load()
docs[0].page_content[:200]

## Document Splitting

In [28]:
from langchain.text_splitter import CharacterTextSplitter
text_splitter = CharacterTextSplitter(
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap = 150,
    length_function=len
)

In [43]:
from langchain.text_splitter import TokenTextSplitter

text_splitter = TokenTextSplitter(chunk_size=1, chunk_overlap=0)

In [47]:
text_splitter.split_text(docs[0].page_content)[:20]

['B',
 'ASS',
 ' B',
 'OO',
 'ST',
 'ED',
 ' MUS',
 'IC',
 ' Look',
 ' around',
 ' what',
 ' do',
 ' you',
 ' see',
 ',',
 ' couple',
 ' handful',
 's',
 ' of',
 ' hackers']

In [44]:
docs_splits = text_splitter.split_documents(docs)

In [49]:
text_splitter.split_documents(pages)[:4]

[Document(page_content='C', metadata={'source': 'test.pdf', 'page': 0}),
 Document(page_content='ognitive', metadata={'source': 'test.pdf', 'page': 0}),
 Document(page_content=' Solution', metadata={'source': 'test.pdf', 'page': 0}),
 Document(page_content=' for', metadata={'source': 'test.pdf', 'page': 0})]

In [37]:
print(len(docs))
print(len(docs_splits))

1
1


In [8]:
from langchain.embeddings.openai import OpenAIEmbeddings
import os
import openai
from langchain.document_loaders import PyPDFLoader

In [10]:
loaders = [
    PyPDFLoader("test.pdf"),
    PyPDFLoader("semi-automated.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [13]:
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

In [14]:
splits = text_splitter.split_documents(docs)

In [15]:
len(splits)

88

### Embeddings

In [19]:
openai.api_key = key_g

In [24]:
embedding = OpenAIEmbeddings(openai_api_key=key_g)

In [25]:
sentence1 = "i like dogs"
sentence2 = "i like canines"
sentence3 = "the weather is ugly outside"

embedding1 = embedding.embed_query(sentence1)
embedding2 = embedding.embed_query(sentence2)
embedding3 = embedding.embed_query(sentence3)

In [26]:
import numpy as np

In [27]:
np.dot(embedding1, embedding2)

0.963208056126645

In [28]:
np.dot(embedding2, embedding3)

0.7596001131741936

In [29]:
np.dot(embedding1, embedding3)

0.7709997651294672

## Vectorstores

## SQL agents

In [1]:
from langchain.agents import create_sql_agent
from langchain.agents.agent_toolkits import SQLDatabaseToolkit
from langchain.sql_database import SQLDatabase
from langchain.llms.openai import OpenAI
from langchain.agents import AgentExecutor
from langchain.agents.agent_types import AgentType
from langchain.chat_models import ChatOpenAI

In [8]:
!pip install pandas


Collecting pandas
  Using cached pandas-2.0.3-cp311-cp311-win_amd64.whl (10.6 MB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB)
Collecting tzdata>=2022.1 (from pandas)
  Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB)
Installing collected packages: pytz, tzdata, pandas
Successfully installed pandas-2.0.3 pytz-2023.3 tzdata-2023.3


In [9]:
import sqlite3
import pandas as pd
# let's make the connection with our database
conn = sqlite3.connect("FPA_FOD_20170508.sqlite")
# # we are going to use pandas to query a table
fires = pd.read_sql_query("SELECT * FROM Fires", conn)
fires.head()

Unnamed: 0,OBJECTID,FOD_ID,FPA_ID,SOURCE_SYSTEM_TYPE,SOURCE_SYSTEM,NWCG_REPORTING_AGENCY,NWCG_REPORTING_UNIT_ID,NWCG_REPORTING_UNIT_NAME,SOURCE_REPORTING_UNIT,SOURCE_REPORTING_UNIT_NAME,...,FIRE_SIZE_CLASS,LATITUDE,LONGITUDE,OWNER_CODE,OWNER_DESCR,STATE,COUNTY,FIPS_CODE,FIPS_NAME,Shape
0,1,1,FS-1418826,FED,FS-FIRESTAT,FS,USCAPNF,Plumas National Forest,511,Plumas National Forest,...,A,40.036944,-121.005833,5.0,USFS,CA,63,63,Plumas,b'\x00\x01\xad\x10\x00\x00\xe8d\xc2\x92_@^\xc0...
1,2,2,FS-1418827,FED,FS-FIRESTAT,FS,USCAENF,Eldorado National Forest,503,Eldorado National Forest,...,A,38.933056,-120.404444,5.0,USFS,CA,61,61,Placer,b'\x00\x01\xad\x10\x00\x00T\xb6\xeej\xe2\x19^\...
2,3,3,FS-1418835,FED,FS-FIRESTAT,FS,USCAENF,Eldorado National Forest,503,Eldorado National Forest,...,A,38.984167,-120.735556,13.0,STATE OR PRIVATE,CA,17,17,El Dorado,b'\x00\x01\xad\x10\x00\x00\xd0\xa5\xa0W\x13/^\...
3,4,4,FS-1418845,FED,FS-FIRESTAT,FS,USCAENF,Eldorado National Forest,503,Eldorado National Forest,...,A,38.559167,-119.913333,5.0,USFS,CA,3,3,Alpine,b'\x00\x01\xad\x10\x00\x00\x94\xac\xa3\rt\xfa]...
4,5,5,FS-1418847,FED,FS-FIRESTAT,FS,USCAENF,Eldorado National Forest,503,Eldorado National Forest,...,A,38.559167,-119.933056,5.0,USFS,CA,3,3,Alpine,b'\x00\x01\xad\x10\x00\x00@\xe3\xaa.\xb7\xfb]\...


In [13]:
!dir

 El volumen de la unidad D es Kuky
 El n£mero de serie del volumen es: 3A98-E69F

 Directorio de D:\Github\Data-Science\openai

14/07/2023  03:37 p. m.    <DIR>          .
06/07/2023  09:13 a. m.    <DIR>          ..
14/07/2023  10:55 a. m.    <DIR>          .ipynb_checkpoints
06/07/2023  03:19 p. m.           253.818 03_vectorstores_and_embeddings.ipynb
06/07/2023  04:30 p. m.           252.650 05_question_answering.ipynb
14/07/2023  09:38 a. m.             6.993 BugHunter.ipynb
14/07/2023  03:19 p. m.       795.785.216 FPA_FOD_20170508.sqlite
14/07/2023  03:37 p. m.            79.214 LangChain.ipynb
14/07/2023  11:12 a. m.           210.202 mysqlsampledatabase.sql
12/09/2022  11:53 a. m.         1.486.219 semi-automated.pdf
02/05/2023  06:36 p. m.           289.945 test.pdf
14/04/2019  10:05 p. m.         3.125.075 ytcracker - enter my world (produced by amplitude problem).m4a
               9 archivos    801.489.332 bytes
               3 dirs  210.183.426.048 bytes libres


In [None]:
Server=localhost\SQLEXPRESS;Database=master;Trusted_Connection=True;
MSI\Usuario

In [14]:
import sqlalchemy.engine.url as url

url.make_url('sqlite:////D:/Github/Data-Science/openaisite.FPA_FOD_20170508.sqlite')

sqlite:////D:/Github/Data-Science/openaisite.FPA_FOD_20170508.sqlite

In [21]:
db = SQLDatabase.from_uri(url.make_url('sqlite:////D:/Github/Data-Science/FPA_FOD_20170508.sqlite'))

OperationalError: (sqlite3.OperationalError) unable to open database file
(Background on this error at: https://sqlalche.me/e/20/e3q8)

In [2]:
help(SQLDatabase)

Help on class SQLDatabase in module langchain.sql_database:

class SQLDatabase(builtins.object)
 |  SQLDatabase(engine: 'Engine', schema: 'Optional[str]' = None, metadata: 'Optional[MetaData]' = None, ignore_tables: 'Optional[List[str]]' = None, include_tables: 'Optional[List[str]]' = None, sample_rows_in_table_info: 'int' = 3, indexes_in_table_info: 'bool' = False, custom_table_info: 'Optional[dict]' = None, view_support: 'bool' = False, max_string_length: 'int' = 300)
 |  
 |  SQLAlchemy wrapper around a database.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, engine: 'Engine', schema: 'Optional[str]' = None, metadata: 'Optional[MetaData]' = None, ignore_tables: 'Optional[List[str]]' = None, include_tables: 'Optional[List[str]]' = None, sample_rows_in_table_info: 'int' = 3, indexes_in_table_info: 'bool' = False, custom_table_info: 'Optional[dict]' = None, view_support: 'bool' = False, max_string_length: 'int' = 300)
 |      Create engine from database URI.
 |  
 |  get_table_

In [None]:
mysqlsampledatabase.sql

+ https://js.langchain.com/docs/modules/chains/other_chains/sql
+ https://blog.futuresmart.ai/langchain-sql-agents-openai-llms-query-database-using-natural-language
+ https://python.langchain.com/docs/modules/chains/popular/sqlite