Using load.env and find.env we are using the API keys of Pinecone and OpenAI for our applicaiton.

In [1]:
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())


True

## LLM Wrappers

1. Opening the OpenAI wrapper
2. Initiating the ```text-davinci-003``` completion model
3. Asking the question

In [3]:
from langchain.llms import OpenAI
llm = OpenAI(model_name= "text-davinci-003")
llm("Explain large language models in a few words.")

'\n\nLarge language models are deep neural networks trained on large corpora of text to generate context-aware predictions about words, phrases, and sentences. They are used to build natural language processing (NLP) systems that can understand and generate human-like language.'

## Chat Model

We will be importing a schema containing 3 parts:
1. AI Message
2. Human Message --> User Message
3. System Message --> User to configure the system in the playground

In [4]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage,
)
from langchain.chat_models import ChatOpenAI

To use the chat model we combine the system message and human message and use it as a list as an input to the chat model

In [5]:
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3)
messages = [
    SystemMessage(content= "you are an expert data scientist"),
    HumanMessage(content= "Write a python sciipt to train a neural network on simulated data")
]
response = chat(messages)

In [6]:
print(response.content, end="\n")

Sure, here's an example script that trains a simple neural network on simulated data using the Keras library:

```python
import numpy as np
from keras.models import Sequential
from keras.layers import Dense

# Generate simulated data
x = np.random.rand(1000, 10)
y = np.sum(x, axis=1)

# Define the neural network architecture
model = Sequential()
model.add(Dense(32, input_dim=10, activation='relu'))
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam')

# Train the model
model.fit(x, y, epochs=100, batch_size=32, validation_split=0.2)
```

In this script, we first generate a simulated dataset with 1000 samples, where each sample has 10 features and the target variable is the sum of the features. We then define a simple neural network architecture with one hidden layer of 32 neurons and an output layer with one neuron. We compile the model with the mean squared error loss function and the Adam optimizer, and then train th

## Prompt Templates

Prompts are the things that we are going to send to our language model.
Most of the time these prompt are not going to be static buy dynamic so for this LangChain has Prompt Templates.

In [7]:
from langchain import PromptTemplate

template = """
You are an expert data scientist with the expertise in building deep learning models.
Explain the concept of {concept} in couple of lines.
"""

prompt = PromptTemplate(
    input_variables= ["concept"],
    template= template,
)

In [8]:
prompt

PromptTemplate(input_variables=['concept'], output_parser=None, partial_variables={}, template='\nYou are an expert data scientist with the expertise in building deep learning models.\nExplain the concept of {concept} in couple of lines.\n', template_format='f-string', validate_template=True)

Now take a piece of text and use it as user input. We can then use the user input to format the prompt to the language model.

As the user input changes, we will have the required answers

In [9]:
llm(prompt.format(concept= "batch normalization"))

'\nBatch Normalization is a technique used to reduce internal covariate shift in deep learning models. It normalizes the inputs to a layer for each mini-batch and helps the model learn faster and more accurately. It also helps keep the activations in a reasonable range, which can reduce the chances of overfitting and improve generalization.'

In [10]:
llm(prompt.format(concept= "Grid Seach CV"))

'\nGrid Search CV is a technique used in hyperparameter tuning that allows a user to define a grid of hyperparameter values, and exhaustively search for the best performing model by evaluating all combinations of these hyperparameters. It works by taking a grid of parameters and evaluating a model for each combination of parameters, and then selecting the model that performs the best.'

In [11]:
llm(prompt.format(concept= "encoders"))

'\nEncoders are used to transform data into a form that is more suitable for machine learning models. They are used to compress the amount of data required to represent a given input and to reduce the dimensionality of the input space. Encoders help to convert input data of a certain type into numerical representations which can be used as input to deep learning models.'

## Chains

A chain takes a language model and a prompt template and combines them into an interface that takes input from the user and outputs an answer from the language model (like a composite function where the inner function is the prompt template and the outer function is language model)

We can also build sequential chains where we have one chain returning an output and then a second chain taking the output from the first chain as an input.

In [12]:
from langchain.chains import LLMChain
chain = LLMChain(llm= llm, prompt= prompt)

#specify the input variables
print(chain.run("encoders"))


Encoders are a type of neural network architecture used to reduce the dimensionality of input data by extracting important features from the data. They are primarily used for representation learning, where the goal is to compress the input data into a more efficient representation that can be used for downstream tasks such as classification and prediction.


In [21]:
second_prompt = PromptTemplate(
    input_variables= ["ml_concept"],
    template= "Turn the concept description of {ml_concept} and explain it to a 5 year old in 500 words.",
)
chain_two = LLMChain(llm=llm, prompt=second_prompt)

Combining both the chains to get the explanation

In [22]:
from langchain.chains import SimpleSequentialChain
combined_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

#running the chain specifying the input of the first chain only
explanation = combined_chain.run("encoders")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
An encoder is a type of deep learning model that compresses data into a lower dimensional representation, usually for use in other downstream models. It is commonly used in natural language processing tasks, such as translating text or speech into a numerical representation that can be fed into a neural network.[0m
[33;1m[1;3m

An encoder is a special type of computer program that can take information from one place and make it smaller and easier to use. For example, it can take a big block of text and turn it into a series of numbers or symbols that are easier for a computer to understand.

Let's say you had a book full of words. An encoder can take those words and turn them into a series of numbers that represent the words. So instead of having to remember the words, the computer can just remember the numbers. It's like having a secret code that only the computer can understand.

Let's say you wanted to send a m

## Embeddings and VectorStores

Taking the explanation and spliting it into chunks as we need to further store them into Vector Spaces

In [23]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size= 100,
    chunk_overlap= 0,
)

texts = text_splitter.create_documents([explanation])

In [24]:
texts

[Document(page_content='An encoder is a special type of computer program that can take information from one place and make', metadata={}),
 Document(page_content='it smaller and easier to use. For example, it can take a big block of text and turn it into a series', metadata={}),
 Document(page_content='of numbers or symbols that are easier for a computer to understand.', metadata={}),
 Document(page_content="Let's say you had a book full of words. An encoder can take those words and turn them into a series", metadata={}),
 Document(page_content='of numbers that represent the words. So instead of having to remember the words, the computer can', metadata={}),
 Document(page_content="just remember the numbers. It's like having a secret code that only the computer can understand.", metadata={}),
 Document(page_content="Let's say you wanted to send a message to your friend. An encoder can take the message and make it", metadata={}),
 Document(page_content='into a secret code that only your 

In [25]:
texts[0].page_content

'An encoder is a special type of computer program that can take information from one place and make'

In [26]:
texts[1].page_content

'it smaller and easier to use. For example, it can take a big block of text and turn it into a series'

Vector representation of the text --> embeddings

In [27]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(model_name = 'ada')

In [29]:
query_result = embeddings.embed_query(texts[0].page_content)
query_result

[-0.03485928179961753,
 0.05722920349882175,
 0.00021713228966146255,
 0.022472629817635873,
 0.027792933076667206,
 0.0028065110708098222,
 0.020439001249579196,
 0.0118114829797385,
 0.006260085988574631,
 0.01108225179984785,
 -0.04026175006327804,
 -0.023787299694025734,
 0.035927449211539106,
 0.07522373811868205,
 0.015447365459645325,
 -0.030956356535407722,
 0.03609178294608784,
 -0.027176680640786863,
 0.012273671375326164,
 -0.003959414636214966,
 0.013105610673648476,
 0.0008441358990276038,
 0.07279981832805603,
 0.0013788622175790024,
 0.050902356581340714,
 0.01047113653940202,
 0.029333560441077686,
 0.030278480346055497,
 -0.005654105575256828,
 -0.03878274645233945,
 -0.01392727934639214,
 0.007405286351965141,
 0.03705724410222294,
 0.022698589168301676,
 0.023581881594517223,
 0.01032734405601058,
 0.03810487026298722,
 -0.00011474470627358384,
 -0.042069421143314106,
 -0.033832196890010546,
 0.003617908652313542,
 -0.038351372727455514,
 0.015334385784312421,
 0.004

We have the vector representation of the docuement, now we are going to store it in PineCone, which a vector store

In [30]:
import os
import pinecone
from langchain.vectorstores import Pinecone

#initialize pinecone
pinecone.init(api_key= os.getenv("PINECONE_API_KEY"), environment= os.getenv("PINECONE_ENV"))

  from tqdm.autonotebook import tqdm


In [36]:
index_name = 'langchain-quickstart'
search = Pinecone.from_documents(texts, embeddings, index_name= index_name)

ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'date': 'Mon, 26 Jun 2023 11:22:55 GMT', 'x-envoy-upstream-service-time': '1', 'content-length': '104', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 1024 does not match the dimension of the index 1500","details":[]}


In [None]:
query = "What is different about encoder?"
result = search.similarity_search(query)

In [None]:
result

## Agents

Go over to OpenAI, chatGPT's plug-in page, you'll notice a python code interpreter, we will do something similar in langchain.

from langchain.agents.agent_toolkits import create_python_agent
from langchain.tools.python.tool import PythonREPLTool
from langchain.python import PythonREPL
from langchain.llms.openai import OpenAI

In [None]:
# initiating a python agent executor

agent_executor = create_python_agent(
    llm = OpenAI(temperature=0, max_tokens=1000),
    tool = PythonREPLTool(),
    verbose = True
)

In [None]:
# allowing language model to run python code
agent_executor.run("FInd the roots (zeros) if the quadratic fucntion 3 * x**2 + 2 * x - 1")