In [None]:
from dotenv import load_dotenv

In [None]:
load_dotenv()

## Models

### Vanila Large Language Models (LLMs)

LLMs are primarily designed for generating contextually relevant text, with primary focus on generating, completing, and language understanding. These models are pre-trained on diverse corpus capturing linguistic patterns for language understanding. They are widely used for downstream tasks like translation, summarization, task/domain-specific fine-tuning. etc.

Some prominent examples:
- GPT-3 (deprecated)
- llama, llama-2, llama-3

#### Loading open-source chat models using Ollama.
Using Ollama one can setup server for quantized models locally.

References:
1. [langchain](https://python.langchain.com/v0.1/docs/modules/model_io/)
2. [ollama github](https://github.com/ollama/ollama?tab=readme-ov-file)
3. [ollama model library](https://ollama.com/library)

In [None]:
!ollama pull "llama3:text"

In [None]:
from langchain_community.llms import Ollama

llama3 = Ollama(model="llama3:text")

In [None]:
print(llama3.invoke("What is the meaning of life in 10 words?"))

### Chat or Instruction tuned Models

Chat or instruction models are specifically designed for following user instructions or engaging in conversation with the user. They are LLMs that are further fine-tuned with specific datasets. Their main focus is to understand the context from user queries and respond accordingly. They are widely used for question answering, chatbots, dialogoe systems, etc.

Some prominent examples:
- GPT-3.5-turbo, GPT-4
- llama-chat models
- claude-2

In langchain, a chat model is a language model that uses chat messages as inputs and returns chat messages as outputs.

##### Passing user message to model through HumanMessage

In [None]:
from langchain_core.messages import HumanMessage
message = [HumanMessage("What is the meaning of life in 10 words?")]

#### OpenAI models

In [None]:
from langchain_openai import ChatOpenAI

In [None]:
chat_llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.0,
    max_tokens=1024
)

`invoke()` call the chain on an input

In [None]:
print(chat_llm.invoke(message))

`stream()` stream back chunks of the response

In [None]:
for chunk in chat_llm.stream(message):
    print(chunk.content, end="", flush=True)

In [None]:
llm_name= "gpt-4"
chat_llm_gpt4 = ChatOpenAI(model_name=llm_name, temperature=0, openai_api_key=openai.api_key)

In [None]:
print(chat_llm_gpt4.invoke(message))

P.S.: The LLM returns a string, while the ChatModel returns a message.

#### Loading open-source chat models using Ollama.

In [None]:
!ollama pull "llama3"

In [None]:
from langchain_community.chat_models import ChatOllama

llama3_chat = ChatOllama(model="llama3",
                         temperature=0.0,
                         max_tokens=1024,
                         top_k=10,)

In [None]:
print(llama3_chat.invoke(message).content)