# LangChain Models

## What are Models in LangChain?

**Models** are the core building blocks of LangChain applications. They provide a **standardized interface** to interact with various Large Language Model (LLM) providers, allowing you to switch between different models and providers without changing your application code.

### Why Use LangChain Models?

1. **Provider Abstraction**: Write code once, use with any LLM provider (OpenAI, HuggingFace, Groq, Ollama, etc.)
2. **Consistent Interface**: All models use the same `.invoke()` method regardless of provider
3. **Easy Switching**: Change from one model to another with minimal code changes
4. **Composability**: Models integrate seamlessly with prompts, chains, and other LangChain components
5. **Type Safety**: Chat models provide structured message handling

### Models in the LangChain Ecosystem

```
Input → Prompt Template → Model → Output Parser → Result
```

Models sit at the heart of every LangChain application, processing prompts and generating responses.

### Key Principle

**All LangChain models implement a common interface**, making them interchangeable and composable with other LangChain components.

## LLMs vs Chat Models

### Definition

LangChain provides two main types of model interfaces:

#### 1. LLMs (Language Models)
- **Input**: Plain text string
- **Output**: Plain text string
- **Use Case**: Simple text completion, generation tasks
- **Example**: "Complete this sentence: The capital of France is" → "Paris"

#### 2. Chat Models
- **Input**: List of messages with roles (System, Human, AI)
- **Output**: AI message with content
- **Use Case**: Conversational AI, chatbots, multi-turn dialogues
- **Example**: 
  ```python
  [
    SystemMessage(content="You are a helpful assistant"),
    HumanMessage(content="What is the capital of France?")
  ]
  → AIMessage(content="The capital of France is Paris.")
  ```

### Comparison Table

| Feature | LLMs | Chat Models |
|---------|------|-------------|
| **Input Format** | String | List of Messages |
| **Output Format** | String | AIMessage |
| **Conversation History** | Manual | Built-in |
| **Role Support** | No | Yes (System, Human, AI) |
| **Best For** | Simple completion | Conversations, agents |
| **Example Class** | `HuggingFaceEndpoint` | `ChatHuggingFace`, `ChatGroq` |

### When to Use Each

**Use LLMs when:**
- Simple text completion tasks
- Single-turn generation
- No conversation context needed

**Use Chat Models when:**
- Building chatbots or assistants
- Multi-turn conversations
- Need system instructions
- Working with agents
- Most modern LLM applications (recommended)

## Model Providers Overview

LangChain supports integration with numerous LLM providers:

### API-Based Providers
- **OpenAI**: GPT-4, GPT-3.5 (requires API key, paid)
- **Anthropic**: Claude models (requires API key, paid)
- **Google**: Gemini, PaLM (requires API key)
- **Groq**: Ultra-fast inference (requires API key, free tier available)
- **HuggingFace**: Access to thousands of models (requires API token, free tier)

### Local Providers
- **Ollama**: Run models locally (free, requires local installation)
- **HuggingFace Pipeline**: Download and run models locally (free, requires storage)
- **LlamaCpp**: Optimized local inference (free)

### Comparison: API vs Local

| Aspect | API-Based | Local |
|--------|-----------|-------|
| **Setup** | Easy (just API key) | Complex (install, download) |
| **Cost** | Pay per token | Free (after hardware) |
| **Speed** | Fast (dedicated servers) | Depends on hardware |
| **Privacy** | Data sent to provider | Complete privacy |
| **Model Selection** | Provider's models | Any open-source model |
| **Hardware Needs** | None | GPU recommended |
| **Internet** | Required | Not required |

In this notebook, we'll cover:
1. **HuggingFaceEndpoint** (API-based)
2. **ChatGroq** (API-based, very fast)
3. **ChatOllama** (Local)
4. **HuggingFacePipeline** (Local)

## HuggingFaceEndpoint

### Definition

`HuggingFaceEndpoint` provides access to models hosted on the **Hugging Face Hub** via their Inference API. This allows you to use thousands of open-source models without downloading them locally.

### Key Features

- **Access to 1000s of models**: Any model on Hugging Face Hub
- **No local storage**: Models run on Hugging Face servers
- **Free tier available**: Limited requests per month
- **API token required**: Get from huggingface.co/settings/tokens
- **Flexible**: Supports various tasks (text-generation, summarization, etc.)

### Use Cases

- Trying different models without downloading
- Production apps with moderate traffic
- Prototyping and experimentation
- Using specialized models (code, multilingual, etc.)

### Limitations

- **Rate limits**: Free tier has request limits
- **Internet required**: Cannot work offline
- **Latency**: Slower than local models
- **Privacy**: Data sent to Hugging Face servers

### ChatHuggingFace Wrapper

`ChatHuggingFace` wraps a `HuggingFaceEndpoint` to make it behave like a chat model, enabling:
- Message-based input (System, Human, AI)
- Conversation history management
- Integration with LangChain chat components

In [None]:
# HuggingFaceEndpoint and ChatHuggingFace Example
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from dotenv import load_dotenv

# Load environment variables (HF_TOKEN should be in .env file)
load_dotenv()

# Create HuggingFace endpoint for Llama model
llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Llama-3.2-1B-Instruct",  # Model ID from Hugging Face Hub
    task="text-generation",                      # Task type
    temperature=0.5                               # Creativity level (0-1)
)

# Wrap it as a chat model
model1 = ChatHuggingFace(llm=llm)

# Test the model
response = model1.invoke("What is LangChain?")
print(response.content)

### Code Explanation:

1. **Import required classes**: `HuggingFaceEndpoint` for API access, `ChatHuggingFace` for chat wrapper
2. **Load environment variables**: `load_dotenv()` loads `HF_TOKEN` from `.env` file
   - Get your token from: https://huggingface.co/settings/tokens
   - Add to `.env`: `HF_TOKEN=your_token_here`
3. **Create endpoint**:
   - `repo_id`: Model identifier from Hugging Face Hub (format: `username/model-name`)
   - `task`: Type of task ("text-generation" for LLMs)
   - `temperature`: Controls randomness (0 = deterministic, 1 = creative)
4. **Wrap as chat model**: `ChatHuggingFace(llm=llm)` converts the LLM to a chat model
5. **Invoke**: `model1.invoke("question")` sends request and returns `AIMessage`
6. **Access content**: `response.content` contains the generated text

### Common Parameters:

```python
llm = HuggingFaceEndpoint(
    repo_id="model-name",           # Required: Model ID
    task="text-generation",         # Task type
    temperature=0.7,                # Randomness (0-1)
    max_new_tokens=512,             # Max tokens to generate
    top_p=0.95,                     # Nucleus sampling
    repetition_penalty=1.15,        # Prevent repetition
    huggingfacehub_api_token="..." # Or use env variable
)
```

### Popular Models to Try:

- `meta-llama/Llama-3.2-1B-Instruct` - Fast, small Llama model
- `mistralai/Mistral-7B-Instruct-v0.2` - High quality, efficient
- `google/flan-t5-large` - Good for summarization
- `bigcode/starcoder` - Code generation

## ChatGroq

### Definition

`ChatGroq` provides access to **Groq's ultra-fast inference API**. Groq uses custom hardware (LPU - Language Processing Unit) to achieve extremely fast response times.

### Key Features

- **Blazing Fast**: 10-100x faster than typical inference
- **Low Latency**: Responses in milliseconds
- **Free Tier**: Generous free tier available
- **Popular Models**: Llama, Mixtral, Gemma
- **Chat-Native**: Built for conversational AI

### Use Cases

- Real-time chatbots
- Interactive applications
- High-throughput applications
- Streaming responses
- Production apps requiring speed

### Limitations

- **Limited model selection**: Only models optimized for Groq hardware
- **API key required**: Get from console.groq.com
- **Rate limits**: Free tier has limits
- **Internet required**: Cloud-based only

### Why Choose Groq?

**Speed is critical** → Groq is the fastest option
- User-facing chatbots (no waiting)
- Real-time applications
- High request volume
- Streaming responses

In [None]:
# ChatGroq Example
from langchain_groq import ChatGroq
from dotenv import load_dotenv

# Load environment variables (GROQ_API_KEY should be in .env file)
load_dotenv()

# Create Groq chat model
model2 = ChatGroq(
    model="llama-3.1-8b-instant",  # Model name
    temperature=0.5                 # Creativity level
)

# Test the model
response = model2.invoke("Explain LangChain in one sentence")
print(response.content)

### Code Explanation:

1. **Import ChatGroq**: From `langchain_groq` package
2. **Load environment**: `load_dotenv()` loads `GROQ_API_KEY` from `.env`
   - Get API key from: https://console.groq.com/keys
   - Add to `.env`: `GROQ_API_KEY=your_key_here`
3. **Create model**:
   - `model`: Model identifier (see available models below)
   - `temperature`: Controls randomness (0 = focused, 1 = creative)
4. **Invoke**: Send message and get response
5. **Access content**: `response.content` contains the answer

### Available Groq Models:

```python
# Fast and efficient
model="llama-3.1-8b-instant"      # Llama 3.1 8B (recommended)
model="llama-3.1-70b-versatile"   # Llama 3.1 70B (more capable)
model="mixtral-8x7b-32768"        # Mixtral (long context)
model="gemma-7b-it"               # Google Gemma
```

### Common Parameters:

```python
model = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0.7,              # Randomness (0-1)
    max_tokens=1024,              # Max response length
    top_p=0.9,                    # Nucleus sampling
    stream=False,                 # Enable streaming
    api_key="..."                 # Or use env variable
)
```

### Performance Comparison:

| Provider | Typical Latency | Speed Rating |
|----------|----------------|---------------|
| Groq | 50-200ms | ⚡⚡⚡⚡⚡ |
| OpenAI | 1-3s | ⚡⚡⚡ |
| HuggingFace API | 2-5s | ⚡⚡ |
| Local (GPU) | 500ms-2s | ⚡⚡⚡ |
| Local (CPU) | 5-30s | ⚡ |

## ChatOllama

### Definition

`ChatOllama` provides an interface to **Ollama**, a tool for running large language models locally on your machine. It's the easiest way to run models locally.

### Key Features

- **100% Local**: Models run entirely on your machine
- **Complete Privacy**: No data sent to external servers
- **Free**: No API costs
- **Offline**: Works without internet
- **Easy Setup**: Simple installation and model management
- **Many Models**: Llama, Mistral, Phi, Gemma, and more

### Use Cases

- Privacy-sensitive applications
- Offline environments
- Development without API costs
- Learning and experimentation
- Applications with high request volume

### Limitations

- **Hardware requirements**: Needs good CPU/GPU
- **Storage**: Models can be 1-40GB each
- **Setup required**: Must install Ollama first
- **Slower than Groq**: But faster than CPU-only inference
- **Model size constraints**: Larger models need more RAM

### Installation

1. **Install Ollama**: Download from https://ollama.ai
2. **Pull a model**: 
   ```bash
   ollama pull phi3
   ollama pull llama3.2
   ollama pull mistral
   ```
3. **Verify**: `ollama list` to see installed models

### When to Use Ollama?

**Choose Ollama when:**
- Privacy is critical
- No internet connection
- Want to avoid API costs
- Have decent hardware (8GB+ RAM)
- Learning/experimenting

In [None]:
# ChatOllama Example
from langchain_ollama import ChatOllama

# PREREQUISITE: Run this in terminal first:
# ollama pull phi3

# Create Ollama chat model
model3 = ChatOllama(
    model='phi3',        # Model name (must be pulled first)
    temperature=0.5      # Creativity level
)

# Test the model
response = model3.invoke("What are the benefits of local LLMs?")
print(response.content)

### Code Explanation:

1. **Import ChatOllama**: From `langchain_ollama` package
2. **Prerequisites**: 
   - Ollama must be installed
   - Model must be pulled: `ollama pull phi3`
   - Ollama service must be running (starts automatically on install)
3. **Create model**:
   - `model`: Name of pulled model
   - `temperature`: Controls randomness
4. **Invoke**: Works exactly like other chat models
5. **Local execution**: Model runs on your machine, no API calls

### Popular Ollama Models:

| Model | Size | Best For | RAM Needed |
|-------|------|----------|------------|
| `phi3` | 2.3GB | Fast, efficient | 4GB |
| `llama3.2` | 2GB | General purpose | 4GB |
| `mistral` | 4.1GB | High quality | 8GB |
| `llama3.1:8b` | 4.7GB | Balanced | 8GB |
| `codellama` | 3.8GB | Code generation | 8GB |
| `llama3.1:70b` | 40GB | Best quality | 64GB |

### Common Parameters:

```python
model = ChatOllama(
    model="phi3",
    temperature=0.7,
    num_predict=512,        # Max tokens to generate
    top_p=0.9,
    top_k=40,
    repeat_penalty=1.1,
    base_url="http://localhost:11434"  # Ollama server URL
)
```

### Ollama Commands:

```bash
# List installed models
ollama list

# Pull a model
ollama pull llama3.2

# Remove a model
ollama rm phi3

# Run model in terminal (for testing)
ollama run phi3
```

## HuggingFacePipeline (Fully Local)

### Definition

`HuggingFacePipeline` allows you to download and run models **entirely locally** using the `transformers` library. This gives you complete control and maximum privacy.

### Key Features

- **Fully Local**: Model weights downloaded to your machine
- **Complete Privacy**: No external API calls
- **Free**: No API costs
- **Offline**: Works without internet (after download)
- **Full Control**: Access to all model parameters
- **Any Model**: Use any model from Hugging Face Hub

### Use Cases

- Maximum privacy requirements
- Air-gapped environments
- Research and experimentation
- Fine-tuning custom models
- High-volume applications (no API limits)

### Limitations

- **Hardware intensive**: Requires good GPU for reasonable speed
- **Large storage**: Models are 1-40GB+
- **Complex setup**: More configuration than Ollama
- **Slow on CPU**: Can take 10-30 seconds per response
- **Memory hungry**: Large models need 16GB+ RAM

### Requirements

- **GPU recommended**: NVIDIA GPU with CUDA for good performance
- **Storage**: 10-50GB for model weights
- **RAM**: 8GB minimum, 16GB+ recommended
- **Libraries**: `transformers`, `torch`, `accelerate`

### When to Use HuggingFacePipeline?

**Choose HuggingFacePipeline when:**
- Need absolute privacy (medical, legal, sensitive data)
- Offline environment required
- Want to fine-tune models
- Have powerful hardware (GPU)
- Research or experimentation

In [None]:
# HuggingFacePipeline Example (Fully Local)
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline
import os

# Set cache directory (where models will be downloaded)
os.environ['HF_HOME'] = 'D:/huggingface_cache'

# Create local pipeline
llm = HuggingFacePipeline.from_model_id(
    model_id='TinyLlama/TinyLlama-1.1B-Chat-v1.0',  # Small model for demo
    task='text-generation',                         # Task type
    pipeline_kwargs=dict(
        temperature=0.5,                            # Creativity
        max_new_tokens=100                          # Max response length
    )
)

# Wrap as chat model
model4 = ChatHuggingFace(llm=llm)

# Test the model (runs locally!)
result = model4.invoke("What is the capital of India?")
print(result.content)

### Code Explanation:

1. **Import classes**: `HuggingFacePipeline` for local inference, `ChatHuggingFace` for chat wrapper
2. **Set cache directory**: `HF_HOME` determines where models are downloaded
   - Default: `~/.cache/huggingface/`
   - Change if you have limited space on C: drive
3. **Create pipeline**:
   - `model_id`: Model from Hugging Face Hub
   - `task`: Type of task
   - `pipeline_kwargs`: Parameters for generation
4. **First run**: Model will be downloaded (can take several minutes)
5. **Subsequent runs**: Uses cached model (fast startup)
6. **Wrap as chat**: `ChatHuggingFace(llm=llm)` for chat interface
7. **Invoke**: Runs entirely on your machine

### Recommended Local Models:

| Model | Size | Speed (CPU) | Speed (GPU) | Quality |
|-------|------|-------------|-------------|----------|
| `TinyLlama/TinyLlama-1.1B-Chat-v1.0` | 1.1GB | Slow | Fast | Basic |
| `microsoft/phi-2` | 2.7GB | Very Slow | Fast | Good |
| `mistralai/Mistral-7B-Instruct-v0.2` | 14GB | Unusable | Medium | Excellent |
| `meta-llama/Llama-2-7b-chat-hf` | 13GB | Unusable | Medium | Excellent |

### Advanced Configuration:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Load model with specific device
model_id = "microsoft/phi-2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",      # Automatically use GPU if available
    torch_dtype="auto",     # Optimize data type
    trust_remote_code=True  # For some models
)

# Create pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.7
)

# Use with LangChain
llm = HuggingFacePipeline(pipeline=pipe)
```

### GPU vs CPU Performance:

| Hardware | TinyLlama (1.1B) | Phi-2 (2.7B) | Mistral (7B) |
|----------|------------------|--------------|---------------|
| **CPU** | 5-10s | 15-30s | 60-120s |
| **GPU (RTX 3060)** | 0.5-1s | 1-2s | 3-5s |
| **GPU (RTX 4090)** | 0.2-0.5s | 0.5-1s | 1-2s |

## Understanding Model Parameters

### Temperature

**Controls randomness/creativity of responses**

- **Range**: 0.0 to 1.0 (sometimes up to 2.0)
- **Low (0.0-0.3)**: Deterministic, focused, factual
- **Medium (0.4-0.7)**: Balanced, natural
- **High (0.8-1.0+)**: Creative, varied, unpredictable

**Use Cases:**
```python
temperature=0.0   # Factual Q&A, code generation, data extraction
temperature=0.5   # General chatbot, balanced responses
temperature=0.9   # Creative writing, brainstorming, storytelling
```

### Max Tokens / Max New Tokens

**Limits the length of generated response**

- **max_tokens**: Total tokens (prompt + response)
- **max_new_tokens**: Only response tokens (recommended)
- **Typical values**: 100-2000

**Guidelines:**
```python
max_new_tokens=50     # Short answers, classifications
max_new_tokens=256    # Paragraph responses
max_new_tokens=1024   # Detailed explanations
max_new_tokens=2048   # Long-form content
```

### Top P (Nucleus Sampling)

**Alternative to temperature for controlling randomness**

- **Range**: 0.0 to 1.0
- **How it works**: Considers only top P% probability mass
- **Typical value**: 0.9-0.95

```python
top_p=0.9   # Consider top 90% of probability mass
top_p=0.95  # More diverse (default for many models)
top_p=1.0   # Consider all tokens
```

### Top K

**Limits vocabulary to top K most likely tokens**

- **Range**: 1 to vocabulary size
- **Typical value**: 40-50
- **Lower = more focused, Higher = more diverse**

```python
top_k=10   # Very focused
top_k=40   # Balanced (common default)
top_k=100  # More diverse
```

### Repetition Penalty

**Prevents model from repeating itself**

- **Range**: 1.0 to 2.0
- **1.0**: No penalty
- **1.1-1.3**: Recommended range
- **Higher**: Stronger penalty against repetition

```python
repetition_penalty=1.0   # No penalty
repetition_penalty=1.15  # Mild penalty (recommended)
repetition_penalty=1.5   # Strong penalty
```

### Parameter Combinations:

```python
# Factual, deterministic
temperature=0.1, top_p=0.9, top_k=40

# Balanced, natural
temperature=0.7, top_p=0.95, top_k=50

# Creative, diverse
temperature=0.9, top_p=0.98, top_k=100
```

## Best Practices for LangChain Models

### 1. Model Selection

| Requirement | Recommended Model |
|-------------|-------------------|
| **Speed is critical** | ChatGroq |
| **Privacy required** | ChatOllama or HuggingFacePipeline |
| **Cost-effective** | HuggingFaceEndpoint (free tier) |
| **Best quality** | ChatGroq (llama-3.1-70b) or GPT-4 |
| **Offline needed** | ChatOllama or HuggingFacePipeline |
| **Experimentation** | HuggingFaceEndpoint |
| **Production** | ChatGroq or OpenAI |

### 2. API Key Management

✅ **DO:**
- Store API keys in `.env` file
- Use `python-dotenv` to load keys
- Add `.env` to `.gitignore`
- Use environment variables in production
- Rotate keys regularly

❌ **DON'T:**
- Hardcode API keys in code
- Commit keys to version control
- Share keys publicly
- Use same key across projects

**Example `.env` file:**
```
HF_TOKEN=hf_xxxxxxxxxxxxx
GROQ_API_KEY=gsk_xxxxxxxxxxxxx
OPENAI_API_KEY=sk-xxxxxxxxxxxxx
```

### 3. Error Handling

```python
from langchain_groq import ChatGroq
from langchain.schema import HumanMessage

model = ChatGroq(model="llama-3.1-8b-instant")

try:
    response = model.invoke("Your question")
    print(response.content)
except Exception as e:
    print(f"Error: {e}")
    # Fallback logic
```

### 4. Rate Limiting

```python
import time
from langchain_groq import ChatGroq

model = ChatGroq(model="llama-3.1-8b-instant")

questions = ["Q1", "Q2", "Q3"]

for q in questions:
    response = model.invoke(q)
    print(response.content)
    time.sleep(1)  # Avoid rate limits
```

### 5. Cost Optimization

- **Use smaller models** when possible (phi3 vs llama-70b)
- **Limit max_tokens** to reduce costs
- **Cache responses** for repeated queries
- **Use local models** for development
- **Monitor usage** with provider dashboards

### 6. Performance Optimization

```python
# For local models: use GPU
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

# For API models: use async for parallel requests
import asyncio
from langchain_groq import ChatGroq

async def get_response(question):
    model = ChatGroq(model="llama-3.1-8b-instant")
    return await model.ainvoke(question)

# Process multiple questions in parallel
questions = ["Q1", "Q2", "Q3"]
responses = await asyncio.gather(*[get_response(q) for q in questions])
```

### 7. Testing Models

```python
# Test with simple question first
test_question = "What is 2+2?"
response = model.invoke(test_question)
print(f"Model works: {response.content}")

# Verify model parameters
print(f"Model: {model.model}")
print(f"Temperature: {model.temperature}")
```

### 8. Model Switching

```python
# Easy to switch between models
def get_model(provider="groq"):
    if provider == "groq":
        return ChatGroq(model="llama-3.1-8b-instant")
    elif provider == "ollama":
        return ChatOllama(model="phi3")
    elif provider == "hf":
        llm = HuggingFaceEndpoint(repo_id="meta-llama/Llama-3.2-1B-Instruct")
        return ChatHuggingFace(llm=llm)

# Use any model with same interface
model = get_model("groq")
response = model.invoke("Hello!")
```

## Summary

### Models Covered

1. **HuggingFaceEndpoint**: API access to Hugging Face models
2. **ChatHuggingFace**: Chat wrapper for HuggingFace models
3. **ChatGroq**: Ultra-fast inference with Groq's LPU
4. **ChatOllama**: Easy local model execution
5. **HuggingFacePipeline**: Fully local model inference

### Key Takeaways

1. **Standardized Interface**: All models use `.invoke()` - easy to switch
2. **Chat Models Recommended**: Better for most modern applications
3. **Speed**: Groq > Local GPU > HuggingFace API > Local CPU
4. **Privacy**: Local (Ollama/Pipeline) > API-based
5. **Cost**: Local (free) > HuggingFace (free tier) > Groq (free tier) > OpenAI (paid)
6. **Parameters Matter**: Temperature, max_tokens, top_p affect output quality

### Quick Decision Guide

```
Need speed? → ChatGroq
Need privacy? → ChatOllama or HuggingFacePipeline
Need free? → HuggingFaceEndpoint or ChatOllama
Need offline? → ChatOllama or HuggingFacePipeline
Need best quality? → ChatGroq (70B) or GPT-4
Just learning? → HuggingFaceEndpoint or ChatOllama
```

### Next Steps in LangChain

After understanding models:
1. **Prompts**: Learn to craft effective prompts with PromptTemplate
2. **Output Parsers**: Structure model outputs (JSON, Pydantic)
3. **Chains**: Combine models with prompts and parsers
4. **Memory**: Add conversation history
5. **Agents**: Build autonomous AI systems

### Additional Resources

- [LangChain Models Documentation](https://python.langchain.com/docs/modules/model_io/models/)
- [Hugging Face Hub](https://huggingface.co/models)
- [Groq Console](https://console.groq.com)
- [Ollama Models](https://ollama.ai/library)
- [Model Comparison](https://artificialanalysis.ai/)