# LLM APIs and Ollama

_IMPORTANT: If you're not as familiar with APIs in general, and with Environment Variables on your PC or Mac, please review the APIs section in Guide 4 Technical Foundations before proceeding with this guide!_

## Introduction to LLM APIs

Throughout the course, we use APIs for connecting with the strongest LLMs on the planet.

OpenAI has a python client library which simplifies the way we call OpenAI on the cloud:

```python
# Specify which model to use, and the prompts
MODEL = "gpt-4o-mini"
messages = [{"role": "user", "content": "what is 2+2?"}]

# Create an OpenAI python client for making web calls to OpenAI
openai = OpenAI()

# Make the call
response = openai.chat.completions.create(model=MODEL, messages=messages)
print(response.choices[0].message.content)
```

Several other LLMs, including Google Gemini and DeepSeek, have API endpoints that are compatible with OpenAI. In fact, it's almost everyone aside from Anthropic!

OpenAI has made their python client library available for others to use by switching the URL from their URL to somewhere else:

`not_actually_openai = OpenAI(base_url="https://somewhere.completely.different/", api_key="another_providers_key")`

It's important to realize that this OpenAI code is just a utility for making https calls to endpoints. There's no LLM code here - just a wrapper around a network call.

Many API providers offer an OpenAI compatible endpoint that you can use as your base_url. Here are some popular ones:

```python
DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"
GROK_BASE_URL = "https://api.x.ai/v1"
GROQ_BASE_URL = "https://api.groq.com/openai/v1"
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OLLAMA_BASE_URL = "http://localhost:11434/v1"
```

### Using different API providers with Agent Frameworks

The Agent Frameworks make it easy to switch between these providers. You can switch LLMs and pick different ones at any point in the course. You may need to look up in the framework docs how best to switch to a different API, or ask me. For OpenAI Agents SDK, see a section later in this notebook. For CrewAI, we cover it on the course, but it's easy: just use the full path to the model that LiteLLM expects.

## Costs of APIs

The cost of each API call is very low indeed - most calls to models we use on this course are fractions of cents.

But it's extremely important to note:

1. A complex Agentic project could involve many LLM calls - perhaps 20-30 - and so it can add up. It's important to set limits and monitor usage.

2. With Agentic AI, there is a risk of Agents getting into a loop or carrying out more processing than intended. You should monitor your API usage, and never put more budget than you are comfortable with. Some APIs have an "auto-refill" setting that can charge automatically to your card - I strongly recommend you keep this off.

3. You should only spend what you are comfortable with. There is a free alternative in Ollama that you can use as a replacement if you wish. DeepSeek, Gemini 2.5 Flash and gpt-4.1-nano are significantly cheaper.

Keep in mind that these LLM calls typically involve trillions of floating point calculations - someone has to pay the electricity bills!

### Ollama: Free alternative to Paid APIs (but please see Warning about llama version)

Ollama is a product that runs locally on your machine. It can run open-source models, and it provides an API endpoint on your computer that is compatible with OpenAI.

First, download Ollama by visiting:
https://ollama.com

Then from your Terminal in Cursor (View menu >> Terminal), run this command to download a model:

`ollama pull llama3.2`

WARNING: Be careful not to use llama3.3 or llama4 - these are much larger models that are not suitable for home computers.

And now, any time that we have code like:  
`openai = OpenAI()`  
You can use this as a direct replacement:  
`openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')`  
And also replace model names like **gpt-4o-mini** with **llama3.2**.  

You don't need to put anything in your .env file for this; with Ollama, everything is running on your computer. You're not calling out to a third party on the cloud, nobody has your credit card details, so there's no need for a secret key! The code `api_key='ollama'` above is only required because the OpenAI client library expects an api_key to be passed in, but the value is ignored by Ollama.

Below is a full example:

```python
# You need to do this one time on your computer
!ollama pull llama3.2

from openai import OpenAI
MODEL = "llama3.2"
openai = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

response = openai.chat.completions.create(
 model=MODEL,
 messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

print(response.choices[0].message.content)
```

You will need to make similar changes to use Ollama within any of the Agent Frameworks - you should be able to google for an exact example, or ask me.

### OpenRouter: Convenient gateway platform for OpenAI and others

OpenRouter is a third party service that allows you to connect to a wide range of LLMs, including OpenAI.

It's known for having a simpler billing process that may be easier for some countries outside the US.

First, check out their website:  
https://openrouter.ai/

Then, take a peak at their quickstart:  
https://openrouter.ai/docs/quickstart

And add your key to your .env file:  
`OPENROUTER_API_KEY=sk-or....`

And now, any time you have code like this:  
```python
MODEL = "gpt-4o-mini"
openai = OpenAI()
```

You can replace it with code like this:

```python
MODEL = "openai/gpt-4o-mini"
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")
openai = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=openrouter_api_key)

response = openai.chat.completions.create(
 model=MODEL,
 messages=[{"role": "user", "content": "What is 2 + 2?"}]
)

print(response.choices[0].message.content)
```

You will need to make similar changes to use OpenRouter within any of the Agent Frameworks - you should be able to google for an exact example, or ask me.

## OpenAI Agents SDK - specific instructions

With OpenAI Agents SDK (weeks 2 and 6), it's particularly easy to use any model provided by OpenAI themselves. Simply pass in the model name:

`agent = Agent(name="Jokester", instructions="You are a joke teller", model="gpt-4o-mini")`

You can also substitute in any other provider with an OpenAI compatible API. You do it in 3 steps like this:

```python
DEEPSEEK_BASE_URL = "https://api.deepseek.com/v1"
deepseek_client = AsyncOpenAI(base_url=DEEPSEEK_BASE_URL, api_key=deepseek_api_key)
deepseek_model = OpenAIChatCompletionsModel(model="deepseek-chat", openai_client=deepseek_client)
```

And then you simply provide this model when you create an Agent.

`agent = Agent(name="Jokester", instructions="You are a joke teller", model=deepseek_model)`

And you can use a similar approach for any other OpenAI compatible API, with the same 3 steps:

```python
# Specify the base URL endpoints where the provider offers an OpenAI compatible API
GEMINI_BASE_URL = "https://generativelanguage.googleapis.com/v1beta/openai/"
GROK_BASE_URL = "https://api.x.ai/v1"
GROQ_BASE_URL = "https://api.groq.com/openai/v1"
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OLLAMA_BASE_URL = "http://localhost:11434/v1"

# Create an AsyncOpenAI object for that endpoint
gemini_client = AsyncOpenAI(base_url=GEMINI_BASE_URL, api_key=google_api_key)
grok_client = AsyncOpenAI(base_url=GROK_BASE_URL, api_key=grok_api_key)
groq_client = AsyncOpenAI(base_url=GROQ_BASE_URL, api_key=groq_api_key)
openrouter_client = AsyncOpenAI(base_url=OPENROUTER_BASE_URL, api_key=openrouter_api_key)
ollama_client = AsyncOpenAI(base_url=OLLAMA_BASE_URL, api_key="ollama")

# Create a model object to provide when creating an Agent
gemini_model = OpenAIChatCompletionsModel(model="gemini-2.5-flash", openai_client=gemini_client)
grok_3_model = OpenAIChatCompletionsModel(model="grok-3-mini-beta", openai_client=openrouter_client)
llama3_3_model = OpenAIChatCompletionsModel(model="llama-3.3-70b-versatile", openai_client=groq_client)
grok_3_via_openrouter_model = OpenAIChatCompletionsModel(model="x-ai/grok-3-mini-beta", openai_client=openrouter_client)
llama_3_2_local_model = OpenAIChatCompletionsModel(model="llama3.2", openai_client=ollama_client)
```

### To use Azure with OpenAI Agents SDK

See instructions here:  
https://techcommunity.microsoft.com/blog/azure-ai-services-blog/use-azure-openai-and-apim-with-the-openai-agents-sdk/4392537

Such as this:
```python
from openai import AsyncAzureOpenAI
from agents import set_default_openai_client
from dotenv import load_dotenv
import os
 
# Load environment variables
load_dotenv()
 
# Create OpenAI client using Azure OpenAI
openai_client = AsyncAzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT")
)
 
# Set the default OpenAI client for the Agents SDK
set_default_openai_client(openai_client)
```

## CrewAI setup

Here's Crew's docs for LLM connections with the model names to use for all models. As student Sadan S. pointed out (thank you!), it's worth knowing that for Google you need to use the environment variable `GEMINI_API_KEY` instead of `GOOGLE_API_KEY`:

https://docs.crewai.com/concepts/llms

And here's their tutorial with some more info:

https://docs.crewai.com/how-to/llm-connections

## LangGraph setup

To use LangGraph with Ollama (and follow similar for other models):  
https://python.langchain.com/docs/integrations/chat/ollama/#installation

First add the package:  
`uv add langchain-ollama`

Then in the lab, make this replacement:   
```python
from langchain_ollama import ChatOllama
# llm = ChatOpenAI(model="gpt-4o-mini")
llm = ChatOllama(model="gemma3:4b")
```

And obviously run `!ollama pull gemma3:4b` (or whichever model) beforehand.

Many thanks to Miroslav P. for adding this, and to Arvin F. for the question!

## LangGraph with other models

Just follow the same recipe as above, but use any of the models from here:  
https://python.langchain.com/docs/integrations/chat/



## AutoGen with other models

Here's another contribution from Miroslav P. (thank you!) for using Ollama + local models with AutoGen, and Miroslav has a great example showing gemma3 performing well.

```python
# model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
 
from autogen_ext.models.ollama import OllamaChatCompletionClient
 
model_client = OllamaChatCompletionClient(
    model="gemma3:4b",
    model_info={
        "vision": True,
        "function_calling": False,
        "json_output": True,
        "family": "unknown",
    },
)
```

## Worth keeping in mind

1. If you wish to use Ollama to run models locally, you may find that smaller models struggle with the more advanced projects. You'll need to experiment with different model sizes and capabilities, and plenty of patience may be needed to find something that works well. I expect several of our projects are too challenging for llama3.2. As an alternative, consider the free models on openrouter.ai, or the very cheap models that are almost free - like DeepSeek.

2. Chat models often do better than Reasoning models because Reasoning models can "over-think" some assignments. It's important to experiment. Bigger isn't always better...

3. It's confusing, but there are 2 different providers that sound similar!  
- Grok is the LLM from Elon Musk's X
- Groq is a platform for fast inference of open source models

A student pointed out to me that "Groq" came first!
