# How to use LiteLLM and Smolagents with Amazon Nova models

In this tutorial we are going to:

1. Configure a LiteLLM load-balancer with Amazon Nova models. 
2. Configure LiteLLM with the Hugging Face smolagents agentic framework with Cross-Region Inference with Amazon Nova.

Before running this notebook:
- Make sure you have enabled the models Amazon Micro and Pro in your AWS Console in us-east-1 and us-west-2. For more informations about the process check the following link: 
https://docs.aws.amazon.com/bedrock/latest/userguide/model-access-modify.html
- Configure your AWS credentials: https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/setup-credentials.html
- AWS IAM permissions required: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-prereq.html

## Install requirements

In [1]:
%pip install litellm==1.59.8 smolagents==1.6.0 boto3==1.36.8 --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m25.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## 1. LiteLLM and Amazon Nova
LiteLLM is an open-source framework which handles loadbalancing, fallbacks and spend tracking across 100+ LLMs. We are going to use Amazon Nova Micro and Pro via the LiteLLM Router

In [2]:
import os
from litellm import Router

# aws region
os.environ["AWS_REGION_NAME"] = "us-east-1"

* 'fields' has been removed


 Each of the models will have different Requests Per Minute (RPM) configurations, feel free to modify according to your usage. 

In [3]:
model_list = [
    {
        "model_name": "nova_models",
        "litellm_params": { 
            "model": "bedrock/amazon.nova-micro-v1:0",          #nova micro model id
            "aws_region_name": os.getenv("AWS_REGION_NAME"),    #aws region
            #TPM and RPM configuration - this is going to be used by the router
            "tpm": 400000,
            "rpm": 200,
        },
    }, 
    {
        "model_name": "nova_models", 
        "litellm_params": {
            "model": "bedrock/amazon.nova-pro-v1:0",
            "aws_region_name": os.getenv("AWS_REGION_NAME"),    #aws region
            #TPM and RPM configuration - this is going to be used by the router
            "tpm": 400000,
            "rpm": 100,
        }
    }
]

Here is our router:

In [4]:
router = Router(
    model_list=model_list,
    routing_strategy="usage-based-routing-v2",
    enable_pre_call_checks=True,
    num_retries=1,
    set_verbose=False,
    allowed_fails=1,
    cooldown_time=100,
    disable_cooldowns=False
)

## Testing router
It is time to test our router.

In [5]:
response = await router.acompletion(
    model="nova_models",
    messages=[{"role": "user", "content": "Hey, how's it going?"}],
    temperature=0.1,
)

print(response.choices[0].message.content)

Hey there! I'm doing great, thanks for asking. How can I assist you today? Whether you have questions, need some information, or just want to chat, I'm here for you. What's on your mind?


## 2. Hugging Face Smolagents
The Hugging Face smolagents is an open-source agentic framework. We are going to use the Smolagents LiteLLM integration to invoke Amazon Nova Pro model. Instead of using the LiteLLM load-balancing, this time we will be using the Amazon Bedrock Cross-Region Inference, which will automatically route our inference requests between *us-east-1 and us-west-2*.

In [6]:
from smolagents import CodeAgent, DuckDuckGoSearchTool, LiteLLMModel

model = LiteLLMModel(model_id="bedrock/us.amazon.nova-pro-v1:0")

# The DuckDuckGoSearchTool enables our agent to search and retrieve results from the DuckDuckGo search engine. 
# The tools calls will be formulated by the LLM in code format, then parsed and executed.
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)

agent.run("How many seconds would it take for a leopard at full speed to run through Pont des Arts?")

  from .autonotebook import tqdm as notebook_tqdm


9.62

## Adding custom tools
Lets add a second tool to our agent, a custom tool which return the weather details using a static response.

In [7]:
from smolagents import tool

@tool
def get_weather(location: str) -> str:
    """
    Get weather in the next days at given location.

    Args:
        location: the location
    """
    return "The weather is freezing cold with torrential rains and temperatures below -10°C"


agent = CodeAgent(tools=[DuckDuckGoSearchTool(), get_weather], model=model)

print("ToolCallingAgent:", agent.run("What's the weather like in Paris?"))

ToolCallingAgent: The weather in Paris is freezing cold with torrential rains and temperatures below -10°C.
