
# Accesing Watsonx.ai via REST API

In this lab, we will take a comprehensive look at making HTTP requests to access Watsonx.ai's REST API and learn how to use the functionality. This lab explore only a few of the many REST endpoints available so explore the REST API documentation to view the full list of capabilities.

Before you start you should have the necessary items to access watsonx.ai programmatically, them being:

- your IBM Cloud API key
- the IBM Cloud service URL tied to your instance
- the Project ID associated with your watsonx instance

In [None]:
# first we'll start by installing some dependencies
import sys
!{sys.executable} -m pip install -q requests
!{sys.executable} -m pip install -q ibm-cloud-sdk-core
!{sys.executable} -m pip install -q ibm-watson-machine-learning==1.0.311

In [None]:
# and verifying they've install correctly
import json
import requests
from ibm_cloud_sdk_core import IAMTokenManager
from ibm_watson_machine_learning.foundation_models import Model
from ibm_watson_machine_learning.metanames import GenTextParamsMetaNames as GenParams

## HTTP request headers

Headers contain parameter values that represent the metadata associated with an API requests and response. In the following example, the Authorization header provides the server with credentials to validate your access. Watsonx.ai uses a "Bearer" access token which is used to pass our Watsonx.ai authentication key. The `Content-type` header in the request is added to tell the server or the browser which is serving the resource to the end user about the media type of the request. In this case, type of expected data as `application/json`.

In [None]:
# THESE ARE THE VALUE YOU'LL NEED TO FILL IN YOURSELF
# API key you created
api_key = "INSERT YOUR KEY HERE"

# Project ID of your watsonx instance
project_id = "INSERT YOUR PROJECT ID HERE"

# URL service endpoint
ibm_cloud_url = "INSERT YOUR SERVICE URL HERE"

access_token = ''
try:
  access_token = IAMTokenManager(
    apikey = api_key,
    url = "https://iam.cloud.ibm.com/identity/token"
  ).get_token()
except:
  print('Issue obtaining access token. Check variables?')

# The headers we'll send for both our POST and GET requests
headers = {
  "Authorization": "Bearer " + access_token,
  "Content-Type": "application/json",
  "Accept": "application/json"
}

## POST vs GET

HTTP requests come in two flavors: GET and POST. When using GET, data parameters are included in the URL and visible to everyone. However, when using POST, data is not displayed in the URL but is instead passed in the HTTP message body.

GET requests are intended to retrieve data from a server and do not modify the server’s state. On the other hand, POST requests are used to send data to the server for processing and may modify the server’s state.

## POST requests with 'Generate' endpoint
The generate endpoint "https://us-south.ml.cloud.ibm.com/ml/v1-beta/generation/text" provides an interface for sending prompts to any model supported by Watsonx.ai. Given a text prompt as inputs, and required parameters, the selected model will attempt to complete the provided input and return "generated_text".

Request body needs to include:
- Model id (string): the id of the model
- Input (string): prompt to generate completion
- Parameters for the model (key-value pairs)
- your watsonx project ID

> It's important to note the generate endpoint URL is dependent on the location of where your watsonx instance is provisioned

In [None]:
# The body we'll send as part of our POST request
body = {
  "model_id": "google/flan-ul2",
  "input": "Write a short blog post for an advanced cloud service for large language models: This service is",
  "parameters": {
    "temperature": 0,
    "max_new_tokens": 50,
    "min_new_tokens": 25
  },
  "project_id": project_id  
}

In [None]:
version = "2023-07-07"
generation_endpoint = f"{ibm_cloud_url}/ml/v1-beta/generation/text?version={version}"

response = requests.post(url=generation_endpoint, headers=headers, json=body)
print("Raw JSON response:\n", json.dumps(response.json(), indent=2))

## Using GET requests to retrieve data

The GET method is used to retrieve information from Watsonx.ai using a given URL.

### GET /models

This enpoint will get the list of all models currently supported by Watsonx.ai

In [None]:
model_endpoint = f"{ibm_cloud_url}/ml/v1-beta/foundation_model_specs?version={version}"
response = requests.get(url = model_endpoint, headers=headers)
models = response.json()['resources']

print(f"{len(models)} models supported the Watsonx.ai")

# Uncomment below to see the raw JSON
# print(json.dumps(models, indent=2))

# prettify the output just to see the model names
pretty_output = lambda m: m['label']
print(json.dumps(list(map(pretty_output, models)), indent=2))

## Using a Python library

Now that we've seen how to interact with watsonx.ai through the use of a REST API we'll now take a look at how we can interact with it via a Python library as well.

In [None]:
# the credentials to authenticate with ibm cloud
# similar to the headers created before
creds = {
  "url": ibm_cloud_url,
  "apikey": api_key 
}

In [None]:
# a helper function that allows multiple prompts to be sent in at once and parameters for tweaking
def send_to_watsonxai(prompts,
                    model_name="google/flan-ul2",
                    decoding_method="greedy",
                    max_new_tokens=100,
                    min_new_tokens=30,
                    temperature=1.0,
                    repetition_penalty=2.0
                    ):
    '''
   helper function for sending prompts and params to Watsonx.ai
    
    Args:  
        prompts:list list of text prompts
        decoding:str Watsonx.ai parameter "sample" or "greedy"
        max_new_tok:int Watsonx.ai parameter for max new tokens/response returned
        temperature:float Watsonx.ai parameter for temperature (range 0>2)
        repetition_penalty:float Watsonx.ai parameter for repetition penalty (range 1.0 to 2.0)

    Returns: None
        prints response
    '''

    assert not any(map(lambda prompt: len(prompt) < 1, prompts)), "make sure none of the prompts in the inputs prompts are empty"

    # Instantiate parameters for text generation
    model_params = {
        GenParams.DECODING_METHOD: decoding_method,
        GenParams.MIN_NEW_TOKENS: min_new_tokens,
        GenParams.MAX_NEW_TOKENS: max_new_tokens,
        GenParams.RANDOM_SEED: 42,
        GenParams.TEMPERATURE: temperature,
        GenParams.REPETITION_PENALTY: repetition_penalty,
    }


    # Instantiate a model proxy object to send your requests
    model = Model(
        model_id=model_name,
        params=model_params,
        credentials=creds,
        project_id=project_id)


    for prompt in prompts:
        print(model.generate_text(prompt))

In [None]:
prompt = "Write a short blog post for an advanced cloud service for large language models: This service is"

# feel free to test changing the model from one of the values listed before
# and try changing other values as well
response = send_to_watsonxai(
  prompts=[prompt],
  model_name="google/flan-ul2",
  decoding_method="greedy",
  max_new_tokens=100,
  min_new_tokens=30,
  temperature=1.0,
  repetition_penalty=2.0
)

## Conclusion

Although we've used Python as our language of choice in this lab. It's important to note that through the use of an REST API essentially any language can use watsonx.ai programmatically. Meaning that there's no limit to how clients choose to integrate watsonx.ai into their currently existing tech stack. 