# Demonstration of the Granite citations intrisic

This notebook shows the usage of the IO processor for the Granite citations intrisic, 
also known as the [LoRA Adapter for Citation Generation](
    https://huggingface.co/ibm-granite/granite-3.2-8b-lora-rag-citation-generation
)

This notebook can run its own vLLM server to perform inference, or you can host the 
models on your own server. To use your own server, set the `run_server` variable below
to `False` and set appropriate values for the constants 
`openai_base_url`, `openai_base_model_name` and `openai_lora_model_name`.

In [None]:
from granite_io.io.granite_3_2.input_processors.granite_3_2_input_processor import (
    Granite3Point2Inputs,
)
from granite_io import make_io_processor, make_backend
from IPython.display import display, Markdown
from granite_io.backend.vllm_server import LocalVLLMServer
from granite_io.io.citations import CitationsCompositeIOProcessor, CitationsIOProcessor
from granite_io.visualization import CitationsWidget

In [None]:
# Constants go here
base_model_name = "ibm-granite/granite-3.2-8b-instruct"
lora_model_name = "ibm-granite/granite-3.2-8b-lora-rag-citation-generation"
run_server = True

In [None]:
if run_server:
    # Start by firing up a local vLLM server and connecting a backend instance to it.
    server = LocalVLLMServer(
        base_model_name, lora_adapters=[(lora_model_name, lora_model_name)]
    )
    server.wait_for_startup(200)
    lora_backend = server.make_lora_backend(lora_model_name)
    backend = server.make_backend()
else:  # if not run_server
    # Use an existing server.
    # Modify the constants here as needed.
    openai_base_url = "http://localhost:55555/v1"
    openai_api_key = "granite_intrinsics_1234"
    openai_base_model_name = base_model_name
    openai_lora_model_name = lora_model_name
    backend = make_backend(
        "openai",
        {
            "model_name": openai_base_model_name,
            "openai_base_url": openai_base_url,
            "openai_api_key": openai_api_key,
        },
    )
    lora_backend = make_backend(
        "openai",
        {
            "model_name": openai_lora_model_name,
            "openai_base_url": openai_base_url,
            "openai_api_key": openai_api_key,
        },
    )

In [None]:
# Create an example chat completion with a user question and two documents.
chat_input = Granite3Point2Inputs.model_validate(
    {
        "messages": [
            {
                "role": "user",
                "content": "What is the visibility level of Git Repos and Issue \
Tracking projects?",
            }
        ],
        "documents": [
            {
                "text": "Git Repos and Issue Tracking is an IBM-hosted component of \
the Continuous Delivery service. All of the data that you provide to Git Repos and \
Issue Tracking, including but not limited to source files, issues, pull requests, and \
project configuration properties, is managed securely within Continuous Delivery. \
However, Git Repos and Issue Tracking supports various mechanisms for exporting, \
sending, or otherwise sharing data to users and third parties. The ability of Git \
Repos and Issue Tracking to share information is typical of many social coding \
platforms. However, such sharing might conflict with regulatory controls that \
apply to your business. After you create a project in Git Repos and Issue Tracking, \
but before you entrust any files, issues, records, or other data with the project, \
review the project settings and change any settings that you deem necessary to \
protect your data. Settings to review include visibility levels, email notifications, \
integrations, web hooks, access tokens, deploy tokens, and deploy keys. Project \
visibility levels \n\nGit Repos and Issue Tracking projects can have one of the \
following visibility levels: private, internal, or public. * Private projects are \
visible only to project members. This setting is the default visibility level for new \
projects, and is the most secure visibility level for your data. * Internal projects \
are visible to all users that are logged in to IBM Cloud. * Public projects are \
visible to anyone. To limit project access to only project members, complete the \
following steps:\n\n\n\n1. From the project sidebar, click Settings > General. \
2. On the General Settings page, click Visibility > project features > permissions. \
3. Locate the Project visibility setting. 4. Select Private, if it is not already \
selected. 5. Click Save changes. Project membership \n\nGit Repos and Issue Tracking \
is a cloud hosted social coding environment that is available to all Continuous \
Delivery users. If you are a Git Repos and Issue Tracking project Maintainer or Owner, \
you can invite any user and group members to the project. IBM Cloud places no \
restrictions on who you can invite to a project."
            },
            {
                "text": "After you create a project in Git Repos and Issue Tracking, \
but before you entrust any files, issues, records, or other data with the project, \
review the project settings and change any settings that are necessary to protect your \
data. \
Settings to review include visibility levels, email notifications, integrations, web \
hooks, access tokens, deploy tokens, and deploy keys. Project visibility levels \
\n\nGit Repos and Issue Tracking projects can have one of the following visibility \
levels: private, internal, or public. * Private projects are visible only to \
project members. This setting is the default visibility level for new projects, and \
is the most secure visibility level for your data. * Internal projects are visible to \
all users that are logged in to IBM Cloud. * Public projects are visible to anyone. \
To limit project access to only project members, complete the following \
steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the \
General Settings page, click Visibility > project features > permissions. 3. Locate \
the Project visibility setting. 4. Select Private, if it is not already selected. \
5. Click Save changes. Project email settings \n\nBy default, Git Repos and Issue \
Tracking notifies project members by way of email about project activities. These \
emails typically include customer-owned data that was provided to Git Repos and Issue \
Tracking by users. For example, if a user posts a comment to an issue, Git Repos and \
Issue Tracking sends an email to all subscribers. The email includes information such \
as a copy of the comment, the user who posted it, and when the comment was posted. \
To turn off all email notifications for your project, complete the following \
steps:\n\n\n\n1. From the project sidebar, click Settings > General. 2. On the \
**General Settings **page, click Visibility > project features > permissions. \
3. Select the Disable email notifications checkbox. 4. Click Save changes. Project \
integrations and webhooks"
            },
        ],
        "generate_inputs": {"temperature": 0.0, "max_tokens": 1024},
    }
)
chat_input

In [None]:
# Pass the example input through Granite 3.2 to get an answer
granite_io_proc = make_io_processor("Granite 3.2", backend=backend)
result = await granite_io_proc.acreate_chat_completion(chat_input)

display(Markdown(result.results[0].next_message.content))

In [None]:
# Append the model's output to the chat
next_chat_input = chat_input.with_next_message(result.results[0].next_message)
next_chat_input.messages

In [None]:
# Instantiate the I/O processor for the citations LoRA adapter
io_proc = CitationsIOProcessor(lora_backend)

# Pass our example input thorugh the I/O processor and retrieve the result
chat_result = await io_proc.acreate_chat_completion(next_chat_input)

next_message = chat_result.results[0].next_message
print(next_message.model_dump_json(indent=2))

In [None]:
# Visualize citations using citations widget
CitationsWidget().show(next_chat_input, chat_result)

In [None]:
# We also get raw output of the LoRA adapter for debugging
next_message._raw

In [None]:
# Try with an artifical poor-quality assistant response.
from granite_io.types import AssistantMessage

chat_result_2 = await io_proc.acreate_chat_completion(
    chat_input.with_next_message(
        AssistantMessage(
            content="Git repos are generally only visible in the infrared "
            "spectrum, due to their natural camouflage. Issue Tracking projects "
            "are much easier to see; their bright colors warn predators of the "
            "poisonous technical debt that they secrete."
        )
    ).with_addl_generate_params({"temperature": 0.0})
)
chat_result_2.results[0].next_message

In [None]:
# Create a composite citations processor that generates a response and adds citations
composite_proc = CitationsCompositeIOProcessor(granite_io_proc, lora_backend)

# Note that this codes passes in the original chat input, without an assistant response
chat_result_4 = await composite_proc.acreate_chat_completion(chat_input)
print(chat_result_4.model_dump_json(indent=2))

In [None]:
# We can also ask the composite IO processor to generate multiple completions, in
# which case it will generate citations for all completions in parallel.
chat_result_5 = await composite_proc.acreate_chat_completion(
    chat_input.with_addl_generate_params({"n": 5, "temperature": 0.7})
)
# print(chat_result_5.model_dump_json(indent=2))

for result in chat_result_5.results:
    print(f"Assistant: {result.next_message.content}")
    print(f"           ({len(result.next_message.citations)} citations)")

In [None]:
# Free up GPU resources
if "server" in locals():
    server.shutdown()