# **Chat completions with Writer**

**_Chat completions_** are LLM responses to the user in a conversation or chat. They tend to be shorter than text completions, but are often part of a sequence of exchanges between the user and the model. This cookbook shows how to use the `chat()` method of Writer’s `chat` object to build apps for single-turn and multi-turn chats, with streaming and non-streaming responses.

## **Contents**

- [Introduction](#introduction)
- [Setup](#setup)
- [The `chat` object](#the-chat-object)
- [Single-turn chat completion (non-streaming version)](#single-turn-chat-completion-non-streaming-version)
- [Multi-turn chat completion (non-streaming version)](#multi-turn-chat-completion-non-streaming-version)
- [Single-turn chat completion (streaming version)](#single-turn-chat-completion-streaming-version)
- [Multi-turn chat completion (streaming version)](#multi-turn-chat-completion-streaming-version)
- [For more information](#for-more-information)

<a id="introduction"></a>
## **Introduction**

### What are chat completions?

**In chat completion, the model interacts with the user in a conversational format, _completing_ a chat by responding to them.** While a chat system can be used to get a single response to a single input, chat completion systems are designed to participate in a back-and-forth conversation with the user, which requires it to retain the history of a conversation and derive context from that history to generate useful and meaningful responses as the conversation continues.

Chat completion, as its name implies, is meant as a way to complete a conversation by providing the user with a human-like conversation partner. The term _completion_ is also used to mean the answer that a model provides in response to a user prompt.

There are two general categories of chat completion:

1. **“Single-turn” completions,** where the model is given a single prompt and generates a response or solution in a single step, without any follow-up. You may want to think of “single-turn” chat completions as shorter-form versions of text completions.
2. **“Multi-turn” completions,** where the model and user have a back-and-forth conversation with multiple exchanges. In this kind of completion, the model appears to maintain a “sense of context” because it “remembers” what took place earlier in the conversation.

For each of these categories, there are two options for the way the completion is returned:

1. **Non-streaming:** The model does not return a completion until it has been completely generated. There is a waiting period (typically a few seconds) while the completion is being generated.
2. **Streaming:** The model returns the completion in chunks as it’s being generated. You need to write additional code to collect and assemble these chunks, but there’s almost no waiting period.

<a id="setup"></a>
## **Setup**

### Dependencies

This notebook uses the following packages:

* `ipywidgets`: To draw UI widgets for the streaming versions of the apps.
* `python-dotenv`: To load environment variables.
* `writer-sdk`: To access the Writer API.

Run the cell below ensure you have these packages.

In [None]:
%pip install -r requirements.txt -q

### Initialization

The cell below performs the initialization required for this notebook including the creation of an instance of the `Writer` object to interact with the LLM.

To create a Writer client object, you need an API key. [You can sign up for one for free](https://app.writer.com/aistudio/signup). 

Once you have an API key, we recommend that you store it as an environment variable in a `.env` file like so:

```
WRITER_API_KEY="{Your Writer API key goes here}"
```

When you instantiate the client with `client = Writer()`, the newly-created object will automatically look for an environment variable named `WRITER_API_KEY` and will complete the instantiation if an only if `WRITER_API_KEY` has been defined. This notebook uses the [python-dotenv](https://pypi.org/project/python-dotenv/) library to automatically define environment variables based on the contents of an `.env` file in the same directory.

The `Writer()` initializer method also has an `api_key` parameter that you can use like this...

```
client = Writer(api_key="{Your Writer API key goes here}")
```

...but we strongly encourage you not to leave API keys in your source code.

In [None]:
# Run this cell before running any other cells in this cookbook!

from writerai import Writer

# Load environment variables from .env file
%reload_ext dotenv
%dotenv

client = Writer()

<a id="the-chat-object"></a>
## **The `chat` object**

Now that you have a Writer client instance, it’s time to start building chat completion apps! 

The `chat` property of a Writer client instance contains methods and properties related to chat completion. In all the examples in this cookbook, you’ll build chat completion apps by using the `chat` property’s `chat()` method, which makes requests for chat completions from Palmyra, Writer's custom model.

<a id="single-turn-chat-completion-non-streaming-version"></a>
## **Single-turn chat completion (non-streaming version)**

The cell below contains a simple single-turn chat completion app. When you run it, you will be asked to enter a prompt. After you enter the prompt, you can expect to wait a moment or two while Palmyra generates the complete text of its response. Once Palmyra’s done generating, the app will display the response and finish running.

Try entering a simple question or command that can be answered or satisfied with just one reply (e.g. “What’s the fastest land animal?” or “I need some synonyms for ‘awesome’”).

In [None]:
initial_system_message = {
    "role": "system",
    "content": "You are a helpful assistant. Respond concisely and politely to user queries. Use clear, simple language. When asked for technical explanations, provide detailed and accurate information, but avoid jargon. If the user asks for assistance with a task, offer step-by-step guidance."
}
messages = [initial_system_message]

print("""
Sample single-turn chat completion app
======================================
""")

user_prompt = input("Enter a prompt: ").strip()
user_message = {
    "role": "user",
    "content": user_prompt
}
temperature = float(input("Enter a temperature (0.0 - 2.0), or just press 'Enter' for 1.0: ").strip() or 1.0)
messages.append(user_message)
chat_response = client.chat.chat(
    messages=messages,
    temperature=temperature,
    model="palmyra-x-004",
    stream=False
)
print(f"\n{chat_response.choices[0].message.content}\n\n")

### Notes

#### Messages

LLMs with conversational interfaces like chat completions keep track of conversations using a list of messages categorized by roles. With Palmyra, there are three roles:

1. **`user`**: This role is for messages containing user input or prompts, which Palmyra responds to. From the user’s point of view, `user` messages drive the conversation. An example of a `user` message is “What’s the fastest land animal?”
2. **`assistant`**: This role is for messages containing Palmyra’s responses to the user’s input. Palmyra is playing the role of AI assistant. An example of an `assistant` message is “The fastest land animal is the cheetah. It can reach speeds up to 70-75 mph (112-120 km/h) in short bursts.”
3. **`system`**: This role is for messages that function as additional instructions or that define how the assistant should behave during the conversation. A `system` message guides the behavior or personality of the assistant, specifying how Palmyra should respond, and is typically set at the start of a conversation. An example of a `system` message is “You are a knowledgeable assistant that provides concise answers to technical questions.”

#### Calling the `chat()` method

The heart of the single-turn chat completion app is this line:

```python
chat_response = client.chat.chat(
    messages=messages,
    temperature=temperature,
    model="palmyra-x-004",
    stream=False
)
```

It makes a call to the client instance’s `chat` object’s `chat()` method, which requests a chat completion from Palmyra. In order to get that completion, the call provides arguments for the following parameters:

<table width="66%">
    <tr>
        <th width="25%" style="background-color: #5551ff; color: #ffffff;">Parameter</th>
        <th style="background-color: #5551ff; color: #ffffff;">Description</th>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>messages</code></td>
        <td style="border: 1px solid #bfcbff;">
            <p>A list containing message dictionaries that Palmyra will use as a basis for the completion it will return as its response. Message dictionaries have the following keys:</p>
            <ul>
                <li><code>role</code>: Determines the message type. Valid values are <code>user</code>, <code>assistant</code>, and <code>system</code>. See _Messages_ above for details.</li>
                <li><code>content</code>: The actual text of the message.</li>
            </ul>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>temperature</code><br />(optional)</td>
        <td style="border: 1px solid #bfcbff;">
            <p>A float that controls the level of randomness in the text that Palmyra generates:</p>
            <ul>
                <li>The default value is 1.</li>
                <li>At temperatures <em>below</em> 1, the responses are more deterministic and predictable, with Palmyra tending to choose the highest probability tokens based on previously-generated ones. The generated output is predictable and repetitive, and produces more "safe" or "obvious" answers.</li>
                <li>At temperatures <em>above</em> 1, the responses are more random and “imaginative,”  with Palmyra giving less probable tokens a better chance of being chosen. The generated output is less predictable, and produces more “creative” answers. The results become increasingly nonsensical at temperatures of about 1.4 and higher, especially for longer completions.</li>
            </ul>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>model</code></td>
        <td style="border: 1px solid #bfcbff;">
            A string specifying which model to use. In this case, we’re using the latest model
            at the time of writing, <code>palmyra-x-004</code>.
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>stream</code></td>
        <td style="border: 1px solid #bfcbff;">
            A boolean specifying if the method should stream the completion in chunks
            in real time as Palmyra generates it (<code>True</code>) or wait until Palmyra
            finishes generating the completion before returning a value (<code>False</code>).
            Since we want the completion all at once, we set this parameter to <code>False</code>.
        </td>
    </tr>
</table>

#### The response to the `chat()` method

When `chat()` is called with its `stream` parameter set to `False`, it returns a `Chat` object with the following properties:

<table  width="66%">
    <tr>
        <th width="25%" style="background-color: #5551ff; color: #ffffff;">Property</th>
        <th style="background-color: #5551ff; color: #ffffff;">Description</th>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>id</code></td>
        <td style="border: 1px solid #bfcbff;">A string containing the <code>Chat</code> object’s unique identifier.</td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>choices</code></td>
        <td style="border: 1px solid #bfcbff;">
            <p>A list of <code>Choice</code> objects representing possible completions (usually containing just one). Each <code>Choice</code> object has the following properties:</p>
            <ul>
                <li><code>finish_reason</code>: The reason Palmyra stopped generating the response. Possible values include <code>"stop"</code> for a complete response, and <code>"length"</code> if the response was truncated.</li>
                <li><code>message</code>: A <code>ChoiceMessage</code> object with two properties, <code>content</code> and <code>role</code>, which serve the same purpose as the <code>content</code> and <code>role</code> keys in a message dictionary.</li>
            </ul>
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>created</code></td>
        <td style="border: 1px solid #bfcbff;">An integer representing the time when the response was created as a Unix timestamp. You can use this to compare the timing of the response with timings of other events.
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>model</code></td>
        <td style="border: 1px solid #bfcbff;">A string specifying the model that generated the response.
        </td>
    </tr>
</table>

Of all the properties listed above, the one we’re most interested in is the `message` property, and within the `ChoiceMessage` object it contains, we’re only interested in its `content` property, which contains the completion that Palmyra generated. 

<a id="multi-turn-chat-completion-non-streaming-version"></a>
## **Multi-turn chat completion (non-streaming version)**

The cell below contains a basic multi-turn chat completion app. When you run it, you will be able to have an ongoing conversation with Palmyra until you enter a blank line, which stops the app. The app displays the number of prompts you have entered so far.

Like the single-turn app above, when this app requests completions from Palmyra, there’s a pause while it generates the complete text of its response. Once Palmyra’s done generating, the app will display the response and finish running.
 
Unlike the single-turn app, this app not only lets you enter more than one prompt, but also maintains a record of the conversation in the `messages` list — both the user’s messages (the ones where the value of the `"role"` key is `"user"`), and Palmyra’s replies (messages where the value of the `"role"` key is `"assistant"`). You can see the contents of `messages` while the app is running by entering `!messages` as a prompt (it will not count as part of the conversation).

Try entering a question or command first, and then enter follow-up requests to confirm that the app “remembers” previous parts of the conversation. For example, try “What's the fastest bird?" for your first prompt, then "Tell me more" for your second prompt.

In [None]:
user_prompt_count = 1
initial_system_message = {
    "role": "system",
    "content": "You are a helpful assistant. Respond concisely and politely to user queries. Use clear, simple language. When asked for technical explanations, provide detailed and accurate information, but avoid jargon. If the user asks for assistance with a task, offer step-by-step guidance."
}
messages = [initial_system_message]

print("""
Sample multi-turn chat completion app
=====================================
""")
temperature = float(input("Enter a temperature (0.0 - 2.0) for the chat, or just press 'Enter' for 1.0: ").strip() or 1.0)

while True:
    user_prompt = input(f"[{user_prompt_count}]\nEnter a prompt: ").strip()
    if not user_prompt:
        break

    if user_prompt == "!messages":
        print(f"\nContents of `messages` (this will not be included as part of the conversation):")
        print("-------------------------------------------------------------------------------")
        print(f"{messages}\n\n")
        continue

    user_prompt_count +=1
    user_message = {
        "role": "user",
        "content": user_prompt
    }
    messages.append(user_message)
    
    chat_response = client.chat.chat(
        messages=messages,
        temperature=temperature,
        model="palmyra-x-004",
        stream=False
    )
    chat_response_role = chat_response.choices[0].message.role
    chat_response_content = chat_response.choices[0].message.content
    print(f"---\n{chat_response_content}\n\n")
    
    response_message = {
        "role": chat_response_role,
        "content": chat_response_content
    }
    messages.append(response_message)

### Notes

Aside from running within a loop, the key difference between this version and the single-turn version is that this version keeps track of the conversation by constantly growing the `messages` list. 

When the user enters a prompt, that prompt gets added to `messages` as a `user` message, and when Palmyra generates a response, it gets added to `messages` as an `assistant` message. Each time the app calls the `chat` method, it provides the latest version of the `messages` list as the argument for the `messages` parameter, giving Palmyra a complete record of the conversation so far, and with it, context.

<a id="single-turn-chat-completion-streaming-version"></a>
## **Single-turn chat completion (streaming version)**

The cell below is a _streaming_ version of the single-turn chat completion app. With this app, after you enter the prompt, you will immediately see the completion as Palmyra generates it as a stream of text in a manner similar to a lot of AI chat apps.

Note that this version of the app uses the [Jupyter Widgets](https://ipywidgets.readthedocs.io/en/stable/) library to provide a graphical user interface for the app. There’s a reason, which will be explained in the notes in the cell after the code.

In [None]:
from ipywidgets import Button, ButtonStyle, FloatSlider, Layout, Text, Textarea, VBox

def display_ui():
    prompt_text_box = Text(
        value="",
        placeholder="Enter your prompt here",
        description="Prompt:",
        style={"description_width": "100px"},
        layout=Layout(width="500px"),
        continuous_update=False,
        disabled=False   
    )
    temperature_slider = FloatSlider(
        min=0.0,
        max=2.0,
        value=1.0,
        step=0.1,
        orientation="horizontal",
        description="Temperature:",
        style={"description_width": "100px"},
        layout=Layout(width="500px"),
        continuous_update=False,
        readout=True,
        readout_format=".1f",
        disabled=False
    )
    submit_button = Button(
        description="Submit",
        tooltip="Click me",
        style=ButtonStyle(button_color="thistle", font_weight="bold"),
        layout = Layout(margin="0px 0px 0px 110px"),
        icon="upload",
        disabled=False
    )
    completion_text_area = Textarea(
        value="",
        placeholder="",
        description="Response:",
        layout=Layout(width="800px", height="200px", margin="10px 0px 0px 20px"),
        disabled=False
    )
    display(
        VBox(
            [
                prompt_text_box, 
                temperature_slider,
                submit_button,
                completion_text_area,
            ]
        )
    )
    return (prompt_text_box, temperature_slider, submit_button, completion_text_area)

def generate_completion(prompt_text_box, temperature_slider, submit_button, completion_text_area):   
    # Define messages
    initial_system_message = {
        "role": "system",
        "content": "You are a helpful assistant. Respond concisely and politely to user queries. Use clear, simple language. When asked for technical explanations, provide detailed and accurate information, but avoid jargon. If the user asks for assistance with a task, offer step-by-step guidance."
    }
    user_message = {
        "role": "user",
        "content": prompt_text_box.value.strip()
    }

    # Put UI in "generating" mode
    prompt_text_box.disabled = True
    temperature_slider.disabled = True
    submit_button.disabled = True
    submit_button.description = "Generating..."
    submit_button.icon = "hourglass"

    # Generate completion and display it
    chat_response = client.chat.chat(
        messages=[initial_system_message, user_message],
        temperature=temperature_slider.value,
        model="palmyra-x-004",
        stream=True
    )
    output_text = ""
    for chunk in chat_response:
        if chunk.choices[0]["delta"]["content"]:
            output_text += chunk.choices[0]["delta"]["content"]
        else:
            continue
        completion_text_area.value = output_text

    # Reset UI to "Awaiting user input" mode
    prompt_text_box.disabled = False
    temperature_slider.disabled = False
    submit_button.disabled = False
    submit_button.description = "Submit"
    submit_button.icon = "upload"


def main():
    prompt_text_box, temperature_slider, submit_button, completion_text_area = display_ui()
    submit_button.on_click(lambda button: generate_completion(prompt_text_box, temperature_slider, submit_button, completion_text_area))


main()

### Notes

#### A different argument for the `chat()` method

This version of the app calls the `chat()` method in pretty much the same way with one notable exception: the argument it provides for the `stream` parameter is `True`, which specifies that Palmyra should stream its responses as it generates them rather than waiting until the response has been completely generated before returning it:

```
chat_response = client.chat.chat(
    messages=[initial_system_message, user_message],
    model="palmyra-x-004",
    stream=True
)
```

#### The response to the `chat()` method

When `chat()` is called with its `stream` parameter set to `True`, it returns a stream of `ChatStreamingData` objects with properties that differ slightly from when `stream` parameter set to `False`:

<table  width="66%">
    <tr>
        <th width="25%" style="background-color: #5551ff; color: #ffffff;">Property</th>
        <th style="background-color: #5551ff; color: #ffffff;">Description</th>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>id</code></td>
        <td style="border: 1px solid #bfcbff;">A string containing the <code>Chat</code> object’s unique identifier.</td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>choices</code></td>
        <td style="border: 1px solid #bfcbff;">
            A list of dictionaries representing possible portions of the completion (usually containing just one). WIth streaming completions, the key that matters is <code>"delta"</code>, whose corresponding value is a small piece of the completion. It’s a dictionary with the keys <code>"content"</code> and <code>"role"</code>, which serve the same purpose as the <code>"content"</code> and <code>"role"</code> keys in a message dictionary.
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>created</code></td>
        <td style="border: 1px solid #bfcbff;">An integer representing the time when the response was created as a Unix timestamp. You can use this to compare the timing of the response with timings of other events.
        </td>
    </tr>
    <tr>
        <td style="border: 1px solid #bfcbff;"><code>model</code></td>
        <td style="border: 1px solid #bfcbff;">A string specifying the model that generated the response.
        </td>
    </tr>
</table>

#### Why does this version use Jupyter Widgets?

It _is_ possible to simply use a `print()` function to display the response stream — it’s as simple as this:

```
for chunk in chat_response:
    print(chunk.choices[0]["delta"]["content"])
```

The `for` loop continues as long as the stream hasn’t finished, with each iteration happening when Palmyra generates the next part of its response. The problem is that if you use the `print()` function to display the chunks as they arrive, you get output that looks like this:

```
 The
 fastest
 land
 animal
 is
 the
 che
et
ah
.
 It
 can
 reach
 speeds
 up
 to
 
7
0
-
7
5
 miles
 per
 hour
...
```

The solution to this problem is to feed the stream into a UI component whose contents can be updated in real time. Fortunately, there’s Jupyter Widgets, a library that brings UI widgets to Jupyter Notebooks so that they can be as interactive as web and desktop applications.

#### Drawing the UI

The `display_ui()` function creates four UI objects, or widgets:

- `prompt_text_box`: A text box where the user enters their prompt.
- `temperature_slider`: A slider that lets the user can set the temperature within a range of 0.0 to 2.0 in increments of 0.1.
- `submit_button`: The user clicks this button to submit their prompt.
- `completion_text_area`: Palymra’s response appears here.

The widgets are laid out inside a `VBox` layout container.

`display_ui()` also returns `prompt_text_box`, `temperature_slider`, `submit_button`, and `completion_text_area` widgets so that they can be referenced by other code.

<a id="multi-turn-chat-completion-streaming-version"></a>
## **Multi-turn chat completion (streaming version)**

Here’s the streaming version of the multi-turn chat app. Like the streaming single-turn app, this application uses Jupyter Widgets. Note that this app has two text areas: one for the most recent response from Palmyra, and another one below it that contains the text of the entire conversation:

In [None]:
from ipywidgets import Button, ButtonStyle, FloatSlider, HBox, Layout, Text, Textarea, VBox

def display_ui():
    prompt_text_box = Text(
        value="",
        placeholder="Enter your prompt here",
        description="Prompt:",
        style={"description_width": "150px"},
        layout=Layout(width="500px"),
        continuous_update=False,
        disabled=False   
    )
    temperature_slider = FloatSlider(
        min=0.0,
        max=2.0,
        value=1.0,
        step=0.1,
        orientation="horizontal",
        description="Temperature:",
        style={"description_width": "150px"},
        layout=Layout(width="500px"),
        continuous_update=False,
        readout=True,
        readout_format=".1f",
        disabled=False
    )
    submit_button = Button(
        description="Submit",
        tooltip="Click me",
        style=ButtonStyle(button_color="thistle", font_weight="bold"),
        layout = Layout(margin='0px 0px 0px 160px'),
        icon="upload",
        disabled=False
    )
    completion_text_area = Textarea(
        value="",
        placeholder="",
        description="Current\nresponse:",
        style={"description_width": "150px"},
        layout=Layout(width="800px", height="200px", margin="10px 0px 10px 0px"),
        disabled=False
    )
    conversation_text_area = Textarea(
        value="",
        placeholder="",
        description="Full\nconversation:",
        style={"description_width": "150px"},
        layout=Layout(width="800px", height="200px", margin="10px 0px 10px 0px"),
        disabled=False
    )
    display(
        VBox(
            [
                prompt_text_box,
                temperature_slider,
                submit_button,
                completion_text_area,
                conversation_text_area
            ]
        )
    )
    return (prompt_text_box, temperature_slider, submit_button, completion_text_area, conversation_text_area)
    
def generate_completion(prompt_text_box, temperature_slider, submit_button, completion_text_area, conversation_text_area, messages): 
    # Add user prompt to messages
    user_prompt = prompt_text_box.value.strip()
    conversation_text_area.value += f"{user_prompt}\n\n"
    user_message = {
        "role": "user",
        "content": user_prompt
    }
    messages.append(user_message)

    # Put UI in "generating" mode
    prompt_text_box.disabled = True
    temperature_slider.disabled = True
    submit_button.disabled = True
    submit_button.description = "Generating..."
    submit_button.icon = "hourglass"

    # Generate completion and display it
    temperature = temperature_slider.value
    chat_response = client.chat.chat(
        messages=messages,
        temperature=temperature,
        model="palmyra-x-004",
        stream=True
    )
    output_text = ""
    for chunk in chat_response:
        print(f"chunk type: {type(chunk)}")
        if chunk.choices[0]["delta"]["content"]:
            output_text += chunk.choices[0]["delta"]["content"]
            completion_text_area.value = output_text
        else:
            continue

    # Add Palmyra’s response to the `messages` list
    model_message = {
        "role": "assistant",
        "content": output_text
    }
    messages.append(model_message)

    # Reset UI to "Awaiting user input" mode
    prompt_text_box.value = ""
    prompt_text_box.disabled = False
    temperature_slider.disabled = False
    submit_button.disabled = False
    submit_button.description = "Submit"
    submit_button.icon = "upload"

    # Update conversation text area
    conversation_text_area.value += f"[Temperature: {temperature:.1f}]\n{output_text}\n\n"

initial_system_message = {
    "role": "system",
    "content": "You are a helpful assistant. Respond concisely and politely to user queries. Use clear, simple language. When asked for technical explanations, provide detailed and accurate information, but avoid jargon. If the user asks for assistance with a task, offer step-by-step guidance."
}
messages = [initial_system_message]

prompt_text_box, temperature_slider, submit_button, completion_text_area, conversation_text_area = display_ui()
submit_button.on_click(lambda button: generate_completion(prompt_text_box, temperature_slider, submit_button, completion_text_area, conversation_text_area, messages))

<a id="for-more-information"></a>
## **For more information**

For more information about chat completions, the `chat` object, and its `chat()` method, see:

- [The _Chat completion_ guide](https://dev.writer.com/api-guides/chat-completion)
- [The completion API’s _Chat completion page](https://dev.writer.com/api-guides/api-reference/completion-api/chat-completion)
