# API endpoint prompt format testing
Playing with different formats for mostly mistral models. Using Together via the `/inference` endpoint we can bypass their prompt formatting for instruct/chat models and have more control.

This means there's still the control and instructing of these models, but we can also part-fill their responses to guide.

Alternatively, just use the base models and ignore the instruction tuning completely. There's still a lot of non-instruct style things that can be done with the instruction tuned models, but their logit distributions are altered from the base and it's not yet clear in what ways this is better or worse for extracting the best reasoning and knoweledge from them if you don't need to have structured conversations...this is still a research question.

In [181]:
import json
import os
import sys
from typing import List, Dict, Any
import requests

In [13]:
# add the parent directory to the path so we can import the module
# sys.path.append(os.path.abspath('.'))
sys.path.append(os.getcwd())

In [14]:
from dotenv import load_dotenv
load_dotenv()

True

In [15]:
from llm_utils.endpoint_utils import rest_api_request

In [42]:
def create_json_data_together_base(model: str, max_tokens: int = 1024, temperature: float = 0.7, repetition_penalty: float = 1.0, top_p: float = 0.7, **kwargs) -> Dict[str, Any]:
    """Create json dict for requests call"""
    json_data: Dict[str, Any] = {
        "model": model,
        "max_tokens": max_tokens,
        "temperature": temperature,
        "repetition_penalty": repetition_penalty,
        "top_p": top_p,
        **kwargs,
    }
    return json_data
def create_json_data_together_chat(model: str, messages: List[str], max_tokens: int = 1024, temperature: float = 0.7, repetition_penalty: float = 1.0, top_p: float = 0.7, **kwargs) -> Dict[str, Any]:
    """Create json dict for requests call"""
    return create_json_data_together_base(model, max_tokens, temperature, repetition_penalty, top_p, messages=messages, **kwargs)

def create_json_data_together_inference(model: str, prompt: str, max_tokens: int = 1024, temperature: float = 0.7, repetition_penalty: float = 1.0, top_p: float = 0.7, **kwargs) -> Dict[str, Any]:
    """Create json dict for requests call"""
    return create_json_data_together_base(model, max_tokens, temperature, repetition_penalty, top_p, prompt=prompt, **kwargs)


In [43]:
fmt_prompt_mistral_instruct = """\
[INST] {prompt} [/INST] \
"""

In [46]:
prompt = "What's the Roland 808?"
prompt_str = fmt_prompt_mistral_instruct.format(prompt=prompt)

In [47]:
prompt_str

"[INST] What's the Roland 808? [/INST] "

In [64]:
BASE_URL = "https://api.together.xyz/inference"
API_KEY = os.getenv("TOGETHER_API_KEY")
MODEL = "mistralai/Mixtral-8x7B-Instruct-v0.1"
# MODEL = "mistralai/Mixtral-8x7B-v0.1"

In [65]:
create_json_data_together_inference(MODEL, prompt_str)

{'model': 'mistralai/Mixtral-8x7B-Instruct-v0.1',
 'max_tokens': 1024,
 'temperature': 0.7,
 'repetition_penalty': 1.0,
 'top_p': 0.7,
 'prompt': "[INST] What's the Roland 808? [/INST] "}

In [71]:
stream=False
response = rest_api_request(BASE_URL, create_json_data_together_inference(MODEL, prompt_str, stream=stream), API_KEY, stream=stream)

In [72]:
# response.content.decode("utf-8")
json_data = response.json()
print(json.dumps(json_data, indent=4, sort_keys=True))
print('\n\n' + json_data['output']['choices'][0]['text'])

{
    "args": {
        "max_tokens": 1024,
        "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
        "prompt": "[INST] What's the Roland 808? [/INST] ",
        "repetition_penalty": 1,
        "stream": false,
        "temperature": 0.7,
        "top_p": 0.7
    },
    "id": "84055db01d5179ae-LHR",
    "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
    "model_owner": "",
    "num_returns": 1,
    "output": {
        "choices": [
            {
                "text": "\tThe Roland TR-808 Rhythm Composer, commonly known as the Roland 808, is a programmable drum machine manufactured by the Roland Corporation between 1980 and 1984. It was designed to provide musicians with a simple and affordable way to create electronic drum patterns, and it quickly became a staple in many genres of music, including hip hop, house, and techno.\n\nThe 808 features a simple step-sequencer interface, which allows users to program drum patterns by entering notes on a 16-step grid. It includes a var

## Streaming client

In [182]:
# Handling of Server Sent Events (SSE) -- alternatively use sseclient-py (not sseclient)
def _process_sse_event(buffer):
    event_data = {}
    for line in buffer.strip().split('\n'):
        key, value = line.split(':', 1)
        event_data[key.strip()] = value.strip()

    return event_data

def stream_sse(response: requests.Response):
    # Make sure the connection is valid
    if response.status_code == 200:
        buffer = ''
        for line in response.iter_lines():
            if line:
                buffer += line.decode('utf-8') + '\n'
            else:
                yield _process_sse_event(buffer)
                buffer = ''
    else:
        print(f"Connection failed with status code: {response.status_code}")


def print_stream_sse(response: requests.Response):
    for chunk in stream_sse(response):
        # Assumes either together.ai or openai (vLLM might work too...)
        if not chunk['data'] or chunk['data'] == '[DONE]':
            print("")
            break
        data = json.loads(chunk['data'])
        print(data['choices'][0]['text'], end='', flush=True)

In [192]:
stream=True
response = rest_api_request(BASE_URL, create_json_data_together_inference(MODEL, prompt_str, stream=stream), API_KEY, stream=stream)

In [184]:
print(prompt_str)
print_stream_sse(response)

[INST] What's the Roland 808? [/INST] 
	TheThe Roland TR-808 Rhythm Composer, commonly known as the Roland 808, is a classic analog drum machine that was first released by the Roland Corporation in 1980. It quickly became popular in the music industry due to its unique sound and versatility.

The 808 features a simple step-sequencer interface and a variety of drum sounds, including kick, snare, tom, rim shot, hand clap, and cymbal. The sounds can be tweaked and customized using various controls, such as tuning, decay, and attack.

One of the reasons the 808 became so iconic is its distinctive bass drum sound, which has a deep, boomy quality that is often associated with hip-hop and electronic music. The 808's cowbell and hand clap sounds are also widely used in many genres of music.

Although the 808 was eventually discontinued in 1983, its influence can still be heard in countless recordings and live performances today. The 808 has been emulated in many software plugins and sample pac

In [193]:
full_text = ""
for chunk in stream_sse(response):
    if chunk['data'] == '[DONE]':
        print("")
        break
    text = json.loads(chunk['data'])['choices'][0]['text']
    full_text += text
    print(text, end='', flush=True)

	The Roland TR-808The Roland TR-808 Rhythm Composer, commonly known as the Roland 808, is a classic analog drum machine that was first released by the Roland Corporation in 1980. It quickly became popular in the music industry for its distinctive sound and has been used by countless musicians and producers to create some of the most iconic beats in music history.

The Roland 808 features a simple and intuitive interface with buttons and knobs that allow users to program and customize drum patterns. It includes a variety of drum sounds, such as bass drum, snare drum, handclap, cymbal, and more, which can be adjusted for pitch, decay, and tone. The 808 also has a built-in sequencer that can store up to 32 patterns, which can be chained together to create longer songs.

One of the reasons the Roland 808 has become so legendary is its unique sound. The bass drum in particular has a deep, punchy sound that has become a staple in many genres of music, including hip-hop, electronic dance musi

In [195]:
print(prompt_str + full_text)

[INST] What's the Roland 808? [/INST] 	The Roland TR-808 Rhythm Composer, commonly known as the Roland 808, is a classic analog drum machine that was first released by the Roland Corporation in 1980. It quickly became popular in the music industry for its distinctive sound and has been used by countless musicians and producers to create some of the most iconic beats in music history.

The Roland 808 features a simple and intuitive interface with buttons and knobs that allow users to program and customize drum patterns. It includes a variety of drum sounds, such as bass drum, snare drum, handclap, cymbal, and more, which can be adjusted for pitch, decay, and tone. The 808 also has a built-in sequencer that can store up to 32 patterns, which can be chained together to create longer songs.

One of the reasons the Roland 808 has become so legendary is its unique sound. The bass drum in particular has a deep, punchy sound that has become a staple in many genres of music, including hip-hop, 