## Key topics:

**Tokens**: basic units of text/code for LLMs to process/generate language.

**Tokenization**: splitting input/output texts into smaller units for LLMs.

**Vocabulary size**: the number of tokens each model uses, which varies among different GPT models.

**Tokenization cost**: affects the memory and computational resources that a model needs, which influences the cost and performance of running Azure OpenAI model.

In [None]:
#pip install tiktoken #The open source version of tiktoken can be installed from PyPI

In [4]:
import tiktoken 

cl100k_base = tiktoken.get_encoding("cl100k_base") 

enc = tiktoken.Encoding( 
    name="gpt-35-turbo",  
    pat_str=cl100k_base._pat_str, 
    mergeable_ranks=cl100k_base._mergeable_ranks, 
    special_tokens={ 
        **cl100k_base._special_tokens, 
        "<|im_start|>": 100264, 
        "<|im_end|>": 100265
    } 
) 

tokens = enc.encode( 
    "The road to creating new medicines and vaccines has traditionally been long and winding!"
) 

print('Total number of tokens:', len(tokens))
print('Tokens : ', [enc.decode([t]) for t in tokens])
print("Tokens' numerical values:", tokens)

#https://platform.openai.com/tokenizer

Total number of tokens: 15
Tokens :  ['The', ' road', ' to', ' creating', ' new', ' medicines', ' and', ' vaccines', ' has', ' traditionally', ' been', ' long', ' and', ' winding', '!']
Tokens' numerical values: [791, 5754, 311, 6968, 502, 39653, 323, 40300, 706, 36342, 1027, 1317, 323, 54826, 0]


In [6]:
import openai
from openai import AzureOpenAI
import os 
from azure.identity import ManagedIdentityCredential

default_credential=ManagedIdentityCredential(client_id="d30cba06-04c1-4065-a91d-8b7ce3b07b78")
token=default_credential.get_token("https://cognitiveservices.azure.com/.default")
Resource_endpoint="https://openaiykus.openai.azure.com/"
openai.api_type="azure_ad"

client = AzureOpenAI(
  azure_endpoint = Resource_endpoint, 
  api_key=token.token,  
  api_version="2023-05-15"
)

In [7]:
deployment_name='gpt-35-turbo-instruct' 
#This will correspond to the custom name you chose for your deployment when you deployed a model. 
    
# Send a completion call to generate an answer
print('Sending a test completion job')
start_phrase = 'Help with the cost of living. '
response = client.completions.create(
    model=deployment_name, 
    prompt=start_phrase, 
    max_tokens=10)
print(response.choices[0].text)

Sending a test completion job

Here are a few tips for managing the cost


# Usage

In [8]:
response

Completion(id='cmpl-8MjFMIEUxLhergfxuoP7OgFIsEZeN', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=None, text='\nHere are a few tips for managing the cost')], created=1700427836, model='gpt-35-turbo-instruct', object='text_completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=8, total_tokens=18))

In [10]:
response.usage

CompletionUsage(completion_tokens=10, prompt_tokens=8, total_tokens=18)

Azure OpenAI uses a subword tokenization method called "Byte-Pair Encoding (BPE)" for its GPT-based models. ** BPE is a method that merges the most frequently occurring pairs of characters or bytes into a single token **, until a certain number of tokens or a vocabulary size is reached. BPE can help the model to handle rare or unseen words, and to create more compact and consistent representations of the texts. BPE can also allow the model to generate new words or tokens, by combining existing ones. 

https://learn.microsoft.com/en-us/semantic-kernel/prompt-engineering/tokens