## Key topics:

**Tokens**: Tokens are a numerical representation of how the Azure OpenAI models process text. So they are representing words or just chunks of characters. For English text, 1 token is approximately 4 characters or 0.75 words. 

**Tokenization**: splitting input/output texts into smaller units for LLMs.

**Vocabulary size**: the number of tokens each model uses, which varies among different GPT models.

In [1]:
#pip install tiktoken #The open source version of tiktoken can be installed from PyPI

In [5]:
import tiktoken 

cl100k_base = tiktoken.get_encoding("cl100k_base") 

enc = tiktoken.Encoding( 
    name="gpt-35-turbo",  
    pat_str=cl100k_base._pat_str, 
    mergeable_ranks=cl100k_base._mergeable_ranks, 
    special_tokens={ 
        **cl100k_base._special_tokens, 
        "<|im_start|>": 100264, 
        "<|im_end|>": 100265
    } 
) 

tokens = enc.encode( 
    "The Very Group announces long term strategic partnership with Carlyle and IMI and a robust Q2 performance."
) 

print('Total number of tokens:', len(tokens))
print('Tokens : ', [enc.decode([t]) for t in tokens])
print("Tokens' numerical values:", tokens)

#https://platform.openai.com/tokenizer

Total number of tokens: 21
Tokens :  ['The', ' Very', ' Group', ' announces', ' long', ' term', ' strategic', ' partnership', ' with', ' Carly', 'le', ' and', ' IM', 'I', ' and', ' a', ' robust', ' Q', '2', ' performance', '.']
Tokens' numerical values: [791, 15668, 5856, 48782, 1317, 4751, 19092, 15664, 449, 79191, 273, 323, 6654, 40, 323, 264, 22514, 1229, 17, 5178, 13]


In [1]:
import os
import openai
from dotenv import load_dotenv

# Set up Azure OpenAI
load_dotenv("credentials.env")

openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") # Api base is the 'Endpoint' which can be found in Azure Portal where Azure OpenAI is created. It looks like https://xxxxxx.openai.azure.com/
openai.api_version = "2024-02-15-preview"
openai.api_key = os.getenv("AZURE_OPENAI_KEY")

import os
from openai import AzureOpenAI
    
client = AzureOpenAI(
    api_key=os.getenv("AZURE_OPENAI_KEY"),  
    api_version="2024-02-15-preview",
    azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")
    )
    
deployment_name='gpt-35-turbo' #This will correspond to the custom name you chose for your deployment when you deployed a model. 
 

In [2]:
 # Send a completion call to generate an answer
print('Sending a test completion job')
start_phrase = 'Help with the cost of living. '
response = client.completions.create(
    model=deployment_name, 
    prompt=start_phrase, 
    max_tokens=100)
print(response.choices[0].text)

Sending a test completion job
 Depending on where you live or want to live, you may be eligible for aid for childcare, groceries, housing, or other expenses.

Final Thoughts

In the end, there is no easy way to achieve financial stability when faced with unemployment. However, using your resources and taking advantage of these tips can be your ticket to surviving and thriving when life doesn’t go in the direction you wish it had. Further, by taking a proactive and open-minded approach, you may find financial independence and the possibility of


# Usage

In [3]:
response

Completion(id='cmpl-9QtJwQaMtbJRAjD5IhU1imqqvkDWt', choices=[CompletionChoice(finish_reason='length', index=0, logprobs=None, text=' Depending on where you live or want to live, you may be eligible for aid for childcare, groceries, housing, or other expenses.\n\nFinal Thoughts\n\nIn the end, there is no easy way to achieve financial stability when faced with unemployment. However, using your resources and taking advantage of these tips can be your ticket to surviving and thriving when life doesn’t go in the direction you wish it had. Further, by taking a proactive and open-minded approach, you may find financial independence and the possibility of', content_filter_results={'hate': {'filtered': False, 'severity': 'safe'}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}})], created=1716196208, model='gpt-35-turbo', object='text_completion', system_fingerprint=None, usage=Completio

In [4]:
response.usage

CompletionUsage(completion_tokens=100, prompt_tokens=8, total_tokens=108)

Azure OpenAI uses a subword tokenization method called "Byte-Pair Encoding (BPE)" for its GPT-based models. ** BPE is a method that merges the most frequently occurring pairs of characters or bytes into a single token **, until a certain number of tokens or a vocabulary size is reached. BPE can help the model to handle rare or unseen words, and to create more compact and consistent representations of the texts. BPE can also allow the model to generate new words or tokens, by combining existing ones. 

https://learn.microsoft.com/en-us/semantic-kernel/prompt-engineering/tokens