<a href="https://colab.research.google.com/github/ychoi-kr/LLM-API/blob/main/appendix/token_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install tiktoken==0.7.0 google-generativeai==0.5.2

Collecting tiktoken==0.7.0
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m7.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: tiktoken
Successfully installed tiktoken-0.7.0


In [2]:
from google.colab import userdata

In [3]:
korean_text = "안녕하세요. 오늘도 좋은 하루 되시기 바랍니다."
english_text = "Hello. I hope you have a great day today."

In [4]:
import tiktoken

def print_openai_token_count(text, model):
    encoding = tiktoken.encoding_for_model(model)
    print("text:", text)
    tokens = encoding.encode(text)
    print("Number of tokens:", len(tokens))


In [5]:
print_openai_token_count(korean_text, "gpt-3.5-turbo")
print_openai_token_count(english_text, "gpt-3.5-turbo")

text: 안녕하세요. 오늘도 좋은 하루 되시기 바랍니다.
Number of tokens: 24
text: Hello. I hope you have a great day today.
Number of tokens: 11


In [6]:
print_openai_token_count(korean_text, "gpt-4o")
print_openai_token_count(english_text, "gpt-4o")

text: 안녕하세요. 오늘도 좋은 하루 되시기 바랍니다.
Number of tokens: 12
text: Hello. I hope you have a great day today.
Number of tokens: 11


In [7]:
import google.generativeai as genai
import os

genai.configure(api_key=userdata.get('GOOGLE_API_KEY'))
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(korean_text)

# Prepare the content for token counting
from google.generativeai.types import content_types

def print_google_token_count(text):
    print("text:", text)
    print(model.count_tokens(content_types.to_contents(text)))

print_google_token_count(korean_text)
print_google_token_count(english_text)


text: 안녕하세요. 오늘도 좋은 하루 되시기 바랍니다.
total_tokens: 17

text: Hello. I hope you have a great day today.
total_tokens: 11



In [8]:
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_pretrained("upstage/solar-1-mini-tokenizer")

def print_solar_token_count(text):
    print("text:", text)
    enc = tokenizer.encode(text)
    inv_vocab = {v: k for k, v in tokenizer.get_vocab().items()}
    tokens = [inv_vocab[token_id] for token_id in enc.ids]
    number_of_tokens = len(enc.ids)
    print("Number of tokens:", number_of_tokens)

print_solar_token_count(korean_text)
print_solar_token_count(english_text)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer.json:   0%|          | 0.00/3.31M [00:00<?, ?B/s]

text: 안녕하세요. 오늘도 좋은 하루 되시기 바랍니다.
Number of tokens: 10
text: Hello. I hope you have a great day today.
Number of tokens: 12
