Skip to content

Commit

Permalink
Change method to calculate number of tokens for OpenAIChat (langchain…
Browse files Browse the repository at this point in the history
…-ai#1457)

Solves langchain-ai#1412

Currently `OpenAIChat` inherits the way it calculates the number of
tokens, `get_num_token`, from `BaseLLM`.
In the other hand `OpenAI` inherits from `BaseOpenAI`. 

`BaseOpenAI` and `BaseLLM` uses different methodologies for doing this.
The first relies on `tiktoken` while the second on `GPT2TokenizerFast`.

The motivation of this PR is:

1. Bring consistency about the way of calculating number of tokens
`get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non
Chat` scenarios.
2. Give preference to the `tiktoken` method as it's serverless friendly.
It doesn't require downloading models which might make it incompatible
with `readonly` filesystems.
  • Loading branch information
juankysoriano authored and zachschillaci27 committed Mar 8, 2023
1 parent f9f20cc commit 427e61c
Showing 1 changed file with 22 additions and 0 deletions.
22 changes: 22 additions & 0 deletions langchain/llms/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -692,3 +692,25 @@ def _identifying_params(self) -> Mapping[str, Any]:
def _llm_type(self) -> str:
"""Return type of llm."""
return "openai-chat"

def get_num_tokens(self, text: str) -> int:
"""Calculate num tokens with tiktoken package."""
# tiktoken NOT supported for Python 3.8 or below
if sys.version_info[1] <= 8:
return super().get_num_tokens(text)
try:
import tiktoken
except ImportError:
raise ValueError(
"Could not import tiktoken python package. "
"This is needed in order to calculate get_num_tokens. "
"Please it install it with `pip install tiktoken`."
)
# create a GPT-3.5-Turbo encoder instance
enc = tiktoken.encoding_for_model("gpt-3.5-turbo")

# encode the text using the GPT-3.5-Turbo encoder
tokenized_text = enc.encode(text)

# calculate the number of tokens in the encoded text
return len(tokenized_text)

0 comments on commit 427e61c

Please sign in to comment.