AI Memory Overflow
A project by @tercmd (on Twitter)
Understanding the memory context of AI models by testing prompt lengths exceeding their context limits. Contributions welcome!
Table of Contents
- Testing ChatGPT
- Try it yourself!
- Final Test Results
gpt-3.5-turbo model has a context length of 4096 tokens. The prompt, larger than 4096 tokens, consisted of blocks (5 characters separated by
-) and asked the AI model to provide the first and last block in the list. With a context length of 4096 tokens, each prompt averaged 973.33 blocks.
So, how does ChatGPT perform over a hundred tests?
At most, ChatGPT retains context for 79.45% of blocks.
It must be noted that these results only pertain to the
gpt-3.5-turbo model. Additionally, the test results do not account for instances when ChatGPT responded with code instead of a direct response.
In instances where the model responded with blocks not present in the list, I either retested the prompt in a new conversation, or tested with a new prompt when ChatGPT attempted to fill in the remaining characters and provided an invalid block.
In cases it responded with a truncated block or with a block with incorrect capitalization, the correct, complete version of the block was considered.
Also, with a context length of 3000 tokens (lower than 4096 tokens), ChatGPT remembers context for all blocks, responding with the correct first and last block every time.
Try it yourself!
To run a test on a model (by OpenAI, as this uses their library
tiktoken), follow these steps:
Install requirements using
pip install tiktoken.
Clone this repo into a folder (
git clone https://github.com/terminalcommandnewsletter/ai-memory-overflow.git) as the script requires files in
In the terminal, run
python main.py -m [MODEL]. Requires Python 2.6 or later.
To see all options, run
python main.py -h.
-u flag to use the same concatenation used in the testing (not perfect, since it adds the separator at the beginning).
Also, to check the index of a block, use the
-x flag which will ask for the last block the AI sees and print the response as
"<block index>,<number of blocks>,<percentage blocks remembered>".
If you have tested any model extensively (not in the repo), you can contribute to the repo.
Final Test Results
|Test Number||Tested in||Model Tested||Context Length||Number of Characters per Block||Average (Mean) % Remembered||Lowest % Remembered||Highest % Remembered||Standard Deviation||Number of Tests||Tested by||Test data (without headers, format:
|1||May 2023||gpt-3.5-turbo||4096 tokens||5||75.99%||13.04%||79.45%||11.24%||100||@terminalcommandnewsletter||Test data|
The license for this repo can be found in the COPYING file.