AI Memory Overflow

Understanding the memory context of AI models by testing prompt lengths exceeding their context limits. Contributions welcome!

Testing ChatGPT

Test Circumstances

The gpt-3.5-turbo model has a context length of 4096 tokens. The prompt, larger than 4096 tokens, consisted of blocks (5 characters separated by -) and asked the AI model to provide the first and last block in the list. With a context length of 4096 tokens, each prompt averaged 973.33 blocks.

So, how does ChatGPT perform over a hundred tests?

Can't view the image? Click here!

At most, ChatGPT retains context for 79.45% of blocks.

It must be noted that these results only pertain to the gpt-3.5-turbo model. Additionally, the test results do not account for instances when ChatGPT responded with code instead of a direct response.

In instances where the model responded with blocks not present in the list, I either retested the prompt in a new conversation, or tested with a new prompt when ChatGPT attempted to fill in the remaining characters and provided an invalid block.

In cases it responded with a truncated block or with a block with incorrect capitalization, the correct, complete version of the block was considered.

Also, with a context length of 3000 tokens (lower than 4096 tokens), ChatGPT remembers context for all blocks, responding with the correct first and last block every time.

Try it yourself!

To run a test on a model (by OpenAI, as this uses their library tiktoken), follow these steps:

Install requirements using pip install tiktoken.
Clone this repo into a folder (git clone https://github.com/terminalcommandnewsletter/ai-memory-overflow.git) as the script requires files in utils/ to run.
In the terminal, run python main.py -m [MODEL]. Requires Python 2.6 or later.

To see all options, run python main.py -h.

Use the --old/-u flag to use the same concatenation used in the testing (not perfect, since it adds the separator at the beginning).

Also, to check the index of a block, use the --check/-x flag which will ask for the last block the AI sees and print the response as "<block index>,<number of blocks>,<percentage blocks remembered>".

If you have tested any model extensively (not in the repo), you can contribute to the repo.

Final Test Results

Test Number	Tested in	Model Tested	Context Length	Number of Characters per Block	Average (Mean) % Remembered	Lowest % Remembered	Highest % Remembered	Standard Deviation	Number of Tests	Tested by	Test data (without headers, format: `"last" block index,number of blocks,percentage remembered`)
1	May 2023	gpt-3.5-turbo	4096 tokens	5	75.99%	13.04%	79.45%	11.24%	100	@terminalcommandnewsletter	Test data

License

The license for this repo can be found in the COPYING file.

This program comes with ABSOLUTELY NO WARRANTY; for details, check COPYING. This is free software, and you are welcome to redistribute it under certain conditions; for details, check COPYING.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
img		img
test-data		test-data
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
COPYING		COPYING
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Memory Overflow

Table of Contents

Testing ChatGPT

Test Circumstances

So, how does ChatGPT perform over a hundred tests?

Try it yourself!

Final Test Results

License

About

Uh oh!

Languages

License

terminalcommandnewsletter/ai-memory-overflow

Folders and files

Latest commit

History

Repository files navigation

AI Memory Overflow

Table of Contents

Testing ChatGPT

Test Circumstances

So, how does ChatGPT perform over a hundred tests?

Try it yourself!

Final Test Results

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Languages