Optimize KV cache size #1

tc-wolf · 2024-10-15T19:57:10Z

Allow for saving / reloading KV cache without saving the logits, since not needed for my use case + sampling strategy.

- Create `reload_from_cache_state` method - Still using LLamaState as container - Use low level `ctx.get_logits_ith` to get last calculated logits. - Add StateReloadError so that can be fallible. - Change Llama class to use this instead of `load_state` directly. - Default implementation still uses `load_state`.

- Use ptr.contents, not ptr in `np.array` - Get dtype from return type on annotated signature - Explicitly set copy=True and dtype on np.array - Should not strictly be necessary since pointer is typed

Catch StateReloadError and add logging if runs into this when running.

- Fix loading state (from_buffer -> from_buffer_copy since bytes aren't mutable) - Add tests (E2E, errors when should, reloads successfully, logits correct, etc.) Have to set LLAMA_TEST_MODEL to point to model path in order to get this to run.

D'oh

- Check when saving that model doesn't need logits - Ad note in `reload_from_cache` state to revisit

- Make default to *not* save logits - Error if needed and save_logits False in build_cache - Handle reloading with/without scores if needed + available

- Add more tests - Make llama_state / small_model module scope (so don't need to reload for each test) - Setting env var in `.env` file

tc-wolf added 9 commits October 14, 2024 12:46

Fix bug

ed6c354

- Use ptr.contents, not ptr in `np.array` - Get dtype from return type on annotated signature - Explicitly set copy=True and dtype on np.array - Should not strictly be necessary since pointer is typed

Catch StateReloadError

90d42c3

Catch StateReloadError and add logging if runs into this when running.

Add tests

46718c9

- Fix loading state (from_buffer -> from_buffer_copy since bytes aren't mutable) - Add tests (E2E, errors when should, reloads successfully, logits correct, etc.) Have to set LLAMA_TEST_MODEL to point to model path in order to get this to run.

Skip saving logits

8362cfa

D'oh

Add check + note

de7f862

- Check when saving that model doesn't need logits - Ad note in `reload_from_cache` state to revisit

Finalize llama cache changes

b4e2156

- Make default to *not* save logits - Error if needed and save_logits False in build_cache - Handle reloading with/without scores if needed + available

Finalize tests

ca5d1a4

- Add more tests - Make llama_state / small_model module scope (so don't need to reload for each test) - Setting env var in `.env` file

Better variable name

0967eda

tc-wolf merged commit 6634017 into bumped_llama_cpp_with_disk_cache Oct 21, 2024

tc-wolf deleted the optimize_kv_cache_size branch October 21, 2024 16:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize KV cache size #1

Optimize KV cache size #1

Uh oh!

tc-wolf commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Optimize KV cache size #1

Optimize KV cache size #1

Uh oh!

Conversation

tc-wolf commented Oct 15, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant