Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rpc : send hash when tensor data is above some fixed threshold #12496

Merged
merged 5 commits into from
Mar 28, 2025

Conversation

rgerganov
Copy link
Collaborator

ref #10095

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 21, 2025
@rgerganov rgerganov marked this pull request as ready for review March 21, 2025 13:21
@rgerganov rgerganov force-pushed the rpc-hash branch 3 times, most recently from b7bda76 to 3b5f524 Compare March 26, 2025 08:33
@rgerganov
Copy link
Collaborator Author

I added support for caching tensors from GGUF files (via multiple -f args) and specifying local cache dir (-d arg).

The problem with caching GGUF files is that they need to stay in RAM the whole time. I tried to keep a map of hash -> (gguf_path, tensor_name) and load tensors on demand but it turned out to be very slow. You can find this version of the patch here.

The good news is that caching large tensors in a local dir and loading them from there works fine.

I am inclined to drop support for caching GGUFs and leave only the cache dir support. Keeping GGUFs in memory also makes the reported available memory inaccurate with the CPU backend.

@ggerganov
Copy link
Member

I am inclined to drop support for caching GGUFs and leave only the cache dir support. Keeping GGUFs in memory also makes the reported available memory inaccurate with the CPU backend.

I'm OK with that - it will make the change even simpler.

@rgerganov
Copy link
Collaborator Author

OK, I have left only the -d arg which specifies the cache dir.

Should we try to reuse fs_get_cache_directory() from common.h and put stuff under $HOME/.cache/llama.cpp/?

@ggerganov
Copy link
Member

Should we try to reuse fs_get_cache_directory() from common.h and put stuff under $HOME/.cache/llama.cpp/?

Sounds good. Maybe instead of linking libcommon to rpc-server you can copy paste the function for now.

@rgerganov
Copy link
Collaborator Author

I changed the server to accept a -c, --cache flag which enables the local cache. Currently we use $HOME/.cache/llama.cpp/rpc/. Is using subdir OK? We can also save to $HOME/.cache/llama.cpp/rpc-XXXXXXXXXXXXXXXX and keep the cache flat, I don't have strong preferences

@ggerganov
Copy link
Member

I changed the server to accept a -c, --cache flag which enables the local cache.

Yes, that's better - the cache path can be controlled with the env variable.

Is using subdir OK?

Yes.


namespace fs = std::filesystem;

// NOTE: this is copied from common.cpp to avoid linking with libllama
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// NOTE: this is copied from common.cpp to avoid linking with libllama
// NOTE: this is copied from common.cpp to avoid linking with libcommon

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok but I don't understand why putting common in target_link_libraries is pulling libllama as dependency. I have a static libcommon.a built, why is not possible to link against it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcommon already links to libllama:

target_link_libraries (${TARGET} PRIVATE ${LLAMA_COMMON_EXTRA_LIBS} PUBLIC llama Threads::Threads)

So anything that links to libcommon will indirectly link to libllama. In theory, we can separate all "common" functionality that does not depend on libllama into a separate standalone common library that would be suitable to link in this case.

Btw, on master, the rpc-server example links to libllama, which is not necessary. You can simply remove this dependency:

diff --git a/examples/rpc/CMakeLists.txt b/examples/rpc/CMakeLists.txt
index ae48fb98d..892db89ea 100644
--- a/examples/rpc/CMakeLists.txt
+++ b/examples/rpc/CMakeLists.txt
@@ -1,2 +1,2 @@
 add_executable(rpc-server rpc-server.cpp)
-target_link_libraries(rpc-server PRIVATE ggml llama)
+target_link_libraries(rpc-server PRIVATE ggml)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, fixed

@rgerganov rgerganov merged commit ab6ab8f into ggml-org:master Mar 28, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
examples ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants