rpc : send hash when tensor data is above some fixed threshold #12496

rgerganov · 2025-03-21T11:34:56Z

rgerganov · 2025-03-26T08:44:19Z

I added support for caching tensors from GGUF files (via multiple -f args) and specifying local cache dir (-d arg).

The problem with caching GGUF files is that they need to stay in RAM the whole time. I tried to keep a map of hash -> (gguf_path, tensor_name) and load tensors on demand but it turned out to be very slow. You can find this version of the patch here.

The good news is that caching large tensors in a local dir and loading them from there works fine.

I am inclined to drop support for caching GGUFs and leave only the cache dir support. Keeping GGUFs in memory also makes the reported available memory inaccurate with the CPU backend.

ggerganov · 2025-03-26T16:12:20Z

I am inclined to drop support for caching GGUFs and leave only the cache dir support. Keeping GGUFs in memory also makes the reported available memory inaccurate with the CPU backend.

I'm OK with that - it will make the change even simpler.

ref ggml-org#10095

rgerganov · 2025-03-27T08:47:42Z

OK, I have left only the -d arg which specifies the cache dir.

Should we try to reuse fs_get_cache_directory() from common.h and put stuff under $HOME/.cache/llama.cpp/?

ggerganov · 2025-03-27T08:53:35Z

Should we try to reuse fs_get_cache_directory() from common.h and put stuff under $HOME/.cache/llama.cpp/?

Sounds good. Maybe instead of linking libcommon to rpc-server you can copy paste the function for now.

ggml/src/ggml-rpc/ggml-rpc.cpp

rgerganov · 2025-03-27T09:45:54Z

I changed the server to accept a -c, --cache flag which enables the local cache. Currently we use $HOME/.cache/llama.cpp/rpc/. Is using subdir OK? We can also save to $HOME/.cache/llama.cpp/rpc-XXXXXXXXXXXXXXXX and keep the cache flat, I don't have strong preferences

ggerganov · 2025-03-27T09:53:15Z

I changed the server to accept a -c, --cache flag which enables the local cache.

Yes, that's better - the cache path can be controlled with the env variable.

Is using subdir OK?

Yes.

ggerganov · 2025-03-27T09:51:33Z

examples/rpc/rpc-server.cpp

+
+namespace fs = std::filesystem;
+
+// NOTE: this is copied from common.cpp to avoid linking with libllama


Suggested change

// NOTE: this is copied from common.cpp to avoid linking with libllama

// NOTE: this is copied from common.cpp to avoid linking with libcommon

ok but I don't understand why putting common in target_link_libraries is pulling libllama as dependency. I have a static libcommon.a built, why is not possible to link against it?

libcommon already links to libllama:

llama.cpp/common/CMakeLists.txt

Line 141 in f125b8d

target_link_libraries (${TARGET} PRIVATE ${LLAMA_COMMON_EXTRA_LIBS} PUBLIC llama Threads::Threads)

So anything that links to libcommon will indirectly link to libllama. In theory, we can separate all "common" functionality that does not depend on libllama into a separate standalone common library that would be suitable to link in this case.

Btw, on master, the rpc-server example links to libllama, which is not necessary. You can simply remove this dependency:

diff --git a/examples/rpc/CMakeLists.txt b/examples/rpc/CMakeLists.txt index ae48fb98d..892db89ea 100644 --- a/examples/rpc/CMakeLists.txt +++ b/examples/rpc/CMakeLists.txt @@ -1,2 +1,2 @@ add_executable(rpc-server rpc-server.cpp) -target_link_libraries(rpc-server PRIVATE ggml llama) +target_link_libraries(rpc-server PRIVATE ggml)

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Mar 21, 2025

rgerganov mentioned this pull request Mar 21, 2025

Feature Request: RPC offloading using a local model copy #10095

Closed

4 tasks

rgerganov force-pushed the rpc-hash branch from 80529f4 to aab1a0e Compare March 21, 2025 13:19

github-actions bot added the examples label Mar 21, 2025

rgerganov marked this pull request as ready for review March 21, 2025 13:21

rgerganov force-pushed the rpc-hash branch 3 times, most recently from b7bda76 to 3b5f524 Compare March 26, 2025 08:33

rpc : send hash when tensor data is above some fixed threshold

c04dfdf

ref ggml-org#10095

rgerganov force-pushed the rpc-hash branch from 3b5f524 to c04dfdf Compare March 27, 2025 08:45

ggerganov approved these changes Mar 27, 2025

View reviewed changes

ggml/src/ggml-rpc/ggml-rpc.cpp Show resolved Hide resolved

ggml/src/ggml-rpc/ggml-rpc.cpp Outdated Show resolved Hide resolved

rpc : put cache under $HOME/.cache/llama.cpp

b8f5e8c

ggerganov reviewed Mar 27, 2025

View reviewed changes

rgerganov added 3 commits March 27, 2025 11:53

try to fix win32 build

eece45d

another try to fix win32 build

921893c

remove llama as dependency

19e6ec6

ggerganov approved these changes Mar 27, 2025

View reviewed changes

rgerganov merged commit ab6ab8f into ggml-org:master Mar 28, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

rpc : send hash when tensor data is above some fixed threshold #12496

rpc : send hash when tensor data is above some fixed threshold #12496

Uh oh!

rgerganov commented Mar 21, 2025

Uh oh!

rgerganov commented Mar 26, 2025

Uh oh!

ggerganov commented Mar 26, 2025

Uh oh!

rgerganov commented Mar 27, 2025

Uh oh!

ggerganov commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

rgerganov commented Mar 27, 2025

Uh oh!

ggerganov commented Mar 27, 2025

Uh oh!

ggerganov Mar 27, 2025

Uh oh!

rgerganov Mar 27, 2025

Uh oh!

ggerganov Mar 27, 2025

Uh oh!

rgerganov Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!


		namespace fs = std::filesystem;

		// NOTE: this is copied from common.cpp to avoid linking with libllama

rpc : send hash when tensor data is above some fixed threshold #12496

rpc : send hash when tensor data is above some fixed threshold #12496

Uh oh!

Conversation

rgerganov commented Mar 21, 2025

Uh oh!

rgerganov commented Mar 26, 2025

Uh oh!

ggerganov commented Mar 26, 2025

Uh oh!

rgerganov commented Mar 27, 2025

Uh oh!

ggerganov commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

rgerganov commented Mar 27, 2025

Uh oh!

ggerganov commented Mar 27, 2025

Uh oh!

ggerganov Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

rgerganov Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!