-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Closed
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity
Description
🚀 The feature, motivation and pitch
Now, there are two kind of kv connector.
- Offload kv cache to kv connector for reuse purpose
- Transfer KVCache from P to D
But we could only specific one kv connector, this make the two above kv connectors cannot co-exist, i propose to do an abstract or use a type, maybe {p2p, offload} to support to find the needed kv connector.
In my view, the ideal way is
Pcheck for partial cache hit and do prefill for the uncache hit part.Ptransfer the minimal necessary kv cache(which mean Decode instance cannot obtain it from offload kv connector)Dreceive KV cache fromoffload connectorandPand update the kvcacheDexecute the decode step by step until finish.Dsave the kv cache into offload kv connector for reuse purpose- Goto 1.
Any idea about this?
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
blossomin, robertgshaw2-redhat, hychen11 and Crystal-lcx
Metadata
Metadata
Assignees
Labels
feature requestNew feature or requestNew feature or requeststaleOver 90 days of inactivityOver 90 days of inactivity