-
Notifications
You must be signed in to change notification settings - Fork 276
Description
In UpdateWithResponse, we only use query to find the cache entry and then fill the response. However, it's possible for two queries with the same value to occur at the same time. For example, if they use different models, the cached result may be incorrect in this case.
semantic-router/src/semantic-router/pkg/extproc/response_handler.go
Lines 157 to 165 in 31069a9
| if exists && ctx.RequestQuery != "" && responseBody != nil { | |
| err := r.Cache.UpdateWithResponse(string(cacheID), responseBody) | |
| if err != nil { | |
| observability.Errorf("Error updating cache: %v", err) | |
| // Continue even if cache update fails | |
| } else { | |
| observability.Infof("Cache updated for request ID: %s", ctx.RequestID) | |
| } | |
| } |
(here cacheID is query)
I propose using ctx.RequestID to locate the correct cache because it's unique. That is, I will add RequestID to the CacheEntry struct. Also, we don't need to return cacheID anymore.
By the way, why do we use this two-step design? It seems like if the second step (filling the LLM response) fails, the first step is useless. Why don't we add the cache entry only when we get the full response?