Enable LRU eviction policy in ObjectCache#367
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical memory management issue in the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-designed feature: an LRU eviction policy for the ObjectCache, including new configuration options, a flexible eviction policy trait, an LRU implementation, and updates to the cache logic. However, the refactoring of ObjectCache methods has exposed or propagated several path traversal vulnerabilities. User-supplied identifiers like session_id, object_id, and key are used to construct file system paths without proper validation or sanitization, which could allow an attacker to read, write, or delete arbitrary files. Beyond this critical security concern, further improvements are needed to ensure consistent configuration parsing between YAML files and environment variables, clarify the impact of default behavior changes in the documentation, and improve code clarity and test robustness.
| ) -> Result<ObjectMetadata, FlameError> { | ||
| let object_id = object_id.unwrap_or_else(|| uuid::Uuid::new_v4().to_string()); | ||
| let key = format!("{}/{}", session_id, object_id); | ||
| let size = object.data.len() as u64; |
There was a problem hiding this comment.
The put_with_id function is vulnerable to path traversal. The session_id and object_id parameters, which are derived from user-supplied Arrow Flight descriptors, are used to construct file paths without validation. An attacker can use path traversal sequences (e.g., ..) to write files outside the intended storage directory.
| let object_path = storage_path.join(format!("{}.arrow", key)); | ||
|
|
||
| let file = fs::File::open(&object_path) | ||
| .map_err(|e| FlameError::NotFound(format!("Object file not found: {}", e)))?; | ||
| let reader = FileReader::try_new(file, None) | ||
| .map_err(|e| FlameError::Internal(format!("Failed to create reader: {}", e)))?; | ||
|
|
||
| let batch = reader | ||
| .into_iter() | ||
| .next() | ||
| .ok_or_else(|| FlameError::Internal("No batches in file".to_string()))? | ||
| .map_err(|e| FlameError::Internal(format!("Failed to read batch: {}", e)))?; | ||
|
|
||
| let object = batch_to_object(&batch) | ||
| .map_err(|e| FlameError::Internal(format!("Failed to parse batch: {}", e)))?; | ||
|
|
||
| Ok(object) | ||
| self.load_object_from_disk_internal(&object_path) |
There was a problem hiding this comment.
The load_object_from_disk function constructs a file path by joining storage_path with a user-supplied key. This is a classic path traversal vulnerability. You should validate that the resulting path is still within the storage_path.
| let object_path = storage_path.join(format!("{}.arrow", key)); | |
| let file = fs::File::open(&object_path) | |
| .map_err(|e| FlameError::NotFound(format!("Object file not found: {}", e)))?; | |
| let reader = FileReader::try_new(file, None) | |
| .map_err(|e| FlameError::Internal(format!("Failed to create reader: {}", e)))?; | |
| let batch = reader | |
| .into_iter() | |
| .next() | |
| .ok_or_else(|| FlameError::Internal("No batches in file".to_string()))? | |
| .map_err(|e| FlameError::Internal(format!("Failed to read batch: {}", e)))?; | |
| let object = batch_to_object(&batch) | |
| .map_err(|e| FlameError::Internal(format!("Failed to parse batch: {}", e)))?; | |
| Ok(object) | |
| self.load_object_from_disk_internal(&object_path) | |
| let object_path = storage_path.join(format!("{}.arrow", key)); | |
| if !object_path.starts_with(storage_path) { | |
| return Err(FlameError::InvalidConfig(format!("Invalid key: {}", key))); | |
| } | |
| self.load_object_from_disk_internal(&object_path) |
| Ok(Some(object)) | ||
| } | ||
|
|
||
| async fn get(&self, key: String) -> Result<Object, FlameError> { |
There was a problem hiding this comment.
The get function is vulnerable to path traversal. The key parameter, which is derived from user-supplied Arrow Flight tickets, is used to construct file paths in load_object_from_disk without validation. An attacker can provide a key like ../../etc/passwd to read arbitrary files on the system (with a .arrow extension).
| Err(FlameError::NotFound(format!("object <{}> not found", key))) | ||
| } | ||
|
|
||
| async fn update(&self, key: String, new_object: Object) -> Result<ObjectMetadata, FlameError> { |
| Ok(meta) | ||
| } | ||
|
|
||
| async fn delete(&self, session_id: SessionID) -> Result<(), FlameError> { |
| pub fn max_memory_bytes(&self) -> u64 { | ||
| let max_memory_str = self.max_memory.as_deref().unwrap_or(DEFAULT_MAX_MEMORY); | ||
| max_memory_str | ||
| .parse::<ByteSize>() | ||
| .map(|bs| bs.as_u64()) | ||
| .unwrap_or(1024 * 1024 * 1024) // 1G fallback | ||
| } |
There was a problem hiding this comment.
There is an inconsistency in how memory size strings are parsed. The parse_memory_size function in common/src/ctx.rs is designed to interpret units like 'G' as binary (GiB), which is desirable for memory sizes. However, this max_memory_bytes function uses str::parse::<ByteSize>() directly, which treats 'G' as decimal (GB). This leads to inconsistent behavior between configuration from a file (which uses the custom parser) and configuration from environment variables (which would be parsed here).
To ensure consistent behavior, this function should use the common::ctx::parse_memory_size function. You will need to add use common::ctx::parse_memory_size; to the top of the file.
pub fn max_memory_bytes(&self) -> u64 {
let max_memory_str = self.max_memory.as_deref().unwrap_or(DEFAULT_MAX_MEMORY);
common::ctx::parse_memory_size(max_memory_str).unwrap_or(1024 * 1024 * 1024) // 1G fallback
}| let ctx = FlameClusterContext::from_file(Some(tmp_file.to_string_lossy().to_string())) | ||
| .map_err(|e| FlameError::Internal(e.to_string()))?; |
There was a problem hiding this comment.
Wrapping the result of FlameClusterContext::from_file with map_err obscures the original error type. If from_file returns a specific FlameError variant (like InvalidConfig), it gets converted into a generic Internal error. This makes debugging test failures harder. It would be better to use ? directly to propagate the original error.
| let ctx = FlameClusterContext::from_file(Some(tmp_file.to_string_lossy().to_string())) | |
| .map_err(|e| FlameError::Internal(e.to_string()))?; | |
| let ctx = FlameClusterContext::from_file(Some(tmp_file.to_string_lossy().to_string()))?; |
| - Backward compatible: If eviction config is not set, defaults to LRU with 1G limit | ||
| - No changes to Arrow Flight protocol or ObjectRef structure | ||
| - Existing cached data on disk remains accessible | ||
|
|
||
| *Breaking Changes:* | ||
| - None |
There was a problem hiding this comment.
The document states there are no breaking changes, but also that the default behavior changes from unbounded memory to an LRU policy with a 1G limit if no eviction configuration is provided. This change in default behavior could be considered a breaking change for users who rely on the previous unbounded behavior. It would be clearer to explicitly mention this as a behavioral change in the 'Breaking Changes' section or rephrase the 'Backward compatible' point to clarify this.
| let mut nodes = match lock_ptr!(self.nodes) { | ||
| Ok(n) => n, | ||
| Err(e) => { | ||
| tracing::error!("Failed to lock nodes: {}", e); | ||
| return; | ||
| } | ||
| }; | ||
| let mut head = match lock_ptr!(self.head) { | ||
| Ok(h) => h, | ||
| Err(e) => { | ||
| tracing::error!("Failed to lock head: {}", e); | ||
| return; | ||
| } | ||
| }; | ||
| let mut tail = match lock_ptr!(self.tail) { | ||
| Ok(t) => t, | ||
| Err(e) => { | ||
| tracing::error!("Failed to lock tail: {}", e); | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
The pattern of locking nodes, head, and tail individually with error handling is repeated in on_access, on_evict, and on_add. This introduces boilerplate and could be simplified by creating a helper function to acquire all three locks and return them as a tuple, for example Result<(MutexGuard<...>, ...), ...>. This would make the code more concise and less error-prone.
- Add path traversal validation to prevent directory traversal attacks in put_with_id, get, update, delete, and load_object_from_disk - Use common::ctx::parse_memory_size in EvictionConfig::max_memory_bytes for consistent binary unit interpretation (G=GiB, M=MiB) - Simplify test error handling by using ? operator directly - Clarify behavioral change in FS.md: default LRU with 1GiB limit - Fix clippy warnings (map_or -> is_some_and, wildcard pattern)
ee7106c to
e6aad4c
Compare
Implement memory management for ObjectCache with configurable eviction:
- Add EvictionPolicy trait with LRU and NoEviction implementations
- LRU uses doubly-linked list + HashMap for O(1) operations
- Support max_memory (e.g. "1G", "512M") and max_objects limits
- Evicted objects remain on disk, reloaded on demand
- Add path traversal validation for security
- Enable LRU eviction in CI configuration
Configuration example:
cache:
eviction:
policy: "lru"
max_memory: "1G"
max_objects: 10000
Closes xflops#366
fix #366