-
Notifications
You must be signed in to change notification settings - Fork 2.1k
File Based Storage Provider #9537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
File Based Storage Provider #9537
Conversation
@dotnet-policy-service agree |
👍🏻 |
Directory.CreateDirectory(Path.GetDirectoryName(path)!); | ||
} | ||
var fileInfo = new FileInfo(path); | ||
if (fileInfo.Exists && fileInfo.LastWriteTimeUtc.ToString(CultureInfo.InvariantCulture) != grainState.ETag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File date information can be Pita when used by file shares/zfs etc.
Option for sha256 or custom providers would allow sifferent kind of consistency checks.
Some filesystems allow to have hash build in and some have metadata/tags that allows this to be better for production.
|
||
public sealed class FileGrainStorageOptions : IStorageProviderSerializerOptions | ||
{ | ||
#region properties |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a love of devs, do not use regions 👏
var storedData = options.GrainStorageSerializer.Serialize(grainState.State); | ||
var fName = GetKeyString(stateName, grainId); | ||
var path = Path.Combine(options.RootDirectory, fName!); | ||
if (!Directory.Exists(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sync call will destroy some sharing services.
I suggest that try use it, if fails, then try create folder.
This is more ops per write than "normally" needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this in discord, and wanted to note it here for anyone else looking at this PR. We did something similar and
getting the semantics for a file system storage provider is tricky because:
- File systems are implemented differently, and you have to be sure the subset of functionality you are using is robust across all of them - including when you add NFS or SMB into the mix too, what do they each guarantee in terms of data integrity?
- Writes are not atomic, if your program or OS crashes midway through overwriting grain state, you are left in a non-deterministic possibly corrupt state.
- LastWriteTimeUtc may not be accurate - caching, lazy metadata writes could affect it - see https://learn.microsoft.com/en-us/dotnet/api/system.io.filesysteminfo.lastwritetimeutc?view=net-9.0#remarks
- It's possible for two identical grains to be active during a split-brain scenario so you cannot rely on reading last write time then writing because that allows a race-condition.
- On Linux (as of .NET 9.0) there are no truly asynchronous file operations - all the async ones are implemented as synchronous queued on the threadpool, so you need to be careful not to flood the threadpool with thread stalling sync work during grain activation storage reads.
All these problems can be worked around, and I think it's important to do so because you are dealing with storage and people will trust it to reliably persist their precious data.
We went through several iterations for a log-based storage provider and we settled on was:
- use exclusive file locking (and we test this works on the base path specified on initialization, because if not all bets are off) - and handle the specific IOException HResult (which is different on Windows and *nix) trying to access a locked file
- append xxHash to the contents of each file to ensure we can detect partial writes/integrity problems - this could maybe also serve as your etag in this scenario
- always write replacement contents (with a xxHash) to a new deterministically named file and then overwrite the original file ensuring exclusive access to both for the duration.
- always look for the above deterministically named new file when opening each original file, and resume the replacement operation if it exists.
- Add a concurrency gate around the async-over-sync file operations and increase the threadpool size by the concurrency gate limit, to ensure the threadpool has adequate capacity for our file operations.
- Ensure file handle lifetime is short if the number of concurrent handles might become a problem (some storage stacks have limits)
- Make sure you are not going to run into inode exhaustion based on how you store your files especially for EXT volumes it seems.
With the above mitigated we have processed a few billion storage operations on the file system now - but we are still only using this for data that can be replaced.
I need to find the time to get more information about your suggested changes. i am not that deep into the details you mention and i dont want to submit changes i dont understand. |
This pull request introduces a new file-based grain storage provider for Microsoft Orleans. The changes include adding a new project for the provider, implementing its core functionality, and providing documentation and configuration examples.
New File-Based Grain Storage Provider
Project Setup:
Orleans.Persistence.FileStorage
to the solution with the necessary project references and metadata (Orleans.sln
,src/File/Orleans.Persistence.FileStorage/Orleans.Persistence.FileStorage.csproj
) [1] [2].Core Implementation:
FileGrainStorage
class, which provides methods for reading, writing, and clearing grain state using a file-based approach (src/File/Orleans.Persistence.FileStorage/FileGrainStorage.cs
).FileGrainStorageFactory
to create instances ofFileGrainStorage
(src/File/Orleans.Persistence.FileStorage/FileGrainStorageFactory.cs
).FileGrainStorageOptions
to configure the root directory and serializer for the storage provider (src/File/Orleans.Persistence.FileStorage/FileGrainStorageOptions.cs
).FileSiloBuilderExtensions
to simplify the configuration of the file storage provider in Orleans silo builders (src/File/Orleans.Persistence.FileStorage/FileSiloBuilderExtensions.cs
).Documentation and Examples:
README.md
file with an introduction, setup instructions, and examples for configuring and using the file storage provider (src/File/Orleans.Persistence.FileStorage/README.md
).Microsoft Reviewers: Open in CodeFlow