-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spinning up several clusters at the same time creates multiple SystemsManager entries #14
Comments
Thanks for reporting this. |
Is there a timeline or a recommended workaround for this? We've run into the same issue. |
Once the key is generated can you remove the older ones and restart the cluster? |
This particular problem makes the data protection provider difficult to use in AWS Lambda |
What would be a good way to solve this issue? I previously used the package If multiple instances are simultaneously reading and writing to the Parameter Store, then one way to solve this would be to write a temp value to a temp parameter to indicate that the work is in progress. If other instances read that temp value and detect that it's in progress, then they should wait 50-100ms and retry reading the data protection value. Case 1:
Case 2:
The idea of using GUID is to make sure that multiple instances haven't written to the temp parameter, e.g. if multiple instances read the temp parameter at the same time and write to it. Then only one instance will be in charge of writing the data protection value, whoever was last to write its GUID. I would personally use DynamoDb for this, but I wouldn't want to make it a dependency in this package. I'm pretty sure there's an easier way to solve this, but this comes to mind. |
Is there an ETA for this issue or a work around? |
There is no easier way to solve this... This is actually a problem of distributed locking. The scenario you have described is still not bulletproof. In Azure, you can have a distributed locking mechanism using a Blob lease. However, that would introduce dependency on another service/system. In my personal opinion, the best option would be for a developer to implement themselves a custom DataProtectionProvider. After all, one would have to implement only 2 methods : |
Also using lambda and multiple nodes (fargate) and we've taken the following approach... AspNet Core Data Protection is designed to initialize a key if none exist on first usage and rotate them when the key is approaching it's expiration date. I guess it mostly "just works" for most of the scenarios envisaged by the AspNet core team. We've decided to not allow our services that handle application HTTP requests to generate/rotate keys automatically via services
.AddDataProtection()
.SetApplicationName(appName)
.PersistKeysToAWSSystemsManager("/MyApplication/DataProtection");
.DisableAutomaticKeyGeneration(); // Key generation is handled seperately. ... and setting IAM policy to NOT allow these services to write to SSM Parameter store. And instead we have this separate code: var services = new ServiceCollection();
services.AddAWSService<IAmazonSimpleSystemsManagement>(awsOptions);
services.AddLogging(...);
services
.AddDataProtection()
.SetApplicationName(appName)
.PersistKeysToAWSSystemsManager("/MyApplication/DataProtection");
var dataProtectionProvider = serviceProvider.GetRequiredService<IDataProtectionProvider>();
dataProtectionProvider.CreateProtector("doesntmatter"); // Initializes a key if none exists, will rotate if approaching expiration date There are a couple of ways of running this code to do the key management activity:
I will argue this is a better / simpler approach than dealing with race conditions and distributed locking. |
I am using this approach but it's not creating a key if it does not exist. Any ideas why? |
I found that if you spin up a multi-node cluster that uses the AWS Data Protection Provider and all nodes are starting at the exact same time, then you might encounter an issue where more than one "/MyApplication/DataProtection" entry is created in AWS Systems Manager Parameter Store. As a result, these clusters end up not sharing the AWS Systems Manager entry which causes problems.
The only solution is to get a single node spun up first, and only when it's fully loaded, to spin up other nodes, which will use the same Systems Manager entry as the first node.
The text was updated successfully, but these errors were encountered: