-
Notifications
You must be signed in to change notification settings - Fork 675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Addressable data store (aka CID store) #5715
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
5a93547
to
27345a6
Compare
@jorgee apologies, can latest changes be made as PR against this branch? so it will be much simpler do understand what's new for me |
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Considering the checksum can be computed in different ways, it should be tracked the algorithm used along with the checksum value. I've added a "task" checkbox about this in the comment above. |
Minor, it may be better to rename
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorgee you can use the onWorkflowPublish
event to capture the output metadata (i.e. annotations) from the workflow outputs. This event comes from PublishOp if you want to see how it works. Alternatively, we can modify PublishOp to send the entire metadata for an output when it's done.
I might sketch a PR for this later if I have some time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bentsherman do you know a pipeline, example or test where it is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now there is only this e2e test: https://github.com/nextflow-io/nextflow/blob/master/tests/output-dsl.nf
I am considering adding the algorithm inside the checksum field like the container image digest.
However, the hashing algorithm used by Nexflow depends on the mode and type of data. I should replicate the same code to extract the algorithm, and if I am not wrong, it could also be a combination of algorithms. For instance, in the case of files or directories, it is using a sha-256 hash pass to the default murmur3_128 hasher. So at this moment, I will put something like |
Not understanding the rationale of the mode over the actual algorithm |
If he's using the HashBuilder to hash files, then the checksum will be different based on whether the process |
Yes, I am currently using the HashBuilder. |
I'd considering using something like to avoid collapsing everything in the prefix (and make more extensible if needed)
|
modules/nextflow/src/main/groovy/nextflow/data/cid/CidStore.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Signed-off-by: jorgee <jorge.ejarque@seqera.io> Co-authored-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Minor, this warning message should not container the object address
|
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Signed-off-by: jorgee <jorge.ejarque@seqera.io> Co-authored-by: jorgee <jorge.ejarque@seqera.io>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Signed-off-by: jorgee <jorge.ejarque@seqera.io> Co-authored-by: jorgee <jorge.ejarque@seqera.io>
Tentative implementation for addressable data store (very basic POC so far).
Update on 1 Mar 2025 from #5787 by @jorgee
M1 Implementation of CID store for provenance
Changes:
Known Limitations: