Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ZEP 8 (URL syntax) draft #48

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

jbms
Copy link

@jbms jbms commented Sep 10, 2023

No description provided.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@normanrz Please take a look.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@MSanKeys963 Looks like there is an issue with the docs build that is unrelated to this PR.

@jbms
Copy link
Author

jbms commented Sep 10, 2023

@martindurant Would appreciate your perspective on this --- I imagine you might say that we should just use fsspec syntax instead, though.

@martindurant
Copy link
Member

Well indeed, I could say "why invent another"; although translating between | and :: syntax ought to be straight forward. fsspec also cares about fs parameters that might be embedded in URLs and wildcards for globbing.

@normanrz
Copy link
Contributor

While standardizing a URL scheme has benefits on its own, I think the main benefit/motivation for this ZEP is the formalization of Zip stores. Essentially, to comply with this ZEP, implementations need to implement zip stores. Maybe that should be written out more explicitly?

@jbms
Copy link
Author

jbms commented Sep 22, 2023

While standardizing a URL scheme has benefits on its own, I think the main benefit/motivation for this ZEP is the formalization of Zip stores. Essentially, to comply with this ZEP, implementations need to implement zip stores. Maybe that should be written out more explicitly?

While this ZEP was prompted by our discussion about zip stores, my intention was that we standardize on the syntax for various protocols, but that implementations would choose which ones to support.

I think we could also push implementations to support zip format, but I'm not sure I want to tie that to this URL syntax proposal.

@normanrz
Copy link
Contributor

@ap-- I think this might also be interesting for upath to implement.

@normanrz
Copy link
Contributor

@bogovicj this might also be relevant for your OME transformations proposal.

@MSanKeys963
Copy link
Member

MSanKeys963 commented Oct 25, 2023

@MSanKeys963 Looks like there is an issue with the docs build that is unrelated to this PR.

@jbms: I have added #51 to fix the RTD build. Can you please update your PR?
(Seems like I'm unable to update your PR)

title: ZEP0008
description: URL syntax
parent: draft ZEPs
nav_order: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nav_order: 1
nav_order: 8

@bogovicj
Copy link

bogovicj commented Nov 14, 2023

Thanks @jbms for putting this together! There are a few situations I came up with for which I'm not sure what the
relative URL should be

What does it look like to use ..: to "go up" multiple levels?
Is this correct / valid?

Base URL: gs://bucket/0.zip|zip:a|zarr3:i
Relative URL: ..:..:1.zip|zip:b|zarr3:ii
Resolved URL: gs://bucket/1.zip|zip:b|zarr3:ii

Is it correct / valid to use .. in the "path part" of relative URL, after a ..:?

Base URL: gs://bucket/0/a/i.zarr|zarr3:foo
Relative URL: ..:../b/i.zarr|zarr3:foo
Resolved URL: gs://bucket/0/b/i.zarr|zarr3:foo

If one needs to add an adapter in a relative way, how does one go about it?
For example:

Base URL: gs://bucket/0/a/i.zarr
Desired Resolved URL: gs://bucket/0/a/i.zarr|zarr3:foo

Which, if any, of these do you think should be used? Are any of these invalid?

  • .|zarr3:foo (clearest to me)
  • |zarr3:foo
  • zarr3:foo

@bogovicj
Copy link

One more thing:

We've found it useful to be able to reference a particular part of the attributes stored in json
with a URL. For example, for

this zarr3 zarr.json
{
    "zarr_format": 3,
    "node_type": "array",
    "shape": [10000, 1000],
    "dimension_names": ["rows", "columns"],
    "data_type": "float64",
    "chunk_grid": {
        "name": "regular",
        "configuration": {
            "chunk_shape": [1000, 100]
        }
    },
    "chunk_key_encoding": {
        "name": "default",
        "configuration": {
            "separator": "/"
        }
    },
    "codecs": [{
        "name": "gzip",
        "configuration": {
            "level": 1
        }
    }],
    "fill_value": "NaN",
    "attributes": {
        "foo": 42,
        "bar": "apples",
        "baz": [1, 2, 3, 4]
    }
}
  • /attributes/baz[0] points to 1
  • /shape points to [10000, 1000]
  • /chunk_grid/configuration points to { "chunk_shape": [1000, 100] }

Could you envision adding an attributes: or zarr.json:, or similar adapter, that enaables this?

For example: gs://bucket/0.zip|zip:a|zarr3:i|zarr.json:attributes/foo

A specific use case: I often re-use and reference transformations. Since these are described by metadata (not arrays),
and so referencing the specific metadata is helpful.

For example, if this were adopted, something like this would not uncommon in my workflows:

{
    "type" : "sequence",
    "transformations" : [
        { "url" : "..:/localTransformations|zarr.json:/transform[1]" },
        { "url" : "gs://bucket/path/to/templateTransformation.zarr|zarr3:sharedTransforms|zarr.json:/transform[0]" },
    ]
}

@jbms
Copy link
Author

jbms commented Nov 14, 2023 via email

@jbms
Copy link
Author

jbms commented Nov 15, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants