Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert format of configuration URLs #54

Closed
abudaan opened this issue Jan 20, 2024 · 8 comments
Closed

Revert format of configuration URLs #54

abudaan opened this issue Jan 20, 2024 · 8 comments
Assignees

Comments

@abudaan
Copy link
Member

abudaan commented Jan 20, 2024

          Both GCP and AWS seems to use a similar URI format for bucket identification:

GCP: gs://bucketname/path/to/object
https://cloud.google.com/storage/docs/gsutil

AWS: s3://buckname/path/to/object
https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html#using-s3-commands-managing-objects-param

I understand extra params are required such as region, credentials etc... but the v1 formats handled that nicely.

Originally posted by @headlessme in #46 (comment)

@abudaan
Copy link
Member Author

abudaan commented Jan 20, 2024

Okay, I will revert to the old URL format and parser. I might change the format a bit. Do you think we should use @ for the bucket?

proto://credential1:credential2@bucket?region=us-west-1&param2=value2

@abudaan abudaan self-assigned this Jan 20, 2024
@abudaan
Copy link
Member Author

abudaan commented Jan 20, 2024

Proposal for config URLs, read here

@headlessme
Copy link

headlessme commented Jan 20, 2024

Mapping these config strings to URI components:

scheme = The storage type (s3, gs, local, azure etc...). I've used gs here to match the native Google format but gcs could be aliased to provide backwards compatibility with v1 of this library.

The authority breaks down to:

  • userinfo - any credentials required, spread over username and password as makes sense for each provider (e.g. AccessTokenId:SecretAccessKey for AWS)
  • host - the bucket being targeted (bucket name in this usage seems akin to domains in HTTP URLs)
  • port - unused

The path maps directly to the object path within the bucket

query stores any provider specific configuration (e.g. ?region=us-east-1)

fragment - unused

I think this matches pretty closely your existing implementation from v1 (apart from AWS region in the host, which I think is a bit inconsistent with the others)? I think this is also a superset of at least the basic GCP and AWS bucket URI formats that I know about, which is a nice property. I'm not that familiar with the other providers so I don't know if it would accommodate their existing formats too.

I'll look through the proposal doc in more detail to see if it matches this generic structure.

@abudaan
Copy link
Member Author

abudaan commented Jan 21, 2024

Thanks for your input! I do agree that @region/bucketName is the odd one out here so I will implement what you suggested:

s3://access_key_id:secret_access_key@bucket_name?region=us-west-1&extra_option2=value2...

This way the configuration URLs can be consistent across all currently supported cloud storage services. And besides it simplifies the parser function :-)

@headlessme
Copy link

headlessme commented Jan 22, 2024

I'd expect these strings to be parseable with https://nodejs.org/api/url.html#new-urlinput-base as they'll be a standard format.

A quick test in node:

import { URL } from 'url'

const parsed = new URL('s3://id:key@mybucket/some/path/to/an/object?region=us-east-1')

console.log(parsed)

This outputs

URL {
  href: 's3://id:key@mybucket/some/path/to/an/object?region=us-east-1',
  origin: 'null',
  protocol: 's3:',
  username: 'id',
  password: 'key',
  host: 'mybucket',
  hostname: 'mybucket',
  port: '',
  pathname: '/some/path/to/an/object',
  search: '?region=us-east-1',
  searchParams: URLSearchParams { 'region' => 'us-east-1' },
  hash: ''
}

@abudaan
Copy link
Member Author

abudaan commented Jan 26, 2024

I have implemented the standard URL parser but unfortunately this can't be used with the config URL of the local adapter and the GS adapter:

const u = "local://path/to/bucket@bucket_name?mode=511&extra_option2=value2"
{
  value: {
    type: 'local:',
    part1: null, // should be: path/to/bucket
    part2: null,
    bucketName: 'path',
    extraOptions: { mode: '511', extra_option2: 'value2' }
  },
  error: null
}


const u ="gcs://path/to/key_file.json@bucket_name?extra_option1=value1&extra_option2=value2";
{
  value: {
    type: 'gcs:',
    part1: null, // should be: path/to/key_file.json
    part2: null,
    bucketName: 'path',
    extraOptions: { extra_option1: 'value1', extra_option2: 'value2' }
  },
  error: null
}

Also the config URL doesn't (yet) use the path to an object in the storage; you need to call a getFile method after you've created a Storage instance.

It might be worth considering whether or not we should add support for direct URL's to objects in the storage. This would imply that if a path to an object is detected in the config URL, the Storage automatically calls a getFile method after instantiation.

  • getFile can be either getFileAsURL or getFileAsStream

@headlessme
Copy link

The path could be treated as a default object prefix for any later calls to any of the getFile methods?

@abudaan
Copy link
Member Author

abudaan commented Jan 31, 2024

fixed in 2.1.0

@abudaan abudaan closed this as completed Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants