Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content and formatting in the output of retriever ls #210

Closed
1 of 2 tasks
cboettig opened this issue Sep 24, 2014 · 4 comments
Closed
1 of 2 tasks

Content and formatting in the output of retriever ls #210

cboettig opened this issue Sep 24, 2014 · 4 comments
Assignees

Comments

@cboettig
Copy link

cboettig commented Sep 24, 2014

As discussed in ropensci/rdataretriever#36 (comment)

  • It's not trivial to map the available datasets listed to the more
    meaningful descriptions on the website,
    http://ecodataretriever.org/available-data.html.
  • The naming convention for datasets is unclear, and not obviously scalable as an
    identifier.
@dmcglinn
Copy link
Member

I agree this is a bit of a problem. It seems like the mapping of names from ls datasets to those descriptions at http://www.ecodataretriever.org/available-data.html could be accomplished with a table at http://www.ecodataretriever.org/available-data.html with two columns: 1) the long hyperlinked descriptive name of the dataset, and 2) the shortname that retriever users will refer to the dataset by (e.g., BBS).

With respect to scalability of names it appears that many of the datasets names are of the format LastnameYEAR where Lastname refers to the lastname of the first author of the dataset. This system could be codified as a rule for published datasets. For unpublished datasets I don't think a simple system will be that easy to define.

Additionally it appears that the output of retriever ls does not automatically detect all of the available datasets. For example the newly created script EA_palmer2007.script which generates the dataset named Palmer2007 is not listed by retriever ls.

@ethanwhite
Copy link
Member

I also agree that this is something that needs improving, just haven't had time to work on it.

Additionally it appears that the output of retriever ls does not automatically detect all of the available datasets. For example the newly created script EA_palmer2007.script which generates the dataset named Palmer2007 is not listed by retriever ls.

This is probably because you're running the most recent release rather than the current version of master. The current release will only download scripts that existed as of that release. The truth is the entire relationships between scripts and releases needs to be thought about from both a technical and user perspective and I'm hoping that once I get a software engineer hired we'll be able to tackle that side of things.

@shreyneil
Copy link
Contributor

shreyneil commented Feb 20, 2018

@ethanwhite I think this issue was solved using #488 . Please review and close this issue.

@ethanwhite
Copy link
Member

I agree that this has generally been addressed. Thanks for pointing this out @shreyneil!

We've fully addressed the first point in that they are explicitly linked at https://retriever.readthedocs.io/en/latest/datasets_list.html and through the verbose presentation that @shreyneil points to.

We've also improved the ability to digest and work with this metadata in Python and R.

We haven't yet grappled with the naming conventions issue, but to be honest that feels like a broader community discussion involving Frictionless Data and other folks. Specifically I think any conventions related to naming should end up in https://frictionlessdata.io/specs/data-package/#required-properties, where there aren't any specifics about naming at this time.

Thanks for the issue @cboettig. We may not get to things fast, but we do see to get to them eventually 😄.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants