-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add ability to use existing data packages #980
Conversation
The basics of this are working. Running:
Will grab the json data package file, modify it to run with the retriever, and add it to the If other folks' data package files looked exactly like ours we'd just need to hook this code up, but they don't, and I think this is going to mean a fair bit of additional work. E.g.,
|
Downloads Frictionless Data data packages from the web and adds them to the retriever. Accomplished by: * Retrieving the JSON metadata file * Modifying it to work with the retriever * Writing it to the scripts directory New packages are added by adding a name-url pair to scripts/datapackages.yml.
External data packages will have a number of keys that we don't use in the retriever. This only loads the keys that we work with, handles their types correctly, and ignores the rest. This solves issues with non-retriever keys whose values were strings not being handled properly.
The retriever still describes types using its old system instead of the official frictionless data spec. This converts frictionless data types to retriever types to allow proper handling.
There is no system for ensuring that data package names are unique and descriptive. This replaces the name in the original dp with the name specified in datapackage.yml
b07d216
to
a4ef09b
Compare
To properly test this feature it was necessary to add datapackages.yml to the master branch separately.
4e6b494
to
373a702
Compare
@@ -26,6 +27,10 @@ def download_from_repository(filepath, newpath, repo=REPOSITORY): | |||
raise | |||
pass | |||
|
|||
def download_external_dps(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some clean up.
- more line space before the function.
'number': 'double', | ||
'integer': 'int', | ||
'date': 'char' | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Closing bracket not indented well
- Double spacing between functions
- Do we need the extra libs
urlparse, urlunparse, ParseResult
2.Add new tables for spatial data. For the retriever to handle Geospatial data 3.Define table for each script before pre-processing. 4.Added the functionaly to handle compressed/archied datasets by using key word archieved and the files that are in the zip for each table. 5. Handling data packages registered in the `datapackage.yml` file I have remove the scripts from this file that seem not to have followed the standards. To make this possible, we are matching the major `frictionless` types, integer,string etc, with the types in the retriever and falling back to char for some other types For example, datetime YYYY-MM-DDThh:mm:ssZ. Ref: This is a combination of the work from PR weecology#814 weecology#980 weecology#1005
2.Add new tables for spatial data. For the retriever to handle Geospatial data 3.Define table for each script before pre-processing. 4.Added the functionaly to handle compressed/archied datasets by using key word archieved and the files that are in the zip for each table. 5. Handling data packages registered in the `datapackage.yml` file I have remove the scripts from this file that seem not to have followed the standards. To make this possible, we are matching the major `frictionless` types, integer,string etc, with the types in the retriever and falling back to char for some other types For example, datetime YYYY-MM-DDThh:mm:ssZ. Ref: This is a combination of the work from PR weecology#814 weecology#980 weecology#1005
2.Add new tables for spatial data. For the retriever to handle Geospatial data 3.Define table for each script before pre-processing. 4.Added the functionaly to handle compressed/archied datasets by using key word archieved and the files that are in the zip for each table. 5. Handling data packages registered in the `datapackage.yml` file I have remove the scripts from this file that seem not to have followed the standards. To make this possible, we are matching the major `frictionless` types, integer,string etc, with the types in the retriever and falling back to char for some other types For example, datetime YYYY-MM-DDThh:mm:ssZ. Ref: This is a combination of the work from PR weecology#814 weecology#980 weecology#1005
Closing this pr now that changes have been incorporated into PR #1010. Feel free to open this PR up again if anything new comes to light! |
Downloads Frictionless Data data packages from the web and adds
them to the retriever. Accomplished by:
New packages are added by adding a name-url pair to
scripts/datapackages.yml.