Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions for committing dataset #1338

Open
wants to merge 7 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@apoorvaeternity
Copy link
Member

commented May 30, 2019

No description provided.

@@ -199,7 +199,7 @@ def xml2csv(input_file, outputfile=None, header_values=None, row_tag="row"):
return outputfile


def getmd5(data, data_type='lines', encoding='utf-8'):
def getmd5(data, data_type='lines', encoding=ENCODING):

This comment has been minimized.

Copy link
@henrykironde

henrykironde May 30, 2019

Contributor

I do not recommend changing this line.

This comment has been minimized.

Copy link
@apoorvaeternity

apoorvaeternity May 30, 2019

Author Member

It was causing encoding issues when trying to find the md5 of raw data of aquatic-animal-excretion.

This comment has been minimized.

Copy link
@henrykironde

henrykironde May 30, 2019

Contributor

If you are dealing with this script, you may want to start debugging from
retriever install --debug csv aquatic-animal-excretion

This comment has been minimized.

Copy link
@apoorvaeternity

apoorvaeternity May 30, 2019

Author Member

errr
It is giving the same encoding error. Working fine if I change the encoding to ISO-8859-1 for the script.

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch from ce32961 to 1072546 May 30, 2019



def package_details():
details = {}

This comment has been minimized.

Copy link
@henrykironde

henrykironde May 30, 2019

Contributor

details = {'retriever': VERSION[1:]}

"""
if type(dataset) == str:
# if dataset is not a dataset script object find the right script
dataset = [script for script in datasets() if script.name == dataset][0]

This comment has been minimized.

Copy link
@henrykironde

henrykironde May 30, 2019

Contributor
paths_to_zip = {'script': dataset._file,
                'raw_data': []}

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch from 813e3d4 to 20aa09a May 31, 2019

@apoorvaeternity

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2019

CLI usage:
retriever commit abalone-age -m "commit message" -p path_to_store_archive_file

@henrykironde

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2019

Thanks @apoorvaeternity, hope it covers all added functionality. For example, if I give `retriever commit abalone-aged, I should be able to get some good help messages. I have not tested yet.

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch from 1ef6b79 to 0069dd1 Jun 6, 2019

@henrykironde

This comment has been minimized.

Copy link
Contributor

commented Jun 9, 2019

Just tested this and it looks like the commit are good. We should discuss more when you are free.

details = {}
packages = pkg_resources.working_set
for package in packages:
if package:

This comment has been minimized.

Copy link
@henrykironde

henrykironde Jun 11, 2019

Contributor

I think if package: will always be True

"""
Commit dataset to a zipped file.
"""
if type(dataset) == str:

This comment has been minimized.

Copy link
@henrykironde

henrykironde Jun 11, 2019

Contributor

Would did be the best way to handle this line if type(dataset) == str isinstance(s, str)

This comment has been minimized.

Copy link
@henrykironde

henrykironde Jun 11, 2019

Contributor

I still don't know why we are checking for this string.

This comment has been minimized.

Copy link
@apoorvaeternity

apoorvaeternity Jun 11, 2019

Author Member

If we don't pass a dataset script object but only the name of dataset it should be able to handle it.

print("Successfully committed.")
except Exception as e:
print("Dataset could not be committed.")
raise (e)

This comment has been minimized.

Copy link
@henrykironde

henrykironde Jun 11, 2019

Contributor

Avoid raising, in case we have to use other programs, the raise will interfere with the python interpreter. Let's print the error e and use return.

This comment has been minimized.

Copy link
@apoorvaeternity

apoorvaeternity Jun 11, 2019

Author Member

Okay 👍

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch 3 times, most recently from a8d1f95 to 228a70d Jun 11, 2019

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch from 228a70d to a34f383 Jun 11, 2019

@apoorvaeternity

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2019

@henrykironde How does it look now?

@apoorvaeternity apoorvaeternity changed the title [WIP] Add functions for provenance feature Add functions for committing dataset Jun 11, 2019

@henrykironde

This comment has been minimized.

Copy link
Contributor

commented Jun 11, 2019

Looks good. I think we can make the tests more comprehensive . Lets discuss more when you get time, feel free to call me.

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch 2 times, most recently from ee082dd to 75573a0 Jun 13, 2019

@apoorvaeternity apoorvaeternity force-pushed the apoorvaeternity:provenance_script branch from 75573a0 to fcd5a39 Jun 13, 2019

@apoorvaeternity

This comment has been minimized.

Copy link
Member Author

commented Jun 13, 2019

Once #1343 gets merged I will update the URLs for the raw data of sample-dataset that is being used for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.