Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] CSV Datasource error #13

Closed
bonacciog opened this issue Jan 15, 2021 · 10 comments
Closed

[BUG] CSV Datasource error #13

bonacciog opened this issue Jan 15, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@bonacciog
Copy link

bonacciog commented Jan 15, 2021

Describe the bug
I'm not able to getting started with quickstart example pipeline.
Trying to run:
ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv')

To Reproduce
I have followed QuickStart steps:

  1. pip install zenml
  2. zenml init
  3. Run the QuickStart example

Screenshots
Schermata 2021-01-15 alle 14 56 56

Stack Trace

KeyError Traceback (most recent call last)
in
1 # Add a datasource. This will automatically track and version it.
----> 2 ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv')
3 training_pipeline.add_datasource(ds)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/csv_datasource.py in init(self, name, path, schema, **unused_kwargs)
45 schema (str): optional schema for data to conform to.
46 """
---> 47 super().init(name, schema, **unused_kwargs)
48 self.path = path
49

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/base_datasource.py in init(self, name, schema, _id, _source, *args, **kwargs)
61 else:
62 # If none, then this is assumed to be 'new'. Check dupes.
---> 63 all_names = Repository.get_instance().get_datasource_names()
64 if any(d == name for d in all_names):
65 raise Exception(

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/repo/repo.py in get_datasource_names(self)
236 c = yaml_utils.read_yaml(file_path)
237 n.append(c[keys.GlobalKeys.DATASOURCE][keys.DatasourceKeys.NAME])
--> 238 return list(set(n))
239
240 @track(event=GET_DATASOURCES)

KeyError: 'datasource'

** Context (please complete the following information):**

  • OS: MacOS Big Sur 11.1
  • Python Version: 3.8.2
  • ZenML Version: 0.1.3
@hamzamaiot
Copy link
Contributor

Thanks for reporting @bonacciog ! Could you please verify that your pipelines_dir does not contain any YAML files from older runs of ZenML. This might be due to a non-backwards compatible upgrade of the YAML standard from 0.1.2 to 0.1.3

@hamzamaiot hamzamaiot added the bug Something isn't working label Jan 15, 2021
@bonacciog
Copy link
Author

@hamzamaiot Thanks for your reply!

Schermata 2021-01-15 alle 15 08 40

It does't seem there is an YAML file. Is this screenshot enough?

@hamzamaiot
Copy link
Contributor

Could you check in the pipelines directory?

ls pipelines/

@bonacciog
Copy link
Author

Schermata 2021-01-15 alle 15 32 31

@hamzamaiot
Copy link
Contributor

hamzamaiot commented Jan 15, 2021

ah yes, I see the bug. The Untitled.ipynb file should not exist in the pipelines directory. Actually, nothing should exist in the pipelines directory except the pipeline YAML configurations -> Thats a condition we should be catching in the code so thanks for bringing it up! I'll keep this issue open and add a PR to it that helps in catching these errors more elegantly.

Your immediate solution would be to move the Untitled.ipynb out into the root zenml_practise dir and try again. It should hopefully work.

P.S. If you wanted a recommended directory structure, we have one in our docs for reference. Hope it helps!

@bonacciog
Copy link
Author

Thank you for your help @hamzamaiot!

I have created that file, so my mistake.
Now the dir is organized in this way:
Schermata 2021-01-15 alle 16 35 38

I have created the dir "notebooks" and file QuickStart.ipynb to start with QuickStart example.

Now I got another error:
Schermata 2021-01-15 alle 16 38 05

@bonacciog
Copy link
Author

Maybe I have to work outside of subdirectories (notebooks, pipelines..) ? @hamzamaiot

@htahir1
Copy link
Contributor

htahir1 commented Jan 15, 2021

Sorry switching accounts. Now the error is clearer - the datasource already exists so you can either fetch it in your script using the repository.get_datasource_by_name() . The repository instance can be fetched by using Repository.get_instance() .

Or to start from scratch just delete the pipelines directory and it will work 👍

@bonacciog
Copy link
Author

Thank you! @htahir1
Now it is working fine! My mistake, sorry

@htahir1
Copy link
Contributor

htahir1 commented Jan 16, 2021

No problem! We also added a fix to the YAML pipelines dir problem you were facing in #14 . Thanks for the heads-up!

@htahir1 htahir1 closed this as completed Jan 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants