[BUG] CSV Datasource error #13

bonacciog · 2021-01-15T13:59:50Z

Describe the bug
I'm not able to getting started with quickstart example pipeline.
Trying to run:
ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv')

To Reproduce
I have followed QuickStart steps:

pip install zenml
zenml init
Run the QuickStart example

Screenshots

Stack Trace

KeyError Traceback (most recent call last)
in
1 # Add a datasource. This will automatically track and version it.
----> 2 ds = CSVDatasource(name='Pima Indians Diabetes Dataset', path='gs://zenml_quickstart/diabetes.csv')
3 training_pipeline.add_datasource(ds)

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/csv_datasource.py in init(self, name, path, schema, **unused_kwargs)
45 schema (str): optional schema for data to conform to.
46 """
---> 47 super().init(name, schema, **unused_kwargs)
48 self.path = path
49

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/datasources/base_datasource.py in init(self, name, schema, _id, _source, *args, **kwargs)
61 else:
62 # If none, then this is assumed to be 'new'. Check dupes.
---> 63 all_names = Repository.get_instance().get_datasource_names()
64 if any(d == name for d in all_names):
65 raise Exception(

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/zenml/core/repo/repo.py in get_datasource_names(self)
236 c = yaml_utils.read_yaml(file_path)
237 n.append(c[keys.GlobalKeys.DATASOURCE][keys.DatasourceKeys.NAME])
--> 238 return list(set(n))
239
240 @track(event=GET_DATASOURCES)

KeyError: 'datasource'

** Context (please complete the following information):**

OS: MacOS Big Sur 11.1
Python Version: 3.8.2
ZenML Version: 0.1.3

hamzamaiot · 2021-01-15T14:03:57Z

Thanks for reporting @bonacciog ! Could you please verify that your pipelines_dir does not contain any YAML files from older runs of ZenML. This might be due to a non-backwards compatible upgrade of the YAML standard from 0.1.2 to 0.1.3

bonacciog · 2021-01-15T14:10:27Z

@hamzamaiot Thanks for your reply!

It does't seem there is an YAML file. Is this screenshot enough?

hamzamaiot · 2021-01-15T14:16:53Z

Could you check in the pipelines directory?

ls pipelines/

bonacciog · 2021-01-15T14:33:00Z

hamzamaiot · 2021-01-15T15:20:15Z

ah yes, I see the bug. The Untitled.ipynb file should not exist in the pipelines directory. Actually, nothing should exist in the pipelines directory except the pipeline YAML configurations -> Thats a condition we should be catching in the code so thanks for bringing it up! I'll keep this issue open and add a PR to it that helps in catching these errors more elegantly.

Your immediate solution would be to move the Untitled.ipynb out into the root zenml_practise dir and try again. It should hopefully work.

P.S. If you wanted a recommended directory structure, we have one in our docs for reference. Hope it helps!

bonacciog · 2021-01-15T15:39:16Z

Thank you for your help @hamzamaiot!

I have created that file, so my mistake.
Now the dir is organized in this way:

I have created the dir "notebooks" and file QuickStart.ipynb to start with QuickStart example.

Now I got another error:

bonacciog · 2021-01-15T15:41:15Z

Maybe I have to work outside of subdirectories (notebooks, pipelines..) ? @hamzamaiot

htahir1 · 2021-01-15T15:55:20Z

Sorry switching accounts. Now the error is clearer - the datasource already exists so you can either fetch it in your script using the repository.get_datasource_by_name() . The repository instance can be fetched by using Repository.get_instance() .

Or to start from scratch just delete the pipelines directory and it will work 👍

bonacciog · 2021-01-15T16:02:22Z

Thank you! @htahir1
Now it is working fine! My mistake, sorry

htahir1 · 2021-01-16T20:01:35Z

No problem! We also added a fix to the YAML pipelines dir problem you were facing in #14 . Thanks for the heads-up!

hamzamaiot added the bug Something isn't working label Jan 15, 2021

htahir1 closed this as completed Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] CSV Datasource error #13

[BUG] CSV Datasource error #13

bonacciog commented Jan 15, 2021 •

edited

Loading

hamzamaiot commented Jan 15, 2021

bonacciog commented Jan 15, 2021

hamzamaiot commented Jan 15, 2021

bonacciog commented Jan 15, 2021

hamzamaiot commented Jan 15, 2021 •

edited

Loading

bonacciog commented Jan 15, 2021

bonacciog commented Jan 15, 2021

htahir1 commented Jan 15, 2021

bonacciog commented Jan 15, 2021

htahir1 commented Jan 16, 2021

[BUG] CSV Datasource error #13

[BUG] CSV Datasource error #13

Comments

bonacciog commented Jan 15, 2021 • edited Loading

Stack Trace

hamzamaiot commented Jan 15, 2021

bonacciog commented Jan 15, 2021

hamzamaiot commented Jan 15, 2021

bonacciog commented Jan 15, 2021

hamzamaiot commented Jan 15, 2021 • edited Loading

bonacciog commented Jan 15, 2021

bonacciog commented Jan 15, 2021

htahir1 commented Jan 15, 2021

bonacciog commented Jan 15, 2021

htahir1 commented Jan 16, 2021

bonacciog commented Jan 15, 2021 •

edited

Loading

hamzamaiot commented Jan 15, 2021 •

edited

Loading