Revamp Quickstart #604

fa9r · 2022-05-17T18:46:28Z

Describe changes

I finished @htahir1 's work on creating a new quickstart.

Overall, I completely overhauled the content, structure, and code of all parts: README, notebooks/quickstart.ipynb, run.py, pipelines/, steps/, etc. The new content is basically a compressed & polished version of ZenBytes 1.1 to 3.1.

The PR also includes a small bug fix in src/zenml/integrations/facets/visualizers/facet_statistics_visualizer.py that allows us to use the Facets visualizer in Colab.

@reviewers: Since literally everything changed, it does not make much sense to review the quickstart part by file changes; I'd suggest to manually go through the quickstart notebook / colab / script instead.

Pre-requisites

Please ensure you have done the following:

I have read the CONTRIBUTING.md document.
If my change requires a change to docs, I have updated the documentation accordingly.
If I have added an integration, I have updated the integrations table.
[] I have added tests to cover my changes.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Other (add details above)

review-notebook-app · 2022-05-17T18:46:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

ayush714 · 2022-05-18T07:46:38Z

From my side it's good to go

strickvl · 2022-05-18T08:52:08Z

I just went through this. Some thoughts / feedback in no particular order:

I think you need to merge develop branch into here. There are some CLI fixes that I don't see in this branch (esp stuff like zenml stack list which are fixed on develop)
There's somehow a mismatch between the README and the notebook version. i.e. the README has you create the profile and update the stack and install the integrations, but then the notebook has you do all that stuff all over again. Maybe just tell people to use the notebook at the top before you give all the instructions?
I had a little bit the feeling of 'too fast, too furious' where we got into the full multi-pipeline diagram. Made me think that there are somehow two kinds of people who will come to the quickstart:
- 1. the people who are true beginners (maybe even beginners to ML)
- 1. the people who have been around the block, have deployed models somehow for work and who just want to see the 'full story' showing a somewhat complex use case
- I'm not sure how we bring those two audiences together. Seems like a bit of a conflict in my mind.
- I know we want to showcase ZenML in all its glory, but if you have no idea about anything, then I think it would be a bit overwhelming.
- (Appreciate that this isn't the most useful / productive feedback at this stage :) )
You start with the pipeline and then define all the steps. I wonder whether it could help with the information overload to start with telling the story of all the individual pieces and then show how it all comes together as pipelines?
The header "Run ML Pipelines" should probably be "Run ZenML Pipelines"
I actually got the following error when I tried to run the pipelines, so I was unable to get any further in the notebook:

RuntimeError: Failed to start service 
MLFlowDeploymentService[841cb997-d79d-4c6e-8712-9c977316e3b8] (type: model-serving, flavor: 
mlflow)
  Administrative state: `active`
  Operational state: `inactive`
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: 
/Users/strickvl/Library/Application Support/zenml/local_stores/fa711f02-18b6-4a07-95b2-a79fbf
7a2a49/841cb997-d79d-4c6e-8712-9c977316e3b8/service.log

All that said, however, there is a certain kind of more technical / competent user who will find this quickstart amazing and exactly what they wanted, so nice work on that! I think this is a great step forward for them.

…ze-quickstart

htahir1

I ran the quickstart and noticed a few things:

The restart button works but then it takes a bit long to actually do the restart. You can even run the next few cells AFTER you execute the restart and it somehow still runs them making them appear as if the restart
The FacetsVisualizer needs to run twice to get it working. This is a known bug but i would add it in the notebook somewhere
The DAG Visualizer within the notebook is a bit glitchy. Its zoomed out and doesnt allow for panning properly. Would be cool to make it a bit more palatable within the notebook and also give an option to make it work outside the notebook for non-colab users
I have not run the actually code yet but shouldnt we use the scritps to run it on kubeflow? That would be a missing link the the story. What I would do is: Hey now you ran it on notebook, run it on a kubeflow stack with these commands and then either point them to the README where everything is written or have like a run_on_kubeflow.sh script that does the stack provisoning and running the pipeline

Overall fantastic work! Really love it!

…nd move run.py-specific set up instructions out of prerequisites.

fa9r · 2022-05-18T09:32:16Z

I just went through this. Some thoughts / feedback in no particular order:

I think you need to merge develop branch into here. There are some CLI fixes that I don't see in this branch (esp stuff like zenml stack list which are fixed on develop)

Just merged Hamza's branch (which just merged develop), now the CLI issues should be fixed here as well.

There's somehow a mismatch between the README and the notebook version. i.e. the README has you create the profile and update the stack and install the integrations, but then the notebook has you do all that stuff all over again. Maybe just tell people to use the notebook at the top before you give all the instructions?

True, the detailed setup instructions are only needed when running with run.py actually. I moved all those instructions down to the run.py section and also marked the notebook option as 'Recommended'.

I had a little bit the feeling of 'too fast, too furious' where we got into the full multi-pipeline diagram. Made me think that there are somehow two kinds of people who will come to the quickstart:

the people who are true beginners (maybe even beginners to ML)

the people who have been around the block, have deployed models somehow for work and who just want to see the 'full story' showing a somewhat complex use case

I'm not sure how we bring those two audiences together. Seems like a bit of a conflict in my mind.

I know we want to showcase ZenML in all its glory, but if you have no idea about anything, then I think it would be a bit overwhelming.

(Appreciate that this isn't the most useful / productive feedback at this stage :) )

Very true, I was thinking about this too. I found it hard to show off the value that ZenML creates without making it somewhat complex though (IMO a lot of the value comes from our various integrations and how we abstract them via stacks). I think for type 1 (true beginners) ZenBytes is the place to start. I included a link to that at the end of the quickstart, but now that I think about it a lot of folks might drop out before ever seeing that if they feel overwhelmed during the quickstart already. How about we add some kind of disclaimer at the beginning that mentions that the target audience of this quickstart is type 2 (having basic ML/MLOps experimence) and link to ZenBytes there as well. Thanks for pointing this out @strickvl , this is a big issue IMO. Any other ideas how to address it?

You start with the pipeline and then define all the steps. I wonder whether it could help with the information overload to start with telling the story of all the individual pieces and then show how it all comes together as pipelines?

Yes, great point. I think the natural flow is how we do it in ZenBytes:

start with data/train/eval steps
put it into training pipeline
add deployment and skew detection to train pipeline
run the train pipeline and show the model running
write the prediction service loader step to load the deployed model
write the basic inference pipeline with inference data loader and predictor
add skew and drift detection to inference pipeline
run inference pipeline

The reason I changed the ordering here is that I tried to make the quickstart more quick. This way, we just directly define the pipelines, then code up the steps (which I'd expect people to just skim over), then run them both and get done with it. IMO I feel going into detail on each step/part is a bit too much content here. However, this approach also has drawbacks as you said, e.g. one thing I don't like is that the prediction service loader step is defined before the training pipeline is run which makes that step feel really obscure. Personally, I still prefer the current ordering for sake of brevity as mentioned above, but I'm open to change it if more people prefer that.

The header "Run ML Pipelines" should probably be "Run ZenML Pipelines"

Good suggestion, I changed it.

I actually got the following error when I tried to run the pipelines, so I was unable to get any further in the notebook:

RuntimeError: Failed to start service 
MLFlowDeploymentService[841cb997-d79d-4c6e-8712-9c977316e3b8] (type: model-serving, flavor: 
mlflow)
  Administrative state: `active`
  Operational state: `inactive`
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: 
/Users/strickvl/Library/Application Support/zenml/local_stores/fa711f02-18b6-4a07-95b2-a79fbf
7a2a49/841cb997-d79d-4c6e-8712-9c977316e3b8/service.log

This is caused by the new MLflow 1.26.0 not being compatible with ZenML. I think @stefannica already fixed this in this PR: #595, so this also be fixed now that I merged develop.

All that said, however, there is a certain kind of more technical / competent user who will find this quickstart amazing and exactly what they wanted, so nice work on that! I think this is a great step forward for them.

Thanks so much for your amazing review @strickvl!

strickvl · 2022-05-18T09:37:50Z

Very true, I was thinking about this too. I found it hard to show off the value that ZenML creates without making it somewhat complex though (IMO a lot of the value comes from our various integrations and how we abstract them via stacks). I think for type 1 (true beginners) ZenBytes is the place to start. I included a link to that at the end of the quickstart, but now that I think about it a lot of folks might drop out before ever seeing that if they feel overwhelmed during the quickstart already. How about we add some kind of disclaimer at the beginning that mentions that the target audience of this quickstart is type 2 (having basic ML/MLOps experimence) and link to ZenBytes there as well. Thanks for pointing this out @strickvl , this is a big issue IMO. Any other ideas how to address it?

I think the core thing is just going to be how we guide people through our various entry pages / entry points. The core ones are:

whatever happens when someone types zenml go
our main webpage (zenml.io)
our Github README
the docs

All of these should be united and uniform I think in offering two different paths: (1) for ultra beginners and (2) for people with experience who want to get a quick glimpse of the power of ZenML. Those two different paths seem to require pretty different experiences or guides, so we should just be explicit about that and cater to them separately.

We should maybe use the word 'beginner' somehow as a signpost for real beginners.

In any case, we should just be really explicit about which examples/notebooks are for you if you're new to things vs if you've been around the block a few times. (This intersects with my work on the README, btw, so will sync with you later in the day on that I guess so we have this standardised everywhere).

fa9r · 2022-05-18T09:43:06Z

I ran the quickstart and noticed a few things:

The restart button works but then it takes a bit long to actually do the restart. You can even run the next few cells AFTER you execute the restart and it somehow still runs them making them appear as if the restart

Yeah this can be confusing when you just want to shift-enter through it. Already added a comment below the installs telling people to wait for the cell to finish and kernel to restart before proceeding. Not sure how to do this better, any ideas?

The FacetsVisualizer needs to run twice to get it working. This is a known bug but i would add it in the notebook somewhere

I haven't experienced that myself yet but I added a note to rerun the cell if nothing happens.

The DAG Visualizer within the notebook is a bit glitchy. Its zoomed out and doesnt allow for panning properly. Would be cool to make it a bit more palatable within the notebook and also give an option to make it work outside the notebook for non-colab users

Haven't experienced that myself either, I will rerun the quickstart myself again now that your branch and develop are merged. Let's see if this can be improved.

I have not run the actually code yet but shouldnt we use the scritps to run it on kubeflow? That would be a missing link the the story. What I would do is: Hey now you ran it on notebook, run it on a kubeflow stack with these commands and then either point them to the README where everything is written or have like a run_on_kubeflow.sh script that does the stack provisoning and running the pipeline

Great suggestion, would love to do that, but that will still be a lot of work. IMO this is also the missing link in ZenBytes before we deploy to the cloud. IMO this is the next piece of content we should write and then add it into the quickstart too!

Overall fantastic work! Really love it!

Glad to hear, thanks for the review 😃

htahir1 · 2022-05-18T10:18:25Z

Yeah this can be confusing when you just want to shift-enter through it. Already added a comment below the installs telling people to wait for the cell to finish and kernel to restart before proceeding. Not sure how to do this better, any ideas?

Maybe a os.sleep(5) to even force people to wait? :D

I haven't experienced that myself yet but I added a note to rerun the cell if nothing happens.

Thanks!

Haven't experienced that myself either, I will rerun the quickstart myself again now that your branch and develop are merged. Let's see if this can be improved.

Yes lets do it. Happy to hear it works for you. @strickvl Did you also find the inline dag visualizer ok to work with?

Great suggestion, would love to do that, but that will still be a lot of work. IMO this is also the missing link in ZenBytes before we deploy to the cloud. IMO this is the next piece of content we should write and then add it into the quickstart too!

Is it a lot of work really? Would it not just be a few lines in the README and at the end of the notebook? We already have the code itself in zenbytes

strickvl · 2022-05-18T10:46:55Z

@htahir1 I never got that far in the notebook since there was some bug which prevented me from running the pipelines at all.

fa9r · 2022-05-18T12:22:12Z

Maybe a os.sleep(5) to even force people to wait? :D

Doesn't work unfortunately, still kills the subsequent cells

…zenml-io/zenml into feature/CONTENT-166-finalize-quickstart

* Getting there * midway best practices * Added more explain stuff * Edited best practices * checkpoint * Checkpoint for quickstart * Added notebook * Added dash visualizer jupyter * Refactored imports * Added runner script * Quickstart now follows best practices * Quickstart formatted * Updated zenml go * Latest quickstart * Adapted zenml go to new quickstart * Added event for extending ZenML * Linting fixes * zenml go is now optout too * Made an exception for pipeline yamls * Added config * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> * Deleting * preserving user metadata * Fromatted latest * Formatting and linting * Formatting and linting * Typo * One function down * Update stakc comp done * Deregister stack * Create user * Delete user * Get user * Added all events * Added project * Added facade for all base methods * Deleted flavor event delted * Wrapped track event to make sure deduplication doesnt happen * Wrapped track event to make sure deduplication doesnt happen * Registered stack refactored * Flavor event is now properly done * Update stack done * Added import and export stack * Added import and export stack * Analytics events done * Copy paste error * Fixed mypy errors * Toc updated * Restructurung * Restructure * GitBook: [#2] No subject * Fix/update `zenml go` (#606) * Fix broken feature store example (#602) * allow yaml files * include yaml config * Added success message (#603) * Added success message * Reformatted * remove extra colon * add extra spacing * update zenml go behaviour Co-authored-by: Alexej Penner <thealexejpenner@gmail.com> * Analytics turned off in copy_active_configuration * Docstring nits * Formatting * Added todo * Revamp Quickstart (#604) * WIP: rework/reformat quickstart. * Finished quickstart.ipynb (untested). * Improved docs/comments of quickstart.ipynb. * Fix bugs in quickstart.ipynb. Everything except dash integration works. * Update quickstart README and setup.py. * Bug fix: allow Facets visualizer magic in Colab. * Added screenshots for all visualizations to quickstart.ipynb. * Added pipeline visualization to quickstart.ipynb. * Added stack visualization to quickstart.ipynb. * Wrote new quickstart run.py script / code modules. * Fix grammar / spelling of quickstart. * Update quickstart README: declare notebook execution as recommended and move run.py-specific set up instructions out of prerequisites. * Adjust quickstart to reviewer feedback. * Added magic flag Co-authored-by: Hamza Tahir <htahir111@gmail.com> * Changed best practices to misc and added resources section * Finished landing page of resoruces * Half finished best practices * Added eventbrite * Getting there * FAQs added basically * Fixed link; * GitBook: [#3] No subject * Apply suggestions from code review Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com> Co-authored-by: Alexej Penner <thealexejpenner@gmail.com> Co-authored-by: Felix Altenberger <felix@zenml.io>

fa9r added 10 commits May 16, 2022 17:57

WIP: rework/reformat quickstart.

30dd244

Finished quickstart.ipynb (untested).

1a720a7

Improved docs/comments of quickstart.ipynb.

e2ef487

Fix bugs in quickstart.ipynb. Everything except dash integration works.

46c088b

Update quickstart README and setup.py.

ad9d6d4

Bug fix: allow Facets visualizer magic in Colab.

d938082

Added screenshots for all visualizations to quickstart.ipynb.

6dba722

Added pipeline visualization to quickstart.ipynb.

d3e2b79

Added stack visualization to quickstart.ipynb.

5b03ef2

Wrote new quickstart run.py script / code modules.

e161dc5

fa9r requested review from htahir1 and strickvl May 17, 2022 18:46

github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels May 17, 2022

fa9r changed the title ~~Feature/content 166 finalize quickstart~~ Revamped Quickstart May 17, 2022

fa9r changed the title ~~Revamped Quickstart~~ Revamp Quickstart May 17, 2022

Fix grammar / spelling of quickstart.

94cb9f6

Merge branch 'doc/hamza-misc-updates' into feature/CONTENT-166-finali…

e7d8beb

…ze-quickstart

htahir1 reviewed May 18, 2022

View reviewed changes

Update quickstart README: declare notebook execution as recommended a…

7506b8e

…nd move run.py-specific set up instructions out of prerequisites.

Adjust quickstart to reviewer feedback.

8e3215c

htahir1 and others added 2 commits May 18, 2022 15:06

Added magic flag

2b04621

Merge branch 'feature/CONTENT-166-finalize-quickstart' of github.com:…

da70928

…zenml-io/zenml into feature/CONTENT-166-finalize-quickstart

htahir1 merged commit d405139 into doc/hamza-misc-updates May 18, 2022

htahir1 deleted the feature/CONTENT-166-finalize-quickstart branch May 18, 2022 13:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revamp Quickstart #604

Revamp Quickstart #604

fa9r commented May 17, 2022 •

edited

Loading

review-notebook-app bot commented May 17, 2022

ayush714 commented May 18, 2022

strickvl commented May 18, 2022 •

edited

Loading

htahir1 left a comment

fa9r commented May 18, 2022

strickvl commented May 18, 2022

fa9r commented May 18, 2022

htahir1 commented May 18, 2022

strickvl commented May 18, 2022

fa9r commented May 18, 2022

Revamp Quickstart #604

Revamp Quickstart #604

Conversation

fa9r commented May 17, 2022 • edited Loading

Describe changes

Pre-requisites

Types of changes

review-notebook-app bot commented May 17, 2022

ayush714 commented May 18, 2022

strickvl commented May 18, 2022 • edited Loading

htahir1 left a comment

Choose a reason for hiding this comment

fa9r commented May 18, 2022

strickvl commented May 18, 2022

fa9r commented May 18, 2022

htahir1 commented May 18, 2022

strickvl commented May 18, 2022

fa9r commented May 18, 2022

fa9r commented May 17, 2022 •

edited

Loading

strickvl commented May 18, 2022 •

edited

Loading