Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revamp Quickstart #604

Merged

Conversation

fa9r
Copy link
Contributor

@fa9r fa9r commented May 17, 2022

Describe changes

I finished @htahir1 's work on creating a new quickstart.

Overall, I completely overhauled the content, structure, and code of all parts: README, notebooks/quickstart.ipynb, run.py, pipelines/, steps/, etc. The new content is basically a compressed & polished version of ZenBytes 1.1 to 3.1.

The PR also includes a small bug fix in src/zenml/integrations/facets/visualizers/facet_statistics_visualizer.py that allows us to use the Facets visualizer in Colab.

@reviewers: Since literally everything changed, it does not make much sense to review the quickstart part by file changes; I'd suggest to manually go through the quickstart notebook / colab / script instead.

Pre-requisites

Please ensure you have done the following:

  • I have read the CONTRIBUTING.md document.
  • If my change requires a change to docs, I have updated the documentation accordingly.
  • If I have added an integration, I have updated the integrations table.
  • [] I have added tests to cover my changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Other (add details above)

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@fa9r fa9r requested review from htahir1 and strickvl May 17, 2022 18:46
@github-actions github-actions bot added internal To filter out internal PRs and issues enhancement New feature or request labels May 17, 2022
@fa9r fa9r changed the title Feature/content 166 finalize quickstart Revamped Quickstart May 17, 2022
@fa9r fa9r changed the title Revamped Quickstart Revamp Quickstart May 17, 2022
@ayush714
Copy link
Contributor

From my side it's good to go

@strickvl
Copy link
Contributor

strickvl commented May 18, 2022

I just went through this. Some thoughts / feedback in no particular order:

  • I think you need to merge develop branch into here. There are some CLI fixes that I don't see in this branch (esp stuff like zenml stack list which are fixed on develop)
  • There's somehow a mismatch between the README and the notebook version. i.e. the README has you create the profile and update the stack and install the integrations, but then the notebook has you do all that stuff all over again. Maybe just tell people to use the notebook at the top before you give all the instructions?
  • I had a little bit the feeling of 'too fast, too furious' where we got into the full multi-pipeline diagram. Made me think that there are somehow two kinds of people who will come to the quickstart:
      1. the people who are true beginners (maybe even beginners to ML)
      1. the people who have been around the block, have deployed models somehow for work and who just want to see the 'full story' showing a somewhat complex use case
    • I'm not sure how we bring those two audiences together. Seems like a bit of a conflict in my mind.
    • I know we want to showcase ZenML in all its glory, but if you have no idea about anything, then I think it would be a bit overwhelming.
    • (Appreciate that this isn't the most useful / productive feedback at this stage :) )
  • You start with the pipeline and then define all the steps. I wonder whether it could help with the information overload to start with telling the story of all the individual pieces and then show how it all comes together as pipelines?
  • The header "Run ML Pipelines" should probably be "Run ZenML Pipelines"
  • I actually got the following error when I tried to run the pipelines, so I was unable to get any further in the notebook:
RuntimeError: Failed to start service 
MLFlowDeploymentService[841cb997-d79d-4c6e-8712-9c977316e3b8] (type: model-serving, flavor: 
mlflow)
  Administrative state: `active`
  Operational state: `inactive`
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: 
/Users/strickvl/Library/Application Support/zenml/local_stores/fa711f02-18b6-4a07-95b2-a79fbf
7a2a49/841cb997-d79d-4c6e-8712-9c977316e3b8/service.log

All that said, however, there is a certain kind of more technical / competent user who will find this quickstart amazing and exactly what they wanted, so nice work on that! I think this is a great step forward for them.

Copy link
Contributor

@htahir1 htahir1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the quickstart and noticed a few things:

  • The restart button works but then it takes a bit long to actually do the restart. You can even run the next few cells AFTER you execute the restart and it somehow still runs them making them appear as if the restart
  • The FacetsVisualizer needs to run twice to get it working. This is a known bug but i would add it in the notebook somewhere
  • The DAG Visualizer within the notebook is a bit glitchy. Its zoomed out and doesnt allow for panning properly. Would be cool to make it a bit more palatable within the notebook and also give an option to make it work outside the notebook for non-colab users
  • I have not run the actually code yet but shouldnt we use the scritps to run it on kubeflow? That would be a missing link the the story. What I would do is: Hey now you ran it on notebook, run it on a kubeflow stack with these commands and then either point them to the README where everything is written or have like a run_on_kubeflow.sh script that does the stack provisoning and running the pipeline

Overall fantastic work! Really love it!

…nd move run.py-specific set up instructions out of prerequisites.
@fa9r
Copy link
Contributor Author

fa9r commented May 18, 2022

I just went through this. Some thoughts / feedback in no particular order:

  • I think you need to merge develop branch into here. There are some CLI fixes that I don't see in this branch (esp stuff like zenml stack list which are fixed on develop)

Just merged Hamza's branch (which just merged develop), now the CLI issues should be fixed here as well.

  • There's somehow a mismatch between the README and the notebook version. i.e. the README has you create the profile and update the stack and install the integrations, but then the notebook has you do all that stuff all over again. Maybe just tell people to use the notebook at the top before you give all the instructions?

True, the detailed setup instructions are only needed when running with run.py actually. I moved all those instructions down to the run.py section and also marked the notebook option as 'Recommended'.

  • I had a little bit the feeling of 'too fast, too furious' where we got into the full multi-pipeline diagram. Made me think that there are somehow two kinds of people who will come to the quickstart:

      1. the people who are true beginners (maybe even beginners to ML)
      1. the people who have been around the block, have deployed models somehow for work and who just want to see the 'full story' showing a somewhat complex use case
    • I'm not sure how we bring those two audiences together. Seems like a bit of a conflict in my mind.
    • I know we want to showcase ZenML in all its glory, but if you have no idea about anything, then I think it would be a bit overwhelming.
    • (Appreciate that this isn't the most useful / productive feedback at this stage :) )

Very true, I was thinking about this too. I found it hard to show off the value that ZenML creates without making it somewhat complex though (IMO a lot of the value comes from our various integrations and how we abstract them via stacks). I think for type 1 (true beginners) ZenBytes is the place to start. I included a link to that at the end of the quickstart, but now that I think about it a lot of folks might drop out before ever seeing that if they feel overwhelmed during the quickstart already. How about we add some kind of disclaimer at the beginning that mentions that the target audience of this quickstart is type 2 (having basic ML/MLOps experimence) and link to ZenBytes there as well. Thanks for pointing this out @strickvl , this is a big issue IMO. Any other ideas how to address it?

  • You start with the pipeline and then define all the steps. I wonder whether it could help with the information overload to start with telling the story of all the individual pieces and then show how it all comes together as pipelines?

Yes, great point. I think the natural flow is how we do it in ZenBytes:

  • start with data/train/eval steps
  • put it into training pipeline
  • add deployment and skew detection to train pipeline
  • run the train pipeline and show the model running
  • write the prediction service loader step to load the deployed model
  • write the basic inference pipeline with inference data loader and predictor
  • add skew and drift detection to inference pipeline
  • run inference pipeline

The reason I changed the ordering here is that I tried to make the quickstart more quick. This way, we just directly define the pipelines, then code up the steps (which I'd expect people to just skim over), then run them both and get done with it. IMO I feel going into detail on each step/part is a bit too much content here. However, this approach also has drawbacks as you said, e.g. one thing I don't like is that the prediction service loader step is defined before the training pipeline is run which makes that step feel really obscure. Personally, I still prefer the current ordering for sake of brevity as mentioned above, but I'm open to change it if more people prefer that.

  • The header "Run ML Pipelines" should probably be "Run ZenML Pipelines"

Good suggestion, I changed it.

  • I actually got the following error when I tried to run the pipelines, so I was unable to get any further in the notebook:
RuntimeError: Failed to start service 
MLFlowDeploymentService[841cb997-d79d-4c6e-8712-9c977316e3b8] (type: model-serving, flavor: 
mlflow)
  Administrative state: `active`
  Operational state: `inactive`
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: 
/Users/strickvl/Library/Application Support/zenml/local_stores/fa711f02-18b6-4a07-95b2-a79fbf
7a2a49/841cb997-d79d-4c6e-8712-9c977316e3b8/service.log

This is caused by the new MLflow 1.26.0 not being compatible with ZenML. I think @stefannica already fixed this in this PR: #595, so this also be fixed now that I merged develop.

All that said, however, there is a certain kind of more technical / competent user who will find this quickstart amazing and exactly what they wanted, so nice work on that! I think this is a great step forward for them.

Thanks so much for your amazing review @strickvl!

@strickvl
Copy link
Contributor

Very true, I was thinking about this too. I found it hard to show off the value that ZenML creates without making it somewhat complex though (IMO a lot of the value comes from our various integrations and how we abstract them via stacks). I think for type 1 (true beginners) ZenBytes is the place to start. I included a link to that at the end of the quickstart, but now that I think about it a lot of folks might drop out before ever seeing that if they feel overwhelmed during the quickstart already. How about we add some kind of disclaimer at the beginning that mentions that the target audience of this quickstart is type 2 (having basic ML/MLOps experimence) and link to ZenBytes there as well. Thanks for pointing this out @strickvl , this is a big issue IMO. Any other ideas how to address it?

I think the core thing is just going to be how we guide people through our various entry pages / entry points. The core ones are:

  • whatever happens when someone types zenml go
  • our main webpage (zenml.io)
  • our Github README
  • the docs

All of these should be united and uniform I think in offering two different paths: (1) for ultra beginners and (2) for people with experience who want to get a quick glimpse of the power of ZenML. Those two different paths seem to require pretty different experiences or guides, so we should just be explicit about that and cater to them separately.

We should maybe use the word 'beginner' somehow as a signpost for real beginners.

In any case, we should just be really explicit about which examples/notebooks are for you if you're new to things vs if you've been around the block a few times. (This intersects with my work on the README, btw, so will sync with you later in the day on that I guess so we have this standardised everywhere).

@fa9r
Copy link
Contributor Author

fa9r commented May 18, 2022

I ran the quickstart and noticed a few things:

  • The restart button works but then it takes a bit long to actually do the restart. You can even run the next few cells AFTER you execute the restart and it somehow still runs them making them appear as if the restart

Yeah this can be confusing when you just want to shift-enter through it. Already added a comment below the installs telling people to wait for the cell to finish and kernel to restart before proceeding. Not sure how to do this better, any ideas?

  • The FacetsVisualizer needs to run twice to get it working. This is a known bug but i would add it in the notebook somewhere

I haven't experienced that myself yet but I added a note to rerun the cell if nothing happens.

  • The DAG Visualizer within the notebook is a bit glitchy. Its zoomed out and doesnt allow for panning properly. Would be cool to make it a bit more palatable within the notebook and also give an option to make it work outside the notebook for non-colab users

Haven't experienced that myself either, I will rerun the quickstart myself again now that your branch and develop are merged. Let's see if this can be improved.

  • I have not run the actually code yet but shouldnt we use the scritps to run it on kubeflow? That would be a missing link the the story. What I would do is: Hey now you ran it on notebook, run it on a kubeflow stack with these commands and then either point them to the README where everything is written or have like a run_on_kubeflow.sh script that does the stack provisoning and running the pipeline

Great suggestion, would love to do that, but that will still be a lot of work. IMO this is also the missing link in ZenBytes before we deploy to the cloud. IMO this is the next piece of content we should write and then add it into the quickstart too!

Overall fantastic work! Really love it!

Glad to hear, thanks for the review 😃

@htahir1
Copy link
Contributor

htahir1 commented May 18, 2022

Yeah this can be confusing when you just want to shift-enter through it. Already added a comment below the installs telling people to wait for the cell to finish and kernel to restart before proceeding. Not sure how to do this better, any ideas?

Maybe a os.sleep(5) to even force people to wait? :D

I haven't experienced that myself yet but I added a note to rerun the cell if nothing happens.

Thanks!

Haven't experienced that myself either, I will rerun the quickstart myself again now that your branch and develop are merged. Let's see if this can be improved.

Yes lets do it. Happy to hear it works for you. @strickvl Did you also find the inline dag visualizer ok to work with?

Great suggestion, would love to do that, but that will still be a lot of work. IMO this is also the missing link in ZenBytes before we deploy to the cloud. IMO this is the next piece of content we should write and then add it into the quickstart too!

Is it a lot of work really? Would it not just be a few lines in the README and at the end of the notebook? We already have the code itself in zenbytes

@strickvl
Copy link
Contributor

@htahir1 I never got that far in the notebook since there was some bug which prevented me from running the pipelines at all.

@fa9r
Copy link
Contributor Author

fa9r commented May 18, 2022

Maybe a os.sleep(5) to even force people to wait? :D

Doesn't work unfortunately, still kills the subsequent cells

htahir1 and others added 2 commits May 18, 2022 15:06
…zenml-io/zenml into feature/CONTENT-166-finalize-quickstart
@htahir1 htahir1 merged commit d405139 into doc/hamza-misc-updates May 18, 2022
@htahir1 htahir1 deleted the feature/CONTENT-166-finalize-quickstart branch May 18, 2022 13:32
htahir1 added a commit that referenced this pull request May 18, 2022
* Getting there

* midway best practices

* Added more explain stuff

* Edited best practices

* checkpoint

* Checkpoint for quickstart

* Added notebook

* Added dash visualizer jupyter

* Refactored imports

* Added runner script

* Quickstart now follows best practices

* Quickstart formatted

* Updated zenml go

* Latest quickstart

* Adapted zenml go to new quickstart

* Added event for extending ZenML

* Linting fixes

* zenml go is now optout too

* Made an exception for pipeline yamls

* Added config

* Apply suggestions from code review

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

* Deleting

* preserving user metadata

* Fromatted latest

* Formatting and linting

* Formatting and linting

* Typo

* One function down

* Update stakc comp done

* Deregister stack

* Create user

* Delete user

* Get user

* Added all events

* Added project

* Added facade for all base methods

* Deleted flavor event delted

* Wrapped track event to make sure deduplication doesnt happen

* Wrapped track event to make sure deduplication doesnt happen

* Registered stack refactored

* Flavor event is now properly done

* Update stack done

* Added import and export stack

* Added import and export stack

* Analytics events done

* Copy paste error

* Fixed mypy errors

* Toc updated

* Restructurung

* Restructure

* GitBook: [#2] No subject

* Fix/update `zenml go` (#606)

* Fix broken feature store example (#602)

* allow yaml files

* include yaml config

* Added success message (#603)

* Added success message

* Reformatted

* remove extra colon

* add extra spacing

* update zenml go behaviour

Co-authored-by: Alexej Penner <thealexejpenner@gmail.com>

* Analytics turned off in copy_active_configuration

* Docstring nits

* Formatting

* Added todo

* Revamp Quickstart (#604)

* WIP: rework/reformat quickstart.

* Finished quickstart.ipynb (untested).

* Improved docs/comments of quickstart.ipynb.

* Fix bugs in quickstart.ipynb. Everything except dash integration works.

* Update quickstart README and setup.py.

* Bug fix: allow Facets visualizer magic in Colab.

* Added screenshots for all visualizations to quickstart.ipynb.

* Added pipeline visualization to quickstart.ipynb.

* Added stack visualization to quickstart.ipynb.

* Wrote new quickstart run.py script / code modules.

* Fix grammar / spelling of quickstart.

* Update quickstart README: declare notebook execution as recommended and move run.py-specific set up instructions out of prerequisites.

* Adjust quickstart to reviewer feedback.

* Added magic flag

Co-authored-by: Hamza Tahir <htahir111@gmail.com>

* Changed best practices to misc and added resources section

* Finished landing page of resoruces

* Half finished best practices

* Added eventbrite

* Getting there

* FAQs added basically

* Fixed link;

* GitBook: [#3] No subject

* Apply suggestions from code review

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>

Co-authored-by: Alex Strick van Linschoten <strickvl@users.noreply.github.com>
Co-authored-by: Alexej Penner <thealexejpenner@gmail.com>
Co-authored-by: Felix Altenberger <felix@zenml.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request internal To filter out internal PRs and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants