Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add visualization in notebooks / Colab #267

Open
lucaventurini opened this issue Apr 3, 2019 · 11 comments

Comments

Projects
None yet
6 participants
@lucaventurini
Copy link

commented Apr 3, 2019

Hi,

I managed to explore all the basic features of this great tool within the GPU-enabled notebooks provided in Google Colab and Kaggle. I was also able to run and view the tensorboard.

The only thing I missed is the visualization, as I didn't manage to find a way to plot from the notebook.
Is there a way to call the CLI functions from python, and make them return a matplotlib plot? In this way they could be visualized with %matplotlib inline, or %matplotlib notebook in case we need to zoom or interact.

@w4nderlust

This comment has been minimized.

Copy link
Collaborator

commented Apr 3, 2019

@lucaventurini we are planning to add the visualization function to the programmatic api, so you will be able to do that soon.

@skim2257

This comment has been minimized.

Copy link

commented Apr 5, 2019

@lucaventurini you can try !command to execute bash commands in google colab / jupyter notebooks

i.e: !ludwig visualize -v learning_curves -ts training_statistics.json

@w4nderlust

This comment has been minimized.

Copy link
Collaborator

commented Apr 5, 2019

That doesn’t solve the issue unfortunately, the images generated by pyplot will not be visible in a notebook

@dsblank

This comment has been minimized.

Copy link
Contributor

commented Apr 8, 2019

There may be a way to have it easily run in the notebook, and be able to see visualizations. Here is a hack that might be able to be wrapped up nicely so that it would work better. I am also imagining a --jupyter flag that could be added in ludwig.contribs:

In a cell:

import sys
sys.argv = ("ludwig experiment --data_csv reuters-allcats.csv --model_definition_file model_definition.yaml".split(" ")) 
import ludwig
ludwig.cli.main()

Now you are running in the notebook, with all of the normal Ludwig CLI flags. Maybe this could be a Jupyter magic.

@floscha

This comment has been minimized.

Copy link
Contributor

commented Apr 24, 2019

Using Ludwig's visualization functionality is already possible using its programmatic API. However, it is not quite intuitive nor convenient. Assuming you have finished training a model and stored the results under results/experiment_run_1 and also enabled inline plotting using %matplotlib inline, the following few lines of code will do the trick:

training_statistics = ['results/experiment_run_1/training_statistics.json']
field = []
model_names = ['results/experiment_run_1/model/']
ludwig.visualize.learning_curves(training_statistics, field, model_names)

Especially wrapping the paths in lists and the need to specify the unnecessary field parameter as an empty list leave some room for API improvement 😉

@w4nderlust

This comment has been minimized.

Copy link
Collaborator

commented Apr 24, 2019

@floscha there's a reason why those functions are not provided in the api module: they are not ready for prime time. Anyway there is a reason for thsoe parameters to be lists and there is a reason for having a field parameter (yes, it is used in the code and is useful). As they can be improved, concrete help and contributions would be much more appreciated ;)

@floscha

This comment has been minimized.

Copy link
Contributor

commented Apr 25, 2019

Hey @w4nderlust, I understand that the visualize module is made for internal usage and is definitely not useless for that matter 😉However, there appears to be a considerable number of people for which the current way of plotting programmatically seems rather obscure. Especially since its usage differs quite a bit from the CLI commands which do not require lists or an explicit field value.

I would be very glad to help with Ludwig's visualization in notebooks. Since you mentioned that you are planning to add visualization to the programmatic API, are there any ideas yet what that API should look like?

@w4nderlust

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

@floscha the usage doesn't differ at all from the CLI, the functions are exactly the same and the arguments are exactly the same.

The work that needs to be done is the following: each function of the visualization module performs loading of files, data manipulation and visualization of the manipulated data. At the moment the functions for visualization are separate (utils/visualization_utils.py), but the current functions in the visualize module do both loading and data manipulation. Those two operations need to be separated so that one would programmatically call only the function that calculates the logic with that that has already been loaded.

As an example, the learning_curves function would need to get as input, instead of a list of files, just the data columns needed, and an outer function should load files and call this inner function as many times as needed. This will make for a clean api.

@prakass1

This comment has been minimized.

Copy link

commented May 30, 2019

One Solution which I did find recently is using below code:

from ludwig import visualize
visualize.learning_curves(['gDrive/Ludwig/results/api_experiment_run_1/training_statistics.json'],None)

This produces output to the cell of a google colab notebook. One problem would be that one would have to read the functions defined visualize and in utils there is the actual plotting by the filename utils/visualization_utils.py as mentioned above already by @w4nderlust

The screenshot to confirm:

image

@w4nderlust

This comment has been minimized.

Copy link
Collaborator

commented May 30, 2019

Thanks for your note @prakass1 . We are currently working on adding an option to save images (there a branch for that, to be merged soon) and in redefining the signature of the functions in visualize to make them usable not only by inputting files like you did, but also directly from dataframes/numpy arrays obtained through prediction. Stay tuned

@prakass1

This comment has been minimized.

Copy link

commented May 30, 2019

@w4nderlust I am new to the tool. Looking forward to the upcoming changes and probably will provide suggestions to any enhancements :)

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.