Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run data transformer on other types of inputs #843

Open
saulshanabrook opened this issue May 16, 2018 · 10 comments
Open

Run data transformer on other types of inputs #843

saulshanabrook opened this issue May 16, 2018 · 10 comments

Comments

@saulshanabrook
Copy link
Contributor

Right now, AFAIK, the data transformers are only run when a pandas dataframe is passed in: https://github.com/altair-viz/altair/blob/ee69e62c55847432178c2e52657b9cdcd98d8f43/altair/vegalite/v2/api.py#L26

However, I would like to be able to run a transformer when I pass in an ibis expression to a Chart. The transformer will take that ibis object and return a valid vega lite data dictionary.

The goal is for a user to easily compose visualizations from mapd expressions built with ibis, that are rendered with mapd's vega renderer.

If Altair called the data transformer when it gets a recognized class, then I could get this to work. Another solution would be to expose a registry where users could register special transformations associated with difference classes. Then _prepare_data could use this registry.

@saulshanabrook
Copy link
Contributor Author

saulshanabrook commented May 16, 2018

I am actually gonna close this for now, since we might not need this tight of an integration. We can just feed in our SQL query as a string, which Altair will think of as a URL, and then we can grab that out in a renderer, to send to the mapd backend.

@jakevdp
Copy link
Collaborator

jakevdp commented May 17, 2018

OK – let me know if a use-case for this comes up and we can think about how to make the plugin more configurable.

@saulshanabrook
Copy link
Contributor Author

saulshanabrook commented May 22, 2018

I am reopening this because we would like to be able to pass in Ibis expressions to Chart and do a couple of different things. Either get some subset of the data into a pandas dataframe and plot that (data.limit(max_rows).execute()), which would work for most ibis backends, or keep the query as SQL and output it for the mapd backend to visualize.

I see a couple of options, starting with the simplest:

  1. Run data transformations on unknown classes (here).
  2. Change prepare data to be a single dispatch function, so that users can import it and register their data preparers.

I am happy to help implement either of these, or another idea you have.

@jakevdp
Copy link
Collaborator

jakevdp commented May 22, 2018

I think the best option is to run data transformers on all data inputs. We'd have to modify default data transformers in Altair to raise a warning about unknown data types before passing the value through unchanged.

This would not change the API at all, and then you could simply register & enable a new transformer that would work for whichever data source you wish.

@saulshanabrook
Copy link
Contributor Author

That sounds good. I can start working on a PR.

@saulshanabrook
Copy link
Contributor Author

We'd have to modify default data transformers in Altair to raise a warning about unknown data types before passing the value through unchanged.

But then if you passed in a url string, for example, it would give you a bunch of warning by default.

@jakevdp
Copy link
Collaborator

jakevdp commented May 23, 2018

I'd imagine that strings (assumed to be URLs) would be one of the "recognized" types.

@ellisonbg
Copy link
Collaborator

I can imagine that it would be helpful to run the data transformation on all data types. One way we might want to design that is using multiple dispatch:

https://github.com/mrocklin/multipledispatch

Then it becomes much easier to write different combinations of transformers that have different implementations for different data types. Much better than different data transformers each having a bunch if/case logic switching on the types. Thoughts?

@saulshanabrook
Copy link
Contributor Author

That makes sense. Could we use Python's built in single dispatch? https://docs.python.org/3.6/library/functools.html#functools.singledispatch

@ellisonbg
Copy link
Collaborator

ellisonbg commented Aug 18, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants