New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support extracting transformed chart data using VegaFusion #3081

Merged

jonmmease merged 23 commits into master from jonmmease/transformed_data

Jun 14, 2023

Contributor

jonmmease commented Jun 9, 2023 •

edited by binste

Loading

Overview

As discussed in #3054, this PR ports the transformed_data logic from VegaFusion into Altair. See https://vegafusion.io/transformed_data.html for some background on VegaFusion's transformed_data function.

A standalone transformed_data utility function is added to altair/utils/transformed_data.py, and a chart._transformed_data method is added to Chart, FacetChart, LayerChart, ConcatChart, HConcatChart, and VConcatChart (but not RepeatChart, as I'm not sure how to do this yet). I added a leading underscore to the chart method so that we can use this for internal testing (and documentation) before making it a public, but happy to reconsider that decision.

Example

import altair as alt
from vega_datasets import data

source = data.movies.url

chart = alt.Chart(source).mark_bar().encode(
    alt.X("aggregate_gross:Q").aggregate("mean").title(None),
    alt.Y("ranked_director:N")
        .sort(op="mean", field="aggregate_gross", order="descending")
        .title(None)
).transform_aggregate(
    aggregate_gross='mean(Worldwide_Gross)',
    groupby=["Director"],
).transform_window(
    rank='row_number()',
    sort=[alt.SortField("aggregate_gross", order="descending")],
).transform_calculate(
    ranked_director="datum.rank < 10 ? datum.Director : 'All Others'"
).properties(
    title="Top Directors by Average Worldwide Gross",
)
chart

chart._transformed_data()

|    | ranked_director   |   mean_aggregate_gross |
|---:|:------------------|-----------------------:|
|  0 | All Others        |            8.87602e+07 |
|  1 | James Cameron     |            8.29781e+08 |
|  2 | George Lucas      |            6.73577e+08 |
|  3 | Peter Jackson     |            5.95566e+08 |
|  4 | Andrew Stanton    |            7.00319e+08 |
|  5 | David Yates       |            9.37984e+08 |
|  6 | Carlos Saldanha   |            7.69293e+08 |
|  7 | Andrew Adamson    |            6.43134e+08 |
|  8 | David Slade       |            6.88155e+08 |
|  9 | Pete Docter       |            7.31305e+08 |

Testing

I added tests for the expected transformed data length and (a subset of) the columns for all of the example charts that are currently supported

Limitations

VegaFusion doesn't support every Vega transform, and it doesn't support every option for all of the transforms it does support. When an unsupported transform is required, VegaFusion raises an, admittedly not very helpful, error like this:

ValueError: Pre-transform error: Requested variable (Variable { name: "data_2", namespace: Data }, [])
 requires transforms or signal expressions that are not yet supported

Tracking improvements to this error message in vega/vegafusion#336. Tracking improvements to the documentation of supported transforms in vega/vegafusion#337.

jonmmease added 8 commits

June 7, 2023 09:55


          Port transformed_data functionality from VegaFusion

67b44da


          Add initial transformed_data tests


          skip black formatting for pytest.mark.parametrize

3ae6c7d


          Test exclude flag to transformed_data

eed32e9


          chart.transformed_data -> chart._transformed_data

6f61bad

Make method internal while still experimental


          Add VegaFusion as dev dependency

2be1f64


          Add better error message when VegaFusion is not installed

07c5a00


          Merge remote-tracking branch 'origin/master' into jonmmease/transform…

4360cf8

…ed_data

jonmmease requested review from mattijn and binste

June 9, 2023 13:07

jonmmease commented

View reviewed changes

altair/utils/transformed_data.py Outdated

+                  # Compile to Vega and extract inline DataFrames
+                  with data_transformers.enable("vegafusion-inline"):
+                      vega_spec = chart.to_dict(format="vega")
+                      inline_datasets = get_inline_datasets_for_spec(vega_spec)

Contributor Author

jonmmease Jun 9, 2023

I'd like to bring the vegafusion-inline transformer and the get_inline_datasets_for_spec function over to Altair eventually, but will save that for a future PR.

Contributor

binste Jun 10, 2023

Just for my own understanding, does the vegafusion-inline transformer use the entrypoint system to register itself or why is it available here?

Contributor Author

jonmmease Jun 10, 2023

Yes, it's defined and registered in vegafusion during import: https://github.com/hex-inc/vegafusion/blob/ae2040ebf93b1a69328be9df2ec831bd8fbb6ff4/python/vegafusion/vegafusion/transformer.py#L256

jonmmease commented

View reviewed changes

altair/utils/transformed_data.py Outdated

+                  ...     ]
+                  ... }
+                  >>> get_group_mark_for_scope(spec, (1,))
+                  {'type': 'group', 'marks': [{'type': 'rect'}]}

Contributor Author

jonmmease Jun 9, 2023

Does hatch run test automatically run doctests?

Contributor

mattijn Jun 9, 2023

hatch run test runs the following:

black --diff --color --check .
ruff check .
mypy altair tests
python -m pytest --pyargs --doctest-modules tests

See https://github.com/altair-viz/altair/blob/master/pyproject.toml#L104-L107

binste requested changes

View reviewed changes

Contributor

binste left a comment

Thanks @jonmmease, this is going to be a great feature! I don't have a preference if we keep it as ._transformed_data for now until we have some documentation as well.

altair/utils/transformed_data.py Outdated Show resolved Hide resolved

altair/utils/transformed_data.py Outdated Show resolved Hide resolved

altair/utils/transformed_data.py Outdated

Comment on lines 58 to 60

+                  DataFrame or list of DataFrame
+                      If input chart is a Chart or Facet Chart, returns a DataFrame of the transformed data
+                      Otherwise, returns a list of DataFrames of the transformed data

Contributor

binste Jun 10, 2023

Suggested change

      
                DataFrame or list of DataFrame
          
                    If input chart is a Chart or Facet Chart, returns a DataFrame of the transformed data
          
                    Otherwise, returns a list of DataFrames of the transformed data
          
                Pandas DataFrame, list of Pandas DataFrames or None
          
                    If input chart is a Chart or Facet Chart, returns a Pandas DataFrame of the transformed data if it exists and None if the chart does not have a dataset
          
                    Otherwise, returns a list of Pandas DataFrames of the transformed data

Contributor Author

jonmmease Jun 10, 2023

Done in 75cf958.

As a side note, pandas is supposed to always be lower case, even at the start of a sentence. See https://pandas.pydata.org/about/citing.html.

Contributor

binste Jun 10, 2023

Good to know, thanks!

altair/utils/transformed_data.py Outdated

+                  # Compile to Vega and extract inline DataFrames
+                  with data_transformers.enable("vegafusion-inline"):
+                      vega_spec = chart.to_dict(format="vega")
+                      inline_datasets = get_inline_datasets_for_spec(vega_spec)

Contributor

binste Jun 10, 2023

Just for my own understanding, does the vegafusion-inline transformer use the entrypoint system to register itself or why is it available here?

altair/utils/transformed_data.py Outdated

Contributor

binste Jun 10, 2023

What do you think about making this module "internal/private" by renaming it to _transformed_data.py? Almost everything in Altair so far is public which makes further development difficult in some areas (and bloats up the top-level API and the "API reference" section in the documentation, see #2918, although that would not be the case here). I'd be in favour that we are more selective moving forward with new features and mark modules or functions as private if we don't expect a user to access them directly. Users should use the Chart.transformed_data method anyway and it would make it easier to iterate on this once you start integrating more of the VegaFusion functionality.

altair/utils/transformed_data.py Outdated

+              from altair.utils.schemapi import Undefined
+              Scope = Tuple[int, ...]
+              FacetMapping = Dict[Tuple[str, Scope], Tuple[str, Scope]]

Contributor

binste Jun 10, 2023

Appreciate that you're adding type hints! :)

altair/utils/transformed_data.py Outdated Show resolved Hide resolved

altair/vegalite/v5/api.py Outdated

+                      DataFrame
+                          Transformed data as a DataFrame
+                      """
+                      from ...utils.transformed_data import transformed_data

Contributor

binste Jun 10, 2023

The existing codebase uses relative imports in many places so this would be consistent but I'm in favour of switching to absolute ones. They are easier to read, especially as Altair has many modules with the same names on different levels.

tests/test_transformed_data.py

+                  source = pkgutil.get_data(examples_methods_syntax.__name__, filename)
+                  chart = eval_block(source)
+                  df = chart._transformed_data()
+                  assert len(df) == rows

Contributor

binste Jun 10, 2023

Do you think it makes sense to also check if this dataframe no nulls? assert df.notnull().all().all()

Contributor Author

jonmmease Jun 10, 2023

I don't think so. When the input DataFrame has nulls it's possible for these to be pass through to the transformed data. Vega-Lite usually filters null values for the columns that are used in the chart, but transformed_data returns all of the columns, so the unused columns can still have nulls.

jonmmease and others added 6 commits

June 10, 2023 12:09


          Move import

f0b26ea

Co-authored-by: Stefan Binder <binder_stefan@outlook.com>


          move import

b48f8d3

Co-authored-by: Stefan Binder <binder_stefan@outlook.com>


          Docstring update

75cf958


          Make utils.transformed_data internal, use absolute imports

a46ce1b


          Reword docstring

48f802c

Co-authored-by: Stefan Binder <binder_stefan@outlook.com>


          Merge branch 'jonmmease/transformed_data' of github.com:altair-viz/al…

dfa18bc

…tair into jonmmease/transformed_data

Contributor Author

jonmmease commented Jun 10, 2023

Thanks for the review @binste! I think I've addressed all of your feedback.

binste approved these changes

View reviewed changes

Contributor

binste left a comment

Indeed, thanks! :)

mattijn reviewed

View reviewed changes

Contributor

mattijn left a comment

Thanks @jonmmease! I've placed a few more questions inline.

altair/utils/_transformed_data.py Outdated

+              Scope = Tuple[int, ...]
+              FacetMapping = Dict[Tuple[str, Scope], Tuple[str, Scope]]
+              MAGIC_CHART_NAME = "_vf_mark{}"

Contributor

mattijn Jun 10, 2023

I'm not so fond of the word magic. Can we describe what it actually is? And replace all places where the word is used?

Is this the name of each view? I think we name if elsewhere view+_+<incrementing value>. Maybe align? Or, if different, add a trailing underscore before the incrementing number.

Or is this the reference name of the dataset within a view?

Contributor Author

jonmmease Jun 10, 2023

Yeah, happy to rename this. This is the name that is applied to each unnamed Chart/Subchart/View.

Changed to:

VIEW_NAME = "altair_view_{}"

Contributor

mattijn Jun 10, 2023

One question, if this also defines a view name, could we also use _get_name()? Or will you run into conflicts then?
_get_name() is defined on the Chart class, see here: https://github.com/altair-viz/altair/blob/master/altair/vegalite/v5/api.py#L2569-L2571.

Contributor Author

jonmmease Jun 10, 2023

I wasn't familiar with _get_name(). Yeah, I think this would work. I'll give it a try.

Contributor Author

jonmmease Jun 10, 2023

Done in 88fceb5

altair/utils/_transformed_data.py Outdated

+                      ) from err
+                  if isinstance(chart, Chart):
+                      # Add dummy mark if None specified to satisfy Vega-Lite

Contributor

mattijn Jun 10, 2023

Rename into template mark if this section is really necessary, but I was wondering if we need this? Shouldn't this raise a warning or error instead?

Contributor Author

jonmmease Jun 10, 2023

Reworded comment in aabf5d6.

The reason to do this is so that it's possible to call chart.transformed_data() on a chart with transforms even if no mark has been defined. For example:

import pandas as pd
chart = alt.Chart(
    pd.DataFrame({"a": [1, 2, 3], "b": ["A", "BB", "CCC"]})
)
chart.transform_filter("datum.a > 1")._transformed_data()

	a	b
0	2	BB
1	3	CCC

Not sure if this will be a common use case, but I think it's neat to be able to use an Altair chart kind of like a DataFrame this way.

Contributor

mattijn Jun 10, 2023

This is not a chart anymore😉! Thinking aloud, without judging: I always thought it was good that dataframe libraries make a clear distinction between dataframe operations and visualizations. With the powerful options introduced in this PR, altair is slowly turning into a dataframe library as well.

Contributor Author

jonmmease Jun 10, 2023

One idea I've had is that Altair could provide a alt.Dataset object that supports the .transform_* methods and the mark_* methods. The .transform_* methods would return a new Dataset and the .mark_* methods would return a Chart.

This is somewhat similar to how HoloViews works (https://holoviews.org/getting_started/Tabular_Datasets.html)

Contributor

mattijn Jun 10, 2023 •

edited

Loading

I like this concept. I don't think it is possible to run transforms in Vega-Lite without defining a mark.

Naming is also an issue. alt.Data and alt.Dataset already exists. Maybe we can extend these? Worth raising a follow up issue on this!

Contributor Author

jonmmease Jun 10, 2023

I don't think this it is possible to run transforms in Vega-Lite without defining a mark

Yes, this is why I add a mark when none is defined before calling vega-lite through vl-convert.

Naming is also an issue. alt.Data and alt.Dataset already exists

Or maybe just alt.DataFrame? I'll open a follow up issue.

altair/utils/_transformed_data.py

+                          subcharts = chart.vconcat
+                      elif isinstance(chart, ConcatChart):
+                          subcharts = chart.concat
+                      else:

Contributor

mattijn Jun 10, 2023

Should we make a special raise for RepeatChart? Or is this not applicable here? It may feel like an omission.

Contributor Author

jonmmease Jun 10, 2023

Right now, RepeatChart doesn't have a _transformed_data() method, so it won't get this far. We could add a _transformed_data() method to RepeatChart that always raises an exception that it's not supported. But this might also be confusing (why have the method as all if it never works). What do you think?

Contributor

mattijn Jun 10, 2023

Just wondering, why can it never works? A NotImplementedError seems OK for me for now. It is also confusing if RepeatChart is the only Chart without a transformed_data() method.

Contributor Author

jonmmease Jun 10, 2023

Yeah, a NotImplementedError sounds like a good approach. I'll add that. I think it's probably possible to support RepeatChart in the future, I just haven't work through quite what it means, and how to process the structure of the Vega that Vega-Lite produces in this case.

Contributor Author

jonmmease Jun 10, 2023

Done in a738408

Contributor

binste Jun 11, 2023

Now that all top-level charts have the ._transformed_data method, you could add it to TopLevelMixin so there is no need anymore to repeat it.

Contributor Author

jonmmease Jun 11, 2023

I'm not sure. The return type in the type signature is different between Chart/FacetChart (where it's Optional[DataFrameLike], and the others (where it's List[DataFrameLike].

For the standalone utility transformed_data function I used typeing.overload to express the two signatures. But I wasn't sure how to accomplish this in a superclass method when the type signature depends on the type of self. Do you have any ideas on that?

Contributor

binste Jun 11, 2023

Good point. Just tried and got this far with defining _transformed_data on TopLevelMixin:

   @overload
    def _transformed_data(
        self: Union["Chart", "FacetChart", "RepeatChart"],
        row_limit: Optional[int] = ...,
        exclude: Optional[Iterable[str]] = ...,
    ) -> Optional[DataFrameLike]:
        ...

    @overload
    def _transformed_data(
        self: Union["LayerChart", "ConcatChart", "HConcatChart", "VConcatChart"],
        row_limit: Optional[int] = ...,
        exclude: Optional[Iterable[str]] = ...,
    ) -> List[DataFrameLike]:
        ...

    def _transformed_data(
        self,
        row_limit=None,
        exclude=None,
    ):

But then mypy complains, rightly so, that these types are not superclasses of TopLevelMixin:

altair/vegalite/v5/api.py:2407: error: The erased type of self "Union[altair.vegalite.v5.api.Chart, altair.vegalite.v5.api.FacetChart, altair.vegalite.v5.api.RepeatChart]" is not a supertype of its class "altair.vegalite.v5.api.TopLevelMixin"  [misc]
altair/vegalite/v5/api.py:2415: error: The erased type of self "Union[altair.vegalite.v5.api.LayerChart, altair.vegalite.v5.api.ConcatChart, altair.vegalite.v5.api.HConcatChart, altair.vegalite.v5.api.VConcatChart]" is not a supertype of its class "altair.vegalite.v5.api.TopLevelMixin"  [misc]

According to this GH issue and some others, the solution might be to create protocols whcih identify the individual chart types but that seems to be more complicated then just keeping it as it is now -> I'm ok with leaving it.

altair/vegalite/v5/api.py Outdated

+                      self,
+                      row_limit: Optional[int] = None,
+                      exclude: Optional[Iterable[str]] = None,
+                  ) -> List[pd.DataFrame]:

Contributor

mattijn Jun 10, 2023

Output is always a list of pandas dataframes? Do we want this to eventually support other dataframes-a-like as well?

Contributor Author

jonmmease Jun 10, 2023

Right now in VegaFusion, these results can actually be an Arrow tables (if the input Chart(s) wrap arrow tables), a Polars DataFrames (if the input Chart(s) wrap Polars DataFrames), or pandas DataFrames (all other cases).

It seemed to me that this would make for a pretty awkward type signature, so I left it as just pandas. It could be a Union, but arrow and polars are optional dependencies so I'm not sure how to have the Union include different options based on what's installed. Do you have any ideas here @mattijn or @binste?

Contributor

mattijn Jun 10, 2023

Can we define a DataFrameLike type (elsewhere in the codebase) that can refer to any object with a .__dataframe__ attribute, regardless of whether it's a pandas DataFrame, polars DataFrame, or any other object? And use that type here? Would that help?

Contributor Author

jonmmease Jun 10, 2023

That's a good idea, I'll look into that.

Contributor Author

jonmmease Jun 10, 2023

Done in 6f43d6b

Contributor

mattijn Jun 10, 2023

Can you double check this @binste? Should a type definition comes back in the root of altair (see changes in __init__)?

Contributor

binste Jun 11, 2023 •

edited

Loading

I'm in favor of renaming it to _DataFrameLike so no one starts using it themselves and it won't show up in __init__. I'm doing the same in #2976. Gives us the flexibility to gather these types at a later stage into a types.py module if that makes sense or move them around/redefine otherwise.

Apart from that I like the protocol usage. We can narrow down the type further once we figure out how to best do that but at least it's better then Any while being correct. At a later stage (not in this PR), we can try to use generics to define the return type of transformed_data based on the type of the .data attribute.

Contributor

binste Jun 11, 2023

Could you also move the Protocol import into a if sys.version_info >= (3, 8): statement? It was introduced in Python 3.8 and we try to import from the official library as soon as it's available.

Contributor Author

jonmmease Jun 11, 2023

Yes, I'll move the protocol.

I might not have the right understanding here, but since DataFrameLike is in the return type of the public method, doesn't the type need to be public as well? Otherwise, Altair users wouldn't be able to work with it in their own typed code.

jonmmease added 7 commits

June 10, 2023 13:39


          Remove magic, use "view" instead of chart or mark

280eb0f


          Reword

aabf5d6


          Remove incorrect comment

16250fd


          black

8ab1dce


          Use DataFrameLike protocol for the transformed_data signature

6f43d6b

The returned DataFrames can be arrow or Polars as well if those are used as the input to the Chart.


          Add NotImplementedError for RepeatChart

a738408


          Use Chart._get_name to name subcharts

88fceb5

mattijn approved these changes

View reviewed changes

Contributor

mattijn commented Jun 10, 2023 •

edited

Loading

One comment. Naming is hard, but I’m not sure if transformed_data covers all. I think this function also will return data if there are no transforms defined in the specification. Maybe just name it dataframes() or something similar?

Contributor Author

jonmmease commented Jun 10, 2023

One comment. Naming is hard, but I’m not sure if transformed_data covers all. I think this function also will return data if there are no transforms defined in the specification. Maybe just name it dataframes() or something similar?

I chose .transformed_data() to mirror the .data property of a chart. It is true that this method will return data even if there are no transforms, but I think it's appropriate for a chart with no transforms to return the original dataset. @binste do you have any thoughts on the naming here?

Contributor

mattijn commented Jun 11, 2023

What about to_df() or to_data(). Limitations (that I'm willing to accept):

to_df() is maybe too narrow once we support array data too?
to_data() is maybe too wide, how to know upfront what will you get?

jonmmease mentioned this pull request

Altair DataFrame class #3083

Open

1 task

Contributor Author

jonmmease commented Jun 11, 2023

What about to_df() or to_data()

I'm not a fan of .to_data(), since we already have a .data property on the Chart (I think it would be confusing why they are both there, and why they mean different things). .to_df() is interesting, though it would feel odd to me if we eventually added an alt.DataFrame class as suggested in #3083.

If we think of the transforms on a chart as being analogous to a lazy data frame, we could consider names like .collect() (used by Polars, but maybe not a great idea since Vega has a collect transform), .compute() (used by Dask), or .evaluate() or .eval().

I like .eval() since it's a natural to say we're "evaluating the transforms" in the chart, and it's nice and short.

Contributor

binste commented Jun 11, 2023

Agree, naming is hard... If GitHub CoPilot could solve this I would be very happy ;) Two points from my side:

For me, something like to_data does not convey that it's not just the original data but the one after all transformations have been applied. transformed_data works for me even if no transforms are defined as it still technically is the "data with all transformations applied" but maybe I'm just biased because I now used it in vegafusion for a while and got used to it.
- -> I think it's great if it has something like transformed/evaluated/eval in the name
On the other hand, it should also have data/df/... in the name as else it's unclear what is being evaluated. eval() could mean that the Chart specification is evaluated to Vega-Lite or even to Vega with VegaFusion.

Hence, I like transformed_data. Synonyms for transformed could be used such as evaluated but transformed maps closest to the transformation concept: alt.Chart().transform_pivot().transformed_data vs. alt.Chart().transform_pivot().evaluated_data().

Contributor

mattijn commented Jun 11, 2023 •

edited

Loading

I liked the eval(), but agree with @binste that the meaning is ambiguous. Regarding the following:

Right now in VegaFusion, these results can actually be an Arrow tables (if the input Chart(s) wrap arrow tables), a Polars DataFrames (if the input Chart(s) wrap Polars DataFrames), or pandas DataFrames (all other cases).

Should the output type be identical to the input type (eg. what is the input type of a url)? Otherwise can do .pl() for polars, .arrow() for pyarrow and .df() for pandas, like duckdb is doing?

Anyhow, no strong feelings for or against the suggestions.

Contributor

binste commented Jun 11, 2023 •

edited

Loading

That's a good point. Makes it more stable if a user can declare what they want in return independent of what they put in. As another idea: transformed_data(return_type="pandas"/"polars"/"arrow") or df_type.

Contributor

mattijn commented Jun 12, 2023 •

edited

Loading

To summarize, up until now we have discussed:

.transformed_data()
.to_data()
.to_df()
.collect()
.compute()
.evaluate()
.eval()
.evaluated_data()
.df()
.pl()
.arrow()

Currently in Altair we do it as such:

.save(format=***)
.to_dict(format=***) # defaults to vega-lite option for vega
.to_json(format=***) # defaults to vega-lite option for vega

So than it could be consistent to say

transformed_data(format=*** OR dtype=***), # eg defaults to auto (equals input-dtype) option for manually define dtype

Personally, I like it when there is a grouping using to_***().
Using the autocomplete I can discover what this object can be evaluated into (similar one can autocomplete to .mark_***() and transform_***() to see what is possible).

For example:

to_svg()
to_png()
to_pdf(()
to_df()
to_pl()
to_arrow()
to_vgjson()
to_vljson()
to_json() # is vega-lite
to_vgdict()
to_vldict()
to_dict() # is vega-lite

I'm aware that the latter (using to_***()) is probably better to discuss in a different issue. So also tempted to say, leave it as is (.transformed_data()).


          Protocol is available in Python 3.8

1416f4d

Contributor Author

jonmmease commented Jun 12, 2023

Thanks for the thorough summary @mattijn. It sounds like we've all landed on leaving the name as transformed_data. I'll still leave it as an internal method for this PR, so it's not completely final, but let's move forward as-is.

I think the last remaining question for this PR is whether the DataFrameLike protocol should be public or private. As I mentioned above, my assumption was that it should be public since it will be used in the return type of public methods. But I'm honestly not that familiar with mypy conventions.

This was referenced Jun 12, 2023

Type hints: Summary issue #2951

Closed

Type hints: Parts of folders "vegalite", "v5", and "utils" #2976

Merged

Contributor

binste commented Jun 12, 2023 •

edited

Loading

It's a great point that the types should be public as well. I've never type hinted a publicly accessible library so if someone has more experience please chime in but it sounds reasonable to me. I'm still unsure how to best organise these types. In #2976 I actually wanted to introduce the same protocol. I'll probably switch to yours in case this PR is merged first.

How about this: We mark these types as private for now to give us the freedom to refactor. Altair is not yet marked as a typed package (i.e. it does not have a py.typed file in its repo, see the mypy docs). Once we have typed most of the public API and add that file, we convert the private types to public ones and make it official by adding it to the release notes that this is ready. As this might still be a few weeks out, for any release until then, users won't have any downside of these private types being in the codebase but it's clear that they should not yet be used. I added a note to do this in #2951 which I'm working through.

Thanks @mattijn for the overview, this is helpful! I also find the to_* methods convenient but I'm not sure yet how to best do it. I'm also ok with leaving as-is and moving this forward.


          Make DataFrameLike private for now

c665e8f

Contributor Author

jonmmease commented Jun 12, 2023

Sounds good @binste, I made the protocol private in c665e8f.

I think that's everything. I'll let this sit until tomorrow in case anything else comes to mind, but if I don't hear anything I'll merge tomorrow. Thanks again!

jonmmease merged commit f3938bf into master

24 checks passed

jonmmease mentioned this pull request

Make transformed_data public and add initial docs #3084

Merged

jonmmease mentioned this pull request

Add VegaFusion data transformer with mime renderer, save, and to_dict/to_json integration #3094

Merged

mattijn mentioned this pull request

Introduce export methods like Chart.to_***() #3189

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment