Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with density transform when providing csv data as a url #9165

Open
dallascard opened this issue Nov 7, 2023 · 2 comments
Open

Bug with density transform when providing csv data as a url #9165

dallascard opened this issue Nov 7, 2023 · 2 comments
Labels

Comments

@dallascard
Copy link

dallascard commented Nov 7, 2023

I'm having an issue with density transforms that seems to occur when providing data in the form of link to a .csv file. So far, this does not seem to be an issue when passing in the data as a link to a .json file.

As background, I'm coming to this from Altair. Starting from the Altair example for a Density plot, here is the original example in Altair:

import altair as alt
from vega_datasets import data

alt.Chart(data.movies.url).transform_density(
    'IMDB_Rating',
    as_=['IMDB_Rating', 'density'],
).mark_area().encode(
    x="IMDB_Rating:Q",
    y='density:Q',
)

which produces the following vega-lite specification (which works fine). Note that the url links to a json file:

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/movies.json"
  },
  "mark": {"type": "area"},
  "encoding": {
    "x": {"field": "IMDB_Rating", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "IMDB_Rating", "as": ["IMDB_Rating", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

(or in the vega-lite editor)

However, if I change the data source to the seattle_weather dataset, and plot the temp_max variable in that dataset, the chart it produces is empty.

Here is the altair code:

import altair as alt
from vega_datasets import data

alt.Chart(data.seattle_weather.url).transform_density(
    'temp_max',
    as_=['temp_max', 'density'],
).mark_area().encode(
    x="temp_max:Q",
    y='density:Q',
)

which produces the following vega-lite specification (which links to a .csv file):

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/seattle-weather.csv"
  },
  "mark": {"type": "area"},
  "encoding": {
    "x": {"field": "temp_max", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "temp_max", "as": ["temp_max", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

Example in Editor, which shows a blank plot.

Interestingly, this works fine if I use vega datasets to convert the url into a dataframe, which embeds the entire dataset in the vega-lite specificaiton. Here is the working Altair code, which is the same, except for how the data source is specified:

import altair as alt
from vega_datasets import data

alt.Chart(data.seattle_weather()).transform_density(
    'temp_max',
    as_=['temp_max', 'density'],
).mark_area().encode(
    x="temp_max:Q",
    y='density:Q',
)

And the example in the editor, which similarly works.

Also, other types of plots also work with the seattle_weather data when using the url. For example, here is a scatter plot using the same variable in Altair:

alt.Chart(data.seattle_weather.url).mark_point().encode(
    x="temp_max:Q",
    y='temp_min:Q',
)

which works find and produces the following vega-lite code in the editor.

I'm not 100% sure, but I think the problem arises when passing in data in the form of a URL that points to a .csv file. Here is another example using the Disasters dataset. In this case, the plot is not blank, but it is not correct (as can be verified by changing the altair code to embed the data in the vega-light specification, as above). Here is the Altair code:

alt.Chart(data.disasters.url).transform_density(
     'Deaths',
    as_=['Deaths', 'density'],
).mark_line(width=2).encode(
    x="Deaths:Q",
    y='density:Q',
)

and the vega-lite specification:

{
  "config": {"view": {"continuousWidth": 300, "continuousHeight": 300}},
  "data": {
    "url": "https://cdn.jsdelivr.net/npm/vega-datasets@v1.29.0/data/disasters.csv"
  },
  "mark": {"type": "line", "width": 2},
  "encoding": {
    "x": {"field": "Deaths", "type": "quantitative"},
    "y": {"field": "density", "type": "quantitative"}
  },
  "transform": [{"density": "Deaths", "as": ["Deaths", "density"]}],
  "$schema": "https://vega.github.io/schema/vega-lite/v5.8.0.json"
}

and the live example in the editor.

@dallascard dallascard changed the title Bug with density transform when providing data as a url Bug with density transform when providing data as a url to a csv file Nov 7, 2023
@dallascard dallascard changed the title Bug with density transform when providing data as a url to a csv file Bug with density transform when providing csv data as a url Nov 7, 2023
@dallascard
Copy link
Author

Also, it seems like Issue #7603 could be related

@ChiaLingWeng
Copy link
Contributor

Still working on this, but I think this issue is generated during process with csv extension.
In below example, use json file works fine
Open the Chart in the Vega Editor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants