Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parse date values #150

Closed
wants to merge 6 commits into from
Closed

Conversation

johnmdonich
Copy link

Add support for datetime.date values in pandas DataFrame as described in #70

Pandas types the SQL DATETYPE as an Object column with standard library datetime.date values. This also comes up when converting an Apache Spark Dataframe with DateType() columns.

A simple check for date values in the object apply() function is all that is needed.

image

@domoritz
Copy link
Member

domoritz commented Jan 7, 2020

@jakevdp
Copy link
Contributor

jakevdp commented Jan 8, 2020

Yeah, this could be added to Altair, but storing Python date types in object arrays is really an anti-pattern. Far better and more performant is to use pandas datetimes directly; e.g.

df.Year = pd.to_datetime(df.Year, format='%Y')

vega/utils.py Outdated
if isinstance(val, np.ndarray):
return val.tolist()
elif isinstance(val, datetime.date):
return "{dt:%Y-%m-%d}".format(dt=val)
Copy link
Contributor

@jakevdp jakevdp Jan 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Date parsing has to be done very carefully at the interface between Pandas and Vega-Lite: these dates should be encoded in ISO format, or else you will run into bugs like this one: vega/altair#1027

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that having numpy types is strongly preferred over python date types. Although I don't think it is unreasonable to assume that a column of python date be serialized into valid json for plotting.

Adding this support will benefit anyone converting Apache Spark DataFrame to Pandas DataFrame because DateType columns are cast to python dates. You can see in SPARK-23290 the reasoning for this design choice.

I did add an explicit cast to iso format

vega/utils.py Outdated
@@ -48,7 +48,7 @@ def parse_object_column_type(val):
if isinstance(val, np.ndarray):
return val.tolist()
elif isinstance(val, datetime.date):
return "{dt:%Y-%m-%d}".format(dt=val)
return val.isoformat()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, a partial iso format like "2019-01-01" is not sufficient to ensure that Javascript date parsing is consistent with the expectations of pandas interfacing to Vega-Lite. You need the full iso format, like '2019-01-01T00:00:00'.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clearing this up for me i have done more than a couple hacks dealing with this.

@domoritz
Copy link
Member

domoritz commented Jan 9, 2020

Just to check, do we need to add time zone information to the generated string?

@johnmdonich
Copy link
Author

johnmdonich commented Jan 9, 2020

The method used in Altair and highlighted in the vega/altair#1027 does not add a timezone to the iso format string when converting numpy type M8[D] or the full datetime. In fact timezone aware datetimes are deprecated.

I did some exploration of this and my dates are plotting as expected so I do not think it is necessary. Although @jakevdp has done more work on this than I have so I will defer to him on this one.

@domoritz
Copy link
Member

Can you help finish this pull request @johnmdonich?

@johnmdonich
Copy link
Author

johnmdonich commented Jan 27, 2020

For sure.. It is finished except for a squash. I was just waiting for approval/review from @jakevdp.

@domoritz
Copy link
Member

I can squash when I merge.

@domoritz
Copy link
Member

domoritz commented May 15, 2020

@jakevdp @johnmdonich What's left in this pull request? How can I help push it over the finish line?

@domoritz
Copy link
Member

domoritz commented Jun 3, 2021

@johnmdonich could you redo this pull request?

@domoritz
Copy link
Member

Please send a new pull request.

@domoritz domoritz closed this Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants