Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transforming date values #41

Closed
Malonl opened this issue Apr 16, 2019 · 10 comments
Closed

Transforming date values #41

Malonl opened this issue Apr 16, 2019 · 10 comments

Comments

@Malonl
Copy link

Malonl commented Apr 16, 2019

Hi.

I have a use case where I want to use date features as input values for a predictive model. I need to transform the date features to be useful.
For example, I need to know the difference between two dates (for example, just the difference in days between 01-04-2019 and 16-04-2019, but the dates can also be months or years apart).
Or just getting the day of the month, the month itself or the year (i.e. for 16-04-2019, getting 16, 4 and 2019 as seperate values).

My question is if it is possible to do this within TFX and if not, is this a feature that is coming up?
It would be important for my use case because the transform needs to be done in the graph format so that I can serve the model with the transformations inside the pipeline.
Otherwise I would need to add something that can do this for me outside of TFX.

Thanks in advance!

Martijn

@Efaq
Copy link

Efaq commented Apr 16, 2019

Same issue here!

@ruoyu90
Copy link
Contributor

ruoyu90 commented Apr 16, 2019

IIUC this is about converting 1 column into several features. You can use the transform component to do this by putting your logic into preprocessing_fn like our chicago taxi example.

@Harshini-Gadige
Copy link

Harshini-Gadige commented Apr 16, 2019

For more information on using preprocessing_fn within Tensorflow Transform, please check Preprocessing function example here.

@Efaq
Copy link

Efaq commented Apr 17, 2019

@ruoyu90 I think the issue goes through graph operations with dates, more than just converting one column into three. So:

  • task: given two date columns, generate a new column with the difference between them in days.
  • if we could use some python library (datetime for example), it would be trivial. Without the library, we would need to implement the knowledge about the calendar (number of days in each month etc)
  • I believe we cannot use a conventional python library because if we use it, the transformation would not be written to the graph, and thus we would not be able to have it at serving time.

If there is no way such operations to the graph, then as mentioned above we would need to implement a piece of pipeline transforming the data both before training and before serving, outside of the graph.

Does it make sense, or am I missing something here?

@chanshah chanshah assigned chanshah and unassigned chanshah Apr 18, 2019
@chanshah
Copy link
Member

@tensorflow/transform-team please take a look.

@chanshah chanshah removed their assignment Apr 21, 2019
@KesterTong
Copy link

Everything @Efaq Says above is correct: this should be done with Transform and must be done with TensorFlow ops (not py_func) so that it can be done in serving.

Please file an issue in the tensorflow/transform repo, and we can further discuss the exact functionality required.

@Malonl
Copy link
Author

Malonl commented Apr 23, 2019

Thanks for all of your replies. I will post an issue in the transform repo.

@ian-hensel-apex
Copy link

Any update on this?

@dmitra79
Copy link

Any update on these features? They really would be great to have!

@pritamdodeja
Copy link

Since tfx isn't supporting this, here is a workaround. You can use https://www.tensorflow.org/addons/api_docs/python/tfa/text/parse_time to parse the string to unix date time, and then derive temporal features from the number of seconds since 1970. I would almost be tempted to have a utility function that takes as input a list of columns that are date time and transforms those features to be unix time. I'll likely need to write this, and when I do, I'll post it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants