Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min/max values of transformed metadata #11

Closed
pnezis opened this issue Apr 10, 2017 · 6 comments
Closed

min/max values of transformed metadata #11

pnezis opened this issue Apr 10, 2017 · 6 comments
Milestone

Comments

@pnezis
Copy link

pnezis commented Apr 10, 2017

Assume I am applying the string_to_int function on a categorical column. In the transformed schema this column is mapped to INT where the min and max entries of the domain are the min and max values of INT

"domain": {
      "ints": {
            "isCategorical": false,
            "max": "9223372036854775807",
            "min": "-9223372036854775808"
        }
 },

I would expect instead the actual min (-1) and max (size(vocab)-1) values of this transformed column. Is there any workaround for this?

@KesterTong
Copy link
Contributor

Currently not, but I am working on setting this information automatically. Should be done on 2-3 weeks.

@KesterTong
Copy link
Contributor

Update: made some progress here, ETA is still 2-3 weeks

@buffxz
Copy link

buffxz commented Aug 24, 2017

Is there any updates about this issues? When can we expect we have more informative meta data about the dataset ?

can we also have the quantile values?

@KesterTong
Copy link
Contributor

This is fixed at head, but it will be about 1 month before the next release of TFT on PyPI.

Regarding other metadata, the situation of using string_to_int is special because we already know from the string_to_int calculation, what the range is.

For other cases, we don't have code to calculate these ranges. In principal this is straightforward but the internal code isn't setup to calculate arbitrary statistics for the outputs. We plan to add this but it's not clear how we will do it at this point.

@katsiapis
Copy link
Member

katsiapis commented Oct 21, 2017

@pnezis Tensorflow Transform 0.3.1 and 0.3.1 have been released and contain several enhancements, including (I believe) support for your initial request.

@buffxz regarding Quantiles, please see our response in #29

@KesterTong or @elmer-garduno, could we apply the appropriate milestone (0.3.0?) to this Issue and perhaps Close it (if we've fully answered all the questions)?

@elmer-garduno elmer-garduno added this to the 0.3.1 milestone Oct 22, 2017
@KesterTong
Copy link
Contributor

This was fixed in 0.3.0 but we've added it to the 0.3.1 milestone since we didn't create a milestone for 0.3.0. To answer the original question, as of version 0.3.0, we will infer min and max values (based on the vocab size) and set them in the transformed schema, but only for the output of string_to_int. If you do any transformations after string_to_int we will not provide min and max values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants