Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split metadata into multiple files #16

Open
brandondutra opened this issue Mar 12, 2017 · 2 comments
Open

split metadata into multiple files #16

brandondutra opened this issue Mar 12, 2017 · 2 comments

Comments

@brandondutra
Copy link

Having one file with all the vocabs can be a problem for large examples. I think this was a performance problem with a criteo sample.

It would be nice to have vocab files for each column. So if a "string to int" transforms is needed only for a few categorical columns, the vocab for every column does not need to be loaded.

@nikhilk
Copy link
Contributor

nikhilk commented Mar 12, 2017

Yes, agree.

I haven't fully grokked how vocab files work end-to-end ... wrt to setting up a hashtable from a file, so it works at training and prediction time, and how vocabs should be saved within a saved model. Perhaps this can be researched a bit unless you already know...

@brandondutra
Copy link
Author

The structure data package reads the vocab file, and embeds it in the graph with index_table_from_tensor (but I think index_to_string_table_from_file would work fine). The vocab file then does not need to be saved with the exported graph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants