split metadata into multiple files #16

brandondutra · 2017-03-12T20:26:03Z

Having one file with all the vocabs can be a problem for large examples. I think this was a performance problem with a criteo sample.

It would be nice to have vocab files for each column. So if a "string to int" transforms is needed only for a few categorical columns, the vocab for every column does not need to be loaded.

nikhilk · 2017-03-12T21:35:46Z

Yes, agree.

I haven't fully grokked how vocab files work end-to-end ... wrt to setting up a hashtable from a file, so it works at training and prediction time, and how vocabs should be saved within a saved model. Perhaps this can be researched a bit unless you already know...

brandondutra · 2017-03-12T23:51:08Z

The structure data package reads the vocab file, and embeds it in the graph with index_table_from_tensor (but I think index_to_string_table_from_file would work fine). The vocab file then does not need to be saved with the exported graph.

nikhilk added issue.feature P1 labels Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split metadata into multiple files #16

split metadata into multiple files #16

brandondutra commented Mar 12, 2017

nikhilk commented Mar 12, 2017

brandondutra commented Mar 12, 2017

split metadata into multiple files #16

split metadata into multiple files #16

Comments

brandondutra commented Mar 12, 2017

nikhilk commented Mar 12, 2017

brandondutra commented Mar 12, 2017