issue about data preprocess #14

shangqing-liu · 2019-07-17T02:38:16Z

Hi,
May I ask when I read your codes about reader.py, I doubted the function process_dataset, the sample of the maximum contexts:

safe_limit = tf.cast(tf.maximum(num_contexts_per_example, self.config.MAX_CONTEXTS), tf.int32)
rand_indices = tf.random_shuffle(tf.range(safe_limit))[:self.config.MAX_CONTEXTS]
contexts = tf.gather(all_contexts, rand_indices) # (max_contexts,)

seems will be array out of bounds.

urialon · 2019-07-17T03:09:26Z

Hi,
There is no array out of bound, because there is padding. Even when there are less than max_contexts, there are paddings instead.

Best,
Uri

shangqing-liu · 2019-07-17T03:13:26Z

got it, thanks very much @urialon

urialon closed this as completed Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue about data preprocess #14

issue about data preprocess #14

shangqing-liu commented Jul 17, 2019 •

edited

Loading

urialon commented Jul 17, 2019

shangqing-liu commented Jul 17, 2019

issue about data preprocess #14

issue about data preprocess #14

Comments

shangqing-liu commented Jul 17, 2019 • edited Loading

urialon commented Jul 17, 2019

shangqing-liu commented Jul 17, 2019

shangqing-liu commented Jul 17, 2019 •

edited

Loading