Model arguments confusion!! #19

Priyansh2 · 2019-03-27T14:15:31Z

No description provided.

yuvalpinter · 2019-03-27T18:22:49Z

Hi Priyansh, does closing the issue mean you figured out these points? I would be glad to help if not.

Priyansh2 · 2019-03-28T07:50:57Z

@yuvalpinter Actually I by mistake cleared off my issue related content while editing and then saved it. I am rewriting it here. While training using 'model.py', what should i give it to the --vocab ?. Is this option take my rare word file which were not present in the vocabulary of my training data. If so, then what is the use of --all_from_mimick as it is written that setting it "ON", the vectors in original training set are overriden by Mimick-generated vectors. I mean can't we give training data words along with my rare word file or just only these words . What will happen in both case. Also, the option :- --normalized-targets says that if toggled, train on normalized vectors from set. Does this mean that it will normalised the vectors before training happens. Can you elaborate on this option usage. Moreover, regarding the dimensionality of word vectors, i have and want after mimick algorithm the word vector's dimension to be x (lets say 100). So what needs to be changed for this ?

Priyansh2 · 2019-03-28T08:27:40Z

@yuvalpinter There are some coding bugs which i wrote below. Kindly fix it.

In make-dataset.py file (inside mimick directory) on line 19, utils should be changed to util and inside util.py following should be added :- import codecs,numpy as np and code corresponding to functions :- read_text_embs and read_pickle_embs should be added.
In above file in line 90, the file should be pickled in "wb" format rather "w". In my case it throws an error.

yuvalpinter · 2019-04-16T15:37:06Z

Hi,
Apologies for the late response.

The bugs you mention are a result of the utils file being moved up to the main directory, followed by a new util file being opened. I will think about how best to fix this (for now you might as well just copy over the utils file into the directory). You're right about the wb line.

--vocab is indeed a text file containing all words you want to know the predicted embeddings for, if you want to plug it into your model as a preprocessing step (you can always just load a Mimick model into your downstream application and call it on-the-fly as well).
all_from_mimick flag asks if you want all embeddings for --vocab words (including in-vocab) to be inferred from Mimick; the default is to copy over any in-vocab words from the original dictionary you're training the Mimick model from.
--normalized-targets normalizes the input embeddings before training happens, as you hypothesized. In my experiments I did not encounter any major effect of this flag.
Mimick is constrained to predicting embeddings in the same dimensionality as the input embeddings. If you want to change it, the best way would be to change their dimensionality accordingly (e.g. by some projection, or PCA).

Priyansh2 closed this as completed Mar 27, 2019

Priyansh2 reopened this Mar 28, 2019

yuvalpinter self-assigned this Apr 8, 2019

yuvalpinter added a commit that referenced this issue Apr 16, 2019

bugs from #19

deb60d3

yuvalpinter closed this as completed Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model arguments confusion!! #19

Model arguments confusion!! #19

Priyansh2 commented Mar 27, 2019 •

edited

yuvalpinter commented Mar 27, 2019 •

edited

Priyansh2 commented Mar 28, 2019 •

edited

Priyansh2 commented Mar 28, 2019

yuvalpinter commented Apr 16, 2019

Model arguments confusion!! #19

Model arguments confusion!! #19

Comments

Priyansh2 commented Mar 27, 2019 • edited

yuvalpinter commented Mar 27, 2019 • edited

Priyansh2 commented Mar 28, 2019 • edited

Priyansh2 commented Mar 28, 2019

yuvalpinter commented Apr 16, 2019

Priyansh2 commented Mar 27, 2019 •

edited

yuvalpinter commented Mar 27, 2019 •

edited

Priyansh2 commented Mar 28, 2019 •

edited