Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model arguments confusion!! #19

Closed
Priyansh2 opened this issue Mar 27, 2019 · 4 comments
Closed

Model arguments confusion!! #19

Priyansh2 opened this issue Mar 27, 2019 · 4 comments
Assignees

Comments

@Priyansh2
Copy link

Priyansh2 commented Mar 27, 2019

No description provided.

@yuvalpinter
Copy link
Owner

yuvalpinter commented Mar 27, 2019

Hi Priyansh, does closing the issue mean you figured out these points? I would be glad to help if not.

@Priyansh2
Copy link
Author

Priyansh2 commented Mar 28, 2019

@yuvalpinter Actually I by mistake cleared off my issue related content while editing and then saved it. I am rewriting it here. While training using 'model.py', what should i give it to the --vocab ?. Is this option take my rare word file which were not present in the vocabulary of my training data. If so, then what is the use of --all_from_mimick as it is written that setting it "ON", the vectors in original training set are overriden by Mimick-generated vectors. I mean can't we give training data words along with my rare word file or just only these words . What will happen in both case. Also, the option :- --normalized-targets says that if toggled, train on normalized vectors from set. Does this mean that it will normalised the vectors before training happens. Can you elaborate on this option usage. Moreover, regarding the dimensionality of word vectors, i have and want after mimick algorithm the word vector's dimension to be x (lets say 100). So what needs to be changed for this ?

@Priyansh2 Priyansh2 reopened this Mar 28, 2019
@Priyansh2
Copy link
Author

@yuvalpinter There are some coding bugs which i wrote below. Kindly fix it.

  1. In make-dataset.py file (inside mimick directory) on line 19, utils should be changed to util and inside util.py following should be added :- import codecs,numpy as np and code corresponding to functions :- read_text_embs and read_pickle_embs should be added.

  2. In above file in line 90, the file should be pickled in "wb" format rather "w". In my case it throws an error.

@yuvalpinter yuvalpinter self-assigned this Apr 8, 2019
@yuvalpinter
Copy link
Owner

Hi,
Apologies for the late response.

  1. The bugs you mention are a result of the utils file being moved up to the main directory, followed by a new util file being opened. I will think about how best to fix this (for now you might as well just copy over the utils file into the directory). You're right about the wb line.
  • --vocab is indeed a text file containing all words you want to know the predicted embeddings for, if you want to plug it into your model as a preprocessing step (you can always just load a Mimick model into your downstream application and call it on-the-fly as well).
  • all_from_mimick flag asks if you want all embeddings for --vocab words (including in-vocab) to be inferred from Mimick; the default is to copy over any in-vocab words from the original dictionary you're training the Mimick model from.
  • --normalized-targets normalizes the input embeddings before training happens, as you hypothesized. In my experiments I did not encounter any major effect of this flag.
  • Mimick is constrained to predicting embeddings in the same dimensionality as the input embeddings. If you want to change it, the best way would be to change their dimensionality accordingly (e.g. by some projection, or PCA).

yuvalpinter added a commit that referenced this issue Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants