Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move sparsity related discussion to another section. #26

Merged
merged 2 commits into from
Jul 18, 2023

Conversation

Ankush-Chander
Copy link
Contributor

Many thanks for this comprehensive guide on embeddings.

My understanding behind proposed change:
Sparsity related discussion at the beginning of section 3.2.2 TF-IDF gives an impression that we solve sparsity issue by moving from one-hot encoding to TF-IDF. However TF-IDF encoding doesn"t change the number of zero cells we have in one-hot encoding, it only effects the weightage given to the non-zero cells, thereby leaving the sparsity of the one-hot matrix unchanged.

Proposed change:

  1. Elicit advantage of TF-IDF over one-hot encoding in section 3.2.2 TF-IDF.
  2. Move sparsity discussion to section 3.2.3 where matrix factorisation is discussed.

@micaleel
Copy link

@Ankush-Chander, your definition of sparsity based on the size of embeddings is debatable . When comparing two embeddings of equal size, the one with fewer empty or zero values is considered dense.

@Ankush-Chander
Copy link
Contributor Author

Hi @micaleel
Sorry for not being more explicit before. My interpretation of sparsity is exactly as you just mentioned.
Number of cells in term-document matrix that have value 0.

embeddings.tex Outdated Show resolved Hide resolved
Copy link
Owner

@veekaybee veekaybee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one small change for clarification, once that's done this is good to go. Thanks for opening this!

@veekaybee
Copy link
Owner

LGTM!

@veekaybee veekaybee merged commit d057519 into veekaybee:main Jul 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants