You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.
In section 4 of the workbook I noticed that a formula requires to check frequency for zero, if a word does not appear in a document.
This condition actually never evaluates to true, because in function get_tf we take frequency from bag_of_words function and in the bag we always have a frequency of each word at least 1.
What gives desired result and 0 for non existing words in a document is actually get_vector function. When we iterate IDF words from entire corpus, we eventually get words non existent in TF dictionary. So we get 0 just be cause we try to select a word that does not exist, but not because we have a condition in function _tf. Also I need to mention that constructing defaultdict with "int" factory is important (in get_tf function). Otherwise the dictionary will not return 0 on non-existent keys, but will throw an exception.
I propose to fix solution code and also fix the description in workbook's section 4 a little bit, which concerns handling special case, when freq == 0.
Thanks!
The text was updated successfully, but these errors were encountered:
In section 4 of the workbook I noticed that a formula requires to check frequency for zero, if a word does not appear in a document.
This condition actually never evaluates to true, because in function get_tf we take frequency from bag_of_words function and in the bag we always have a frequency of each word at least 1.
What gives desired result and 0 for non existing words in a document is actually get_vector function. When we iterate IDF words from entire corpus, we eventually get words non existent in TF dictionary. So we get 0 just be cause we try to select a word that does not exist, but not because we have a condition in function _tf. Also I need to mention that constructing defaultdict with "int" factory is important (in get_tf function). Otherwise the dictionary will not return 0 on non-existent keys, but will throw an exception.
I propose to fix solution code and also fix the description in workbook's section 4 a little bit, which concerns handling special case, when freq == 0.
Thanks!
The text was updated successfully, but these errors were encountered: