Skip to content
This repository has been archived by the owner on Jun 27, 2022. It is now read-only.

Term 2: Lesson 6 part 6: computing _tf function does not need to check if freq == 0 #91

Open
jevgenitolstouhhov opened this issue Jan 25, 2019 · 0 comments
Assignees

Comments

@jevgenitolstouhhov
Copy link

In section 4 of the workbook I noticed that a formula requires to check frequency for zero, if a word does not appear in a document.

This condition actually never evaluates to true, because in function get_tf we take frequency from bag_of_words function and in the bag we always have a frequency of each word at least 1.

What gives desired result and 0 for non existing words in a document is actually get_vector function. When we iterate IDF words from entire corpus, we eventually get words non existent in TF dictionary. So we get 0 just be cause we try to select a word that does not exist, but not because we have a condition in function _tf. Also I need to mention that constructing defaultdict with "int" factory is important (in get_tf function). Otherwise the dictionary will not return 0 on non-existent keys, but will throw an exception.

I propose to fix solution code and also fix the description in workbook's section 4 a little bit, which concerns handling special case, when freq == 0.

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants