-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Description
Hi, I recently discovered this excellent repository for learning basic concepts in ML and noticed a potential problem in the implementation of dropout wrapper. In particular, this is the line of code I am confused about:
dLdy *= 1.0 / (1.0 - self.p) |
Shouldn't the gradient from a later layer also apply the mask from the one used in forward()
? Otherwise, dLdy
will overestimate the true gradient after dividing by the probability 1.0 - self.p
. Not sure this is an issue though as I am a beginner in ML.
Metadata
Metadata
Assignees
Labels
No labels