Conversation
chewxy
left a comment
There was a problem hiding this comment.
left you some things to think about
model/glove/embedding.go
Outdated
| P []tensor.Tensor | ||
| Q []tensor.Tensor | ||
|
|
||
| gradP []tensor.Tensor |
There was a problem hiding this comment.
To think about: is it necessary to keep a copy of the tensors of the gradients?
There was a problem hiding this comment.
Yes. GloVe uses AdaGrad as the optimizer, instead of SGD. AdaGrad also use all past iteration's gradients to decrease learning rate automatically.
Here: https://en.wikipedia.org/wiki/Stochastic_gradient_descent#AdaGrad
There was a problem hiding this comment.
Yeah I'm familiar with Adagrad. I'm thinking maybe this shouldn't be put into the Embedding struct. Instead have something like a Solver struct
There was a problem hiding this comment.
ah I see, that's right. I'll try it.
(It's also good to select other optimizers, AdaDelta, Adam, and so on.)
model/glove/glove.go
Outdated
|
|
||
| func (g *GloVe) train(pind, qind int, f float64) (err error) { | ||
| // SGD | ||
| inner, _ := tensor.Inner(g.emb.P[pind], g.emb.Q[qind]) |
There was a problem hiding this comment.
bad habit! should always check for errors.
Perhaps create a utility struct somewhere that looks like this:
type maybe struct{
err error
}
func (m *maybe) DoBinOp(fn func(a, b interface{})(tensor.Tensor, error), a, b tensor.Tensor) (retVal tensor.Tensor) {
if m.err != nil {
return nil
}
retVal, m.err = fn(a, b)
return
}
func (m *maybe) Error() error { return m.err }
then you can do this:
m := new(maybe)
inner := m.Do(tensor.Inner, g.emb.p[pind], g.emb.Q[qind])
bias := m.Do(tensor.Add, g.emb.biasP[pind, g.emb.biasQ[qind])
// and so on and so forth
model/type.go
Outdated
| } | ||
|
|
||
| // OnesTensor create a tensor has 1 in all elements. | ||
| func (t *Type) OnesTensor(shape ...int) tensor.Tensor { |
There was a problem hiding this comment.
should be called Ones... it already returns a Tensor, so no need to repeat
be9fc29 to
ad90f44
Compare
abae4b5 to
fd4d134
Compare
38e3416 to
fb6a8a8
Compare
66d14b4 to
f6192c6
Compare
| cost = 0.5 * fdiff * diff | ||
| fdiff *= a.initLearningRate | ||
|
|
||
| for i := 0; i < a.dimension; i++ { |
There was a problem hiding this comment.
has there any work been done to compare this with a FMA function?
There was a problem hiding this comment.
Done a little experiments: against about 30 million records
with FMA in tensor: 1 min per iteration
without FMA (this): 30 sec per iteration
model/glove/cofreq.go
Outdated
| // CofreqMap stores the co-frequency between word-word. | ||
| type CofreqMap map[Pair]float64 | ||
|
|
||
| // Pair stores the co-frequency pair words. |
There was a problem hiding this comment.
To make this even faster check this out: https://blog.chewxy.com/2017/07/12/21-bits-english/. I'll help you convert when I find the time
model/glove/glove.go
Outdated
| ) | ||
|
|
||
| // GloVe stores the configs for GloVe models. | ||
| type GloVe struct { |
There was a problem hiding this comment.
While GloVe is the proper name (Global Vector) I think it's quite terrible to have random Uppercase letters in the middle of a name that is not a word. Perhaps stick with Glove ? Just an opinion
There was a problem hiding this comment.
Nope, so I'll rename it, by the way, why is it terrible there are uppercases in the middle?
Overview
Solveris prepared SGD and AdaGrad.Feature Works