Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DIGITS Tutorial based on this project #12

Closed
gheinrich opened this issue Apr 15, 2016 · 8 comments
Closed

DIGITS Tutorial based on this project #12

gheinrich opened this issue Apr 15, 2016 · 8 comments

Comments

@gheinrich
Copy link

Hi, just to let watchers know that I have added a Tutorial on DIGITS for text classification using the model from this project.

See the write-up

I am using CuDNN and an optimized data loader and training time is an order of magnitude faster than the reference implementation. On my system it takes ~15mn to train one epoch of 498400 training samples and 56000 validation samples, with 4 validation sweeps per epoch.

@gheinrich
Copy link
Author

This isn't an issue so I am closing now.

@aichemzee
Copy link

Do you have a public aws ami with digits installed?

@gheinrich
Copy link
Author

Sorry I don't have that. We have a docker image for DIGITS which makes installation very straightforward.

However you need a more recent version of DIGITS than available in the Docker image to run the Text Classification tutorial. @flx42 is it conceivable to publish a Dockerfile to allow users to create images off the latest DIGITS code from Github?

@darwinzer0
Copy link

Thank you for implementing this, it works much faster. Because the input is 1024 characters instead of the 1014 in the original version, should the sizes of the layers be slightly different? e.g. 341 x 256 after the first TemporalMaxPooling, and so forth.

@gheinrich
Copy link
Author

gheinrich commented May 11, 2016

Oh yes I should have updated this comment. Or perhaps it should be removed since the idea is to have the number of input characters a parameter.

With feature_len=1024 don't we end up with (1024-6-3)/3+1=339 features after the first max pooling operation? So the successive shapes would be:

    -- those shapes are assuming feature_len==1024
    -- 1024 x alphabet_len
    net:add(backend.TemporalConvolution(alphabet_len, 256, 7))
    -- [1024-6=1018] x 256
    net:add(nn.Threshold())
    net:add(nn.TemporalMaxPooling(3, 3))
    -- [(1018-3)/3+1=339] x 256
    net:add(backend.TemporalConvolution(256, 256, 7))
    -- [339-6=333] x 256
    net:add(nn.Threshold())
    net:add(nn.TemporalMaxPooling(3, 3))
    -- [(333-3)/3+1=111] x 256
    net:add(backend.TemporalConvolution(256, 256, 3))
    net:add(nn.Threshold())
    -- [111-2=109] x 256
    net:add(backend.TemporalConvolution(256, 256, 3))
    net:add(nn.Threshold())
    -- [109-2=107] x 256
    net:add(backend.TemporalConvolution(256, 256, 3))
    net:add(nn.Threshold())
    -- [107-2=105] x 256
    net:add(backend.TemporalConvolution(256, 256, 3))
    -- [105-2=103] x 256
    net:add(nn.Threshold())
    net:add(nn.TemporalMaxPooling(3, 3))
    -- [(103-3)/3+1=34] x 256
    net:add(nn.Reshape(8704))

We still end up with 8704 features at the input of the fully-connected layers.

@darwinzer0
Copy link

Ah yes, I had forgotten the -6 in the first convolution. Thanks, this helps a lot. I was making some changes to the network and wanted make sure I was calculating everything correctly.

@zhangxiangxiao
Copy link
Owner

This is wonderful! I see the pull request is in DIGITS already. If you do not mind, I will probably brag about it on Facebook a bit :P

Thanks for the great contribution!

@gheinrich
Copy link
Author

If you do not mind, I will probably brag about it on Facebook a bit :P

You are most welcome to do so :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants