Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor layer to handle minibatch at once #181

Merged

Conversation

reunanen
Copy link
Contributor

For Batch Normalization purposes, refactor layer to handle a minibatch at once.

Not saying that I completely knew what I was doing, so please be critical!

? std::vector<tensor_t>(&t_cost[start_index], &t_cost[end_index])
: std::vector<tensor_t>();

bprop<E>(fprop(in_batch, i), t_batch, i, t_cost_batch);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bprops only those samples from the minibatch that have been assigned to this thread. I guess this will need to be changed when implementing Batch Normalization, so that the whole minibatch goes to the layer as a single object. The parallelism can then be handled inside each layer. This may be a good thing, because the optimal parallelism strategy may be different for each layer type (at least in theory).

@bhack
Copy link
Contributor

bhack commented Jun 14, 2016

/cc @edgarriba

…n anonymous namespace in order to hopefully make AppVeyor happy
@edgarriba
Copy link
Member

@reunanen it looks nice! be aware that soon there will be some changes in the layers API.
Check this PR for device abstraction: #166

{(in[0]^in[1])*1.0, (in[0]==in[1])*1.0}, // 2nd output: train xor/eq function
{-1, 0.5} // 3rd output: train constants (purpose: make output channel count different from output dimension)
{(in[0]&&in[1])*1.0, (in[0]||in[1])*1.0}, // 1st output train and/or function
{(in[0]^in[1])*1.0, (in[0]==in[1])*1.0} // 2nd output train xor/eq function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reunanen please cast the results to float_t as done here in order to have a consistent code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. But actually I reverted this change already. Didn't really mean to push it in the first place – just added it to more easily track how the different dimensions are supposed to work.

@edgarriba
Copy link
Member

@reunanen @nyanp this PR is to master. We should define a quick protocol for development branches

@reunanen
Copy link
Contributor Author

Damn, I only tested with OpenMP enabled. Apparently the tests fail without it. Will try to fix.

@bhack
Copy link
Contributor

bhack commented Jun 14, 2016

@nyanp Yes we need to find a policy for branches and pull request in this period of heavy refactoring

@edgarriba
Copy link
Member

my PR can wait after we can merge this minibatch approach. Otherwise, it will be a pain to adapt it.

@nyanp
Copy link
Member

nyanp commented Jun 15, 2016

@reunanen
Great step! Did you execute a performance profiling? I'm interested in how this PR will affect the speed of forward/backward pass.

@edgarriba
Oops I've accidentally merged your first PR. As far as I see it is not so hard to merge this PR into current feat/generic-computational-graph, so I'll do it.

@edgarriba
Copy link
Member

no worries

@nyanp @reunanen @bhack
Do you think that with the new API we could implement this minibatches routines inside layer::forward/backward_pass() ?

@nyanp
Copy link
Member

nyanp commented Jun 15, 2016

@edgarriba
Yes, and we'll have more opportunities for improve the training time with minibatch parallelism inside each layers.

@reunanen
Copy link
Contributor Author

reunanen commented Jun 16, 2016

Unfortunately still struggling with correctness. @nyanp, good question (re: perf), compared it for the first time now. Seems to be poorer at the moment (about 10% or so); will try to look into that as well.

So even if the tests pass, I don't think everything is correct. Probably better not merge just yet.

}

virtual void set_worker_count(cnn_size_t worker_count) override {
Base::set_worker_count(worker_count);
dropout_layer_worker_storage_.resize(worker_count);

for (dropout_layer_worker_specific_storage& dws : dropout_layer_worker_storage_) {
dws.mask_.resize(in_size_);
if (dws.mask_.empty()) {
dws.mask_.resize(1, std::vector<uint8_t>(in_size_));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size of mask_ allocated here is [1, in_size_], but what we need is [sample_size, in_size_]. Possible solution is one of them: (I think the former is better)

  • Add new virtual function set_minibatch_count into layer classes, and call it from network at the beginning of training
  • Check the size of minibatch in fprop and resize mask_ on demand - It requires a exclusive locking while resizing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nyanp: Correct. For other layer types I took the latter approach, though (see e.g. here – although I think this may be one of the places that are not correct yet, but let's not focus on that right now).

Not sure why it requires a lock, though, as each vector (to be resized) is still worker-specific.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reunanen
Ah I misunderstood, you are correct. it doesn't require a lock while resizing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed by resizing in fprop.

@nyanp
Copy link
Member

nyanp commented Jun 16, 2016

@reunanen

Seems to be poorer at the moment (about 10% or so); will try to look into that as well.

Ok. I think loss around 10% is acceptable for master branch :) but it's great if you improve it better!

So even if the tests pass, I don't think everything is correct. Probably better not merge just yet.

Current tests do almost nothing about minibatch, so it would be great if you add/extend a minibatch version of gradient_check method.

@edgarriba
Copy link
Member

besides, if we parallelize the for loops for mini-batches I think that it will be a huge boosting

@bhack
Copy link
Contributor

bhack commented Jun 16, 2016

For convolution there are differents batched gemm approaches. You can do a google search on this topic. /cc @naibaf7

@reunanen
Copy link
Contributor Author

Regarding the discussion on perf – after all the point is to be able to do Batch Normalization, which should let us learn more in a single epoch. But maybe this shouldn't be merged to master, before BN actually exists?

@nyanp
Copy link
Member

nyanp commented Jun 18, 2016

@reunanen
As far as I know, all other major dnn libraries(tensorflow, caffe, mxnet, keras) handle minibatch as this PR - it is a good sign for us. Since we may be heading in the right direction regardless of BN, I can merge this PR without BN implementation.

Could you add a gradient-check test for mini-batch?

@reunanen
Copy link
Contributor Author

@nyanp : Ok, I'll try to add a gradient check for minibatch.

Also, I think the code here should be changed to now run in parallel n_threads minibatches – one on each thread. Any comments or objections?

@nyanp
Copy link
Member

nyanp commented Jun 19, 2016

@reunanen
Thanks!

I think the code here should be changed to now run in parallel n_threads minibatches – one on each thread. Any comments or objections?

I agree. We don't need this parallelism any more. Would you change it?

@reunanen
Copy link
Contributor Author

@nyanp : Sure, I'll have a look at it.

@nyanp nyanp mentioned this pull request Jun 20, 2016
@reunanen
Copy link
Contributor Author

@nyanp
In fact, running the minibatches in parallel would change the learning results. Why? In the current version, the weights are updated after each minibatch, so results from minibatch 1 are already in use when processing minibatch 2. In my opinion, now that I think of it, changing this behavior is a major modification that may well break existing applications. So after all I think it should be introduced in another PR, or perhaps not at all.

@nyanp
Copy link
Member

nyanp commented Jun 21, 2016

@reunanen
Thank you very much for pointing out the error!

In the current version, the weights are updated after each minibatch, so results from minibatch 1 are already in use when processing minibatch 2.

Hmm all weights are updated at outside of parallel-for loop, so weight should be unchanged within minibatch. Could you tell me where did you find the data-dependency?

Anyway I agree that this issue should be separated from this PR, so I'll merge it sooner.

@reunanen
Copy link
Contributor Author

@nyanp
Yes, I meant that the current implementation is correct, while running multiple minibatches in parallel (as I suggested) might not be such a great idea after all.

@nyanp
Copy link
Member

nyanp commented Jun 21, 2016

@reunanen
Ah I misunderstood. Ok I'll merge this PR as is :)

@nyanp nyanp merged commit 588fc22 into tiny-dnn:master Jun 21, 2016
@reunanen reunanen deleted the refactor-layer-to-handle-minibatch-at-once branch June 22, 2016 04:27
@reunanen reunanen mentioned this pull request Jun 22, 2016
@nyanp nyanp mentioned this pull request Jul 10, 2016
3 tasks
nyanp added a commit that referenced this pull request Jul 21, 2016
* v0.0.1 -> v.0.1.0

- Now we can handle non-sequential model as ```network<graph>``` #108
#153
- Catch up the latest format of caffe's proto #162
- Improve the default behaviour of re-init weight #136
- Add more tests and documents #73
- Remove dependency of OpenCV in MNIST example

* Update layer.h

* Added test case in separate cpp file to force link errors if there are duplicate symbols when including tiny_cnn.h in multiple .cpp files.

* Fix linker error due to duplicate symbols.

* Update README.md

- Update layer catalogue
- Upgrade example codes to v0.1.0

* Removed a repeated include.

* Bug Fix: network::test was broken in multi-thread. closes #185

* fix test_with_stb_image.cpp and typos (tiny_cnn_hrds => tiny_cnn_hdrs) in CMakeLists.txt (#189)

* fix a compile error and warnings when the type float_t is a typedef of float (#191)

* fix README.md (#194)

* Refactor layer to handle minibatch at once (#181)

* get rid of unnecessary compiler warnings (C4018: '>=': signed/unsigned mismatch)

* refactor layer to handle a minibatch at once (previously individual input samples only)

* move new functions apply_cost_if_defined and add_sample_gradient to an anonymous namespace in order to hopefully make AppVeyor happy

* revert the changes to test_network.h

* minor type fixes to get rid of compiler warnings

* make quick changes to deconvolutional_layer so that the examples at least compile

* fix backward_activation to use all samples

* remove unused variables

* update picotest to fix operator precedence

* fix input data ordering

* fix overriding prev_delta

* change gradients to be per-sample as well

* remove unused variables

* fix indexing in convolutional_layer

* also in dropout_layer, have a different mask vector for each sample of a minibatch

* deconvolution_layer: fix allocating space for prev_out_ and cur_out_padded_

* add gradient check for minibatch

* minor: change types to fix some compiler warnings

* Add application links to doc, #158

* Fixed typo (#205)

* Add a contribution document

* fixed typo (#216)

* Add batch normalization #147

* Add batch normalization prototype & remove batch-level parallelsim #147

* add backward pass & formatting

* Add unit test for forward pass

* Add numerical check for batchnorm

* Fix convolutional::brop for pointing correct storage

* Fix bprop in batchnorm

* Change an order of arguments in ctor

to keep consistency to other layers

* add batch normalization layer

* fix compiler error on deconvolutional-layer

* Implement caffe importer

* Revert changes around calc_delta

* Fix backprop of bias factor in conv layer

* Fix compilation error in MSVC2013, close #218 #231

* Add slice layer (#233)

* Bug Fix #234

* Add BSD-3 license file #228

* Fix handling non-square input data in caffemodel #227

* Add power layer #227

* Generalization of loss functions + Correcting MSE (#232)

* generalization of loss function to vectors (solves wrong MSE)

* removed unnecessary function due to loss function generalization

* loss function df operating on vec_t

* correct df of mse

* missin brackets

* fix compile errors in conv-layer

* remove sample_count
edgarriba pushed a commit to edgarriba/tiny-cnn that referenced this pull request Aug 8, 2016
* v0.0.1 -> v.0.1.0

- Now we can handle non-sequential model as ```network<graph>``` tiny-dnn#108
tiny-dnn#153
- Catch up the latest format of caffe's proto tiny-dnn#162
- Improve the default behaviour of re-init weight tiny-dnn#136
- Add more tests and documents tiny-dnn#73
- Remove dependency of OpenCV in MNIST example

* Update layer.h

* Added test case in separate cpp file to force link errors if there are duplicate symbols when including tiny_cnn.h in multiple .cpp files.

* Fix linker error due to duplicate symbols.

* Update README.md

- Update layer catalogue
- Upgrade example codes to v0.1.0

* Removed a repeated include.

* Bug Fix: network::test was broken in multi-thread. closes tiny-dnn#185

* fix test_with_stb_image.cpp and typos (tiny_cnn_hrds => tiny_cnn_hdrs) in CMakeLists.txt (tiny-dnn#189)

* fix a compile error and warnings when the type float_t is a typedef of float (tiny-dnn#191)

* fix README.md (tiny-dnn#194)

* Refactor layer to handle minibatch at once (tiny-dnn#181)

* get rid of unnecessary compiler warnings (C4018: '>=': signed/unsigned mismatch)

* refactor layer to handle a minibatch at once (previously individual input samples only)

* move new functions apply_cost_if_defined and add_sample_gradient to an anonymous namespace in order to hopefully make AppVeyor happy

* revert the changes to test_network.h

* minor type fixes to get rid of compiler warnings

* make quick changes to deconvolutional_layer so that the examples at least compile

* fix backward_activation to use all samples

* remove unused variables

* update picotest to fix operator precedence

* fix input data ordering

* fix overriding prev_delta

* change gradients to be per-sample as well

* remove unused variables

* fix indexing in convolutional_layer

* also in dropout_layer, have a different mask vector for each sample of a minibatch

* deconvolution_layer: fix allocating space for prev_out_ and cur_out_padded_

* add gradient check for minibatch

* minor: change types to fix some compiler warnings

* Add application links to doc, tiny-dnn#158

* Fixed typo (tiny-dnn#205)

* Add a contribution document

* fixed typo (tiny-dnn#216)

* Add batch normalization tiny-dnn#147

* Add batch normalization prototype & remove batch-level parallelsim tiny-dnn#147

* add backward pass & formatting

* Add unit test for forward pass

* Add numerical check for batchnorm

* Fix convolutional::brop for pointing correct storage

* Fix bprop in batchnorm

* Change an order of arguments in ctor

to keep consistency to other layers

* add batch normalization layer

* fix compiler error on deconvolutional-layer

* Implement caffe importer

* Revert changes around calc_delta

* Fix backprop of bias factor in conv layer

* Fix compilation error in MSVC2013, close tiny-dnn#218 tiny-dnn#231

* Add slice layer (tiny-dnn#233)

* Bug Fix tiny-dnn#234

* Add BSD-3 license file tiny-dnn#228

* Fix handling non-square input data in caffemodel tiny-dnn#227

* Add power layer tiny-dnn#227

* Generalization of loss functions + Correcting MSE (tiny-dnn#232)

* generalization of loss function to vectors (solves wrong MSE)

* removed unnecessary function due to loss function generalization

* loss function df operating on vec_t

* correct df of mse

* missin brackets

* fix compile errors in conv-layer

* remove sample_count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants