Move to an organization & renaming tiny-cnn #235

nyanp · 2016-07-17T14:53:06Z

We've decided to move tiny-cnn to an organization account to accelarate its development (discussed in #226).

Since it is clear that we are expanding the scope of tiny-cnn from convolutional networks to general networks, the project name tiny-Cnn seems a bit inaccurate now. I want to change the project name to more appropriate one (if we agreed), at the timing of the transferring repository.

In #226 we have these 3 proposals:

tiny-dnn (convolutioanl net -> deep net)
hornet (loose acronyms of Header Only Network)
tiny-cnn (of course we can keep its name)

Whichever we take, naming of project doesn't affect the library API except for its namespace, and hyperlinks and folks, pull requests will be correctly redirected to the new repository.

Please feel free to give me your feedback if you have suggestions for the naming! We want to decide the name and move to a new account until around 7/25.

H4kor · 2016-07-17T15:54:40Z

Sounds good.

I like the name hornet, but tiny-* might be better in terms of visibility/say what it is.

edgarriba · 2016-07-17T20:33:57Z

For consistency tiny-dnn I think is okay. However, hornet could reflect a reborn of the library.

bhack · 2016-07-17T20:49:26Z

/cc @mtamburrano

bhack · 2016-07-17T20:59:21Z

I start push some names: .hdnn, lightdnn, lightnet, tinygraph, tinyflow, hornet

patrikhuber · 2016-07-18T10:50:20Z

I would vote for tiny-dnn. "tiny-cnn" is already quite known, and with "tiny-dnn" the familiarity with the name is not lost. It's a nice name.

edgarriba · 2016-07-25T08:18:51Z

So people, what's the decision?

nyanp · 2016-07-25T13:59:05Z

tiny-dnn seems good for everyone. It's a nice name for me, and have no namespace problem. Let's std::move to tiny-dnn :)

edgarriba · 2016-07-25T14:03:39Z

nice!

@bhack green light for the logo

garybradski · 2016-07-27T00:56:33Z

How about an actual tiny name?: dnn

nyanp · 2016-07-27T14:04:50Z

@garybradski
Thanks, but I think it's too short for the library name. I'd vote for tiny-dnn.

garybradski · 2016-07-27T18:21:12Z

OK, tiny-dnn. I had some questions about this module. We have some other deep net modules:
https://github.com/opencv/opencv_contrib/tree/master/modules

dnn, (not very helpful README.md -- module to run caffe deep nets)
dnns_easily_fooled (Separate, good README ...)
cnn_3dobj (Good readme ... but wouldn't it be better as an example of tiny-dnn?)

Anyhow, I'm concerned that a new user won't know where to start. Maybe the READMEs have to be cross-linked and, for my intent, most of the real work should go into tiny-dnn which is the most complete now. I'm also concerned for exactly how tiny-dnn will work. Will it be dependent on tiny-cnn or a completely stand alone module? It will be much cleaner if it is a complete stand alone module. Is this OK? that effectively tiny-cnn becomes tiny-dnn? Of course, full credit can be given both in the module and in the contributors page:
http://code.opencv.org/projects/opencv/wiki/Contributors?version=3

nyanp · 2016-07-28T16:03:15Z

@garybradski

Anyhow, I'm concerned that a new user won't know where to start.

I'm worried about it, too. I think current dnn module (lacks too many building blocks to use) should be replaced with the new dnn based on tiny-dnn. Re-organize cnn_3dobj as an example of tiny-dnn seems good. /cc: @wangyida any idea?

I'm also concerned for exactly how tiny-dnn will work

We'll just rename tiny-cnn to tiny-dnn, they're both the same. Basically tiny-dnn is stand alone, and we need some libraries to enable extra functionalities (libdnn for GPU, NNPACK for x86 speedup, and Protocol buffers for importing caffe's model).

Of course, full credit can be given both in the module and in the contributors page:

I'm happy to hear that :) Of course you might want to add two great GSoC students bridging OpenCV and tiny-dnn to this list.

bhack · 2016-07-28T16:11:53Z

/cc @ludv1x

wangyida · 2016-07-28T16:24:22Z

@garybradski @nyanp cnn_3dobj is an interesting example for pose estimation together with category prediction, this could be included as an example rather than a standalone module. But the rendering process has a dependency on VTK toolkits.

bhack · 2016-07-28T16:33:21Z

This is the commit log for the other GSoC project https://github.com/ludv1x/opencv_contrib/commits/dnn-dev.
There are also opencv/opencv_contrib#738 and opencv/opencv_contrib#724

garybradski · 2016-07-30T07:15:04Z

As stated in email, and beyond this scope, but:
Running models produced by Caffe, as done in dnn is now better focused on running them on tiny-dnn
cnn_3dobj would be best ported to tiny-dnn
dnns_easily_fooled, probably OK as is

edgarriba · 2016-07-30T07:25:58Z

@ludv1x I've seen you are contributing with OpenCL kernels to dnn module.
Could we somehow collaborate to have the same in tiny-dnn?
/cc @garybradski @bhack

@garybradski The current dnn module seems to have an extended core, so we'll have a parallelism with the new "tiny" module. What's the plan?

bhack · 2016-07-30T07:29:59Z

@edgarriba @ludv1x is mentored by @Nerei. /cc @vpisarev

wangyida · 2016-07-30T09:17:31Z

@garybradski cnn_3dobj module could be included in tiny-dnn because core function is a triplet loss this is included in Caffe triplet loss, I enclosed a trained model less than 5 MB in cnn_3dobj module, this could be used directly in caffe_converter in tiny-cnn.

ludv1x · 2016-07-30T22:54:37Z

@edgarriba Contributed kernels is based on ones from OpenCL-Caffe repo (which distributed under BSD licence) . So you can use them freely.
Also you can try to use kernels from opencl branch of main Caffe repo which is in active development now. And it seems that these kernels have better optimization (at least convolution).

So, I don't see serious problems with OCL kernels.
Do I correctly understand that development should be focused on adding core OpenCL functionality into tiny-cnn (i. e. implement own opencv2/core/ocl.hpp API) ?

bhack · 2016-07-31T00:49:00Z

@ludv1x See #208

edgarriba · 2016-07-31T08:54:37Z

@ludv1x the plan is to port kernels from opencl branch.

Notice that for convolution, we are porting LibDNN which is the standalone version of greentea. @naibaf7 is giving support in this task.

For general OpenCL functionality we decided to use CLCudaAPI which is a header-project with common interfaces for OpenCL and CUDA.

zhangqianhui · 2016-08-01T05:35:08Z

tinydnn
I think the name need underline the "tiny" comparing to Caffe or mxnet。

pansk · 2016-08-11T12:59:12Z

I got distracted for a few months, and this project exploded and even changed its name...
Well, nice job!

Meanwhile, I studied cuDNN, and now I'm catching up with this huge batch of updates (and updating my code as well)... I hope I can provide some help again soon :)

edgarriba · 2016-08-11T13:50:13Z

@pansk welcome to the party again!
notce that we're putting some effort to support GPU via OpenCL with a refactored Ops style backend.

Check new ops proposal for custom convolution
https://github.com/edgarriba/tiny-cnn/blob/origin/core-framework-patch/tiny_cnn/core/kernels/conv2d_tiny_op.h

And for OpenCL (not working yet)
https://github.com/edgarriba/tiny-cnn/blob/opencl-ops-patch/tiny_cnn/core/kernels/conv2d_opencl_op.h

pansk · 2016-08-11T15:28:36Z

Is, like it seems, tensor_t still a std::vector<...>?

This approach requires you to get back and forth to CPU memory between two layers, and it's quite inefficient (it will probably destroy any performance gain given from the GPU).
In an email thread with @nyanp we were discussing about having a structure that represents i/o tensors and another which represents parameters (weights, biases, convolution coefficients, ...).

The first structure was supposed to "automatically" move between the CPU and the GPU when needed (e.g., when interleaving CPU and GPU layers, something which is probably still inefficient, but should be allowed for prototyping new, complex layers), while the second was conceived for full-time resident on the GPU (for GPU backends) and was supposed to be downloaded/uploaded only for serialization/deserialization purposes (probably manually).

I also see some interoperability issue between CLCudaAPI and cuDNN (and most of other cu* libraries): the former wants to wrap the kernel, and the others don't provide them, but are built on top of a CUDA stream. Please note that I didn't explore CLCudaAPI thoroughly, so I might be missing some details that made this last comment useless or wrong.

naibaf7 · 2016-08-11T15:52:11Z

@pansk
When CLCudaAPI has initialized a device, you can both get the underlying memory pointer for CUDA and OpenCL. Then you can use normal CUDA code or OpenCL code separately from CLCudaAPI.
The only warning is to keep the memory wrapper from CLCudaAPI within the scope (avoid the garbage collector) while handling the underlying memory objects...
In short, cuDNN, cuBLAS and libDNN, etc. can be used in conjunction with CLCudaAPI without issues.

I would also recommend to keep one data structure for both weights and data. If coded correctly, the transfers should not be triggered often anyways, and it allows more flexibility. You could add a profiling method to these memory objects to find unnecessary transfers.
If the additional memory consumption bugs you, there could be a method to explicitly release a memory object from a device or the CPU, and reallocate the memory when the next transfer to the CPU or device is requested.

edgarriba · 2016-08-11T16:08:30Z

@naibaf7
appreciable details, thx!

@pansk

Is, like it seems, tensor_t still a std::vector<...>?

Yes, tensor_t is a vector of vectors that was introduced to handle batch normalization.

The approach I propose is just a proof of concept to have a minimum GPU pipeline working and as pointed out memory transfer is needed in order to take advantage of GPU. Things in the TODO list are model tensors in classes and implement a proper memory transfer module.

Since the GSOC is about to finish, probably it's time to rewrite the roadmap and assign tasks. /cc @nyanp @bhack

pansk · 2016-08-11T17:25:33Z

@naibaf7
Thank you for clarifying, about CLCudaAPI. Another thing that could be interesting in the future is the ability to use more than one cuda stream. Is that possible, as far as you know?

About the data structure, you're correct: another solution is to use a single data structure for both.

At the moment data and weights are already held in two different structures (tensor_t and vec_t respectively).

I quite liked the solution with two different structures, partly because of memory occupation and but mostly because it makes "coding correctly" obvious. It's always possible to check with a profiler later, but I generally prefer making such bad conditions not so easy to code rather than track them down when they've been committed and used for a while... In the end, for most platforms I expect one of the structures to be a decorated version of the other, so there's no real code duplication.

Anyway, I'm not at all strongly against a unified structure, I just have a mild preference for the other solution.

naibaf7 · 2016-08-11T17:34:11Z

@pansk
Not sure if it handles multiple queues (OpenCL) or streams (CUDA) yet, but I'm sure it would be easy to add.

@CNugteren
Do you want to clarify?

edgarriba · 2016-08-11T17:44:28Z

@pansk
I think that having a unified structure it's more simple in terms of design. I agree that we have different types of data representing different concepts, but from the point of view of computation they are all matrix/tensor based objects. I think that with a proper definition we can represent them. Could you explain more your concern about with memory occupation?

bhack · 2016-08-11T18:00:08Z

There is a queue/stream object constructor.... See https://github.com/CNugteren/CLCudaAPI/blob/master/doc/api.md

CNugteren · 2016-08-11T19:17:18Z

Indeed, you can have multiple streams/queues, and you can enqueue kernels or memory copy operations (asynchronous or synchronous) to different streams. You can also wait for all tasks in a certain stream/queue to be completed.

If any other features are needed, CLCudaAPI can be extended. However, note that it can only support features which are available in both CUDA and OpenCL.

pansk · 2016-08-12T14:04:40Z

@edgarriba Memory concerns are two-fold.

memory occupation: using memory on CPU when is just needed for serialization/deserialization could be a problem on some embedded platform (we should provide an explicit method to clean up CPU or GPU memory when no longer needed)
memory bandwidth: allowing the parameters to be "easily" moved between GPU and CPU simplifies writing bad code without noticing, and requires profiling to be spotted. If, on the other hand, moving parameters requires additional code, contributors will naturally tend to produce "faster" code (reducing the number of parameter moves back and forth, or even moving some logic to the GPU when needed).

@naibaf7, @bhack, @CNugteren Thank you, I'll have a deeper look at CLCudaAPI in the next days.

bhack · 2016-08-20T19:30:18Z

I think we can close this.

nyanp · 2016-08-21T14:39:40Z

Exactly.

nyanp added the community label Jul 18, 2016

edgarriba mentioned this issue Jul 18, 2016

integration to tinny-cnn naibaf7/libdnn#3

Open

edgarriba mentioned this issue Aug 11, 2016

Model Tensor structures #273

Closed

bhack mentioned this issue Aug 15, 2016

Can i just use the interface of opencl ? CNugteren/CLCudaAPI#7

Closed

nyanp closed this as completed Aug 21, 2016

Move to an organization & renaming tiny-cnn #235

Move to an organization & renaming tiny-cnn #235

Comments

nyanp commented Jul 17, 2016

H4kor commented Jul 17, 2016

edgarriba commented Jul 17, 2016

bhack commented Jul 17, 2016

bhack commented Jul 17, 2016

patrikhuber commented Jul 18, 2016

edgarriba commented Jul 25, 2016

nyanp commented Jul 25, 2016

edgarriba commented Jul 25, 2016

garybradski commented Jul 27, 2016

nyanp commented Jul 27, 2016

garybradski commented Jul 27, 2016

nyanp commented Jul 28, 2016

bhack commented Jul 28, 2016

wangyida commented Jul 28, 2016

bhack commented Jul 28, 2016 • edited Loading

garybradski commented Jul 30, 2016

edgarriba commented Jul 30, 2016 • edited Loading

bhack commented Jul 30, 2016

wangyida commented Jul 30, 2016

ludv1x commented Jul 30, 2016

bhack commented Jul 31, 2016

edgarriba commented Jul 31, 2016

zhangqianhui commented Aug 1, 2016

pansk commented Aug 11, 2016

edgarriba commented Aug 11, 2016

pansk commented Aug 11, 2016 • edited Loading

naibaf7 commented Aug 11, 2016 • edited Loading

edgarriba commented Aug 11, 2016

pansk commented Aug 11, 2016

naibaf7 commented Aug 11, 2016 • edited Loading

edgarriba commented Aug 11, 2016

bhack commented Aug 11, 2016

CNugteren commented Aug 11, 2016 • edited Loading

pansk commented Aug 12, 2016

bhack commented Aug 20, 2016

nyanp commented Aug 21, 2016

bhack commented Jul 28, 2016 •

edited

Loading

edgarriba commented Jul 30, 2016 •

edited

Loading

pansk commented Aug 11, 2016 •

edited

Loading

naibaf7 commented Aug 11, 2016 •

edited

Loading

naibaf7 commented Aug 11, 2016 •

edited

Loading

CNugteren commented Aug 11, 2016 •

edited

Loading