finalize structure

zhreshold · Oct 6, 2016 · 8c4062b · 8c4062b
1 parent 02514a7
commit 8c4062b
Show file tree

Hide file tree

Showing 25 changed files with 22 additions and 2,457 deletions.
diff --git a/README.md b/README.md
@@ -4,16 +4,21 @@ SSD is an unified framework for object detection with a single network.
 
 You can use the code to train/evaluate/test for object detection task.
 
-*This repo is still under construction.*
-
 ### Disclaimer
 This is a re-implementation of original SSD which is based on caffe. The official
 repository is available [here](https://github.com/weiliu89/caffe/tree/ssd).
 The arXiv paper is available [here](http://arxiv.org/abs/1512.02325).
 
 This example is intended for reproducing the nice detector while fully utilize the
-remarkable traits of MXNet. However:
-* The model is not compatible with caffe version due to the implementation details.
+remarkable traits of MXNet.
+* The model is fully compatible with caffe version due to the implementation details.
+* Model converter from caffe is available, I'll release it once I can convert any symbol other than VGG16.
+
+### Demo results
+![demo1](https://cloud.githubusercontent.com/assets/3307514/19171057/8e1a0cc4-8be0-11e6-9d8f-088c25353b40.png)
+![demo2](https://cloud.githubusercontent.com/assets/3307514/19171063/91ec2792-8be0-11e6-983c-773bd6868fa8.png)
+![demo3](https://cloud.githubusercontent.com/assets/3307514/19171086/a9346842-8be0-11e6-8011-c17716b22ad3.png)
+
 
 ### Getting started
 * You will need python modules: `easydict`, `cv2`, `matplotlib` and `numpy`.
@@ -34,28 +39,27 @@ git clone --recursive https://github.com/zhreshold/mxnet-ssd.git
 # git submodule update --recursive --init
 cd mxnet-ssd/mxnet
 ```
-* Build MXNet with extra layers: Follow the official instructions
-[here](http://mxnet.readthedocs.io/en/latest/how_to/build.html), and add extra
-layers in `config.mk` by pointing `EXTRA_OPERATORS = ../operator/`.
+* Build MXNet: Follow the official instructions
+[here](http://mxnet.readthedocs.io/en/latest/how_to/build.html).
 Remember to enable CUDA if you want to be able to train, since CPU training is
-insanely slow. Using CUDNN is not fully tested but should be fine.
+insanely slow. Using CUDNN is optional, it's not fully tested but should be fine.
 
 ### Try the demo
-* Download the pretrained model: `to_be_added`, and extract to `model/` directory.
+* Download the pretrained model: [`ssd_300_voc_0712.zip`](https://dl.dropboxusercontent.com/u/39265872/ssd_300_voc0712.zip), and extract to `model/` directory. (This model is converted from VGG_VOC0712_SSD_300x300_iter_60000.caffemodel provided by paper author).
 * Run
 ```
 # cd /path/to/mxnet-ssd
 python demo.py
 # play with examples:
-python demo.py --images ./data/demo/dog.jpg --thresh 0.3
+python demo.py --epoch 0 --images ./data/demo/dog.jpg --thresh 0.3
 ```
 * Check `python demo.py --help` for more options.
 
 ### Train the model
 This example only covers training on Pascal VOC dataset. Other datasets should
 be easily supported by adding subclass derived from class `Imdb` in `dataset/imdb.py`.
 See example of `dataset/pascal_voc.py` for details.
-* Download the converted pretrained `vgg16_reduced` model: , put `.param` and `.json` files
+* Download the converted pretrained `vgg16_reduced` model [here](https://dl.dropboxusercontent.com/u/39265872/vgg16_reduced.zip), unzip `.param` and `.json` files
 into `model/` directory by default.
 * Download the PASCAL VOC dataset, skip this step if you already have one.
 ```
@@ -75,10 +79,11 @@ in the same `VOCdevkit` folder.
 `ln -s /path/to/VOCdevkit /path/to/this_example/data/VOCdevkit`.
 Use hard link instead of copy could save us a bit disk space.
 * Start training: `python train.py`
-* By default, this example will use `batch-size=32` and `learning_rate=0.004`.
+* By default, this example will use `batch-size=32` and `learning_rate=0.002`.
 You might need to change the parameters a bit if you have different configurations.
 Check `python train.py --help` for more training options. For example, if you have 4 GPUs, use:
 ```
+# note that a perfect training parameter set is yet to be found for multi-gpu
 python train.py --gpus 0,1,2,3 --batch-size 128 --lr 0.005
 ```
 

diff --git a/data/demo/000005.jpg b/data/demo/000005.jpg
diff --git a/data/demo/000012.jpg b/data/demo/000012.jpg
diff --git a/data/demo/2008_000145.jpg b/data/demo/2008_000145.jpg
diff --git a/data/demo/bangkok2.jpg b/data/demo/bangkok2.jpg
diff --git a/data/demo/dogcat.jpg b/data/demo/dogcat.jpg
diff --git a/data/demo/monitor.jpg b/data/demo/monitor.jpg
diff --git a/data/demo/stoplight.jpg b/data/demo/stoplight.jpg
diff --git a/data/demo/umbrella.jpg b/data/demo/umbrella.jpg
diff --git a/demo.py b/demo.py
@@ -48,7 +48,7 @@ def parse_args():
     parser.add_argument('--ext', dest='extension', help='image extension, optional',
                         type=str, nargs='?')
     parser.add_argument('--epoch', dest='epoch', help='epoch of trained model',
-                        default=200, type=int)
+                        default=0, type=int)
     parser.add_argument('--prefix', dest='prefix', help='trained model prefix',
                         default=os.path.join(os.getcwd(), 'model', 'ssd'), type=str)
     parser.add_argument('--cpu', dest='cpu', help='(override GPU) use CPU to detect',
@@ -63,7 +63,7 @@ def parse_args():
                         help='green mean value')
     parser.add_argument('--mean-b', dest='mean_b', type=float, default=104,
                         help='blue mean value')
-    parser.add_argument('--thresh', dest='thresh', type=float, default=0.6,
+    parser.add_argument('--thresh', dest='thresh', type=float, default=0.5,
                         help='object visualize score threshold, default 0.6')
     parser.add_argument('--nms', dest='nms_thresh', type=float, default=0.5,
                         help='non-maximum suppression threshold, default 0.5')

diff --git a/mxnet b/mxnet
diff --git a/operator/multibox_detection-inl.h b/operator/multibox_detection-inl.h