GitHub - wangzheallen/vsad: this is the code release for ''Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition''

this is the release code for Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition:

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, and Yu Qiao

The performance is as below:

acc	MIT_indoor	SUN397
mean:	78.5	63.5
VLAD:	83.9	70.1
FV	83.6	69.0
VSAD	84.9	71.7

Note: The encoding method based on our scene_patchenet feature surpass human performance on sun397(68.5%).

Feature

we released the concise and effective feature for MIT indoor feature, it is notated as hybrid_PatchNet+VSAD in the paper which obtains 86.1 accuracy. You can use it as baseline or as complementary feature for further study.

acc on MIT	dimension	storage
86.1	1002562*2	1.9G

Model

Our trained scene_patchnet and object_patchenet, the model is based on cudnn_v4, if your system is based on cudnn_v5, you can use the code below cudnn_v4 to cudnn_v5: https://github.com/yjxiong/caffe/blob/action_recog/python/bn_convert_style.py

acc	Top5
Object_patchnet_on_ImageNet:	85.3
Scene_patchnet_on_Places205:	82.7

They both take 128 * 128 patches as input.

Code

mit_hybrid_vsad.mat -- you can use this feature as your baseline or to concatenate for further study, it is only 100*256*2*2 dimensions while performs 86.1 acc on MIT indoor, you can download from mit_hybrid_vsad.mat
extracting_feature_exmaple.m -- you can use this code as template to extract scene_patchnet_feature or object_patchnet_probability, for scene_patchnet_feature it is global average pool feature and for for object_patchnet_probability it is fully connnect feature with softmax function
for_encoder_scene67.mat -- serve as assist to your handle on MIT_indoor dataset, from vl_feat
for_encoder_sun397.mat -- serve as assist to your handle on sun397 dataset
mit_pca.mat -- our generated scene_patchnet_feature pca matrix for mit indoor, used in vsad_encoding_example.m
mit_vsad_codebook.mat -- our generated semantical codebook for mit_indoor, used in vsad_encoding_example.m
multi_crop.m -- dense crop as 10 * 10 grid, used in extracting_feature_example.m
object_selection_256.mat -- 256 objects selected from 1000(in ImageNet), applied to both MIT_indoor and SUN397
sun_pca.mat -- our generated scene_patchnet_feature pca matrix for sun397, used in vsad_encoding_example.m
sun_vsad_codebook.mat -- our generated semantical codebookfor sun397, used in vsad_encoding_example.m
vsad_encoding_example.m -- an example for VSAD encoding algorithm
vsad_encoding.m -- our developed VSAD encoding function
plot_mit_sun.m -- Plot the figure in the below of this page
xticklabel_rotate.m -- Serve for plot_mit_sun and rotate the text in the figure

Usage

1. Download code and model

2. Extract scene_net_feature and object_net_probability (extracting_feature_example.m, multi_crop.m)

3. VSAD encoding (vsad_encoding.m, vsad_encoding_example.m, mit_pca.mat, mit_vsad_codebook.mat, object_selection_256.mat)

Contact

Figure Plot for Reference

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
model		model
4.png		4.png
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

model

model

4.png

4.png

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Feature

Model

Code

Usage

About

Releases

Packages

Languages

License

wangzheallen/vsad

Folders and files

Latest commit

History

Repository files navigation

Feature

Model

Code

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Languages