this is the code release for ''Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition''
Switch branches/tags
Nothing to show
Clone or download
Latest commit b495e8e Feb 28, 2018
Failed to load latest commit information.
code upload code for figures Aug 9, 2017
model add code and model Aug 25, 2016
4.png add pictures Aug 9, 2017
LICENSE Initial commit Aug 25, 2016 Update Feb 27, 2018

this is the release code for Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition:

Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene Recognition
Zhe Wang, Limin Wang, Yali Wang, Bowen Zhang, and Yu Qiao

The performance is as below:

acc MIT_indoor SUN397
mean: 78.5 63.5
VLAD: 83.9 70.1
FV 83.6 69.0
VSAD 84.9 71.7

Note: The encoding method based on our scene_patchenet feature surpass human performance on sun397(68.5%).


we released the concise and effective feature for MIT indoor feature, it is notated as hybrid_PatchNet+VSAD in the paper which obtains 86.1 accuracy. You can use it as baseline or as complementary feature for further study.
acc on MIT dimension storage
86.1 100*256*2*2 1.9G


Our trained scene_patchnet and object_patchenet, the model is based on cudnn_v4, if your system is based on cudnn_v5, you can use the code below cudnn_v4 to cudnn_v5:

acc Top5
Object_patchnet_on_ImageNet: 85.3
Scene_patchnet_on_Places205: 82.7

They both take 128 * 128 patches as input.


  • mit_hybrid_vsad.mat -- you can use this feature as your baseline or to concatenate for further study, it is only 100*256*2*2 dimensions while performs 86.1 acc on MIT indoor, you can download from mit_hybrid_vsad.mat
  • extracting_feature_exmaple.m -- you can use this code as template to extract scene_patchnet_feature or object_patchnet_probability, for scene_patchnet_feature it is global average pool feature and for for object_patchnet_probability it is fully connnect feature with softmax function
  • for_encoder_scene67.mat -- serve as assist to your handle on MIT_indoor dataset, from vl_feat
  • for_encoder_sun397.mat -- serve as assist to your handle on sun397 dataset
  • mit_pca.mat -- our generated scene_patchnet_feature pca matrix for mit indoor, used in vsad_encoding_example.m
  • mit_vsad_codebook.mat -- our generated semantical codebook for mit_indoor, used in vsad_encoding_example.m
  • multi_crop.m -- dense crop as 10 * 10 grid, used in extracting_feature_example.m
  • object_selection_256.mat -- 256 objects selected from 1000(in ImageNet), applied to both MIT_indoor and SUN397
  • sun_pca.mat -- our generated scene_patchnet_feature pca matrix for sun397, used in vsad_encoding_example.m
  • sun_vsad_codebook.mat -- our generated semantical codebookfor sun397, used in vsad_encoding_example.m
  • vsad_encoding_example.m -- an example for VSAD encoding algorithm
  • vsad_encoding.m -- our developed VSAD encoding function
  • plot_mit_sun.m -- Plot the figure in the below of this page
  • xticklabel_rotate.m -- Serve for plot_mit_sun and rotate the text in the figure


1. Download code and model

2. Extract scene_net_feature and object_net_probability (extracting_feature_example.m, multi_crop.m)

3. VSAD encoding (vsad_encoding.m, vsad_encoding_example.m, mit_pca.mat, mit_vsad_codebook.mat, object_selection_256.mat)


Figure Plot for Reference

Alt text