caffe-video_triplet

This code is developed based on Caffe: project site.

This code is the implementation for training the siamese-triplet network in the paper:

Xiaolong Wang and Abhinav Gupta. Unsupervised Learning of Visual Representations using Videos. Proc. of IEEE International Conference on Computer Vision (ICCV), 2015. pdf

Codes

Training scripts are in rank_scripts/rank_alexnet:

For implementation, since the siamese networks share the weights, so there is only one network in prototxt.

The input of the network is pairs of image patches. For each pair of patches, they are taken as the similar patches in the same video track. We use the label to specify whether the patches come from the same video, if they come from different videos they will have different labels (it does not matter what is the number, just need to be integer). In this way, we can get the third negative patch from other pairs with different labels.

In the loss, for each pair of patches, it will try to find the third negative patch in the same batch. There are two ways to do it, one is random selection, the other is hard negative mining.

In the prototxt:

layer {		
	name: "loss"	
	type: "RankHardLoss" 	
	rank_param{		
		neg_num: 4	
		pair_size: 2 	
		hard_ratio: 0.5 	
		rand_ratio: 0.5 	
		margin: 1 	
	} 	
	bottom: "norml2" 	
	bottom: "label" 	
}

neg_num means how many negative patches you want for each pair of patches, if it is 4, that means there are 4 triplets. pair_size = 2 just means inputs are pairs of patches. hard_ratio = 0.5 means half of the negative patches are hard examples, rand_ratio = 0.5 means half of the negative patches are randomly selected. For start, you can just set rand_ratio = 1 and hard_ratio = 0. The margin for contrastive loss needs to be designed for different tasks, trying to set margin = 0.5 or 0.1 might make a difference for other tasks.

Models

We offer two models trained with our method:

color model is trained with RGB images. gray model is trained with gray images (3-channel inputs). prototxt is the prototxt for both models. mean is the mean file.

In case our server is down, the models can be downloaded from dropbox:

color model is trained with RGB images. gray model is trained with gray images (3-channel inputs).

Training Patches

The unsupervised mined patches can be downloaded from here: https://www.dropbox.com/sh/vgp2k3mdi61sdgr/AAB9vwX140jppHjp33n4UoO7a?dl=0

Each tar file contains different patches. Note that the file YouTube.tar.gz can be extracted by using "tar xf" even though it is named as "tar.gz" file.

The example of the training list can be downloaded from here: https://www.dropbox.com/s/tnbu2myy7g0i6l6/trainlist.txt?dl=0

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
cmake		cmake
docs		docs
examples		examples
include/caffe		include/caffe
matlab		matlab
python		python
rank_scripts/rank_alexnet		rank_scripts/rank_alexnet
scripts		scripts
src		src
tools		tools
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
Makefile.config		Makefile.config
Makefile.config.example		Makefile.config.example
README.md		README.md
caffe.cloc		caffe.cloc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

caffe-video_triplet

Codes

Models

Training Patches

About

Releases

Packages

Languages

License

xiaolonw/caffe-video_triplet

Folders and files

Latest commit

History

Repository files navigation

caffe-video_triplet

Codes

Models

Training Patches

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages