words-matter-scene-text-for-image-classification

This repository contains the code for the following two papers:

[1] Sezer Karaoglu, Ran Tao, Theo Gevers, Arnold W. M. Smeulders, Words Matter: Scene Text for Image Classification and Retrieval, in IEEE Transactions on Multimedia, 2017
[2] Sezer Karaoglu, Ran Tao, Jan van Gemert, Theo Gevers, Con-Text: Text Detection for Fine-grained Object Classification, in IEEE Transactions on Image Processing, 2017

[1] introduces a fully unsupervised word proposal method to detect words in images and shows the detected words are useful for image classification and retrieval. [2] proposes a novel text (character) detection method based on text saliency. If you find the word proposal method and the textual representation of the detected words useful in your research, please consider citing [1]. If you find the saliency based text detection method useful, please consider citing [2].

Contact: sezerkaraoglu@gmail.com, rantao.mail@gmail.com

Usage

[Dataset]: The Con-Text dataset can be found here https://staff.fnwi.uva.nl/s.karaoglu/datasetWeb/Dataset.html

'Finegrained_ImageNames.mat' is the list of images in the Con-Text dataset.

[Text detection]: The code in the folder 'text_detection/' is for generating word bounding box proposals. See 'text_detection/demo.m'.

[Generate textual representation]: Refer to 'EncodeTextualConTextScript.m' for how to generate representations of the word-level textual contents in images. Both the CPU version ('EncodeTextual.m') and the GPU version ('EncodeTextualGPU.m') are provided. To generate the representations of the word-level textual contents, the word recognition model provided by Jaderberg et al (http://www.robots.ox.ac.uk/~vgg/research/text/) is required. Go to folder 'NIPS2014DLW-Jaderberg/' and run 'download.sh' to download the word recognition model.

[Generate visual representation]: Refer to 'deep_visual_features/extract_googlenet_feat.py' for how to extract googlenet features. Caffe (https://github.com/BVLC/caffe) is needed.

[Fine tune googlenet on the Con-Text dataset]: See folder 'finetune_googlenet/'.

[Classification]: See 'run_classification.m'. libsvm (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) is required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

words-matter-scene-text-for-image-classification

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
NIPS2014DLW-Jaderberg		NIPS2014DLW-Jaderberg
deep_visual_features		deep_visual_features
finetune_googlenet		finetune_googlenet
metadata/ImageSets		metadata/ImageSets
prepare_data_for_fine_tuning_googlenet		prepare_data_for_fine_tuning_googlenet
text_detection		text_detection
ComputeAP.m		ComputeAP.m
EncodeTextual.m		EncodeTextual.m
EncodeTextualConTextScript.m		EncodeTextualConTextScript.m
EncodeTextualGPU.m		EncodeTextualGPU.m
Finegrained_ImageNames.mat		Finegrained_ImageNames.mat
LICENSE		LICENSE
README.md		README.md
run_classification.m		run_classification.m

License

taotaoorange/words-matter-scene-text-for-image-classification

Folders and files

Latest commit

History

Repository files navigation

words-matter-scene-text-for-image-classification

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages