Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Fully Convolutional Color Constancy with Confidence-weighted Pooling (CVPR 2017)


Yuanming Hu1,2, Baoyuan Wang1, Stephen Lin1

1Microsoft Research 2Tsinghua University (now MIT CSAIL)

Change log:

  • July 19, 2018:
    • Improved instructions for (painless) reproducibility.
    • See updated FAQ How to reproduce the numbers reported in the paper?
  • May 22, 2018:
    • Added some FAQs.
  • April 25, 2018: Released network definition scripts and training instructions. TODO:
    • Update for more datasets and benchmarks.
    • Upgrade python version. Please use python 2.7 for now.
  • April 15, 2018: Started preparing for code release.

The Problem, the Challenge, and Our Solution

Visual Results (More)


Color Constancy and Datasets

a) Links to datasets

(The following two sub-questions are FAQs before I release the code - now the script will take care of these details and you don't need to worry unless out of curiosity.)

b) The input images look purely black. What's happening?

The input photos from the ColorChecker dataset are 16-bit png files and some image viewer may not support them, as pngs are typically 8-bit. Also, since these photos are linear (RAW sensor activations) and modern displays have a 2.2 gamma value (instead of linear gamma), they will appear even darker when displayed. An exposure correction is also necessary.

c) I corrected the gamma. Now most images appear green. Is there anything wrong?

It's common that RAW images appear green. One possible cause is that the color filters of digital cameras may have a stronger activation on the green channel.

d) What can be done to improve the datasets?

  • (More data for deep learning) The amount of data is relatively small for deep learning.
  • (More data for accurate comparison) With only 568 images, the test noise is huge. When three-fold cross-validation is used, if a single test image raises its error from 1 degree to 18 degrees (which is common), it will result in ~0.1 degree average angular error increase. Another issue about the small amount of data is that, the gap between validation set and test set can still be very large. As a result, you may find early stopping on the validation set may not result in a pleasant error on the test set.
  • (Validation set) The (historical) three-fold cross-validation splits do not actually have validation sets. This means people have to tweak hyper-parameters based on the test set. This may not be a serious issue for traditional statistics-based approaches since not many parameters need to be tuned, but the risk of overfitting is becoming higher and higher when the model goes deeper and deeper!
  • (Quality) In some images, there are actually more than one light sources in the scene. The illumination difference may be as large as 10 degrees. Since we already achieve < 2 degrees of estimation error, further reducing this number may not provide a significant evidence for algorithm comparison.

Finally, The Cube dataset can be useful for future research!

FC4 Training and Testing

a) Installation

Please use python2 for now. All dependencies can be installed via pip:

sudo python2 -m pip install opencv-python tensorflow-gpu scipy

b) Data Pre-processing

Shi's Re-processing of Gehler's Raw Dataset:

  • Download the 4 zip files from the website
  • Extract the png images into fc4/data/gehler/images/, without creating subfolders.
  • python, and wait for it to finish
  • python to view data-augmented patches. Press any key to see the next patch. You can use this data provider to train your own network.

c) Model Training

  • First, make sure you have preprocessed the data.
  • python train example, and wait for it to finish. The model will be located at models/fc4/example/. example here is the model name and you can change it to any other suitable identifier.
  • Note that there are three folds and you need to modify Ln 99 of to FOLD = 1 or FOLD = 2 for other two folds.

d) Visualize Confidence Maps You can look at how the confidence map evolves at the folders models/fc4/example/testXXXXsummaries_0.500000.

e) Pretrained models?

To get the pretrained models on the ColorChecker dataset, please download Pretrained models on the ColorChecker Dataset, and put the nine files in folder pretrained.

f) How to reproduce the results reported in the paper?

  • Taking the ColorChecker dataset as an example.
  • Please train the three-fold models (make sure you modify FOLD to be 0, 1, 2 in or download the pretrained models.
  • (Assuming you are using the pretrained models. Modify the path if not.) Test on the ColorChecker dataset (make sure you have preprocessed it):
python2 test pretrained/colorchecker_fold1and2.ckpt -1 g0 fold0
python2 test pretrained/colorchecker_fold2and0.ckpt -1 g1 fold1
python2 test pretrained/colorchecker_fold0and1.ckpt -1 g2 fold2
  • Combine the three folds:
   python2 outputs/fold0_err.pkl outputs/fold1_err.pkl outputs/fold2_err.pkl
  • You will see the results
25: 0.384, med: 1.160 tri: 1.237 avg: 1.634 75: 3.760 95: 4.850
  • In comparison to what we reported in the paper:
Mean Median Tri. Mean Best 25% Worst 25% 95% Quant.
SqueezeNet-FC4 (CVPR 2017 paper) 1.65 1.18 1.27 0.38 3.78 4.73
SqueezeNet-FC4 (Open source code) 1.63 1.16 1.24 0.38 3.76 4.85

You can see we get slightly better results except for 95% Quant.. The difference should be due to randomness (or different TensorFlow version etc.).

g) How to make inference on images based on a trained model?

  • Test on other images: (e.g. sample_inputs/a.png)
python2 test pretrained/colorchecker_fold1and2.ckpt -1 sample_inputs/a.png

The corrected image will be in the cc_outputs folder.

You will see the results in seconds. Legend (TODO: this legend doesn't match the latest code!):

h) What does the SEPARATE_CONFIDENCE option mean? When its value is False, does it mean confidence-weighted pooling is disabled?

Firstly, let's clarify a common misunderstanding of the color constancy problem: the output of a color constancy consists of three components. Actually, there are only two components (degrees-of-freedom). In some paper, the two components are denoted as u/v or temperature/tint. When estimating R/G/B, there should be a constraint on the values, either L1 (R+G+B=1) or L2 (R^2+G^2+B^2=1).

In our paper, we estimate R/G/B. Therefore, for each patch, we should either normalize the R/G/B output and estimate another confidence value (which is mathematically more explicit), or directly use the unnormalized estimation as normalized R/G/B times confidence, as mentioned in paper section 4.1. Either way is fine and confidence-weighting is used because one extra degree of freedom (i.e. confidence) is allowed. If you use SEPARATE_CONFIDENCE=True, the former is used; otherwise the latter is used.

If you want to disable confidence-weighted pooling, the correct way is setting WEIGHTED_POOLING=False.

i) How to merge test results on three folds?

python2 [fold0_model_name] [fold1_model_name] [fold2_model_name]


  title={FC 4: Fully Convolutional Color Constancy with Confidence-weighted Pooling},
  author={Hu, Yuanming and Wang, Baoyuan and Lin, Stephen},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},

Related Research Projects and Implementations

  • Exposure (General-purpose photo postprocessing with GANs and reinforcement learning)
  • FFCC (Fast Fourier Color Constancy: an auto white balance solution with machine learning in Fourier space)
  • ...

Color Constancy Resources


  • The SqueezeNet model is taken from here. Thank Yu Gu for his great efforts in converting the Caffe models into a TensorFlow-readable version!


Code and resources for "FC4 : Fully Convolutional Color Constancy with Confidence-weighted Pooling" (CVPR 2017)




No packages published