Skip to content
Ralf Brown edited this page Apr 27, 2023 · 23 revisions

Understanding color and color management

Some useful resources to jump into digital color management, editing pipeline, calibrations, view transform, etc. :

Designing vs. hacking

Too many programmers jump on their IDE before being sure they actually understand the problem they are trying to solve. darktable is full of saturday-afternoon projects that lack polish, disregard ergonomics and got their inner colour science wrong. Yet, they sort of help and allow to get some work done, but running the extra mile could have made them more (or simply) efficient.

Design is a process by which we match the needs of a category of users with a technical solution by building a tool. For the tool to be adapted to user's needs, one has to know first what kind of users is targeted, what their real need is (and not what they think their need is), then sketch several possible solutions, before finally bending some code to do it.

While designing a tool, draw, sketch, write, research what the academia has to say about your problem and the state of the art, then finally prototype something. Don't open your IDE until you got everything figured out on paper first.

Hacking is nice and all, but often ends up with half-baked code that produces toys, not tools.

Writing efficient code

Pixels are essentially 4D RGBA vectors. Since 2004, processors have special abilities to process vectors and apply Single Instructions on Multiple Data (SIMD). This allows us to speed-up the computations by processing an entire pixel (SSE2) up to 4 pixels (AVX-512) at the same time, saving a lot of CPU cycles.

darktable has two versions of its IOPs : pure C (scalar but written to enable compiler use of CPU vector instructions) and OpenCL (vectorized on GPU). Modern compilers and the OpenMP library have auto-vectorization options that can optimize pure C, provided the code is written in a vectorizable way and uses some pragmas to give hints to the compiler.

Write vectorizable code : https://info.ornl.gov/sites/publications/files/Pub69214.pdf

Best practices for auto-vectorization:

  • avoid branches in loops that change the control flow. Use inline statements like absolute = (x > 0) ? x : -x; so they can be converted to bytes masks in SIMD,
  • pixels should only be referenced from the base pointer of their array and the indices of the loops, not from implicit pointer increments, for example:
float *image = (float *)in; 
for(size_t i= 0; i < height; ++i)
{
  float *pixel = (float *)image + i * width;
  for(size_t j = 0; j < width; ++j)
  {
    *pixel = whatever;
    pixel++;
  }
}

should be written :

float *const restrict image = (float *)in; 
for(size_t i = 0; i < height; ++i)
{
  for(size_t j = 0; j < width; ++j)
  {
    image[i * width + j] = whatever;
  }
}

In the former, the address pointed by pixel depends on the previous loop iteration, which prevents parallelization and vectorization, and also makes the code more difficult to read. The latter uses an indexing that only depends on i and j loop increments, avoids false aliasing, and is easier to read (we immediately spot the array indexing).

  • avoid carrying struct arguments in functions called in loops, and unpack the struct members before the loop. Vectorization can't be performed on structures, but only on float and int scalars and arrays. For example:
typedef struct iop_data_t
{
  float[4] pixel;
  float factor;
} iop_data_t;

float foo(float x, struct iop_data_t *bar)
{
  return bar->factor * (x + bar->pixel[0] + bar->pixel[1] + bar->pixel[2] + bar->pixel[3]);
}

void loop(const float *in, float *out, const size_t width, const size_t height, const struct iop_data_t bar)
{
  for(size_t k = 0; k < height * width; ++k)
  {
    out[k] = foo(in[k], bar); // the non-vectorized function will be called at each iteration (expensive)
  }
} 

should be written:

typedef struct iop_data_t
{
  float[4] pixel DT_ALIGNED_PIXEL; // align on 16-bits addresses
  float factor;
} iop_data_t;

#ifdef _OPENMP
#pragma declare simd
#endif
/* declare the function vectorizable and inline it to avoid calls from within the loop */
inline float foo(const float x, const float pixel[4], const float factor)
{
  float sum = x;

  /* use a SIMD reduction to vectorize the sum */
  #ifdef _OPENMP
  #pragma omp simd aligned(pixel:16) reduction(+:sum)
  #endif
  for(size_t k = 0; k < 4; ++k)
    sum += pixel[k];

  return factor * sum;
}

void loop(const float *const restrict in, // use constant pointers and restrict keyword to avoid false-aliasing
          float *const restrict out, 
          const size_t width, const size_t height, const struct iop_data_t bar)
{
  /* unpack the struct members */
  const float *const restrict pixel = bar->pixel;
  const float factor = bar-> factor;

  #ifdef _OPENMP
  #pragma omp parallel for simd default(none) \
  dt_omp_firstprivate(in, out, pixel, factor, width, height) \
  schedule(simd:static) aligned(in, out:64)
  #endif
  for(size_t k = 0; k < height * width; ++k)
  {
    /* 
    * now the code of the function foo is copied inside the loop
    * so we avoid functions calls
    * and the compiler can vectorize the content of foo at the loop level
    * for example, on AVX2 platforms, the compiler could optimize the function
    * to process 16 elements of out and in at every loop step to save cycles.
    */
    out[k] = foo(in[k], pixel, factor);
  }
} 
  • if you use nested loops (e.g. loop on the width and height of the array), declare the pixel pointers in the innermost loop and use collapse(2) in the OpenMP pragma so the compiler will be able to optimize the cache/memory use and split the loop more evenly between the different threads,
  • use flat indexing of arrays whenever possible (for(size_t k = 0 ; k < ch * width * height ; k += ch)) instead of nested width/height/channels loops,
  • use the restrict keyword on image/pixels pointers to avoid aliasing and avoid inplace operations on pixels (*out must always be different from *in) so you don't trigger variable dependencies between threads
  • align arrays on 64 bytes and pixels on 16 bytes blocks so the memory is contiguous and the CPU can load full cache lines (and avoid segfaults),
  • write small functions and optimize locally (one loop/function), using OpenMP and/or compiler pragmas,
  • keep your code stupid simple, systematic and avoid smart-ass pointer arithmetic because it will only lead the compiler to detect variable dependencies and pointer aliasing where there are none,
  • avoid types casts,
  • declare input/output pointers as *const and variables as const to avoid false-sharing in parallel loops (using shared(variable) OpenMP pragma),
  • look at Rawtherapee source code because these guys got it right.

Coding Style

To facilitate collaboration, a coding style guide is in order.

Defined by use so far:

  • use american-english spelling, especially for user-visible strings
  • spaces instead of tabs
  • shiftwidth=2
  • remove trailing white space
  • { and } in their own lines

Here are two modelines that you can add to your source files that will help with sticking to these defaults:

// vim: shiftwidth=2:expandtab:tabstop=2:cindent
// kate: tab-indents: off; indent-width 2; replace-tabs on; indent-mode cstyle; remove-trailing-space on;

There is a tool in the repository that will beautify any code in the tree. It is here:

tools/beautify_style.sh

For emacs, all that's needed is the following content in the file $TOP/.dir-locals.el:

((c-mode . ((c-file-style . "bsd")
        (c-basic-offset . 2)
        (indent-tabs-mode . nil))))

In newer versions of darktable, this file should already be present.

Views

Preferences

Modules

Modules are the interfaces for IOPs, i.e. image-processing filters stacked in the pixelpipe. IOPs can be found in src/iop and the IOP API can be found in the header src/iop/iop_api.h.

Most IOPs have two variants of their pixel-filtering part:

  1. a pure C implementation, in process()
  2. an OpenCL version, offloading the computation to the GPU, in process_cl().

An example of a dummy IOP can be found in src/iop/useless.c and used as a boilerplate.

If you add a new IOP, be sure to add the C file in src/iop/CMakeLists.txt#L69 and deal with its priority in the pixelpipe by adding a new node in tools/iop_dependencies.py

Libs