Commits on Dec 7, 2016
  1. Compression engine: add support for lz4_hc

    As noted in the code, lz4_hc supports comes "free" with the lz4 library.
    lz4_hc is a high compression variant of lz4, using an identical block
    and frame format (so there's no unique decompression function), but with
    a much more aggressive search strategy during compression.  This results
    in better compression results but much slower compression performance.
    Note that decompression speed is identical to regular lz4, so lz4_hc is
    a good candidate when there is plenty of time available for compression,
    but decompression performance is at a premium.  I'm considering using it
    for PD's new resource file format, for example.
    Anyway, lz4_hc compression is now supported both throughout PD and
    within pdPackages specifically.  Note that the pdPackages spec isn't
    affected by this - the file flags still just use LZ4 as a descriptor,
    since only the LZ4 decompressor is required.
    committed Dec 7, 2016
  2. Continued performance improvements

    - All temporary icon and image files are now written using internal PD
    file formats, as these formats are both 1) much faster to read/write,
    and 2) generally smaller than comparable PNG files.  This yields large
    performance improvements in both the Undo/Redo engine, and thumbnail
    reads/writes for the "Recent Files" menu.
    - The Undo engine has been reworked to report profiling information.
    This turned up some surprising locations where stalls may occur, and
    I've now optimized a few of those areas by hand.  Additional work is
    - The central system for determining thumbnail quality algorithms has
    been recalibrated.  Thumbnails are now resized using a faster algorithm
    by default, which provides reasonably good results in about 1/2 the time
    of the previous method.  The previous method can still be accessed via
    the Options dialog, and setting thumbnail quality to MAXIMUM.
    (Similarly, you can turn down thumbnail quality for a performance
    - Thumbnail performance is now cached in the preferences manager instead
    of a bare global variable.  I intend to move other global performance
    variables to this system as well.
    committed Dec 7, 2016
  3. Write hand-optimized versions of all remaining blend modes

    I finally bit the bullet and completed this horrible project.  All blend
    modes are now well-optimized, which is particularly important while
    painting, obviously.  The HSL-based functions in particular have
    received special attention, and are now much more aggressively tuned
    committed Dec 7, 2016
Commits on Dec 6, 2016
  1. Paintbrush engine: tons of performance improvements

    In particular, a bunch of work has been focused on improving the "end of
    stroke" stutter as the paint layer is permanently merged down onto the
    layer beneath it.  A large series of optimizations are now attempted,
    including checks for things like "can we fit the paint stroke into the
    existing DIB" or "what is the smallest overlapping region of the stroke
    and original layer we have to handle specially?"  New optimized pathways
    have also been built into the central compositor, pixel blender, and DIB
    assistant module, with extra attention given to improving painting in
    non-standard alpha modes.
    Painting on large layers in particular should now be much more fluid
    between strokes, particularly when non-standard blend and/or alpha modes
    are active.
    Additional optimizations have also been applied to the paint engine
    itself, including improved formulas for calculating update rects.  This
    ekes out a few more FPS while under heavy load.
    committed Dec 6, 2016
  2. pdStream: a new solution for generic I/O streams

    For several years now, the clsBasicBuffer class (by vbforums user
    "dilettante" -
    has served as the backbone for pdPackages.  It's a very nice little byte
    buffer class, but I've slowly hacked it to pieces to support my own
    needs, and if I want to improve pdPackages further, it's time to move to
    a new solution designed from the ground-up for performance.
    Enter pdStream.  pdStream is also a class built around byte buffer
    management, but with some significant differences from the old solution.
    1) pdStream makes it easy to do crazy things like write/read large
    chunks of data to/from bare pointers.  PD does this a lot to avoid
    unnecessary data copying between 3rd-party compression libraries, and if
    we can compress things like large byte arrays directly into the target
    buffer, it saves us a ton of unnecessary allocations.
    2) pdStream no longer exposes a "chunk size" that the user controls.
    Instead, allocations are made automatically, using a strategy that
    automatically adjusts based on the size of the buffer and the size of
    incoming write requests.  This lets us create pdPackage objects with
    significantly fewer repeat allocations, which again meaningfully
    improves performance.
    3) This feature is still under construction, but pdStreams can be
    created in two modes: the default memory-backed mode, or a new
    file-backed mode.  In file-backed mode, pdStream wraps file-only APIs
    and uses no additional memory to hold the stream's data.  This lets you
    use the same pdStream interface to write data directly to file (instead
    of building up the stream in memory, then dumping the whole thing to
    file at once).  Similarly, when reading a stream from an existing file,
    we no longer need to copy the whole file into memory before using the
    stream's interface.  (Besides just performance, this will also allow us
    to support file sizes > 2GB!)
    As I said, that last feature is still under construction, but it will be
    an important part of continued pdPackage performance improvements.
    Memory-mapped files in particular will be used where possible, making
    pdStream a much better end-to-end solution for our unique buffering
    In the meantime, pdPackages have been fully migrated to pdStream, and
    PD's PDI file load/save functions have also been updated to make use of
    new functionality.  I haven't done actual benchmarking between the old
    and new systems to compare performance, but memory usage is way down
    compared to the old system, and things like file saving (particularly
    Undo/Redo) certainly "feel" snappier.
    Thanks also to Roy K for an updated German language file, which I've
    crammed into this commit.
    committed Dec 6, 2016
Commits on Dec 4, 2016
  1. LZ4 now exists as an available compression engine

    Lots of changes here; let's try to go through them quickly...
    1) In a nutshell: the high-performance LZ4 engine
    ( is now available as a compression library,
    and all of PD's internal temp caches now use it preferentially.
    2) Why another compression library?  Because LZ4 is one of the few
    compression libraries where: [compression time] + [time to write
    compressed data to HDD] < [time to write uncompressed data to HDD].
    Most compression algorithms are slower than typical HDD I/O speeds, so
    while they provide great benefits over a network, for local use, they
    exist purely to save HDD space.  LZ4 is unique this way, because it
    provides tangible performance benefits because of its sweet spot in the
    compression speed / compression ratio paradigm.  (Note that HDD times
    here refer to a traditional 7200 RPM drive; SSDs are still faster for
    raw uncompressed writes, but they tend to have bigger problems with
    available space, so the tradeoff is still a good one for us).
    3) As a quick comparison, note that LZ4 outperforms Google's Snappy
    algorithm *and* Apple's LZFSE by large margins.  It has become the
    default algorithm for a number of database engines and storage platforms
    (e.g. OpenZFS -, so it
    is very stable and still under active development - all good things.
    See the squash benchmark for more detailed results:
    4) When doing something like "Save Image to PDI", PD still prefers to
    use zstd, because its compression ratios are so good (and
    compression/decompression performance is still way better than zlib).
    However, when saving something like Undo/Redo data, our main concern is
    snappy performance, particularly while the user is painting.  This is
    where LZ4 comes in.  Using it for Undo/Redo compression actually results
    in *better* performance than writing uncompressed data, while
    simultaneously saving the user's HDD space.  Very exciting.
    5) As of this build, PD now defaults to compressing Undo/Redo data using
    LZ4.  From the Options dialog, the user can still disable compression
    entirely, if desired, or request even stronger compression (which
    results in PD silently switching to zstd).
    6) The LZ4 DLL includes a high-compression alternative called "lz4hc",
    which compresses even better than stock LZ4 while having identical
    decompression speeds.  PD does not currently expose this mode as zstd
    fits that particular niche better for us.  I may look at changing this
    for Undo/Redo data specifically, as lz4's extreme decompression speeds
    are helpful there.
    7) The central Compression module handles all lz4 interop transparently,
    so you really don't need to know details about its internals to use it.
    Just remember that...
    - zLib is for legacy use only
    - zstd should be used when compression ratio is at a premium
    - lz4 should be used when performance is at a premium (for *any* data
    that hits the HDD, including temp data)
    8) I will be looking for more compression applications throughout the
    project in the coming weeks, but for now, compression engine work is
    being set aside for other tasks.
    committed Dec 4, 2016
  2. Clipboard: fix problem with upside-down PNG format

    FreeImage is killing me these days.  If I had an alternative, I'd use
    it, but I still don't know of a comparable VB-friendly image I/O
    engine... sigh...
    committed Dec 4, 2016
  3. New unified compression interface

    The new Compression module makes it trivial to perform compression and
    decompression tasks against any backend.  Simply call the standard
    compression/decompression functions, specifying the compression engine
    you want, and the module silently takes care of the rest.
    This greatly simplifies the design of pdPackager2, and should allow me
    to remove a lot more redundant compression code from throughout the
    project.  It also makes it much easier to add new compression libraries
    to the project, as it's just the addition of a new compression engine
    enum to the central Compression module; everywhere else in the project
    inherits that enum "automagically".
    As part of this change, the default compression engine for internal PD
    bits is now zstd.  This accelerates a number of previously slow
    processes, but don't get too comfortable, because they're soon going to
    get even faster...
    committed Dec 4, 2016
Commits on Dec 2, 2016
  1. Finalize zstd integration into pdPackage v2

    pdPackage now supports zstd as a compression library.  It should be used
    in place of zlib wherever possible (although zlib is still fully
    supported throughout the library).
    A few notes, while I'm here:
    1) pdPackage v2 no longer writes or verifies specialized checksums.
    There are a few reasons for this; most significantly, the compressed
    streams themselves have internal checksums, and if they pass all checks,
    we're perfectly happy to try and load the data.  CRC32 values are easily
    forged and as PDI is not a transmission format, the checksums embedded
    in the compression streams are as good a failsafe as any against rare
    events like corruption.  (Also, zstd uses a better checksum method than
    crc32 anyway, so it's stupid for us to apply crc32 on top of their
    method.)  Note that this also provides a meaningful performance boost
    during both reading and writing, which is never unwelcome.
    2) Many function signatures have changed in pdPackage v2.  They may
    change again in the coming weeks.  I'm trying to simplify the process of
    using pdPackages, which necessitates some interface modifications from
    3) When creating packages, compression settings are no longer
    "true/false".  You must now explicitly set the compression engine for
    each added node (e.g. none, zlib, zstd).  zstd is the new default
    throughout the library.  (And obviously, as before, no input is required
    when reading a package - pdPackage v2 handles that silently.)
    4) The zip-like quick interfaces for packaging files and folders have
    not been well-tested throughout this overhaul.  Use at your own risk.  I
    can't roll PD's auto-update system over to these (as old systems don't
    have zstd available), so pdPackage v1 will continue to be used for
    auto-updates for the forseeable future.
    5) I've recompiled libzstd using VS 2015, which will hopefully eke out a
    bit more performance on newer systems.  Let me know if you run into
    anything strange, but my few quick tests on an old XP box show no
    obvious problems.  (As always, I can't guarantee that PD will work on
    older systems *unless* they are up-to-date on Windows Updates - don't
    forget this!)
    6) Next up is work on a preliminary streaming interface for pdPackage.
    I've got a few different ideas for this, so there may be a delay as I
    test them out and see which produces the largest benefit(s).
    committed Dec 2, 2016
  2. zstd library: integrate custom-built DLL as plugin

    zstd info available here:
    This custom build is basically just a stock build of the latest GitHub
    repo, with options set to enable XP compatibility.  XP testing has not
    actually occurred yet, because compression integration is still ongoing.
    Note that zLib will always be available for legacy reasons, but the plan
    is to gradually roll all internal PD compression functions over to zstd,
    which is significantly faster (or alternatively, which compresses better
    than zLib if you slow it down to zLib levels).
    This commit also includes a "hack-and-slash" implementation of zstd
    compression inside pdPackager2.  I can confirm that compression and
    decompression both work beautifully, and they were in fact quite simple
    to implement.  Preliminary results show slightly larger PDI files using
    default settings, but PDI saving time is improved 4x over zLib.  On my
    standard 5-layer, 5-megapixel test image, save time drops from ~1.0
    second to ~0.28 seconds over repeat tests, while saved size increases to
    ~5,800 kb from ~5,200 kb; if settings are tweaked to produce comparable
    save file sizes, compression speed is about ~0.5 seconds, or still twice
    as fast.  A similar test on a 2-layer, 20-megapixel image, shows save
    time reduced from 4.5 seconds to 1.4 seconds.
    Next up is additional rounds of performance testing to make sure this
    switch is worth it, and if that pans out, I'll properly integrate
    controllable compression engine settings into pdPackager2.
    committed Dec 2, 2016
Commits on Dec 1, 2016
  1. PDI file format: roll over various engines to v2

    v2 of the PDI format is going to be built on v2 of the pdPackager class.
    This format will break backwards compatibility with v1 in a number of
    significant ways.  It will also all be handled transparently to the end
    user, so there's nothing here for users to be concerned with.
    On my end, v2 is going to be a significant improvement for both memory
    usage and performance.  pdPackager2 is going to completely do away with
    the need to maintain a running copy of the archive in RAM.  Instead, it
    will write data immediately out to file as it receives it, allowing us
    to work with much larger files and with improved asynchronous behavior
    (thanks to everything being built around memory-mapped files from the
    Also new to v2 is support for the new zstd compression library, which
    will give us much better compression performance at identical or better
    compression ratios than v1.  Integrating zstd is still "to-do", however.
    committed Dec 1, 2016
  2. pdLayer: lay the framework for migrating ICC profile management here

    With this, it's now time to do something scary: start work on an upgrade
    to the PDI file format.  I'm going to do what I should have done in the
    first place (sigh) and use the .zip file approach of sticking the
    directory at the *end* of the file.  This lets us write compressed data
    out to file immediately, without the need for an in-memory cache, and
    when we're done we just stick our finished directory after all the
    compressed data.
    Also, I'm looking at moving from zLib to Facebook's zstd library
    ( which provides compression levels
    nearly identical to zLib, but with significantly faster compression and
    decompression time.  This would be a large win for us, particularly when
    writing Undo/Redo data.
    Anyway, this all relates to ICC profiles because a new chunk is needed
    inside PDI files for storing all ICC data for all layers (and
    potentially the image's working space, too).  But I don't want to
    implement this until the new PDI implementation is finished.
    committed Dec 1, 2016
Commits on Nov 30, 2016
  1. Use ExifTool to extract ICC profiles from formats FreeImage doesn't s…

    Also, a generic function now exists to use ExifTool to extract *any*
    embedded ICC profile out to a standalone file.
    Because metadata is extracted asynchronously, the base image may appear
    on-screen before ExifTool pulls the ICC data for us.  This may cause the
    image to "flicker" as its correct profile is applied.  I don't consider
    this a problem, as the cost of fixing it is waiting for ExifTool
    synchronously, which is simply not feasible from a performance
    committed Nov 30, 2016
  2. LittleCMS: add support for retrieving basic ICC profile info

    We need access to things like "profile name" if we want to allow the
    user to assign a new profile to an image or layer.
    committed Nov 30, 2016
Commits on Nov 29, 2016
  1. Activate color management for all remaining UI elements

    ...including effect previews.  I've done my best to make sure this
    doesn't interfere with performance, but please let me know if you run
    into anything unexpected.
    I've also switched to a custom internal window message to broadcast
    color profile changes (including events like dragging the PD window to a
    different display), which greatly reduces the amount of custom code
    required to redraw every on-screen element that may be affected.  Yay!
    I also discovered that (as usual) alpha premultiplication introduces
    some unique quirks into color management.  I've now added the ability to
    override alpha handling if it's known that a DIB does not contain
    meaningful alpha values.  This is crucial in places like the viewport,
    where we only color-manage the final "ready for the screen" composited
    image (which contains no meaningful alpha data, meaning we can skip its
    processing entirely).
    Also, I've reworked the transform cache system to correctly manage
    transforms for both 24-bpp and 32-bpp source DIBs.  24-bpp DIBs are
    still used in some places throughout the program, although I'm slowly
    replacing them with 32-bpp versions, which are much simpler as we don't
    have to deal with stride issues.
    committed Nov 29, 2016
  2. Reactivate color management for all color selection controls and dialogs

    Also, the image tabstrip is now color-managed when visible.
    committed Nov 29, 2016
  3. Reinstate color management for the main viewport

    I wanted to do this before proceeding with paint tools, because I need
    to know what kind of delay is induced by problematic ICC profiles (like
    those generated by the stupid Windows "calibrate your monitor without
    using professional tools!" Control Panel applet).
    Maximum delay across my set of test profiles is ~15-25 ms, with
    professionally constructed profiles coming in closer to ~10 ms.  This is
    an excellent result, and significantly better than the old Windows ICM
    engine (which frequently hit delays over 100 ms, especially on profiles
    constructed by Windows).
    Note that users who don't want a color-managed display can now
    deactivate the engine entirely from the Tools > Options dialog.  This
    will provide a performance boost, and if your display is close to sRGB
    anyway, it might be worth it.
    Next up is restoring color management to elements like the image
    tabstrip and the color selection tools.
    Thank you also to Roy K and ChenLin for contributing updated translation
    files to this patch.
    committed Nov 29, 2016
Commits on Nov 28, 2016
  1. Tools > Options: overhaul in preparation for 7.0 changes

    Some rearranging was in order, as a lot of program behavior has changed
    over the course of this development cycle.  My apologies about the
    translation changes this will require.  :/
    committed Nov 28, 2016
  2. Color Management: new foundation for a Photoshop-like experience

    Migrating from the Windows ICM engine to LittleCMS opened the door to
    huge improvements to PD's color management approach.  Instead of being
    tied to sRGB for everything, we now have the ability to use any working
    space we want - or rather, any working space the *user* wants, which is
    what ultimately matters.
    Of course, enabling this requires rethinking a number of core
    architectural decisions relating to color management.  This will be a
    multi-commit project with many changes throughout core PD classes.
    First up, this commit reworks the way PD manages the "working space to
    display" transform.  I've added new user preferences to increase user
    control over display handling, and the entire under-the-hood approach to
    display color management has been improved.  (For example, displays are
    now tracked by EDID serial number, which guarantees that color
    management settings correctly persist for detachable displays.)
    One of the most important LittleCMS improvements is the ability to
    optimize color transforms and cache them; this provides large
    performance boosts whenever a transform is reused - and the current
    "working space to display" transform is used every time the screen is
    redrawn!  I've reworked the "CheckParentMonitor" function to be much
    more aggressive about skipping unneeded work, which should further
    improve viewport performance once the color management engine is turned
    back on.
    Speaking of that - display color management is *not* currently active in
    the viewport pipeline, by design.  I'm in the process of rebuilding the
    transform code against LittleCMS, but at present, display ICC settings
    will not be visible in the main viewport.
    committed Nov 28, 2016
Commits on Nov 18, 2016
  1. Paintbrush: massive performance improvements

    The brush engine and compositor now work together to only redraw the
    portion of the paint stroke cache that's changed since the last stroke
    render.  This makes paintbrush performance pretty much independent of
    image size or layer count, which is the ideal performance scenario.
    committed Nov 18, 2016
Commits on Nov 16, 2016
  1. Directly expose paintbrush preview quality setting

    This is very helpful during testing
    committed Nov 16, 2016
Commits on Nov 15, 2016
  1. Implement alpha locking and erase blend mode

    These two things don't seem related, but because they both live inside
    pdCompositor, it was relatively easy to implement them simultaneously.
    The "Erase" blend mode is (in my mind, anyway) an elegant solution to an
    "eraser" tool.  Basically, it allows us to use *any* object as an
    eraser!  This means that any brush in the program - with all associated
    brush features - can be used to erase, or any shape, vector layer, text
    layer, or image layer.
    As an example, load an image, add a text layer, then set the text
    layer's blend mode to "erase".  Pretty cool!
    (I'll still implement a dedicated "erase" tool at some point, but it
    will just silently wrap this behavior around a standard brush
    Also, alpha locking is now available as an alpha mode.  This allows you
    to paint on any layer or image without modifying its alpha channel.
    Please note that locked alpha *only applies to paint tools* at present,
    *not* filters or adjustments.  Other software handles this
    inconsistently (e.g. some programs ignore the setting for filters and
    adjustments, which is my preferred behavior, but GIMP honors it even
    though it may lead to undesirable output) and I haven't entirely made up
    my mind about how to handle this in PD.  There are pros and cons to both
    methods.  Input welcome!
    (Also, a quick note - these features were ported over very quickly from
    my separate paint tool test program, so I haven't tested them
    thoroughly.  Please let me know if you run into any bugs.)
    committed Nov 15, 2016
Commits on Nov 14, 2016
  1. pdSlider: new exponential, logarithmic modes

    Some sliders cover an enormous range of values.  If users are likely to
    spend most of their time in a certain subset of values (e.g. a slider
    that goes from 1 to 10,000, but values 1-100 are most likely), this
    makes the slider much more usable.
    There are many sliders in the project that could make use of this
    option, but for now, I've only updated the Paintbrush Size slider.  I'll
    get to others in due time.
    (Also, note that this new setting *only works for sliders with minimum
    values > 0*.  This is by design.)
    committed Nov 14, 2016
Commits on Nov 10, 2016
  1. Compositor: improve performance of semi-transparent brushes

    GDI+ is horrifically slow when resizing and changing alpha
    simultaneously, so we now always perform these steps separately.
    committed Nov 10, 2016
Commits on Nov 8, 2016
  1. Minor UI, translation fixes to paint tools

    For now, I'm limiting brush sizes to 1.0 pixels.  GDI+ subpixel behavior
    is wonky under 1.0, so this is the easiest solution.
    committed Nov 8, 2016
  2. New merge pipeline for paint operations

    Building off the previous commit, the final "merge" of a paint brush
    onto a target layer is now handled by a specialized merge function
    (instead of the regular MergeLayers() function, which uses different
    order rules).
    This means that previewed brush strokes and final brush strokes are now
    identical - yay!
    With this, I believe the basic paintbrush is pretty much free of major
    bugs.  Performance is still not ideal, but I've got a gameplan for
    fixing it.  Now if only I can find the time...
    committed Nov 8, 2016
  3. New compositor pipeline for paintbrushes

    I hadn't originally wanted to do this, given the complexity of PD's
    central compositor, but there's no escaping the performance benefits.
    The painting pipeline is just different enough to require some specific
    tweaks, particularly when it comes to compositing order (as we have to
    composite the paint stroke onto the target layer out-of-order), so we're
    gonna get better long-term performance with a dedicated paint
    At present, the brush stroke compositor uses a very similar strategy to
    the regular compositor.  Specifically, it re-renders the entire brush
    area on every screen refresh.  An obvious future optimization would be
    rendering only the regions of the brush layer that have changed, and
    when I have time, you can bet I'll tackle that!  (The calculations for
    this are going to be a bit complicated, as we don't currently track any
    of this information... but the responsiveness benefits will likely be
    worth the trouble.)
    That said, performance of the current implementation is still
    surprisingly competitive with GIMP and Paint.NET on large multilayer
    Besides performance benefits, the new compositor pipeline correctly
    implements the behavior of painting on a layer with a non-standard
    opacity, blend mode, or alpha inheritance.  The paint stroke and target
    layer are precomposited *first*, and the result is then blended using
    the current layer's settings.  This gets complicated quickly as the
    paint brush can have its own opacity, blend mode, and alpha settings,
    but I believe everything is sorted.
    Unfortunately, PD's "merge layers" tool - which performs the final paint
    stroke merge when the mouse is released - does not behave identically.
    I'll be updating it shortly to support a "paint" mode, which has some
    different considerations from "normal" mode.
    committed Nov 8, 2016
Commits on Nov 7, 2016
  1. Paint tool: implement Undo/Redo integration

    With this, a basic paintbrush tool is effectively implemented.  You can
    modify brush settings, apply strokes, and undo those strokes as
    There are still some weird oddities with things like non-standard blend
    modes (e.g. painting onto a "multiply" layer with a "normal" brush is
    not previewed correctly), but bugs like this are actively being worked
    committed Nov 7, 2016
Commits on Nov 6, 2016
  1. Fix issue with pdSlider Got/Lost focus tracking

    Argh, just a typo issue
    committed Nov 6, 2016
  2. Fix a number of Undo/Redo issues with non-destructive properties

    Non-destructive properties are a real PITA when it comes to Undo/Redo.
    When toggling something like layer visibility, we don't want to create
    Undo entries for every change - for example, if the user toggles a
    layer's visibility "on" then "off", we don't want to create an Undo for
    that, as the image is back to its original state.
    This commit should fix bothersome issues where Undo/Redo data would
    seemingly be created out of nowhere, because the system was overly
    aggressive about trying to track changes to non-destructive properties.
    committed Nov 6, 2016