pdPackager: further improve node read/write performance

Two major changes here. 1) As much as possible, I've tried to rework pdPackager to not require intermediary arrays when reading/writing, and/or compressing/decompressing node data. This is made possible by new interaction functions with the underlying buffer class, which allow us to use naked pointers and pre-allocated buffers whenever possible, avoiding the need for interop via VB arrays. 2) Raster layers in PDI files were another opportunity for optimizations, as we must pre-allocate a DIB prior to extracting the actual pixel data from file. Historically, pixel data had to be extracted into a temporary array at its compression size, decompressed into a new array at its decompression size, then finally copied into the pre-allocated DIB. Besides wasting a huge amount of memory, it's time-consuming to repeatedly allocate large chunks of memory (50+ MB for a standard photo), allocations that are further compounded when loading images with multiple layers. But no more! Now, pdPackager exposes some (explicitly marked as dangerous) functions for circumstances when an external buffer can be safely pre-allocated, independent of pdPackager. When this happens, it can now perform a blind extraction+decompression directly from the master pdPackage buffer (typically compressed) to a caller-allocated buffer (like a DIB, in PD's case). Temporary buffers are no longer required or allocated, but obviously, these functions are only valid for circumstances where the caller can use other information (in our case, DIB stride, height, color depth) to pre-allocate a correctly sized buffer independent of pdPackage's internal measurements. Anyway, the take-home message is that PDI reading+writing is now much faster, which is particularly great for improving responsiveness of the Undo/Redo stack.
tannerhelland · Aug 12, 2015 · 1090def · 1090def
1 parent 4d0244e
commit 1090def
Show file tree

Hide file tree

Showing 6 changed files with 288 additions and 78 deletions.
diff --git a/Classes/cBasicBuffer.cls b/Classes/cBasicBuffer.cls
@@ -166,9 +166,8 @@ Public Property Get Size() As Long
     Size = mSize
 End Property
 
+'Erase current Buffer contents and reset all internals.
 Public Sub CloseBuf()
-    'Get rid of Buffer contents, prepare for new open and reuse.
-
     Erase Buffer
     mOpen = False
     mPosition = 0
@@ -201,8 +200,8 @@ Public Function CurrentBytesPointer() As Long
 End Function
 
 'Open a new buffer.
-' Added by Tanner: allow the caller to specify a starting buffer size.  This can be helpful if we have some idea
-'  of the buffer's size in advance, as we avoid the need for costly ReDim Preserve statements.
+' Added by Tanner: allow the caller to specify a starting buffer size.  In many cases in PD, I have some notion
+'  of the buffer's size in advance, which spares the need for costly ReDim Preserve statements during buffer construction.
 Public Sub OpenBuf(Optional ByVal startingBufferSize As Long = 0)
 
     If mOpen Then
@@ -218,9 +217,11 @@ Public Sub OpenBuf(Optional ByVal startingBufferSize As Long = 0)
 
 End Sub
 
+'Return the requested number of bytes, or all bytes if -1 is specified as the length.  This function makes no changes
+' to the buffer itself, but it *does* move the current buffer position by [length] bytes.
+' NOTE: you obviously should not pass zero as the length.
 Public Function ReadBytes(Optional ByVal Length As Long = -1) As Byte()
-    'Return requested number of bytes (or less if less in buffer) or
-    'all bytes.
+
     '
     'Requesting Length = 0 bytes (or any bytes when the buffer is empty)
     'raises an exception since we can't return an "empty" Byte array.
@@ -256,16 +257,55 @@ Public Function ReadBytes(Optional ByVal Length As Long = -1) As Byte()
     ReadBytes = Bytes
 End Function
 
-Public Function ScanForBytes(ByRef Target() As Byte) As Long
+'Added by Tanner: variant approach to ReadBytes(), above.  This function behaves identically to ReadBytes in that it
+' moves the buffer pointer according to the requested length.  HOWEVER, it only returns a pointer to the current position,
+' not a full copy of the buffer bytes.  This is ideal for subsequent API interactions like zLib, because we can simply
+' pass it that pointer, sparing us the need for costly (and pointless) MoveMemory calls.
+'
+'IMPORTANT NOTE: adding data to the buffer may force it to allocate new memory, invalidating all previously returned
+' pointers.  You must use the return of this function immediately, as its correctness is not guaranteed if any other
+' class functions are called.
+Public Function ReadBytes_PointerOnly(Optional ByVal Length As Long = -1) As Long
+
+    'This If() block is roughly identical to ReadBytes, above
+    If mOpen Then
+        If Length < 0 Then
+            'A Length = -1 (or < 0) means read all.
+            If EOS Then
+                Err.Raise 5, TypeName(Me), "Nothing to read"
+            Else
+                Length = mSize - mPosition
+            End If
+        ElseIf Length > 0 Then
+            If mPosition + Length > mSize Then
+                'Handle requests extending past EOS.
+                Length = mSize - mPosition
+            End If
+        Else
+            Err.Raise 5, TypeName(Me), "Can't request 0 (no) bytes"
+        End If
+    Else
+        Err.Raise &H8004C300, TypeName(Me), "Not open"
+    End If
+
+    'Return a pointer to the current buffer position
+    ReadBytes_PointerOnly = VarPtr(Buffer(mPosition))
+
+    'Advance the buffer position identically to ReadBytes(), above
+    mPosition = mPosition + Length
+
+End Function
+
+Public Function ScanForBytes(ByRef target() As Byte) As Long
     'Scan forward from Position, looking for Target value which can be
     'one or more bytes.
     '
     'If found returns position index of first byte of Target, else
     'returns -1.
 
     If mOpen Then
-        ScanForBytes = InStrB(mPosition + 1, Buffer, Target) - 1
-        If ScanForBytes > mSize - (UBound(Target) - LBound(Target) + 1) Then ScanForBytes = -1
+        ScanForBytes = InStrB(mPosition + 1, Buffer, target) - 1
+        If ScanForBytes > mSize - (UBound(target) - LBound(target) + 1) Then ScanForBytes = -1
     Else
         Err.Raise &H8004C300, TypeName(Me), "Not open"
     End If

diff --git a/Classes/pdImage.cls b/Classes/pdImage.cls
@@ -1290,7 +1290,7 @@ Public Function getNumOfHiddenLayers() As Long
 End Function
 
 'Estimate the RAM usage for this image, based on the number of layers and the size of each layer
-Public Function estimateRAMUsage() As Long
+Public Function estimateRAMUsage() As Double
 
     estimateRAMUsage = 0
 

diff --git a/Classes/pdLayer.cls b/Classes/pdLayer.cls
@@ -2292,11 +2292,11 @@ Public Function syncInternalDIB() As Boolean
 End Function
 
 'Estimate the RAM usage for this layer, based on known heavy memory users (layer DIB, in particular)
-Public Function estimateRAMUsage() As Long
+Public Function estimateRAMUsage() As Double
 
     'Calculate the raw memory size of both the layer and the cached thumbnail DIB, if any
     If Not (layerDIB Is Nothing) Then estimateRAMUsage = layerDIB.getDIBArrayWidth * layerDIB.getDIBHeight
-    If Not (tmpDIB Is Nothing) Then estimateRAMUsage = tmpDIB.getDIBArrayWidth * tmpDIB.getDIBHeight
+    If Not (tmpDIB Is Nothing) Then estimateRAMUsage = estimateRAMUsage + tmpDIB.getDIBArrayWidth * tmpDIB.getDIBHeight
     If Not (layerThumbnail Is Nothing) Then estimateRAMUsage = estimateRAMUsage + layerThumbnail.getDIBArrayWidth * layerThumbnail.getDIBHeight
 
     'Assume a 10% overhead for this class instance and other miscellaneous layer data

diff --git a/Classes/pdPackager.cls b/Classes/pdPackager.cls
@@ -1603,14 +1603,41 @@ Public Function getNodeDataByID(ByVal targetNodeID As Long, ByVal useHeaderBuffe
     Next i
 
     'If a matching node was found, use getNodeDataByIndex() to retrieve its data
-    If nodeIndex > 0 Then
+    If nodeIndex >= 0 Then
         getNodeDataByID = getNodeDataByIndex(nodeIndex, useHeaderBuffer, dstArray(), disableChecksumValidation)
     Else
         getNodeDataByID = False
     End If
 
 End Function
 
+Public Function getNodeDataByID_UnsafeDstPointer(ByVal targetNodeID As Long, ByVal useHeaderBuffer As Boolean, ByVal dstPointer As Long, Optional ByVal disableChecksumValidation As Boolean = False) As Boolean
+
+    'Search the node array for a matching ID
+    Dim i As Long, nodeIndex As Long
+    nodeIndex = -1
+
+    For i = 0 To UBound(m_NodeDirectory)
+
+        If targetNodeID = m_NodeDirectory(i).NodeID Then
+
+            'This node is the one!
+            nodeIndex = i
+            Exit For
+
+        End If
+
+    Next i
+
+    'If a matching node was found, use getNodeDataByIndex() to retrieve its data
+    If nodeIndex >= 0 Then
+        getNodeDataByID_UnsafeDstPointer = getNodeDataByIndex_UnsafeDstPointer(nodeIndex, useHeaderBuffer, dstPointer, disableChecksumValidation)
+    Else
+        getNodeDataByID_UnsafeDstPointer = False
+    End If
+
+End Function
+
 'getNodeHeaderByName and getNodeHeaderByID both wrap this function, getNodeHeaderByIndex.  (Those functions are simply used
 ' to retrieve the relevant index for a node.)
 '
@@ -1631,23 +1658,21 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe
         dataOriginalSize = m_NodeDirectory(nodeIndex).NodeDataOriginalSize
     End If
 
-    'Move the data buffer position to the relevant offset, then retrieve the data arary
-    Dim tmpDataArray() As Byte
+    'Move the data buffer position to the relevant offset for the requested chunk
     m_DataBuffer.Position = dataOffset
-    tmpDataArray = m_DataBuffer.ReadBytes(dataPackedSize)
 
     'If the original size and packed size are different, assume the data is compressed.
     If (dataPackedSize <> dataOriginalSize) Then
 
         'Make sure zLib is available.  If it isn't, the user is screwed.
         If m_ZLibAvailable Then
 
-            'Resize the temporary buffer to the original size of the data
+            'Resize the destination buffer to the original size of the data
             ReDim dstArray(0 To dataOriginalSize - 1) As Byte
 
             'Use zLib to decompress the data
             Dim zLibResult As Long
-            zLibResult = uncompress(dstArray(0), dataOriginalSize, tmpDataArray(0), UBound(tmpDataArray) + 1)
+            zLibResult = uncompress(dstArray(0), dataOriginalSize, ByVal m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize)
 
             If zLibResult <> 0 Then
 
@@ -1665,9 +1690,9 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe
 
         End If
 
-    'The data is uncompressed.  No further processing is necessary.
+    'If the data is uncompressed, make a direct copy of the corresponding chunk of the master buffer
     Else
-        dstArray = tmpDataArray
+        dstArray = m_DataBuffer.ReadBytes(dataPackedSize)
     End If
 
     'Retrieve the stored checksum value for this node (if any).
@@ -1710,6 +1735,110 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe
 
 End Function
 
+'WARNING!  This function is for expert users only.  Do not use it unless you fully understand the repercussions described below.
+'
+'Given a node index and a target buffer region (header or data), blindly copy the contents of that data buffer to a pointer supplied by the user.
+' As with the normal getNodeDataByIndex() function, decompression and checksum validation is handled automatically, depending on the contents of the node.
+'
+'What makes this function different is that it completely relies on the caller to create a correctly sized buffer for the extracted data.
+' For common cases, this is simply not possible, as the caller does not have enough knowledge necessary to prep such a buffer.  Instead, it relies
+' on pdPackager to construct the destination buffer for it, using the size values stored inside the pdPackage instance.
+'
+'PhotoDemon is unique in this regard, as it can correctly calculate the size of raster layer pixel buffers (because it knows the associated
+' width, height, stride and color depth of the buffer).  Because PD has to create a matching DIB in advance, a destination buffer of perfect size
+' is already implicitly allocated, so there is no need for a temporary VB array to shuttle the data around.
+'
+'That said, note that PD only uses this function sparingly, when it is *absolutely certain of its safety*.  You would be wise to do the same.
+' An incorrectly allocated destination buffer will cause mass destruction (and a guaranteed hard crash), so do not use this function ignorantly.
+Public Function getNodeDataByIndex_UnsafeDstPointer(ByVal nodeIndex As Long, ByVal useHeaderBuffer As Boolean, ByVal dstPointer As Long, Optional ByVal disableChecksumValidation As Boolean = False) As Boolean
+
+    'Retrieve the offset, packed size, and original (unpacked) size of the target data
+    Dim dataOffset As Long, dataPackedSize As Long, dataOriginalSize As Long
+
+    If useHeaderBuffer Then
+        dataOffset = m_NodeDirectory(nodeIndex).NodeHeaderOffset
+        dataPackedSize = m_NodeDirectory(nodeIndex).NodeHeaderPackedSize
+        dataOriginalSize = m_NodeDirectory(nodeIndex).NodeHeaderOriginalSize
+    Else
+        dataOffset = m_NodeDirectory(nodeIndex).NodeDataOffset
+        dataPackedSize = m_NodeDirectory(nodeIndex).NodeDataPackedSize
+        dataOriginalSize = m_NodeDirectory(nodeIndex).NodeDataOriginalSize
+    End If
+
+    'Move the data buffer position to the relevant offset for the requested chunk
+    m_DataBuffer.Position = dataOffset
+
+    'If the original size and packed size are different, assume the data is compressed.
+    If (dataPackedSize <> dataOriginalSize) Then
+
+        'Make sure zLib is available.  If it isn't, the user is screwed.
+        If m_ZLibAvailable Then
+
+            'Use zLib to decompress the data directly into the pointer we've been given
+            Dim zLibResult As Long
+            zLibResult = uncompress(ByVal dstPointer, dataOriginalSize, ByVal m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize)
+
+            If zLibResult <> 0 Then
+
+                raiseInternalMessage "File node is compressed, but zLib failed to decompress it.  Error code returned: %1", zLibResult
+                getNodeDataByIndex_UnsafeDstPointer = False
+                Exit Function
+
+            End If
+
+        Else
+
+            raiseInternalMessage "File node is compressed, but zLib is missing.  Decompression impossible; abandoning load."
+            getNodeDataByIndex_UnsafeDstPointer = False
+            Exit Function
+
+        End If
+
+    'If the data is uncompressed, make a direct copy of the corresponding chunk of the master buffer
+    Else
+        CopyMemory dstPointer, m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize
+    End If
+
+    'Retrieve the stored checksum value for this node (if any).
+    Dim checkSumOriginal As Long
+
+    If useHeaderBuffer Then
+        checkSumOriginal = m_NodeDirectory(nodeIndex).NodeHeaderAdler32
+    Else
+        checkSumOriginal = m_NodeDirectory(nodeIndex).NodeDataAdler32
+    End If
+
+    'Apply checksum validation now, unless one of three criteria is met:
+    ' 1) The node does not contain a checksum to validate against
+    ' 2) The user has forcibly disabled checksumming this node
+    ' 3) zLib is missing
+    If (checkSumOriginal <> 0) And (Not disableChecksumValidation) And m_ZLibAvailable Then
+
+        'Like CRC32 functions, Adler checksums accept a previous value as their initial input.  If you don't want to supply
+        ' this, you can supply a null buffer to get the library's recommended initial value.  (That's what we do here.)
+        Dim zLibAdlerSum As Long
+        zLibAdlerSum = checkSumArbitraryData(dstPointer, dataOriginalSize)
+
+        'If the checksums do not match, fail the function and exit
+        If checkSumOriginal <> zLibAdlerSum Then
+            raiseInternalMessage "Checksum failed for node #" & nodeIndex & ".  Expected value: " & checkSumOriginal & ", calculated value: " & zLibAdlerSum & ". Load abandoned."
+            getNodeDataByIndex_UnsafeDstPointer = False
+            Exit Function
+        Else
+            If useHeaderBuffer Then
+                Debug.Print "Checksum successfully verified for node #" & nodeIndex & " (header)."
+            Else
+                Debug.Print "Checksum successfully verified for node #" & nodeIndex & " (data)."
+            End If
+        End If
+
+    End If
+
+    'If we made it all the way here, the node has been successfully loaded!
+    getNodeDataByIndex_UnsafeDstPointer = True
+
+End Function
+
 'If you want to use compression functions, you must provide the class with a path to a STDCALL (WAPI) copy of zLib, including
 ' "/zlibwapi.dll" at the end of the path.  This class assumes a file named "zlibwapi.dll"; if you are using one called zLib.dll,
 ' you must rewrite the zLib API function declarations to match.