Skip to content

Commit

Permalink
pdPackager: further improve node read/write performance
Browse files Browse the repository at this point in the history
Two major changes here.

1) As much as possible, I've tried to rework pdPackager to not require
intermediary arrays when reading/writing, and/or
compressing/decompressing node data.  This is made possible by new
interaction functions with the underlying buffer class, which allow us
to use naked pointers and pre-allocated buffers whenever possible,
avoiding the need for interop via VB arrays.

2) Raster layers in PDI files were another opportunity for
optimizations, as we must pre-allocate a DIB prior to extracting the
actual pixel data from file.  Historically, pixel data had to be
extracted into a temporary array at its compression size, decompressed
into a new array at its decompression size, then finally copied into the
pre-allocated DIB.  Besides wasting a huge amount of memory, it's
time-consuming to repeatedly allocate large chunks of memory (50+ MB for
a standard photo), allocations that are further compounded when loading
images with multiple layers.

But no more!  Now, pdPackager exposes some (explicitly marked as
dangerous) functions for circumstances when an external buffer can be
safely pre-allocated, independent of pdPackager.  When this happens, it
can now perform a blind extraction+decompression directly from the
master pdPackage buffer (typically compressed) to a caller-allocated
buffer (like a DIB, in PD's case).  Temporary buffers are no longer
required or allocated, but obviously, these functions are only valid for
circumstances where the caller can use other information (in our case,
DIB stride, height, color depth) to pre-allocate a correctly sized
buffer independent of pdPackage's internal measurements.

Anyway, the take-home message is that PDI reading+writing is now much
faster, which is particularly great for improving responsiveness of the
Undo/Redo stack.
  • Loading branch information
tannerhelland committed Aug 12, 2015
1 parent 4d0244e commit 1090def
Show file tree
Hide file tree
Showing 6 changed files with 288 additions and 78 deletions.
58 changes: 49 additions & 9 deletions Classes/cBasicBuffer.cls
Original file line number Diff line number Diff line change
Expand Up @@ -166,9 +166,8 @@ Public Property Get Size() As Long
Size = mSize
End Property

'Erase current Buffer contents and reset all internals.
Public Sub CloseBuf()
'Get rid of Buffer contents, prepare for new open and reuse.

Erase Buffer
mOpen = False
mPosition = 0
Expand Down Expand Up @@ -201,8 +200,8 @@ Public Function CurrentBytesPointer() As Long
End Function

'Open a new buffer.
' Added by Tanner: allow the caller to specify a starting buffer size. This can be helpful if we have some idea
' of the buffer's size in advance, as we avoid the need for costly ReDim Preserve statements.
' Added by Tanner: allow the caller to specify a starting buffer size. In many cases in PD, I have some notion
' of the buffer's size in advance, which spares the need for costly ReDim Preserve statements during buffer construction.
Public Sub OpenBuf(Optional ByVal startingBufferSize As Long = 0)

If mOpen Then
Expand All @@ -218,9 +217,11 @@ Public Sub OpenBuf(Optional ByVal startingBufferSize As Long = 0)

End Sub

'Return the requested number of bytes, or all bytes if -1 is specified as the length. This function makes no changes
' to the buffer itself, but it *does* move the current buffer position by [length] bytes.
' NOTE: you obviously should not pass zero as the length.
Public Function ReadBytes(Optional ByVal Length As Long = -1) As Byte()
'Return requested number of bytes (or less if less in buffer) or
'all bytes.

'
'Requesting Length = 0 bytes (or any bytes when the buffer is empty)
'raises an exception since we can't return an "empty" Byte array.
Expand Down Expand Up @@ -256,16 +257,55 @@ Public Function ReadBytes(Optional ByVal Length As Long = -1) As Byte()
ReadBytes = Bytes
End Function

Public Function ScanForBytes(ByRef Target() As Byte) As Long
'Added by Tanner: variant approach to ReadBytes(), above. This function behaves identically to ReadBytes in that it
' moves the buffer pointer according to the requested length. HOWEVER, it only returns a pointer to the current position,
' not a full copy of the buffer bytes. This is ideal for subsequent API interactions like zLib, because we can simply
' pass it that pointer, sparing us the need for costly (and pointless) MoveMemory calls.
'
'IMPORTANT NOTE: adding data to the buffer may force it to allocate new memory, invalidating all previously returned
' pointers. You must use the return of this function immediately, as its correctness is not guaranteed if any other
' class functions are called.
Public Function ReadBytes_PointerOnly(Optional ByVal Length As Long = -1) As Long

'This If() block is roughly identical to ReadBytes, above
If mOpen Then
If Length < 0 Then
'A Length = -1 (or < 0) means read all.
If EOS Then
Err.Raise 5, TypeName(Me), "Nothing to read"
Else
Length = mSize - mPosition
End If
ElseIf Length > 0 Then
If mPosition + Length > mSize Then
'Handle requests extending past EOS.
Length = mSize - mPosition
End If
Else
Err.Raise 5, TypeName(Me), "Can't request 0 (no) bytes"
End If
Else
Err.Raise &H8004C300, TypeName(Me), "Not open"
End If

'Return a pointer to the current buffer position
ReadBytes_PointerOnly = VarPtr(Buffer(mPosition))

'Advance the buffer position identically to ReadBytes(), above
mPosition = mPosition + Length

End Function

Public Function ScanForBytes(ByRef target() As Byte) As Long
'Scan forward from Position, looking for Target value which can be
'one or more bytes.
'
'If found returns position index of first byte of Target, else
'returns -1.

If mOpen Then
ScanForBytes = InStrB(mPosition + 1, Buffer, Target) - 1
If ScanForBytes > mSize - (UBound(Target) - LBound(Target) + 1) Then ScanForBytes = -1
ScanForBytes = InStrB(mPosition + 1, Buffer, target) - 1
If ScanForBytes > mSize - (UBound(target) - LBound(target) + 1) Then ScanForBytes = -1
Else
Err.Raise &H8004C300, TypeName(Me), "Not open"
End If
Expand Down
2 changes: 1 addition & 1 deletion Classes/pdImage.cls
Original file line number Diff line number Diff line change
Expand Up @@ -1290,7 +1290,7 @@ Public Function getNumOfHiddenLayers() As Long
End Function

'Estimate the RAM usage for this image, based on the number of layers and the size of each layer
Public Function estimateRAMUsage() As Long
Public Function estimateRAMUsage() As Double

estimateRAMUsage = 0

Expand Down
4 changes: 2 additions & 2 deletions Classes/pdLayer.cls
Original file line number Diff line number Diff line change
Expand Up @@ -2292,11 +2292,11 @@ Public Function syncInternalDIB() As Boolean
End Function

'Estimate the RAM usage for this layer, based on known heavy memory users (layer DIB, in particular)
Public Function estimateRAMUsage() As Long
Public Function estimateRAMUsage() As Double

'Calculate the raw memory size of both the layer and the cached thumbnail DIB, if any
If Not (layerDIB Is Nothing) Then estimateRAMUsage = layerDIB.getDIBArrayWidth * layerDIB.getDIBHeight
If Not (tmpDIB Is Nothing) Then estimateRAMUsage = tmpDIB.getDIBArrayWidth * tmpDIB.getDIBHeight
If Not (tmpDIB Is Nothing) Then estimateRAMUsage = estimateRAMUsage + tmpDIB.getDIBArrayWidth * tmpDIB.getDIBHeight
If Not (layerThumbnail Is Nothing) Then estimateRAMUsage = estimateRAMUsage + layerThumbnail.getDIBArrayWidth * layerThumbnail.getDIBHeight

'Assume a 10% overhead for this class instance and other miscellaneous layer data
Expand Down
145 changes: 137 additions & 8 deletions Classes/pdPackager.cls
Original file line number Diff line number Diff line change
Expand Up @@ -1603,14 +1603,41 @@ Public Function getNodeDataByID(ByVal targetNodeID As Long, ByVal useHeaderBuffe
Next i

'If a matching node was found, use getNodeDataByIndex() to retrieve its data
If nodeIndex > 0 Then
If nodeIndex >= 0 Then
getNodeDataByID = getNodeDataByIndex(nodeIndex, useHeaderBuffer, dstArray(), disableChecksumValidation)
Else
getNodeDataByID = False
End If

End Function

Public Function getNodeDataByID_UnsafeDstPointer(ByVal targetNodeID As Long, ByVal useHeaderBuffer As Boolean, ByVal dstPointer As Long, Optional ByVal disableChecksumValidation As Boolean = False) As Boolean

'Search the node array for a matching ID
Dim i As Long, nodeIndex As Long
nodeIndex = -1

For i = 0 To UBound(m_NodeDirectory)

If targetNodeID = m_NodeDirectory(i).NodeID Then

'This node is the one!
nodeIndex = i
Exit For

End If

Next i

'If a matching node was found, use getNodeDataByIndex() to retrieve its data
If nodeIndex >= 0 Then
getNodeDataByID_UnsafeDstPointer = getNodeDataByIndex_UnsafeDstPointer(nodeIndex, useHeaderBuffer, dstPointer, disableChecksumValidation)
Else
getNodeDataByID_UnsafeDstPointer = False
End If

End Function

'getNodeHeaderByName and getNodeHeaderByID both wrap this function, getNodeHeaderByIndex. (Those functions are simply used
' to retrieve the relevant index for a node.)
'
Expand All @@ -1631,23 +1658,21 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe
dataOriginalSize = m_NodeDirectory(nodeIndex).NodeDataOriginalSize
End If

'Move the data buffer position to the relevant offset, then retrieve the data arary
Dim tmpDataArray() As Byte
'Move the data buffer position to the relevant offset for the requested chunk
m_DataBuffer.Position = dataOffset
tmpDataArray = m_DataBuffer.ReadBytes(dataPackedSize)

'If the original size and packed size are different, assume the data is compressed.
If (dataPackedSize <> dataOriginalSize) Then

'Make sure zLib is available. If it isn't, the user is screwed.
If m_ZLibAvailable Then

'Resize the temporary buffer to the original size of the data
'Resize the destination buffer to the original size of the data
ReDim dstArray(0 To dataOriginalSize - 1) As Byte

'Use zLib to decompress the data
Dim zLibResult As Long
zLibResult = uncompress(dstArray(0), dataOriginalSize, tmpDataArray(0), UBound(tmpDataArray) + 1)
zLibResult = uncompress(dstArray(0), dataOriginalSize, ByVal m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize)

If zLibResult <> 0 Then

Expand All @@ -1665,9 +1690,9 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe

End If

'The data is uncompressed. No further processing is necessary.
'If the data is uncompressed, make a direct copy of the corresponding chunk of the master buffer
Else
dstArray = tmpDataArray
dstArray = m_DataBuffer.ReadBytes(dataPackedSize)
End If

'Retrieve the stored checksum value for this node (if any).
Expand Down Expand Up @@ -1710,6 +1735,110 @@ Public Function getNodeDataByIndex(ByVal nodeIndex As Long, ByVal useHeaderBuffe

End Function

'WARNING! This function is for expert users only. Do not use it unless you fully understand the repercussions described below.
'
'Given a node index and a target buffer region (header or data), blindly copy the contents of that data buffer to a pointer supplied by the user.
' As with the normal getNodeDataByIndex() function, decompression and checksum validation is handled automatically, depending on the contents of the node.
'
'What makes this function different is that it completely relies on the caller to create a correctly sized buffer for the extracted data.
' For common cases, this is simply not possible, as the caller does not have enough knowledge necessary to prep such a buffer. Instead, it relies
' on pdPackager to construct the destination buffer for it, using the size values stored inside the pdPackage instance.
'
'PhotoDemon is unique in this regard, as it can correctly calculate the size of raster layer pixel buffers (because it knows the associated
' width, height, stride and color depth of the buffer). Because PD has to create a matching DIB in advance, a destination buffer of perfect size
' is already implicitly allocated, so there is no need for a temporary VB array to shuttle the data around.
'
'That said, note that PD only uses this function sparingly, when it is *absolutely certain of its safety*. You would be wise to do the same.
' An incorrectly allocated destination buffer will cause mass destruction (and a guaranteed hard crash), so do not use this function ignorantly.
Public Function getNodeDataByIndex_UnsafeDstPointer(ByVal nodeIndex As Long, ByVal useHeaderBuffer As Boolean, ByVal dstPointer As Long, Optional ByVal disableChecksumValidation As Boolean = False) As Boolean

'Retrieve the offset, packed size, and original (unpacked) size of the target data
Dim dataOffset As Long, dataPackedSize As Long, dataOriginalSize As Long

If useHeaderBuffer Then
dataOffset = m_NodeDirectory(nodeIndex).NodeHeaderOffset
dataPackedSize = m_NodeDirectory(nodeIndex).NodeHeaderPackedSize
dataOriginalSize = m_NodeDirectory(nodeIndex).NodeHeaderOriginalSize
Else
dataOffset = m_NodeDirectory(nodeIndex).NodeDataOffset
dataPackedSize = m_NodeDirectory(nodeIndex).NodeDataPackedSize
dataOriginalSize = m_NodeDirectory(nodeIndex).NodeDataOriginalSize
End If

'Move the data buffer position to the relevant offset for the requested chunk
m_DataBuffer.Position = dataOffset

'If the original size and packed size are different, assume the data is compressed.
If (dataPackedSize <> dataOriginalSize) Then

'Make sure zLib is available. If it isn't, the user is screwed.
If m_ZLibAvailable Then

'Use zLib to decompress the data directly into the pointer we've been given
Dim zLibResult As Long
zLibResult = uncompress(ByVal dstPointer, dataOriginalSize, ByVal m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize)

If zLibResult <> 0 Then

raiseInternalMessage "File node is compressed, but zLib failed to decompress it. Error code returned: %1", zLibResult
getNodeDataByIndex_UnsafeDstPointer = False
Exit Function

End If

Else

raiseInternalMessage "File node is compressed, but zLib is missing. Decompression impossible; abandoning load."
getNodeDataByIndex_UnsafeDstPointer = False
Exit Function

End If

'If the data is uncompressed, make a direct copy of the corresponding chunk of the master buffer
Else
CopyMemory dstPointer, m_DataBuffer.ReadBytes_PointerOnly(dataPackedSize), dataPackedSize
End If

'Retrieve the stored checksum value for this node (if any).
Dim checkSumOriginal As Long

If useHeaderBuffer Then
checkSumOriginal = m_NodeDirectory(nodeIndex).NodeHeaderAdler32
Else
checkSumOriginal = m_NodeDirectory(nodeIndex).NodeDataAdler32
End If

'Apply checksum validation now, unless one of three criteria is met:
' 1) The node does not contain a checksum to validate against
' 2) The user has forcibly disabled checksumming this node
' 3) zLib is missing
If (checkSumOriginal <> 0) And (Not disableChecksumValidation) And m_ZLibAvailable Then

'Like CRC32 functions, Adler checksums accept a previous value as their initial input. If you don't want to supply
' this, you can supply a null buffer to get the library's recommended initial value. (That's what we do here.)
Dim zLibAdlerSum As Long
zLibAdlerSum = checkSumArbitraryData(dstPointer, dataOriginalSize)

'If the checksums do not match, fail the function and exit
If checkSumOriginal <> zLibAdlerSum Then
raiseInternalMessage "Checksum failed for node #" & nodeIndex & ". Expected value: " & checkSumOriginal & ", calculated value: " & zLibAdlerSum & ". Load abandoned."
getNodeDataByIndex_UnsafeDstPointer = False
Exit Function
Else
If useHeaderBuffer Then
Debug.Print "Checksum successfully verified for node #" & nodeIndex & " (header)."
Else
Debug.Print "Checksum successfully verified for node #" & nodeIndex & " (data)."
End If
End If

End If

'If we made it all the way here, the node has been successfully loaded!
getNodeDataByIndex_UnsafeDstPointer = True

End Function

'If you want to use compression functions, you must provide the class with a path to a STDCALL (WAPI) copy of zLib, including
' "/zlibwapi.dll" at the end of the path. This class assumes a file named "zlibwapi.dll"; if you are using one called zLib.dll,
' you must rewrite the zLib API function declarations to match.
Expand Down
Loading

0 comments on commit 1090def

Please sign in to comment.