This release introduces a major new feature, tangent frame generation, significantly improves vertex decoding performance on Intel/AMD CPUs and also adds several smaller features and improvements. Highlights:
A new tangent frame generator can generate tangents from either indexed or unindexed triangle mesh with positions/normals/texture coordinates. The generator implements MikkTSpace algorithm, but is significantly faster to run (~6-10x faster than mikktspace.c depending on the input structure). Note that the generation result is a per-corner tangent and applying the tangents to the indexed mesh may require splitting source vertices; consult documentation for details. By default, the algorithm uses a modified weighting scheme that significantly improves tangent quality around beveled regions in the mesh, with meshopt_TangentCompatible option provided for cases where exact compatibility with mikktspace.c is important (e.g. normal map baking workflows).
Vertex decoding (meshopt_decodeVertexBuffer) implementation for Intel/AMD CPUs has been significantly revised. The new mostly branchless SSSE3 implementation is usually ~20-45% faster than it was in previous releases, with the gains depending on CPU and the data composition; it's typical to see gains in the upper half of this range for engine-packed data. This implementation is automatically selected on compatible CPUs (SSSE3+POPCNT), with no change in data encoding, so to get this performance boost a library update is sufficient. If the code is compiled with AVX-512/AVX10 support (which is currently only selected at compile time when opted into), the decoding is an additional ~10% faster.
Additionally, new index filtering functions are provided to remove degenerate/duplicate triangles based on positional identity, which can be especially helpful for raytracing performance, a function for computing optimal shared exponent for cluster positions can be used to prepare geometry for upcoming DXR2 Compressed1 format, and clusterlod.h now implements DAG BVH construction via clodBuildHierarchy.
The majority of the work on the core library in this release has been sponsored by Valve; thank you!
Library improvements
- New experimental function for generating tangents based on MikkTSpace algorithm,
meshopt_generateTangents - New experimental functions for filtering out degenerate and duplicate triangles based on positional identity,
meshopt_filterIndexBuffer/meshopt_filterIndexBufferMulti - New experimental function for computing optimal shared exponent for cluster positions which can be used with upcoming DXR2 Compressed1 format
meshopt_encode/decodeMeshlet*functions,meshopt_extractMeshletIndicesandmeshopt_optimizeMeshletLevelfunctions, as well asmeshopt_SimplifyVertex_Priorityandmeshopt_SimplifyRegularizeLightflags, are now stable.- Significantly improve
meshopt_decodeVertexBufferperformance on existing data for Intel/AMD CPUs (20-45% faster depending on CPU and data characteristics) - Significantly improve performance of
meshopt_partitionClusterson larger partition sizes (~2x fortarget_size64, ~200x for 1024) - Improve post-compression ratio for meshlets encoded using meshlet codec after
meshopt_optimizeMeshletLevelwith level 1+ (~0.5% gains) - Fix reduced encoding precision of small numbers when using
meshopt_encodeFilterExpwith a shared exponent mode if the input contains exact zeroes - Support special meshlet hardware configurations that require a limited triangle index span via
MESHOPTIMIZER_CLUSTERIZER_INDEXLIMITdefine - Support direct decoding of vertex data into destination buffer via
MESHOPTIMIZER_VERTEXCODEC_ZEROCOPYdefine for slightly faster decoding (disabled by default as it does not work well with write-combined memory)
Additional improvements
clusterlod.hnow implements DAG BVH construction viaclodBuildHierarchyfor efficient hierarchical cut selection- Add tangent generation to
gltfpackwhen requested via-gtargument - Add experimental
MeshoptTangentsJavaScript module with the new tangent space generator - Fix
meshopt_decoder.jsWasm SIMD implementation corner case when using v1 format and highest compression (part of 1.1.1 patch release) - Fix a rare race condition in
meshopt_decoder.jswhen using WebWorkers viauseWorkers(part of 1.1.1 patch release) - Improve vertex decoding performance in
meshopt_decoder.jsby 5-10% - Fix several bugs in
gltfpack(mesh merging with negative scales no longer flips tangent frames, more careful handling of animation tracks with zero scale)