Releases: pinterf/mvtools
Mvtools2 2.7.46 with DePans
Change log (MvTools)
- 2.7.46 (20240503) *
- Recheck and fix build processes for various compilers
(Visual Studio MSVC v143, v141_xp; Intel C++ Compiler 2024.1 ICX, 19.2 ICL; ClangCL; gcc mingw64) - Fix (#56): bug in Cross search
- (unreleased 2.7.46 (20230208) test build in #58)
- Fix (#58): MFlowFPS memory leak
- Fix (#49): lsad 0 caused division-by-zero crash
- Optimization: MDegrain1-6 8 bit: Add avx2 code path (already was in code but disabled, now go live)
- Optimization: MDegrainN 10-16 bits: add SSE4.1 code path (was: C only)
- Optimization: VerticalBicubic interpolation SSE4.1 version besides SSE2
- Fix: MRestoreVect was trying to create unaligned frame (crash)
https://forum.doom9.org/showthread.php?p=1955944#post1955944 - (#48) Project files/solution: Intel C++ Compiler 2021 (icx) and 19.2 (icl classic) support on Windows
- source: pull avstp 1.04 helper files and reapply earlier patches
- Source internals: MDegrain: for block size of 4, read exactly 4 bytes instead of 8. (in extreme cases it would read past a valid memory area)
- Source internals: Stop using _mm256_zeroupper in avx2, compilers do that automatically.
Affected LLVM builds, because it unoptimizedly saved and restored all ymm registers (slow!)
https://stackoverflow.com/questions/68736527/do-i-need-to-use-mm256-zeroupper-in-2021
- Recheck and fix build processes for various compilers
Change log (DePan and DePanEstimate)
- Moved to Visual Studio 2022, v141_xp and v143 toolset, Intel Compiler ICX 2024.1 and ICL build support
Depan 2.14
- Fix: "DepanScenes" plane parameter for YUY2 clips did not work
DepanEstimate 2.11
- Throw an error if memory allocation fails
MvTools2 2.7.45 with depans.
Change log
(mvtools2 only, depans are unchanged)
- 2.7.45 (20210608)
- Fix: change parameter 'ml' from int to float in MBlockFPS. (Other filters with 'ml' are O.K.: MMask, MFlowInter, MFlowFPS are using float.)
- Fix MBlockFPS html doc as well, which mentions 'thres' instead of 'ml'. Add mode 5-8 to MBlockFPS docs
- Move change log from readme to CHANGELOG.md
- Code change/speedup: MSuper: rfilter=0 and 1
8 bit: drop old SSE code, port to SIMD intrinsics. Add SIMD to 16 bit case. Quicker, much quicker.
(rfilter: Hierarchical levels smoothing and reducing (halving) filter) - Code change/speedup: MSuper: sharp=1 for pel=2 or 4
Bicubic resizer drop old SSE code, port to SIMD intrinsics, implement SIMD intrinsics to 16 bit case.
No need for Bilinear.asm and Bilinear-x64.asm any more. - SATD: add 8 bit C versions (geee, there wasn't one) (as an alternative to the external asm)
- SAD: add internal SIMD for 8 bit SAD (SSE4.1) (as an alternative to the external asm)
- Overlaps: Add internal SIMD for 8 bit. (as an alternative to the external asm)
- In def.h any existing external assembler file can be disabled.
The primary reason for this was to quickly test the linux port, for me this was easier than bothering with asm compilation and linking.
For non-Windows cases all of these are disabled now.- USE_COPYCODE_ASM (CopyCode-a.asm). Has internal alternative. Same speed.
- USE_OVERLAPS_ASM (Overlap-a.asm). asm implements 8 bit only. Has internal SIMD alternative. About the same speed.
- USE_SAD_ASM (sad-a.asm) asm implements 8 bit only. Note: Internal 8 bit SIMD SAD is a bit slower that these handcrafted ones.
- USE_SATD_ASM (Pixel-a.asm) asm implements 8 bit only. Note: SATD 8 bit has no SIMD replacement yet.
- USE_LUMA_ASM (Variance-a.asm) asm implements 8 bit only. Has internal alternative.
- USE_FDCT88INT_ASM (fdct_mmx.asm, fdct_mmx_x64.asm)
Only used for 8x8 block sizes. Quick integer version instead of fftw3.
No internal alternative, fftw3 is used instead. - USE_AVSTP (do find search for avstp.dll on Windows)
- Minor and not so minor cosmetics, mainly for GCC.
- Add Cmake build system.
- MvTools2: Linux/GCC port (needs sse4.1), Dewindowsification.
fftw3: MAnalyze dct modes that require fftw3 library will search for libfftw3f_threads.so.3
Install either libfftw3-single3 (deb) or fftw-devel (rpm) package");
e.g. sudo apt-get update
sudo apt-get install libfftw3-dev - Not done (will be done in a second phase):
Add back some external asms for linux build. For 8 bit SAD and SATD mainly.
Linux port is still Intel-only, though every part has C alternative by now. - Separate the 3 projects (mvtools2, depan, depan_estimate).
- DePan and DePanEstimate: Linux port
- DePanEstimate: add fft_threads variable (default 1) for fftw3 mode (experimental)
- DepanEstimate: add MT guard around sensible fft3w functions
- mingw build fixes
- Add build instructions to README.md
- experimental avx2 for MDegrain1..6 (was not worth speedwise on my i7700 - memory transfer is bottleneck)
- experimental 32-bit float internal Overlaps result buffer for MDegrain1..6
whether if it is any quicker/more exact than the integer-scaled arithmetic version
(when out32=true, do not use, it is only for development/test, maybe will be removed in the future)
MvTools2 2.7.44 with depans.
MvTools2 + Depan + DepanEstimate
Change log
** 2.7.44 (20201214)**
- MAnalyze: fix motion vector generation inconsistencies across multiple runs.
Note: when internal multithreading is used (avstp + mt=true), inconsistencies will still occur by design.
** 2.7.43 (20200602)**
- MCompensate: fix crash for GreyScale formats when overlap is used
** 2.7.42 (20200522)**
- MDegrain family: limit and limitc to float, allowing more granurality for 10+ bit depth
- Update Avisynth headers, use V8 interface frame property copy if available
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.5 (20200522)
DepanEstimate: 2.10.0.4 (20200522)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported, special DLLs with v141_xp toolset
MvTools2 2.7.43 with depans
MvTools2 + Depan + DepanEstimate
Change log
** 2.7.43 (20200602)**
- MCompensate: fix crash for GreyScale formats when overlap is used
** 2.7.42 (20200522)**
- MDegrain family: limit and limitc to float, allowing more granurality for 10+ bit depth
- Update Avisynth headers, use V8 interface frame property copy if available
2.7.41 (20200430)
- (MVTools2 and DepanEstimate unchanged)
- Fixes in Depan.DLL:
Depan 2.13.1.5:- fix regression: DepanInterleave colorspace check allowed no planar, only YUY2
- Depan: Fix an issue when Depan would search for inputlog file even when there was no inputlog filename provided
- Depan: Fix crash on YUY2 input on internal YV16 conversion
2.7.41 (20190502)
- Fix: regression since 2.7.35: MSuper chroma for non-planar YUY2 (Thanks to mkauf)
- Project moved to Visual Studio 2019, v141_xp and v142 toolset
2.7.40 (20190212)
- Fix: MFlowInter possible crash with specific parameter and colorspace settings.
- Fix: MFlowInter possible overflow at 16 bit clips (artifacts)
- MFlow, MFlowInter, MFlowFPS, MFlowBlur, MBlockFPS: support all 8-16 and 32 bit float Y, YUV (4:2:0, 4:2:2, 4:4:4) and planar RGB color spaces.
Input clip format is independent from the ones used for getting motion vectors.
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.5 (20200522)
DepanEstimate: 2.10.0.4 (20200522)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported, special DLLs with v141_xp toolset
MvTools2 2.7.42 with depans
MvTools2 + Depan + DepanEstimate
Change log
** 2.7.42 (20200522)**
- MDegrain family: limit and limitc to float, allowing more granurality for 10+ bit depth
- Update Avisynth headers, use V8 interface frame property copy if available
2.7.41 (20200430)
- (MVTools2 and DepanEstimate unchanged)
- Fixes in Depan.DLL:
Depan 2.13.1.5:- fix regression: DepanInterleave colorspace check allowed no planar, only YUY2
- Depan: Fix an issue when Depan would search for inputlog file even when there was no inputlog filename provided
- Depan: Fix crash on YUY2 input on internal YV16 conversion
2.7.41 (20190502)
- Fix: regression since 2.7.35: MSuper chroma for non-planar YUY2 (Thanks to mkauf)
- Project moved to Visual Studio 2019, v141_xp and v142 toolset
2.7.40 (20190212)
- Fix: MFlowInter possible crash with specific parameter and colorspace settings.
- Fix: MFlowInter possible overflow at 16 bit clips (artifacts)
- MFlow, MFlowInter, MFlowFPS, MFlowBlur, MBlockFPS: support all 8-16 and 32 bit float Y, YUV (4:2:0, 4:2:2, 4:4:4) and planar RGB color spaces.
Input clip format is independent from the ones used for getting motion vectors.
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.5 (20200522)
DepanEstimate: 2.10.0.4 (20200522)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported, special DLLs with v141_xp toolset
MvTools2 2.7.41 with depans, Depan fix edition
MvTools2 + Depan + DepanEstimate
Change log
2.7.41 (20200430)
- (MVTools2 and DepanEstimate unchanged)
- Fixes in Depan.DLL:
Depan 2.13.1.5:- fix regression: DepanInterleave colorspace check allowed no planar, only YUY2
- Depan: Fix an issue when Depan would search for inputlog file even when there was no inputlog filename provided
- Depan: Fix crash on YUY2 input on internal YV16 conversion
2.7.41 (20190502)
- Fix: regression since 2.7.35: MSuper chroma for non-planar YUY2 (Thanks to mkauf)
- Project moved to Visual Studio 2019, v141_xp and v142 toolset
2.7.40 (20190212)
- Fix: MFlowInter possible crash with specific parameter and colorspace settings.
- Fix: MFlowInter possible overflow at 16 bit clips (artifacts)
- MFlow, MFlowInter, MFlowFPS, MFlowBlur, MBlockFPS: support all 8-16 and 32 bit float Y, YUV (4:2:0, 4:2:2, 4:4:4) and planar RGB color spaces.
Input clip format is independent from the ones used for getting motion vectors.
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.4 (20190502)
DepanEstimate: 2.10.0.3 (20190502)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported, special DLLs with v141_xp toolset
MvTools2 2.7.41 with depans
MvTools2 + Depan + DepanEstimate
Change log
2.7.41 (20190502)
- Fix: regression since 2.7.35: MSuper chroma for non-planar YUY2 (Thanks to mkauf)
- Project moved to Visual Studio 2019, v141_xp and v142 toolset
2.7.40 (20190212)
- Fix: MFlowInter possible crash with specific parameter and colorspace settings.
- Fix: MFlowInter possible overflow at 16 bit clips (artifacts)
- MFlow, MFlowInter, MFlowFPS, MFlowBlur, MBlockFPS: support all 8-16 and 32 bit float Y, YUV (4:2:0, 4:2:2, 4:4:4) and planar RGB color spaces.
Input clip format is independent from the ones used for getting motion vectors.
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.4 (20190502)
DepanEstimate: 2.10.0.3 (20190502)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported, special DLLs with v141_xp toolset
MvTools2 2.7.40 with depans
MvTools2 + Depan + DepanEstimate
Change log
2.7.40 (20190212)
- Fix: MFlowInter possible crash with specific parameter and colorspace settings.
- Fix: MFlowInter possible overflow at 16 bit clips (artifacts)
- MFlow, MFlowInter, MFlowFPS, MFlowBlur, MBlockFPS: support all 8-16 and 32 bit float Y, YUV (4:2:0, 4:2:2, 4:4:4) and planar RGB color spaces.
Input clip format is independent from the ones used for getting motion vectors.
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.3 (20170525)
DepanEstimate: 2.10.0.2 (20170525)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported
MvTools2 2.7.39 with depans
MvTools2 + Depan + DepanEstimate
Change log
2.7.39 (20190102)
- MSuper: fix 16 bits, pel=2, sharp=2, which caused bottom-section artifacts for MDegrain using 8 bit vector origin and 16 bit real clip
- MDegrain1-6,N: Enhanced:
Input clip (and super) format now is fully independent from vector clip's base format (subsampling had to be the same before)
E.g. make motion vector clips from a YV12 source and apply them on a 8-32 bit 4:4:4 input
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.3 (20170525)
DepanEstimate: 2.10.0.2 (20170525)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported
MvTools2 2.7.38 with depans
MvTools2 + Depan + DepanEstimate
Change log
2.7.38 (20181209)
- MCompensate: Fix regression in latest v37: overlap=0 crash - ouch, sorry
- MAnalyze DCT: FFTW float: quicker postprocess of DCT'd blocks, now is correct for non power-of-2 block sizes e.g. 12 or 24 (no effect on 8x8 for which fast integer DCT is used)
- MAnalyze DCT: more consistent handling of post DCT internal normalization for non-square block sizes.
2.7.37 (20181128)
- MCompensate: limit thSAD, thSAD2, thSCD1 to valid range 0-(8x8x255) (e.g. given thSAD = 100000 will go back to 16320)
- Fix: MCompensate: use int64 to avoid effective thSAD and thSAD2 overflow typically happen at bigger block sizes or large thSAD parameter value.
- MCompensate: SSE2 (8bit) and SSE4 (10-16 bit) overlap result calculation
- Changed: SAD 8x8, 8x4, 4x4, 4x8 to use SSE2 instead of MMX registers
- Fix: MDegrain if overlap<>0: missing rounder in rightmost 8 pixels for non-mod8 width 8 bit clips
- Fix: MSuper artifacts at 10-32 bits and nPel==4
- New: MCompensate 32 bit float and planar RGB support (by using motion vectors made from 8-16 bit YUV clip). Input and super clip can be of a different format than the one used for motion vector creation. (Similar to MDegrain1-6)
2.7.36 (20181120)
- Fix: Allow overlap operation when there are only two blocks in either horizontal or vertical direction (was: division by 0 crash)
- Fix: Fallback to overlap=0 mode when block count is only 1 in either h or v direction (was: undefined behaviour) The cases above occured for small frame sizes, when frame size and overlap values resulted in less than 3 blocks in a direction.
- Misc: update html docs with overlap drawing and others.
2.7.35 (20181113)
- MFlowXXX: Slight speed gain by putting the out-of-frame vector check to resizer
- Fix: MAnalyze: Fix a possible internal overflow on larger blocksizes with small overlap e.g. BlkSize=32, OverlapX=0, OverlapY=4
- MSuper: Planar RGB support.
- MFlowFPS: Planar RGB support: generate vectors in YUV, use RGB input and super clip
- MFlowFPS: less memory for 4:4:4 and greyscale
- MFlowFPS: a bit faster 4:4:4, much faster greyscale
- MFlowFPS: support different bit depth for clip and the vectors (use vectors from 8 bit analysis for a 16 bit clip)
2.7.34 (20181108)
- MFlowInter: Use less memory, eliminates ten full-frame internal buffers
2.7.33 (20181021)
- Fix: MFlowXX: random access violation caused by enlarged vectors pointing on out-of-frame positions
2.7.32 (20181018)
- New: MAnalyze: Enhance mt mode report for Avisynth+: MT_SERIALIZED instead of MT_MULTI_INSTANCE when temporal=true or using output file.
- Fix: MAnalyze: fix a possible internal overflow on larger blocksizes and lambda combinations. e.g. truemotion=true with blksize=32
2.7.31 (20180409)
- Fix: MFlow: SC detection after having the mv clip. Fixed in 2.5.11.22 but was missed during 2.6.0.5 merge.
- Fix: MFlow: crash in 16bit 4:2:0, mode=1
- Fix: MDegrain, out16=true: Green bottom lines when overlap blocks are not covering the full vertical area
2.7.30 (20180405)
- Fix: crash in MFlowInter (and possibly other MFlow...). v2.7.29 revelead this additional bug (which was not even 100% reproducible), this fix is basically the 2nd part of the solution.
2.7.29 (20180403)
- Fix: MFlowInter (and possibly other MFlow...) crash with specific combination of analyze parameters (e.g. blkSize=16,overlapv=4,divide=1). Bug existed since at least 2.5.11.22
2.7.28 (20180323)
- Fix: in MDegrain1-6/N allow Y8 input for out16 parameter
2.7.27 (20180318)
- Fix: MDepan: use zerow parameter. The parameter had no effect probably since it had been introduced. (veins1)
- MDepan: report MT mode for Avisynth+. MT_MULTI_INSTANCE, except for logfile writing output mode when it reports MT_SERIALIZED. (other filters already have proper registration, MDepan was missed)
2.7.26 (20180314)
- New: MDegrain1-6 and N: new parameter bool "out16" = false. If set, 8 bit input results in native 16bit output (like lsb=true hack but this is native). Faster than lsb=true by up to 12% (i7-7700)
- Faster: special 10 bit SAD functions instead of the generic 10-16bit ones. Depending on the block size, 4-17% gain for a typical MDegrain1-6 session (for exactly 10 bit clips)
2.7.25 (20180227)
- Fix: x64: not-cleared mmx state in MSuper assembly code would cause crash later, e.g. in x264 encoding, depending on following filters.
- Fix: MSCDetection SC value parameter name to Ysc from Yth (must be an ancient typo), docs are OK, but the fix is mentioned in docs
- MSuper: import 8 bit sse2 interpolators from mvtools-vs. Extend them for 10-16bits (faster super clip). Some filters are still todo.
- MSuper: support 32bit float clips, which can be used later by MDegrains (but not for MAnalyse)
- MDegrains: allow degraining clip with different bit depth from vectors. Clip and Super must be the same bit depth
- MDegrains: consistently use limit and limitC, 255 do nothing, otherwise scale 0-254 value to the current bit-depth range
- Overlaps: more correct internal rounding for 8 bits:
old: pixel = Sum( (tmp + 256) >> 6) >> 5
new: pixel = (Sum( (tmp + 32) >> 6) + 16) >> 5 - Overlaps: round for 16bits
old: pixel = Sum(tmp) >> 11
new: pixel = (Sum(tmp) + 1024) >> 11 - Overlaps: 32bit float (but still use the original 11 bit window constants)
- Project: change from yasm to nasm.
2.7.24 (20171205)
- Fix: MFlowBlur: possible access violation crash when nPel>1
- New: MScaleVect parameter 'bits'. e.g. Analyze 8 bit clips, use their vectors for 16 bits
- Move project to VS2017
2.7.23 (20171012)
- Fix: MScaleVect wrong rounding of scaled motion vectors with negative components. e.g. proper scaling (-1;-2) to (-2;-4) instead of (-1,-3)
2.7.22 (20170830)
- Misc: Stop using version suffix .22
- Fix: [DCT 8x8@8bit] garbage on x64: internal assembly code did not save xmm6/xmm7
- Fix: [DCT 8x8@8bit] safe multithreading for integer DCT (8x8 block size, 8 bit video): assembly had a single working buffer.
- Fix: [MDegrain] did not release input motion vector clips in destructor, possible hang at script closing. Bug since 2.7.1.22 (introducing MDegrain4/5)
- Mod: fftw conversion constant of sqrt(2)/2 is more accurate (was:0.707), 16 bit formats may benefit (by feisty2)
Fix: SSE4 assembly instructions in x64, broke on non-SSE4 processors
For full changelog see documentation.
Bundled filters:
Depan: 2.13.1.3 (20170525)
DepanEstimate: 2.10.0.2 (20170525)
General support of 10-16 bit formats with Avisynth Plus (r2294 or newer recommended)
x86 and x64 builds
Windows XP is still supported