Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
I've noticed when pulling the same video from youtube, even though the files are exactly the same (9,201,043 bytes in this case), there are 52 bytes near the beginning that differ slightly. (diff below)
This causes the md5 hash of the file to be completely different, and makes it difficult to track duplicate videos in a system we are building, among other issues.
Currently, our fix is to simply drop a fixed number of bytes off the file as the changes seem to be always in the header, first 0.2% of the file. The issue with this is that then the file's hash doesn't actually match the computed hash which is pretty non-intuitive and could cause problems down the line.
It would be really nice to have an option for youtube-dl to simply overwrite these bytes with known working data if they are coming from youtube, or force itself to write same bytes if they are getting written from youtube-dl, that way the resulting file will be the same.
Also, if anyone else knows what these bytes represent in the header and what it would be safe to overwrite them with, I could do this myself. Though still an option would be best, as I'm sure others have found this issue.
Thanks!