Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single video pulled twice results in different video files of same size #5700

Closed
eriktjacobsen opened this issue May 14, 2015 · 1 comment
Closed

Comments

@eriktjacobsen
Copy link

@eriktjacobsen eriktjacobsen commented May 14, 2015

I've noticed when pulling the same video from youtube, even though the files are exactly the same (9,201,043 bytes in this case), there are 52 bytes near the beginning that differ slightly. (diff below)

This causes the md5 hash of the file to be completely different, and makes it difficult to track duplicate videos in a system we are building, among other issues.

Currently, our fix is to simply drop a fixed number of bytes off the file as the changes seem to be always in the header, first 0.2% of the file. The issue with this is that then the file's hash doesn't actually match the computed hash which is pretty non-intuitive and could cause problems down the line.

It would be really nice to have an option for youtube-dl to simply overwrite these bytes with known working data if they are coming from youtube, or force itself to write same bytes if they are getting written from youtube-dl, that way the resulting file will be the same.

Also, if anyone else knows what these bytes represent in the header and what it would be safe to overwrite them with, I could do this myself. Though still an option would be best, as I'm sure others have found this issue.

Thanks!

1173,1174c1173,1174
< 0004940: 4146 4346 3134 324d 4831 3433 3135 3634  AFCF142MH1431564
< 0004950: 3734 3333 3637 3332 3800 0000 0000 0000  743367328.......

---
> 0004940: 4146 4346 3545 3248 4831 3433 3135 3635  AFCF5E2HH1431565
> 0004950: 3032 3232 3933 3439 3400 0000 0000 0000  022293494.......
1195,1196c1195,1196
< 0004aa0: 0001 0000 0000 7238 2d2d 2d73 6e2d 6f30  ......r8---sn-o0
< 0004ab0: 3937 7a6e 6536 2e67 6f6f 676c 6576 6964  97zne6.googlevid

---
> 0004aa0: 0001 0000 0000 7238 2d2d 2d73 6e2d 6e77  ......r8---sn-nw
> 0004ab0: 6a37 6b6e 6536 2e67 6f6f 676c 6576 6964  j7kne6.googlevid
@jaimeMF
Copy link
Collaborator

@jaimeMF jaimeMF commented May 14, 2015

Duplicate of #2885

@jaimeMF jaimeMF closed this May 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.