Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON contains full file name (and possibly other identifying metadata) and this cannot be disabled #25576

Closed
6 tasks done
hunter0002 opened this issue Jun 7, 2020 · 3 comments

Comments

@hunter0002
Copy link

hunter0002 commented Jun 7, 2020

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2020.06.06
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Description

The generated JSON (flag --write-info-json) always includes the full directory name of the video file, which can be a privacy problem if the JSON is or has to be made public, since this leaks the user's OS and often the name of his/her home directory. For example, this affects millions of videos on the Internet Archive, for which the uploaders may not have been aware of this problem. It should be possible -- by default or upon using another flag -- to directly generate a JSON file that contains a minimal amount of identifying information. See bibanon/tubeup#119.

@vxbinaca
Copy link
Contributor

vxbinaca commented Jun 7, 2020

this affects millions of videos on the Internet Archive

it's probably more like 400,000+

He wants directory path removed from the JSON metadata.

@dstftw
Copy link
Collaborator

dstftw commented Jun 7, 2020

No idea what are you even talking about, it does not include any directory names.

@brandongalbraith
Copy link

brandongalbraith commented Jun 10, 2020

youtube-dl includes a _filename key with a value of the full path to an asset retrieved from the targeted video url when using the --write-info-json flag and rendering the JSON file for video metadata. This could leak your local workstation username, as well as other sensitive information, depending on what information is contained within the path where you're downloading to using youtube-dl if you intend to distribute the JSON file in question (in the case of https://github.com/bibanon/tubeup, we're uploading this file as part of a bundle for each video to create items in the Internet Archive). The IP address of the client is also being included as a parameter within the URL being built to download various formats from Youtube:

https://r4---sn-vgqsrnek.googlevideo.com/videoplayback?expire=1591769617&ei=sCXgXs6KOp6Dir4Pk4udyAk&ip=REDACTED&id=o-AOEwMs7i56Fm6oQh65ftjSdwajjLN1p5vL_Dr2cE1Jip&itag=140&source=youtube&requiressl=yes&mh=OV&mm=31%2C26&mn=sn-vgqsrnek%2Csn-5ualdnel&ms=au%2Conr&mv=m&mvi=3&pl=17&pcm2=no&initcwndbps=1916250&vprv=1&mime=audio%2Fmp4&gir=yes&clen=65725446&dur=4061.123&lmt=1575811774555276&mt=1591747896&fvip=4&keepalive=yes&c=WEB&txp=3531432&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cpcm2%2Cvprv%2Cmime%2Cgir%2Cclen%2Cdur%2Clmt&sig=AOq0QJ8wRQIgGZMMsj971PKlKCjqx34aYvhULs-gW-8qqXAqq1t0t2wCIQCwjEMDVHNRzB0Z30OWTl6aIsFYIlRh04RcOs1xurtE3w%3D%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AG3C_xAwRgIhAJX6TBtQ44tSsX9mhEK8zrT-hz4VZw4G46UUCOLIBunqAiEA9qsvhCMUJMxEmuEqzhnkeBYoWWSoF8AsyiVCWCteTks%3D&ratebypass=yes"

@dstftw Would you accept a PR sanitizing this data? Or should I handle sanitizing on our end (bibanon/tubeup#119)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants