Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Remove sensitive personal information from info.json files #42

Closed
DarkMahesvara opened this issue Feb 1, 2021 · 14 comments

Comments

@DarkMahesvara
Copy link

Checklist

  • [ x ] I'm reporting a feature request
  • [ x ] I've verified that I'm running yt-dlp version 2021.01.29
  • [ x ] I've searched the bugtracker for similar feature requests including closed ones

Description

Would it be possible to remove sensitive personal information such as IP adress and OS username from the info.json file? Since official YTDL doesn't seem to care.

more info on the problem:
ytdl-org/youtube-dl#25576
ytdl-org/youtube-dl#25681

@pukkandan
Copy link
Member

The _filename entry can be easily stripped out, but the IP from the URL cannot; since the URL is useless without the IP. This will break the functionality of --load-info-json

I could, prevode a new switch to strip out identifying info, but again, this only really applied to youtube videos since I do not know what other identifying info could be present in other extractors' URLs. In my honest opinion, it is far better to create a small python script to be called using --exec that removes the information you deem troublesome.

Let me know your thoughts, and whether or not you think an additional script will satisfy your needs

@DarkMahesvara
Copy link
Author

a python script that can be used with --exec should be sufficient. It would be nice if such a script could be added to the Readme with a disclaimer about it on the --write-info-json section.

@pukkandan
Copy link
Member

Upon further investigation, it seems that your request is related to tubeup. On a quick read of tube-up source code, it seems that tubeup already post-processes the json. So it should be easy to strip the data from their end. Am I wrong?

@DarkMahesvara
Copy link
Author

oh didn't know about that. They say that there isn't a fix yet but here is a pull request for it.

@pukkandan
Copy link
Member

I have pushed a patch to remove _filename, but like I said above, the IP cannot be removed from my end without sacrificing existing functionality. So I am closing this issue.

I try to help the tubeup devs implement this on their side.

@Tzahi12345
Copy link

I'd personally like to keep this information. Without it, my project simply won't work with this fork. Would it be possible to add an arg like --keep-sensitive-info or the inverse?

@pukkandan
Copy link
Member

@Tzahi12345 I only removed the _filename field which (by python naming convention) is for internal use only anyway. Could you explain how/why you are using this field?

Note that this filename maynot be the correct final filename depending on the options used

@Tzahi12345
Copy link

@pukkandan I use it in my project YoutubeDL-Material to locate the file after it's been downloaded, it all happens in the backend code and so it would be difficult to find the file otherwise (especially if users have a custom output like %(id)s)

Note that this filename maynot be the correct final filename depending on the options used

Yeah this is a pain to deal with but I basically just restrict the file type to mp3/4 and so I simply strip out the bad extension from _filename and replace it with the right one

@pukkandan
Copy link
Member

@Tzahi12345 The correct way to get the filename after download is to either look at the infodict at the end (if using API) or to use --exec. Anyway, I'll see whether I can give an option to keep it in the infojson

@pukkandan
Copy link
Member

@Tzahi12345 I have added an option --no-clean-infojson to prevent any fields from being filtered out of the infojson. Note that this doesn't just add back _filename, but also a few other fields like requested_formats, requested_subtitles etc that are normally removed when creating the json

@Tzahi12345
Copy link

Awesome thank you so much @pukkandan! I look forward to the next release so I can finally support yt-dlp :)

@vxbinaca
Copy link
Contributor

vxbinaca commented Jun 2, 2021

The other issue with stripping out IP is we can't test it or WHOIS the IP to see if theres unreported by the host georestrictions. So I'll have a bunch of kids dumping channels from other supported sites, supported site silently implements something to georestrict or throttle a region or country or IP, and I have no way of seeing whats going on because they'll refuse to post their IP - alternatively you could implement a flag that removes the redaction so I could force it's use for submitting issues for my project.

Or people who use Tubeup could just get a VPS and not rip from their desktops.

@pukkandan
Copy link
Member

@vxbinaca See my earlier comments

I have pushed a patch to remove _filename, but like I said above, the IP cannot be removed from my end without sacrificing existing functionality. So I am closing this issue.

I have added an option --no-clean-infojson to prevent any fields from being filtered out of the infojson. Note that this doesn't just add back _filename, but also a few other fields like requested_formats, requested_subtitles etc that are normally removed when creating the json

@toscompliantname
Copy link

toscompliantname commented Sep 10, 2021

Or people who use Tubeup could just get a VPS and not rip from their desktops.

and by "get" he meant "rent" a virtual ""private"" server, just so that the ip the program spits out is of your vps provider's rather your isp's
note to self: don't post when drunk or angry

nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
nixxo pushed a commit to nixxo/yt-dlp that referenced this issue Nov 22, 2021
Options: --clean-infojson, --no-clean-infojson

Related: yt-dlp#42 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants