Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Get YouTube data from API, not page scrape? #376
Comments
|
By using the YouTube API you are agreeing to its terms and conditions. See the "Terms" link from the top bar at this page: http://gdata.youtube.com/demo/index.html Which, as of the time I'm writing this takes you to: https://developers.google.com/youtube/terms I'm worried about point number 5, "Caching". The YouTube API must be avoided if possible, IMHO. We should follow the steps a web browser does to avoid legal problems as much as possible. Edit: also, in the prohibitions section, points 9, 11, etc. |
|
@rg3: Fair enough and certainly acceptable. But, given that we're faking the user-agent for direct access, and that the data URL above doesn't require an API key or a specific client identification label, it'd be relatively hard for them to "tell" that youtube-dl is abusing the API (anymore than they can reliably tell that youtube-dl is downloading videos). Still, I can understand the reticence. |
|
The thing about the YouTube API terms and conditions is that they do not apply to the user of the application, but to the creator of the application. BTW, I know there are a couple of InfoExtractors that use it, but I only recently found out about the terms and conditions for the API. :( |
|
I'll close this. I ended up creating a cheap metadata-script which is innocent enough to include herein: for FILEPATH in *
do
FILENAME=$(basename "$FILEPATH")
EXTENSION="${FILENAME##*.}"
YOUTUBE_ID="${FILENAME%.*}"
if [ $YOUTUBE_ID != "*" ]
then
DATA=`curl -s https://gdata.youtube.com/feeds/api/videos/$YOUTUBE_ID?v=2`
PUBLISHED=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->published;' | sed 's/\..*Z/Z/'`
AUTHOR=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->author->name;'`
TITLE=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->title;'`
DESCRIPTION=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->description;'`
COMMENT="http://www.youtube.com/watch?v=$YOUTUBE_ID"
ALBUM=$AUTHOR
if [ "$1" ]
then
TITLE="$AUTHOR: $TITLE"
AUTHOR=$1
ALBUM=$1
fi
AtomicParsley "$FILEPATH" --overWrite --podcastFlag true --artist "$AUTHOR" --title "$TITLE" --album "$ALBUM" --year "$PUBLISHED" --comment "$COMMENT" --description "$DESCRIPTION"
fi
done |
|
I know this issue is a whopping 8 years old, but I noticed the Data API is still not being used so I set out to find out why. Now checking the current terms I cannot find anything that would indicate using it in the way youtube-dl works as a violation. I might have missed something though, but I think a reevaluation would be appropriate as it would allow to fix a very long standing issue (or at least provide a decent workaround). |
|
youtube-dl will not use any kind of APIs involving sharing personal keys due to obvious reasons. |
G'day. I've been working on some local scripts to archive YouTube videos, and part of that is adding in metadata to the downloaded mp4 files from youtube-dl. After noticing that the description was coming out truncated (caused by me missing lxml), I'm now noticing that youtube-dl only knows the day (YYYY-MM-DD) and not the time the video was uploaded. This is through no fault of the script, however, as there is actually no complete timestamp on the video page itself.
The complete timestamp DOES seem to appear in the API though:
https://gdata.youtube.com/feeds/api/videos/WN5IzJ99Qys?v=2
I'm not a Python coder, but I could probably figure out to get this support in, but the real question is: would you be OK with youtube-dl making an additional request per video URL? (If not, that's fine too, as I'd just write a custom API script to fetch the data and ignore the --write-info-json option.)