Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get YouTube data from API, not page scrape? #376

Closed
morbus opened this issue Jul 12, 2012 · 6 comments
Closed

Get YouTube data from API, not page scrape? #376

morbus opened this issue Jul 12, 2012 · 6 comments

Comments

@morbus
Copy link

@morbus morbus commented Jul 12, 2012

G'day. I've been working on some local scripts to archive YouTube videos, and part of that is adding in metadata to the downloaded mp4 files from youtube-dl. After noticing that the description was coming out truncated (caused by me missing lxml), I'm now noticing that youtube-dl only knows the day (YYYY-MM-DD) and not the time the video was uploaded. This is through no fault of the script, however, as there is actually no complete timestamp on the video page itself.

The complete timestamp DOES seem to appear in the API though:
https://gdata.youtube.com/feeds/api/videos/WN5IzJ99Qys?v=2

I'm not a Python coder, but I could probably figure out to get this support in, but the real question is: would you be OK with youtube-dl making an additional request per video URL? (If not, that's fine too, as I'd just write a custom API script to fetch the data and ignore the --write-info-json option.)

@rg3
Copy link
Collaborator

@rg3 rg3 commented Jul 12, 2012

By using the YouTube API you are agreeing to its terms and conditions. See the "Terms" link from the top bar at this page:

http://gdata.youtube.com/demo/index.html

Which, as of the time I'm writing this takes you to:

https://developers.google.com/youtube/terms

I'm worried about point number 5, "Caching". The YouTube API must be avoided if possible, IMHO. We should follow the steps a web browser does to avoid legal problems as much as possible.

Edit: also, in the prohibitions section, points 9, 11, etc.

@morbus
Copy link
Author

@morbus morbus commented Jul 12, 2012

@rg3: Fair enough and certainly acceptable. But, given that we're faking the user-agent for direct access, and that the data URL above doesn't require an API key or a specific client identification label, it'd be relatively hard for them to "tell" that youtube-dl is abusing the API (anymore than they can reliably tell that youtube-dl is downloading videos).

Still, I can understand the reticence.

@rg3
Copy link
Collaborator

@rg3 rg3 commented Jul 15, 2012

The thing about the YouTube API terms and conditions is that they do not apply to the user of the application, but to the creator of the application.

BTW, I know there are a couple of InfoExtractors that use it, but I only recently found out about the terms and conditions for the API. :(

@morbus
Copy link
Author

@morbus morbus commented Jul 15, 2012

I'll close this. I ended up creating a cheap metadata-script which is innocent enough to include herein:

for FILEPATH in *
do
  FILENAME=$(basename "$FILEPATH")
  EXTENSION="${FILENAME##*.}"
  YOUTUBE_ID="${FILENAME%.*}"

  if [ $YOUTUBE_ID != "*" ]
  then
    DATA=`curl -s https://gdata.youtube.com/feeds/api/videos/$YOUTUBE_ID?v=2`
    PUBLISHED=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->published;' | sed 's/\..*Z/Z/'`
    AUTHOR=`echo $DATA | php -r 'print simplexml_load_file("php://stdin")->author->name;'`
    TITLE=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->title;'`
    DESCRIPTION=`echo $DATA | php -r '$x = simplexml_load_file("php://stdin"); $ns = $x->getNameSpaces(true); $m = $x->children($ns["media"]); print $m->group->description;'`
    COMMENT="http://www.youtube.com/watch?v=$YOUTUBE_ID"
    ALBUM=$AUTHOR

    if [ "$1" ]
    then
      TITLE="$AUTHOR: $TITLE"
      AUTHOR=$1
      ALBUM=$1
    fi

    AtomicParsley "$FILEPATH" --overWrite --podcastFlag true --artist "$AUTHOR" --title "$TITLE" --album "$ALBUM" --year "$PUBLISHED" --comment "$COMMENT" --description "$DESCRIPTION"
  fi
done
@morbus morbus closed this Jul 15, 2012
@BrainStone
Copy link

@BrainStone BrainStone commented Feb 7, 2020

I know this issue is a whopping 8 years old, but I noticed the Data API is still not being used so I set out to find out why.
And it seems this issue gives the answer.

Now checking the current terms I cannot find anything that would indicate using it in the way youtube-dl works as a violation. I might have missed something though, but I think a reevaluation would be appropriate as it would allow to fix a very long standing issue (or at least provide a decent workaround).

@dstftw
Copy link
Collaborator

@dstftw dstftw commented Feb 7, 2020

youtube-dl will not use any kind of APIs involving sharing personal keys due to obvious reasons.

@ytdl-org ytdl-org locked and limited conversation to collaborators Feb 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.