Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facebook: unable to access the actual title of the video #14156

Open
jollino opened this issue Sep 9, 2017 · 1 comment
Open

Facebook: unable to access the actual title of the video #14156

jollino opened this issue Sep 9, 2017 · 1 comment

Comments

@jollino
Copy link

@jollino jollino commented Sep 9, 2017

Please follow the guide below

  • You will be asked some questions and requested to provide some information, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your issue (like this: [x])
  • Use the Preview tab to see what your issue will actually look like

Make sure you are using the latest version: run youtube-dl --version and ensure your version is 2017.09.02. If it's not, read this FAQ entry and update. Issues with outdated version will be rejected.

  • I've verified and I assure that I'm running youtube-dl 2017.09.02

Before submitting an issue make sure you have:

  • At least skimmed through the README, most notably the FAQ and BUGS sections
  • Searched the bugtracker for similar issues including closed ones

What is the purpose of your issue?

  • Bug report (encountered problems with youtube-dl)
  • Site support request (request for adding support for a new site)
  • Feature request (request for a new functionality)
  • Question
  • Other

The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your issue


If the purpose of this issue is a bug report, site support request or you are not completely sure provide the full verbose output as follows:

Add the -v flag to your command line you run youtube-dl with (youtube-dl -v <your command line>), copy the whole output and insert it here. It should look similar to one below (replace it with your log inserted between triple ```):

[debug] System config: []
[debug] User config: ['--retries', '99']
[debug] Custom config: []
[debug] Command-line args: ['--verbose', 'https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545/']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2017.09.02
[debug] Python version 3.6.2 - Darwin-16.7.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 3.3.3, ffprobe 3.3.3, rtmpdump 2.4
[debug] Proxy map: {}
[facebook] 415442948635545: Downloading webpage
[debug] Default format spec: bestvideo+bestaudio/best
[debug] Invoking downloader on 'https://video-mxp1-1.xx.fbcdn.net/v/t42.1790-2/11163850_415443371968836_1361671976_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InNkIn0%3D&oh=030b09815fc1303f203f9541d321ff85&oe=59B40912'
[download] Promo della prima stagione di Camera Café  #CameraCafé #LaRinascita-415442948635545.mp4 has already been downloaded
[download] 100% of 808.89KiB
<end of log>

Description of your issue, suggested solution and other information

It looks like youtube-dl is unable to get the title of a video, and always uses the description as a title. It's actually somewhat rare for videos to have an actual title, but it would be useful for youtube-dl to use it when it is available.

For reference I'm using https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545/ but it seems to happen consistently.

The problem, I believe, is that the actual title is only shown if the video if opened from a video list page, not if the url is accessed directly, so it's just hard to spot. Most videos, moreover, don't even have one; the first part of the description is effectively used as the title.

Compare for instance for opening the aforementioned video url versus opening it from https://www.facebook.com/cclarinascita/videos/ (1st video of the 3rd playlist from the top, named "Camera Café - I stagione - Ep. 0-49").

At least on my end, the direct url just opens a simple video page showing no title, whereas opening it from the list shows a different interface with a bigger video on the left, and a right sidebar with the title in black, the description, and more videos from the same playlist ("Up next").

I ran the --write-info-json option, and it really looks like the title is not found at all:

{
   "id":"415442948635545",
   "title":"Promo della prima stagione di Camera Caf\u00e9  #CameraCaf\u00e9 #LaRinascita",
   "formats":[
      {
         "format_id":"progressive_sd_src",
         "url":"https://video-mxp1-1.xx.fbcdn.net/v/t42.1790-2/11163850_415443371968836_1361671976_n.mp4?efg=eyJybHIiOjM5OSwicmxhIjo1MTIsInZlbmNvZGVfdGFnIjoic2QifQ%3D%3D&rl=399&vabr=222&oh=030b09815fc1303f203f9541d321ff85&oe=59B40912",
         "preference":-10,
         "ext":"mp4",
         "format":"progressive_sd_src - unknown",
         "protocol":"https",
         "http_headers":{
            "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)",
            "Accept-Charset":"ISO-8859-1,utf-8;q=0.7,*;q=0.7",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Encoding":"gzip, deflate",
            "Accept-Language":"en-us,en;q=0.5"
         }
      },
      {
         "format_id":"progressive_sd_src_no_ratelimit",
         "url":"https://video-mxp1-1.xx.fbcdn.net/v/t42.1790-2/11163850_415443371968836_1361671976_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InNkIn0%3D&oh=030b09815fc1303f203f9541d321ff85&oe=59B40912",
         "preference":-10,
         "ext":"mp4",
         "format":"progressive_sd_src_no_ratelimit - unknown",
         "protocol":"https",
         "http_headers":{
            "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)",
            "Accept-Charset":"ISO-8859-1,utf-8;q=0.7,*;q=0.7",
            "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
            "Accept-Encoding":"gzip, deflate",
            "Accept-Language":"en-us,en;q=0.5"
         }
      }
   ],
   "uploader":"Camera Caf\u00e9 - La Rinascita",
   "timestamp":1429881130,
   "extractor":"facebook",
   "webpage_url":"https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545/?type=1",
   "webpage_url_basename":"415442948635545",
   "extractor_key":"Facebook",
   "playlist":null,
   "playlist_index":null,
   "display_id":"415442948635545",
   "upload_date":"20150424",
   "format_id":"progressive_sd_src_no_ratelimit",
   "url":"https://video-mxp1-1.xx.fbcdn.net/v/t42.1790-2/11163850_415443371968836_1361671976_n.mp4?efg=eyJ2ZW5jb2RlX3RhZyI6InNkIn0%3D&oh=030b09815fc1303f203f9541d321ff85&oe=59B40912",
   "preference":-10,
   "ext":"mp4",
   "format":"progressive_sd_src_no_ratelimit - unknown",
   "protocol":"https",
   "http_headers":{
      "User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/47.0 (Chrome)",
      "Accept-Charset":"ISO-8859-1,utf-8;q=0.7,*;q=0.7",
      "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
      "Accept-Encoding":"gzip, deflate",
      "Accept-Language":"en-us,en;q=0.5"
   },
   "fulltitle":"Promo della prima stagione di Camera Caf\u00e9  #CameraCaf\u00e9 #LaRinascita",
   "_filename":"Promo della prima stagione di Camera Caf\u00e9  #CameraCaf\u00e9 #LaRinascita-415442948635545.mp4"
}

I dug a bit into the html of https://www.facebook.com/cclarinascita/videos/vl.1382797855382682/415442948635545 (which, again, does not show the title if it's accessed directly) and discovered that the title is actually injected into the page, but it's only found inside two commented links to the video itself, in two different parts of the page:

<a class="_2za_" href="https://www.facebook.com/cclarinascita/videos"><span class="_50f7">Camera Café - I stagione - Ep. 0 - Sketch Promo</span></a>

and

<a data-onclick="[[&quot;TahoeController&quot;,&quot;openFromVideoLinkHelper&quot;,&#123;&quot;__elem&quot;:1&#125;,&quot;unknown&quot;]]" class="async_saving _400z _2-40 _5pcq" href="/cclarinascita/videos/415442948635545/" aria-label="Camera Caf&#xe9; - I stagione - Ep. 0 - Sketch Promo" data-video-channel-id="387437888106301:415442948635545" data-channel-caller="channel_view_from_unknown" ajaxify="#" rel="async" target="">

I'm not sure how reliable this would be given that it's commented out (would a parser even be able to access it, by default?), but I suppose it's the only silver lining here.

Also, it appears that changing the ?type= parameter in the query string has absolutely no effect; the same is for removing the playlist part of the url (the intermediate /vl.1382797855382682/ in this case).

@barsnick
Copy link

@barsnick barsnick commented Jan 3, 2019

This has been annoying me for quite some time as well. The Facebook users (i.e. content providers) give their videos actual titles as captions, and youtube-dl extracts the text "below".

Another random example:
https://www.facebook.com/markberubemusic/videos/vb.350276931674911/315046592680474/?type=2&theater

The caption says:

This week in Switzerland...

13.12.18 - Basel - Kaserne / 15.12.18 - Bern - Dachstock * opening for Sophie Hunger

youtube-dl extracts:

13.12.18 - Basel - Kaserne / 15.12.18 - Bern - Dachstock * opening for Sophie...

While I actually interpret the title as such:

This week in Switzerland...

I can give tons of other examples.

I found this "proper" title in Facebook's current HTML code within this construct:
<title id="pageTitle">... | Facebook</title>

I'll post a pull request adding this regex to youtube-dl's title extraction. Its result is titles much more according to my expectations. It does change almost every title (including all the tests). It also changes group videos' titles from "[group name] has 481 members" to "[group name] Public Group". I don't see this side effect as all too bad though, either.

If you're impatient before I manage to post the pull request: Here's the regex:

r'(?s)<title id="pageTitle"[^>]*>([^<]*)(?: \| Facebook)</title>'

or the code:

        if not video_title:
            video_title = self._html_search_regex(
                r'(?s)<title id="pageTitle"[^>]*>([^<]*)(?: \| Facebook)</title>',
                webpage, 'title', default=None)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.