Facebook info extractor (v2) #81

imrehg · 2011-02-20T12:18:12Z

Sorry for taking so long to answer. I've been a bit busy this week. The code looks good and no doubt I'll merge it as long as you commit to support the InfoExtractor. It seems to integrate very well with the existing code. I'll make a few comments so you can address possible issues.

Sure, I take responsibility for it. :) As for the issues you mentioned:

It prints "Facebook" with a capital F. Normally we print the name all in lowercase, so you could change that if you are going to make more changes. It's a minor detail.

Fixed.

Second, there is a block of code in the InfoExtractor indented with 4 spaces instead of tabs. You could probably use tabs to be consistent with the rest of the code.

Took a little time to find it but now fixed. Not sure what went wrong, will keep an eye out next time.

In that same loop, line 2394, you use the word "format" as a variable name. "format" is a reserved word in some Python versions and, while you can assign another value to it, it's dangerous because we could add code that uses it later and it wouldn't work.

Sure, fixed, I don't think there's any good reason to cause conflicts, I didn't know it was reserved previously.

Finally, I don't mind importing more modules if they allow for doing something easily, as long as the modules are available in Python 2.5. My worry here is that the function you're using, unicode_escape_decode, is not mentioned in the documentation. Could you investigate this a little bit? If you explain me what it's supposed to do with an example, maybe we can code a safer version of that routine.

I think it should be safe, tested with Python 2.5.5, and it works just the same. I used something like:

import codecs
assert codecs.unicode_escape_decode('\x61\x62\x63') == (u'abc', 3)

The webpage originally downloaded into an 'str' type, with the URL we are interested in encoded in unicode as "https\u00253A\u00252F\u00252F", which should be turned into "https://". Let's say:

>>> s = "https\u00253A\u00252F\u00252F"
>>> s.decode('utf-8')
u'https\\u00253A\\u00252F\\u00252F'
>>> import codecs
>>> codecs.unicode_escape_decode(s)
(u'https%3A%2F%2F', 29)

Here the last line contains quoted format that we can now convert back to the right string to use. This whole unicode thing is pretty confusing, but I'm trying to learn more. Any thoughts?

Let me know if there's anything else I missed. :)
Cheers!

This IE should be full-featured. Public videos can be downloaded without login, e.g: https://www.facebook.com/video/video.php?v=696729990595 Private videos need login, and subject to login rate limit of a couple of tries / minute.

rg3 · 2011-02-20T14:06:34Z

In your example, I think

s.decode('unicode_escape')

Should produce the same results. I'll review it again later and, if no problems are found, I'll merge it.

imrehg · 2011-02-20T16:00:05Z

That seems to be right. Awesome, I remove the module and send another pull request. Sorry to give you so much work. Keep them suggestions coming if there's any other unreasonable code.

[pull] master from rg3:master

Facebook info extractor

3698428

This IE should be full-featured. Public videos can be downloaded without login, e.g: https://www.facebook.com/video/video.php?v=696729990595 Private videos need login, and subject to login rate limit of a couple of tries / minute.

sirinath mentioned this pull request Nov 30, 2013

Overheads - Protocol, Request, JSON, Page Download, total, percentages, etc. #1862

Closed

sirinath mentioned this pull request Dec 27, 2013

2013.12.23.4 Redownloads Some Part and Completed Downloads #2047

Closed

FelixChery mentioned this pull request Apr 24, 2019

HTTP Error 503 #20785

Closed

joedborg referenced this pull request in joedborg/youtube-dl Nov 17, 2020

Merge pull request #81 from rg3/master

2d29fbf

[pull] master from rg3:master

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Facebook info extractor (v2) #81

Facebook info extractor (v2) #81

imrehg commented Feb 20, 2011

rg3 commented Feb 20, 2011

imrehg commented Feb 20, 2011

Facebook info extractor (v2) #81

Facebook info extractor (v2) #81

Conversation

imrehg commented Feb 20, 2011

rg3 commented Feb 20, 2011

imrehg commented Feb 20, 2011