Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Create a function for pattern-matching and error checking? #847
Comments
|
Go ahead and start changing, this looks fine to me. Although I'd really prefer HTML parsing + xpaths for a lot of these checks. |
|
Eh, me too, I'm a huge BeautifulSoup fan, but I fear that without Filippo Valsorda 2013/5/20 Philipp Hagemeister notifications@github.com
|
|
I would personally return the |
|
Umh... there would not be much gain, then, as you have to check if the mobj My idea is that of supporting only a single return value, and to take any Filippo Valsorda 2013/5/20 Jaime Marquínez Ferrándiz notifications@github.com
|
|
You're right, I was thinking just in fatal errors. |
|
Speaking about errors, I and @jaimeMF thought that we should discuss which extraction failures should be fatal, and which should issue warnings. My take is:
|
|
So this would be the function def search_regex(pattern, string, name, fatal=True, flags=0):
mobj = re.search(pattern, string, flags)
if mobj is None and fatal:
raise ExtractorError(u'Unable to extract %s; '
u'please report this issue on GitHub.' % name)
elif mobj is None:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on GitHub.' % name)
return None
else:
# return the first matched group
return next(g for g in mobj.groups() if g is not None) |
|
Just one note, some IEs assign |
|
-----BEGIN PGP SIGNED MESSAGE----- On 05/29/2013 08:10 AM, Jaime Marquínez Ferrándiz wrote:
-----BEGIN PGP SIGNATURE----- iEYEAREKAAYFAlGlrikACgkQ9eq1gvr7CFy+eQCbBSWFO78LrhVa2UL0DFydIHL/ |
|
Ok, great, Anna is in the process of deploying the function, I'll have her However, I feel like that in output filenames we should represent unset or While we are at it, @phihag have you expressed your preference on which On Wednesday, May 29, 2013, Philipp Hagemeister wrote:
Filippo Valsorda |
|
My latest iteration of the function, including a default value and support for multiple fallback patterns: def search_regex(pattern, string, name, default=None, fatal=True, flags=0):
"""
Perform a regex search on the given string, using a single or a list of
patterns returning the first matching group.
In case of failure return a default value or raise a WARNING or a
ExtractorError, depending on fatal, specifying the field name.
"""
if type(pattern) in (type(''), type(u''), re.RegexObject):
mobj = re.search(pattern, string, flags)
else:
for p in pattern:
mobj = re.search(p, string, flags)
if mobj: break
if sys.stderr.isatty() and os.name != 'nt':
_name = u'\033[0;34m%s\033[0m' % name
else:
_name = name
if mobj:
# return the first matching group
return next(g for g in mobj.groups() if g is not None)
elif default is not None:
return default
elif fatal:
raise ExtractorError(u'Unable to extract %s; '
u'please report this issue on GitHub.' % _name)
else:
self._downloader.report_warning(u'unable to extract %s; '
u'please report this issue on GitHub.' % _name)
return None |
We have two common base patterns:
So I was thinking about a helper function like
If you like it I can put up a PR.