Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/yt-dlp/yt-dlp into ytdlp
Browse files Browse the repository at this point in the history
* 'master' of https://github.com/yt-dlp/yt-dlp:
  [extractor/jwplatform] Look for `data-video-jw-id`
  [cleanup] Misc fixes (see desc)
  • Loading branch information
Lesmiscore committed Jun 12, 2022
2 parents 13af23f + 55baa67 commit 17ae2e7
Show file tree
Hide file tree
Showing 10 changed files with 73 additions and 59 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ PYTHON ?= /usr/bin/env python3
SYSCONFDIR = $(shell if [ $(PREFIX) = /usr -o $(PREFIX) = /usr/local ]; then echo /etc; else echo $(PREFIX)/etc; fi)

# set markdown input format to "markdown-smart" for pandoc version 2 and to "markdown" for pandoc prior to version 2
MARKDOWN = $(shell if [ "$(pandoc -v | head -n1 | cut -d" " -f2 | head -c1)" = "2" ]; then echo markdown-smart; else echo markdown; fi)
MARKDOWN = $(shell if [ `pandoc -v | head -n1 | cut -d" " -f2 | head -c1` = "2" ]; then echo markdown-smart; else echo markdown; fi)

# it won't run in BSD install!
# you should install GNU coreutils and replace these install command with ginstall, if needed
Expand Down
30 changes: 10 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -409,8 +409,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
--list-extractors List all supported extractors and exit
--extractor-descriptions Output descriptions of all supported
extractors and exit
--force-generic-extractor Force extraction to use the generic
extractor
--force-generic-extractor Force extraction to use the generic extractor
--default-search PREFIX Use this prefix for unqualified URLs. Eg:
"gvsearch2:python" downloads two videos from
google videos for the search term "python".
Expand Down Expand Up @@ -469,8 +468,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
aliases; so be carefull to avoid defining
recursive options. As a safety measure, each
alias may be triggered a maximum of 100
times. This option can be used multiple
times
times. This option can be used multiple times

## Network Options:
--proxy URL Use the specified HTTP/HTTPS/SOCKS proxy. To
Expand All @@ -497,8 +495,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
explicitly provided two-letter ISO 3166-2
country code
--geo-bypass-ip-block IP_BLOCK Force bypass geographic restriction with
explicitly provided IP block in CIDR
notation
explicitly provided IP block in CIDR notation

## Video Selection:
--playlist-start NUMBER Playlist video to start at (default is 1)
Expand Down Expand Up @@ -708,8 +705,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
modification time (default)
--no-mtime Do not use the Last-modified header to set
the file modification time
--write-description Write video description to a .description
file
--write-description Write video description to a .description file
--no-write-description Do not write video description (default)
--write-info-json Write video metadata to a .info.json file
(this may contain personal information)
Expand All @@ -731,8 +727,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
extraction is known to be quick (Alias:
--no-get-comments)
--load-info-json FILE JSON file containing the video information
(created with the "--write-info-json"
option)
(created with the "--write-info-json" option)
--cookies FILE Netscape formatted file to read cookies from
and dump cookie jar in
--no-cookies Do not read/dump cookies from/to file
Expand All @@ -748,8 +743,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
for decrypting Chromium cookies on Linux can
be (optionally) specified after the browser
name separated by a "+". Currently supported
keyrings are: basictext, gnomekeyring,
kwallet
keyrings are: basictext, gnomekeyring, kwallet
--no-cookies-from-browser Do not load cookies from browser (default)
--cache-dir DIR Location in the filesystem where youtube-dl
can store some downloaded information (such
Expand All @@ -761,8 +755,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi

## Thumbnail Options:
--write-thumbnail Write thumbnail image to disk
--no-write-thumbnail Do not write thumbnail image to disk
(default)
--no-write-thumbnail Do not write thumbnail image to disk (default)
--write-all-thumbnails Write all thumbnail image formats to disk
--list-thumbnails List available thumbnails of each video.
Simulate unless --no-simulate is used
Expand Down Expand Up @@ -1048,8 +1041,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
otherwise), force (try fixing even if file
already exists)
--ffmpeg-location PATH Location of the ffmpeg binary; either the
path to the binary or its containing
directory
path to the binary or its containing directory
--exec [WHEN:]CMD Execute a command, optionally prefixed with
when to execute it (after_move if
unspecified), separated by a ":". Supported
Expand All @@ -1076,8 +1068,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
be used with "--paths" and "--output" to set
the output filename for the split files. See
"OUTPUT TEMPLATE" for details
--no-split-chapters Do not split video based on chapters
(default)
--no-split-chapters Do not split video based on chapters (default)
--remove-chapters REGEX Remove chapters whose title matches the
given regular expression. The syntax is the
same as --download-sections. This option can
Expand Down Expand Up @@ -1108,8 +1099,7 @@ You can also fork the project on github and run your fork's [build workflow](.gi
(after downloading and processing all
formats of a video), or "playlist" (at end
of playlist). This option can be used
multiple times to add different
postprocessors
multiple times to add different postprocessors

## SponsorBlock Options:
Make chapter entries for, or remove various segments (sponsor,
Expand Down
8 changes: 8 additions & 0 deletions devscripts/make_readme.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
OPTIONS_START = 'General Options:'
OPTIONS_END = 'CONFIGURATION'
EPILOG_START = 'See full documentation'
ALLOWED_OVERSHOOT = 2

DISABLE_PATCH = object()

Expand All @@ -28,6 +29,7 @@ def apply_patch(text, patch):

options = take_section(sys.stdin.read(), f'\n {OPTIONS_START}', f'\n{EPILOG_START}', shift=1)

max_width = max(map(len, options.split('\n')))
switch_col_width = len(re.search(r'(?m)^\s{5,}', options).group())
delim = f'\n{" " * switch_col_width}'

Expand All @@ -44,6 +46,12 @@ def apply_patch(text, patch):
rf'(?m)({delim}\S+)+$',
lambda mobj: ''.join((delim, mobj.group(0).replace(delim, '')))
),
( # Allow overshooting last line
rf'(?m)^(?P<prev>.+)${delim}(?P<current>.+)$(?!{delim})',
lambda mobj: (mobj.group().replace(delim, ' ')
if len(mobj.group()) - len(delim) + 1 <= max_width + ALLOWED_OVERSHOOT
else mobj.group())
),
( # Avoid newline when a space is available b/w switch and description
DISABLE_PATCH, # This creates issues with prepare_manpage
r'(?m)^(\s{4}-.{%d})(%s)' % (switch_col_width - 6, delim),
Expand Down
4 changes: 2 additions & 2 deletions yt_dlp/YoutubeDL.py
Original file line number Diff line number Diff line change
Expand Up @@ -631,7 +631,7 @@ def __init__(self, params=None, auto_init=True):
)
self._allow_colors = Namespace(**{
type_: not self.params.get('no_color') and supports_terminal_sequences(stream)
for type_, stream in self._out_files if type_ != 'console'
for type_, stream in self._out_files.items_ if type_ != 'console'
})

if sys.version_info < (3, 6):
Expand Down Expand Up @@ -3961,7 +3961,7 @@ def get_encoding(stream):
sys.getfilesystemencoding(),
self.get_encoding(),
', '.join(
f'{key} {get_encoding(stream)}' for key, stream in self._out_files
f'{key} {get_encoding(stream)}' for key, stream in self._out_files.items_
if stream is not None and key != 'console')
)

Expand Down
22 changes: 21 additions & 1 deletion yt_dlp/extractor/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2577,7 +2577,27 @@ class GenericIE(InfoExtractor):
'timestamp': 1652833414,
'age_limit': 0,
}
}
}, {
'url': 'https://www.skimag.com/video/ski-people-1980/',
'info_dict': {
'id': 'ski-people-1980',
'title': 'Ski People (1980)',
},
'playlist_count': 1,
'playlist': [{
'md5': '022a7e31c70620ebec18deeab376ee03',
'info_dict': {
'id': 'YTmgRiNU',
'ext': 'mp4',
'title': '1980 Ski People',
'timestamp': 1610407738,
'description': 'md5:cf9c3d101452c91e141f292b19fe4843',
'thumbnail': 'https://cdn.jwplayer.com/v2/media/YTmgRiNU/poster.jpg?width=720',
'duration': 5688.0,
'upload_date': '20210111',
}
}]
},
]

_CORRUPTED_SCHEME_CONVERSION_TABLE = {
Expand Down
3 changes: 3 additions & 0 deletions yt_dlp/extractor/jwplatform.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ def _extract_urls(webpage):
webpage)
if ret:
return ret
mobj = re.search(r'<div\b[^>]* data-video-jw-id="([a-zA-Z0-9]{8})"', webpage)
if mobj:
return [f'jwplatform:{mobj.group(1)}']

def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
Expand Down
7 changes: 7 additions & 0 deletions yt_dlp/extractor/rumble.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ class RumbleEmbedIE(InfoExtractor):
'title': 'WMAR 2 News Latest Headlines | October 20, 6pm',
'timestamp': 1571611968,
'upload_date': '20191020',
'channel_url': 'https://rumble.com/c/WMAR',
'channel': 'WMAR',
'thumbnail': 'https://sp.rmbl.ws/s8/1/5/M/z/1/5Mz1a.OvCc-small-WMAR-2-News-Latest-Headline.jpg',
'duration': 234,
'uploader': 'WMAR',
}
}, {
'url': 'https://rumble.com/embed/vslb7v',
Expand All @@ -38,6 +43,7 @@ class RumbleEmbedIE(InfoExtractor):
'channel': 'CTNews',
'thumbnail': 'https://sp.rmbl.ws/s8/6/7/i/9/h/7i9hd.OvCc.jpg',
'duration': 901,
'uploader': 'CTNews',
}
}, {
'url': 'https://rumble.com/embed/ufe9n.v5pv5f',
Expand Down Expand Up @@ -96,6 +102,7 @@ def _real_extract(self, url):
'channel': author.get('name'),
'channel_url': author.get('url'),
'duration': int_or_none(video.get('duration')),
'uploader': author.get('name'),
}


Expand Down
21 changes: 8 additions & 13 deletions yt_dlp/jsinterp.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
_NAME_RE = r'[a-zA-Z_$][a-zA-Z_$0-9]*'

_MATCHING_PARENS = dict(zip('({[', ')}]'))
_QUOTES = '\'"'


class JS_Break(ExtractorError):
Expand Down Expand Up @@ -68,24 +69,18 @@ def _separate(expr, delim=',', max_split=None):
if not expr:
return
counters = {k: 0 for k in _MATCHING_PARENS.values()}
start, splits, pos, delim_len, in_quote, quote_escape = 0, 0, 0, len(delim) - 1, False, False
start, splits, pos, delim_len = 0, 0, 0, len(delim) - 1
in_quote, escaping = None, False
for idx, char in enumerate(expr):
if quote_escape:
quote_escape = False
continue
elif char == '\\' and expr[start] in '"\'':
quote_escape = True
continue
elif char in '"\'':
in_quote = not in_quote
continue
elif in_quote:
continue
if char in _MATCHING_PARENS:
counters[_MATCHING_PARENS[char]] += 1
elif char in counters:
counters[char] -= 1
if char != delim[pos] or any(counters.values()):
elif not escaping and char in _QUOTES and in_quote in (char, None):
in_quote = None if in_quote else char
escaping = not escaping and in_quote and char == '\\'

if char != delim[pos] or any(counters.values()) or in_quote:
pos = 0
continue
elif pos != delim_len:
Expand Down
2 changes: 1 addition & 1 deletion yt_dlp/postprocessor/_attachments.py
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ def _finish_multiline_status(self):
)

def _report_progress_status(self, s, default_template):
for name, style in self.ProgressStyles:
for name, style in self.ProgressStyles.items_:
name = f'_{name}_str'
if name not in s:
continue
Expand Down
33 changes: 12 additions & 21 deletions yt_dlp/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
import tempfile
import time
import traceback
import types
import urllib.parse
import xml.etree.ElementTree
import zlib
Expand Down Expand Up @@ -370,14 +371,14 @@ def get_element_html_by_attribute(attribute, value, html, **kargs):
def get_elements_by_class(class_name, html, **kargs):
"""Return the content of all tags with the specified class in the passed HTML document as a list"""
return get_elements_by_attribute(
'class', r'[^\'"]*\b%s\b[^\'"]*' % re.escape(class_name),
'class', r'[^\'"]*(?<=[\'"\s])%s(?=[\'"\s])[^\'"]*' % re.escape(class_name),
html, escape_value=False)


def get_elements_html_by_class(class_name, html):
"""Return the html of all tags with the specified class in the passed HTML document as a list"""
return get_elements_html_by_attribute(
'class', r'[^\'"]*\b%s\b[^\'"]*' % re.escape(class_name),
'class', r'[^\'"]*(?<=[\'"\s])%s(?=[\'"\s])[^\'"]*' % re.escape(class_name),
html, escape_value=False)


Expand Down Expand Up @@ -3433,16 +3434,15 @@ def _match_one(filter_part, dct, incomplete):
else:
is_incomplete = lambda k: k in incomplete

operator_rex = re.compile(r'''(?x)\s*
operator_rex = re.compile(r'''(?x)
(?P<key>[a-z_]+)
\s*(?P<negation>!\s*)?(?P<op>%s)(?P<none_inclusive>\s*\?)?\s*
(?:
(?P<quote>["\'])(?P<quotedstrval>.+?)(?P=quote)|
(?P<strval>.+?)
)
\s*$
''' % '|'.join(map(re.escape, COMPARISON_OPERATORS.keys())))
m = operator_rex.search(filter_part)
m = operator_rex.fullmatch(filter_part.strip())
if m:
m = m.groupdict()
unnegated_op = COMPARISON_OPERATORS[m['op']]
Expand Down Expand Up @@ -3478,11 +3478,10 @@ def _match_one(filter_part, dct, incomplete):
'': lambda v: (v is True) if isinstance(v, bool) else (v is not None),
'!': lambda v: (v is False) if isinstance(v, bool) else (v is None),
}
operator_rex = re.compile(r'''(?x)\s*
operator_rex = re.compile(r'''(?x)
(?P<op>%s)\s*(?P<key>[a-z_]+)
\s*$
''' % '|'.join(map(re.escape, UNARY_OPERATORS.keys())))
m = operator_rex.search(filter_part)
m = operator_rex.fullmatch(filter_part.strip())
if m:
op = UNARY_OPERATORS[m.group('op')]
actual_value = dct.get(m.group('key'))
Expand Down Expand Up @@ -5458,23 +5457,15 @@ def get_argcount(func):
return try_get(func, lambda x: x.__code__.co_argcount, int)


class Namespace:
class Namespace(types.SimpleNamespace):
"""Immutable namespace"""

def __init__(self, **kwargs):
self._dict = kwargs

def __getattr__(self, attr):
return self._dict[attr]

def __contains__(self, item):
return item in self._dict.values()

def __iter__(self):
return iter(self._dict.items())
return iter(self.__dict__.values())

def __repr__(self):
return f'{type(self).__name__}({", ".join(f"{k}={v}" for k, v in self)})'
@property
def items_(self):
return self.__dict__.items()


# Deprecated
Expand Down

0 comments on commit 17ae2e7

Please sign in to comment.