Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--match-filter parsing with spaces or dashes #8050

Closed
gohlhausen opened this issue Dec 29, 2015 · 2 comments
Closed

--match-filter parsing with spaces or dashes #8050

gohlhausen opened this issue Dec 29, 2015 · 2 comments
Labels

Comments

@gohlhausen
Copy link

@gohlhausen gohlhausen commented Dec 29, 2015

How do I use --match-filter to match this youtube channel?
Does it support spaces or hyphens in the string?

This is the uploader and uploader_ID from the json dump for one of the videos.

   "uploader":"John Sucks at Video Games",
   "uploader_id":"UC1hlBVlxLDY--Ih2fEyH5nQ",

For uploader, it will either not match spaces properly (breaking them into separate tokens) or just not match.

G:\Youtube-DL>youtube-dl https://www.youtube.com/channel/UC1hlBVlxLDY--Ih2fEyH5nQ  --verbose --match-filter "uploader = John Sucks at Video Games"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'https://www.youtube.com/channel/UC1hlBVlxLDY--Ih2fEyH5nQ', u'--verbose', u'--match-filter', u'uploader = John Sucks at Video Games']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2015.12.29
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-69422-gf5722ba, ffprobe N-69422-gf5722ba
[debug] Proxy map: {}
[youtube:channel] UC1hlBVlxLDY--Ih2fEyH5nQ: Downloading channel page
[youtube:playlist] UU1hlBVlxLDY--Ih2fEyH5nQ: Downloading webpage
[download] Downloading playlist: Uploads from John Sucks at Video Games
[youtube:playlist] UU1hlBVlxLDY--Ih2fEyH5nQ: Downloading page #1
[youtube:playlist] playlist Uploads from John Sucks at Video Games: Downloading 13 videos
[download] Downloading video 1 of 13
[youtube] BX88GB_bXmc: Downloading webpage
[youtube] BX88GB_bXmc: Downloading video info webpage
[youtube] BX88GB_bXmc: Extracting video information
[youtube] BX88GB_bXmc: Downloading DASH manifest
[youtube] BX88GB_bXmc: Downloading DASH manifest
[download] Thunder Wolves PC Game Review - HELICOPTER COLLECTOR does not pass filter uploader = John Sucks at Video Games, skipping ..
[download] Downloading video 2 of 13
[youtube] 9MpxxJos1gQ: Downloading webpage
[youtube] 9MpxxJos1gQ: Downloading video info webpage
[youtube] 9MpxxJos1gQ: Extracting video information
[youtube] 9MpxxJos1gQ: Downloading DASH manifest
[youtube] 9MpxxJos1gQ: Downloading DASH manifest
[download] Windows 10 21:9 Gaming PC Build - Part 6 - Final Build Video - Let's put it all together! does not pass filter uploader = John Sucks at Video Games, skipping ..
[download] Downloading video 3 of 13
[youtube] _cFaXX2txAY: Downloading webpage

ERROR: Interrupted by user

For uploader_id, I get this error:

G:\Youtube-DL>youtube-dl https://www.youtube.com/channel/UC1hlBVlxLDY--Ih2fEyH5nQ  --verbose --match-filter "uploader_id = UC1hlBVlxLDY--Ih2fEyH5nQ"
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'https://www.youtube.com/channel/UC1hlBVlxLDY--Ih2fEyH5nQ', u'--verbose', u'--match-filter', u'uploader_id = UC1hlBVlxLDY--Ih2fEyH5nQ']
[debug] Encodings: locale cp1252, fs mbcs, out cp437, pref cp1252
[debug] youtube-dl version 2015.12.29
[debug] Python version 2.7.10 - Windows-8-6.2.9200
[debug] exe versions: ffmpeg N-69422-gf5722ba, ffprobe N-69422-gf5722ba
[debug] Proxy map: {}
[youtube:channel] UC1hlBVlxLDY--Ih2fEyH5nQ: Downloading channel page
[youtube:playlist] UU1hlBVlxLDY--Ih2fEyH5nQ: Downloading webpage
[download] Downloading playlist: Uploads from John Sucks at Video Games
[youtube:playlist] UU1hlBVlxLDY--Ih2fEyH5nQ: Downloading page #1
[youtube:playlist] playlist Uploads from John Sucks at Video Games: Downloading 13 videos
[download] Downloading video 1 of 13
[youtube] BX88GB_bXmc: Downloading webpage
[youtube] BX88GB_bXmc: Downloading video info webpage
[youtube] BX88GB_bXmc: Extracting video information
[youtube] BX88GB_bXmc: Downloading DASH manifest
[youtube] BX88GB_bXmc: Downloading DASH manifest
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "youtube_dl\__init__.pyo", line 410, in main
  File "youtube_dl\__init__.pyo", line 400, in _real_main
  File "youtube_dl\YoutubeDL.pyo", line 1677, in download
  File "youtube_dl\YoutubeDL.pyo", line 676, in extract_info
  File "youtube_dl\YoutubeDL.pyo", line 729, in process_ie_result
  File "youtube_dl\YoutubeDL.pyo", line 676, in extract_info
  File "youtube_dl\YoutubeDL.pyo", line 837, in process_ie_result
  File "youtube_dl\YoutubeDL.pyo", line 729, in process_ie_result
  File "youtube_dl\YoutubeDL.pyo", line 676, in extract_info
  File "youtube_dl\YoutubeDL.pyo", line 722, in process_ie_result
  File "youtube_dl\YoutubeDL.pyo", line 1347, in process_video_result
  File "youtube_dl\YoutubeDL.pyo", line 1418, in process_info
  File "youtube_dl\YoutubeDL.pyo", line 628, in _match_entry
  File "youtube_dl\utils.pyo", line 1984, in _match_func
  File "youtube_dl\utils.pyo", line 1979, in match_str
  File "youtube_dl\utils.pyo", line 1979, in <genexpr>
  File "youtube_dl\utils.pyo", line 1972, in _match_one
ValueError: Invalid filter part u'uploader_id = UC1hlBVlxLDY--Ih2fEyH5nQ'

I know I can specify the channel in the video URL, but I want to use my subscription list and pick the video format depending on the channel. I don't want to unsubscribe from these channels. I need to exclude those channels with the last catch-all download line in the script using match-format "uploader != xxxx & uploader != xxxx & uploader != xxxx "

Here is that part of my script:

youtube-dl -f 299+141/299+140 https://www.youtube.com/user/blkdog7/videos -o "%%(uploader)s/%%(title)s.%%(ext)s" --ignore-errors --download-archive archive.YT --verbose --restrict-filenames 
youtube-dl -f 299+141/299+140 https://www.youtube.com/channel/UC1hlBVlxLDY--Ih2fEyH5nQ/videos  -o "%%(uploader)s/%%(title)s.%%(ext)s" --ignore-errors --download-archive archive.YT --verbose --restrict-filenames 
youtube-dl -f 299+141/299+140/137+141/137+140/bestvideo+bestaudio/best -o "%%(uploader)s/%%(title)s.%%(ext)s" --ignore-errors --download-archive archive.YT -u username -p password :ytsubs --verbose --restrict-filenames  --match-filter "uploader_id != blkdog7 & uploader_id != UC1hlBVlxLDY--Ih2fEyH5nQ"

@jaimeMF jaimeMF added the request label Dec 30, 2015
@gohlhausen
Copy link
Author

@gohlhausen gohlhausen commented Jan 1, 2016

It looks like a simple change to the regex will add this functionality.
I don't know what the impact is to the rest of the project, though.

Change line 1930 in utils.py from

            (?P<strval>(?![0-9.])[a-z0-9A-Z]*)

to

            (?P<strval>(?![0-9.])[a-z0-9A-Z/s/-]*)

This will allow whitespaces and dashes to be considered part of a strval.
Since the input is already broken by & before it gets here, there shouldn't be any problem with the function knowing what part is what.

@mariussteffen
Copy link

@mariussteffen mariussteffen commented Mar 28, 2016

I've had a similar problem, but instead of '-', '_' was causing an exception.
So the regex should be:
(?P<strval>(?![0-9.])[a-z0-9A-Z/s/-]*)

EDIT: Sometimes ids start with a number. Currently, when excluding such ids with id != <id>, NOTHING will pass the filter.
So to decide whether a value should be parsed as integer or string, strings should always start and end with '. So maybe something like:
(?P<strval>\'[a-z0-9A-Z/s/-]*\')

@dstftw dstftw closed this in db13c16 Feb 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.