Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CCMA] "ValueError: date not matching format" when downloading video #30961

Open
5 of 6 tasks
adaliaramon opened this issue May 19, 2022 · 2 comments · May be fixed by #31320
Open
5 of 6 tasks

[CCMA] "ValueError: date not matching format" when downloading video #30961

adaliaramon opened this issue May 19, 2022 · 2 comments · May be fixed by #31320
Labels
broken-IE problem with existing site extraction

Comments

@adaliaramon
Copy link

Checklist

  • I'm reporting a broken site support issue
  • I've verified that I'm running youtube-dl version 2021.12.17
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar bug reports including closed ones
  • I've read bugs section in FAQ

Verbose log

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.ccma.cat/tv3/alacarta/crims/manoli/video/6158990/']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Python version 3.10.4 (CPython) - Linux-5.17.8-arch1-1-x86_64-with-glibc2.35
[debug] exe versions: ffmpeg 5.0, ffprobe 5.0
[debug] Proxy map: {}
[CCMA] 6158990: Downloading JSON metadata
Traceback (most recent call last):
  File "/usr/bin/youtube-dl", line 33, in <module>
    sys.exit(load_entry_point('youtube-dl==2021.12.17', 'console_scripts', 'youtube-dl')())
  File "/usr/lib/python3.10/site-packages/youtube_dl/__init__.py", line 475, in main
    _real_main(argv)
  File "/usr/lib/python3.10/site-packages/youtube_dl/__init__.py", line 465, in _real_main
    retcode = ydl.download(all_urls)
  File "/usr/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 2068, in download
    res = self.extract_info(
  File "/usr/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 808, in extract_info
    return self.__extract_info(url, ie, download, extra_info, process)
  File "/usr/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 815, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/youtube_dl/YoutubeDL.py", line 836, in __extract_info
    ie_result = ie.extract(url)
  File "/usr/lib/python3.10/site-packages/youtube_dl/extractor/common.py", line 534, in extract
    ie_result = self._real_extract(url)
  File "/usr/lib/python3.10/site-packages/youtube_dl/extractor/ccma.py", line 103, in _real_extract
    timestamp = calendar.timegm((datetime.datetime.strptime(
  File "/usr/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/usr/lib/python3.10/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2022-05-16T22:09:06' does not match format '%Y-%d-%mT%H:%M:%S'

Description

Got a ValueError when attempting to download this video. It seems like the date is actually matching the format, so no idea about what's going on.

@dirkf
Copy link
Contributor

dirkf commented May 19, 2022

Some home-made code can be replaced by a standard utility:

--- old/youtube_dl/extractor/ccma.py
+++ new/youtube_dl/extractor/ccma.py
@@ -11,6 +11,7 @@ from ..utils import (
     extract_timezone,
     int_or_none,
     parse_duration,
+    parse_iso8601,
     parse_resolution,
     try_get,
     url_or_none,
@@ -25,7 +26,7 @@ class CCMAIE(InfoExtractor):
         'info_dict': {
             'id': '5630208',
             'ext': 'mp4',
-            'title': 'L\'espot de La Marató de TV3',
+            'title': r'''re:^L'espot de La Marató''',
             'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
             'timestamp': 1478608140,
             'upload_date': '20161108',
@@ -39,8 +40,8 @@ class CCMAIE(InfoExtractor):
             'ext': 'mp3',
             'title': 'El Consell de Savis analitza el derbi',
             'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
-            'upload_date': '20170512',
-            'timestamp': 1494622500,
             'vcodec': 'none',
             'categories': ['Esports'],
         }
@@ -50,9 +51,9 @@ class CCMAIE(InfoExtractor):
         'info_dict': {
             'id': '6031387',
             'ext': 'mp4',
-            'title': 'Crims - Josep Talleda, l\'"Espereu-me" (capítol 1)',
+            'title': r'''re:\bJosep Talleda, l'"Espereu-me"''',
             'description': 'md5:7cbdafb640da9d0d2c0f62bad1e74e60',
-            'timestamp': 1582577700,
+            'timestamp': 1582577919,
             'upload_date': '20200224',
             'subtitles': 'mincount:4',
             'age_limit': 16,
@@ -96,14 +97,7 @@ class CCMAIE(InfoExtractor):
         duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
         tematica = try_get(informacio, lambda x: x['tematica']['text'])
 
-        timestamp = None
-        data_utc = try_get(informacio, lambda x: x['data_emissio']['utc'])
-        try:
-            timezone, data_utc = extract_timezone(data_utc)
-            timestamp = calendar.timegm((datetime.datetime.strptime(
-                data_utc, '%Y-%d-%mT%H:%M:%S') - timezone).timetuple())
-        except TypeError:
-            pass
+        timestamp = parse_iso8601(try_get(informacio, lambda x: x['data_emissio']['utc']))
 
         subtitles = {}
         subtitols = media.get('subtitols') or []

yt-dlp chose unified_timestamp() instead.

The hand-rolled code is catching TypeError instead of ValueError in the utility function. That leaves the question as to why the error is being thrown: I suspected an encoding issue, but no. The exception is only thrown in the hand-rolled code and not in the utility function. Surely Python doesn't have a secret down on Catalan variable names?

@dirkf dirkf added the broken-IE problem with existing site extraction label May 19, 2022
@dirkf dirkf changed the title "ValueError: date not matching format" when downloading video [CCMA] "ValueError: date not matching format" when downloading video May 19, 2022
@idrilirdi
Copy link

Can confirm that by modifying the code like shown up here downloading from CCMA works. Without the modification it fails because when checking for timestamp it is returned as %y-%m-%d instead of %y-%d-%m, as is normal outside of the US

celebdor added a commit to celebdor/youtube-dl that referenced this issue Oct 31, 2022
This PR fixes the date to the way ccma provides it as well as fixing
some broken tests

closes ytdl-org#30961

Signed-off-by: Antoni Segura Puimedon <celebdor@gmail.com>
celebdor added a commit to celebdor/youtube-dl that referenced this issue Oct 31, 2022
This PR fixes the date to the way ccma provides it as well as fixing
some broken tests

closes ytdl-org#30961

Signed-off-by: Antoni Segura Puimedon <celebdor@gmail.com>
@celebdor celebdor linked a pull request Oct 31, 2022 that will close this issue
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broken-IE problem with existing site extraction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants