-
-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[youtube] Upload date being wrong by one day #9829
Comments
The upload_date is returned in UTC |
Ok makes sense, but is there an option to change the time format to CEST. Maybe some way to get also a time so I could workaround it on my own? |
use |
I tried yt-dlp -no-download --compat-option no-youtube-prefer-utc-upload-date https://www.youtube.com/watch?v=OqjTtnmGv8s --print upload_date But still 20211025 not 20211026, that is weird. Since on the youtube webpage no matter of region I see the same date 26.10.2021... I don't know why there is a difference. |
cc @coletdjnz |
Upstream gets the same result:
That value is the The page displayed by Chromium (Qt WebEngine) and by Firefox ESR (shows
From OP's account, in CEST the page shows 26 instead of 25 (2021's CEST ended on 31 October, the last Sunday of the month). Why do we get 20211025? The yt-dl extractor is looking at the webpage: (Pdb) pp re.findall(r'[^,>]{,45}2021-?10-?2\d[^\s>,]*', webpage)
['<meta itemprop="datePublished" content="2021-10-25T15:29:23-07:00"',
'<meta itemprop="uploadDate" content="2021-10-25T15:29:23-07:00"',
'"publishDate":"2021-10-25T15:29:23-07:00"',
'"uploadDate":"2021-10-25T15:29:23-07:00"}}']
(Pdb) There is plenty of full resolution timestamp data there. The yt-dlp extractor looks at all the player responses that were both requested and available, but similar data also seemed to be present in the cases that I checked. The root of the problem seems to be this:
...
# Remove AM/PM + timezone
date_str = re.sub(r'(?i)\s*(?:AM|PM)(?:\s+[A-Z]+)?', '', date_str)
_, date_str = extract_timezone(date_str)
... Is it just me, or is this is complete nonsense? The explanation might be, from the first commit of this function:
So this was a first pass that hasn't been fixed since 2014. The Solutions:
Presumably, before JS got fancy date/time processing, YT used to send the YYYYMMDD resolution shown on the page, rather sending the ISO 8601 format and degrading it. |
Also, apparently the In fact, |
I'm confident the actual timestamp was not available the last time I checked. If we can now extract full timestamp instead of upload_date, that's awesome! With timestamp, we don't need non-UTC date since user can just add their timezone |
I've read the dirkf comments and your pukkandan. As I can see it will go towards improvement of creating a possibility to add a timezone within a timestamp? Because when I added timestamp instead of upload_date in my command it seemed to return NA. |
Since historically (I'm guessing this did not change until after the generic/offset values for the |
I’m really confused. Do you want 20211025 or 20211026? |
The upload date should always be in UTC unless it is a stream or premiere. We tell YouTube to return everything in UTC. The compat opt reverts this behaviour to what yt-dl does - use the date in PT(?) tz from microformats (if available) yt-dlp/yt_dlp/extractor/youtube.py Line 4565 in ac817bc
We cannot get the timestamp. If you need precise datetime then yt-dlp is not the right tool, you need to use the data API for that. |
But as described above, it seems that If a timestamp is available, OP will be able to format it into the desired TZ (CEST) as also described above. Example: # The upload date for scheduled, live and past live streams / premieres in microformats
# may be different from the stream date. Although not in UTC, we will prefer it in this case.
# See: https://github.com/yt-dlp/yt-dlp/pull/2223#issuecomment-1008485139
- upload_date = (
- unified_strdate(get_first(microformats, 'uploadDate'))
- or unified_strdate(search_meta('uploadDate')))
- if not upload_date or (
- live_status in ('not_live', None)
- and 'no-youtube-prefer-utc-upload-date' not in self.get_param('compat_opts', [])
- ):
- upload_date = strftime_or_none(
- self._parse_time_text(self._get_text(vpir, 'dateText'))) or upload_date
- info['upload_date'] = upload_date
+ timestamp = (
+ unified_timestamp(get_first(microformats, 'uploadDate'))
+ or unified_timestamp(search_meta('uploadDate')))
+ own_upload_date = (
+ live_status not in ('not_live', None)
+ or 'no-youtube-prefer-utc-upload-date' in self.get_param('compat_opts', []))
+ if not timestamp or own_upload_date:
+ upload_date = (
+ unified_strdate(get_first(microformats, 'uploadDate'))
+ or unified_strdate(search_meta('uploadDate')))
+ else:
+ upload_date = None
+ if not (upload_date and own_upload_date):
+ if not upload_date and timestamp:
+ # TODO: complicated TZ processing to render timestamp as YYMMDD in Pacific time
+ pass
+ if not upload_date:
+ upload_date = strftime_or_none(
+ self._parse_time_text(self._get_text(vpir, 'dateText'))) or upload_date
+ if timestamp:
+ info['timestamp'] = timestamp
+ if upload_date:
+ info['upload_date'] = upload_date
+
+ if (timestamp or upload_date) and live_status not in ('is_live', 'post_live', 'is_upcoming'):
+ # Newly uploaded videos' HLS formats are potentially problematic and need to be checked
+ if timestamp:
+ upload_datetime = datetime.datetime.utcfromtimestamp(timestamp)
+ else:
+ upload_datetime = datetime_from_str(upload_date) # .replace(tzinfo=datetime.timezone.utc)
+ if upload_datetime >= datetime_from_str('today-2days'):
+ for fmt in info['formats']:
+ if fmt.get('protocol') == 'm3u8_native':
+ fmt['__needs_testing'] = True
for s_k, d_k in [('artist', 'creator'), ('track', 'alt_title')]: Output (CEST = UTC+2 = +7200s): $ yt-dlp --print upload_date --print timestamp --print '%(timestamp+7200>%Y%m%d)s' 'https://www.youtube.com/watch?v=OqjTtnmGv8s'
20211025
1635200963
20211026
$ |
If you can get the full timestamp then that must be a recent change by YouTube. In that case the extractor should be updated to extract it (as UTC) unix timestamp. |
DO NOT REMOVE OR SKIP THE ISSUE TEMPLATE
Checklist
Please make sure the question is worded well enough to be understood
As I understand, I uploaded the video on 25-10-2021, but youtube processed it on 26-10-2021 therefore this date shows up instead of 25-10-2021? Reference: #7802
Command:
yt-dlp --print upload_date "https://www.youtube.com/watch?v=OqjTtnmGv8s"
As I've seen there is no way to print the date after being processed by youtube that shows up near the description?
Or do I understand it wrongly?
Provide verbose output that clearly demonstrates the problem
yt-dlp -vU <your command line>
)'verbose': True
toYoutubeDL
params instead[debug] Command-line config
) and insert it belowComplete Verbose Output
The text was updated successfully, but these errors were encountered: