Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cbc.ca] Video ID issues when using Archive tag #21761

Open
5 tasks done
darthhaggis opened this issue Jul 12, 2019 · 1 comment
Open
5 tasks done

[cbc.ca] Video ID issues when using Archive tag #21761

darthhaggis opened this issue Jul 12, 2019 · 1 comment

Comments

@darthhaggis
Copy link

darthhaggis commented Jul 12, 2019

Checklist

  • I'm reporting a malfunctioning site support
  • I've verified that I'm running youtube-dl version 2019.07.02
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Description

WRITE DESCRIPTION HERE

I have been having issues with the https://watch.cbc.ca (or the recently updated https://gem.cbc.ca) site whenever the --download-archive tag is used.
It appears that what is happening is that when the program is sent either a URL for a season playlist or a batch file containing the URLs for multiple episodes in a season and the --download-archive tag is also used, then only half of the episodes get downloaded, with every other one being skipped. Using the file output formatting -o "%(series)s - %(season_number)sx%(episode_number)02d - %(title)s [%(id)s].%(ext)s" it seems to show that the Video IDs are somehow getting shifted and mixed up at some point. Therefore, if a file archive is being used (and checked against) then every other file is being skipped because its ID is already in the database.
This is what I believe is currently going on:

<checks if VideoID1 is in archive>
<VideoID1 not found>
<starts download of VideoID1>
<VideoID1 saved as VideoID2>
<VideoID2 added to archive>
<proceeds to next file>
<checks if VideoID2 is in archive>
<VideoID2 found>
<skip VideoID2>
<proceeds to next file>
<checks if VideoID3 is in archive>
<VideoID3 not found>
<starts download of VideoID3>
<VideoID3 saved as VideoID4>
<VideoID4 added to archive>
<proceeds to next file>
<checks if VideoID4 is in archive>
<VideoID4 found>
<skip VideoID4>
<proceeds to next file>
<etc>

As a result, you can see that if you start with the first episode, then only the odd numbered episodes get downloaded. (Similar results for the even numbers if your first episode is evenly numbered.) This problem only occurs though when you are using an archive to avoid downloading the same file multiple times.
Regarding Video IDs, I have noticed that what is being saved into the finished file corresponds to the ID listed for the next episode's URL. Is there something unique with this site that the IDs are somehow getting shifted in such a way that it is causing this behaviour?

Here is an example using the following URLs for reference:
Season 1 Playlist: https://gem.cbc.ca/season/back-in-time-for-dinner-uk/season-1/1a9bb35b-f429-4b0a-9647-53ef460a3f1c
Episode 1: https://gem.cbc.ca/media/back-in-time-for-dinner-uk/season-1/episode-1/38e815a-010e6f4b76c
Episode 2: https://gem.cbc.ca/media/back-in-time-for-dinner-uk/season-1/episode-2/38e815a-010e6ecc82d
Episode 3: https://gem.cbc.ca/media/back-in-time-for-dinner-uk/season-1/episode-3/38e815a-010e70354a2
Episode 4: https://gem.cbc.ca/media/back-in-time-for-dinner-uk/season-1/episode-4/38e815a-010e749392c

Episode 1's ID should be 38e815a-010e6f4b76c
Episode 2's ID should be 38e815a-010e6ecc82d
However, episode 1 gets saved as "Back in Time for Dinner (UK) - 1x01 - 1950s [38e815a-010e6ecc82d].mp4" (using the previously mentioned output format), and 38e815a-010e6ecc82d gets added to the archive after episode 1 is finished downloading rather than 38e815a-010e6f4b76c. As a result, when episode 2 starts, its ID is already in the archive and gets skipped, with the program moving on to episode 3.

Any assistance in figuring out what is going on so we can confidently continue to use an archive file with a batch file or playlist URL (without having to go back and edit out half of the IDs) would be appreciated. Thank you.

So... any thoughts?

@darthhaggis
Copy link
Author

Any updates on this?
I've also noticed that the last file in a list gets saved with a much longer ID tag which doesn't seem to be recorded anywhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant