Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extractor/nbc] Fix NBCOlympicsStreamIE #29688

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nchilada
Copy link

@nchilada nchilada commented Jul 29, 2021

Please follow the guide below

  • You will be asked some questions, please read them carefully and answer honestly
  • Put an x into all the boxes [ ] relevant to your pull request (like that [x])
  • Use Preview tab to see how your pull request will actually look like

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR fixes NBCOlympicsStreamIE, which applies to stream.nbcolympics.com and is therefore the main extractor for the NBC Olympics in 2020/2021. See #29665, yt-dlp/yt-dlp#617.

This does not address NBCOlympicsIE, which is responsible for any news articles on www.nbcolympics.com that happen to have embedded videos. That extractor is still broken and fixing it doesn’t feel like a high priority.

@asedeno
Copy link

asedeno commented Aug 3, 2021

-http_seekable requires ffmpeg n4.3 or newer; with older versions this errors out. Worked great once I built a newer ffmpeg than the one in Debian 10. Thanks!

@asedeno
Copy link

asedeno commented Aug 3, 2021

It appears not all videos require getting a tokenized url, and for those that don't we get a 400 BAD REQUEST while generating it. Using the non-tokenized source_url in place of the tokenized url seems to work for those though.

event_config['cdnToken'] (boolean) is the indicator you want.

@nchilada
Copy link
Author

nchilada commented Aug 3, 2021

-http_seekable requires ffmpeg n4.3 or newer; with older versions this errors out. Worked great once I built a newer ffmpeg than the one in Debian 10. Thanks!

@asedeno good to know, thanks!

For people who are stuck with older versions of ffmpeg, one option would be to use a (local) proxy to strip request headers from the outbound requests. That's what I did initially:
Charles Proxy: fix ffmpeg requests to akamaized.net

It appears not all videos require getting a tokenized url, and for those that don't we get a 400 BAD REQUEST while generating it. Using the non-tokenized source_url in place of the tokenized url seems to work for those though.

event_config['cdnToken'] (boolean) is the indicator you want.

Hmm, I do see event_config['cdnToken']: True in the videos that I've been testing with. I think you're saying, if that property is False, then we must avoid the code that computes tokenized_url and simply use source_url?

It might be helpful if you could link to such a stream, but failing that, I'm happy to make the change and keep it if you say it works for you. I just don't want to submit a branch of code that hasn't been run by someone.

In any case, thank you very much for taking a look at this and providing important feedback! It's very late here but I will check back tomorrow.

@nchilada nchilada marked this pull request as ready for review August 3, 2021 03:51
@asedeno
Copy link

asedeno commented Aug 3, 2021

It appears not all videos require getting a tokenized url, and for those that don't we get a 400 BAD REQUEST while generating it. Using the non-tokenized source_url in place of the tokenized url seems to work for those though.
event_config['cdnToken'] (boolean) is the indicator you want.

Hmm, I do see event_config['cdnToken']: True in the videos that I've been testing with. I think you're saying, if that property is False, then we must avoid the code that computes tokenized_url and simply use source_url?

It might be helpful if you could link to such a stream, but failing that, I'm happy to make the change and keep it if you say it works for you. I just don't want to submit a branch of code that hasn't been run by someone.

In any case, thank you very much for taking a look at this and providing important feedback! It's very late here but I will check back tomorrow.

Yeah, that's what I'm saying. Here's one that appears that way for me.

https://stream.nbcolympics.com/gymnastics-event-finals-mens-floor-pommel-horse-womens-vault-bars

@nchilada
Copy link
Author

nchilada commented Aug 3, 2021

@asedeno thanks for the super quick link! You were exactly right. Update incoming, with a new test case

@wesnm
Copy link

wesnm commented Aug 3, 2021

This got me to the point of ffmpeg starting, but it always died with a 403 error. After adding these extra headers, things started working:

--add-header Origin:https://stream.nbcolympics.com
--add-header Referer:https://stream.nbcolympics.com/
--add-header Sec-Fetch-Dest:empty
--add-header Sec-Fetch-Mode:cors
--add-header Sec-Fetch-Site:cross-site

I didn't test different combinations to see if they were all required. Just added what was missing from the browser. After this, ffmpeg downloads the stream just fine.

Edit: Now it works fine this morning without the extra headers, strange.

Suggest using the event_config["eventStatus"] to determine if the event is live or not. The value is "live" for a live broadcast, otherwise set to "replay".

@nchilada
Copy link
Author

nchilada commented Aug 3, 2021

This got me to the point of ffmpeg starting, but it always died with a 403 error. After adding these extra headers, things started working:

--add-header Origin:https://stream.nbcolympics.com
--add-header Referer:https://stream.nbcolympics.com/
--add-header Sec-Fetch-Dest:empty
--add-header Sec-Fetch-Mode:cors
--add-header Sec-Fetch-Site:cross-site

I didn't test different combinations to see if they were all required. Just added what was missing from the browser. After this, ffmpeg downloads the stream just fine.

Edit: Now it works fine this morning without the extra headers, strange.

@wesnm interesting, thanks for noting this. I will take a look later today. It’s conceivable that a server would restrict its activity based on Origin or partial Referer, though I haven't seen such behavior in the last couple days.

Suggest using the event_config["eventStatus"] to determine if the event is live or not. The value is "live" for a live broadcast, otherwise set to "replay".

It makes sense that a live stream might behave differently than a reply! I’ll check whether the code works on a live stream, though I won’t be able to add a long-lasting test case for that.

@nchilada
Copy link
Author

nchilada commented Aug 3, 2021

This got me to the point of ffmpeg starting, but it always died with a 403 error. After adding these extra headers, things started working:

--add-header Origin:https://stream.nbcolympics.com
--add-header Referer:https://stream.nbcolympics.com/
--add-header Sec-Fetch-Dest:empty
--add-header Sec-Fetch-Mode:cors
--add-header Sec-Fetch-Site:cross-site

I didn't test different combinations to see if they were all required. Just added what was missing from the browser. After this, ffmpeg downloads the stream just fine.

Edit: Now it works fine this morning without the extra headers, strange.

@wesnm interesting, thanks for noting this. I will take a look later today. It’s conceivable that a server would restrict its activity based on Origin or partial Referer, though I haven't seen such behavior in the last couple days.

I thought about this some more and I don't think it's worth adding prophylactic Referer and Origin headers, nor the preflight OPTIONS requests that are also part of CORS. When servers require Referer or Origin, I think it's usually a kind of CSRF protection, and I don't know if NBC and their technology partners would bother to implement any kind of CSRF in the context of video streaming. I'm willing to wait and see.

Suggest using the event_config["eventStatus"] to determine if the event is live or not. The value is "live" for a live broadcast, otherwise set to "replay".

It makes sense that a live stream might behave differently than a reply! I’ll check whether the code works on a live stream, though I won’t be able to add a long-lasting test case for that.

Darn, I'm not able to load any live streams with my credentials, either via a web browser or via youtube-dl. I can't figure out if such pages even allow clients to go back to the very beginning of the stream. Can we skip this for now? I assume replays are the priority.

@nchilada
Copy link
Author

nchilada commented Aug 3, 2021

@remitamine @dstftw or others, could someone please approve the workflow when you get a chance? I think this PR is ready to go!

@nchilada nchilada changed the title [extractor/nbc] Fix NBC Olympics extractor [extractor/nbc] Fix NBCOlympicsStreamIE Aug 3, 2021
@asedeno
Copy link

asedeno commented Aug 3, 2021

Darn, I'm not able to load any live streams with my credentials, either via a web browser or via youtube-dl. I can't figure out if such pages even allow clients to go back to the very beginning of the stream. Can we skip this for now? I assume replays are the priority.

My credentials work for live streams, so I can test things if needed, though I agree replays are the priority and shouldn't block this.

@wesnm
Copy link

wesnm commented Aug 4, 2021

I thought about this some more and I don't think it's worth adding prophylactic Referer and Origin headers, nor the preflight OPTIONS requests that are also part of CORS. When servers require Referer or Origin, I think it's usually a kind of CSRF protection, and I don't know if NBC and their technology partners would bother to implement any kind of CSRF in the context of video streaming. I'm willing to wait and see.

If it's not needed to work, I wouldn't include it. I don't know what was going on last night, maybe just needed sleep.

Suggest using the event_config["eventStatus"] to determine if the event is live or not. The value is "live" for a live broadcast, otherwise set to "replay".

It makes sense that a live stream might behave differently than a reply! I’ll check whether the code works on a live stream, though I won’t be able to add a long-lasting test case for that.

Darn, I'm not able to load any live streams with my credentials, either via a web browser or via youtube-dl. I can't figure out if such pages even allow clients to go back to the very beginning of the stream. Can we skip this for now? I assume replays are the priority.

One thing the "live" status does is determine if youtube-dl will delegate to alternate downloaders. If an event is flagged as live, it will only download using ffmpeg with a single worker, which is much much slower.

These live events in the browser do allow rewinding to any point in the stream.

pukkandan pushed a commit to yt-dlp/yt-dlp that referenced this pull request Aug 4, 2021
PR: ytdl-org/youtube-dl#29688
Closes: #617, ytdl-org/youtube-dl#29665

* Livestreams are untested
* If using ffmpeg as downloader, v4.3+ is needed since `-http_seekable` option is necessary
* Instead of making a seperate key for each arg that needs to be passed to ffmpeg, I made `_ffmpeg_args`
* This deprecates `_seekable`, but the option is kept for compatibility

Authored by: nchilada, pukkandan
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
PR: ytdl-org/youtube-dl#29688
Closes: yt-dlp#617, ytdl-org/youtube-dl#29665

* Livestreams are untested
* If using ffmpeg as downloader, v4.3+ is needed since `-http_seekable` option is necessary
* Instead of making a seperate key for each arg that needs to be passed to ffmpeg, I made `_ffmpeg_args`
* This deprecates `_seekable`, but the option is kept for compatibility

Authored by: nchilada, pukkandan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants