Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft videos (eg on mybuild.microsoft.com) failing #25363

Open
snarfed opened this issue May 22, 2020 · 2 comments
Open

Microsoft videos (eg on mybuild.microsoft.com) failing #25363

snarfed opened this issue May 22, 2020 · 2 comments

Comments

@snarfed
Copy link

@snarfed snarfed commented May 22, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.05.08
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

youtube-dl -v https://mybuild.microsoft.com/sessions/be9d4903-dc15-4704-8436-01a4b3273443
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://mybuild.microsoft.com/sessions/be9d4903-dc15-4704-8436-01a4b3273443']
[debug] Encodings: locale UTF-8, fs utf-8, out UTF-8, pref UTF-8
[debug] youtube-dl version 2020.05.08
[debug] Python version 3.6.5 (CPython) - Darwin-19.4.0-x86_64-i386-64bit
[debug] exe versions: ffmpeg 4.2.2, ffprobe 4.2.2, rtmpdump 2.4
[debug] Proxy map: {}
[generic] be9d4903-dc15-4704-8436-01a4b3273443: Requesting header
WARNING: Could not send HEAD request to https://mybuild.microsoft.com/sessions/be9d4903-dc15-4704-8436-01a4b3273443: HTTP Error 404: Not Found
[generic] be9d4903-dc15-4704-8436-01a4b3273443: Downloading webpage
ERROR: Unable to download webpage: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type  youtube-dl -U  to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
  File "/usr/local/bin/youtube-dl/youtube_dl/extractor/common.py", line 627, in _request_webpage
    return self._downloader.urlopen(url_or_request)
  File "/usr/local/bin/youtube-dl/youtube_dl/YoutubeDL.py", line 2238, in urlopen
    return self._opener.open(req, timeout=self._socket_timeout)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Description

Hi all! Just FYI, videos on https://mybuild.microsoft.com/ are unhappy right now. Example URL: https://mybuild.microsoft.com/sessions/be9d4903-dc15-4704-8436-01a4b3273443

Odd problem. These URLs evidently return HTTP 404, not 200, but they serve valid HTML and assets, so browsers render them ok. and the contained videos play fine.

example screenshot of browser dev tools showing the 404 response and page loaded and rendered ok:

image

but curl 404s on both HEAD and GET. such an odd setup by MS.

$ curl -I https://mybuild.microsoft.com/sessions/23912de2-1531-4684-b85a-d57ac30af09e
HTTP/2 404
content-length: 63225
content-type: text/html
...

$ curl -v -o /dev/null https://mybuild.microsoft.com/sessions/23912de2-1531-4684-b85a-d57ac30af09e
* Connected to mybuild.microsoft.com (40.112.243.5) port 443 (#0)
...
> GET /sessions/23912de2-1531-4684-b85a-d57ac30af09e HTTP/2
> Host: mybuild.microsoft.com
...
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 404 
< content-length: 63225
< content-type: text/html
...
@hheimbuerger
Copy link

@hheimbuerger hheimbuerger commented May 28, 2020

I considered building this, but ultimately gave up and downloaded the videos I was interested in semi-manually. Here's some notes, in case anyone wants to pick this up:

First of all, I discovered that 404 as well when I started implementing this... super weird! 😆

Some videos are available on https://channel9.msdn.com/Events/Build/2020 and there's an official download link there. Mostly the ones with BDL session IDs.
However, many are missing, too. And they don't seem to add more (at least to this Event playlist), so I'm not sure if the other ones will eventually get there.

On Channel9, this seems to be a new "subsystem", as the existing Channel9 extractor of youtube-dl doesn't work on those videos. I looked into that a little bit, and preliminarily called this new system 'Medius', as that seems to be a term that pops up again and again.
Correspondingly, there's a page at https://medius.studios.ms/, which seems to be some internal management portal. It's publicly accessible, though. There is a "Microsoft Build 2020" channel, but it's also very incomplete.

Both there and on mybuild.microsoft.com, the videos are showing as iframe embeds, e.g. in your case: https://medius.studios.ms/Embed/video-nc/B20-KEY01A
Unfortunately, URLs seem to be built and loaded in JS, so I couldn't find a super easy way to architect an extractor. That's pretty much why I gave it up — seemed to be too much work for a one-time event.

I downloaded those videos by opening the dev tools, then loading the session page (i.e. the 404ing page you linked above), and on the Network tab, setting a filter for manifest. When pressing the play button, that should leave you with exactly one XHR. In your case: https://amsmediusw-ak.studios.ms/71e639f5-56b9-44f2-afeb-d702ee702ec6/Build2020Satya.ism/manifest(format=mpd-time-csf)

Dropping that URL into youtube-dl gives you a perfectly working video file. Just without meta data of course, as it is using the generic extractor.

One last problem is that the download is super slow! It shows a high download bandwidth, but also calculates hours of total downloading time. I came to the conclusion that this is probably some kind of rate limiting I'm hitting there (even though Chrome doesn't appear to be affected by it). Further analysis showed that I'm often getting around 42.1-42.2s (yes, that precisely!) of response time when requesting e.g. https://amsmediusw-ak.studios.ms/. That leads to many hours of download time with youtube-dl, as those 42s are affecting every single one of the hundreds of DASH fragments it needs to request.
I reported this issue to urllib3 (in retrospect, probably not the right target group, but that's how far down I could trace it at the time): urllib3/urllib3#1879

The workaround is weird, but reliable: set a timeout! Any timeout, it doesn't matter, the timeout isn't actually hit. Just setting a timeout makes the response time go down to way under 1s.

In summary, you can download your video with the following command:

youtube-dl --socket-timeout=10 --output="Microsoft Build 2020 - Empowering every developer, with Satya Nadella" "https://amsmediusw-ak.studios.ms/71e639f5-56b9-44f2-afeb-d702ee702ec6/Build2020Satya.ism/manifest(format=mpd-time-csf)"
@snarfed
Copy link
Author

@snarfed snarfed commented May 28, 2020

wow! that's some intense sleuthing. and yup, that process worked for the other video i wanted to download. thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.