Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Twilio in Broadcast Upload Extension #7

Closed
andyboyd opened this issue Apr 11, 2018 · 22 comments
Closed

Twilio in Broadcast Upload Extension #7

andyboyd opened this issue Apr 11, 2018 · 22 comments
Assignees

Comments

@andyboyd
Copy link

Hi,

I'm trying to implement a Broadcast Upload Extension using Twilio and I seem to be having issues with memory usage causing the extension to crash due to memory pressure.

I've created a custom TVIVideoCapturer class based off your example, with the main difference being that since this is an upload extension, I'm being provided with sample buffers, rather than having to grab frames off a view myself.

So, my (simplified) general approach is:

  1. Receive CMSampleBuffer from extension's processSampleBuffer() method
  2. convert CMSampleBuffer to CVImageBuffer
  3. create TVIVideoFrame from CVImageBuffer
  4. call consumeCapturedFrame() on my capture consumer

The conversion from CMSampleBuffer to CVImageBuffer seems to be working as expected, but after consuming a few frames my extension receives memory warnings and crashes. I've tried tweaking the resolution and frame rate of the supported format in my capturer, but it doesn't seem to make a difference.

Do you have any ideas or advice?

Thanks

@ceaglest
Copy link
Contributor

Hi @andyboyd,

Thanks for writing in.

I've used ReplayKit2 to capture video from within an application, but not in a Broadcast Upload Extension. I had tried earlier versions of ReplayKit in iOS 10.x in an extension and didn't see the almost immediate memory warning and crash that you experienced. However, without published sample code or automated tests of this use case, I can't claim with certainty that we support it.

Do you have any ideas or advice?

This may be a silly question, but is there any chance you are leaking either the entire CMSampleBuffer or CVImageBuffer? TVIVideoFrame will CFRetain the input CVImageBuffer, and CFRelease it in its destructor. If you're able to share some example code, I'd be happy to take a look.

One more question, what iOS device, OS version, and SDK version are you using?

We are interested in supporting this use case. At the moment our team is wrapping up the 2.0.0 release, but once its complete we will investigate this further. Ideally, we would have published ReplayKit sample code which demonstrates a working integration with our Video SDK both in an extension and in an application.

I'll keep this ticket updated as we investigate further.

Best,
Chris

@ceaglest ceaglest self-assigned this Apr 11, 2018
@andyboyd
Copy link
Author

I've done a bit more investigation on this now. It looks very much like it's the memory usage of the capture consumer (or something beneath it) that's causing the extension to crash. The scenario that was crashing immediately for me was by just passing the raw CVImageBuffer contained in the CMSampleBuffer to the capture consumer, which is quite a high resolution, so I tried ignoring that and passing a preallocated static image on every frame instead. That allowed me to play around with the resolution more easily.

I found that I was able to get a stable feed up and running by doing the following:

  1. Setting the input image resolution to 300x200
  2. Telling the room to prefer H265 over VP8 or VP9

Both of which make complete sense, however a 300x200 image is not exactly going to deliver a great experience for my users.

At the moment I'm working on using VideoToolbox to scale my CVImageBuffers down to a lower resolution to see if I can get it stable that way before working on trying to optimise things as much as possible to try and get the quality up.

I'm not sure if it's helpful to you, but some other things I tried were dropping down the frame rate I was feeding to the capture consumer by only sending a frame to consume every few seconds, and I found that resolution was way more important than frame rate. Even when I only sent a frame every 10 seconds at a resolution of 640x480, it would crash on the second frame repeatably. Given that it was able to cope with 30 fps of 300x200, it seems unlikely that 2 frames of 640x480 uses more memory than it can handle. Is it possible that on receiving the second frame the capture consumer preallocates enough memory for a larger number of frames to optimise the speed of its memory allocations or something?

Oh, and I'm using an iPhone X running iOS 11.3, with the iOS 11.3 SDK and Twilio 2.0.0-preview9. I can't give you my actual app code, but if it's helpful to you, I can put a small sample together to demonstrate the issue.

@ceaglest
Copy link
Contributor

ceaglest commented Apr 12, 2018

Hi @andyboyd,

Thank you for the information, this is really helpful.

At the moment I'm working on using VideoToolbox to scale my CVImageBuffers down to a lower resolution to see if I can get it stable that way before working on trying to optimise things as much as possible to try and get the quality up.

I'm not sure if it's helpful to you, but some other things I tried were dropping down the frame rate I was feeding to the capture consumer by only sending a frame to consume every few seconds, and I found that resolution was way more important than frame rate.

There is a betterment to save memory in our video pipeline. If you choose H.264 then we will use VideoToolbox internally to encode the video, in some cases scaling it down as available bandwidth demands in our VTCompressionSession. The problem is that the video pipeline needs to support software codecs like VP8 as well, and we maintain an I420 buffer pool for this purpose. When a frame is captured we don't know if hardware or software codecs will be needed so we pull a buffer from the pool and package it along with the captured frame.

If you're never using VP8 (and you don't have renderers which require an I420 conversion), then this buffer is adding a memory overhead of:

Size = Width * Height * 1.5

At least when no downscaling is requested by the encoder(s). On an iPhone X, where the screen is 2436x1125 this adds up to:

Size = 4,110,750 bytes/frame

Typically there are only 1-2 of these buffers in flight at a time, but this cost is pretty important if you are in an extension. I should also point out that our H.264 codec is currently limited to no more than 1280x720 anyways, so passing such large frames will actually cause the encode to fail.

Oh, and I'm using an iPhone X running iOS 11.3, with the iOS 11.3 SDK and Twilio 2.0.0-preview9. I can't give you my actual app code, but if it's helpful to you, I can put a small sample together to demonstrate the issue.

I would recommend sticking with the latest 2.0.0-beta4 if possible, but I don't expect that you will see any memory usage reductions by making the change. As I mentioned earlier, we won't be able to look into this issue further until 2.0.0 is out, but I may ask for more information (like sample code) at that time.

Best,
Chris

@andyboyd
Copy link
Author

I've managed to get something up and running. I ended up resizing the CVImageBuffers from the CMSampleBuffers down to 1/4 (i.e. half width and half height) their original size, and it seems to be working pretty well. I do think it's just a case of tuning the input resolution of the frames being sent to Twilio to control the memory usage.

I guess my issue is pretty much resolved now, so thanks for you help.

If you're using VideoToolbox internally, I do wonder if it would be possible to simply provide CMSampleBuffers to the capture consumer, along with a target resolution, and letting it subsample them in the compression session, rather than having to convert them to a smaller input size, and then the compression session compressing them again. It seems like that would be more efficient, but I suspect it depends on the internals of VTCompressionSession and how it has to allocate memory while it's working. It seems like that interface would play nicer with broadcast extensions than what's currently available.

@ceaglest
Copy link
Contributor

ceaglest commented Apr 13, 2018

Hi,

If you're using VideoToolbox internally, I do wonder if it would be possible to simply provide CMSampleBuffers to the capture consumer, along with a target resolution, and letting it subsample them in the compression session, rather than having to convert them to a smaller input size, and then the compression session compressing them again. It seems like that would be more efficient, but I suspect it depends on the internals of VTCompressionSession and how it has to allocate memory while it's working. It seems like that interface would play nicer with broadcast extensions than what's currently available.

Yes, we are considering letting the developer provide cropping and/or scaling information to TVIVideoCaptureConsumer. This would allow us to allocate smaller buffers when using the software (VP8, VP9) pipeline, and skip the extra step of your capturer having to perform scaling up front.

When using VTCompressionSession our goal is to feed frames directly into it if possible. However, there are some case (like where pixel format conversions or rotations are required) where we have the session allocate an input buffer pool and copy your frames into it.

I'm very glad that you've got something up and running. I'll keep this ticket open until we have time to revisit this use case post-2.0.

Best,
Chris

@andyboyd
Copy link
Author

Hi again Chris,

A bit of a follow up question for you.

I'm trying to add Audio to my stream, and I've been able to publish a local audio track successfully, but it's not receiving any content. I don't think the extension has a default audio session the way the audio track is expecting, because it gives me the raw audio samples in the processSampleBuffer callback.

I get the feeling the way to do this in the extension is to implement a TVIAudioSink, and then give that the CMSampleBuffers from my processSampleBuffer callbacks, but I'm not quite clear on what I should be doing inside the renderSample function of the audio sink.

Am I on the right track with that at all, or am I misunderstanding the way it's supposed to work?

@andyboyd
Copy link
Author

andyboyd commented May 1, 2018

I've made some progress, but still in need of a bit of help.

So far, I've created my own TVIAudioDevice, I'm calling TVIAudioDeviceFormatChanged() in startCapturing(), and keeping a reference to the context. Then, when I receive audio samples from ReplayKit, I'm calling TVIAudioDeviceWriteCaptureData(). I am successfully publishing an audio track, and audio is coming through on it, but it's all garbled and corrupted. Just sounds like a horrible crackling sound, though I can hear that it's definitely responding to the noises I make into the microphone, because the crackling changes, and sounds like R2D2 is in the background.

My theory is that somehow in converting from CMSampleBuffers to the UnsafeMutablePointer that TVIAudioDeviceWriteCaptureData() requires, something is getting warped somehow. Possibly the timings are out or something, but I can't really find any documentation anywhere about what I need to do to resolve this. It's just trial and error.

@ceaglest
Copy link
Contributor

ceaglest commented May 1, 2018

Hey @andyboyd,

Sorry for the late response, I was out of office yesterday.

You are on the right track by creating your own TVIAudioDevice. Unfortunately, this is a case where we don't have sample code specific to ReplayKit yet. Have you had a look at AudioDeviceExample?

My theory is that somehow in converting from CMSampleBuffers to the UnsafeMutablePointer that TVIAudioDeviceWriteCaptureData() requires, something is getting warped somehow. Possibly the timings are out or something, but I can't really find any documentation anywhere about what I need to do to resolve this. It's just trial and error.

It's a matter of making sure the TVIAudioFormat that you are using matches that of the incoming CMSampleBuffer's AudioStreamBasicDescription in terms of the number of channels and sample rate. You could do a sanity check by comparing the two ASBDs, but in general this is where you want to derive the TVIAudioFormat from.

Access AudioStreamBasicDescription from CMSampleBuffer: https://github.com/twilio/video-quickstart-swift/blob/master/AudioSinkExample/AudioSinks/ExampleSpeechRecognizer.m#L86

Create AudioStreamBasicDescription from TVIAudioFormat: https://twilio.github.io/twilio-video-ios/docs/latest/Classes/TVIAudioFormat.html#//api/name/streamDescription

Best,
Chris

@andyboyd
Copy link
Author

andyboyd commented May 2, 2018

Thanks Chris,

Coincidentally, I did get the mic audio streaming successfully about half an hour before you replied! Isn't that always the way!

The key in the end was related to the buffer size being sent to TVIAudioDeviceWriteCaptureData(). I had been sending through the buffer size from the audio device's capture format every time, but I needed to send through the smaller value of that and mDataByteSize of the CMBlockBuffer contained within the sample buffer.

@ceaglest
Copy link
Contributor

ceaglest commented May 2, 2018

Awesome, it's great to hear that you've got an AudioDevice up and running with ReplayKit.

@andyboyd
Copy link
Author

Hi again @ceaglest

I have another question on this issue, not sure if it's something I'm misunderstanding, or if it's a limitation of the way the Twilio SDK currently works.

The mechanism for getting the captureContext/renderContext on the TVIAudioDevice is to call TVIAudioDeviceCaptureFormatChanged()/TVIAudioDeviceRenderFormatChanged(), which triggers a query to the audio device's captureFormat or renderFormat property.

This works just fine, but in a broadcast extension, the extension is continually getting samples from both the microphone and the running app. These are usually in different formats. It's quite problematic calling TVIAudioDeviceCaptureFormatChanged() every time a different type of sample comes in, since doing so usually results in a dropped sample while the audio device reinitialises with the new format. In the broadcast extension, since the samples get interleaved, this usually results in one or both of the sources getting skipped entirely, because the samples are dropped while the format changes.

I imagine this is probably a limitation of the TwilioVideo SDK having a single pipeline each for rendering and capturing, but is there anything you can think of that would help with this situation? If not, I guess it's another feature request!

Thanks.

PS, I'm using SDK version 2.2.1 on iOS 11

@ceaglest
Copy link
Contributor

ceaglest commented Aug 3, 2018

Hi @andyboyd,

Sorry for the delayed response.

I imagine this is probably a limitation of the TwilioVideo SDK having a single pipeline each for rendering and capturing, but is there anything you can think of that would help with this situation? If not, I guess it's another feature request!

Unfortunately a TVIAudioDevice can only work with a single capture and recording format at a time. Your capturer should deliver a continuous stream of raw audio samples, with format changes only when needed. A format change causes other elements of the audio pipeline to be reconfigured, so it shouldn't be done for every slice of audio.

In this case, I think what you want is to pick a canonical recording format (either the mic or app audio format) and convert the other input to match that. Then mix the result together in either mono or stereo, and deliver it to us from there.

You can use an AudioConverter to perform a channel and/or sample rate conversion, like this example.

I think this is a great question for @piyushtank because he is working on a ReplayKit example:

twilio/video-quickstart-ios#287

Lets keep the discussion going,
Chris

@julien-l
Copy link

Hi,
Do you have updates on code samples for using ReplayKit with the TwilioVideo SDK?
Thanks

@piyushtank
Copy link
Contributor

piyushtank commented Sep 13, 2018

@julien-l We are discussing prioritizing our ReplayKit sample code ticket and getting it into the next sprint. We have a work-in-progress PR available which demonstrates how to use ReplayKit with TwilioVideo - twilio/video-quickstart-ios#287
We were side tracked by other high priority issues with our Voice and Video SDKs. Internally we are discussing to work on the TODO items listed on the PR and get it merged in the coming sprint. I will keep you posted.

@etown
Copy link

etown commented Nov 20, 2018

@andyboyd
“The key in the end was related to the buffer size being sent to TVIAudioDeviceWriteCaptureData(). I had been sending through the buffer size from the audio device's capture format every time, but I needed to send through the smaller value of that and mDataByteSize of the CMBlockBuffer contained within the sample buffer.”

Could you possibly share some example code? Thanks you!

@ceaglest
Copy link
Contributor

ceaglest commented Dec 4, 2018

Hello folks,

We have released 2.6.0-preview1, with new Video Source APIs that significantly improve the performance of streaming ReplayKit content. You can try our updated example now.

Could you possibly share some example code? Thanks you!

If the original poster would be willing to share, that is great! We do plan to demonstrate mixing in the example app, but haven't had a chance to update it yet. I'll circle back with the team on where we could fit this in.

Best,
Chris

@etown
Copy link

etown commented Dec 17, 2018

@ceaglest Great news on the 2.6.0 preview!

Regarding the audio, In our use case we are seeking just the app audio and do not need to mix in the microphone audio. We attempted capturing app audio without success using the ExampleReplayKitAudioCapturer. Looking through the issues and comments, I’m not sure that anybody has successfully captured the app audio. Have you attempted to capture app audio? Thank you!

@ceaglest
Copy link
Contributor

Hi @etown,

We attempted capturing app audio without success using the ExampleReplayKitAudioCapturer. Looking through the issues and comments, I’m not sure that anybody has successfully captured the app audio. Have you attempted to capture app audio? Thank you!

I have not, but judging by the ticket that was filed this morning (twilio/video-quickstart-ios#339) it should be possible to capture just app audio, without mixing, by making some small changes.

Best,
Chris

@etown
Copy link

etown commented Dec 17, 2018 via email

@ceaglest
Copy link
Contributor

Hi @etown,

The supplied sizeInBytes is invalid. The sizeInBytes must match with the size returned by TVIAudioFormat:bufferSize utility method.

I looked into this briefly and posted an update in:
twilio/video-quickstart-ios#339

Best,
Chris

@Moriquendi
Copy link

@ceaglest I wonder if there's any way to initiate the Twilio call from the main app and then during the call start sharing the screen.
In the code example, when broadcasting the screen, user is connected to Twilio room from within the upload extension. What if someone connected to the room from the main app and then want to initiate screen broadcast?

I was playing around with passing PixelBuffers to the main app via shared memory and then pushing those to the Twilio, but I couldn't make it work :( (something's broken when copying buffers)

@ceaglest
Copy link
Contributor

ceaglest commented Apr 25, 2019

Hi Developers,

I believe all the original questions in this issue have been answered, so I'm closing it out. To summarize, our example has a broadcast extension which:

  1. Publishes video frames from ReplayKit
  2. Allows for efficient downscaling of video frames (using H.264)
  3. Is not required to subscribe to Tracks (in a Group Room)
  4. Publishes either application audio or microphone audio from ReplayKit

We will continue to improve on our example code in future iterations.

Best,
Chris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants