New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add user-specified callback to video+image encoders (H.264 and PTS) #97

Closed
d2rk opened this Issue May 15, 2014 · 15 comments

Comments

Projects
None yet
2 participants
@d2rk

d2rk commented May 15, 2014

I would like to sync video and audio frames. Is it any way to fix/edit video frame [timestamp (PTS)] https://github.com/waveform80/picamera/blob/master/picamera/encoders.py#L78)? Because I don't think here is any timing information in an h.264 stream.

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 15, 2014

There seems to be timing when I query it...

>>> import picamera
>>> camera = picamera.PiCamera()
>>> camera.resolution = (1280, 720)
>>> camera.framerate = 30
>>> camera.start_recording('/dev/null', format='h264')
>>> camera.frame
PiVideoFrame(index=57, keyframe=False, frame_size=17913L, video_size=1010411L, split_size=1010411L, timestamp=1838431L, header=False)
>>> camera.frame
PiVideoFrame(index=139, keyframe=False, frame_size=21132L, video_size=2663364L, split_size=2663364L, timestamp=4464760L, header=False)
>>> camera.frame
PiVideoFrame(index=183, keyframe=False, frame_size=27L, video_size=3468427L, split_size=3468427L, timestamp=None, header=True)
>>> camera.frame
PiVideoFrame(index=229, keyframe=False, frame_size=19927L, video_size=4394489L, split_size=4394489L, timestamp=7386552L, header=False)
>>> camera.frame
PiVideoFrame(index=265, keyframe=False, frame_size=18767L, video_size=5125989L, split_size=5125989L, timestamp=8535571L, header=False)
>>> camera.frame
PiVideoFrame(index=304, keyframe=False, frame_size=18172L, video_size=5846855L, split_size=5846855L, timestamp=9815908L, header=False)
>>> camera.frame
PiVideoFrame(index=335, keyframe=False, frame_size=18183L, video_size=6496454L, split_size=6496454L, timestamp=10800780L, header=False)
>>> camera.frame
PiVideoFrame(index=364, keyframe=False, frame_size=16082L, video_size=7029499L, split_size=7029499L, timestamp=11752823L, header=False)
>>> camera.frame
PiVideoFrame(index=393, keyframe=False, frame_size=17101L, video_size=7626398L, split_size=7626398L, timestamp=12672039L, header=False)
>>> camera.frame
PiVideoFrame(index=431, keyframe=False, frame_size=19795L, video_size=8385680L, split_size=8385680L, timestamp=13886716L, header=False)
>>> camera.frame
PiVideoFrame(index=467, keyframe=False, frame_size=17775L, video_size=9053536L, split_size=9053536L, timestamp=15068565L, header=False)
>>> camera.frame
PiVideoFrame(index=505, keyframe=False, frame_size=18411L, video_size=9824806L, split_size=9824806L, timestamp=16283241L, header=False)
>>> camera.frame
PiVideoFrame(index=526, keyframe=False, frame_size=18467L, video_size=10206506L, split_size=10206506L, timestamp=16972652L, header=False)
>>> camera.frame
PiVideoFrame(index=554, keyframe=False, frame_size=16888L, video_size=10781451L, split_size=10781451L, timestamp=17859040L, header=False)
>>> camera.frame
PiVideoFrame(index=587, keyframe=False, frame_size=16899L, video_size=11395820L, split_size=11395820L, timestamp=18942399L, header=False)
>>> camera.frame
PiVideoFrame(index=616, keyframe=False, frame_size=19440L, video_size=12011487L, split_size=12011487L, timestamp=19861615L, header=False)
>>> camera.frame
PiVideoFrame(index=643, keyframe=False, frame_size=19372L, video_size=12539546L, split_size=12539546L, timestamp=20748001L, header=False)
>>> camera.frame
PiVideoFrame(index=670, keyframe=False, frame_size=17088L, video_size=13052561L, split_size=13052561L, timestamp=21634391L, header=False)
>>> camera.stop_recording()

There's one frame up there where timestamp is None but it's a header frame so that's understandable. Are you seeing no timing information at all?

@d2rk

This comment has been minimized.

d2rk commented May 15, 2014

No, I can see timing information, but it is not correct when I have two streams. I need to sync PTS for video stream and audio stream at some point. E.g. when audio goes faster then video has to "speed up".

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 15, 2014

How are you attempting to sync the two streams? I think you're correct that there's no timing information embedded in the H.264 stream itself; the meta-data provided by the frame property is separate and if you're sync'ing two streams then presumably you need to write some frame indexes and their corresponding timestamps to some external source to use as a reference.

Unfortunately I'm not particularly familiar with AV sync (beyond playing with it in cinelerra and other systems) so I'm not sure software that performs this function usually requires for such purposes.

@d2rk

This comment has been minimized.

d2rk commented May 15, 2014

I would like to do something like this by manually manage two streams and correct their PTS.

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 16, 2014

I've trawled through that link briefly, and as far as I can tell they're using audio as the base PTS. I can't see anywhere that actually reads the PTS from the video encoder; the video PTS is simply counted as frames come in and then tweaked to bring it in sync with the audio PTS (this is the only place video_current_pts is set and as you can see in that function, it's either incremented (VIDEO_PTS_STEP) with adjustments (+/- 150), or simply reset to audio_current_pts).

In other words, this approach (if implemented in Python) wouldn't use the PiCamera.frame property at all; the video timestamp provided by the encoder is unused. That said, I don't think this approach could (currently) be implemented with picamera as it requires some code to run for every frame and picamera doesn't (yet) provide a means to hook into the encoder's callback.

Incidentally, I also had a look at the Pi's H.264 output with the handy h264bitstream utility and you are correct that no timing information is present in the output stream (the timing_info_present_flag in the VUI portion of the SPS is 0).

@d2rk

This comment has been minimized.

d2rk commented May 25, 2014

OK, then callback for encoder would be nice for the first time.

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 25, 2014

Yup, I'll change this ticket to an enhancement one to remind me to add a user-specified callback parameter to the various recording and capture functions. Shouldn't be difficult so I'm confident it'll go in 1.5

@waveform80 waveform80 changed the title from h.264 and PTS to Add user-specified callback to video+image encoders (H.264 and PTS) May 25, 2014

@waveform80 waveform80 added this to the 1.5 milestone May 25, 2014

@waveform80 waveform80 self-assigned this May 25, 2014

@d2rk

This comment has been minimized.

d2rk commented May 25, 2014

OK, thanks. Maybe it would be even better to pass own encoder derived from PiVideoEncoder.

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 25, 2014

Tempting, but the callback idea keeps things simpler to test & debug for the time being; about the only use-case I can see for custom encoder classes vs callbacks is that a custom encoder class could remove existing bits of the encoder classes for the purpose of reducing overhead (and, assuming one stays within Python, I'd be impressed if someone could significantly speed up the encoder classes within the callback phase - they're already pretty minimal)

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 28, 2014

Hmm ... I've had a play with callbacks now, and I've come to the conclusion that you're right: it'd be better to permit a custom encoder class. The issue with callbacks comes with where to place them: before the write to output? After? There's decent arguments for either. If before, do we allow the callback to mutate the buffer to be written? Again, there's a use-case for it. So instead of lots of different callback hooks with different calling conventions, let's go with your idea of specifying a replacement encoder class.

The only objection I can come up with at the moment is that while it's easy for start_recording (before it only ever uses PiVideoEncoder) it's a tad more tricky when it comes to the capture methods (as they use one of PiRawOneImageEncoder, PiCookedOneImageEncoder, PiRawMultiImageEncoder or PiCookedMultiImageEncoder). For the time being, I'm going to completely avoid that question and only permit specification of the encoder class with start_recording, not capture et al - but I'm sure I'll have to tackle it at some point in the future!

@d2rk

This comment has been minimized.

d2rk commented May 28, 2014

Good point. I've already changed the code for my needs. But I did it with builder picamera.Picamera(encoder_builder=MyEncoderBuilder()) so you may modify builder instance and do not touch start_recording.

@waveform80

This comment has been minimized.

Owner

waveform80 commented May 29, 2014

Interesting idea - so you've got a factory instance which has some method(s) on it which construct the encoder given a set of parameters. That's certainly more flexible than merely adding a class parameter to start_recording. How does your builder handle construction of encoders for image capture (or is it purely for video at the moment?).

I was looking at PiCamera._get_ports and thinking that something like PiCamera._get_encoder would be a nice parallel (which could be overridden by anyone wanting to output a different encoder class), but it'd need a few parameters to decide which of the current classes to return. Something along the lines of for_video, for_one_shot, for_raw.

However, if we go with an encoder factory class it could have separate methods for video or image encoder construction (I'm not sure that adds any particular flexibility but it might be easier for people to override as they could elect to only override the video related method, for example).

@d2rk

This comment has been minimized.

d2rk commented May 29, 2014

Yes, on this moment I'm working with video. But you can do something like this: PiCamera(video_encoder_builder=MyVideoEncoderBuilder(), image_encoder_builder=MyImageEncoderBuilder()). The key moment here is that you're passing options to build() method such as format, so the builder can generate right encoder instance, e.g. PiRawOneImageEncoder, PiCookedOneImageEncoder.

@waveform80

This comment has been minimized.

Owner

waveform80 commented Jun 1, 2014

Good point on the format parameter. I've been playing around with this today, and I think I've settled on a design. In the end I didn't go with a separate encoder builder class, but with a bunch of extra methods on PiCamera: _get_video_encoder, etc. which take more or less the same parameters as the encoder classes' initializers - it just seemed simpler than introducing yet another class (and I don't think it limits what can be achieved by overriding the classes).

The bulk of the work turned out to be all the extra documentation that needed writing; all the encoder classes now have some (admittedly rudimentary) docs, along with the new _get_*_encoder methods, and a recipe demonstrating a simple custom video encoder class.

Once I've run through the test suite (which'll take an hour or so) I'll push it to GitHub.

@waveform80 waveform80 closed this in 323b06b Jun 2, 2014

@d2rk

This comment has been minimized.

d2rk commented Jun 2, 2014

OK, sounds good to me. Thank you.

waveform80 added a commit that referenced this issue Jun 2, 2014

Tidy up the docs for #97 a bit more
Also corrected a mistake in the inheritance diagram for the recipe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment