Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Add VAAPI VPP support for deinterlacing #2460

Closed
wants to merge 2 commits into from

10 participants

@BtbN

This still needs some testing, but it already works quite nice.
The new VPP API is currently only available in the staging branch of libva, but the code contains checks so it doesn't fail to compile with older vaapi versions.

I'm still very new to XBMC code, so i'm not sure if i did everything the right way.

Some people i was told who might be interested in this:
@elupus
@fernetmenta
@gbeauchesne

@fritsch
Collaborator

Thanks again for doing this PR. It is a great start in order to support more vaapi features in the future, when this VA/PP work is getting into the "normal" libva. As I already read the code before, there is one minor thing, that could be considered. Perhaps vaapi can manage the pp stuff alone, so that other code parts don't have to know about it at all. But thinking of long term and that PP is able to do scaling, rotating and so on - it would be okay as it is. let' see what the others think of it.

@alanwww1
Collaborator

This would be a great new feature for Intel owners! I am more than happy to test it!

One question: is BOB the only method supported with VPP or there are already other more advanced ways supported ?

@BtbN

The VPP api supports a lot of diffrent deinterlacing methods, up to full motion compensation deinterlacing, but the libva drivers currently only implement bob.

@alanwww1
Collaborator

Just a precise info on the various methods I just got from Gwenole:

"Bob deinterlacing is the only deinterlacing method supported in the VA intel-driver for now.
This does not provide any visual quality improvement since this is only bob deinterlacing. It might look better than using VA_TOP_FIELD|VA_BOTTOM_FIELD flags to vaPutSurface()/vaCopySurfaceGLX() but this is collateral.

There is work-in-progress to support Motion-Adaptive method, but no ETA yet as this is not a technical problem. Motion-Compensated method is currently out-of-scope.

Regards,
Gwenole."

So we can expect Motion Adaptive sometime in the future. That is good news.

@FernetMenta
Collaborator

I doubt that this design which hijacks the render thread for deinterlacing will cope with more advanced methods than bob.
1) performance
2) this was not designed to keep surfaces used for post proc.

My new implementations for vdpau and xvba run deinterlacing on a separate thread and deliver ready-to-render textures to renderer. No need to call into the API from the main thread which makes the design more robust.

Another approach for Bob could be pulling the decoded surfaces into system memory and let renderer do the deinterlacing. Then no unreleased libva is required. Provided that vaapi does sopport this, I have not checked.

Just my thoughts on this, I am not against this in general.

@BtbN

Running the deinterlacer in a separate thread would definitly be better. More complex algorithms need more hanling, like buffering some frames as forward and backward references.
I think for Bob this should work fine as drop-in replacement.

@BtbN

I updated this. Now it runs in a seperate thread in the VAAPI decoder, same as suggestes.
It propably still needs some testing and polishing, but in my tests its working perfectly.
I also added VAAPI deinterlacing as new deinterlacing-method, selectable via the UI or perfered by Auto-Select if available.

@elupus
Collaborator
@BtbN

It propably could run within the Decode thread, but the current setup separates VPP, which is except for the display a new vaapi instance, into its own thread, which allows it to work in the background, while the decoder can fill more surfaces.
The Performance gain is not very big, but on some systems with not so powerfull GPUs this performance gain can make the difference for 1080i50 playback.

VPP is also capable of much more than just deinterlacing, and the seperate thread allows for easy integration of such features(Scaling, some filters and other stuff) in the future.

@gbeauchesne
@BtbN

Hi,

Yes, they are still in there. I'll check if the flickering is still there and report a bug if it is.
The current implementation only does bob deinterlacing. But changing it to do other stuff like scaling or some filters would only require the VPP class to be changed, which shouldn't be too complicated. Bigger problem might be to make XBMC use these filters properly.

At the moment i can't think of anything that might be missing in the API.

The only real problem we currently got with deinterlacing is the realy poor performance of the bob deinterlacer. (Decoding+)Deinterlacing(+Rendering) a 1080i50 video puts higher load on the gpu(checked with intel_gpu_top) than transcoding(Decode + Encode, both with libva) it. Specialy on weaker machines like the smallest intel NUC this causes some troubles.

Another thing with the GLX render-api, not directly related to the intel-driver, is that it internaly uses GLX_ext_texture_from_pixmap to get a texture from the libva video frame, and then renders this into the target texture. This adds an additional (unneccessary?) render step. Wouldn't it be possible to directly use the internal texture? Something like vaBindSurfaceGLX which binds a vaapi surface as current OpenGL Texture, without the need to render it into an intermediate texture.

@fritsch
Collaborator

Those patches have been tested now for some weeks in OpenELEC(1) and xbmc (2) via a special ppa
It did not introduce any regressions also. Despite the Clarkdale GPUs seem to render blackscreen if deinterlacing is enabled, which seems to be a libva-intel-driver problem.

This improves vaapi situation on xbmc by a huge amount.

So if there are no other technical problems (e.g. source code, architecture specific), I would see it as a good candidate for inclusion.

Edit:
(1) http://openelec.tv/forum/116-vaapi-intel/64006-vaapi-deinterlacing-testing?limitstart=0
(2) http://forum.xbmc.org/showthread.php?tid=165707

@elupus
Collaborator
@elupus
Collaborator
@BtbN

The thread is used so the decoder can continue decoding stuff while VPP is deinterlacing.
It's independend and can work in parallel.

@elupus
Collaborator
@BtbN

It also doesn't hurt and adds an easy way to introduce future processing steps which might need more CPU time.
On lower end machines the thread also enables them to decode 1080i. The Celeron fritsch tested it on wasn't able to keep up with 1080i with vpp running in the same thread. Putting it in a separate thread made it fast enough. So there is, at least a small, performance gain.

@alanwww1
Collaborator

Intel added Motion adaptive deinterlacing and other Post processing abilities in the new master version (1.2.0) of the vaapi driver:

http://cgit.freedesktop.org/vaapi/intel-driver/commit/?id=e4581aa7e6488173a1f1ba10b6d1c0689f64999e

  • Video process on ILK/SNB/IVB/HSW
    • CSC/scaling on ILK
    • CSC/scaling/NoiseReduction/Deinterlacing{Bob} on SNB/IVB
    • CSC/scaling/NoiseReduction/Deinterlacing{Bob,MotionAdaptive}/Sharpening/ColorBalance on HSW

Would be great to add these features.
@BtbN Can you pick this up ?
@elupus Do you still have any principal issues with the implementation ?

Thanks to both of you !

@fritsch
Collaborator

@BtbN is travelling this week, so not sure if he can answer.

Problem with Temp / Spat currently is, that it is only implemented for Intel Haswell and no one of us already has this hardware.

@alanwww1
Collaborator

@fritsch
That is for sure, but at least the framework would be good to be pulled in and soon we'll have the hardware available to test.

@fritsch
Collaborator

The VPP works (this PR here) already uses this framework. It was started as all those were in the staging branch to have it supported when ever it becomes stable - which now has happened.

The Temporal / Spatial Implementation for haswell (basing on the same framework, just needing "more input fields") has been implemented in this tree approx a week ago, which included an update in libdrm to 2.4.45, so everything pretty new.

But yes, making the Intel deinterlacing class more generic and enumerating all the (deinterlacing) features, that libva-driver-intel provides is the plan and was already discussed.

@alanwww1
Collaborator

@elupus
Can this be pulled in for now, or you still have some concerns ?
Thx

@fritsch
Collaborator

@alanwww1:
I would wait two more weeks. I will be looking at the code again, currently i am ill and not i am not meant to do coding. We now have the chance to get it going in a more abstract way. Currently it is fully BOB centric. I started to use VAAPI_BOB and VAAPI_Best as it is done on windows some days ago. Those need to have a fallback to the lowest method (Bob) and the Auto setting should also decide which one to choose (in that case: Bob).

The code changes are not very intrusive.

  • Add one more string
  • Make Bob the Auto default
  • Rename the BobReady methods to get the capabilities in a more general way
  • enumerate the temp / spat (that is a one liner)
  • Adapt the processing chain to have an array of smart pointers to hold the "last three" fields - the rest is already pretty nice and integration should be flawless
@alanwww1
Collaborator

@fritsch
Thanks. Ok. Until that, I might be getting a Haswell rig, so we can test things out later.

@fritsch
Collaborator

@alanwww1: I hope your Haswell rig is ready. It seems there is a whole lot to test.
@sraue: As you also have a Haswell, can you test it on your machine? (OpenELEC xbmc=master + this PR) should be enough.

@alanwww1
Collaborator

@fritsch
Great job, I'll try it asap. Hopefully in the weekend I got some time. Thx

@fritsch
Collaborator

@alanwww1: That code was written fully by @btbn. I only helped in discussing. So no wrong flowers for me :-)

@wsnipex
Collaborator
@alanwww1
Collaborator

So guys, I made asome tests:

At both test cases, normal vaapi decoding works with no problem.

Let me know what to change, or if you need more debug info.

@fritsch
Collaborator

Can you test with this sample and report back? https://dl.dropboxusercontent.com/u/55728161/walker_interlaced.ts

xbmc.log with debugging turned on, please.

@alanwww1
Collaborator

Ok. I did some more tests.
The mpeg2 file you sent does the same thing. EVERY file I play no mater what resolution or encoding (even non-interlaced files if I turn on deinterlacing) does the green screen.
Log: http://paste.ubuntu.com/5850833/

I also did a test with an Ivy Bridge system (Core i7-3770K) and it worked well on that system.

So what we know:

  • Patch works on Ivy Bridge
  • Neighter the old nor the new patch work on Haswell regardless of fileformat, resolution

I even tried to force VPP to use BOB method by adding the following line

method = DeinterlacingBob;

After line 203: https://github.com/BtbN/xbmc/blob/3ff62a0cd3ffcd3ab10409a61973a5b8e6186c7f/xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI_VPP.cpp#L203
Log: http://paste.ubuntu.com/5850880/

It produces the same green screen. So this can also well be a driver problem, since none of the VPP deinterlacers work on Haswell.

@fritsch
Collaborator

Thanks for this tests. I am not sure the error is in the Code - as it does not bailout in any way. Can you by any chance upgrade all your graphic stack to xorg-edgers? Make a full backup or something before you do it, but Haswell is still work in progress. Be it in intel drivers, libdrm or in mesa.

Latest and greatest could be required.

Your log indicates, that you are still on

GL_VERSION = 3.0 Mesa 9.1.3
@alanwww1
Collaborator

Thanks. I tried to upgrade everything to the freshest by adding the xorg edgers. Now it is mesa 9.2-devel, but everything works the same:

On IVB, VPP Deinterlacing works,
On HSW, VPP Deinterlacing does NOT work.

Log: http://paste.ubuntu.com/5850948/

I think there is a Big chance, that this is a libva intel driver problem. Might be good to notify the Intel guys. Maye @gbeauchesne can help identify the problem.

@BtbN

Strange, it looks like something inside libva is misbehaving.
You can try to enable libva trace, and upload a log.

iirc it's enabled by adding LIBVA_TRACE=/wheverever/tracelog to /etc/libva.conf

@alanwww1
Collaborator

I have collceted the libva trace log. It does not tell too much for me. Hopefully someone can tell what the problem is with Haswell deinterlacing:

http://paste.ubuntu.com/5852777/

@alanwww1
Collaborator

Hi All !

I got an info from Gwenole about Haswell deinterlacing. Here is his message:


Sorry, I am very busy right now with integrating support for VA/VPP (+deinterlacing) in gstreamer-vaapi. If there is an issue on Haswell, you can be sure it would be fixed along the way. 
Hopefully, that should be settled down by the middle of next week. Thanks for your patience.

Regards,
Gwenole. 

Since normal (non-VPP) BOB deinterlacing still works on Haswell and VPP works on ALL other platforms, I think this PR is already good to pull in, if there are no objections from other Devs @elupus , @fritsch , @sraue

xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI.cpp
((60 lines not shown))
}
- if(!m_holder.surface)
+
+ m_vppth->WaitForOutput(2000);
@elupus Collaborator
elupus added a note

You are not allowed to block that long here. This blocks dvdplayervideo.

@BtbN
BtbN added a note

I think that line can be removed entirely, not sure why even added it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI.cpp
((55 lines not shown))
+void CVPPThread::OnStartup()
+{
+ CLog::Log(LOGDEBUG, "VAAPI - VPP thread on startup");
+}
+
+void CVPPThread::OnExit()
+{
+ CLog::Log(LOGDEBUG, "VAAPI - VPP thread on exit");
+}
+
+void CVPPThread::InsertNewFrame(CVPPPicture &new_frame)
+{
+ if(!IsRunning())
+ return;
+
+ m_input_queue_lock.lock();
@elupus Collaborator
elupus added a note

CSingleLock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI.cpp
((76 lines not shown))
+}
+
+void CVPPThread::WaitForOutput(unsigned long msec)
+{
+ if(!IsRunning())
+ return;
+
+ m_output_queue_lock.lock();
+ if(m_output_queue.empty())
+ {
+ if(msec > 0)
+ m_output_cond.wait(m_output_queue_lock, msec);
+ else
+ m_output_cond.wait(m_output_queue_lock);
+ }
+ m_output_queue_lock.unlock();
@elupus Collaborator
elupus added a note

conditionals have sporadic wakeup. You need to re-check.

@BtbN
BtbN added a note

So i'd have to check for the time myself? What exactly does that sporadic wakeup mean? Do they just randomly wakeup sometimes?

@BtbN
BtbN added a note

Just using a while loop is not realy an option here, as it'd not work with the timeout.
Anyway, I'm going to remove this function entirely, as I stopped using it in CDecoder::GetPicture, which was the only place where it was used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI.cpp
((109 lines not shown))
+ m_output_queue_lock.unlock();
+
+ return res;
+}
+
+CVPPPicture CVPPThread::GetCurrentFrame()
+{
+ CVPPPicture res = CVPPPicture();
+
+ if(m_stop)
+ return res;
+
+ m_input_queue_lock.lock();
+
+ if(m_input_queue.empty())
+ m_input_cond.wait(m_input_queue_lock);
@elupus Collaborator
elupus added a note

sporadic wakeup here too.

@BtbN
BtbN added a note

Not realy an issue here, as the function will just return an invalid picture in that case, which will just cause the thread to do a dry loop run, ending up in this function again.
This behaviour is important, as stopping the thread relies on the behaviour of this functions, that it returns whenver the condvar is signaled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@elupus
Collaborator

I'm still very very much against this being threaded.

@BtbN

I can't give numbers on that, but the thread does perform better than doing it all in the same thread.
No idea how much better, but when @fritsch tested it on his low end celeron machine, it wasn't able to keep up with 1080i50 first. After moving vpp to its own thread, it was fast enough.

Apart from that, the issues you mentions should be fixed, i just haven't pushed them yet, because github might remove the comments then.

@fritsch
Collaborator

I wrote Haihao of Intel about the issue and here is his response. It seems kernel > 3.11-rc1 or drm-intel-next is needed and then everything should be fine:

To support deinterlacing on HSW, A special Linux kernel is required.
See the note of know issue at
https://01.org/linuxgraphics/downloads/2013/2013q2-intel-graphics-stack-release

(Note: The new feature has been merged into Linux 3.11-rc1)

+        vaQueryVideoProcFilters(m_display->get(), m_deintContext, filters, &numFilters) != VA_STATUS_SUCCESS
+     || vaQueryVideoProcFilterCaps(m_display->get(), m_deintContext, VAProcFilterDeinterlacing, deinterlacingCaps, &numDeinterlacingCaps) != VA_STATUS_SUCCESS)

Returning VA_STATUS_SUCCESS from vaQueryVideoProcFilters() can't
guarantee VAProcFilterDeinterlacing is supported, you should go through
the returned array and find out the right filter then query the
capability of this filter.

Yes, some enhancement can be done in vaQueryVideoProcFilterCaps(), I
will add more check in vaQueryVideoProcFilterCaps().

Thanks
Haihao
@alanwww1
Collaborator

I'll make a test soon and report back.

@alanwww1
Collaborator

I made a test with kernel 3.11_RC2
We have an improvement, but now it is blurred image (see screenshot) flashing with green screen (some frames were completely green from the screenshots I made). Maybe it is me doing something wrong. Any idea ? @gbeauchesne , Haihao ?

libva tracelog: http://paste.ubuntu.com/5919464/
xbmc log: http://paste.ubuntu.com/5919486/
screenshot:
screenshot007

@wsnipex
Collaborator

@elupus can this be merged, or are you still opposed due to the thread?

@sgoryachkin

Hi,

I made AUR packege https://aur.archlinux.org/packages/xbmc-frodo-vpp-git/ (based on xbmc[comunity]) whith vaapi-vpp-deinterlacing-frodo branch, and try this on my HTPC.

HTPC Config:
Intel® Core i3 3220T (Ivy Bridge, HD Graphics 2500)
Kernel - 3.10.5
xf86-video-intel - 2.21.14
Libdrm - 2.4.45
Libva - 1.2.1
vaapi-intel-driver - 1.2
Cairo - 1.12.14
Xserver - 1.14

Result:
VAAPI Auto:
Deinterlacing works (IPTV chanels). Picture perfect, but Discovery HD Logo is jumping from a high frequency.

@fritsch
Collaborator

@sgoryachkin:
Logfile with debugging please - and output of vainfo

@ghost

Hi,
I've just signed up to support this idea. I'm using BtbN vpp-deinterlacing fork now but I would like to see this function included in the original version. Is the reason why this is not yet included the elupus averse to run this in separate thread? Do you elupus have another solution how to use hw accelerated deinterlacing on intel GPU?

@MartijnKaijser

@Alesho
this is NOT a support section. got something to ask, use the forum

@ghost

@MartijnKaijser
I'm sorry, just thought that section with the name of Discusion is for discussion. You don't want this to be included?

@elupus
Collaborator

The point is really that we shouldn't need to run this in a separate thread to get full performance. The CPU is not doing any work, the GPU is. If we need the separate thread we are doing something wrong and inefficient in the single thread which we should not be doing in the first place.

@ghost

@elupus
Thank you elupus, I appreciate you answer very much, especialy how kind you wrote it.
@BtbN
Could you BtbN please try to rewrite it back to single thread and try find out inefficiency the elupus is talking about?
@all
I am realy sorry for "spaming" here but I would realy like to see things moving in this request.

@fritsch
Collaborator

@Alesho:
This is a dev space. Such a rewrite is not done by "a just do it". It seems you have no clue on the actual source code and how btbn uses the second thread as a design approach. So please stop misusing that place to lobby features you want.

Every @ triggers a separate mail to someone and @ all gets him: https://github.com/all an email

@MartijnKaijser

@elupus / @FernetMenta could you review the changes?
sanity build: jenkins build this please

language/English/strings.po
@@ -6882,7 +6882,27 @@ msgctxt "#16325"
msgid "VDPAU - Bob"
msgstr ""
-#empty strings from id 16326 to 16399
+msgctxt "#16326"
+msgid "VAAPI Auto"
@FernetMenta Collaborator

we already have AUTO. why should this not fit for vaapi?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
xbmc/cores/VideoRenderers/LinuxRendererGL.cpp
@@ -56,6 +56,7 @@
#include <va/va_x11.h>
#include <va/va_glx.h>
#include "cores/dvdplayer/DVDCodecs/Video/VAAPI.h"
+#include "cores/dvdplayer/DVDCodecs/Video/VAAPI_VPP.h"
@FernetMenta Collaborator

renderer should not be bothered with a second vaapi include.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
language/English/strings.po
@@ -6882,7 +6882,27 @@ msgctxt "#16325"
msgid "VDPAU - Bob"
msgstr ""
-#empty strings from id 16326 to 16399
+msgctxt "#16326"
+msgid "VAAPI Auto"
+msgstr ""
+
+msgctxt "#16327"
@MartijnKaijser Owner

please specify in which files the strings are used. see examples in strings.po

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI_VPP.cpp
((4 lines not shown))
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ *
+ */
@MartijnKaijser Owner

outdated copyright header. please c/p one from another file and use only 2013 as year. Same for the other files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@mbroemme

With latest version it is still not working for me. Switching to a TV channel which should be deinterlaced xbmc immediately crashs:

The system configuration is the following:

Kernel - 3.11.1-2-ARCH
CPU - Intel(R) Core(TM) i5-4570T CPU @ 2.90GHz
GPU - HD4600
xf86-video-intel - 2.21.15
Libdrm - 2.4.46
Libva - 1.2.1
vaapi-intel-driver - 1.2.1
Deinterlace method - VAAPI Auto

Does anybody have an idea?

@MartijnKaijser

@mbroemme please use pastebin

@mbroemme

@MartijnKaijser sorry here is pastebin version: http://pastebin.com/kdJ0KCYr

@BtbN

MADI and MCDI should work now. Turns out the main problem was that the forward references actualy don't mean the next frame. They mean the previous one. Also, vaapi seems to expect beeing called twice for each frame.
With this sorted now, the only problem left are the occasional assertion crashes from the intel driver, which only seem to happen on mpeg2, and also not all the time. Most likely this is some bug in libva or ffmpeg, as it's complaining about a missing internal buffer(obj_surface->bo), which cannot be accessed or validated from outside the intel driver.

@BtbN BtbN referenced this pull request from a commit
Commit has since been removed from the repository and is no longer available.
@FernetMenta FernetMenta was assigned
@BtbN

I still don't think that this should be merged currently. The code generaly works fine, but the libva drivers is causing way too many crashes for this to be considdered stable.
For bob it seems to work flawless, so i might disable everything else per default and only make it available through some AdvancedSetting.

@fritsch
Collaborator

Yeah, we can also get it more tested via the known infrastructures provided by @wsnipex and therefore update the vaapi vpp testing thread: http://forum.xbmc.org/showthread.php?tid=165707

Thanks very much for your work and also the constant interaction with the intel guys to improve upstream vaapi driver.

@MartijnKaijser

any update on this?

@fritsch
Collaborator
@FernetMenta
Collaborator

Many thanks for your poineer work on this. Finally we have vpp implemented.

@FernetMenta FernetMenta closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.