
Loading…
Thanks again for doing this PR. It is a great start in order to support more vaapi features in the future, when this VA/PP work is getting into the "normal" libva. As I already read the code before, there is one minor thing, that could be considered. Perhaps vaapi can manage the pp stuff alone, so that other code parts don't have to know about it at all. But thinking of long term and that PP is able to do scaling, rotating and so on - it would be okay as it is. let' see what the others think of it.
This would be a great new feature for Intel owners! I am more than happy to test it!
One question: is BOB the only method supported with VPP or there are already other more advanced ways supported ?
The VPP api supports a lot of diffrent deinterlacing methods, up to full motion compensation deinterlacing, but the libva drivers currently only implement bob.
Just a precise info on the various methods I just got from Gwenole:
"Bob deinterlacing is the only deinterlacing method supported in the VA intel-driver for now.
This does not provide any visual quality improvement since this is only bob deinterlacing. It might look better than using VA_TOP_FIELD|VA_BOTTOM_FIELD flags to vaPutSurface()/vaCopySurfaceGLX() but this is collateral.
There is work-in-progress to support Motion-Adaptive method, but no ETA yet as this is not a technical problem. Motion-Compensated method is currently out-of-scope.
Regards,
Gwenole."
So we can expect Motion Adaptive sometime in the future. That is good news.
I doubt that this design which hijacks the render thread for deinterlacing will cope with more advanced methods than bob.
1) performance
2) this was not designed to keep surfaces used for post proc.
My new implementations for vdpau and xvba run deinterlacing on a separate thread and deliver ready-to-render textures to renderer. No need to call into the API from the main thread which makes the design more robust.
Another approach for Bob could be pulling the decoded surfaces into system memory and let renderer do the deinterlacing. Then no unreleased libva is required. Provided that vaapi does sopport this, I have not checked.
Just my thoughts on this, I am not against this in general.
Running the deinterlacer in a separate thread would definitly be better. More complex algorithms need more hanling, like buffering some frames as forward and backward references.
I think for Bob this should work fine as drop-in replacement.
I updated this. Now it runs in a seperate thread in the VAAPI decoder, same as suggestes.
It propably still needs some testing and polishing, but in my tests its working perfectly.
I also added VAAPI deinterlacing as new deinterlacing-method, selectable via the UI or perfered by Auto-Select if available.
It propably could run within the Decode thread, but the current setup separates VPP, which is except for the display a new vaapi instance, into its own thread, which allows it to work in the background, while the decoder can fill more surfaces.
The Performance gain is not very big, but on some systems with not so powerfull GPUs this performance gain can make the difference for 1080i50 playback.
VPP is also capable of much more than just deinterlacing, and the seperate thread allows for easy integration of such features(Scaling, some filters and other stuff) in the future.
Hi,
Yes, they are still in there. I'll check if the flickering is still there and report a bug if it is.
The current implementation only does bob deinterlacing. But changing it to do other stuff like scaling or some filters would only require the VPP class to be changed, which shouldn't be too complicated. Bigger problem might be to make XBMC use these filters properly.
At the moment i can't think of anything that might be missing in the API.
The only real problem we currently got with deinterlacing is the realy poor performance of the bob deinterlacer. (Decoding+)Deinterlacing(+Rendering) a 1080i50 video puts higher load on the gpu(checked with intel_gpu_top) than transcoding(Decode + Encode, both with libva) it. Specialy on weaker machines like the smallest intel NUC this causes some troubles.
Another thing with the GLX render-api, not directly related to the intel-driver, is that it internaly uses GLX_ext_texture_from_pixmap to get a texture from the libva video frame, and then renders this into the target texture. This adds an additional (unneccessary?) render step. Wouldn't it be possible to directly use the internal texture? Something like vaBindSurfaceGLX which binds a vaapi surface as current OpenGL Texture, without the need to render it into an intermediate texture.
Those patches have been tested now for some weeks in OpenELEC(1) and xbmc (2) via a special ppa
It did not introduce any regressions also. Despite the Clarkdale GPUs seem to render blackscreen if deinterlacing is enabled, which seems to be a libva-intel-driver problem.
This improves vaapi situation on xbmc by a huge amount.
So if there are no other technical problems (e.g. source code, architecture specific), I would see it as a good candidate for inclusion.
Edit:
(1) http://openelec.tv/forum/116-vaapi-intel/64006-vaapi-deinterlacing-testing?limitstart=0
(2) http://forum.xbmc.org/showthread.php?tid=165707
The thread is used so the decoder can continue decoding stuff while VPP is deinterlacing.
It's independend and can work in parallel.
It also doesn't hurt and adds an easy way to introduce future processing steps which might need more CPU time.
On lower end machines the thread also enables them to decode 1080i. The Celeron fritsch tested it on wasn't able to keep up with 1080i with vpp running in the same thread. Putting it in a separate thread made it fast enough. So there is, at least a small, performance gain.
Intel added Motion adaptive deinterlacing and other Post processing abilities in the new master version (1.2.0) of the vaapi driver:
http://cgit.freedesktop.org/vaapi/intel-driver/commit/?id=e4581aa7e6488173a1f1ba10b6d1c0689f64999e
Would be great to add these features.
@BtbN Can you pick this up ?
@elupus Do you still have any principal issues with the implementation ?
Thanks to both of you !
The VPP works (this PR here) already uses this framework. It was started as all those were in the staging branch to have it supported when ever it becomes stable - which now has happened.
The Temporal / Spatial Implementation for haswell (basing on the same framework, just needing "more input fields") has been implemented in this tree approx a week ago, which included an update in libdrm to 2.4.45, so everything pretty new.
But yes, making the Intel deinterlacing class more generic and enumerating all the (deinterlacing) features, that libva-driver-intel provides is the plan and was already discussed.
@alanwww1:
I would wait two more weeks. I will be looking at the code again, currently i am ill and not i am not meant to do coding. We now have the chance to get it going in a more abstract way. Currently it is fully BOB centric. I started to use VAAPI_BOB and VAAPI_Best as it is done on windows some days ago. Those need to have a fallback to the lowest method (Bob) and the Auto setting should also decide which one to choose (in that case: Bob).
The code changes are not very intrusive.
ubuntu testing packages: https://launchpad.net/~wsnipex/+archive/xbmc-pr-tests
So guys, I made asome tests:
At both test cases, normal vaapi decoding works with no problem.
Let me know what to change, or if you need more debug info.
Can you test with this sample and report back? https://dl.dropboxusercontent.com/u/55728161/walker_interlaced.ts
xbmc.log with debugging turned on, please.
Ok. I did some more tests.
The mpeg2 file you sent does the same thing. EVERY file I play no mater what resolution or encoding (even non-interlaced files if I turn on deinterlacing) does the green screen.
Log: http://paste.ubuntu.com/5850833/
I also did a test with an Ivy Bridge system (Core i7-3770K) and it worked well on that system.
So what we know:
I even tried to force VPP to use BOB method by adding the following line
method = DeinterlacingBob;
After line 203: https://github.com/BtbN/xbmc/blob/3ff62a0cd3ffcd3ab10409a61973a5b8e6186c7f/xbmc/cores/dvdplayer/DVDCodecs/Video/VAAPI_VPP.cpp#L203
Log: http://paste.ubuntu.com/5850880/
It produces the same green screen. So this can also well be a driver problem, since none of the VPP deinterlacers work on Haswell.
Thanks for this tests. I am not sure the error is in the Code - as it does not bailout in any way. Can you by any chance upgrade all your graphic stack to xorg-edgers? Make a full backup or something before you do it, but Haswell is still work in progress. Be it in intel drivers, libdrm or in mesa.
Latest and greatest could be required.
Your log indicates, that you are still on
GL_VERSION = 3.0 Mesa 9.1.3
Thanks. I tried to upgrade everything to the freshest by adding the xorg edgers. Now it is mesa 9.2-devel, but everything works the same:
On IVB, VPP Deinterlacing works,
On HSW, VPP Deinterlacing does NOT work.
Log: http://paste.ubuntu.com/5850948/
I think there is a Big chance, that this is a libva intel driver problem. Might be good to notify the Intel guys. Maye @gbeauchesne can help identify the problem.
Strange, it looks like something inside libva is misbehaving.
You can try to enable libva trace, and upload a log.
iirc it's enabled by adding LIBVA_TRACE=/wheverever/tracelog to /etc/libva.conf
I have collceted the libva trace log. It does not tell too much for me. Hopefully someone can tell what the problem is with Haswell deinterlacing:
Hi All !
I got an info from Gwenole about Haswell deinterlacing. Here is his message:
Sorry, I am very busy right now with integrating support for VA/VPP (+deinterlacing) in gstreamer-vaapi. If there is an issue on Haswell, you can be sure it would be fixed along the way.
Hopefully, that should be settled down by the middle of next week. Thanks for your patience.
Regards,
Gwenole.
Since normal (non-VPP) BOB deinterlacing still works on Haswell and VPP works on ALL other platforms, I think this PR is already good to pull in, if there are no objections from other Devs @elupus , @fritsch , @sraue
|
You are not allowed to block that long here. This blocks dvdplayervideo.
BtbN
added a note
I think that line can be removed entirely, not sure why even added it.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
CSingleLock
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
conditionals have sporadic wakeup. You need to re-check.
BtbN
added a note
So i'd have to check for the time myself? What exactly does that sporadic wakeup mean? Do they just randomly wakeup sometimes?
ghost
added a note
BtbN
added a note
Just using a while loop is not realy an option here, as it'd not work with the timeout.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
sporadic wakeup here too.
BtbN
added a note
Not realy an issue here, as the function will just return an invalid picture in that case, which will just cause the thread to do a dry loop run, ending up in this function again.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
I'm still very very much against this being threaded.
I can't give numbers on that, but the thread does perform better than doing it all in the same thread.
No idea how much better, but when @fritsch tested it on his low end celeron machine, it wasn't able to keep up with 1080i50 first. After moving vpp to its own thread, it was fast enough.
Apart from that, the issues you mentions should be fixed, i just haven't pushed them yet, because github might remove the comments then.
I wrote Haihao of Intel about the issue and here is his response. It seems kernel > 3.11-rc1 or drm-intel-next is needed and then everything should be fine:
To support deinterlacing on HSW, A special Linux kernel is required.
See the note of know issue at
https://01.org/linuxgraphics/downloads/2013/2013q2-intel-graphics-stack-release
(Note: The new feature has been merged into Linux 3.11-rc1)
+ vaQueryVideoProcFilters(m_display->get(), m_deintContext, filters, &numFilters) != VA_STATUS_SUCCESS
+ || vaQueryVideoProcFilterCaps(m_display->get(), m_deintContext, VAProcFilterDeinterlacing, deinterlacingCaps, &numDeinterlacingCaps) != VA_STATUS_SUCCESS)
Returning VA_STATUS_SUCCESS from vaQueryVideoProcFilters() can't
guarantee VAProcFilterDeinterlacing is supported, you should go through
the returned array and find out the right filter then query the
capability of this filter.
Yes, some enhancement can be done in vaQueryVideoProcFilterCaps(), I
will add more check in vaQueryVideoProcFilterCaps().
Thanks
Haihao
I'll make a test soon and report back.
I made a test with kernel 3.11_RC2
We have an improvement, but now it is blurred image (see screenshot) flashing with green screen (some frames were completely green from the screenshots I made). Maybe it is me doing something wrong. Any idea ? @gbeauchesne , Haihao ?
libva tracelog: http://paste.ubuntu.com/5919464/
xbmc log: http://paste.ubuntu.com/5919486/
screenshot:
Hi,
I made AUR packege https://aur.archlinux.org/packages/xbmc-frodo-vpp-git/ (based on xbmc[comunity]) whith vaapi-vpp-deinterlacing-frodo branch, and try this on my HTPC.
HTPC Config:
Intel® Core i3 3220T (Ivy Bridge, HD Graphics 2500)
Kernel - 3.10.5
xf86-video-intel - 2.21.14
Libdrm - 2.4.45
Libva - 1.2.1
vaapi-intel-driver - 1.2
Cairo - 1.12.14
Xserver - 1.14
Result:
VAAPI Auto:
Deinterlacing works (IPTV chanels). Picture perfect, but Discovery HD Logo is jumping from a high frequency.
@sgoryachkin:
Logfile with debugging please - and output of vainfo
Hi,
I've just signed up to support this idea. I'm using BtbN vpp-deinterlacing fork now but I would like to see this function included in the original version. Is the reason why this is not yet included the elupus averse to run this in separate thread? Do you elupus have another solution how to use hw accelerated deinterlacing on intel GPU?
@Alesho
this is NOT a support section. got something to ask, use the forum
@MartijnKaijser
I'm sorry, just thought that section with the name of Discusion is for discussion. You don't want this to be included?
The point is really that we shouldn't need to run this in a separate thread to get full performance. The CPU is not doing any work, the GPU is. If we need the separate thread we are doing something wrong and inefficient in the single thread which we should not be doing in the first place.
@elupus
Thank you elupus, I appreciate you answer very much, especialy how kind you wrote it.
@BtbN
Could you BtbN please try to rewrite it back to single thread and try find out inefficiency the elupus is talking about?
@all
I am realy sorry for "spaming" here but I would realy like to see things moving in this request.
@Alesho:
This is a dev space. Such a rewrite is not done by "a just do it". It seems you have no clue on the actual source code and how btbn uses the second thread as a design approach. So please stop misusing that place to lobby features you want.
Every @ triggers a separate mail to someone and @ all gets him: https://github.com/all an email
@elupus / @FernetMenta could you review the changes?
sanity build: jenkins build this please
| @@ -6882,7 +6882,27 @@ msgctxt "#16325" | ||
| msgid "VDPAU - Bob" | ||
| msgstr "" | ||
| -#empty strings from id 16326 to 16399 | ||
| +msgctxt "#16326" | ||
| +msgid "VAAPI Auto" |
|
we already have AUTO. why should this not fit for vaapi?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
| @@ -56,6 +56,7 @@ | ||
| #include <va/va_x11.h> | ||
| #include <va/va_glx.h> | ||
| #include "cores/dvdplayer/DVDCodecs/Video/VAAPI.h" | ||
| +#include "cores/dvdplayer/DVDCodecs/Video/VAAPI_VPP.h" |
|
renderer should not be bothered with a second vaapi include.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
| @@ -6882,7 +6882,27 @@ msgctxt "#16325" | ||
| msgid "VDPAU - Bob" | ||
| msgstr "" | ||
| -#empty strings from id 16326 to 16399 | ||
| +msgctxt "#16326" | ||
| +msgid "VAAPI Auto" | ||
| +msgstr "" | ||
| + | ||
| +msgctxt "#16327" |
|
please specify in which files the strings are used. see examples in strings.po
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
|
outdated copyright header. please c/p one from another file and use only 2013 as year. Same for the other files.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
|
With latest version it is still not working for me. Switching to a TV channel which should be deinterlaced xbmc immediately crashs:
The system configuration is the following:
Kernel - 3.11.1-2-ARCH
CPU - Intel(R) Core(TM) i5-4570T CPU @ 2.90GHz
GPU - HD4600
xf86-video-intel - 2.21.15
Libdrm - 2.4.46
Libva - 1.2.1
vaapi-intel-driver - 1.2.1
Deinterlace method - VAAPI Auto
Does anybody have an idea?
@mbroemme please use pastebin
@MartijnKaijser sorry here is pastebin version: http://pastebin.com/kdJ0KCYr
MADI and MCDI should work now. Turns out the main problem was that the forward references actualy don't mean the next frame. They mean the previous one. Also, vaapi seems to expect beeing called twice for each frame.
With this sorted now, the only problem left are the occasional assertion crashes from the intel driver, which only seem to happen on mpeg2, and also not all the time. Most likely this is some bug in libva or ffmpeg, as it's complaining about a missing internal buffer(obj_surface->bo), which cannot be accessed or validated from outside the intel driver.
| Commit has since been removed from the repository and is no longer available. |
I still don't think that this should be merged currently. The code generaly works fine, but the libva drivers is causing way too many crashes for this to be considdered stable.
For bob it seems to work flawless, so i might disable everything else per default and only make it available through some AdvancedSetting.
Yeah, we can also get it more tested via the known infrastructures provided by @wsnipex and therefore update the vaapi vpp testing thread: http://forum.xbmc.org/showthread.php?tid=165707
Thanks very much for your work and also the constant interaction with the intel guys to improve upstream vaapi driver.
any update on this?
Many thanks for your poineer work on this. Finally we have vpp implemented.
This still needs some testing, but it already works quite nice.
The new VPP API is currently only available in the staging branch of libva, but the code contains checks so it doesn't fail to compile with older vaapi versions.
I'm still very new to XBMC code, so i'm not sure if i did everything the right way.
Some people i was told who might be interested in this:
@elupus
@fernetmenta
@gbeauchesne