GUI Renderer abstraction and move to deferred rendering #2681

theuni · 2013-05-02T01:01:17Z

This removes all render operations from guilib in favor of an abstracted scene-graph that receives batched render data for later render. When guilib has a texture/font ready for render, it packs up the data and adds it to the graph. At the end of each frame, that graph is evaluated by the renderer and drawn all in one go.

GL/GLES have been implemented, but directx is missing. Verified working on osx/linux/android. The scene-graph is very basic and intentionally non-robust. I think it wise to wait for a directx implementation before trying to settle on anything final.

Benefits:

New renderers (not only dx/gl/gles) as our GUI is now completely independent of rendering code. Our current gl/gles renderers are implemented here, but directx is missing.
We can be MUCH smarter and more efficient with our rendering. We can look at an entire scene at once and decide to cull invisible objects or batch similar ones. Some of those things have been implemented here, but for the most part I tried to keep it similar to past behavior.
Gives a big speedup when moving through the render loop, as we're now skipping lots of small graphics blocking operations.
g_graphicsContext could be eliminated for the most part, since graphics operations are now asynchronous and threadsafe.
Rendering could (and should) be moved to its own thread since all operations are now free of gui processing routines.
Possible to include as an external api for plugins and binary add-ons so that they can use the renderer of their choice, and simply give us the results to display.
Makes it much easier to decouple windowing from rendering, leading to the possibility of changing the renderer on the fly (gl/gles/glesv3).
lots more :)

This does not re-implement our video renderers. I would like to have included those as well, but it's simply too much to knock out at once. Instead, the scene-graph was created in such a way that it can be rendered partially, multiple times. So for operations where we temporarily lose control of our renderer (video playback or visualizations for example), the flow looks like this:

clear buffer
run through the render loop, adding controls to the scene graph
hit a render/video/fullscreen control
render the current scene-graph
let the video renderer do its thing
finish the gui loop, adding controls to a fresh scene graph
render the current scene-graph
repeat

The result of the above is the same as our current implementation, it's just moved into different chunks. Playback/vis are confirmed working fine.

There are a few other changes in here as well including some shader optimizations and general rendering improvements. One of the big ones was making diffuse-color a per-texture rather than per-vertex feature, so that it can be sent once per texture.

Known issues:

Missing directx support (the big one. hint ;)
Somewhat broken dirty-region mode 1/2. Deferred rendering conflicts somewhat with the idea behind these modes.
I'm sure I've missed a few render details

Implementing:
There is some basic doxy (see RenderSystem.h), but it remains to be seen if many changes are needed for directx. I've never touched it, so I'm not sure exactly how much it differs. In theory, it only requires adding upload/delete/render functions. For GL/GLES, it was basically just a c/p of the old texture rendering code, then a refactor to take advantage of the new functionality.
The renderer now takes on more responsibility for things that guilib used to handle, like clearing buffers, dirty-region handling, etc. I'll be very happy to help out with any porting questions.

…gles

… to the scene graph

Add a "RenderObject" member that represents an abstract gpu texture handle. It is defined as a void*, and should be cast by each implementation to whatever it represents. For example, gl/gles both define it as a GLuint (unsigned int). It should NOT have a hard-coded build-time definition. Also, add some helper functions for getting at the texture easier.

This represents GL_LUMINANCE_ALPHA in GL-speak. Suitable for our fonts.

1. Take advantage of CGUITextureBase to handle our texture upload. GUITextureBase has its own backing store, which is deleted after upload since it may be undefined. Because of this, we keep a cached copy of the texture in GUIFontTTF and submit the updates to the texture when needed. 2. Batch up the draws and send them to the scene graph. 3. Switch to a 2-byte texture. First byte is solid, second is the alpha value. When rendered, this will be treated by GL as RGBA (255,255,255,a). We use a uniform color throughout rather than per-vertex as before. Combining these things, we can use our texture shaders with zero state changes. Note: The texture is copied with the correct stride for any desired format, so if other renderers would prefer rgb/rgba/bgra/etc, they can easily be substituted per-renderer.

…Manager

We no longer have any need for per-vertex colors in the gui, so use a uniform as a significant performance gain

1. Don't clear. GL needs to do this last thing before starting to render, else we can block needlessly while waiting for a buffer. Let the renderer handle it. 2. Don't worry about dirty regions or scissoring. Do a RenderPass, then give the results to the renderer. It will best how to deal with the results. 3. Draw the dirty region visualizers as part of the scene graph, just like any other GUI element.

Also change multiply order on multi+blend to save an op

This is a hack to avoid having to rewrite all of the current video renderers.

…en necessary This was not working correctly. Always report the whole screen as dirty for modes 0/3, so that the renderer can always simply render the dirty region list, and it will always show the full story.

theuni · 2013-05-02T01:07:16Z

ping @topfs2 This is what we discussed a few nights ago. Please let me know if there's anything in the implementation that would impede a move to proper gpu projection/model operations. I'm hoping that it would only require extending BatchDraw to include matrix data.

ping @elupus. Same as the above. I know you worked on that at some point, I'd be curious to hear your thoughts.

ping @smspillaz. You asked for a ping when this was PR'd :)

ping @jmarshallnz Any input would be great.

xbmc/cores/VideoRenderers/OverlayRendererGL.cpp

@@ -561,6 +562,7 @@ void COverlayTextureGL::Render(SRenderState& state)
    col[i][0] = col[i][1] = col[i][2] = col[i][3] = 1.0f;
  }

+  glUniform4f(uniColLoc,(col[0][0]), (col[0][1]), (col[0][2]), (col[0][3]));


This almost eliminates the copying around of vector elements, which was proving to be very expensive. Also added SetVertices() for a cheap means of moving all PackedVertices into a Batch

…nges

Don't keep a copy of vertex data, since it's undefined after it's been added anyway. Instead, calculate the number of vertices are required, allocate exactly that many, then work directly on the destination's vector. TODO: Same thing for fonts, they're a killer.

theuni · 2013-05-07T08:29:08Z

I believe I've addressed most of the concerns here (the one that remains is moving the pixel backing store to the Texture, but I'd prefer to hold on that until we can decide how to get font render cpu usage under control first).

The commits here are a bit messy because the interfaces are still a bit hazy, but I think it's approaching something sensible. I've done my best to eliminate any ping-pong'ing done during development, but some things may still look a bit out of order where I've squashed them or moved them around here.

I also reworked quite a bit, and I'm happier with the result now. In profiling the new batching methods, it was easy to see that a bulk of the effort was spent on creating, allocating, copying, and destroying buffers. After playing with a few different models, I think I've found an approach that works reasonably well. The vectors of batch data have moved to vectors of shared_ptr's of batch data. This greatly eliminates the alloc/copy burden, and most of it is handled behind the scenes.

While I was at it, I also moved BatchDraw into its own class, mainly just for future-proofing and cleanliness.

Here (ignoring the wonky coords order, I left that as-is) is a pretty good representation of the zero-copy method, which is (imo) what we should be shooting for.

The last big thing on my radar is trying to get font calculating/batching/copying under control, but I'm beginning to wonder if we would be better off solving it a higher level (flagging when text needs to render but has not changed since last frame, so cached batch data could be used). If that is a reasonable approach, then I think refactoring font batching would largely be a waste of time.

stupid-boy · 2013-05-07T21:18:52Z

i believe your branch is working?
i ask because here ( at my setup ) things are going worse. with mainstream code from before few days i get around 35-36 frames in system info and rss text onto main screen is moving fluent. with code from your fork i get ~22-23 fps, text is with snatches and animations in transitions from one screen to another in system menu to are with snatches. on some places where previously i get ~30% cpu load, now i get 60+% cpu. vsync doesn't change things because in both situations demand is for higher refresh than my system can produce.
my rpi is 'ondemand' clocked at 1000 ( and is >97% loaded during this test ), selected and native resolution on my tv is 1920x1080@60, via HDMI. release build.

theuni · 2013-05-07T21:30:06Z

You're probably being hit with the extra cost of text processing mentioned above. Should be easy enough to test that theory. Find 2 screens that are similar except that one has more text than the other. If you're interested, gdb would probably give a quick peek into what's going on under the hood if you just randomly break a few times and see what function the main thread is in.

I'm still thinking on how to improve this.

stupid-boy · 2013-05-08T05:32:32Z

may be. definitely text rendering has something here. lowest load i get is in main system settings window, where text is only on menu on left side. definitely last commit makes difference. before: more snatches in rss text, less visible in windows transition animations. now: little less in text, little more in transitions. roughly same fps and cpu load for both commits.

just to remember, @popcornmix reported above ~70fps in system info screen in debug, same screen at which i get ~22-23 in release. this lead me for next time to personal conclusion that there are two hardware versions, behaving differently. previously i see reports that exactly the same image behaves differently regardless vsync. for some it just work, for others, including me, not. may be i have 'bad' chance to own slower version, but that chance is good in providing capabilities for tests on this platform too. that is why i report my findings too.
that are just personal feelings about hardware versions and i can be wrong.

[osx/ios/atv2] - sync xcode projects

jmarshallnz · 2014-07-12T05:43:09Z

@smspillaz in case you might have time to look into this again at some point (or you, ofcourse @theuni :) ) I've rebased it up here:

https://github.com/jmarshallnz/xbmc/tree/deferred_renderer

I likely won't have time to actually work on it short-term, but at least it builds (it doesn't actually work ofcourse...)

theuni · 2014-07-12T17:21:34Z

@jmarshallnz Heh, I would actually love to get this in at some point because I spent so many days/nights cursing at the main-thread rendering (I have another branch somewhere that builds on top of this one, moving the rendering out of the main thread, but it was just an experimental hack). But this isn't a high priority for me right now :)

That said, it'd probably be more useful to use this PR as a guideline rather than getting it back in working order. IIRC profiling showed an unnecessary absurd amount of unnecessary vertex data copying going on. The premise was good though.

It could probably be broken into ~3 stages though, so it could be added piecemeal:

Define a data structure for a list of vertices. Here it was a vector of PackedVertex's. vector is probably far too heavy for the purpose. For the controls that render themselves (fonts, textures, etc), load up their vertices into that data structure and pull them back out to render. This seems simple enough, but it's a bit of a challenge to define an efficient structure (without nasty ifdefs) that all engines can use.
Create a render manager that does immediate renders of vertices+textures passed in for each rendering engine. This could likely be c/p from textures for the most part. This should allow the gl/gles/dx renderers to be written pretty quickly without too many regressions.
Switch to deferred rendering. This is the tricky part where I stalled. GL/GLES worked but I couldn't do DX. I wonder if it may be possible to do this on a per-engine basis, so that one wouldn't hold up the rest.

jmarshallnz · 2014-07-12T22:27:57Z

Cheers @theuni - agreed that the first steps would be moving the (small amount) of texture/font, engine-specific stuff into renderer so that it's all in one spot to begin with, so that guilib etc. is free of specifics.

Cory Fields added 21 commits May 1, 2013 20:29

rendermanager: add SceneGraph and abstracted render functions for gl/…

b8f0241

…gles

rendermanager: remove GUITexture inheritance and send batched regions…

64e2f9a

… to the scene graph

rendermanager: nuke old gl/gles texture renderers

bee2e88

rendermanager: fix renamed classes

eb1c87a

rendermanager: remove old texture uploader

537613e

rendermanager: add A8L8 texture format

e2e1193

This represents GL_LUMINANCE_ALPHA in GL-speak. Suitable for our fonts.

rendermanager: remove old font upload/render classes

4740447

rendermanager: remove old font-texture cleanup functions from Texture…

9849819

…Manager

rendermanager: add gles uniform for diffuse color

a22a761

We no longer have any need for per-vertex colors in the gui, so use a uniform as a significant performance gain

rendermanager: remove deleted files from build

d346a26

rendermanager: CBaseTexture fixups

a0a7841

rendermanager: use scene-graph DrawQuad

ef00d85

rendermanager: update shaders to use uniform color

ab27baf

Also change multiply order on multi+blend to save an op

rendermanager: abstract picture slideshow

f6a89ac

rendermanager: hack. Draw the current scene before rendering video

dc63218

This is a hack to avoid having to rewrite all of the current video renderers.

rendermanager: use the same video-render hack for render controls

a068908

rendermanager: fix dirty region solvers to report the whole screen wh…

9864acb

…en necessary This was not working correctly. Always report the whole screen as dirty for modes 0/3, so that the renderer can always simply render the dirty region list, and it will always show the full story.

rendermanager: add some basic doxy

7a5b9e3

smspillaz reviewed May 2, 2013
View reviewed changes

Cory Fields added 13 commits May 6, 2013 20:09

rendermanager: switch to vectors of shared_ptr

2d9df1b

This almost eliminates the copying around of vector elements, which was proving to be very expensive. Also added SetVertices() for a cheap means of moving all PackedVertices into a Batch

rendermanager: change fonts to use shared_ptr

42325bb

rendermanager: update textures to use shared_ptr

349e5db

rendermanager: update picture slideshow to use shared_ptr

3ee3756

rendermanager: update gles renderer to use shared_ptr

14f017d

rendermanager: no need for casts here

54285e0

rendermanager: forward-declare to avoid including SceneGraph.h

4036754

rendermanager: use a lighter include

9227817

rendermanager: remove unnecessary include

055e852

rendermanager: fixups for gl renderer after review and structural cha…

62bbc3e

…nges

rendermanager: forward-declare to avoid an include in GUITexture.h

1356c1a

rendermanager: plug a small mem leak

222b24b

theuni mentioned this pull request May 7, 2013

Add buffering to video renderers (without dropping) #2309

Closed

Memphiz and others added 2 commits May 8, 2013 23:44

[osx/ios/atv2] - sync xcode projects

82c9e03

Merge pull request #5 from Memphiz/rendermanager-master

86c6604

[osx/ios/atv2] - sync xcode projects

FernetMenta mentioned this pull request May 15, 2013

ASIC hang back again OpenELEC/OpenELEC.tv#2296

Closed

theuni referenced this pull request in opdenkamp/xbmc-pvr-addons Jun 9, 2013

vnsi: add gles rendering for vdr ui

298fcd3

MartijnKaijser closed this May 23, 2015

MartijnKaijser modified the milestones: Temporary freezer until devs have time, Abandoned, obsolete or superseeded May 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GUI Renderer abstraction and move to deferred rendering #2681

GUI Renderer abstraction and move to deferred rendering #2681

theuni commented May 2, 2013

theuni commented May 2, 2013

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

theuni commented May 7, 2013

stupid-boy commented May 7, 2013

theuni commented May 7, 2013

stupid-boy commented May 8, 2013

jmarshallnz commented Jul 12, 2014

theuni commented Jul 12, 2014

jmarshallnz commented Jul 12, 2014

GUI Renderer abstraction and move to deferred rendering #2681

GUI Renderer abstraction and move to deferred rendering #2681

Conversation

theuni commented May 2, 2013

theuni commented May 2, 2013

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

theuni commented May 7, 2013

stupid-boy commented May 7, 2013

theuni commented May 7, 2013

stupid-boy commented May 8, 2013

jmarshallnz commented Jul 12, 2014

theuni commented Jul 12, 2014

jmarshallnz commented Jul 12, 2014