Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUI Renderer abstraction and move to deferred rendering #2681

Closed
wants to merge 58 commits into from

Conversation

theuni
Copy link
Contributor

@theuni theuni commented May 2, 2013

This removes all render operations from guilib in favor of an abstracted scene-graph that receives batched render data for later render. When guilib has a texture/font ready for render, it packs up the data and adds it to the graph. At the end of each frame, that graph is evaluated by the renderer and drawn all in one go.

GL/GLES have been implemented, but directx is missing. Verified working on osx/linux/android. The scene-graph is very basic and intentionally non-robust. I think it wise to wait for a directx implementation before trying to settle on anything final.

Benefits:

  • New renderers (not only dx/gl/gles) as our GUI is now completely independent of rendering code. Our current gl/gles renderers are implemented here, but directx is missing.
  • We can be MUCH smarter and more efficient with our rendering. We can look at an entire scene at once and decide to cull invisible objects or batch similar ones. Some of those things have been implemented here, but for the most part I tried to keep it similar to past behavior.
  • Gives a big speedup when moving through the render loop, as we're now skipping lots of small graphics blocking operations.
  • g_graphicsContext could be eliminated for the most part, since graphics operations are now asynchronous and threadsafe.
  • Rendering could (and should) be moved to its own thread since all operations are now free of gui processing routines.
  • Possible to include as an external api for plugins and binary add-ons so that they can use the renderer of their choice, and simply give us the results to display.
  • Makes it much easier to decouple windowing from rendering, leading to the possibility of changing the renderer on the fly (gl/gles/glesv3).
  • lots more :)

This does not re-implement our video renderers. I would like to have included those as well, but it's simply too much to knock out at once. Instead, the scene-graph was created in such a way that it can be rendered partially, multiple times. So for operations where we temporarily lose control of our renderer (video playback or visualizations for example), the flow looks like this:

  • clear buffer
  • run through the render loop, adding controls to the scene graph
  • hit a render/video/fullscreen control
  • render the current scene-graph
  • let the video renderer do its thing
  • finish the gui loop, adding controls to a fresh scene graph
  • render the current scene-graph
  • repeat

The result of the above is the same as our current implementation, it's just moved into different chunks. Playback/vis are confirmed working fine.

There are a few other changes in here as well including some shader optimizations and general rendering improvements. One of the big ones was making diffuse-color a per-texture rather than per-vertex feature, so that it can be sent once per texture.

Known issues:

  • Missing directx support (the big one. hint ;)
  • Somewhat broken dirty-region mode 1/2. Deferred rendering conflicts somewhat with the idea behind these modes.
  • I'm sure I've missed a few render details

Implementing:
There is some basic doxy (see RenderSystem.h), but it remains to be seen if many changes are needed for directx. I've never touched it, so I'm not sure exactly how much it differs. In theory, it only requires adding upload/delete/render functions. For GL/GLES, it was basically just a c/p of the old texture rendering code, then a refactor to take advantage of the new functionality.
The renderer now takes on more responsibility for things that guilib used to handle, like clearing buffers, dirty-region handling, etc. I'll be very happy to help out with any porting questions.

Cory Fields added 21 commits May 1, 2013 20:29
Add a "RenderObject" member that represents an abstract gpu texture handle.
It is defined as a void*, and should be cast by each implementation to whatever
it represents. For example, gl/gles both define it as a GLuint (unsigned int).

It should NOT have a hard-coded build-time definition.

Also, add some helper functions for getting at the texture easier.
This represents GL_LUMINANCE_ALPHA in GL-speak. Suitable for our fonts.
1. Take advantage of CGUITextureBase to handle our texture upload.
   GUITextureBase has its own backing store, which is deleted after upload
   since it may be undefined. Because of this, we keep a cached copy of the
   texture in GUIFontTTF and submit the updates to the texture when needed.

2. Batch up the draws and send them to the scene graph.

3. Switch to a 2-byte texture. First byte is solid, second is the alpha value.
   When rendered, this will be treated by GL as RGBA (255,255,255,a). We use
   a uniform color throughout rather than per-vertex as before. Combining these
   things, we can use our texture shaders with zero state changes.

   Note: The texture is copied with the correct stride for any desired format,
   so if other renderers would prefer rgb/rgba/bgra/etc, they can easily be
   substituted per-renderer.
We no longer have any need for per-vertex colors in the gui, so use a uniform
as a significant performance gain
1.  Don't clear. GL needs to do this last thing before starting to render, else
    we can block needlessly while waiting for a buffer. Let the renderer handle
    it.

2.  Don't worry about dirty regions or scissoring. Do a RenderPass, then give
    the results to the renderer. It will best how to deal with the results.

3.  Draw the dirty region visualizers as part of the scene graph, just like any
    other GUI element.
Also change multiply order on multi+blend to save an op
This is a hack to avoid having to rewrite all of the current video renderers.
…en necessary

This was not working correctly. Always report the whole screen as dirty for
modes 0/3, so that the renderer can always simply render the dirty region list,
and it will always show the full story.
@theuni
Copy link
Contributor Author

theuni commented May 2, 2013

ping @topfs2 This is what we discussed a few nights ago. Please let me know if there's anything in the implementation that would impede a move to proper gpu projection/model operations. I'm hoping that it would only require extending BatchDraw to include matrix data.

ping @elupus. Same as the above. I know you worked on that at some point, I'd be curious to hear your thoughts.

ping @smspillaz. You asked for a ping when this was PR'd :)

ping @jmarshallnz Any input would be great.

@@ -561,6 +562,7 @@ void COverlayTextureGL::Render(SRenderState& state)
col[i][0] = col[i][1] = col[i][2] = col[i][3] = 1.0f;
}

glUniform4f(uniColLoc,(col[0][0]), (col[0][1]), (col[0][2]), (col[0][3]));

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Cory Fields added 13 commits May 6, 2013 20:09
This almost eliminates the copying around of vector elements, which was proving
to be very expensive.

Also added SetVertices() for a cheap means of moving all PackedVertices into a
Batch
Don't keep a copy of vertex data, since it's undefined after it's been added
anyway. Instead, calculate the number of vertices are required, allocate
exactly that many, then work directly on the destination's vector.

TODO: Same thing for fonts, they're a killer.
@theuni
Copy link
Contributor Author

theuni commented May 7, 2013

I believe I've addressed most of the concerns here (the one that remains is moving the pixel backing store to the Texture, but I'd prefer to hold on that until we can decide how to get font render cpu usage under control first).

The commits here are a bit messy because the interfaces are still a bit hazy, but I think it's approaching something sensible. I've done my best to eliminate any ping-pong'ing done during development, but some things may still look a bit out of order where I've squashed them or moved them around here.

I also reworked quite a bit, and I'm happier with the result now. In profiling the new batching methods, it was easy to see that a bulk of the effort was spent on creating, allocating, copying, and destroying buffers. After playing with a few different models, I think I've found an approach that works reasonably well. The vectors of batch data have moved to vectors of shared_ptr's of batch data. This greatly eliminates the alloc/copy burden, and most of it is handled behind the scenes.

While I was at it, I also moved BatchDraw into its own class, mainly just for future-proofing and cleanliness.

Here (ignoring the wonky coords order, I left that as-is) is a pretty good representation of the zero-copy method, which is (imo) what we should be shooting for.

The last big thing on my radar is trying to get font calculating/batching/copying under control, but I'm beginning to wonder if we would be better off solving it a higher level (flagging when text needs to render but has not changed since last frame, so cached batch data could be used). If that is a reasonable approach, then I think refactoring font batching would largely be a waste of time.

@stupid-boy
Copy link
Contributor

i believe your branch is working?
i ask because here ( at my setup ) things are going worse. with mainstream code from before few days i get around 35-36 frames in system info and rss text onto main screen is moving fluent. with code from your fork i get ~22-23 fps, text is with snatches and animations in transitions from one screen to another in system menu to are with snatches. on some places where previously i get ~30% cpu load, now i get 60+% cpu. vsync doesn't change things because in both situations demand is for higher refresh than my system can produce.
my rpi is 'ondemand' clocked at 1000 ( and is >97% loaded during this test ), selected and native resolution on my tv is 1920x1080@60, via HDMI. release build.

@theuni
Copy link
Contributor Author

theuni commented May 7, 2013

You're probably being hit with the extra cost of text processing mentioned above. Should be easy enough to test that theory. Find 2 screens that are similar except that one has more text than the other. If you're interested, gdb would probably give a quick peek into what's going on under the hood if you just randomly break a few times and see what function the main thread is in.

I'm still thinking on how to improve this.

@stupid-boy
Copy link
Contributor

may be. definitely text rendering has something here. lowest load i get is in main system settings window, where text is only on menu on left side. definitely last commit makes difference. before: more snatches in rss text, less visible in windows transition animations. now: little less in text, little more in transitions. roughly same fps and cpu load for both commits.

just to remember, @popcornmix reported above ~70fps in system info screen in debug, same screen at which i get ~22-23 in release. this lead me for next time to personal conclusion that there are two hardware versions, behaving differently. previously i see reports that exactly the same image behaves differently regardless vsync. for some it just work, for others, including me, not. may be i have 'bad' chance to own slower version, but that chance is good in providing capabilities for tests on this platform too. that is why i report my findings too.
that are just personal feelings about hardware versions and i can be wrong.

theuni referenced this pull request in opdenkamp/xbmc-pvr-addons Jun 9, 2013
@jmarshallnz
Copy link
Contributor

@smspillaz in case you might have time to look into this again at some point (or you, ofcourse @theuni :) ) I've rebased it up here:

https://github.com/jmarshallnz/xbmc/tree/deferred_renderer

I likely won't have time to actually work on it short-term, but at least it builds (it doesn't actually work ofcourse...)

@theuni
Copy link
Contributor Author

theuni commented Jul 12, 2014

@jmarshallnz Heh, I would actually love to get this in at some point because I spent so many days/nights cursing at the main-thread rendering (I have another branch somewhere that builds on top of this one, moving the rendering out of the main thread, but it was just an experimental hack). But this isn't a high priority for me right now :)

That said, it'd probably be more useful to use this PR as a guideline rather than getting it back in working order. IIRC profiling showed an unnecessary absurd amount of unnecessary vertex data copying going on. The premise was good though.

It could probably be broken into ~3 stages though, so it could be added piecemeal:

  • Define a data structure for a list of vertices. Here it was a vector of PackedVertex's. vector is probably far too heavy for the purpose. For the controls that render themselves (fonts, textures, etc), load up their vertices into that data structure and pull them back out to render. This seems simple enough, but it's a bit of a challenge to define an efficient structure (without nasty ifdefs) that all engines can use.
  • Create a render manager that does immediate renders of vertices+textures passed in for each rendering engine. This could likely be c/p from textures for the most part. This should allow the gl/gles/dx renderers to be written pretty quickly without too many regressions.
  • Switch to deferred rendering. This is the tricky part where I stalled. GL/GLES worked but I couldn't do DX. I wonder if it may be possible to do this on a per-engine basis, so that one wouldn't hold up the rest.

@jmarshallnz
Copy link
Contributor

Cheers @theuni - agreed that the first steps would be moving the (small amount) of texture/font, engine-specific stuff into renderer so that it's all in one spot to begin with, so that guilib etc. is free of specifics.

@MartijnKaijser MartijnKaijser modified the milestones: Temporary freezer until devs have time, Abandoned, obsolete or superseeded May 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet