-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GUI Renderer abstraction and move to deferred rendering #2681
Conversation
… to the scene graph
Add a "RenderObject" member that represents an abstract gpu texture handle. It is defined as a void*, and should be cast by each implementation to whatever it represents. For example, gl/gles both define it as a GLuint (unsigned int). It should NOT have a hard-coded build-time definition. Also, add some helper functions for getting at the texture easier.
This represents GL_LUMINANCE_ALPHA in GL-speak. Suitable for our fonts.
1. Take advantage of CGUITextureBase to handle our texture upload. GUITextureBase has its own backing store, which is deleted after upload since it may be undefined. Because of this, we keep a cached copy of the texture in GUIFontTTF and submit the updates to the texture when needed. 2. Batch up the draws and send them to the scene graph. 3. Switch to a 2-byte texture. First byte is solid, second is the alpha value. When rendered, this will be treated by GL as RGBA (255,255,255,a). We use a uniform color throughout rather than per-vertex as before. Combining these things, we can use our texture shaders with zero state changes. Note: The texture is copied with the correct stride for any desired format, so if other renderers would prefer rgb/rgba/bgra/etc, they can easily be substituted per-renderer.
We no longer have any need for per-vertex colors in the gui, so use a uniform as a significant performance gain
1. Don't clear. GL needs to do this last thing before starting to render, else we can block needlessly while waiting for a buffer. Let the renderer handle it. 2. Don't worry about dirty regions or scissoring. Do a RenderPass, then give the results to the renderer. It will best how to deal with the results. 3. Draw the dirty region visualizers as part of the scene graph, just like any other GUI element.
Also change multiply order on multi+blend to save an op
This is a hack to avoid having to rewrite all of the current video renderers.
…en necessary This was not working correctly. Always report the whole screen as dirty for modes 0/3, so that the renderer can always simply render the dirty region list, and it will always show the full story.
ping @topfs2 This is what we discussed a few nights ago. Please let me know if there's anything in the implementation that would impede a move to proper gpu projection/model operations. I'm hoping that it would only require extending BatchDraw to include matrix data. ping @elupus. Same as the above. I know you worked on that at some point, I'd be curious to hear your thoughts. ping @smspillaz. You asked for a ping when this was PR'd :) ping @jmarshallnz Any input would be great. |
@@ -561,6 +562,7 @@ void COverlayTextureGL::Render(SRenderState& state) | |||
col[i][0] = col[i][1] = col[i][2] = col[i][3] = 1.0f; | |||
} | |||
|
|||
glUniform4f(uniColLoc,(col[0][0]), (col[0][1]), (col[0][2]), (col[0][3])); |
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
This almost eliminates the copying around of vector elements, which was proving to be very expensive. Also added SetVertices() for a cheap means of moving all PackedVertices into a Batch
Don't keep a copy of vertex data, since it's undefined after it's been added anyway. Instead, calculate the number of vertices are required, allocate exactly that many, then work directly on the destination's vector. TODO: Same thing for fonts, they're a killer.
I believe I've addressed most of the concerns here (the one that remains is moving the pixel backing store to the Texture, but I'd prefer to hold on that until we can decide how to get font render cpu usage under control first). The commits here are a bit messy because the interfaces are still a bit hazy, but I think it's approaching something sensible. I've done my best to eliminate any ping-pong'ing done during development, but some things may still look a bit out of order where I've squashed them or moved them around here. I also reworked quite a bit, and I'm happier with the result now. In profiling the new batching methods, it was easy to see that a bulk of the effort was spent on creating, allocating, copying, and destroying buffers. After playing with a few different models, I think I've found an approach that works reasonably well. The vectors of batch data have moved to vectors of shared_ptr's of batch data. This greatly eliminates the alloc/copy burden, and most of it is handled behind the scenes. While I was at it, I also moved BatchDraw into its own class, mainly just for future-proofing and cleanliness. Here (ignoring the wonky coords order, I left that as-is) is a pretty good representation of the zero-copy method, which is (imo) what we should be shooting for. The last big thing on my radar is trying to get font calculating/batching/copying under control, but I'm beginning to wonder if we would be better off solving it a higher level (flagging when text needs to render but has not changed since last frame, so cached batch data could be used). If that is a reasonable approach, then I think refactoring font batching would largely be a waste of time. |
i believe your branch is working? |
You're probably being hit with the extra cost of text processing mentioned above. Should be easy enough to test that theory. Find 2 screens that are similar except that one has more text than the other. If you're interested, gdb would probably give a quick peek into what's going on under the hood if you just randomly break a few times and see what function the main thread is in. I'm still thinking on how to improve this. |
may be. definitely text rendering has something here. lowest load i get is in main system settings window, where text is only on menu on left side. definitely last commit makes difference. before: more snatches in rss text, less visible in windows transition animations. now: little less in text, little more in transitions. roughly same fps and cpu load for both commits. just to remember, @popcornmix reported above ~70fps in system info screen in debug, same screen at which i get ~22-23 in release. this lead me for next time to personal conclusion that there are two hardware versions, behaving differently. previously i see reports that exactly the same image behaves differently regardless vsync. for some it just work, for others, including me, not. may be i have 'bad' chance to own slower version, but that chance is good in providing capabilities for tests on this platform too. that is why i report my findings too. |
[osx/ios/atv2] - sync xcode projects
@smspillaz in case you might have time to look into this again at some point (or you, ofcourse @theuni :) ) I've rebased it up here: https://github.com/jmarshallnz/xbmc/tree/deferred_renderer I likely won't have time to actually work on it short-term, but at least it builds (it doesn't actually work ofcourse...) |
@jmarshallnz Heh, I would actually love to get this in at some point because I spent so many days/nights cursing at the main-thread rendering (I have another branch somewhere that builds on top of this one, moving the rendering out of the main thread, but it was just an experimental hack). But this isn't a high priority for me right now :) That said, it'd probably be more useful to use this PR as a guideline rather than getting it back in working order. IIRC profiling showed an unnecessary absurd amount of unnecessary vertex data copying going on. The premise was good though. It could probably be broken into ~3 stages though, so it could be added piecemeal:
|
Cheers @theuni - agreed that the first steps would be moving the (small amount) of texture/font, engine-specific stuff into renderer so that it's all in one spot to begin with, so that guilib etc. is free of specifics. |
This removes all render operations from guilib in favor of an abstracted scene-graph that receives batched render data for later render. When guilib has a texture/font ready for render, it packs up the data and adds it to the graph. At the end of each frame, that graph is evaluated by the renderer and drawn all in one go.
GL/GLES have been implemented, but directx is missing. Verified working on osx/linux/android. The scene-graph is very basic and intentionally non-robust. I think it wise to wait for a directx implementation before trying to settle on anything final.
Benefits:
This does not re-implement our video renderers. I would like to have included those as well, but it's simply too much to knock out at once. Instead, the scene-graph was created in such a way that it can be rendered partially, multiple times. So for operations where we temporarily lose control of our renderer (video playback or visualizations for example), the flow looks like this:
The result of the above is the same as our current implementation, it's just moved into different chunks. Playback/vis are confirmed working fine.
There are a few other changes in here as well including some shader optimizations and general rendering improvements. One of the big ones was making diffuse-color a per-texture rather than per-vertex feature, so that it can be sent once per texture.
Known issues:
Implementing:
There is some basic doxy (see RenderSystem.h), but it remains to be seen if many changes are needed for directx. I've never touched it, so I'm not sure exactly how much it differs. In theory, it only requires adding upload/delete/render functions. For GL/GLES, it was basically just a c/p of the old texture rendering code, then a refactor to take advantage of the new functionality.
The renderer now takes on more responsibility for things that guilib used to handle, like clearing buffers, dirty-region handling, etc. I'll be very happy to help out with any porting questions.