GUI Renderer abstraction and move to deferred rendering #2681

Closed
wants to merge 58 commits into
from

Projects

None yet
Member
theuni commented May 2, 2013

This removes all render operations from guilib in favor of an abstracted scene-graph that receives batched render data for later render. When guilib has a texture/font ready for render, it packs up the data and adds it to the graph. At the end of each frame, that graph is evaluated by the renderer and drawn all in one go.

GL/GLES have been implemented, but directx is missing. Verified working on osx/linux/android. The scene-graph is very basic and intentionally non-robust. I think it wise to wait for a directx implementation before trying to settle on anything final.

Benefits:

  • New renderers (not only dx/gl/gles) as our GUI is now completely independent of rendering code. Our current gl/gles renderers are implemented here, but directx is missing.
  • We can be MUCH smarter and more efficient with our rendering. We can look at an entire scene at once and decide to cull invisible objects or batch similar ones. Some of those things have been implemented here, but for the most part I tried to keep it similar to past behavior.
  • Gives a big speedup when moving through the render loop, as we're now skipping lots of small graphics blocking operations.
  • g_graphicsContext could be eliminated for the most part, since graphics operations are now asynchronous and threadsafe.
  • Rendering could (and should) be moved to its own thread since all operations are now free of gui processing routines.
  • Possible to include as an external api for plugins and binary add-ons so that they can use the renderer of their choice, and simply give us the results to display.
  • Makes it much easier to decouple windowing from rendering, leading to the possibility of changing the renderer on the fly (gl/gles/glesv3).
  • lots more :)

This does not re-implement our video renderers. I would like to have included those as well, but it's simply too much to knock out at once. Instead, the scene-graph was created in such a way that it can be rendered partially, multiple times. So for operations where we temporarily lose control of our renderer (video playback or visualizations for example), the flow looks like this:

  • clear buffer
  • run through the render loop, adding controls to the scene graph
  • hit a render/video/fullscreen control
  • render the current scene-graph
  • let the video renderer do its thing
  • finish the gui loop, adding controls to a fresh scene graph
  • render the current scene-graph
  • repeat

The result of the above is the same as our current implementation, it's just moved into different chunks. Playback/vis are confirmed working fine.

There are a few other changes in here as well including some shader optimizations and general rendering improvements. One of the big ones was making diffuse-color a per-texture rather than per-vertex feature, so that it can be sent once per texture.

Known issues:

  • Missing directx support (the big one. hint ;)
  • Somewhat broken dirty-region mode 1/2. Deferred rendering conflicts somewhat with the idea behind these modes.
  • I'm sure I've missed a few render details

Implementing:
There is some basic doxy (see RenderSystem.h), but it remains to be seen if many changes are needed for directx. I've never touched it, so I'm not sure exactly how much it differs. In theory, it only requires adding upload/delete/render functions. For GL/GLES, it was basically just a c/p of the old texture rendering code, then a refactor to take advantage of the new functionality.
The renderer now takes on more responsibility for things that guilib used to handle, like clearing buffers, dirty-region handling, etc. I'll be very happy to help out with any porting questions.

Cory Fields added some commits May 1, 2013
Cory Fields rendermanager: add SceneGraph and abstracted render functions for gl/…
…gles
b8f0241
Cory Fields rendermanager: remove GUITexture inheritance and send batched regions…
… to the scene graph
64e2f9a
Cory Fields rendermanager: nuke old gl/gles texture renderers bee2e88
Cory Fields rendermanager: remove inheritance from CBaseTexture.
Add a "RenderObject" member that represents an abstract gpu texture handle.
It is defined as a void*, and should be cast by each implementation to whatever
it represents. For example, gl/gles both define it as a GLuint (unsigned int).

It should NOT have a hard-coded build-time definition.

Also, add some helper functions for getting at the texture easier.
9b62152
Cory Fields rendermanager: fix renamed classes eb1c87a
Cory Fields rendermanager: remove old texture uploader 537613e
Cory Fields rendermanager: add A8L8 texture format
This represents GL_LUMINANCE_ALPHA in GL-speak. Suitable for our fonts.
e2e1193
Cory Fields rendermanager: abstract font rendering
1. Take advantage of CGUITextureBase to handle our texture upload.
   GUITextureBase has its own backing store, which is deleted after upload
   since it may be undefined. Because of this, we keep a cached copy of the
   texture in GUIFontTTF and submit the updates to the texture when needed.

2. Batch up the draws and send them to the scene graph.

3. Switch to a 2-byte texture. First byte is solid, second is the alpha value.
   When rendered, this will be treated by GL as RGBA (255,255,255,a). We use
   a uniform color throughout rather than per-vertex as before. Combining these
   things, we can use our texture shaders with zero state changes.

   Note: The texture is copied with the correct stride for any desired format,
   so if other renderers would prefer rgb/rgba/bgra/etc, they can easily be
   substituted per-renderer.
87191fe
Cory Fields rendermanager: remove old font upload/render classes 4740447
Cory Fields rendermanager: remove old font-texture cleanup functions from Texture…
…Manager
9849819
Cory Fields rendermanager: add gles uniform for diffuse color
We no longer have any need for per-vertex colors in the gui, so use a uniform
as a significant performance gain
a22a761
Cory Fields rendermanager: move most of the render logic out of GUIWindowManager
1.  Don't clear. GL needs to do this last thing before starting to render, else
    we can block needlessly while waiting for a buffer. Let the renderer handle
    it.

2.  Don't worry about dirty regions or scissoring. Do a RenderPass, then give
    the results to the renderer. It will best how to deal with the results.

3.  Draw the dirty region visualizers as part of the scene graph, just like any
    other GUI element.
01a213a
Cory Fields rendermanager: remove deleted files from build d346a26
Cory Fields rendermanager: CBaseTexture fixups a0a7841
Cory Fields rendermanager: use scene-graph DrawQuad ef00d85
Cory Fields rendermanager: update shaders to use uniform color
Also change multiply order on multi+blend to save an op
ab27baf
Cory Fields rendermanager: abstract picture slideshow f6a89ac
Cory Fields rendermanager: hack. Draw the current scene before rendering video
This is a hack to avoid having to rewrite all of the current video renderers.
dc63218
Cory Fields rendermanager: use the same video-render hack for render controls a068908
Cory Fields rendermanager: fix dirty region solvers to report the whole screen wh…
…en necessary

This was not working correctly. Always report the whole screen as dirty for
modes 0/3, so that the renderer can always simply render the dirty region list,
and it will always show the full story.
9864acb
Cory Fields rendermanager: add some basic doxy 7a5b9e3
Member
theuni commented May 2, 2013

ping @topfs2 This is what we discussed a few nights ago. Please let me know if there's anything in the implementation that would impede a move to proper gpu projection/model operations. I'm hoping that it would only require extending BatchDraw to include matrix data.

ping @elupus. Same as the above. I know you worked on that at some point, I'd be curious to hear your thoughts.

ping @smspillaz. You asked for a ping when this was PR'd :)

ping @jmarshallnz Any input would be great.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/cores/VideoRenderers/OverlayRendererGL.cpp
@@ -561,6 +562,7 @@ void COverlayTextureGL::Render(SRenderState& state)
col[i][0] = col[i][1] = col[i][2] = col[i][3] = 1.0f;
}
+ glUniform4f(uniColLoc,(col[0][0]), (col[0][1]), (col[0][2]), (col[0][3]));
smspillaz
smspillaz May 2, 2013 Contributor

This feels strange to me. In the fragment shaders we are now using a uniform color, but the vertex shaders still take a per-vertex color attribute and emit that into an m_color varying. Was that an oversight or intentional?

theuni
theuni May 2, 2013 Member

half/half. I was nervous to remove per-vertex coloring completely, for fear that it would be needed somewhere. It can still be sent and used by the renderer if necessary. The current gles renderer does not enable the color attrib, so it should be a no-op.

My reasoning for not removing it from the frag shader is that i believe some of the video renderers may be using it. But you're right, for gles, we should probably be able to nuke it completely.

smspillaz
smspillaz May 2, 2013 Contributor

Ah okay - I guess there are some other fragment shaders that use the per-vertex attribute.

jmarshallnz
jmarshallnz May 2, 2013 Member

Technically, button controls can set per-corner alpha. It's not actually used, though, and may have been removed already from the texture layers. This is what it was from originally. I wouldn't remove support for it at the lower level (shaders) unless it is a useful speed improvement.

theuni
theuni May 2, 2013 Member

Yea, iirc it gives roughly a ~30% overall speedup on mali. Something about the uniform makes the multiply free vs a per-pixel lookup when it's varying.

Colors can still be sent via attribute, but they're ignored by the gl/gles renderers atm because I didn't see it in use anywhere. Trivial to re-enable if necessary though.

theuni
theuni May 2, 2013 Member

Verified via the mali shader benchmark tools btw, I'm not pulling it out of my ass :)

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/cores/VideoRenderers/RenderManager.cpp
@@ -675,6 +677,7 @@ void CXBMCRenderManager::Render(bool clear, DWORD flags, DWORD alpha)
void CXBMCRenderManager::Present()
{
+ g_Windowing.DrawSceneGraph();
smspillaz
smspillaz May 2, 2013 Contributor

This feels curious. Is there a difference between CXBMCRenderManager::Present () and CXBMCRenderManager::RenderUpdate ?

theuni
theuni May 2, 2013 Member

Yes, the naming is all pretty terrible. I believe it's because we've gone through several iterations of what "render" actually means :)

One is used for fullscreen rendering, the other is used for video controls.

This is very much a hack, and hopefully a temporary one until video renderers can be brought under the same roof.

elupus
elupus May 2, 2013 Member

@@ -675,6 +677,7 @@ void CXBMCRenderManager::Render(bool clear, DWORD
flags, DWORD alpha)

void CXBMCRenderManager::Present()
{

  • g_Windowing.DrawSceneGraph();

This feels curious. Is there a difference between
CXBMCRenderManager::Present () and CXBMCRenderManager::RenderUpdate ?

Some. You may want to grab my commit from my 3d branch that cleans this up
a bit.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/guilib/DirtyRegionSolvers.cpp
@@ -34,14 +34,15 @@ void CUnionDirtyRegionSolver::Solve(const CDirtyRegionList &input, CDirtyRegionL
void CFillViewportAlwaysRegionSolver::Solve(const CDirtyRegionList &input, CDirtyRegionList &output)
{
- CDirtyRegion unifiedRegion(g_graphicsContext.GetViewWindow());
- output.push_back(unifiedRegion);
+ CRect screen(0,0,(float)g_graphicsContext.GetWidth(), (float)g_graphicsContext.GetHeight());
+ output.push_back(CDirtyRegion(screen));
smspillaz
smspillaz May 2, 2013 Contributor

Does CDirtyRegion have an int,int,int,int constructor? Could be useful to save constructions of multiple temporaries.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
@@ -281,6 +279,7 @@ bool CGUIFontTTFBase::Load(const CStdString& strFilename, float height, float as
delete(m_texture);
m_texture = NULL;
+ delete [] m_cachePixels;
delete[] m_char;
smspillaz
smspillaz May 2, 2013 Contributor

Is the preferred style delete[] or delete [] ?

(Also insert note here about how boost::scoped_array <>, boost::shared_array <> or std::vector <> might be a better fit).

theuni
theuni May 2, 2013 Member

Should be delete[], will change.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
@@ -718,90 +719,45 @@ void CGUIFontTTFBase::RenderCharacter(float posX, float posY, const Character *c
float tt = texture.y1 * m_textureScaleY;
float tb = texture.y2 * m_textureScaleY;
- // grow the vertex buffer if required
- if(m_vertex_count >= m_vertex_size)
+ m_color = color;
+ BatchDraw *draw = NULL;
+ BatchDraw temp;
+ for (std::vector<BatchDraw>::iterator i = m_batchDraws.begin(); i != m_batchDraws.end(); i++)
smspillaz
smspillaz May 2, 2013 Contributor

This can probably be ++i

theuni
theuni May 2, 2013 Member

Yep, will do.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
+ newHeight = CBaseTexture::PadPow2(newHeight);
+ if(!m_texture)
+ m_texture = new CBaseTexture;
+
+ m_texture->Allocate(m_textureWidth, newHeight, XB_FMT_A8L8);
+ if (!m_texture)
+ {
+ CLog::Log(LOGERROR, "GUIFontTTFGL::CacheCharacter: Error creating new cache texture for size %f", m_height);
+ delete m_texture;
+ return NULL;
+ }
+ m_textureHeight = m_texture->GetHeight();
+ m_textureWidth = m_texture->GetWidth();
+ m_texturePitch = m_texture->GetPitch();
+ m_textureBlockSize = m_texture->GetBlockSize();
+ unsigned char* newCachedPixels = new unsigned char[m_texturePitch * m_textureHeight];
smspillaz
smspillaz May 2, 2013 Contributor

I'm not so sure of its still an issue, but I've run into trouble in the past when doing arithmetic inside of new []. I'd suggest storing m_texturePitch * m_textureHeight in a size_t temporary or consider using a vector.

theuni
theuni May 2, 2013 Member

Never seen an issue with such basic operations, but it can't hurt to use a quick temporary. will do.

@smspillaz smspillaz and 2 others commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.h
@@ -71,18 +61,18 @@ class CGUIFontTTFBase
public:
CGUIFontTTFBase(const CStdString& strFileName);
- virtual ~CGUIFontTTFBase(void);
+ ~CGUIFontTTFBase(void);
smspillaz
smspillaz May 2, 2013 Contributor

I'm assuming the intention here is to make it so that nobody should inherit CGUIFontTTFBase. Is that made clear in the rest of the code?

theuni
theuni May 2, 2013 Member

Yes, the only reason to inherit was to add rendering operations, which are no longer permitted.

davilla
davilla May 2, 2013 Contributor

if you have virtual functions, you must have a virtual destructor.. in fact you will get a compiler warning about it.

theuni
theuni May 2, 2013 Member

there are no virtual functions left.

@smspillaz smspillaz commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.h
};
-
-#if defined(HAS_GL) || defined(HAS_GLES)
-#include "GUIFontTTFGL.h"
-#define CGUIFontTTF CGUIFontTTFGL
-#elif defined(HAS_DX)
-#include "GUIFontTTFDX.h"
-#define CGUIFontTTF CGUIFontTTFDX
-#endif
-
+#define CGUIFontTTF CGUIFontTTFBase
smspillaz
smspillaz May 2, 2013 Contributor

Can this be avoided with a find-and-replace or a typedef?

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUITexture.cpp
@@ -229,9 +230,18 @@ void CGUITextureBase::Render()
Render(m_vertex.x2 - m_info.border.x2, m_vertex.y2 - m_info.border.y2, m_vertex.x2, m_vertex.y2, u2, v2, u3, v3, u3, v3);
}
- // close off our renderer
- End();
-
+ m_batchDraw.texture=m_texture.m_textures[m_currentFrame];
smspillaz
smspillaz May 2, 2013 Contributor

Is the preferred style foo=bar or foo = bar ?

theuni
theuni May 2, 2013 Member

yep, foo = bar. will do.

@smspillaz smspillaz and 2 others commented on an outdated diff May 2, 2013
xbmc/guilib/GUITexture.cpp
@@ -229,9 +230,18 @@ void CGUITextureBase::Render()
Render(m_vertex.x2 - m_info.border.x2, m_vertex.y2 - m_info.border.y2, m_vertex.x2, m_vertex.y2, u2, v2, u3, v3, u3, v3);
}
- // close off our renderer
- End();
-
+ m_batchDraw.texture=m_texture.m_textures[m_currentFrame];
+ m_batchDraw.color = color;
+ m_batchDraw.dirty = true;
+ if (m_diffuse.size())
+ m_batchDraw.diffuseTexture=m_diffuse.m_textures[0];
+ else
+ m_batchDraw.diffuseTexture=NULL;
+ if (m_batchDraw.vertices.size())
smspillaz
smspillaz May 2, 2013 Contributor

Possible law-of-demeter violation - does GUITexture really need access to vertices or just whether or not its nonempty. In that case, a wrapper method around vertices.empty () (as opposed to vertices.size ()) might make more sense here.

theuni
theuni May 2, 2013 Member

Yep, agreed. Though ultimately, I think we'll probably drop the vector and use a simple array. The vector is very clunky to use for this purpose, but also I have a feeling that once we start doing heavy operations in the scene-graph, the cost of the vector might start to really outweigh the benefits. It was only used to avoid common pitfalls during the restructure.

smspillaz
smspillaz May 2, 2013 Contributor

Hmm. I guess as long a space is reserved in the vector you won't get a hit from push_back allocation. In any case, if the renderer doesn't need to know about the vertices then then this "there are vertices in this array" abstraction should be provided by a separate method.

jmarshallnz
jmarshallnz May 2, 2013 Member

Don't you just want !m_batchDraw.vertices.empty() anyway? The texture definitely needs to know about the vertices.

theuni
theuni May 2, 2013 Member

Also, the renderers check to be sure we have vertices. There's no sense in doing the sanity tests in both places. And doing it here would make it more likely for the renderers to behave the same way, so here probably makes the most sense.

I'll make it !m_batchDraw.vertices.empty() (a wrapper seems like a micro-optim at best), and drop the check in the renderers.

theuni
theuni May 2, 2013 Member

On second thought, we could do the basic sanity checks in MergeSimilar (or some step as the scene-graph evolves) and get the best of both. I'll move forward that way for now.

Contributor

Note: I'm just able to do style / simple-gl type reviews. I'm not overly familiar with xbmc's rendering architecture but I get how the render batching works etc.

Member
theuni commented May 2, 2013

Sure, it's very helpful.

I probably should've mentioned (if it's not clear enough) that this is definitely an RFC, so I'm mainly looking to see if I've done anything stupid that discounts any other possible implementations. I've tried to be as un-invasive and simple as possible, with the assumption that it will need some architectural changes before tidying things up.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
}
}
+ if (!draw)
jmarshallnz
jmarshallnz May 2, 2013 Member

I dunno if it makes sense, but having a map<color, vector > would make this more efficient. You'd then have K batchdraw vectors to hand off rather than one in End().

EDIT: Actually, by the looks you really want map<color, BatchDraw>

theuni
theuni May 2, 2013 Member

roger.

@jmarshallnz jmarshallnz and 2 others commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
+}
+
+void CGUIFontTTF::End()
+{
+ if (m_nestedBeginCount == 0)
+ return;
+
+ if (--m_nestedBeginCount > 0)
+ return;
+
+ for (std::vector<BatchDraw>::iterator i = m_batchDraws.begin(); i != m_batchDraws.end(); i++)
+ {
+ i->texture = m_texture;
+ i->diffuseTexture = NULL;
+ i->dirty = true;
+ CSceneGraph *sceneGraph = g_Windowing.GetSceneGraph();
jmarshallnz
jmarshallnz May 2, 2013 Member

Doing the Get outside the loop might make sense?

theuni
theuni May 2, 2013 Member

heh, ouch. yep.

davilla
davilla May 2, 2013 Contributor

++i on that iterator

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
+ unsigned char *dst2 = target;
+ unsigned char *src2 = source;
+ for (int x = 0; x < bitmap.width; x++, src2++)
+ {
+ memset(dst2, 0xff, m_textureBlockSize - 1);
+ dst2+= m_textureBlockSize -1;
+ *dst2++ = src2[0];
+ }
+ source += bitmap.width;
+ target += m_texturePitch;
+ }
+ // THE SOURCE VALUES ARE THE SAME IN BOTH SITUATIONS.
+
+ // Since we have a new texture, we need to delete the old one
+ // the Begin(); End(); stuff is handled by whoever called us
+ m_texture->Update(m_textureWidth, m_textureHeight, m_texturePitch, XB_FMT_A8L8, m_cachePixels, true);
jmarshallnz
jmarshallnz May 2, 2013 Member

Hmm, this is kinda crappy as you completely copy the texture for every change. This is how it used to be done on GL, but there might be faster methods. It would be better if you instead could tell it to just update a portion of the texture. That way renderers that support it can. To do this I think the texture object needs to know that it should be updateable so it can keep m_pixels around. i.e. move the pixel cache to the texture.

theuni
theuni May 2, 2013 Member

Yea, I really really don't like this either. I landed on doing it this way because I figured it would be easiest to implement across the board.

I don't think that keeping m_pixels around is safe universally. My fear was that some implementations (especially those that can provide mapped pointers to physical locations) could cause m_texture to be invalidated after upload.

Though I suppose we could add an option in CTexture for whether it should be kept around if not, then the implementations could decide?

jmarshallnz
jmarshallnz May 2, 2013 Member

Yeah - adding a bool to CTexture was the idea I was suggesting, yup.

theuni
theuni May 2, 2013 Member

Heh, re-reading i see now that's exactly what you said. Will do.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
+ unsigned char *src2 = source;
+ for (int x = 0; x < bitmap.width; x++, src2++)
+ {
+ memset(dst2, 0xff, m_textureBlockSize - 1);
+ dst2+= m_textureBlockSize -1;
+ *dst2++ = src2[0];
+ }
+ source += bitmap.width;
+ target += m_texturePitch;
+ }
+ // THE SOURCE VALUES ARE THE SAME IN BOTH SITUATIONS.
+
+ // Since we have a new texture, we need to delete the old one
+ // the Begin(); End(); stuff is handled by whoever called us
+ m_texture->Update(m_textureWidth, m_textureHeight, m_texturePitch, XB_FMT_A8L8, m_cachePixels, true);
+ return TRUE;
jmarshallnz
jmarshallnz May 2, 2013 Member

no need to shout (yeah, I see it's COPY AND PASTED) :p

theuni
theuni May 2, 2013 Member

haha, I don't have a clue what that means, but it doesn't apply anymore :)

Will remove

@jmarshallnz jmarshallnz commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.h
@@ -27,6 +27,8 @@
*
*/
+#include "rendering/SceneGraph.h"
+
jmarshallnz
jmarshallnz May 2, 2013 Member

We'd want this forwarded (or a simpler include) ideally.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUITexture.cpp
m_isAllocated = NO;
+ m_batchDraw.texture = NULL;
jmarshallnz
jmarshallnz May 2, 2013 Member

Does it make sense to clear everything at this point (i.e. having some BatchDraw::Clear() function?)

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIWindowManager.cpp
if (g_advancedSettings.m_guiVisualizeDirtyRegions)
{
+ hasRendered = true;
+ BatchDraw quad, quad2;
+ PackedVertices dirtyvertices, dirtyvertices2;
jmarshallnz
jmarshallnz May 2, 2013 Member

not required

theuni
theuni May 2, 2013 Member

testing code, will nuke the redundancy

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/Texture.h
/*!
\ingroup textures
\brief Base texture class, subclasses of which depend on the render spec (DX, GL etc.)
*/
class CBaseTexture
{
-
+friend class CRenderSystemGL;
+friend class CRenderSystemGLES;
theuni
theuni May 2, 2013 Member

Whoops, more testing code. I think I added all the helpers required. I'll make these go away.

@jmarshallnz jmarshallnz and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIWindowManager.cpp
@@ -530,7 +530,6 @@ void CGUIWindowManager::RenderPass()
CGUIWindow* pWindow = GetWindow(GetActiveWindow());
if (pWindow)
{
- pWindow->ClearBackground();
jmarshallnz
jmarshallnz May 2, 2013 Member

You can't just throw this one away :)

theuni
theuni May 2, 2013 Member

Best I can tell, that just works its way down to a full-screen clear. Did I miss something along the way?

jmarshallnz
jmarshallnz May 2, 2013 Member

There's a colour involved IIRC, along with an optional clear by the skinner (they can set the background color to 0 if they never require clearing due a texture always being present). The latter could be eliminated automatically at some point perhaps, but until then we should keep it in.

theuni
theuni May 2, 2013 Member

Ok, well the order here is crucial. It's important (for some GL drivers) that we do a glClear at the beginning of each frame, as it tends to be a hint to drivers that they can throw away any current buffers. It's also very important to delay this, since if you draw then clear, you're often waiting an entire vsync while that glClear blocks while waiting for a free buffer. Timing tests shows that this makes a drastic difference in our render loop.

So, I suppose we might need a new ClearScene() function in the SceneGraph that mimics glClear(color), which will add it in the pipeline for later. Would that suffice?

jmarshallnz
jmarshallnz May 2, 2013 Member

Yup, exactly.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ {
+ object = CreateTextureObject();
+ baseTexture->SetTextureObject(object);
+ }
+ baseTexture->m_loadedToGPU = LoadToGPU(object, width, height, pitch, rows, format, (const unsigned char*)baseTexture->GetPixels());
+ VerifyGLState();
+ return true;
+}
+
+void CRenderSystemGLES::BindToUnit(CBaseTexture *baseTexture, unsigned int unit)
+{
+ TextureObject object = baseTexture->GetTextureObject();
+ if (!object)
+ return;
+ glActiveTexture(GL_TEXTURE0 + unit);
+ glBindTexture(GL_TEXTURE_2D, *((GLuint*)object));
smspillaz
smspillaz May 2, 2013 Contributor

Small optimization - you can keep track of the currently active texture unit and textures bound to each unit inside of CRenderSystemGL/ES and then skip calls into OpenGL in those cases.

theuni
theuni May 2, 2013 Member

Agreed, see the note at https://github.com/xbmc/xbmc/pull/2681/files#L42R787 . There are probably a few others to do this for as well.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ {
+ if (dirtyRegions)
+ {
+ for (CDirtyRegionList::const_iterator region = dirtyRegions->begin(); region != dirtyRegions->end(); region++)
+ {
+ SetScissors(*region);
+ ClearBuffers(0);
+ ResetScissors();
+ }
+ }
+ else
+ ClearBuffers(0);
+ m_needsClear = false;
+ }
+
+ for(CSceneGraph::const_iterator i = sceneGraph->begin(); i != sceneGraph->end(); i++)
smspillaz
smspillaz May 2, 2013 Contributor

s/i++/++i/

garbear
garbear May 2, 2013 Member

Do we support any compilers where this change causes a difference in the generated assembly (and when optimization isn't purposefully being forced to off)?

And sorry, my choices are either to do the tests myself, link to a couple of stack overflow articles, or leave this question rhetorical :) The point being, i++ vs ++i is usually cosmetic as far as primitives, pointers and STL containers are concerned.

smspillaz
smspillaz May 2, 2013 Contributor

I believe that anywhere you have a custom operator++ defined on a class and you make a temporary copy of the class in operator++ then the compiler cannot optimize that away because nontrivial code may run in the class constructor.

Its a very minor detail, but more or less just about avoiding premature optimizations whilst you're at it.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
@@ -631,4 +632,318 @@ GLint CRenderSystemGLES::GUIShaderGetCoord1()
return -1;
}
+GLint CRenderSystemGLES::GUIShaderGetUniCol()
+{
+ if (m_pGUIshader[m_method])
+ return m_pGUIshader[m_method]->GetUniColLoc();
+
+ return -1;
+}
+
+void CRenderSystemGLES::DrawSceneGraphImpl(const CSceneGraph *sceneGraph, const CDirtyRegionList *dirtyRegions)
+{
+
+ if (!m_bRenderCreated)
smspillaz
smspillaz May 2, 2013 Contributor

This feels like a design issue to me. It shouldn't even be possible to call DrawSceneGraphImpl without a constructed render system.

theuni
theuni May 2, 2013 Member

As it stands, no, this could be removed.

I went back and forth for a while with the design, trying to decide where the scenegraph should come from (should there be only one? a singleton class? arbitrary number?). Ultimately I gave it to the renderer to hold, but I'm not entirely convinced that's the best approach. It's a bit clunky to grab the graph from the renderer, operate on it, let it go, then have the renderer draw the scene. But I wanted an implementation-independent graph, so that they can all share the culling/rearranging functionality.

I think it may take some experimentation to get that part just right.

smspillaz
smspillaz May 2, 2013 Contributor

Noted, lets deal with it later :)

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ else
+ range = 255 - 0;
+
+ GLint posLoc = -1;
+ GLint tex0Loc = -1;
+ GLint tex1Loc = -1;
+ GLint uniColLoc = -1;
+
+ for(CSceneGraph::const_iterator i = sceneGraph->begin(); i != sceneGraph->end(); i++)
+ {
+ if (i->texture)
+ LoadToGPU(i->texture);
+ if (i->diffuseTexture)
+ LoadToGPU(i->diffuseTexture);
+
+ if (i->vertices.size() < 4)
smspillaz
smspillaz May 2, 2013 Contributor

This feels like it wouldn't have any effect?

theuni
theuni May 2, 2013 Member

quick sanity check in case someone pushed in a batch with no vertices. Another example of where the vector proves clunky.

smspillaz
smspillaz May 2, 2013 Contributor

Hmm, although the continue; statement is at the end of the loop. Is it a future proofing thing?

theuni
theuni May 2, 2013 Member

erm, hah! I see now. Looks like more c/p gone bad (luckily you guys are here to spot them)

It should of course be at the top of the loop. The intention was to detect any noops before introducing pipeline stalls by uploading textures needlessly.

But as discussed in another comment, I'll move these checks into MergeSimilar in the scene graph instead, so that the renderer will get pre-validated batches.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ LoadToGPU(i->texture);
+ if (i->diffuseTexture)
+ LoadToGPU(i->diffuseTexture);
+
+ if (i->vertices.size() < 4)
+ continue;
+ }
+
+ // Clear directly before the first draw in each frame
+ if (m_needsClear)
+ {
+ if (dirtyRegions)
+ {
+ for (CDirtyRegionList::const_iterator region = dirtyRegions->begin(); region != dirtyRegions->end(); region++)
+ {
+ SetScissors(*region);
smspillaz
smspillaz May 2, 2013 Contributor

Is this necessary? Maybe it is for the TBDR case, but usually you'd just end up drawing on top of whatever was there, right?

theuni
theuni May 2, 2013 Member
  1. see the comment above to @jmarshallnz about clearing.
  2. trying to keep current mainline behavior for the most part, changes/optims as a second step
@smspillaz smspillaz commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ {
+ if (i->texture)
+ LoadToGPU(i->texture);
+ if (i->diffuseTexture)
+ LoadToGPU(i->diffuseTexture);
+
+ if (i->vertices.size() < 4)
+ continue;
+ }
+
+ // Clear directly before the first draw in each frame
+ if (m_needsClear)
+ {
+ if (dirtyRegions)
+ {
+ for (CDirtyRegionList::const_iterator region = dirtyRegions->begin(); region != dirtyRegions->end(); region++)
smspillaz
smspillaz May 2, 2013 Contributor

++region.

@smspillaz smspillaz commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+
+ if (!m_bRenderCreated)
+ return;
+ int range, unit = 0;
+
+ if(g_Windowing.UseLimitedColor())
+ range = 235 - 16;
+ else
+ range = 255 - 0;
+
+ GLint posLoc = -1;
+ GLint tex0Loc = -1;
+ GLint tex1Loc = -1;
+ GLint uniColLoc = -1;
+
+ for(CSceneGraph::const_iterator i = sceneGraph->begin(); i != sceneGraph->end(); i++)
smspillaz
smspillaz May 2, 2013 Contributor

++i. Maybe I'll stop doing these ones :P

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ *itr++ = j + 3;
+ }
+
+ if (i->texture)
+ BindToUnit(i->texture, unit++);
+
+ r = GET_R(i->color) * range / 255;
+ g = GET_G(i->color) * range / 255;
+ b = GET_B(i->color) * range / 255;
+ a = GET_A(i->color);
+
+ bool hasAlpha = a < 255 || (i->texture && i->texture->HasAlpha());
+
+ if (i->diffuseTexture)
+ {
+ BindToUnit(i->diffuseTexture, unit++);
smspillaz
smspillaz May 2, 2013 Contributor

What happens when we run out of units?

theuni
theuni May 2, 2013 Member

We only use 2 :)

The unit++ syntax was just c/p from the old texturing code, it really doesn't fit anymore. Will hard-code instead.

@smspillaz smspillaz and 2 others commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ }
+ else if (i->texture)
+ {
+ EnableGUIShader(SM_TEXTURE);
+ }
+ else
+ {
+ EnableGUIShader(SM_DEFAULT);
+ }
+
+ posLoc = GUIShaderGetPos();
+ tex0Loc = GUIShaderGetCoord0();
+ tex1Loc = GUIShaderGetCoord1();
+ uniColLoc = GUIShaderGetUniCol();
+
+ glUniform4f(uniColLoc, ((float)r / 255.0), ((float)g / 255.0), ((float)b / 255.0), ((float)a / 255.0));
smspillaz
smspillaz May 2, 2013 Contributor

Don't think there's any need to cast numerators here.

theuni
theuni May 2, 2013 Member

correct, will remove.

jmarshallnz
jmarshallnz May 2, 2013 Member

At the same time, make the denominators 255.0f

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ EnableGUIShader(SM_MULTI_BLENDCOLOR);
+ }
+ }
+ else if (i->texture)
+ {
+ EnableGUIShader(SM_TEXTURE);
+ }
+ else
+ {
+ EnableGUIShader(SM_DEFAULT);
+ }
+
+ posLoc = GUIShaderGetPos();
+ tex0Loc = GUIShaderGetCoord0();
+ tex1Loc = GUIShaderGetCoord1();
+ uniColLoc = GUIShaderGetUniCol();
smspillaz
smspillaz May 2, 2013 Contributor

I don't know about the implementation of GUIShaderGet... - hopefully these are cached - glGetUniformLocation is kinda expensive.

theuni
theuni May 2, 2013 Member

they should be, yes.

@smspillaz smspillaz and 2 others commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ glVertexAttribPointer(tex1Loc, 2, GL_FLOAT, GL_FALSE, sizeof(PackedVertex), (char*)&i->vertices[0] + offsetof(PackedVertex, u2));
+ glEnableVertexAttribArray(tex1Loc);
+ }
+
+ if ( hasAlpha )
+ {
+ glEnable( GL_BLEND );
+ glBlendFuncSeparate(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA, GL_ONE_MINUS_DST_ALPHA, GL_ONE);
+ }
+ else
+ {
+ glDisable(GL_BLEND);
+ }
+
+ glEnableVertexAttribArray(posLoc);
+ glVertexAttribPointer(posLoc, 3, GL_FLOAT, GL_FALSE, sizeof(PackedVertex), (char*)&i->vertices[0] + offsetof(PackedVertex, x));
smspillaz
smspillaz May 2, 2013 Contributor

I think glVertexAttribPointer takes GLvoid * instead of char *. Not that this really matters here.

A broader design consideration would be to use vertex buffer objects instead of client side vertex arrays. But that's probably an issue for after this PR.

theuni
theuni May 2, 2013 Member

I have a branch around somewhere that switches to vbos, but it's really not that helpful until we do a better job with caching and our matrices. We'd be updating once per frame, so it's not much of a win.

But it's trivial to do the once-per-frame, so we can give it a shot after the initial changes are merged.

theuni
theuni May 2, 2013 Member

@topfs2 cough ^^ :)

jmarshallnz
jmarshallnz May 2, 2013 Member

It's a char* so that the pointer arithmetic works.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ ResetScissors();
+ }
+ }
+ else
+ {
+ glDrawElements(GL_TRIANGLES, triangleVerts, GL_UNSIGNED_SHORT, idx);
+ }
+ // Don't reset state until we're done with all batches. Only reset what
+ // could affect the next set of textures.
+ // TODO: Create wrappers for things like blending on/off and shader
+ // switching to prevent unnecessary changes.
+
+ if (i->diffuseTexture)
+ {
+ glDisableVertexAttribArray(tex1Loc);
+ glBindTexture(GL_TEXTURE_2D, 0);
theuni
theuni May 2, 2013 Member

what am i looking at? glvoid?

smspillaz
smspillaz May 2, 2013 Contributor

Whoops, sorry, that comment must have ended up in the wrong place. Yeah, I was referring to the GLvoid * thing aboove but its a moot point now so don't worry about it :)

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ {
+ glDrawElements(GL_TRIANGLES, triangleVerts, GL_UNSIGNED_SHORT, idx);
+ }
+ // Don't reset state until we're done with all batches. Only reset what
+ // could affect the next set of textures.
+ // TODO: Create wrappers for things like blending on/off and shader
+ // switching to prevent unnecessary changes.
+
+ if (i->diffuseTexture)
+ {
+ glDisableVertexAttribArray(tex1Loc);
+ glBindTexture(GL_TEXTURE_2D, 0);
+ }
+ if (i->texture)
+ {
+ glActiveTexture(GL_TEXTURE0);
smspillaz
smspillaz May 2, 2013 Contributor

I don't think the texture is guaranteed to be on GL_TEXTURE0 - if it is, make a note about that assumption.

theuni
theuni May 2, 2013 Member

always has been. assumed insider info i suppose. of course you're correct about documenting.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/rendering/gles/RenderSystemGLES.cpp
+ glBindTexture(GL_TEXTURE_2D, 0);
+ }
+ if (i->texture)
+ {
+ glActiveTexture(GL_TEXTURE0);
+ glBindTexture(GL_TEXTURE_2D, 0);
+ glDisableVertexAttribArray(tex0Loc);
+ }
+ glDisableVertexAttribArray(posLoc);
+ }
+
+ glDisableVertexAttribArray(tex0Loc);
+ glDisableVertexAttribArray(tex1Loc);
+ DisableGUIShader();
+
+ glActiveTexture(GL_TEXTURE0);
smspillaz
smspillaz May 2, 2013 Contributor

This is probably redundant in light of line 799/

theuni
theuni May 2, 2013 Member

this is in case we add any texture units in the future, so that we always leave the scene set to 0 (notice it's outside the batch loop).

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/SceneGraph.h
+{
+ BatchDraw() : texture(NULL), diffuseTexture(NULL), dirty(true), color(0) { vertices.reserve(4); };
+ CBaseTexture *texture;
+ CBaseTexture *diffuseTexture;
+ bool dirty;
+ uint32_t color;
+
+ PackedVertices vertices;
+};
+
+class CSceneGraph
+{
+public:
+ typedef const BatchDraw *const_iterator;
+ CSceneGraph();
+ CSceneGraph(const CSceneGraph& other);
smspillaz
smspillaz May 2, 2013 Contributor

Should CSceneGraph really be copyable here? If it should, then a definition of the copy assignment operator is also necessary as it is not trivially copyable.

theuni
theuni May 2, 2013 Member

See earlier comment, this is left over from when I was trying various approaches to passing the scene around. For now, we'll just remove it.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/SceneGraph.h
+ void DrawQuad(const CRect &rect, color_t color, CBaseTexture *texture=NULL, const CRect *texCoords=NULL);
+
+// Only use these on a COPY of the Scene Graph. Forced double-buffering helps
+// to avoid complicated Locking.
+
+ /*! \brief helper function for iterating through batches*/
+ const_iterator begin() const;
+
+ /*! \brief helper function for iterating through batches*/
+ const_iterator end() const;
+
+ /*! \brief Perform quick optimizations on the scene based on similar
+ attributes. Contiguous batches can be merged under some conditions which
+ results in less draw calls
+ */
+ void MergeSimilar();
smspillaz
smspillaz May 2, 2013 Contributor

Feels a bit weird having this be part of the public interface.

If CRenderSystem just depends on m_batches then maybe a helper function which returns those should have the side effect of calling MergeSimilar before doing that?

theuni
theuni May 2, 2013 Member

I wanted to wait for directx before doing that. Not sure if it's overly-optimistic to believe that these operations will be render-agnostic.

Pretty much this entire interface could be described the same way currently.

smspillaz
smspillaz May 2, 2013 Contributor

Noted, thanks.

@smspillaz smspillaz and 1 other commented on an outdated diff May 2, 2013
xbmc/rendering/SceneGraph.cpp
+ packedvertices[1].v1 = coords.y1;
+ packedvertices[2].u1 = coords.x2;
+ packedvertices[2].v1 = coords.y2;
+ packedvertices[3].u1 = coords.x1;
+ packedvertices[3].v1 = coords.y2;
+ }
+
+ quad.color = color;
+ quad.texture = texture;
+ quad.dirty = true;
+ quad.diffuseTexture = NULL;
+ quad.vertices = packedvertices;
+ Add(quad);
+}
+
+void CSceneGraph::RectToVertices(const CRect &rect, PackedVertices &packedvertices)
smspillaz
smspillaz May 2, 2013 Contributor

Both this and DrawQuad can be nonmember nonfriend functions in the same namespace.

theuni
theuni May 2, 2013 Member

drawquad is tied to an instance (whether it should be is debatable). RectToVertices, sure.

smspillaz
smspillaz May 2, 2013 Contributor

Ah. It is tied to the instance, but only in a very weak way. Add () is part of the public interface so this is trivially refactorable if you think it would make sense.

@smspillaz smspillaz commented on the diff May 2, 2013
xbmc/rendering/gl/RenderSystemGL.cpp
+
+ if(g_Windowing.UseLimitedColor())
+ range = 235 - 16;
+ else
+ range = 255 - 0;
+
+ glBlendFunc(GL_SRC_ALPHA,GL_ONE_MINUS_SRC_ALPHA);
+ glPolygonMode(GL_FRONT_AND_BACK, GL_FILL);
+
+ glTexEnvf(GL_TEXTURE_ENV, GL_TEXTURE_ENV_MODE, GL_COMBINE);
+ glTexEnvf(GL_TEXTURE_ENV, GL_OPERAND0_RGB, GL_SRC_COLOR);
+ glTexEnvi(GL_TEXTURE_ENV, GL_COMBINE_ALPHA, GL_MODULATE);
+ glTexEnvi(GL_TEXTURE_ENV, GL_SOURCE0_ALPHA, GL_TEXTURE0);
+ glTexEnvi(GL_TEXTURE_ENV, GL_OPERAND0_ALPHA, GL_SRC_ALPHA);
+ glTexEnvi(GL_TEXTURE_ENV, GL_SOURCE1_ALPHA, GL_PRIMARY_COLOR);
+ glTexEnvf(GL_TEXTURE_ENV, GL_OPERAND1_ALPHA, GL_SRC_ALPHA);
smspillaz
smspillaz May 2, 2013 Contributor

Hmm. Maybe its about time that the fixed function mode in CRenderSystemGL was killed in favor of a shader based renderer like CRenderSystemGLES ?

theuni
theuni May 2, 2013 Member

Yep, see earlier comment about keeping things the same for this PR.

My hope is to add shader-based GL as a new renderer. This way we could have gl2/gl3/gl4 selected at runtime based on hw capabilities. Iirc @jmarshallnz has a GL shader implementation around somewhere already that could probably be c/p for the most part.

theuni
theuni May 2, 2013 Member

Feel free to point out that i changed from quads to triangles, and single vertex calls to glDrawElements ;)
But that was done to keep gl/gles in-line as much as possible while not having to jump to a different rendering approach/standard.

jmarshallnz
jmarshallnz May 2, 2013 Member

My "implementation" was literally a c&p from the GLES, with just the (very) minor changes the shaders needed.

theuni
theuni May 2, 2013 Member

Well, now is a good time to decide how to handle these subtle differences. We can start down the inheritance path, which we always seem to end up undoing at some point. Or we can do a proper factory with calls into the current renderer. Or we can just flat-out copy/paste.

It's largely irrelevant until we can change renderers on the fly, but it's worth putting some thought into it now.

smspillaz
smspillaz May 2, 2013 Contributor

I can do some investigation on this. In compiz a large part of the GLES /
GL code is combined.

See also "OpenGL Insights" - there's a chapter in there about that. But I agree on topic-relevancy, this is something that can be considered elsewhere.

@leechguy leechguy and 1 other commented on an outdated diff May 2, 2013
xbmc/guilib/GUIFontTTF.cpp
+CBaseTexture* CGUIFontTTF::ReallocTexture(unsigned int& newHeight)
+{
+ unsigned int oldHeight = m_textureHeight;
+ unsigned int oldPitch = m_texturePitch;
+ newHeight = CBaseTexture::PadPow2(newHeight);
+ if(!m_texture)
+ m_texture = new CBaseTexture;
+
+ m_texture->Allocate(m_textureWidth, newHeight, XB_FMT_A8L8);
+ if (!m_texture)
+ {
+ CLog::Log(LOGERROR, "GUIFontTTFGL::CacheCharacter: Error creating new cache texture for size %f", m_height);
+ delete m_texture;
+ return NULL;
+ }
+ m_textureHeight = m_texture->GetHeight();
leechguy
leechguy May 2, 2013 Contributor

Please ignore if I'm making a stupid remark, but I've been staring at this part of the code for a while now and can't figure it out (lines 852, 853 and 856).

Line 852 should be moved to below the if block? If not, then I don't understand the if (!m_texture) on line 853. I mean, this will never evaluate to true because either m_texture is not NULL, or when m_texture is NULL you will have already caused a null pointer exception at line 852 when invoking m_texture->Allocate().

Line 853/856. If m_texture is NULL, then why delete it at line 856 if it does not exist anyway?

theuni
theuni May 2, 2013 Member

Yep, there was some reshuffling and copy/paste here, so you're correct. Will move the allocate down.

Member

Great work @theuni. If I get time I'll try and do up the DX implementation. I don't hold much hope over the next few days though, so if someone else can beat me to it, please do :)

Member
theuni commented May 2, 2013

Thanks for the great reviews @jmarshallnz and @smspillaz. I have a long list of things to address, all helpful suggestions.

As for directx, I don't think there's much hope of making it for this window unfortunately, so there's not any rush. Of more importance (and personal interest) than the actual implementation would be a quick overview from someone who knows dx, and who could pick apart any areas where the scene graph is inadequate. I'm very interested in a complete abstraction here, so that other renderers (software via pixman/cairo comes to mind) could be added in the future.

I'm also curious to hear from the matrix side if anything there would be needed other than a spot for them in our batch structure.

Contributor

No problem @theuni . I mirror the comments of @jmarshallnz - this work looks excellent (and in fact is something I've wanted to do in another project for quite some time).

Probably something that wasn't mentioned was the big benefit you get in terms of cache efficiency when moving to a batched render model like this one - if you push all of the batches down at once organized in a linear array the CPU can just pre-fetch all of the vertex data.

@theuni, excellent work! I was actually thinking about this scenario some day ago when I was profiling an idling xbmc setup. Hopes this gets into main line in the near future. Keep up the good work! /Regards, Lars.

Member
theuni commented May 3, 2013

@popcornmix: mind giving this a go for rpi? It would be nice to have verification that there are no major problems there.

Member

@theuni
Sure - I'll test it tomorrow. Looks very interesting.

Member
theuni commented May 3, 2013

Thanks! Assuming it builds and works ok, we need to verify that video playback still looks correct. Both full-screen and in GUI background.

Also curious to know if there's any speedup as a result. Rpi's gpu is quick enough that I doubt there were many stalls during the loop, but those stalls are very driver-dependent. The extra processing overhead could actually slow things down a little, hopefully the other optims offset that.

Member

On Pi I get compile error:
In file included from SceneGraph.cpp:21:0: SceneGraph.h:25:9: error: 'uint32_t' does not name a type SceneGraph.h:48:3: error: 'uint32_t' does not name a type

#include <stdint.h>
fixes that.

I disabled vsync, and normally see about 66fps in system info/video screen.
With your patch I see about 70fps, so it looks a fraction faster,
This is a debug build.

Video played fine (on Pi that's on a non-GL layer, so that's not surprising).
Pressing TAB to combine GUI and video playback seemed fine.
Music with visualisation was fine.

Looks promising.

Member
theuni commented May 3, 2013

Awesome, thanks for testing. The optims aren't in yet, so the fact that it isn't slower is a win in my book. Will fix the include.

rbej commented May 4, 2013

Confirm Rpi compile error and #include stdint.h fixed error. Everything works ok. No error and glitches on Rpi.

Cory Fields rendermanager: fixups from review
- m_batchDraws to map
- delete cosmetics
- no arithmetic in new[]
- replace irrelevant comment
- move GetSceneGraph outside of draw loop
01b861b
Cory Fields added some commits May 4, 2013
Cory Fields rendermanager: drop CGUIFontTTFBase define e365094
Cory Fields rendermanager: use DrawQuad to paint a window background if needed 0da8886
Cory Fields rendermanager: turn BatchDraw into a class with setters a5a0d9d
Cory Fields rendermanager: add helper Reset() function 0aec348
Cory Fields rendermanager: rename Clear to Reset so it's not mistaken for a draw …
…operation
0cb9663
Cory Fields rendermanager: Split BatchDraw class into a new file c48d90e
Cory Fields rendermanager: rename BatchDraw to CBatchDraw 705de78
Cory Fields rendermanager: guilib cleanups
- rename to CBatchDraw
- use setter functions
- use Reset() on the batch rather than just NULL'ing the texture
ac23177
Cory Fields rendermanager: remove unused vars 094eba3
Cory Fields rendermanager: fix bad copy/paste 8f5b85f
Cory Fields rendermanager: add getters for BatchDraw dc8956e
Cory Fields rendermanager: use BatchDraw getters
Also drop the checks for vertices, this will be done already
29dc209
Cory Fields rendermanager: add get/set functions for testing if a texture is curr…
…ent on the GPU

This lets us drop the nasty friendships with the renderers
a82c462
Cory Fields rendermanager: use get/set gpu functions for textures 68ba3e4
Cory Fields rendermanager: fixup slideshow after scenegraph/batchtexture changes c483967
Cory Fields rendermanager: add BatchDraw helpers for merging similar batches 47c2790
Cory Fields rendermanager: simplify MergeSimilar with new CanMerge/Merge functions eeda0dd
Cory Fields rendermanager: rename MergeSimilar to PreProcess in preparation for a…
…dding a few other checks
e3213fb
Cory Fields rendermanager: check for valid vertices 0db89d9
Cory Fields rendermanager: change iter++ to ++iter in a few places a43bd32
Cory Fields rendermanager: make texture unit usage more obvious c59716a
Cory Fields rendermanager: switch to vectors of shared_ptr
This almost eliminates the copying around of vector elements, which was proving
to be very expensive.

Also added SetVertices() for a cheap means of moving all PackedVertices into a
Batch
2d9df1b
Cory Fields rendermanager: change fonts to use shared_ptr 42325bb
Cory Fields rendermanager: update textures to use shared_ptr 349e5db
Cory Fields rendermanager: update picture slideshow to use shared_ptr 3ee3756
Cory Fields rendermanager: update gles renderer to use shared_ptr 14f017d
Cory Fields rendermanager: forward-declare to avoid including SceneGraph.h 4036754
Cory Fields rendermanager: use a lighter include 9227817
Cory Fields rendermanager: no need for casts here 54285e0
Cory Fields rendermanager: remove unnecessary include 055e852
Cory Fields rendermanager: fixups for gl renderer after review and structural cha…
…nges
62bbc3e
Cory Fields rendermanager: zero-copy vertices for textures
Don't keep a copy of vertex data, since it's undefined after it's been added
anyway. Instead, calculate the number of vertices are required, allocate
exactly that many, then work directly on the destination's vector.

TODO: Same thing for fonts, they're a killer.
c8bf42e
Cory Fields rendermanager: forward-declare to avoid an include in GUITexture.h 1356c1a
Cory Fields rendermanager: plug a small mem leak 222b24b
Member
theuni commented May 7, 2013

I believe I've addressed most of the concerns here (the one that remains is moving the pixel backing store to the Texture, but I'd prefer to hold on that until we can decide how to get font render cpu usage under control first).

The commits here are a bit messy because the interfaces are still a bit hazy, but I think it's approaching something sensible. I've done my best to eliminate any ping-pong'ing done during development, but some things may still look a bit out of order where I've squashed them or moved them around here.

I also reworked quite a bit, and I'm happier with the result now. In profiling the new batching methods, it was easy to see that a bulk of the effort was spent on creating, allocating, copying, and destroying buffers. After playing with a few different models, I think I've found an approach that works reasonably well. The vectors of batch data have moved to vectors of shared_ptr's of batch data. This greatly eliminates the alloc/copy burden, and most of it is handled behind the scenes.

While I was at it, I also moved BatchDraw into its own class, mainly just for future-proofing and cleanliness.

Here (ignoring the wonky coords order, I left that as-is) is a pretty good representation of the zero-copy method, which is (imo) what we should be shooting for.

The last big thing on my radar is trying to get font calculating/batching/copying under control, but I'm beginning to wonder if we would be better off solving it a higher level (flagging when text needs to render but has not changed since last frame, so cached batch data could be used). If that is a reasonable approach, then I think refactoring font batching would largely be a waste of time.

Contributor

i believe your branch is working?
i ask because here ( at my setup ) things are going worse. with mainstream code from before few days i get around 35-36 frames in system info and rss text onto main screen is moving fluent. with code from your fork i get ~22-23 fps, text is with snatches and animations in transitions from one screen to another in system menu to are with snatches. on some places where previously i get ~30% cpu load, now i get 60+% cpu. vsync doesn't change things because in both situations demand is for higher refresh than my system can produce.
my rpi is 'ondemand' clocked at 1000 ( and is >97% loaded during this test ), selected and native resolution on my tv is 1920x1080@60, via HDMI. release build.

Member
theuni commented May 7, 2013

You're probably being hit with the extra cost of text processing mentioned above. Should be easy enough to test that theory. Find 2 screens that are similar except that one has more text than the other. If you're interested, gdb would probably give a quick peek into what's going on under the hood if you just randomly break a few times and see what function the main thread is in.

I'm still thinking on how to improve this.

Contributor

may be. definitely text rendering has something here. lowest load i get is in main system settings window, where text is only on menu on left side. definitely last commit makes difference. before: more snatches in rss text, less visible in windows transition animations. now: little less in text, little more in transitions. roughly same fps and cpu load for both commits.

just to remember, @popcornmix reported above ~70fps in system info screen in debug, same screen at which i get ~22-23 in release. this lead me for next time to personal conclusion that there are two hardware versions, behaving differently. previously i see reports that exactly the same image behaves differently regardless vsync. for some it just work, for others, including me, not. may be i have 'bad' chance to own slower version, but that chance is good in providing capabilities for tests on this platform too. that is why i report my findings too.
that are just personal feelings about hardware versions and i can be wrong.

@FernetMenta FernetMenta referenced this pull request in OpenELEC/OpenELEC.tv May 15, 2013
Closed

ASIC hang back again #2296

@theuni theuni referenced this pull request in opdenkamp/xbmc-pvr-addons Jun 9, 2013
@FernetMenta FernetMenta vnsi: add gles rendering for vdr ui 298fcd3
Member

@smspillaz in case you might have time to look into this again at some point (or you, ofcourse @theuni :) ) I've rebased it up here:

https://github.com/jmarshallnz/xbmc/tree/deferred_renderer

I likely won't have time to actually work on it short-term, but at least it builds (it doesn't actually work ofcourse...)

Member
theuni commented Jul 12, 2014

@jmarshallnz Heh, I would actually love to get this in at some point because I spent so many days/nights cursing at the main-thread rendering (I have another branch somewhere that builds on top of this one, moving the rendering out of the main thread, but it was just an experimental hack). But this isn't a high priority for me right now :)

That said, it'd probably be more useful to use this PR as a guideline rather than getting it back in working order. IIRC profiling showed an unnecessary absurd amount of unnecessary vertex data copying going on. The premise was good though.

It could probably be broken into ~3 stages though, so it could be added piecemeal:

  • Define a data structure for a list of vertices. Here it was a vector of PackedVertex's. vector is probably far too heavy for the purpose. For the controls that render themselves (fonts, textures, etc), load up their vertices into that data structure and pull them back out to render. This seems simple enough, but it's a bit of a challenge to define an efficient structure (without nasty ifdefs) that all engines can use.
  • Create a render manager that does immediate renders of vertices+textures passed in for each rendering engine. This could likely be c/p from textures for the most part. This should allow the gl/gles/dx renderers to be written pretty quickly without too many regressions.
  • Switch to deferred rendering. This is the tricky part where I stalled. GL/GLES worked but I couldn't do DX. I wonder if it may be possible to do this on a per-engine basis, so that one wouldn't hold up the rest.
Member

Cheers @theuni - agreed that the first steps would be moving the (small amount) of texture/font, engine-specific stuff into renderer so that it's all in one spot to begin with, so that guilib etc. is free of specifics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment