This simplifies the vertex data uploading in the journal, and could improve
performance. Modifying a VBO mid-scene could reqire synchronizing with the
GPU or some form of shadowing/copying to avoid modifying data that the GPU
is currently processing; the buffer was also being marked as GL_STATIC_DRAW
which could have made things worse.
Now we simply create a GL_STATIC_DRAW VBO for each flush and and delete it
when we are finished.
This simplifies the vertex data uploading in the journal, and could improve
performance. Modifying a VBO mid-scene could reqire synchronizing with the
GPU or some form of shadowing/copying to avoid modifying data that the GPU
is currently processing; the buffer was also being marked as GL_STATIC_DRAW
which could have made things worse.
Now we simply create a GL_STATIC_DRAW VBO for each flush and and delete it
when we are finished.
Using cogl_rectangle (and thus the journal) in
_cogl_add_path_to_stencil_buffer means we have to consider all the state
that the journal may change in case it may interfer with the direct GL calls
used. This has proven to be error prone and in this case the journal is an
unnecissary overhead. We now simply call glRectf instead of using
cogl_rectangle.
Using cogl_rectangle (and thus the journal) in
_cogl_add_path_to_stencil_buffer means we have to consider all the state
that the journal may change in case it may interfer with the direct GL calls
used. This has proven to be error prone and in this case the journal is an
unnecissary overhead. We now simply call glRectf instead of using
cogl_rectangle.
We were missing the simplest test of all: are the two CoglHandles equal and
are the flush option flags for each material equal? This should improve
batching for some common cases.
For small runs of text like icon labels, we can get better performance
going through the Cogl journal since text may then be batched together
with other geometry.
For larger runs of text though we still use VBOs since the cost of logging
the quads becomes too expensive, including the software transform which
isn't at all optimized at this point. VBOs also have the further advantage
of avoiding repeated validation of vertices by the driver and repeated
mapping of data into the GPU so long as the text doesn't change.
Currently the threshold is 100 vertices/25 quads. This number was plucked
out of thin air and should be tuned later.
With this change I see ~180% fps improvment for test-text. (x61s + i965 +
Mesa 7.6-devel)
Whenever we modify a material we call _cogl_material_pre_change_notify which
checks to see if the material is referenced by the journal and if so flushes
if before we modify the material.
Since the journal logs material colors directly into a vertex array (to
avoid us repeatedly calling glColor) then we know we never need to flush
the journal when material colors change.
We were missing the simplest test of all: are the two CoglHandles equal and
are the flush option flags for each material equal? This should improve
batching for some common cases.
Since most Clutter actors aren't much more than textured quads; flushing the
journal typically involves lots of 'change modelview; draw quad' sequences.
The amount of overhead involved in uploading a new modelview and queuing
that primitive is huge in comparison to simply transforming 4 vertices by
the current modelview when logging quads. (Note if your GPU supports HW
vertex transform, then it still does the projective and viewport transforms)
At the same time a --cogl-debug=disable-software-transform option has been
added for comparison and debugging.
This change allows typical pick scenes to be batched into a single draw call
and I'm seeing test-pick run over 200% faster with this. (i965 + Mesa
7.6-devel)
Whenever we modify a material we call _cogl_material_pre_change_notify which
checks to see if the material is referenced by the journal and if so flushes
if before we modify the material.
Since the journal logs material colors directly into a vertex array (to
avoid us repeatedly calling glColor) then we know we never need to flush
the journal when material colors change.
Since most Clutter actors aren't much more than textured quads; flushing the
journal typically involves lots of 'change modelview; draw quad' sequences.
The amount of overhead involved in uploading a new modelview and queuing
that primitive is huge in comparison to simply transforming 4 vertices by
the current modelview when logging quads. (Note if your GPU supports HW
vertex transform, then it still does the projective and viewport transforms)
At the same time a --cogl-debug=disable-software-transform option has been
added for comparison and debugging.
This change allows typical pick scenes to be batched into a single draw call
and I'm seeing test-pick run over 200% faster with this. (i965 + Mesa
7.6-devel)
Enabling this option makes Cogl trace how the journal is managing to batch
your rectangles. The journal staggers how it emmits state to the GL driver
and the batches will normally get smaller for each stage, but ideally you
don't want to be in a situation where Cogl is only able to draw one quad per
modelview change and draw call.
E.g. this is a fairly ideal example:
BATCHING: journal len = 101
BATCHING: vbo offset batch len = 101
BATCHING: material batch len = 101
BATCHING: modelview batch len = 101
This isn't:
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
<repeat>
Enabling this option makes Cogl trace how the journal is managing to batch
your rectangles. The journal staggers how it emmits state to the GL driver
and the batches will normally get smaller for each stage, but ideally you
don't want to be in a situation where Cogl is only able to draw one quad per
modelview change and draw call.
E.g. this is a fairly ideal example:
BATCHING: journal len = 101
BATCHING: vbo offset batch len = 101
BATCHING: material batch len = 101
BATCHING: modelview batch len = 101
This isn't:
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
<repeat>
When this option is used Cogl will print a trace of all quads that get
logged into the journal, and a trace of quads as they get flushed.
If you are seeing a bug with the geometry being drawn by Cogl this may give
some clues by letting you sanity check the numbers being logged vs the
numbers being emitted.
When this option is used Cogl will print a trace of all quads that get
logged into the journal, and a trace of quads as they get flushed.
If you are seeing a bug with the geometry being drawn by Cogl this may give
some clues by letting you sanity check the numbers being logged vs the
numbers being emitted.
For testing the VBO fallback paths it helps to be able to disable the
COGL_FEATURE_VBOS feature flag. When VBOs aren't available Cogl should use
client side malloc()'d buffers instead.
For testing the VBO fallback paths it helps to be able to disable the
COGL_FEATURE_VBOS feature flag. When VBOs aren't available Cogl should use
client side malloc()'d buffers instead.
Previously we only used the Cogl matrix stack API for indirect contexts, but
it's too costly to keep on requesting modelview matrices from GL (for
logging in the journal) even for direct rendering.
I also experimented with a patch for mesa to improve performance and
discussed this with upstream, but we agreed to consider the GL matrix API
essentially deprecated. (For reference the GLES 2 and GL 3 specs have
removed the matrix APIs)
Previously we only used the Cogl matrix stack API for indirect contexts, but
it's too costly to keep on requesting modelview matrices from GL (for
logging in the journal) even for direct rendering.
I also experimented with a patch for mesa to improve performance and
discussed this with upstream, but we agreed to consider the GL matrix API
essentially deprecated. (For reference the GLES 2 and GL 3 specs have
removed the matrix APIs)
CoglColors shouldn't be compared using memcmp since they may contain
uninitialized padding bytes.
The prototype is also suitable for passing to g_hash_table_new as the
key_equal_func.
_cogl_pango_display_list_add_texture now uses this instead of memcmp.
CoglColors shouldn't be compared using memcmp since they may contain
uninitialized padding bytes.
The prototype is also suitable for passing to g_hash_table_new as the
key_equal_func.
_cogl_pango_display_list_add_texture now uses this instead of memcmp.
We now put the color of materials into the vertex array used by the journal
instead of calling glColor() but the number of requests for the material
color were quite expensive so we have changed the material color to
internally be byte components instead of floats to avoid repeat conversions
and added _cogl_material_get_colorubv as a fast-path for the journal to
copy data into the vertex array.
We now put the color of materials into the vertex array used by the journal
instead of calling glColor() but the number of requests for the material
color were quite expensive so we have changed the material color to
internally be byte components instead of floats to avoid repeat conversions
and added _cogl_material_get_colorubv as a fast-path for the journal to
copy data into the vertex array.
To improve batching of geometry in the Cogl journal we need to avoid modifying
materials midscene.
Currently cogl_set_source_color and cogl_set_source_texture simply modify a
single shared material. In the future we can improve this so they use a pool
of materials that gets recycled as the journal is flushed, but for now we
give all ClutterRectangles their own private materials for painting with.
The number of material layers enabled when logging a quad in the journal
determines the stride of the corresponding vertex data (since we need a set
of texture coordinates for each layer.) By padding data in the case where we
have only one layer we can avoid a change in stride if we are mixing single
and double layer primitives in a scene (e.g. relevent for a composite
manager that may use 2 layers for all shaped windows) Avoiding stride
changes means we can minimize calls to gl{Vertex,Color}Pointer when flushing
the journal.
Since we need to update the texcoord pointers when the actual number of
layers changes, this adds another batch_and_call() stage to deal with
glTexCoordPointer and enabling/disabling the client arrays.
Currently cogl_set_source_color uses a single shared material which means
each actor that uses it causes the journal to flush if the color changes.
Until we improve cogl_set_source_color to use a pool of materials that can
be recycled as the journal is flushed we avoid mid-scene material changes by
giving all actors a private material instead.
Previously the journal was always flushed at the end of
_cogl_rectangles_with_multitexture_coords, (i.e. the end of any
cogl_rectangle* calls) but now we have broadened the potential for batching
geometry. In ideal circumstances we will only flush once per scene.
In summary the journal works like this:
When you use any of the cogl_rectangle* APIs then nothing is emitted to the
GPU at this point, we just log one or more quads into the journal. A
journal entry consists of the quad coordinates, an associated material
reference, and a modelview matrix. Ideally the journal only gets flushed
once at the end of a scene, but in fact there are things to consider that
may cause unwanted flushing, including:
- modifying materials mid-scene
This is because each quad in the journal has an associated material
reference (i.e. not copy), so if you try and modify a material that is
already referenced in the journal we force a flush first)
NOTE: For now this means you should avoid using cogl_set_source_color()
since that currently uses a single shared material. Later we
should change it to use a pool of materials that is recycled
when the journal is flushed.
- modifying any state that isn't currently logged, such as depth, fog and
backface culling enables.
The first thing that happens when flushing, is to upload all the vertex data
associated with the journal into a single VBO.
We then go through a process of splitting up the journal into batches that
have compatible state so they can be emitted to the GPU together. This is
currently broken up into 3 levels so we can stagger the state changes:
1) we break the journal up according to changes in the number of material layers
associated with logged quads. The number of layers in a material determines
the stride of the associated vertices, so we have to update our vertex
array offsets at this level. (i.e. calling gl{Vertex,Color},Pointer etc)
2) we further split batches up according to material compatability. (e.g.
materials with different textures) We flush material state at this level.
3) Finally we split batches up according to modelview changes. At this level
we update the modelview matrix and actually emit the actual draw command.
This commit is largely about putting the initial design in-place; this will be
followed by other changes that take advantage of the extended batching.
The number of material layers enabled when logging a quad in the journal
determines the stride of the corresponding vertex data (since we need a set
of texture coordinates for each layer.) By padding data in the case where we
have only one layer we can avoid a change in stride if we are mixing single
and double layer primitives in a scene (e.g. relevent for a composite
manager that may use 2 layers for all shaped windows) Avoiding stride
changes means we can minimize calls to gl{Vertex,Color}Pointer when flushing
the journal.
Since we need to update the texcoord pointers when the actual number of
layers changes, this adds another batch_and_call() stage to deal with
glTexCoordPointer and enabling/disabling the client arrays.
Previously the journal was always flushed at the end of
_cogl_rectangles_with_multitexture_coords, (i.e. the end of any
cogl_rectangle* calls) but now we have broadened the potential for batching
geometry. In ideal circumstances we will only flush once per scene.
In summary the journal works like this:
When you use any of the cogl_rectangle* APIs then nothing is emitted to the
GPU at this point, we just log one or more quads into the journal. A
journal entry consists of the quad coordinates, an associated material
reference, and a modelview matrix. Ideally the journal only gets flushed
once at the end of a scene, but in fact there are things to consider that
may cause unwanted flushing, including:
- modifying materials mid-scene
This is because each quad in the journal has an associated material
reference (i.e. not copy), so if you try and modify a material that is
already referenced in the journal we force a flush first)
NOTE: For now this means you should avoid using cogl_set_source_color()
since that currently uses a single shared material. Later we
should change it to use a pool of materials that is recycled
when the journal is flushed.
- modifying any state that isn't currently logged, such as depth, fog and
backface culling enables.
The first thing that happens when flushing, is to upload all the vertex data
associated with the journal into a single VBO.
We then go through a process of splitting up the journal into batches that
have compatible state so they can be emitted to the GPU together. This is
currently broken up into 3 levels so we can stagger the state changes:
1) we break the journal up according to changes in the number of material layers
associated with logged quads. The number of layers in a material determines
the stride of the associated vertices, so we have to update our vertex
array offsets at this level. (i.e. calling gl{Vertex,Color},Pointer etc)
2) we further split batches up according to material compatability. (e.g.
materials with different textures) We flush material state at this level.
3) Finally we split batches up according to modelview changes. At this level
we update the modelview matrix and actually emit the actual draw command.
This commit is largely about putting the initial design in-place; this will be
followed by other changes that take advantage of the extended batching.
Removed the G_PARAM_CONSTRUCT from the property registration of
"load-async" and "load-data-async". It made it impossible to use only
load-data-async, as the async loading state would be unset when
load-async got set it's default FALSE value.
Remove a number of functions that were either entirely unimplemented
or had empty implementations for the Clutter-compositor.
meta_compositor_begin_move()
meta_compositor_update_move()
meta_compositor_end_move()
meta_compositor_set_active_window()
meta_compositor_free_window()
http://bugzilla.gnome.org/show_bug.cgi?id=581813
Now that we only have one compositor, there's no reason to access the
compositor functions through a vtable. Remove the MetaCompositor virtualization
and make the clutter code implement the meta_compositor_* functions
directly.
Move the checks for the compositor being NULL from the vtable wrappers
to the calling code (most of them were already there, so just a few
needed to be added)
Note: the compositor is actually hard-coded on at the moment and the plan
is to remove the non-composited code entirely, so the checks are
added only to keep things neat: they have no practical effect.
http://bugzilla.gnome.org/show_bug.cgi?id=581813
Mutter is a Clutter-based compositing manager. So, remove the code for
the XRender-based compositor, and make it mandatory to have XComposite,
XRender and Clutter.
Run-time support for non-composited operation is left for now.
* src/compositor/mutter/: Move files from this subdirectory into
the main compositor/ directory.
* compositor/compositor-xrender.ccompositor/compositor-xrender.h:
Remove
* include/compositor-clutter.h: Remove this stray file, it had been
replaced with compositor-mutter.h some time back.
http://bugzilla.gnome.org/show_bug.cgi?id=581813
Since the stack passed to the compositor now accurately reflects
the X stacking order, we need to treat hidden windows (which are
at the bottom of the X stacking order) specially - when the
compositor stacking order is synced, try to keep animating hidden
actors in their old positions in the stack.
http://bugzilla.gnome.org/show_bug.cgi?id=585984
Use signed integers while combining window space clip rectangles, so we avoid
arithmatic errors later resulting in glScissor getting negative width and
height arguments.
Use signed integers while combining window space clip rectangles, so we avoid
arithmatic errors later resulting in glScissor getting negative width and
height arguments.
With MetaStackTracker, it's no longer necessary to XQueryTree to
get a reasonably-up-to-date view of the server stacking order.
Add some comments explaining unclear aspects of
raise_window_relative_to_managed_windows() and with future possible
improvements.
http://bugzilla.gnome.org/show_bug.cgi?id=585984
Don't add override-redirect windows to MetaStack; we shouldn't
be restacking them.
Since we *aren't* stacking the override-redirect windows, we need to
be careful that to ignore them when looking for the top managed
window.
http://bugzilla.gnome.org/show_bug.cgi?id=585984
In order to properly track the stacking order for override-redirect
windows, move meta_compositor_sync_stack() call into MetaStackTracker.
In the new location, we sync the stack as a before-redraw idle function,
rather then using the freeze-thaw facilities of MetaStack. This is
simpler, and also properly compresses multiple stack changes on
notifications received from the X server.
http://bugzilla.gnome.org/show_bug.cgi?id=585984
Wedging override-redirect windows into the constraint code in stack.c
results in Mutter getting confused about the stacking order of
these windows with respect to other windows, and may also in some
cases cause Mutter to restack override-redirect windows.
core/stack-tracker.c core/stack-tracker.h: MetaStackTracker - combine
events received from the X server with local changes we have made
to come up with the best possible idea of what the stacking order
is at any one point in time.
core/screen.c core/screen-private.h: Create a MetaStackTracker for
the screen.
core/display.c: Feed relevant events to MetaStackTracker
core/frame.c core/screen.c core/stack.c: When we make changes to the
stacking order or add windows, record those changes immediatley
in MetaStackTracker so we have the information without waiting
for a round-trip.
include/ui.h ui/ui.c: meta_ui_create_frame_window add a return value
for the X request serial used to create the window.
http://bugzilla.gnome.org/show_bug.cgi?id=585984