[cogl] Improve ability to break out into raw OpenGL via begin/end mechanism
Adds a cogl_flush() to give developers breaking into raw GL a fighting chance
[cogl-material] Be more carefull about flushing in cogl_material_remove_layer
Revert "[rectangle] Avoid modifying materials mid scene"
Revert "[actor] Avoid modifying materials mid-scene to improve journal batching"
[cogl-vertex-buffer] Disable unused client tex coord arrays
[cogl] disable all client tex coord arrays in _cogl_add_path_to_stencil_buffer
[cogl] flush matrices in _cogl_add_path_to_stencil_buffer
[journal] Don't resize a singlton VBO; create and destroy a VBO each flush
[cogl] avoid using the journal in _cogl_add_path_to_stencil_buffer
[pango-display-list] Use the Cogl journal for short runs of text
[material] _cogl_material_equal: catch the simplest case of matching handles
[material] avoid flushing the journal when just changing the color
[cogl journal] Perform software modelview transform on logged quads.
[Cogl journal] use G_UNLIKLEY around runtime debugging conditions
[cogl journal] Adds a --cogl-debug=batching option to trace batching
[Cogl journal] Adds a --cogl-debug=journal option for tracing the journal
[cogl] Adds a debug option for disabling use of VBOs --cogl-debug=disable-vbos
[cogl] Force Cogl to always use the client side matrix stack
[cogl-debug] Adds a "client-side-matrices" Cogl debug option
[cogl-color] Adds a cogl_color_equal() function
[cogl material] optimize logging of material colors in the journal
[rectangle] Avoid modifying materials mid scene
[actor] Avoid modifying materials mid-scene to improve journal batching
[journal] Always pad our vertex data as if at least 2 layers are enabled
[cogl] Improving Cogl journal to minimize driver overheads + GPU state changes
The Cogl journal is a mechanism Cogl uses to batch geometry resulting from
any of the cogl_rectangle* functions before sending it to OpenGL. This aims
to improve the Cogl journal so that it can reduce the number of state
changes and draw calls we issue to the OpenGL driver and hopfully improve
performance.
Previously each call to any of the cogl_rectangle* functions would imply an
immediate GL draw call, as well as a corresponding modelview change;
material state changes and gl{Vertex,Color,TexCoord}Pointer calls. Now
though we have tried to open the scope for batching up as much as possible
so we only have to flush the geometry either before calling glXSwapBuffers,
or when we change state that isn't tracked by the journal.
As a basic example, it's now possible for us to batch typical picking
renders into a single draw call for the whole scene.
Some key points about this change:
- We now perform transformations of quads in software (except for long runs of
text which continue to use VBOs)
* It might seem surprising at first, but when you consider that so many
Clutter actors are little more than textured quads and each actor
typically implies a modelview matrix change; the costs involved in
setting up the GPU with the new modelview can easily out weigh the cost
of simply transforming 4 vertices.
- We always use Cogl's own client side matrix API now.
* We found the performance of querying the OpenGL driver for matrix state
was often worse than using the client matrix code, and also - discussing
with Mesa developers - agreed that since khronos has essentially
deprecated the GL matrix API (by removing it from OpenGL 3 and
OpenGL-ES 2) it was appropriate to take full responsibility for all our
matrix manipulation.
- Developers should avoid modifying materials mid-scene.
* With the exception of material color changes, if you try and modify a
material that is referenced in the journal we will currently force a
journal flush. Note: you can assume that re-setting the same value for
a material property won't require a flush though.
- Several new --cogl-debug options have been added
* "disable-batching" can be used to identify bugs in the way that the
journal does its batching; of could this shouldn't ever be needed :-)
* "disable-vbos" can be used to test the VBO fallback paths where we
simply use malloc()'d buffers instead.
* "batching" lets you get an overview of how the journal is batching
your geometry and may help you identify ways to improve your
application performance.
* "journal" lets you trace all the geometry as it gets logged in the
journal, and all the geometry as its flushed from the journal.
Obviously an inconsistency can identify a bug, but the numbers may
help you verify application logic too.
* "disable-software-transform" as implied will instead use the driver
/GPU to transform quads by the modelview matrix.
Although committed separately a --clutter-debug=nop-picking option was
also added that lets you remove picking from the equation, which can
sometimes help make problem analysis more deterministic.
Although we wouldn't recommend developers try and interleve OpenGL drawing
with Cogl drawing - we would prefer patches that improve Cogl to avoid this
if possible - we are providing a simple mechanism that will at least give
developers a fighting chance if they find it necissary.
Note: we aren't helping developers change OpenGL state to modify the
behaviour of Cogl drawing functions - it's unlikley that can ever be
reliably supported - but if they are trying to do something like:
- setup some OpenGL state.
- draw using OpenGL (e.g. glDrawArrays() )
- reset modified OpenGL state.
- continue using Cogl to draw
They should surround their blocks of raw OpenGL with cogl_begin_gl() and
cogl_end_gl():
cogl_begin_gl ();
- setup some OpenGL state.
- draw using OpenGL (e.g. glDrawArrays() )
- reset modified OpenGL state.
cogl_end_gl ();
- continue using Cogl to draw
Again; we aren't supporting code like this:
- setup some OpenGL state.
- use Cogl to draw
- reset modified OpenGL state.
When the internals of Cogl evolves, this is very liable to break.
cogl_begin_gl() will flush all internally batched Cogl primitives, and emit
all internal Cogl state to OpenGL as if it were going to draw something
itself.
The result is that the OpenGL modelview matrix will be setup; the state
corresponding to the current source material will be setup and other world
state such as backface culling, depth and fogging enabledness will be also
be sent to OpenGL.
Note: no special material state is flushed, so if developers want Cogl to setup
a simplified material state it is the their responsibility to set a simple
source material before calling cogl_begin_gl. E.g. by calling
cogl_set_source_color4ub().
Note: It is the developers responsibility to restore any OpenGL state that they
modify to how it was after calling cogl_begin_gl() if they don't do this then
the result of further Cogl calls is undefined.
This function should only need to be called in exceptional circumstances
since Cogl can normally determine internally when a flush is necessary.
As an optimization Cogl drawing functions may batch up primitives
internally, so if you are trying to use raw GL outside of Cogl you stand a
better chance of being successful if you ask Cogl to flush any batched
geometry before making your state changes.
cogl_flush() ensures that the underlying driver is issued all the commands
necessary to draw the batched primitives. It provides no guarantees about
when the driver will complete the rendering.
This provides no guarantees about the GL state upon returning and to avoid
confusing Cogl you should aim to restore any changes you make before
resuming use of Cogl.
If you are making state changes with the intention of affecting Cogl drawing
primitives you are 100% on your own since you stand a good chance of
conflicting with Cogl internals. For example clutter-gst which currently
uses direct GL calls to bind ARBfp programs will very likely break when Cogl
starts to use ARBfb programs internally for the material API, but for now it
can use cogl_flush() to at least ensure that the ARBfp program isn't applied
to additional primitives.
This does not provide a robust generalized solution supporting safe use of
raw GL, its use is very much discouraged.
Previously we would call _cogl_material_pre_change_notify unconditionally, but
now we wait until we really know we are removing a layer before notifying the
change, which will require a journal flush.
Since the convenience functions cogl_set_source_color4ub and
cogl_set_source_texture share a single material, cogl_set_source_color4ub
always calls cogl_material_remove_layer. Often this is a NOP though and
shouldn't require a journal flush.
This gets performance back to where it was before reverting the per-actor
material commits.
Before any cogl vertex buffer drawing we call
enable_state_for_drawing_buffer which sets up the GL state, but we weren't
disabling unsed client texture coord arrays.
This simplifies the vertex data uploading in the journal, and could improve
performance. Modifying a VBO mid-scene could reqire synchronizing with the
GPU or some form of shadowing/copying to avoid modifying data that the GPU
is currently processing; the buffer was also being marked as GL_STATIC_DRAW
which could have made things worse.
Now we simply create a GL_STATIC_DRAW VBO for each flush and and delete it
when we are finished.
Using cogl_rectangle (and thus the journal) in
_cogl_add_path_to_stencil_buffer means we have to consider all the state
that the journal may change in case it may interfer with the direct GL calls
used. This has proven to be error prone and in this case the journal is an
unnecissary overhead. We now simply call glRectf instead of using
cogl_rectangle.
We were missing the simplest test of all: are the two CoglHandles equal and
are the flush option flags for each material equal? This should improve
batching for some common cases.
Whenever we modify a material we call _cogl_material_pre_change_notify which
checks to see if the material is referenced by the journal and if so flushes
if before we modify the material.
Since the journal logs material colors directly into a vertex array (to
avoid us repeatedly calling glColor) then we know we never need to flush
the journal when material colors change.
Since most Clutter actors aren't much more than textured quads; flushing the
journal typically involves lots of 'change modelview; draw quad' sequences.
The amount of overhead involved in uploading a new modelview and queuing
that primitive is huge in comparison to simply transforming 4 vertices by
the current modelview when logging quads. (Note if your GPU supports HW
vertex transform, then it still does the projective and viewport transforms)
At the same time a --cogl-debug=disable-software-transform option has been
added for comparison and debugging.
This change allows typical pick scenes to be batched into a single draw call
and I'm seeing test-pick run over 200% faster with this. (i965 + Mesa
7.6-devel)
Enabling this option makes Cogl trace how the journal is managing to batch
your rectangles. The journal staggers how it emmits state to the GL driver
and the batches will normally get smaller for each stage, but ideally you
don't want to be in a situation where Cogl is only able to draw one quad per
modelview change and draw call.
E.g. this is a fairly ideal example:
BATCHING: journal len = 101
BATCHING: vbo offset batch len = 101
BATCHING: material batch len = 101
BATCHING: modelview batch len = 101
This isn't:
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
<repeat>
When this option is used Cogl will print a trace of all quads that get
logged into the journal, and a trace of quads as they get flushed.
If you are seeing a bug with the geometry being drawn by Cogl this may give
some clues by letting you sanity check the numbers being logged vs the
numbers being emitted.
For testing the VBO fallback paths it helps to be able to disable the
COGL_FEATURE_VBOS feature flag. When VBOs aren't available Cogl should use
client side malloc()'d buffers instead.
Previously we only used the Cogl matrix stack API for indirect contexts, but
it's too costly to keep on requesting modelview matrices from GL (for
logging in the journal) even for direct rendering.
I also experimented with a patch for mesa to improve performance and
discussed this with upstream, but we agreed to consider the GL matrix API
essentially deprecated. (For reference the GLES 2 and GL 3 specs have
removed the matrix APIs)
CoglColors shouldn't be compared using memcmp since they may contain
uninitialized padding bytes.
The prototype is also suitable for passing to g_hash_table_new as the
key_equal_func.
_cogl_pango_display_list_add_texture now uses this instead of memcmp.
We now put the color of materials into the vertex array used by the journal
instead of calling glColor() but the number of requests for the material
color were quite expensive so we have changed the material color to
internally be byte components instead of floats to avoid repeat conversions
and added _cogl_material_get_colorubv as a fast-path for the journal to
copy data into the vertex array.
The number of material layers enabled when logging a quad in the journal
determines the stride of the corresponding vertex data (since we need a set
of texture coordinates for each layer.) By padding data in the case where we
have only one layer we can avoid a change in stride if we are mixing single
and double layer primitives in a scene (e.g. relevent for a composite
manager that may use 2 layers for all shaped windows) Avoiding stride
changes means we can minimize calls to gl{Vertex,Color}Pointer when flushing
the journal.
Since we need to update the texcoord pointers when the actual number of
layers changes, this adds another batch_and_call() stage to deal with
glTexCoordPointer and enabling/disabling the client arrays.
Previously the journal was always flushed at the end of
_cogl_rectangles_with_multitexture_coords, (i.e. the end of any
cogl_rectangle* calls) but now we have broadened the potential for batching
geometry. In ideal circumstances we will only flush once per scene.
In summary the journal works like this:
When you use any of the cogl_rectangle* APIs then nothing is emitted to the
GPU at this point, we just log one or more quads into the journal. A
journal entry consists of the quad coordinates, an associated material
reference, and a modelview matrix. Ideally the journal only gets flushed
once at the end of a scene, but in fact there are things to consider that
may cause unwanted flushing, including:
- modifying materials mid-scene
This is because each quad in the journal has an associated material
reference (i.e. not copy), so if you try and modify a material that is
already referenced in the journal we force a flush first)
NOTE: For now this means you should avoid using cogl_set_source_color()
since that currently uses a single shared material. Later we
should change it to use a pool of materials that is recycled
when the journal is flushed.
- modifying any state that isn't currently logged, such as depth, fog and
backface culling enables.
The first thing that happens when flushing, is to upload all the vertex data
associated with the journal into a single VBO.
We then go through a process of splitting up the journal into batches that
have compatible state so they can be emitted to the GPU together. This is
currently broken up into 3 levels so we can stagger the state changes:
1) we break the journal up according to changes in the number of material layers
associated with logged quads. The number of layers in a material determines
the stride of the associated vertices, so we have to update our vertex
array offsets at this level. (i.e. calling gl{Vertex,Color},Pointer etc)
2) we further split batches up according to material compatability. (e.g.
materials with different textures) We flush material state at this level.
3) Finally we split batches up according to modelview changes. At this level
we update the modelview matrix and actually emit the actual draw command.
This commit is largely about putting the initial design in-place; this will be
followed by other changes that take advantage of the extended batching.
Use signed integers while combining window space clip rectangles, so we avoid
arithmatic errors later resulting in glScissor getting negative width and
height arguments.
Previously this was RGBA_8888. It souldn't really make a difference but for
consistency we expect almost all textures in use to have an internaly
premultiplied pixel format.
_cogl_texture_download_from_gl needs to create transient CoglBitmaps when
downloading sliced textures from GL, and then copies these as subregions
into the final target_bitmap. _cogl_texture_download_from_gl also supports
target_bitmaps with a different format to the source CoglTexture being
downloaded.
The problem was that in the case of slice textures we were always looking
at the format of the CoglTexture, not of the target_bitmap when setting
up the transient slice bitmap.
To allow for flushing of batched geometry within Cogl we can't support users
directly calling glReadPixels. glReadPixels is also awkward, not least
because it returns upside down image data.
All the unit tests have been swithed over and clutter_stage_read_pixels now
sits on top of this too.
We were calculating our vertex stride and allocating our vertex array
differently depending on whether the user passed TRUE for use_color or not.
The problem was that we were always writting color data to the array
regardless of use_color.
There was also a bug with _cogl_texture_sliced_polygon in that it was
writing byte color components but we were expecting float components. We
now use byte components in _cogl_multitexture_unsliced_polygon too and pass
GL_UNSIGNED_BYTE to glColorPointer.
Cogl already add similar defines but with the CLUTTER namespace
(CLUTTER_COGL_HAS_GL and CLUTTER_COGL_HAS_GLES). Let's just add two
similar defines with the COGL namespace. Removing the CLUTTER_COGL ones
could break applications silently for no real good reason.
HAVE_COGL_GLES2 is defined in config.h through the configure script and
should not be used in public headers.
The patch makes configure generate the right define that can be used
later in the header.
In order to be ready for the next major version of GLib we need to
disable single header inclusion by using the G_DISABLE_SINGLE_INCLUDES
define in the build process.
My patch to choose a premultiplied format when the user gives
COGL_PIXEL_FORMAT_ANY for the internal_format broke the case where the data
in question doesn't have and alpha channel.
This was accidentally missed when merging the premultiplication branch
since I merged a local version of the branch that missed this commit.
Although the underlying materials should allow layers with INVALID_HANDLES
it shouldn't be necissary to expose that via cogl_set_source_texture() and
it's easier to resolve a warning/crash here than odd artefacts/crashes later
in the pipeline.
Merge branch 'premultiplication'
[cogl-texture docs] Improves the documentation of the internal_format args
[test-premult] Adds a unit test for texture upload premultiplication semantics
[fog] Document that fogging only works with opaque or unmultipled colors
[test-blend-strings] Explicitly request RGBA_888 tex format for test textures
[premultiplication] Be more conservative with what data gets premultiplied
[bitmap] Fixes _cogl_bitmap_fallback_unpremult
[cogl-bitmap] Fix minor copy and paste error in _cogl_bitmap_fallback_premult
Avoid unnecesary unpremultiplication when saving to local data
Don't unpremultiply Cairo data
Default to a blend function that expects premultiplied colors
Implement premultiplication for CoglBitmap
Use correct texture format for pixmap textures and FBO's
Add cogl_color_premultiply()
Clarifies that if you give COGL_PIXEL_FORMAT_ANY as the internal format for
cogl_texture_new_from_file or cogl_texture_new_from_data then Cogl will
choose a premultiplied internal format.
The fixed function fogging provided by OpenGL only works with unmultiplied
colors (or if the color has an alpha of 1.0) so since we now premultiply
textures and colors by default a note to this affect has been added to
clutter_stage_set_fog and cogl_set_fog.
test-depth.c no longer uses clutter_stage_set_fog for this reason.
In the future when we can depend on fragment shaders we should also be
able to support fogging of premultiplied primitives.
We don't want to force texture data to be premultipled if the user
explicitly specifies a non premultiplied internal_format such as
COGL_PIXEL_FORMAT_RGBA_8888. So now Cogl will only automatically
premultiply data when COGL_PIXEL_FORMAT_ANY is given for the
internal_format, or a premultiplied internal format such as
COGL_PIXEL_FORMAT_RGBA_8888_PRE is requested but non-premultiplied source
data is given.
This approach is consistent with OpenVG image formats which have already
influenced Cogl's pixel format semantics.
The _cogl_unpremult_alpha_{first,last} functions which
_cogl_bitmap_fallback_unpremult depends on were incorrectly casting each
of the byte components of a texel to a gulong and performing shifts as
if it were dealing with the whole texel.
It now just uses array indexing to access the byte components without
needing to cast or manually shift any bits around.
Even though we used to depend on unpremult whenever we used a
ClutterCairoTexture, clutter_cairo_texture_context_destroy had it's own
unpremult code which worked which is why this bug wouldn't have been noticed
before.
Many operations, like mixing two textures together or alpha-blending
onto a destination with alpha, are done most logically if texture data
is in premultiplied form. We also have many sources of premultiplied
texture data, like X pixmaps, FBOs, cairo surfaces. Rather than trying
to work with two different types of texture data, simplify things by
always premultiplying texture data before uploading to GL.
Because the default blend function is changed to accommodate this,
uses of pure-color CoglMaterial need to be adapted to add
premultiplication.
gl/cogl-texture.c gles/cogl-texture.c: Always premultiply
non-premultiplied texture data before uploading to GL.
cogl-material.c cogl-material.h: Switch the default blend functions
to ONE, ONE_MINUS_SRC_ALPHA so they work correctly with premultiplied
data.
cogl.c: Make cogl_set_source_color() premultiply the color.
cogl.h.in color-material.h: Add some documentation about
premultiplication and its interaction with color values.
cogl-pango-render.c clutter-texture.c tests/interactive/test-cogl-offscreen.c:
Use premultiplied colors.
http://bugzilla.openedhand.com/show_bug.cgi?id=1406
Signed-off-by: Robert Bragg <robert@linux.intel.com>
cogl-bitmap.c cogl-bitmap-pixbuf.c cogl-bitmap-fallback.c cogl-bitmap-private.h:
Add _cogl_bitmap_can_premult(), _cogl_bitmap_premult() and implement
a reasonably fast implementation in the "fallback" code.
http://bugzilla.openedhand.com/show_bug.cgi?id=1406
Signed-off-by: Robert Bragg <robert@linux.intel.com>
Merge branch 'master-clock-updates'
* master-clock-updates: (22 commits)
Change the paint forcing on the Text cache text
[timelines] Improve marker hit check and don't fudge the delta
Revert "[timeline] Don't clamp the elapsed time when a looping tl reaches the end"
[tests] Don't add a newline to the end of g_test_message calls
[test-timeline] Add a marker at the beginning of the timeline
[timeline] Don't clamp the elapsed time when a looping tl reaches the end
[master-clock] Throttle if no redraw was performed
[docs] Update Clutter's API reference
Force a paint instead of calling clutter_redraw()
Fix clutter_redraw() to match the redraw cycle
Run the repaint functions inside the redraw cycle
Remove useless manual timeline ticking
Move elapsed-time calculations into ClutterTimeline
Limit the frame rate when not syncing to VBLANK
Decrease the main-loop priority of the frame cycle
Avoid motion-compression in test-picking test
Compress events as part of the frame cycle
Remove stage update idle and do updates from the master clock
Call g_main_context_wakeup() when we start running timelines
Remove unused msecs_delta member
...
Otherwise if there is an error before the slices are created it will
try to free the first_pixels array and crash.
It now also checks whether first_pixels has been created before using
it to update the mipmaps. This should only happen for
cogl_texture_new_from_foreign and doesn't matter if the FBO extension
is available. It would be better in this case to fetch the first pixel
using glGetTexImage as Owen mentioned in the last commit.