Since most Clutter actors aren't much more than textured quads; flushing the
journal typically involves lots of 'change modelview; draw quad' sequences.
The amount of overhead involved in uploading a new modelview and queuing
that primitive is huge in comparison to simply transforming 4 vertices by
the current modelview when logging quads. (Note if your GPU supports HW
vertex transform, then it still does the projective and viewport transforms)
At the same time a --cogl-debug=disable-software-transform option has been
added for comparison and debugging.
This change allows typical pick scenes to be batched into a single draw call
and I'm seeing test-pick run over 200% faster with this. (i965 + Mesa
7.6-devel)
Enabling this option makes Cogl trace how the journal is managing to batch
your rectangles. The journal staggers how it emmits state to the GL driver
and the batches will normally get smaller for each stage, but ideally you
don't want to be in a situation where Cogl is only able to draw one quad per
modelview change and draw call.
E.g. this is a fairly ideal example:
BATCHING: journal len = 101
BATCHING: vbo offset batch len = 101
BATCHING: material batch len = 101
BATCHING: modelview batch len = 101
This isn't:
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
BATCHING: journal len = 1
BATCHING: vbo offset batch len = 1
BATCHING: material batch len = 1
BATCHING: modelview batch len = 1
<repeat>
When this option is used Cogl will print a trace of all quads that get
logged into the journal, and a trace of quads as they get flushed.
If you are seeing a bug with the geometry being drawn by Cogl this may give
some clues by letting you sanity check the numbers being logged vs the
numbers being emitted.
For testing the VBO fallback paths it helps to be able to disable the
COGL_FEATURE_VBOS feature flag. When VBOs aren't available Cogl should use
client side malloc()'d buffers instead.
Previously we only used the Cogl matrix stack API for indirect contexts, but
it's too costly to keep on requesting modelview matrices from GL (for
logging in the journal) even for direct rendering.
I also experimented with a patch for mesa to improve performance and
discussed this with upstream, but we agreed to consider the GL matrix API
essentially deprecated. (For reference the GLES 2 and GL 3 specs have
removed the matrix APIs)
CoglColors shouldn't be compared using memcmp since they may contain
uninitialized padding bytes.
The prototype is also suitable for passing to g_hash_table_new as the
key_equal_func.
_cogl_pango_display_list_add_texture now uses this instead of memcmp.
We now put the color of materials into the vertex array used by the journal
instead of calling glColor() but the number of requests for the material
color were quite expensive so we have changed the material color to
internally be byte components instead of floats to avoid repeat conversions
and added _cogl_material_get_colorubv as a fast-path for the journal to
copy data into the vertex array.
The number of material layers enabled when logging a quad in the journal
determines the stride of the corresponding vertex data (since we need a set
of texture coordinates for each layer.) By padding data in the case where we
have only one layer we can avoid a change in stride if we are mixing single
and double layer primitives in a scene (e.g. relevent for a composite
manager that may use 2 layers for all shaped windows) Avoiding stride
changes means we can minimize calls to gl{Vertex,Color}Pointer when flushing
the journal.
Since we need to update the texcoord pointers when the actual number of
layers changes, this adds another batch_and_call() stage to deal with
glTexCoordPointer and enabling/disabling the client arrays.
Previously the journal was always flushed at the end of
_cogl_rectangles_with_multitexture_coords, (i.e. the end of any
cogl_rectangle* calls) but now we have broadened the potential for batching
geometry. In ideal circumstances we will only flush once per scene.
In summary the journal works like this:
When you use any of the cogl_rectangle* APIs then nothing is emitted to the
GPU at this point, we just log one or more quads into the journal. A
journal entry consists of the quad coordinates, an associated material
reference, and a modelview matrix. Ideally the journal only gets flushed
once at the end of a scene, but in fact there are things to consider that
may cause unwanted flushing, including:
- modifying materials mid-scene
This is because each quad in the journal has an associated material
reference (i.e. not copy), so if you try and modify a material that is
already referenced in the journal we force a flush first)
NOTE: For now this means you should avoid using cogl_set_source_color()
since that currently uses a single shared material. Later we
should change it to use a pool of materials that is recycled
when the journal is flushed.
- modifying any state that isn't currently logged, such as depth, fog and
backface culling enables.
The first thing that happens when flushing, is to upload all the vertex data
associated with the journal into a single VBO.
We then go through a process of splitting up the journal into batches that
have compatible state so they can be emitted to the GPU together. This is
currently broken up into 3 levels so we can stagger the state changes:
1) we break the journal up according to changes in the number of material layers
associated with logged quads. The number of layers in a material determines
the stride of the associated vertices, so we have to update our vertex
array offsets at this level. (i.e. calling gl{Vertex,Color},Pointer etc)
2) we further split batches up according to material compatability. (e.g.
materials with different textures) We flush material state at this level.
3) Finally we split batches up according to modelview changes. At this level
we update the modelview matrix and actually emit the actual draw command.
This commit is largely about putting the initial design in-place; this will be
followed by other changes that take advantage of the extended batching.
Use signed integers while combining window space clip rectangles, so we avoid
arithmatic errors later resulting in glScissor getting negative width and
height arguments.
To allow for flushing of batched geometry within Cogl we can't support users
directly calling glReadPixels. glReadPixels is also awkward, not least
because it returns upside down image data.
All the unit tests have been swithed over and clutter_stage_read_pixels now
sits on top of this too.
We were calculating our vertex stride and allocating our vertex array
differently depending on whether the user passed TRUE for use_color or not.
The problem was that we were always writting color data to the array
regardless of use_color.
There was also a bug with _cogl_texture_sliced_polygon in that it was
writing byte color components but we were expecting float components. We
now use byte components in _cogl_multitexture_unsliced_polygon too and pass
GL_UNSIGNED_BYTE to glColorPointer.
In order to be ready for the next major version of GLib we need to
disable single header inclusion by using the G_DISABLE_SINGLE_INCLUDES
define in the build process.
Although the underlying materials should allow layers with INVALID_HANDLES
it shouldn't be necissary to expose that via cogl_set_source_texture() and
it's easier to resolve a warning/crash here than odd artefacts/crashes later
in the pipeline.
The _cogl_unpremult_alpha_{first,last} functions which
_cogl_bitmap_fallback_unpremult depends on were incorrectly casting each
of the byte components of a texel to a gulong and performing shifts as
if it were dealing with the whole texel.
It now just uses array indexing to access the byte components without
needing to cast or manually shift any bits around.
Even though we used to depend on unpremult whenever we used a
ClutterCairoTexture, clutter_cairo_texture_context_destroy had it's own
unpremult code which worked which is why this bug wouldn't have been noticed
before.
Many operations, like mixing two textures together or alpha-blending
onto a destination with alpha, are done most logically if texture data
is in premultiplied form. We also have many sources of premultiplied
texture data, like X pixmaps, FBOs, cairo surfaces. Rather than trying
to work with two different types of texture data, simplify things by
always premultiplying texture data before uploading to GL.
Because the default blend function is changed to accommodate this,
uses of pure-color CoglMaterial need to be adapted to add
premultiplication.
gl/cogl-texture.c gles/cogl-texture.c: Always premultiply
non-premultiplied texture data before uploading to GL.
cogl-material.c cogl-material.h: Switch the default blend functions
to ONE, ONE_MINUS_SRC_ALPHA so they work correctly with premultiplied
data.
cogl.c: Make cogl_set_source_color() premultiply the color.
cogl.h.in color-material.h: Add some documentation about
premultiplication and its interaction with color values.
cogl-pango-render.c clutter-texture.c tests/interactive/test-cogl-offscreen.c:
Use premultiplied colors.
http://bugzilla.openedhand.com/show_bug.cgi?id=1406
Signed-off-by: Robert Bragg <robert@linux.intel.com>
cogl-bitmap.c cogl-bitmap-pixbuf.c cogl-bitmap-fallback.c cogl-bitmap-private.h:
Add _cogl_bitmap_can_premult(), _cogl_bitmap_premult() and implement
a reasonably fast implementation in the "fallback" code.
http://bugzilla.openedhand.com/show_bug.cgi?id=1406
Signed-off-by: Robert Bragg <robert@linux.intel.com>
It's very common that there's no reasonable fallback to do if the
blend or combine string you set isn't supported. So, rather than
requiring everybody to pass in a GError purely to catch syntax erorrs,
automatically g_warning() if a parse error is encountered and @error
is NULL.
http://bugzilla.openedhand.com/show_bug.cgi?id=1642
Signed-off-by: Robert Bragg <robert@linux.intel.com>
The texture filters are now a property of the material layer rather
than the texture object. Whenever a texture is painted with a material
it sets the filters on all of the GL textures in the Cogl texture. The
filter is cached so that it won't be changed unnecessarily.
The automatic mipmap generation has changed so that the mipmaps are
only generated when the texture is painted instead of every time the
data changes. Changing the texture sets a flag to mark that the
mipmaps are dirty. This works better if the FBO extension is available
because we can use glGenerateMipmap. If the extension is not available
it will temporarily enable automatic mipmap generation and reupload
the first pixel of each slice. This requires tracking the data for the
first pixel.
The COGL_TEXTURE_AUTO_MIPMAP flag has been replaced with
COGL_TEXTURE_NO_AUTO_MIPMAP so that it will default to
auto-mipmapping. The mipmap generation is now effectively free if you
are not using a mipmap filter mode so you would only want to disable
it if you had some special reason to generate your own mipmaps.
ClutterTexture no longer has to store its own copy of the filter
mode. Instead it stores it in the material and the property is
directly set and read from that. This fixes problems with the filters
getting out of sync when a cogl handle is set on the texture
directly. It also avoids the mess of having to rerealize the texture
if the filter quality changes to HIGH because Cogl will take of
generating the mipmaps if needed.
It was previously possible to create a material layer with no texture
by setting some property on it such as the matrix. However it was not
possible to get back to that state without removing the layer and
recreating it. It is useful to be able to remove the texture to free
resources without forgetting the state of the layer so we can put a
different texture in later.
Instead of using GL_TRIANGLES and uploading the indices every time, it
now uses GL_QUADS instead on OpenGL. Under GLES it still uses indices
but it uses the new cogl_vertex_buffer_indices_get_for_quads function
to avoid uploading the vertices every time.
This requires the _cogl_vertex_buffer_indices_pointer_from_handle
function to be exposed privately to the rest of Cogl.
The static_indices array has been removed from the Cogl context.
This function can be used as an efficient way of drawing groups of
quads without using GL_QUADS. It generates a VBO containing the
indices needed to render using pairs of GL_TRIANGLES. The VBO is
globally cached so that it only needs to be uploaded whenever more
indices are requested than ever before.
We avoid rebuilding cogl-enum-types.h and cogl-enum-types.c by
using a "guard" -- a stamp file that will block Makefile. Since
we need cogl-enum-types.h into /clutter/cogl as well for the
cogl.h include to work, if we copy the cogl-enum-types.h
unconditionally it will cause a rebuild of the whole COGL; which
will cause a full rebuild.
To solve this, we can copy the header file when generating it
under the stamp file.
Just like we do with GObject types and G_DEFINE_TYPE, we should
use the g_once_init_enter/g_once_init_leave mechanism to make the
GType registration of enumeration types thread safe.
The setup_viewport() function should only be used by Clutter and
not by application code.
It can be emulated by changing the Stage size and perspective and
requeueing a redraw after calling clutter_stage_ensure_viewport().
Previously indices were tightly bound to a particular Cogl vertex buffer
but we would like to be able to share indices so now we have
cogl_vertex_buffer_indices_new () which returns a CoglHandle.
In particular we could like to have a shared set of indices for drawing
lists of quads that can be shared between the pango renderer and the
Cogl journal.
At the moment Cogl doesn't do much batching of quads so most of the time we
are flushing a single quad at a time. This patch simplifies how we submit
those quads to OpenGL by using glDrawArrays with GL_TRIANGLE_FAN mode
instead of sending indexed vertices using GL_TRIANGLES mode.
Note: I hope to follow up soon with changes that improve our batching and
also move the indices into a VBO so they don't need to be re-validated every
time we call glDrawElements.
cogl_enable_depth_test and cogl_enable_backface_culling have been renamed
and now have corresponding getters, the new functions are:
cogl_set_depth_test_enabled
cogl_get_depth_test_enabled
cogl_set_backface_culling_enabled
cogl_get_backface_culling_enabled
This adds cogl_matrix api for multiplying matrices either by a perspective
or ortho projective transform. The internal matrix stack and current-matrix
APIs also have corresponding support added.
New public API:
cogl_matrix_perspective
cogl_matrix_ortho
cogl_ortho
cogl_set_modelview_matrix
cogl_set_projection_matrix
Originally cogl_vertex_buffer_add_indices let the user pass in their own unique
ID for the indices; now the Id is generated internally and returned to the
caller.
It's now possible to add arrays of indices to a Cogl vertex buffer and
they will be put into an OpenGL vertex buffer object. Since it's quite
common for index arrays to be static it saves the OpenGL driver from
having to validate them repeatedly.
This changes the cogl_vertex_buffer_draw_elements API: It's no longer
possible to provide a pointer to an index array at draw time. So
cogl_vertex_buffer_draw_elements now takes an indices identifier that
should correspond to an idendifier returned when calling
cogl_vertex_buffer_add_indices ()
This is being removed before we release Clutter 1.0 since the implementation
wasn't complete, and so we assume no one is using this yet. Util we have
someone with a good usecase, we can't pretend to support breaking out into
raw OpenGL.
There were a number of functions intended to support creating of new
primitives using materials, but at this point they aren't used outside of
Cogl so until someone has a usecase and we can get feedback on this
API, it's being removed before we release Clutter 1.0.