glib already has a data type to manage a list of callbacks called a
GHookList so we might as well use it instead of maintaining Cogl's own
type. The glib version may be slightly more efficient because it
avoids using a GList and instead encodes the prev and next pointers
directly in the GHook structure. It also has more features than
CoglCallbackList.
Previously we were applying the culling optimization to any actor
painted without considering that we may be painting to an offscreen
framebuffer where the stage clip isn't applicable.
For now we simply expose a getter for the current draw framebuffer
and we can assume that a return value of NULL corresponds to the
stage.
Note: This will need to be updated as stages start to be backed by real
CoglFramebuffer objects and so we won't get NULL in those cases.
Drawing and clipping to paths is generally quite expensive because the
geometry has to be tessellated into triangles in a single VBO which
breaks up the journal batching. If we can detect when the path
contains just a single rectangle then we can instead divert to calling
cogl_rectangle which will take advantage of the journal, or by pushing
a rectangle clip which usually ends up just using the scissor.
This patch adds a boolean to each path to mark when it is a
rectangle. It gets cleared whenever a node is added or gets set to
TRUE whenever cogl2_path_rectangle is called. This doesn't try to
catch cases where a rectangle is composed by cogl_path_line_to and
cogl_path_move_to commands.
In 9ff04e8a99 the builtin uniforms were moved to the common shader
boilerplate. However the common boilerplate is positioned before the
default precision specifier on GLES2 so it would fail to compile
because the uniforms end up with no precision in the fragment
shader. This patch just moves the precision specifier to above the
common boilerplate.
Instead of unconditionally combining the modelview and projection
matrices and then iterating each of the vertices to call
cogl_matrix_transform_point for each one in turn we now only combine the
matrices if there are more than 4 vertices (with less than 4 vertices
its less work to transform them separately) and we use the new
cogl_vertex_{transform,project}_points APIs which can hopefully
vectorize the transformations.
Finally the perspective divide and viewport scale is done in a separate
loop at the end and we don't do the spurious perspective divide and
viewport scale for the z component.
This adds two new experimental functions to cogl-matrix.c:
cogl_matrix_view_2d_in_perspective and cogl_matrix_view_2d_in_frustum
which can be used to setup a view transform that maps a 2D coordinate
system (0,0) top left and (width,height) bottom right to the current
viewport.
Toolkits such as Clutter that want to mix 2D and 3D drawing can use
these APIs to position a 2D coordinate system at an arbitrary depth
inside a 3D perspective projected viewing frustum.
OpenGL < 4.0 only supports integer based viewports and internally we
have a mixture of code using floats and integers for viewports. This
patch switches all viewports throughout clutter and cogl to be
represented using floats considering that in the future we may want to
take advantage of floating point viewports with modern hardware/drivers.
This makes a change to the original point_in_poly algorithm from:
http://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/pnpoly.html
The aim was to tune the test so that tests against screen aligned
rectangles are more resilient to some in-precision in how we transformed
that rectangle into screen coordinates. In particular gnome-shell was
finding that for some stage sizes then row 0 of the stage would become a
dead zone when going through the software picking fast-path and this was
because the y position of screen aligned rectangles could end up as
something like 0.00024 and the way the algorithm works it doesn't have
any epsilon/fuz factor to consider that in-precision.
We've avoided introducing an epsilon factor to the comparisons since we
feel there's a risk of changing some semantics in ways that might not be
desirable. One of those is that if you transform two polygons which
share an edge and test a point close to that edge then this algorithm
will currently give a positive result for only one polygon.
Another concern is the way this algorithm resolves the corner case where
the horizontal ray being cast to count edge crossings may cross directly
through a vertex. The solution is based on the "idea of Simulation of
Simplicity" and "pretends to shift the ray infinitesimally down so that
it either clearly intersects, or clearly doesn't touch". I'm not
familiar with the idea myself so I expect a misplaced epsilon is likely
to break that aspect of the algorithm.
The simple solution this patch applies is to pixel align the polygon
vertices which should eradicate most noise due to in-precision.
https://bugzilla.gnome.org/show_bug.cgi?id=641197
When using a pipeline and the journal to blit images between
framebuffers, it should disable blending. Otherwise it will end up
blending the source texture with uninitialised garbage in the
destination texture.
If an atlas texture's last reference is held by the journal or by the
last flushed pipeline then if an atlas migration is started it can
cause a crash. This is because the atlas migration will cause a
journal flush and can sometimes change the current pipeline which
means that the texture would be destroyed during migration.
This patch adds an extra 'post_reorganize' callback to the existing
'reorganize' callback (which is now renamed to 'pre_reorganize'). The
pre_reorganize callback is now called before the atlas grabs a list of
the current textures instead of after so that it doesn't matter if the
journal flush destroys some of those textures. The pre_reorganize
callback for CoglAtlasTexture grabs a reference to all of the textures
so that they can not be destroyed when the migration changes the
pipeline. In the post_reorganize callback the reference is removed
again.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2538
When Cogl debugging is disabled then the 'waste' variable is not used
so it throws a compiler warning. This patch removes the variable and
the value is calculated directly as the parameter to COGL_NOTE.
Some code was doing pointer arithmetic on the return value from
cogl_buffer_map which is void* pointer. This is a GCC extension so we
should try to avoid it. This patch adds casts to guint8* where
appropriate.
Based on a patch by Fan, Chun-wei.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2561
Instead of directly banging GL to migrate textures the atlas now uses
the CoglFramebuffer API. It will use one of four approaches; it can
set up two FBOs and use _cogl_blit_framebuffer to copy between them;
it can use a single target fbo and then render the source texture to
the FBO using a Cogl draw call; it can use a single FBO and call
glCopyTexSubImage2D; or it can fallback to reading all of the texture
data back to system memory and uploading it again with a sub texture
update.
Previously GL calls were used directly because Cogl wasn't able to
create a framebuffer without a stencil and depth buffer. However there
is now an internal version of cogl_offscreen_new_to_texture which
takes a set of flags to disable the two buffers.
The code for blitting has now been moved into a separate file called
cogl-blit.c because it has become quite long and it may be useful
outside of the atlas at some point.
The 4 different methods have a fixed order of preference which is:
* Texture render between two FBOs
* glBlitFramebuffer
* glCopyTexSubImage2D
* glGetTexImage + glTexSubImage2D
Once a method is succesfully used it is tried first for all subsequent
blits. The default default can be overridden by setting the
environment variable COGL_ATLAS_DEFAULT_BLIT_MODE to one of the
following values:
* texture-render
* framebuffer
* copy-tex-sub-image
* get-tex-data
This adds a declaration for _cogl_is_texture_2d to the private header
so that it can be used in cogl-blit.c to determine if the target
texture is a simple 2D texture.
This adds a function called _cogl_texture_2d_copy_from_framebuffer
which is a simple wrapper around glCopyTexSubImage2D. It is currently
specific to the texture 2D backend.
This adds the _cogl_blit_framebuffer internal function which is a
wrapper around glBlitFramebuffer. The API is changed from the GL
version of the function to reflect the limitations provided by the
GL_ANGLE_framebuffer_blit extension (eg, no scaling or mirroring).
This extension is the GLES equivalent of the GL_EXT_framebuffer_blit
extension except that it has some extra restrictions. We need to check
for some extension that provides glBlitFramebuffer so that we can
unconditionally use ctx->drv.pf_glBlitFramebuffer in both GL and GLES
code. Even with the restrictions, the extension provides enough
features for what Cogl needs.
Previously when _cogl_atlas_texture_migrate_out_of_atlas is called it
would unreference the atlas texture's sub-texture before calling
_cogl_atlas_copy_rectangle. This would leave the atlas texture in an
inconsistent state during the copy. This doesn't normally matter but
if the copy ends up doing a render then the atlas texture may end up
being referenced. In particular it would cause problems if the texture
is left in a texture unit because then Cogl may try to call
get_gl_texture even though the texture isn't actually being used for
rendering. To fix this the sub texture is now unrefed after the copy
call instead.
The current framebuffer is now internally separated so that there can
be a different draw and read buffer. This is required to use the
GL_EXT_framebuffer_blit extension. The current draw and read buffers
are stored as a pair in a single stack so that pushing the draw and
read buffer is done simultaneously with the new
_cogl_push_framebuffers internal function. Calling
cogl_pop_framebuffer will restore both the draw and read buffer to the
previous state. The public cogl_push_framebuffer function is layered
on top of the new function so that it just pushes the same buffer for
both drawing and reading.
When flushing the framebuffer state, the cogl_framebuffer_flush_state
function now tackes a pointer to both the draw and the read
buffer. Anywhere that was just flushing the state for the current
framebuffer with _cogl_get_framebuffer now needs to call both
_cogl_get_draw_buffer and _cogl_get_read_buffer.
When pushing a framebuffer it would previously push
COGL_INVALID_HANDLE to the top of the framebuffer stack so that when
it later calls cogl_set_framebuffer it will recognise that the
framebuffer is different and replace the top with the new
pointer. This isn't ideal because it breaks the code to flush the
journal because _cogl_framebuffer_flush_journal is called with the
value of the old pointer which is NULL. That function was checking for
a NULL pointer so it wouldn't actually flush. It also would mean that
if you pushed the same framebuffer twice we would end up dirtying
state unnecessarily. To fix this cogl_push_framebuffer now pushes a
reference to the current framebuffer instead.
After a dependent framebuffer is added to a framebuffer it was never
getting removed. Once the journal for a framebuffer is flushed we no
longer depend on any framebuffers so the list should be cleared. This
was causing leaks of offscreens and textures.
This adds a note to clarify that cogl_matrix_multiply allows you to
multiply the @a matrix in-place, so @a can equal @result but @b can't
equal @result.
When uploading the layer matrix to GL it wasn't first calling
glActiveTextureMatrix to set the right texture unit for the
layer. This would end up setting the texture matrix on whatever layer
happened to be previously active. This happened to work for
test-cogl-multitexture presumably because it was coincidentally
setting the layer matrix on the last used layer.
The pipeline private data is accessed both from the private data set
on a CoglPipeline and the destroy notify function of a weak material
that the vertex buffer creates when it needs to override the wrap
mode. However when a CoglPipeline is destroyed, the CoglObject code
first removes all of the private data set on the object and then the
CoglPipeline code gets invoked to destroy all of the weak children. At
this point the vertex buffer's weak override destroy notify function
will get invoked and try to use the private data which has already
been freed causing a crash.
This patch instead adds a reference count to the pipeline private data
stuct so that we can avoid freeing it until both the private data on
the pipeline has been destroyed and all of the weak materials are
destroyed.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2544
In cogl_pipeline_set_layer_combine_constant it was comparing whether
the new color is the same as the old color using a memcmp on the
constant_color parameter. However the combine constant is stored in
the layer data as an array of four floats but the passed in color is a
CoglColor (which is currently an array of four guint8s). This was
causing valgrind errors and presumably also the check for setting the
same color twice would always fail.
This patch makes it do the conversion to a float array upfront before
the comparison.
cogl_matrix_project_points and cogl_matrix_transform_points had an
optimization for the common case where the stride parameters exactly
match the size of the corresponding structures. The code for both when
generated by gcc with -O2 on x86-64 use two registers to hold the
addresses of the input and output arrays. In the strided version these
pointers are incremented by adding the value of a register and in the
packed version they are incremented by adding an immediate value. I
think the difference in cost here would be negligible and it may even
be faster to add a register.
Also GCC appears to retain the loop counter in a register for the
strided version but in the packed version it can optimize it out and
directly use the input pointer as the counter. I think it would be
possible to reorder the code a bit to explicitly use the input pointer
as the counter if this were a problem.
Getting rid of the packed versions tidies up the code a bit and it
could potentially be faster if the code differences are small and we
get to avoid an extra conditional in cogl_matrix_transform_points.
When copying COMBINE state in
_cogl_pipeline_layer_init_multi_property_sparse_state we would read some
state from the destination layer (invalid data potentially), then
redundantly set the value back on the destination. This was picked up by
valgrind, and the code is now more careful about how it references the
src layer vs the destination layer.
There is currently a problem with per-framebuffer journals in that it's
possible to create a framebuffer from a texture which then gets rendered
too but the framebuffer (and corresponding journal) can be freed before
the texture gets used to draw with.
Conceptually we want to make sure when freeing a framebuffer that - if
it is associated with a texture - we flush the journal as the last thing
before really freeing the framebuffer's meta data. Technically though
this is awkward to implement since the obvious mechanism for us to be
notified about the framebuffer's destruction (by setting some user data
internally with a callback) notifies when the framebuffer has a
ref-count of 0. This means we'd have to be careful what we do with the
framebuffer to consider e.g. recursive destruction; anything that would
set more user data on the framebuffer while it is being destroyed and
ensuring nothing else gets notified of the framebuffer's destruction
before the journal has been flushed.
For simplicity, for now, this patch provides another solution which is
to flush framebuffer journals whenever we switch away from a given
framebuffer via cogl_set_framebuffer or cogl_push/pop_framebuffer. The
disadvantage of this approach is that we can't batch all the geometry of
a scene that involves intermediate renders to offscreen framebufers.
Clutter is doing this more and more with applications that use the
ClutterEffect APIs so this is a shame. Hopefully this will only be a
stop-gap solution while we consider how to reliably support journal
logging across framebuffer changes.
When flushing a clip stack that contains more than one rectangle which
needs to use the stencil buffer the code takes a different path so
that it can combine the new rectangle with the existing contents of
the stencil buffer. However it was not correctly flushing the
modelview and projection matrices so that rectangle would be in the
wrong place.
This adds a COGL_DEBUG=clipping option that reports how the clip is
being flushed. This is needed to determine whether the scissor,
stencil clip planes or software clipping is being used.
The CoglDebugFlags are now stored in an array of unsigned ints rather
than a single variable. The flags are accessed using macros instead of
directly peeking at the cogl_debug_flags variable. The index values
are stored in the enum rather than the actual mask values so that the
enum doesn't need to be more than 32 bits wide. The hope is that the
code to determine the index into the array can be optimized out by the
compiler so it should have exactly the same performance as the old
code.
The lighting parameters such as the diffuse and ambient colors were
previously only flushed in the fixed vertend. This meant that if a
vertex shader was used then they would not be set. The lighting
parameters are uniforms which are just as useful in a fragment shader
so it doesn't really make sense to set them in the vertend. They are
now flushed in the common cogl-pipeline-opengl code but the code is
#ifdef'd for GLES2 because they need to be part of the progend in that
case.
The uniforms for the alpha test reference value and point size on
GLES2 are updating using similar code. This generalizes the code so
that there is a static array of predefined builtin uniforms which
contains the uniform name, a pointer to a function to get the value
from the pipeline, a pointer to a function to update the uniform and a
flag representing which CoglPipelineState change affects the
uniform. The uniforms are then updated in a loop. This should simplify
adding more builtin uniforms.
The builtin uniforms are accessible from either the vertex shader or
the fragment shader so we should define them in the common
section. This doesn't really matter for the current list of uniforms
because it's pretty unlikely that you'd want to access the matrices
from the fragment shader, but for other builtins such as the lighting
material properties it makes sense.
When we added the texture->framebuffers member a _cogl_texture_init
funciton was added to initialize the list of framebuffers associated
with a texture to NULL. All the backends were updated except the
x11 tfp backend. This was causing crashes in test-pixmap.
This is part of a broader cleanup of some of the experimental Cogl API.
One of the reasons for this particular rename is to reduce the verbosity
of using the API. Another reason is that CoglVertexArray is going to be
renamed CoglAttributeBuffer and we want to help emphasize the
relationship between CoglAttributes and CoglAttributeBuffers.
We have a bunch of experimental convenience functions like
cogl_primitive_p2/p2t2 that have corresponding vertex structures but it
seemed a bit odd to have the vertex annotation e.g. "P2T2" be an infix
of the type like CoglP2T2Vertex instead of be a postfix like
CoglVertexP2T2. This switches them all to follow the postfix naming
style.
COGL_DEBUG=disable-fast-read-pixel can be used to disable the
optimization for reading a single pixel colour back by looking at the
geometry in the journal and not involving the GPU. With this disabled we
will always flush the journal, rendering to the framebuffer and then use
glReadPixels to get the result.
This adds a transparent optimization to cogl_read_pixels for when a
single pixel is being read back and it happens that all the geometry of
the current frame is still available in the framebuffer's associated
journal.
The intention is to indirectly optimize Clutter's render based picking
mechanism in such a way that the 99% of cases where scenes are comprised
of trivial quad primitives that can easily be intersected we can avoid
the latency of kicking a GPU render and blocking for the result when we
know we can calculate the result manually on the CPU probably faster
than we could even kick a render.
A nice property of this solution is that it maintains all the
flexibility of the render based picking provided by Clutter and it can
gracefully fall back to GPU rendering if actors are drawn using anything
more complex than a quad for their geometry.
It seems worth noting that there is a limitation to the extensibility of
this approach in that it can only optimize picking a against geometry
that passes through Cogl's journal which isn't something Clutter
directly controls. For now though this really doesn't matter since
basically all apps should end up hitting this fast-path. The current
idea to address this longer term would be a pick2 vfunc for ClutterActor
that can support geometry and render based input regions of actors and
move this optimization up into Clutter instead.
Note: currently we don't have a primitive count threshold to consider
that there could be scenes with enough geometry for us to compensate for
the cost of kicking a render and determine a result more efficiently by
utilizing the GPU. We don't currently expect this to be common though.
Note: in the future it could still be interesting to revive something
like the wip/async-pbo-picking branch to provide an asynchronous
read-pixels based optimization for Clutter picking in cases where more
complex input regions that necessitate rendering are in use or if we do
add a threshold for rendering as mentioned above.
Both cogl_matrix_transform_points and _project_points take points_in and
points_out arguments and explicitly allow pointing to the same array
(i.e. to transform in-place) The implementation of the various internal
transform functions though were not handling this possability and so it
was possible the reference partially transformed vertex values as if
they were original input values leading to incorrect results. This patch
ensures we take a temporary copy of the current input point when
transforming.
This adds a utility function that can determine if a given point
intersects an arbitrary polygon, by counting how many edges a
"semi-infinite" horizontal ray crosses from that point. The plan is to
use this for a software based read-pixel fast path that avoids using the
GPU to rasterize journaled primitives and can instead intersect a point
being read with quads in the journal to determine the correct color.
This adds a stop-gap mechanism for Cogl to know when the window system
is requested to present the current backbuffer to the frontbuffer by
adding a _cogl_swap_buffers_notify function that backends are now
expected to call right after issuing the equivalent request to OpenGL
vie the platforms OpenGL binding layer. This (blindly) updates all the
backends to call this new function.
For now Cogl doesn't do anything with the notification but the intention
is to use it as part of a planned read-pixel optimization which will
need to reset some state at the start of each new frame.
Instead of having _cogl_get/set_clip stack which reference the global
CoglContext this instead makes those into CoglClipState method functions
named _cogl_clip_state_get/set_stack that take an explicit pointer to a
CoglClipState.
This also adds _cogl_framebuffer_get/set_clip_stack convenience
functions that avoid having to first get the ClipState from a
framebuffer then the stack from that - so we can maintain the
convenience of _cogl_get_clip_stack.