This adds a transparent optimization to cogl_read_pixels for when a
single pixel is being read back and it happens that all the geometry of
the current frame is still available in the framebuffer's associated
journal.
The intention is to indirectly optimize Clutter's render based picking
mechanism in such a way that the 99% of cases where scenes are comprised
of trivial quad primitives that can easily be intersected we can avoid
the latency of kicking a GPU render and blocking for the result when we
know we can calculate the result manually on the CPU probably faster
than we could even kick a render.
A nice property of this solution is that it maintains all the
flexibility of the render based picking provided by Clutter and it can
gracefully fall back to GPU rendering if actors are drawn using anything
more complex than a quad for their geometry.
It seems worth noting that there is a limitation to the extensibility of
this approach in that it can only optimize picking a against geometry
that passes through Cogl's journal which isn't something Clutter
directly controls. For now though this really doesn't matter since
basically all apps should end up hitting this fast-path. The current
idea to address this longer term would be a pick2 vfunc for ClutterActor
that can support geometry and render based input regions of actors and
move this optimization up into Clutter instead.
Note: currently we don't have a primitive count threshold to consider
that there could be scenes with enough geometry for us to compensate for
the cost of kicking a render and determine a result more efficiently by
utilizing the GPU. We don't currently expect this to be common though.
Note: in the future it could still be interesting to revive something
like the wip/async-pbo-picking branch to provide an asynchronous
read-pixels based optimization for Clutter picking in cases where more
complex input regions that necessitate rendering are in use or if we do
add a threshold for rendering as mentioned above.
Move the private Backend API to a separate header.
This also allows us to finally move the class vtable and instance
structure to a separate file and plug the visibility hole that left
the Backend class bare for everyone to poke into.
When we don't use a window system drawable, we can't query the color
masks at context initialization time. Do it lazily so we're sure to have
a current context with a valid framebuffer.
This uses actor paint volumes to perform culling during
clutter_actor_paint.
When performing a clipped redraw (because only a few localized actors
changed) then as we traverse the scenegraph painting the actors we can
now ignore actors that don't intersect the clip region. Early testing
shows this can have a big performance benefit; e.g. 100% fps improvement
for test-state with culling enabled and we hope that there are even much
more compelling examples than that in the real world,
Most Clutter applications are 2Dish interfaces and have quite a lot of
actors that get continuously painted when anything is animated. The
dynamic actors are often localized to an area of user focus though so
with culling we can completely avoid painting any of the static actors
outside the current clip region.
Obviously the cost of culling has to be offset against the cost of
painting to determine if it's a win, but our (limited) testing suggests
it should be a win for most applications.
Note: we hope we will be able to also bring another performance bump
from culling with another iteration - hopefully in the 1.6 cycle - to
avoid doing the culling in screen space and instead do it in the stage's
model space. This will hopefully let us minimize the cost of
transforming the actor volumes for culling.
This adds a private ->relayout_pending boolean similar in spirit to
redraw_pending. This will allow us to queue a relayout without
implicitly queueing a redraw; instead we can depend on the actions
of a relayout to queue any necessary redraw.
This ensures that clipped redraws are disabled when using
CLUTTER_PAINT=redraws. This may seem unintuitive given that this option
is for debugging clipped redraws, but we can't draw an outline outside
the clip region and anything we draw inside the clip region is liable to
leave a trailing mess on the screen since it won't be cleared up by
later clipped redraws.
This adds a debug option to visualize the paint volumes of all actors.
When CLUTTER_PAINT=paint-volumes is exported in the environment before
running a Clutter application then all actors will have their bounding
volume drawn in green with a label corresponding to the actors type.
This is a fairly extensive second pass at exposing paint volumes for
actors.
The API has changed to allow clutter_actor_get_paint_volume to fail
since there are times - such as when an actor isn't a descendent of the
stage - when the volume can't be determined. Another example is when
something has connected to the "paint" signal of the actor and we simply
have no way of knowing what might be drawn in that handler.
The API has also be changed to return a const ClutterPaintVolume pointer
(transfer none) so we can avoid having to dynamically allocate the
volumes in the most common/performance critical code paths. Profiling was
showing the slice allocation of volumes taking about 1% of an apps time,
for some fairly basic tests. Most volumes can now simply be allocated on
the stack; for clutter_actor_get_paint_volume we return a pointer to
&priv->paint_volume and if we need a more dynamic allocation there is
now a _clutter_stage_paint_volume_stack_allocate() mechanism which lets
us allocate data which expires at the start of the next frame.
The API has been extended to make it easier to implement
get_paint_volume for containers by using
clutter_actor_get_transformed_paint_volume and
clutter_paint_volume_union. The first allows you to query the paint
volume of a child but transformed into parent actor coordinates. The
second lets you combine volumes together so you can union all the
volumes for a container's children and report that as the container's
own volume.
The representation of paint volumes has been updated to consider that
2D actors are the most common.
The effect apis, clutter-texture and clutter-group have been update
accordingly.
We have an optimization to track when there are multiple picks per
frame so we can do a full render of the pick buffer to reduce the
number of pick renders for a static scene.
There was a problem though in that we were tracking this information in
the ClutterMainContext, but conceptually this doesn't really make sense
because the pick buffer is associated with a stage framebuffer and there
can be multiple stages for one context.
This patch moves the state tracking to ClutterStage.
This reverts commit d7e86e2696.
This was a half baked patch that was pushed a bit early since it broke
test-texture-pick-with-alpha + the commit message refers to a change on
the wip/paint-box branch that hasn't happened yet.
We have an optimization to track when there are multiple picks per
frames so we can do a full render of the pick buffer to reduce the
number of pick renders for a static scene.
There were two problems with how we were tracking this state though.
Firstly we were tracking this information in the ClutterMainContext, but
conceptually this doesn't really make sense because the pick buffer is
associated with a stage framebuffer and there can be multiple stages for
one context. Secondly - since the change to how redraws are queued - we
weren't marking the pick buffer as invalid when a queuing a redraw, we
were only marking the buffer invalid when signaling/finishing the
queue-redraw process, which is now deferred until just before a paint.
This meant using clutter_stage_get_actor_at_pos after a scenegraph
change could give a wrong result if it just read from an existing (but
technically invalid) pick buffer.
This patch moves the state tracking to ClutterStage, and ensures the
buffer is invalidated in _clutter_stage_queue_actor_redraw.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2283
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
When building with --enable-profile we now depend on the uprof-0.3
developer release which brings a few improvements:
» It lets us "fix" how we initialize uprof so that instead of using a shared
object constructor/destructor (which was a hack used when first adding
uprof support to Clutter) we can now initialize as part of clutter's
normal initialization code. As a side note though, I found that the way
Clutter initializes has some quite serious problems whenever it
involves GOptionGroups. It is not able to guarantee the initialization
of dependencies like uprof and Cogl. For this reason we still use the
contructor/destructor approach to initialize uprof in Cogl.
» uprof-0.3 provides a better API for adding custom columns when reporting
timer and counter statistics which lets us remove quite a lot of manual
report generation code in clutter-profile.c.
» uprof-0.3 provides a shared context for tracking mainloop timer
statistics. This means any mainloop based library following the same
"Mainloop" timer naming convention can use the shared context and no
matter who ends up owning the final mainloop the statistics will always
be in the same place. This allows profiling of Clutter with an
external mainloop such as with the Mutter compositor.
» uprof-0.3 can export statistics over dbus and comes with an ncurses
based ui to vizualize timer and counter stats live.
The latest version of uprof can be cloned from:
git://github.com/rib/UProf.git
When building actor relative transforms, instead of using the matrix
stack to combine transformations and making assumptions about what is
currently on the stack we now just explicitly initialize an identity
matrix and apply transforms to that.
This removes the full_vertex_t typedef for internal transformation code
and we just use ClutterVertex.
ClutterStage now implements apply_transform like any other actor now
and the code we had in _cogl_setup_viewport has been moved to the
stage's apply_transform instead.
ClutterStage now tracks an explicit projection matrix and viewport
geometry. The projection matrix is derived from the perspective whenever
that changes, and the viewport is updated when the stage gets a new
allocation. The SYNC_MATRICES mechanism has been removed in favour of
_clutter_stage_dirty_viewport/projection() APIs that get used when
switching between multiple stages to ensure cogl has the latest
information about the onscreen framebuffer.
In 965907deb3 the picking was changed to render the full stage
instead of a single pixel whenever picking is performed more than once
between paints. However the condition in the if-statement was
backwards so it would end up always doing a full stage render.
The idea is that if we see multiple picks per frame then that implies
the visible scene has become static. In this case we can promote the
next pick render to be unclipped so we have valid pick values for the
entire stage. Now we can continue to read from this cached buffer until
the stage contents do visibly change.
Thanks to Luca Bruno on #clutter for this idea!
The P_() macro adds a context for the property nick and blurb. In order
to make xgettext recognize it, we need to drop glib-gettexize inside the
autogen.sh script and ship a modified Makefile.in.in with Clutter.
Initialize the accessibility support calling cally_accessibility_init
Take into account that this is required to at least be sure that
CallyUtil class is available.
It also modifies cally_accessibility_module_init in order to return
if the initialization was fine (and the name, removing the module word).
It also removes the gnome accessibility hooks, as it is not anymore
module code.
Solves CB#2098
Every time we request a CoglPangoFontMap, either internally or
externally, we should have one available.
Reviewed-by: Neil Roberts <neil@linux.intel.com>
In commit c0a553163b I changed the format used to read the picking
pixel to COGL_PIXEL_FORMAT_RGB_888 because it was convenient to avoid
the premult conversion. However this broke picking on GLES on some
platforms because for that glReadPixels is only guaranteed to support
GL_RGBA with GL_UNSIGNED_BYTE. Since the last commit cogl_read_pixels
will always use that format but it will end up with a conversion back
to RGB_888. This patch avoids that conversion and avoids the premult
conversion by reading in RGBA_8888_PRE.
http://bugzilla.openedhand.com/show_bug.cgi?id=2057
We kind of assume that stuff will break well before during the
ClutterBackend::create_context() implementation if we fail to create a
GL context. We do, however, have error reporting in place inside the
Backend API to catch those cases. Unfortunately, since we switched to
lazy initialization of the Stage, there can be a case of GL context
creation failure that still leads to a successful initialization - and a
segmentation fault later on. This is clearly Not Good™.
Let's try to catch a failure in all the places calling create_context()
and report back to the user the error in a meaningful way, before
crashing and burning.
A new (internal only currently) API, _clutter_actor_queue_clipped_redraw
can be used to queue a redraw along with a clip rectangle in actor
coordinates. This clip rectangle propagates up to the stage and clutter
backend which may optionally use the information to optimize stage
redraws. The GLX backend in particular may scissor the next redraw to
the clip rectangle and use GLX_MESA_copy_sub_buffer to present the stage
subregion.
The intention is that any actors that can naturally determine the bounds
of updates should queue clipped redraws to reduce the cost of updating
small regions of the screen.
Notes:
» If GLX_MESA_copy_sub_buffer isn't available then the GLX backend
ignores any clip rectangles.
» queuing multiple clipped redraws will result in the bounding box of
each clip rectangle being used.
» If a clipped redraw has a height > 300 pixels then it's promoted into
a full stage redraw, so that the GPU doesn't end up blocking too long
waiting for the vsync to reach the optimal position to avoid tearing.
» Note: no empirical data was used to come up with this threshold so
we may need to tune this.
» Currently only ClutterX11TexturePixmap makes use of this new API. This
is done via a new "queue-damage-redraw" signal that is emitted when
the pixmap is updated. The default handler queues a clipped redraw
with the assumption that the pixmap is being painted as a rectangle
covering the actors transformed allocation. If you subclass
ClutterX11TexturePixmap and change how it's painted you now also
need to override the signal handler and queue your own redraw.
Technically this is a semantic break, but it's assumed that no one
is currently doing this.
This still leaves a few unsolved issues with regards to optimizing sub
stage redraws that need to be addressed in further work so this can only
be considered a stepping stone a this point:
» Because we have no reliable way to determine if the painting of any
given actor is being modified any optimizations implemented using
_clutter_actor_queue_redraw_with_clip must be overridable by a
subclass, and technically must be opt-in for existing classes to avoid
a change in semantics. E.g. consider that a user connects to the paint
signal for ClutterTexture and paints a circle instead of a rectangle.
In this case any original logic to queue clipped redraws would be
incorrect.
» Currently only the implementation of an actor has enough information
with which to queue clipped redraws. E.g. It is not possible for
generic code in clutter-actor.c to queue a clipped redraw when hiding
an actor because actors have no way to report a "paint box". (remember
actors can draw outside their allocation and actors with depth may
also be projected outside of their allocation)
» The current plan is to add a actor_class->get_paint_cuboid()
virtual so actors can report a bounding cube for everything they
would draw in their current state and use that to queue clipped
redraws against the stage by projecting the paint cube into stage
coordinates.
» Our heuristics for promoting clipped redraws into full redraws to
avoid blocking the GPU while we wait for the vsync need improving:
» vsync issues aren't relevant for redirected/composited applications
so they should use different heuristics. In this case we instead
need to trade off the cost of blitting when using glXCopySubBuffer
vs promoting to a full redraw and flipping instead.
cogl_read_pixels() no longer asserts that the format passed in is
RGBA_8888 but instead accepts any format. The appropriate GL enums for
the format are passed to glReadPixels so OpenGL should be perform a
conversion if neccessary.
It currently assumes glReadPixels will always give us premultiplied
data. This will usually be correct because the result of the default
blending operations for Cogl ends up with premultiplied data in the
framebuffer. However it is possible for the framebuffer to be in
whatever format depending on what CoglMaterial is used to render to
it. Eventually we may want to add a way for an application to inform
Cogl that the framebuffer is not premultiplied in case it is being
used for some special purpose.
If the requested format is not premultiplied then Cogl will convert
it. The tests have been changed to read the data as premultiplied so
that they won't be affected by the conversion. Picking in Clutter has
been changed to use COGL_PIXEL_FORMAT_RGB_888 because it doesn't need
the alpha component. clutter_stage_read_pixels is left unchanged
because the application can't specify a format for that so it seems to
make most sense to store unpremultiplied values.
http://bugzilla.openedhand.com/show_bug.cgi?id=1959
Since using addresses that might change is something that finally
the FSF acknowledge as a plausible scenario (after changing address
twice), the license blurb in the source files should use the URI
for getting the license in case the library did not come with it.
Not that URIs cannot possibly change, but at least it's easier to
set up a redirection at the same place.
As a side note: this commit closes the oldes bug in Clutter's bug
report tool.
http://bugzilla.openedhand.com/show_bug.cgi?id=521
Some of the ClutterDebugFlags are not meant as a logging facility: they
actually change Clutter's behaviour at run-time.
It would be useful to have this distinction ratified, and thus split
ClutterDebugFlags into two: one DebugFlags for logging facilities and
another set of flags for behavioural changes.
This split is warranted because:
• it should be possible to do "CLUTTER_DEBUG=all" and only have
log messages on the output
• it should be possible to use behavioural modifiers even on a
Clutter that has been compiled without debugging messages
support
The commit adds two new debugging flags:
ClutterPickDebugFlags - controlled by the CLUTTER_PICK environment
variable
ClutterPaintDebugFlags - controlled by the CLUTTER_PAINT environment
variable
The PickDebugFlags are:
nop-picking
dump-pick-buffers
While the PaintDebugFlags is:
disable-swap-events
The mechanism is equivalent to the CLUTTER_DEBUG environment variable,
but it does not depend on the debug level selected when configuring and
compiling Clutter. The picking and painting debugging flags are
initialized at clutter_init() time.
http://bugzilla.openedhand.com/show_bug.cgi?id=1991
• Remove unused variables.
• Do not pre-initialize ClutterActor's GType; pre-emptive optimizations
like these are more black magic than real optimization.
Instead of creating the default stage during initialization we can
now safely create it whenever clutter_stage_get_default() is called.
To maintain the invariant, the default stage is immediately realized
by Clutter itself.
Since we must guarantee that Cogl has a GL context to query, it is too
late to use the "dummy Window" trick from within the get_features()
virtual function implementation.
Instead, we can create a dummy Window from create_context() itself and
leave it around - basically trading a default stage with a dummy X
window.
We need to have the dummy X window around all the time so that the
GLX context can be selected and made current.
When we disable the per-actor events delivery Clutter replicates the X11
implicit soft grab for motion events with off-stage. The implicit grab
is done whenever the pointer of a device leaves a window with a button
still pressed; with the implicit grab in place the window still receives
motion events even after the LeaveNotify - until the button is released.
The implicit grab is not honoured in the per-actor event deliver case,
though, so we have a mismatch between two in theory equivalent cases.
Luckily, the fix is pretty trivial: when we check for a motion event
with a stage set but without an actor set, and that has off-stage
coordinates, we arbitrarily set the source to be the stage of the event
and emit the pointer event.