When we come to presenting the result of a clipped redraw to the front
buffer with a blit we need to ensure that all the rendering is done,
otherwise redraw operations that are slower than the framerate can queue
up in the pipeline during a heavy animation, causing a larger and larger
backlog of rendering visible as lag to the user.
Note: Since calling glFinish() and sycnrhonizing the CPU with the GPU is
far from ideal, we hope that this is only a short term solution.
One idea is to using sync objects to track render completion so we can
throttle the backlog (ideally with an additional extension that lets us
get notifications in our mainloop instead of having to busy wait for the
completion.)
Another option is to support clipped redraws by reusing the contents of
old back buffers such that we can flip instead of using a blit and then
we can use GLX_INTEL_swap_events to throttle. For this though we would
still probably want an additional extension so we can report the limited
region of the window damage to X/compositors.
Thanks to Owen Taylor and Alexander Larsson for reporting the problem.
This implements a variation of frustum culling whereby we convert screen
space clip rectangles into eye space mini-frustums so that we don't have
to repeatedly transform actor paint-volumes all the way into screen
coordinates to perform culling, we just have to apply the modelview
transform and then determine each points distance from the planes that
make up the clip frustum.
By avoiding the projective transform, perspective divide and viewport
scale for each point culled this makes culling much cheaper.
As noted in commit ce3f55292a an explict glFlush is needed for
both glBlitFramebuffer and glXCopySubBuffer.
_clutter_backend_glx_blit_sub_buffer was already doing an explicit
flush when using glBlitFramebuffer, so just do it unconditonally
and remove the call from clutter_stage_glx_redraw.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2558
Unlike glXSwapBuffers, glXCopySubBuffer and glBlitFramebuffer don't
issue an implicit glFlush() so we have to flush ourselves if we want the
request to complete in finite amount of time since otherwise the driver
can batch the command indefinitely.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2551
Instead of asking all backends to do that for us, we can call
ClutterStageWindow::redraw ourselves by default.
This changeset fixes all backends to actually do the right thing, and
move the stage implementation redraw inside the ClutterStageWindow
implementation itself.
* xi2: (41 commits)
test-devices: Actually print the axis data
device-manager/xi2: Sync the stage of source devices
event: Clean up clutter_event_copy()
device: unset the axes array pointer when resetting
device-manager/xi2: Fix device hotplugging
glx: Clean up GLX implementation
device/x11: Store min/max keycode in the XI device class
x11: Hide all private symbols
docs: More documentation fixes for InputDevice
*/event: Never manipulate the event queue directly
win32: Update DeviceManager device creation
device: Allow enabling/disabling non-master devices
backend/eglx: Add newly created stages to the translators
device: Add more doc annotations
device: Use a double for translate_axis() argument
test-devices: Clean up and show axes data
event: Fix up clutter_event_copy()
device/xi2: Translate the axis data after setting devices
device: Add more accessors for properties
docs: Update API reference
...
This adds a stop-gap mechanism for Cogl to know when the window system
is requested to present the current backbuffer to the frontbuffer by
adding a _cogl_swap_buffers_notify function that backends are now
expected to call right after issuing the equivalent request to OpenGL
vie the platforms OpenGL binding layer. This (blindly) updates all the
backends to call this new function.
For now Cogl doesn't do anything with the notification but the intention
is to use it as part of a planned read-pixel optimization which will
need to reset some state at the start of each new frame.
This is a lump commit that is fairly difficult to break down without
either breaking bisecting or breaking the test cases.
The new design for handling X11 event translation works this way:
- ClutterBackend::translate_event() has been added as the central
point used by a ClutterBackend implementation to translate a
native event into a ClutterEvent;
- ClutterEventTranslator is a private interface that should be
implemented by backend-specific objects, like stage
implementations and ClutterDeviceManager sub-classes, and
allows dealing with class-specific event translation;
- ClutterStageX11 implements EventTranslator, and deals with the
stage-relative X11 events coming from the X11 event source;
- ClutterStageGLX overrides EventTranslator, in order to
deal with the INTEL_GLX_swap_event extension, and it chains up
to the X11 default implementation;
- ClutterDeviceManagerX11 has been split into two separate classes,
one that deals with core and (optionally) XI1 events, and the
other that deals with XI2 events; the selection is done at run-time,
since the core+XI1 and XI2 mechanisms are mutually exclusive.
All the other backends we officially support still use their own
custom event source and translation function, but the end goal is to
migrate them to the translate_event() virtual function, and have the
event source be a shared part of Clutter core.
This tweaks the semantics of the has_redraw_clips vfunc so we can assume
that at the start of a new frame there is an implied, initial,
redraw_clip that clips everything (i.e. nothing would be redrawn) so in
that case we would expect the has_redraw_clips vfunc to return True at
the start of a new frame for backends that support clipping.
Previously there was an ambiguity when this function returned False
since it could either mean a full screen redraw had been queued or it
could mean that the clip state wasn't yet initialized for that frame.
This would result in _clutter_stage_has_full_redraw_queued() returning
True at the start of a new frame even before any actors have been
updated, which in turn meant we would incorrectly ignore queue_redraw
requests for actors, believing them to be redundant.
To consider that we've see a number of drivers that can struggle to get
going and may produce a bad first frame we now force the first 2 frames
to be full redraws. This became a serious issue after we started using
clipped redraws more aggressively because we assumed that after the
first frame the full framebuffer was valid and we only redraw the
content that changes. With buggy drivers though, applications would be
left with junk covering a lot of the stage until some event triggered a
full redraw.
This is a workaround for a race condition when resizing windows while
there are in-flight glXCopySubBuffer blits happening.
The problem stems from the fact that rectangles for the blits are
described relative to the bottom left of the window and because we can't
guarantee control over the X window gravity used when resizing so the
gravity is typically NorthWest not SouthWest.
This means if you grow a window vertically the server will make sure to
place the old contents of the window at the top-left/north-west of your
new larger window, but that may happen asynchronous to GLX preparing to
do a blit specified relative to the bottom-left/south-west of the window
(based on the old smaller window geometry).
When the GLX issued blit finally happens relative to the new bottom of
your window, the destination will have shifted relative to the top-left
where all the pixels you care about are so it will result in a nasty
artefact making resizing look very ugly!
We can't currently fix this completely, in-part because the window
manager tends to trample any gravity we might set. This workaround
instead simply disables blits for a while if we are notified of any
resizes happening so if the user is resizing a window via the window
manager then they may see an artefact for one frame but then we will
fallback to redrawing the full stage until the cooling off period is
over.
This uses actor paint volumes to perform culling during
clutter_actor_paint.
When performing a clipped redraw (because only a few localized actors
changed) then as we traverse the scenegraph painting the actors we can
now ignore actors that don't intersect the clip region. Early testing
shows this can have a big performance benefit; e.g. 100% fps improvement
for test-state with culling enabled and we hope that there are even much
more compelling examples than that in the real world,
Most Clutter applications are 2Dish interfaces and have quite a lot of
actors that get continuously painted when anything is animated. The
dynamic actors are often localized to an area of user focus though so
with culling we can completely avoid painting any of the static actors
outside the current clip region.
Obviously the cost of culling has to be offset against the cost of
painting to determine if it's a win, but our (limited) testing suggests
it should be a win for most applications.
Note: we hope we will be able to also bring another performance bump
from culling with another iteration - hopefully in the 1.6 cycle - to
avoid doing the culling in screen space and instead do it in the stage's
model space. This will hopefully let us minimize the cost of
transforming the actor volumes for culling.
This ensures that clipped redraws are disabled when using
CLUTTER_PAINT=redraws. This may seem unintuitive given that this option
is for debugging clipped redraws, but we can't draw an outline outside
the clip region and anything we draw inside the clip region is liable to
leave a trailing mess on the screen since it won't be cleared up by
later clipped redraws.
This is a fairly extensive second pass at exposing paint volumes for
actors.
The API has changed to allow clutter_actor_get_paint_volume to fail
since there are times - such as when an actor isn't a descendent of the
stage - when the volume can't be determined. Another example is when
something has connected to the "paint" signal of the actor and we simply
have no way of knowing what might be drawn in that handler.
The API has also be changed to return a const ClutterPaintVolume pointer
(transfer none) so we can avoid having to dynamically allocate the
volumes in the most common/performance critical code paths. Profiling was
showing the slice allocation of volumes taking about 1% of an apps time,
for some fairly basic tests. Most volumes can now simply be allocated on
the stack; for clutter_actor_get_paint_volume we return a pointer to
&priv->paint_volume and if we need a more dynamic allocation there is
now a _clutter_stage_paint_volume_stack_allocate() mechanism which lets
us allocate data which expires at the start of the next frame.
The API has been extended to make it easier to implement
get_paint_volume for containers by using
clutter_actor_get_transformed_paint_volume and
clutter_paint_volume_union. The first allows you to query the paint
volume of a child but transformed into parent actor coordinates. The
second lets you combine volumes together so you can union all the
volumes for a container's children and report that as the container's
own volume.
The representation of paint volumes has been updated to consider that
2D actors are the most common.
The effect apis, clutter-texture and clutter-group have been update
accordingly.
If a NULL clip is passed to clutter_stage_glx_add_redraw_clip then we
update the redraw clip to have width of 0, but we weren't setting
stage_glx->initialized_redraw_clip = TRUE. This could result in a full,
unclipped stage redraw being reduced to a clipped redraw.
The glx and egl(x) backends export some internal symbols. Hide these
symbols (using '_' prefix) to reduce ABI differentiation between the
glx and eglx flavours.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2267
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
DRM is available on more platforms than Linux (e.g. kFreeBSD), but
Clutter currently FTBFS there because of not being an alternative to
the __linux__ code (where it should be HAVE_DRM).
Instead of copying the DRM data structures, we should use libdrm when
falling back to directly requesting to wait for the vblank.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2225
Based on a patch by: Emilio Pozuelo Monfort <pochu27@gmail.com>
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
Currently, we select input events and GLX events conditionally,
depending on whether the user has disabled event retrieval.
We should, instead, unconditionally select input events even with event
retrieval disabled because we need to guarantee that the Clutter
internal state is maintained when calling clutter_x11_handle_event()
without requiring applications or embedding toolkits to select events
themselves. If we did that, we'd have to document the events to be
selected, and also update applications and embedding toolkits each time
we added a new mask, or a new class of events - something that's clearly
not possible.
See:
http://bugzilla.clutter-project.org/show_bug.cgi?id=998
for the rationale of why we did conditional selection. It is now clear
that a compositor should clear out the input region, since it cannot
assume a perfectly clean slate coming from us.
See:
http://bugzilla.clutter-project.org/show_bug.cgi?id=2228
for an example of things that break if we do conditional event
selection on GLX events. In that specific case, the X11 server ≤ 1.8
always pushed GLX events on the queue, even without selecting them; this
has been fixed in the X11 server ≥ 1.9, which means that applications
like Mutter or toolkit integration libraries like Clutter-GTK would stop
working on recent Intel drivers providing the GLX_INTEL_swap_event
extension.
This change has been tested with Mutter and Clutter-GTK.
Moves preprocessor #ifdef __linux_ above else statement, avoiding the
lack of an else block if __linux__ is not defined.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2212
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
When clipped redraws were first supported in Clutter a heuristic was
added to promote tall clipped redraws into full redraws due to a concern
that using glXCopySubBuffer for tall rectangles would block the GPU for
too long waiting for the vtrace to be in a suitable position so that
tearing isn't seen. We've so far been unable to measure any impact from
this blocking even with full height windows so we are removing the
arbitrary threshold of 300px that was originally "plucked out of thin
air".
http://bugzilla.o-hand.com/show_bug.cgi?id=2136
If we have the GLX_SGI_video_sync extension then it's possible to always
keep track for the video sync counter each time we call glXSwapBuffers
or do a sub stage blit. This then allows us to avoid waiting before
issuing a blit if we can see that the counter has already progressed.
Also since we expect that glXCopySubBuffer is synchronized to the vblank
we don't need to use glFinish () in conjunction with the vblank wait
since the vblank wait's only purpose is to add a delay.
Neither glXCopySubBuffer or glBlitFramebuffer are integrated with the
swap interval of a framebuffer so that means when we do partial stage
updates (as Mutter does in response to window damage) then the blits
aren't throttled which means applications that throw lots of damage
events at the compositor can effectively cause Clutter to run flat out
taking up all the system resources issuing more blits than can even be
seen.
This patch now makes sure we use the GLX_SGI_video_sync or a
DRM_VBLANK_RELATIVE ioctl to throttle blits to the vblank frequency as
we do when using glXSwapBuffers.
Currently glXCopySubBufferMESA is used for sub stage redraws, but in case
a driver does not support GLX_MESA_copy_sub_buffer we fall back to redrawing
the complete stage which isn't really optimal.
So instead to directly fallback to complete redraws try using GL_EXT_framebuffer_blit
to do the BACK to FRONT buffer copies.
http://bugzilla.openedhand.com/show_bug.cgi?id=2128
Move the size check after the NULL check, add the clip height into the
check logic and fix up the comment.
http://bugzilla.openedhand.com/show_bug.cgi?id=2040
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
* Add new clutter_geometry_union(), because writing union intersection
is harder than it looks. Fixes two problems with the inline code in
clutter_stage_glx_add_redraw_clip().
1) The ->x and ->y of were reassigned to before using them to
compute the new width and height.
2) since ClutterGeometry has unsigned width, x + width is unsigned,
and comparison goes wrong if either rectangle has a negative
x + width. (We fixed width for GdkRectangle to be signed for GTK+-2.0,
this is a potent source of bugs.)
* Use in clutter_stage_glx_add_redraw_clip()
* Account for the case where the incoming rectangle is empty, and don't
end up with the stage being entirely redrawn.
* Account for the case where the stage already has a degenerate
width and don't end up with redrawing only the new rectangle and not
the rest of the stage.
The better fix here for the second two problems is to stop using a 0
width to mean the entire stage, but this should work for now.
http://bugzilla.openedhand.com/show_bug.cgi?id=2040
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
A new (internal only currently) API, _clutter_actor_queue_clipped_redraw
can be used to queue a redraw along with a clip rectangle in actor
coordinates. This clip rectangle propagates up to the stage and clutter
backend which may optionally use the information to optimize stage
redraws. The GLX backend in particular may scissor the next redraw to
the clip rectangle and use GLX_MESA_copy_sub_buffer to present the stage
subregion.
The intention is that any actors that can naturally determine the bounds
of updates should queue clipped redraws to reduce the cost of updating
small regions of the screen.
Notes:
» If GLX_MESA_copy_sub_buffer isn't available then the GLX backend
ignores any clip rectangles.
» queuing multiple clipped redraws will result in the bounding box of
each clip rectangle being used.
» If a clipped redraw has a height > 300 pixels then it's promoted into
a full stage redraw, so that the GPU doesn't end up blocking too long
waiting for the vsync to reach the optimal position to avoid tearing.
» Note: no empirical data was used to come up with this threshold so
we may need to tune this.
» Currently only ClutterX11TexturePixmap makes use of this new API. This
is done via a new "queue-damage-redraw" signal that is emitted when
the pixmap is updated. The default handler queues a clipped redraw
with the assumption that the pixmap is being painted as a rectangle
covering the actors transformed allocation. If you subclass
ClutterX11TexturePixmap and change how it's painted you now also
need to override the signal handler and queue your own redraw.
Technically this is a semantic break, but it's assumed that no one
is currently doing this.
This still leaves a few unsolved issues with regards to optimizing sub
stage redraws that need to be addressed in further work so this can only
be considered a stepping stone a this point:
» Because we have no reliable way to determine if the painting of any
given actor is being modified any optimizations implemented using
_clutter_actor_queue_redraw_with_clip must be overridable by a
subclass, and technically must be opt-in for existing classes to avoid
a change in semantics. E.g. consider that a user connects to the paint
signal for ClutterTexture and paints a circle instead of a rectangle.
In this case any original logic to queue clipped redraws would be
incorrect.
» Currently only the implementation of an actor has enough information
with which to queue clipped redraws. E.g. It is not possible for
generic code in clutter-actor.c to queue a clipped redraw when hiding
an actor because actors have no way to report a "paint box". (remember
actors can draw outside their allocation and actors with depth may
also be projected outside of their allocation)
» The current plan is to add a actor_class->get_paint_cuboid()
virtual so actors can report a bounding cube for everything they
would draw in their current state and use that to queue clipped
redraws against the stage by projecting the paint cube into stage
coordinates.
» Our heuristics for promoting clipped redraws into full redraws to
avoid blocking the GPU while we wait for the vsync need improving:
» vsync issues aren't relevant for redirected/composited applications
so they should use different heuristics. In this case we instead
need to trade off the cost of blitting when using glXCopySubBuffer
vs promoting to a full redraw and flipping instead.
Since using addresses that might change is something that finally
the FSF acknowledge as a plausible scenario (after changing address
twice), the license blurb in the source files should use the URI
for getting the license in case the library did not come with it.
Not that URIs cannot possibly change, but at least it's easier to
set up a redirection at the same place.
As a side note: this commit closes the oldes bug in Clutter's bug
report tool.
http://bugzilla.openedhand.com/show_bug.cgi?id=521
When we resize, we relied on the stage's allocate to re-initialise the
GL viewport. Unfortunately, if we resized within Clutter, the new size
was cached before the window is actually resized, so glViewport wasn't
being called after resizing (some of the time, it's a race condition).
Change the way resizing works slightly so that we only resize when the
geometry size doesn't match our preferred size, and queue a relayout on
ConfigureNotify so the glViewport gets called.
Also change window creation slightly so that setting the size of a
window before it's realized works correctly.
If your OpenGL driver supports GLX_INTEL_swap_event that means when
glXSwapBuffers is called it returns immediatly and an XEvent is sent when
the actual swap has finished.
Clutter can use the events that notify swap completion as a means to
throttle rendering in the master clock without blocking the CPU and so it
should help improve the performance of CPU bound applications.
Some extensions only support GLX versions > 1.3 and may not support
old style X Windows as GLXDrawables, so we now create GLXWindows for
stages when possible.
We want to set the default size without triggering the layout machinary,
so change the window creation process slightly so we start with a
640x480 window.
When requesting the GLXFBConfig for creating the GLX context, we should
always request one that links to an ARGB visual instead of a plain RGB
one.
By using an ARGB visual we allow the ClutterStage:use-alpha property to
work as intended when running Clutter under a compositing manager.
The default behaviour of requesting an ARGB visual can be disabled by
using the:
CLUTTER_DISABLE_ARGB_VISUAL
Environment variable.
The only backend that tried to implement offscreen stages was the GLX backend
and even this has apparently be broken for some time without anyone noticing.
The property still remains and since the property already clearly states that
it may not work I don't expect anyone to notice.
This simplifies quite a bit of the GLX code which is very desireable from the
POV that we want to start migrating window system code down to Cogl and the
simpler the code is the more straight forward this work will be.
In the future when Cogl has a nicely designed API for framebuffer objects then
re-implementing offscreen stages cleanly for *all* backends should be quite
straightforward.
Instead of using ClutterActor for the base class of the Stage
implementation we should extend the StageWindow interface with
the required bits (geometry, realization) and use a simple object
class.
This require a wee bit of changes across Backend, Stage and
StageWindow, even though it's mostly re-shuffling.
First of all, StageWindow should get new virtual functions:
* geometry:
- resize()
- get_geometry()
* realization
- realize()
- unrealize()
This covers all the bits that we use from ClutterActor currently
inside the stage implementations.
The ClutterBackend::create_stage() virtual function should create
a StageWindow, and not an Actor (it should always have been; the
fact that it returned an Actor was a leak of the black magic going
on underneath). Since we never guaranteed ABI compatibility for
the Backend class, this is not a problem.
Internally to ClutterStage we can finally drop the shenanigans of
setting/unsetting actor flags on the implementation: if the realization
succeeds, for instance, we set the REALIZED flag on the Stage and
we're done.
As an initial proof of concept, the X11 and GLX stage implementations
have been ported to the New World Order(tm) and show no regressions.
An implementaton of realize() never needs to set the
CLUTTER_ACTOR_REALIZED flag, though it can unset the flag if
things fail unexpectedly. (Previously, stage backend implementations
had to do this since clutter_actor_realize() wasn't used; this
is no longer the case.)
http://bugzilla.openedhand.com/show_bug.cgi?id=1634
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
The mapping and unmapping of the X11 stage implementation is
a bit bong. It's asynchronous, for starters, when it really
can avoid it by tracking the state internally.
The ordering of the map/unmap sequence is also broken with
respect to the resizing.
By tracking the state internally into StageX11 we can safely
remove the MapNotify and UnmapNotify X event handling.
In theory, we should use _NET_WM_STATE a lot more, and reuse
the X11 state flags for fullscreening as well.
This is the another step into abstracting the backend operations
that are currently spread all across the board back into the
backend implementations where they belong.
The GL context creation, for instance, is demanded to the stage
realization which makes it a critical path for every operation
that is GL-context bound. This usually does not make any difference
since we realize the default stage, but at some point we might
start looking into avoiding the default stage realization in order
to make the Clutter startup faster.
It also makes the code maintainable because every part is self
contained and can be reworked with the minimum amount of pain.
The XVisualInfo for GL is created when a stage is being realized.
When embedding Clutter inside another toolkit we might not want to
realize a stage to extract the XVisualInfo, then set the stage
window using a foreign X Window -- which will cause a re-realization.
Instead, we should abstract as much as possible into the X11 backend.
Unfortunately, the XVisualInfo for GL is requested using GLX API; for
this reason we have to create a ClutterBackendX11 method that we
override inside the ClutterBackendGLX implementation.
This also allows us to move a little bit of complexity from out of
the stage realization, which is currently a very delicate and hard
to debug section.
Since we are destroying any previously set VisualInfo we keep we know
for sure that stage->xvisinfo is going to be None; hence, no reason to
check this condition.