Fix a regression that got introduced with
c483b52d240136cf49b358cde2c0d2c0ed237e51 where we started passing the
redraw_clip to paint_stage() instead of creating a temporary view_region
for unclipped redraws: In case we detect an invalid buffer age, we fall
back to doing an unclipped redraw after we passed the first check
setting up may_use_clipped_redraw. That means we didn't reset the
redraw_clip to the view_rect, and we're now going to redraw the stage
using the original redraw clip even though we're swapping the full
framebuffer without damage.
To fix that, check for the buffer age before setting up the
fb_clip_region and the redraw_clip and set may_use_clipped_redraw to
FALSE if the buffer age is invalid, too. This ensures the redraw_clip is
always going to be correctly set to the view rect when we want to force
a full redraw.
Fixes https://gitlab.gnome.org/GNOME/mutter/issues/1128
This is so that cogl-trace.h can start using things from cogl-macros.h,
and so that it doesn't leak cogl-config.h into the world, while exposing
it to e.g. gnome-shell so that it can make use of it as well. There is
no practical reason why we shouldn't just include cogl-trace.h via
cogl.h as we do with everything else.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1059
offset_scale_and_clamp_region() creates a new region resulting in
view_damage which at this point is the only thing left pointing to what
originally was fb_damage getting overwritten and being leaked.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1089
The stage window handled the redraw clip in a global manner; this would
interfere if we want to paint views individually as it'd mean
intersecting views (i.e. mirrored monitors) would loose the redraw clip
once the first view was painted. It also is awkward to have a global
state for something that is built up before redrawing, and only really
valid during paint, due to buffer damage history.
This commits removes all redraw clip management from the stage window,
moving it all into the stage views. When a redraw clip is added to the
stage, every affected view will get the same redraw clip added to it,
and eventually when painted, the stage window (ClutterStageCogl) will
retrieve the redraw clip for each view as it repaints them.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1042
Add a helper that scales and clamps a region, aimed to be used when
transforming between framebuffer coordinate space and view coordinate
spaces.
This helps readability by moving out the verbose for loops that deals
with the individual rects of a region to the helper, making the logic
where it's used much simpler.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1042
The 'have_clip' variable has repeatedly confused me to meaning that
there is a clip. What it actually means is that the effective clip
covers the whole view; the 'redraw_clip == NULL' meaning full redraw is
an important implementation detail for the context, and makes the
intention of the variable unclear; especially since we will after a
couple of blocks will *always* have a clip, just that it covers the
whole view.
Rename the variable to 'is_full_redraw' and negate the meaning, aiming
to make things a lot more clear.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1042
When calculating the fallback framebuffer clip region, which should be
the region in framebuffer coordinates, we didn't scale the view layout
with the view framebuffer scale, meaning for any other scale than 1,
we'd draw a too small region of the view. Fix this by just using the
size of the framebuffer directly, avoiding any scale dependent
calculation all together.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1042
We'll expect a swap event if any of the view paints resulted in a swap;
make the logic dealing with this clearer by making changing the less
vilible '|| swap_event' postfix with a up front '|=' operator.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1042
When calculating regions, a lot of temporary allocations are created. For
the array of rects (which is often a short number of them) we can use
stack allocations up to 1 page (256 cairo_rectangle_int_t). For building
a region of rectangles, cairo and pixman are much faster if you have all
of the rectangles up front or else it mallocs quite a bit of temporary
memory.
If we re-use the cairo_rectangle_int_t array we've already allocated (and
preferably on the stack), we can delay the creation of regions until after
the tight loop.
Additionally, it requires fewer allocations to union two cairo_region_t
than to incrementally union the rectangles into the region.
Before (percentages are of total number of allocations)
TOTAL FUNCTION
[ 100.00%] [Everything]
[ 100.00%] [gnome-shell --wayland --display-server]
[ 99.67%] _start
[ 99.67%] __libc_start_main
[ 99.67%] main
[ 98.60%] meta_run
[ 96.90%] g_main_loop_run
[ 96.90%] g_main_context_iterate.isra.0
[ 96.90%] g_main_context_dispatch
[ 90.27%] clutter_clock_dispatch
[ 86.54%] _clutter_stage_do_update
[ 85.00%] clutter_stage_cogl_redraw
[ 84.98%] clutter_stage_cogl_redraw_view
[ 81.09%] cairo_region_union_rectangle
After (overhead has much dropped)
TOTAL FUNCTION
[ 100.00%] [Everything]
[ 99.80%] [gnome-shell --wayland --display-server]
[ 99.48%] _start
[ 99.48%] __libc_start_main
[ 99.48%] main
[ 92.37%] meta_run
[ 81.49%] g_main_loop_run
[ 81.49%] g_main_context_iterate.isra.0
[ 81.43%] g_main_context_dispatch
[ 39.40%] clutter_clock_dispatch
[ 26.93%] _clutter_stage_do_update
[ 25.80%] clutter_stage_cogl_redraw
[ 25.60%] clutter_stage_cogl_redraw_view
https://gitlab.gnome.org/GNOME/mutter/merge_requests/1071
Previously, we would use a single offscreen framebuffer for both
transformations and when a shadow framebuffer should be used, but that
can be dreadfully slow when using software rendering with a discrete GPU
due to bandwidth limitations.
Keep the offscreen framebuffer for transformations only and add another
intermediate shadow framebuffer used as a copy of the onscreen
framebuffer.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/877
This reverts commit 4918893326fe39dba14575a37359b040d074a5a0.
This commit prevented cogl_stage_cogl_redraw_view() from skipping
swap buffers entirely if the invalidation region ended up empty.
This meant we were actually swapping buffers when we didn't need to.
The source of the glitches was fixed more properly, so this just adds
extra work.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/898
As we do not prevent the SwapBuffers call from happening, those also
do count. Results in clip area calculations to be right for monitors
that previously did not get invalidated.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/888
Currently, Clutter does picking by drawing with Cogl and reading
the pixel that's beneath the given point. Since Cogl has a journal
that records drawing operations, and has optimizations to read a
single pixel from a list of rectangle, it would be expected that
we would hit this fast path and not flush the journal while picking.
However, that's not the case: dithering, clipping with scissors, etc,
can all flush the journal, issuing commands to the GPU and making
picking slow. On NVidia-based systems, this glReadPixels() call is
extremely costly.
Introduce geometric picking, and avoid using the Cogl journal entirely.
Do this by introducing a stack of actors in ClutterStage. This stack
is cached, but for now, don't use the cache as much as possible.
The picking routines are still tied to painting.
When projecting the actor vertexes, do it manually and take the modelview
matrix of the framebuffer into account as well.
CPU usage on an Intel i7-7700, tested with two different GPUs/drivers:
| | Intel | Nvidia |
| ------: | --------: | -----: |
| Moving the mouse: |
| Before | 10% | 10% |
| After | 6% | 6% |
| Moving a window: |
| Before | 23% | 81% |
| After | 19% | 40% |
Closes: https://gitlab.gnome.org/GNOME/mutter/issues/154,
https://gitlab.gnome.org/GNOME/mutter/issues/691
Helps significantly with: https://gitlab.gnome.org/GNOME/mutter/issues/283,
https://gitlab.gnome.org/GNOME/mutter/issues/590,
https://gitlab.gnome.org/GNOME/mutter/issues/700
v2: Fix code style issues
Simplify quadrilateral checks
Remove the 0.5f hack
Differentiate axis-aligned rectangles
https://gitlab.gnome.org/GNOME/mutter/merge_requests/189
This reverts commit f57ce7254d4abd462044318bb088fccd8557ccc8.
It causes crashes, https://gitlab.gnome.org/GNOME/mutter/issues/735, and
changes various expectations relied upon by the renderer code, and being
close to release, it's safer to revert now and reconsider how to remove
the pending swap counter at a later point.
ClutterStage:after-paint is supposed to be emitted after all
painting is done, but before the frame is finished. However,
as it is right now, it is being emitted after each view is
painted -- on multi-monitor setups, after-frame is being
emitted multiple times.
Send after-paint only once, after all views are painted and
before finishing the frame.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/623
After 4faeb12731b8, the maximum time allowed for an update to happen
is calculated as:
max_render_time_allowed = refresh_interval - 1000 * sync_delay;
However, extremely small refresh intervals -- that come as consequence
to extremely high refresh rates -- may fall into an odd numerical range
when refresh_interval < 1000 * sync_delay. That would give us a negative
time.
To be extra cautious about it, add another sanity check for this case.
Change suggested by Jasper St. Pierre.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
If `last_presentation_time` is zero (unsupported) or just very old
(system was idle) then we would like to avoid that triggering a large
number of loop interations. This will get us closer to the right answer
without iterating.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
That could happen if the backend did not provide presentation timestamps,
or if the screen was not changing other than the hardware cursor:
if (stage_cogl->last_presentation_time == 0||
stage_cogl->last_presentation_time < now - 150000)
{
stage_cogl->update_time = now;
return;
}
By setting `update_time` to `now`, master_clock_get_swap_wait_time()
returns 0:
gint64 now = g_source_get_time (master_clock->source);
if (min_update_time < now)
{
return 0;
}
else
{
gint64 delay_us = min_update_time - now;
return (delay_us + 999) / 1000;
}
However, zero is a value unsupported by the default master clock
due to:
if (swap_delay != 0)
return swap_delay;
All cases are now handled by extrapolating when the next presentation
time would be and calculating an appropriate update time to meet that.
We also need to add a check for `update_time == last_update_time`, which
is a situation that just became possible since we support old (or zero)
values of `last_presentation_time`. This avoids getting more than one
stage update per frame interval when input events arrive without
triggering a stage redraw (e.g. moving the hardware cursor).
https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
Instead of crazy refresh rates >1MHz falling back to 60Hz, just honour
them by rendering unthrottled (same as `sync_delay < 0`). Although I
wouldn't actually expect that path to ever be needed in reality, it just
ensures an infinite `while` loop never happens.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/363
Spotted while adding tracing to swap buffers, we only enter
the first part of the if condition when use_clipped_redraw
is TRUE, so it's pretty safe to assume it's TRUE.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/197
The idea here is to be able to visualize and immediately
understand what is happening. Something like:
```
[ view1 ] [ view2 ]
[---- Layout ---][------ Paint ------][ Pick ]
[================== Update =====================]
```
But with colors. A few of the previous profiling data
sections were removed, since they didn't really add to
reading the graph.
https://gitlab.gnome.org/GNOME/mutter/merge_requests/197
If an update (new frame) had been scheduled already before
`_clutter_stage_cogl_presented` was called then that means it was
scheduled for the wrong time. Because the `last_presentation_time` has
changed since then. And using an `update_time` based on an outdated
presentation time results in scheduling frames too early, filling the
buffer queue (triple buffering or worse) and high visual latency.
So if we do receive a presentation event when an update is already
scheduled, remember to reschedule the update based on the newer
`last_presentation_time`. This way we avoid overfilling the buffer queue
and limit ourselves to double buffering for less visible lag.
Closes: https://gitlab.gnome.org/GNOME/mutter/issues/334
Prerequisite: https://gitlab.gnome.org/GNOME/mutter/merge_requests/520https://gitlab.gnome.org/GNOME/mutter/merge_requests/281
The `last_presentation_time` is usually a little in the past (although
sometimes in the future depending on the driver). When it's over 2ms
(`sync_delay`) in the past that would trigger the while loop to count up so
that the next `update_time` is in the future.
The problem with that is for common values of `last_presentation_time`
which are only a few milliseconds ago, incrementing `update_time` by
`refresh_interval` also means counting past the next physical frame that
we haven't rendered yet. And so mutter would skip that frame.
**Example**
Given:
```
last_presentation_time = now - 3ms
sync_delay = 2ms
refresh_interval = 16ms
next_presentation_time = last_presentation_time + refresh_interval
= now + 13ms
-3ms now +13ms +29ms +45ms
----|--+------------|---------------|---------------|----
: :
last_presentation_time next_presentation_time
```
Old algorithm:
```
update_time = last_presentation_time + sync_delay
= now - 1ms
while (update_time < now)
(now - 1ms < now)
update_time = now - 1ms + 16ms
update_time = now + 15ms
next_presentation_time = now + 13ms
available_render_time = next_presentation_time - max(now, update_time)
= (now + 13ms) - (now + 15ms)
= -2ms so the next frame will be skipped.
-3ms now +13ms +29ms +45ms
----|--+------------|-+-------------|---------------|----
: : :
: : update_time (too late)
: :
last_presentation_time next_presentation_time (a missed frame)
```
New algorithm:
```
min_render_time_allowed = refresh_interval / 2
= 8ms
max_render_time_allowed = refresh_interval - sync_delay
= 14ms
target_presentation_time = last_presentation_time + refresh_interval
= now - 3ms + 16ms
= now + 13ms
while (target_presentation_time - min_render_time_allowed < now)
(now + 13ms - 8ms < now)
(5ms < 0ms)
# loop is never entered
update_time = target_presentation_time - max_render_time_allowed
= now + 13ms - 14ms
= now - 1ms
next_presentation_time = now + 13ms
available_render_time = next_presentation_time - max(now, update_time)
= (now + 13ms) - now
= 13ms which is plenty of render time.
-3ms now +13ms +29ms +45ms
----|-++------------|---------------|---------------|----
: : :
: update_time :
: :
last_presentation_time next_presentation_time
```
The reason nobody noticed these missed frames very often was because
mutter has some accidental workarounds built-in:
* Prior to 3.32, the offending code was only reachable in Xorg sessions.
It was never reached in Wayland sessions because it hadn't been
implemented yet (till e9e4b2b72).
* Even though Wayland support is now implemented the native backend
provides a `last_presentation_time` much faster than Xorg sessions
(being in the same process) and so is less likely to spuriously enter
the while loop to miss a frame.
* For Xorg sessions we are accidentally triple buffering (#334). This
is a good way to avoid the missed frames, but is also an accident.
* `sync_delay` is presently just high enough (2ms by coincidence is very
close to common values of `now - last_presentation_time`) to push the
`update_time` into the future in some cases, which avoids entering the
while loop. This is why the same missed frames problem was also noticed
when experimenting with `sync_delay = 0`.
v2: adjust variable names and code style.
Fixes: https://bugzilla.gnome.org/show_bug.cgi?id=789186
and most of https://gitlab.gnome.org/GNOME/mutter/issues/571https://gitlab.gnome.org/GNOME/mutter/merge_requests/520