Previously whenever the journal is flushed a new vertex array would be
created to contain the vertices. To avoid the overhead of reallocating
a buffer every time, this patch makes it use a pool of 8 buffers which
are cycled in turn. The buffers are never destroyed but instead the
data is replaced. The journal should only ever be using one buffer at
a time but we cache more than one buffer anyway in case the GL driver
is internally using the buffer in which case mapping the buffer may
cause it to create a new buffer anyway.
For the first iteration of the CoglAttribute API several of the new
functions accepted a pointer to a NULL terminated list of CoglAttribute
pointers - probably as a way to reduce the number of arguments required.
This style isn't consistent with existing Cogl APIs though and so we now
explicitly pass n_attributes arguments and don't require the NULL
termination.
This is part of a broader cleanup of some of the experimental Cogl API.
One of the reasons for this particular rename is to switch away from
using the term "Array" which implies a regular, indexable layout which
isn't the case. We also want to have a strongly implied relationship
between CoglAttributes and CoglAttributeBuffers.
This renames the two internal functions _cogl_get_draw/read_buffer
as cogl_get_draw_framebuffer and _cogl_get_read_framebuffer. The
former is now also exposed as experimental API.
OpenGL < 4.0 only supports integer based viewports and internally we
have a mixture of code using floats and integers for viewports. This
patch switches all viewports throughout clutter and cogl to be
represented using floats considering that in the future we may want to
take advantage of floating point viewports with modern hardware/drivers.
This makes a change to the original point_in_poly algorithm from:
http://www.ecse.rpi.edu/Homepages/wrf/Research/Short_Notes/pnpoly.html
The aim was to tune the test so that tests against screen aligned
rectangles are more resilient to some in-precision in how we transformed
that rectangle into screen coordinates. In particular gnome-shell was
finding that for some stage sizes then row 0 of the stage would become a
dead zone when going through the software picking fast-path and this was
because the y position of screen aligned rectangles could end up as
something like 0.00024 and the way the algorithm works it doesn't have
any epsilon/fuz factor to consider that in-precision.
We've avoided introducing an epsilon factor to the comparisons since we
feel there's a risk of changing some semantics in ways that might not be
desirable. One of those is that if you transform two polygons which
share an edge and test a point close to that edge then this algorithm
will currently give a positive result for only one polygon.
Another concern is the way this algorithm resolves the corner case where
the horizontal ray being cast to count edge crossings may cross directly
through a vertex. The solution is based on the "idea of Simulation of
Simplicity" and "pretends to shift the ray infinitesimally down so that
it either clearly intersects, or clearly doesn't touch". I'm not
familiar with the idea myself so I expect a misplaced epsilon is likely
to break that aspect of the algorithm.
The simple solution this patch applies is to pixel align the polygon
vertices which should eradicate most noise due to in-precision.
https://bugzilla.gnome.org/show_bug.cgi?id=641197
Some code was doing pointer arithmetic on the return value from
cogl_buffer_map which is void* pointer. This is a GCC extension so we
should try to avoid it. This patch adds casts to guint8* where
appropriate.
Based on a patch by Fan, Chun-wei.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2561
The current framebuffer is now internally separated so that there can
be a different draw and read buffer. This is required to use the
GL_EXT_framebuffer_blit extension. The current draw and read buffers
are stored as a pair in a single stack so that pushing the draw and
read buffer is done simultaneously with the new
_cogl_push_framebuffers internal function. Calling
cogl_pop_framebuffer will restore both the draw and read buffer to the
previous state. The public cogl_push_framebuffer function is layered
on top of the new function so that it just pushes the same buffer for
both drawing and reading.
When flushing the framebuffer state, the cogl_framebuffer_flush_state
function now tackes a pointer to both the draw and the read
buffer. Anywhere that was just flushing the state for the current
framebuffer with _cogl_get_framebuffer now needs to call both
_cogl_get_draw_buffer and _cogl_get_read_buffer.
This adds a COGL_DEBUG=clipping option that reports how the clip is
being flushed. This is needed to determine whether the scissor,
stencil clip planes or software clipping is being used.
The CoglDebugFlags are now stored in an array of unsigned ints rather
than a single variable. The flags are accessed using macros instead of
directly peeking at the cogl_debug_flags variable. The index values
are stored in the enum rather than the actual mask values so that the
enum doesn't need to be more than 32 bits wide. The hope is that the
code to determine the index into the array can be optimized out by the
compiler so it should have exactly the same performance as the old
code.
This is part of a broader cleanup of some of the experimental Cogl API.
One of the reasons for this particular rename is to reduce the verbosity
of using the API. Another reason is that CoglVertexArray is going to be
renamed CoglAttributeBuffer and we want to help emphasize the
relationship between CoglAttributes and CoglAttributeBuffers.
COGL_DEBUG=disable-fast-read-pixel can be used to disable the
optimization for reading a single pixel colour back by looking at the
geometry in the journal and not involving the GPU. With this disabled we
will always flush the journal, rendering to the framebuffer and then use
glReadPixels to get the result.
This adds a transparent optimization to cogl_read_pixels for when a
single pixel is being read back and it happens that all the geometry of
the current frame is still available in the framebuffer's associated
journal.
The intention is to indirectly optimize Clutter's render based picking
mechanism in such a way that the 99% of cases where scenes are comprised
of trivial quad primitives that can easily be intersected we can avoid
the latency of kicking a GPU render and blocking for the result when we
know we can calculate the result manually on the CPU probably faster
than we could even kick a render.
A nice property of this solution is that it maintains all the
flexibility of the render based picking provided by Clutter and it can
gracefully fall back to GPU rendering if actors are drawn using anything
more complex than a quad for their geometry.
It seems worth noting that there is a limitation to the extensibility of
this approach in that it can only optimize picking a against geometry
that passes through Cogl's journal which isn't something Clutter
directly controls. For now though this really doesn't matter since
basically all apps should end up hitting this fast-path. The current
idea to address this longer term would be a pick2 vfunc for ClutterActor
that can support geometry and render based input regions of actors and
move this optimization up into Clutter instead.
Note: currently we don't have a primitive count threshold to consider
that there could be scenes with enough geometry for us to compensate for
the cost of kicking a render and determine a result more efficiently by
utilizing the GPU. We don't currently expect this to be common though.
Note: in the future it could still be interesting to revive something
like the wip/async-pbo-picking branch to provide an asynchronous
read-pixels based optimization for Clutter picking in cases where more
complex input regions that necessitate rendering are in use or if we do
add a threshold for rendering as mentioned above.
Instead of having _cogl_get/set_clip stack which reference the global
CoglContext this instead makes those into CoglClipState method functions
named _cogl_clip_state_get/set_stack that take an explicit pointer to a
CoglClipState.
This also adds _cogl_framebuffer_get/set_clip_stack convenience
functions that avoid having to first get the ClipState from a
framebuffer then the stack from that - so we can maintain the
convenience of _cogl_get_clip_stack.
Instead of having a single journal per context, we now have a
CoglJournal object for each CoglFramebuffer. This means we now don't
have to flush the journal when switching/pushing/popping between
different framebuffers so for example a Clutter scene that involves some
ClutterEffect actors that transiently redirect to an FBO can still be
batched.
This also allows us to track state in the journal that relates to the
current frame of its associated framebuffer which we'll need for our
optimization for using the CPU to handle reading a single pixel back
from a framebuffer when we know the whole scene is currently comprised
of simple rectangles in a journal.
In the journal code and when generating the stroke path the vertices
are generated on the fly and stored in a CoglBuffer using
cogl_buffer_map. However cogl_buffer_map is allowed to fail but it
wasn't checking for a NULL return value. In particular on GLES it will
always fail because glMapBuffer is only provided by an extension. This
adds a new pair of internal functions called
_cogl_buffer_{un,}map_for_fill_or_fallback which wrap
cogl_buffer_map. If the map fails then it will instead return a
pointer into a GByteArray attached to the context. When the buffer is
unmapped the array is copied into the buffer using
cogl_buffer_set_data.
When an item is added to the journal the current pipeline immediately
gets the legacy state applied to it and the modified pipeline is
logged instead of the original. However the actual drawing from the
journal is done using the vertex attribute API which was also applying
the legacy state. This meant that the legacy state used would be a
combination of the state set when the journal entry was added as well
as the state set when the journal is flushed. To fix this there is now
an extra CoglDrawFlag to avoid applying the legacy state when setting
up the GL state for the vertex attributes. The journal uses this flag
when flushing.
The vertex attribute API assumes that if there is a color array
enabled then we can't determine if the colors are opaque so we have to
enable blending. The journal always uses a color array to avoid
switching color state between rectangles. Since the journal switched
to using vertex attributes this means we effectively always enable
blending from the journal. To fix this there is now a new flag for
_cogl_draw_vertex_attributes to specify that the color array is known
to only contain opaque colors which causes the draw function not to
copy the pipeline. If the pipeline has blending disabled then the
journal passes this flag.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2481
There is an internal version of cogl_draw_vertex_attributes_array
which previously just bypassed the framebuffer flushing, journal
flushing and pipeline validation so that it could be used to draw the
journal. This patch generalises the function so that it takes a set of
flags to specify which parts to flush. The public version of the
function now just calls the internal version with the flags set to
0. The '_real' version of the function has now been merged into the
internal version of the function because it was only called in one
place. This simplifies the code somewhat. The common code which
flushed the various state has been moved to a separate function. The
indexed versions of the functions have had a similar treatment.
http://bugzilla.clutter-project.org/show_bug.cgi?id=2481
_cogl_pipeline_equal now accepts a mask of pipeline differences and layer
differences to constrain what state will be compared. In addition a set
of flags are passed that can tweak the comparison semantics for some
state groups. For example when comparing layer textures we sometimes
only need to compare the texture target and can ignore the data itself.
In updating the code this patch also changes it so all required pipeline
authorities are resolved in one step up-front instead of resolving the
authority for each state group in turn and repeatedly having to traverse
the pipeline's ancestry. This adds two new functions
_cogl_pipeline_resolve_authorities and
_cogl_pipeline_layer_resolve_authorities to handle resolving a set of
authorities.
This adds a debug option called disable-software-clipping which causes
the journal to always log the clip stack state rather than trying to
manually clip rectangles.
Before flushing the journal there is now a separate iteration that
will try to determine if the matrix of the clip stack and the matrix
of the rectangle in each entry are on the same plane. If they are it
can completely avoid the clip stack and instead manually modify the
vertex and texture coordinates to implement the clip. The has the
advantage that it won't break up batching if a single clipped
rectangle is used in a scene.
The software clip is only used if there is no user program and no
texture matrices. There is a threshold to the size of the batch where
it is assumed that it is worth the cost to break up a batch and
program the GPU to do the clipping. Currently this is set to 8
although this figure is plucked out of thin air.
To check whether the two matrices are on the same plane it tries to
determine if one of the matrices is just a simple translation of the
other. In the process of this it also works out what the translation
would be. These values can be used to translate the clip rectangle
into the coordinate space of the rectangle to be logged. Then we can
do the clip directly in the rectangle's coordinate space.
When logging a quad we now only store the 2 vertices representing the
top left and bottom right of the quad. The color is only stored once
per entry. Once we come to upload the data we expand the 2 vertices
into four and copy the color to each vertex. We do this by mapping the
buffer and directly expanding into it. We have to copy the data before
we can render it anyway so it doesn't make much sense to expand the
vertices before uploading and this way should save some space in the
size of the journal. It also makes it slightly easier if we later want
to do pre-processing on the journal entries before uploading such as
doing software clipping.
The modelview matrix is now always copied to the journal entry whereas
before it would only be copied if we aren't doing software
transform. The journal entry struct always has the space for the
modelview matrix so hopefully it's only a small cost to copy the
matrix.
The transform for the four entries is now done using
cogl_matrix_transform_points which may be slightly faster than
transforming them each individually with a call to
cogl_matrix_transfom.
When logging quads in the journal it used to be possible to specify a
mask of fallback layers (layers where a default white texture should be
used in-place of the corresponding texture in the current source
pipeline). Since we now handle fallbacks for cogl_rectangle* primitives
when validating the pipeline up-front before logging in the journal we
no longer need the ability for the journal to apply fallbacks too.
This removes the possibility to specify wrap mode overrides within a
CoglPipelineFlushOptions struct since the right way to handle these
overrides is by copying the user's material and making the changes to
that copy before flushing. All primitives code has already switched away
from using these wrap mode overrides so this patch just removes unused
code and types. It also remove the wrap_mode_overrides argument for
_cogl_journal_log_quad.
This adds an optional data argument for cogl_vertex_array_new() since it
seems that mostly every case where we use this API we follow up with a
cogl_buffer_set_data() matching the size of the new array. This
simplifies all those cases and whenever we want to delay uploading of
data then NULL can simply be passed.
Since d5634e37 the sliced texture backend now works in terms of
CoglTexture2Ds so there's no need to have special casing for
overriding the texture of a pipeline layer with a GL handle. Instead
we can just use cogl_pipeline_set_layer_texture with the
CoglHandle. The special _cogl_pipeline_set_layer_gl_texture_slice
function has now been removed and parts of the code for comparing
materials have been simplified.
When adding a new entry to the journal a reference is now taken on the
current clip stack. Modifying the current clip state no longer causes
a journal flush. The journal flushing code now has an extra stage to
compare the clip state of each entry. The comparison can simply be
done by comparing the pointers. Although different clip states will
still end up with multiple draw calls this at leasts allows a scene
comprising of multiple different clips to be upload with one vbo. It
also lays the groundwork to do certain tricks when drawing clipped
rectangles such as modifying the geometry instead of setting a clip
state.
Unless the CoglBuffer is being used for texture data then it's
relatively unlikely that the data will contain an array of bytes. For
example if it's used as a vertex array then it's more likely to be
floats or some vertex struct. In that case it's much more convenient
if set_data and map use void* pointers so that we can avoid a cast.
This applies an API naming change that's been deliberated over for a
while now which is to rename CoglMaterial to CoglPipeline.
For now the new pipeline API is marked as experimental and public
headers continue to talk about materials not pipelines. The CoglMaterial
API is now maintained in terms of the cogl_pipeline API internally.
Currently this API is targeting Cogl 2.0 so we will have time to
integrate it properly with other upcoming Cogl 2.0 work.
The basic reasons for the rename are:
- That the term "material" implies to many people that they are
constrained to fragment processing; perhaps as some kind of high-level
texture abstraction.
- In Clutter they get exposed by ClutterTexture actors which may be
re-inforcing this misconception.
- When comparing how other frameworks use the term material, a material
sometimes describes a multi-pass fragment processing technique which
isn't the case in Cogl.
- In code, "CoglPipeline" will hopefully be a much more self documenting
summary of what these objects represent; a full GPU pipeline
configuration including, for example, vertex processing, fragment
processing and blending.
- When considering the API documentation story, at some point we need a
document introducing developers to how the "GPU pipeline" works so it
should become intuitive that CoglPipeline maps back to that
description of the GPU pipeline.
- This is consistent in terminology and concept to OpenGL 4's new
pipeline object which is a container for program objects.
Note: The cogl-material.[ch] files have been renamed to
cogl-material-compat.[ch] because otherwise git doesn't seem to treat
the change as a moving the old cogl-material.c->cogl-pipeline.c and so
we loose all our git-blame history.
Instead of using raw OpenGL in the journal we now use the vertex
attributes API instead. This is part of an ongoing effort to reduce the
number of drawing paths we maintain in Cogl.
This moves the code supporting _cogl_material_flush_gl_state into
cogl-material-opengl.c as part of an effort to reduce the size of
cogl-material.c to keep it manageable.
As a follow on to using cogl_material_copy instead of flush options this
patch now removes the ability to pass flush options to
_cogl_material_equal which is the final reference to the
CoglMaterialFlushOptions mechanism.
Since cogl_material_copy should now be cheap to use we can simplify
how we handle fallbacks and wrap mode overrides etc by simply copying
the original material and making our override changes on the new
material. This avoids the need for a sideband state structure that has
been growing in size and makes flushing material state more complex.
Note the plan is to eventually use weak materials for these override
materials and attach these as private data to the original materials so
we aren't making so many one-shot materials.
This is a complete overhaul of the data structures used to manage
CoglMaterial state.
We have these requirements that were aiming to meet:
(Note: the references to "renderlists" correspond to the effort to
support scenegraph level shuffling of Clutter actor primitives so we can
minimize GPU state changes)
Sparse State:
We wanted a design that allows sparse descriptions of state so it scales
well as we make CoglMaterial responsible for more and more state. It
needs to scale well in terms of memory usage and the cost of operations
we need to apply to materials such as comparing, copying and flushing
their state. I.e. we would rather have these things scale by the number
of real changes a material represents not by how much overall state
CoglMaterial becomes responsible for.
Cheap Copies:
As we add support for renderlists in Clutter we will need to be able to
get an immutable handle for a given material's current state so that we
can retain a record of a primitive with its associated material without
worrying that changes to the original material will invalidate that
record.
No more flush override options:
We want to get rid of the flush overrides mechanism we currently use to
deal with texture fallbacks, wrap mode changes and to handle the use of
highlevel CoglTextures that need to be resolved into lowlevel textures
before flushing the material state.
The flush options structure has been expanding in size and the structure
is logged with every journal entry so it is not an approach that scales
well at all. It also makes flushing material state that much more
complex.
Weak Materials:
Again for renderlists we need a way to create materials derived from
other materials but without the strict requirement that modifications to
the original material wont affect the derived ("weak") material. The
only requirement is that its possible to later check if the original
material has been changed.
A summary of the new design:
A CoglMaterial now basically represents a diff against its parent.
Each material has a single parent and a mask of state that it changes.
Each group of state (such as the blending state) has an "authority"
which is found by walking up from a given material through its ancestors
checking the difference mask until a match for that group is found.
There is only one root node to the graph of all materials, which is the
default material first created when Cogl is being initialized.
All the groups of state are divided into two types, such that
infrequently changed state belongs in a separate "BigState" structure
that is only allocated and attached to a material when necessary.
CoglMaterialLayers are another sparse structure. Like CoglMaterials they
represent a diff against their parent and all the layers are part of
another graph with the "default_layer_0" layer being the root node that
Cogl creates during initialization.
Copying a material is now basically just a case of slice allocating a
CoglMaterial, setting the parent to be the source being copied and
zeroing the mask of changes.
Flush overrides should now be handled by simply relying on the cheapness
of copying a material and making changes to it. (This will be done in a
follow on commit)
Weak material support will be added in a follow on commit.
As part of an effort to improve the architecture of CoglMaterial
internally this overhauls how we flush layer state to OpenGL by adding a
formal backend abstraction for fragment processing and further
formalizing the CoglTextureUnit abstraction.
There are three backends: "glsl", "arbfp" and "fixed". The fixed backend
uses the OpenGL fixed function APIs to setup the fragment processing,
the arbfp backend uses code generation to handle fragment processing
using an ARBfp program, and the GLSL backend is currently only there as
a formality to handle user programs associated with a material. (i.e.
the glsl backend doesn't yet support code generation)
The GLSL backend has highest precedence, then arbfp and finally the
fixed. If a backend can't support some particular CoglMaterial feature
then it will fallback to the next backend.
This adds three new COGL_DEBUG options:
* "disable-texturing" as expected should disable all texturing
* "disable-arbfp" always make the arbfp backend fallback
* "disable-glsl" always make the glsl backend fallback
* "show-source" show code generated by the arbfp/glsl backends
While this is totally fine (0 in the pointer context will be converted
in the right internal NULL representation, which could be a value with
some bits to 1), I believe it's clearer to use NULL in the pointer
context.
It seems that, in most case, it's more an overlook than a deliberate
choice to use FALSE/0 as NULL, eg. copying a _COGL_GET_CONTEXT (ctx, 0)
or a g_return_val_if_fail (cond, 0) from a function returning a
gboolean.
Instead of directly using a guint32 to store a bitmask for each used
texcoord array, it now stores them in a CoglBitmask. This removes the
limitation of 32 layers (although there are still other places in Cogl
that imply this restriction). To disable texcoord arrays code should
call _cogl_disable_other_texcoord_arrays which takes a bitmask of
texcoord arrays that should not be disabled. There are two extra
bitmasks stored in the CoglContext which are used temporarily for this
function to avoid allocating a new bitmask each time.
http://bugzilla.openedhand.com/show_bug.cgi?id=2132
It should be quite acceptable to use a texture without defining any
texture coords. For example a shader may be in use that is doing
texture lookups without referencing the texture coordinates. Also it
should be possible to replace the vertex colors using a texture layer
without a texture but with a constant layer color.
enable_state_for_drawing_buffer no longer sets any disabled layers in
the overrides. Instead of counting the number of units with texture
coordinates it now keeps them in a mask. This means there can now be
gaps in the list of enabled texture coordinate arrays. To cope with
this, the Cogl context now also stores a mask to track the enabled
arrays. Instead of code manually iterating each enabled array to
disable them, there is now an internal function called
_cogl_disable_texcoord_arrays which disables a given mask.
I think this could also fix potential bugs when a vertex buffer has
gaps in the texture coordinate attributes that it provides. For
example if the vertex buffer only had texture coordinates for layer 2
then the disabling code would not disable the coordinates for layers 0
and 1 even though they are not used. This could cause a crash if the
previous data for those arrays is no longer valid.
http://bugzilla.openedhand.com/show_bug.cgi?id=2132
Previously, Cogl's texture coordinate system was effectively always
GL_REPEAT so that if an application specifies coordinates outside the
range 0→1 it would get repeated copies of the texture. It would
however change the mode to GL_CLAMP_TO_EDGE if all of the coordinates
are in the range 0→1 so that in the common case that the whole texture
is being drawn with linear filtering it will not blend in edge pixels
from the opposite sides.
This patch adds the option for applications to change the wrap mode
per layer. There are now three wrap modes: 'repeat', 'clamp-to-edge'
and 'automatic'. The automatic map mode is the default and it
implements the previous behaviour. The wrap mode can be changed for
the s and t coordinates independently. I've tried to make the
internals support setting the r coordinate but as we don't support 3D
textures yet I haven't exposed any public API for it.
The texture backends still have a set_wrap_mode virtual but this value
is intended to be transitory and it will be changed whenever the
material is flushed (although the backends are expected to cache it so
that it won't use too many GL calls). In my understanding this value
was always meant to be transitory and all primitives were meant to set
the value before drawing. However there were comments suggesting that
this is not the expected behaviour. In particular the vertex buffer
drawing code never set a wrap mode so it would end up with whatever
the texture was previously used for. These issues are now fixed
because the material will always set the wrap modes.
There is code to manually implement clamp-to-edge for textures that
can't be hardware repeated. However this doesn't fully work because it
relies on being able to draw the stretched parts using quads with the
same values for tx1 and tx2. The texture iteration code doesn't
support this so it breaks. This is a separate bug and it isn't
trivially solved.
When flushing a material there are now extra options to set wrap mode
overrides. The overrides are an array of values for each layer that
specifies an override for the s, t or r coordinates. The primitives
use this to implement the automatic wrap mode. cogl_polygon also uses
it to set GL_CLAMP_TO_BORDER mode for its trick to render sliced
textures. Although this code has been added it looks like the sliced
trick has been broken for a while and I haven't attempted to fix it
here.
I've added a constant to represent the maximum number of layers that a
material supports so that I can size the overrides array. I've set it
to 32 because as far as I can tell we have that limit imposed anyway
because the other flush options use a guint32 to store a flag about
each layer. The overrides array ends up adding 32 bytes to each flush
options struct which may be a concern.
http://bugzilla.openedhand.com/show_bug.cgi?id=2063
Every now and then someone sees the cogl_enable API and gets confused,
thinking its public API so this renames the symbol to be clear that it's
is an internal only API.
Since using addresses that might change is something that finally
the FSF acknowledge as a plausible scenario (after changing address
twice), the license blurb in the source files should use the URI
for getting the license in case the library did not come with it.
Not that URIs cannot possibly change, but at least it's easier to
set up a redirection at the same place.
As a side note: this commit closes the oldes bug in Clutter's bug
report tool.
http://bugzilla.openedhand.com/show_bug.cgi?id=521
Most Cogl debugging code conditions are marked as G_UNLIKELY with the
intention of having the CPU branch prediction always assume the
path is disabled so having debugging support in release binaries has
negligible overhead.
This patch simply fixes a few cases where we weren't using G_UNLIKELY.
All the cogl_rectangle* APIs normalize their input into into an array of
_CoglMutiTexturedRect rectangles and pass these on to our work horse;
_cogl_rectangles_with_multitexture_coords. The definition of
_CoglMutiTexturedRect had 4 separate float members, x_1, y_1, x_2 and
y_2 which meant for some common cases we were having to copy out from an
array into these members. We are now able to simply point into the users
array avoiding a copy which seems desirable when submiting lots of
rectangles.