This adds a fast path for premultiplying an RGBA image using SSE2
instructions. SSE registers are 128-bit and we need at least 16-bits
per component for the intermediate result of the multiplication so we
can do two pixels in parallel with one register. The function
interleaves 2 SSE registers to multiply 4 pixels in one function call
with the hope that this will pipeline better.
http://bugzilla.openedhand.com/show_bug.cgi?id=1939
Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
OpenGL ES has no PBO extension, so we fallback to using a malloc'ed
buffer. Make sure the OpenGL-only defines don't leak into the OpenGL ES
compilation.
First, let's add a new public feature called, surprisingly,
COGL_FEATURE_PBOS to check the availability of PBOs and provide a
fallback path when running on older GL implementations or on OpenGL ES
In case the underlying OpenGL implementation does not provide PBOs, we
need a fallback path (a malloc'ed buffer). The CoglPixelBufer
constructors will instanciate a subclass of CoglBuffer that handles
map/unmap and set_data() with a malloc'ed buffer.
The public feature is useful to check before using set_data() on a
buffer as it will mean doing a memcpy() when not supporting PBOs (in
that case, it's better to create the texture directly instead of using a
CoglBuffer).
The only goal of using COGL buffers is to use them to create
textures. cogl_texture_new_from_buffer() is the new symbol to create
textures out of buffers.
This subclass of CoglBuffer aims at wrapping PBOs or other system
surfaces like DRM buffer objects. Two constructors are available:
cogl_pixel_buffer_new() with a size when you only care about the size of
the buffer (such a buffer can be used to store several texture data such
as the three planes of a I420 frame).
cogl_pixel_buffer_new_full() is more a 1:1 mapping between the data and
an underlying surface, with the possibility of having access to a low
level memory buffer that may have a stride.
Buffer objects are cool! This abstracts the buffer API first introduced
by GL_ARB_vertex_buffer_object and then extended to other objects.
The coglBuffer abstract class is intended to be the base class of all
the buffer objects, letting the user map() buffers. If the underlying
implementation does not support buffer objects (or only support VBO but
not FBO for instance), fallback paths should be provided.
The only way the user has to set the mipmap filters is through the
material/layer API. This API defaults to GL_LINEAR/GL_LINEAR for the max
and min filters. With the main use case of cogl being 2D interfaces, it
makes sense do default to GL_LINEAR for the min filter.
When creating new textures, we did not set any filter on them, using
OpenGL defaults': GL_NEAREST_MIPMAP_LINEAR for the min filter and
GL_LINEAR for the max filter. This will make the driver allocate memory
for the mipmap tree, memory that will not be used in the nominal case
(as the material API defaults to GL_LINEAR).
This patch tries to ensure that the min filter is set to GL_LINEAR
before any glTexImage*() call is done on the texture by setting the
filter when generating new OpenGL handles.
Some GL functions have a return value that the GE() macro is not able to
handle. Let's define a new Ge_RET() macro which will be able to handle
functions such as glMapBuffer().
While at it, removed the unused variadic dots to the GE() macro.
When we trashed the contents of the stencil buffer during
_cogl_path_fill_nodes we marked the clip stack state as dirty and expected
the clip stack code would clean up our glStencilFunc state.
The problem is that we only try and update the clip state during
_cogl_journal_init (when we flush the framebuffer state) which is only
called when the journal first gets something logged in it.
To make sure the stencil state is cleaned up we now also flush the journal
so _cogl_journal_init will be called for the next logged rectangle.
This adds three new texture backends.
- CoglTexture2D: This is a trimmed down version of CoglTexture2DSliced
which only supports a single texture and only works with the
GL_TEXTURE_2D target. The code is a lot simpler so it has a less
overheads than dealing with slices. Cogl will use this wherever
possible.
- CoglSubTexture: This is used to get a CoglHandle to represent a
subregion of another texture. The texture can be used as if it was a
standalone texture but it does not need to copy the resources.
- CoglAtlasTexture: This collects RGB and RGBA textures into a single
GL texture with the aim of reducing texture state changes and
increasing batching. The backend will try to manage the atlas and
may move the textures around to close gaps in the texture. By
default all textures will be placed in the atlas.
There was a typo in getting the height of the full texture to check
whether the sub region fits so that it was using the width
instead. This was causing crashes when debugging is enabled for some
apps.
In cogl_texture_new_from_file we create and own a temporary
bitmap. There's no need to copy this data if we need to do a premult
conversion so instead it just does conversion before passing it on to
cogl_texture_new_from_bitmap.
The Cogl atlas code was using _cogl_texture_prepare_for_upload with a
NULL pointer for the dst_bmp to determine the internal format of the
texture without converting the bitmap. It needs to do this to decide
whether the texture will go in the atlas before wasting time on the
conversion. This use of the function is a little confusing so that
part of it has been split out into a new function called
_cogl_texture_determine_internal_format. The code to decide whether a
premult conversion is needed has also been split out.
Commit 92a375ab4 changed the initial value of max_texcoord_attrib_unit
to -1 so that it could disable the texture coord array for the first
texture unit when there are no texture coords used in the vbo. However
max_texcoord_attrib_unit was an unsigned value so this actually became
G_MAXUINT. The disabling loop at the bottom still worked because
G_MAXUINT+1==0 but the check for whether any texture unit is greater
than max_texcoord_attrib_unit was failing so it would always end up
disabling all texture units. This is now fixed by changing
max_texcoord_attrib_unit to be signed.
When deciding if a material layer is equal it now compares the GL
target and texture number if the textures are not sliced. This is
needed to get batching across atlased textures.
Cogl accepts a pixel format for both the data in memory and the
internal format to be used for the texture. If they do not match then
it would convert them using the CoglBitmap functions before uploading
the data. However, GL also lets you specify both formats so it makes
more sense to let GL do the conversion. The driver may need the
texture in a specific format so it may end up being converted anyway.
The cogl_texture_upload_data functions have been removed and replaced
with a single function to prepare the bitmap. This will only do the
premultiplication conversion because that is the only part that GL
can't do directly.
The premult part of _cogl_convert_premult has now been split out as
_cogl_convert_premult_status. _cogl_convert_premult has been renamed
to _cogl_convert_format to make it less confusing. The premult
conversion is now done in-place instead of copying the
buffer. Previously it was copying the buffer once for the format
conversion and then copying it again for the premult conversion. The
premult conversion never changes the size of the buffer so it's quite
easy to do in place. We can also use the separated out function
independently.
The internal format of the atlas texture is still set to the
appropriate format so Cogl will disable blending for textures that are
intended to be RGB. This should end up ignoring the alpha channel from
the texture in the atlas. This makes the code slightly easier to
maintain and should also improve the chances of batching.
Instead of assigning a new colour to each quad of a batch, the
rectangle debugging code now assigns a new colour to each batch so
that it can be used to visually see what is being batched. The colour
is stored in a global variable that is reset during cogl_clear. This
improves the chances that the same colour will be used for a batch in
the next frames to avoid flickering.
When setting up the state for the vertex buffer,
enable_state_for_drawing_buffer tries to keep track of the highest
numbered texture unit in use. It then disables any texture arrays for
units that were previously enabled if they are greater than that
number. However if there is no texturing in the VBO then the max used
unit would be left at 0 which it would later think meant unit 0 is
still in use so it wouldn't disable it. To fix this it now initialises
the max used unit to -1 which it should interpret as ‘no units are in
use’ so it will later disable the arrays for all units.
Thanks to Jon Mayo for reporting the bug.
http://bugzilla.openedhand.com/show_bug.cgi?id=1957
We were checking the number of texture units against the GL enum that is
used in glGetInteger() to query that number. Let's abstract this in a
little function.
Took the opportunity to dig a bit on the usage of GL limits for the
number of texture (image) units and document our use of them. We'll need
something finer grained if we want to fully exploit texture image units
with a programmable pipeline.
The index field of CoglTextureUnit was never set, leading to the
creation of units with index set to 0. When trying to retrieve a texture
unit by its index (!= 0) with _cogl_get_texture_unit(), a new one was
created as it could not find it back in the list of textures units:
ctx->texture_units.
http://bugzilla.openedhand.com/show_bug.cgi?id=1958
Previously the atlas textures were being created with whatever format
the first sub texture is in. Only three formats are supported so this
only matters if the first texture is a premultiplied alpha
texture. Instead it now masks out the premultiplied bit so that the
textures are always either RGB_888 or RGBA_8888.
When uploading texture data it was just calling cogl_texture_set_data
on the large texture. This would attempt to convert the data to the
format of the large texture. All of the textures with alpha channels
are stored together regardless of whether they are premultiplied so
this was causing premultiplied textures to be unpremultiplied
again. It now just uploads the data ignoring the premult bit of the
format so that it only gets converted once.
With the atlas texture backend ensuring the mipmaps can make it become
a completely different texture which will have different texture
coordinates or may even be sliced. Therefore we need to ensure the
mipmaps before deciding which quads to log in the journal. This adds a
new private function to cogl-material which ensures the mipmaps if
needed.
The sub texture backend doesn't work well as a completely general
texture backend because for example when rendering with cogl_polygon
it needs to be able to tranform arbitrary texture coordinates without
reference to the other coordintes. This can't be done when the texture
coordinates are a multiple of one because sometimes the coordinate
should represent the left or top edge and sometimes it should
represent the bottom or top edge. For example if the s coordinates are
0 and 1 then 1 represents the right edge but if they are 1 and 2 then
1 represents the left edge.
Instead the sub-textures are now documented not to support coordinates
outside the range [0,1]. The coordinates for the sub-region are now
represented as integers as this helps avoid rounding issues. The
region can no longer be a super-region of the texture as this
simplifies the code quite a lot.
There are two new texture virtual functions:
transform_quad_coords_to_gl - This transforms two pairs of coordinates
representing a quad. It will return FALSE if the coordinates can
not be transformed. The sub texture backend uses this to detect
coordinates that require repeating which causes cogl-primitives
to use manual repeating.
ensure_non_quad_rendering - This is used in cogl_polygon and
cogl_vertex_buffer to inform the texture backend that
transform_quad_to_gl is going to be used. The atlas backend
migrates the texture out of the atlas when it hits this.
When calculating the next integer position for negative coordinates it
would not increment if the position is already a multiple of one so we
need to manually add one.
When try_creating_fbo fails it returns 0 to report the error and if it
succeeds it returns ‘flags’. However cogl_offscreen_new_to_texture
also passes in 0 for the flags as the last fallback to create the fbo
with nothing but the color buffer. In that case it will return 0
regardless of whether it succeeded so the last fallback will always be
considered a failure.
To fix this it now just returns a gboolean to indicate whether it
succeeded and the flags used for each attempt is assigned when passing
the argument rather than from the return value of the function.
Also if the only configuration that succeeded was with flags==0 then
it would always try all combinations because last_working_flags would
also be zero. To avoid this it now uses a separate gboolean to mark
whether we found a successful set of flags.
http://bugzilla.openedhand.com/show_bug.cgi?id=1873
Since 755cce33a7 the framebuffer code is using the GL enums
GL_DEPTH_ATTACHMENT and GL_DEPTH_COMPONENT16. These aren't available
directly under GLES except with the OES suffix so we need to define
them manually as we do with the other framebuffer constants.
These macros used to define Cogl wrappers for the GLenum values. There are
now Cogl enums everywhere in the API where these were required so we
shouldn't need them anymore. They were in the public headers but as
they are not neccessary and were not in the API docs for Clutter 1.0
it should be safe to remove them.
If a user supplied multiple groups of texture coordinates with
cogl_rectangle_with_multitexture_coords() then we would repeatedly log only
the first group in the journal. This fixes that bug and adds a conformance
test to verify the fix.
Thanks to Gord Allott for reporting this bug.
The Intel drivers in Mesa 7.6 (and possibly earlier versions) don't
support creating FBOs with a stencil buffer but without a depth
buffer. This reworks framebuffer allocation so that we try a number
of fallback options before failing.
The options we try in order are:
- the same options that were sucessful last time if available
- combined depth and stencil
- separate depth and stencil
- just stencil, no depth
- just depth, no stencil
- neither depth or stencil
We weren't taking a reference on the texture to be used as the color buffer
for offscreen rendering, so it was possible to free the texture leaving the
framebuffer in an inconsistent state.
This adds gives Cogl a dedicated UProf context which will be linked together
with Clutter's context during clutter_init_real().
Initial timers cover _cogl_journal_flush and _cogl_journal_log_quad
You can explicitly ask for a report of Cogl statistics by exporting
COGL_PROFILE_OUTPUT_REPORT=1 but since the context is linked with Clutter's
the statisitcs will also be shown in the automatic Clutter reports.
* stage-use-alpha:
tests: Use accessor methods for :use-alpha
stage: Add accessors for :use-alpha
tests: Allow setting the stage opacity in test-paint-wrapper
stage: Premultiply the stage color
stage: Composite the opacity with the alpha channel
glx: Always request an ARGB visual
stage: Add :use-alpha property
materials: Get the right blend function for alpha
When the texture is in the atlas, ensuring the mipmaps can effectively
make it become a completely different texture so we should do this
before getting the GL handle.
Mipmaps don't work very well in the current atlas because there is not
enough padding between the textures. If ensure_mipmaps is called it
will now create a new texture and migrate the atlased texture to
it. It will use the same blit mechanism as when migrating so it will
try to use an FBO for a fast blit. However if this is not possible it
will end up downloading the data for the entire atlas which is not
ideal.
When reorganizing the textures, we can avoid downloading the entire
texture data if we bind the source texture in a framebuffer object and
copy the destination using glCopyTexSubImage2D. This is also
implemented using a much faster path in Mesa.
Currently it is calling the GL framebuffer API directly but ideally it
would use the Cogl offscreen API. However there is no way to tell Cogl
not to create a stencil renderbuffer which seems like a waste in this
situation.
If FBOs are not available it will fallback to reading back the entire
texture data as before.
This adds a 'dump-atlas-image' debug category. When enabled, CoglAtlas
will use Cairo to create a png which visualizes the leaf rectangles of
the atlas.
This adds an 'atlas' category to the COGL_DEBUG environment
variable. When enabled Cogl will display messages when textures are
added to the atlas and when the atlas is reorganized.
When space can't be found in the atlas for a new texture it will now
try to reorganize the atlas to make space. A new CoglAtlas is created
and all of the textures are readded in decreasing size order. If the
textures still don't fit then the size of the atlas is doubled until
either we find a space or we reach the texture size limits. If we
successfully find an organization that fits then all of the textures
will be migrated to a new texture. This involves copying the texture
data into CPU memory and then uploading it again. Potentially it could
eventually use a PBO or an FBO to transfer the image without going
through the CPU.
The algorithm for laying out the textures works a lot better if the
rectangles are added in order so we might eventually want some API for
creating multiple textures in one go to avoid reorganizing the atlas
as far as possible.
This adds a CoglAtlas type which is a data structure that keeps track
of unused sub rectangles of a larger rectangle. There is a new atlased
texture backend which uses this to put multiple textures into a single
larger texture.
Currently the atlas is always sized 256x256 and the textures are never
moved once they are put in. Eventually it needs to be able to
reorganise the atlas and grow it if necessary. It also needs to
migrate the textures out of the atlas if mipmaps are required.
This is an optimised version of CoglTexture2DSliced that always deals
with a single texture and always uses the GL_TEXTURE_2D
target. cogl_texture_new_from_bitmap now tries to use this backend
first. If it can't create a texture with that size then it falls back
the sliced backend.
cogl_texture_upload_data_prepare has been split into two functions
because the sliced backend needs to know the real internal format
before the conversion is performed. Otherwise the converted bitmap
will be wasted if the backend can't support the size.
This provides a way to upload the entire data for a texture without
having to first call glTexImage and then glTexSubImage. This should be
faster especially with indirect rendering where it would needlessy
send the data for the texture twice.
new_from_data and new_from_file can be implemented in terms of
new_from_bitmap so it makes sense to move these to cogl-texture rather
than having to implement them in every texture backend.
This adds a new texture backend which represents a sub texture of a
larger texture. The texture is created with a reference to the full
texture and a set of coordinates describing the region. The backend
simply defers to the full texture for all operations and maps the
coordinates to the other range. You can also use coordinates outside
the range [0,1] to create a repeated version of the full texture.
A new public API function called cogl_texture_new_from_sub_texture is
available to create the sub texture.
The CoglTextureSliceCallback function pointer now takes const pointers
for the texture coordinates. This makes it clearer that the callback
should not modify the array and therefore the backend can use the same
array for both sets of coords.
Given a region of texture coordinates this utility invokes a callback
enough times to cover the region with a subregion that spans the
texture at most once. Eg, if called with tx1 and tx2 as 0.5 and 3.0 it
it would invoke the callback with:
0.5,1.0 1.0,2.0 2.0,3.0
Manual repeating is needed by all texture backends regardless of
whether they can support hardware repeating because when Cogl calls
the foreach_sub_texture_in_region method then it sets the wrap mode to
GL_CLAMP_TO_EDGE and no hardware repeating is possible.
In _cogl_multitexture_quad_single_primitive we use a wrap mode of
GL_CLAMP_TO_EDGE if the texture coordinates are all in the range [0,1]
or GL_REPEAT otherwise. This is to avoid pulling in pixels from either
side when using GL_LINEAR filter mode and rendering the entire
texture. Previously it was checking using the unconverted texture
coordinates. This is ok unless the texture backend is radically
transforming the texture coordinates, such as in the sub texture
backend where the coordinates may map to something completely
different. We now check whether the coordinates are in range after
converting them.
Most of the fields that were previously in CoglTexture are specific to
the implementation of CoglTexture2DSliced so they should be placed
there instead. For example, the 'mipmaps_dirty' flag is an
implementation detail of the ensure_mipmaps function so it doesn't
make sense to force all texture backends to have this function.
Other fields such as width, height, gl_format and format may make
sense for all textures but I've added them as virtual functions
instead. This may make more sense for a sub-texture backend for
example where it can calculate these based on the full texture.
The CoglTexture struct previously contained some fields which are only
used to upload data such as the CoglBitmap and the source GL
format. These are now moved to a separate CoglTextureUploadData struct
which only exists for the duration of one of the cogl_texture_*_new
functions. In cogl-texture there are utility functions which operate
on this new struct rather than on CoglTexture directly.
Some of the fields that were previously stored in the CoglBitmap
struct are now copied to the CoglTexture such as the width, height,
format and internal GL format.
The rowstride was previously stored in CoglTexture and this was
publicly accessible with the cogl_texture_get_rowstride
function. However this doesn't seem to be a useful function because
there is no need to use the same rowstride again when uploading or
downloading new data. Instead cogl_texture_get_rowstride now just
calculates a suitable rowstride from the format and width of the
texture.
Commit 558b17ee1e added support for rectangle textures to the
framebuffer code. Under GLES there is no GL_TEXTURE_RECTANGLE_ARB
definition so this was breaking the build. The rest of Cogl uses
ifdef's around that constant so we should do the same here.
The correct blend function for the alpha channel is:
GL_ONE, GL_ONE_MINUS_SRC_ALPHA
As per bug 1406. This fix was dropped when the switch to premultiplied
alpha was merged.
We currently enable blending if the material colour has
transparency. This patch makes it also enable blending if any of the
lighting colours have transparency. Arguably this isn't neccessary
because we don't expose any API to enable lighting so there is no
bug. However it is currently possible to enable lighting with a direct
call to glEnable and this otherwise works so it is a shame not to have
it.
http://bugzilla.openedhand.com/show_bug.cgi?id=1907
cogl_push_draw_buffer, cogl_set_draw_buffer and cogl_pop_draw_buffer are now
deprecated and new code should use the new cogl_framebuffer_* API instead.
Code that previously did:
cogl_push_draw_buffer ();
cogl_set_draw_buffer (COGL_OFFSCREEN_BUFFER, buffer);
/* draw */
cogl_pop_draw_buffer ();
should now be re-written as:
cogl_push_framebuffer (buffer);
/* draw */
cogl_pop_framebuffer ();
As can be seen from the example above the rename has been used as an
opportunity to remove the redundant target argument from
cogl_set_draw_buffer; it now only takes one call to redirect to an offscreen
buffer, and finally the term framebuffer may be a bit more familiar to
anyone coming from an OpenGL background.
Instead of storing an enum with the backend type for each texture and
then using a switch statement to decide which function to call, we
should store pointers to all of the functions in a struct and have
each texture point to that struct. This is potentially slightly faster
when there are more backends and it makes implementing new backends
easier because it's more obvious which functions have to be
implemented.
cogl_offscreen_new_to_texture previously bailed out if the given texture's
GL target was anything but GL_TEXTURE_2D, but it now also allows
foreign GL_TEXTURE_RECTANGLE_ARB textures.
Thanks to Owen for reporting this issue, ref:
https://bugzilla.gnome.org/show_bug.cgi?id=601032
cogl_material_copy can be used to create a new CoglHandle referencing a copy
of some given material.
From now on we will advise that developers always aim to use this function
instead of cogl_material_new() when creating a material that is in any way
derived from another.
By using cogl_material_copy, Cogl can maintain an ancestry for each material
and keep track of "similar" materials. The plan is that Cogl will use this
information to minimize the cost of GPU state transitions.
This function was #if 0'd before we released Clutter 1.0 so there's no
implementation of it. At some point we thought it might assist with
developers breaking out into raw OpenGL. Breaking out to raw GL is a
difficult problem though so we decided instead we will wait for a specific
use case to arrise before trying to support it.
_cogl_material_get_layer expects a CoglMaterial* pointer but it was
being called with a CoglHandle. This doesn't matter because the
CoglHandle is actually just the CoglMaterial* pointer anyway but it
breaks the ability to change the _cogl_material_pointer_from_handle
macro.
• Use the same style for the Cogl API reference as the one used for
the Clutter API reference.
• Fix the introspection annotations for cogl_bitmap_get_size_from_file()
The imported Mesa matrix code has some documentation annotations
that make gtk-doc very angry. Since it's all private anyway we
can safely make gtk-doc ignore the offending stuff.
$(COGL_DRIVER)/cogl-defines.h is generated in the configure script so
it ends up in the build directory. Therefore the build rule for
cogl/cogl-defines.h should depend on the file in $(builddir) not
$(srcdir).
The deprecation notices in gtk-doc should also refer to the
release that added the deprecation, and if the deprecated
symbol has been replaced by something else then the new symbol
should be correctly referenced.
The main COGL header cogl.h is currently created at configure time
because it conditionally includes the driver-dependent defines. This
sometimes leads to a stale cogl.h with old definitions which can
break the build until you clean out the whole tree and start from
scratch.
We can generate a stable cogl-defines.h at build time from the
equivalent driver-dependent header and let cogl.h include that
file instead.