2ded18933e
cogl_matrix_project_points and cogl_matrix_transform_points had an optimization for the common case where the stride parameters exactly match the size of the corresponding structures. The code for both when generated by gcc with -O2 on x86-64 use two registers to hold the addresses of the input and output arrays. In the strided version these pointers are incremented by adding the value of a register and in the packed version they are incremented by adding an immediate value. I think the difference in cost here would be negligible and it may even be faster to add a register. Also GCC appears to retain the loop counter in a register for the strided version but in the packed version it can optimize it out and directly use the input pointer as the counter. I think it would be possible to reorder the code a bit to explicitly use the input pointer as the counter if this were a problem. Getting rid of the packed versions tidies up the code a bit and it could potentially be faster if the code differences are small and we get to avoid an extra conditional in cogl_matrix_transform_points. |
||
---|---|---|
.. | ||
cogl | ||
doc | ||
pango | ||
Makefile.am |