
This adds a fast path for premultiplying an RGBA image using SSE2 instructions. SSE registers are 128-bit and we need at least 16-bits per component for the intermediate result of the multiplication so we can do two pixels in parallel with one register. The function interleaves 2 SSE registers to multiply 4 pixels in one function call with the hope that this will pipeline better. http://bugzilla.openedhand.com/show_bug.cgi?id=1939 Signed-off-by: Emmanuele Bassi <ebassi@linux.intel.com>
Description
Languages
C
98.9%
Meson
0.7%
Python
0.3%