* shader: split CM rgba/rgbx into discard ones
make it branchless if we have no discards.
* shader: ensure we dont stall on vbo uv buffer
if we render a new texture before the previous was done gpu wise its
going to stall until done, call glBufferData to orphan the data.
this allows the driver to return a new memory block immediately
if the GPU is still reading from the previous one
* protocols: ensure we reset GL_PACK_ALIGNMENT
reset GL_PACK_ALIGNMENT back to the default initial value of 4
* shader: use unsigned short in VAO
loose a tiny bit of precision but gain massive bandwidth reductions.
use GL_UNSIGNED_SHORT and set it as normalized. clamp and round the UV
for uint16_t in customUv.
* shader: interleave vertex buffers
use std::array for fullverts, use a single interleaved buffer for
position and uv, should in theory improve cache locality. and also remove
the need to have two buffers around.
* shader: revert precision drop
we need the float precision because we might have 1.01 or similiar
floats entering CM shader maths, and rounding/clamping those means the
maths turns out wrong. so revert back to float, sadly higher bandwidth
usage.
* update doColorManagement api
* convert primaries to XYZ on cpu
* remove unused primaries uniform
---------
Co-authored-by: UjinT34 <ujint34@mail.ru>