On 28 Jan 2016, at 01:04, Joaquim Duran Comas <jdurancomas@gmail.com> wrote:

If explicit simd should not be used, by now, then you should help the compiler to generate more optimized code, by aligning properly the buffers.

There is the boost.align library, which provides an aligned allocator (http://www.boost.org/doc/libs/1_60_0/doc/html/align.html)

That good to know.  Functions aligned_alloc and aligned_free can replace the functions

void *
malloc_(std::size_t alignment, std::size_t size)
{
    alignment = std::max(alignment, alignof(void *));
    size     += alignment;

    void *ptr  = std::malloc(size);
    void *ptr2 = (void *)(((uintptr_t)ptr + alignment) & ~(alignment-1));
    void **vp  = (void**) ptr2 - 1;
    *vp        = ptr;
    return ptr2;
}

void
free_(void *ptr)
{
    std::free(*((void**)ptr-1));
}

This really should have gone into C++11