Hi Vihan,
I built a GPU implementation of two parallel ODE solvers a Adams-Bashforth-Moulton with 4 steps memory and a 4th order Runge-Kutta for some linear systems simulator a couple of years ago. The parallelization technique is not new as it is a prefix sum application (Guy E. Blelloch, Prefix Sums and Their Applications). The implementation was using internally some boost libraries; outwards interfaces for plain C, Python, and Matlab were implemented. I will send you the documentation and link to my git sources if interested on your personal email. There is no point in spamming the boost list with "non boost" items.
Some thoughts
1. It will be hard to have an interface compatible with odeint for a parallel ode solver. If you want to use a stepper approach, then you will be able to parallelize only the operations in one step ( Matrix - Vector multiplications and SAXPY operations). This will not give you enough GPU load for small systems. Therefore, for 12-15 variables I do not think that you will get any advantage using the GPU versus the CPU.
2. One way to achieve good GPU load (and performance) is to use a Prefix Sum to parallelize operations for several simulation times simultaneously. This is doable and the theory is relatively straight forward but it usually leads to an increased number of operations. However, adaptive time stepping will not be easy to implement.