Boost logo

Boost :

Subject: Re: [boost] Genetics library: Volunteers needed
From: Kenneth Adam Miller (kennethadammiller_at_[hidden])
Date: 2015-07-23 15:07:38


Whoa you want ot use the JNI manually? Have you heard about JNA? It might
be good to take a look at that because it really takes the pain out of
working with Java <-> native.

On Thu, Jul 23, 2015 at 2:38 PM, Antony Polukhin <antoshkka_at_[hidden]>
wrote:

> 2015-07-21 15:11 GMT+03:00 Andy Thomason <a.thomason_at_[hidden]>:
>
> > Hi All,
> >
> > I am recruiting users for the putative genetics library.
> >
>
> Hi,
>
> I like the idea of genetic library in Boost!
>
> However code misses essential optimizations and suffers from premature
> optimizations.
>
> * dna_string misses reserve() in assignment. This makes some of the
> push_back()s slow.
> * Attempt to understande the exact search rewarded me with headache (cool
> hack, I've enjoyed it!). Too many magic constants and variables, this makes
> the algo hard to maintain. Also I have a disbelive that the algorithm is
> optimal:
> You are comparing by 4 nucleotides. 256 nucleotide combinations with length
> 4 exist. Let's assume for simplicity that nucleotides are uniform
> distributed. Algorithm will often give false positives: it will be
> triggered roughtly once each 256 nucleotide comparisons. You're doing some
> kind of vectorization, so algo will give false positives each ~8 loop
> bodies.
>
> Comparing by longer nucleotide chain will trigger the compare_inexact less
> often. For example comparing by 8 necleotides will trigger false positive
> once per ~65500 comparisons.
>
> * comparison operators require improvements. Compare sizes first (it's
> cheap!). Use memcmp in cases like `values < rhs.values || values ==
> rhs.values`. memcmp will give you an integer that already shows is value
> bigger\smaller\equal, without a need to iterate over the data for seconf
> time.
>
> * `const auto str_values = str.get_values();` - must be `const auto&
> str_values = str.get_values();`
> * provide an enum for nucleotides { nA = 0, nT = ...}. This would make the
> library more user friendly.
>
> There's more. If you're interested, I can investigate further
>
> --
> Best regards,
> Antony Polukhin
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk