Boost logo

Boost :

From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-02-15 12:24:45


Thorsten Ottosen wrote:
> Phil Endecott skrev:
>> Joe Gottman wrote:
>>> I just discovered that the string algorithm predicate is_any_of has an
>>> inefficient implementation. Internally it uses a std::set.
>>
>> It would be nice if things like this could be at least as efficient as
>> the equivalent C functionality. I've just done some quick benchmarks
>> of possible is_any_of implementations. My test splits my
>> /usr/share/dict/words at every lower-case vowel; it's about a megabyte
>> and splits into 300,000 pieces.
>>
>> First using std::set:
>>
>> struct stdset_is_any_of {
>> std::set<char> s;
>> stdset_is_any_of(string pat) {
>> for(string::const_iterator i=pat.begin(); i!=pat.end(); ++i) {
>> s.insert(*i);
>> }
>> }
>> bool operator()(char c) {
>> return s.find(c) != s.end();
>> }
>> };
>>
>> Relative performance is 0.6.
>
> Fon something as light as char, I suspect std::set<char> or
> tr1::unordered_set<char> is not an optimal choice.
>
> Could you try this test with boost::interprocess::flat_set<char> ?
> (It's in <boost/interprocess/containers/flat_set.hpp>)

Hi Thorsten,

I can't easily test that code, but I have tried this which I think is
essentially the same algorithm:

struct strbin_is_any_of {
   string pat;
   strbin_is_any_of(string pat_): pat(pat_) {
     sort(pat.begin(),pat.end());
   }
   bool operator()(char c) {
     return binary_search(pat.begin(),pat.end(),c);
   }
};

With the same test as before, the performance of this is 1.5; that's
between the std::set (0.6) and the bit-set (4.0) implementations, as
you would expect.

Comparing it with the code that does a linear search through an
unsorted pattern string, the linear search remained faster in all of my
(small set of) tests. This binary search may do better for perhaps 100
elements in the search set, but since the bit-set needs only 32 bytes
of storage and is faster, I would switch over to that implementation
well before then.

Phil.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk