Boost logo

Boost :

Subject: Re: [boost] [Review:Algorithms] Comments about search algorithms
From: Marshall Clow (mclow.lists_at_[hidden])
Date: 2011-09-24 19:56:24


On Sep 24, 2011, at 11:22 AM, Phil Endecott wrote:

> Dear All,
>
> Here are some observations about the proposed search algorithms:
>
> - I was confused as to whether the caller must keep the pattern pointed to by the iterators passed to the constructor after the constructor returns. My guess was that once the internal tables were built the pattern would not be needed any more - but that turns out to be wrong. At the least, the docs need to explain this.

OK.

> - The three algorithms have very similar interfaces. The docs could perhaps define a "searcher" concept and explicitly declare these algorithms' classes to be models of that concept.

I'll see what i can do.

> - std::search is similar to the three algorithms' free function versions (congratulations on getting the arguments in the same order!). For completeness, how about adding a searcher class that uses std::search?

They weren't at first ;-)

> - A user faced with these three (or four) algorithms will need to choose one. The docs should offer some guidance, e.g. benchmarks or rules-of-thumb, about which to use.

OK.

> - It would sometimes be useful to have a variant of the constructor that takes a const char* pattern.

You can do that today with ( str, str + strlen (str), but I'll think about how to do it better.
Maybe just a specialization for char * (and wchar?)
The trick, I think, will be to avoid having a general pointer-based version that insists on null termination.

> - The tables store ints. For patterns shorter than 256 characters I think that uint8_t would be sufficient, and would reduce the size of a table from 1024 or 2048 bytes to only 256 bytes. Similarly uint16_t or uint32_t could be used for larger patterns. It is not obvious how to implement this, though; perhaps an optional max_pattern_length template parameter, making it the user's responsibility to ensure the supplied pattern is not too long.
>
> - The docs describe some methods that are private to the implementation; why?

Are you referring to the traits class? If so, that section is there because the implementation is a customization point, rather than a private implementation detail.
The current implementation has two options (array and sparse table), and it chooses which one based on the input types - the array is faster, but is really only suitable for character-sized types (which is probably the most common case). The unordered_map is slower, but much more memory efficient for large alphabets.

I guess I didn't make this clear in the documentation. I will see if I can make it better.

All this is now under:
        https://github.com/mclow/Boost.Algorithm/issues/8 (for code issues)
and https://github.com/mclow/Boost.Algorithm/issues/9 (for documentation issues)

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists_at_[hidden]>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk