|
Boost : |
From: Miroslav Silovic (miro_at_[hidden])
Date: 2002-10-24 07:02:13
John Maddock wrote:
>Ok here's the low-down:
>
>When compiled as a shared library there is a lot of stuff in there: three
>different traits classes (you'll only use one of these, but which one?),
>support for two different character types (char and wchar_t), support for
>POSIX API layers, and a high level simplified wrapper class that some
>newbees find useful (which also includes thing like file searching).
>
Hmm, ouch. This means that the entire code gets re-instantiated for both
string and stream iteration (right?)
>
>If you compile and link as a static library then you obviously only link to
>what you need, and code size should be in the 100K range (it depends a bit
>on which traits class you use - if you start using std::locale then code
>size can go up quite a bit).
>
Static linking... Ick! :)
>I suppose I should refactor into multiple shared libs, but IMO that's a
>maintenance nightmare: I'm not saying it won't happen, just that it's not as
>trivial as it sounds :-(
>
>
Actually, I think it'd make a lot of sense to refactor the lib into a
buffered regexp matcher. Have a (variable-size, depending on the needs
of the concrete regexp) buffer of char*, and then use interface that
walls off the template parameter instances behind vtable barrier. The
buffer would, of course, cut down on the virtual method call overhead,
as you'd match chunks of the string rather than the entire iterator
contents (obviously, some of the old chunks would have to be kept for
backtracking, and you could just keep them in something like dequeue).
Wide characters could be handled through virtual calls for comparison
that'd operate on a character buffer (instead of the individual chars).
I had something like this in mind (very VAGUE code follows):
template<class Iterator, class CharTraits>
class RegexpAdapter {
// matching support
virtual size_t fill_buffer(Iterator i, char* buffer, size_t
max_size) = 0;
virtual bool string_equal(Iterator i, char* str, size_t string_size)
= 0;
virtual bool match_character_class(Iterator i, char* compiled_class,
size_t class_size) = 0;
// compilation support
virtual size_t tokenize_buffer(Iterator i, RegexpToken*
token_buffer, size_t max_size) = 0;
// etc - these would be a good starting point
};
Then write a templated implementation for this interface - the
implementation of the primitives would isolate the
char/iterator-dependant parts of regexp matching.
... and yes, I know that this mixes very low-level and very high-level
concepts. IMHO, that's how it should be. :)
Miro
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk