|
Boost : |
From: Vladimir Pozdyayev (ardatur_at_[hidden])
Date: 2004-10-28 03:30:00
JM> The regex internals are in the process of being completely rewritten (code
JM> is in cvs in the regex5 branch), I hope to merge this to the main trunk in
JM> the next few weeks: mainly it's the docs that I need to bring up to date.
JM> Regex parsing and state machine construction should now be quite
JM> straightforward to understand (within limits for a regex engine obviously!),
JM> so I would urge you to take a look (I can send you a zip if you don't have
JM> cvs access).
Which cvs do you mean? I suppose it's not the main boost cvs, as I
can't see anything named "regex5" there. I think the easiest way would
be mailing a zip, yes. Though the cvs address would do, if there's an
anonymous access.
JM> I think the main problem is providing the same feature set as the existing
JM> engine - my understanding is that no machine can have the complexity you
JM> claim and still match backrefs, or even I believe wide characters (because
JM> the character set is too large to realistically build a table based NFA).
JM> Is that correct?
There's no backrefs, of course. Speaking of wide characters, they do
not provide that much trouble: character sets do not have to be bit
vectors or something like that. Any functional object that takes a
character and returns boolean would be ok. This way charsets can even
use locale-related system calls and stuff---let's call them
`algorithmical' charsets instead of potentially huge table-based ones.
-- Best regards, Vladimir Pozdyayev.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk