Boost logo

Boost :

From: Nathan Myers (ncm_at_[hidden])
Date: 2005-05-04 09:57:51

On Wed, May 04, 2005 at 12:11:41AM -0500, Aaron W. LaFramboise wrote:
> Nathan Myers wrote:
> > After there's text in the
> > buffer, you can decide if it's enough to merit calling whatever is
> > supposed to extract and operate on it.
> It seems that if you already have code that does this by directly
> examining the buffer, there may be little point in dropping back a level
> of abstraction and then using operator>>. In particular, in a common
> case, verifying whether input is complete does most of the work of
> actually extracting it.

Imagine that you want to use somebody's library that can parse a
"{}"-delimited language like Javascript. It wants its input from an
istream& or (more likely, I hope) streambuf*. You can scan the
incoming text for matching brackets, and the bracket that matches
the first opening bracket delimits a unit. The only real work you're
doing is skipping comments and string literals. (Similarly, perhaps,
for XML.) The size of the unit to parse is not restricted to the
size of your buffer, but there might be a maximum size you care to
handle, for security/DOS if nothing else.

Then there's Giovanni's example of line delimiters, which may be a
more common use. I can recognize newlines, but don't care to
reproduce even the code to parse numbers, and don't want to copy each
line all over creation just to get the numbers out; I'd rather parse
them right from the buffer the text first landed in.

> One thing I've never understood is how extractors are supposed to be
> written when they require reading two or more sub-objects from the input
> stream. If reading the first part suceeds, but the second part fails,
> what happens to the chunk of data that was read? And how do we prevent
> the stream from being in an indeterminant state due to not knowing how
> much was read? Perhaps the solution to this problem might present new
> ideas for solutions to the nonblocking extractor problem.

Or vice versa. The old libstdc++ istream used to support indefinitely-
large pushback, but that's not really what's wanted. What you need is
a way get a token from the streambuf that lets you seek the stream back
to that point, e.g. when you find failbit set. When the token's dtor
is called, the streambuf can discard any accumulated state up to the
next place it issued a token. Of course that just takes a clever
streambuf, and doesn't need any help from the standard library.
(Unfortunately streampos can't be that token; no dtor.) A less elegant
scheme is possible: you might have a streambuf that saves _everything_,
until told to discard whatever came before some point, e.g. the current
position when you know you have a good parse. It can pubseek() back to
that point and to any point after.

Of course, once it has seeked (sought?) back there, you need to know
what to do next. I am finding it hard to think of what one might do.
Skip to the next start delimiter? You could do that without seeking
first. Try a different LR(n) parse-table production? Maybe, but
that's a bit obscure.

Nathan Myers

Boost list run by bdawes at, gregod at, cpdaniel at, john at