Boost logo

Boost :

From: scleary_at_[hidden]
Date: 2001-10-01 08:59:58


--- In boost_at_y..., dietmar_kuehl_at_y... wrote:
> My approach to an XML parser is to use a tokenizer to chop the
> XML sequence into digestable parts. On top if this tokenizer,
> another iterator verifying well-formedness is sitting. Optionally,
> yet another iterator checking validity is used where both of these
> iterators implement the same concept (something like "XML object
> iterator"; an XML object can be an entity, its attributes, contents,
> etc. and the iterator will tell what it is currently sitting on
> using an appropriate accessor).

I like this idea very much, and have used it with Python's
iterators. The only time I've used it in C++ was in implementing a
very simple subset of XML -- I have used Xerces before as well, and
dropped it because of problems already mentioned by others (docs,
etc.). Also, I was annoyed that only SAX and DOM were provided --
I've always wanted an interface like the one you suggest.

As for the name, I started out calling them "iterators", but as I
worked with them, they became less iterator-like and more input-
stream-like. So I called them "data streams" to avoid confusion with
istreams.

The design I ended up with was to have a "data stream" type and
a "buffered data stream" type. A DataStream supports:
  typedef ... element_type;
  bool eoi(); // End Of Input
  element_type next(); // may only be called if !eoi()

Whereas a buffered DataStream supports:
  typedef ... element_type;
  bool eoi();
  const element_type & get(); // "current" data item; may only be
called if !eoi()
  void next(); // discards the "current" data item; can only be
called after get(), putback(), or eoi() which returned false
  element_type consume(); // equivalent to get(), then next()
  void putback(const element_type & x); // places x back into input
stream

A type is provided which converts any DataStream into a
BufferedDataStream. I've also done a few variations on this: a lot
of parsing DataStreams only need a single-element buffer, so I've
done BufferedDataStreams with no putback(). Another variation is to
allow the check for eoi() but not require it: have get() (for
BufferedDataStreams) and next() (for DataStreams) throw a special
exception type end_of_input. This might slow down the code (I
haven't done any testing), but it usually makes writing upper-layer
iterators easier, in my experience anyway.

Hope these ideas help!

        -Steve


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk