Boost logo

Boost :

From: Robert Ramey (ramey_at_[hidden])
Date: 2005-09-29 23:11:08


Daryle Walker wrote:

> But you got other problems, here's the entire class in
> <boost/archive/impl/basic_xml_oarchive.ipp>
>
> //========================================================================
> //...
> namespace boost {
> namespace archive {
>
> namespace detail {
> template<class CharType>
> struct XML_name {
> typedef bool result_type;
> void operator()(CharType t) const{
> unsigned char lookup_table[] = {
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0, // -.
> 1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0, // 0-9
> 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // A-
> 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1, // -Z _
> 0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, // a-
> 1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0, // -z
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
> 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> };
> if((unsigned)t > 127)
> return;
> if(0 == lookup_table[(unsigned)t])
> boost::throw_exception(
> xml_archive_exception(
> xml_archive_exception::xml_archive_tag_name_error
> )
> );
> }
> };
>
> //...
> //========================================================================
>
> A static-const table would have to be created for each version of
> XML_name, but (in the code later in the file) you only use a "const
> char" version of the template. So why bother making this a template
> at all? Believe it or not, Boost code doesn't _have_ to be made of
> templates.

When I wrote this I didn't know that it woudl be used for only one kind of
character.

> * Why do you use "const char" instead of just "char" in your
> instantiations?

off hand I don't remember. But it might ot might not have had something to
do with
what happens when one passes a const argument to an argument taking a
non-const. This will generate a compile error for class types. I don't
remember if compilers complain for primitive types or not.

>
> * result_type doesn't match what operator() returns.

It seems that that is vestige of some earlier incarnation.

> * If the table is just 0's and 1's, why not just use "bool"? Is it
> because "bool" can be bigger than a byte?

yes

>
> * The code assumes that CharType is compatible with "unsigned". What
> if it isn't? This suggests to be that you should make a non-template
> function that optimizes the code for "char" and make the class
> template call that function (with appropriate conversions).
>
> * The code assumes that CharType's code points are ASCII compatible.

an oversight on my part.

> * XML is supposed to be Unicode-friendly. What about non-ASCII
> characters that could be used in XML names?

The code assumes that CharType can be converted to an unsigned integer.
This is true for the instantiation used.

> * As another poster said, what's wrong with using functions like
> "isalnum"? If you're afraid it may include non-ASCII letters and
> numbers, then just use a giant switch statement. Hopefully compiler
> makers haven't forgotten how to optimize those (usually into a table).

isalnum didn't include certain punctuation that is legal in xml tags so it
whould been
isalmum || ... . In fact, what is really needed is is_xxml_tag_char - so I
made this.

> * Why do bad values outside the range simply return while ones inside
> the range throw?

I don't remember why I did this now.

There is a whole layer going on here that isn't really apparent. When an
XML archive is read xml tags are converted to string variables for internal
usage withing the archive implementation. This presents a problem. There
are currently two cases

xml_?archive for normal 8 bit characters
and _w?archive for wide characters.

xml_archive just reads in the characters into the string. Whether the xml
is in ascii, the local multi-byte character set or utf8 is not taken into
account. That is the codecvt facet doesn't do anything.

xml_warchive presumes that the xml is coded in utf16. variables to be
placed in strings - such as xml tags are tranlated into multi-byte
characters in the current locale. Then they are checked here for invalid
xml_characters.

So what's really missing here is a good ix_xml_tag function. Given the
concerns raised in this email, it would seem that it should consider the
posibility of EBCDIC characters as well. Should anyone have an
implementation of such a function I would be pleased to consider it.

Robert Ramey


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk