Boost logo

Boost :

From: Vyacheslav E. Andrejev (mortituris_at_[hidden])
Date: 2007-03-08 12:52:47


Hello Joel,

JG> No, that does not happen. The signed(ness) is ignored by the
JG> char-set class. The full 8-bits is mapped to a 256-bit bitset. Do
JG> you see a need for negative char values?

Unfortunately, it does happen. Look at the definition of two functions. The
first is boost::spirit::utility::impl::construct_chset (in the file $(BOOST)/boost/spirit/utility/impl/chset.ipp):

    template <typename CharT, typename CharT2>
    void construct_chset(boost::shared_ptr<basic_chset<CharT> >& ptr,
            CharT2 const* definition)
    {
        CharT2 ch = *definition++;
        while (ch)
        {
            CharT2 next = *definition++;
            if (next == '-')
            {
                next = *definition++;
                if (next == 0)
                {
                    ptr->set(ch);
                    ptr->set('-');
                    break;
                }
                ptr->set(ch, next);
            }
            else
            {
                ptr->set(ch);
            }
            ch = next;
        }
    }

It is obvious that for our input "\x9\xA\xD\x20-\xFF" the function "ptr->set(ch,
next);" will be invoked with the first argument equals to '\x20', i.e. 32,
and the second equals to '\xff', i.e. -1. Yes, you right, the implementation
then will try to map these arguments into a bitset of unsigned values. But
look at the way it is implemented in boost::spirit::basic_chset_8bit<CharT>::set.
You can find the implementation of in $(BOOST)/boost/spirit/utility/impl/chset/basic_chset.ipp:

basic_chset_8bit<CharT>::set(CharT from, CharT to)
{
    for (int i = from; i <= to; ++i)
        bset.set((unsigned char)i);
}

There will be not a single iteration when from == 32 and to == -1.

To correct this issue I suggest change

Char = chset_t("\x9\xA\xD\x20-\xFF");

to

Char = chset_t("\x9\xA\xD\x20-\x7f\x80-\xFF");

The latter one will work independently of sign'ness of the char type.

JG> Do
JG> you see a need for negative char values?
No. But char is signed by default on my compiler. That fact along with improper
initialization of Char primitive caused me problems when I tried to implement
XML comments skipping in boost::serialization.

Regards.

--
-- Vyacheslav E. Andrejev
-- System Architect, Optech International, Inc.
-- E-mail: mortituris_at_[hidden]
JG> Vyacheslav E. Andrejev wrote:
JG> 
>> Hello All,
>> 
>> XML grammar parser in boost::serialization has a definition of spirit
>> primitive to deal with XML character set. Initialization of this
>> pimitive looks like following (file
>> $(BOOST)/libs/serialization/src/xml_grammar.cpp):
>> 
>> Char = chset_t("\x9\xA\xD\x20-\xFF");
>> 
>> Obviously, if char type is signed, then \xFF means -1 and the above
>> initialization will be equivalent to
>> 
>> Char = chset_t("\x9\xA\xD");
>> 
JG> No, that does not happen. The signed(ness) is ignored by the
JG> char-set class. The full 8-bits is mapped to a 256-bit bitset. Do
JG> you see a need for negative char values?
JG> 
JG> Regards,
JG> 
JG> _______________________________________________
JG> Unsubscribe & other changes:
JG> http://lists.boost.org/mailman/listinfo.cgi/boost

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk