|
Boost : |
From: Vyacheslav E. Andrejev (mortituris_at_[hidden])
Date: 2007-03-08 12:52:47
Hello Joel,
JG> No, that does not happen. The signed(ness) is ignored by the
JG> char-set class. The full 8-bits is mapped to a 256-bit bitset. Do
JG> you see a need for negative char values?
Unfortunately, it does happen. Look at the definition of two functions. The
first is boost::spirit::utility::impl::construct_chset (in the file $(BOOST)/boost/spirit/utility/impl/chset.ipp):
template <typename CharT, typename CharT2>
void construct_chset(boost::shared_ptr<basic_chset<CharT> >& ptr,
CharT2 const* definition)
{
CharT2 ch = *definition++;
while (ch)
{
CharT2 next = *definition++;
if (next == '-')
{
next = *definition++;
if (next == 0)
{
ptr->set(ch);
ptr->set('-');
break;
}
ptr->set(ch, next);
}
else
{
ptr->set(ch);
}
ch = next;
}
}
It is obvious that for our input "\x9\xA\xD\x20-\xFF" the function "ptr->set(ch,
next);" will be invoked with the first argument equals to '\x20', i.e. 32,
and the second equals to '\xff', i.e. -1. Yes, you right, the implementation
then will try to map these arguments into a bitset of unsigned values. But
look at the way it is implemented in boost::spirit::basic_chset_8bit<CharT>::set.
You can find the implementation of in $(BOOST)/boost/spirit/utility/impl/chset/basic_chset.ipp:
basic_chset_8bit<CharT>::set(CharT from, CharT to)
{
for (int i = from; i <= to; ++i)
bset.set((unsigned char)i);
}
There will be not a single iteration when from == 32 and to == -1.
To correct this issue I suggest change
Char = chset_t("\x9\xA\xD\x20-\xFF");
to
Char = chset_t("\x9\xA\xD\x20-\x7f\x80-\xFF");
The latter one will work independently of sign'ness of the char type.
JG> Do
JG> you see a need for negative char values?
No. But char is signed by default on my compiler. That fact along with improper
initialization of Char primitive caused me problems when I tried to implement
XML comments skipping in boost::serialization.
Regards.
-- -- Vyacheslav E. Andrejev -- System Architect, Optech International, Inc. -- E-mail: mortituris_at_[hidden] JG> Vyacheslav E. Andrejev wrote: JG> >> Hello All, >> >> XML grammar parser in boost::serialization has a definition of spirit >> primitive to deal with XML character set. Initialization of this >> pimitive looks like following (file >> $(BOOST)/libs/serialization/src/xml_grammar.cpp): >> >> Char = chset_t("\x9\xA\xD\x20-\xFF"); >> >> Obviously, if char type is signed, then \xFF means -1 and the above >> initialization will be equivalent to >> >> Char = chset_t("\x9\xA\xD"); >> JG> No, that does not happen. The signed(ness) is ignored by the JG> char-set class. The full 8-bits is mapped to a 256-bit bitset. Do JG> you see a need for negative char values? JG> JG> Regards, JG> JG> _______________________________________________ JG> Unsubscribe & other changes: JG> http://lists.boost.org/mailman/listinfo.cgi/boost
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk