Boost logo

Boost Users :

Subject: [Boost-users] [boost::property_tree] rapidxml get_index bug for UTF8 ?
From: ÇÇ־ǿ (qiaozhiqiang_at_[hidden])
Date: 2011-03-04 01:04:02


When use UTF8, non ASCII char is > 127, but char is signed,
So get_index() return a big value.
char c = -120;
get_index(c)

VC2010 say:
boost::property_tree::detail::rapidxml::internal::get_index<char> returned 4294967168 unsigned int

then
internal::lookup_tables<0>::lookup_whitespace[internal::get_index(ch)]
is error.

My patch:

        inline size_t get_index(const Ch c)
        {
            // *** char c (ASCII / UTF8) and wchar_t c: 0 ~ 127 is ASCII char
            size_t r = c; //******** convert to unsigned
            // If not ASCII char, its sematic is same as plain 'z'
            // if (c > 255) //********* ASSCII is 0 to 127
            if (r > 127) //******** check r, or check if(c < 0 || c > 127)
            {
                return 'z';
            }

            return r; //******** return r
        }
  
This is boost code:
boost_1_46_0\boost\property_tree\detail\rapidxml.hpp
         template<class Ch>
        inline size_t get_index(const Ch c)
        {
            // If not ASCII char, its sematic is same as plain 'z'
            if (c > 255)
            {
                return 'z';
            }
            return c;
        }

        // Detect whitespace character
        struct whitespace_pred
        {
            static unsigned char test(Ch ch)
            {
                return internal::lookup_tables<0>::lookup_whitespace[internal::get_index(ch)];
            }
        };

        // Detect node name character
        struct node_name_pred
        {
            static unsigned char test(Ch ch)
            {
                return internal::lookup_tables<0>::lookup_node_name[internal::get_index(ch)];
            }
        };


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net