Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost's direction regarding UTF8 -> UTF32 andUTF32 -> UTF8
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2010-06-24 15:16:03


John Maddock wrote:

> Really? That would be a bug, the intention is that they should always
> throw an exception when given invalid input.

The iterator adapter has no way of knowing it has reached the end.

Consider this in u16_to_u32_iterator:

void increment()
{
    // skip high surrogate first if there is one:
    if(detail::is_high_surrogate(*m_position)) ++m_position;
    ++m_position;
    m_value = pending_read;
}

If the last character is a high surrogate, you increment the iterator
twice, while it is only allowed to do it once.

Fixing the bug means making the iterator adapter have knowledge of the
beginning, the end, and the current position.

> Of course a more complete solution would always be welcome....

My library deals with this.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net