Boost logo

Boost Users :

Subject: Re: [Boost-users] Boost's direction regarding UTF8 -> UTF32andUTF32 -> UTF8
From: John Maddock (boost.regex_at_[hidden])
Date: 2010-06-25 04:11:07


> The iterator adapter has no way of knowing it has reached the end.
>
> Consider this in u16_to_u32_iterator:
>
> void increment()
> {
> // skip high surrogate first if there is one:
> if(detail::is_high_surrogate(*m_position)) ++m_position;
> ++m_position;
> m_value = pending_read;
> }
>
> If the last character is a high surrogate, you increment the iterator
> twice, while it is only allowed to do it once.
>
> Fixing the bug means making the iterator adapter have knowledge of the
> beginning, the end, and the current position.

Ah, guilty as charged :-(

The fix is horrible though :-(

>> Of course a more complete solution would always be welcome....
>
> My library deals with this.

Good!

As I keep saying this was only supposed to be an interim solution until
someone did it properly... I'll look forward to seeing yours being reviewed!

Cheers, John.


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net