Boost :

Date view	Thread view	Subject view	Author view

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2004-10-21 10:06:27

Next message: Robert Ramey: "[boost] Re: [serialize] xml archives and base 64 encoded binary data"
Previous message: Rogier van Dalen: "Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?"
In reply to: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"
Reply: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"

On Wed, 20 Oct 2004 12:48:31 -0700, Eric Niebler
<eric_at_[hidden]> wrote:
> I think the default should be UTF-16 encoding, and that the iterator
> should use a scheme like this to be random access. Rationale: there are
> string algorithms that benefit from random access (Boyer-Moore comes to
> mind).

Correct me if I'm wrong. From what I gather from a Google search,
Boyer-Moore is a fast string search algorithm. Why not use the
algorithm on the code units rather than codepoints? UTF-8 and UTF-16
are both not stateful, specifically to allow optimisations such as
this (as well as error recovery).

As was pointed out earlier in this thread, searching for Unicode
characters takes looking at combining characters as well. I think this
will go for many, if not all, algorithms that you can think of: either
they can be made to work with code units, or they must work on
abstract characters, which means a variable-width encoding anyway.
(See the Unicode Standard 4, Section 2.5 for a similar argument for
UTF-16 over UTF-32, even though the latter is fixed-width.)

I'm ready to be proven wrong; however, at this moment at least I
believe that any effort to make UTF-16 randomly accessible is not
useful.

Regards,
Rogier

Next message: Robert Ramey: "[boost] Re: [serialize] xml archives and base 64 encoded binary data"
Previous message: Rogier van Dalen: "Re: [boost] Re: Re: Re: Any interest in adding unicode support to boost?"
In reply to: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"
Reply: Eric Niebler: "[boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk