Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2004-10-21 14:10:28

Rogier van Dalen wrote:
> On Wed, 20 Oct 2004 12:48:31 -0700, Eric Niebler
> <eric_at_[hidden]> wrote:
>>I think the default should be UTF-16 encoding, and that the iterator
>>should use a scheme like this to be random access. Rationale: there are
>>string algorithms that benefit from random access (Boyer-Moore comes to
> Correct me if I'm wrong. From what I gather from a Google search,
> Boyer-Moore is a fast string search algorithm. Why not use the
> algorithm on the code units rather than codepoints? UTF-8 and UTF-16
> are both not stateful, specifically to allow optimisations such as
> this (as well as error recovery).

Searching a Unicode string for a particular bit pattern is not
particularly meaningful because the same string can be represented with
different bit patterns. Have I misinterpreted what you are suggesting?

Eric Niebler
Boost Consulting

Boost list run by bdawes at, gregod at, cpdaniel at, john at