Boost logo

Boost :

From: Eric Niebler (eric_at_[hidden])
Date: 2004-10-21 14:10:28


Rogier van Dalen wrote:
> On Wed, 20 Oct 2004 12:48:31 -0700, Eric Niebler
> <eric_at_[hidden]> wrote:
>
>>I think the default should be UTF-16 encoding, and that the iterator
>>should use a scheme like this to be random access. Rationale: there are
>>string algorithms that benefit from random access (Boyer-Moore comes to
>>mind).
>
>
> Correct me if I'm wrong. From what I gather from a Google search,
> Boyer-Moore is a fast string search algorithm. Why not use the
> algorithm on the code units rather than codepoints? UTF-8 and UTF-16
> are both not stateful, specifically to allow optimisations such as
> this (as well as error recovery).
>

Searching a Unicode string for a particular bit pattern is not
particularly meaningful because the same string can be represented with
different bit patterns. Have I misinterpreted what you are suggesting?

-- 
Eric Niebler
Boost Consulting
www.boost-consulting.com

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk