Boost logo

Boost :

From: Rob Stewart (stewart_at_[hidden])
Date: 2004-10-20 15:20:47

From: "Erik Wien" <wien_at_[hidden]>
> "Rogier van Dalen" <rogiervd_at_[hidden]> wrote in message
> > On Wed, 20 Oct 2004 15:51:21 +0300, Peter Dimov <pdimov_at_[hidden]> wrote:
> >>
> >> Or maybe you are arguing that the string should always be kept in a
> >> particular normalized form?
> >
> > That seems to be the only way of keeping comparison, search, etcetera,
> > implementable in terms of char_traits<> functions --- and so, the only
> > way of getting performance similar to std::basic_string<>'s.
> >
> > Note that normalisation of any kind requires access to the Unicode
> > Character Database, which may take some time, especially if the
> > relevant parts happen not to be in the processor cache.
> >
> > Comparing any Unicode data in different or unknown normalisation forms
> > will therefore by definition be slow.
> True.. So what we basically need to determine, is what is most critical?
> Fast comparing of strings (Strings always represented in a given NF), or
> fast genereal string handling (NF determined when needed)

What if the class had the option, at least, to hold multiple
forms, creating each on demand? Then, the operations you invoke
would simply request the particular form they require. If that
form is not currently available, it is generated.

That approach means you need a dirty flag set by mutating
operations to know when to invalidate the secondary forms. I can
envision thrashing as operations requiring a secondary form
trigger mutations which invalidate the secondary form only to be
needed immediately thereafter.

It might also be possible to mutate all currently available
generated forms, but then the complexity guarantees are affected.

Rob Stewart                           stewart_at_[hidden]
Software Engineer           
Susquehanna International Group, LLP  using std::disclaimer;

Boost list run by bdawes at, gregod at, cpdaniel at, john at