Boost logo

Boost :

From: Johan Råde (rade_at_[hidden])
Date: 2007-06-23 05:22:56


Peter Bindels wrote:
> On 23/06/07, Mathias Gaunard <mathias.gaunard_at_[hidden]> wrote:
>> Peter Bindels wrote:
>>> When searching ASCII text, it's
>>> equal;
>> Not if you handle grapheme clusters.
>>
>> If your text is "abcfoôdef", with ô coded as o + combining accent, then
>> searching for "foo" shouldn't work, since you would only find part of
>> the grapheme cluster and possibly do weird things if for example the
>> substring is removed.
>
> Combining accents, nor in fact any character with accent, were in
> ASCII last time I checked.

Exactly what question is being discussed here?
I thought the question was, how fast is text search with UTF-8 strings that happen
to contain ASCII only, compared with text search with ASCII strings.
Even if the UTF-8 strings happen to contain ASCII,
the search algorithm may still have to check for combining characters.

The wider question is, should people who currently use ASCII and care a lot about performance
and don't care about i18n switch to UTF-8?

It would certainly simplify life if all strings were UTF-n.
I have had to deal with BSTR, CString, QString and others.
Just having to deal with a single string type would be a good thing.

--Johan Råde


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk