|
Boost : |
From: Sean Parent (sparent_at_[hidden])
Date: 2006-07-05 20:39:29
I don't have enough time to delve deeply into this thread but I
thought I'd make a few passing comments.
Adobe has a fairly major string class problem (we joke that every
project must have it's own string class - which is nearly true).
There isn't such thing as a single type of string - there are _many_
purposes and you need to be able to handle things like language and
style runs and large, large blocks of text with efficient edits, UI
substations (which are aware of things like split negation and
masculine/feminine forms), language based ordering, different
encodings...
We need another string class like a hole in the head.
What we do need - are good standard algorithms which can be applied
to any string class.
I believe this is doable with the current iterator interface.
I believe it's possible (meaning I've done some quick experiments) to
define an input iterator (actually as strong as a non-mutating
forward iterator) and output iterator, which do conversions. This
means that you can define operations in terms of unicode encoding
(though some operations such as ordering may still require a locale).
Consider -
to_lower(first, last, output)
to_upper(first, last, output)
such transformations can work with any encoding (you can uppercase
UTF-8 into UTF-32). They can't work in-situ (but I don't think
to_upper or to_lower really can work in-situ - certainly not in UTF-8
and probably not in UTF-16, and I believe there are some multi-
character forms that even break in UTF-32...). It is possible though
to wrap them with a replace function for in-place operations.
The current std::find() will work with such iterator adapters to find
single UTF-32 character (in any encoded sequence).
Currently with ASL we're taking such an approach for localization
strings (replacing an existing string class for localized strings at
Adobe with a small set of functions and _any_ string class (any
sequence of code units), including std::string, std::vector (or deque
or list).
You might take a look here for some ideas: <http://
opensource.adobe.com/group__asl__xstring.html>.
Sean
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk