|
Boost : |
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2008-02-19 07:40:28
Phil Endecott wrote:
> Felipe Magno de Almeida wrote:
>> On Fri, Feb 15, 2008 at 3:54 PM, Phil Endecott wrote:
>>> This week I
>>> have been writing some UTF-8 encoding and decoding and
>>> Unicode<->iso8859 conversion algorithms. They seem to be faster than
>>> the libc implementations which is satisfying especially as I haven't
>>> even started on the serious optimisations yet. This will be part of
>>> the strings-tagged-with-character-sets stuff that I have described
>>> before. Anyone interested?
>>
>> Sure. Though I'm most interested in all charset conversions. But the
>> most usual is enough to speed up my application *a lot*.
>
> Thanks to everyone who expressed an interest.
>
> I will attempt to have some sort of documentation and code available in
> the next few days. Pester me if I don't produce anything.
OK, the code is here:
http://svn.chezphil.org/libpbe/trunk/include/charset/
and there are some very basic docs here:
http://svn.chezphil.org/libpbe/trunk/doc/charsets/
(Have a look at intro.txt for the feature list.)
This code is not yet Boostified (namespaces, directory layout etc.)
Most of it compiles but it has hardly been exercised at all.
The functionality includes conversion between UTF-8, UCS-2, UCS-4,
ASCII and ISO-8859-*.
Things I'd appreciate feedback on:
- What should the cs_string look like? Basically everywhere that
std::string uses an integer position I have the choice of a character
position, a unit position, or an iterator - or not providing that function.
- What character sets are people interested in using (a) at the "edges"
of their programs, and (b) in the "core"?
Regards, Phil.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk