Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] GSoC Unicode library: second preview
From: Artyom (artyomtnk_at_[hidden])
Date: 2009-06-20 12:11:08

Next message: Ilya Bobir: "Re: [boost] DateTime conversion to tm problem?"
Previous message: troy d. straszheim: "[boost] GSOC unicode and py3k"
Maybe in reply to: Mathias Gaunard: "[boost] GSoC Unicode library: second preview"
Next in thread: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Reply: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Reply: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"

Hello,

> Here is the documentation of the
> current state of the Unicode library that I am doing as a
> google summer of code project:
> http://blogloufoque.free.fr/unicode/doc/html/
[snip]

Where is the source code?

....

Some notes:

> UTF-16 ... This is the recommended encoding for dealing with
> Unicode internally for general purposes

To be honest, it is most error prone encoding to work with Unicode:

1. It is variable length encoding
2. There surragate charrecters are quite rare and thus it is very
hard to find bugs related with it.

It was mostly born as a "mistake" at the beggining of the unicode
when it was beleved that 16bit is enough for signle code point.
So many software platforms adopted 16 bit encoding that supported
only BMP, As a result you can **easily** find **huge** amount of
bugs in the code that uses utf-16, In most of cases such bugs
are hard to track because these code points are rare.

For example, try to edit file-name in Windows with a charrecter that
not in BMP you would see that you need to press "delete" twice, try
to write such charecter in Qt3 application... that would just not work;
There are many examples of it.

So, I would be aware of recommending this encoding as internal encoding,
just because many platforms use it.

> UTF-32 ... This encoding isn't really recommended

As I mentioned above, it is not quite true, it is much safer encoding
to work with,

So I would recommend not to write such "suggestions".

More notes:
-----------

- For boundary checks I'd suggest to use ICU or Qt4 like API: iterate
over string and return each time next bound. Not check if there is
a bound on specific character.

- Examples and More description is required

Artyom

Next message: Ilya Bobir: "Re: [boost] DateTime conversion to tm problem?"
Previous message: troy d. straszheim: "[boost] GSOC unicode and py3k"
Maybe in reply to: Mathias Gaunard: "[boost] GSoC Unicode library: second preview"
Next in thread: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Reply: Scott McMurray: "Re: [boost] GSoC Unicode library: second preview"
Reply: Mathias Gaunard: "Re: [boost] GSoC Unicode library: second preview"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk