Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter
From: Artyom Beilis (artyomtnk_at_[hidden])
Date: 2011-08-11 07:03:45

Next message: Stewart, Robert: "Re: [boost] [Containers] Review"
Previous message: Vadim Stadnik: "Re: [boost] [Autosave] Re: [math][accumulators] Empirical distribution function"
In reply to: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Next in thread: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

>From: Soares Chen Ruo Fei <crf_at_[hidden]> >To: boost_at_[hidden] >Sent: Tuesday, August 9, 2011 10:53 AM >Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter > >My post has probably slipped through the radar so I'm just going to >bump this post again. Please feel free to criticize if you think that >my library has any fundamental design flaw. > As a student and GSoC >participant, I think the most important thing is for me is to learn >what I did wrong in the project so that I will not repeat the same >mistake, and also to allow me to gain enough experience so that I can >really give useful contribution to the open source community in >future. > >Any feedback is really much appreciated. Thanks. > Hello, First of all I want to tell that I'm as the author of Boost.Locale library have very strong opinion on how strings and Unicode should be handled. My strong opinion is: a. Strings should be just container object with default encoding and some useful API to handle it. b. Default encoding MUST be UTF-8 c. There are several ways to implement strings COW, Mutable, Immutable, with small string optimization and so on. This way or other std::string is de-facto string and I think we should live with it and use some alternative containers where it matters. d. Code point and code unit are meaningless unless you develop some Unicode algorithm - and you don't - you use one written by experts. So my biggest problem is motivation: ----------------------------------- > The main reason that Boost.Ustr is developed is because current > raw string types such as std::string requires developers to make > assumption on the encoding of the string content, such as UTF-8 > for std::string. This creates inconsistency when a string passed > to library APIs has different encoding from the library expects. This Ustr does not solve this problem as it does not provide really some kind of adapter<generic encoding> { string content } This is some kind of thing that may be useful, but not in this case. Basically your library provides wrapper around string and outputs Unicode code points but it does it for UTF encodings only! It does not benefit too much. You provide encoding traits but it is basically meaningless for the propose you had given as: It does not provide traits for non-Unicode encodings like lets say Shift-JIS or ISO-8859-8 BTW you can't create traits for many encodings, for example you can't implement traits requirements: http://crf.scriptmatrix.net/ustr/ustr/advanced.html#ustr.advanced.custom_encoding_traits For popular encodings like Shift-JIS or GBK... Homework: tell me why ;-) Also it is likely that encoding is something that can be changed in the runtime not compile time and it seems that this adapter does not support such option. > The problem mainly arise because there are a small minority of > developers who use different encoding for the same string type. If someone uses strings with different encodings he usually knows their encoding... The problem is that API inconsistent as on Windows narrow string is some ANSI code page and anywhere else it is UTF-8. This is entirely different problem and such adapters don't really solve them but actually make it worse... Other problem is ================ I don't believe that string adapter would solve any real problems because: a) If you iterate over code points you are very likely do something wrong. As code point != character and this is very common mistake. b) If you want to iterate over code points it is better to have some kind of utf_iterator that receives a range and iterate over it, it would be more generic and do not require to have an additional class. For example Boost.Locale has utf_traits that allow to implement iteration over code points quite easily. See: http://svn.boost.org/svn/boost/trunk/libs/locale/doc/html/namespaceboost_1_1locale_1_1utf.html http://svn.boost.org/svn/boost/trunk/libs/locale/doc/html/structboost_1_1locale_1_1utf_1_1utf__traits.html And you don't need any kind of specific adapters. c) The problem in Boost is not missing Unicode String and it is not even required to have yet-another-unicode-string that we have good Unicode support. The problem is policy the problem is Boost just can't decide once and forever that std::string is UTF-8... But don't get me wrong. This is My Opinion, many would disagree with me. ================================= Bottom line, Unicode strings, cool string adapters, UTF-iterators and even Boost.Unicode and Boost.Locale would not solve the problems that Boost libraries use inconsistent encodings on different platforms. IMHO: the only way to solve it is POLICY. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/

Next message: Stewart, Robert: "Re: [boost] [Containers] Review"
Previous message: Vadim Stadnik: "Re: [boost] [Autosave] Re: [math][accumulators] Empirical distribution function"
In reply to: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Next in thread: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Daniel James: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"
Reply: Soares Chen Ruo Fei: "Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String Adapter"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk