Boost logo

Boost :

From: Nemanja Trifunovic (nemanja_trifunovic_at_[hidden])
Date: 2006-12-06 08:35:39


Hello Rogier, Thanks for your comments. 1) Iterators, or rather itarator adapters. I believe the iterators should be built on top of these functions. In fact, I am already developing them in the version 2 of the library (see here for the latest snapshot: http://utfcpp.svn.sourceforge.net/viewvc/utfcpp/v2_0/source/ ). However, I see some other iterator implementations, and would rather start with this free functions until we decide the best design for the iterators. 2) IO - currently it is out of the scope of this library. If I enough people agree with you, this may change, but currently I have no plans for IO. Honestly, I dislike C++ standard IO and would love to avoid it if possible :) 3) Tables for data: As I replied to Hervé, my test cases showed a version with the table to run slower (with two different compilers). I will investigate it further, though, since I agree it is not very logical. 4) A string type. There are way too many C++ string types out there already, and I wanted to provide a tool for making them work with UTF-8 encoding, rather than introducing yet another string class. Probably the same philosophy as Boost String Algorithms http://www.boost.org/doc/html/string_algo.html Best, Nemanja Trifunovic ----- Original Message ---- From: Rogier van Dalen <rogiervd_at_[hidden]> To: boost_at_[hidden] Sent: Wednesday, December 6, 2006 5:11:17 AM Subject: Re: [boost] UTF8 library - second call for informal review Dear Nemanja, On 12/5/06, Nemanja Trifunovic <nemanja_trifunovic_at_[hidden]> wrote: > This is the second call for the informal review of the UTF8 library. It is based on verson 1.02 of UTF8-CPP: http://utfcpp.sourceforge.net/ and you can find it at I like the functions you provide, and the "unchecked" namespace. Unlike Hervé, I do think exceptions are the way to go. I seem to miss a couple of things though. In a recent discussion on this list there seemed to be a preference for using iterators, which can be composed, for example to perform UTF-8->UTF-16 conversion, or conversions to other codepages. Iterators can be much more flexible than these free functions. Is there any particular reason why you do not include similar functions for UTF-16? One of the most important uses for UTF must be IO. Shouldn't a utf_codecvt be part of the library? Hervé is right: reading UTF-8 can be optimised a lot using tables with data. I've got an implementation lying around that I'd be happy to share. It took 30% less time than the straightforward implementation and it did all the necessary checks. The final thing is, your functions try to maintain strings with of valid UTF-8. Why not provide a string type that maintains this variant? Conclusion: in my opinion a lot of things are missing from the library at the moment. Regards, Rogier _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost ____________________________________________________________________________________ Do you Yahoo!? Everyone is raving about the all-new Yahoo! Mail beta. http://new.mail.yahoo.com


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk