On Tue, Jun 2, 2020 at 7:44 PM Glen Fernandes via Boost-users <boost-users@lists.boost.org> wrote:
>
> The Boost formal review of Zach Laine's Unicode library, Text,
> commences on June 11th and concludes June 20th. We welcome the
> participation of Boost developers and users, and the C++ community.

Here is my review of Text in the form of a list of notes/questions, in no particular order.

Reading through the documentation, everything makes sense. The motivation for each type and operation is easy to understand.

I welcome the use of signed int as index/size type, but I wonder if this would interact in some negative way when mixed with similar types from the standard library. I don't think so, but who knows.

I don't like the multi-page format of the documentation. I'd prefer a single HTML page for easy searching.

What is the purpose of the throw_on_encoding_error type here? https://github.com/tzlaine/text/blob/b8bb1a1101e7cf53b460671a3ead19516e7636fe/include/boost/text/transcode_iterator.hpp#L35

I searched and couldn't find any place in the documentation that explains how ErrorHandler is used. Looking at the source code, it's just a function, but its interface should be documented.

Generally I dislike interfaces with configurable error handling but it appears that in the case of the conversion iterators this is warranted: both replacement character and throwing are useful. I wonder though if it would be better to delete this template parameter and always output 0xfffd, and in addition provide an iterator adaptor type which can be used to wrap e.g. an output iterator, and which throws if it sees 0xfffd being output.

Both .c_str() and .data() should be added to string for compatibility. Also, it is not true that s.begin() is equivalent to s.c_str(), because in debug iterators should not be pointers. Perhaps &*s.begin() is equivalent, but this should not be valid on an empty string (assert), null termination notwithstanding.

The note about passing ropes by value, I think, is redundant. Logically, copying a rope gets you an independent copy, the fact that internally the copies share state is an implementation detail. Therefore, the documentation can just specify that a rope is as thread-safe as an int.

Speaking of thread-safety, where are the thread-safety guarantees of rope documented? Where are the relevant tests?

Speaking of tests, I suggest separating the generated test sources in a subdirectory so things like test/string.cpp are easy to find. Not to mention github.com truncates the test directory to 1000 files.

The grammar seems wrong here; 'I don't know if you've ever written an undo system that can do, undo, or redo any command you can think of, in less than 40 lines of code, but there one is."

"Every text editor you've ever used that handles large files gracefully has done something to break the file into chunks" What about the buffer gap algorithm? I think Borland editors used this way back in the day. :)

In conclusion:

I spent a couple of hours studying the documentation and parts of the source code.

I vote to ACCEPT Text into Boost.