Boost logo

Boost :

Subject: Re: [boost] [locale] Review
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-19 11:13:24


On 18/04/2011 14:24, Artyom wrote:

> Can you please state explicitly what things are
> conditional for the library to get to mini-review.

I don't know, I just read some more of the other reviews, and no one
seems to have the same concerns as me, therefore I do not know if my
demands are justified.

> Boost.Locale is build from very practical
> point of view - primary usability. I don't
> try to solve all possible corner cases, especially
> when they are not central part of localization
> task.
>
> I'd rather prefer spending more time on adding
> specific missing features useful for localization
> over providing stuff like iterator based interface
> for some operations that usually executed
> on small chunks of text.

I think that for a Boost library, the API can be more important than the
features themselves.

>> The main reason for my decision is the
>> interface that the library provides:
>> albeit it aims at abstracting the
>> backend used to implement the
>> facilities, it exposes too much of
>> how those facilities are implemented,
>> which prevents benefiting from future
>> backends that may not have the same
>> limitations as the current backends do.
>>
>> Indeed, the Boost.Locale rationale clearly
>> says that it only takes std::basic_string
>> and const char* as input because that's what
>> the backend libraries expect anyway.
>>
>
> Not only, there are more reasons for this:
>
>
> http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_linear_chunks

I can't find any more reasons than what I said.
Am I missing something?

> When Boost.Unicode would be ready I'm sure its users
> would be glad to use its own native interface where
> it is needed.
>
> So do we need more generic API that would never be used?

Maybe they could want another backend for various reasons.
- ICU is more well-tested
- ICU is more feature-rich
- Boost.Unicode's behaviour is not tailored for particular locales
- Boost.Unicode is written by a crazy frenchman that they do not trust
- ICU may already be installed while Boost.Unicode not
- etc.

>
> To be honest I don't think that iterators
> especially over non-UTF charsets are good
> idea.
>
> I'd rather accept a concept of stream
> that you read from and write to:
>
> 1. It is buffered
> 2. It is fast and can work with
> big chunks.
> 3. It can be passed transparently
> and ABI stable to any backends.
>
> So I would rather see implementation of
> generic algorithms over I/O streams rather
> then iterators.

Ok. I had the feeling that iterators were more popular than streams, but
streams are fine too; they're very close concepts.

> Without it I wouldn't develop the library
> in first place.

I never said to get rid of the stable API, I was just asking if you were
willing to consider giving up the idea of the binaries of the library
backends being swappable at (a potentially runtime) link-time.

i.e. backends would be statically instead of dynamically selected.

But if not, then it's fine, but that does give quite a few limitations
to what the library can do in terms of template stuff.

> What is important that std::codecvt is part
> of C++ standard and thus any library that uses
> this standard would enjor the full power
> of Boost.Locale without even depending on it.

While I agree that support for std::codecvt is important, why not make a
new interface similar to it that doesn't have the same state problems,
and tell users to use that instead when possible?

> I may agree that it is not perfect, but giving
> limitations of current state-of-the-art implementations
> in ICU it is yet reasonable one.

ICU does not have such limitations.
  - BreakIterator allows to find boundaries as you iterate the text
  - Even if ICU couldn't do that, there is no reason to return in the
operator* of your token_iterator basic_string<ValueType>(begin, end) and
copy the data, you could just return iterator_range<IteratorType>(begin,
end).

Also, while I'm at it, break_iterator really looks like a special case
of boost::filter_iterator.

>> some compile-time values tested at runtime,
>> some things that could have reused existing
>> utilities from Boost,
>
> More specific points?

I saw stuff like

if(is_linear_iterator<SomeType>::is_linear)
    ....

Additionally, this particular trait is not canonical (should be ::value,
not ::is_linear), is not righly named (it's contiguous, not linear), and
could have other uses outside of Boost.Locale, so it should be in its
own file.

>> While most of the backends are separated,
>> it is not the case for encoding/codepage.cpp,
>> which is shared by all backends and contains ifdefs.
>> This is not a very nice extension point.
>>
>
> Actually not codepage only.
>
> - Formatting (locale::format)
> - Locale information
> - Message Formatting (gettext)
> - codecvt facets (not details)
>
> And more...
>
> So there are lots of shared parts.

I meant shared parts which are not really shared, i.e. a file that is
shared but that contains completely different code chosen using #ifdefs.

>> I'm also a bit concerned about the generalized
>> usage of virtual functions anywhere;
>> it seems unnecessary in quite a few places,
>> please prefer using templates when dynamic binding is not required.
>
> More specific points?

Couldn't boost::locale::util::base_converter be a concept rather than an
abstract base class?

>> A lot of new and vectors too, I'd prefer if ideally
>> the library never allocated anything.
>
> I'm sorry but it is just something
> that can't and would never happen.
>
> This request has no reasonable base especially
> for such complex topic.

Usage of templates instead of inclusion polymorphism would allow to
avoid newing the object and using a smart pointer, for example.

Plus the instances of allocation in the boundary stuff (when you build
an index vector and when you copy into a new basic_string) appears to be
unnecessary.

Functions that allocate behind your back are frowned upon, because you
have no mean of controlling how that memory is allocated or managed.
It limits the application range of the library. But then that could be
considered reasonable if the benefits outweigh the drawbacks; I do not
think this is necessarily the case here.

>
>>
>> I hope my review will be well received
>> and that you will consider all of my "feature requests".
>> Your library is practical and can get things done for sure,
>> but maybe it's not quite as good as it could be in terms of
>> interface yet.
>>
>
> The interface is a part of the design and have
> designed to be simple and covinent rather then
> fully generic.

Passing an arbitrary range to the functions is convenient, forcing the
user to copy it to a vector and pass the begin and end pointers before
it can call the functions isn't.

See, genericity is convenient.

> It can't be Boost.Locale's backend as at least
> in current state it is locale agnostic
> and misses most important feature localization is actually
> required - CLDR - Common Locale Database Repositry

It would have to be a partial backend (used just for conversion between
UTF encodings, normalization and boundary analysis), and you could fill
the blanks using another backend.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk