Boost logo

Boost :

Subject: Re: [boost] [locale] Review
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-19 14:30:53


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
>
> > Can you please state explicitly what things are
> > conditional for the library to get to mini-review.
>
> I don't know, I just read some more of the other reviews,
> and no one seems to have the same concerns as me, therefore
> I do not know if my demands are justified.
>
>
> > Boost.Locale is build from very practical
> > point of view - primary usability. I don't
> > try to solve all possible corner cases, especially
> > when they are not central part of localization
> > task.
> >
> > I'd rather prefer spending more time on adding
> > specific missing features useful for localization
> > over providing stuff like iterator based interface
> > for some operations that usually executed
> > on small chunks of text.
>
> I think that for a Boost library, the API can be more
> important than the features themselves.
>

As you know, the API is a subject to change
in Boost, and I would say that it is
changed toooo freely and toooo often.

However in case of Boost.Locale I think
more changes will come but they
would extend the functionality.

However everything really depends on
needs.

> >> Indeed, the Boost.Locale rationale clearly
> >> says that it only takes std::basic_string
> >> and const char* as input because that's what
> >> the backend libraries expect anyway.
> >>
> >
> > Not only, there are more reasons for this:
> >
> >
> >
>http://cppcms.sourceforge.net/boost_locale/html/appendix.html#why_linear_chunks
>
> I can't find any more reasons than what I said.
> Am I missing something?
>

Yes:

   Most of supported operations on text like collation,
   case handling usually work on small chunks of text

I can accept that some operations may be
better to work on arbitrary streams but
most of them just don't need it.

For example collation... When exactly
do you need to compare arbitrary
text streams?

> > To be honest I don't think that iterators
> > especially over non-UTF charsets are good
> > idea.
> >
> > I'd rather accept a concept of stream
> > that you read from and write to:
> >
> > 1. It is buffered
> > 2. It is fast and can work with
> > big chunks.
> > 3. It can be passed transparently
> > and ABI stable to any backends.
> >
> > So I would rather see implementation of
> > generic algorithms over I/O streams rather
> > then iterators.
>
> Ok. I had the feeling that iterators were more popular
> than streams, but streams are fine too; they're very
> close concepts.
>

I'll reveal you the plan that I'm going to provide
stream like interface for search algorithm.

Unfortunately it is really non-trivial, not
because it is not possible but rather
because there is no enough time to implement
it.

Basically what I need is to implement
icu::CharacterIterator over seekable
std::istream

The biggest problem is that CharacterIterator
requires UTF-16 indexing and this is
not-so simple problem.

So I left this out of the scope of the
current version of Boost.Locale and I'll
add it in future.

I think it is doable but it requires
me to look deep into ICU sources
because it is not really documented
how to do it.

Once it will be done, it would be
possible to implement same strategy
for boundary analysis and probably
some others as well (normalization I think)

So basically user would provide I/O streams
for this purpose (of course with some
nice wrappers for common cases)

But it would not happen now, as it requires
quite a lot work and testing.

This is basically what I had said in the lines
in the documentation.

   When new API is introduced into Boost.Locale
   in future, such that it likely works on large
   chunks of text, will provide an interface for
   non-linear text handling.

However the only API that really
needs this is search api, for some
others like segmentation it is nice to
have. For others like collation
it is not needed at all.

So this is a quick glance to the
future that currently only on
planning stage and is not a part
of the review.

I thought to provide stream API
for charset conversion but put it
on hold as it is not really a
central part, especially when
codecvt it there.

> >
> > More specific points?
>
> I saw stuff like
>
> if(is_linear_iterator<SomeType>::is_linear)
> ....
>
> Additionally, this particular trait is not canonical
> (should be ::value, not ::is_linear), is not righly
> named (it's contiguous, not linear), and could have
> other uses outside of Boost.Locale, so it should be in its own file.

Such things should appear in code review.

I'm really welcome to learn how to do things better.
The question is whether such changes are critical or not.

> >> While most of the backends are separated,
> >> it is not the case for encoding/codepage.cpp,
> >> which is shared by all backends and contains ifdefs.
> >> This is not a very nice extension point.
> >>
> >
> > Actually not codepage only.
> >
> > - Formatting (locale::format)
> > - Locale information
> > - Message Formatting (gettext)
> > - codecvt facets (not details)
> >
> > And more...
> >
> > So there are lots of shared parts.
>
> I meant shared parts which are not really shared,
> i.e. a file that is shared but that contains completely
> different code chosen using #ifdefs.
>

Not exactly, it consists of fallbacks,
if for example iconv does not support windows-xyx
uconv would be used.

It allows to merge several libraries in way
they backup each other.

Take a deeper look to the section.

It is different from backend selection.
 
> >> I'm also a bit concerned about the generalized
> >> usage of virtual functions anywhere;
> >> it seems unnecessary in quite a few places,
> >> please prefer using templates when dynamic binding is not required.
> >
> > More specific points?
>
> Couldn't boost::locale::util::base_converter be a concept rather than an
>abstract base class?
>

Because there is no need to duplicate
a complex code via template metaprogamming
if a simple function call can be made.

Don't overrate template metaprogramming
and don't underestimate virtual functions.

>
> >> A lot of new and vectors too, I'd prefer if ideally
> >> the library never allocated anything.
> >
> > I'm sorry but it is just something
> > that can't and would never happen.
> >
> > This request has no reasonable base especially
> > for such complex topic.
>
> Usage of templates instead of inclusion polymorphism
> would allow to avoid newing the object and using a smart
> pointer, for example.
>

I'm not sure what exact location bothers use but
anywhere (unless I miss something)
there are minimal data copying, and I relate heavily
on RVO.

If you see some not-required copying tell me.
 
> Plus the instances of allocation in the
> boundary stuff (when you build an index
> vector and when you copy into a new basic_string)
> appears to be unnecessary.
>

More specific location? I don't remember
such thing, I just need better pointers
to answer.

> >> I hope my review will be well received
> >> and that you will consider all of my "feature requests".
> >> Your library is practical and can get things done for sure,
> >> but maybe it's not quite as good as it could be in terms of
> >> interface yet.
> >>
> >
> > The interface is a part of the design and have
> > designed to be simple and covinent rather then
> > fully generic.
>
> Passing an arbitrary range to the functions is convenient, forcing
> the user to copy it to a vector and pass the begin and end pointers
> before it can call the functions isn't.
>

Where exactly, because I need better context as above.

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk