Boost logo

Boost :

Subject: Re: [boost] [locale] Review
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2011-04-20 07:35:55


On 19/04/2011 22:40, Artyom wrote:

> Text processing, not localization,
> apart there is a stream charset
> conversion...

What is localization if not text processing?

> I mean binary code.
>
> When you have
>
> template<typename Type>
> class foo {
> void bar() { something type independent }
> }
>
>
> And then use:
>
> foo<char>
>
> and
>
> foo<wchar_t>
>
> bar would be eventually duplicated in binary
> code as
>
> void foo<char>::bar();
> void foo<wchar_t>::bar();

Except that foo<T>::bar() would mostly just call T::baz() and
T::whatamacallit(), so it will be completely different for T or U.

Generating very small functions that keep jumping and doing indirections
between each other is even worse than duplicating a bit of binary code.
Inlining gives very good performance benefits.

It appears that you are reluctant to template usage, because you're not
comfortable with templates techniques and you still believe age-old myths.
Some people here have deployed template code in very constrained
environments, such as microcontrollers with only a few KBs of memory,
and it works just fine.

I don't think that kind of template fear is positive for a Boost
library. But then why not, maybe Boost libraries really overuse
templates as some claim.

> This would not happen. It is not fancy
> header only library that does some small
> functions character by character.
>
> This library uses a dozen of various APIs...
> Do you really think it is possible to
> do it without a single new?

Yes, no problem at all.
As I said, just take an output iterator or a container that you append
to. This way the allocation is handled by whatever the user gives for
the output.

> And BTW most of them
> are called for locale's facets generation,
> basically once locale initialized....

Yes, those are required due to your design decisions of using the C++
locale facet system; but that may not be the case for all usages.

Note those are just general remarks, this is not a major problem. But it
would definitely be best if you could reduce places where allocations
happen to a minimum.

> Yes? So how would you return a string? I don't see there
> any unexpected allocations.

I shall repeat the fix then. Return an iterator_range (similar to a pair
of pointers) instead of a basic_string.

> Yes, it is simple to write
>
> template<typename Input,typename Output>
> Output bad_to_upper(Input begin,Input end,Output out,std::locale const&l)
> {
> typedef std::ctype<typename Input::value_type> facet_type;
> while(begin!=end)
> *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++;
> }
>
> But it does not work this way because
> to_upper needs entire chunk and not arbitrary
> character at every point.
>
> You need to call some virtual function on some
> range it does not even know what Iterator is...

That's a backend limitation, that you may be able to overcome or not.

If the backend really needs a contiguous memory buffer, then just copy
the range to a contiguous memory buffer. I don't see where the problem is.

I'm just trying to make your library easier to use. I may not have a
pointer to it handy, and generating one could cost a lot of lines of
code, which would make your library a bit annoying to use.

> So you are tring to apply techniques that
> does not belog here.
>
> Why because you need either to:
>
> template<typename Input,typename Output>
> Output a_to_upper(Input begin,Input end,Output out,std::locale const&l)
> {
> typedef typename Input::value_type char_type;
> typedef boost::locale::convert<char_type> facet_type;
> std::vector<char_type> input_buf;
> std::copy(begin,end,back_insterer(temporary_buf));
> std::basic_string<char_type> output_buf
> = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size());
> std::copy(output_buf.begin(),output_buf.end(),out);
> }
>
>
> But it does two allocations!$@R$%#!
> Not good.

That could be the general case, yes (though it would be best if to_upper
could directly take a buffer to write to rather than return a basic_string).
Then you would specialize the cases where either input or output are
contiguous iterators to avoid those copies.

I suppose you could also do something like.

char_type output_buffer[buffer_size];
while(input_begin != input_end)
{
    char_type* output = output_buffer;
    status = use_facet<whatever>(l).to_upper(input_begin, input_end,
input_begin, output_buffer, output_buffer + buffer_size, output);

    if(status == ...) // do something smart in case of error or incomplete

    std::copy(output_buffer, output, out);
}

Zero memory allocation needs to happen from the library itself with that
kind of design.

> So lets create a some virtual iterator:

If you want that kind of thing, use type erasure. This is the same
technique used by any or function.
It allows a single type to contain any other type and dispatch each of
it member functions dynamically as long as it models a particular concept.

For iterators, google for any_iterator.

> But, hey!#%#$%#4
>
> For each character I call virtual function WOW
> the cost is too big!

Indeed, this is probably why any_iterator never became popular.

But note codecvt facets can have exactly that same problem. At least
Dikumware STL calls codecvt facets functions character per character.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk