Boost logo

Boost :

Subject: Re: [boost] [locale] Review
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-19 16:40:51


> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
> > I can accept that some operations may be
> > better to work on arbitrary streams but
> > most of them just don't need it.
> >
> > For example collation... When exactly
> > do you need to compare arbitrary
> > text streams?
>
> Because the data does not exist in memory, may be computed on the fly,
> or whatever really.
> A possible application is simply to chain operations in a pipeline, i.e.
> without having to apply one operation completely, then the other on that
> result, etc (and do the intermediate temporary buffer allocations).
>

Pipeline and collation?

Either I don't get you or we have too different
points of view.

Not every programming concept is about stream
processing, especially collation where you sort
two Units of data, where each unit is a whole
part.

But lets live it behind because I don't
see that we would get anywhere

> > I thought to provide stream API
> > for charset conversion but put it
> > on hold as it is not really a
> > central part, especially when
> > codecvt it there.
>
> I believe it *is* the very central part of any text processing system.
>

Text processing, not localization,
apart there is a stream charset
conversion...

> > Take a deeper look to the section.
> >
> > It is different from backend selection.
>
> If I want to add a backend, I only want to add a new repository with the
> implementation for that backend. I do not want to have to hack all
> shared files by adding some additional ifdefs.
>

It is different from localization backend and utility
that converts one encoding to other.

But I see your point.

> > Because there is no need to duplicate
> > a complex code via template metaprogamming
> > if a simple function call can be made.
>
> This sentence doesn't make any sense to me.
>
> Template meta-programming is not a mean to duplicate code.
> Nor is normal template usage, which is what I suggested instead of
> virtual functions, template meta-programming.
>

I mean binary code.

When you have

template<typename Type>
class foo {
   void bar() { something type independent }
}

And then use:

  foo<char>

and
  
  foo<wchar_t>

bar would be eventually duplicated in binary
code as

  void foo<char>::bar();
  void foo<wchar_t>::bar();

Regardless the fact it does the same job.
And finally you get huge executables that
basically copy same things around.

> >>>> A lot of new and vectors too, I'd prefer if ideally
> >>>> the library never allocated anything.
> >>>
> >>> I'm sorry but it is just something
> >>> that can't and would never happen.
> >>>
> >>> This request has no reasonable base especially
> >>> for such complex topic.
> >>
> >> Usage of templates instead of inclusion polymorphism
> >> would allow to avoid newing the object and using a smart
> >> pointer, for example.
> >>
> >
> > I'm not sure what exact location bothers use but
> > anywhere (unless I miss something)
> > there are minimal data copying, and I relate heavily
> > on RVO.
>
> I didn't say copying, I said allocation and usage of new.
> grep -r -F "new " * should give you the exact locations.
>

This would not happen. It is not fancy
header only library that does some small
functions character by character.

This library uses a dozen of various APIs...
Do you really think it is possible to
do it without a single new?

And BTW most of them
are called for locale's facets generation,
basically once locale initialized....

If you would really had run this grep and seen
each use case of them you wouldn't even
write this "grep" sentence

> >
> > If you see some not-required copying tell me.
> >
> >> Plus the instances of allocation in the
> >> boundary stuff (when you build an index
> >> vector and when you copy into a new basic_string)
> >> appears to be unnecessary.
> >>
> >
> > More specific location? I don't remember
> > such thing, I just need better pointers
> > to answer.
>
> I've been very precise. You unnecessarily allocate a new string and copy
> the contents in the operator* of token_iterator.

Yes? So how would you return a string? I don't see there
any unexpected allocations.

------------------------------------------------------

I want to say few words to summarize
because I don't see it is going anywhere

Boost.Locale is not Boost.Unicode, it behaves
differently, it thinks differently and does
many things in a way normal localization
APIs all over the world do it.

Yes, ranges in nice and important
concept for template metaprogramming,
but it is not template library and would
never be.

You can't expect from the library to provide
techniques suitable for template system.

Yes, it is simple to write

  template<typename Input,typename Output>
  Output bad_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef std::ctype<typename Input::value_type> facet_type;
    while(begin!=end)
      *out++ = std::use_facet<facet_type>(l).to_upper(*begin)++;
  }

But it does not work this way because
to_upper needs entire chunk and not arbitrary
character at every point.

You need to call some virtual function on some
range it does not even know what Iterator is...

So you are tring to apply techniques that
does not belog here.

Why because you need either to:

  template<typename Input,typename Output>
  Output a_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    typedef boost::locale::convert<char_type> facet_type;
    std::vector<char_type> input_buf;
    std::copy(begin,end,back_insterer(temporary_buf));
    std::basic_string<char_type> output_buf
      = std::use_facet<facet_type>(l).to_upper(&input_buf[0],input_buf.size());
    std::copy(output_buf.begin(),output_buf.end(),out);
  }

But it does two allocations!$@R$%#!
Not good.

So lets create a some virtual iterator:

  template<typename CharType>
  class base_iterator<CharType> {
    virtual CharType value() { return value_; }
    virtual bool next() = 0;
  protected:
    CharType value_;
  }

  template<IteratorType>
  class wrapper : public base_iterator<typename IteratorType::value_type> {
     wrapper(IteratorType begin,IteratorType end): begin_(begin),end_(end) {}
     virtual bool next() {
        if(begin==end)
          return false;
        value_ == *begin++;
     }
  private:
    IteratorType begin_,end_;
  }

Same for

  template<typename CharType>
  class base_output_iterator<CharType> { ... }

  template<IteratorType>
  class output_wrapper :

And now we rewrite our function as:

  template<typename Input,typename Output>
  Output b_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    input_wrapper<char_type> input(begin,end);
    output_wrapper<char_type> output(out);
    std::use_facet<facet_type>(l).to_upper(input,output);
    return output.value();
  }

But, hey!#%#$%#4

For each character I call virtual function WOW
the cost is too big!

$^$%^%@#$^@#%@#$%@#$%

Attempt nuber three, make virtual functions
more efficient

  template<IteratorType>
  class input_wrapper : public std::istream<typename IteratorType::value_type> {
    ...
  }
  template<IteratorType>
  class output_wrapper : public std::ostream<typename IteratorType::value_type>
{
    ...
  }

Now they are buffered and no virtual functions call and even
under the hood it may work on single memory chunk...

  template<typename Input,typename Output>
  Output c_to_upper(Input begin,Input end,Output out,std::locale const &l)
  {
    typedef typename Input::value_type char_type;
    input_wrapper<char_type> input(begin,end);
    output_wrapper<char_type> output(out);
    std::use_facet<facet_type>(l).to_upper(input,output);
    return output.value();
  }

But hey... We created to iostream object because
user wanted to do convert a string to upper...

Something really-really-really wrong here.

------------------------------------------

Template metaprograming techniuqes
just to fit there.

You may want to enforce them as much
as you can but they are and will be ugly.

Don't try to make things more fancy then
they should be especially when
it comes to text and every string
I've ever seen has something
like c_str()....

Artyom


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk