Boost logo

Boost Users :

From: Andy (atompkins_at_[hidden])
Date: 2006-03-07 13:29:36


Pavol Droba <droba_at_[hidden]> wrote in
news:20060307091753.GU793_at_[hidden]:

> On Fri, Mar 03, 2006 at 11:29:58PM +0100, Olaf van der Spek wrote:
> > >
>> >
>> > Well, this is a more complicated point. Small optimization you are
>> > proposing does not really make any difference. I have read a paper
>> > (I don't remember where),
>>
>> Why not? Isn't a simple compare much faster than two table lookup?
>>
>
> Ok, maybe you are right. One compare will not cost much and it can
> speedup few cases.
>
>> > where author proposes to cache the results of tolower during the
>> > comparison.
>>
>> I'm not sure how that would work, could you explain?
>>
>
> I don't remember the details. The idea was to eliminate some of the
> virual calls, that take place when dealing with locales. The algorihtm
> was described on several pages. Unfortunaltely, I forgot where I have
> seen that paper.
>
> Regards, Pavol

I believe what you are looking for is found in Effective STL by Scott
Meyers (pages 229-237). On page 236-237 he writes a case insensitive
string comparison functor that has this optimization, here it is:

struct lt_str_2 :
  public std::binary_function<std::string, std::string, bool>
{
  struct lt_char {
    const char* tab; lt_char(const char* t) : tab(t) {} bool
    operator()(char x, char y) const {
      return tab[x-CHAR_MIN] < tab[y-CHAR_MIN];
    }
  };

  char tab[CHAR_MAX - CHAR_MIN + 1]; lt_str_2(const std::locale& L =
  std::locale::classic()) {
    const std::ctype<char>& ct = std::use_facet<std::ctype<char> >(L);
    for (int i=CHAR_MIN; i<=CHAR_MAX; ++i)
      tab[i-CHAR_MIN] = (char)i;

    ct.toupper(tab, tab + (CHAR_MAX - CHAR_MIN + 1));
  }
  
  bool operator()(const std::string& x, const std::string& y) const {
    return std::lexicographical_compare(x.begin(), x.end(),
                                        y.begin(), y.end(),
                                        lt_char(tab));
  }
};

He then talks about the pros and cons of this. "This isn't a fully
general solution ..., if your were working with 32-bit UCS-4
characters." "This might be slower if you're creating an lt_str_2
function object, using it to compare a few short string, and then
throwing it away." "For any substantial use, though, lt_str_2 will be
noticeable faster"

I hope that I am allowed to reproduce this. If I am not, I humbly ask
Scott Meyers for forgiveness.

Andy


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net