Boost logo

Boost :

From: Craig Henderson (cdm.henderson_at_[hidden])
Date: 2002-10-03 10:06:09


I'd be interested in this kind of class. I have implemented a similar type
of class using the boost CRC library to calculate the CRC of the string and
using that for comparisons. It is far less feature-rich that your
implementation and does not use any pooling yet - it is still in the early
development stage for a specific use where I needed to optimise lots of
string compares.

Given the other items on my "todo" list, it isn't going to get done anytime
soon, so I'd be happy to replace it with your submission ;-)

Some observations on the first look:
1) the begin() and end() members should be const if the return is
const_iterator, otherwise not. So
        iterator begin() const { return str().begin(); }
        iterator end() const { return str().end(); }
        const_iterator begin() { return str().begin(); }
        const_iterator end() { return str().end(); }
should be
        iterator begin() { return str().begin(); }
        iterator end() { return str().end(); }
        const_iterator begin() const { return str().begin(); }
        const_iterator end() const { return str().end(); }

2) you have defined operator<() but not operator>()

3) the operator<() uses reinterpret_cast<int> to case a pointer to an int.
a) Shouldn't this be an unsigned type?
b) Shouldn't this be static_cast<>?
c) Does it need casting at all?

4) the typedef unsigned size_type; is used to return sizes from the
encapsulated basic_string, so should be
typedef typename std::basic_string<CharT>::size_type size_type;

5) there are no revere_iterator or rbegin()/rend() methods; const/non-const,
of course :-).

6) It might be nice to have the symbol table parameters definable by the
user somehow. The INITIAL_SIZE and GROW_TIMES constants are hard coded and
not overridable, but the library user may be able to make a more informed
decision about these values that the library implementor.

7) A mute point, but the THRESHOLD member of symbol_table should be static
as it is a constant. I appreciate that the basic_immutable_string class
instantiates a static member of this class, but the static-ness should be
within the class itself, so clarity and reuse. In fact, there is no real
benefit in having it a member at all, is there?

Regards
--Craig

"Bohdan" <warever_at_[hidden]> wrote in message
news:anhcp3$qsa$1_at_main.gmane.org...
> I'd like to propose immutable string class, i.e. string that has
> immutable contents which can be pooled. IMHO immutable_string
> is better candidate for various "name" properties than std::string.
>
> cons:
> 1) less space consuming when string type can contain multiple
duplicate
> values.
> 2) faster copy, assign (string size is not significant).
> 3) faster equality comparison (string size is not significant).
> 4) hash code is calculated and stored, so std_ext::hash<
> immutable_string >
> should be very fast.
> cons:
> 1) creation of immutable string is ~2 times slower than creation
> of ::std::string, but there are still a lot of optimizations to be
> done.
>
> Rationale:
> Most often string data belongs to one of following categories:
>
> 1) value strings.
> 2) name/symbol strings.
>
> First category requires different string manipulations
> ( concatenation, substring, ...), whereas second requires
> mainly creation, copy/assignment and equality-comparison.
> First category requirements are fully covered by std::string.
> It is also possible to use same class for second category,
> but second category has less requirements hence can have
> more effective implementation.
> Some languages (C#) implement such strings on language level.
> Actually c/c++ compilers often have "string pooling" option,
> but it works only for compile time "..." strings.
>
> I've downloaded draft version to files section :
> http://groups.yahoo.com/group/boost/files/immutable_string/immutstr0.zip
>
> some benchmarks for P4(1.6Gh):
> --------------------------------------------------------------------------

--
> ---
> type std::string :
>
>                                                  bcc32(5.6)  gcc(3.1.1)
vc7
>                                                  -------------------------
--
> ---
> fill array with different strings:               0.351       0.44
> 0.42
> find tail element of array:                      0.16        0.401
> 0.471
> comparing different size strings 60000000 times: 0.24        5.137
> 7.991
> comparing equal size strings 60000000 times:     2.884       6.8
> 8.482
>
> --------------------------------------------------------------------------
--
> ---
> type immutable_string :
>                                                  bcc32(5.6)  gcc(3.1.1)
vc7
>                                                  -------------------------
--
> ---
> fill array with different strings:               0.661       0.721
> 1.051
> find tail element of array:                      0.01        0.01
> 0.05
> comparing different size strings 60000000 times: 0.221       0.12
> 0.972
> comparing equal size strings 60000000 times:     0.2         0.11
> 0.961
>
>
>
>
>
> _______________________________________________
> Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost
>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk