Boost logo

Boost :

From: Baptiste Lepilleur (blepctc_at_[hidden])
Date: 2002-09-09 17:31:27


----- Original Message -----
From: "Craig Henderson" <cdm.henderson_at_[hidden]>
To: <boost_at_[hidden]>
Sent: Monday, September 09, 2002 9:59 PM
Subject: [boost] Re: Container Difference proposal

> "Rozental, Gennadiy" <Gennadiy_at_[hidden]> wrote in message
> news:1373D6342FA1D4119A5100E029437F6401169E1A_at_clifford.devo.ilx.com...
> >
> > In most part I disagree with what you proposing here.
> >
[...]
> > 2. What is change_set is it some special type?
>
> No, it's just a container that the algorithm populates with the changes.
The
> actual definition is:
>
> template<typename T>
> class lcs_change : boost::noncopyable
> {
> private:
> std::size_t rec_num_; // record number
> record_type type_; // record type (see enum above)
> const T &data_; // record data

Hmm, are all this data really needed ? Couldn't lcs_change be reduced to
record_type ? It should be possible to recompute other using the original
sequence. Unless I'm mistaken, the current scheme is not very space
effecient (rougly 12 bytes per change). If you are comparing large text on a
character basis that easily size around 1Mo, then the *12 factor became
really meaningful.

The 'computed' data could be provided by for convenience by sugar coat
iterator (see by the end of the mail).

[...]

> template<typename Seq, typename Res=std::list< lcs_change<Seq::value_type>
*> >
> class longest_common_subsequence : boost::noncopyable

It seems that it should really be:

template<typename ValueType, typename ResultSequence = std::deque<
lcs_change<ValueType> > >
class longest_common_subsequence : boost::noncopyable

I believe deque offer a better memory growth scheme (page based). Making a
double liked list of pointers on lcs_change seems overkill. Or, does the
algorithm need to insert/remove change in the middle of the result set
(which would explain the choice of slist) ?

What is the rational for making lcs_change and longest_common_subsequence
non copyable ? There does not seem to be anything wrong by allowing them to
be copyable.

[...]

> > 4. Why are you using pointers and not a references?
>
> I like to use a const reference for input parameters and a pointer for
> output parameters, and place in the input parameters before the output
> parameters. This makes for a clean interface which provides more
information
> to the user about the use of parameters.

Boost follows Scott Meyers "quality programming practices" given in his
"Effectice C++" series (see guideline on boost page). Among others, there is
a guideline concerning the use of pointer and reference, which, if I
remember well, goes down to this: "use pointer only when you need to
represent a NULL pointer, otherwise use reference."

Indicating input and output can usually be better communicated through
parameter name, type or convention (STL). Below, OutIt makes it clear that
it is an output iterator.

> > Here how I invision it
> >
> > template<typename InIt1, typename InIt2, typename OutIt >
> > void
> > longest_common_subsequence( InIt1 first_begin, InIt1 first_end,
> > InIt2 second_begin,
> > OutIt& res_begin );
> >
> > template<typename InIt, typename DiffIt, typename OutIt >
> > void
> > apply_diff( InIt input_begin, InIt input_end, InIt diff_begin, OutIt
> > res_begin );
>
> Yep, pretty much my though if I get rid of the class.

I actually prefer this interface: it keeps the algorithm interface as simple
as possible and mesh very well with existing STL algorithm.

It makes it less clumsy to add functions to extend the interface, and allows
for automatic type deduction. It also keep the algorithm apart from sugar
coat functions like apply_diff (by the way, what is it use ? Since
lcs_change stores a reference on the original data, The second sequence
obviously still need to exist for apply_diff to work...)

Also, it is a lot more flexible: I can do some really twisted thing in the
assigment operator of res_begin if I want to. Suppose that lcs_change have
been simplified down to record_type, I could store the change in a bitset
using only 2 bits, or recompose a convenient sequence change (as in the
current lcs_change) on the fly, or do some other processing right away,
without even needing to store the 'changes'.

In addition, depending on iterators and not sequences provides more
flexibility. LCS may be used by another algorithm which may only have
iterators as input!

Keeping the algorithm interface as simple and as generic as possible is I
think a very important element (Guru of the week #84,
http://www.gotw.ca/gotw/084.htm, helped me a lot understanding that aspect).
Helper functions can then easily be added to make the interface friendlier
if need be.

> It looks like we are fairly close on our thoughts here. Thanks for your
> interest and all your comments. Like I said, I'll put the code and docs in
> the sandbox asap. Any more thoughts, ideas or comments, please post them
:-)

I look forward to it,
Baptiste.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk