|
Boost : |
From: JOAQUIN LOPEZ MU?Z (joaquin_at_[hidden])
Date: 2008-02-02 06:57:52
Hi Alberto, thank you for your review!
----- Mensaje original -----
De: Alberto Ganesh Barbati <AlbertoBarbati_at_[hidden]>
Fecha: Sábado, Febrero 2, 2008 0:38 am
Asunto: Re: [boost] [flyweight] Review period extended to February 3
Para: boost_at_[hidden]
> Ion Gaztañaga ha scritto:
> >
> > * What is your evaluation of the design?
>
> The overall design looks very good. However, IMHO it doesn't fully
> grasp the essence of the flyweight pattern because of a major
> flaw. The flaw is the fact that an object needs to be created in
> order to check whether an equivalent flyweight exists or not.
> Consider this code:
>
> typedef boost::flyweight<std::string> fw;
>
> fw s1("foo");
> fw s2("foo");
>
> the creation of s1 requires one call to the string constructor,
> two calls to the string copy constructor and two calls to the
> string destructor. If we could use rvalue references and
> perfect forwarding, we could avoid one or maybe even two of
> the three temporaries. However insertion of a new flyweight
> should occur rarely in typical use cases, so it doesn't matter
> much. The problem is when constructing s2, a case which is
> going to occur much more often. To construct s2 the current
> implementation calls the string constructor once, the string
> copy constructor once and the string destructor twice. Even
> in this case, we may in the future avoid one temporary, but
> not the other. We are actually creating a full-blown object
> only to throw it away! That is a waste which can be get worse
> if instead of one std::string you had an UDT which is
> expensive to construct. Notice that the latter is precisely
> the scenario where the flyweight pattern is most useful!
>
> The GoF book describes a flyweight pattern where each flyweight
> can be identified through some key object that is supposed to
> be cheaper to create and manage than the final object. In other
> words, the GoF pattern acts like an std::map, whereas this
> proposal assumes that the final object acts as the key, much
> like an std::set. I agree that the proposed approach may be
> simpler to explain and to work with and also provide an
> almost drop-in replacement of the original type, but IMHO the map
> approach is far more useful, especially for "heavy" types and for
> polymorphic types (whose construction usually requires a
> potentially expensive dynamic allocation).
I'm glad you've raised an important issue here. As it happens,
during the design of the library I devoted some thinking to the
map-vs-set problem (if you allow me to call it that way) and I opted
for the current design for the following reasons:
1. You're right that instantiating a flyweight<T> will always
create a temporary T (and hopefully just one when we get
rvalue refs and variadic templates, but also see point 3 below).
But the purpose of the library is *not* to avoid constructing
expensive objects (although that would be a nice secondary
goal), it is to reduce memory consumption through object
sharing. Collateral benefits include increased performance
in object passing around. Consider the following "heavyweight"
equivalent of your snippet above:
typedef std::string str;
str s1("foo");
str s2("foo");
Introducing the flyweight library here does not avoid constructing
s2, but it results in a final situation of lower memory usage.
Construction of a flyweight<T> object is marginally more expensive
than constructing a T because of the extra factory lookup, although
this is expected to be compensated by the fact that flywewight<T>s
are subsequently cheaper to pass around than Ts. This effect can be
observed running the performance example at
http://svn.boost.org/svn/boost/sandbox/flyweight/
libs/flyweight/doc/examples.html#example5
(or see http://lists.boost.org/boost-users/2008/01/33630.php)
So, my conclusion is that flyweightizing a type results in
actual benefits, although these are not related to the avoidance
of heavy object constructions (copy construction aside).
2. Indeed GoF introduces a key type K into the pattern that
is used to retrieve the actual values of T. So, we have
a one-to-one relation K-->T, i.e. there exists a stateless
function f of the form
T f(const K&);
that can be used to construct a T from a given K. And,
additionally, K is cheaper to construct than T. This is
the map approach, right? My question now is: is this a realistic
scenario? If K is actually cheaper to construct than T and
we can univocally get the associated T from any K, why work
with Ts and not just use Ks in the first place? The only
plaussible justifications I can think of is that f() is
computationally expensive or that T is more convenient to
work with than K, but these seem (to me) not so likely
concerns --more on the second concern on point 3 below, though.
Note that I explicitly observed that f must be *stateless*, i.e.
a K object contains exactly the same information as its associated
T value. This is not the case in most usages of std::map, which
is a reason why std::maps are useful :)
I am not plainly denying the existence of sensible K-->T
scenarios, but I thought long and hard and couldn't find
any. If you can come up with one I'll be happy to know.
So, my analysis led me to conclude that the right approach
is to assume that K==T, that is, the set approach, or at
most than K and T are just different representations of the
same information.
3. That said, there is a natural evolution for the library that
would eliminate T construction under some circumstances:
C++0x containers will provide the emplace() member function allowing
for in-place construction of an inserted elements given the
ctor arguments, so eliminating the need to construct a
temporary to pass to insert() and related memfuns. When this is
available, the flyweight lib will take advantage of it, so that
fw s2("foo");
will create no temporary std::strings. In the terminology of
point 2, we have K==const char*, T==std::string. You can argue
that this seems to contradict the thesis of point 2 (there
are no sensible usa cases where K!=T), but note that this
situation (constructing a std::string from a const char*) is
only marginally useful: in a real program most std::strings do
not get constructed from compile time literals, but rather they
are computed dynamically, so the optimizations we can introduce
here will not dominate the overall performance of the program.
> > * What is your evaluation of the implementation?
>
> The implementation seems well done. The use of the "named
> parameters" idiom is very interesting.
>
> However, I strongly disagree with the choice of the term "factory"
> for a component that actually only acts as a storage. In GoF the
> term factory is properly used for a component which is devoted to
> construct the final object given its identifying key. But in this
> proposal it's the user code that actually constructs the object,
> so the term is used incorrectly and is misleading.
I decided to keep "factory" because this component is obviously
related to the element of the same name in GoF and oher descriptions
of the pattern. Incidentally, when point 3 above gets implemented
the factory will act more like a factory in the sense you describe.
That said, I'll be happy to use whatever other terminology reviewers
agree upon.
>
> > * What is your evaluation of the documentation?
>
> The documentation looks very well written, with lot of examples.
> The only sections that I found a bit lacking are the ones about the
> policies. The policies are indeed properly described, but the bare
> description is not very helpful in showing how to write a new
> policy. In particular I couldn't understand how to write a Tracking
> policy until I actually saw the code of flyweights::refcounted.
Note taken, I'll try to refine that bit. It's no surprise that
you get the most difficulties in understanding this aspect, because
it is arguably the most convoluted.
>
> > * What is your evaluation of the potential usefulness of the
> library?
> The library as it is, is not very useful, although it might be
> potentially very useful if the flaw I described before could be
> addressed.
I hope you might concede that the lib is useful in the scenarios
described at point 1 above, even if these are not the modes of
usage you envisioned in the frist place. Looking forward to your
discussion of the arguments I presented here.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk