Boost logo

Boost Users :

From: Jeff Garland (jeff_at_[hidden])
Date: 2006-09-12 13:57:31


Scott Meyers wrote:
> I have a question about library interface design in general with a
> strong interest in its application to C++, so I hope the moderators will
> view this at on-topic for a list devoted to the users' view of a set of
> widely used C++ libraries. In its most naked form, the question is
> this: how important is it that class constructors ensure that their
> objects are in a "truly" initialized state?

Very. Back in the mists of programming history, which you certainly remember,
on large scale projects 'uninitialized variables' where a huge source of
errors in programs. This is much less true today -- even though it is
certainly still possible with the languages we are using. Of course compilers
warn you so there's no excuse now. Having objects that construct completely
is analogous, but maybe even one step up the foodchain.

> I've always been a big fan of the "don't let users create objects they
> can't really use" philosophy. So, in general, I don't like the idea of
> "empty" objects or two-phase construction. It's too easy for people to
> make mistakes.

Exactly. And I believe if you give them this option so it is 'easier to
learn' then you will be 'training them' to make mistakes.

The real issue, for me at least, is in being able to eliminate incorrect
behavior of a 'partially constructed object' in a 10 million line of code
program that I can't possibly understand fully. Let's do a small thought
experiment. Suppose a program that has been running in production for years
starts malfunctioning. I have a stack trace that shows me where something
goes wrong, but it only happens infrequently -- that is, all tests pass and
geez it's been working for years in the field without problems. So I have to
go into detective mode to figure out what's happening.

To get from 10 million LOC to say 100K LOC I simply look at the stack trace
and see where the failure is happening. Then I can start looking at the
objects involved and see what possible failure modes are consistent with the
program behavior. Now, if any class in the trace supports a default
constructor that can lead to exceptions on later access I have to consider the
possibility that the object isn't constructed correctly. This may lead me
chasing across thousands of lines of code -- even to different programs if say
one program generates data and another uses it. If I know that this is
impossible because of the class design it eliminates many failure modes and
hence lines of code from consideration.

At the end of the story I believe this is much more important because one
latent bug like this can cost many thousands of dollars to track down.

> So, for example, if I were to design an interface for an
> event log that required a user-specified ostream on which to log events,
> I'd declare the constructor taking an ostream parameter (possibly with a
> default, but that's a separate question, so don't worry about it):
>
> EventLog::EventLog(std::ostream& logstream); // an ostream must be
> // specified
>
> I've slept soundly with this philosophy for many years, but lately I've
> noticed that this idea runs headlong into one of the ideas of
> testability: that classes should be easy to test in isolation. The
> above constructor requires that an ostream be set up before an EventLog
> can be tested, and this might (1) be a pain and (2) be irrelevant for
> whatever test I want to perform. In such cases, offering a default
> constructor in addition to the above would make the class potentially
> easier to test. (In the general case, there might be many parameters,
> and they might themselves be hard to instantiate for a test due to
> required parameters that their constructors require....)

Testing in total isolation is a myth. To be a little softer -- it's really
going to depend on the type of class you are testing whether or not it can be
rationally tested in isolation. If you haven't lately, you should re-read
Lakos's treatment of this subject in Large Scale C++ Software Design. This
book is 10 years old, but he breaks down testability in a way I've not seen
anyone else do since doing testing became all the rage. Most of the 'test
first' stuff I've seem ignores the inherent untestability of some software.

In the EventLog case, I would be totally unconcerned about requiring the
ostream -- it's an incredibly stable and well tested library. A 'level 0'
component in Lakos lexicon. The issue for testing is more serious when the
dependency is on JoesCustomAndEverEvolving class. Here it's a 'problem' since
not only do I need to use it, but if its changing frequently it might break my
tests. But depending on the component I'm building it might be unreasonable to
build a stand-in -- in fact most of the time I think stubs are a waste.
Anyway, I can't think of a case where losing the correctness benefit of
complete construction will truly help simplify the overall testing effort.

One last point. Don't forget that you may have made EventLog testing harder
since you now have to add all the tests for the 'incomplete construction'
cases to your test suite. And you still have to write tests against the 'full
up' scenarios. At least if you are going to perform good coverage...

> Another thing I've noticed is that some users prefer a more exploratory
> style of development: they want to get something to compile ASAP and
> then play around with it. In particular, they don't want to be bothered
> with having to look up a bunch of constructor parameters in some
> documentation and then find ways to instantiate the parameters, they
> just want to create the objects they want and then play around with
> them. My gut instinct is not to have much sympathy for this argument,

I have zero sympathy. If you want to build stable and large software systems
then they need to get serious about correctness.

> but then I read in "Framework Design Guidelines" that it's typically
> better to let people create "uninitialized" objects and throw exceptions
> if the objects are then used. In fact, I took the EventLog example from
> page 27 of that book, where they make clear that this code will compile
> and then throw at runtime (I've translated from C# to C++, because the
> fact that the example is in C# is not relevant):
>
> EvengLog log;
> log.WriteEntry("Hello World"); // throws: no log stream was set
>
> This book is by the designers of the .NET library, so regardless of your
> feelings about .NET, you have to admit that they have through about this
> kind of stuff a lot and also have a lot of experience with library users.

No opinion on the framework, but when I'm 'exploring' I most often want to see
how I would use a library to write real production code, because more than
likely that's what I'm doing. My 'exploration' is more than likely a few
hundred line program to sort out how the interfaces work. If I encountered
the above code example above, I'd abandon the library as unusable (assuming I
had an option). If I couldn't abandon I'd write a wrapper with a default
stream to initialize to ensure I didn't make that mistake.

And by the way, using defaults or writing initialization methods or classes
that provide common defaults is a nice way of making exploring easier. The
library can supply these in the example code.

> But then on the third hand I get mail like this:
>
> > The .NET libraries have many objects with many constructors that
> leave the constructed object in a not ready-to-use state.
> >
> > An example:
> > System.Data.SqlClient.SqlParameter is a class that describes a bound
> parameter used in a database statement. Bound parameters are essential
> to prevent SQL injection attacks. They should be exceedingly easy to use
> since the "competition" (string concatenation of parameters into the SQL
> statement) is easy, well understood, and dangerous.
> >
> > However, the SqlParameter class has six constructors. Only two
> constructors create a sqlParameter object that can be immediately used.
> The others all require that you set additional properties (of course,
> which additional properties is unclear). Failure to prepare the
> SqlParameter object correctly typically generates an un-helpful database
> error when the SQL statement is executed. To add to the confusion, the
> first ctor shown by intellisense has 10 parameters (which, if set
> correctly, will instantiate a usable object). The last ctor shown by
> intellisense has only 2 parameters and is the most intuitive choice. The
> four in between are all half-baked. It's confusing, and even though I
> use it all the time, I still have to look at code snippets to remember how.

This seems like a failure of design focus to me. If the 'big constructor' can
actually detect a failure at the point of construction it should throw an
exception then.

Having said all this, the SqlParameter class might be an example of a 'GOF
builder' (I didn't look it up so I'm not sure) where the main purpose is to
gradually build a more complex object. In which case, I would tend to
eliminate all the constructors making the initial state always be 'null' or
empty. Then the user would have to call a series of methods to build up the
sql command and there might be an explicit call to 'validate' once that
process is complete. This would be a case, as opposed to EventLogger, where
'full initialization' on construction might just confuse the purpose of the class.

> So I'm confused. Constructors that "really" initialize objects detect
> some kind of errors during compilation, but they make testing harder,
> are arguably contrary to exploratory programming, and seem to contradict
> the advice of the designers of the .NET API. Constructors that "sort
> of" initialize objects are more test-friendly (also more loosely
> coupled, BTW)

I don't agree that they are more loosely coupled as ultimately you will still
need to supply either a stub or the actual class to do something of use --
certainly in the EventLogger you won't be able to write many tests without
setting the i/o stream.

> and facilitate an exploratory programming style, but defer
> some kinds of error detection to runtime (as well as incur the runtime
> time/space costs of such detection and ensuing actions).
>
> So, library users, what do you prefer, and why?

Your original wisdom was correct -- in most cases I want construction to
guarantee a complete object.

Jeff


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net