Boost logo

Boost :

Subject: Re: [boost] [Booster] Or boost is useless for library developers
From: Hervé Brönnimann (hervebronnimann_at_[hidden])
Date: 2010-05-20 23:02:39


[I still lurk on the list, although I almost never post these days... sorry]

Artyom, all:
I feel compelled to chime in, because while maintaining the Bloomberg STL and core application infrastructure between 2006--2009, I've also seen my share of crazy stuff. With hundreds of thousands of objects linked into more than a thousand libraries, compilation times were a real concern. In addition, but this is peculiar to Bloomberg infrastructure (and for historical reasons), some executables had to link in great amounts of stuff.

Sometimes the game was to try to keep the size of the debug symbols below the limit (as large as 2G on some compilers), other times it was the size of the code segment (which could reach up to 1.5G). At times it did seem futile, but when the company's business depends on it, and a seemingly innocuous change added tens of MB just because the debug symbols grow, and you cannot roll out the next release, these were real problems.

I must say, boost could be used in the leaves of the libraries hierarchy, but was forbidden in any "core" library. I myself was not part of that decision from management, but the reasons as I understood them were that we had components with similar functionality and slightly better performance (because tuned and optimized for our use cases), but mostly I think because of the header-only approach of many boost libraries.

Although we did use templates, we were careful to limit the amount of inlining and did template hoisting every chance we got. We had reimplemented a large part of boost (granted, with our own support for allocators, Ion knows something of it since he and Pablo Halpern had long discussions about the scoped allocator model). We attempted to factor as much code as possible into the static libs, and keep the headers for truly small inline functions. For instance, we had a partial specialization of vector<T*> that was a thin wrapper around vector<void*> (template hoisting), and basic_string<char> in terms of a non-template class.

I think that templates can be a problem, but if that is a concern, there are techniques to mitigate that. Namely, do not allow templates to propagate. Templates are for foundational libraries (boost::function, boost::variant, std::vector, std::map, boost::shared_ptr). In application domains, use concrete types.

If I am writing a map from a set of symbols to a set of values (let's say an attribute class), that I am going to use all over my code base, I would certainly consider writing an interface (or set of interfaces) whose implementation would use, but not export, the types std::map<symbol, boost::tuple<attribute_1, attribute_2, ... > >. Thus clients (that is, my code base) would simply use MySymbolMap. All the templates would reside in MySymbolMap.o and that's it. There are also advantages to that in terms of readability.

For instance, vector<MyClass> becomes MyClassVector. In the first phase, it can be a typedef, but having the type defined separately and used as MyClassVector throughout the code base gives the option to later define it as a thin wrapper, and move functions out-of-line so that translation units that include it no longer have to reference vector<MyClass>. Likewise, since tuples and variant can be template hogs, it was frowned upon to use typedef to instantiations of the variant template, but encouraged instead to define a concrete type for each tuple or variant instantiation.

So unless I have a very good reason to keep templates, when I am writing application-specific code, I try to factor my types in a single translation unit, within which I use template tools for the implementation. This is a well-known technique, but unfortunately, if you don't apply it systematically, template instantiation creeps in and it's a pain to refactor along it. Of course, this presupposes that there is some kind of hierarchical approach to code design (and thus some kind of high-level architect - we had John Lakos).

Speaking of that, and just as a case in point, I think at some point, the longest symbol in the code base was a whopping 10K long (mangled, iirc) on the IBM compiler (which has a poor mangling scheme, unlike the SunPro compiler which does some sort of subexpression factoring for which this symbol was only a few KBs). If memory serves, it was some kind of

> "std::map< std::map<std::string, std::pair<float, MyClass> >::iterator, std::pair<std::string, std::pair<double, MyClass> > >::iterator
> std::map< std::map<std::string, std::pair<float, MyClass> >::iterator, std::pair<std::string, std::pair<double, MyClass> > >::insert(std::map< std::map<std::string, std::pair<float, MyClass> >::iterator, std::pair<std::string, std::pair<double, MyClass> > >::iterator, std::pair<std::string, std::pair<double, MyClass> >::iterator, std::pair<std::string, std::pair<double, MyClass> > > const&);"

or something like this with, but a few more levels of nestedness in the pairs and maps. You might think it's crazy, but it's actually not that hard to do with code like:

struct MyClass
struct FloatMap {
   typedef std::pair<float, MyClass> ValueClass;
   typedef std::map<std::string, ValueClass> SymbolMap;
   typedef SymbolMap::value_type SymbolValue;
   typedef SymbolMap::iterator iterator;
};
struct DoubleMap {
   typedef std::pair<Value, MyClass> ValueClass;
   typedef std::map<std::string, ValueClass> SymbolMap;
   typedef SymbolMap::value_type SymbolValue;
   typedef SymbolMap::iterator iterator;
};

std::map<FloatMap::iterator, DoubleMap::SymbolValue> myMap;
myMap.insert(...);

Modeling relationships between financial instruments can involve several levels of maps, and that's absolutely legit. The point is that, instead of defining his own types, the developer kept defining them using typedefs and std:: facilities like map, pair, and iterator, and there were several levels of that in different files. Each file by itself looked nice and concise. But at the end, he ended using it up within a function template (or inline, can't remember which), which was instantiated within dozens of translation units. Thus the occurrence of that symbol in several (too many) object files.

--
Hervé Brönnimann
hervebronnimann_at_[hidden]

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk