Boost logo

Boost :

Subject: Re: [boost] cmake target and binary name mangling
From: Roger Leigh (rleigh_at_[hidden])
Date: 2017-07-23 21:44:37


On 23/07/17 11:42, Niall Douglas via Boost wrote:
>>> As I already mentioned in another post, specifying the entire
>>> configuration in the target name is exactly the right way to implement
>>> modern cmake 3.
>>
>> Wouldn't this, if taken to its logical conclusion, amount to a (poor?)
>> reimplementation of Boost.Build in CMake? I mean, once you have
>> target-<variant>debug-<link>static that's pretty much what it is.
>
> Why fix something that's broke? Closely replicating the Boost.Build
> design will make replacing Boost.Build with cmake so much easier. All
> the dependent tooling etc will "just work".

I have to strongly disagree here.

It's not about being "broken", it's about being non-standard and
overly-complicated for marginal benefit. This pushes an additional
burden on end users of Boost to somehow know in advance which bits are
encoded in the library names; it's a terrible requirement to impose on
others. This forces every end user to provide some mapping from their
own build logic to the names used by Boost for encoding each component
of the configuration suffix, and then hope that the system provides
that. Consider just how costly and painful this truly is.

I've suffered from this in the past trying to use Boost with the
autotools on Linux. Debian used to use suffixes for single- and
multi-threaded. So what do I do, search for the -st and -mt suffixes
and also unsuffixed? Or hardcode the one I expect, complicated by the
provided names changing over time? I went with the latter, only to have
it fail to build on other systems with different names in use, to then
try to autodetect all the variants only to still have people complain it
still fails because there's some other variant I didn't cater for. And
this was just for a single suffix variant. Thankfully, most Linux
distributions later dropped all the suffixes.

This policy forces every consumer to need to reimplement a good chunk of
logic to determine what name might be in use. And I say might, because
it's essentially an unrewarding guessing game. And every failure is an
example of compromised portability of other people's codebases caused by
this policy.

One of the reasons I switched to using CMake was that FindBoost.cmake
encoded most of the esoteric rules used in the naming conventions and
plays the guessing game really well. It usually finds the right answer.
  But that's ~2000 lines of nastiness which I have to maintain, and the
rules keep changing with each Boost release. If you're not using CMake,
and you can't hardcode some specific variant, you have to pay some of
this cost as well; multiply that by each independent downstream use of
Boost. Is that cost acceptable?

The suggestion in another post that some of this stuff could be encoded
in the pkg-config names is also subject to this problem. It again
pushes a burden onto every single consumer of the boost libraries, which
in my opinion is quite unacceptable. It also kind of defeats the point
of using pkg-config in the first place. It's whole purpose is to answer
the simple question: what do I need to compile and link with X, and if
that's not handled, it's failed.

> Besides, John Maddock is 100% right that ABI requirements ought to be
> encoded into the shared library name. Saves so much time and hassle and
> confusion, plus consumers can easily REGEX out the link requirements if
> needed.

In my experience as an end user of Boost, it only *adds* time, hassle
and confusion. I want to link with a specific library, and I don't want
this obfuscated unnecessarily.

> Here are the cmake targets which my quickcpplib tooling autogenerates
> for AFIO v2 based on inspection of the environment (apologies for the
> long list in advance):
[...]
> ... and <special> is anything affecting the link requirements of the
> binary, so here QuickCppLib auto detected the code sanitisers
> asan,msan,tsan,ubsan all of which require to be linked like-with-like,
> and hence needed a separate <special> category.
>
> Group-level targets also exist, so "afio_sl-tsan" means "all the static
> library AFIO targets with Thread Sanitiser", "_tsan" means "all the
> targets with Thread Sanitiser" and so on.

> The above is just the mangling for cmake targets which must always be
> unique within a given cmake build else cmake will refuse to work, hence
> mangling systematically according to a strict REGEX pattern makes a ton
> of sense (and using TARGET ALIAS to a more convenient to type target
> name like "boost::afio").
>
> The actual binaries generated by the build are further mangled again
> according to this REGEXable pattern:

Apologies for saying this, but this is truly horrific. None of that
belongs in a CMake build.

One of the benefits of using a standard build system, be it the GNU
Autotools, CMake, Maven or something else is that they provide standard
ways of doing things. This starts with common command-line options and
configurable properties, to build targets and behaviour. This makes the
build of one project accessible to others for both development and end
use because it uses the same conventions that all the other projects
using the build system adhere to. That is to say, there are certain
basic expectations of what the system will provide and how it will
behave. While it's certainly possible with CMake to deviate from the
default behaviour, this breaks a whole lot of those expectations, and
significantly complicates and devalues what CMake provides for us.

The key disagreement here is that there are two ways of doing things:

- Running a build and having it produce multiple builds of the same
library, with the configuration details encoded in the name, which may
be installed alongside each other. You need to know the specific
variant name to link with it.
- Running a build with a specific configuration and having it product a
single library with a plain name; you need to run multiple separate
builds with separate install prefixes to build multiple combinations.
You don't need to know the variant name to link with it, but you do need
to know the path to the variant you want.

The former is what Boost.Build provides. The latter is what the GNU
Autotools, CMake and pretty much every one else provides.

Boost is the exception here. Pretty much every project I've ever come
across does the latter. While I can see the reasons why the Boost
conventions were established, I don't agree that they are the ideal
solution or that they have value for the general case; there is also
value in behaving the same as the rest of the world for ease of
interoperability and use. For example, I manage just fine for all my
work projects by having the automated CI system doing builds for n
configuration variants and packaging up artefacts for each of the n
variants for Windows, and also for all the other platforms I support.

Also, in the Linux world, look at the biarch and multiarch layouts.
Here's multiarch on Ubuntu 17.04:
   /usr/lib/x86_64-linux-gnu/libtiff.so.5.2.5
   /usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.62.0
the processor type and system are encoded in the lib path. None of it
required any special manging of the library names; I can link with
-ltiff or -lboost_filesystem and it will just work. You can also use
paths on Windows in exactly the same manner for build variants, again
with no name mangling.

I would suggest that the CMake build for Boost should do things
following the standard practices for CMake, and be as simple as possible.

Regards,
Roger


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk