Boost logo

Boost :

From: Ben Artin (macdev_at_[hidden])
Date: 2006-04-22 14:29:13


The question was raised of possible overlap between the proposed property_tree
library and the existing serialization and program_options libraries. My view
on this matter:

The primary purpose of the program_options library is to provide an input
mechanism for simple (flat) data, specifically for the purpose of providing
configuration options to programs that are configured via the command line or
config files. The need for this library arises because those mechanisms for
entering configuration options are common in command-line tools.

The primary purpose of the serialization library is to provide a way to convert
live C++ data structures into a form that can be transported to another program
(or to the same program at a later time) and there reconstituted into a clone of
the original data structure. The need for this library arises because the need
for communication with other processes and storage of application data outside
of the application process is common across many problem domains.

The fact that the serialization library in its current form provides only
conversions to byte streams is not important. This is just an implementation
detail that arises from the fact that most common forms of interprocess
communication and external storage use byte streams, and so conversion of C++
data to byte streams easily addresses both of those problems.

Let us assume there there exists a data representation other than a byte stream
that is suitable for external data representation or interprocess communication.
Such a representation would be a reasonable archive format for the serialization
library. A user of the serialization library could (at least in principle) take
advantage of the strengths of this data representation.

Regardless of the data representation used by the serialization library, one
must treat the serialized format as opaque; there is a substantial amount of
metadata embedded in it, and while the format can be made to be human-readable,
it's predictably difficult to make it human-editable and even harder to make it
human-writable. It it not possible to eliminate this metadata and retain other
features provided by the serialization library (such as pointer tracking and
class versioning).

Now consider those design goals and constraints: the program_options library is
an input method only; the serialization library is an input and output method,
with features that prevent the data format from being human-editable.

It is clear that in any case in which a human-editable input and output file
format is needed, neither of these libraries will satisfy the requirement.

As it turns out, the domain of persistent configuration data edited both by
humans and by programs (such as configuration data of most GUI programs) is such
a case, and that the need for persistent user-editable configuration data is not
addressed by either program_options or serialization.

Enter property_trees. This library attempts to fill this need (persistent
user-editable data) and as a result overlaps with some functionality already
provided by serialization and program_options. If all you need is data input,
you could use either property_trees or program_options, as long as your data
structures are simple enough to be representable by program_options. If you need
input and output, you could use either serialization and property_trees, with
different data format and code complexity tradeoffs.

Here are some questions that I think need to be answered in order to decide
whether this overlap is a good idea:

1) Is a property tree in-memory data structure a valuable abstraction,
regardless of how it is serialized?

I think the answer to this is yes. Representation of a hierarchy of key-value
pairs is useful across the board, whether you are writing a compiler (symbol
tables) or a GUI FTP client (cached per-directory data).

2) Is the serialization of such a data structure to formats less rich than those
needed by the serialization library useful?

I think the answer to this is yes. The obvious motivation for this is in
serialization of configuration data to a user-editable form.

3) Could such serialization be done by the serialization library?

I believe so. Provided that the serialization library has the notion of an
archive that doesn't allow for class versioning or pointer tracking and may have
limited ability to represent hierarchical data, many common configuration file
formats (including Windows INI files, Mac OS X property lists, UNIX
key<delimiter>value formats) can be produced by the serialization library.

4) Should such serialization be done by the serialization library?

I believe so. The serialization library already has an interface for defining a
way to convert C++ objects to an external representation, and I think it would
be a good idea to maintain one interface for that purpose in boost.

5) Could a serialized form of property trees be used to input program options?

Yes.

So, here is one way that I see to resolve the property_trees conundrum:

1) A library providing an in-memory property tree abstraction should be
submitted.

2) A library providing serialization of property trees to some common
configuration file formats should be submitted. This could be part of the same
library as in part 1. As part of this work, serialization library may need to be
modified to allow for archive formats that cannot represent the full range of
object and class tracking metadata.

3) Ideally, this should result in the ability to serialize arbitrary C++ objects
directly to a configuration file format -- in-memory data shouldn't have to be
in the explicit form of a property tree in order to be serializable to a
configuration file.

4) Property tree file formats should be allowed as inputs to the program options
library (and the existing program options configuration file format should be
subsumed by the library from part 2 above).

This would result in orthogonal components for each of the following
responsibilities:

1) In-memory property tree representation
2) External property tree representation
3) Conversion of C++ objects, including in-memory property tree representation,
to external property tree representation
4) Interpretation of property trees as command line program options
5) Parsing of command line program options from argv

For what it's worth, Mac OS X has a native property tree API, and I have
implemented a serialization archive format that produces such a property tree
from C++ data structures using the serialization library. My understanding of
the various abstractions and their interactions is based on that experience.

Ben

-- 
I changed my name: <http://periodic-kingdom.org/People/NameChange.php>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk