Subject: Re: [boost] Requests for comments on a (partly) hypothetical non-relational serialization library
From: David Abrahams (dave_at_[hidden])
Date: 2010-06-20 07:04:54
At Sat, 19 Jun 2010 15:38:14 -0500,
> Hi all: I've been working for a while on a variety of tools to facilitate
> application development in normal (cross-platform) C++, and avoid the
> byzantine dependency chains (including needing multiple boost versions)
> which so often creep in
Cool; would love to hear more about how you do that.
> because real applications always seem to piece
> together disparate parts with different build systems, requirements, even
> how to download the source code...pretty soon you're not programming C++
> anymore, you're tinkering with Python make scripts, or Perl code
> generators, or learning Git or Subversion ... know what I'm saying?
Maybe in your view it will just amount to more tinkering, but it
sounds like http://ryppl.org is designed to address many of these
> Anyhow, I'm a fan of the Mongo database, but it's notoriously hard to build
> even the drivers, and not really suited for simple SQLite-like object
> serialization for persistence between runs of an application (even though
> this is theoretically possible, it is poorly documented and still requires
> linking against the entire Mongo system).
> So I've decided to develop a serialization framework (not a database) with
> some "NoSQL" features
>From what I can find, the term âNoSQLâ is so nebulous (we don't know
much except that it's not a traditional SQL database) that it's hard
to imagine what a âNoSQLâ feature might be. More specifics could help.
> based on Mongo, but alot easier to use. I believe this framework
> could provide a foundation upon which useful, moderately complex C++
> applications could be designed, by providing extensions to the
> library which are optional to use but which incorporate my work (I
> hope that doesn't sound pedantic) on general application
> development, without extra external dependencies.
This makes me a bit nervous, because it begins to sound like a
framework of its own, which tends to imply its own dependencies. I
don't think you'll find much interest here in components whose use
require dragging in a dependency on some kind of database store.
> Specifically, these extensions would include:
> 1) A tool for generating GUI code -- for wxWidgets, in particular -- from
> archives that could be edited with a simple textual front-end, vaguely like
> 2) A custom language based on Clojure -- a Lisp dialect originally
> implemented by Rich Hickey on the JVM -- for expressing queries and
> importing/exporting data from/to an archive;
> 3) Perl6-like regular expressions for matching against textual fields in an
> 4) AI-inspired algorithms for sorting, filtering, and in other ways
> operating on archives.
These all sounds quite interesting, but also they all sound like they
should be independent projects.
> My academic background is in AI -- actually, to be precise, I wrote a
> doctoral dissertation in the philosophy of science, but I researched AI in
> this context -- but I'm especially interested in nonrelational database
> theory because it better captures the process of modeling complex systems,
> and, in general, nonrelational databases are more interesting from an AI
> perspective because the lack of a fixed schema means that operations like
> sorting and filtering can require some "reasoning". I'm particularly
> interested in application development because I think one concrete
> application of AI research is to make tools like IDEs smarter. A
> non-relational serialization library could potentially serve the
> application development process not only by providing an easy way to
> persist data, but through IDE extensions or project generators -- store
> lists of debug breakpoints in an archive, or parse source code for
> namespaces, types, etc., and store the results in an archive, or an archive
> to represent all the controls in a GUI...
> The library I have in mind would differ from boost.serialization by
> providing explicit support for non-relational functionality,
Please be specific.
> and also by
> using a restricted type system
Boost.Serialization already uses a restricted type system AFAICT.
> along the lines of MongoDB and JSON: any persistable data field
> would have to be marshalled into one of a few predefined types,
> although users could explicitly extend the type system if desired.
> Aside from writing persistence code directly in the C++ source
> (along the lines of, e.g., instantiating a serialize() template in
> namespace boost::serialization), the test or demo applications I've
> been writing use external files, written in the (currently very
> minimal) Clojure-like language I mentioned above, and an interpreter
> does the actual serialization -- so the persistence strategy could
> be altered without recompiling the application, even while it is
> running. I think this offers new potential for using AI-style
> algorithms for things like tracking usage patterns, because all of
> that could be implemented fully orthogonal to the application
That sounds pretty research-speculative at this point. Am I right?
> So, that's the project I've sort of assigned myself, and I would
> appreciate any comments and ideas and what I could do to make this
> the kind of library C++ programmers would consider trying out.
My advice: you have lots of really interesting ideas, but any one of
them by itself could make for an all-consuming project. Start by
decoupling them. Then, pick a small piece to implement first.
Nothing kills off great ambitions faster than biting off too large a
hunk at once. Also, try to be more specific and fill in more details
when you describe what you're doing. Don't assume that everyone who
would want to use your work knows anything about non-RDBMSes, AI,
Functional Programming, etc.
-- Dave Abrahams BoostPro Computing http://www.boostpro.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk