Boost logo

Boost :

Subject: Re: [boost] Library for configuration file parsing
From: Marsh Ray (marsh_at_[hidden])
Date: 2010-11-28 19:54:30


On 11/28/2010 04:54 PM, Dean Michael Berris wrote:
> On Mon, Nov 29, 2010 at 6:25 AM, Marsh Ray<marsh_at_[hidden]> wrote:
>>
>> What I would really like is a clean and simple JSON library.
>
> At the risk of sounding PR'ish...
>
>> Last time I looked around (a year or two ago) it seemed like there were a
>> lot of 50-80% side projects, none of which gave me the warm fuzzies about
>> being tested and maintained. Many would parse but not generate, or vice
>> versa. The DOM v SAX architectural decisions seem relevant too.
>
> It's actually on the list of things for me to do on cpp-netlib for 0.9

Cool!

> -- I'm working on cleaning up the internals of the library, and then
> preparing to do higher level utilities that will make web application
> or web service (REST+JSON) development with C++ easier.
>
> One of the things that I will be working on is a simple, robust, and
> type-safe way for doing JSON parsing/generation using Boost.Spirit.

Honestly, the couple of times I've tried to use Spirit I have not been
successful. I've done a few templates in my time but that library blows
my mind. The concept is brilliant - and its implementation, heroic.
But trying to actually use it tends to just make me feel dumb.

I look at the diagrams at http://www.json.org/
and I see a simple byte-by-byte (or character-by-character) state
machine. The kind of thing that's been done since the early C compilers,
only much simpler. Something I could understand in a debugger or, more
importantly, review for security in a network-facing application.

> I'm positive there's already an example of how to do it with
> Boost.Spirit's Qi/Karma and I'm almost sure that I'll start with
> those.

I hate to say it, but what I want is not that.

I can't put Spirit code out on a network-facing environment for the same
reason that I can't put a Haskell program out in such an environment - I
don't understand it under the hood well enough to reason about the upper
limits on its runtime resource consumption. (Actually, in the Haskell
case it's not clear that anyone does. :-)

> The idea with the utility library is that it will be usable in
> many different contexts -- and I'm actually prioritizing the parsing
> of HTTP requests that have JSON payload in PUT/POST requests.
>
> Of course that's just work waiting to be done -- if you have specific
> use cases in mind aside from just (simple) configuration file parsing,
> I'd definitely appreciate guidance/thoughts on what you would look for
> in a JSON parsing/generation library.

Haha, cool, I get to play the customer for once.

My wishlist/thoughts:

* An interface based on UTF-8 encoded std::strings. Locales and other
string encodings are not helpful to me.

* Require minimal header dependencies. For example, I take std::vector,
map, string, shared_ptr, and BOOST_FOREACH as a given. But other big
header trees should have a justification.

* You mentioned type-safe. But the documents are completely dynamic,
there's no schema. I'd rather just have everything presented as strings,
but maybe the library would do reasonable automatic conversions on
output. I would not want incoming untrusted JSON to create objects of
attacker chosen types unless the interface makes the code state its
expectations and throws an exception. Like dynamic_cast to a reference
type (not like to a pointer type which defaults to a null pointer crash).

* Some types I see as valuable to work with are "string of arbitrary
text" (e.g. an unqualified std::string), "string claimed to be JSON" (we
received it), and "string of known-valid JSON" (we generated or
validated it). These are things that tend to get confused in
applications, can result in security holes (double escaping bugs), and
that stricter typing could help.

* What would make it really industrial-strength (i.e., good enough for
web apps) is a first-class mechanism for declaring limits on total
memory usage and object allocation count before beginning a parsing or
generation operation.

* The DOM could have an interface sort of like:

void f(shared_ptr<json::dom_node> jdn)
{
   shared_ptr<json::object> jo = jdn->as<json::object>();
   // throws if somehow not a json::object ^^^^^^^^^^^^

   std::string username;
   BOOST_FOREACH(json::object_pair & jopr, jo->pairs())
   {
     // Iteration actively randomizes the order.
     // It's not significant according to the spec, right? :-)
     if (jopr->name() == "username")
         username = jopr->value_as<std::string>();
            // throws if ^^^^^^^^ throws if not a json::string node
   }
   ...
}

* shared_ptr is great, but an intrusive_ptr could be good too. Hopefully
cyclic references shouldn't be a problem, but a whole-document pool
deallocator could be helpful. I like a convention where node types
expose a typedef like 'sptr_type' with its preferred smart pointer type.

* It doesn't have to be a header-only library. It'd be better to have
the interface small and simple.

* Interfacing to boost::serialization could be cool, but it's probably
not the primary use case right now.

* I don't much care about what type of exception gets thrown. Anything
under std::exception is fine. It would be good to have line and char
position information for parsing errors.

* It would be cool if the parser could be incrementally spoon-fed input
data and code could pull data out of the generator incrementally as
well. This would facilitate usage with ASIO-like callbacks.

* A simple pair of functions for escaping and unescaping according to
the actual JSON rules for the between-doublequotes context.

* And a pony.

Thanks,

- Marsh


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk