Boost logo

Boost :

From: Ruben Perez (rubenperez038_at_[hidden])
Date: 2023-02-09 12:23:41


On Sun, 5 Feb 2023 at 04:56, Klemens Morgenstern via Boost
<boost_at_[hidden]> wrote:
>
> The formal review of the mustache starts today, Sunday the 5th, and ends
> next week on Tuesday the 14th.

Hi all,

This is my review of Boost.Mustache.

First of all, thanks Peter for submitting the library, and Klemens for
volunteering as review manager.

> - Does this library bring real benefit to C++ developers for real world
> use-case?

This is a utility library, and as such I do believe it has a place in Boost.
Although Mustache is not super-popular, it comes up in certain contexts.
I think having a robust Mustache engine, compatible with Describe and JSON,
brings benefits to users.

> - Do you have an application for this library?

Not right now. I'm thinking of writing a PoC pentesting software which may
require templating HTTP requests with a list of payloads. If that happens
to be the case, I'd use this library for that.

> - Does the API match with current best practices?

It has a very small API surface, which I am really glad to see.
It uses JSON as input format for data, which has both advantages (it's simple)
and disadvantages (it doesn't allow to implement lambdas, it's not that
extensible, and in its current form, it requires a lot of copying).

I value simplicity a lot, so I'm alright with using JSON. However,
I think the last point should be solved somehow. Template data is purely
read-only, and efficiency should be a goal in a library like this. For these
reasons, an API that allows the user to pass input data with minimal copies
should exist.

The partials argument doesn't seem to have the right type, as it must be a
mapping from string to string, and the API declares it as a JSON object.

If I'm not mistaken, by using JSON you're forcing the usage of UTF-8 for
the input document. I think this is perfectly fine, but it should be documented.

It may be worth exploring the option of providing a renderer::reset method
that allows to recycle the same renderer, improving memory re-use.

Is there any use case for passing a different output_ref to
renderer::render_some and renderer::finish? If there isn't, output_ref could
be passed to renderer's constructor.

> - Did you try to use it? What problems or surprises did you encounter?

I've built the library, the tests and the examples with b2, Linux, gcc11 and
with cmake, Linux, clang15 and libc++ with no problems. I had to add the example
manually to the CMakeLists.txt, as cmake doesn't currently build the examples
(the author is already aware of that).

I've tried to break the library in every possible way I've imagined (especially
with buffer overruns and XSS payloads) and I have failed, which is good.

As described in previous emails, I found a couple of possible escaping-related
problems that can constitute vulnerabilities. These are described in
https://github.com/pdimov/mustache/issues/6 and should be easy to fix.

> - What is your evaluation of the implementation?

I've navigated through most of it using a debugger, and I think it's fine.
It seemed to me that it performed a lot of copies, so I performed some
measurements. I measured the number and size of allocations performed through
the storage_ptr's and I found it higher than expected. Input data being
converted into json::value is one source for such allocations, and should
be solved by the view-based API. The renderer class also performs a lot of
copying while rendering. These allocations, however, seem to be avoidable
without changing the interface. I've raised
https://github.com/pdimov/mustache/issues/7 to address it.

> - Is the documentation helpful and clear?

I've found reading the source code more helpful.

This is what I would add:

* A discussion section, similar to what URL does. I would include these
  sections:
  * Introductory usage example. Without Boost.Describe (as plain JSON is easier
    to understand for a newcomer). I would include some explanations on the
    Mustache features being used. I don't think a full section on Mustache is
    necessary, but saying "this is a tag", "this is a partial" would be useful.
  * Using the library with user-defined types - with Describe now.
  * Using renderer to render with render_some.
  * Using string types as output.
  * A section on escaping (the magic word "security" may help here). All other
    libraries have a bunch of open issues by users not understanding that
    escaping happens by default. Also, I would describe what "HTML escape"
    means in this context (the spec doesn't say it and different libs do
    different things).
* A proper reference section, with links and human-readable explanations.
  Taking output_ref as an example, I would like to see:
  * A URL for output_ref, so you can reference it from other places in docs.
  * A human-readable explanation (e.g. "a polymorphic reference to either
    a string or a I/O stream that can be used to output a rendered Mustache
    template").
  * Explanations of what St and Os mean (ideally, pointing to its own section).
  * When you reference other functions/classes (e.g. renderer in the render()
    description), please do provide links to them. Otherwise, the user ends up
    having to CTRL+F it, which is extremely frustrating (real story from mp11).
* Doxygen comments (or similar).
  Pressing F12 and seeing comments on the function you're using helps.
* Comparison to other libraries. Specially to the one listed by the official
  Mustache page, even if it seems to be unmaintained.
* The documentation should state the exact version of the Mustache API that
  it's implementing. It would be good to briefly describe what
  "mandatory features" are in the spec, what optional features aren't
  implemented but are in the roadmap, and which ones won't happen. The spec
  is very difficult to read as a user, so don't expect people will read it.

> - How much effort did you put into your evaluation? A glance? A quick reading?
> In-depth study?

I've dedicated around 10h to this review. I've read the docs, the Mustache
discussion and spec, reviewed the Node library, built and run the examples,
tried to break them, navigated through the impl via debugger and done the
allocation study.

> - Are you knowledgeable about the problem domain?

I briefly met Mustache when working with a swagger-codegen project
(https://github.com/swagger-api/swagger-codegen). Given an OpenAPI spec,
swagger-codegen generates stub code for you in different languages.
I haven't used it from C++ or in production.

> Please explicitly state that you either *accept* or *reject* the inclusion
> of this library into boost.

My recommendation is to **CONDITIONALLY ACCEPT** this library into Boost,
with the following conditions:
1. The partials argument must be typed according to the data structure it
   requires.
2. A view-based API should be added or, at least, should be addable without
   breaking changes.
3. The found vulnerabilities must be fixed.
4. Proper documentation must be written.

Some other minor points (these are not acceptance conditions):

* The link to mustache.github.io in the docs is HTTP instead of HTTPS.
* Calling a type in the examples "reference" may not be the best idea. I
  stared at it thinking "why is this not dangling" until I figured out what
  "reference" meant in that context.
* Is the config header supposed to be included by users? If not, move to
  detail/
* Is output_ref::write supposed to be called by users? If not, make it private.
* Not being able to load the CML without the superproject contributes to the
  feeling of living in a monolith.
* Your copyright is out of date ;)

Regards,
Ruben.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk