Boost logo

Boost :

From: John Salmon (salmonj_at_[hidden])
Date: 2006-09-24 10:28:11


When restoring a string from an xml_iarchive, the process stack can
grow surprisingly large. A demonstration is appended.

Restoring a string with 10000 '<' characters blows out the stack to
more than 3.7MB. Notice that the demo program shows that neither the
letter 'x' nor the letter '.' is problematic.

It seems that stack growth only comes from entity references in the
xml archive, i.e., sequences that match the 'Reference' pattern in
basic_xml_grammar.ipp): &gt;, &amp;, &lt; &apos; &quot;. My guess is
that the parser used to unescape the references has the property that
it pushes the stack every time it sees one of Reference patterns.

This isn't just a theoretical problem. It can arise in practice if
one tries to restore a string containing XML, or C++ source. (all
those '<', '>' and '&'). In fact, I found it by investigating why my
stack grew to more than 1MB when I switched from a text_archive to an
xml archive in a real application.

I haven't taken a close look at the grammar, and I have no experience
at all with spirit. Is this likely to be something easily fixable, or
is it something one just has to live with?

Cheers,
John Salmon

---------------cut here serstack.cpp --------
// Demonstration that xml_iarchive blows out the
// stack when handed a string with lots of xml
// entity references: &, <, >, ', ". This demo
// does not explore what happens with unicode char refs,
// e.g., &#NNNN; and &#xXXXX;

#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>
#include <boost/serialization/string.hpp>
#include <sstream>
#include <cassert>
#include <iostream>
#include <cstdlib>

using namespace std;
using namespace boost;

// Figuring out how much the stack has grown is *very*
// system dependent. This works on at least one
// version of Linux.
void checkStk(char *txt){
    pid_t pid = getpid();
    char command[512];

    printf("%s", txt);
    sprintf(command, "grep VmStk /proc/%d/status", pid);
    system(command);
}

void archive_string(const char c){
    string bigstring(10000, c);
    stringstream ss;
    archive::xml_oarchive oa(ss);
    oa << BOOST_SERIALIZATION_NVP(bigstring);

    cout << "Archiving a string of '" << c << "'\n";
    string copy_of_bigstring;
    archive::xml_iarchive ia(ss);
    checkStk("Before ia >> copy\n");
    ia >> BOOST_SERIALIZATION_NVP(copy_of_bigstring);
    checkStk("After ia >> copy\n");
    assert( bigstring == copy_of_bigstring );
}

int main(int argc, char **argv){
    const char *letters;
    if(argc == 2)
        letters = argv[1];
    else
        letters = "x>";
    // Note that once the stack is 'blown', you don't
    // learn much by testing other letters. I.e.,
    // serstack "xyz&>"
    // tells you about & but not much about >
    for(const char *p = letters; *p; ++p)
        archive_string(*p);
    return 0;
}

-----------------
# The stack grows to 3.6MB when we archive strings
# with entity refernces:
salmonj_at_drda0026.nyc$ serstack '<'
Archiving a string of '<'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 3760 kB
salmonj_at_drda0026.nyc$ serstack '&'
Archiving a string of '&'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 3760 kB
salmonj_at_drda0026.nyc$ serstack '"'
Archiving a string of '"'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 3760 kB
salmonj_at_drda0026.nyc$ serstack "'"
Archiving a string of '''
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 3760 kB
# But if we archive strings containing 'plain'
# characters, even punction, the stack remains
# a svelte 12kB.
salmonj_at_drda0026.nyc$ serstack 'abcdef.!@123*()789'
Archiving a string of 'a'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of 'b'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of 'c'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of 'd'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of 'e'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of 'f'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '.'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '!'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '@'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '1'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '2'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '3'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '*'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '('
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of ')'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '7'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '8'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
Archiving a string of '9'
Before ia >> copy
VmStk: 12 kB
After ia >> copy
VmStk: 12 kB
salmonj_at_drda0026.nyc$


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk