|
Boost : |
Subject: [boost] [spirit][qi] Fastest way to parse file
From: Antony Polukhin (antoshkka_at_[hidden])
Date: 2012-02-22 15:15:48
Hi,
Boost.Spirit documentation advices to use multi_pass iterators for
parsing files (or reading data to a STL container and then passing the
begin and end of the container to Spirit.Qi).
Much better solution would be to use a memory mapped file:
boost::interprocess::file_mapping fm(filename.c_str(),
boost::interprocess::read_only);
boost::interprocess::mapped_region region(fm,
boost::interprocess::read_only, 0, 0);
const char* begin = reinterpret_cast<const char*>(region.get_address());
const char* const end = b + region.get_size();
Comparing to multi_pass iterator, mmap approach increased parsing
speed more than 5 times, reduced memory usage and CPU load.
Mmap approach is also a little bit faster than reading data to STL
container and mmap sometimes requires less memory (depending on OS).
I think, that such solution should be at least mentioned in Spirit
documentation.
May be mmap shall be wrapped in some class for tighter integration
with Spirit and simpler usage:
int main()
{
namespace spirit = boost::spirit;
using spirit::ascii::space;
using spirit::ascii::char_;
using spirit::qi::double_;
using spirit::qi::eol;
spirit::file_parser first("multi_pass.txt"); // class that does mmap`ing
std::vector<double> v;
bool result = spirit::qi::phrase_parse(first
, spirit::make_default_multi_pass(base_iterator_type())
, double_ >> *(',' >> double_) // recognize list of doubles
, space | '#' >> *(char_ - eol) >> eol // comment skipper
, v); // data read from file
if (!result) {
std::cout << "Failed parsing input file!" << std::endl;
return -2;
}
std::cout << "Successfully parsed input file!" << std::endl;
return 0;
}
Best regards,
Antony Polukhin
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk