Subject: [Boost-bugs] [Boost C++ Libraries] #12456: mapped_file issues with huge file support
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-09-14 16:27:34
#12456: mapped_file issues with huge file support
--------------------------------------+--------------------------
Reporter: Igor Minin <igorm6387@â¦> | Owner: turkanis
Type: Bugs | Status: new
Milestone: To Be Determined | Component: iostreams
Version: Boost 1.61.0 | Severity: Optimization
Keywords: |
--------------------------------------+--------------------------
I worked on an application that handles huge files (tens of Gb). Using
memory mapped file is a common way to deal with such amounts of data. I'd
like to use boost::iostreams::mapped_file to read and write those files.
Sadly enough I faced a number of issues that makes it nearly impossible.
I describe all the problems in one ticket instead of several tickets
because all the issues are tightly connected to each other and fixing one
of those requires fixing another.
First, let us take a look at the mapped_file::open declaration:
{{{
template<typename Path>
void open( const Path& path,
BOOST_IOS::openmode mode =
BOOST_IOS::in | BOOST_IOS::out,
size_type length = max_length,
stream_offset offset = 0 );
}}}
It has length parameter that says how many bytes of file we wish to map
into the memory. By default it try to map the whole file, but in general
this parameter should be much lesser than the file size. Consider working
with file of 100 Gb. Mapping the whole file is too expencive and in many
cases simply impossible (consider x86 OS, for example).
This length parameter is stored in the size_ member.
{{{
size_ =
static_cast<std::size_t>(
p.length != max_length ?
std::min<boost::intmax_t>(p.length, size) :
size
);
}}}
{{{
std::size_t size() const { return size_; }
}}}
That leads us to the following problem:
1. mapped_file::size() returns us NOT the file size. In general case it
returns memory view size. If I need to know the file size
I must do additional queries outside of mapped_file code. It's a painful
work because I need to reimplement a lot of mapped_file::open code.
Mapped_file::open already knows this size, but doesn't expose it outside.
I guess mapped_file should have two methods: size() and file_size() or
view_size() and size() to separately get the whole file size and the size
of mapped region.
2. mapped_file::resize ignores length parameter. Consider:
{{{
void mapped_file_impl::resize(stream_offset new_size)
{
...
size_ = new_size;
param_type p(params_);
map_file(p); // May modify p.hint
...
}
}}}
Compare to the code from open. No min(length, size), it just uses the
new_size as view size. Again, in case of a huge file it is inappropriate
and sometimes impossible.
3. mapped_file doesn't allow remapping file without closing it and open
with new offset and length. That approach kills performance in case of
application that needs intensively read huge file piece by piece. I guess
mapped_file should have method remap that accepts new offset and do job
similar to resize, but without resizing file, just remapping.
-- Ticket URL: <https://svn.boost.org/trac/boost/ticket/12456> Boost C++ Libraries <http://www.boost.org/> Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:20 UTC