[Boost-bugs] [Boost C++ Libraries] #12456: mapped_file issues with huge file support

Subject: [Boost-bugs] [Boost C++ Libraries] #12456: mapped_file issues with huge file support
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2016-09-14 16:27:34


#12456: mapped_file issues with huge file support
--------------------------------------+--------------------------
 Reporter: Igor Minin <igorm6387@…> | Owner: turkanis
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: iostreams
  Version: Boost 1.61.0 | Severity: Optimization
 Keywords: |
--------------------------------------+--------------------------
 I worked on an application that handles huge files (tens of Gb). Using
 memory mapped file is a common way to deal with such amounts of data. I'd
 like to use boost::iostreams::mapped_file to read and write those files.
 Sadly enough I faced a number of issues that makes it nearly impossible.

 I describe all the problems in one ticket instead of several tickets
 because all the issues are tightly connected to each other and fixing one
 of those requires fixing another.

 First, let us take a look at the mapped_file::open declaration:

 {{{
     template<typename Path>
     void open( const Path& path,
                BOOST_IOS::openmode mode =
                    BOOST_IOS::in | BOOST_IOS::out,
                size_type length = max_length,
                stream_offset offset = 0 );
 }}}

 It has length parameter that says how many bytes of file we wish to map
 into the memory. By default it try to map the whole file, but in general
 this parameter should be much lesser than the file size. Consider working
 with file of 100 Gb. Mapping the whole file is too expencive and in many
 cases simply impossible (consider x86 OS, for example).

 This length parameter is stored in the size_ member.

 {{{

  size_ =
                 static_cast<std::size_t>(
                     p.length != max_length ?
                         std::min<boost::intmax_t>(p.length, size) :
                         size
                 );

 }}}

 {{{
 std::size_t size() const { return size_; }
 }}}

 That leads us to the following problem:

 1. mapped_file::size() returns us NOT the file size. In general case it
 returns memory view size. If I need to know the file size
 I must do additional queries outside of mapped_file code. It's a painful
 work because I need to reimplement a lot of mapped_file::open code.
 Mapped_file::open already knows this size, but doesn't expose it outside.
 I guess mapped_file should have two methods: size() and file_size() or
 view_size() and size() to separately get the whole file size and the size
 of mapped region.


 2. mapped_file::resize ignores length parameter. Consider:

 {{{
 void mapped_file_impl::resize(stream_offset new_size)
 {
 ...
     size_ = new_size;
         param_type p(params_);
     map_file(p); // May modify p.hint
 ...
 }
 }}}

 Compare to the code from open. No min(length, size), it just uses the
 new_size as view size. Again, in case of a huge file it is inappropriate
 and sometimes impossible.

 3. mapped_file doesn't allow remapping file without closing it and open
 with new offset and length. That approach kills performance in case of
 application that needs intensively read huge file piece by piece. I guess
 mapped_file should have method remap that accepts new offset and do job
 similar to resize, but without resizing file, just remapping.

-- 
Ticket URL: <https://svn.boost.org/trac/boost/ticket/12456>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.

This archive was generated by hypermail 2.1.7 : 2017-02-16 18:50:20 UTC