Boost logo

Boost :

Subject: Re: [boost] [MMap/VM] RFC
From: Domagoj Saric (dsaritz_at_[hidden])
Date: 2016-03-06 16:19:44


On 4.3.2016. 1:18, Andy Thomason wrote:
> Hi Domagoj and Mathias.
>
> > for quite some time I've been working on a portable mmap/virtual memory
>
> I have been using Boost.Interprocess for handling large genetic data files
> in the proposed Boost.Genetics, it is intended for sharing memory images
> between processes, but works just as well as a portable mmap wrapper
> including as a block allocator.

And the "portable mmap wrapper" and "block allocator" are IMO two
separate things and belong in separate libraries, as mmap-ing is not
essentially an IPC thing. This is one of the chalenges in making a
VM/MMAP library - find out the clear API boundary where the library
should stop and Interprocess should then build upon.

> The only shame is that it is not 100% header-based and so needs binaries
> for different platforms.

But Interprocess is a header only library?

> mmap is the only sensible way of working with large in-memory datasets
> as it can exceed the swap file size.

Not sure what you mean by 'it can exceed swap file size'. A mapping can
be larger than the available RAM+swap only if you use overcommit and/or
'uncommited'/'unreserved' mappings (and thus risk AV/SIGSEV crashes).
 From your description I gather that you need/use something like a
scratch disk/file(s) (which can be useful on systems with a dedicated
but 'not big enough' swap partition).
IMO this is not something client code should worry about, i.e. it should
be abstracted by the VM library: you would specify ('reserve') how much
scratch space you need and the library would see if it can use the
paging file as a scratch file (what it esentially is) or look for a
disk/partition with more space (and it would choose appropriate mapping
and file creation flags/system hints optimal for scratch storage)...
And this brings me to another point not covered in the opening post:

9. Resizable views (currently a todo item)
Besides the already mentioned resizable ('ftruncatable') mapping
objects, we can also have resizable views of those mappings. For
example, one might wish to 'walk' a file (possibly in a random fashion)
in chunks (e.g. files that contain 'ready to use data', such as
uncompressed audio or video files). The problems is what API would be an
'overall best' for such use cases.
mapped_view objects are currently thin RAII wrappers around an
iterator_range and iterator_ranges have the handy advance_begin() and
advance_end() member functions (which are hidden by the non resizable
mapped_view class). These however move in 'element'-sized chunks (bytes
for char views) which does not seem handy or efficient for mapped_views:
you'd usually want to either resize the view or advance in page-size or
mapped_view.size() chunks.
Perhaps I can, for the future resizable_mapped_view class, retain the
advance_begin/end() interface and add
- advance() (which does a simultanaeous begin/end advance)
- increment/decrement operators which hop in chunks equal the current
size of the view
- a seek() function that can do an absolute jump and resizing in one call.

An additional question is should resizable_mapped_views hold references
to their parent mapping objects and resize them as needed (or fail if
one tries to grow them beyond the size of the mapped file/shared memory
object)?

-- 
"What Huxley teaches is that in the age of advanced technology, 
spiritual devastation is more likely to come from an enemy with a 
smiling face than from one whose countenance exudes suspicion and hate."
Neil Postman

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk