Subject: Re: [boost] [AFIO] Formal review
From: Jeremy Maitin-Shepard (jeremy_at_[hidden])
Date: 2015-09-01 20:40:32
On Tue, Sep 1, 2015 at 3:42 PM, Niall Douglas <s_sourceforge_at_[hidden]>
> On 31 Aug 2015 at 20:39, Andrey Semashev wrote:
> > > Before people go "aha!", remember that afio::handle has proper error
> > > handling and has to configure a number of other items, including
> > > asking the OS for what its "real" path is and other housekeeping.
> > Why does the library do that when not asked? Why not collect this data
> > only when the user requests?
> The problem is the race free filesystem support which isn't as
> simple, unfortunately, as simply using the POSIX *at() functions.
> You need a minimum of the true canonical path of the fd, and its
> stat_t struct (specifically the inode, created and modified
> timestamps). That's at least two additional syscalls just for those,
> and AFIO may do as many as seven syscalls per handle open depending.
Can you describe in more detail exactly why this information is needed/how
it is used?
> You can disable the race free semantics, which are slow, using
> afio::file_flags::no_race_protection. However AFIO fundamentally
> assumes that any file system path consuming function is slow and will
> not be frequently called by applications expecting high performance,
> so work is deliberately loaded into path consuming functions to keep
> it away from all other functions.
> If you have a need to do a lot of file opens and closes and don't
> care about races on the file system, you are better off not using
> AFIO. STL iostreams is perfectly good for this as the only OSs which
> allow async file opens and closes is QNX and the Hurd.
There is certainly a large gap between what is possible with STL iostreams
or Boost filesystem and what is possible using platform-specific APIs for
file system manipulations. While it is obviously your choice what scope to
pick, I think it may be possible for it to be usable for everything you
intend to do with it, e.g. for writing a database backend, but also to be
much more widely useful.
> > >> - I'd much prefer native_handle() to return the actual appropriate
> native handle
> > >> type for the platform, e.g. int on POSIX, rather than a void *.
> I'm guessing
> > >> this is done in order to support alternative dispatchers, that
> > >> necessarily based on OS files at all, in the future, but perhaps a
> better way
> > >> of doing this that doesn't require casting the void* to an int,
> can be found.
> > >
> > > It's more ABI stability.
> > Stability with respect to what? Do you expect native handles to be
> > somehow different on the same platform?
> AFIO's design allows many backends e.g. ZIP archives, HTTP etc. So
> yes, native handles can vary on the same platform and I needed to
> choose one for the ABI.
Perhaps you could provide a less generic API for the native platform
filesystem, and then once you also have support for ZIP archives, etc.
create a more generic interface on top. While archive files and different
network protocols like WebDAV and FTP can certainly be presented as file
systems, particularly for read-only access, and indeed it is often possible
to access them as file systems through the native platform APIs, they tend
to differ sufficiently from file native systems that a different API may be
> > std::vector does not copy memory on C++ 11 when returned by value.
> > Correction: when returned with a move constructor or RVO.
> Which AFIO makes sure of, and I have verified as working.
What I meant is that returning std::vector<directory_entry> means that you
have to allocate memory for each path. In contrast, with a callback-based
API, you could reuse the same directory_entry for all of the entries.
> > > It's more that pwrite off the end of the file is inherently racy to
> > > concurrent read/writers. If you want safe automatic file length
> > > extension on writes, open a handle with the append flag and use that.
> > >
> > > AFIO fatal exits the process if you try to read or write past the
> > > current file size, it's a logic error and your program is inherently
> > > broken.
> > This is unacceptable behavior IMO. I would expect an exception in such
> AFIO assumes the only reason you are reading and writing past the
> maximum file size is either a logic error or memory corruption. It
> cannot tell which is which quickly, so we assume it's memory
> corruption as your program should never have this logic error. And as
> I explained before, if AFIO thinks memory corruption has occurred it
> can no longer know if what it has been told to do is correct or
> wrong, and therefore not terminating the process could cause damage
> to other people's data.
> I do have a very big concern that it is a denial of service attack on
> anything using AFIO by simply shrinking a file it is using from
> another process to cause the AFIO using process to fatal exit, so I
> will be trying to improve on the current situation in the engine
> rewrite. It's not as easy as it looks if you want to keep reads and
> writes fast, believe me, but again the ASIO reactor is in my way
> because that is code I can't control. Once I've written a replacement
> reactor I am hoping it can track scatter-gather writes a lot better
> than ASIO can and therefore handle this situation better without
> adding much overhead for tracking the individual operations.
Given that you go to great lengths to provide reasonable behavior in the
presence of other types of concurrent modification, it is very surprising
that the library makes it impossible to handle concurrent file truncation.
Writes could also fail due to running out of disk space or quota. Without
this AFIO is essentially only usable in highly controlled circumstances,
which would seem to make all of the other race guarantees unnecessary. For
instance it would not be usable for writing a general file synchronization,
backup, or file server program.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk