Subject: Re: [boost] [filesystems] file for rename not found
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2019-03-14 12:46:53
On 3/14/19 2:29 PM, Florian Lindner via Boost wrote:
> Am 14.03.19 um 10:11 schrieb Andrey Semashev via Boost:
>> I haven't had experience with Lustre, but I'm guessing it may be related. Did you try calling fsync between close and rename?
> No, I was assuming that close() does this. I have modified the code to
> namespace fs = boost::filesystem;
> auto path = getFilename();
> auto tmp = fs::path(path + "~");
> boost::iostreams::stream<boost::iostreams::file_descriptor_sink> ofs(tmp);
> ofs << info;
> fs::rename(tmp, path);
> Reproducing the bug is hard, as so far, it only has appeared on really huge runs with more than 4000 processors.
close doesn't guarantee that written data or metadata has reached the
media. IOW, other processes may not observe the file creation
immediately after close. fdatasync only guarantees that for data but not
metadata. fsync guarantees that for both, which is why I explicitly
mentioned it and not fdatasync. For distributed filesystems, "media"
typically means something else than the physical storage on the nodes.
Exactly what it means depends on the filesystem.
Normally, one would expect that OS (and filesystem driver in the OS, in
particular) would guarantee that file creation would be visible at least
to the same process (thread) that created the file, even if that
operation did not reach the media. I allow that Lustre doesn't maintain
this guarantee, and if so, I would think this is a filesystem problem,
not that of user's application or Boost.Filesystem. This may be a design
choice (which would be wrong, IMHO) or even a configurable option with
some tradeoff, not necessarilly a programming bug.
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk