|
Boost : |
Subject: Re: [boost] [filesystems] file for rename not found
From: Andrey Semashev (andrey.semashev_at_[hidden])
Date: 2019-03-14 12:53:47
On 3/14/19 3:46 PM, Andrey Semashev wrote:
> On 3/14/19 2:29 PM, Florian Lindner via Boost wrote:
>>
>> Am 14.03.19 um 10:11 schrieb Andrey Semashev via Boost:
>>>
>>> I haven't had experience with Lustre, but I'm guessing it may be
>>> related. Did you try calling fsync between close and rename?
>>
>> No, I was assuming that close() does this. I have modified the code to
>>
>> {
>> Â Â namespace fs = boost::filesystem;
>> Â Â auto path = getFilename();
>> Â Â auto tmp = fs::path(path + "~");
>> Â Â fs::create_directories(tmp.parent_path());
>> Â Â boost::iostreams::stream<boost::iostreams::file_descriptor_sink>
>> ofs(tmp);
>> Â Â ofs << info;
>> Â Â ::fdatasync(ofs->handle());
>> Â Â ofs.close();
>> Â Â fs::rename(tmp, path);
>> }
>>
>> Reproducing the bug is hard, as so far, it only has appeared on really
>> huge runs with more than 4000 processors.
>
> close doesn't guarantee that written data or metadata has reached the
> media. IOW, other processes may not observe the file creation
> immediately after close. fdatasync only guarantees that for data but not
> metadata. fsync guarantees that for both, which is why I explicitly
> mentioned it and not fdatasync. For distributed filesystems, "media"
> typically means something else than the physical storage on the nodes.
> Exactly what it means depends on the filesystem.
>
> Normally, one would expect that OS (and filesystem driver in the OS, in
> particular) would guarantee that file creation would be visible at least
> to the same process (thread) that created the file, even if that
> operation did not reach the media. I allow that Lustre doesn't maintain
> this guarantee, and if so, I would think this is a filesystem problem,
> not that of user's application or Boost.Filesystem. This may be a design
> choice (which would be wrong, IMHO) or even a configurable option with
> some tradeoff, not necessarilly a programming bug.
As another possibility, creating and writing to the file is not atomic
with subsequent renaming. It is always possible that another process
removes or renames the written file before you attempt to rename it. It
may not be intended in your setup, but you should verify this
possibility, and even if that isn't supposed to happen, be prepared that
it happens anyway (e.g. due to human actions or some sort of system
failure).
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk