Boost logo

Boost :

Subject: Re: [boost] [filesystem]: infinite-recursion with symlink
From: Zachary Turner (divisortheory_at_[hidden])
Date: 2009-11-24 17:44:48


On Tue, Nov 24, 2009 at 4:03 PM, Beman Dawes <bdawes_at_[hidden]> wrote:

> On Tue, Nov 24, 2009 at 10:16 AM, Zachary Turner <divisortheory_at_[hidden]
> >wrote:
>
> >
> > a) timestamp operations
> >
>
> Which specific timestamp operations do you need? Last write time is already
> supported.
>

All timestamps which are gettable / settable via system calls. In windows
this means create/access/modify, and posix this means create/modify/change.
 One of the posix ones I believe is actually not settable.

>
>
> > b) cross-platform create/open of files
> >
>
> Nothing planned, beyond the current <fstream> support.
>
Unfortunate, as there is really quite a bit in common between the two
operating systems' create/open methods that could be exploited. For example
 (here, <-> means either equivalent or more or less equivalent):

O_DIRECT <-> FILE_FLAG_NO_BUFFERING|FILE_FLAG_WRITE_THROUGH
O_NONBLOCK <-> FILE_FLAG_OVERLAPPED
O_CREAT | O_EXCL <-> CREATE_NEW
O_CREAT | O_TRUNC <-> CREATE_ALWAYS

You can abstract these into an enumeration such as create_direct,
create_async, etc but also define all the platform specific ones as well and
allow them to be combined with the generic ones.

Other areas of boost require handles to operate on. For example,
boost::asio supports asynchronous operations but requires a handle that's
been opened with the appropriate flags (FILE_FLAG_OVERLAPPED, for example).
 It actually doesn't support async i/o for posix filesystem handles, but I
have my own extensions to boost::asio that do allow this and map to the
posix aio_* family of apis, and requires a handle that's been opened with
O_NONBLOCK.

boost::iostreams already supports a cross platform file descriptor / handle,
but there is currently no cross-platform way to actually create such a
handle. So I have the feeling that almost anyone using
boost::iostreams::file_descriptor is using ifdefs all over their code to
create the handles. Correct me if this is wrong.

>
>
> > c) windows junctions and symlinks
> >
>
> Supported in V3.
>

What types of operations are supported for junctions and symlinks? Can I
query the target to see what it points to, and is there an api(s) that
allows delete to selectively delete the target or the original item?

Also I forgot to mention hard links. If hard links are supported, can i
query the link count or get the inode number? (Contrary to popular belief,
all versions since windows 2000 support hard links and the ability to get an
inode # for a file).

>
>
> > d) unix block/char devices, sockets, and pipes
> >
>
> Nothing planned in Boost.Filesystem. Several other Boost libraries already
> have at least some support for these.
>

If boost.filesystem supported them, I could create them using a consistent
interface to how I create other types of filesystem objects, and also be
able to use any timestamp functionality provided by Boost.Filesystem to
query them.

>
>
> >
> > create/open is the biggest gaping hole in the current FS library in my
> > opinion. you often just need a handle and want to customized the way in
> > which it's opened. with my code you can do soemthing like:
> >
> > filesystem::handle handle; /* opaque structure, only understood by
> the
> > filesystem api */
> > filesystem::object_info info; /* boost variant, internal type depends
> on
> > type of filesystem object */
> >
> > filesystem::create_file(
> > path,
> > link_open_target, /* follow symlinks */
> > only_dir, /* fail unless this is a directory handle
> > */
> > flags::async | flags::direct | flags::bypass_security, /*
> > open for async direct i/o and disable any kernel security checking */
> > &handle, /* have the function return the handle
> (this
> > can be null if not interested) */
> > &info; /* have the function return object info
> > (this can be null if not interested) */
> > );
> >
> > handle then is an opaque object that can be used by other filesystem apis
> > like timestamp operations etc, and info is a boost::variant whose type
> > depends on whether it's a directory, junctino, symlink, pipe, etc.
> >
>
> Nothing like that planned at the moment.
>
> I've solved all of the above problems in a cross-platform way in my own
> > in-house api but it might be difficult to integrate any of what i've done
> > into an interface consistent with the current filesystem library. if you
> > want any of this code though let me know.
> >
>
> What I'd be most interested in is motivation and use cases. I need to
> better
> understand the need before I start thinking about code.

Well, I work on high performance backup software for linux/windows. For
backup I need raw access to the disk, this means I need fine control over
the handle that I'm performing I/O on. In particular, it needs to be
asynchronous and support unbuffered i/o but there are various other cases
where I use strange combinations of flags (for example
FILE_FLAG_SEQUENTIAL_SCAN or FILE_FLAG_DELETE_ON_CLOSE on windows). Some
files however should not be backed up (depends on some custom rules
specified by the user) and for these I need to be able to recursively delete
them. But maybe sometimes I want to follow links and sometimes I don't,
again depends on some user parameters.

If performance were not such a high consideration this is not a problem.
 But since it is, I need to do everything possible to minimize the number of
API calls and opens/closes on individual files. Consider for example a
system with millions of small files (say 0 bytes just for the sake of
argument). In this case just opening the file is 100% of the work that
needs to be done on this file, so I should try to open it as few times as
possible. However, first I have to know that it's even a normal file and
not some directory that I need to recurse into. So I check if it's a file,
it is so then I can open it and start reading from it. But issuing two
calls via a path is going to be much slower than first getting a handle to
the object and then querying the handle for the required information. Then,
without even opening it again, I can use the same handle to actually read
data from the file, saving costly operations.

There are many examples of optimizations such as this, but ultimately it
boils down to the fact that operations on handles are much faster than
operations on paths, and handles can also be used to actually perform i/o
on.

As for restore, I need to be able to set every possible aspect of a file
that exists, including all timestamps, permissions, and I need to be able to
restore any type of file whether it be a socket, pipe, or windows junction.
  Again, I need fine grained control over the handle. For example, on
windows I need FILE_FLAG_BACKUP_SEMANTICS to disable ACL security checking.

90% of this can be abstracted into things that are common between each
platform. For the parts that can't, I really like the model that Boost.Asio
has employed, where it provides a windows and posix namespace and provides
all the extra details there.

Zach


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk