Boost logo

Boost :

From: Dietmar Kuehl (dietmar_kuehl_at_[hidden])
Date: 2002-03-03 22:14:45


Hi,
Jan Langer wrote:

> i would use three types. one for saving the directory walking status
> (DIR and dirent), one as an identification object and one as the
> attribute cache. and i think the first type which saves the walking
> status should be the iterator itself.

I think we are at least partially talking of different things so lets
give things a name together with some form of a definition:

- "filename": This is a representation identifying a file at least in a
    given context (ie. it may be a relative name).
- "dir_entry": This is a handle for file constructed from a "filename"
    which provides access to the file's attributes.
- "dir_it": This is an iterator walking a directory which is constructed
    from a filename. It provides acccess to the current file's attributes.
- "file_rep": This is an opaque type used to hold a dir_entry's or a
    dir_it's data. If people feel uncomfortable with storing the directory
    iterator's state, too, this can actually even be split into two parts
    to separate the directory iterator's state from the file's attribute
    cache.

(all four types are actually used in my implementation...).

I think the first three of these are pretty obvious :-) The filename can
be an arbitrary sequence, probably a 'basic_string' with an appropriate
character type: Probably all of these are templatized with a character
type but this is just an orthogonal issue.

The reason for the "file_rep" is two-fold: First, I consider it crucial
that system headers don't need to be include just to use the directory
facilities. That is, neither the user nor the directory iterator
implementation shall include system headers in the corresponding header
files. There are two implications out of this:
- We don't know all of the types, eg. uid_t on POSIX. I don't think that
    this is a big problem because either these are integers or should
    probably be not directly visible anyway (see eg. the handling of the
    files access properties in my implementation).
- The file_rep is only declared in user visible headers. It is defined
    in a header or a source private to the implementation. Also, it is
    handled via a pointer internally because we don't know the layout
    of this type to embed it into a dir_entry or a dir_it.

Second, the file_rep can be shared between dir_entry and dir_it to have
only one implementation of the attribute accessors. Using the property
map's together with a "key" rather than directly with the object makes
things just easier: 'operator*()' would return a reference to a
'file_rep'. To get data out of this beast, you would use corresponding
property maps.

The file_rep (it may, of course, be called differently, as may the other
types be called differently, eg. "basic_string" rather than "filename")
is basically an attribute cache. It is used at least to retrieve all
attributes at once for reading. Whether it is used to prevent
write-through is a different issue: There are arguments both ways. When
writing data into a cache rather than to the actual entity, it may
easily result in surprising behavior. Basically, a cache should be
transparent to users and this is often non-trivial to achieve. For
example, assume that "/tmp" is the current working directory:

    dir_entry f1("./foo");
    dir_entry f2("/tmp/foo");
    set(user_read, *f1, true);
    set(user_read, *f2, false);
    std::cout << get(user_read, *f1) << "\n";

Does this print "0" or "1"? I would expect "0" but I'm about 100% sure
that a cached implementation would write "1". ... and the result after
this code would be either the original state if we require an explicit
"commit" and "1" if the destructor commits the change implicitly.
However, I would the access attribute to be "0" because this was the
latest change.

Actually, I just wanted to point out the attribute cache has to be
invalidate (or to be updated) when the directory iterator is moved
forward. I would definitely bundle a file_rep with a directory iterator,
BTW, because it would otherwise mean that for file attribute access
(like eg. "is_directory()") it would be necessary to create a
dir_entry.

> your approach seems to mean that a directory iterator points to an
> attribute cache.

Yes, that's correct. ... and even more, I think this is the correct
approach :-) You have to put the attribute cache somewhere. Options
are the dir_it or a newly created dir_entry. I think going over a
dir_entry makes things considerably more inconvenient while the extra
cost of having a dir_it store both the iteration status and the
attribute cache is rather low.

-- 
<mailto:dietmar_kuehl_at_[hidden]> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk