Boost logo

Boost :

From: Dietmar Kuehl (dietmar_kuehl_at_[hidden])
Date: 2002-03-03 11:19:21


Hi,
Jeff Garland wrote:

> Maybe I'm missing something fundamental, but an iterator needs to point at some
> sort of type. All Beman is suggesting is that the type be something like the
> following(forgeting the templated string for a minute):
>
> //fundamental capabilities of all directory entries
> class directory_entry
> {
> public:
> std::string pathname() const;
> std::string filename() const;
> bool is_directory() const;
> };

The bit you are missing, IMO, is that there is no need to expose what
the iterator is pointing to directly. There is some confusion in the
way the standard iterators are specified which basically mandates that
'*it' and 'it->some_member()' have a reasonable meaning. This whole
approach is ill-advised and especially when the sequence is made up of
multiple of attributes, as is the case even with what you suggested
above, it does not work really well.

Don't mind the fact that I have choosen the filename as the entity
accessed using 'operator*()': This is an entirely arbitrary choice and
any other attribute, be it 'is_directory', the last modification time,
or a constant would have done as well! This is only present to turn the
iterator into an iterator matching the standard requirements for
iterators and the filenames seems to be an obvious choice for this: A
basic goal was to make simple things simple and filling a container with
the names in a container using

   std::vector<std::string> dir((dir_it("dir2")), dir_it());

seems to be easy enough (note, BTW, that your approach makes even things
like this much harder!). If people would prefer to have 'operator*()'
return a full path name it would be fine for me as well: I don't care
much about 'operator*()' - except that it should do something easy to
use if we have it in the first place.

The directory iterator's actual data is handle on a directory entry
which caches the entries system attributes. Since the set of attributes
is open (it can be considered to be open for basically all sequences
because it is for all sequences reasonable to view the sequence as a
sequence of derived data) it does not make sense access attributes via
member functions [directly]: This mere causes to consider some
attributes to be more important than others. ... and the list of
attributes *you* consider to be important does not match mine: To me the
name of the file and the access rights are important! Why is there no
function telling me whether I can read the file in the set? You may
reasonably ask why there is no such function in my latest submission if
this is so important for me. The answer is simple: it takes more than
just the directory facilities to figure this out and I didn't want to
write a whole POSIX binding. Still it is much more important to me than
eg. the pathname because I thave this already anyway (this is what the
directory was created from).

The whole property map issue is to avoid special handling of attributes.
I play favorites for the file name but I had to select one attribute to
be used for 'operator*()'. ... and even this attribute can be accessed
in a consistent way using property maps. Ah, talking of property maps:
Of course, I'm all in favour for using the property map interfaces as is
spelled out in the property map library! The only reason I used the
explicitly specified template arguments was that the property map object
is not really necessary in this case. I have, however, no problem at all
if this special case is dropped.

> My point from day 1 in this thread is that something like the above are the bare
> essential required for portable programming with directories and files and
> therefore I believe clear justification for keeping them apart from the
> remaining optional properties.

The may be your bare essential for dealing with directories but
obviously this list contains already redundant information (the path
name) nor is it sufficient for eg. my needs. Thus, I can't see why your
choice of attributes is better than mine: Just let's avoid the whole
issue and access all attributes the same way.

> I suppose either would be fine, but the difference makes the non-portable code
> ( get<is_graphic> ) stand out.

You are implying that I need non-portable code to figure out whether a
file is gif, jpeg, ...? I don't think so.

>>The reason that the directory iterator has a value type at all is just
>>that iterators [falsely] need value types in the first place! The file's
>>name is more or less an arbitrary choice which comes in handy for simple
>
> The name is still a string value, no?

As I spelled out already in the previous mail: The value type does not
at all matter to majority of the operation! It is just a special access
method for a special attribute. Whether it is a string, a bool, an int,
or an arbitrary complex data structure does not really matter at all: It
is just there to make the directory iterator an iterator according to
to the standard requirements. And a string representation of the file
name seems to be a pretty convenient approach - much more convenient
than the structure you suggested.

The only reason 'operator*()' might actually access something different,
eg. the structure used to represent a handle for the directory entry
(which is, BTW, opaque to the user), is that 'operator*()' may be the
best choice to obtain the "key" used by property maps from an iterator.

> There is another factor here that I think we are forgetting.
> The pure functional approach is going to need to access the disk again for each
> property lookup since the only thing the iterator can hold is the name. As a
> result, performance will certainly suffer.

As strange as it may seem: No, I'm not at all forgetting this issue!
Just because the value type is a string it does not mean that the
string is the only entity stored in the iterator's representation - and
it is not in my implementation. Instead, there is an access method the
corresponding property maps use to get at the internal representation.
What I had forgotten about was the indirection taken by property maps,
ie. that they use a "key" rather than the iterator itself (I have always
used iterators directly instead of a "key"). This would, however, settle
what 'operator*()' is returning: A reference to an opaque structure. The
corresponding property maps know what to do about this structure to
retrieve the attributes. This would also fix the problem of having two
overload for directory iterators and directory entries as the directory
entry would also provide access to a reference to the opaque structure
for use as a key by the property maps.

> It would be really nifty if a
> programmer could customize at compile time the properties retrieved with each
> direntry....

For the systems I know (WIN32 and POSIX), the system specifies what
attributes are obtained by a system call and this happens to be what
is stored internally. If you are referring to additional, derived
attributes, I think it would cause more problems than it would solve:
it would be necessary to maintain which of these attributes are up to
date and I guess that this maintainance is more expensive than the
computation of most derived attributes. That is, the machinary would
just be costly without benefit.

-- 
<mailto:dietmar_kuehl_at_[hidden]> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk