Boost logo

Boost :

From: Angus Leeming (angus.leeming_at_[hidden])
Date: 2004-09-27 15:33:55


Gennadiy Rozental wrote:
>> Would you be happy if I changed the interface to:
>>
>> std::list<fs::path> matches;
>> boost::glob(matches, "../te?t/foo*bar", starting_directory);
>
> It slightly better, but I still prefer iterator interface. What I am
> going to do with above list? Most probably I a mgoing to iterate through
> it to do somethins, or find simething. What if I only need first matching
> element? We have several years expirience now with container/iterator
> alternatives. I believe iterator interface is more natual here (and
> ultimately more efficient).

Don't get me wrong, I'm not disagreeing. It's just that I'm on a steep
learning curve here.

Maybe it helps us both if I present some code. The first thing that
boost::glob does is parse the "../foo*bar/*.cpp" pattern into a list of
predicates, one per component in the pattern tree (3 in this case):

        std::list<detail::glob_predicate<> > predicates;

        bool contains_wildcards = false;
        for (fs::path::iterator it = glob_path.begin(),
             end = glob_path.end();
             it != end; ++it) {
            detail::glob_predicate<> const predicate(*it, flags);
            contains_wildcards += predicate.contains_wildcards();
            predicates.push_back(predicate);
        }

The glob_predicate is a functor that is used by glob_iterator to filter
files:

template <typename TraitsT = glob_traits>
struct glob_predicate {
    /** @returns true if @c p.leaf() matches the glob expression. */
    bool operator()(filesystem::path const & p) const
};

However, if the pattern I'm trying to match doesn't contain any wild cards
at all, then there's no need for any iteration at all:

        std::list<fs::path> matches;
        if (contains_wildcards) {
            glob_impl(matches, predicates.begin(), prior(predicates.end()),
                      working_dir, flags);

        } else {
            fs::path abs_path = working_dir / glob_path;
            if (exists(abs_path))
                matches.push_back(abs_path);
        }

That doesn't seem to fit nicely with an iterator interface to me, but I
emphasise I'm coming from a position of zero knowledge.

Similarly, glob_impl is a recursive function that calls itself for each
component in the "predicates" tree. At each level, it checks whether that
component contains a wild card. If not, it just adds the component to the
matching path:

void glob_impl(std::list<fs::path> & matches,
               predicate_vec::const_iterator first,
               predicate_vec::const_iterator last,
               fs::path const & working_dir,
               glob_flags flags)
{
    detail::glob_predicate<> const & predicate = *first;

    if (first == last) {
        // We've reached the destination directory.
        // Add any matches to @c matches.
        if (!predicate.contains_wildcards()) {
            fs::path const candidate =
                working_dir / fs::path(predicate.file(), fs::no_check);

            if (exists(candidate) &&
                (!(flags & glob_onlydir) || is_directory(candidate)))
                    matches.push_back(candidate);

        } else {
            detail::glob_iterator git(predicate, working_dir);
            detail::glob_iterator const gend;
            for (; git != gend; ++git) {
                if (!(flags & glob_onlydir) || is_directory(*git))
                    matches.push_back(*git);
            }
        }

        return;
    }

    if (!predicate.contains_wildcards()) {
        // No testing for the existence of the directory.
        // That would place additional, unnecessary requirements on glob().
        glob_impl(matches, next(first), last,
                  working_dir / fs::path(predicate.file(), fs::no_check),
                  flags);

    } else {
        detail::glob_iterator git(predicate, working_dir);
        detail::glob_iterator const gend;
        for (; git != gend; ++git) {
            if (is_directory(*git))
                glob_impl(matches, next(first), last, *git, flags);
        }
    }
}
        

Now, presumably you're going to come back at me and tell me that this is
all just implementation detail. That's true, it is. But my problem is that
I don't yet see how I could attack the problem more efficiently using some
other approach.

Does any of this make sense?

Regards,
Angus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk