Boost logo

Boost :

From: Angus Leeming (angus.leeming_at_[hidden])
Date: 2004-10-01 05:27:55


Martin wrote:

>> An interesting idea and certainly much less work
>>
>> However, as I understand it, you're suggesting limiting the wildcards
>> simply to ensure that the filtered_directory_iterator behaves the same
>> on posix and windows systems?
>
> No. The main reason was to have a simple iterator for simple (and what I
> think is the most common) cases which also avoid the need to go via a
> list.

That's two separate requirements.
1. A simple iterator. By 'simple', you mean one using the underlying API.
   Right?

2. Avoid the list in

list<path> glob(string const & pattern, path const & working_dir);

Actually, this second requirement is contradictory to the first because
glob()'s results must be stored internally for the iterator to then
iterate over. No?

> So did I but I put it into a separate iterator where you can define the
> rules completely independent of the filesystem.

This is, in essence, what I am proposing. I have now reworked the interface
following Gennadiy's suggestion. Here's a glob_iterator that can recurse
down directories:

class BOOST_GLOB_DECL glob_iterator
    : public iterator_facade<
                 glob_iterator // Derived type
               , filesystem::path const // value_type
               , single_pass_traversal_tag
>
{
public:
    glob_iterator() {}
    glob_iterator(std::string const & pattern,
                  filesystem::path const & wd,
                  glob_flags flags);
private:
    ...
};

It works, but is considerably slower than the function returning a list. No
doubt profiling will help track down what I'm doing inefficiiently.

# A simple wrapper for the real glob()
$ time ./real_glob_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.042s
user 0m0.010s
sys 0m0.010s

# The glob() function I posted earlier in the week.
$ time ./glob_fun_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.099s
user 0m0.070s
sys 0m0.010s

# The new glob_iterator.
$ time ./glib_it_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.236s
user 0m0.200s
sys 0m0.010s

I'm never sure whether to pay attention to the 'real' or to the 'user'
times... Anyway, there's a clear heirarchy ATM.

>> Don't you ever search for things like "[a-d]*.{cxx,hpp}"?
>
> I do it in the shell but I have never had the need to do it inside an
> application. I'm sure there are such applications.

Here's one. Qt (QProcess), gtk (gspawn*) and ACE (ACE_Process) all enable
the user to spawn a child process in a portable way. However, what they
all lack is a *powerful* way to initialise their data from a string
containing a "command-line like" syntax.

(And, no, passing an arbitrary "ls `rm -f *` foo.cpp" to the system()
command isn't a viable alternative.)

I've been playing around writing something that can parse a subset of the
Bourne shell. Enough to make it easy and safe to launch a single process
from a string. "parse_pseudo_command_line" fills a "spawn_data" variable.
It's then simple to ascertain whether the request is safe or not.

Now *this* is a function that would benefit from a portable glob.

http://www.devel.lyx.org/~leeming/libs/child/doc/html/parse_pseudo_command_line.html
Equivalent URL: http://tinyurl.com/4c4v9

>> Also, how do you limit the wildcards? I take it you don't, but that the
>> underlying matcher (findfirstfile, glob) will behave differently on
>> receipt of the same pattern.
>
> The filesystems already behave differently since one is case-sensitive
> and the other is not. Anyway, I think it is reasonable to limit the
> wildcards to some portable syntax e.g. max 2 '*' are allowed and they
> must either be the last character or followed by a '.'.

Again, how do you *limit* them? That implies that you must prescan the
pattern, presumably throwing once you've determined that the thing is
breaking your "reasonable" limits.

Regards,
Angus


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk