Boost logo

Boost :

From: Caleb Epstein (caleb.epstein_at_[hidden])
Date: 2006-03-06 10:14:54

On 3/4/06, Caleb Epstein <caleb.epstein_at_[hidden]> wrote:
> On 3/4/06, Ion Gaztañaga <igaztanaga_at_[hidden]> wrote:
> > Now that filesystem is proposed for the standard I would like to ask
> > boosters (and Beman, of course) if they find these performance concerns
> > serious enough.
> Perhaps if they were accompanied by some comparative performance
> benchmarks or profile analysis?

In the interests of science, I wrote a small "file finder" using
Boost.Filesystem and a comparable version using POSIX functions (e.g. stat,
readdir, etc). The POSIX version runs MANY times faster than the
Boost.Filesystem version (code attached). Note that the Boost version makes
use of the "status" member of the directory iterator which is in CVS and is
aimed at reducing the number of operating system calls that the library
needs to make.

The test programs take an optional -R (recursive) flag and a list of
directories and filename extensions on the command line. They walks each of
the directories (recursively) searching for files with matching extensions
and tally their number and size. At the end, they generate a summary report
by extension.

Here is some sample output after priming the buffer cache by running each of
the programs several times (yes, I have a lot of music):

[9:48] cae @ tela 740% time ~/finder-fs -R /raid/shn .mp3 .flac .shn
.flac: 5550 files, 243.918 GiB
.mp3: 30364 files, 232.695 GiB
.shn: 152 files, 6.90744 GiB
~/finder-fs -R /raid/shn .mp3 .flac .shn 0.31s user 3.97s system 97% cpu
4.369 total

Once the buffer cache has been primed, the results do not vary much. All
runs are on the order of 4.3 seconds. This is using an optimized version of
the filesystem library and my code compiled with -g -O2 -pg. Removing the
profiling options reduces the runtime to approximately 4.1 seconds so the
profiling overhead is relatively small.

Here's the output from the POSIX version:

[9:49] cae @ tela 741% time ~/finder-posix -R /raid/shn .mp3 .flac .shn
.flac: 5550 files, 243.918 GiB
.mp3: 30364 files, 232.695 GiB
.shn: 152 files, 6.90744 GiB
~/finder-posix -R /raid/shn .mp3 .flac .shn 0.18s user 0.64s system 99% cpu
0.832 total

Looking at the profiling output from the "finder-fs" program, it appears a
bulk of the time is spent in fs::basic_path::operator/=() which may bear out
Ion's fears.

The profiling output for "finder-posix" shows that the bulk of the time (
66.6%) is in std::map::find and the finder function. In the
Boost.Filesystem version, this amounts to only 12.5% of the runtime.

Caleb Epstein
caleb dot epstein at gmail dot com

Boost list run by bdawes at, gregod at, cpdaniel at, john at