|
Boost : |
From: Jason D Schmidt (jd.schmidt_at_[hidden])
Date: 2002-11-09 16:05:41
Hi everyone,
Awhile ago, I posted a few messages about writing some fast Fourier
transform (FFT) code for Boost. Now, we have a lot of discussion about
numerical integration. If we're going to start writing numerical code
for Boost (and I hope do a lot of it), we should probably come up with
some overall guidelines about their interfaces & performance. Here are
some of my thoughts on those issues; maybe this could start a useful
discussion.
Much like STL algorithms, numerical functions can take an array of data
and process it and return either a single number or another array
(integration). Also, numerical code can take a function object and
return an array (derivatives). This gets more complicated, for example,
a Fourier transform
(FT) takes an array of either real or complex data and, in general,
returns complex result.
With the above examples in mind, there are a lot of issues to think
about. Should algorithms that take an array of data simply take some
container as an argument and return as a container (if appropriate), or
should they take input iterators, an output iterator, and return an
output iterator (ala STL)? An STL-style algorithm would work just as
well for C arrays as for any standard container. If the algorithm takes
a std::valarray<T> (not a standard container), for example, then the
programmer who writes the algorithm can take full advantage of the
natural math operations defined for
that class. After all, why should we even allow a user to integrate a
std::list of data? That's not the intended use of the list class.
However, that is THE one and only intended use of the valarray class, and
providing an iterator interface doesn't allow numerical algorithms to
operate on C++'s only numeric array class because valarrays don't have
iterators. Then again, maybe someone will want to use a boost::array as
often as a plain C array as often as their own array class; then that
user is forced to use the valarray class when it doesn't necessarily fit
his needs.
Say that we want to stick to the STL convention and go with an iterator
interface. Now, let's try applying it to a FT function. We might want
the following types of FT functions:
one that takes real data & returns complex
one that takes complex & returns complex
one that's entirely in-place
Here are the difficulties that arise:
Life is easy when the FFT takes complex & returns complex:
template<class InIt, class OutIt>
OutIt fft(InIt first, InIt last, OutIt x);
If it takes real data & returns complex, how would that interface even
work? How do we guarantee that InIt points to real data & OutIt points
to complex? Maybe some overloading could be done, but there's no partial
specialization for function templates, only for class templates.
Now let's think about an in-place FFT:
template<class InIt, class OutIt>
void fft(InIt first, InIt last);
The in-place transform must take complex data & return complex data
because InIt is the type of the arguments & return value. There's no
possibility of transforming real data in-place because in general the
result is complex. To transform real data this way, a user would have to
loop through the data
& convert to complex. It gets worse because performing a FFT of real
input only requires a complex FFT of size n/2 (n is the size of the input
data) because of a property of FFTs. That's a big performance gain that
a user would severely miss.
Clearly, performance considerations do make a big impact the interface.
To add another FFT example, the first part of the FFT algorithm requires
re-ordering the data. The new order of the data only depends on the
number of data points to be transformed. If we could make FFTs of
different sizes separate functions, then inside each FFT function, one
could store the new order of the indices in a static array object. If
the size is a template parameter, that happens with little effort, and we
can make a big performance gain by eliminating redundant calculations of
the indices. That way if a user is doing 1,000 FFTs of size 1024, the
order of the indices only has to be computed once, so each FFT evaluation
does only a portion of the algorithm. Requiring the user to input the
size at compile time is a
bit of a restriction, plus there are many difficulties that arise when we
try to build other algorithms on top of our complex FFT (real FFT, power
spectrum, correlation, convolution). To alleviate the restriction &
other difficulties, we could make the FFT a class, & require the user to
pass the size of the data into the FFT constructor, then the order of the
indices happens only once, in the constructor.
Just to add one final thought, comparing numerical algorithms to STL
algorithms again, do we want to make performance guarantees (algorithmic
complexity) just like the STL? That would be very nice. Keep in mind
that a function's overall performance depends greatly on the containers
used, not just the complexity of the algorithm. Thus, the "valarray vs.
iterator vs. anything else" interface issue will have a big impact on
performance.
I think these are issues that need to be weighed, and a policy should be
set forth before Boost starts to accept numerical algorithms. That would
make it much easier for developers of numerical algorithms for Boost.
Jason Schmidt
________________________________________________________________
Sign Up for Juno Platinum Internet Access Today
Only $9.95 per month!
Visit www.juno.com
Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk