Thanks as always Stephen (who I still refuse to accept is either a single individual or human):  Sorry to all if this is getting so long and specific, this project is justified later.

Your comments generated a lot of thought about what my main problems are.  I think I distracted myself by the generation of derivatives here before I had worked out some other issues. But your implementation looks very elegant.  For now lets assume that people supply the gradient/jacobian directly or from their own finite_difference/autodifferentiation.  So here are some, more basic, questions.

Question 1: Based on your template, how do I forward a function to the boost::function's operator() in this wrapper?  Why might we want to do this?  Because we are trying to implement a concept with operator() and .gradient(..) with a function wrapper here.  Library writers may want to write generic code that operates on any of these, not just ones wrapped in math_function...  Then we may end up with true inlining of operator() and .gradient(...) when a full differentiable function object is implemented, instead of always going through an indirection pointer in boost::function.

template<class Signature> 
class math_function { 
public: 
   typedef boost::function<Signature>::result_type result_type;
    template<class F> 
math_function(const F& f, const G& f) : f_(f) : g_(g) {}
result_type operator()(How to extract from the signature the input parameters? myvalues...)
{
return f_(myvalues....);
}
result_type gradient()(How to extract from the signature the input parameters???????)
{
return g_(myvalues...);
}

private: 
    boost::function<Signature> f_; 
    boost::function<Signature> g_; 

}; 
void test_mathfunctions()
{
using lambda::_1;
        using lambda::_2;
math_function<double (double)> myfunc(_1*_1, 2* _1); //simple quadratic
       cout << myfunc(.1);
       cout << myfunc.gradient(.1);
}

Question 2: How can this wrapping interact with binding?
void test_mathfunctions_binding()
{
using lambda::_1;
        using lambda::_2;
      //But you might also want a constant thrown in there...  Why a separate parameter?  Because you don't take the derivative over the other stuff.  And you want to keep your functions pure so you can bind, not force them to be stateful, etc.
      math_function<double (double, double)> myfunc(_1*_1  + _2,  2* _1); //simple quadratic  translated by a constant
      
     //But how would you work with bind?  Would these following work immediately?
     boost::function<double (double)> myfunc_bound = bind(myfunc, _1, 2.0); 
     boost::function<double (double)> myfuncderiv_bound = bind(myfunc::gradient, _1, 2.0); //or something like this... I always forget member function notation.

    //Any way to bind that entire structure back to a math_function?  Very useful since we would bind parameters before calling optimizers, etc.  With dynamic programming we usually have a grid of values and call the optimizer binding over the entire grid.
     math_function<double (double)> myfunc_bound = math_bind(myfunc, _1, 2.0);
    cout << my_func_bound.gradient(1.0); //Note that the binding can now use the gradient. 
}

Question 3: How would we adapt a structure implementing the operator() and gradient() to be stored here?  Just like boost::function accepting any function object?
struct square
{
   double operator()(double x) {return x * x;}
   double gradient(double x){return 2 * x;}
};
math_function<double (double)> myfunc(square);

Given the previous code, would we write a constructor like?: 
template<class F>
   math_function(const F& f) : f_(f), g_(bind(F::gradient, f, _1...?) //Here I don't see how to bind to the arbitrary signature, extracting the inputs from the Signature which (may) be more than 1


Question 3: What are the thoughts on whether/how this is extensible to higher dimensions?   Lets start on the gradient with mapping functions:    f : R^N -> R
It is perfectly reasonable for these types of problems for the first parameter in the function to be the only one that the derivative is taken over.  The dimensionality of the derivative/gradient/jacobian would come out of a combination of the result_type from the signature  and the dimensionality of the domain.  For example, something like:
template<int DomainDim, class Signature> 
class math_function { 
public: 
   typedef boost::function<Signature>::result_type result_type;
   typedef boost::array<DomainDim, result_type> gradient_result_type; //Note that this adds the dimensionality, the data structure could be a template parameter.
    template<class F> 
math_function(const F& f, const G& f) : f_(f) : g_(g) {}
result_type operator()(How to extract form the signature the input parameters???????)
{
return f_(Signature);
}
gradient_result_type gradient()(How to extract form the signature the input parameters???????)
{
return g_(Signature);
}

private: 
    boost::function<Signature> f_; 
    boost::function<gradient result_type with OLD INPUT Signature> g_; 

}; 

void test_mathfunctions_multidim()
{
using lambda::_1;
        using lambda::_2;

      boost::array<double, 2> mygrad(boost::array<double, 2> x, double a) {boost::array<double, 2> ret; ret[0] = 2*x[0]; ret[1] = 2*x[1]; }; //I put this separately since I don't know how to construct arrays in lambda

      math_function<2, double (boost::array<double, 2> , double)> myfunc(pow(_1[0]*, 2) + pow(_1[1], 2) + _2, mygrad);

      cout << myfunc.gradient({{0.0, 0.0}}); //etc.

}

Question 4: Now lets say that what we really want is to manage f: R^N -> R^M functions.  Here, we could potentially use ublas::matrix as a default data structure.
template<class Signature, class SecondDimStructure = ublas::matrix,
                   template <class T, int N>  class NDimStructure = boost::multi_array >  //We really don't need the dim's here since we are using heap based data structures.
class math_function { 
public: 
   typedef boost::function<Signature>::result_type result_type;
   typedef SecondDimStructure  gradient_result_type; //This really is a jacobian, size N * M
   typedef NDimStructure<result_type::value_type, 4>  hessian_result_type; //This really is size (N*M) ^2 if I remember my calculus....

    template<class F> 
math_function(const F& f, const G& f) : f_(f) : g_(g) {}
result_type operator()(How to extract form the signature the input parameters???????)
{
return f_(Signature);
}
gradient_result_type gradient()(How to extract form the signature the input parameters???????)
{
return g_(Signature);
}

private: 
    boost::function<Signature> f_; 
    boost::function<gradient result_type with OLD INPUT Signature> g_; 
    //should add in the higher dimensional hessian of the function if I remember 

}; 

Question 5: I want to also associate information about the nature of the function.  Whether it is convex/concave, increasing, strictly increasing, etc.  These tend to have an inheritance structure, but it isn't absolutely necessary.  I was thinking about using trait classes.  But I assume that we would lose all those traits when we use type erasure?  Are we better off storing a bunch of enums in this data structure?  Increasing the size of the class shouldn't be an issue.

Question 6:
Let me justify why this is all very useful and a general problem:  Right now, everyone writing non-linear algorithms in C++ has their own structures for managing mathematical functions that contain derivatives, jacobians, hessians.  I am running into this right now because I am trying to wrap a bunch of other people's algorithms for my own work.  A few comments:
1) Everyone is using dynamic polymorphism for these overloading a virtual operator() and some type of gradient/jacobian... I assume this is because there is no agreed upon concept for functions objects with gradients...  In reality, none of these seem to require dynamic polymorphism.  Why shouldn't a newton's method optimizer have inlining of the operator() and derivative through static polymorphism where possible?
2) Most of these guys have conflated functionality/settings specific to their optimizers/solves with the evaluation of the function/gradient.  This makes generic code difficult without wrappers/adapters.
3) Many use their own data structures for arrays, matrices, multidim results.  This may be inevitable, but means that you need to have these types parameterized in any mathematical callback routine.
4) While a lot could be gained from having people agree on a static concept for mathematical applications akin to standardizing on operator() for functors, it is also very useful to have a generic callback wrapper akin to boost;:function.  The main reason for this is that you will tend to write parameterized, pure functions where you want to bind parameters.  Worst case, people could write their own functions with this mathematical callback structure and adapters could be written for the different libraries.
5) Not to disparage these superb libraries, but a few examples to see what I mean:  http://quantlib.org/reference/class_quant_lib_1_1_cost_function.html  and http://trilinos.sandia.gov/packages/docs/r6.0/packages/didasko/doc/html/nox_ex1.html and http://www.coin-or.org/CppAD/Doc/ipopt_cppad_simple.cpp.xml

If anyone is interested in working with me on this very interesting and useful problem, please give me an email.  I think it has a lot of potential for general use.

-Jesse