Let's say I wanted to create my own thread pool class.  I attempted this at first by making a thread manager class and in its constructor creating boost::thread objects and letting the objects expire naturally, when they went out of scope.  The threads themselves and its function seemed to be running fine, but when I started encountering problems I wondered if the thread object needed to remain in scope.  So I switched to creating boost::thread objects on the heap with new, but this didn't solve my problems, and by doing it this way I found out about the mem leaks.  So my first question is what is the best way to make a thread pool, and are there any pratfalls to watch out for?
 
I made what I thought was a simple input / output scheme for the thread functions.  Please see the code attached.  The thread functions sleep while waiting on the main thread to add data to an input queue.  The thread then pops it, and locks a mutex while executing.  When done, it puts the data on the output queue (order doesn't matter) and the mutex is unlocked.  The main thread is checking for when the input queue is empty, and when it is, it tries to lock all the thread mutexes.  Although the input queue may be empty, the thread may still be executing and the output would not be full, thus the thread mutex.  The problem I was getting was that the thread manager seemed to be ignoring mutexes.  You can see the problem in action with the attached code.  What you are looking for is when the size of the output is not what you put in.  You will need to run the code many times (~30) because the error is infrequent.
 
A much more common error (1 in 3) I am getting is that I get an access violation in boost::mutex::scoped_lock.  I don't know if they are related or what.  I found if I have a memory intensive app in the background then there aren't as many access violations, so run one while looking for the other bug or you will go crazy.
 
So basically I am new to threading and I don't know if these are bugs in my code, the boost::threads library, or what.  Any suggestions on the abstraction of the thread pool or anything else is welcome.
 
Boost 1.33.1
Windows XP x64 (win32 debug app)
Intel Core 2 Duo