Boost logo

Boost Users :

From: Andrew McDonald (andrew.mcdonald_at_[hidden])
Date: 2006-05-11 04:40:06


I am using boost_1_33_1, in Microsoft Visual C++ 2005 Express
Edition version 8.0.50727.42.
Boost has been built with the vc-8_0 toolset.

 
When using a filtering_stream with a regex_filter I get a crash in cases where the
filtering_stream is destroyed (i.e. closed) having processed text without any match for the regex.

===========================================================

#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filter/regex.hpp>

#include <iostream>

using namespace boost;
using namespace boost::iostreams;

// replaces y with Q
const regex_filter Y_FILTER
(
   regex("y") ,"Q"
);

// replaces x with Q
const regex_filter X_FILTER
(
   regex("x") ,"Q"
);

// test data has a y but no x
const char* Y_BUT_NO_X = "abc y cba";

void test()
{

   // outputs "abc Q cba" as expected
   {
      filtering_ostream out( Y_FILTER );
      out.push(std::cout);
      out << Y_BUT_NO_X;
   }
   
   // crashes with null pointer dereference !!!
   {
      filtering_ostream out( X_FILTER );
      out.push(std::cout);
      out << Y_BUT_NO_X;
   }
}

===========================================================
Please note that the same crash occurs with "real world" regex's and data so
the issue is not caused by boundary conditions due to the "artificial" test data.

It is generically caused by the condition of not matching the regex.

The crash occurs at line 53 of boost/iostreams/filter/regex.hpp

in the function

 void do_filter(const vector_type& src, vector_type& dest)
 {
    typedef regex_iterator<const Ch*, Ch, Tr> iterator;
    if (src.empty())
        return;
    iterator first(&src[0], &src[0] + src.size(), re_, flags_);
    iterator last;
    const Ch* suffix = 0; // Prevent GCC 2.95 warning.
    for (; first != last; ++first) {
        dest.insert( dest.end(),
                     first->prefix().first,
                     first->prefix().second );
        string_type replacement = replace_(*first);
        dest.insert( dest.end(),
                     replacement.begin(),
                     replacement.end() );
        suffix = first->suffix().first;
    }
    dest.insert(dest.end(), suffix, &src[0] + src.size()); <<< Crash !!
}

In cases where no regex match will occur then first == last, the for loop body is never run and
consequently suffix remains initialised to 0.
Consequently its use in the line succeeding the loop body causes a crash.

The intent of this last line is to copy any text remaining after the last match from the src to the dest.

This means that an appropriate initialiser for suffix is the start of the src, so that all text is copied to the destination if no match occurs.

This can be achieved by altering the line;
    const Ch* suffix = 0; // Prevent GCC 2.95 warning.
to
    const Ch* suffix = &src[0]; // Prevent GCC 2.95 warning.

However a small rewrite avoids several calls to operator[] and size() and makes the code (ever so slightly) more readable;

    void do_filter(const vector_type& src, vector_type& dest)
        {
            typedef regex_iterator<const Ch*, Ch, Tr> iterator;
            if (src.empty())
                return;
            const Ch* start = &src[0];
            const Ch* end = start + src.size();
            const Ch* suffix = start; // Prevent GCC 2.95 warning.
            iterator first(start, end, re_, flags_);
            iterator last;
            for (; first != last; ++first) {
                dest.insert( dest.end(),
                             first->prefix().first,
                             first->prefix().second );
                string_type replacement = replace_(*first);
                dest.insert( dest.end(),
                             replacement.begin(),
                             replacement.end() );
                suffix = first->suffix().first;
            }
            dest.insert(dest.end(), suffix, end);
        }

I have tested this with both the test() function above and my original data and correct output is produced.

regards,

Andrew
Andrew McDonald
System Architect

> Norwood Systems Australia Pty Ltd
Level 1, 71 Troy Terrace
PO Box 1281
Subiaco, WA 6904

> Tel +61 8 9380 7766
> Fax +61 8 9380 7733
andrew.mcdonald_at_[hidden]
>
> The information in this email, and any attachments, may contain confidential information and is intended solely for the attention and use of the named addressee (s). It must not be disclosed to any person(s) without authorization. If you are not the intended recipient, or a person responsible for delivering it to the intended recipient, you are not authorized to, and must not, disclose, copy, distribute, or retain this message or any part of it. If you have received this communication in error, please notify the sender immediately.
>
>
>
>
>
>
>
>
>
>
>
>
>


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net