Boost logo

Boost Users :

Subject: [Boost-users] Boost tokenizer does not work, and shows up invalid read of size on valgrind
From: Avi Bahra (avibahra_at_[hidden])
Date: 2009-07-14 11:46:54


Using latest boost version 39 on gcc 4.2.1 on SUSE linux
I am trying to load a file, and then split the lines into a vector of
strings.
However when this is run, it showed that the last string was corrupt.
When I ran this with valgrind, the very first error shows an invalid read of
size 1, in guts of boost::char_separator.

==19963== Invalid read of size 1
==19963== at 0x8056870: bool boost::char_separator<char,
std::char_traits<char> >::operator()<__gnu_cxx::__normal_iterator<char
const*, std::string>, std::string>(__gnu_cxx::__normal_iterator<char const*,
std::string>&, __gnu_cxx::__normal_iterator<char const*, std::string>,
std::string&) (token_functions.hpp:430)
==19963== by 0x8056E3E: boost::token_iterator<boost::char_separator<char,
std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*,
std::string>, std::string>::initialize() (token_iterator.hpp:70)
==19963== by 0x8056EA6: boost::token_iterator<boost::char_separator<char,
std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*,
std::string>, std::string>::token_iterator(boost::char_separator<char,
std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*,
std::string>, __gnu_cxx::__normal_iterator<char const*, std::string>)
(token_iterator.hpp:77)
==19963== by 0x8056FAA: boost::tokenizer<boost::char_separator<char,
std::char_traits<char> >, __gnu_cxx::__normal_iterator<char const*,
std::string>, std::string>::begin() const (tokenizer.hpp:86)

here is program:

BOOST_AUTO_TEST_CASE( test_log_append ) {

    string logFile = "test/logfile.txt";

    // Load the log file into a vector, of strings, and test content
    ifstream ifs(logFile.c_str());
    BOOST_REQUIRE_MESSAGE(ifs, "Could not open log file\n");

    stringstream ss; ss << ifs.rdbuf(); // Read the
whole file into a string
    char_separator<char> sep("\n"); // Split the file
content unix=\n pc =\n\r
    typedef boost::tokenizer<boost::char_separator<char> > tokenizer;
    tokenizer tokens(ss.str(),
sep);
// <<<<<<<< valgrind barfs here

    std::vector<std::string> lines; lines.reserve(9);
    std::copy(tokens.begin(), tokens.end(),
back_inserter(lines)); // <<<<<<<< valgrind barfs here
    for(int i = 0; i < lines.size(); i++) { cerr << "'" << lines[i] <<
"'\n"; }
}

the input in the logfile.txt is of the form:

 MSG:[16:36:09 14.7.2009] First Message
LOG:[16:36:09 14.7.2009] LOG
WAR:[16:36:09 14.7.2009] ERROR
ERR:[16:36:09 14.7.2009] WARNING
DBG:[16:36:09 14.7.2009] DEBUG
OTH:[16:36:09 14.7.2009] OTHER
OTH:[16:36:09 14.7.2009] OTHER2
MSG:[16:36:09 14.7.2009] Last Message

The output is of the form:

'MSG:[16:36:09 14.7.2009] First Message'
'LOG:[16:36:09 14.7.2009] LOG'
'WAR:[16:36:09 14.7.2009] ERROR'
'ERR:[16:36:09 14.7.2009] WARNING'
'DBG:[16:36:09 14.7.2009] DEBUG'
'OTH:[16:36:09 14.7.2009] OTHER'
'OTH:[16:36:09 14.7.2009] OTHER2'
'�:[16:36:09 14.7.2009] Last Message'

Notice that the last string is corrupt.
Is the tokenizer known to be buggy in boost 1.39, or am I doing it all
wrong ?

  Best regards,
Ta,
   Avi



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net