Subject: [Boost-bugs] [Boost C++ Libraries] #1273: CR+LF newlines in position_iterator
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2007-09-18 09:00:32
#1273: CR+LF newlines in position_iterator
-------------------------------------+--------------------------------------
Reporter: slehuitouze_at_[hidden] | Owner: djowel
Type: Bugs | Status: new
Milestone: To Be Determined | Component: spirit
Version: | Severity: Problem
Keywords: |
-------------------------------------+--------------------------------------
On september 13th, I sent a mail on "spirit-general" mailing list to
describe a bug I ran into using position_iterator, which is entitled
"Various newline styles and position_iterator".
I'm not sure it is useful to rewrite everything here, I'll just come to
the conclusion : "position_iterator< file_iterator<char> >" has iterator
category "random_access_iterator_tag", whereas direct pointer arithmetic
is not possible on it (because of the eating of LF when facing CR+LF
newline).
As a consequence, one may end up with an unitialized character when one
tries to copy a range of two position_iterator in a "std::vector<char>".
This is demonstrated by the attached C++ source code, whose (part of the)
output on my machine is as follows:
**************BEGINNING OF OUTPUT***********************
We have read following characters in a 'vector<char>' container from a
file:
#0: 65 (A)
#1: 66 (B)
#2: 13 (\r)
#3: 87 (W)
#4: 205 (unexpected character)
**************END OF OUTPUT***********************
You will see while perusing the code that I have provided two versions :
one dealing with a file (i.e. type "position_iterator< file_iterator<char>
>"), one dealing with a mere character buffer (i.e. type
"position_iterator<const char*>"). Both of them cause the bug.
I also tried a variant (that can be activated by commenting out line #2)
that uses a "std::string" instead of a "std::vector<char>", and which does
not exhibit the problem. I have not looked in detail, but it's probably
because "std::string" copy is probably implemented by a pre-reservation
followed by a loop of "insert" and "push_back", rather than a pre-
allocation followed by a loop of assignment and incrementation (as in
"std::vector").
This approach (i.e. using a "std::string" rather than a "std::vector") is
not a practical workaround for my problem, since the problem is inside
spirit itself (more precisely, at lines 246-248 in 1.8.3 file
"spirit/tree/common.hpp"), where variable "text" has type
"std::vector<char>":
********************************************
node_val_data(IteratorT const& _first, IteratorT const& _last)
: text(_first, _last), is_root_(false), parser_id_(), value_()
{}
********************************************
As I said in my original mail, rapid solution is to simply change the
iterator category of "position_iterator" to "forward_iterator_tag".
But I think a more serious reflexion should also be considered: Is it
normal that the stream of char coming out of a "position_iterator<
file_iterator<char> >" may be different than the one coming out of a
"file_iterator<char>"?
I'm not sure of the answer...
In the above-mentionned mail, I suggested a correction for method
"increment" (that needs an extra member variable "_crJustSeen") that would
not change the stream, this might be the base for a new implementation
that you could do.
Regards.
--Serge Le Huitouze
--
Ticket URL: <http://svn.boost.org/trac/boost/ticket/1273>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.
This archive was generated by hypermail 2.1.7 : 2017-02-16 18:49:56 UTC