[Boost-bugs] [Boost C++ Libraries] #1273: CR+LF newlines in position_iterator

Subject: [Boost-bugs] [Boost C++ Libraries] #1273: CR+LF newlines in position_iterator
From: Boost C++ Libraries (noreply_at_[hidden])
Date: 2007-09-18 09:00:32


#1273: CR+LF newlines in position_iterator
-------------------------------------+--------------------------------------
 Reporter: slehuitouze_at_[hidden] | Owner: djowel
     Type: Bugs | Status: new
Milestone: To Be Determined | Component: spirit
  Version: | Severity: Problem
 Keywords: |
-------------------------------------+--------------------------------------
 On september 13th, I sent a mail on "spirit-general" mailing list to
 describe a bug I ran into using position_iterator, which is entitled
 "Various newline styles and position_iterator".
 I'm not sure it is useful to rewrite everything here, I'll just come to
 the conclusion : "position_iterator< file_iterator<char> >" has iterator
 category "random_access_iterator_tag", whereas direct pointer arithmetic
 is not possible on it (because of the eating of LF when facing CR+LF
 newline).
 As a consequence, one may end up with an unitialized character when one
 tries to copy a range of two position_iterator in a "std::vector<char>".
 This is demonstrated by the attached C++ source code, whose (part of the)
 output on my machine is as follows:
 **************BEGINNING OF OUTPUT***********************
 We have read following characters in a 'vector<char>' container from a
 file:
 #0: 65 (A)
 #1: 66 (B)
 #2: 13 (\r)
 #3: 87 (W)
 #4: 205 (unexpected character)
 **************END OF OUTPUT***********************

 You will see while perusing the code that I have provided two versions :
 one dealing with a file (i.e. type "position_iterator< file_iterator<char>
>"), one dealing with a mere character buffer (i.e. type
 "position_iterator<const char*>"). Both of them cause the bug.
 I also tried a variant (that can be activated by commenting out line #2)
 that uses a "std::string" instead of a "std::vector<char>", and which does
 not exhibit the problem. I have not looked in detail, but it's probably
 because "std::string" copy is probably implemented by a pre-reservation
 followed by a loop of "insert" and "push_back", rather than a pre-
 allocation followed by a loop of assignment and incrementation (as in
 "std::vector").
 This approach (i.e. using a "std::string" rather than a "std::vector") is
 not a practical workaround for my problem, since the problem is inside
 spirit itself (more precisely, at lines 246-248 in 1.8.3 file
 "spirit/tree/common.hpp"), where variable "text" has type
 "std::vector<char>":
 ********************************************
     node_val_data(IteratorT const& _first, IteratorT const& _last)
         : text(_first, _last), is_root_(false), parser_id_(), value_()
         {}
 ********************************************


 As I said in my original mail, rapid solution is to simply change the
 iterator category of "position_iterator" to "forward_iterator_tag".

 But I think a more serious reflexion should also be considered: Is it
 normal that the stream of char coming out of a "position_iterator<
 file_iterator<char> >" may be different than the one coming out of a
 "file_iterator<char>"?
 I'm not sure of the answer...
 In the above-mentionned mail, I suggested a correction for method
 "increment" (that needs an extra member variable "_crJustSeen") that would
 not change the stream, this might be the base for a new implementation
 that you could do.

 Regards.

 --Serge Le Huitouze

--
Ticket URL: <http://svn.boost.org/trac/boost/ticket/1273>
Boost C++ Libraries <http://www.boost.org/>
Boost provides free peer-reviewed portable C++ source libraries.


This archive was generated by hypermail 2.1.7 : 2017-02-16 18:49:56 UTC