Boost logo

Boost :

From: Richard Hodges (hodges.r_at_[hidden])
Date: 2020-09-06 07:55:18


Perhaps file an issue here?

https://github.com/boostorg/iostreams/issues

On Sat, 5 Sep 2020 at 20:58, Justin McManus via Boost <boost_at_[hidden]>
wrote:

> To follow up on my original post, I have two additional observations:
> 1.) I'm currently using boost version 1_65_1. In version 1_58_0, the code
> always read to the EOF without an issue, even with the default buffering.
> 2.) I tried making the buffer size arbitrarily large (1e8), but this had
> almost no effect at all on the behavior of the code. Since that buffer size
> is guaranteed to be large enough to hold any line in the input files I'm
> processing, it would seem that a limitation in the buffer size is not the
> underlying problem.
>
> On Sat, Sep 5, 2020 at 2:23 PM Justin McManus <justin_at_[hidden]> wrote:
>
> > I have some code that works as intended, but it requires setting a
> > buffer_size parameter to zero on a std::ifstream pushed onto a filtering
> > chain, and I'd like to understand why, to ensure I'm not introducing a
> bug
> > or a hack.
> >
> > I have essentially the following code:
> >
> >
> --------------------------------------------------------------------------------------------------------
> > std::ifstream m_jf("json_filename", std::ios_base::in |
> > std::ios_base::binary);
> > std::locale utf8_locale("en_US.UTF-8");
> > m_jf.imbue(utf8_locale);
> >
> > boost::iostreams::filtering_istream m_inbuf;
> > m_inbuf.push(boost::iostreams::bzip2_decompressor());
> > m_inbuf.push(m_jf);
> >
> > std::string m_line;
> > while (std::getline(m_inbuf, m_line)) {
> > // Process the current line from the JSON file
> > }
> >
> >
> --------------------------------------------------------------------------------------------------------
> >
> > What I find is that the std::getline call will fail before the code has
> > reached the EOF. It will always fail at the same line in a given JSON
> file,
> > but it will fail on different lines in different JSON files. It's
> perfectly
> > reproducible.
> >
> > However, if I change lines 4 and 5 to
> > m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
> > m_inbuf.push(m_jf, *0*);
> > then the problem goes away.
> >
> > My question is, Why does setting the buffer_size parameter to zero solve
> > the issue? What does this do, exactly? I saw the suggestion to set the
> > buffer size this way from an old post in 2009, and it appears to work,
> but
> > I'd like a deeper understanding of what's happening under the hood. If
> the
> > buffer size is set to zero, what does the underlying implementation do,
> and
> > how might this influence whether std::getline fails before the EOF?
> >
> > Thanks very much,
> > Justin
> >
> > --
> > Justin McManus, Ph.D.
> > Principal Scientist
> > Lead Computational Biologist and Statistical Geneticist
> > Kallyope, Inc.
> > 430 East 29th Street, Suite 1050
> > New York, NY 10016
> >
>
>
> --
> Justin McManus, Ph.D.
> Principal Scientist
> Lead Computational Biologist and Statistical Geneticist
> Kallyope, Inc.
> 430 East 29th Street, Suite 1050
> New York, NY 10016
> (646) 596-3471
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost
>

-- 
Richard Hodges
hodges.r_at_[hidden]
office: +442032898513
home: +376841522
mobile: +376380212

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk