Boost logo

Boost Users :

Subject: Re: [Boost-users] [Spirit] Looking for a little Qi guidance for Unicode parsing
From: Travis Gockel (travis_at_[hidden])
Date: 2019-01-28 21:50:02


On Sun, Jan 27, 2019 at 11:05 Michael Powell via Boost-users <
boost-users_at_[hidden]> wrote:

> Hello,
>
> I am turning a corner in my JSON parser. I support ASCII through and
> through, but now I want to support Unicode, apparently UTF-8, part of
> the JSON standard. From what I can tell, this is not the entire
> grammar, but just for Strings.
>
> Looking for a little guidance on how to approach that issue, the
> elements involved, etc. Such as, are we talking about C++
> std::wstring? I have also seen std::u32string referenced in some
> forums.
>
> To begin with, it is a somewhat naive impression, would the characters
> not translate to unsigned char or char, but rather to
> std::wstring::value_type or std::u32string::value_type? Things like
> that come to mind approaching the issue.
>
> Additionally, how to otherwise handle symbol tables such as escape
> characters, i.e. from:
>
> struct escapes_t : qi::symbols<char, char> {
> escapes_t() {
> this->add("\\b", '\b')
> ("\\f", '\f')
> ("\\n", '\n')
> ("\\r", '\r')
> ("\\t", '\t')
> ("\\v", '\v')
> ("\\\\", '\\')
> ("\\/", '/')
> ("\\'", '\'')
> ("\\\"", '"')
> ;
> }
> } char_esc;
>
> And on from there.
>
> Thanks!
>
> Best regards,
>
> Michael W Powell
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> https://lists.boost.org/mailman/listinfo.cgi/boost-users

The answer to your question is a bit more complicate than you might expect.
In short, std::string is capable of representing Unicode text, as the
difference between binary representation (bits and bytes) and meaning
(codepoints). It would probably be illuminating for you to watch a talk
called “Unicode in C++” by James McNellis (
https://m.youtube.com/watch?v=tOHnXt3Ycfo).

> <https://lists.boost.org/mailman/listinfo.cgi/boost-users>

<https://lists.boost.org/mailman/listinfo.cgi/boost-users>

-- 
Travis Göckel
+1.720.234.9330


Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net