Boost logo

Boost :

From: Erik Wien (wien_at_[hidden])
Date: 2004-10-20 13:36:46


"Vladimir Prus" <ghost_at_[hidden]> wrote in message
news:cl55cd$9ei$1_at_sea.gmane.org...
> Why do you need the traits, at compile-time?

Perhaps I didn't state this clearly enough. The traits class is one of the
template parameters of the encoded_string class. (Defaulting to
encoding_traits<encoding>) The traits class contains all information about
the encoding being specified, like code unit size, and functions for
iterating through a code unit sequence. All encoding specific implementation
is done in the traits class.

> - Why would the user want to change the encoding? Especially between
> UTF-16 and UTF-32?

Well... Different people have different needs. If you are mostly using ASCII
characters, and require small size, UTF-8 would fit your bill. If you need
the best general performance on most operations, use UTF-16. If you need
fast iteration over code points and size doesn't matter, use UTF-32.

> - Why would the user want to specify encoding at compile time? Are there
> performance benefits to that? Basically, if we agree that UTF-32 is not
> needed, then UTF-16 is the only encoding which does not require complex
> handling. Maybe, for other encodings using virtual functions in character
> iterator is OK? And if iterators have abstract characters" as value_type,
> maybe the overhead if that is much large that virtual function call even
> for UTF-16.

Though I haven't confirmed this by testing, I would assume templating the
encoding and thus specifying it at compile time would result in better
performance since you don't have the overhead of virtual function calls.
(Polymorphy would probably be needed if templates were scrapped.) Avoiding
virtual calls also enables the compiler to optimize (inline) more
thouroughly, something that is very benificial in this case because of the
amount of different small, specialized functions that are needed in string
manipulation.

> (As a side note, discussion about templated vs. non-templated interface
> seems a reasonable addition to a thethis. It's sure thing that if
> anybody
> wrote such a thethis in our lab, he would be asked to justify such a
> global decisions).

Thanks for the tip! I would probably include a discussion on why templates
are used if they end up in a final implementation.

> - What if the user wants to specify encoding at run time? For example, XML
> files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding
> if
> XML document is 8-bit, and UTF-16 when it's Unicode.

That is one problem with the templating of encoding. You would have to ether
template all file scanning functions in the XML parser on encoding as well,
of you would need to do some run-time checks and use the correct template
depending on the encoding used in the file. This is of course not ideal, but
only where encoding is something that is specified upon run-time. What the
most common scenario is, is something that needs to be determined before a
final design is decided on.


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk