Boost :

Date view	Thread view	Subject view	Author view

From: Vladimir Prus (ghost_at_[hidden])
Date: 2004-10-21 01:05:23

Next message: Vladimir Prus: "[boost] Re: Re: Re: Re: Any interest in adding unicode support to boost?"
Previous message: Cromwell Enage: "Re: [boost] [graph] adjacency_list::operator=()"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Mathew Robertson: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Reply: Mathew Robertson: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Erik Wien wrote:

>> - Why would the user want to change the encoding? Especially between
>> UTF-16 and UTF-32?
>
> Well... Different people have different needs. If you are mostly using
> ASCII characters, and require small size, UTF-8 would fit your bill. If
> you need the best general performance on most operations, use UTF-16. If
> you need fast iteration over code points and size doesn't matter, use
> UTF-32.

Ok, since everybody agreed characters outside 16 bits are very rare, UTF-32
seems to never be needed. As for UTF-8 vs. UTF-16: yes, the need for choice
seems present. However, UTF-16 string class would be better than no string
class at all, and extra genericity will cost you development time.

>> - Why would the user want to specify encoding at compile time? Are there
>> performance benefits to that? Basically, if we agree that UTF-32 is not
>> needed, then UTF-16 is the only encoding which does not require complex
>> handling. Maybe, for other encodings using virtual functions in
>> character iterator is OK? And if iterators have abstract characters" as
>> value_type, maybe the overhead if that is much large that virtual
>> function call even for UTF-16.
>
> Though I haven't confirmed this by testing, I would assume templating the
> encoding and thus specifying it at compile time would result in better
> performance since you don't have the overhead of virtual function calls.
> (Polymorphy would probably be needed if templates were scrapped.)

It would. The question is by how much.

> Avoiding
> virtual calls also enables the compiler to optimize (inline) more
> thouroughly, something that is very benificial in this case because of the
> amount of different small, specialized functions that are needed in string
> manipulation.

This is a bit abstract. Virtual function is a inlining barrier, but it would
be placed only for character access. On both sides of the barrier, compiler
can freely optimize everything.

>> - What if the user wants to specify encoding at run time? For example,
>> XML
>> files specify encoding explicitly. I'd want to use ascii/UTF-8 encoding
>> if
>> XML document is 8-bit, and UTF-16 when it's Unicode.
>
> That is one problem with the templating of encoding. You would have to
> ether template all file scanning functions in the XML parser on encoding
> as well, of you would need to do some run-time checks and use the correct
> template depending on the encoding used in the file. This is of course not
> ideal, but only where encoding is something that is specified upon
> run-time. What the most common scenario is, is something that needs to be
> determined before a final design is decided on.

Another possibility is that you can decide if UTF8 of UTF16 should be used
dynamically -- just counting the number of non-ascii characters. That would
mean that only really advanced users need make the decision themself.

I think I'm starting to like Peter's idea that advanced users need
vector<char_xxx> together with a set of algorithms.

- Volodya

Next message: Vladimir Prus: "[boost] Re: Re: Re: Re: Any interest in adding unicode support to boost?"
Previous message: Cromwell Enage: "Re: [boost] [graph] adjacency_list::operator=()"
In reply to: Erik Wien: "[boost] Re: Any interest in adding unicode support to boost?"
Next in thread: Mathew Robertson: "Re: [boost] Re: Any interest in adding unicode support to boost?"
Reply: Mathew Robertson: "Re: [boost] Re: Any interest in adding unicode support to boost?"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk