Boost :

Date view	Thread view	Subject view	Author view

Subject: Re: [boost] How to find: best integer type for processor
From: Simonson, Lucanus J (lucanus.j.simonson_at_[hidden])
Date: 2008-10-30 14:56:27

Next message: Robert Ramey: "Re: [boost] [1.37.0][date-time] Breaking the log jam"
Previous message: Beman Dawes: "Re: [boost] [1.37.0][date-time] Breaking the log jam"
In reply to: Christian Henning: "Re: [boost] How to find: best integer type for processor"
Next in thread: Simonson, Lucanus J: "Re: [boost] How to find: best integer type for processor"

Christian Henning wrote:
>A quick question regarding 64-bit. Is the alignment here 8 bytes or
>still 4 bytes? I mean when reading from memory does the compiler read
>in 4 bytes or 8 byte chunks?

On a true 64 bit architecture the word size would be 8 bytes. You current 64 bit processors are still 32 bit addressable and have 32 bit alignment.

A cache line much larger than 8 bytes is read from physical memory into cache and when data is fetched from the data cache into a register it depends on the bit width of the bus between the cache and the processor whether it is 4 or 8 bytes at a time. If it were a 32-bit architecture it would read two four byte chunks into two 32 bit registers. On a EM64 bit architecture it would read one 8 byte chunk into a 64 bit extended register, unless the value were split between two cache lines in memory, in which case it takes two 32 bit chunks from the two different cache lines. It takes more than one clock to access even L1 cache so whether it does it in one step or two isn't really the performance critical aspect of the problem anyway.

In general, you should not need to consider whether reading a 64 bit value from data cache into register by the processor is one step or two for the processor when programming in C++. It is read as one chunk from memory into the cache, even on a 32-bit architecture, which is good enough, and your latency from cache to memory is usually a bigger deal than your latency from cache to register. In other words, the typical workload loses more performance to cache misses than to cache access latency.

The compiler would be hard pressed to align 64-bit values to addresses that are multiples of eight bytes in memory in a 32-bit addressable system, and as you can see, it is more profitable to align things to cache lines, which can be done at the application level with special allocators.

Hope that helps,
Luke

Next message: Robert Ramey: "Re: [boost] [1.37.0][date-time] Breaking the log jam"
Previous message: Beman Dawes: "Re: [boost] [1.37.0][date-time] Breaking the log jam"
In reply to: Christian Henning: "Re: [boost] How to find: best integer type for processor"
Next in thread: Simonson, Lucanus J: "Re: [boost] How to find: best integer type for processor"

Date view	Thread view	Subject view	Author view

Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk