Boost logo

Boost-Build :

Subject: Re: [Boost-build] [win32] Inefficiencies in builtin_check_if_file
From: François Jacques (francois.jacques_at_[hidden])
Date: 2012-04-25 07:39:36


I've been giving some thoughts to this. I think the FindFirstFileA/GetFileAttributes could be moved up the chain, within file_info function, so that path_as_key receives the expanded path. This way, the filesys calls would be done in file_info rather than done in the path caching.

Thoughts?

Le 2012-04-24 à 16:22, Steven Watanabe a écrit :

> AMDG
>
> On 04/24/2012 11:01 AM, François Jacques wrote:
>> Hello,
>>
>> The windows implementation of file_query requires excessive time, and
>> is mostly I/O bound. I've been able to reduce the amount of time by
>> replacing the stat call in filent.c::file_query by
>> GetFileAttributesExA. The later is significantly faster than stat
>> since it doesn't attempt to expand the path it is being given. On a
>> large project, this made a big difference.
>>
>> But that's not all, and the next issue is more interesting to fix.
>> Follow the white rabbit :
>>
>> First, stat/GetFileAttributesExA is called to update bjam's
>> file_info_t record that gets created whenever file_info is called with
>> a path that's not already in the filesys.c file_info_t hashtable :
>>
>> file_info_t * ff = file_info( filename );
>> if ( ! ff->time )
>> { /* call stat or GetFileAttriubutes */ }
>>
>> Then, in file_info function, bjam makes sure that the key is sane by
>> calling path_as_key :
>>
>> file_info_t * file_info( OBJECT * filename ) {
>> /* snip */
>> filename = path_as_key( filename );
>> /* snip */
>> }
>>
>> On Win32, path_as_key calls FindFirstFileA through ShortPathToLongPath
>> and its use is questionable.
>>
>> Again, but that's not the point here, GetFileAttributes could have
>> been used instead as is likely to give a slightly better performance.
>> Anyhow. What bugs me here is the WIN32_FIND_DATA structure in there
>> that will end up containing all the information to populate the
>> file_info structure... and is thrown away! Instead, the file_info gets
>> updated through stat and GetFileAttributes in upper levels. Why? Why
>> hitting the file system twice for each file?
>>
>> Taking a step back, is ShortPathToLongPath required at all? All other
>> platforms are not using that mechanism. Is it to ensure that short
>> paths ("progra~1") gets expanded to "Program Files" ? The name of the
>> function hints in that direction. I can see how it could be used to
>> avoid having two cache entries for a given path that's expressed with
>> short path notation to save memory, but the performance hit from I/O
>> is terrible on large projects.
>>
>> Perhaps path_as_key and ShortPathToLongPath should receive an output
>> file_info argument that would be populated with the information
>> retrieved by the FindFirstFile call? In such scenario, the stat call
>> in file_query would become redundant on Windows but since the
>> file_info structure is already populated, the stat call would not be
>> performed.
>>
>> Please have a look at the codebase and correct me if I'm wrong. The
>> profiler says I'm right though :)
>>
>
> This all makes perfect sense. Patches welcome,
> otherwise I'll implement it some time after the
> next release.
>
> In Christ,
> Steven Watanabe
> _______________________________________________
> Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost-build


Boost-Build list run by bdawes at acm.org, david.abrahams at rcn.com, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk