Boost logo

Boost Users :

Subject: Re: [Boost-users] [filesystem] problem: is_regular_file and deduplified files (reparse+sparse)
From: Paul Harris (harris.pc_at_[hidden])
Date: 2015-07-24 04:04:08


Note, i've reposted this in a different form onto the boost-dev list,
which I assume is the proper forum for the next step (fixing the bug)

On 23 July 2015 at 15:11, Paul Harris <harris.pc_at_[hidden]> wrote:

> FYI, I followed the blog article,
> then once the machine was "running" I clicked Connect at the bottom.
> That gave me an .rdp file which in theory I could use with rdesktop, but
> it uses a DNS name that was only just created, so that didn't work.
>
> When you click the name of the server in the list, it shows the public IP
> on the right.. and the port
> then you can do this
> $ rdesktop that.ip.addr:port
>
> But only if you have the latest rdesktop AND you have set up kerberos
> something-something.
>
> Instead I found a windows computer and used remote desktop from there.
>
> ---
>
> Once inside,
> in the "Server Manager --> Dashboard" window on the screen, click "Add
> Roles"
> then go next next until "Server Roles"
> expand "File and Storage services" , "File and iSCSI" , and tick "Data
> Deduplication"
> Then next next etc and Install.
> Wait a bit... and its done.
>
> http://www.techrepublic.com/blog/data-center/configuring-windows-server-8-deduplication/
>
> ---
>
> Continuing on that webpage...
> Time to enable dedup. There is a temp disk D: so lets enable there.
>
> Method 1... I did this and then went to method 2... Start PowerShell,
> type: "Enable-DedupVolume D:"
>
> Method 2... in that same Dashboard, hit the 4th button (File and Storage
> Services)
> Then Volumes --> Disks
> click Volume 1 at the top, and then right click D: at the bottom -->
> Configure Dedup.
>
> To try and accelerate this puppy, I set the "age to dedup" to 0 days.
>
>
> http://www.techrepublic.com/blog/data-center/windows-server-2012-deduplication-how-and-where-to-tweak/
>
> ---
>
> Time to make something to dedup. We'll just duplicate the warning.txt
> file that exists on D:
>
> In powershell:
> PS> D:
> PS> $file = Get-Content DATALOSS_WARNING_README.txt
>
> Then, do these 2 commands a bunch of times until "big.txt" gets to say 6MB
> PS> Add-Content big.txt $file
> PS> $file = Get-Content big.txt
>
> Then use windows explorer (or other) to make a dozen copies of big.txt
>
>
> Copy c:\windows\explorer.exe to D:
> to give it something to dedup
> Go to D: and then copy-paste explorer.exe a dozen times.
>
> In PowerShell, type:
> PS> Update-DedupStatus -Volume D:
> PS> Start-DedupStatus -Type Optimization -Volume D:
>
> and then wait for it to finish.
> you can track its progress with:
> PS> Get-DedupJob
> PS> Get-DedupStatus -Volume D:
>
> ---
>
> So, once its deduped, you check.
> PS> FSUTIL REPARSEPOINT QUERY big.txt
> you should see that its a reparse point with that 0x800etc0013 code.
>
> Copy-paste big.txt to big2.txt and check it with the query, and it should
> tell you big2 is NOT a reparse point.
>
>
> NOW TO TEST !
>
> ----
>
>
> On 23 July 2015 at 13:57, Paul Harris <harris.pc_at_[hidden]> wrote:
>
>> Hi Niall, you can use the Azure to test this sort of thing... I think.
>> I'm trying it out now.
>>
>> http://blogs.technet.com/b/tommypatterson/p/azureservertrial.aspx
>>
>>
>> On 23 July 2015 at 09:54, Niall Douglas <s_sourceforge_at_[hidden]>
>> wrote:
>>
>>> On 23 Jul 2015 at 8:56, Paul Harris wrote:
>>>
>>> > With this server comes the new "dedup" feature, that can automatically
>>> > deduplify files. This happens on a schedule, eg 2am saturday. So
>>> suddenly
>>> > we are getting messages of failures of software from all over the
>>> place,
>>> > due to fs::is_regular_file()
>>> >
>>> > Deduped files have the REPARSE and SPARSE flag set.
>>> > On the command line, you can run
>>> > FSUTIL REPARSEPOINT QUERY
>>> >
>>> > and the "Reparse Tag Value" is 0x80000013
>>> >
>>> > Which is a relatively new flag known as IO_REPARSE_TAG_DEDUP
>>> >
>>> https://msdn.microsoft.com/en-us/library/windows/desktop/aa365740%28v=vs.85%29.aspx
>>> >
>>> > These files act as normal files, you can fopen and fread them, so I
>>> assume
>>> > they should be treated almost like symlink by boost... perhaps not
>>> quite a
>>> > symlink because I assume the "lstat" link properties are identical to
>>> the
>>> > file's stat properties.
>>> >
>>> >
>>> > Typically, I iterate over directories and only process files if
>>> > fs::is_regular_file(filename) is true.
>>> >
>>> > I wrote some code to check what the properties were on these files,
>>> and its
>>> > not any of the possible enums detected by file_status::type().
>>> >
>>> > ideas?
>>>
>>> Proposed Boost.AFIO doesn't support IO_REPARSE_TAG_DEDUP because I
>>> have no access to any system to test the support upon.
>>>
>>> However, if AFIO were to support IO_REPARSE_TAG_DEDUP, it would treat
>>> it identically to a symlink/junction point.
>>>
>>> I'd suggest Boost.Filesystem do the same, and treat pseudo-symlinks
>>> as symlinks. That probably means adding full symlink support for
>>> Filesystem on Windows. Here are some links to example implementation
>>> code:
>>>
>>> Reading a symlink target:
>>> https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af
>>> io/v2/detail/impl/afio_iocp.ipp#L511
>>> <https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/detail/impl/afio_iocp.ipp#L511>
>>>
>>> Writing a symlink:
>>> https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/af
>>> io/v2/detail/impl/afio_iocp.ipp#L848
>>> <https://github.com/BoostGSoC13/boost.afio/blob/master/include/boost/afio/v2/detail/impl/afio_iocp.ipp#L848>
>>>
>>> Obviously best not allow rewriting a pseudo-symlink like
>>> IO_REPARSE_TAG_DEDUP, make it read only.
>>>
>>> Niall
>>>
>>> --
>>> ned Productions Limited Consulting
>>> http://www.nedproductions.biz/
>>> http://ie.linkedin.com/in/nialldouglas/
>>>
>>>
>>>
>>> _______________________________________________
>>> Boost-users mailing list
>>> Boost-users_at_[hidden]
>>> http://lists.boost.org/mailman/listinfo.cgi/boost-users
>>>
>>
>>
>



Boost-users list run by williamkempf at hotmail.com, kalb at libertysoft.com, bjorn.karlsson at readsoft.com, gregod at cs.rpi.edu, wekempf at cox.net