Boost logo

Boost :

Subject: Re: [boost] [algorithm] Review Request: edit_distance
From: Erik Erlandson (eje_at_[hidden])
Date: 2014-01-31 19:53:59


----- Original Message -----

>
> - It would be great to see a benchmark against GNU diff!

So I cooked up an implementation of unix diff:
https://github.com/erikerlandson/algorithm/blob/edit_distance/sequence/example/edit_distance_diff_example.cpp

It produces standard default unix diff output:
http://en.wikipedia.org/wiki/Diff#Usage

That is to say, edit_distance_diff_example will reproduce the output of diff, at least if you just give both programs two files for arguments:

$ edit_distance_diff_example foo.txt bar.txt > ed.out
$ diff foo.txt bar.txt > diff.out
$ diff ed.out diff.out
$

I generated a few benchmarking data-sets, by editing /usr/share/dict/words. On my machine (F18), the 'words' file has 479,828 lines (one word per line) and is 4,953,680 bytes. So, a decent size of file in both line length and bytes. I did a couple different experiments, where I compared against a variation that differed by around 150 lines, and then another variation that differed by around 1200 lines. I also varied input size, by just duplicating the contents, so the result was twice the length and size.

Across these variations of input size and file difference that I tested, edit_distance_diff_example ran consistently 40% to 60% faster than unix diff. So, roughly speaking it's performing about twice as fast.

And less than 150 lines of code!


Boost list run by bdawes at acm.org, gregod at cs.rpi.edu, cpdaniel at pacbell.net, john at johnmaddock.co.uk