Shared snapshots

Yesterday I bought an external hard-disk for backups. To make the backups I’ve rolled my own script (snapshot.sh) that uses rsync to make a snapshot of my home-partition and transfer it to the external hard-disk. The interesting part is, that the script takes advantage of the rsync option --link-dest to share a large part of the snapshots using hardlinks.

So after making three snapshots du -sh tells us that the snapshots takes up 25.3 Gb of disk space:

8.5G    /mnt/external/vips-backup/snapshot.0
8.4G    /mnt/external/vips-backup/snapshot.1
8.4G    /mnt/external/vips-backup/snapshot.2

However, df -h tells us that only 8.6 Gb of the external hard-disk is used:

/dev/sda1             187G  8.6G  179G   5% /mnt/external

Nifty, and with just 40 lines of bash scripting.

Now with Java

Following up on the benchmark from yesterday, I’ve produced a Java version of the queens benchmark (queens.java). Below is the updated graph. I have a suspision that the reason Mono and Java is performing so bad is not only due to exceptions, but rather it has something to do with garbage collection.

Graph showing running times for the different executables

I also tried to improve the C++ version based on some hints I got from Asger and Søren (colleagues at Laerdal Sophus). The C++ improvements worked really well when we tested them at work with Visual Studio 6.0. But the improvements had absolutely no effect when compiling with gcc 3.4.1. Either gcc is very smart or very stupid. Something suggest the later as a quick test reveals that gcc version 3.4.1 is producing worse code than version 3.3.4, which in turn is producing worse code than version 3.2.3 . I guess that I’ll have to install Windows and the free Microsoft compilers, so that I can compare them against gcc and Mono.

A hand full of Queens

Or: SML/Moscow ML is faster than C++/GCC

Inspired by a remark by Peter about the abysmal performance of exceptions on the .NET platform. I decided to check it for myself with a micro-benchmark. I also decided to check the performance of C++ exceptions, just for the fun of it.

The micro-benchmark is to find a solution to the N-queen problem using backtracking, with exceptions used as backtracking. I tested four different languages (SML, O’Caml, C#, and C++) and five different language implementations (MLton, Moscow ML, O’Caml, Mono, and GCC). The results are a bit surprising, and definately deserves further investigation. The graph below shows the running times for the five different executables:

Graph showing running times for the different executables

From the graph we see that O’Caml and MLton are way faster than the rest (and O’Caml has a small edge compared to MLton), Moscow ML is faster than Mono, and my C++ program is slow as a slug. I’ve tried various small tweaks to make the C++ version faster, but nothing really helps. For instance, at first we might notice that there are a tiny algorithmic difference between the C++ version and the other versions, namely the order in which a new position is compared with the already determined positions. But changing this order have no real effect (I’ve tried to change C++ program in various ways, and I tried to change the other programs). Still, the C++ performance is so bad that it looks suspicious. I’m open to suggestions for how to improve the C++ program.

Now I just have to test the SML version with the MLKit and SML/NJ, test the C# version with Microsoft’s runtime, test the C++ version with Visual C++ and Intel’s C++ compiler, and make a Java version of the benchmark and test it with … Any volunteers?

The programs used in the benchmark are here: queens.sml, queens.ml, queens.cs (ran mono with option --optimize=all) , and queens.cpp (compiled with option -O3).

mGTK progress

Tonight I decided to try and port an example from Mono: A Developer’s Notebook to the upcoming release of mGTK.

By joining forces with Henning we were able to compile the example for the following screenshot:

mGTK screenshot

The email entry-box is updated dynamically when you type something in the two other entry boxes. Mnemonics works, that is, if you press ALT-f the focus jumps to the entry-box for the first name and the text is selected. And all that in less than 100 lines of SML code.

Visitor pattern visited

Hmm, my first blog entry. To get started, I’ll just write about a book I’m reading these days.

I have a confession to make: I have been guilty of discarding some of the OOP patterns as “just papering over the weaknesses of OOP languages”. For example, the Visitor pattern has been discarded as “it is just fold”. However, lately I’ve been reading a few C++ books (because my new job involves a lot C++ coding). One of the books I’m reading (for the second time) is Andrei Alexandrescu’s book Modern C++ Design, which is about how to implement patterns using C++ templates. In the introduction of the chapter about the Visitor pattern Alexandrescu first write:

To do this [enhance the functionality of a given class hierarchy], you can either add new classes or add new virtual member functions. Adding new classes is easy … [snip] In contrast, adding new virtual functions is difficult. To be able to manipulate objects polymorphically (via pointers to the root class), you must add virtual member functions to the root class, and possibly to many other classes in the hierarchy. This is a major operation. … [snip] In a nutshell, from a dependency standpoint, new classes are easy to add, and new virtual member functions are difficult to add.

At this point I has nodding to myself and thinking something along the lines: “What a clear description of what is wrong with OO programming languages”. These problems with OO languages are of couse well known, it is, for example, similar to what Xavier Leroy said in his invited talk at ICFP’99. I was just using Alexandrescu to confirm my prejudices. But Alexandrescu wents on:

Visitor trades the advantage you don’t care about [to add new classes with ease] for an advantage you need [to add new virtual functions to a hierachy easily].

This is the important insight. And it an insight that I’ve completely overlooked until now. I think that this way of looking at programming patterns is useful: look for advantages you don’t need and see if you can trade them for advantages you need. I also think that this is useful viewpoint for type systems and programming languages in general. For example, in C/C++ you have the advantage that you have full control over memory management, and this advantage can in some situations be used for writing efficient programs. But sometimes you are willing to give up that advantage for the advantage of automatically handling of memory management (i.e., a garbage collector).

Back to Alexandrescu. His book also made me realise that the Visitor pattern probably corresponds more to a case-expression than to a fold-function, at least Alexandrescu doesn’t give a concrete example that handles recursion. Perhaps, I should have paid more attention when I read the GoF. All in all, I can recommend Alexandrescu’s book as a fine supplement to the GoF (if you are into C++ and don’t mind templates).