SML

You are currently browsing the archive for the SML category.

In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. Most were severely damaged, but those that could be recovered confirmed the existence of a secret society long suspected to have been active in the region around the year 200 BC.

Based on a translation of these documents, we now know that the society, the Cult of the Bound Variable, was devoted to the careful study of computation, over two millennia before the invention of the digital computer.

Like last year the prospects for my participation in the ICFP Contest was not looking good. None of my team mates from last year’s team seemed to be able to participate, and neither did I myself. The weekend of the contest was packed with family business. And on top of that, when the weekend arrived I was sick Friday night and Saturday.

However, Sunday evening I had some free time and I decided that I would take a crack a the contest just to see what it was about. Judging from the discussion mailing-list it quite fun and interesting. The first phase of the contest task was to implement a 14-instruction virtual machine called UM and when that was running you should use it for running the provided codex for the operating system UMIX.

So I registered my team KFL and started to implement my UM in SML. The first thing I did was to implement an instruction decoder that could translate a 32-bit word into an SML datatype. Then I wrote a function that read in a file of 32-bit words encoded in big-endian as four 8-bit words each. And then maped my decode function over the Vector of words. For this task the SML Basis Library really shined:

fun readFile filename =
    let val dev = BinIO.openIn filename
        val all = BinIO.inputAll dev before BinIO.closeIn dev
        val words = Vector.tabulate(Word8Vector.length all div 4,
                                    fn i => Word32.fromLarge(PackWord32Big.subVec(all,i)))
    in  Vector.map decode words
    end

Time spend: 1 hour.

Unfortunately, this did not work. My decoding function failed after 1675 instructions or so, complaining about illegal instructions. And indeed the 32-bit word it complained about did not seem to encode a legal instruction. I tried to reimplement the conversion from 8-bit words to 32-bit words, in case PackWord32Big worked different than I thought. But I still got the same error. Thus, I gave up and went to bed.

Time spend: 2 hours.

Monday morning I had to see to some other things first, but then I had some time to spend on the contest. Even after I had slept on the problem I still couldn’t figure out what was wrong. So I asked my colleague Arne if he had 10 minutes to help me debug my program. I explained him the problem, show him my code (actually my debug output, and then we looked at the codex in a hex-editor. He confirmed that from my explanation, my program appeared to be working correctly, and it looked as if there was an illegal instruction in the codex, if all instructions really was encoded a single 32-word. Hence, one or more of my assumptions had to wrong (is was easy to rule out that the codex was wrong, because more than a hundred teams were able to run the codex). Then it occurred to me, the codex was not required to only contain valid instructions, maybe the code would jump over damaged parts of the codex and part of the contest would be to repair the codex. Thus, I changed my code to only decode instructions on demand, and kept the whole program as an array of 32-bit words. Lo and behold the machine was able to start running the codex! However it failed in the self-check the codex performed. After some debugging I found one place where as I used the name of an register (registers in the UM is named by integers) as a value rather than using the value contained in the register. And now my UM was able to run the codex and the SANDmark (a debug and benchmark suite provided by the contest managers).

Time spend: 2 hours.

My first version ran the SANDmark in a bit more than 18 minutes (14 min user and 4 min sys) , 768 seconds user time according to MLton’s profiler. Which was not to bad but I’d seen on the discussion list, that other participants had UMs that ran the SANDmark in a couple of minutes. Thus, I decided to profile my UM to see where the time was spend. To my surprise the top function in the profile was my decode function, a function that took a 32-bit word and translates it to an SML datatype. Here are the first few lines of decode together with the helper function standardRegs that fetches out the register names:

fun standardRegs w =
    let open Word32
        val A = (w << 0w23) >> 0w29
        val B = (w << 0w26) >> 0w29
        val C = andb(w, 0w7)
    in (toInt A, toInt B, toInt C)
    end

fun decode w =
    let open Word32
        val opr = w >> 0w28
    in case opr of
           0w0  => CMove(standardRegs w)
         | 0w1  => ARead(standardRegs w)
         | 0w2  => AWrite(standardRegs w)
         ...

And the top of my interpreter loop looked like this:

      while true do
             case spin() of
                 CMove(A,B,C) => if $C = 0w0 then ()
                                 else A < - $B
               | ARead(A,B,C) => A < - $$B sub (W32.toInt($C))
               | AWrite(A,B,C) => Array.update($$A, W32.toInt($B), $C)
               ...

Where spin is the function that reads the current word at the program counter, updates the program counter, decodes the word, and return the instruction. But how could 19% of the time be spend in the decode. I moved the call to decode from spin to my interpreter loop to aid the MLton optimisers:

      while true do
             case decode(spin()) of

This made the SANDmark 5 minutes faster wall clock time, that is 13 minutes. Or in MLton profiler time 529 seconds. 30% improvement just for moving a function around. Not bad.

Time spend: 30 minutes.

After this optimisation my UM was fast enough that I thought I’d try to solve some of the puzzles. So I logged into the UMIX OS using the guest account and started to poke around and collect points. The first real puzzle was to fix a password cracker written in a weird Basic dialect that used roman numerals instead of decimal notation for integer literals (including for the line numbers).

Time spend: 1½ hour. Collected points 230.

Then I had to go home, and while I cooked dinner (I was baking pita bread, and while the dough was rising I had time to hack) I was able to write an improved password cracker—in this weired roman numerals Basic: hack2.bas. This gained me an other 100 points, just before the contest ended (the contest ended at 18:00 in CEST)

Time spend 45 min. Collected points in total 330.

All in all not bad to make 330 points after spending only seven hours and 45 minutes of rather fragmented time.

After dinner I was able to gain an other 35 points by writing a list reversal program in a graphical 2D language: rev.2d. It took half an hour or so.

The setup for the contest was absolutely amazing and most entertaining. My account of it here does not do it justice. An incredible ammount of work must have gone into the preparation of it. I’m looking forward for the final debriefing from the Contest Organizers.

Yesterday, I tried for fun to optimise my UM program a bit more. Programs running on the UM are able to allocate and free arrays. In my original implementation I used a ref to a functional Red-Black tree to keep track of the mapping from UM-pointers to arrays. I know, not the best choice of data structure, but I was just trying to get a “got enough” UM up and running. From the profile it was obvious that lots of time and memory was spend on keeping the Red-Black trees balanced. Thus, I replaced this code with an array, and a free-list for reusing UM-pointers (32-bit words). Thus, my code for managing the “heap” when from 12 lines of code (not counting the code in the Red-Black tree library) to 28 lines of code. This small changed made the SANDmark run in 4 minutes wall clock(175 seconds of MLton profiler time) an improvement of almost 67%. Looking at the profile, I could see that decode was again on top of the list (using 42% of the time). Thus, I decided to inline decode and deforest the instruction datatype by hand. This made my code 68 lines smaller, and the SANDmark ran in 2.50 minutes (134 seconds of MLton profiler time), 23% improvement. Almost four times faster than the UM I participated in the contest with. Time spend 1½ hour.

  • Code for the UM I participated with: um2.sml
  • Code with improve heap handling: um3.sml
  • Code with inlined decode function: um4.sml

The answer to yesterdays quiz is: Yes, types are necessary for lambda-lifting refactoring. Namely, if the lifted function contains an overloaded operator such as, e.g., +.

For example, given the program:

fun foo x =
    let fun add y = x + y
    in  add 5.0 end

where we want to lift the function add. Our first attempt might be to transform the program into something like (refactoring is a source code to source code transformation):

fun add x y = x + y
fun foo x = add x 5.0

However, this might not compile with every SML compiler. It depends on how much local context the compiler uses to resolve overloading.

Moscow ML compiles the transformed program above just fine. But Moscow ML complains about the following version:

fun add x y = x + y;
fun foo x = add x 5.0

Notice the extra semi-colon after the declaration of add.

Update 2005-12-22:
Just for good messure. Here is one possible result of refactoring with necessary type information:

fun add (x : real) y = x + y;
fun foo x = add x 5.0

I believe that only captured variables needs to be typed.

Also note, that overloading is not the only problem. It is possible to construct a similar example using records, but I’ll leave that to the reader (unless there is a popular and desperate demand, then I can provide an example).

Yesterday, I discussed with some students who are implementing an SML plug-in for eclipse, whether types are necessary for a lambda-lifting refactoring for SML.

So today’s quiz is simply: Are types necessary for lambda-lifting refactoring in SML? Why/Why not?

Remember, the refactoring works on valid SML programs, and after the transformation the program should still be valid.

Example:

fun foo x =
    let fun bar y = (x,y)
    in bar x end

is transformed/refactored to

fun bar x y = (x,y)
fun foo x = bar x x

I’ll give the answer tomorrow.

SQLite for Moscow ML

Stop The Press! Henning has started to make a binding of SQLite for Moscow ML. This absolutelly great news. I’m looking so much forward to play with this binding (hint, hint, Henning).

At work we are using SQLite with great success. I think that SQLite fills an important, but somewhat overlooked, niche: a small, efficient, and easy to embed relational database with fairly complete support for SQL.

I hope that Henning plans to relase it under a license, so that it might be possible to use the binding in a closed source program…

Together with mGTK this binding will open up lots of interesting possibilities. For instance, Henning talked about using the binding for poking into the Beagle database. Nifty stuff.

Update 2005-01-28:Henning has given me access to his darcs repository for the binding.

mGTK pre-release 0.93

Tonight Henning and managed to make a release of mGTK. We call it a prerelease becase we have not written any documentation, the release has only been tested on my laptop (running Debian Sarge/Sid), and we have not made a Windows port yet. Still, now you can download it from Sourceforge and play with it.

Michael claims that he does not have an editor that can handle Unicode.

Thus, Henning and I whipped up an editor using mGTK that can handle Unicode. Oh, and did I mention that you can it compile with either Moscow ML or MLton without changing the source?

The real story is of course that I we were trying to build a “real” application using mGTK, and the editor example shows that the gtk+ widgets handle Unicode fine. Wereas I’m not sure that SML handles Unicode “fine”, String.size does not return the number of characters but the number of bytes. But at least TextIO does not mess up the bytes (in Moscow ML at least, didn’t test with MLton).

Oh, and I used file and gedit to check that the file I saved really was Unicode.

More mGTK Progress

Henning and I have been working hard (leisure time hard, that is) on mGTK since my last mGTK progress report. We have ported all the interesting Gtk# examples from Mono: A Developer’s Notebook Chapter 4.

screenshot of four mGTK examples

Currently the only missing bits are better support for the lower levels of the library stack most notably GDK, but also Glib, ATK, and Pango. Which means that there are no pretty monkeys and the drag and drop example does not compile, yet. The source code for the examples can be found in the mGTK CVS.

Next up, is polish of the bindings generator for MLton which is currently not as well as the Moscow ML generator. The only thing missing is support for out-arguments. However, Henning is on the case, so the problems should be solved within days(?!?). A release is sooo close. That fells really good because it has taken three years to come this far.

callcc for Moscow ML

The last few days I have tried to recreate my implementation of callcc for Moscow ML. My original (incomplete) implementation was lost last fall when my old laptop was stolen (or it is still lurking somewhere in mess which is my old backup system).

The code almost works. For example, the following code, which just ignores the continuation value k:

3 + callcc (fn k => 2 + 1);

Gives 6, as it should.

However, the following more interesting example, which actually uses the continuation value:

3 + callcc (fn k => 2 + throw (k, 1));

Gives 67299815 and not 4. Ouch!

Likewise, given the following declaration of multiply:

fun multiply ints =
    callcc (fn ret =>
    let
        fun mult nil = 1
          | mult (0::_) = throw (ret, 0)
          | mult (n::ns) = n * mult ns
    in
	mult ints
    end)

The expression:

multiply [1,2,3,4,0,5];

evaluates to 67299810 and not 0. Double ouch!!

My guess is that I somehow mess things up when I restore the stack, and some pointer ends up on top of the stack. But I’m too tired to debug this futher tonight.

One nifty feature of my code (if I can get it to work) is that it is implemented using the Dynlib library. Which means that no changes are needed to the original Moscow ML code, it is all in a seperate library.

Or: SML/Moscow ML is faster than C++/GCC

Inspired by a remark by Peter about the abysmal performance of exceptions on the .NET platform. I decided to check it for myself with a micro-benchmark. I also decided to check the performance of C++ exceptions, just for the fun of it.

The micro-benchmark is to find a solution to the N-queen problem using backtracking, with exceptions used as backtracking. I tested four different languages (SML, O’Caml, C#, and C++) and five different language implementations (MLton, Moscow ML, O’Caml, Mono, and GCC). The results are a bit surprising, and definately deserves further investigation. The graph below shows the running times for the five different executables:

Graph showing running times for the different executables

From the graph we see that O’Caml and MLton are way faster than the rest (and O’Caml has a small edge compared to MLton), Moscow ML is faster than Mono, and my C++ program is slow as a slug. I’ve tried various small tweaks to make the C++ version faster, but nothing really helps. For instance, at first we might notice that there are a tiny algorithmic difference between the C++ version and the other versions, namely the order in which a new position is compared with the already determined positions. But changing this order have no real effect (I’ve tried to change C++ program in various ways, and I tried to change the other programs). Still, the C++ performance is so bad that it looks suspicious. I’m open to suggestions for how to improve the C++ program.

Now I just have to test the SML version with the MLKit and SML/NJ, test the C# version with Microsoft’s runtime, test the C++ version with Visual C++ and Intel’s C++ compiler, and make a Java version of the benchmark and test it with … Any volunteers?

The programs used in the benchmark are here: queens.sml, queens.ml, queens.cs (ran mono with option --optimize=all) , and queens.cpp (compiled with option -O3).

mGTK progress

Tonight I decided to try and port an example from Mono: A Developer’s Notebook to the upcoming release of mGTK.

By joining forces with Henning we were able to compile the example for the following screenshot:

mGTK screenshot

The email entry-box is updated dynamically when you type something in the two other entry boxes. Mnemonics works, that is, if you press ALT-f the focus jumps to the entry-box for the first name and the text is selected. And all that in less than 100 lines of SML code.