ICFP Contest 2006, Team KFL

In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. Most were severely damaged, but those that could be recovered confirmed the existence of a secret society long suspected to have been active in the region around the year 200 BC.

Based on a translation of these documents, we now know that the society, the Cult of the Bound Variable, was devoted to the careful study of computation, over two millennia before the invention of the digital computer.

Like last year the prospects for my participation in the ICFP Contest was not looking good. None of my team mates from last year’s team seemed to be able to participate, and neither did I myself. The weekend of the contest was packed with family business. And on top of that, when the weekend arrived I was sick Friday night and Saturday.

However, Sunday evening I had some free time and I decided that I would take a crack a the contest just to see what it was about. Judging from the discussion mailing-list it sounded quite fun and interesting. The first phase of the contest task was to implement a 14-instruction virtual machine called UM and when that was running you should use it for running the provided codex for the operating system UMIX.

So I registered my team KFL and started to implement my UM in SML. The first thing I did was to implement an instruction decoder that could translate a 32-bit word into an SML datatype. Then I wrote a function that read in a file of 32-bit words encoded in big-endian as four 8-bit words each. And then maped my decode function over the Vector of words. For this task the SML Basis Library really shined:

fun readFile filename =
    let val dev = BinIO.openIn filename
        val all = BinIO.inputAll dev before BinIO.closeIn dev
        val words = Vector.tabulate(Word8Vector.length all div 4,
                                    fn i => Word32.fromLarge(PackWord32Big.subVec(all,i)))
    in  Vector.map decode words

Time spend: 1 hour.

Unfortunately, this did not work. My decoding function failed after 1675 instructions or so, complaining about illegal instructions. And indeed the 32-bit word it complained about did not seem to encode a legal instruction. I tried to reimplement the conversion from 8-bit words to 32-bit words, in case PackWord32Big worked different than I thought. But I still got the same error. Thus, I gave up and went to bed.

Time spend: 2 hours.

Monday morning I had to see to some other things first, but then I had some time to spend on the contest. Even after I had slept on the problem I still couldn’t figure out what was wrong. So I asked my colleague Arne if he had 10 minutes to help me debug my program. I explained him the problem, showed him my code (actually my debug output, and then we looked at the codex in a hex-editor. He confirmed that based on my explanation, my program appeared to be working correctly, and it looked as if there was an illegal instruction in the codex, if all instructions really was encoded a single 32-word. Hence, one or more of my assumptions had to wrong (it was easy to rule out that the codex was wrong, because more than a hundred teams were able to run the codex). Then it occurred to me, the codex was not required to only contain valid instructions, maybe the code would jump over damaged parts of the codex and part of the contest would be to repair the codex. Thus, I changed my code to only decode instructions on demand, and kept the whole program as an array of 32-bit words. Lo and behold the machine was able to start running the codex! However it failed in the self-check the codex performed. After some debugging I found one place where I used the name of an register (registers in the UM are named by integers) as a value rather than using the value contained in the register. Now my UM was able to run the codex and the SANDmark (a debug and benchmark suite provided by the contest managers).

Time spend: 2 hours.

My first version ran the SANDmark in a bit more than 18 minutes (14 min user and 4 min sys) , 768 seconds user time according to MLton’s profiler. Which was not to bad but I’d seen on the discussion list, that other participants had UMs that ran the SANDmark in a couple of minutes. Thus, I decided to profile my UM to see where the time was spend. To my surprise the top function in the profile was my decode function, a function that took a 32-bit word and translates it to an SML datatype. Here are the first few lines of decode together with the helper function standardRegs that fetches out the register names:

fun standardRegs w =
    let open Word32
        val A = (w << 0w23) >> 0w29
        val B = (w << 0w26) >> 0w29
        val C = andb(w, 0w7)
    in (toInt A, toInt B, toInt C)

fun decode w =
    let open Word32
        val opr = w >> 0w28
    in case opr of
           0w0  => CMove(standardRegs w)
         | 0w1  => ARead(standardRegs w)
         | 0w2  => AWrite(standardRegs w)

And the top of my interpreter loop looked like this:

      while true do
             case spin() of
                 CMove(A,B,C) => if $C = 0w0 then ()
                                 else A < - $B
               | ARead(A,B,C) => A < - $$B sub (W32.toInt($C))
               | AWrite(A,B,C) => Array.update($$A, W32.toInt($B), $C)

Where spin is the function that reads the current word at the program counter, updates the program counter, decodes the word, and return the instruction. But how could 19% of the time be spend in the decode. I moved the call to decode from spin to my interpreter loop to aid the MLton optimizers:

      while true do
             case decode(spin()) of

This made the SANDmark 5 minutes faster wall clock time, that is 13 minutes. Or in MLton profiler time 529 seconds. 30% improvement just for moving a function around. Not bad.

Time spend: 30 minutes.

After this optimization my UM was fast enough that I thought I’d try to solve some of the puzzles. So I logged into the UMIX OS using the guest account and started to poke around and collect points. The first real puzzle was to fix a password cracker written in a weird Basic dialect that used roman numerals instead of decimal notation for integer literals (including for the line numbers).

Time spend: 1½ hour. Collected points 230.

Then I had to go home, and while I cooked dinner (I was baking pita bread, and while the dough was rising I had time to hack) I was able to write an improved password cracker—in this weird roman numerals Basic: hack2.bas. This gained me an other 100 points, just before the contest ended (the contest ended at 18:00 in CEST)

Time spend 45 min. Collected points in total 330.

All in all not bad to make 330 points after spending only seven hours and 45 minutes of rather fragmented time.

After dinner I was able to gain an other 35 points by writing a list reversal program in a graphical 2D language: rev.2d. It took half an hour or so.

The setup for the contest was absolutely amazing and most entertaining. My account of it here does not do it justice. An incredible amount of work must have gone into the preparation of it. I’m looking forward for the final debriefing from the Contest Organizers.

Yesterday, I tried for fun to optimize my UM program a bit more. Programs running on the UM are able to allocate and free arrays. In my original implementation I used a ref to a functional Red-Black tree to keep track of the mapping from UM-pointers to arrays. I know, not the best choice of data structure, but I was just trying to get a “good enough” UM up and running. From the profile it was obvious that lots of time and memory was spend on keeping the Red-Black trees balanced. Thus, I replaced this code with an array, and a free-list for reusing UM-pointers (32-bit words). Thus, my code for managing the “heap” when from 12 lines of code (not counting the code in the Red-Black tree library) to 28 lines of code. This small changed made the SANDmark run in 4 minutes wall clock(175 seconds of MLton profiler time) an improvement of almost 67%. Looking at the profile, I could see that decode was again on top of the list (using 42% of the time). Thus, I decided to inline decode and deforest the instruction datatype by hand. This made my code 68 lines smaller, and the SANDmark ran in 2.50 minutes (134 seconds of MLton profiler time), 23% improvement. Almost four times faster than the UM I participated in the contest with. Time spend 1½ hour.

  • Code for the UM I participated with: um2.sml
  • Code with improve heap handling: um3.sml
  • Code with inlined decode function: um4.sml

3 thoughts on “ICFP Contest 2006, Team KFL

  1. Hey, as you I’m trying to build/solve this contest… and as this is my first ‘virtual machine’ I’m enjoying every little step… As .Net is my primairy language I used it to build the UM in… never played so low level in .Net before I encountered some performance issues but at the momenten the SANDmark is running in about 30 minutes (and creates the correct output).

    Nou some little steps further I’m running the UMIX-platform… but here my problems start… I can’t login… I’ve tried many many guest-logins but they all result in “ACCESS DENIED for user”… Am I missing something? Read all documents multilple times but can’t discover what to use…

  2. This is a seriously excellent go through for me, Must confess that you are 1 of the best bloggers I ever saw.Thanks for posting this informative write-up.

Leave a Reply to Ken Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.