In 1967, during excavation for the construction of a new shopping center in Monroeville, Pennsylvania, workers uncovered a vault containing a cache of ancient scrolls. Most were severely damaged, but those that could be recovered confirmed the existence of a secret society long suspected to have been active in the region around the year 200 BC.
Based on a translation of these documents, we now know that the society, the Cult of the Bound Variable, was devoted to the careful study of computation, over two millennia before the invention of the digital computer.
Like last year the prospects for my participation in the ICFP Contest was not looking good. None of my team mates from last year’s team seemed to be able to participate, and neither did I myself. The weekend of the contest was packed with family business. And on top of that, when the weekend arrived I was sick Friday night and Saturday.
However, Sunday evening I had some free time and I decided that I would take a crack a the contest just to see what it was about. Judging from the discussion mailing-list it quite fun and interesting. The first phase of the contest task was to implement a 14-instruction virtual machine called UM and when that was running you should use it for running the provided codex for the operating system UMIX.
So I registered my team KFL and started to implement my UM in SML. The first thing I did was to implement an instruction decoder that could translate a 32-bit word into an SML datatype. Then I wrote a function that read in a file of 32-bit words encoded in big-endian as four 8-bit words each. And then
maped my decode function over the Vector of words. For this task the SML Basis Library really shined:
fun readFile filename = let val dev = BinIO.openIn filename val all = BinIO.inputAll dev before BinIO.closeIn dev val words = Vector.tabulate(Word8Vector.length all div 4, fn i => Word32.fromLarge(PackWord32Big.subVec(all,i))) in Vector.map decode words end
Time spend: 1 hour.
Unfortunately, this did not work. My decoding function failed after 1675 instructions or so, complaining about illegal instructions. And indeed the 32-bit word it complained about did not seem to encode a legal instruction. I tried to reimplement the conversion from 8-bit words to 32-bit words, in case
PackWord32Big worked different than I thought. But I still got the same error. Thus, I gave up and went to bed.
Time spend: 2 hours.
Monday morning I had to see to some other things first, but then I had some time to spend on the contest. Even after I had slept on the problem I still couldn’t figure out what was wrong. So I asked my colleague Arne if he had 10 minutes to help me debug my program. I explained him the problem, show him my code (actually my debug output, and then we looked at the codex in a hex-editor. He confirmed that from my explanation, my program appeared to be working correctly, and it looked as if there was an illegal instruction in the codex, if all instructions really was encoded a single 32-word. Hence, one or more of my assumptions had to wrong (is was easy to rule out that the codex was wrong, because more than a hundred teams were able to run the codex). Then it occurred to me, the codex was not required to only contain valid instructions, maybe the code would jump over damaged parts of the codex and part of the contest would be to repair the codex. Thus, I changed my code to only decode instructions on demand, and kept the whole program as an array of 32-bit words. Lo and behold the machine was able to start running the codex! However it failed in the self-check the codex performed. After some debugging I found one place where as I used the name of an register (registers in the UM is named by integers) as a value rather than using the value contained in the register. And now my UM was able to run the codex and the SANDmark (a debug and benchmark suite provided by the contest managers).
Time spend: 2 hours.
My first version ran the SANDmark in a bit more than 18 minutes (14 min user and 4 min sys) , 768 seconds user time according to MLton’s profiler. Which was not to bad but I’d seen on the discussion list, that other participants had UMs that ran the SANDmark in a couple of minutes. Thus, I decided to profile my UM to see where the time was spend. To my surprise the top function in the profile was my decode function, a function that took a 32-bit word and translates it to an SML datatype. Here are the first few lines of
decode together with the helper function
standardRegs that fetches out the register names:
fun standardRegs w = let open Word32 val A = (w << 0w23) >> 0w29 val B = (w << 0w26) >> 0w29 val C = andb(w, 0w7) in (toInt A, toInt B, toInt C) end fun decode w = let open Word32 val opr = w >> 0w28 in case opr of 0w0 => CMove(standardRegs w) | 0w1 => ARead(standardRegs w) | 0w2 => AWrite(standardRegs w) ...
And the top of my interpreter loop looked like this:
while true do case spin() of CMove(A,B,C) => if $C = 0w0 then () else A < - $B | ARead(A,B,C) => A < - $$B sub (W32.toInt($C)) | AWrite(A,B,C) => Array.update($$A, W32.toInt($B), $C) ...
spin is the function that reads the current word at the program counter, updates the program counter, decodes the word, and return the instruction. But how could 19% of the time be spend in the
decode. I moved the call to decode from spin to my interpreter loop to aid the MLton optimisers:
while true do case decode(spin()) of
This made the SANDmark 5 minutes faster wall clock time, that is 13 minutes. Or in MLton profiler time 529 seconds. 30% improvement just for moving a function around. Not bad.
Time spend: 30 minutes.
After this optimisation my UM was fast enough that I thought I’d try to solve some of the puzzles. So I logged into the UMIX OS using the guest account and started to poke around and collect points. The first real puzzle was to fix a password cracker written in a weird Basic dialect that used roman numerals instead of decimal notation for integer literals (including for the line numbers).
Time spend: 1½ hour. Collected points 230.
Then I had to go home, and while I cooked dinner (I was baking pita bread, and while the dough was rising I had time to hack) I was able to write an improved password cracker—in this weired roman numerals Basic: hack2.bas. This gained me an other 100 points, just before the contest ended (the contest ended at 18:00 in CEST)
Time spend 45 min. Collected points in total 330.
All in all not bad to make 330 points after spending only seven hours and 45 minutes of rather fragmented time.
After dinner I was able to gain an other 35 points by writing a list reversal program in a graphical 2D language: rev.2d. It took half an hour or so.
The setup for the contest was absolutely amazing and most entertaining. My account of it here does not do it justice. An incredible ammount of work must have gone into the preparation of it. I’m looking forward for the final debriefing from the Contest Organizers.
Yesterday, I tried for fun to optimise my UM program a bit more. Programs running on the UM are able to allocate and free arrays. In my original implementation I used a ref to a functional Red-Black tree to keep track of the mapping from UM-pointers to arrays. I know, not the best choice of data structure, but I was just trying to get a “got enough” UM up and running. From the profile it was obvious that lots of time and memory was spend on keeping the Red-Black trees balanced. Thus, I replaced this code with an array, and a free-list for reusing UM-pointers (32-bit words). Thus, my code for managing the “heap” when from 12 lines of code (not counting the code in the Red-Black tree library) to 28 lines of code. This small changed made the SANDmark run in 4 minutes wall clock(175 seconds of MLton profiler time) an improvement of almost 67%. Looking at the profile, I could see that
decode was again on top of the list (using 42% of the time). Thus, I decided to inline
decode and deforest the instruction datatype by hand. This made my code 68 lines smaller, and the SANDmark ran in 2.50 minutes (134 seconds of MLton profiler time), 23% improvement. Almost four times faster than the UM I participated in the contest with. Time spend 1½ hour.