Sunday, June 7, 2009

A simple midi wrapper library in Clojure

Java integration is good in Clojure, but to get a more native feel a small wapper library may be used. This is a such a wrapper I used to read midi messages from a midi device. This is not a complete wrapper, there is no functionality for sending midi messages, reading midi files, sysex messages etc. (The source can be found at github) (Note to OS X users, you will need to install a SPI to be able to access external midi devices from java. Mandolane and mmj are two alternatives.) First I'll create a namespace for the wrapper, and import some classes from javax.sound.midi:
(ns com.jalat.cljmidi
  (:import (javax.sound.midi MidiSystem MidiUnavailableException
        MidiDevice MidiDevice$Info
        Receiver Transmitter Synthesizer Sequencer
        MidiMessage ShortMessage SysexMessage MetaMessage)))
First a function that collects info about available mididevices into a list of hashmaps. The device object itself stored with the :device key since that would be used later to get hold of the actual midi device object.
(defn get-mididevices
  "Returns a list of hashes with info about the midi devices available."
  []
  (map (fn [device]
  {:name (.getName device)
   :vendor (.getVendor device)
   :version (.getVersion device)
   :description (.getDescription device)
   :device (. MidiSystem getMidiDevice device)})
       (. MidiSystem getMidiDeviceInfo)))
We would usually only be interested in in devices of one type so a filter function that filters on device type is probably useful.
(defn filter-mididevices [class device-infos]
  "returns a list of midi devices of the Class class from a map of midi devices"
  (filter (fn [device-info]
     (instance? class (:device device-info)))
   device-infos))
In my case I'm using a software midi device with two outputs (Transmitters):
com.jalat.cljmidi> (map :name (filter-mididevices Transmitter (get-mididevices)))
("from v.m.k. 1.6 osx 1 " "from v.m.k. 1.6 osx 2 ")
It's fairly easy to use Java ENUMS from clojure, but I find keywods more clojury, so I set up a couple of hashmaps to translate from ENUMS to keywords:
(def midi-shortmessage-status {ShortMessage/ACTIVE_SENSING :active-sensing
                               ShortMessage/CONTINUE :continue
                               ShortMessage/END_OF_EXCLUSIVE :end-of-exclusive
                               ShortMessage/MIDI_TIME_CODE :midi-time-code
                               ShortMessage/SONG_POSITION_POINTER :song-position-pointer
                               ShortMessage/SONG_SELECT :song_select
                               ShortMessage/START :start
                               ShortMessage/STOP :stop
                               ShortMessage/SYSTEM_RESET :system-reset
                               ShortMessage/TIMING_CLOCK :timing-clock
                               ShortMessage/TUNE_REQUEST :tune-request})

(def midi-sysexmessage-status {SysexMessage/SYSTEM_EXCLUSIVE :system-exclusive
                               SysexMessage/SPECIAL_SYSTEM_EXCLUSIVE :special-system-exclusive})

(def midi-shortmessage-command {ShortMessage/CHANNEL_PRESSURE :channel-pressure
                                ShortMessage/CONTROL_CHANGE :control-change
                                ShortMessage/NOTE_OFF :note-off
                                ShortMessage/NOTE_ON :note-on
                                ShortMessage/PITCH_BEND :pitch-bend
                                ShortMessage/POLY_PRESSURE :poly-pressure
                                ShortMessage/PROGRAM_CHANGE :program-change})
Some helper definitions/functions.
(def key-names [:C :C# :D :D#  :E :F :F# :G :G# :A :A# :B])

(defn keyname 
  "Given a midi note, returns the name of the note/key"
  [index]
  (nth (cycle key-names) index))

(defn- calculate-14-bit-value
  "Calculates the the 14 bit value given two integers 
representing the high and low parts of a 14 bit value."
  [lower higher]
  (bit-or (bit-and lower 0x7f)
   (bit-shift-left (bit-and higher 0x7f) 
     7)))
Midi commands are midi events like note-on/note-off/pitch-bend etc. This function takes the data from a command message and returns a hashmap of the values. I've split up the cond with whitespace between the different branches to make it easier to read. There is one clojure idiom here that may look strange if you're new to clojure: the (#{:foo :bar} :gaz) test. #{} is the reader macro for creating a hashset and using it as a function it will return true if the argument is member of the set. so (#{:foo :bar} :foo) will return true, while (#{:foo :bar} :gaz) will return false.
(defn- decode-midi-command 
  "Takes the data of a midi-command and returns a hashmap of the message"
  [command channel data1 data2]
  (cond (#{:note-on :note-off} command)
 {:command command :channel channel :key (keyname data1)
  :octave (int (/ data1 12)) :velocity data2}

 (#{:channel-pressure :poly-pressure} command)
 {:command command :channel channel :key (keyname data1) 
  :octave (int (/ data1 12)) :pressure data2}

 (= :control-change command)
 {:command command :channel channel :change data1 :value data2}

 (= :program-change command)
 {:command command :chanel channel :change data1}

 (= :pitch-bend command)
 {:command command :channel channel
  :change (calculate-14-bit-value data1 data2)}))
With the helper functions out of the way it's time to get the java midi events translated to clojure maps. There are three types of midi messages in javax.sound.midi: ShortMessage, SysexMessage and MetaMessage. MetaMessages is used when reading midi files, so I'm just going to ignore it for now and just add functions for decoding ShortMessages and SysexMessages and worry about MetaMessages later. Clojure has a neat way of allowing me to do this cleanly. I'm going to make a multimethod that dispatches on the class of the message. If I send it a MetaMessage it will raise an exeption, but since I'm just going to use this to read events from midi devices that is not an issue for me.
(defmulti decode-midi-message class)

(defmethod decode-midi-message javax.sound.midi.ShortMessage [message]
  (let [status (midi-shortmessage-status (. message getStatus))
        command (midi-shortmessage-command (. message getCommand))
        channel (inc (. message getChannel))
        data1 (. message getData1)
        data2 (. message getData2)]
    (cond command (decode-midi-command command channel data1 data2)
   status  {:status status}
   :else   {:unknown-status (. message getStatus)
     :unknown-command (. message getCommand)
     :byte1 data1 :byte2 data2})))

(defmethod decode-midi-message SysexMessage [message]
  (let [bytes (. message getData)]
    {:status (midi-sysexmessage-status (. message getStatus))
     :data bytes}))
The most foreign concept of the midi api for me is that to be able to accept midi messages I need to create a object that implements the Receiver interface. Much more natural for me would be to set up a callback for each event that arrives. The callback function would accept a hashmap representing the event. In addition to the data from decode-midi-message we add a timestamp.
(defn midi-input-callback
  "Sets up a callback to f with a map representing a midi message"
  [transmitter f]
  (let [receiver (proxy [Receiver] []
     (close [] nil)
     (send [message timestamp]
    (f (assoc (decode-midi-message message)
                                   :timestamp timestamp))))]
    (. transmitter setReceiver receiver)
    (. transmitter open)
    transmitter))
We might also not worry about a callback, maybe we just want to collect all incoming messages in a sequence? This function set up a closure with a ref to a sequence, and then creates a callback that simply add all incoming events to that sequence.
(defn midi-input-collection
  "Takes a transmitter as the argument and sets up a receiver that puts
all incoming messages into a ref to a sequence. Returns the ref"
  [transmitter]
  (let [midi-data (ref ())
        receiver (proxy [Receiver] []
     (close [] nil)
     (send [message timestamp]
    (dosync
     (alter midi-data
     conj (assoc (decode-midi-message message)
     :timestamp timestamp)))))]
    (. transmitter setReceiver receiver)
    (. transmitter open)
    midi-data))
Finally, agents could combine the two features. You could set up a callback that get sent to the agent for each message that arrives, and store the sequence of events in the agent itself. Other functions could be sent to the agent by other parts of the program, for example to empty or truncate the sequence.
(defn setup-midi-agent
  "Sets up and agent and sends 'handler-function' to it whenever a message
arrives from the transmitter"
  [transmitter handler-function]
  (let [midi-agent (agent {:transmitter transmitter
                           :beat-stamp 0
                           :beat-gap 0
                           :timing-clock-queue []
                           :last-message-timestamp 0})]
    (midi-input-callback transmitter
    (fn [message-map]
      (send midi-agent handler-function message-map)))
    midi-agent))
Now we've got a simple way of collecting midi events that doesn't really show too much of the java underpinnings:
com.jalat.cljmidi> (def *foo* 
   (midi-input-collection 
    (:device (first (filter-mididevices Transmitter (get-mididevices))))))
#'com.jalat.cljmidi/*foo*
com.jalat.cljmidi> @*foo*
({:timestamp 14031000, :command :note-on, :channel 1, :key :C, :octave 0, :velocity 0}
 {:timestamp 13883000, :command :note-on, :channel 1, :key :C, :octave 0, :velocity 80}
 {:timestamp 13882000, :command :note-on, :channel 1, :key :E, :octave 4, :velocity 0}
 {:timestamp 12680000, :command :note-on, :channel 1, :key :E, :octave 4, :velocity 0}
 {:timestamp 12523000, :command :note-on, :channel 1, :key :E, :octave 4, :velocity 80}
 {:timestamp 12523000, :command :note-on, :channel 1, :key :D#, :octave 3, :velocity 0}
 {:timestamp 11280000, :command :note-on, :channel 1, :key :D#, :octave 3, :velocity 0}
 {:timestamp 11148000, :command :note-on, :channel 1, :key :D#, :octave 3, :velocity 80}
 {:timestamp 11146000, :command :note-on, :channel 1, :key :D, :octave 2, :velocity 0}
 {:timestamp 10431000, :command :note-on, :channel 1, :key :D, :octave 2, :velocity 0}
 {:timestamp 10235000, :command :note-on, :channel 1, :key :D, :octave 2, :velocity 80}
 {:timestamp 10235000, :command :note-on, :channel 1, :key :A, :octave 9, :velocity 0})


Sunday, October 19, 2008

The game, clojure version.

I've been playing around with Clojure a bit, so here is a Clojure version of the little puzzle I wrote in my first blog entry If you don't have clojure set up, go to http://clojure.org/ and get going.

First, we'll make our own little namespace. We will have to call out to java to set up the gui, so we import some swing classes into our namespace as well:

user> (ns com.jalat.flipper)
nil
com.jalat.flipper> (import '(java.awt Color GridLayout Dimension)
                         '(java.awt.event ActionEvent ActionListener)
                         '(javax.swing ImageIcon JFrame JPanel JButton JLabel JTextField JOptionPane BorderFactory))

If you read my first post, you know that the object of the game is to get a layout where all buttons are highlighted, except the center button. I'm going to use true/false values for this, and I'm going to put the values into a vector. The [] is Clojure syntax for a vector.:

com.jalat.flipper> (def solution [true true true
                                true false true
                                true true true])
#'com.jalat.flipper/solution

I'm going to put the moves we can do into a hash, {} is the syntax for a hash. I'm using numbers for the key, and each vaule is a vector where true is a field that will be flipped by the move, and false is a field that will be unchanged by the move.

com.jalat.flipper> (def moves {1 [true  true  false
                                true  true  false
                                false false false]
                             2 [true  true  true
                                false false false
                                false false false]
                             3 [false true  true
                                false true  true
                                false false false]
                             4 [true  false false
                                true  false false
                                true  false false]
                             5 [true  false true
                                false true  false
                                true  false true]
                             6 [false false true
                                false false true
                                false false true]
                             7 [false false false
                                true  true  false
                                true  true  false]
                             8 [false false false
                                false false false
                                true  true  true]
                             9 [false false false
                                false true  true
                                false true  true]})
#'com.jalat.flipper/moves

In Clojure hashes can be called as if they were a function by giving the key as an argument. so to look up the move with the key "1", just call it:

com.jalat.flipper> (moves 1)
[true true false true true false false false false]

So we got a solution and moves, all we need is a starting position. We have the function rand which gives a float between 0 and 1 if we don't give any arguments. Clojure has a shortcut for lambda which is #(). I'm not using any arguments in this case but they would be specified with % and an optional number if there are more than one argument. #(+ 5 %) would be an anonymous function adding five to the argument it receives. repeatedly is a function that just runs it's argument forever and returns it as a lazy sequence. The lazyness is important as this would otherwise be known as a endless loop. take to the rescue. Take returns the first n elements of a sequence, this prevents repeadedly to run forever. Finally, into takes the sequence and stuffs it into a vector.

com.jalat.flipper> (defn scramble []
                    "Generate a new starting position"
                    (into [] (take 9 (repeatedly #(< 0.5 (rand))))))
com.jalat.flipper> (scramble)
[true true false false true true false true false]

Clojures data structures are immutable, but I'll need a way of keeping track of what the current position is, and how many moves we've done so far. For that I'll use references. A reference is a pointer to a data structure. While the data itself can not be modified, the reference can be modified inside a transaction. There is a shortcut to deref, just put @ in front of the reference. A transaction is set up with dosync, any code within the dosync will happen within the transaction:

com.jalat.flipper> (def state (ref (scramble)))
#'com.jalat.flipper/state
com.jalat.flipper> state 
clojure.lang.Ref@2b71a3
com.jalat.flipper> @state
[false true true false true false true true true]
com.jalat.flipper> (def num-moves (ref 0))
#'com.jalat.flipper/num-moves
com.jalat.flipper> (defn new-puzzle []
                    "Set up a starting position, and zero out the moves"
                    (dosync
                      (ref-set num-moves 0)
                      (ref-set state (scramble))))
#'com.jalat.flipper/new-puzzle

Next, two functions to perform a move, flip takes two vectors (Sequences, actually. I could have used lists) and does a xor on the two vectors. apply-move starts a transaction, and in that transaction updates the number of moves we've done, finds the correct move and calls flip on that move and the current state. The alter function used takes the data that the ref givens as it's first argument, and gives that as the first argument to the third argument together with the rest of the arguments. Finally the ref will be pointed to the result of the function. So (alter num-moves inc) will find the current number of moves, increase it by one, and point the num-moves reference to the increased number.

com.jalat.flipper> (defn flip [pieces state]
                    "Takes two true/false sequences, one specifies which bits
of the other to flip true/false. (Or the other way around,
it's a symmetric operation.)"
                    (map (fn [x y] (and (not (and x y))
                                        (or x y)))
                         pieces state))
#'com.jalat.flipper/flip
com.jalat.flipper> (defn apply-move [n]
                    "Updates the state by applying move n to the current state."
                    (dosync
                      (alter num-moves inc)
                      (alter state flip (moves n))))
#'com.jalat.flipper/apply-move

Time for some java integration. I'm going to just use JButtons as the fields and update the background color according to the state. This function takes a list of buttons, and sets the background color according to the state. Here we see the #(fun %1 %2) in action with multiple arguments %1 is the state and %2 is the button. A symbol starting with . is considered a method call on a object. What is here (.setBackground button (.red Color)) would in Java be written as button.setBackground(Color.red) Finally. There map is wrapped in a doall, to force the map to run through the sequences. Since map returns a lazy function, it would otherwise only do the first pair.

I'm not entirely happy about how I'm doing this. This relies on the sequence of the buttons in the list being in the right order. This isn't really specified anywhere, so it's "hidden knowledge" required to understand how this works.

com.jalat.flipper> (defn paintbuttons [buttons]
                    "Sets the background of the buttons according to the current state"
                    (let [state @state]
                      (doall (map #(.setBackground %2 (if %1
                                                        (.red Color)
                                                        (.black Color)))
                                  state buttons))))
#'com.jalat.flipper/paintbuttons

Finally a function to set up the gui. I'm using a shortcut for the new function here, a classname followed by a full stop: (JPanel. "Hello") Is equivalent to new JPanel("Hello"); in Java. Another convenience macro is doto. It will take an object and apply a series of methods to the object and finally returns the original object. Finally there is proxy. Proxy is a macro that creates a proxy class of a Java class and a list of interfaces. (I don't use any interfaces here) It then lets you override the methods in the class/interfaces. I use it here to add listeners to the buttons in the interface.

com.jalat.flipper> (defn make-gui []
                    "Sets up the playing field and adds listeners"
                    (new-puzzle)
                    (let [panel (JPanel. (GridLayout. 0 3))
                          moveField (doto (JTextField. "0")
                                      (setHorizontalAlignment (.RIGHT JTextField))
                                      (setEditable false))
                          buttons (map #(doto (JButton. (ImageIcon. (str "/Users/asbjxrn/Clojure/projects/com/jalat/flipper/" % ".gif")))
                                          (setBackground (if (@state (dec %))
                                                            (.red Color)
                                                            (.black Color)))
                                          (setBorder (.createEmptyBorder BorderFactory))
                                          (setSize (Dimension. 40 40))
                                          (setContentAreaFilled false)
                                          (setOpaque true))
                                       (range 1 10))]
                      (doall (map #(.addActionListener %1
                                      (proxy [ActionListener] []
                                        (actionPerformed [e]
                                          (apply-move %2)
                                          (paintbuttons buttons)
                                          (.setText moveField (str @num-moves))
                                          (when (= solution @state)
                                            (.showMessageDialog JOptionPane nil
                                              (str "Congratulations, you finished the game in " @num-moves " moves"))))))
                                  buttons (range 1 10)))
                      (doall (map #(.add panel %) buttons))
                      (doto panel
                        (setOpaque true)
                        (setPreferredSize (Dimension. 300 400))
                        (add (doto (JButton. "Start")
                               (addActionListener
                                 (proxy [ActionListener] []
                                   (actionPerformed [e]
                                     (new-puzzle)
                                     (paintbuttons buttons)
                                     (.setText moveField (str @num-moves)))))))
                        (add (JLabel. "Moves:"))
                        (add moveField))
                      (doto (JFrame. "Flipper")
                        (setContentPane panel)
                        (pack)
                        (setVisible true))))
#'com.jalat.flipper/make-gui

Finally, the only thing remaining is to launch the gui:

com.jalat.flipper> (make-gui)
javax.swing.JFrame[frame0,0,22,300x422,layout=java.awt.BorderLayout,title=Flipper,resizable,normal,defaultCloseOperation=HIDE_ON_CLOSE,rootPane=javax.swing.JRootPane[,0,22,300x400,layout=javax.swing.JRootPane$RootLayout,alignmentX=0.0,alignmentY=0.0,border=,flags=449,maximumSize=,minimumSize=,preferredSize=],rootPaneCheckingEnabled=true]

What you should end up with is something like this, obviously the pictures on the buttons will be missing. I added them when I created the buttons in the make-gui function.

Tuesday, October 7, 2008

Parsing logs

So I made a blog and some requests have started coming in. And of course I want to get some data about my visitors.

I could just download a program from somewhere and get all the statistics I could think of in less than ten minutes, but where is the fun in that? Better to spend hours to try to write some lisp code that barely manages to get the most basic stats from the logfiles.

You'll notice that I redefine the same function again and again and again. This is how I usually work. Gradually refining functions until I think they work as I want.(This is where you're appaled by the lack of unit testing/formal verification. I basically fiddle until it works for the limited amount of test cases I have.)

First some libraries. I'm gonna chicken out of the hard work of the parsing because Edi Weitz already did that and made the cl-ppcre package which implements perls regular expressions in Common Lisp.


(defpackage :com.jalat.logparse
  (:use :cl :cl-ppcre))

A short run-down of the regular expressions I'm going to use for those who don't know perl regular expressions:

most characters matches themselves.
\Smatches non-empty characters (anything but spaces, newlines and tabs).
\dmatches digits.
\w matches alphanumeric characters (Uses alphanumericp, might differ from implementation to implementation).
[abc]matches a list of characters.
[^abc]matches everything except the list of characters.
(pattern)stores pattern in a "register" for later use
+repeat previous character/group one or more times
*repeat previous character/group zero or more times
{n}repeat previous character/group n times
|means or, so (a|b) matches a or b.
^at the start of line, means the beginning of a line so "^a" does not match "ba"

Here's two log entries, this is actually one request. The first request use a old url, and I redirect the visitor to the new url. (Which by now also is an old url... If only planning things were fun I might actually start doing some one day.):


222.153.17.71 - - [08/Jan/2006:00:14:47 +0000] "GET /back-of-the-envelope.html HTTP/1.1" 302 302 "http://planet.lisp.org/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2" "-"
222.153.17.71 - - [08/Jan/2006:00:14:48 +0000] "GET /blog.ucw?entry-id=2 HTTP/1.1" 200 15016 "http://planet.lisp.org/" "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2" "-"

The first part is the ip address of the user who looked at our blog. As a simple demo of how to use cl-ppcre, I'll scan and collect the first string up until the first whitespace. (hopefully gathering the ip address in the process.) The register-groups-bind will do the scanning and storing (binding) for me.


(in-package :com.jalat.logparse)

(defparameter *teststrings* '("222.153.17.71 - - [08/Jan/2006:00:14:48 +0000] \"GET /blog.ucw?entry-id=2 HTTP/1.1\" 200 15016 \"http://planet.lisp.org/\" \"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2\" \"-\"" 
                              "222.153.17.71 - - [08/Jan/2006:00:14:47 +0000] \"GET /back-of-the-envelope.html HTTP/1.1\" 302 302 \"http://planet.lisp.org/\" \"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2\" \"-\""))

(defun split-string (logstring)
    (register-groups-bind 
      (ipaddr) 
      ("(\\S+) " logstring)
    (list ipaddr)))

COM.JALAT.LOGPARSE 39 > (first *teststrings*)
"222.153.17.71 - - [08/Jan/2006:00:14:48 +0000] \"GET /blog.ucw?entry-id=2 HTTP/1.1\" 200 15016 \"http://planet.lisp.org/\" \"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2\" \"-\""

COM.JALAT.LOGPARSE 40 > (split-string (first *teststrings*))
("222.153.17.71")

As you can see, the (\\S+) has matched non-empty characters and is stored in the variable ipaddr. cl-ppcre returns the first match it can find, that is why I get the ip address and not just any substring consisting of non-whitespace. Why do I use \\S if \S is what matches a character? \ is the quote character for strings in lisp so a simple \ will just pass "S" on to cl-ppcre:



COM.JALAT.LOGPARSE 41 > (princ "\a")
a
"a"

COM.JALAT.LOGPARSE 42 > (princ "\\a")
\a
"\\a"

COM.JALAT.LOGPARSE 44 > (length "\a")
1

COM.JALAT.LOGPARSE 45 > (length "\\a")
2

I'm not going to bother with the ip adresses. I guess it might be useful if you want to try to count individual users, but then you have to be clever about different user behind a proxy/NAT. So I won't be doing that. Too complicated for me, and who really wants to reduce the percieved number of visitors anyway. I'm doing this to boost my ego, after all.

What I will do is to use the time data. I can't use \\S+ here since there is a space in the string. What I'll do is read the first [ and then collect all non-] characters.


(defun split-string (logstring)
    (register-groups-bind 
      (when) 
      ("\\[([^\\]]*)\\]" logstring)
    (list when)))

COM.JALAT.LOGPARSE 46 > (split-string (first *teststrings*))
("08/Jan/2006:00:14:48 +0000")

As you can see, I have to quote (twice) the ['s and ]'s that are literal, to stop them being interpreted as a [] directive. If you have problems sorting out the expression above, it might help to eliminate from outside and in starting with \\[ and \\]. Inside those are the ( ) meaning that I want to collect the string matching the pattern inside the ( ). Finally the inner expression [^\\]]* means 0 or more characters that are not the character ']'

I'm a bit lucky because of the fact that there are no [] in the log strings before the time description. This means that I abuse the fact that cl-ppcre returns the first match it finds. But this is not the best way of writing a regular expression if we want performance. With this expression the scanner have to search the string for the start of the expression. If we tell the scanner where the expression starts we can reduce the workload a bit:


(defun split-string (logstring)
    (register-groups-bind 
      (when) 
      ("^\\S+ \\S+ \\S+ \\[([^\\]]*)\\]" logstring)
    (list when)))

The ^ tells the scanner that our expression starts at the start of the string, then there are three substrings separated by spaces that I'm not interested in. Finally there is the date string which I am interested in.

Next is to extract the url. The url is surrounded by "GET and HTTPsomething". I'm using a similar trick to what I used above to match up to the ".


(defun split-string (logstring)
    (register-groups-bind 
      (when request) 
      ("^\\S+ \\S+ \\S+ \\[([^\\]]*)\\] \"GET (\\S*) HTTP[^\\\"]*\"" logstring)
    (list when request)))


COM.JALAT.LOGPARSE 5 > (split-string (first *teststrings*))
("08/Jan/2006:00:14:48 +0000" "/blog.ucw?entry-id=2")

COM.JALAT.LOGPARSE 7 > (split-string (second *teststrings*))
("08/Jan/2006:00:14:47 +0000" "/back-of-the-envelope.html")

Now, remember that this is only one request that is being redirected. I don't really want to count it twice. The next element of the string helps us sort out requests that has been answered successfully from requests that fail, are redirected or don't exist. This element is the result code, 200 means a successfull request so I'll hardcode 200 in my regexp string. The field after the result code I'll skip by matching but not collecting it. Then I'll collect the referring url. I want to see who have been nice and linked to me.


(defun split-string (logstring)
    (register-groups-bind 
      (when request referrer) 
      ("^\\S+ \\S+ \\S+ \\[([^\\]]*)\\] \"GET (\\S*) HTTP[^\\\"]*\" 200 \\S+ \"([^\\\"]+)\"" logstring)
    (list when request referrer)))

COM.JALAT.LOGPARSE 9 > (split-string (car *teststrings*))
("08/Jan/2006:00:14:48 +0000" "/blog.ucw?entry-id=2" "http://planet.lisp.org/")

COM.JALAT.LOGPARSE 10 > (split-string (cadr *teststrings*))
NIL

One more detail, I don't like the date/time format, for now all I want is the date. It would also be nice to have a more sortable date. I will do this by splitting up the date string in days months year and combining it to a number. \d{2} will match two digits (the day part), \w{3} three alphanumeric character (month) and \d{4} the year.


(defparameter *months* '(("Jan" . "01") ("Feb" . "02") ("Mar" . "03")
                         ("Apr" . "04") ("May" . "05") ("Jun" . "06")
                         ("Jul" . "07") ("Aug" . "08") ("Sep" . "09")
                         ("Oct" . "10") ("Nov" . "11") ("Dec" . "12")))

(defun numerical-date (date month year)
  (parse-integer 
   (concatenate 'string
                year
                (cdr (assoc month *months* :test #'string=))
                date)))

(defun split-string (logstring)
    (register-groups-bind 
      (date month year request referrer) 
      ("^\\S+ \\S+ \\S+ \\[(\\d{2})/(\\w{3})/(\\d{4})[^\\]]*\\] \"GET (\\S*) HTTP[^\\\"]*\" 200 \\S+ \"([^\\\"]+)\"" logstring)
    (list (numerical-date date month year) request referrer)))

COM.JALAT.LOGPARSE 34 > (split-string (car *teststrings*))
(20060108 "/blog.ucw?entry-id=2" "http://planet.lisp.org/")

Ok, time to start counting. I'm just going to start with a hash-table of dates and count how many requests I get for each date. I will also move the regexp out of the record-string function, so the regexp scanner only have to be built once. This should increase the performance a bit.


(defparameter *logscanner* (create-scanner "^\\S+ \\S+ \\S+ \\[(\\d{2})/(\\w{3})/(\\d{4})[^\\]]*\\] \"GET (\\S*) HTTP[^\\\"]*\" 200 \\S+ \"([^\\\"]+)\""))
(defparameter *datehash* (make-hash-table))

(defun record-entry (date request referrer)
  (declare (ignore request referrer))
  (let* ((currentcount (gethash date *datehash* 0)))
    (setf (gethash date *datehash*) (1+ currentcount))))

(defun record-string (logstring)
    (register-groups-bind 
      (day month year request referrer) 
      (*logscanner* logstring)
      (record-entry (numerical-date day month year) request referrer)))

(defun parse-file (filename)
  (with-open-file (s filename)
    (do ((string (read-line s) (read-line s nil 'eof)))
        ((eql string 'eof) nil)
      (record-string string))))

(defun printstats (filename)
  (parse-file filename)
  (dolist (x (sort 
               (loop for k being the hash-key of *datehash*
                     collect (list k (gethash k *datehash*)))
               #'>
               :key #'car))
    (format t "~a ~a~%" (car x) (cadr x))))


COM.JALAT.LOGPARSE 28 > (printstats "~/poundlog")
20060128 1986
20060127 325
20060126 33
20060125 29
20060124 39
20060123 25
20060122 73
20060121 30
20060120 32
20060119 67
20060118 103
20060117 116
20060116 96
20060115 98
20060114 89
20060113 130
20060112 123

The code above is quite simple I hope, create-scanner creates a closure that I use in the now renamed record-string. The result is passed on to record-entry that updates the hash table of hits. parse-file loops through a file calling record-string for each line. printstats just prints out the keys/values of the hash table.

Ok, this raises the obvious question, what happened on the 27/28 of January? 30 visitors a day is unlikely enough (I'm not going to bother showing this, but there is a surprisingly large amount of bots traversing the net :) so what on earth triggered almost 2000 requests on the 28th? Let's have a look at where they come from. Instead of keeping a single number for each date, I'll have a hash table for each date, and in that hash table record the number of times each url appear as a referrer:



(defun record-entry (date request referrer)
  (declare (ignore request))
  (let* ((referrerhash (gethash date *datehash* (make-hash-table :test 'equal))))
    (incf (gethash referrer referrerhash 0))
    (setf (gethash date *datehash*) referrerhash)))

Hash tables by default uses eql to compare values. This works well with the numerical key I use for the dates, but won't work too well for the url strings. That is why I the :test argument to choose a different equality test. The possible choices are eq, eql, equal and equalp. For string keys equal will do nicely.


(defun printstats (filename)
  (setf *datehash* (make-hash-table))
  (parse-file filename)
  (dolist (date-entry (sort 
                       (loop for k being the hash-key of *datehash*
                             collect (list k (gethash k *datehash*)))
                       #'>
                       :key #'first))
    (format t "~%~a:~%" (first date-entry))
    (dolist (referrer (sort 
                       (loop for k being the hash-key of (second date-entry)
                             collect (list k (gethash k (second date-entry))))
                       #'>
                       :key #'second))
      (format t "  ~a ~a~%" (second referrer) (first referrer)))))

printstats is updated with an inner loop that loops over the hashes in the date hashtable, sorting by the number of request from the url. And the result:


COM.JALAT.LOGPARSE 19 > (printstats "~/poundlog")

20060128:
  621 http://www.jalat.com/blogs/lisp
  433 http://www.jalat.com/blogs/lisp?id=4
  242 http://planet.lisp.org/
  153 http://www.jalat.com/
  113 http://www.jalat.com/blogs/lisp?id=3
  67 http://www.jalat.com/blogs/lisp?id=1
  53 http://www.jalat.com/blogs/lisp?id=2
  42 http://weitz.de/hunchentoot/
  40 http://lemonodor.com/archives/001339.html
  31 http://lemonodor.com/
  28 http://www.lemonodor.com/
  17 http://www.google.com/reader/lens/
  7 http://keithdevens.com/weblog
  5 http://anarchaia.org/

I cut off most of this list, it's 108 different referrers for the 28th january and while there are some interesting entries there, they are not responsible for the spike on the 27/28th. It turns out quite a few people read http://planet.lisp.org and that Mr. John Wiseman aka. lemonodor mentioned my ramblings in his blog which happens to be aggregated on planet.lisp.org.

Anyway, the biggest referrer seems to be myself. Mostly links to images, is my guess. Shouldn't be too hard to check. In fact, since is is possible to use a list as a key to the hash table, I only have to make the key to my hash a list of (referrer request). A simple two line change:


(defun record-entry (date request referrer)
  (let* ((referrerhash (gethash date *datehash* (make-hash-table :test 'equal))))
    (incf (gethash (list referrer request) referrerhash 0)) 
    (setf (gethash date *datehash*) referrerhash)))

(defun printstats (filename)
  (setf *datehash* (make-hash-table))
  (parse-file filename)
  (dolist (date-entry (sort 
                       (loop for k being the hash-key of *datehash*
                             collect (list k (gethash k *datehash*)))
                       #'>
                       :key #'first))
    (format t "~%~a:~%" (first date-entry))
    (dolist (referrer (sort 
                       (loop for k being the hash-key of (second date-entry)
                             collect (list k (gethash k (second date-entry))))
                       #'>
                       :key #'second))
      (format t "  ~a ~a => ~a~%" (second referrer) (first (first referrer)) (second (first referrer)))))
  *datehash*)


COM.JALAT.LOGPARSE 22 > (printstats "~/poundlog")

20060128:
  266 http://www.jalat.com/blogs/lisp => /static-files/pieces.gif
  227 http://www.jalat.com/blogs/lisp => /static-files/stylesheet.css
  197 http://www.jalat.com/blogs/lisp?id=4 => /static-files/pieces.gif
  186 http://www.jalat.com/blogs/lisp?id=4 => /static-files/stylesheet.css
  96 http://planet.lisp.org/ => /blogs/lisp?id=4
  94 http://planet.lisp.org/ => /blogs/lisp
  85 http://www.jalat.com/ => /blogs/lisp
  54 http://www.jalat.com/blogs/lisp => /blogs/lisp?id=3
  53 http://www.jalat.com/blogs/lisp?id=3 => /static-files/stylesheet.css
  52 http://planet.lisp.org/ => /blogs/lisp?id=3
  42 http://weitz.de/hunchentoot/ => /
  40 http://www.jalat.com/ => /static-files/stylesheet.css
  33 http://www.jalat.com/blogs/lisp?id=2 => /blogs/lisp?id=1
  31 http://www.jalat.com/blogs/lisp?id=3 => /blogs/lisp?id=2
  29 http://www.jalat.com/blogs/lisp?id=1 => /static-files/flipper-linux.jpg
  24 http://www.jalat.com/blogs/lisp => /blogs/lisp?id=1
  21 http://lemonodor.com/archives/001339.html => /blogs/lisp

Not just pictures it seems. The stylesheet is a big contributor as well. I'm not all that interested in this as they aren't really hits, so I'm going to filter it out. At the same time i'll add in the missing hostname for the request strings.


(defun record-entry (date request referrer)
  (unless (scan "static-files|favicon.ico|flipper-linux.jpg" request)
    (let* ((referrerhash (gethash date *datehash* (make-hash-table :test 'equal)))
           (full-request (concatenate 'string "http://www.jalat.com" request)))
      (incf (gethash (list referrer full-request) referrerhash 0))
      (setf (gethash date *datehash*) referrerhash))))

scan is another function from cl-ppcre. It returns indexes to where the pattern is found in the string, or nil if the pattern is not found. I'm not interested in the index, but simply use it as a true/false test for if the pattern is in the string. The resulting output is:


20060128:
  96 http://planet.lisp.org/ => http://www.jalat.com/blogs/lisp?id=4
  94 http://planet.lisp.org/ => http://www.jalat.com/blogs/lisp
  85 http://www.jalat.com/ => http://www.jalat.com/blogs/lisp
  54 http://www.jalat.com/blogs/lisp => http://www.jalat.com/blogs/lisp?id=3
  52 http://planet.lisp.org/ => http://www.jalat.com/blogs/lisp?id=3
  42 http://weitz.de/hunchentoot/ => http://www.jalat.com/
  33 http://www.jalat.com/blogs/lisp?id=2 => http://www.jalat.com/blogs/lisp?id=1
  31 http://www.jalat.com/blogs/lisp?id=3 => http://www.jalat.com/blogs/lisp?id=2
  24 http://www.jalat.com/blogs/lisp => http://www.jalat.com/blogs/lisp?id=1
  21 http://lemonodor.com/archives/001339.html => http://www.jalat.com/blogs/lisp
  16 http://www.jalat.com/blogs/lisp => http://www.jalat.com/blogs/lisp?id=2
  16 http://www.jalat.com/blogs/lisp?id=1 => http://www.jalat.com/blogs/lisp?id=2
  15 http://www.jalat.com/blogs/lisp?id=3 => http://www.jalat.com/blogs/lisp?id=1
  15 http://www.jalat.com/blogs/lisp?id=4 => http://www.jalat.com/blogs/lisp?id=3
  15 http://lemonodor.com/archives/001339.html => http://www.jalat.com/blogs/lisp?id=4
  12 http://www.lemonodor.com/ => http://www.jalat.com/blogs/lisp
  12 http://www.lemonodor.com/ => http://www.jalat.com/blogs/lisp?id=4
  12 http://lemonodor.com/ => http://www.jalat.com/blogs/lisp
  11 http://lemonodor.com/ => http://www.jalat.com/blogs/lisp?id=4

It works even though it's not that pretty. I'm going to leave it at that anyway, because I'm trying to shorten these blog entries and I've hopefully demonstrated some regexps and hash table usage by now. Throw it into a table for more presentable output. And you probably want to combine www.domain.name and domain.name urls. I'm going to finish off with someting a bit different:

CL-DOT is a package by Juho Snellman that makes it easy to make data files for the dot program in the Graphviz suite. It works by defining methods that describe how different objects should be represented, and then passing a data structure to generate-graph.

generate-graph will generate a graph that can be saved in different ways, I just save it to a dot file, and later use dot to generate the picture. It is also possible to have cl-dot generate ps/jpeg/etc. directly.

(defparameter *dothash* (make-hash-table))

(defun record-entry (date request referrer)
  (unless (scan "static-files" request)
    (let* ((referrerhash (gethash date *datehash* (make-hash-table)))
           (interned-request (intern (concatenate 'string "http://www.jalat.com" request)))
           (interned-referrer (intern referrer))
           (ref-dothash (gethash interned-referrer *dothash* (make-hash-table))))
      (incf (gethash (list interned-referrer interned-request) referrerhash 0))
      (setf (gethash date *datehash*) referrerhash)
      (incf (gethash interned-request ref-dothash 0))
      (incf (gethash :num-referrals ref-dothash 0))
      (setf (gethash interned-referrer *dothash*) ref-dothash))))

I have to make one workaround. cl-dot keeps track of object by keeping them in a hash. That means that I have to turn the strings into symbols. intern does just that. *dothash* Each entry in the *dothash* hash has a url as a key and the value is another hash which lists the urls the first url refers to and the number of times it refers to them. I'm also putting in a special symbol :num-referrals that contains the sum of the urls referrals.

BTW: The use of hash tables here is a bit dubious. Each url usually only refers to a couple of other urls. So the hash tables are very small, and the overhead is high. A list would probably have been a more efficient choice.


(defpackage :com.jalat.logparse
  (:use :cl :cl-ppcre :cl-dot))

(defmethod object-node ((object list))
  nil)

(defmethod object-knows-of ((object list))
  object)

I'm going to pass a bunch of urls in a list to the generate-graph function. I don't want to print the list in the graph, it's just a container for the data, so object-node for lists just return nil. I do need cl-dot to model the contents of the list, so I have to make the contents known. I do this with the object-knows-of method.


(defmethod object-node ((object symbol))
  (make-instance 'node
                 :attributes (list :label object
                                   :shape :box)))

(defmethod object-points-to ((object symbol))
  (let ((reqhash (gethash object *dothash*)))
    (when reqhash
      (remove-if 'null
                 (loop for i being the hash-key of reqhash
                       collect (unless (eql :num-referrals i)
                                 (make-instance 'attributed
                                                :object i
                                                :attributes (list :label (gethash i reqhash)))))))))

I'm going to represent symbols (urls) as boxes, and the symbol itself as the label of the box. Urls point to other urls, finding the urls this url refers to is as simple as looking it up in the *dothash* table and removing the :num-referrals symbol.


(defun make-graph (filename)
  (with-open-file (out filename :direction :output :if-exists :supersede)
    (cl-dot:print-graph
     (cl-dot:generate-graph 
      (mapcar #'first
              (subseq (sort (loop for k being the hash-key of *dothash*
                                  collect (let ((refhash (gethash k *dothash*)))
                                            (list k (gethash :num-referrals refhash))))
                            #'>
                            :key #'second)
                      0 20)))
     out)))



I could just generate a huge graph with all the urls, but it gets quite messy. Here the loop generates a list of (url num-referrals), sort sorts it based on the num-referrals, subseq makes alist of the first 20, mapcar makes a list of only the urls, generate-graph makes the graph, and print-graph writes it to a file.

Using dot I can now generate a graph of the top referring urls to my site. As you can see it's a bit messy even with 20 urls. (It's also blurry due to resizing...) I still think it's kind of neat in a way.

As usual: Feedback welcome at asbjxrn@bjxrnstad.net

The blog

Obviously I'm no longer using the software I wrote here, and this code is outdated now. Hunchentoot has changed quite a bit since I wrote this. You've been warned.

I want to make a blog framework. Because, if there is one thing the world needs these days, it's another blog framework. I also couldn't find much example code around for TBNL, a web framework by Edi Weitz which I've been looking at looking at. (I didn't search that hard for examples, and quite a bit of the code below is from examples I found after starting to write this.) Anyway, I figured I'll just make some example code myself and keep it around in case I need it later.

Bill Clementson has written about getting TBNL up and running with apache and mod_lisp. In this example I'm going to use hunchentoot, a pure lisp web server by (again) Edi Weitz.

After reading the the third chapter of Practical common lisp by Peter Seibel we choose to use simple lists to store data about the blog, because ripping off code is way easier than writing our own. We'll keep individual entries in separate files so writing and editing them can be done with whatever tools we want to use.

Let's get started:


(defparameter *blog-db-item-dir* "/home/asbjxrn/blog/items/")
(defparameter *blog-db-file* "/home/asbjxrn/blog/blog-db")
(defparameter *blog-db* ())

(defun create-blog (&key id title description author)
  "Create a new blog by pushing a list containg a blog description onto the blog database.
Return the updated blog-database."
  (let ((new-blog (list
                   :id id
                   :title title
                   :description description
                   :author author
                   :items ())))
    (push new-blog *blog-db*)
    new-blog))

As you can see, I've made room for growth. *blog-db* is a list of blogs. This means we need some functions to find/list/delete the blogs. The print functions I only used for debugging/testing of the blog framework.


(defparameter *blog-hosts-prefix* "http://www.jalat.com")
(defparameter *blog-script-name-prefix* "/blogs/")

(defun print-blog-headers (blog)
  "Prints out everything but the items of a blog."
  (format t "Name: ~a~%Title: ~a~%Description: ~a~%Homepage: ~a~%Author: ~a~%~%"
          (getf blog :id)
          (getf blog :title)
          (getf blog :description)
          (getf blog :author)))

(defun print-blogs ()
  "Prints out the name, title, homepage, descripton and author of all the registered blogs"
  (loop for blog in *blog-db*
        do (print-blog-headers blog)))

(defun homepage (blog)
  "Generates the homepage uri for the blog"
  (format nil "~a~a~a" *blog-hosts-prefix* *blog-script-name-prefix*
          (getf blog :id)))
                 
(defun get-blog (name)
  "Returns the blog identified by \"name\""
  (when (stringp name)
    (find-if #'(lambda (x) (string= name (getf x :id)))
             *blog-db*)))

(defun delete-blog (name)
  "Deletes the blog identified by \"name\". Destructive on *blog-db*"
  (when (stringp name)
    (setf *blog-db*
          (remove-if #'(lambda (x) (string= name (getf x :id)))
                     *blog-db*))))

So far so good, time to add some functions for handling the items in each blog. since values passed to us through a get or post are strings, I parse the id arguments if they are strings:


(defun max-blog-id (blog)
  "Find the highest id of any item in a blog."
  (if (getf blog :items)
    (apply #'max (loop for entry in (getf blog :items)
                       collect (getf entry :id)))
    0))

(defun add-blog-item (blog &key title file description)
  "Add a new item in a blog."
  (push
   (list :title title
         :file file
         :description description
         :pub-time (get-universal-time)
         :id (+ 1 (max-blog-id blog)))
   (getf blog :items)))

(defun delete-blog-item (blog id)
  "Deletes a item in a blog. Destructive on the blog"
  (if (stringp id)
    (setf id (parse-integer id :junk-allowed t)))
  (setf (getf blog :items)
        (remove-if #'(lambda (x) (eql id (getf x :id)))
                   (getf blog :items))))

(defun get-blog-item (blog id)
  "Find blog item with id id."
  (if (stringp id)
    (setf id (parse-integer id :junk-allowed t)))
  (find-if #'(lambda (x) (eql id (getf x :id)))
           (getf blog :items)))

(defun get-last-blog-item (blog)
  "Return the last item for a blog"
  (get-blog-item blog (max-blog-id blog)))

(defun list-blog-items (blog)
  "Print a list of items in a blog."
  (format t "~{~{~a:~t~a~%~}~%~}" (getf blog :items)))

A couple of functions to store and load our data from disk:


(defun save-blog-db ()
  "Simply write out the blog structure to the db file."
  (with-open-file (out *blog-db-file*
                       :direction :output
                       :if-exists :supersede)
    (with-standard-io-syntax
      (print *blog-db* out))))

(defun load-blog-db ()
  "Simple read in the whole db in one operation."
  (with-open-file (in *blog-db-file*)
    (with-standard-io-syntax
      (setf *blog-db* (read in)))))

And some functions to read the blog entries from disk. To improve performance one might want to keep everything in memory, but I don't think that is going to be an issue. Besides, it's convenient to be able to just edit the files themselvesm, and have the blog update automatically.


(defun slurp-file (filename)
  "Dump the contents of a file into a string."
  (with-open-file (stream filename)
    (let ((seq (make-string (file-length stream))))
      (read-sequence seq stream)
      seq)))

(defun get-blog-file (blog id)
  "Reads the file of a blog item and returns it as a string."
  (let* ((item (get-blog-item blog id))
         (filename (concatenate 'string *blog-db-item-dir* (getf item :file))))
    (slurp-file filename)))

I think I'm done with the backend. Time to start with TBNL. First some convenience functions to start and stop the server. In the start function I also set up my dispatch-table:

  • /blogs/<foo> is handled by the function blog-page
  • /admin is handled by the function manage-blogs
  • A folder dispatcher serves all files in my static-files directory
  • And I have a separate static file dispatcher that handles "/favicon.ico", generously stolen from planet.lisp.org.

Btw. These functions are Hunchentoot specific, use start-tbnl if you use mod_lisp, araneida or something else.


(defparameter *blog-script-name-prefix* "/blogs/")
(defparameter *blog-admin-uri* "/admin")
(defparameter *blog-static-uri* "/static-files/")
(defparameter *blog-static-dir* "/home/asbjxrn/blog/static-files/")
(defparameter *blog-server* nil)
(defparameter *log-file* "/home/asbjxrn/blog/log")

(defun start-blog (&key port)
  "starts up the server, and initializes our dispatch-table"
  (setf (log-file) (make-pathname :defaults *log-file*))
  (load-blog-db)
  (setq *blog-server*
        (tbnl:start-server :port port)
        *dispatch-table*
        (list (create-prefix-dispatcher *blog-script-name-prefix* 'blog-page)
              (create-prefix-dispatcher *blog-admin-uri* 'manage-blogs)              
              (create-folder-dispatcher-and-handler *blog-static-uri*
                                                    (make-pathname :defaults *blog-static-dir*))
              (create-static-file-dispatcher-and-handler "/favicon.ico"
                                                         (merge-pathnames
                                                          (make-pathname :defaults *blog-static-dir*)
                                                          "favicon.ico"))
              (create-prefix-dispatcher "/index.html" 'front-page)                            
              (create-prefix-dispatcher "/the-blog.html" 'manage-blogs)              
              'default-dispatcher)))

(defun stop-blog ()
  (tbnl:stop-server *blog-server*))

All blogs are handled by the same function which will use request-uri to get hold of the name of the blog the user wants to read. So I need a function that gets hold of the name by removing the /blogs/ part of the uri.
At the same time, I'll make a function that prints out a timestamp in a readable way. (Totally ignoring things like timezones etc.)


(defun remove-prefix (uri prefix)
  "Returns the prefix from an uri in search of a blog name. 
Big assumtion: Blog names consists only of latin characters."
  (let ((scanstring (concatenate 'string prefix "([A-Za-z]+)")))
    (multiple-value-bind (dummy matchvector) (cl-ppcre:scan-to-strings scanstring uri)
      (declare (ignore dummy))
      (if matchvector
        (svref matchvector 0)
        ""))))

(defun timestring (timestamp)
  "Takes a timestamp and returns a string formatted with year-month-day hour:minutes"
  (multiple-value-bind (sec min hour day mnt year weekday dst tz)
      (decode-universal-time timestamp)
    (declare (ignore sec weekday dst tz))
    (format nil "~d-~d-~d ~2,'0d:~2,'0d" year mnt day hour min)))

HTML coming up. At long last some HTML generation. I'm going to use the CL-WHO library written by Edi Weitz to generate the HTML, but you can use whatever you want as long as your handler functions returns strings containing HTML. One nice thing about this approac is that you can just call the functions from the REPL and see if you get the HTML you expect. Use :indent t to make it easier to read.

The first function is a function I'll use to make a navigation bar. It takes a list of items, and generates a list of anchor tags, where the title of the item is used as the text of the link and the id of the item is used to create a uri like "?id=5". So these uris will only point back to the same page, but I'll use the parameter to decide what to display.


(defun navigate-items (items)
  "Returns html code four our navigation sidebar."
  (with-html-output-to-string (*standard-output* nil :indent t)
    (:ul
     (loop for item in items
           do (htm
               (:li
                (:a :href (format nil "?id=~a" (getf item :id))
                 (fmt "~a" (getf item :title)))
                (:br)
                (when (getf item :pub-time)
                  (htm (fmt "~a" (timestring (getf item :pub-time)))
                       (:br)))))))))

Finally the function that displays the blog item.
First I use script-name and remove-prefix to get the name of the blog, returning a 404 if the blog doesn't exist.
Then I use (get-parameter "id") to get the ?id=.. part of the uri if it exists. If it doesn't exist or no item with the specified id exist, I just show the latest item for this blog.

The page is split up in three parts, a header, the content, and a navigation bar.


(defparameter *blog-css-file* "/static-files/stylesheet.css")

(defun blog-page ()
  "Finally, the function that writes the html for our blog output."
  (let ((blog (get-blog (remove-prefix (script-name) *blog-script-name-prefix*))))
    (unless blog
      (setf (return-code *reply*)
            +http-not-found+)
      (return-from blog-page))
    (let* ((item-id (or (and (get-parameter "id")
                             (parse-integer (get-parameter "id")))
                        (max-blog-id blog)))
           (item    (get-blog-item blog item-id)))
      (unless item
        (redirect (format nil "~a?id=~a" (script-name) (max-blog-id blog)))
        (return-from blog-page))
      (with-html-output-to-string (*standard-output* nil :prologue t :indent t)
        (:html
         (:head
          (:title (fmt (concatenate 'string (getf blog :title) " - " (getf item :title))))
          (:link :href *blog-css-file* :rel "stylesheet" :type "text/css"))
         (:body
          (:div :id "Header"
           (:h1 (:a :href (getf blog :homepage)
                 (fmt "~a" (getf blog :title)))))
          (:div :id "Content"
           (:h3 (fmt "~a" (getf item :title)))
           (:p (fmt "~a" (get-blog-file blog (getf item :id)))))
          (:div :id "Menu"
           (:h4 "Blogged:")
           (fmt "~a" (navigate-items (getf blog :items))))))))))

And that's it. Get hold of some css, stuff it in /static-files/ and you should have a working website where people can go to read your latest musings. Of course, updating is a bit inconvenient as you have to execute commands create-blog/add-blog-item/etc. at the REPL. A simple form where one could maintain the data would me a lot more convenient...

It could look something like this:


(defun blog-form (blog)
  "A form with blog data."
  (with-html-output-to-string (*standard-output* nil :indent t)
    (:h3 (:a :href (homepage blog)
          (fmt "~a" (or (getf blog :title) ""))))
    (:h5 (fmt "~a" (or (getf blog :description) "")))
    (:form :method :post 
     (:table 
      (:tr
       (:td "Title: ")
       (:td
        (:input :type :text
         :name "new-blog-title"
         :value (or (getf blog :title) ""))))
      (:tr
       (:td "Description: ")
       (:td
        (:input :type :text
         :name "new-blog-description"
         :value (or (getf blog :description) ""))))
      (:tr
       (:td "Homepage: ")
       (:td (fmt "~a~a" *blog-hosts-prefix* *blog-script-name-prefix*)
        (:input :type :text
         :name "new-blog-id"
         :value (or (getf blog :id) ""))))
      (:tr
       (:td "Author: ")
       (:td
        (:input :type :text
         :name "new-blog-author"
         :value (or (getf blog :author) ""))))
      (:tr
       (:td (:input :type :submit :name "delete-blog" :value "Delete blog"))
       (:td (:input :type :submit :name "update-blog" :value "Update blog")))))))

Not particulary pretty, but this is not for public consumption, so I don't care. To apply the changes we make in the form, I'll call this function:


(defun update-blog (blog)
  "parses the post parameters and updates the blog accordingly"
  (let ((post-parameter-p)
        (new-blog (or blog ())))
    (when (post-parameter "new-blog-title")
      (setf (getf new-blog :title) (post-parameter "new-blog-title"))
      (setq post-parameter-p t))
    (when (post-parameter "new-blog-description")
      (setf (getf new-blog :description) (post-parameter "new-blog-description"))
      (setq post-parameter-p t))
    (when (post-parameter "new-blog-id")
      (setf (getf new-blog :id) (post-parameter "new-blog-id"))
      (setq post-parameter-p t))
    (when (post-parameter "new-blog-author")
      (setf (getf new-blog :author) (post-parameter "new-blog-author"))
      (setq post-parameter-p t))
    (when post-parameter-p
      (unless blog
        (setf blog (apply 'create-blog new-blog)))
      (save-blog-db)))
  blog)

And similar functions for items within the blog, and a function that loops through the items in a blog and makes a form for each one:


(defun item-form (item)
  "A simple one-line form for a blog item"
  (with-html-output-to-string (*standard-output* nil)
    (:form :method :post 
    (:input :type :hidden :name "item-id" :value (getf item :id))
     (:tr
      (:td
       (:input :type :text
        :name "new-item-title"
        :value (or (getf item :title) "")))
      (:td
       (:input :type :text
        :name "new-item-description"
        :value (or (getf item :description) "")))
      (:td
       (:input :type :text
        :name "new-item-file"
        :value (or (getf item :file) "")))
      (:td (:input :type :submit :name "update-item" :value (if item "Update item" "Save item")))
      (when item
        (htm (:td (:input :type :submit :name "delete-item" :value "Delete item"))))))))

(defun update-item (blog)
  "Updates an item in a blog."
  (let ((post-parameter-p)
        (new-item (or (get-blog-item blog (post-parameter "item-id"))
                      ())))
    (when (post-parameter "new-item-title")
      (setf (getf new-item :title) (post-parameter "new-item-title"))
      (setq post-parameter-p t))
    (when (post-parameter "new-item-description")
      (setf (getf new-item :description) (post-parameter "new-item-description"))
      (setq post-parameter-p t))
    (when (post-parameter "new-item-file")
      (setf (getf new-item :file) (post-parameter "new-item-file"))
      (setq post-parameter-p t))
    (when post-parameter-p
      (unless (get-blog-item blog (getf new-item :id))
        (apply 'add-blog-item blog new-item)))
    (when post-parameter-p
      (save-blog-db))))


(defun blog-items-forms (blog)
  "makes a list of forms, one for each item in the blog"
  (with-html-output-to-string (*standard-output* nil :indent t)
    (:table 
     (:tr
      (:th "Title") (:th "Description") (:th "File"))
     (fmt (item-form nil))
     (loop for item in (getf blog :items)
           do (fmt (item-form item))))))

The observant reader will have noticed that there are delete-buttons in the forms that are not handled by the above functions. I'm going to do that in this function that will be the central calling point from our admin-page:


(defun handle-update (blog)
  "apply changes to blog based on post parameters in the request"
  (cond
   ((post-parameter "delete-blog") (progn (delete-blog (getf blog :id))
                                     (redirect (format nil "~a?id=~a" (script-name)))))
   ((post-parameter "delete-item") (delete-blog-item blog (post-parameter "item-id")))
   ((post-parameter "update-blog") (update-blog blog))
   ((post-parameter "update-item") (update-item blog)))
  (redirect (format nil "~a" (request-uri))))

Finally the blog admin page. The only new thing here is that we put in some authorization control to prevent just anyone to connect and mess up our data.


(defun manage-blogs ()
  "Let the user manage his blogs"
  (multiple-value-bind (user password)
      (authorization)
    (cond ((or (string/= user "admin")
               (string/= password "admin"))    ; Noone will guess this.
           (require-authorization))
          (t
           (let ((blog (get-blog (get-parameter "id"))))
             (when (post-parameters)
               (handle-update blog))
             (with-html-output-to-string (*standard-output* nil :prologue t :indent t)
               (:html
                (:head
                 (:title (fmt (concatenate 'string
                                           (getf blog :title)
                                           " - admin - "
                                           (or (getf blog :title) ""))))
                 (:link :href *blog-css-file* :rel "stylesheet" :type "text/css"))
                (:body
                 (:div :id "Header"
                  (:h1 (fmt "Admin - ~a" (or (getf blog :title) ""))))
                 (:div :id "Content"
                  (fmt "~a" (blog-form blog))
                  (fmt "~a" (blog-items-forms blog)))
                 (:div :id "Menu"
                  (:a :href  (format nil "~a?id=" (script-name))
                   "New blog")
                  (:h4 "Blogs:")
                  (fmt "~a" (navigate-items *blog-db*)))))))))))

That is about it. There is also a dispatcher set up for index.html but you should be able to put something together yourself now and this write-up has become far longer than I was planning to make it. Other exercises for the reader are: generating rss feeds, file upload though the form, comments, etc.

As usual: Feedback welcome at asbjxrn@bjxrnstad.net

Monday, October 6, 2008

Let's dance

There is this sudoku craze going around the world. I got to admit, for a while I were writing numbers in squares as well. But I (quickly, I like to think.) realized that it was basically just different ways of counting which numbers are already used and filling in the rest, and repeat the process over and over and over again.

Computers are good at counting, I thought, so I wrote a program to do the work for me. Using the same logic I did myself to eliminate choices and filling in squares when only one choice were left for a square. It worked well for a while, but then I found a puzzle it didn't solve, and I had to add some more rules. And it worked for a while, but then I found another puzzle. And so on.

I'm not going to show you that program. I finally gave up on the logic approach, you never know if there is some pattern you have missed until you stumble upon it. Instead I going to show the brute force approach. It's not obvious how well that would work, sudoku puzzles with as few as 17 starting hints are known. Without doing any kind of narrowing of the search, that means there are (expt 9 (- 81 17)) combinations to search through. That is a big number.

Luckily, smarter people than me have played around with sudoku as well. And one of them, Donald E. Knuth, has come up with an algorithm that's pretty good. He calls it "Dancing links". Actually, he didn't really make it to solve sudokus, Knuth wanted to pack shapes into a container.

A small example. Suppose you have the following pieces and want to pack them onto a 3x3 square, without rotating any of them. To solve this problem you could set up a matrix so that each row represents a placement of a piece. The first 9 columns represent the squares in the 3x3 square. The last three columns represent the piece used for that row (marked L, B and S for L-shape, Big square and Small square.) We get something like this (I'm using _ to make it easier to read):

                           L B S
                      
1)  1 1 1  1 _ _  _ _ _    1 _ _    ; L in top/middle row.
2)  _ _ _  1 1 1  1 _ _    1 _ _    ; L in middle/bottom row

3)  1 1 _  1 1 _  _ _ _    _ 1 _    ; Big square in top left corner
4)  _ 1 1  _ 1 1  _ _ _    _ 1 _    ; Big square in top right corner
5)  _ _ _  1 1 _  1 1 _    _ 1 _    ; Big square in bottom left corner
6)  _ _ _  _ 1 1  _ 1 1    _ 1 _    ; Big square in bottom right corner

7)  1 _ _  _ _ _  _ _ _    _ _ 1    ; Small square top left
8)  _ 1 _  _ _ _  _ _ _    _ _ 1    ; Small square top middle 
9)  _ _ 1  _ _ _  _ _ _    _ _ 1    ; etc.
10) _ _ _  1 _ _  _ _ _    _ _ 1
11) _ _ _  _ 1 _  _ _ _    _ _ 1
12) _ _ _  _ _ 1  _ _ _    _ _ 1
13) _ _ _  _ _ _  1 _ _    _ _ 1
14) _ _ _  _ _ _  _ 1 _    _ _ 1
15) _ _ _  _ _ _  _ _ 1    _ _ 1

Now we can restate the problem. We will select rows in such a way that if we add up the selected rows we get one and only one one (1) in each column. In this example row 1, 6 and 13 solves the problem. Note also that the last three columns while not representing a square that needs to be filled on the board, is treated no differently.

We can look at each column as a constraint. The last three columns forces us to use each piece only once. And the first nine columns forces us to use each square of the board only once.

Now if you're anything like me you would solve this by first picking the first row, then mentally remove all rows that have a 1 in any of the columns where the first row has a 1. Then mentally remove the columns where the row has a 1 and then looking for the first row that has a 1 in the remaining columns, then repeat that process until there are no more columns to fill. If you had started out with the third row instead of the first row, you would at one point have realized that you couldn't solve the puzzle, and you would put the columns rows back in and chosen another row. You could just start over, but that's inefficient for big puzzles.

Let's make another small puzzle, fill a 3x3 square, and use the numbers 1, 2 and 3 in it so that each row and column have all three numbers in them, but only once in each row (Obviously). I'll make a matrix based on the row, column and value of the number in the square like this:


r,c=v         r,c                r,v                c,v
             
1,1=1   1 _ _ _ _ _ _ _ _  1 _ _ _ _ _ _ _ _  1 _ _ _ _ _ _ _ _
1,1=2   1 _ _ _ _ _ _ _ _  _ 1 _ _ _ _ _ _ _  _ 1 _ _ _ _ _ _ _
1,1=3   1 _ _ _ _ _ _ _ _  _ _ 1 _ _ _ _ _ _  _ _ 1 _ _ _ _ _ _
1,2=1   _ 1 _ _ _ _ _ _ _  1 _ _ _ _ _ _ _ _  _ _ _ 1 _ _ _ _ _
1,2=2   _ 1 _ _ _ _ _ _ _  _ 1 _ _ _ _ _ _ _  _ _ _ _ 1 _ _ _ _
1,2=3   _ 1 _ _ _ _ _ _ _  _ _ 1 _ _ _ _ _ _  _ _ _ _ _ 1 _ _ _
1,3=1   _ _ 1 _ _ _ _ _ _  1 _ _ _ _ _ _ _ _  _ _ _ _ _ _ 1 _ _
1,3=2   _ _ 1 _ _ _ _ _ _  _ 1 _ _ _ _ _ _ _  _ _ _ _ _ _ _ 1 _
1,3=3   _ _ 1 _ _ _ _ _ _  _ _ 1 _ _ _ _ _ _  _ _ _ _ _ _ _ _ 1

             
2,1=1   _ _ _ 1 _ _ _ _ _  _ _ _ 1 _ _ _ _ _  1 _ _ _ _ _ _ _ _
2,1=2   _ _ _ 1 _ _ _ _ _  _ _ _ _ 1 _ _ _ _  _ 1 _ _ _ _ _ _ _
2,1=3   _ _ _ 1 _ _ _ _ _  _ _ _ _ _ 1 _ _ _  _ _ 1 _ _ _ _ _ _
2,2=1   _ _ _ _ 1 _ _ _ _  _ _ _ 1 _ _ _ _ _  _ _ _ 1 _ _ _ _ _
2,2=2   _ _ _ _ 1 _ _ _ _  _ _ _ _ 1 _ _ _ _  _ _ _ _ 1 _ _ _ _
2,2=3   _ _ _ _ 1 _ _ _ _  _ _ _ _ _ 1 _ _ _  _ _ _ _ _ 1 _ _ _
2,3=1   _ _ _ _ _ 1 _ _ _  _ _ _ 1 _ _ _ _ _  _ _ _ _ _ _ 1 _ _
2,3=2   _ _ _ _ _ 1 _ _ _  _ _ _ _ 1 _ _ _ _  _ _ _ _ _ _ _ 1 _
2,3=3   _ _ _ _ _ 1 _ _ _  _ _ _ _ _ 1 _ _ _  _ _ _ _ _ _ _ _ 1

             
3,1=1   _ _ _ _ _ _ 1 _ _  _ _ _ _ _ _ 1 _ _  1 _ _ _ _ _ _ _ _
3,1=2   _ _ _ _ _ _ 1 _ _  _ _ _ _ _ _ _ 1 _  _ 1 _ _ _ _ _ _ _
3,1=3   _ _ _ _ _ _ 1 _ _  _ _ _ _ _ _ _ _ 1  _ _ 1 _ _ _ _ _ _
3,2=1   _ _ _ _ _ _ _ 1 _  _ _ _ _ _ _ 1 _ _  _ _ _ 1 _ _ _ _ _
3,2=2   _ _ _ _ _ _ _ 1 _  _ _ _ _ _ _ _ 1 _  _ _ _ _ 1 _ _ _ _
3,2=3   _ _ _ _ _ _ _ 1 _  _ _ _ _ _ _ _ _ 1  _ _ _ _ _ 1 _ _ _
3,3=1   _ _ _ _ _ _ _ _ 1  _ _ _ _ _ _ 1 _ _  _ _ _ _ _ _ 1 _ _
3,3=2   _ _ _ _ _ _ _ _ 1  _ _ _ _ _ _ _ 1 _  _ _ _ _ _ _ _ 1 _
3,3=3   _ _ _ _ _ _ _ _ 1  _ _ _ _ _ _ _ _ 1  _ _ _ _ _ _ _ _ 1

If we now do as above and choose rows out of this matrix such that we get one and only one one (1) for each column, you can see that the first 9 columns of the matrix forces us to choose only one value for each pair r,c. The next 9 columns forces us to use each value only once for each row (If I chose the first line, (put "1" in location "1,1", I can't choose line 4 (put "1" in "1,2")). And the last columns forces us to only use each value once for each column.

This is almost how I'm going to represent the sudoku puzzle, soduku have one more constraint, the 9x9 square is divided into 9 3x3 blocs, and each block also have to contain the numbers 1-9. This is easy enough to accomplish. We just add another block to our matrix, so in addtion to (r,c) (r,v) (c,v) we have (b,v). This means our matrix is big enough that I'm not going to bother writing it out. Each row has 81 columns for the (r,c) constraint, 81 columns for (r,v), 81 columns for (c,v) and 81 columns for (b,v) => 364 columns. And the matrix has one row for each combination of r,c,v => 9*9*9 => 729 rows.

The point I wanted to illustrate with this detour before I start describing the algorithm is that you don't have to come up with a completely new algorithm for a problem if you know a good algorithm that can be used if you look at the problem in a different way.

When representing a matrix, it could be tempting to represent it as an array. It is probaly the most common way of representing matrixes. However, Knuth used double-linked circular lists for this algorithm, and for a good reason. When you remove an element from a double-linked list, the element remembers it's position in the list if you don't change the elements next/previous pointers. This makes it very easy to reinsert the element again. So when we guess wrong in our algorithm it is easy to backtrack a few steps.

The only thing to watch out for is to reinsert the elements in the opposite order they were taken out of the list, which is what you would do in a depth first search anyway.

So what is the data structure going to look like?

  • Each "1" will be represented by a matrix element, we will not represent the gaps in the matrix in any way. They are not interesting.
  • Each element get a left and right pointer which points to the previous/next element in the row. It's a circular list, so the first and the last element in each row points to each other.
  • Each element also get a up/down pointer with points to the previous/next element in the column. The top and bottom element points to a special element that is the header for that column.
  • Header elements have right/left pointers to each other in a doubly linked list as well. The first and the last header element points to a node that is our entry point to the whole matrix.
  • The header elements also has a name slot, which is not really neccessary but useful during debugging, and a size slot, which is the number of elements left in that column. We'll choose columns to search based on how many elements are left in the columns. If we search the column with the smallest number of elements first, we will reduce the search tree and the whole thing will go a lot faster.
  • Each node in the matrix also has extra slots, one triplet representing (row column value). This is used to print out the solution. And they also have a pointer to the header element of the column they are in.
  • Finally, I'm going to make a array with one element for each row in the matrix, this will point to the first element on each row, and it will be used when setting up the starting position.
The whole thing should look someting like, oh, buggr'it, go read Knuths paper. It's got pretty pictures and stuff. Maybe I'll be bored enough one day to add a diagram here.

With that out of the way, on to the implementation:


(defpackage :sudoku
    (:use :common-lisp))
  
(in-package :sudoku)

(defclass matrix-element ()
  ((up :accessor up :initarg :up)
   (down :accessor down :initarg :down)
   (left :accessor left :initarg :left)
   (right :accessor right :initarg :right)))

Now, if I were very concerned about performance I might use something else than classes for this, but Knuth says premature optimization is evil, so I wont. Besides, the best optimization is to use a good algorithm, which I think I'm doing. It's not like I made it myself.

(defmethod initialize-instance :after ((me matrix-element) &rest initargs &key &allow-other-keys)
  (setf (up me) me)
  (setf (down me) me)
  (setf (left me) me)
  (setf (right me) me))

Once nice thing about classes is that you can make them do stuff when you initialize them by adding an :after method on the initialize-instance method. What I do above is to initialize all the pointers of the matrix element to point back at the newly created element instance. This means that a newly created instance is a tiny circular list with itself as the only object and I don't have to worry about wheter a single element is a list or not, I just treat everything as lists.

(defclass column-header (matrix-element)
  ((size :accessor size :initform 0)
   (name :accessor name :initarg :name)))

(defclass column-element (matrix-element)
  ((column-header :accessor column-header :initarg :column-header)
   (triplet :accessor triplet :initarg :triplet)))

I'm creating two subclasses, to add slots for the column headers and the matrix elements as I mentioned above.

(defun insert-above (element new)
  "Insert a new element above the existing element. And increase size of column by 1"
  (setf (up new) (up element)
        (down new) element
        (down (up element)) new
        (up element) new)
  (incf (size (column-header new)))
  element)

(defun insert-rightof (element new)
  (setf (right new) (right element)
        (left new) element
        (left (right element)) new
        (right element) new)
  element)

Our insertion functions. I'm building up the matrix right to left and top to bottom. Since the columns are circular lists the easiest way to insert an element at the bottom of the list is to insert it above the column header.

(defmethod initialize-instance :after ((me column-element) &rest initargs &key &allow-other-keys)
  (if (column-header me)
    (insert-above (column-header me) me)))

When I initialize the matrix elements I'm going to pass the column header as an argument to the make-instance call. This method puts the element into the circular list for that column.

(defun remove-horizontal (element)
  (setf (left (right element)) (left element)
        (right (left element)) (right element))
  element)

(defun replace-horizontal (element)
  (setf (left (right element)) element
        (right (left element)) element)
  element)

The core of the algorithm, removing and replacing an element from a list. These elements assumes that when you reinsert the element, the list is in the same state as it was when the element was removed, in other words, that (left element) and (right element) points to each other.
(defun remove-vertical (element)
  (setf (up (down element)) (up element)
        (down (up element)) (down element))
  (decf (size (column-header element)))
  element)

(defun replace-vertical (element)
  (setf (up (down element)) element
        (down (up element)) element)
  (incf (size (column-header element)))  
  element)

Same thing vertically, this time I update the size of the column as well.

(defmacro rcv-loop (&rest body)
  `(loop for r from 0 to 8
         do (loop for c from 0 to 8
                  do (loop for v from 0 to 8
                           do ,@body))))


A macro that I'm going to use to build up the matrix. This is probably bad style since I rely on body using the variables r c and v in the body.

(defun add-constraint-headers (matrix elem1 elem2)
  (loop for i from 0 to 8
        do (loop for j from 0 to 8
            do (insert-rightof (left matrix) 
                               (make-instance 'column-header
                                              :name (format nil "~a: ~a/~a: ~a" elem1 (+ 1 i) elem2 (+ 1 j)))))))

(defun sudoinit ()
  "returns an initial matrix representing an empty board."
  (let ((matrix (make-instance 'matrix-element))
        (col-array (make-array (* 4 9 9)))
        (row-array (make-array (* 9 9 9))))
    (add-constraint-headers matrix "Row" "Col")
    (add-constraint-headers matrix "Row" "Number")      
    (add-constraint-headers matrix "Col" "Number")      
    (add-constraint-headers matrix "Block" "Number")
    (loop with foo = (right matrix)
          for i from 0
          until (eql foo matrix)
          do (setf (aref col-array i) foo
                   foo (right foo)))
    (rcv-loop
     (let* ((triplet (list r c v))
            (r-c (make-instance 'column-element
                                :column-header (aref col-array (+ (* 9 r)
                                                                  c))
                                :triplet triplet))
            (r-v (make-instance 'column-element
                                :column-header (aref col-array (+ (* 9 9)
                                                                  (* 9 r)
                                                                  v))
                                :triplet triplet))
            (c-v (make-instance 'column-element
                                :column-header (aref col-array (+ (* 2 9 9)
                                                                  (* 9 c)
                                                                  v))
                                :triplet triplet))
            (b-v (make-instance 'column-element
                                :column-header (aref col-array (+ (* 3 9 9)
                                                                  (* 9 (+ (* 3 (floor r 3))
                                                                          (floor c 3)))
                                                                  v))
                                :triplet triplet)))
       (insert-rightof r-c r-v)
       (insert-rightof r-v c-v)
       (insert-rightof c-v b-v)
       (setf (aref row-array (+ (* 9 (+ (* 9 r) c)) v)) r-c)))
    (values matrix row-array)))

Ok, this is the big one. I first create the row of header elements, adding a short descriptive name in each element. I collect them in a array as well, so it is easy to reference the headers when adding the elements later.

Each row in the matrix has 4 elements. One for the row/col constraint, one for row/value, one for col/value and one for block/value. Remember that the make-instance method for column-elements automatically inserts the element in the right column if I pass it a :column-header argument.

Finally I arrange the elements in a row and put a pointer to the row in my row-array.


(defun cover-col (col-header)
  (do ((elem (down col-header) (down elem)))
      ((eql elem col-header) col-header)
    (do ((row-elem (right elem) (right row-elem)))
        ((eql row-elem elem) row-elem)
      (remove-vertical row-elem)))
  (remove-horizontal col-header))

Knuth calls the hiding of columns/rows for covering, so I'm going to use the same name. This function takes a column header, and for each row that thas an element in this column, it will hide the row vertically so that this row can not be chosen later. So if col-header is the header of the first column, this will remove all rows with a element in the first column.

(defun uncover-col (col-header)
  (do ((elem (up col-header) (up elem)))
      ((eql elem col-header) col-header)
    (do ((row-elem (right elem) (right row-elem)))
        ((eql row-elem elem) row-elem)
      (replace-vertical row-elem)))
  (replace-horizontal col-header))

Uncovering the column again. Both this and the previous function test for the equality of the current and initial element to detect that the circular list has been traversed.


(defun cover-row (elem)
  (do ((column-iter (right elem) (right column-iter)))
      ((eql elem column-iter))
    (cover-col (column-header column-iter))))

(defun uncover-row (elem)
  (do ((column-iter (left elem) (left column-iter)))
      ((eql elem column-iter))
    (uncover-col (column-header column-iter))))

When we choose a move we need to remove the colliding moves for all the columns where the row have elements. This functions takes care of this by iterating over the row, and calling cover-col for each of the columns.


(defun try-elem (elem array solution)
  (push elem solution)
  (cover-row elem)
  (let ((foo (find-solution array solution)))
    (uncover-row elem)
    (pop solution)
    foo))

Why the let? I want to return the value returned by find-solution to the caller because the return value signals whether a solution has been found.


(defun find-smallest-col (array)
  (do ((head-iter (right array) (right head-iter))
       (cur-min (right array) (if (< (size head-iter) (size cur-min))
                                head-iter
                                cur-min)))
      ((or (= 1 (size cur-min))
           (eql head-iter array)) cur-min)))

Loop through the headers and find the column with the smallest number of elements. If we find a column with only one element, we can stop searching.


(defun find-solution (array solution)
  (if (eql (left array) array)
    (progn
      (print-solution solution)
      t)
    (let ((col-header (find-smallest-col array)))
      (prog2
          (cover-col col-header)
          (do ((row-elem (down col-header) (down row-elem)))      
              ((or (eql row-elem col-header)
                   (try-elem row-elem array solution))
               (not (eql row-elem col-header))))
        (uncover-col col-header)))))

Try to find a solution. If all the headers are hidden, we have found a solution, and can print it out. Return t to signal that the solution is found. If a solution is not found yet, find the column with the smallest number of elements and try out each element to see if we find a solution. If a solution is found try-elem will return true, and we can stop searching. Make sure to pass on the true value to the calling function.


(let ((solc 0))
  (defun print-solution (solution)
    (format t "  ~a  " (incf solc))
    (loop with arr = (make-array '(9 9))
          for s in solution
          do (destructuring-bind (i j d) (triplet s)
               (setf (aref arr i j) (+ 1 d)))
          finally (loop for i from 0 to 8
                        do (format t "~{~A~}"
                                   (loop for j from 0 to 8
                                         collect (aref arr i j))))))
  
  (defun reset-count ()
    (setf solc 0)))

Very simple way of printing out solutions. I just build up the resulting matrix from the triplets in the solution list and then print out the resulting matrix on a single line. I also maintain a counter that I can reset counting how many puzzles has been solved.


(defun read-starting-position (pos-string)
  (with-input-from-string (s pos-string)
    (remove-if 'null
               (apply 'append
                      (loop for r from 0 to 8
                            collect (loop for c from 0 to 8
                                          collect (let ((v (- (char-code (read-char s))
                                                              (char-code #\0))))
                                                    (when (> v 0)
                                                      (list r c (1- v))))))))))

Our starting positions is just a string of numbers where 0 means an open square. This function collects a list of triplets (r c v) for all non-null positions. Note the complete lack of any validation of input. I have to subtract 1 from the value because I use the value as a index into the matrix and indexes start at 0.


(defun solve-file (filename)
  (reset-count)
  (multiple-value-bind (matrix rowarr) (sudoinit)
    (with-open-file (s filename)
      (do ((string (read-line s nil 'eof) (read-line s nil 'eof)))
          ((eql string 'eof) nil)
 (format t "~%~a" string)
        (solve string matrix rowarr)))))

(defun solve (pos-string &optional matrix rowarr)
  (unless (and matrix rowarr)
    (multiple-value-bind (foo bar) (sudoinit)
      (setf matrix foo)
      (setf rowarr bar)))
  (let ((solution ()))
    (dolist (triplet (read-starting-position pos-string))
      (destructuring-bind (i j d) triplet
        (push (aref rowarr (+ (* 9 (+ (* 9 i) j)) d)) solution)
        (cover-col (column-header (aref rowarr (+ (* 9 (+ (* 9 i) j)) d))))
        (cover-row (aref rowarr (+ (* 9 (+ (* 9 i) j)) d)))))
    (find-solution matrix solution)
    (do ((elem (pop solution) (pop solution)))
        ((null elem) matrix)
      (uncover-row elem)
      (uncover-col (column-header elem)))))

Finally, getting to run the whole thing. solve-file just reads in a file a line at a time and passes it on to the solve funtion for processing. I create the matrix and the row-array in solve-file as well, otherwise I would have to recreate the matrix for every puzzle.

The solve function is quite simple, read in the starting position. apply the moves in the starting position as if it is moves we have searched so far. Pass everything on to find-solution, and clean up the matrix for reuse after find-solution returns.

So how good is it? Remember a long time ago, I said something about the number of possibilities being a very large number? On my machine, which by no means is a top of the line machine, this program manages to solve about 200 puzzles/second. And that is without doing much in the way of optimizing.

Back of the envelope

Playing with the REPL

In my previous write-up I claimed that with a set of moves an end position could be reached from any start position. In fact, any position can be reached from any other position. How can I be so sure?

This time I'm not going to write a program. Well, in a sense I do, but I'm just going to type in expressions at the lisp REPL, and see where I end up. This time I'm not really bothered by effiency or elegance of the code, I'm more concerned about giving a feel for this kind of back-of-the-envelope coding you can do withing the REPL. You will also see that I kind of repeat the same code again and again with small modifications until I get it right. Because of this I feel it's important to use lisp with good history editing functions or a interface like the excellent slime mode for emacs.

And BTW: From the history numbers, you can tell that I actually messed around a lot more than what I'm showing you. This is normal and not an indication of my cluelessness level. I think.

With that out of the way, let's get started. First, let's recreate the apply-move function, this time we want to combine any kind of moves, not just the predefined moves in the *moves* list:


FLIPPER 50 > (defun xor (a b)
               (loop for ap in a
                     for bp in b
                     collect (mod (+ ap bp) 2)))
XOR

FLIPPER 51 > (xor '(1 0 1) '(1 1 0))
(0 1 1)

FLIPPER 52 > (xor '(1 0 1) '(0 1 1))
(1 1 0)

If you played around with the game, you already knew this, but if you look at what we just did you can see that by applying a move on the result of a move, we get back to the starting position. What this means for us, is that we can replace a move in our list with the result of applying another move on it and we will still be able to generate the same results.

Let's look at our moves again (and add a move identifier to each move):


FLIPPER 55 > (defparameter *c-moves* (loop for m in *moves*
                                           for i upfrom 1
                                           collect (cons m (list i))))
*C-MOVES*

FLIPPER 56 >  (format t "~{~a~%~}" *c-moves*)
((1 1 0 1 1 0 0 0 0) 1)
((1 1 1 0 0 0 0 0 0) 2)
((0 1 1 0 1 1 0 0 0) 3)
((1 0 0 1 0 0 1 0 0) 4)
((1 0 1 0 1 0 1 0 1) 5)
((0 0 1 0 0 1 0 0 1) 6)
((0 0 0 1 1 0 1 1 0) 7)
((0 0 0 0 0 0 1 1 1) 8)
((0 0 0 0 1 1 0 1 1) 9)
NIL

Not quite what I had in mind, we want the move identifier to be a list so we can push other identifiers onto it when we combine moves.

FLIPPER 57 > (defparameter *c-moves* (loop for m in *moves*
                                           for i upfrom 1
                                           collect (list m (list i))))
*C-MOVES*

FLIPPER 58 >  (format t "~{~a~%~}" *c-moves*)
((1 1 0 1 1 0 0 0 0) (1))
((1 1 1 0 0 0 0 0 0) (2))
((0 1 1 0 1 1 0 0 0) (3))
((1 0 0 1 0 0 1 0 0) (4))
((1 0 1 0 1 0 1 0 1) (5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
NIL

Better. But our xor function won't work with "moves" from this list, we need to add code to handle the move identifiers. We will append the identifiers to keep track of which moves are combined to generate the new move.


FLIPPER 59 > (defun my-xor (a b)
               (list (xor (first a) (first b)) 
                     (append (second a) (second b))))
MY-XOR

FLIPPER 60 > (my-xor (first *c-moves*) (second *c-moves*))
((0 0 1 1 1 0 0 0 0) (1 2))

Now, lets see if we can find a move with a specific "bit" set. (It's not really a bit but an integer, but that's what I'll call it.) Let's try to find the first move with the last bit set. (Should be move 5 by looking at the table above.)

We'll use the find-if function, find-if requires two arguments, a test function and a list. The test function should accept one argument and return true or false. We want to test if the nth element of the first list of is equal to 1 and will use a lambda function for this. The whole thing looks like this if we look for the 8th bit set:


FLIPPER 61 > (find-if #'(lambda (x) (= 1 (nth 8 (first x)))) 
                      *c-moves*)
((1 0 1 0 1 0 1 0 1) (5))

Ok, we're almost set, now we'll go through the list of moves, first looking for a move with the first bit set, we will xor this move with all other moves with the first bit set. Then we'll do the same for the second bit, and so on. First, let's try it out by just testing for the first bit and print it out:

FLIPPER 81 > (let* ((bar (find-if #'(lambda (x) (= 1 (nth 0 (first x)))) *c-moves*))
                    (gaz (remove bar *c-moves*)))
               (format t "~{~a~%~}"
                       (loop for move in gaz
                             collect (if (= 0 (nth 0 (first move)))
                                       move
                                       (my-xor bar move)))))
((0 0 1 1 1 0 0 0 0) (1 2))
((0 1 1 0 1 1 0 0 0) (3))
((0 1 0 0 1 0 1 0 0) (1 4))
((0 1 1 1 0 0 1 0 1) (1 5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
NIL

That almost worked, we forgot to put the first element back on. Let's add it back in and at the same time wrap the whole thing in a loop that goes through the index for the "bits". We also have to make some changes inside the let* to update the bit index and the list of moves (foo). Finally we'll print the result.

FLIPPER 88 > (loop with foo = (copy-list *c-moves*)
                   for i from 0 to 8
                   do (let* ((bar (find-if #'(lambda (x) (= 1 (nth i (first x)))) foo))
                             (gaz (remove bar foo)))
                        (setf foo 
                              (cons bar
                                    (loop for move in gaz
                                          collect (if (= 0 (nth i (first move)))
                                                    move
                                                    (my-xor bar move))))))
                   finally (format t "~{~a~%~}" foo))
((0 0 0 0 0 0 0 0 1) (1 2 1 2 1 1 4 1 2 1 2 1 2 1 1 5))
((1 0 1 1 0 0 0 1 0) (1 2 1 2 1 1 4 1 2 7))
((1 0 0 1 0 0 1 0 0) (1 2 1 2 1 1 4))
((1 0 1 1 0 1 0 0 0) (1 2 1 2 1 3))
((0 0 1 1 1 0 0 0 0) (1 2))
((1 1 1 0 0 0 0 0 0) (1 2 1))
((1 0 0 1 0 0 0 0 0) (1 2 1 2 1 1 4 1 2 1 2 1 2 1 1 5 1 2 1 2 1 3 1 2 1 2 6))
((0 0 1 0 0 0 0 0 0) (1 2 1 2 1 1 4 1 2 1 2 1 2 1 1 5 1 2 1 2 1 1 4 1 2 7 1 2 1 2 1 1 4 8))
((0 0 1 1 0 0 0 0 0) (1 2 1 2 1 1 4 1 2 1 2 1 2 1 1 5 1 2 1 2 1 1 4 1 2 7 1 2 1 2 1 3 1 2 9))
NIL

Huh? That doesn't look very nice at all. Let's try to print out the array for each loop and see what is going on.

FLIPPER 89 > (loop with foo = (copy-list *c-moves*)
                   for i from 0 to 8
                   do (let* ((bar (find-if #'(lambda (x) (= 1 (nth i (first x)))) foo))
                             (gaz (remove bar foo)))
                        (setf foo 
                              (cons bar
                                    (loop for move in gaz
                                          collect (if (= 0 (nth i (first move)))
                                                    move
                                                    (my-xor bar move)))))
                        (format t "~{~a~%~}~%" foo)))
((1 1 0 1 1 0 0 0 0) (1))
((0 0 1 1 1 0 0 0 0) (1 2))
((0 1 1 0 1 1 0 0 0) (3))
((0 1 0 0 1 0 1 0 0) (1 4))
((0 1 1 1 0 0 1 0 1) (1 5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))

((1 1 0 1 1 0 0 0 0) (1))
((0 0 1 1 1 0 0 0 0) (1 2))
((1 0 1 1 0 1 0 0 0) (1 3))
((1 0 0 1 0 0 1 0 0) (1 1 4))
((1 0 1 0 1 0 1 0 1) (1 1 5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))

[rest of output snipped.]

Duh, since the first move has the second bit as well as the first bit set, it's chosen for the second round in the loop as well as the first, and from then on everything is a mess. We want to choose new moves a much as possible. Since find-if search from the front, we can do that by putting "bar" at the end of the list after applying the xor instead of the other way around.

We can't just reorder the bar and the loop in the cons call since that will break up "bar" the same way it broke the move identifier the first time we tried to create the *c-moves* parameter. Read up on conses and proper lists to figure out why.


FLIPPER 94 > (loop with foo = (copy-list *c-moves*)
                   for i from 0 to 8
                   do (let* ((bar (find-if #'(lambda (x) (= 1 (nth i (first x)))) foo))
                             (gaz (remove bar foo)))
                        (setf foo 
                              (append (loop for move in gaz
                                            collect (if (= 0 (nth i (first move)))
                                                      move
                                                      (my-xor bar move)))
                                      (list bar)))
                        (format t "~{~a~%~}~%" foo))
)
((0 0 1 1 1 0 0 0 0) (1 2))
((0 1 1 0 1 1 0 0 0) (3))
((0 1 0 0 1 0 1 0 0) (1 4))
((0 1 1 1 0 0 1 0 1) (1 5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
((1 1 0 1 1 0 0 0 0) (1))

((0 0 1 1 1 0 0 0 0) (1 2))
((0 0 1 0 0 1 1 0 0) (3 1 4))
((0 0 0 1 1 1 1 0 1) (3 1 5))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
((1 0 1 1 0 1 0 0 0) (3 1))
((0 1 1 0 1 1 0 0 0) (3))

[more output snipped]

((1 0 0 0 0 0 0 0 0) (1 2 3 1 4 3 1 5 1 2 3 1 4 1 2 6 8 9 1 2 3 1))
((0 1 0 0 0 0 0 0 0) (1 2 3 1 4 1 2 6 1 2 3 1 4 7 9 1 2 3 1 4 1 2 3))
((0 0 1 0 0 0 0 0 0) (1 2 3 1 4 3 1 5 1 2 3 1 4 1 2 6 8 1 2 3 1 4 1 2 6 1 2 3 1 4 7 1 2 3 1 4 1 2))
((0 0 0 1 0 0 0 0 0) (1 2 3 1 4 1 2 6 8 1 2 3 1 4 1 2 6 9 1 2 3 1 4))
((0 0 0 0 1 0 0 0 0) (1 2 3 1 4 3 1 5 1 2 3 1 4 7 9))
((0 0 0 0 0 1 0 0 0) (1 2 3 1 4 1 2 6 8 1 2 3 1 4 7))
((0 0 0 0 0 0 1 0 0) (1 2 3 1 4 3 1 5 1 2 3 1 4 1 2 6))
((0 0 0 0 0 0 0 1 0) (1 2 3 1 4 1 2 6 8))
((0 0 0 0 0 0 0 0 1) (1 2 3 1 4 3 1 5))

NIL
That's more like it! So to toggle the bottom right square, we just need to do the moves "1 2 3 1 4 3 1 5"? But the 3's and two of the 1'a cancel each other out so this is more complex than it needs to. As a final fix we can modify my-xor so that instead of appending the move identifiers, we collect the identifiers that are unique to each of the moves we combine. The function we want to use for this is set-exclusive-or:

FLIPPER 97 > (set-exclusive-or (list 1 2 3) (list 2 3 4))
(4 1)

FLIPPER 98 > (defun my-xor (a b)
               (list (xor (first a) (first b)) 
                     (set-exclusive-or (second a) (second b))))
MY-XOR

FLIPPER 103 > (loop with foo = (copy-list *c-moves*)
                   for i from 0 to 8
                   do (let* ((bar (find-if #'(lambda (x) (= 1 (nth i (first x)))) foo))
                             (gaz (remove bar foo)))
                        (setf foo 
                              (append (loop for move in gaz
                                            collect (if (= 0 (nth i (first move)))
                                                      move
                                                      (my-xor bar move)))
                                      (list bar)))
                        (format t "~{~a~%~}~%" foo)))
((0 0 1 1 1 0 0 0 0) (2 1))
((0 1 1 0 1 1 0 0 0) (3))
((0 1 0 0 1 0 1 0 0) (4 1))
((0 1 1 1 0 0 1 0 1) (5 1))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
((1 1 0 1 1 0 0 0 0) (1))

((0 0 1 1 1 0 0 0 0) (2 1))
((0 0 1 0 0 1 1 0 0) (1 4 3))
((0 0 0 1 1 1 1 0 1) (1 5 3))
((0 0 1 0 0 1 0 0 1) (6))
((0 0 0 1 1 0 1 1 0) (7))
((0 0 0 0 0 0 1 1 1) (8))
((0 0 0 0 1 1 0 1 1) (9))
((1 0 1 1 0 1 0 0 0) (1 3))
((0 1 1 0 1 1 0 0 0) (3))

[snippety]

((1 0 0 0 0 0 0 0 0) (8 6 9 5))
((0 1 0 0 0 0 0 0 0) (7 2 9 4 6))
((0 0 1 0 0 0 0 0 0) (8 7 4 5))
((0 0 0 1 0 0 0 0 0) (2 9 4 3 8))
((0 0 0 0 1 0 0 0 0) (7 3 9 1 5))
((0 0 0 0 0 1 0 0 0) (2 7 1 6 8))
((0 0 0 0 0 0 1 0 0) (3 6 2 5))
((0 0 0 0 0 0 0 1 0) (8 3 4 6 1))
((0 0 0 0 0 0 0 0 1) (5 1 2 4))

NIL
Ok, we're done. We now know which moves we need to do to turn a single square on or off. So we can now just toggle the ones we want. Let's just test it by toggling the top left corner (moves 8 6 9 5):

FLIPPER 105 > (play-tty)
 0 0 1
 0 0 1
 1 0 0

8
 0 0 1
 0 0 1
 0 1 1

6
 0 0 0
 0 0 0
 0 1 0

9
 0 0 0
 0 1 1
 0 0 1

5
 1 0 1
 0 0 1
 1 0 0

Excellent.
If you want to experiment further, you can try to combine what we now know and make a function that takes a starting position and returns which moves a player needs to do to solve a puzzle. Good luck!