Simple Automated File Type Detection Under Windows
June 27, 2008
This post is a report on my experience running Windows ports of the legacy *NIX file(1) command.
The first stage of processing my roll-your-own terabyte text corpus (after acquiring it from the internet, of course!) is to classify each document in the corpus according to its type.
I require classification of files from the text corpus into their mime type: e.g. ‘application/pdf’, ‘text/html’, ‘image/jpeg’, etc. The document type classification drives the application of text extraction algorithms in further stages of processing. Accurate knowledge of document type is important to avoid database corruption caused by applying the wrong text extraction algorithms to a document. (I do not require 100% accuracy here, since a little corruption in the corpus is inevitable, but the higher accuracy, the better.)
Document type is asserted during collection and analysis of the corpus in four relevant ways: 1, http mime type assertion; 2, document filename extension; 3, output of a document type classification program; 4, context (e.g. the type of link or feed, or surrounding html markup). Since the document types reported by http headers, filename extensions, and search engine results are not always accurate, in order to be confident in the classification I require agreement between all applicable type assertions before I allow the document into the corpus.
This post relates to method #3 for detecting document type — using a pre-existing classification program.
I tried two Windows ports of the *NIX ‘file’ command. One port ran, the other did not. I have had success with the Optima SC port and have incorporated it into the document processing pipeline.
Attempt #1 (Failure)
Download file utility installer (“Complete package, except sources”) from http://gnuwin32.sourceforge.net/packages/file.htm.
Run the installer. By default, the executables and necessary .dll files are installed to C:/program files/GnuWin32/bin/. The ‘magic.mgc’ database file is installed to C:/program files/GnuWin32/share/file/magic.mgc.
Either add the install bin/ directory to your path, or copy the installed files to a directory your path already points to. Add an environment variable MAGIC with the value c:/program files/GnuWin32/share/file. (mutatis mutandis)
Run the file utility on a collection of files. Result: program spits thousands of lines of warnings. Conclusion: Does not work!
Attempt #2 (Success)
Download file utility from http://www.optimasc.com/products/fileid/file0.6.1-win32.zip. Download type identification database from http://magicdb.org/magic.db.
Either add the install bin/ directory to your path, or copy the executable ‘file.exe’ and the identification database ‘magic.db’ to a directory in your search path.
Run the file utility. Result: Default arguments work!
Further tests:
C:\> file –check-brief *
C:\> [GOOD OUTPUT!]
C:\> file –check-standard *
C:\> [GOOD OUTPUT!]
C:\> file –check-harder *
C:\> [CRASH!]
The program seems to work ok in the ‘brief’ and ’standard’ modes. The program crashes in ‘harder’ mode. The program also crashes when asked to run on a directory containing 50,000 files. The output is useful (though it must be parsed), and the program is reliable when run on a single file at a time. It is possible to produce output in UTF-8.
In a test I ran using approximately 100,000 documents downloaded from the internet, the initial type prediction based on context and filename extension was verified by the http content-type assertion and the output of the optimasc ‘file’ document type classification program 92% of the time.
In a test of 100 known pdf files, the optimasc file program correctly classified all 100 files as application/pdf. In a test of 100 known well-formed html files, the optimasc file program correctly classified all 100 files as text/html.
The ‘file’ utility under Windows appears to be due to Carl Eric Codere of Optima SC Inc. (http://www.optimasc.com/) I appreciate Carl Eric’s efforts, and the decision by Optima SC to make the utility available.
This post applies to running Steel Bank Common Lisp version 1.0.16 – 1.0.17.14 on Windows Vista. It may apply to later or earlier versions of SBCL, but I cannot directly attest to that. It may apply to certain functionality under Windows XP, but I cannot attest to that either.
[Update 07 June 2008] I have made a few modifications to the attached code in order to:
- place (D. Lichteblau’s) support for SHORTCUT files in a separate function PARSE-WINDOWS-SHORTCUT (to ensure stream-read-invariance wrt TRUENAME);
- add a redefinition for FILE-WRITE-DATE to mimic Unix behavior under Windows;
- add behavior so that TRUENAME of a recursive or dangling link returns the pathname of the link, and does not signal an error;
- add support for symbolic links containing relative pathnames;
- add support for symbolic links to different disks/volumes.
With reference to:
http://robert.zubek.net/blog/2008/04/09/sbcl-emacs-windows-vista/
http://sbcl-internals.cliki.net/Build%20on%20Windows
http://www.sbcl.org/
http://brainrack.wordpress.com/2008/05/29/running-sbcl-on-windows/
http://brainrack.wordpress.com/2008/05/28/broken-and-ill-documented-api-for-windows-mount-points/
Broken #’truename and annoying ASDF under Windows using SBCL
The CLtL2 specification requires #’truename to return the “canonical” pathname of a file. Most lisps interpret this to mean that when a pathname refers to a link, the link should be followed in order to return the pathname under which the file actually resides. Indeed, this is the way the UNIX implementation of SBCL functions with regard to #’truename. However, the Windows implementation of SBCL does not function the same as the UNIX implementation: (truename pathspec) will not follow links under Windows.
Complaining about this would be a petty bit of specs-manship were it not for the fact that the standard system definition facility ASDF relies on following links in order to function as designed. I find the intended behavior of ASDF to be elegant and convenient and so I set out to restore the ANSI functionality of #’truename under SBCL Windows.
[This issue has a history: see Stephan Lang's http://marc.info/?l=sbcl-devel&m=117357216708358&w=2; David Lichteblau's http://www.lichteblau.com/blubba/shortcut/asdf.lisp; and a discussion initiated by Nikodemus Siivola at http://article.gmane.org/gmane.lisp.cclan.general/740.]
Fixing the problem required me to delve deeply into some very dark recesses of the Microsoft Windows API, as well as SBCL’s FFI. Along the way I encountered all the usual abominations one must deal with when trying to program in Microsoftland: incorrect documentation, broken functions, vague and underspecified behaviors, inconsistent functionality between related functions, bizarre designs, Hungarian variable notation, and generally rococo APIs seemingly still implemented in a pre-1980 version of ‘C’ and modified from a cribbed copy of CPM. Some details are here. I feel most highly-privileged to have encountered the rare and wonderful ‘dammit-the-documentation-changed-while-I-was-reading-it!’ experience at http://msdn.microsoft.com/en-us/library/ms791514.aspx.
My plaints should not be taken to slight the abominations in SBCL’s FFI, however, for they are truly ghastly and profound. (Try to figure out exactly how c-strings and (* char)’s are handled sometime!)
Nonetheless, I have a working version of
(LINKED-TRUENAME filespec) — if the last part of the path of /filespec/ points to a link, return a pathname in which the link is resolved, else return the usual truename.
I have tested this function to verify that it works on file symlinks, directory symlinks, and mount points (aka junctions) as produced by mklink, mklink /D, mklink /J, mountvol, and the Disk Manager.
I don’t know if the fix will ever make it into the SBCL distribution. I’m not sure I want to jump through all the hoops necessary. In the meantime, it works for me, and hopefully for you.
Here’s the source:
;;;
;;;
;;; Code for following symlinks under Windows Vista
;;; Redefined truename so truename returns pathname of linked file
;;; This functionality is necessary for running asdf as it is designed
;;; - Neil Haven, June 2008
;;;
;;; For Microsoft reparse-points handling see:
;;; http://blog.kalmbach-software.de/2008/02/28/howto-correctly-read-reparse-data-in-vista/
;;;
;;; For Windows .lnk file handling see:
;;; "Jesse Hager: The Windows Shortcut File Format."
;;; http://www.wotsit.org/list.asp?fc=13
;;; Some code excerpted from David Lichteblau's modifications to asdf.lisp
;;; http://www.lichteblau.com/blubba/shortcut/asdf.lisp
;;;
(defpackage #:winlinks
(:nicknames #:wl)
(:documentation "Package for redefining truename so it follows links.")
(:use #:cl #:sb-alien #:sb-ext)
(:shadow #:truename #:file-write-date)
(:export #:override-truename #:override-file-write-date
#:parse-windows-shortcut))
(in-package #:wl)
#|
To create a mount point under Vista use mklink /J ...
To create a symbolic file link use mklink ...
To create a directory link use mklink ...
|#
#|
typedef struct _REPARSE_DATA_BUFFER {
ULONG ReparseTag;
USHORT ReparseDataLength;
USHORT Reserved;
union {
struct {
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
ULONG Flags; // it seems that the docu is missing this entry (at least 2008-03-07)
WCHAR PathBuffer[1];
} SymbolicLinkReparseBuffer;
struct {
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
WCHAR PathBuffer[1];
} MountPointReparseBuffer;
struct {
UCHAR DataBuffer[1];
} GenericReparseBuffer;
};
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;
#define MAXIMUM_REPARSE_DATA_BUFFER_SIZE ( 16 * 1024 )
//
// The reparse tags are a DWORD. The 32 bits are laid out as follows:
//
// 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1
// 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
// +-+-+-+-+-----------------------+-------------------------------+
// |M|R|N|R| Reserved bits | Reparse Tag Value |
// +-+-+-+-+-----------------------+-------------------------------+
//
// M is the Microsoft bit. When set to 1, it denotes a tag owned by Microsoft.
// All ISVs must use a tag with a 0 in this position.
// Note: If a Microsoft tag is used by non-Microsoft software, the
// behavior is not defined.
//
// R is reserved. Must be zero for non-Microsoft tags.
//
// N is name surrogate. When set to 1, the file represents another named
// entity in the system.
//
|#
(defun override-truename (&optional (package *package*))
"replace definition of truename in /package/ "
(shadowing-import 'wl::truename package))
(defun override-file-write-date (&optional (package *package*))
"replace definition of file-write-date in /package/ "
(shadowing-import 'wl::file-write-date package))
(defun wchar-string->lisp-string (wstring &optional (length -1))
"
Arguments: /wstring/ is a pointer to a zero-terminated
array of wchars (unsigned shorts)
/length/, if supplied, truncates the translation after
/length/ number of characters if a zero-termination is not
encountered before that
Semantics: translates /wstring/ to a lisp string. If
/wstring/ is a null pointer, returns nil
Returns: a lisp string or nil.
"
(declare (type (alien (* unsigned-short)) wstring)
(type integer length))
(when (null-alien wstring)
(return-from wchar-string->lisp-string nil))
;; calculate the length
(when (minusp length)
(do ((index 0 (incf index)))
((= (deref wstring index) 0) (setq length index))))
(do* ((index 0 (incf index))
(wchar (deref wstring index) (deref wstring index))
(lisp-string (make-array length :element-type 'character)))
((= index length) lisp-string)
(declare (type integer index)
(type (unsigned-byte 16) wchar)
(type string lisp-string))
(setf (aref lisp-string index) (code-char wchar))))
(defun lisp-string->wchar-string (string)
"
Semantics: converts the lisp /string/ into an alien pointer to
a zero-terminated array of wide (16-bit) chars. Any
characters in /string/ with a unicode code point
> #xFFFF are given a representation as #xFFFD -- the
standard unicode Replacement Character
Returns: Two values: an alien pointer to a zero-terminated array of
16-bit chars suitable for passing to a Windows API;
and the length of the string
"
(let* ((size (length string))
(wchar-string (make-alien (unsigned 16) (1+ size)))
(unicode 0)
(wchar 0))
(declare (type integer size unicode)
(type (unsigned-byte 16) wchar))
(dotimes (index size)
(setq unicode (char-code (elt string index)))
(when (> unicode #xFFFF) (setq unicode #xFFFD)) ; coerce to 16-bits
(setq wchar (coerce unicode '(unsigned-byte 16)))
(setf (deref wchar-string index) wchar))
(setf (deref wchar-string size) 0)
(return-from lisp-string->wchar-string
(values wchar-string size))))
(defun lisp-string->char-string (string)
"
Semantics: converts the lisp /string/ into an alien pointer to
a zero-terminated array of 8-bit chars. Any
characters in /string/ with a unicode code point
> #xFF are given a representation as #x1A -- the
standard unicode Substitute control
Returns: Two values: an alien pointer to a zero-terminated array of
8-bit chars suitable for passing to a Windows API;
and the length of the string
"
(let* ((size (length string))
(char-string (make-alien (unsigned 8) (1+ size)))
(unicode 0)
(char 0))
(declare (type integer size unicode)
(type (unsigned-byte 8) char))
(dotimes (index size)
(setq unicode (char-code (elt string index)))
(when (> unicode #xFF) (setq unicode #x1A)) ; coerce to 8-bits
(setq char (coerce unicode '(unsigned-byte 8)))
(setf (deref char-string index) char))
(setf (deref char-string size) 0)
(return-from lisp-string->char-string
(values char-string size))))
(define-alien-type nil
(struct SymbolicLinkStructure
(substitute-name-offset unsigned-short) ; 2 bytes
(substitute-name-length unsigned-short) ; 2 bytes
(print-name-offset unsigned-short) ; 2 bytes
(print-name-length unsigned-short) ; 2 bytes
(flags unsigned-long) ; 4 bytes
(path-buffer (array unsigned-short 246)))) ; 492 bytes
(define-alien-type nil
(struct MountPointStructure
(substitute-name-offset unsigned-short) ; 2 bytes
(substitute-name-length unsigned-short) ; 2 bytes
(print-name-offset unsigned-short) ; 2 bytes
(print-name-length unsigned-short) ; 2 bytes
(path-buffer (array unsigned-short 248)))) ; 496 bytes
(define-alien-type nil
(struct GenericReparseStructure
(data-buffer (array char 504)))) ; 504 bytes
(define-alien-type nil
(union ReparseUnion
(symbolic-link-structure (struct SymbolicLinkStructure))
(mount-point-structure (struct MountPointStructure))
(generic-reparse-structure (struct GenericReparseStructure))))
(define-alien-type nil
(struct ReparseStructure
(reparse-tag unsigned-long) ; 4 bytes
(length unsigned-short) ; 2 bytes
(reserved unsigned-short) ; 2 bytes
(reparse-union (union ReparseUnion)))) ; 504 bytes
(defmacro reparse-tag-p (reparse-tag)
"Is /reparse-tag/ a legal Microsoft reparse tag?"
;(declare (type (unsigned-byte 32) reparse-tag))
`(= #x80000000 (logand ,reparse-tag #x80000000)))
(defmacro name-surrogate-p (reparse-tag)
"Is the /reparse-tag/ from a surrogate for another file?"
;(declare (type (unsigned-byte 32) reparse-tag))
`(= #x20000000 (logand ,reparse-tag #x20000000)))
(defmacro mount-point-p (reparse-tag)
"Is the /reparse-tag/ a Microsoft mount point?"
;(declare (type (unsigned-byte 32)) reparse-tag)
`(= #xA0000003 (logand ,reparse-tag #xA0000003)))
(defmacro symlink-p (reparse-tag)
"Is the /reparse-tag/ a Microsoft symlink?"
;(declare (type (unsigned-byte 32)) reparse-tag)
`(= #xA000000C (logand ,reparse-tag #xA000000C)))
#|
HANDLE WINAPI CreateFile(
__in LPCTSTR lpFileName,
__in DWORD dwDesiredAccess,
__in DWORD dwShareMode,
__in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
__in DWORD dwCreationDisposition,
__in DWORD dwFlagsAndAttributes,
__in_opt HANDLE hTemplateFile
);
BOOL WINAPI DeviceIoControl(
__in HANDLE hDevice,
__in DWORD dwIoControlCode,
__in_opt LPVOID lpInBuffer,
__in DWORD nInBufferSize,
__out_opt LPVOID lpOutBuffer,
__in DWORD nOutBufferSize,
__out_opt LPDWORD lpBytesReturned,
__inout_opt LPOVERLAPPED lpOverlapped
);
|#
(load-shared-object "Kernel32")
(declaim (inline CreateFileA))
(declaim (inline CloseHandle))
(declaim (inline DeviceIoControl))
(declaim (inline GetVolumeNameForVolumeMountPointA))
(define-alien-routine ("GetVolumeNameForVolumeMountPointA" GetVolumeNameForVolumeMountPointA) int
(mount-point (* unsigned-char))
(volume-name (* unsigned-char))
(buffer-length unsigned-int))
(define-alien-routine ("CreateFileA" CreateFileA) int
(filename (* unsigned-char))
(access unsigned-int)
(sharemode unsigned-int)
(security unsigned-int)
(create unsigned-int)
(flags unsigned-int)
(template unsigned-int))
(define-alien-routine ("CloseHandle" CloseHandle) int
(device int))
(define-alien-routine ("DeviceIoControl" DeviceIoControl) int
(device unsigned-int)
(control unsigned-int)
(inBuffer (* t))
(inSize unsigned-int)
(outBuffer (* (struct ReparseStructure)))
(outSize unsigned-int)
(bytesReturned (* unsigned-int))
(overlapped (* t)))
(defun get-link-handle (filespec)
"returns -1 on failure, otherwise returns handle for /filespec/"
(let ((filename (lisp-string->char-string (namestring filespec))))
;; for the interpretation of the magic numbers, see the kalmbach reference
(CreateFileA filename #x8 #x7 #x0 #x3 #x02200000 #x0)))
(defun get-volume-name (mount-point)
"
Semantics: returns lisp string for underlying volume name of mount point
If there is no underlying volume name, return /mount-point/
Arguments: /mount-point/ is a string pathname designator
Returns: lisp string in form \\\\?\\Volume{GUID}\\ if /mount-point/ exists
else returns original /mount-point/
"
;; GetVolumePathNamesForVolumeName, does not seem to work for me.
;; So this is a tedious bit of kludgy sequential search.
(let ((volume-name (make-alien unsigned-char 64)) ; Microsoft doco says the max length is 50
(result 0))
(declare (type integer result))
;; read the low-level volume name given a normal path
(setq result
(GetVolumeNameForVolumeMountPointA (lisp-string->char-string mount-point) volume-name 64))
(when (zerop result)
;; if the call fails, the path isn't a mount point
(return-from get-volume-name mount-point))
;; now create a lisp string from the character array
(let ((volume (make-string 64)))
(declare (type string volume))
(dotimes (i 64)
(let ((c (deref volume-name i)))
(when (zerop c)
;; hit end of array, so truncate the string
(setq volume (subseq volume 0 i))
(return))
(setf (char volume i) (code-char c))))
volume)))
(defun volume-name->disk-name (volume-name)
"
Semantics: returns disk-name for low-level volume name
Arguments: /volume-name/ is a low-level volume name
of the form \\??\\Volume{GUID}\\
Returns: lisp string in form X:\\ where X is the disk name.
or /volume-name/ if there is no associated disk
"
(dotimes (disk 26 volume-name)
(let* ((disk-name (concatenate 'string (string (code-char (+ disk (char-code #\A)))) ":\\"))
(disk-volume-name (get-volume-name disk-name)))
(when (and (>= (length volume-name) (length disk-volume-name) 4)
(equalp (subseq disk-volume-name 4) (subseq volume-name 4)))
(return-from volume-name->disk-name disk-name)))))
(defun mount-point->disk-name (pathspec)
"
Semantics: Converts the mount point /pathspec/ into the name of the disk
it points to.
Arguments: /pathspec/ is the pathspec of a mount point
Returns: the namestring of the disk pointed to by /pathspec/
Error Conditions: signals
"
(let ((volume-name (get-volume-name (namestring (cl::truename pathspec)))))
(volume-name->disk-name volume-name)))
(defun extract-path-from-mount-point (reparseStructure)
"internal function for getting a path from a mount point"
(declare (type (alien (struct ReparseStructure)) reparseStructure))
;; get the union fron the reparse structure
(with-alien
((reparseUnion (union ReparseUnion) (slot reparseStructure 'reparse-union)))
;; get the mount-point from the union
(with-alien
((mount-point (struct MountPointStructure) (slot reparseUnion 'mount-point-structure)))
;; get the string buffer from the mount-point
(with-alien ((buffer (array unsigned-short 248) (slot mount-point 'path-buffer))
(sname-offset unsigned-short (slot mount-point 'substitute-name-offset))
(sname-length unsigned-short (slot mount-point 'substitute-name-length)))
;; adjust for size of wchar
(setq sname-offset (/ sname-offset 2)) ; offset is returned in bytes
(setq sname-length (/ sname-length 2)) ; length is returned in bytes
;; get the lisp equivalent
(let ((sname-string
(wchar-string->lisp-string (addr (deref buffer sname-offset)) sname-length)))
;; sname-string has form \??\Volume{GUID}\
;; OR sname-string has form \??\X:\ -- it all depends...
(if (> (length sname-string) (length "\\??\\X:\\"))
;; we have to do this the hard way
(volume-name->disk-name sname-string)
;; otherwise don't forget to lop off the \??\ part...
(subseq sname-string 4)))))))
(defun extract-path-from-symlink (reparseStructure)
"internal function for getting a path from a symlink"
;; get the union from the structure
(with-alien
((reparseUnion (union ReparseUnion) (slot reparseStructure 'reparse-union)))
;; get the symlink from the union
(with-alien
((symlink (struct SymbolicLinkStructure) (slot reparseUnion 'symbolic-link-structure)))
;; get the string buffer from the symlink
(with-alien ((buffer (array unsigned-short 246) (slot symlink 'path-buffer)))
(let ((sname-offset (slot symlink 'substitute-name-offset))
(pname-offset (slot symlink 'print-name-offset))
(sname-length (slot symlink 'substitute-name-length))
(pname-length (slot symlink 'print-name-length)))
(declare (type integer sname-offset pname-offset sname-length pname-length))
;; adjust for size of wchar
(setq sname-offset (/ sname-offset 2)) ; offset is returned in bytes
(setq pname-offset (/ pname-offset 2))
(setq sname-length (/ sname-length 2)) ; length is returned in bytes
(setq pname-length (/ pname-length 2))
;; convert the alien pointer to a lisp string
(with-alien
((sname (* unsigned-short) (addr (deref buffer sname-offset))))
;; (pname (* short) (addr (deref buffer pname-offset))))
(let ((slink (wchar-string->lisp-string sname sname-length))
(indicator-length (length "\\??\\")))
;; if linking another device, we need to strip a \\??\\ from the front...
(if (and (>= (length slink) indicator-length)
(equal "\\??\\" (subseq slink 0 indicator-length)))
(subseq slink indicator-length)
slink))))))))
;;;; Windows shortcut support. Based on:
;;; Jesse Hager: The Windows Shortcut File Format.
;;; http://www.wotsit.org/list.asp?fc=13
;;; Taken from David Lichteblau's modifications to asdf.lisp
;;; http://www.lichteblau.com/blubba/shortcut/asdf.lisp
;;; For the meaning of the magic numbers, see the Hager reference
(defparameter *link-initial-dword* 76)
(defparameter *link-guid* #(1 20 2 0 0 0 0 0 192 0 0 0 0 0 0 70))
(defun read-null-terminated-string (s)
(with-output-to-string (out)
(loop
for code = (read-byte s)
until (zerop code)
do (write-char (code-char code) out))))
(defun read-little-endian (s &optional (bytes 4))
(let ((result 0))
(loop
for i from 0 below bytes
do
(setf result (logior result (ash (read-byte s) (* 8 i)))))
result))
(defun parse-file-location-info (s)
(let ((start (file-position s))
(total-length (read-little-endian s))
(end-of-header (read-little-endian s))
(fli-flags (read-little-endian s))
(local-volume-offset (read-little-endian s))
(local-offset (read-little-endian s))
(network-volume-offset (read-little-endian s))
(remaining-offset (read-little-endian s)))
(declare (ignore total-length end-of-header local-volume-offset))
(unless (zerop fli-flags)
(cond
((logbitp 0 fli-flags)
(file-position s (+ start local-offset)))
((logbitp 1 fli-flags)
(file-position s (+ start
network-volume-offset
#x14))))
(concatenate 'string
(read-null-terminated-string s)
(progn
(file-position s (+ start remaining-offset))
(read-null-terminated-string s))))))
(defun windows-link-p (filespec)
"Is /filespec/ a Windows .lnk shortcut file?"
;; must end in .lnk
(when (not (equalp (pathname-type filespec) "LNK"))
(return-from windows-link-p nil))
;; needs to have the correct header
(with-open-file (s filespec :element-type '(unsigned-byte 8))
(handler-case
(and (= (read-little-endian s) *link-initial-dword*)
(let ((header (make-array (length *link-guid*))))
(read-sequence header s)
(equalp header *link-guid*)))
(end-of-file () nil))))
(defun parse-windows-shortcut (pathname)
(with-open-file (s pathname :element-type '(unsigned-byte 8))
(handler-case
(when (and (= (read-little-endian s) *link-initial-dword*)
(let ((header (make-array (length *link-guid*))))
(read-sequence header s)
(equalp header *link-guid*)))
(let ((flags (read-little-endian s)))
(file-position s 76) ;skip rest of header
(when (logbitp 0 flags)
;; skip shell item id list
(let ((length (read-little-endian s 2)))
(file-position s (+ length (file-position s)))))
(cond
((logbitp 1 flags)
(parse-file-location-info s))
(t
(when (logbitp 2 flags)
;; skip description string
(let ((length (read-little-endian s 2)))
(file-position s (+ length (file-position s)))))
(when (logbitp 3 flags)
;; finally, our pathname
(let* ((length (read-little-endian s 2))
(buffer (make-array length)))
(read-sequence buffer s)
(map 'string #'code-char buffer)))))))
(end-of-file ()
nil))))
(define-condition symlink-error (error) ())
(defun truename-aux (filespec &optional (depth 0))
;; RECURSIVE INVARIANT: /filespec/ is an absolute path, not relative
;; always returns an absolute pathname
(when (> (incf depth) 16)
;; poor man's recursion detection...
(error 'symlink-error))
;(when (windows-link-p filespec)
; (return-from truename-aux (truename-aux (parse-windows-shortcut (namestring filespec)) depth)))
(let ((handle (get-link-handle filespec)))
(declare (type integer handle))
;; check for failure
(when (eq handle -1)
;; generally, this means we can't find the file
(error 'symlink-error))
(with-alien ((reparseStructure (struct ReparseStructure))
(bytes-returned unsigned-int))
;; see kalmbach to interpret the constants here...
(let ((result (DeviceIoControl handle #x900A8 nil #x0
(addr reparseStructure)
#x200 (addr bytes-returned) nil)))
(declare (type integer result))
;; done with handle
(CloseHandle handle)
;; check for failure, meaning device is not a link
(when (zerop result)
;; we are done
(return-from truename-aux filespec))
(let ((tag (slot reparseStructure 'reparse-tag)))
(cond
((not (reparse-tag-p tag))
filespec)
((symlink-p tag)
(let* ((sympath (extract-path-from-symlink reparseStructure))
(dirlist (pathname-directory sympath)))
(case (car dirlist)
(:ABSOLUTE
;; an :absolute directory can be used as is
(truename-aux sympath depth))
(otherwise
;; a :relative directory must be merged with the base directory before continuing
(let ((merged-path (merge-pathnames sympath (directory-namestring filespec))))
(setq merged-path (sb-impl::simplify-win32-namestring (namestring merged-path)))
(truename-aux merged-path depth))))))
((mount-point-p tag)
(pathname (extract-path-from-mount-point reparseStructure)))
(t
filespec)))))))
(defun truename (filespec)
#.(concatenate 'string
(documentation 'cl::truename 'function)
"
Under Windows, the TRUENAME of a symlink that does not
exist or of a recursive symlink, or a symlink more than
16 levels deep, is the name of the symlink.
")
(let ((original-path (cl::truename filespec)))
(handler-case
(cl::truename (truename-aux original-path))
(symlink-error () original-path)
(file-error () original-path))))
(defun file-write-date (pathspec)
#.(documentation 'cl::file-write-date 'function)
(cl::file-write-date (truename pathspec)))
Running SBCL on Windows
May 29, 2008
This post applies to running Steel Bank Common Lisp version 1.0.16 and 1.0.17 on Windows Vista. It may apply to later or earlier versions of SBCL, but I cannot directly attest to that. It may apply to certain functionality under Windows XP, but I cannot attest to that either.
With reference to:
http://robert.zubek.net/blog/2008/04/09/sbcl-emacs-windows-vista/
http://sbcl-internals.cliki.net/Build%20on%20Windows
http://www.sbcl.org/
The following is a set of useful utilities for running SBCL in a windows environment. The utilities deal with setting the default-pathname, the current working directory, and executing shell commands. Your milage may vary.
Changing Directories
Lisp has a global variable called *default-pathname-defaults*. This variable works like a “current directory” when executing lisp commands like #’load or #’directory. However, when you are doing any work via the foreign function interface (that is: if you are calling out to use operating system commands) then there is also the notion of the “current working directory” which is the location the operating system thinks you are executing from. This current working directory (aka “execution path” aka “execution directory”) is not necessarily the same as the *default-pathname-defaults* directory.
Here is source code for the following lisp commands under SBCL
(CWD pathname) — update current working directory to /pathname/
(CD pathname) — update both *default-pathname-default* and the current working directory to /pathname/
The source code uses the alien (foreign) function interface for SBCL.
(load-shared-object "Kernel32")
(declaim (inline SetCurrentDirectoryA))
(declaim (inline GetCurrentDirectoryA))
(define-alien-routine ("SetCurrentDirectoryA" SetCurrentDirectoryA) int
(directory (* unsigned-char)))
(define-alien-routine ("GetCurrentDirectoryA" GetCurrentDirectoryA) unsigned-int
(buffer-length unsigned-int)
(buffer (* unsigned-char)))
(defun char-string->lisp-string (cstring &optional (length -1))
"
Arguments: /cstring/ is a pointer to a zero-terminated
array of chars (unsigned bytes)
/length/, if supplied, truncates the translation after
/length/ number of characters if a zero-termination is not
encountered before that
Semantics: translates /cstring/ to a lisp string. If
/cstring/ is a null pointer, returns nil
Returns: a lisp string or nil.
"
(declare (type integer length))
(when (null-alien cstring)
(return-from char-string->lisp-string nil))
;; calculate the length
(when (minusp length)
(do ((index 0 (incf index)))
((= (deref cstring index) 0) (setq length index))))
(do* ((index 0 (incf index))
(char (deref cstring index) (deref cstring index))
(lisp-string (make-array length :element-type 'character)))
((or (= index length) (zerop char)) (subseq lisp-string 0 index))
(declare (type integer index)
(type (unsigned-byte 8) char)
(type string lisp-string))
(setf (aref lisp-string index) (code-char char))))
(defun lisp-string->char-string (string)
"
Semantics: converts the lisp /string/ into an alien pointer to
a zero-terminated array of 8-bit chars. Any
characters in /string/ with a unicode code point
> #xFF are given a representation as #x1A -- the
standard unicode Substitute control
Returns: Two values: an alien pointer to a zero-terminated array of
8-bit chars suitable for passing to a Windows API;
and the length of the string
"
(let* ((size (length string))
(char-string (make-alien (unsigned 8) (1+ size)))
(unicode 0)
(char 0))
(declare (type integer size unicode)
(type (unsigned-byte 8) char))
(dotimes (index size)
(setq unicode (char-code (elt string index)))
(when (> unicode #xFF) (setq unicode #x1A)) ; coerce to 8-bits
(setq char (coerce unicode '(unsigned-byte 8)))
(setf (deref char-string index) char))
(setf (deref char-string size) 0)
(return-from lisp-string->char-string
(values char-string size))))
(defun get-cwd ()
"
Semantics: returns pathname for the current working (execution) directory.
The current working directory is the directory the operating system
believes the program is running in. Shell commands execute here.
Arguments:
Returns: pathname of the current working directory
"
(let* ((size (GetCurrentDirectoryA 0 nil))
(cwd (make-alien unsigned-char size)))
(declare (type integer size))
(GetCurrentDirectoryA size cwd)
(char-string->lisp-string cwd size)
(truename (char-string->lisp-string cwd size))))
(defun cwd (&optional dir)
"
Semantics: Set the current working directory to /dir/.
If /dir/ is nil or omitted, get the current working directory.
Arguments: /dir/ is a slash-terminated lisp string or directory
pathspec
Returns: the current working directory after execution
Error Conditions: signals 'file-error if not possible
"
(when (null dir)
(return-from cwd (get-cwd)))
(let ((result (SetCurrentDirectoryA (lisp-string->char-string (namestring dir)))))
(declare (type integer result))
(if (zerop result)
(error (make-condition 'file-error :pathname dir))
(get-cwd))))
(defun set-default-directory (&optional (pathname ""))
"Update *default-pathname-defaults* to /pathname/"
(setf *default-pathname-defaults* (truename pathname)))
(defun cd (&optional (pathname ""))
"Update both *default-pathname-defaults* and working directory to /pathname/"
(set-default-directory pathname)
(cwd *default-pathname-defaults*))
Executing Shell Commands
There is a facility in SBCL for sending a command to the enclosing operating system: #’sb-ext:run-program. I’m sure run-program has been extensively debugged under *NIX, and it may even run well under (X)Emacs under *NIX, but it is not much fun to use under XEmacs on Windows.
The following is a Steel Bank Common Lisp implementation of a function to execute a dos command shell command
(SHELL command) — Run /command/ in a dos shell. Return the result.
(defun shell (command &key silent)
"
Syntax (dos command [:silent silent])
Semantics: Run the /command/ in a dos shell. Return when /command/ does.
Side Effects: /command/ is run in a dos shell.
Arguments: /command/ is a string such as could be run from the dos cmd shell.
If /silent/ is t return an empty string for the output. Default is nil.
Returns: (values output status) where /output/ is the output string. If
/silent/ is t, then /output/ is supressed to an empty string.
/status/ is t if the /command/ succeeds, nil otherwise.
"
;; Note: many of the options to #'run-program seem broken in Windows
;; under XEmacs. E.g. the simple :output t option doesn't work under
;; XEmacs -- though it does in an SBCL shell. I'm sure there's a
;; good reason for that involving XEmacs redirections and Slime,
;; but I don't really want to track it down.
;; This function kludges over all the problems by redirecting output
;; to a file, then reading the file and returning its contents.
;; The temporary file is discarded after use.
;; It works for me!
(let ((args (list "/C" command))
(filename "")
(result ""))
;; get a safe temporary filename
(do ((candidate-name (string (gensym "TEMPFILE")) (string (gensym "TEMPFILE"))))
((not (probe-file candidate-name)) (setq filename candidate-name)))
(let ((proc (sb-ext:run-program "CMD" args :output filename :search t :wait t)))
(setq result
(with-open-file (stream filename :direction :input)
(with-output-to-string
(output)
(do ((line (read-line stream nil stream) (read-line stream nil stream)))
((eql line stream))
(format output "~A~%" line)))))
(when (probe-file filename) (delete-file filename))
(values
(if silent
""
(remove #\return result)) ;; handle dos carriage-return line-feed for nice display
(zerop (sb-ext:process-exit-code proc))))))
This note is written for programmers who must deal with mount points under Windows. It applies to Windows Vista Home Premium version 6.0.6000 with kernel32.dll version 6.0.6000.16386. It may apply to other versions of Microsoft’s operating systems, but I have no first-hand knowledge of such. If you are having trouble with GetVolumePathNamesForVolumeName, or other functions to manage links and junctions, the following may apply.
With reference to:
http://blog.kalmbach-software.de/2008/02/28/howto-correctly-read-reparse-data-in-vista/
http://blogs.msdn.com/adioltean/archive/2005/04/16/408947.aspx
http://msdn.microsoft.com/en-us/library/aa365730(VS.85).aspx
http://msdn.microsoft.com/en-us/library/aa365503(VS.85).aspx
Programmers who must manage links under Windows Vista should beware! Of the several low-level functions which purport to manage links and junctions under Vista, it appears that at least two are completely broken and others will exhibit unexpected behavior depending on factors outside of your control.
The broken functions are GetVolumePathNamesForVolumeNameA and GetVolumePathNamesForVolumeNameW.
After extensive thrashing, I have come to the conclusion that these functions do not operate as
documented (see http://msdn.microsoft.com/en-us/library/aa364998(VS.85).aspx); indeed, I can not get these functions to work at all. Both return 0 (BOOL FALSE) for all combinations of arguments applied as per the Microsoft documentation.
In addition to the broken functionality, unexpected behavior arises from the different facilities for creating mount points (variously called Junctions, Directory Junctions, Mount Points, and/or Volume Mount Points in Microsoft’s documentation) under Vista 6.0.6000. These facilities are: mountvol, mklink, and /Control Panel/System and Maintenance/Administrative Tools/Computer Management/Storage/Disk Management/. Mountvol and Disk Management function similarly; mklink is different.
When you create a mount point using mountvol or Disk Management, the substitute name in the REPARSE_DATA_BUFFER (see http://msdn.microsoft.com/en-us/library/ms791514.aspx) acquired via DeviceIoControl associated with the mount point is a low-level volume designator of the form \??\Volume{GUID}\. When you create a mount point using mklink (e.g. via mklink /J MOUNTPATH X:), the substitute name in the REPARSE_DATA_BUFFER is a disk-name (X:), not a volume designator. In neither case is the print-name field of the REPARSE_DATA_BUFFER populated.
[Note that, until very recently the Microsoft documentation for REPARSE_DATA_BUFFER incorrectly omitted the flags field from the SymbolicLinkReparseBuffer part of the union.]
[For a description of how to acquire the REPARSE_DATA_BUFFER in C++ for Windows links, see http://blog.kalmbach-software.de/2008/02/28/howto-correctly-read-reparse-data-in-vista/.]
Note also that mountvol displays volume designators with the syntax \\?\Volume{GUID}\ whereas the REPARSE_DATA_BUFFER returns volume designators with the syntax \??\Volume{GUID}\. The functions GetVolumeNameForVolumeMountPointA and GetVolumeNameForVolumeMountPointW return volume designators with the syntax \\?\Volume{GUID}\.
What this means is that in order to acquire the actual disk name from a mount point you will need to examine the substitute name acquired in the REPARSE_DATA_BUFFER. If the name is of the form \??\X:\ you are done. If it is of the form \??\Volume{GUID}\, you will need to examine the 26 possible results from GetVolumeNameForVolumeMountPoint (one for each disk name of the form X:\). If there is a match in the GUID’s, then you have found the actual disk to which the mount point refers.
I am sure this all makes sense to Microsoft somehow. It doesn’t to me; but, hey.
I’ve implemented all this in Steel Bank Common Lisp’s FFI in order to fix the truename function under Windows. You are welcome to look at the code for inspiration, but if you are a C/C++/C#/Java/etc. hacker it will probably be illegible.
Background
The cl-ansi-tests application (I am using the July 2005 version) is a set of over 20,000 individual tests of various aspects of a lisp’s conformance with the Common Lisp standard. It was written by Paul Dietz over the span of many years. Running cl-ansi-tests is a good way to discover things about a particular lisp implementation, but, as Paul Dietz says, the “tests have not been selected to reflect the importance or relative frequency of different CL features”, so the numerical results of the tests must be interpreted carefully. It is possible for a lisp implementation to fail many trivial conformance tests, but succeed at all the important ones. Likewise, it is possible for a lisp implementation to pass nearly all the tests, but to fail a few critically important ones.
Since the cl-ansi-tests suite is a moderately-large, stressful application (~180,000 lines of code contained in ~850 files), running the cl-ansi-tests suite serves as a good stability test as well as a standards conformance test. I will describe my conformance testing results separate from the stability testing results. Then I will present my (subjective) interpretation of the tests.
The Procedure
Everything necessary to run the cl-ansi-tests suite under Corman Lisp is distributed here: ansi-tests. The distribution includes source code for a modified version of the cl-ansi-tests suite (see ansi-tests/cl-ansi-tests-20050704.ccl/), some ANSI-compatibility patches for Corman Lisp (see ansi-tests/corman-patches/ and ansi-tests/compatibility.lisp), directions for running the tests (see the header for ansi-tests/run-cl-ansi-tests.lisp), and the results from running the tests (see ansi-tests/cl-ansi-tests.24052008.results).
The cl-ansi-tests suite from CLOCC will not load under Corman Lisp out of the box; modification to six files was required to get it running with Corman Lisp. You can see a list of modifications by grepping “cormanlisp” on the source code located at ansi-tests/cl-ansi-tests-20050704.ccl/. I wrote a replacement test driver for Corman Lisp called ‘run-cl-ansi-tests.lisp’; you should use this driver instead of the ‘doit.lsp’ file which is distributed with the cl-ansi-tests suite.
I ran the tests under Windows VISTA on a dual-core Pentium, using Corman Lisp version 3.02 (beta, pre-release). I made a number of changes to the Corman Lisp release. The changes are detailed in the file ‘compatibility.lisp’ from the distribution.
Usage instructions are available in the comments at the head of the ‘run-cl-ansi-tests.lisp’ file in the distribution. If you attempt to run the tests via the usual ‘doit.lsp’, you will crash.
I ran as many of the tests as I could, given my time constraints. In the end, I was able to run 17,271 tests out of a total of more than 21,000 tests.
The distribution includes a copy of the test output with a listing of all failed tests in the file ‘cl-ansi-tests.24052008.results’.
Results and Interpretation of the ANSI Conformance Tests
The result of my testing, as of 25 May 2008, using Corman Lisp version 3.02 (beta, pre-release) and a modified version of the cl-ansi-tests-20050704.orig test suite downloaded from the UBUNTU archives, was that ccl failed 4,643 tests and passed 12,628 tests. I was able to run 17,271 tests; I was unable to run approximately 4,000 tests.
The runnable tests were in the following categories:
symbols
iteration
cons
arrays
hash-tables
types-and-classes
strings
characters
pathnames
files
streams
conditions
packages
sequences
system-construction
objects
numbers
eval-and-compile
data-and-control-flow
structures
I was unable to run the following categories of tests:
printer
reader
environment
miscellaneous
I did not run those categories of tests because either the tests would not load, or, once loaded, the tests would hang or crash my system repeatedly. I ran out of time to debug them.
The largest category of test failures resulted from non-conforming error-signaling behaviors: specifically, Corman Lisp often signals an error at the correct spot, but signals the incorrect error according to the spec. For example, Corman Lisp will typically signal a ‘SIMPLE-ERROR if the wrong number of arguments is supplied to a function; the specification is for a ‘PROGRAM-ERROR.
The second largest category of tests failures was caused by deficiencies in ccl’s type system with
regard to the specification.
The next largest category of test failures was caused by improper handling of the :allow-other-keys
keyword: specifically, the spec states that any function or macro accepting any keyword must also
accept the :allow-other-keys keyword. Corman Lisp incorrectly signals an error if :allow-other-keys
is used without being explicitly declared in the function or macro lambda list.
These three categories of failures accounted for approximately half of all test failures.
Other significant categories of test failures arose from ccl’s type system, incomplete implementation
of the crufty and byzantine ‘loop’ facility, and the failure of the :default-initargs option to
defclass. You can peruse the complete listing of program output in the file
‘cl-ansi-tests.24052008.results’ from the distribution.
Users should be aware that a small number of test cases failed due to non-ANSI scoping rules for
special variables in certain unusual cases.
Based on the results of this round of testing, it appears that significant improvement could be made
very quickly to Corman Lisp’s ANSI compatibility by attending to the following (in order of sheer
numbers of tests):
1. Error signaling in accordance with CLtL2 (> 1150 cases);
2. Harmonizing ccl’s type system with the spec (> 700 cases);
3. Fixing :allow-other-keys (> 300 cases);
4. Implementing the rest of the LOOP spec (> 150 cases);
5. Fixing the :default-initargs option to defclass.
I estimate that fixing these issues would reduce the number of failed tests by an order of magnitude.
I should emphasize that the large majority of failures exhibited by ccl were rather trivial: improper error signaling, a nit-picking problem with :allow-other-keys, missing LOOP features that nobody uses anyway, etc. A few of the failures were more serious in terms of incorrect code generation or missing, but useful, language features. (For example, due to a code-generation error, #’with-package-iterator is broken.) Generally, I would say the results of the testing were quite positive: ccl is a nearly ANSI compliant implementation of lisp, none of the non-compliant behaviors are difficult to fix, none of the non-compliant behaviors are terribly important, and a little effort can yield a big improvement.
Results of Stability Testing
Unfortunately, considered as a stability test, the results of running the test suite were disappointing. I encountered context-dependent and intermittent instabilities in ccl during the course of testing. I found several cases where, separately, I could run Program A, or I could run Program B, but if I tried to run Program A followed directly by Program B, ccl would hang or crash.
I also encountered several non-reproducible problems where ccl would run a program perfectly well part of the time, but would hang or crash other times. Specifically, I cannot run the entire test suite at once: I must load and execute parts of the suite file-by-file. If I try to load the entire test suite, ccl will hang.
Summary & Caveats
Corman Lisp conforms substantially to the CLtL2 spec, although significant areas of non-conformance exist. I would estimate that the most significant areas of non-compliance in terms of numbers of pages of the CLtL2 specification violated are (in some rough order):
1. Error signaling with incorrect condition types
2. The type system mishandles many objects
3. The LOOP specification is not completely implemented
I would estimate that the most significant areas of non-conformance in practical terms of creating bugs and gotchas in code you are writing or trying to port are:
1. The :default-initargs option to defclass does not work
2. The :test-not option to several functions (member, adjoin, etc.) does not work
3. No support for synonym streams or broadcast streams
4. Some FORMAT directives (notably ~<~>) are broken
Subjectively, the most important current issue standing in the way of Corman Lisp’s suitability for product-deployment is not *any* of the ANSI conformance issues, which are relatively minor; the most important current issue is the stability of Corman Lisp measured by the frequency of opaque crashes and system hangs (hang = processor runs away and never comes back). The stability issues must be fixed before Corman Lisp can be considered production-ready.
The strengths of Corman Lisp are its source-code (which is both readily-available and cleanly-written), its compatibility with Microsoft’s C/C++ compiler (which is the compiler of choice for a vast number of commercial applications), and its intuitive and versatile FFI (compare, for instance, how difficult it is to call Lisp from C in SBCL vs. how easy it is in Corman Lisp and you will see what I mean).
Based on my experience over the past few months with Corman Lisp, including writing approximately 5000 lines of cormanlisp lisp, I would say the support I’ve received from Roger has been outstanding and the lisp implementation is generally reliable and fast. If the stability issues with the Corman Lisp implementation can be fixed, I would not hesitate to recommend Corman Lisp for anyone who needs a tightly-integrated lisp/C/C++ solution at a quite reasonable cost.
May 2008
Lisp-Unit-Tester — a portable unit-tester for Lisp
May 25, 2008
Lisp-Unit-Tester (aka LUT)
[Note: this document is a revised version of the original written by Chris Riesbeck. It has been modified to reflect changes made to the lisp-unit package (including a name change to lisp-unit-tester) at the pleasure of Neil Haven. When I posted this description yesterday, 25 May 2008, I neglected to include a link to download the package, my apologies -- it is now included below where it says download here.]
lisp-unit-tester (with nickname :lut) is a Common Lisp library that supports unit testing (download here). Unit testing is the coding practice wherein a library of functionality tests is developed along with, or even ahead of, production code. Unit testing helps detect regressions (code changes that break old functionality), helps focus design efforts on meeting practical and well-defined goals, and helps detect bugs.
There is a long history of testing packages in Lisp, traditionally called “regression” testers, see, for example, RT (ca 1990), CLUnit (ca 2000), Heute (ca 2007). Unit testing packages in Lisp and other languages benefit from inspiration by eXtreme Programming practices and JUnit for Java.
This page has two parts:
Overview
Chris Riesbeck’s main goal for the original unit-tester package was to make it simple to use. It is well-suited to both beginning and advanced Lisp programmers. The advantages of unit-tester, and the revised version lisp-unit-tester, are:
- Written in portable Common Lisp. The package has been run under sbcl, clisp, corman common lisp, lispworks lisp, and so on…
- Just one file to load.
- Dead-simple to define and run tests. See example.
- Supports redefining functions and even macros without reloading tests.
- Supports test-first programming.
- Supports testing return values, printed output, macro expansions, and error conditions.
- Produces short readable output with a reasonable level of detail.
- Groups tests by package for modularity.
The changes from Chris Reisbeck’s original package now present in lisp-unit-tester are the following:
- Unexpected errors in test code no longer abort tests;
- All tests are automatically numbered sequentially for ease of reference;
- Groups of tests are executed in the order defined rather than alphabetically;
- The package now works under CormanLisp;
- All exported functions are now documented;
- The user has the option to set a verbose success mode so that successful tests are not silent;
- Unexpected errors in tests are tallied and reported.
How to Use lisp-unit-tester
- Load (or compile and load) lisp-unit-tester.lisp.
- Evaluate (use-package :lut) or, if you prefer a lot of typing, (use-package :lisp-unit-tester)
- Load a file of tests. See below for how to define tests. An example file (test-lisp-unit.lisp) is provided with the distribution.
- Run the tests with
test.
Uniquely numbered test results will be printed, along with a summary of how many tests were run, how many passed, how many failed, and how many error conditions were generated. (It is possible to customize the behavior of :lut so that only test failures will be printed; it is also possible to customize the behavior of :lut so that unexpected error conditions halt testing.)
You define a test with define-test:
(define-test name exp1 exp2 …)
This defines a test called name. The expressions can be anything, but typically most will be assertion forms.
Tests can be defined before the code they test, even if they’re testing macros. This is to support test-first
programming.
After defining your tests and the code they test, run the tests with
(test)
This runs every test defined in the current package. To run just a specific test, use
(test name1)
To run multiple specific tests, use
(test (name1 name2 …))
To run tests in a particular package named mypackage, use
(test (name1 name2 …) :in-package mypackage)
e.g., (test (greater summit) :in-package mypackage).
The following example
- defines some tests to see if
pick-greaterreturns the larger of two arguments - defines a deliberately broken version of
pick-greater - runs the tests
First, we define some tests.
> (define-test pick-greater
(assert-equal 5 (pick-greater 2 5))
(assert-equal 5 (pick-greater 5 2))
(assert-equal 10 (pick-greater 10 10))
(assert-equal 0 (pick-greater -5 0)) )
PICK-GREATER
Following good test-first programming practice, we run these tests before writing any code.
> (test pick-greater)
PICK-GREATER:1 (PICK-GREATER 2 5) failed:
Expected 5 but saw #<Undefined-Function #x1719AD0>; …
PICK-GREATER:2 (PICK-GREATER 5 2) failed:
Expected 5 but saw #<Undefined-Function #x175ACC0>; …
PICK-GREATER:3 (PICK-GREATER 10 10) failed:
Expected 10 but saw #<Undefined-Function #x177A640>; …
PICK-GREATER:4 (PICK-GREATER -5 0) failed:
Expected 0 but saw #<Undefined-Function #x179A170>; …
PICK-GREATER: 0 assertions passed, 4 failed; including 4 execution errors.
This shows that we need to do some work. So we define our broken version of pick-greater.
> (defun pick-greater (x y) x) ;; deliberately wrong
PICK-GREATER
Now we run the tests again:
> (test pick-greater)
PICK-GREATER:1 (PICK-GREATER 2 5) failed: Expected 5 but saw 2
PICK-GREATER:2 (PICK-GREATER 5 2) succeeded: Got 5
PICK-GREATER:3 (PICK-GREATER 10 10) succeeded: Got 10
PICK-GREATER:4 (PICK-GREATER -5 0) failed: Expected 0 but saw -5
PICK-GREATER: 2 assertions passed, 2 failed; including 0 execution errors.
This shows two failures. The first test failed because (pick-greater 2 5) returned 2 when 5 was expected, and the fourth test case failed because (pick-greater -5 0) returned -5 when 0 was expected.
Assertion Forms
The most commonly used assertion form is
(assert-equal value form)
This tallies a failure if form returns a value not #’equal to value. Both value and test are evaluated in the local lexical environment. This means that you can use local variables in tests. In particular, you can write loops that run many tests at once:
> (define-test my-sqrt
(dotimes (i 5)
(assert-equal i (my-sqrt (* i i)))))
MY-SQRT
> (defun my-sqrt (n) (/ n 2)) ;; wrong definition!!
> (set-verbose-success nil) ;; only show failures
> (test my-sqrt)
MY-SQRT:2 (MY-SQRT (* I I)) failed: Expected 1 but saw 1/2
MY-SQRT:4 (MY-SQRT (* I I)) failed: Expected 3 but saw 9/2
MY-SQRT:5 (MY-SQRT (* I I)) failed: Expected 4 but saw 8
MY-SQRT: 2 assertions passed, 3 failed; including 0 execution errors.
Notice that the above output doesn’t tell us for which values of i the code failed. Fortunately, you can fix this by adding expressions at the end of the assert-equal. These expression and their values will be printed when the results of an assertion are announced.
> (define-test my-sqrt
(dotimes (i 5)
(assert-equal i (my-sqrt (* i i)) i))) ;; added i at the end
MY-SQRT
> (run-tests my-sqrt)
MY-SQRT:2 (MY-SQRT (* I I)) failed: Expected 1 but saw 1/2
I => 1
MY-SQRT:4 (MY-SQRT (* I I)) failed: Expected 3 but saw 9/2
I => 3
MY-SQRT:5 (MY-SQRT (* I I)) failed: Expected 4 but saw 8
I => 4
MY-SQRT: 2 assertions passed, 3 failed; including 0 execution errors.
The next most useful assertion form is
(assert-true test)
This tallies a failure if test returns false. Again, if you need to print out extra information, just add expressions after test.
There are also assertion forms to test what code prints, what errors code returns, or what a macro expands into. A complete list of assertion forms is in the reference section.
How to Organize Tests with Packages
Tests are grouped internally by the current package, so that a set of tests can be defined for one package of code without interfering with tests for other packages.
If your code is being defined in cl-user, which is common when learning Common Lisp, but not for production-level code, then you should define your tests in cl-user as well.
If your code is being defined in its own package, you should define your tests either in that same package, or in another package for test code. The latter approach has the advantage of making sure that your tests have access to only the exported symbols of your code package.
For example, if you were defining a date package, your date.lisp file would look like this:
(defpackage :date
(:use :common-lisp)
(:export #:date->string #:string->date))
(in-package :date)
(defun date->string (date) …)
(defun string->date (string) …)
Your date-tests.lisp file would look like this:
(defpackage :date-tests
(:use :common-lisp :lisp-unit-tester :date))
(in-package :date-tests)
(define-test date->string
(assert-true (string= ... (date->string ...)))
...)
...
You could then run all your date tests in the test package:
(in-package :date-tests) (test)
Alternately, you could run all your date tests from any package with:
(lisp-unit-tester:test :in-package :date-tests)
Reference Section
Here is a list of the functions and macros exported by lisp-unit-tester.
Functions for managing tests
- (define-test name exp1 exp2 …)
- This macro defines a test called name with the expressions specified, in the package specified by the value of
*package*in effect whendefine-testis executed. The expresssions are assembled into runnable code whenever needed byrun-tests. Hence you can define or redefine macros without reloading tests using those macros. - (remove-tests names [package])
- This function removes the tests named for the given package. If no package is given, the value of
*package*is used. - (remove-all-tests [package])
- This function removes the tests for the given package. If no package is given, it removes all tests for the current package. If
nilis given, it removes all tests for all packages. - (test [test | (test*)] [:in-package package])
- Arguments: test – (not evaluated) a symbol designating a test defined with define-test
package – the package to run the tests in, defaults to *package*
Semantics: run the named test or tests in the package. If the tests are not provided or are
nil, then all tests in <package> are run sequentially.
Examples:
(test test1)
(test (test1 test2) :in-package :cl-user)
See Also: define-test
Returns: no values - (use-debugger flag)
- If <flag> is nil, errors will be caught by lisp-unit-tester and the debugger will not be invoked. If <flag> is t, lisp-unit-tester will not attempt to intercept errors. By default, lisp-unit-tester is configured to intercept errors so that testing can proceed without the debugger.
- (set-verbose-success flag)
- Set <flag> to t when you want lisp-unit-tester to announce successful tests as well as failures. Set <flag> to nil when you want lisp-unit-tester only to announce failures. By default, lisp-unit-tester is configured to announce successes as well as failures.
- (set-abort-on-conditions flag)
- Set <flag> to t if you want an unexpected error signal to abort an entire test. Set <flag> to nil if you want a test to continue after encountering an unexpected error. By default, lisp-unit-tester is configured to continue after encountering unexpected errors.
- (set-tag number)
- By default, the assertions within a particular test suite are numbered sequentially starting at 1. You can change the origin of the running sequence by calling 'set-tag. This can be useful if you need to have a unique identifying tag for each assertion within a test.
Forms for assertions
All of the assertion forms are macros. They tally a failure if the associated predication returns false. Assertions can be made about return values, printed output, macro expansions, and even expected errors. Assertion form arguments are evaluated in the local lexical environment.
All assertion forms allow you to include additional expressions at the end of the form. These expressions and their values will be printed only when the test fails.
Return values are unspecified for all assertion forms.
-
Assertions Involving Equality
(assert-eq value form [form1 form2 ...])
(assert-eql value form [form1 form2 ...])
(assert-equal value form [form1 form2 ...])
(assert-equalp value form [form1 form2 ...])
(assert-equality predicate value form [form1 form2 ...]) - These macros tally a failure if value is not equal to the result returned by form, using the specified equality predicate. In general,
assert-equalis used for most tests. - Example use of
assert-equality:
(assert-equality #’set-equal ‘(a b c) (unique-atoms ‘((b c) a ((b a) c)))) -
Assertions Involving Boolean Value
(assert-true test [form1form2 ...])
(assert-false test [form1form2 ...])
assert-truetallies a failure if test returns false.
assert-falsetallies a failure if test returns true.-
Assertions of What Gets Printed
(assert-prints “output” form [form1 form2 ...])
- This macro tallies a failure if form does not print to standard output stream a string equal to the given string, ignoring differences in beginning and ending newlines.
-
Assertions Involving Macroexpansion
(assert-expands expansion form [form1 form2 ...])
-
This macro tallies a failure if
(macroexpand-1 form)does not produce a value equal to expansion.
-
Assertions of Error Conditions
(assert-no-error form [form1 form2 ...])
-
This macro tallies a failure if form signals an unhandled condition.
- (assert-error condition-type form [form1 form2 ...])
- This macro tallies a failure if form does not signal an error that is equal to or a subtype of condition-type. Use
errorto refer to any kind of error. See condition types in
the Common Lisp Hyperspec for other possible names. - For example, (assert-error ‘arithmetic-error (foo 0)) would assert that
foois supposed to signal an arithmetic error when passed zero.
Utility predicates
Several predicate functions are exported that are often useful in writing
tests with assert-equality.
- (logically-equal value1value2)
- This predicate returns true of the two values are either both true, i.e.,
non-NIL, or both false. - (set-equal list1list2 [:test])
- This predicate returns true the first list is a subset of the second and
vice versa.:testcan be used to specify an equality predicate.
The default iseql.
Package Documentation for Corman Lisp
May 5, 2008
Note: Applies to Corman Lisp versions > 3.01 running under Windows XP or Vista. The current release of Corman Lisp (version 3.01) needs some patching to work with :document-package. When the next release of Corman Lisp happens, I will update things as needed; until then, if you have an urgent need for a package documentation library under Corman Lisp you may contact me at g n e i l .at. L i b e r t y R e a c h . c o m and I will send you the necessary patches.
Document-Package – a package for documenting packages in Corman Lisp
I am releasing a utility for documenting a lisp package — unsurprisingly called :document-package. It is a brain-dead cousin to the familiar doxygen utility. The major difference is that, in lisp, it is possible and considered good practice to include documentation strings as part of the definition of your lisp objects as you create them. Thus, a properly-written lisp library should be self-documenting.
The :document-package package steps through all exported and local objects in a package, extracts their documentation strings and other information such as associated lambda-lists, types, etc., and displays it all in a legible format. It is possible to produce package documentation as plain text or as formatted html.
You can download the source for the package from here. There is a document-package.system file included in the distribution if you prefer to load via mk:defsystem. There is also a load-me.lisp file if you wish to simply load a file to get the package. In any event, (use-package :dp) will make the package available. You can test the package by running it on itself via (document :dp).
The following documentation was produced by running (document :dp :html t).
Documentation for Package DOCUMENT-PACKAGE
Nicknames:
- DP
Package Uses:
- COMMON-LISP
The :document-package package defines several useful documentation functions for use with Corman
Common Lisp v3.02+. It is meant to help in collating and writing documentation for lisp packages.
The way to use this package is as follows (in order of importance):
1. Document your package by placing a nice informative documentation string in the defpackage
declaration. Like this one!
2. Document each of your exported functions. Ideally including preconditions, semantics, side
effects, error conditions, and return values for each function.
3. Document each of your exported variables. Describe their semantics and legal values.
4. Document each of your local functions at least by describing the semantics.
Run the function (document ‘package-name). All the documentation will be
collected and displayed.
To test this package:
1. Un-block-comment the series of definitions at the end of the document-package.lisp file.
2. Reload the document-package package.
3. Run document-package on itself via (dp:document :dp) or (dp:document ‘DP) or
(dp:document “DOCUMENT-PACKAGE”) — they should all work by printing documentation to stdout.
Author: Neil Haven
Contact: [DELETED]
Release Version: 1.0.0 [May 2008]
Legal Restrictions on Use: GNU LGPL
Symbols Exported from DOCUMENT-PACKAGE
DOCUMENT [FUNCTION]
Usage: (DOCUMENT PACKAGE &KEY (STREAM *STANDARD-OUTPUT*) (HTML NIL))
Syntax: (document package :stream stream :html flag)
Arguments: /package/ is a package designator. The output /stream/ defaults to *standard-output*.
/html/ defaults to nil.
Preconditions: /package/ is loaded.
Semantics: collect and print all documentation for /package/ onto the output /stream/. If
/html/ is t, then the documentation is formatted as html.
Side Effects:
Error Conditions: /package/ must be loaded
Returns: t on success, nil on failure.
Symbols Internal to DOCUMENT-PACKAGE
FORMAT-H2 [MACRO]
Usage: (FORMAT-H2 STREAM FMT &REST ARGS)
Creates an header2 element
DOCUMENT-TYPE-BINDING [FUNCTION]
Usage: (DOCUMENT-TYPE-BINDING SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a type from deftype
FORMAT-H1 [MACRO]
Usage: (FORMAT-H1 STREAM FMT &REST ARGS)
Centers a header1 element
FORMAT-TITLED-LIST [MACRO]
Usage: (FORMAT-TITLED-LIST STREAM TITLE LIST)
==> title: a, b, c
NOPKG [FUNCTION]
Usage: (NOPKG STREAM ARGUMENT COLON ATSIGN &REST X)
a callable print function to ignore package designators in output
DOCUMENT-STRUCT-BINDING [FUNCTION]
Usage: (DOCUMENT-STRUCT-BINDING SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a structure from defstruct
FIND-DOC-STRING [FUNCTION]
Usage: (FIND-DOC-STRING BODY)
searches for a documentation string in a method body
DOCUMENT-SYMBOL-MACRO [FUNCTION]
Usage: (DOCUMENT-SYMBOL-MACRO SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a symbol macro
DOCUMENT-FUNCTION-BINDING [FUNCTION]
Usage: (DOCUMENT-FUNCTION-BINDING SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
dispatch depending on function type; defmacro, defun, defgeneric, define-modify-macro
FORMAT-HTML-HEADER [MACRO]
Usage: (FORMAT-HTML-HEADER STREAM TITLE)
Create a html header with a title
DOCUMENT-SETF [FUNCTION]
Usage: (DOCUMENT-SETF SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for defsetf
DOCUMENT-SETF-EXPANDER [FUNCTION]
Usage: (DOCUMENT-SETF-EXPANDER SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for define-setf-expander
DOCUMENT-METHOD-FUNCTION [FUNCTION]
Usage: (DOCUMENT-METHOD-FUNCTION METHOD &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for standard method functions from defmethod
REFORMAT-TEXT-FOR-HTML [MACRO]
Usage: (REFORMAT-TEXT-FOR-HTML TEXT)
==> text
DOCUMENT-STANDARD-FUNCTION [FUNCTION]
Usage: (DOCUMENT-STANDARD-FUNCTION SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a function from defun
DOCUMENT-CLASS-BINDING [FUNCTION]
Usage: (DOCUMENT-CLASS-BINDING SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a class from defclass
*HTML-VERSION* [SYMBOL]: T
set to t for output formatted as html, nil otherwise
FORMAT-HTML-FOOTER [MACRO]
Usage: (FORMAT-HTML-FOOTER STREAM)
Create a html footer
DOCUMENT-MACRO-FUNCTION [FUNCTION]
Usage: (DOCUMENT-MACRO-FUNCTION SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a macro from defmacro and define-modify-macro
DOCUMENT-GENERIC-FUNCTION [FUNCTION]
Usage: (DOCUMENT-GENERIC-FUNCTION SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a generic function from defgeneric
DOCUMENT-VARIABLE-BINDING [FUNCTION]
Usage: (DOCUMENT-VARIABLE-BINDING SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a special variable via defvar defparameter defconstant
PRINT-DOCUMENTATION [FUNCTION]
Usage: (PRINT-DOCUMENTATION VAR &OPTIONAL (STREAM *STANDARD-OUTPUT*))
print documentation for var on stream
DOCUMENT-COMPILER-MACRO-FUNCTION [FUNCTION]
Usage: (DOCUMENT-COMPILER-MACRO-FUNCTION SYM &OPTIONAL (STREAM *STANDARD-OUTPUT*))
documentation for a compiler macro from define-compiler-macro
BREAK-TEXT [FUNCTION]
Usage: (BREAK-TEXT TEXT)
substitutes [br/]
~% for all occurrences of
~% and #newline in text
How To Run mk:defsystem.lisp under Corman Common Lisp
April 19, 2008
I recently spent some time getting the clocc distribution of defsystem.lisp working with Corman Lisp. If you are developing a large system you need some sort of system definition facility; In Lisp you basically have two choices if you don’t want to roll your own: asdf:defsystem or mk:defsystem. Corman Lisp has problems with some features of asdf, so that leaves mk:defsystem. The following is a description of what needed to be done to get mk:defsystem working.
I have banged on it some, but not a lot yet. After the next release of Corman Lisp I will petition to get the changes added to mk:defsystem so that it works out of the box.
Applies to:
Corman Common Lisp v3.01+, mk:defsystem v3.6i
Source:
downloaded defsystem.lisp from
http://clocc.cvs.sourceforge.net/*checkout*/clocc/clocc/src/defsystem-3.x/defsystem.lisp
into ~/lisp/systems directory.
Required changes to source defsystem.lisp:
remove the defun near line 1082:
#+cormanlisp
(defun compile-file-pathname…)
remove the defun near line 1091:
#+cormanlisp
(defun file-namestring…)
remove the defun near line 1231:
#+cormanlisp
(defun home-subdirectory…)
remove the macro “#-cormanlisp” near line 1243 so that the region reads
(defun home-subdirectory…
instead of
#-cormanlisp
(defun home-subdirectory…
Changes to Corman Lisp:
Added a few definitions to [CORMAN]init.lisp for common-lisp compatibility
(defun file-namestring (path)
(let ((name (pathname-name path))
(type (pathname-type path)))
(format nil “~@[~A~]~@[.~A~]” name type)))
(defun software-version () nil)
(defun software-type () nil)
(defun directory-namestring (path)
(let ((dir (pathname-directory path)))
(namestring (make-pathname :directory dir))))
Added location for defsystem module in [CORMAN]init.lisp:
(setq *lisp-systems-directory* (concatentate ’string (namestring (user-homedir-pathname)) “lisp\\systems\\”))
(push-module-directory *lisp-systems-directory*)
Changes for Convenience
You will probably want to change some global defaults in defsystem.lisp so defsystem doesn’t prompt you needlessly. Here are my settings (beginning around line 1319 in defsystem.lisp).
Variable: *oos-verbose* Setting: t
Variable: *load-source-if-no-binary* Setting: t
Variable: *bother-user-if-no-binary* Setting: nil
Variable: *load-source-instead-of-binary* Setting: t
Variable: *compile-during-load* Setting: nil
Testing:
1. Create ~/lisp/systems/test.system to contain:
(mk:defsystem “test”
:source-pathname *lisp-systems-directory*
:source-extension “lisp”
:binary-pathname nil
:binary-extension nil
:components ((:file “test”))
:depends-on nil)
2. Create ~/lisp/systems/test.lisp to contain:
(defpackage “TEST”
(:documentation “A test package for defsystem”))
(in-package “TEST”)
(export ‘(test-function))
(defun test-function ()
“Test package found and functioning.”)
3. Start Corman Lisp.
4. (require ‘defsystem)
==> ignorable compiler warnings…
==> “DEFSYSTEM”
[4a. optional: (use-package "MAKE")]
5. (mk:oos “TEST” :load :verbose t)
==> …
==> (#<FILE: test>)
6. (use-package :test)
==> T
7. (test-function)
==> “Test package found and functioning.”
Notes:
Another system definition facility, ASDF, is part of the Corman Lisp distribution, but it tended to
hang my system under XEmacs and Corman Lisp, so I have been using mk:defsystem.
A queer feature of the defpackage and export functions is that, under Corman Lisp anyway, defpackage
requires you to designate exported symbols via upcase strings (in most cases) whereas export
requires you to designate the symbols by using the symbols. This is in accord with the CLHS, but
counter-intuitive.
The :compile option to oos seems to cause problems. Stick with :load.
Documentation for defsystem:
Excerpted verbatim from http://rpgoldman.real-time.com/lisp/defsystem.html
The general format of a component’s definition is:
<definition> ::= (<type> <name> [:host <host>] [:device <device>]
[:source-pathname <pathname>]
[:source-extension <extension>]
[:binary-pathname <pathname>]
[:binary-extension <extension>]
[:package <package>]
[:initially-do <form>]
[:finally-do <form>]
[:components (:serial <definition>*)|(<definition>*)]
[:depends-on (<name>*)]
[:compile-form <form>]
[:load-form <form>])
<type> ::= :system | :module | :file
<def-or-fn> ::= <definition> | <pathname>
Using Systems with Operate-on-System
The function operate-on-system (aka: oos) is used to compile or load a
system, or do any other operation on a system. At present only compile and load operations are
defined, but other operations such as edit, hardcopy, or applying arbitrary functions (e.g.,
enscript, lpr) to every file in the system could be added easily.
The syntax of operate-on-system is as follows:
operate-on-system system-name operation
&key force test verbose dribble load-source-instead-of-binary
load-source-if-no-binary bother-user-if-no-binary
SYSTEM-NAME
is the name of the system and may be a symbol or string.
OPERATION
is ‘compile (or :compile) or ‘load (or :load) or any new operation defined by the user.
FORCE
determines what files are operated on: :all (or T) specifies that all files in the system should be
used :new-source compiles only those files whose sources are more recent than the binaries and loads
the source if it is newer than the binaries. This allows you to load the most up to date version of
the system. :new-sources-and-dependents uses all files used by :new-source, plus any files that
depend on the those files or their dependents (recursively). Force may also be a list of the
specific modules or files to be used (plus their dependents). The default for ‘load is :all and for
‘compile is :new-source-and-dependents.
VERSION
indicates which version of the system should be used. If nil, then the usual root directory is
used. If a symbol, such as ‘alpha, ‘beta, ‘omega, :alpha, or ‘mark, it substitutes the appropriate
(lowercase) subdirectory of the root directory for the root directory. If a string, it replaces the
entire root directory with the given directory.
VERBOSE
is T to print out what it is doing (compiling, loading of modules and files) as it does it. (default
nil)
TEST
is T to print out what it would do without actually doing it. If test is T it automatically sets
verbose to T. (default nil)
DRIBBLE
should be the pathname of a dribble file if you want to keep a record of the compilation. (default
nil)
LOAD-SOURCE-INSTEAD-OF-BINARY
is T to force the system to load source files instead of binary files. (default nil)
LOAD-SOURCE-IF-NO-BINARY
is T to have the system load source files if the binary file is missing. (default nil)
BOTHER-USER-IF-NO-BINARY
is T to have the system bother the user about missing binaries before it goes ahead and loads them
if load-source-if-no-binary is T. (default t) Times out in 60 seconds unless *use-timeouts* is set
to nil.
Changes to Require
This defsystem interacts smoothly with the require and provide facilities of Common
Lisp. Operate-on-system automatically provides the name of any system it loads, and uses the new
definition of require to load any dependencies of the toplevel system.
To facilitate this, three new optional arguments have been added to require. Thus the new syntax of
require is as follows:
require system-name &optional pathname definition-pname default-action version
If pathname is provided, the new require behaves just like the old definition. Otherwise it first
tries to find the definition of the system-name (if it is not already defined it will load the
definition file if it is in the current-directory, the central-registry directory, or the directory
specified by definition-pname) and runs operate-on-system on the system definition. If no definition
is to be found, it will evaluate the default-action if there is one. Otherwise it will try running
the old definition of require on just the system name. If all else fails, it will print out a
warning.
A Sample System Definition and Its Use
Here’s a system definition for the files in the following directory structure:
% du -a test
1 test/fancy/macros.lisp
1 test/fancy/primitives.lisp
3 test/fancy
1 test/macros.lisp
1 test/primitives.lisp
1 test/graphics/macros.lisp
1 test/graphics/primitives.lisp
3 test/graphics
1 test/os/macros.lisp
1 test/os/primitives.lisp
3 test/os
12 test
(defsystem test
:source-pathname “/afs/cs.cmu.edu/user/mkant/Defsystem/test/”
:source-extension “lisp”
:binary-pathname nil
:binary-extension nil
:components ((:module basic
:source-pathname “”
:components ((:file “primitives”)
(:file “macros”
:depends-on (“primitives”))))
(:module graphics
:source-pathname “graphics”
:components ((:file “macros”
:depends-on (“primitives”))
(:file “primitives”))
:depends-on (basic))
(:module fancy-stuff
:source-pathname “fancy”
:components ((:file “macros”
:depends-on (“primitives”))
(:file “primitives”))
:depends-on (graphics operating-system))
(:module operating-system
:source-pathname “os”
:components ((:file “primitives”)
(:file “macros”
:depends-on (“primitives”)))
:depends-on (basic)))
:depends-on nil)
<cl> (operate-on-system ‘test ‘compile :verbose t)
Slime, Swank, and Corman Lisp v3.01+
April 4, 2008
How to Run Corman Lisp with XEmacs/SLIME
I have more-or-less successfully got Corman Common Lisp running with XEmacs/SLIME. I will describe how…
In order to get things running I’ve made a number of changes to Swank and Corman Lisp. If the cookbook below doesn’t work for you I have probably forgotten to document one of the changes, so if you’ll let me know I’ll try to track it down.
Versions
This report applies to:
-
xemacs (version 21.4.21)
-
slime (version 2.0 changelog dated 2006-04-20)
-
corman common lisp (version 3.01)
-
Windows XP and Vista.
Procedure
Download and install XEmacs and Corman Common Lisp.
Modify XEmacs
Run XEmacs and make the following modifications to your init file by either editing it directly (if you know where XEmacs looks for it) or by using /Options/Edit Init File from the XEmacs menu bar:
(add-to-list ‘load-path “/Program Files/XEmacs/xemacs-packages/lisp/slime-2.0″)
(require ’slime)
(slime-setup :autodoc t)
Of course, if you have installed XEmacs in a non-default location, you must modify the load path accordingly.
Modify Corman Common Lisp
You need patched versions of the following Corman Lisp distribution files: clos.lisp, xp.lisp, and trace.lisp. These are available from Corman Lisp. When the version of Corman Lisp subsequent to v3.01 is available you will be able simply to use the normal files from the distribution and will not need to acquire patched files from Corman Lisp. In the meantime, you should add the patched files into your Corman Lisp image.
You must modify the Sys/sockets.lisp file and the Sys/winsock.lisp files from the Corman Lisp distribution as suggested here by ungil on 19 Sep 2006. With apologies, I am copying ungil’s changes verbatim here:
| $ diff -b sockets.lisp 469c469 < (winsock::listen (socket-descriptor s) SOMAXCONN)))) — > (listen (socket-descriptor s) SOMAXCONN))))$ diff -b winsock.lisp 14,15d13 < (shadow ‘(cl:listen)) < |
I have integrated these changes into my Corman Lisp image. They do not seem to have broken anything, but I cannot be sure. If you want to be on the safe side, do not add the sockets.lisp and winsock.lisp changes into your lisp image, just modify the relevant functions after initializing your lisp.
You must modify the Modules/gray-streams.lisp file so that it defines stream-terpri. Add the following code to the file
| (defmethod stream-terpri ((stream fundamental-character-output-stream)) (if (stream-start-line-p stream) nil (progn (stream-write-char stream #\newline) t))) |
If you prefer, you can download the modified file from here, and simply copy it over the old one in the Modules directory. The modified file contains a number of additional stub function definitions that should be transparent to any code currently using gray-streams.lisp.
You need to make a few additional, special changes to Corman Lisp for Slime compatibility. I have collected these in a separate file called slime-compatibility.lisp here. The file slime-compatibility makes the following modifications:
-
Fix the definition of #’user-homedir-pathname;
-
Redefine #’pprint so that it sets *print-pretty* nil;
-
Redefine #’cl:machine-instance, #’cl:machine-type, #’cl:machine-version;
-
Add a feature to *features*;
When you start Corman Lisp, you will need to reload sockets.lisp, winsock.lisp, gray-streams.lisp, and slime-compatibility.lisp so that the new definitions are visible to the system. Alternatively, you can create a new lisp image using the function #’compile-cormanlisp-image. (Consult the documentation.)
You must load all these changes into Corman Lisp before loading swank.
Modify Swank
You must make a number of modifications to the code distributed with XEmacs.
In the slime distribution file [XEmacs]/xemacs-packages/lisp/slime-2.0/swank.lisp you need to comment out a series of assertions. Open the file and search for ‘test-print-arglist’. Then comment out the progn as follows:
#-cormanlisp(progn
(assert (test-print-arglist ‘(function cons) “(function cons)“))
(assert (test-print-arglist ‘(quote cons) “(quote cons)“))
(assert (test-print-arglist ‘(&key (function #’+)) “(&key (function #’+))“))
(assert (test-print-arglist ‘(&whole x y z) “(y z)“))
(assert (test-print-arglist ‘(x &aux y z) “(x)“))
(assert (test-print-arglist ‘(x &environment env y) “(x y)“)))
;; Expected failure:
;; (assert (test-print-arglist ‘(&key ((function f))) “(&key ((function f)))”))
I have also redefined #’swank::swank-debugger-hook so that you can disable the debugger by controlling the value of *maybe-disable-debugger*. In addition to being convenient, it avoids some instabilities involved with the Corman Lisp Debugger. Here is the relevant section of swank.lisp:
#+cormanlisp
(defvar *maybe-avoid-debugger* t
“do not enter debugger if it is possible simply to abort the request”)
#+cormanlisp
(defun swank-debugger-hook (condition hook)
(declare (ignore hook))
(cond ((and *maybe-avoid-debugger* (find-restart ‘abort-request))
(send-to-emacs `(:write-string ,(format nil “~A” condition)))
(invoke-restart ‘abort-request))
(*emacs-connection*
(debug-in-emacs condition))
((default-connection)
(with-connection ((default-connection))
(debug-in-emacs condition)))))
#-cormanlisp
(defun swank-debugger-hook (condition hook)
“Debugger function for binding *DEBUGGER-HOOK*.
Sends a message to Emacs declaring that the debugger has been entered,
then waits to handle further requests from Emacs. Eventually returns
after Emacs causes a restart to be invoked.”
(declare (ignore hook))
(cond (*emacs-connection*
(debug-in-emacs condition))
((default-connection)
(with-connection ((default-connection))
(debug-in-emacs condition)))))
In the slime distribution file [XEmacs]/xemacs-packages/lisp/slime-2.0/swank-corman.lisp you need to modify the definition of #’arglist as follows:
| (defimplementation arglist (name) (handler-case (cond ((and (symbolp name) (macro-function name)) (let* ((function (symbol-function name)) (macro-list (ccl::macro-lambda-list function))) (if macro-list macro-list :not-available))) (t (when (symbolp name) (setq name (symbol-function name))) (if (eq (class-of name) cl::the-class-standard-gf) (generic-function-lambda-list name) (ccl:function-lambda-list name)))) (error () :not-available))) |
Appropriate swank-corman.lisp and swank.lisp files are available as a .zip file for download here. After the next release of Corman Common Lisp (whatever version comes after 3.01), I will see about getting the changes added to the official XEmacs distribution so the standard distribution will simply work out of the box.
Run Slime and Corman Lisp
-
Start Swank on the corman lisp side via the following steps:
-
Start [Corman Lisp]/CormanLisp.exe;
-
(load “Sys/winsock.lisp”)
-
(load “Sys/sockets.lisp”)
-
(load “Modules/gray-streams.lisp”)
-
(load “swank-compatibility.lisp”) ; from wherever you saved swank-compatibility.lisp
- Change directory to the slime distribution. For example (setf (cormanlisp:current-directory) “C:\\Program Files\\XEmacs\\xemacs-packages\\lisp\\slime-2.0\\“)
-
(load “swank-loader.lisp”) ; this will take some time
- (swank::create-server)
-
-
Assuming everything works, you should see a notice that Swank started at port: 4005.
-
Start xemacs.
-
Initiate slime via
-
Alt-x slime-connect<return>
-
Host: 127.0.0.1<return>
-
Port: 4005<return>
-
-
Assuming everything works, you should see a “Connected” notice.
If you prefer to keep the full debugging capabilities, setq swank::*maybe-avoid-debugger* t.
What Works; What Doesn’t
;;; As of
;;; 04 Apr 2008 -
;;; Functionality Description Yes/No/Kinda
;;; ===============================================
;;; C-M-x, etc. basic editing and eval Y
;;; arglist display K
;;; C-c C-c, etc compilation Y
;;; loading files Y
;;; C-c C-d d, &c describe/hyperspec Y
;;; debugger K
;;; C-c I inspector Y
;;; profiling N
;;; stepper N
;;; comm styles (:spawn etc.) N
;;; C-c C-w c, &c cross referencing N
;;; M-. edit definition Y
;;; M-TAB complete symbol Y
;;; C-c C-s complete form K
Notes:
The debugger is mostly ok. It is possible to crash it if you are ungentle. I personally find the lisp-style of debugger overkill in most instances, so it is nice to be able to disable it unless needed. If you want to disable it, at the slime repl setq swank::*maybe-avoid-debugger* t. If you want to re-enable it, setq swank::*maybe-avoid-debugger* nil.
Corman Lisp doesn’t currently feature a stepper.
The arglist display works for all user-defined functions and some built-in functions. Inexplicably, a large fraction of the built-in functions haven’t saved their arglists, so, for example, you can’t get arglist information on #’defun.
Qt: Quality Control, Documentation, Pre-Release Testing
March 31, 2008
In a response to a previous post, Henrik Hartz of Trolltech asked for some specific comments on the documentation and testing issues I find troubling in Trolltech’s Qt. After some consideration, I decided to publish my response to Henrik instead of taking it off-line. Although what I have to say is critical, I believe Qt is a good product and would not want anyone to construe what I have to say as advice not to use Qt in their development work. Do use Qt.
Since we’ve talked a bit about plugins, perhaps I can illustrate my concerns with a few recent examples regarding plugins. The first is quite simple: The Qt 4.3.4 distribution contains a ‘trivial’ example meant to illustrate the use of Qt’s plugin system — the echowindow program. The program will compile, but does not work (reported as Qt Issue N202995). To me, that fact is simply astounding. Trolltech should have a long series of tests that need to be run before any new version of Qt is released; the simplest tests should make sure that all of the demonstration code works. How could such a thing be overlooked? If Trolltech hasn’t completed the simplest tests properly, how can I be confident they have uncovered any problem that is complicated? When I add that fact together with the fact that QPluginLoader actually *crashes* your application when you try to use it in a static build of Qt (reported as Issue N202989), it makes me think QPluginLoader was tested very superficially indeed.
The second problem that really made me worry about Qt’s commitment to quality was the problem with Qt internal plugins (style, imageformat, sqldriver, …) being loaded in static builds of Qt. Because of this problem I had been unable to use the debugger or to run any Qt application from within Visual Studio for several months; my applications would crash unpredictably if I ran them within the Visual Studio environment. I brought the problem to Trolltech’s attention in August of 2007 (Issue N173669). It was finally resolved with the 4.3.4 release of Qt this year. Since the problem was painful for me, I can only assume it was painful for anyone else who tried to use a static build of Qt with Visual Studio. The big question is: why wasn’t it painful for Trolltech? Was nobody at Trolltech trying to use a static build of Qt under Visual Studio? If they were, their applications must have been crashing just like mine. But, obviously, they weren’t. If Trolltech claims to support Visual Studio, and if Trolltech claims to support static builds of Qt, Trolltech must use and test static builds under Visual Studio. But do they?.
I could tell similar stories about several different areas of Qt, but perhaps you get the idea. The amount of testing that is required for a product like Qt is *enormous*. And the bigger the application library gets, the more testing that is required. The pre-release testing requirements scale with the number of possible interactions in the code – the code-base squared, not linearly!
As much as anything, this is a question of attitude and management philosophy. I have attached a letter I sent to Trolltech late last year that tries to explain some of the attitude problems I see. It is meant to be constructive. The thing to remember is that it takes time for a user to report problems to Trolltech – I limit myself to an average of a couple bug reports a month, and for every one I report there are at least three or four I don’t. It is a thankless process for the users that results in a better product for Trolltech. Trolltech should realize this, use *every* report as an opportunity to improve either the documentation or the product, and be grateful.
I will close with the worst example I have seen from Trolltech’s support. Late last year, in regard to a problem where the documentation of one of Trolltech’s classes did not match the behavior of the class, and the example application from Trolltech confused the issue even further (reported as Issue N179283), I received an email from one of Trolltech’s support people that nicely stated the attitude problem,
“This is just an example application,” he said, “It is not supposed to be perfect.”
Nonsense, I say! If Trolltech isn’t supposed to get their sample applications perfect, how can we expect to get our real applications perfect? Trolltech’s goal *must* be to try to get their samples perfect, otherwise, how does Trolltech find problems? Sometimes I wonder whether Trolltech tries very hard. At the very least, I wonder whether Trolltech’s support people have the correct dog-on-a-bone attitude when it comes to finding and fixing problems.
Please understand that I find Qt to be a very good product; it is what Microsoft should have written for MFC, but didn’t. I have taken the time to write this post because I want Qt to succeed even more.
Letter to Trolltech
Before I give Trolltech a renewal order, I’d like to express my displeasure with your support group. The attitude I’d like to see from support is a dogged persistence in hunting down and removing bugs. Indeed, a bug report from a user like me should be welcomed as an opportunity to improve either the documentation (when the user misunderstands how to apply a Qt functionality) or the code (when some part of the Qt functionality is broken). I’m not sure that Trolltech understands this.
You should understand that it takes hours from my time to prepare a bug report for Trolltech. Trolltech won’t even look at a bug unless it is distilled down to a simple case and wrapped up in a nice, neat package — even when this is possible IT TAKES TIME! Since each bug report represents an opportunity to improve the Qt product, Trolltech’s first response upon receiving a bug-report should be gratitude: Gratitude that someone would take the time. Second, Trolltech should understand that since it is a lot of trouble to report a bug, most people don’t bother. We find work-arounds.
For every bug I report, I probably ignore or work around another five. So don’t blow them off by pretending something is an isolated problem. If someone brings it to your attention, it isn’t an isolated problem. Third, I often get the feeling that Trolltech’s attitude is that we, the users, are uniformly foolish, wrongheaded, and we never read the f’g manual. We, the users, are smart enough to use your product, so drop the attitude.
Trolltech support tends to disbelieve or excuse bugs until it is illustrated in glaringly obvious ways that a bug exists. Trolltech’s attitude should be quite the contrary: assume everything reported to you is serious until proved otherwise. At the very least, try to understand the source of your user’s confusion and improve your documentation accordingly.
Trolltech is playing a dangerous game here. Sure, you want to spend as little as possible on support and bug-fixing; I understand that. But the more problems get pushed under the rug, and the less extensive testing Trolltech does of their product, the greater the likelihood that your product eventually reaches a byzantine point of no return with complexity piled on complexity and scary no-go zones in the code where everybody is afraid to touch anything for fear of breaking something.
…
RSS