Last update: 2024-3-18 21:00 PST
wile
is a small scheme-to-c compiler which I'm writing; it's by no
means complete, but it's capable enough that writing programs with it
is starting to be pretty pleasant. wile
is self-hosting.
The name wile
is, of course, the name of that extremely stable
super-genius schemer, Wile E. Coyote. 'nuff said
My design philosophy for it is the unix way of small simple tools; I have tried to keep it simple and self-contained, with minimal dependencies: in a minimal version, I want it to be usable without any requirements beyond a reasonably-modern C compiler (although I do rely on the Boehm garbage collector; without that, you'll be limited to small programs... that said, the pre-autotools bootstrap stage0 compiler is configured to build without GC, and it can recompile itself; that's not tiny.)
This release of it is starting to tend toward batteries-included.
The autotools-based build+install is looking pretty good; the files
and directories are still structured as a separate sub-tree, but
the required steps look a whole lot like the standard autotools
build: cd autotools && configure && make && make install
produces
a working compiler, and that compiler can compile itself and
almost all of the tests: two tests are known failures, and one lists
the configuration, so that will differ from system to system.
Since Coverity offers free scans of open-source projects, I have
scanned wile
, I intend to make it as clean as I can. There are two
remaining sets of flagged issues: first, Coverity claims (quite
correctly) that my use of drand48()
is not cryptographically
secure. That's ok, I think, for the purposes for which I'm using
it. Second, there are a number of places where Coverity flags
"structurally dead" code. That too is correct, these are spots where
wile
put in tail calls, and the dead-code eliminator, which is still
pretty stupid, failed to clean up all of the dead code that it should
have. But these are harmless. So overall it is quite clean already.
On the usability front, I have just added another dependency, however
it is now internal so no need to fetch more software: I added readline
capability to the repl in the form of the very nice
isocline library; with Daan's
kind permission, I added an amalgamated form of it into the wile repo,
and as a result the repl is becoming very pleasant to use. It doesn't
get built by default (yet), but it's pretty trivial: once wile
is
built and installed, the command wile -x repl.scm
will do the trick.
- License
- Short installation
- Limitation and bugs
- Roadmap for future work
- What
wile
already has - Installation gory details
- How to run
wile
- Guide to compiler files
- Contact me
wile
is released under GPLv3 or later. Its runtime library is released
under LGPLv3 or later; linking your code with the runtime library will
not cause your code to become GPL'd. You retain copyright of your
scheme source code and may keep your code as open or closed as you
desire.
-
There is no high-level macro system yet.
wile
does have some low-level macros built in,def-macro
style. -
There is no unicode support;
wile
speaks only ASCII so far. -
I'm aiming at RNRS compliance, for suitably recent value of N; not there yet, but working on it.
-
Tail recursion seems to be largely working, but since
wile
is a scheme-to-c compiler, it can be a bit tricky at times. We depend on the c compiler, and not every c compiler will do this correctly in all cases. In particular, there is a path through(apply)
that should be a tail call, but I've been unable to persuade eithergcc
orclang
to make that happen. -
There are no built-in bignums yet. I have a modest bignum library, see
library/bigint.scm
, and it seems to work pretty well, but it is not yet integrated into the number tower. But calculating, say, the factorial of 1000 is very straightforward. -
A few special forms are still missing:
delay
,force
. -
There are huge amounts of documentation still to be written.
-
Error messages need significant improvement
-
set-car!
andset-cdr!
are broken. The issue is that they need to mutate info that's stored in the heap, but instead they change local copies on the stack. Making everything live in the heap would be one solution to this, but that's a rather drastic change. -
There are some issues with how included files interact with macro expansion; this might not be an actual bug, but it's at best a significant infelicity.
-
I think I've got the majority of numerical functions able to deal with any style of number (int, rational, real, complex where relevant) as input, but there may be some cases I've missed.
-
Fix all the bugs
-
Implement all the stuff that's missing
-
Work on standards compliance
-
Enable a real "link against library" mode: part of it is already there, need an equivalent of wrtl.sch for other libraries, but it seems pretty straightforward
-
Make
wile
optimize better. Right now, I'm writing c output directly. I plan to generate output in some intermediate representation that's more amenable to optimization, then start doing those optimizations. (Allow me to insert a plug here for the excellent advanced compilers course CS6120 taught at Cornell by Adrian Sampson; a bunch of this is what I intend to pursue.) -
Add a decent foreign-function interface: I'd like to be able to call large parts of
FFTW
orLAPACK
etc fromwile
programs, without having to hand-craft every single interface.
-
The usual special forms:
and
begin
case
cond
define
define-primitive
define-alias
do
guard
if
lambda
let
let*
letrec
letrec*
or
quasiquote
unquote
unquote-splicing
quote
set!
-
A reasonably good number of functions in the standard library:
*
+
-
/
/=
<
<=
=
>
>=
UTCtime
abs
accept
acos
acosh
agm
all-true?
angle
any-true?
append
apply
apply-interp
apply-lambda
arithmetic-geometric-mean
asin
asinh
assp
assv
atan
atanh
begin-breakable
begin-form?
bessel-j
bessel-y
bits-and
bits-clear
bits-flip
bits-get
bits-not
bits-or
bits-set
bits-set?
bits-shift
bits-xor
boolean?
bytevector
bytevector->list
bytevector->string
bytevector-create
bytevector-for-each
bytevector-length
bytevector-ref
bytevector-set!
bytevector-swap!
bytevector?
c*
c+
c-
c/
caaaar
caaadr
caaar
caadar
caaddr
caadr
caar
cadaar
cadadr
cadar
caddar
cadddddddr
caddddddr
cadddddr
caddddr
cadddr
caddr
cadr
call-with-current-continuation
call/cc
car
case-lambic
cbrt
cconj
cdaaar
cdaadr
cdaar
cdadar
cdaddr
cdadr
cdar
cddaar
cddadr
cddar
cdddar
cddddr
cdddr
cddr
cdr
ceiling
ceiling-quotient
ceiling-remainder
ceiling/
cfft-good-n?
change-file-owner
change-root-directory
change-symbolic-link-owner
char->integer
char->string
char-alphabetic?
char-alphanumeric?
char-ci/=?
char-ci<=?
char-ci<?
char-ci=?
char-ci>=?
char-ci>?
char-control?
char-downcase
char-hex-digit?
char-lower-case?
char-lowercase?
char-numeric?
char-oct-digit?
char-printable?
char-upcase
char-upper-case?
char-uppercase?
char-whitespace?
char/=?
char<=?
char<?
char=?
char>=?
char>?
char?
cholesky-decompose
cholesky-solve
cimag
clear-file-error
close-port
cmplx
compiled-procedure?
complex-conjugate
complex?
compose
conj
connect-to
cons
continuation?
cos
cosh
cosine-integral
cputime
creal
create-directory
create-link
create-symbolic-link
curry
cxr
day-of-week
day-of-year
def-struct
define-form?
delta-dates
denominator
describe-system-error
digamma
directory-exists?
display
display-object-hook
display-stack-trace
do-until
do-while
elliptic-E
elliptic-K
emergency-exit
emit-code
epochtime
eqv?
erfc
eval
eval-begin
eval-define
even?
exit
exp
expmod
expt
factorial
file-executable?
file-exists?
file-port?
file-readable?
file-writable?
filter
finite?
flatten
float
floor
floor-quotient
floor-remainder
floor/
fluid-let
flush-port
fmod
foldl
foldl1
foldr
for-each
fork-process
fprintf
frexp
fromto
gc-version
gcd
gensym
get-current-directory
get-domain-name
get-effective-group-id
get-effective-user-id
get-environment-variable
get-errno
get-file-eof
get-file-error
get-file-position
get-file-status
get-group-id
get-group-information
get-host-name
get-interpreted-procedure-argu
get-interpreted-procedure-arit
get-interpreted-procedure-body
get-interpreted-procedure-envi
get-interpreted-procedure-macr
get-iproc-args
get-iproc-arity
get-iproc-body
get-iproc-env
get-iproc-macro
get-parent-process-id
get-process-id
get-session-id
get-symbolic-link-status
get-user-id
get-user-information
gregorian-date
hypot
i*
i+
i++
i-
i--
i/
ilog
imag-part
infinite?
integer
integer->char
integer?
interpreted-procedure?
is-block-device?
is-char-device?
is-directory?
is-leap-year?
is-named-pipe?
is-prime?
is-regular-file?
is-socket?
is-symbolic-link?
julian-day
julian-day-of-easter
lambert-W
lambert-W+
lambert-W-
last
lcm
ldexp
length
list
list->bytevector
list->string
list->vector
list-append
list-drop-while
list-filter
list-flatten
list-group-by
list-head
list-last
list-length
list-length<=?
list-length<?
list-length=?
list-length>=?
list-length>?
list-partition
list-ref
list-remove-dups
list-reverse
list-sort
list-tail
list-take-while
list-unhead
list-untail
list?
listen-on
load-file-path
load-form?
load-library
localtime
log
log-gamma
magnitude
make-bytevector
make-interpreted-procedure
make-iproc
make-polar
make-rational
make-rectangular
make-string
make-vector
map
max
max/i
max/q
max/r
memp
memv
min
min/i
min/q
min/r
modulo
namespace
nan?
negative
negative?
newline
next-prime
not
null?
number->string
number/type
number?
numerator
odd?
offset-date
open-file
open-temporary-file
pair?
parse-file
parse-string
partition
phase
pipe-port?
poly-chebyshev1
poly-chebyshev2
poly-hermite1
poly-hermite2
poly-laguerre
poly-legendre
port?
positive?
printf
procedure?
promise?
promote/cmplx
promote/rat
promote/real
q*
q+
q-
q/
quot-rem
quotient
r*
r+
r-
r/
raise
random-cauchy
random-exponential
random-normal-pair
random-permutation
random-poisson
random-seed!
random-uniform
rational?
read-all
read-bytes
read-char
read-directory
read-line
real-part
real?
reciprocal
regex-match
remainder
remove-directory
remove-file
rename-directory
rename-file
replicate
reverse
root-bisect
root-bracket
root-ridders
round
run-command
run-read-command
run-write-command
send-signal
set-car!
set-cdr!
set-current-directory
set-effective-group-id
set-effective-user-id
set-environment-variable
set-errno!
set-file-position
set-group-id
set-interpreted-procedure-envi
set-interpreted-procedure-macr
set-iproc-env!
set-iproc-macro!
set-line-buffering!
set-no-buffering!
set-session-id
set-user-id
sha-256
sha-256-init
sha-256-update
sha-256-finish
sign
sin
sine-integral
sinh
sleep
socket-port?
sprintf
sqlite-close
sqlite-dump-table
sqlite-meta-schema
sqlite-meta-tables
sqlite-open
sqlite-port?
sqlite-run
sqlite-statement-bind
sqlite-statement-cleanup
sqlite-statement-info
sqlite-statement-prepare
sqlite-statement-run
sqlite-statement?
sqlite-version
sqrt
stack-trace
stack-trace-minimal
string->char
string->list
string->number
string->symbol
string-append
string-ci-hash-32
string-ci-hash-64
string-ci/=?
string-ci<=?
string-ci<?
string-ci=?
string-ci>=?
string-ci>?
string-copy
string-create
string-downcase
string-find-first-char
string-find-last-char
string-hash-32
string-hash-64
string-join-by
string-length
string-pad-center
string-pad-left
string-pad-right
string-port?
string-ref
string-reverse
string-set!
string-split-by
string-split-by-whitespace
string-trim
string-trim-left
string-trim-right
string-upcase
string/=?
string<=?
string<?
string=?
string>=?
string>?
string?
substring
symbol->string
symbol=?
symbol?
tan
tanh
token-source-line
truncate
truncate-file
truncate-quotient
truncate-remainder
truncate/
type-of
unless
unset-environment-variable
until
upfrom
vector
vector->list
vector-capacity
vector-cfft!
vector-create
vector-fill!
vector-for-each
vector-length
vector-map
vector-map!
vector-number/type
vector-promote/cmplx!
vector-promote/rat!
vector-promote/real!
vector-ref
vector-set!
vector-sort!
vector-swap!
vector?
wait-process
when
while
wile-architecture-name
wile-basic-build-info
wile-build-info
wile-environment-with-macros
wile-os-name
wile-standard-environment
write-1str
write-bytes
write-char
write-string
zero?
-
A number of other libraries, in the
library/
subdirectory:
arg-parse.scm
bigint.scm
deque.scm
gthread.scm
hash.scm
monad.scm
andmonad-list.scm
quaternions.scm
queue.scm
stack.scm
stats.scm
struct.scm
test.scm
-
A modest set of test programs in
wtest/
which can serve as a starting point for exploration, plus a couple of programs in the main directory which are part of the build itself. -
A couple of example programs in
examples/
:
-
amaze.scm
is a silly toy, a small maze generator. It prints line segments to stdout, use your preferred graphics package to draw them. -
cal.scm
is a re-implementation of part of the standard unix commandcal
to display the current calendar. I needed that at one point and had not installed it on my laptop, so rather than get it from the distro I took it as a challenge to re-implement the bits I wanted. -
gc-test.scm
is a small test of the garbage collector: with different command-line arguments it is supposed to run in constant or linearly growing (slow) or quadratically growing (very fast) memory. If the GC is not working, it will always consume memory very quickly. -
net-server.scm
is a teeny-tiny very cheesy way to serve up the system's notion of the current time across the network - a cargo-cult implementation ofntp
(but don't use this for anything requiring real precision!) -
net-client.scm
is the client corresponding to the above server. -
n-queens.scm
solves the N-queens problem, using themonad
code in the library. -
In the
wtest/
directory, not inexamples/
,test_53.scm
is a Runge-Kutta-based solver for the Volterra equations; in addition to testing first-class closures, this might be a useful example for an ODE solver. -
gthread-test.scm
is a test of the very simple cooperative threads library, but already throws around a fair number of continuations.
Increasingly, the stuff I wrote here is obsolete. Instead, I'll briefly
describe what to do if you want to recompile the compiler itself. The
simplest way is to build the build-stages
program in the top-level
directory, then in that directory, run it: it will rebuild the compiler
and libraries either two or three times, each time using the previous
build result. If the SHA-256 hashes of the corresponding stage1 and stage2
files all match, it will stop early, and if they don't match, it will
build stage3. Either after stage2 or after stage3, the SHA-256 hashes
should match; if they do not, I think there is something seriously wrong.
If you want to build cross-compiled or cross-configured programs, you
will need to do some manual tweaking of the installed files. Look for
the wile-config.dat file in the locations where the wile
files got
installed, make a copy, and tweak it + build a copy of libwrtl.a with
the tweaked settings. If you try this, you will need to tell wile
to
use the tweaked versions rather than the originals; you can set the
environment variable WILE_CONFIG_FILE or use the -CF command-line
option.
After you've built wile
with the autotools dance, you need to either
put the directory where it's installed into your PATH if it's not
already, or you need to invoke it with the explicit full path; that
should be all.
So here's a tiny scheme program, call it hello.scm:
(write-string (if (even? (list-length command-line-arguments)) "saluton mondo" "coi munje") #\newline)
Run wile -x hello.scm hello
, then hello
; it should say saluton mondo
. If you run it as hello 1
, there is one command-line
argument, and it should say coi munje
instead. And if that works,
you're off and running. (The greetings are "hello world" in Esperanto
and Lojban respectively... I think.)
The compiler is composed of three files:
-
wile-main.scm
which does all the nitty-gritty of dealing with command-line options, inferring output file from input if required, etc. -
wile-comp.scm
which is the heart of the compiler: this deals with expressions, special forms, sequencing of outputs, etc. -
wile-prims.scm
which is the compiler's view of the runtime library: a bunch of little tiny leaf functions and codelets that implement most of the (small) primitives.
The rest of the runtime library lives in a number of C and scheme files:
-
alloc.c
which is the lowest-level interface to memory allocation, either plain-vanillamalloc
or the Boehm garbage collector version. -
print.c
which encodes all the knowledge of how to print the different types of scheme values. -
location.c
is a fairly small file which helps provide source file location info for tokens, and thus hopefully helps make better error messages. -
wile-sql.c
contains most of the interface tosqlite
. -
swll-cfft.c
containswile
's 1D complex Fourier transform routine. This should really be in some other library, not the standard runtime, but this sort of thing is kinda what I do a lot of, so there it is. -
continuations.c
implementscall/cc
. -
wile-parse.c
andwile-lex.c
are the parser and scanner, respectively. If you examine these closely, you'll see that they appear machine-generated; that is in fact correct. There are currently-unusedwile.yucc
andwile.ulex
files which are the true sources. A long time ago, I got curious as to howyacc
andlex
work, and I wrote my own,yucc
andulex
. I never got around to releasing those, which is why I use the generated files. Maybe someday I'll release those, or possibly just rewrite this back into standardyacc
andlex
. -
fsi_set.c
,nfa.c
,regex.c
, andulexlib.c
are support files for the scanner.regex.c
also implements the very simple regular-expression engine inwile
. -
wile-rtl1.c
contains a number of low-level runtime functions -
wile-rtl2.scm
contains a fairly large number of higher-level scheme routines that implement large parts of the runtime library. Many of these are written in a fairly low-level style; that's because they came into existence before some of the fancier stuff in the compiler got written (and sometimes they are faster than the fancier versions). -
math-funcs.c
contains a number of special mathematical functions.
The last three of these files get chopped up when compiling the runtime library, so that there are lots of small object files; this is so that executables stay smaller and also more secure: if a particular buggy function gets dragged in during linking even though it never gets used, just because it lives in the same object file as some other function that does get used, and if the executable gets used in cracking a system somehow, buffer overruns or other hacks could get used to reach the broken buggy function even though the normal flow of control never will. But if it isn't linked in... it can't get reached.
If you have questions, comments, bug reports, patches, please send email to me at uhollerbach@gmail.com. Please put "WILE" in the subject line. You can also reach me on Mastodon, where I am @UweHalfHand@norcal.social.
Enjoy! - Uwe Hollerbach