Implementation of modular translation for call-by-value PCF
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Tests
rt
src
.gitignore
Makefile
README.md
configure
llvm_compile.sh
modular.opam
run_tests.sh

README.md

Modular

This program implements the modular translation of call-by-value PCF that is described in detail in this draft paper. The program takes as input a PCF program and performs the following steps:

  1. Infer simple types for the input program.
  2. Infer type annotations to obtain a legal derivation in the annotated PCF type system, which is described in the paper.
  3. Translate the annotated typing derivation to a low-level program.
  4. Generate LLVM-bitcode from the low-level program.

The low-level language used in the implementation is a variant of that described in the paper. The main difference is that it uses a more low-level representation of recursive types. The low-level language does not support recursive types as such, but instead supports a type box<A> of typed pointers to the heap. Recursive types can be implemented using pointers to the heap. For example, lists are available as

  type list<'a> =
        Nil of unit
        | Cons of 'a * box<list<'a>>

The translation in steps 2 and 3 is implemented as it is described in the paper. As described there, in step 2 there are many different ways of solving the constraints on type annotations. The implementation currently uses the simple solution using recursive types described in the paper. For example, the two constraints C <= 'a and 'b *'a <= 'a would be solved by letting 'a be the following type:

  type cons<'b> = Cons1 of C
                | Cons2 of 'b * box<cons<'b>>

This solution amounts to a type of lists. The appearance of box means that these lists are stored as linked lists, where the tail is always a pointer to the heap. This use of the heap is somewhat inefficient and one may want to replace it by stack-allocated data, as described in the paper. This was done experimentally in an earlier version of the translation, which shows that such a change is not hard to do and significantly improves performance. Here the simple solution is implemented first, as it works without having to assume a stack and may have applications, e.g. for hardware synthesis.

Installation

The translation is written in OCaml and uses LLVM for code generation. It depends on Jane Street's Core_kernel library and the OCaml LLVM bindings.

The installation of OCaml packages should work most easily using the OCaml Package Manager (OPAM), which can be obtained from (http://opam.ocamlpro.com). On Ubuntu, OPAM can be installed with apt-get install opam.

Second, LLVM needs to be installed. On Ubuntu, LLVM can be installed using apt-get install llvm. It may be necessary to pin the LLVM bindings for OCaml to the version of LLVM that is installed on the system (see llc -version). For example, pinning the LLVM binding to version 3.8 may be done as follows:

  opam pin add llvm 3.8

Once OPAM and LLVM are installed, the translation can be installed and tested as follows:

  git clone git://github.com/uelis/modular.git
  opam pin add modular
  cd modular
  modular Test/fib.cbv

This should compile the source fib.cbv to a file main.bc.

Source Language

The input language is currently a simple version of PCF. Input files are expected to have the form

  let x1 = t1
  let x2 = t2
  ...
  let xn = tn

Terms can be formed using the following concrete syntax:

  s,t  ::=  x  |  let x = s in t  |  \x -> t  |  s t
         | let rec f x = t | let tailrec f x = t
         |  if s = t then t1 else t2
         | (s, t) | #1 t | #2 t
         |  0  |  ~1  |  1  |  ~2  |  2  | ...
         |  s + t  |  s - t  |  s * t  |  s / t
         |  print(t) | print "text"

The concrete syntax for types is

  X, Y  ::=  Nat | X * Y | X -> Y

The term print(t) prints the value of the term t, which must have type Nat. The value of print(t) is that of t.

For example, the following program prints the 20-th Fibonacci number

  let rec fib x =
    let x1 = x - 1 in
    let x2 = x - 2 in
    if x1 = 0 then 1 else
    if x2 = 0 then 1 else
    (fib x1) +  (fib x2)

  let main =
    print (fib 20)

Usage

The implementation produces an LLVM-bitcode file for each toplevel definition in the input file. If the file fib.cbv has the content of the Fibonacci example above, then the call

  > ./modular.native Test/fib.cbv

prints the types of fib and main (namely Nat -> Nat and Nat respectively) and produces LLVM bitcode files fib.bc and main.bc for them. While both have a well-specified interface, as explained in the paper, without further linking one can currently run only programs of base type. It is possible to link fib.bc to an assembly module implementing the function argument, but one needs to do the linking by hand, currently.

To generate an executable from main.bc using the LLVM compiler tools (and also clang for linking), one may invoke the script llvm_compile.sh as follows:

  > ./llvm_compile main

This should generate an executable main that prints the 20-th Fibonacci number.

  > ./main
  6765

Printing Type Annotations

By default, the implementation just shows simple types. Type annotations can be shown using the command-line argument --type-details. The concrete syntax for annotated types is as follows:

  X, Y  ::=  Nat[A] | [A](X -{B, C}-> Y)

For example,

  > ./modular.native --type-details fib.cbv

gives the following types

  fib : ['g](Nat[unit * 'f + 'e] -{box<(conty0<unit>)>, unit}-> Nat['d])
  main : Nat['k]

where

  type conty0<'a> = conty0_0 of box<(conty0<'a>)> * (unit * int * 'a)
                  | conty0_1 of box<(conty0<'a>)> * int

Examples

A few example programs can be found in the Tests directory. The programs mandelbrot.cbv, raytrace.cbv and raytrace_inline.cbv produce output that can be interpreted as images in PPM format. The produced images should look as follows.

Raytrace

Mandelbrot