manual/manual/cmds/comp.etex

\chapter{Batch compilation (ocamlc)} \label{c:camlc}
%HEVEA\cutname{comp.html}

This chapter describes the OCaml batch compiler "ocamlc",
which compiles OCaml source files to bytecode object files and links
these object files to produce standalone bytecode executable files.
These executable files are then run by the bytecode interpreter
"ocamlrun".

\section{Overview of the compiler}

The "ocamlc" command has a command-line interface similar to the one of
most C compilers. It accepts several types of arguments and processes them
sequentially, after all options have been processed:

\begin{itemize}
\item
Arguments ending in ".mli" are taken to be source files for
compilation unit interfaces. Interfaces specify the names exported by
compilation units: they declare value names with their types, define
public data types, declare abstract data types, and so on. From the
file \var{x}".mli", the "ocamlc" compiler produces a compiled interface
in the file \var{x}".cmi".

\item
Arguments ending in ".ml" are taken to be source files for compilation
unit implementations. Implementations provide definitions for the
names exported by the unit, and also contain expressions to be
evaluated for their side-effects.  From the file \var{x}".ml", the "ocamlc"
compiler produces compiled object bytecode in the file \var{x}".cmo".

If the interface file \var{x}".mli" exists, the implementation
\var{x}".ml" is checked against the corresponding compiled interface
\var{x}".cmi", which is assumed to exist. If no interface
\var{x}".mli" is provided, the compilation of \var{x}".ml" produces a
compiled interface file \var{x}".cmi" in addition to the compiled
object code file \var{x}".cmo". The file \var{x}".cmi" produced
corresponds to an interface that exports everything that is defined in
the implementation \var{x}".ml".

\item
Arguments ending in ".cmo" are taken to be compiled object bytecode.  These
files are linked together, along with the object files obtained
by compiling ".ml" arguments (if any), and the OCaml standard
library, to produce a standalone executable program. The order in
which ".cmo" and ".ml" arguments are presented on the command line is
relevant: compilation units are initialized in that order at
run-time, and it is a link-time error to use a component of a unit
before having initialized it. Hence, a given \var{x}".cmo" file must come
before all ".cmo" files that refer to the unit \var{x}.

\item
Arguments ending in ".cma" are taken to be libraries of object bytecode.
A library of object bytecode packs in a single file a set of object
bytecode files (".cmo" files). Libraries are built with "ocamlc -a"
(see the description of the "-a" option below). The object files
contained in the library are linked as regular ".cmo" files (see
above), in the order specified when the ".cma" file was built. The
only difference is that if an object file contained in a library is
not referenced anywhere in the program, then it is not linked in.

\item
Arguments ending in ".c" are passed to the C compiler, which generates
a ".o" object file (".obj" under Windows). This object file is linked
with the program if the "-custom" flag is set (see the description of
"-custom" below).

\item
Arguments ending in ".o" or ".a" (".obj" or ".lib" under Windows)
are assumed to be C object files and libraries. They are passed to the
C linker when linking in "-custom" mode (see the description of
"-custom" below).

\item
Arguments ending in ".so" (".dll" under Windows)
are assumed to be C shared libraries (DLLs).  During linking, they are
searched for external C functions referenced from the OCaml code,
and their names are written in the generated bytecode executable.
The run-time system "ocamlrun" then loads them dynamically at program
start-up time.

\end{itemize}

The output of the linking phase is a file containing compiled bytecode
that can be executed by the OCaml bytecode interpreter:
the command named "ocamlrun". If "a.out" is the name of the file
produced by the linking phase, the command
\begin{alltt}
        ocamlrun a.out \nth{arg}{1} \nth{arg}{2} \ldots \nth{arg}{n}
\end{alltt}
executes the compiled code contained in "a.out", passing it as
arguments the character strings \nth{arg}{1} to \nth{arg}{n}.
(See chapter~\ref{c:runtime} for more details.)

On most systems, the file produced by the linking
phase can be run directly, as in:
\begin{alltt}
        ./a.out \nth{arg}{1} \nth{arg}{2} \ldots \nth{arg}{n}
\end{alltt}
The produced file has the executable bit set, and it manages to launch
the bytecode interpreter by itself.

\section{Options}\label{s:comp-options}

The following command-line options are recognized by "ocamlc".
The options "-pack", "-a", "-c" and "-output-obj" are mutually exclusive.
% Define boolean variables used by the macros in unified-options.etex
\newif\ifcomp \comptrue
\newif\ifnat \natfalse
\newif\iftop \topfalse
% unified-options gathers all options across the native/bytecode
% compilers and toplevel
\input{unified-options.tex}

\paragraph{Contextual control of command-line options}

The compiler command line can be modified ``from the outside''
with the following mechanisms. These are experimental
and subject to change. They should be used only for experimental and
development work, not in released packages.

\begin{options}
\item["OCAMLPARAM" \rm(environment variable)]
A set of arguments that will be inserted before or after the arguments from
the command line. Arguments are specified in a comma-separated list
of "name=value" pairs. A "_" is used to specify the position of
the command line arguments, i.e. "a=x,_,b=y" means that "a=x" should be
executed before parsing the arguments, and "b=y" after. Finally,
an alternative separator can be specified as the
first character of the string, within the set ":|; ,".
\item["ocaml_compiler_internal_params" \rm(file in the stdlib directory)]
A mapping of file names to lists of arguments that
will be added to the command line (and "OCAMLPARAM") arguments.
\item["OCAML_FLEXLINK" \rm(environment variable)]
Alternative executable to use on native
Windows for "flexlink" instead of the
configured value. Primarily used for bootstrapping.
\end{options}

\section{Modules and the file system}

This short section is intended to clarify the relationship between the
names of the modules corresponding to compilation units and the names
of the files that contain their compiled interface and compiled
implementation.

The compiler always derives the module name by taking the capitalized
base name of the source file (".ml" or ".mli" file).  That is, it
strips the leading directory name, if any, as well as the ".ml" or
".mli" suffix; then, it set the first letter to uppercase, in order to
comply with the requirement that module names must be capitalized.
For instance, compiling the file "mylib/misc.ml" provides an
implementation for the module named "Misc". Other compilation units
may refer to components defined in "mylib/misc.ml" under the names
"Misc."\var{name}; they can also do "open Misc", then use unqualified
names \var{name}.

The ".cmi" and ".cmo" files produced by the compiler have the same
base name as the source file. Hence, the compiled files always have
their base name equal (modulo capitalization of the first letter) to
the name of the module they describe (for ".cmi" files) or implement
(for ".cmo" files).

When the compiler encounters a reference to a free module identifier
"Mod", it looks in the search path for a file named "Mod.cmi" or "mod.cmi"
and loads the compiled interface
contained in that file. As a consequence, renaming ".cmi" files is not
advised: the name of a ".cmi" file must always correspond to the name
of the compilation unit it implements. It is admissible to move them
to another directory, if their base name is preserved, and the correct
"-I" options are given to the compiler. The compiler will flag an
error if it loads a ".cmi" file that has been renamed.

Compiled bytecode files (".cmo" files), on the other hand, can be
freely renamed once created. That's because the linker never attempts
to find by itself the ".cmo" file that implements a module with a
given name: it relies instead on the user providing the list of ".cmo"
files by hand.

\section{Common errors} \label{s:comp-errors}

This section describes and explains the most frequently encountered
error messages.

\begin{options}

\item[Cannot find file \var{filename}]
The named file could not be found in the current directory, nor in the
directories of the search path. The \var{filename} is either a
compiled interface file (".cmi" file), or a compiled bytecode file
(".cmo" file). If \var{filename} has the format \var{mod}".cmi", this
means you are trying to compile a file that references identifiers
from module \var{mod}, but you have not yet compiled an interface for
module \var{mod}. Fix: compile \var{mod}".mli" or \var{mod}".ml"
first, to create the compiled interface \var{mod}".cmi".

If \var{filename} has the format \var{mod}".cmo", this
means you are trying to link a bytecode object file that does not
exist yet. Fix: compile \var{mod}".ml" first.

If your program spans several directories, this error can also appear
because you haven't specified the directories to look into. Fix: add
the correct "-I" options to the command line.

\item[Corrupted compiled interface \var{filename}]
The compiler produces this error when it tries to read a compiled
interface file (".cmi" file) that has the wrong structure. This means
something went wrong when this ".cmi" file was written: the disk was
full, the compiler was interrupted in the middle of the file creation,
and so on. This error can also appear if a ".cmi" file is modified after
its creation by the compiler. Fix: remove the corrupted ".cmi" file,
and rebuild it.

\item[This expression has type \nth{t}{1}, but is used with type \nth{t}{2}]
This is by far the most common type error in programs. Type \nth{t}{1} is
the type inferred for the expression (the part of the program that is
displayed in the error message), by looking at the expression itself.
Type \nth{t}{2} is the type expected by the context of the expression; it
is deduced by looking at how the value of this expression is used in
the rest of the program. If the two types \nth{t}{1} and \nth{t}{2} are not
compatible, then the error above is produced.

In some cases, it is hard to understand why the two types \nth{t}{1} and
\nth{t}{2} are incompatible. For instance, the compiler can report that
``expression of type "foo" cannot be used with type "foo"'', and it
really seems that the two types "foo" are compatible. This is not
always true. Two type constructors can have the same name, but
actually represent different types. This can happen if a type
constructor is redefined. Example:
\begin{verbatim}
        type foo = A | B
        let f = function A -> 0 | B -> 1
        type foo = C | D
        f C
\end{verbatim}
This result in the error message ``expression "C" of type "foo" cannot
be used with type "foo"''.

\item[The type of this expression, \var{t}, contains type variables
      that cannot be generalized]
Type variables ("'a", "'b", \ldots) in a type \var{t} can be in either
of two states: generalized (which means that the type \var{t} is valid
for all possible instantiations of the variables) and not generalized
(which means that the type \var{t} is valid only for one instantiation
of the variables). In a "let" binding "let "\var{name}" = "\var{expr},
the type-checker normally generalizes as many type variables as
possible in the type of \var{expr}. However, this leads to unsoundness
(a well-typed program can crash) in conjunction with polymorphic
mutable data structures. To avoid this, generalization is performed at
"let" bindings only if the bound expression \var{expr} belongs to the
class of ``syntactic values'', which includes constants, identifiers,
functions, tuples of syntactic values, etc. In all other cases (for
instance, \var{expr} is a function application), a polymorphic mutable
could have been created and generalization is therefore turned off for
all variables occurring in contravariant or non-variant branches of the
type. For instance, if the type of a non-value is "'a list" the
variable is generalizable ("list" is a covariant type constructor),
but not in "'a list -> 'a list" (the left branch of "->" is
contravariant) or "'a ref" ("ref" is non-variant).

Non-generalized type variables in a type cause no difficulties inside
a given structure or compilation unit (the contents of a ".ml" file,
or an interactive session), but they cannot be allowed inside
signatures nor in compiled interfaces (".cmi" file), because they
could be used inconsistently later. Therefore, the compiler
flags an error when a structure or compilation unit defines a value
\var{name} whose type contains non-generalized type variables. There
are two ways to fix this error:
\begin{itemize}
\item Add a type constraint or a ".mli" file to give a monomorphic
type (without type variables) to \var{name}. For instance, instead of
writing
\begin{verbatim}
    let sort_int_list = List.sort Stdlib.compare
    (* inferred type 'a list -> 'a list, with 'a not generalized *)
\end{verbatim}
write
\begin{verbatim}
    let sort_int_list = (List.sort Stdlib.compare : int list -> int list);;
\end{verbatim}
\item If you really need \var{name} to have a polymorphic type, turn
its defining expression into a function by adding an extra parameter.
For instance, instead of writing
\begin{verbatim}
    let map_length = List.map Array.length
    (* inferred type 'a array list -> int list, with 'a not generalized *)
\end{verbatim}
write
\begin{verbatim}
    let map_length lv = List.map Array.length lv
\end{verbatim}
\end{itemize}

\item[Reference to undefined global \var{mod}]
This error appears when trying to link an incomplete or incorrectly
ordered set of files. Either you have forgotten to provide an
implementation for the compilation unit named \var{mod} on the command line
(typically, the file named \var{mod}".cmo", or a library containing
that file). Fix: add the missing ".ml" or ".cmo" file to the command
line.  Or, you have provided an implementation for the module named
\var{mod}, but it comes too late on the command line: the
implementation of \var{mod} must come before all bytecode object files
that reference \var{mod}. Fix: change the order of ".ml" and ".cmo"
files on the command line.

Of course, you will always encounter this error if you have mutually
recursive functions across modules. That is, function "Mod1.f" calls
function "Mod2.g", and function "Mod2.g" calls function "Mod1.f".
In this case, no matter what permutations you perform on the command
line, the program will be rejected at link-time. Fixes:
\begin{itemize}
\item Put "f" and "g" in the same module.
\item Parameterize one function by the other.
That is, instead of having
\begin{verbatim}
mod1.ml:    let f x = ... Mod2.g ...
mod2.ml:    let g y = ... Mod1.f ...
\end{verbatim}
define
\begin{verbatim}
mod1.ml:    let f g x = ... g ...
mod2.ml:    let rec g y = ... Mod1.f g ...
\end{verbatim}
and link "mod1.cmo" before "mod2.cmo".
\item Use a reference to hold one of the two functions, as in :
\begin{verbatim}
mod1.ml:    let forward_g =
                ref((fun x -> failwith "forward_g") : <type>)
            let f x = ... !forward_g ...
mod2.ml:    let g y = ... Mod1.f ...
            let _ = Mod1.forward_g := g
\end{verbatim}
\end{itemize}

\item[The external function \var{f} is not available]
This error appears when trying to link code that calls external
functions written in C.  As explained in
chapter~\ref{c:intf-c}, such code must be linked with C libraries that
implement the required \var{f} C function.  If the C libraries in
question are not shared libraries (DLLs), the code must be linked in
``custom runtime'' mode.  Fix: add the required C libraries to the
command line, and possibly the "-custom" option.

\end{options}

\section{Warning reference} \label{s:comp-warnings}

This section describes and explains in detail some warnings:

\subsection{Warning 9: missing fields in a record pattern}

  When pattern matching on records, it can be useful to match only few
  fields of a record. Eliding fields can be done either implicitly
  or explicitly by ending the record pattern with "; _".
  However, implicit field elision is at odd with pattern matching
  exhaustiveness checks.
  Enabling warning 9 prioritizes exhaustiveness checks over the
  convenience of implicit field elision and will warn on implicit
  field elision in record patterns. In particular, this warning can
  help to spot exhaustive record pattern that may need to be updated
  after the addition of new fields to a record type.

\begin{verbatim}
type 'a point = {x='a ;y='a}
let dx { x } = x (* implicit field elision: trigger warning 9 *)
let dy { y; _ } = y (* explicit field elision: do not trigger warning 9 *)
\end{verbatim}

\subsection{Warning 52: fragile constant pattern}
\label{ss:warn52}

  Some constructors, such as the exception constructors "Failure" and
  "Invalid_argument", take as parameter a "string" value holding
  a text message intended for the user.

  These text messages are usually not stable over time: call sites
  building these constructors may refine the message in a future
  version to make it more explicit, etc. Therefore, it is dangerous to
  match over the precise value of the message. For example, until
  OCaml 4.02, "Array.iter2" would raise the exception
\begin{verbatim}
  Invalid_argument "arrays must have the same length"
\end{verbatim}
  Since 4.03 it raises the more helpful message
\begin{verbatim}
  Invalid_argument "Array.iter2: arrays must have the same length"
\end{verbatim}
  but this means that any code of the form
\begin{verbatim}
  try ...
  with Invalid_argument "arrays must have the same length" -> ...
\end{verbatim}
  is now broken and may suffer from uncaught exceptions.

  Warning 52 is there to prevent users from writing such fragile code
  in the first place. It does not occur on every matching on a literal
  string, but only in the case in which library authors expressed
  their intent to possibly change the constructor parameter value in
  the future, by using the attribute "ocaml.warn_on_literal_pattern"
  (see the manual section on builtin attributes in
  \ref{ss:builtin-attributes}):
\begin{verbatim}
  type t =
    | Foo of string [@ocaml.warn_on_literal_pattern]
    | Bar of string

  let no_warning = function
    | Bar "specific value" -> 0
    | _ -> 1

  let warning = function
    | Foo "specific value" -> 0
    | _ -> 1

>    | Foo "specific value" -> 0
>          ^^^^^^^^^^^^^^^^
> Warning 52: Code should not depend on the actual values of
> this constructor's arguments. They are only for information
> and may change in future versions. (See manual section 8.5)
\end{verbatim}

  In particular, all built-in exceptions with a string argument have
  this attribute set: "Invalid_argument", "Failure", "Sys_error" will
  all raise this warning if you match for a specific string argument.

  Additionally, built-in exceptions with a structured argument that
  includes a string also have the attribute set: "Assert_failure" and
  "Match_failure" will raise the warning for a pattern that uses a
  literal string to match the first element of their tuple argument.

  If your code raises this warning, you should {\em not} change the
  way you test for the specific string to avoid the warning (for
  example using a string equality inside the right-hand-side instead
  of a literal pattern), as your code would remain fragile. You should
  instead enlarge the scope of the pattern by matching on all possible
  values.

\begin{verbatim}

let warning = function
  | Foo _ -> 0
  | _ -> 1
\end{verbatim}

  This may require some care: if the scrutinee may return several
  different cases of the same pattern, or raise distinct instances of
  the same exception, you may need to modify your code to separate
  those several cases.

  For example,
\begin{verbatim}
try (int_of_string count_str, bool_of_string choice_str) with
  | Failure "int_of_string" -> (0, true)
  | Failure "bool_of_string" -> (-1, false)
\end{verbatim}
  should be rewritten into more atomic tests. For example,
  using the "exception" patterns documented in Section~\ref{s:exception-match},
  one can write:
\begin{verbatim}
match int_of_string count_str with
  | exception (Failure _) -> (0, true)
  | count ->
    begin match bool_of_string choice_str with
    | exception (Failure _) -> (-1, false)
    | choice -> (count, choice)
    end
\end{verbatim}

The only case where that transformation is not possible is if a given
function call may raise distinct exceptions with the same constructor
but different string values. In this case, you will have to check for
specific string values. This is dangerous API design and it should be
discouraged: it's better to define more precise exception constructors
than store useful information in strings.

\subsection{Warning 57: Ambiguous or-pattern variables under guard}
\label{ss:warn57}

  The semantics of or-patterns in OCaml is specified with
  a left-to-right bias: a value \var{v} matches the pattern \var{p} "|" \var{q}
  if it matches \var{p} or \var{q}, but if it matches both,
  the environment captured by the match is the environment captured by
  \var{p}, never the one captured by \var{q}.

  While this property is generally intuitive, there is at least one specific
  case where a different semantics might be expected.
  Consider a pattern followed by a when-guard:
  "|"~\var{p}~"when"~\var{g}~"->"~\var{e}, for example:
\begin{verbatim}
     | ((Const x, _) | (_, Const x)) when is_neutral x -> branch
\end{verbatim}
  The semantics is clear:
  match the scrutinee against the pattern, if it matches, test the guard,
  and if the guard passes, take the branch.
  In particular, consider the input "(Const"~\var{a}", Const"~\var{b}")", where
  \var{a} fails the test "is_neutral"~\var{a}, while \var{b} passes the test
  "is_neutral"~\var{b}.  With the left-to-right semantics, the clause above is
  {\em not} taken by its input: matching "(Const"~\var{a}", Const"~\var{b}")"
  against the or-pattern succeeds in the left branch, it returns the
  environment \var{x}~"->"~\var{a}, and then the guard
  "is_neutral"~\var{a} is tested and fails, the branch is not taken.

  However, another semantics may be considered more natural here:
  any pair that has one side passing the test will take the branch. With this
  semantics the previous code fragment would be equivalent to
\begin{verbatim}
     | (Const x, _) when is_neutral x -> branch
     | (_, Const x) when is_neutral x -> branch
\end{verbatim}
  This is {\em not} the semantics adopted by OCaml.

 Warning 57 is dedicated to these confusing cases where the
 specified left-to-right semantics is not equivalent to a non-deterministic
 semantics (any branch can be taken) relatively to a specific guard.
 More precisely, it warns when guard uses ``ambiguous'' variables, that are bound
 to different parts of the scrutinees by different sides of a or-pattern.