forked from ocaml/ocaml
-
Notifications
You must be signed in to change notification settings - Fork 0
/
comp.etex
516 lines (451 loc) · 22.7 KB
/
comp.etex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
\chapter{Batch compilation (ocamlc)} \label{c:camlc}
%HEVEA\cutname{comp.html}
This chapter describes the OCaml batch compiler "ocamlc",
which compiles OCaml source files to bytecode object files and links
these object files to produce standalone bytecode executable files.
These executable files are then run by the bytecode interpreter
"ocamlrun".
\section{Overview of the compiler}
The "ocamlc" command has a command-line interface similar to the one of
most C compilers. It accepts several types of arguments and processes them
sequentially, after all options have been processed:
\begin{itemize}
\item
Arguments ending in ".mli" are taken to be source files for
compilation unit interfaces. Interfaces specify the names exported by
compilation units: they declare value names with their types, define
public data types, declare abstract data types, and so on. From the
file \var{x}".mli", the "ocamlc" compiler produces a compiled interface
in the file \var{x}".cmi".
\item
Arguments ending in ".ml" are taken to be source files for compilation
unit implementations. Implementations provide definitions for the
names exported by the unit, and also contain expressions to be
evaluated for their side-effects. From the file \var{x}".ml", the "ocamlc"
compiler produces compiled object bytecode in the file \var{x}".cmo".
If the interface file \var{x}".mli" exists, the implementation
\var{x}".ml" is checked against the corresponding compiled interface
\var{x}".cmi", which is assumed to exist. If no interface
\var{x}".mli" is provided, the compilation of \var{x}".ml" produces a
compiled interface file \var{x}".cmi" in addition to the compiled
object code file \var{x}".cmo". The file \var{x}".cmi" produced
corresponds to an interface that exports everything that is defined in
the implementation \var{x}".ml".
\item
Arguments ending in ".cmo" are taken to be compiled object bytecode. These
files are linked together, along with the object files obtained
by compiling ".ml" arguments (if any), and the OCaml standard
library, to produce a standalone executable program. The order in
which ".cmo" and ".ml" arguments are presented on the command line is
relevant: compilation units are initialized in that order at
run-time, and it is a link-time error to use a component of a unit
before having initialized it. Hence, a given \var{x}".cmo" file must come
before all ".cmo" files that refer to the unit \var{x}.
\item
Arguments ending in ".cma" are taken to be libraries of object bytecode.
A library of object bytecode packs in a single file a set of object
bytecode files (".cmo" files). Libraries are built with "ocamlc -a"
(see the description of the "-a" option below). The object files
contained in the library are linked as regular ".cmo" files (see
above), in the order specified when the ".cma" file was built. The
only difference is that if an object file contained in a library is
not referenced anywhere in the program, then it is not linked in.
\item
Arguments ending in ".c" are passed to the C compiler, which generates
a ".o" object file (".obj" under Windows). This object file is linked
with the program if the "-custom" flag is set (see the description of
"-custom" below).
\item
Arguments ending in ".o" or ".a" (".obj" or ".lib" under Windows)
are assumed to be C object files and libraries. They are passed to the
C linker when linking in "-custom" mode (see the description of
"-custom" below).
\item
Arguments ending in ".so" (".dll" under Windows)
are assumed to be C shared libraries (DLLs). During linking, they are
searched for external C functions referenced from the OCaml code,
and their names are written in the generated bytecode executable.
The run-time system "ocamlrun" then loads them dynamically at program
start-up time.
\end{itemize}
The output of the linking phase is a file containing compiled bytecode
that can be executed by the OCaml bytecode interpreter:
the command named "ocamlrun". If "a.out" is the name of the file
produced by the linking phase, the command
\begin{alltt}
ocamlrun a.out \nth{arg}{1} \nth{arg}{2} \ldots \nth{arg}{n}
\end{alltt}
executes the compiled code contained in "a.out", passing it as
arguments the character strings \nth{arg}{1} to \nth{arg}{n}.
(See chapter~\ref{c:runtime} for more details.)
On most systems, the file produced by the linking
phase can be run directly, as in:
\begin{alltt}
./a.out \nth{arg}{1} \nth{arg}{2} \ldots \nth{arg}{n}
\end{alltt}
The produced file has the executable bit set, and it manages to launch
the bytecode interpreter by itself.
\section{Options}\label{s:comp-options}
The following command-line options are recognized by "ocamlc".
The options "-pack", "-a", "-c" and "-output-obj" are mutually exclusive.
% Define boolean variables used by the macros in unified-options.etex
\newif\ifcomp \comptrue
\newif\ifnat \natfalse
\newif\iftop \topfalse
% unified-options gathers all options across the native/bytecode
% compilers and toplevel
\input{unified-options.tex}
\paragraph{Contextual control of command-line options}
The compiler command line can be modified ``from the outside''
with the following mechanisms. These are experimental
and subject to change. They should be used only for experimental and
development work, not in released packages.
\begin{options}
\item["OCAMLPARAM" \rm(environment variable)]
A set of arguments that will be inserted before or after the arguments from
the command line. Arguments are specified in a comma-separated list
of "name=value" pairs. A "_" is used to specify the position of
the command line arguments, i.e. "a=x,_,b=y" means that "a=x" should be
executed before parsing the arguments, and "b=y" after. Finally,
an alternative separator can be specified as the
first character of the string, within the set ":|; ,".
\item["ocaml_compiler_internal_params" \rm(file in the stdlib directory)]
A mapping of file names to lists of arguments that
will be added to the command line (and "OCAMLPARAM") arguments.
\item["OCAML_FLEXLINK" \rm(environment variable)]
Alternative executable to use on native
Windows for "flexlink" instead of the
configured value. Primarily used for bootstrapping.
\end{options}
\section{Modules and the file system}
This short section is intended to clarify the relationship between the
names of the modules corresponding to compilation units and the names
of the files that contain their compiled interface and compiled
implementation.
The compiler always derives the module name by taking the capitalized
base name of the source file (".ml" or ".mli" file). That is, it
strips the leading directory name, if any, as well as the ".ml" or
".mli" suffix; then, it set the first letter to uppercase, in order to
comply with the requirement that module names must be capitalized.
For instance, compiling the file "mylib/misc.ml" provides an
implementation for the module named "Misc". Other compilation units
may refer to components defined in "mylib/misc.ml" under the names
"Misc."\var{name}; they can also do "open Misc", then use unqualified
names \var{name}.
The ".cmi" and ".cmo" files produced by the compiler have the same
base name as the source file. Hence, the compiled files always have
their base name equal (modulo capitalization of the first letter) to
the name of the module they describe (for ".cmi" files) or implement
(for ".cmo" files).
When the compiler encounters a reference to a free module identifier
"Mod", it looks in the search path for a file named "Mod.cmi" or "mod.cmi"
and loads the compiled interface
contained in that file. As a consequence, renaming ".cmi" files is not
advised: the name of a ".cmi" file must always correspond to the name
of the compilation unit it implements. It is admissible to move them
to another directory, if their base name is preserved, and the correct
"-I" options are given to the compiler. The compiler will flag an
error if it loads a ".cmi" file that has been renamed.
Compiled bytecode files (".cmo" files), on the other hand, can be
freely renamed once created. That's because the linker never attempts
to find by itself the ".cmo" file that implements a module with a
given name: it relies instead on the user providing the list of ".cmo"
files by hand.
\section{Common errors} \label{s:comp-errors}
This section describes and explains the most frequently encountered
error messages.
\begin{options}
\item[Cannot find file \var{filename}]
The named file could not be found in the current directory, nor in the
directories of the search path. The \var{filename} is either a
compiled interface file (".cmi" file), or a compiled bytecode file
(".cmo" file). If \var{filename} has the format \var{mod}".cmi", this
means you are trying to compile a file that references identifiers
from module \var{mod}, but you have not yet compiled an interface for
module \var{mod}. Fix: compile \var{mod}".mli" or \var{mod}".ml"
first, to create the compiled interface \var{mod}".cmi".
If \var{filename} has the format \var{mod}".cmo", this
means you are trying to link a bytecode object file that does not
exist yet. Fix: compile \var{mod}".ml" first.
If your program spans several directories, this error can also appear
because you haven't specified the directories to look into. Fix: add
the correct "-I" options to the command line.
\item[Corrupted compiled interface \var{filename}]
The compiler produces this error when it tries to read a compiled
interface file (".cmi" file) that has the wrong structure. This means
something went wrong when this ".cmi" file was written: the disk was
full, the compiler was interrupted in the middle of the file creation,
and so on. This error can also appear if a ".cmi" file is modified after
its creation by the compiler. Fix: remove the corrupted ".cmi" file,
and rebuild it.
\item[This expression has type \nth{t}{1}, but is used with type \nth{t}{2}]
This is by far the most common type error in programs. Type \nth{t}{1} is
the type inferred for the expression (the part of the program that is
displayed in the error message), by looking at the expression itself.
Type \nth{t}{2} is the type expected by the context of the expression; it
is deduced by looking at how the value of this expression is used in
the rest of the program. If the two types \nth{t}{1} and \nth{t}{2} are not
compatible, then the error above is produced.
In some cases, it is hard to understand why the two types \nth{t}{1} and
\nth{t}{2} are incompatible. For instance, the compiler can report that
``expression of type "foo" cannot be used with type "foo"'', and it
really seems that the two types "foo" are compatible. This is not
always true. Two type constructors can have the same name, but
actually represent different types. This can happen if a type
constructor is redefined. Example:
\begin{verbatim}
type foo = A | B
let f = function A -> 0 | B -> 1
type foo = C | D
f C
\end{verbatim}
This result in the error message ``expression "C" of type "foo" cannot
be used with type "foo"''.
\item[The type of this expression, \var{t}, contains type variables
that cannot be generalized]
Type variables ("'a", "'b", \ldots) in a type \var{t} can be in either
of two states: generalized (which means that the type \var{t} is valid
for all possible instantiations of the variables) and not generalized
(which means that the type \var{t} is valid only for one instantiation
of the variables). In a "let" binding "let "\var{name}" = "\var{expr},
the type-checker normally generalizes as many type variables as
possible in the type of \var{expr}. However, this leads to unsoundness
(a well-typed program can crash) in conjunction with polymorphic
mutable data structures. To avoid this, generalization is performed at
"let" bindings only if the bound expression \var{expr} belongs to the
class of ``syntactic values'', which includes constants, identifiers,
functions, tuples of syntactic values, etc. In all other cases (for
instance, \var{expr} is a function application), a polymorphic mutable
could have been created and generalization is therefore turned off for
all variables occurring in contravariant or non-variant branches of the
type. For instance, if the type of a non-value is "'a list" the
variable is generalizable ("list" is a covariant type constructor),
but not in "'a list -> 'a list" (the left branch of "->" is
contravariant) or "'a ref" ("ref" is non-variant).
Non-generalized type variables in a type cause no difficulties inside
a given structure or compilation unit (the contents of a ".ml" file,
or an interactive session), but they cannot be allowed inside
signatures nor in compiled interfaces (".cmi" file), because they
could be used inconsistently later. Therefore, the compiler
flags an error when a structure or compilation unit defines a value
\var{name} whose type contains non-generalized type variables. There
are two ways to fix this error:
\begin{itemize}
\item Add a type constraint or a ".mli" file to give a monomorphic
type (without type variables) to \var{name}. For instance, instead of
writing
\begin{verbatim}
let sort_int_list = List.sort Stdlib.compare
(* inferred type 'a list -> 'a list, with 'a not generalized *)
\end{verbatim}
write
\begin{verbatim}
let sort_int_list = (List.sort Stdlib.compare : int list -> int list);;
\end{verbatim}
\item If you really need \var{name} to have a polymorphic type, turn
its defining expression into a function by adding an extra parameter.
For instance, instead of writing
\begin{verbatim}
let map_length = List.map Array.length
(* inferred type 'a array list -> int list, with 'a not generalized *)
\end{verbatim}
write
\begin{verbatim}
let map_length lv = List.map Array.length lv
\end{verbatim}
\end{itemize}
\item[Reference to undefined global \var{mod}]
This error appears when trying to link an incomplete or incorrectly
ordered set of files. Either you have forgotten to provide an
implementation for the compilation unit named \var{mod} on the command line
(typically, the file named \var{mod}".cmo", or a library containing
that file). Fix: add the missing ".ml" or ".cmo" file to the command
line. Or, you have provided an implementation for the module named
\var{mod}, but it comes too late on the command line: the
implementation of \var{mod} must come before all bytecode object files
that reference \var{mod}. Fix: change the order of ".ml" and ".cmo"
files on the command line.
Of course, you will always encounter this error if you have mutually
recursive functions across modules. That is, function "Mod1.f" calls
function "Mod2.g", and function "Mod2.g" calls function "Mod1.f".
In this case, no matter what permutations you perform on the command
line, the program will be rejected at link-time. Fixes:
\begin{itemize}
\item Put "f" and "g" in the same module.
\item Parameterize one function by the other.
That is, instead of having
\begin{verbatim}
mod1.ml: let f x = ... Mod2.g ...
mod2.ml: let g y = ... Mod1.f ...
\end{verbatim}
define
\begin{verbatim}
mod1.ml: let f g x = ... g ...
mod2.ml: let rec g y = ... Mod1.f g ...
\end{verbatim}
and link "mod1.cmo" before "mod2.cmo".
\item Use a reference to hold one of the two functions, as in :
\begin{verbatim}
mod1.ml: let forward_g =
ref((fun x -> failwith "forward_g") : <type>)
let f x = ... !forward_g ...
mod2.ml: let g y = ... Mod1.f ...
let _ = Mod1.forward_g := g
\end{verbatim}
\end{itemize}
\item[The external function \var{f} is not available]
This error appears when trying to link code that calls external
functions written in C. As explained in
chapter~\ref{c:intf-c}, such code must be linked with C libraries that
implement the required \var{f} C function. If the C libraries in
question are not shared libraries (DLLs), the code must be linked in
``custom runtime'' mode. Fix: add the required C libraries to the
command line, and possibly the "-custom" option.
\end{options}
\section{Warning reference} \label{s:comp-warnings}
This section describes and explains in detail some warnings:
\subsection{Warning 9: missing fields in a record pattern}
When pattern matching on records, it can be useful to match only few
fields of a record. Eliding fields can be done either implicitly
or explicitly by ending the record pattern with "; _".
However, implicit field elision is at odd with pattern matching
exhaustiveness checks.
Enabling warning 9 prioritizes exhaustiveness checks over the
convenience of implicit field elision and will warn on implicit
field elision in record patterns. In particular, this warning can
help to spot exhaustive record pattern that may need to be updated
after the addition of new fields to a record type.
\begin{verbatim}
type 'a point = {x='a ;y='a}
let dx { x } = x (* implicit field elision: trigger warning 9 *)
let dy { y; _ } = y (* explicit field elision: do not trigger warning 9 *)
\end{verbatim}
\subsection{Warning 52: fragile constant pattern}
\label{ss:warn52}
Some constructors, such as the exception constructors "Failure" and
"Invalid_argument", take as parameter a "string" value holding
a text message intended for the user.
These text messages are usually not stable over time: call sites
building these constructors may refine the message in a future
version to make it more explicit, etc. Therefore, it is dangerous to
match over the precise value of the message. For example, until
OCaml 4.02, "Array.iter2" would raise the exception
\begin{verbatim}
Invalid_argument "arrays must have the same length"
\end{verbatim}
Since 4.03 it raises the more helpful message
\begin{verbatim}
Invalid_argument "Array.iter2: arrays must have the same length"
\end{verbatim}
but this means that any code of the form
\begin{verbatim}
try ...
with Invalid_argument "arrays must have the same length" -> ...
\end{verbatim}
is now broken and may suffer from uncaught exceptions.
Warning 52 is there to prevent users from writing such fragile code
in the first place. It does not occur on every matching on a literal
string, but only in the case in which library authors expressed
their intent to possibly change the constructor parameter value in
the future, by using the attribute "ocaml.warn_on_literal_pattern"
(see the manual section on builtin attributes in
\ref{ss:builtin-attributes}):
\begin{verbatim}
type t =
| Foo of string [@ocaml.warn_on_literal_pattern]
| Bar of string
let no_warning = function
| Bar "specific value" -> 0
| _ -> 1
let warning = function
| Foo "specific value" -> 0
| _ -> 1
> | Foo "specific value" -> 0
> ^^^^^^^^^^^^^^^^
> Warning 52: Code should not depend on the actual values of
> this constructor's arguments. They are only for information
> and may change in future versions. (See manual section 8.5)
\end{verbatim}
In particular, all built-in exceptions with a string argument have
this attribute set: "Invalid_argument", "Failure", "Sys_error" will
all raise this warning if you match for a specific string argument.
Additionally, built-in exceptions with a structured argument that
includes a string also have the attribute set: "Assert_failure" and
"Match_failure" will raise the warning for a pattern that uses a
literal string to match the first element of their tuple argument.
If your code raises this warning, you should {\em not} change the
way you test for the specific string to avoid the warning (for
example using a string equality inside the right-hand-side instead
of a literal pattern), as your code would remain fragile. You should
instead enlarge the scope of the pattern by matching on all possible
values.
\begin{verbatim}
let warning = function
| Foo _ -> 0
| _ -> 1
\end{verbatim}
This may require some care: if the scrutinee may return several
different cases of the same pattern, or raise distinct instances of
the same exception, you may need to modify your code to separate
those several cases.
For example,
\begin{verbatim}
try (int_of_string count_str, bool_of_string choice_str) with
| Failure "int_of_string" -> (0, true)
| Failure "bool_of_string" -> (-1, false)
\end{verbatim}
should be rewritten into more atomic tests. For example,
using the "exception" patterns documented in Section~\ref{s:exception-match},
one can write:
\begin{verbatim}
match int_of_string count_str with
| exception (Failure _) -> (0, true)
| count ->
begin match bool_of_string choice_str with
| exception (Failure _) -> (-1, false)
| choice -> (count, choice)
end
\end{verbatim}
The only case where that transformation is not possible is if a given
function call may raise distinct exceptions with the same constructor
but different string values. In this case, you will have to check for
specific string values. This is dangerous API design and it should be
discouraged: it's better to define more precise exception constructors
than store useful information in strings.
\subsection{Warning 57: Ambiguous or-pattern variables under guard}
\label{ss:warn57}
The semantics of or-patterns in OCaml is specified with
a left-to-right bias: a value \var{v} matches the pattern \var{p} "|" \var{q}
if it matches \var{p} or \var{q}, but if it matches both,
the environment captured by the match is the environment captured by
\var{p}, never the one captured by \var{q}.
While this property is generally intuitive, there is at least one specific
case where a different semantics might be expected.
Consider a pattern followed by a when-guard:
"|"~\var{p}~"when"~\var{g}~"->"~\var{e}, for example:
\begin{verbatim}
| ((Const x, _) | (_, Const x)) when is_neutral x -> branch
\end{verbatim}
The semantics is clear:
match the scrutinee against the pattern, if it matches, test the guard,
and if the guard passes, take the branch.
In particular, consider the input "(Const"~\var{a}", Const"~\var{b}")", where
\var{a} fails the test "is_neutral"~\var{a}, while \var{b} passes the test
"is_neutral"~\var{b}. With the left-to-right semantics, the clause above is
{\em not} taken by its input: matching "(Const"~\var{a}", Const"~\var{b}")"
against the or-pattern succeeds in the left branch, it returns the
environment \var{x}~"->"~\var{a}, and then the guard
"is_neutral"~\var{a} is tested and fails, the branch is not taken.
However, another semantics may be considered more natural here:
any pair that has one side passing the test will take the branch. With this
semantics the previous code fragment would be equivalent to
\begin{verbatim}
| (Const x, _) when is_neutral x -> branch
| (_, Const x) when is_neutral x -> branch
\end{verbatim}
This is {\em not} the semantics adopted by OCaml.
Warning 57 is dedicated to these confusing cases where the
specified left-to-right semantics is not equivalent to a non-deterministic
semantics (any branch can be taken) relatively to a specific guard.
More precisely, it warns when guard uses ``ambiguous'' variables, that are bound
to different parts of the scrutinees by different sides of a or-pattern.