The Caml Light system
release 0.73
Documentation and user's manual
Xavier Leroy
December 2, 1997
Copyright oc 1997 Institut National de Recherche en Informatique et
Automatique
Contents
I Getting started 6
1 Installation instructions 7
1.1 The Unix version. . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 The Macintosh version . . . . . . . . . . . . . . . . . . . . . . 7
1.3 The MS-Windows version. . . . . . . . . . . . . . . . . . . . . . 8
II The Caml Light language reference manual 9
2 The core Caml Light language 10
2.1 Lexical conventions . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Global names. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Type expressions. . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.8 Global definitions. . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10 Module implementations. . . . . . . . . . . . . . . . . . . . . . 28
2.11 Module interfaces . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Language extensions 30
3.1 Streams, parsers, and printers. . . . . . . . . . . . . . . . . . 30
3.2 Guards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Range patterns. . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4 Recursive definitions of values . . . . . . . . . . . . . . . . . 32
3.5 Local definitions using where . . . . . . . . . . . . . . . . . . 32
3.6 Mutable variant types . . . . . . . . . . . . . . . . . . . . . . 32
3.7 String access . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8 Alternate syntax. . . . . . . . . . . . . . . . . . . . . . . . . 33
3.9 Infix symbols . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.10 Directives. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
III The Caml Light commands 36
4 Batch compilation (camlc) 37
4.1 Overview of the compiler. . . . . . . . . . . . . . . . . . . . . 37
4.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Modules and the file system . . . . . . . . . . . . . . . . . . . 41
4.4 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 The toplevel system (camllight) 46
5.1 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 Toplevel control functions. . . . . . . . . . . . . . . . . . . . 49
5.3 The toplevel and the module system. . . . . . . . . . . . . . . . 51
5.4 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1
2
5.5 Building custom toplevel systems: camlmktop. . . . . . . . . . . 53
5.6 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 The runtime system (camlrun) 55
6.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.3 Common errors . . . . . . . . . . . . . . . . . . . . . . . . . . 56
7 The librarian (camllibr) 58
7.1 Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.2 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.3 Turning code into a library . . . . . . . . . . . . . . . . . . . 59
8 Lexer and parser generators (camllex, camlyacc) 61
8.1 Overview of camllex . . . . . . . . . . . . . . . . . . . . . . . 61
8.2 Syntax of lexer definitions . . . . . . . . . . . . . . . . . . . 62
8.3 Overview of camlyacc. . . . . . . . . . . . . . . . . . . . . . . 63
8.4 Syntax of grammar definitions . . . . . . . . . . . . . . . . . . 64
8.5 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
8.6 A complete example. . . . . . . . . . . . . . . . . . . . . . . . 66
9 The debugger (camldebug) 68
9.1 Compiling for debugging . . . . . . . . . . . . . . . . . . . . . 68
9.2 Invocation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
9.3 Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
9.4 Executing a program . . . . . . . . . . . . . . . . . . . . . . . 70
9.5 Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
9.6 The call stack. . . . . . . . . . . . . . . . . . . . . . . . . . 73
9.7 Examining variable values . . . . . . . . . . . . . . . . . . . . 74
9.8 Controlling the debugger. . . . . . . . . . . . . . . . . . . . . 75
9.9 Miscellaneous commands. . . . . . . . . . . . . . . . . . . . . . 77
10 Profiling (camlpro) 78
10.1 Compiling for profiling . . . . . . . . . . . . . . . . . . . . . 78
10.2 Profiling an execution. . . . . . . . . . . . . . . . . . . . . . 79
10.3 Printing profiling information. . . . . . . . . . . . . . . . . . 79
10.4 Known bugs. . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
11 Using Caml Light under Emacs 80
11.1 Updating your .emacs. . . . . . . . . . . . . . . . . . . . . . . 80
11.2 The caml editing mode . . . . . . . . . . . . . . . . . . . . . . 80
11.3 Running the toplevel as an inferior process . . . . . . . . . . . 81
11.4 Running the debugger as an inferior process . . . . . . . . . . . 81
12 Interfacing C with Caml Light 83
12.1 Overview and compilation information. . . . . . . . . . . . . . . 83
12.2 The value type. . . . . . . . . . . . . . . . . . . . . . . . . . 85
12.3 Representation of Caml Light data types . . . . . . . . . . . . . 86
12.4 Operations on values. . . . . . . . . . . . . . . . . . . . . . . 87
12.5 Living in harmony with the garbage collector. . . . . . . . . . . 89
12.6 A complete example. . . . . . . . . . . . . . . . . . . . . . . . 91
IV The Caml Light library 94
13 The core library 95
13.1 bool: boolean operations . . . . . . . . . . . . . . . . . . . . 95
13.2 builtin: base types and constructors . . . . . . . . . . . . . . 96
13.3 char: character operations . . . . . . . . . . . . . . . . . . . 97
3
13.4 eq: generic comparisons . . . . . . . . . . . . . . . . . . . . 97
13.5 exc: exceptions . . . . . . . . . . . . . . . . . . . . . . . . 98
13.6 fchar: character operations, without sanity checks . . . . . . . 99
13.7 float: operations on floating-point numbers . . . . . . . . . . 99
13.8 fstring: string operations, without sanity checks . . . . . . . 101
13.9 fvect: operations on vectors, without sanity checks . . . . . . 101
13.10int: operations on integers . . . . . . . . . . . . . . . . . . 101
13.11io: buffered input and output . . . . . . . . . . . . . . . . . 104
13.12list: operations on lists . . . . . . . . . . . . . . . . . . . 109
13.13pair: operations on pairs . . . . . . . . . . . . . . . . . . . 111
13.14ref: operations on references . . . . . . . . . . . . . . . . . 112
13.15stream: operations on streams . . . . . . . . . . . . . . . . . 112
13.16string: string operations . . . . . . . . . . . . . . . . . . . 113
13.17vect: operations on vectors . . . . . . . . . . . . . . . . . . 115
14 The standard library 118
14.1 arg: parsing of command line arguments . . . . . . . . . . . . . 118
14.2 baltree: basic balanced binary trees . . . . . . . . . . . . . . 119
14.3 filename: operations on file names . . . . . . . . . . . . . . . 120
14.4 format: pretty printing . . . . . . . . . . . . . . . . . . . . 121
14.5 gc: memory management control and statistics . . . . . . . . . . 128
14.6 genlex: a generic lexical analyzer . . . . . . . . . . . . . . . 129
14.7 hashtbl: hash tables and hash functions . . . . . . . . . . . . 130
14.8 lexing: the run-time library for lexers generated by camllex . . 132
14.9 map: association tables over ordered types . . . . . . . . . . . 133
14.10parsing: the run-time library for parsers generated by camlyacc 134
14.11printexc: a catch-all exception handler . . . . . . . . . . . . 134
14.12printf: formatting printing functions . . . . . . . . . . . . . 134
14.13queue: queues . . . . . . . . . . . . . . . . . . . . . . . . . 136
14.14random: pseudo-random number generator . . . . . . . . . . . . . 137
14.15set: sets over ordered types . . . . . . . . . . . . . . . . . . 137
14.16sort: sorting and merging lists . . . . . . . . . . . . . . . . 138
14.17stack: stacks . . . . . . . . . . . . . . . . . . . . . . . . . 139
14.18sys: system interface. . . . . . . . . . . . . . . . . . . . . . 139
15 The graphics library 142
15.1 graphics: machine-independent graphics primitives . . . . . . . 143
16 The unix library: Unix system calls 149
16.1 unix: interface to the Unix system . . . . . . . . . . . . . . . 149
17 The num library: arbitrary-precision rational arithmetic 168
17.1 num: operations on numbers . . . . . . . . . . . . . . . . . . . 168
17.2 arith_status: flags that control rational arithmetic . . . . . . 171
18 The str library: regular expressions and string processing 172
18.1 str: regular expressions and high-level string processing . . . 172
V Appendix 176
19 Further reading 177
19.1 Programming in ML . . . . . . . . . . . . . . . . . . . . . . . . 177
19.2 Descriptions of ML dialects . . . . . . . . . . . . . . . . . . . 178
19.3 Implementing functional programming languages . . . . . . . . . . 179
19.4 Applications of ML. . . . . . . . . . . . . . . . . . . . . . . . 180
Foreword
This manual documents the release 0.73 of the Caml Light system. It is
organized as follows.
- Part I, ``Getting started'', explains how to install Caml Light on your
machine.
- Part II, ``The Caml Light language reference manual'', is the reference
description of the Caml Light language.
- Part III, ``The Caml Light commands'', documents the Caml Light compiler,
toplevel system, and programming utilities.
- Part IV, ``The Caml Light library'', describes the modules provided in
the standard library.
- Part V, ``Appendix'', contains a short bibliography, an index of all
identifiers defined in the standard library, and an index of Caml Light
keywords.
Conventions
The Caml Light system comes in several versions: for Unix machines, for
Macintoshes, and for PCs. The parts of this manual that are specific to one
version are presented as shown below:
Unix: This is material specific to the Unix version.
Mac: This is material specific to the Macintosh version.
PC: This is material specific to the PC version.
License
c
The Caml Light system is copyright o 1989, 1990, 1991, 1992, 1993, 1994,
1995, 1996, 1997 Institut National de Recherche en Informatique et en
Automatique (INRIA). INRIA holds all ownership rights to the Caml Light
system. See the file COPYRIGHT in the distribution for the copyright notice.
The Caml Light system can be freely copied, but not sold. More precisely,
INRIA grants any user of the Caml Light system the right to reproduce it,
provided that the copies are distributed free of charge and under the
conditions given in the COPYRIGHT file. The present documentation is
distributed under the same conditions.
4
5
Availability by FTP
The complete Caml Light distribution resides on the machine ftp.inria.fr. The
distribution files can be transferred by anonymous FTP:
Host: ftp.inria.fr (Internet address 192.93.2.54)
Login name: anonymous
Password: your e-mail address
Directory: lang/caml-light
Files: see the index in file README
Part I
Getting started
6
Chapter 1
Installation instructions
This chapter explains how to install Caml Light on your machine.
1.1 The Unix version
Requirements. Any machine that runs under one of the various flavors of the
Unix operating system, and that has a flat, non-segmented, 32-bit or 64-bit
address space. 4M of RAM, 2M of free disk space. The graphics library
requires X11 release 4 or later.
Installation. The Unix version is distributed in source format, as a
compressed tar file named cl73unix.tar.gz. To extract, move to the directory
where you want the source files to reside, transfer cl7unix.tar.gz to that
directory, and execute
zcat cl73unix.tar.gz | tar xBf -
This extracts the source files in the current directory. The file INSTALL
contains complete instructions on how to configure, compile and install Caml
Light. Read it and follow the instructions.
Troubleshooting. See the file INSTALL.
1.2 The Macintosh version
Requirements. Any Macintosh with at least 1M of RAM (2M is recommended),
running System 6 or 7. About 850K of free space on the disk. The parts of
the Caml Light system that support batch compilation currently require the
Macintosh Programmer's Workshop (MPW) version 3.2. MPW is Apple's development
environment, and it is distributed by APDA, Apple's Programmers and Developers
Association. See the file READ ME in the distribution for APDA's address.
Installation. Create the folder where the Caml Light files will reside.
Double-click on the file cl73macbin.sea from the distribution. This displays
a file dialog box. Open the folder where the Caml Light files will reside,
and click on the Extract button. This will re-create all files from the
distribution in the Caml Light folder.
To test the installation, double-click on the application Caml Light. The
``Caml Light output'' window should display something like
> Caml Light version 0.73
#
7
Chapter 1. Installation instructions 8
In the ``Caml Light input'' window, enter 1+2;; and press the Return key. The
``Caml Light output'' window should display:
> Caml Light version 0.73
#1+2;;
- : int = 3
#
Select ``Quit'' from the ``File'' menu to return to the Finder.
If you have MPW, you can install the batch compilation tools as follows.
The tools and scripts from the tools folder must reside in a place where MPW
will find them as commands. There are two ways to achieve this result:
either copy the files in the tools folder to the Tools or the Scripts folder
in your MPW folder; or keep the files in the tools folder and add the
following line to your UserStartup file (assuming Caml Light resides in folder
Caml Light on the disk named My HD):
Set Commands "{Commands},My HD:Caml Light:tools:"
In either case, you now have to edit the camlc script, and replace the string
Macintosh HD:Caml Light:lib:
(in the first line) with the actual pathname of the lib folder. For example,
if you put Caml Light in folder Caml Light on the disk named My HD, the first
line of camlc should read:
Set stdlib "My HD:Caml Light:lib:"
Troubleshooting. Here is one commonly encountered problem.
Cannot find file stream.zi
(Displayed in the ``Caml Light output'' window, with an alert box telling
you that Caml Light has terminated abnormally.) This is an installation
error. The folder named lib in the distribution must always be in the
same folder as the Caml Light application. It's OK to move the
application to another folder; but remember to move the lib directory to
the same folder. (To return to the Finder, first select ``Quit'' from
the ``File'' menu.)
1.3 The MS-Windows version
Requirements. A PC equipped with a 80386, 80486 or Pentium processor, running
MS Windows 3.x, Windows 95 or Windows NT. About 3M of free space on the disk.
At least 8M of RAM is recommended.
Installation. The MS-Windows version is distributed as a self-extracting,
self-installing archive named cl73win.exe. Simply run it and follow the steps
of the installation program.
Part II
The Caml Light language reference manual
9
Chapter 2
The core Caml Light language
Foreword
This document is intended as a reference manual for the Caml Light language.
It lists all language constructs, and gives their precise syntax and informal
semantics. It is by no means a tutorial introduction to the language: there
is not a single example. A good working knowledge of the language, as
provided by the companion tutorial Functional programming using Caml Light, is
assumed.
No attempt has been made at mathematical rigor: words are employed with
their intuitive meaning, without further definition. As a consequence, the
typing rules have been left out, by lack of the mathematical framework
required to express them, while they are definitely part of a full formal
definition of the language. The reader interested in truly formal
descriptions of languages from the ML family is referred to The definition of
Standard ML and Commentary on Standard ML, by Milner, Tofte and Harper, MIT
Press.
Warning
Several implementations of the Caml Light language are available, and they
evolve at each release. Consequently, this document carefully distinguishes
the language and its implementations. Implementations can provide extra
language constructs; moreover, all points left unspecified in this reference
manual can be interpreted differently by the implementations. The purpose of
this reference manual is to specify those features that all implementations
must provide.
Notations
The syntax of the language is given in BNF-like notation. Terminal symbols
are set in typewriter font (like this). Non-terminal symbols are set in
italic font (like that). Square brackets [...] denote optional components.
Curly brackets {...} denotes zero, one or several repetitions of the enclosed
components. Curly bracket with a trailing plus sign {...}+ denote one or
several repetitions of the enclosed components. Parentheses (...) denote
grouping.
10
Chapter 2. The core Caml Light language 11
2.1 Lexical conventions
Blanks
The following characters are considered as blanks: space, newline, horizontal
tabulation, carriage return, line feed and form feed. Blanks are ignored, but
they separate adjacent identifiers, literals and keywords that would otherwise
be confused as one single identifier, literal or keyword.
Comments
Comments are introduced by the two characters (*, with no intervening blanks,
and terminated by the characters *), with no intervening blanks. Comments are
treated as blank characters. Comments do not occur inside string or character
literals. Nested comments are correctly handled.
Identifiers
ident ::= letter {letter | 0...9 | _}
letter ::= A...Z | a...z
Identifiers are sequences of letters, digits and _ (the underscore
character), starting with a letter. Letters contain at least the 52 lowercase
and uppercase letters from the ASCII set. Implementations can recognize as
letters other characters from the extended ASCII set. Identifiers cannot
contain two adjacent underscore characters (__). Implementation may limit the
number of characters of an identifier, but this limit must be above 256
characters. All characters in an identifier are meaningful.
Integer literals
integer-literal ::= [-] {0...9}+
| [-] (0x | 0X) {0...9 | A...F | a...f}+
| [-] (0o | 0O) {0...7}+
| [-] (0b | 0B) {0...1}+
An integer literal is a sequence of one or more digits, optionally preceded
by a minus sign. By default, integer literals are in decimal (radix 10). The
following prefixes select a different radix:
--------------------------------
|Prefix|Radix |
--------------------------------
|0x, 0X|hexadecimal (radix 16) |
|0o, 0O|octal (radix 8) |
|0b, 0B|binary (radix 2) |
--------------------------------
(The initial 0 is the digit zero; the O for octal is the letter O.)
Floating-point literals
float-literal ::= [-] {0...9}+ [. {0...9}] [(e | E) [+ | -] {0...9}+]
Floating-point decimals consist in an integer part, a decimal part and an
exponent part. The integer part is a sequence of one or more digits,
optionally preceded by a minus sign. The decimal part is a decimal point
followed by zero, one or more digits. The exponent part is the character e or
E followed by an optional + or - sign, followed by one or more digits. The
decimal part or the exponent part can be omitted, but not both to avoid
ambiguity with integer literals.
Chapter 2. The core Caml Light language 12
Character literals
char-literal ::= ` regular-char `
| ` \ (\ | ` | n | t | b | r) `
| ` \ (0...9) (0...9) (0...9) `
Character literals are delimited by ` (backquote) characters. The two
backquotes enclose either one character different from ` and \, or one of the
escape sequences below:
--------------------------------------------------------
|Sequence|Character denoted |
--------------------------------------------------------
|\\ |backslash (\) |
|\` |backquote (`) |
|\n |newline (LF) |
|\r |return (CR) |
|\t |horizontal tabulation (TAB) |
|\b |backspace (BS) |
|\ddd |the character with ASCII code ddd in decimal |
--------------------------------------------------------
String literals
string-literal ::= " {string-character} "
string-character ::= regular-char
| \ (\ | " | n | t | b | r)
| \ (0...9) (0...9) (0...9)
String literals are delimited by " (double quote) characters. The two
double quotes enclose a sequence of either characters different from " and \,
or escape sequences from the table below:
--------------------------------------------------------
|Sequence|Character denoted |
--------------------------------------------------------
|\\ |backslash (\) |
|\" |double quote (") |
|\n |newline (LF) |
|\r |return (CR) |
|\t |horizontal tabulation (TAB) |
|\b |backspace (BS) |
|\ddd |the character with ASCII code ddd in decimal |
--------------------------------------------------------
16
Implementations must support string literals up to 2 -1 characters in
length (65535 characters).
Keywords
The identifiers below are reserved as keywords, and cannot be employed
otherwise:
and as begin do done downto
else end exception for fun function
if in let match mutable not
of or prefix rec then to
try type value where while with
The following character sequences are also keywords:
# ! != & ( ) * *. + +.
, - -. -> . .( / /. : ::
Chapter 2. The core Caml Light language 13
:= ; ;; < <. <- <= <=. <> <>.
= =. == > >. >= >=. @ [ [|
] ^ _ __ { | |] } '
Ambiguities
Lexical ambiguities are resolved according to the ``longest match'' rule:
when a character sequence can be decomposed into two tokens in several
different ways, the decomposition retained is the one with the longest first
token.
2.2 Global names
Global names are used to denote value variables, value constructors (constant
or non-constant), type constructors, and record labels. Internally, a global
name consists of two parts: the name of the defining module (the module
name), and the name of the global inside that module (the local name). The
two parts of the name must be valid identifiers. Externally, global names
have the following syntax:
global-name ::= ident
| ident __ ident
The form ident __ ident is called a qualified name. The first identifier is
the module name, the second identifier is the local name. The form ident is
called an unqualified name. The identifier is the local name; the module name
is omitted. The compiler infers this module name following the completion
rules given below, therefore transforming the unqualified name into a full
global name.
To complete an unqualified identifier, the compiler checks a list of
modules, the opened modules, to see if they define a global with the same
local name as the unqualified identifier. When one is found, the identifier
is completed into the full name of that global. That is, the compiler takes
as module name the name of an opened module that defines a global with the
same local name as the unqualified identifier. If several modules satisfy
this condition, the one that comes first in the list of opened modules is
selected.
The list of opened modules always includes the module currently being
compiled (checked first). (In the case of a toplevel-based implementation,
this is the module where all toplevel definitions are entered.) It also
includes a number of standard library modules that provide the initial
environment (checked last). In addition, the #open and #close directives can
be used to add or remove modules from that list. The modules added with #open
are checked after the module currently being compiled, but before the initial
standard library modules.
Chapter 2. The core Caml Light language 14
variable ::= global-name
| prefix operator-name
operator-name ::= + | - | * | / | mod | +. | -. | *. | /.
| @ | ^ | ! | := | = | <> | == | != | !
| < | <= | > | <= | <. | <=. | >. | <=.
cconstr ::= global-name
| []
| ()
ncconstr ::= global-name
| prefix ::
typeconstr ::= global-name
label ::= global-name
Depending on the context, global names can stand for global variables
(variable), constant value constructors (cconstr), non-constant value
constructors (ncconst), type constructors (typeconstr), or record labels
(label). For variables and value constructors, special names built with
prefix and an operator name are recognized. The tokens [] and () are also
recognized as built-in constant constructors (the empty list and the unit
value).
The syntax of the language restricts labels and type constructors to appear
in certain positions, where no other kind of global names are accepted. Hence
labels and type constructors have their own name spaces. Value constructors
and value variables live in the same name space: a global name in value
position is interpreted as a value constructor if it appears in the scope of a
type declaration defining that constructor; otherwise, the global name is
taken to be a value variable. For value constructors, the type declaration
determines whether a constructor is constant or not.
2.3 Values
This section describes the kinds of values that are manipulated by Caml Light
programs.
2.3.1 Base values
Integer numbers
30 30
Integer values are integer numbers from -2 to 2 -1, that is -1073741824
to 1073741823. Implementations may support a wider range of integer values.
Floating-point numbers
Floating-point values are numbers in floating-point representation.
Everything about floating-point values is implementation-dependent, including
the range of representable numbers, the number of significant digits, and the
way floating-point results are rounded.
Characters
Character values are represented as 8-bit integers between 0 and 255.
Character codes between 0 and 127 are interpreted following the ASCII
standard. The interpretation of character codes between 128 and 255 is
implementation-dependent.
Chapter 2. The core Caml Light language 15
Character strings
String values are finite sequences of characters. Implementations must
16
support strings up to 2 -1 characters in length (65535 characters).
Implementations may support longer strings.
2.3.2 Tuples
Tuples of values are written (v1,...,vn), standing for the n-tuple of values
14
v1 to vn. Tuples of up to 2 -1 elements (16383 elements) must be
supported, though implementations may support tuples with more elements.
2.3.3 Records
Record values are labeled tuples of values. The record value written
{label1=v1 ;...;labeln =vn} associates the value vi to the record label
14
labeli, for i=1...n. Records with up to 2 -1 fields (16383 fields) must be
supported, though implementations may support records with more fields.
2.3.4 Arrays
Arrays are finite, variable-sized sequences of values of the same type.
14
Arrays of length up to 2 -1 (16383 elements) must be supported, though
implementations may support larger arrays.
2.3.5 Variant values
Variant values are either a constant constructor, or a pair of a non-constant
constructor and a value. The former case is written cconstr; the latter case
is written ncconstr(v), where v is said to be the argument of the non-constant
constructor ncconstr.
The following constants are treated like built-in constant constructors:
------------------------------
Constant Constructor
------------------------------
false the boolean false
true the boolean true
() the ``unit'' value
[] the empty list
------------------------------
2.3.6 Functions
Functional values are mappings from values to values.
2.4 Type expressions
typexpr ::= ' ident
| ( typexpr )
| typexpr -> typexpr
| typexpr {* typexpr}+
| typeconstr
| typexpr typeconstr
| ( typexpr {, typexpr} ) typeconstr
Chapter 2. The core Caml Light language 16
The table below shows the relative precedences and associativity of
operators and non-closed type constructions. The constructions with higher
precedences come first.
---------------------------------------------
|Operator |Associativity |
---------------------------------------------
|Type constructor application |-- |
|* |-- |
|-> |right |
---------------------------------------------
Type expressions denote types in definitions of data types as well as in
type constraints over patterns and expressions.
Type variables
The type expression ' ident stands for the type variable named ident. In data
type definitions, type variables are names for the data type parameters. In
type constraints, they represent unspecified types that can be instantiated by
any type to satisfy the type constraint.
Parenthesized types
The type expression ( typexpr ) denotes the same type as typexpr.
Function types
The type expression typexpr1 -> typexpr2 denotes the type of functions
mapping arguments of type typexpr1 to results of type typexpr2.
Tuple types
The type expression typexpr1 *...* typexprn denotes the type of tuples whose
elements belong to types typexpr1,...typexprn respectively.
Constructed types
Type constructors with no parameter, as in typeconstr, are type expressions.
The type expression typexpr typeconstr, where typeconstr is a type
constructor with one parameter, denotes the application of the unary type
constructor typeconstr to the type typexpr.
The type expression (typexpr1,...,typexprn) typeconstr, where typeconstr is
a type constructor with n parameters, denotes the application of the n-ary
type constructor typeconstr to the types typexpr1 through typexprn.
2.5 Constants
constant ::= integer-literal
| float-literal
| char-literal
| string-literal
| cconstr
The syntactic class of constants comprises literals from the four base types
(integers, floating-point numbers, characters, character strings), and
constant constructors.
Chapter 2. The core Caml Light language 17
2.6 Patterns
pattern ::= ident
| _
| pattern as ident
| ( pattern )
| ( pattern : typexpr )
| pattern | pattern
| constant
| ncconstr pattern
| pattern , pattern {, pattern}
| { label = pattern {; label = pattern} }
| [ ]
| [ pattern {; pattern} ]
| pattern :: pattern
The table below shows the relative precedences and associativity of
operators and non-closed pattern constructions. The constructions with higher
precedences come first.
----------------------------------------
|Operator |Associativity |
----------------------------------------
|Constructor application|-- |
|:: |right |
|, |-- |
|| |left |
|as |-- |
----------------------------------------
Patterns are templates that allow selecting data structures of a given
shape, and binding identifiers to components of the data structure. This
selection operation is called pattern matching; its outcome is either ``this
value does not match this pattern'', or ``this value matches this pattern,
resulting in the following bindings of identifiers to values''.
Variable patterns
A pattern that consists in an identifier matches any value, binding the
identifier to the value. The pattern _ also matches any value, but does not
bind any identifier.
Alias patterns
The pattern pattern1 as ident matches the same values as pattern1. If the
matching against pattern1 is successful, the identifier ident is bound to the
matched value, in addition to the bindings performed by the matching against
pattern1.
Parenthesized patterns
The pattern ( pattern1 ) matches the same values as pattern1. A type
constraint can appear in a parenthesized patterns, as in
( pattern1 : typexpr ). This constraint forces the type of pattern1 to be
compatible with type.
``Or'' patterns
The pattern pattern1 | pattern2 represents the logical ``or'' of the two
patterns pattern1 and pattern2. A value matches pattern1 | pattern2 either
if it matches pattern1 or if it matches pattern2. The two sub-patterns
Chapter 2. The core Caml Light language 18
pattern1 and pattern2 must contain no identifiers. Hence no bindings are
returned by matching against an ``or'' pattern.
Constant patterns
A pattern consisting in a constant matches the values that are equal to this
constant.
Variant patterns
The pattern ncconstr pattern1 matches all variants whose constructor is equal
to ncconstr, and whose argument matches pattern1.
The pattern pattern1 :: pattern2 matches non-empty lists whose heads match
pattern1, and whose tails match pattern2. This pattern behaves like
prefix :: ( pattern1 , pattern2 ).
The pattern [ pattern1 ;...; patternn ] matches lists of length n whose
elements match pattern1 ...patternn, respectively. This pattern behaves like
pattern1 ::...:: patternn :: [].
Tuple patterns
The pattern pattern1 ,..., patternn matches n-tuples whose components match
the patterns pattern1 through patternn. That is, the pattern matches the
tuple values (v1,...,vn) such that patterni matches vi for i =1, ...,n.
Record patterns
The pattern { label1 = pattern1 ;...; labeln = patternn } matches records
that define at least the labels label1 through labeln, and such that the
value associated to labeli match the pattern patterni, for i= 1,...,n. The
record value can define more labels than label1 ...labeln; the values
associated to these extra labels are not taken into account for matching.
Chapter 2. The core Caml Light language 19
2.7 Expressions
expr ::= ident
| variable
| constant
| ( expr )
| begin expr end
| ( expr : typexpr )
| expr , expr {, expr}
| ncconstr expr
| expr :: expr
| [ expr {; expr} ]
| [| expr {; expr} |]
| { label = expr {; label = expr} }
| expr expr
| prefix-op expr
| expr infix-op expr
| expr . label
| expr . label <- expr
| expr .( expr )
| expr .( expr ) <- expr
| expr & expr
| expr or expr
| if expr then expr [else expr]
| while expr do expr done
| for ident = expr (to | downto) expr do expr done
| expr ; expr
| match expr with simple-matching
| fun multiple-matching
| function simple-matching
| try expr with simple-matching
| let [rec] let-binding {and let-binding} in expr
simple-matching ::= pattern -> expr {| pattern -> expr}
multiple-matching ::= pattern-list -> expr {| pattern-list -> expr}
pattern-list ::= pattern {pattern}
let-binding ::= pattern = expr
| variable pattern-list = expr
prefix-op ::= - | -. | !
infix-op ::= + | - | * | / | mod | +. | -. | *. | /. | ** | @ | ^ | ! | :=
| = | <> | == | != | < | <= | > | >= | <. | <=. | >. | >=.
The table below shows the relative precedences and associativity of
operators and non-closed constructions. The constructions with higher
precedence come first.
Chapter 2. The core Caml Light language 20
---------------------------------------------
|Construction or operator |Associativity |
---------------------------------------------
|! |-- |
|. .( |-- |
|function application |left |
|constructor application |-- |
|- -. (prefix) |-- |
|** |right |
|mod |left |
|* *. / /. |left |
|+ +. - -. |left |
|:: |right |
|@ ^ |right |
|comparisons (= == < etc.) |left |
|not |-- |
|& |left |
|or |left |
|, |-- |
|<- := |right |
|if |-- |
|; |right |
|let match fun function try |-- |
---------------------------------------------
2.7.1 Simple expressions
Constants
Expressions consisting in a constant evaluate to this constant.
Variables
Expressions consisting in a variable evaluate to the value bound to this
variable in the current evaluation environment. The variable can be either a
qualified identifier or a simple identifier. Qualified identifiers always
denote global variables. Simple identifiers denote either a local variable,
if the identifier is locally bound, or a global variable, whose full name is
obtained by qualifying the simple identifier, as described in section 2.2.
Parenthesized expressions
The expressions ( expr ) and begin expr end have the same value as expr. Both
constructs are semantically equivalent, but it is good style to use
begin...end inside control structures:
if ... then begin ... ; ... end else begin ... ; ... end
and (...) for the other grouping situations.
Parenthesized expressions can contain a type constraint, as in
( expr : type ). This constraint forces the type of expr to be compatible
with type.
Chapter 2. The core Caml Light language 21
Function abstraction
The most general form of function abstraction is:
1 m
fun pattern1 ... pattern1 -> expr1
| ... 1 m
| patternn ... patternn -> exprn
This expression evaluates to a functional value with m curried arguments.
When this function is applied to m values v1 ... vm, the values are matched
1 m
against each pattern row patterni...patterni for i from 1 to n. If one of
these matchings succeeds, that is if the value vj matches the pattern
j
patterni for all j=1, ...,m, then the expression expri associated to the
selected pattern row is evaluated, and its value becomes the value of the
function application. The evaluation of expri takes place in an environment
enriched by the bindings performed during the matching.
If several pattern rows match the arguments, the one that occurs first in
the function definition is selected. If none of the pattern rows matches the
argument, the exception Match_failure is raised.
If the function above is applied to less than m arguments, a functional
value is returned, that represents the partial application of the function to
the arguments provided. This partial application is a function that, when
applied to the remaining arguments, matches all arguments against the pattern
rows as described above. Matching does not start until all m arguments have
been provided to the function; hence, partial applications of the function to
less than m arguments never raise Match_failure.
All pattern rows in the function body must contain the same number of
patterns. A variable must not be bound more than once in one pattern row.
Functions with only one argument can be defined with the function keyword
instead of fun:
function pattern1 -> expr1
| ...
| patternn -> exprn
The function thus defined behaves exactly as described above. The only
difference between the two forms of function definition is how a parsing
ambiguity is resolved. The two forms cconstr pattern (two patterns in a row)
and ncconstr pattern (one pattern) cannot be distinguished syntactically.
Function definitions introduced by fun resolve the ambiguity to the former
form; function definitions introduced by function resolve it to the latter
form (the former form makes no sense in this case).
Function application
Function application is denoted by juxtaposition of expressions. The
expression expr1 expr2...exprn evaluates the expressions expr1 to exprn. The
expression expr1 must evaluate to a functional value, which is then applied
to the values of expr2,...,exprn. The order in which the expressions
expr1,...,exprn are evaluated is not specified.
Local definitions
The let and let rec constructs bind variables locally. The construct
Chapter 2. The core Caml Light language 22
let pattern1 = expr1 and...and patternn = exprn in expr
evaluates expr1...exprn in some unspecified order, then matches their values
against the patterns pattern1...patternn. If the matchings succeed, expr is
evaluated in the environment enriched by the bindings performed during
matching, and the value of expr is returned as the value of the whole let
expression. If one of the matchings fails, the exception Match_failure is
raised.
An alternate syntax is provided to bind variables to functional values:
instead of writing
ident = fun pattern1...patternm -> expr
in a let expression, one may instead write
ident pattern1 ...patternm = expr
Both forms bind ident to the curried function with m arguments and only one
case,
pattern1 ...patternm -> expr.
Recursive definitions of variables are introduced by let rec:
let rec pattern1 = expr1 and...and patternn = exprn in expr
The only difference with the let construct described above is that the
bindings of variables to values performed by the pattern-matching are
considered already performed when the expressions expr1 to exprn are
evaluated. That is, the expressions expr1 to exprn can reference identifiers
that are bound by one of the patterns pattern1,...,patternn, and expect them
to have the same value as in expr, the body of the let rec construct.
The recursive definition is guaranteed to behave as described above if the
expressions expr1 to exprn are function definitions (fun... or function...),
and the patterns pattern1...patternn consist in a single variable, as in:
let rec ident1 = fun...and...and identn = fun...in expr
This defines ident1...identn as mutually recursive functions local to expr.
The behavior of other forms of let rec definitions is
implementation-dependent.
2.7.2 Control constructs
Sequence
The expression expr1 ; expr2 evaluates expr1 first, then expr2, and returns
the value of expr2.
Conditional
The expression if expr1 then expr2 else expr3 evaluates to the value of expr2
if expr1 evaluates to the boolean true, and to the value of expr3 if expr1
evaluates to the boolean false.
The else expr3 part can be omitted, in which case it defaults to else ().
Chapter 2. The core Caml Light language 23
Case expression
The expression
match expr
with pattern1 -> expr1
| ...
| patternn -> exprn
matches the value of expr against the patterns pattern1 to patternn. If the
matching against patterni succeeds, the associated expression expri is
evaluated, and its value becomes the value of the whole match expression. The
evaluation of expri takes place in an environment enriched by the bindings
performed during matching. If several patterns match the value of expr, the
one that occurs first in the match expression is selected. If none of the
patterns match the value of expr, the exception Match_failure is raised.
Boolean operators
The expression expr1 & expr2 evaluates to true if both expr1 and expr2
evaluate to true; otherwise, it evaluates to false. The first component,
expr1, is evaluated first. The second component, expr2, is not evaluated if
the first component evaluates to false. Hence, the expression expr1 & expr2
behaves exactly as
if expr1 then expr2 else false.
The expression expr1 or expr2 evaluates to true if one of expr1 and expr2
evaluates to true; otherwise, it evaluates to false. The first component,
expr1, is evaluated first. The second component, expr2, is not evaluated if
the first component evaluates to true. Hence, the expression expr1 or expr2
behaves exactly as
if expr1 then true else expr2.
Loops
The expression while expr1 do expr2 done repeatedly evaluates expr2 while
expr1 evaluates to true. The loop condition expr1 is evaluated and tested at
the beginning of each iteration. The whole while...done expression evaluates
to the unit value ().
The expression for ident = expr1 to expr2 do expr3 done first evaluates the
expressions expr1 and expr2 (the boundaries) into integer values n and p.
Then, the loop body expr3 is repeatedly evaluated in an environment where the
local variable named ident is successively bound to the values n, n+1, ...,
p-1, p. The loop body is never evaluated if n >p.
The expression for ident = expr1 downto expr2 do expr3 done first evaluates
the expressions expr1 and expr2 (the boundaries) into integer values n and p.
Then, the loop body expr3 is repeatedly evaluated in an environment where the
local variable named ident is successively bound to the values n, n-1, ...,
p+1, p. The loop body is never evaluated if n
expr1
| ...
| patternn -> exprn
evaluates the expression expr and returns its value if the evaluation of expr
does not raise any exception. If the evaluation of expr raises an exception,
the exception value is matched against the patterns pattern1 to patternn. If
the matching against patterni succeeds, the associated expression expri is
evaluated, and its value becomes the value of the whole try expression. The
evaluation of expri takes place in an environment enriched by the bindings
performed during matching. If several patterns match the value of expr, the
one that occurs first in the try expression is selected. If none of the
patterns matches the value of expr, the exception value is raised again,
thereby transparently ``passing through'' the try construct.
2.7.3 Operations on data structures
Products
The expression expr1 ,..., exprn evaluates to the n-tuple of the values of
expressions expr1 to exprn. The evaluation order for the subexpressions is
not specified.
Variants
The expression ncconstr expr evaluates to the variant value whose constructor
is ncconstr, and whose argument is the value of expr.
For lists, some syntactic sugar is provided. The expression expr1 :: expr2
stands for the constructor prefix :: applied to the argument
( expr1 , expr2 ), and therefore evaluates to the list whose head is the
value of expr1 and whose tail is the value of expr2. The expression
[ expr1 ;...; exprn ] is equivalent to expr1 ::...:: exprn :: [], and
therefore evaluates to the list whose elements are the values of expr1 to
exprn.
Records
The expression { label1 = expr1 ;...; labeln = exprn } evaluates to the
record value { label1 = v1 ;...; labeln = vn }, where vi is the value of
expri for i=1, ...,n. The labels label1 to labeln must all belong to the
same record types; all labels belonging to this record type must appear
exactly once in the record expression, though they can appear in any order.
The order in which expr1 to exprn are evaluated is not specified.
The expression expr1 . label evaluates expr1 to a record value, and returns
the value associated to label in this record value.
The expression expr1 . label <- expr2 evaluates expr1 to a record value,
which is then modified in-place by replacing the value associated to label in
this record by the value of expr2. This operation is permitted only if label
has been declared mutable in the definition of the record type. The whole
expression expr1 . label <- expr2 evaluates to the unit value ().
Chapter 2. The core Caml Light language 25
Arrays
The expression [| expr1 ;...; exprn |] evaluates to a n-element array, whose
elements are initialized with the values of expr1 to exprn respectively. The
order in which these expressions are evaluated is unspecified.
The expression expr1 .( expr2 ) is equivalent to the application
vect_item expr1 expr2. In the initial environment, the identifier vect_item
resolves to a built-in function that returns the value of element number
expr2 in the array denoted by expr1. The first element has number 0; the
last element has number n-1, where n is the size of the array. The exception
Invalid_argument is raised if the access is out of bounds.
The expression expr1 .( expr2 ) <- expr3 is equivalent to
vect_assign expr1 expr2 expr3. In the initial environment, the identifier
vect_assign resolves to a built-in function that modifies in-place the array
denoted by expr1, replacing element number expr2 by the value of expr3. The
exception Invalid_argument is raised if the access is out of bounds. The
built-in function returns (). Hence, the whole expression
expr1 .( expr2 ) <- expr3 evaluates to the unit value ().
This behavior of the two constructs expr1 .( expr2 ) and
expr1 .( expr2 ) <- expr3 may change if the meaning of the identifiers
vect_item and vect_assign is changed, either by redefinition or by
modification of the list of opened modules. See the discussion below on
operators.
2.7.4 Operators
The operators written infix-op in the grammar above can appear in infix
position (between two expressions). The operators written prefix-op in the
grammar above can appear in prefix position (in front of an expression).
The expression prefix-op expr is interpreted as the application ident expr,
where ident is the identifier associated to the operator prefix-op in the
table below. Similarly, the expression expr1 infix-op expr2 is interpreted
as the application ident expr1 expr2, where ident is the identifier
associated to the operator infix-op in the table below. The identifiers
written ident above are then evaluated following the rules in section 2.7.1.
In the initial environment, they evaluate to built-in functions whose behavior
is described in the table. The behavior of the constructions prefix-op expr
and expr1 infix-op expr2 may change if the meaning of the identifiers
associated to prefix-op or infix-op is changed, either by redefinition of the
identifiers, or by modification of the list of opened modules, through the
#open and #close directives.
Chapter 2. The core Caml Light language 26
---------------------------------------------------------------------------
|Operator |Associated |Behavior in the default environment |
| |identifier | |
---------------------------------------------------------------------------
|+ |prefix + |Integer addition. |
|- (infix) |prefix - |Integer subtraction. |
|- (prefix) |minus |Integer negation. |
|* |prefix * |Integer multiplication. |
|/ |prefix / |Integer division. Raise Division_by_zero if |
| | |second argument is zero. The result is |
| | |unspecified if either argument is negative. |
|mod |prefix mod |Integer modulus. Raise Division_by_zero if |
| | |second argument is zero. The result is |
| | |unspecified if either argument is negative. |
|+. |prefix +. |Floating-point addition. |
|-. (infix) |prefix -. |Floating-point subtraction. |
|-. (prefix) |minus_float |Floating-point negation. |
|*. |prefix *. |Floating-point multiplication. |
|/. |prefix /. |Floating-point division. Raise Divi- |
| | |sion_by_zero if second argument is zero. |
|** |prefix ** |Floating-point exponentiation. |
|@ |prefix @ |List concatenation. |
|^ |prefix ^ |String concatenation. |
|! |prefix ! |Dereferencing (return the current contents of |
| | |a reference). |
|:= |prefix := |Reference assignment (update the reference |
| | |given as first argument with the value of the |
| | |second argument). |
|= |prefix = |Structural equality test. |
|<> |prefix <> |Structural inequality test. |
|== |prefix == |Physical equality test. |
|!= |prefix != |Physical inequality test. |
|< |prefix < |Test ``less than'' on integers. |
|<= |prefix <= |Test ``less than or equal '' on integers. |
|> |prefix > |Test ``greater than'' on integers. |
|>= |prefix >= |Test ``greater than or equal'' on integers. |
|<. |prefix <. |Test ``less than'' on floating-point numbers. |
|<=. |prefix <=. |Test ``less than or equal '' on floating-point |
| | |numbers. |
|>. |prefix >. |Test ``greater than'' on floating-point |
| | |numbers. |
|>=. |prefix >=. |Test ``greater than or equal'' on floating- |
| | |point numbers. |
---------------------------------------------------------------------------
The behavior of the +, -, *, /, mod, +., -., *. or /. operators is
unspecified if the result falls outside of the range of representable integers
or floating-point numbers, respectively. See chapter 13 for a more precise
description of the behavior of the operators above.
2.8 Global definitions
This section describes the constructs that bind global identifiers (value
variables, value constructors, type constructors, record labels).
Chapter 2. The core Caml Light language 27
2.8.1 Type definitions
type-definition ::= type typedef {and typedef}
typedef ::= type-params ident = constr-decl {| constr-decl}
| type-params ident = { label-decl {; label-decl} }
| type-params ident == typexpr
| type-params ident
type-params ::= nothing
| ' ident
| ( ' ident {, ' ident} )
constr-decl ::= ident
| ident of typexpr
label-decl ::= ident : typexpr
| mutable ident : typexpr
Type definitions bind type constructors to data types: either variant
types, record types, type abbreviations, or abstract data types.
Type definitions are introduced by the type keyword, and consist in one or
several simple definitions, possibly mutually recursive, separated by the and
keyword. Each simple definition defines one type constructor.
A simple definition consists in an identifier, possibly preceded by one or
several type parameters, and followed by a data type description. The
identifier is the local name of the type constructor being defined. (The
module name for this type constructor is the name of the module being
compiled.) The optional type parameters are either one type variable ' ident,
for type constructors with one parameter, or a list of type variables
(' ident1,...,' identn), for type constructors with several parameters.
These type parameters can appear in the type expressions of the right-hand
side of the definition.
Variant types
The type definition typeparams ident = constr-decl1 |...| constr-decln
defines a variant type. The constructor declarations
constr-decl1,...,constr-decln describe the constructors associated to this
variant type. The constructor declaration ident of typexpr declares the local
name ident (in the module being compiled) as a non-constant constructor, whose
argument has type typexpr. The constructor declaration ident declares the
local name ident (in the module being compiled) as a constant constructor.
Record types
The type definition typeparams ident = { label-decl1 ;...; label-decln }
defines a record type. The label declarations label-decl1,...,label-decln
describe the labels associated to this record type. The label declaration
ident : typexpr declares the local name ident in the module being compiled as
a label, whose argument has type typexpr. The label declaration
mutable ident : typexpr behaves similarly; in addition, it allows physical
modification over the argument to this label.
Type abbreviations
The type definition typeparams ident == typexpr defines the type constructor
ident as an abbreviation for the type expression typexpr.
Chapter 2. The core Caml Light language 28
Abstract types
The type definition typeparams ident defines ident as an abstract type. When
appearing in a module interface, this definition allows exporting a type
constructor while hiding how it is represented in the module implementation.
2.8.2 Exception definitions
exception-definition ::= exception constr-decl {and constr-decl}
Exception definitions add new constructors to the built-in variant type exn
of exception values. The constructors are declared as for a definition of a
variant type.
2.9 Directives
directive ::= # open string
| # close string
| # ident string
Directives control the behavior of the compiler. They apply to the
remainder of the current compilation unit.
The two directives #open and #close modify the list of opened modules, that
the compiler uses to complete unqualified identifiers, as described in
section 2.2. The directive #open string adds the module whose name is given
by the string constant string to the list of opened modules, in first
position. The directive #close string removes the first occurrence of the
module whose name is given by the string constant string from the list of
opened modules.
Implementations can provide other directives, provided they follow the
syntax # ident string, where ident is the name of the directive, and the
string constant string is the argument to the directive. The behavior of
these additional directives is implementation-dependent.
2.10 Module implementations
implementation ::= {impl-phrase ;;}
impl-phrase ::= expr
| value-definition
| type-definition
| exception-definition
| directive
value-definition ::= let [rec] let-binding {and let-binding}
A module implementation consists in a sequence of implementation phrases,
terminated by double semicolons. An implementation phrase is either an
expression, a value definition, a type or exception definition, or a
directive. At run-time, implementation phrases are evaluated sequentially, in
the order in which they appear in the module implementation.
Implementation phrases consisting in an expression are evaluated for their
side-effects.
Value definitions bind global value variables in the same way as a
let...in... expression binds local variables. The expressions are evaluated,
and their values are matched against the left-hand sides of the = sides, as
explained in section 2.7.1. If the matching succeeds, the bindings of
identifiers to values performed during matching are interpreted as bindings to
the global value variables whose local name is the identifier, and whose
Chapter 2. The core Caml Light language 29
module name is the name of the module. If the matching fails, the exception
Match_failure is raised. The scope of these bindings is the phrases that
follow the value definition in the module implementation.
Type and exception definitions introduce type constructors, variant
constructors and record labels as described in sections 2.8.1 and 2.8.2. The
scope of these definitions is the phrases that follow the value definition in
the module implementation. The evaluation of an implementation phrase
consisting in a type or exception definition produces no effect at run-time.
Directives modify the behavior of the compiler on the subsequent phrases of
the module implementation, as described in section 2.9. The evaluation of an
implementation phrase consisting in a directive produces no effect at
run-time. Directives apply only to the module currently being compiled; in
particular, they have no effect on other modules that refer to globals
exported by the module being compiled.
2.11 Module interfaces
interface ::= {intf-phrase ;;}
intf-phrase ::= value-declaration
| type-definition
| exception-definition
| directive
value-declaration ::= value ident : typexpr {and ident : typexpr}
Module interfaces declare the global objects (value variables, type
constructors, variant constructors, record labels) that a module exports, that
is, makes available to other modules. Other modules can refer to these
globals using qualified identifiers or the #open directive, as explained in
section 2.2.
A module interface consists in a sequence of interface phrases, terminated
by double semicolons. An interface phrase is either a value declaration, a
type definition, an exception definition, or a directive.
Value declarations declare global value variables that are exported by the
module implementation, and the types with which they are exported. The module
implementation must define these variables, with types at least as general as
the types declared in the interface. The scope of the bindings for these
global variables extends from the module implementation itself to all modules
that refer to those variables.
Type or exception definitions introduce type constructors, variant
constructors and record labels as described in sections 2.8.1 and 2.8.2.
Exception definitions and type definitions that are not abstract type
declarations also take effect in the module implementation; that is, the type
constructors, variant constructors and record labels they define are
considered bound on entrance to the module implementation, and can be referred
to by the implementation phrases. Type definitions that are not abstract type
declarations must not be redefined in the module implementation. In contrast,
the type constructors that are declared abstract in a module interface must be
defined in the module implementation, with the same names.
Directives modify the behavior of the compiler on the subsequent phrases of
the module interface, as described in section 2.9. Directives apply only to
the interface currently being compiled; in particular, they have no effect on
other modules that refer to globals exported by the interface being compiled.
Chapter 3
Language extensions
This chapter describes the language features that are implemented in Caml
Light, but not described in the Caml Light reference manual. In contrast with
the fairly stable kernel language that is described in the reference manual,
the extensions presented here are still experimental, and may be removed or
changed in the future.
3.1 Streams, parsers, and printers
Caml Light comprises a built-in type for streams (possibly infinite sequences
of elements, that are evaluated on demand), and associated stream expressions,
to build streams, and stream patterns, to destructure streams. Streams and
stream patterns provide a natural approach to the writing of recursive-descent
parsers.
Streams are presented by the following extensions to the syntactic classes
of expressions:
expr ::= ...
| [< >]
| [< stream-component {; stream-component} >]
| function stream-matching
| match expr with stream-matching
stream-component ::= ' expr
| expr
stream-matching ::= stream-pattern -> expr {| stream-pattern -> expr}
stream-pattern ::= [< >]
| [< stream-comp-pat {; stream-comp-pat} >]
stream-comp-pat ::= ' pattern
| expr pattern
| ident
Stream expressions are bracketed by [< and >]. They represent the
concatenation of their components. The component ' expr represents the
one-element stream whose element is the value of expr. The component expr
represents a sub-stream. For instance, if both s and t are streams of
integers, then [<'1; s; t; '2>] is a stream of integers containing the element
1, then the elements of s, then those of t, and finally 2. The empty stream
is denoted by [< >].
Unlike any other kind of expressions in the language, stream expressions are
submitted to lazy evaluation: the components are not evaluated when the
stream is built, but only when they are accessed during stream matching. The
components are evaluated once, the first time they are accessed; the following
30
Chapter 3. Language extensions 31
accesses reuse the value computed the first time.
Stream patterns, also bracketed by [< and >], describe initial segments of
streams. In particular, the stream pattern [< >] matches all streams. Stream
pattern components are matched against the corresponding elements of a stream.
The component ' pattern matches the corresponding stream element against the
pattern. The component expr pattern applies the function denoted by expr to
the current stream, then matches the result of the function against pattern.
Finally, the component ident simply binds the identifier to the stream being
matched. (The current implementation limits ident to appear last in a stream
pattern.)
Stream matching proceeds destructively: once a component has been matched,
it is discarded from the stream (by in-place modification).
Stream matching proceeds in two steps: first, a pattern is selected by
matching the stream against the first components of the stream patterns; then,
the following components of the selected pattern are checked against the
stream. If the following components do not match, the exception Parse_error
is raised. There is no backtracking here: stream matching commits to the
pattern selected according to the first element. If none of the first
components of the stream patterns match, the exception Parse_failure is
raised. The Parse_failure exception causes the next alternative to be tried,
if it occurs during the matching of the first element of a stream, before
matching has committed to one pattern.
See Functional programming using Caml Light for a more gentle introductions
to streams, and for some examples of their use in writing parsers. A more
formal presentation of streams, and a discussion of alternate semantics, can
be found in Parsers in ML by Michel Mauny and Daniel de Rauglaudre, in the
proceedings of the 1992 ACM conference on Lisp and Functional Programming.
3.2 Guards
Cases of a pattern matching can include guard expressions, which are arbitrary
boolean expressions that must evaluate to true for the match case to be
selected. Guards occur just before the -> token and are introduced by the
when keyword:
match expr
with pattern1[whencond1] -> expr1
| ...
| patternn[whencondn] -> exprn
(Same syntax for the fun, function, and try ...with constructs.) During
matching, if the value of expr matches some pattern patterni which has a
guard condi, then the expression condi is evaluated (in an environment
enriched by the bindings performed during matching). If condi evaluates to
true, then expri is evaluated and its value returned as the result of the
matching, as usual. But if condi evaluates to false, the matching is resumed
against the patterns following patterni.
3.3 Range patterns
In patterns, Caml Light recognizes the form ` c ` .. ` d ` (two character
constants separated by ..) as a shorthand for the pattern
` c ` | ` c1 ` | ` c2 ` |...| ` cn ` | ` d `
Chapter 3. Language extensions 32
where c1,c2,...,cn are the characters that occur between c and d in the
ASCII character set. For instance, the pattern `0`..`9` matches all
characters that are digits.
3.4 Recursive definitions of values
Besides let rec definitions of functional values, as described in the
reference manual, Caml Light supports a certain class of recursive definitions
of non-functional values. For instance, the following definition is accepted:
let rec x = 1 :: y and y = 2 :: x;;
and correctly binds x to the cyclic list 1::2::1::2::..., and y to the cyclic
list 2::1::2::1::...Informally, the class of accepted definitions consists of
those definitions where the defined variables occur only inside function
bodies or as a field of a data structure. Moreover, the patterns in the
left-hand sides must be identifiers, nothing more complex.
3.5 Local definitions using where
A postfix syntax for local definitions is provided:
expr ::= ...
| expr where [rec] let-binding
The expression expr where let-binding behaves exactly as
let let-binding in expr, and similarly for where rec and let rec.
3.6 Mutable variant types
The argument of a value constructor can be declared ``mutable'' when the
variant type is defined:
type foo = A of mutable int
| B of mutable int * int
| ...
This allows in-place modification of the argument part of a constructed value.
Modification is performed by a new kind of expressions, written ident <- expr,
where ident is an identifier bound by pattern-matching to the argument of a
mutable constructor, and expr denotes the value that must be stored in place
of that argument. Continuing the example above:
let x = A 1 in
begin match x with A y -> y <- 2 | _ -> () end;
x
returns the value A 2. The notation ident <- expr works also if ident is an
identifier bound by pattern-matching to the value of a mutable field in a
record. For instance,
type bar = {mutable lbl : int};;
let x = {lbl = 1} in
begin match x with {lbl = y} -> y <- 2 end;
x
Chapter 3. Language extensions 33
returns the value {lbl = 2}.
3.7 String access
Extra syntactic constructs are provided to access and modify characters in
strings:
expr ::= ...
| expr .[ expr ]
| expr .[ expr ] <- expr
The expression expr1 .[ expr2 ] is equivalent to the application
nth_char expr1 expr2. In the initial environment, the identifier nth_char
resolves to a built-in function that returns the character number expr2 in
the string denoted by expr1. The first element has number 0; the last
element has number n-1, where n is the length of the string. The exception
Invalid_argument is raised if the access is out of bounds.
The expression expr1 .[ expr2 ] <- expr3 is equivalent to
set_nth_char expr1 expr2 expr3. In the initial environment, the identifier
set_nth_char resolves to a built-in function that modifies in-place the string
denoted by expr1, replacing character number expr2 by the value of expr3.
The exception Invalid_argument is raised if the access is out of bounds. The
built-in function returns ().
3.8 Alternate syntax
The syntax of some constructs has been slightly relaxed:
- An optional ; may terminate a sequence, list expression, or record
expression. For instance, begin e1 ; e2 ; end is syntactically correct
and synonymous with begin e1 ; e2 end.
- Similarly, an optional | may begin a pattern-matching expression. For
instance, function | pat1 -> expr1 |... is syntactically correct and
synonymous with function pat1 -> expr1 |....
- The tokens && and || are recognized as synonymous for & (sequential
``and'') and or (sequential ``or''), respectively.
3.9 Infix symbols
Sequences of ``operator characters'', such as <=> or !!, are read as a single
token from the infix-symbol or prefix-symbol class:
infix-symbol ::= (= | < | > | @ | ^ | | | & | ~ | + | - | * | / | $ | %) {operator-char}
prefix-symbol ::= (! | ?) {operator-char}
operator-char ::= ! | $ | % | & | * | + | - | . | / | : | ; | < | = | > | ? | @ | ^ | | | ~
Tokens from these two classes generalize the built-in infix and prefix
operators described in chapter 2:
Chapter 3. Language extensions 34
expr ::= ...
| prefix-symbol expr
| expr infix-symbol expr
variable ::= ...
| prefix prefix-symbol
| prefix infix-symbol
No #infix directive (section 3.10) is needed to give infix symbols their infix
status. The precedences and associativities of infix symbols in expressions
are determined by their first character(s): symbols beginning with ** have
highest precedence (exponentiation), followed by symbols beginning with *, /
or % (multiplication), then + and - (addition), then @ and ^ (concatenation),
then all others symbols (comparisons). The updated precedence table for
expressions is shown below. We write ``*...'' to mean ``any infix symbol
starting with *''.
----------------------------------------------------------------------
|Construction or operator |Associativity |
----------------------------------------------------------------------
|!... ?... |-- |
|. .( .[ |-- |
|function application |left |
|constructor application |-- |
|- -. (prefix) |-- |
|**... |right |
|*... /... %... mod |left |
|+... -... |left |
|:: |right |
|@... ^... |right |
|comparisons (= == < etc.), all other infix symbols|left |
|not |-- |
|& && |left |
|or || |left |
|, |-- |
|<- := |right |
|if |-- |
|; |right |
|let match fun function try |-- |
----------------------------------------------------------------------
Some infix and prefix symbols are predefined in the default environment (see
chapters 2 and 13 for a description of their behavior). The others are
initially unbound and must be bound before use, with a
let prefix infix-symbol = expr or let prefix prefix-symbol = expr binding.
3.10 Directives
In addition to the standard #open and #close directives, Caml Light provides
three additional directives.
#infix " id "
Change the lexical status of the identifier id: in the remainder of the
compilation unit, id is recognized as an infix operator, just like +.
The notation prefix id can be used to refer to the identifier id itself.
Expressions of the form expr1 id expr2 are parsed as the application
prefix id expr1 expr2. The argument to the #infix directive must be an
identifier, that is, a sequence of letters, digits and underscores
starting with a letter; otherwise, the #infix declaration has no effect.
Example:
Chapter 3. Language extensions 35
#infix "union";;
let prefix union = fun x y -> ... ;;
[1,2] union [3,4];;
#uninfix " id "
Remove the infix status attached to the identifier id by a previous
#infix " id " directive.
#directory " dir-name "
Add the named directory to the path of directories searched for compiled
module interface files. This is equivalent to the -I command-line option
to the batch compiler and the toplevel system.
Part III
The Caml Light commands
36
Chapter 4
Batch compilation (camlc)
This chapter describes how Caml Light programs can be compiled
non-interactively, and turned into standalone executable files. This is
achieved by the command camlc, which compiles and links Caml Light source
files.
Mac: This command is not a standalone Macintosh application. To run camlc,
you need the Macintosh Programmer's Workshop (MPW) programming
environment. The programs generated by camlc are also MPW tools, not
standalone Macintosh applications.
4.1 Overview of the compiler
The camlc command has a command-line interface similar to the one of most C
compilers. It accepts several types of arguments: source files for module
implementations; source files for module interfaces; and compiled module
implementations.
- Arguments ending in .mli are taken to be source files for module
interfaces. Module interfaces declare exported global identifiers,
define public data types, and so on. From the file x.mli, the camlc
compiler produces a compiled interface in the file x.zi.
- Arguments ending in .ml are taken to be source files for module
implementation. Module implementations bind global identifiers to
values, define private data types, and contain expressions to be
evaluated for their side-effects. From the file x.ml, the camlc compiler
produces compiled object code in the file x.zo. If the interface file
x.mli exists, the module implementation x.ml is checked against the
corresponding compiled interface x.zi, which is assumed to exist. If no
interface x.mli is provided, the compilation of x.ml produces a compiled
interface file x.zi in addition to the compiled object code file x.zo.
The file x.zi produced corresponds to an interface that exports
everything that is defined in the implementation x.ml.
- Arguments ending in .zo are taken to be compiled object code. These
files are linked together, along with the object code files obtained by
compiling .ml arguments (if any), and the Caml Light standard library, to
produce a standalone executable program. The order in which .zo and .ml
arguments are presented on the command line is relevant: global
identifiers are initialized in that order at run-time, and it is a
link-time error to use a global identifier before having initialized it.
Hence, a given x.zo file must come before all .zo files that refer to
37
Chapter 4. Batch compilation (camlc) 38
identifiers defined in the file x.zo.
The output of the linking phase is a file containing compiled code that can
be executed by the Caml Light runtime system: the command named camlrun. If
caml.out is the name of the file produced by the linking phase, the command
camlrun caml.out arg1 arg2 ... argn
executes the compiled code contained in caml.out, passing it as arguments the
character strings arg1 to argn. (See chapter 6 for more details.)
Unix: On most Unix systems, the file produced by the linking phase can be run
directly, as in:
./caml.out arg1 arg2 ... argn
The produced file has the executable bit set, and it manages to launch
the bytecode interpreter by itself.
PC: The output file produced by the linking phase is directly executable,
provided it is given extension .EXE. Hence, if the output file is named
caml_out.exe, you can execute it with the command
caml_out arg1 arg2 ... argn
Actually, the produced file caml_out.exe is a tiny executable file
prepended to the bytecode file. The executable simply runs the camlrun
runtime system on the remainder of the file. (As a consequence, this
is not a standalone executable: it still requires camlrun.exe to
reside in one of the directories in the path.)
4.2 Options
The following command-line options are recognized by camlc.
-c Compile only. Suppress the linking phase of the compilation. Source
code files are turned into compiled files, but no executable file is
produced. This option is useful to compile modules separately.
-ccopt option
Pass the given option to the C compiler and linker, when linking in
``custom runtime'' mode (see the -custom option). For instance, -ccopt
-Ldir causes the C linker to search for C libraries in directory dir.
-custom
Link in ``custom runtime'' mode. In the default linking mode, the linker
produces bytecode that is intended to be executed with the shared runtime
system, camlrun. In the custom runtime mode, the linker produces an
output file that contains both the runtime system and the bytecode for
the program. The resulting file is considerably larger, but it can be
executed directly, even if the camlrun command is not installed.
Moreover, the ``custom runtime'' mode enables linking Caml Light code
with user-defined C functions, as described in chapter 12.
Chapter 4. Batch compilation (camlc) 39
Unix: Never strip an executable produced with the -custom option.
PC: This option requires the DJGPP port of the GNU C compiler to be
installed.
-g Cause the compiler to produce additional debugging information. During
the linking phase, this option add information at the end of the
executable bytecode file produced. This information is required by the
debugger camldebug and also by the catch-all exception handler from the
standard library module printexc.
During the compilation of an implementation file (.ml file), when the -g
option is set, the compiler adds debugging information to the .zo file.
It also writes a .zix file that describes the full interface of the .ml
file, that is, all types and values defined in the .ml file, including
those that are local to the .ml file (i.e. not declared in the .mli
interface file). Used in conjunction with the -g option to the toplevel
system (chapter 5), the .zix file gives access to the local values of the
module, making it possible to print or ``trace'' them. The .zix file is
not produced if the implementation file has no explicit interface, since,
in this case, the module has no local values.
-i Cause the compiler to print the declared types, exceptions, and global
variables (with their inferred types) when compiling an implementation
(.ml file). This can be useful to check the types inferred by the
compiler. Also, since the output follows the syntax of module
interfaces, it can help in writing an explicit interface (.mli file) for
a file: just redirect the standard output of the compiler to a .mli
file, and edit that file to remove all declarations of unexported
globals.
-I directory
Add the given directory to the list of directories searched for compiled
interface files (.zi) and compiled object code files (.zo). By default,
the current directory is searched first, then the standard library
directory. Directories added with -I are searched after the current
directory, but before the standard library directory. When several
directories are added with several -I options on the command line, these
directories are searched from right to left (the rightmost directory is
searched first, the leftmost is searched last). (Directories can also be
added to the search path from inside the programs with the #directory
directive; see chapter 3.)
-lang language-code
Translate the compiler messages to the specified language. The
language-code is fr for French, es for Spanish, de for German, ... (See
the file camlmsgs.txt in the Caml Light standard library directory for a
list of available languages.) When an unknown language is specified, or
no translation is available for a message, American English is used by
default.
-o exec-file
Specify the name of the output file produced by the linker.
Unix: The default output name is a.out, in keeping with the tradition.
Chapter 4. Batch compilation (camlc) 40
PC: The default output name is caml_out.exe.
Mac: The default output name is Caml.Out.
-O module-set
Specify which set of standard modules is to be implicitly ``opened'' at
the beginning of a compilation. There are three module sets currently
available:
cautious
provides the standard operations on integers, floating-point numbers,
characters, strings, arrays, ..., as well as exception handling,
basic input/output, etc. Operations from the cautious set perform
range and bound checking on string and array operations, as well as
various sanity checks on their arguments.
fast
provides the same operations as the cautious set, but without sanity
checks on their arguments. Programs compiled with -O fast are
therefore slightly faster, but unsafe.
none
suppresses all automatic opening of modules. Compilation starts in
an almost empty environment. This option is not of general use,
except to compile the standard library itself.
The default compilation mode is -O cautious. See chapter 13 for a
complete listing of the modules in the cautious and fast sets.
-p Compile and link in profiling mode. See the description of the profiler
camlpro in chapter 10.
-v Print the version number of the compiler.
-W Print extra warning messages for the following events:
- A #open directive is useless (no identifier in the opened module is
ever referenced).
- A variable name in a pattern matching is capitalized (often
corresponds to a misspelled constant constructor).
Unix: The following environment variable is also consulted:
LANGWhen set, control which language is used to print the compiler
messages (see the -lang command-line option).
PC: The following option is also supported:
Chapter 4. Batch compilation (camlc) 41
@response-file
Process the files whose names are listed in file response-file,
just as if these names appeared on the command line. File names in
response-file are separated by blanks (spaces, tabs, newlines).
This option allows to overcome silly limitations on the length of
the command line.
The following environment variables are also consulted:
CAMLLIB
Contain the path to the standard library directory.
LANGWhen set, control which language is used to print the compiler
messages (see the -lang command-line option).
4.3 Modules and the file system
This short section is intended to clarify the relationship between the names
of the modules and the names of the files that contain their compiled
interface and compiled implementation.
The compiler always derives the name of the compiled module by taking the
base name of the source file (.ml or .mli file). That is, it strips the
leading directory name, if any, as well as the .ml or .mli suffix. The
produced .zi and .zo files have the same base name as the source file; hence,
the compiled files produced by the compiler always have their base name equal
to the name of the module they describe (for .zi files) or implement (for .zo
files).
For compiled interface files (.zi files), this invariant must be preserved
at all times, since the compiler relies on it to load the compiled interface
file for the modules that are used from the module being compiled. Hence, it
is risky and generally incorrect to rename .zi files. It is admissible to
move them to another directory, if their base name is preserved, and the
correct -I options are given to the compiler.
Compiled bytecode files (.zo files), on the other hand, can be freely
renamed once created. That's because 1- .zo files contain the true name of
the module they define, so there is no need to derive that name from the file
name; 2- the linker never attempts to find by itself the .zo file that
implements a module of a given name: it relies on the user providing the list
of .zo files by hand.
4.4 Common errors
This section describes and explains the most frequently encountered error
messages.
Cannot find file filename
The named file could not be found in the current directory, nor in the
directories of the search path. The filename is either a compiled
interface file (.zi file), or a compiled bytecode file (.zo file). If
filename has the format mod.zi, this means you are trying to compile a
file that references identifiers from module mod, but you have not yet
compiled an interface for module mod. Fix: compile mod.mli or mod.ml
first, to create the compiled interface mod.zi.
Chapter 4. Batch compilation (camlc) 42
If filename has the format mod.zo, this means you are trying to link a
bytecode object file that does not exist yet. Fix: compile mod.ml
first.
If your program spans several directories, this error can also appear
because you haven't specified the directories to look into. Fix: add
the correct -I options to the command line.
Corrupted compiled interface file filename
The compiler produces this error when it tries to read a compiled
interface file (.zi file) that has the wrong structure. This means
something went wrong when this .zi file was written: the disk was full,
the compiler was interrupted in the middle of the file creation, and so
on. This error can also appear if a .zi file is modified after its
creation by the compiler. Fix: remove the corrupted .zi file, and
rebuild it.
This expression has type t1, but is used with type t2
This is by far the most common type error in programs. Type t1 is the
type inferred for the expression (the part of the program that is
displayed in the error message), by looking at the expression itself.
Type t2 is the type expected by the context of the expression; it is
deduced by looking at how the value of this expression is used in the
rest of the program. If the two types t1 and t2 are not compatible, then
the error above is produced.
In some cases, it is hard to understand why the two types t1 and t2 are
incompatible. For instance, the compiler can report that ``expression of
type foo cannot be used with type foo'', and it really seems that the two
types foo are compatible. This is not always true. Two type
constructors can have the same name, but actually represent different
types. This can happen if a type constructor is redefined. Example:
type foo = A | B;;
let f = function A -> 0 | B -> 1;;
type foo = C | D;;
f C;;
This result in the error message ``expression C of type foo cannot be
used with type foo''.
Incompatible types with the same names can also appear when a module is
changed and recompiled, but some of its clients are not recompiled.
That's because type constructors in .zi files are not represented by
their name (that would not suffice to identify them, because of type
redefinitions), but by unique stamps that are assigned when the type
declaration is compiled. Consider the three modules:
mod1.ml: type t = A | B;;
let f = function A -> 0 | B -> 1;;
mod2.ml: let g x = 1 + mod1__f(x);;
mod3.ml: mod2__g mod1__A;;
Chapter 4. Batch compilation (camlc) 43
Now, assume mod1.ml is changed and recompiled, but mod2.ml is not
recompiled. The recompilation of mod1.ml can change the stamp assigned
to type t. But the interface mod2.zi will still use the old stamp for
mod1__t in the type of mod2__g. Hence, when compiling mod3.ml, the
system complains that the argument type of mod2__g (that is, mod1__t with
the old stamp) is not compatible with the type of mod1__A (that is,
mod1__t with the new stamp). Fix: use make or a similar tool to ensure
that all clients of a module mod are recompiled when the interface mod.zi
changes. To check that the Makefile contains the right dependencies,
remove all .zi files and rebuild the whole program; if no ``Cannot find
file'' error appears, you're all set.
The type inferred for name, that is, t, contains non-generalizable type variables
Type variables ('a, 'b, ...) in a type t can be in either of two states:
generalized (which means that the type t is valid for all possible
instantiations of the variables) and not generalized (which means that
the type t is valid only for one instantiation of the variables). In a
let binding let name = expr, the type-checker normally generalizes as
many type variables as possible in the type of expr. However, this leads
to unsoundness (a well-typed program can crash) in conjunction with
polymorphic mutable data structures. To avoid this, generalization is
performed at let bindings only if the bound expression expr belongs to
the class of ``syntactic values'', which includes constants, identifiers,
functions, tuples of syntactic values, etc. In all other cases (for
instance, expr is a function application), a polymorphic mutable could
have been created and generalization is therefore turned off.
Non-generalized type variables in a type cause no difficulties inside a
given compilation unit (the contents of a .ml file, or an interactive
session), but they cannot be allowed in types written in a .zi compiled
interface file, because they could be used inconsistently in other
compilation units. Therefore, the compiler flags an error when a .ml
implementation without a .mli interface defines a global variable name
whose type contains non-generalized type variables. There are two
solutions to this problem:
- Add a type constraint or a .mli interface to give a monomorphic type
(without type variables) to name. For instance, instead of writing
let sort_int_list = sort (prefix <);;
(* inferred type 'a list -> 'a list, with 'a not generalized *)
write
let sort_int_list = (sort (prefix <) : int list -> int list);;
- If you really need name to have a polymorphic type, turn its defining
expression into a function by adding an extra parameter. For
instance, instead of writing
Chapter 4. Batch compilation (camlc) 44
let map_length = map vect_length;;
(* inferred type 'a vect list -> int list, with 'a not general-
ized *)
write
let map_length lv = map vect_length lv;;
mod__name is referenced before being defined
This error appears when trying to link an incomplete or incorrectly
ordered set of files. Either you have forgotten to provide an
implementation for the module named mod on the command line (typically,
the file named mod.zo, or a library containing that file). Fix: add the
missing .ml or .zo file to the command line. Or, you have provided an
implementation for the module named mod, but it comes too late on the
command line: the implementation of mod must come before all bytecode
object files that reference one of the global variables defined in module
mod. Fix: change the order of .ml and .zo files on the command line.
Of course, you will always encounter this error if you have mutually
recursive functions across modules. That is, function mod1__f calls
function mod2__g, and function mod2__g calls function mod1__f. In this
case, no matter what permutations you perform on the command line, the
program will be rejected at link-time. Fixes:
- Put f and g in the same module.
- Parameterize one function by the other. That is, instead of having
mod1.ml: let f x = ... mod2__g ... ;;
mod2.ml: let g y = ... mod1__f ... ;;
define
mod1.ml: let f g x = ... g ... ;;
mod2.ml: let rec g y = ... mod1__f g ... ;;
and link mod1 before mod2.
- Use a reference to hold one of the two functions, as in :
mod1.ml: let forward_g =
ref((fun x -> failwith "forward_g") : );;
let f x = ... !forward_g ... ;;
mod2.ml: let g y = ... mod1__f ... ;;
mod1__forward_g := g;;
Chapter 4. Batch compilation (camlc) 45
Unavailable C primitive f
This error appears when trying to link code that calls external functions
written in C in ``default runtime'' mode. As explained in chapter 12,
such code must be linked in ``custom runtime'' mode. Fix: add the
-custom option, as well as the (native code) libraries and (native code)
object files that implement the required external functions.
Chapter 5
The toplevel system (camllight)
This chapter describes the toplevel system for Caml Light, that permits
interactive use of the Caml Light system, through a read-eval-print loop. In
this mode, the system repeatedly reads Caml Light phrases from the input, then
typechecks, compile and evaluate them, then prints the inferred type and
result value, if any. The system prints a # (sharp) prompt before reading
each phrase. A phrase can span several lines. Phrases are delimited by ;;
(the final double-semicolon).
From the standpoint of the module system, all phrases entered at toplevel
are treated as the implementation of a module named top. Hence, all toplevel
definitions are entered in the module top.
Unix: The toplevel system is started by the command camllight. Phrases are
read on standard input, results are printed on standard output, errors
on standard error. End-of-file on standard input terminates camllight
(see also the quit system function below).
The toplevel system does not perform line editing, but it can easily be
used in conjunction with an external line editor such as fep; just run
fep -emacs camllight or fep -vi camllight. Another option is to use
camllight under Gnu Emacs, which gives the full editing power of Emacs
(see the directory contrib/camlmode in the distribution).
At any point, the parsing, compilation or evaluation of the current
phrase can be interrupted by pressing ctrl-C (or, more precisely, by
sending the intr signal to the camllight process). This goes back to
the # prompt.
Mac: The toplevel system is presented as the standalone Macintosh
application Caml Light. This application does not require the
Macintosh Programmer's Workshop to run.
Once launched from the Finder, the application opens two windows,
``Caml Light Input'' and ``Caml Light Output''. Phrases are entered in
the ``Caml Light Input'' window. The ``Caml Light Output'' window
displays a copy of the input phrases as they are processed by the Caml
Light toplevel, interspersed with the toplevel responses. The
``Return'' key sends the contents of the Input window to the Caml Light
toplevel. The ``Enter'' key inserts a newline without sending the
contents of the Input window. (This can be configured with the
``Preferences'' menu item.)
The contents of the input window can be edited at all times, with the
standard Macintosh interface. An history of previously entered phrases
46
Chapter 5. The toplevel system (camllight) 47
is maintained, and can be accessed with the ``Previous entry''
(command-P) and ``Next entry'' (command-N) menu items.
To quit the Caml Light application, either select ``Quit'' from the
``Files'' menu, or use the quit function described below.
At any point, the parsing, compilation or evaluation of the current
phrase can be interrupted by pressing ``command-period'', or by
selecting the item ``Interrupt Caml Light'' in the ``Caml Light'' menu.
This goes back to the # prompt.
PC: The toplevel system is presented as a Windows application named
Camlwin.exe. It should be launched from the Windows file manager or
program manager.
The ``Terminal'' windows is split in two panes. Phrases are entered
and edited in the bottom pane. The top pane displays a copy of the
input phrases as they are processed by the Caml Light toplevel,
interspersed with the toplevel responses. The ``Return'' key sends the
contents of the bottom pane to the Caml Light toplevel. The ``Enter''
key inserts a newline without sending the contents of the Input window.
(This can be configured with the ``Preferences'' menu item.)
The contents of the input window can be edited at all times, with the
standard Windows interface. An history of previously entered phrases
is maintained and displayed in a separate window.
To quit the Camlwin application, either select ``Quit'' from the
``File'' menu, or use the quit function described below.
At any point, the parsing, compilation or evaluation of the current
phrase can be interrupted by selecting the ``Interrupt Caml Light''
menu item. This goes back to the # prompt.
A text-only version of the toplevel system is available under the name
caml.exe. It runs under MSDOS as well as under Windows in a DOS
window. No editing facilities are provided.
5.1 Options
The following command-line options are recognized by the caml or camllight
commands.
-g Start the toplevel system in debugging mode. This mode gives access to
values and types that are local to a module, that is, not exported by the
interface of the module. When debugging mode is off, these local objects
are not accessible (attempts to access them produce an ``Unbound
identifier'' error). When debugging mode is on, these objects become
visible, just like the objects that are exported in the module interface.
In particular, values of abstract types are printed using their concrete
representations, and the functions local to a module can be ``traced''
(see the trace function in section 5.2). This applies only to the
modules that have been compiled in debugging mode (either by the batch
compiler with the -g option, or by the toplevel system in debugging
mode), that is, those modules that have an associated .zix file.
Chapter 5. The toplevel system (camllight) 48
-I directory
Add the given directory to the list of directories searched for compiled
interface files (.zi) and compiled object code files (.zo). By default,
the current directory is searched first, then the standard library
directory. Directories added with -I are searched after the current
directory, but before the standard library directory. When several
directories are added with several -I options on the command line, these
directories are searched from right to left (the rightmost directory is
searched first, the leftmost is searched last). Directories can also be
added to the search path once the toplevel is running with the #directory
directive; see chapter 3.
-lang language-code
Translate the toplevel messages to the specified language. The
language-code is fr for French, es for Spanish, de for German, ... (See
the file camlmsgs.txt in the Caml Light standard library directory for a
list of available languages.) When an unknown language is specified, or
no translation is available for a message, American English is used by
default.
-O module-set
Specify which set of standard modules is to be implicitly ``opened'' when
the toplevel starts. There are three module sets currently available:
cautious
provides the standard operations on integers, floating-point numbers,
characters, strings, arrays, ..., as well as exception handling,
basic input/output, ...Operations from the cautious set perform range
and bound checking on string and vector operations, as well as
various sanity checks on their arguments.
fast
provides the same operations as the cautious set, but without sanity
checks on their arguments. Programs compiled with -O fast are
therefore slightly faster, but unsafe.
none
suppresses all automatic opening of modules. Compilation starts in
an almost empty environment. This option is not of general use.
The default compilation mode is -O cautious. See chapter 13 for a
complete listing of the modules in the cautious and fast sets.
Unix: The following environment variables are also consulted:
LANGWhen set, control which language is used to print the compiler
messages (see the -lang command-line option).
LC_CTYPE
If set to iso_8859_1, accented characters (from the ISO Latin-1
character set) in string and character literals are printed as is;
otherwise, they are printed as decimal escape sequences (\ddd).
Chapter 5. The toplevel system (camllight) 49
5.2 Toplevel control functions
The standard library module toplevel, opened by default when the toplevel
system is launched, provides a number of functions that control the toplevel
behavior, load files in memory, and trace program execution.
value quit : unit -> unit
Exit the toplevel loop and terminate the camllight command.
value include : string -> unit
Read, compile and execute source phrases from the given file. The .ml
extension is automatically added to the file name, if not present. This
is textual inclusion: phrases are processed just as if they were typed
on standard input. In particular, global identifiers defined by these
phrases are entered in the module named top, not in a new module.
value load : string -> unit
Load in memory the source code for a module implementation. Read,
compile and execute source phrases from the given file. The .ml
extension is automatically added if not present. The load function
behaves much like include, except that a new module is created, with name
the base name of the source file name. Global identifiers defined in the
source file are entered in this module, instead of the top module as in
the case of include. For instance, assuming file foo.ml contains the
single phrase
let bar = 1;;
executing load "foo" defines the identifier foo__bar with value 1.
Caveat: the loaded module is not automatically opened: the identifier
bar does not automatically complete to foo__bar. To achieve this, you
must execute the directive #open "foo" afterwards.
value compile : string -> unit
Compile the source code for a module implementation or interface (.ml or
.mli file). Compilation proceeds as with the batch compiler, and
produces the same results as camlc -c. If the toplevel system is in
debugging mode (option -g or function debug_mode below), the compilation
is also performed in debugging mode, as when giving the -g option to the
batch compiler. The result of the compilation is left in files (.zo,
.zi, .zix). The compiled code is not loaded in memory. Use load_object
to load a .zo file produced by compile.
value load_object : string -> unit
Load in memory the compiled bytecode contained in the given file. The
.zo extension is automatically added to the file name, if not present.
The bytecode file has been produced either by the standalone compiler
camlc or by the compile function. Global identifiers defined in the file
being loaded are entered in their own module, not in the top module, just
as with the load function.
Chapter 5. The toplevel system (camllight) 50
value trace : string -> unit
After the execution of trace "foo", all calls to the global function
named foo will be ``traced''. That is, the argument and the result are
displayed for each call, as well as the exceptions escaping out of foo,
either raised by foo itself, or raised by one of the functions called
from foo. If foo is a curried function, each argument is printed as it
is passed to the function. Only functions implemented in ML can be
traced; system primitives such as string_length or user-supplied C
functions cannot.
value untrace : string -> unit
Executing untrace "foo" stops all tracing over the global function named
foo.
value verbose_mode: bool -> unit
verbose_mode true causes the compile function to print the inferred types
and other information. verbose_mode false reverts to the default silent
behavior.
value install_printer : string -> unit
install_printer "printername" registers the function named printername as
a printer for objects whose types match its argument type. That is, the
toplevel loop will call printername when it has such an object to print.
The printing function printername must use the format library module to
produce its output, otherwise the output of printername will not be
correctly located in the values printed by the toplevel loop.
value remove_printer : string -> unit
remove_printer "printername" removes the function named printername from
the table of toplevel printers.
value set_print_depth : int -> unit
set_print_depth n limits the printing of values to a maximal depth of n.
The parts of values whose depth exceeds n are printed as ... (ellipsis).
value set_print_length : int -> unit
set_print_length n limits the number of value nodes printed to at most n.
Remaining parts of values are printed as ... (ellipsis).
value debug_mode: bool -> unit
Set whether extended module interfaces must be used debug_mode true or
not debug_mode false. Extended module interfaces are .zix files that
describe the actual implementation of a module, including private types
and variables. They are generated when compiling with camlc -g, or with
the compile function above when debug_mode is true. When debug_mode is
true, toplevel phrases can refer to private types and variables of
modules, and private functions can be traced with trace. Setting
debug_mode true is equivalent to starting the toplevel with the -g
option.
Chapter 5. The toplevel system (camllight) 51
value cd : string -> unit
Change the current working directory.
value directory : string -> unit
Add the given directory to the search path for files. Same behavior as
the -I option or the #directory directive.
5.3 The toplevel and the module system
Toplevel phrases can refer to identifiers defined in modules other than the
top module with the same mechanisms as for separately compiled modules:
either by using qualified identifiers (modulename__localname), or by using
unqualified identifiers that are automatically completed by searching the list
of opened modules. (See section 2.2.) The modules opened at startup are
given in the documentation for the standard library. Other modules can be
opened with the #open directive.
However, before referencing a global variable from a module other than the
top module, a definition of that global variable must have been entered in
memory. At start-up, the toplevel system contains the definitions for all the
identifiers in the standard library. The definitions for user modules can be
entered with the load or load_object functions described above. Referencing a
global variable for which no definition has been provided by load or
load_object results in the error ``Identifier foo__bar is referenced before
being defined''. Since this is a tricky point, let us consider some examples.
1. The library function sub_string is defined in module string. This module
is part of the standard library, and is one of the modules automatically
opened at start-up. Hence, both phrases
sub_string "qwerty" 1 3;;
string__sub_string "qwerty" 1 3;;
are correct, without having to use #open, load, or load_object.
2. The library function printf is defined in module printf. This module is
part of the standard library, but it is not automatically opened at
start-up. Hence, the phrase
printf__printf "%s %s" "hello" "world";;
is correctly executed, while
printf "%s %s" "hello" "world";;
causes the error ``Variable printf is unbound'', since none of the
currently opened modules define a global with local name printf.
However,
Chapter 5. The toplevel system (camllight) 52
#open "printf";;
printf "%s %s" "hello" "world";;
executes correctly.
3. Assume the file foo.ml resides in the current directory, and contains the
single phrase
let x = 1;;
When the toplevel starts, references to x will cause the error ``Variable
x is unbound''. References to foo__x will cause the error ``Cannot find
file foo.zi'', since the typechecker is attempting to load the compiled
interface for module foo in order to find the type of x. To load in
memory the module foo, just do:
load "foo";;
Then, references to foo__x typecheck and evaluate correctly. Since load
does not open the module it loads, references to x will still fail with
the error ``Variable x is unbound''. You will have to do
#open "foo";;
explicitly, for x to complete automatically into foo__x.
4. Finally, assume the file foo.ml above has been previously compiled with
the camlc -c command. The current directory therefore contains a
compiled interface foo.zi, claiming that foo__x is a global variable with
type int, and a compiled bytecode file foo.zo, defining foo__x to have
the value 1. When the toplevel starts, references to foo__x will cause
the error ``foo__x is referenced before being defined''. In contrast
with case 3 above, the typechecker has succeeded in finding the compiled
interface for module foo. But execution cannot proceed, because no
definition for foo__x has been entered in memory. To do so, execute:
load_object "foo";;
This loads the file foo.zo in memory, therefore defining foo__x. Then,
references to foo__x evaluate correctly. As in case 3 above, references
to x still fail, because load_object does not open the module it loads.
Again, you will have to do
#open "foo";;
explicitly, for x to complete automatically into foo__x.
Chapter 5. The toplevel system (camllight) 53
5.4 Common errors
This section describes and explains the most frequently encountered error
messages.
Cannot find file filename
The named file could not be found in the current directory, nor in the
directories of the search path.
If filename has the format mod.zi, this means the current phrase
references identifiers from module mod, but you have not yet compiled an
interface for module mod. Fix: either load the file mod.ml, which will
also create in memory the compiled interface for module mod; or use camlc
to compile mod.mli or mod.ml, creating the compiled interface mod.zi,
before you start the toplevel.
If filename has the format mod.zo, this means you are trying to load with
load_object a bytecode object file that does not exist yet. Fix:
compile mod.ml with camlc before you start the toplevel. Or, use load
instead of load_object to load the source code instead of a compiled
object file.
If filename has the format mod.ml, this means load or include could not
find the specified source file. Fix: check the spelling of the file
name, or write it if it does not exist.
mod__name is referenced before being defined
You have neglected to load in memory an implementation for a module, with
load or load_object. This is explained in full detail in section 5.3
above.
Corrupted compiled interface file filename
See section 4.4.
Expression of type t1 cannot be used with type t2
See section 4.4.
The type inferred for the value name, that is, t, contains type variables that cannot be generalized
See section 4.4.
5.5 Building custom toplevel systems: camlmktop
The camlmktop command builds Caml Light toplevels that contain user code
preloaded at start-up.
Mac: This command is not available in the Macintosh version.
The camlmktop command takes as argument a set of .zo files, and links them
with the object files that implement the Caml Light toplevel. The typical use
is:
camlmktop -o mytoplevel foo.zo bar.zo gee.zo
This creates the bytecode file mytoplevel, containing the Caml Light toplevel
system, plus the code from the three .zo files. To run this toplevel, give it
Chapter 5. The toplevel system (camllight) 54
as argument to the camllight command:
camllight mytoplevel
This starts a regular toplevel loop, except that the code from foo.zo, bar.zo
and gee.zo is already loaded in memory, just as if you had typed:
load_object "foo";;
load_object "bar";;
load_object "gee";;
on entrance to the toplevel. The modules foo, bar and gee are not opened,
though; you still have to do
#open "foo";;
yourself, if this is what you wish.
5.6 Options
The following command-line options are recognized by camlmktop.
-ccopt option
Pass the given option to the C compiler and linker, when linking in
``custom runtime'' mode. See the corresponding option for camlc, in
chapter 4.
-custom
Link in ``custom runtime'' mode. See the corresponding option for camlc,
in chapter 4.
-g Add debugging information to the toplevel file produced, which can then
be debugged with camldebug (chapter 9).
-I directory
Add the given directory to the list of directories searched for compiled
object code files (.zo).
-o exec-file
Specify the name of the toplevel file produced by the linker.
Unix: The default is camltop.out.
PC: The default is camltop.exe. The name must have .exe extension.
Chapter 6
The runtime system (camlrun)
The camlrun command executes bytecode files produced by the linking phase of
the camlc command.
Mac: This command is a MPW tool, not a standalone Macintosh application.
6.1 Overview
The camlrun command comprises three main parts: the bytecode interpreter,
that actually executes bytecode files; the memory allocator and garbage
collector; and a set of C functions that implement primitive operations such
as input/output.
The usage for camlrun is:
camlrun options bytecode-executable arg1 ... argn
The first non-option argument is taken to be the name of the file containing
the executable bytecode. (That file is searched in the executable path as
well as in the current directory.) The remaining arguments are passed to the
Caml Light program, in the string array sys__command_line. Element 0 of this
array is the name of the bytecode executable file; elements 1 to n are the
remaining arguments arg1 to argn.
As mentioned in chapter 4, in most cases, the bytecode executable files
produced by the camlc command are self-executable, and manage to launch the
camlrun command on themselves automatically. That is, assuming caml.out is a
bytecode executable file,
caml.out arg1 ... argn
works exactly as
camlrun caml.out arg1 ... argn
Notice that it is not possible to pass options to camlrun when invoking
caml.out directly.
6.2 Options
The following command-line option is recognized by camlrun.
-V Print out the camlrun version number. Exit immediately without executing
any byte-code file.
55
Chapter 6. The runtime system (camlrun) 56
The following environment variable are also consulted:
CAMLRUNPARAM
Set the garbage collection parameters. This variable must be a sequence
of parameter specifications. A parameter specification is an option
letter followed by an = sign and a decimal number. There are four
options, corresponding to the four fields of the control record
documented in section 14.5:
s (minor_heap_size) Size of the minor heap.
i (major_heap_increment) Minimum size increment for the major heap.
o (space_overhead) The major GC speed setting.
v (verbose) Whether to print GC messages or not. 0 is false; 1 is
true; other values may give unexpected results.
For example, under csh the command
setenv CAMLRUNPARAM 's=250000 v=1'
tells a subsequent camlrun to set its initial minor heap size to about
1 megabyte (on a 32-bit machine) and to print its GC messages.
PATH
List of directories searched to find the bytecode executable file.
6.3 Common errors
This section describes and explains the most frequently encountered error
messages.
filename: no such file or directory
If filename is the name of a self-executable bytecode file, this means
that either that file does not exist, or that it failed to run the
camlrun bytecode interpreter on itself. The second possibility indicates
that Caml Light has not been properly installed on your system.
Cannot exec camlrun
(When launching a self-executable bytecode file.) The camlrun command
could not be found in the executable path. Check that Caml Light has
been properly installed on your system.
Cannot find the bytecode file
The file that camlrun is trying to execute (e.g. the file given as first
non-option argument to camlrun) either does not exist, or is not a valid
executable bytecode file.
Truncated bytecode file
The file that camlrun is trying to execute is not a valid executable
bytecode file. Probably it has been truncated or mangled since created.
Erase and rebuild it.
Chapter 6. The runtime system (camlrun) 57
Uncaught exception
The program being executed contains a ``stray'' exception. That is, it
raises an exception at some point, and this exception is never caught.
This causes immediate termination of the program. If you wish to know
which exception thus escapes, use the printexc__f function from the
standard library (and don't forget to link your program with the -g
option).
Out of memory
The program being executed requires more memory than available. Either
the program builds too large data structures; or the program contains too
many nested function calls, and the stack overflows. In some cases, your
program is perfectly correct, it just requires more memory than your
machine provides. (This happens quite frequently on small
microcomputers, but is unlikely on Unix machines.) In other cases, the
``out of memory'' message reveals an error in your program:
non-terminating recursive function, allocation of an excessively large
array or string, attempts to build an infinite list or other data
structure, ...
To help you diagnose this error, run your program with the -v option to
camlrun. If it displays lots of ``Growing stack...'' messages, this is
probably a looping recursive function. If it displays lots of ``Growing
heap...'' messages, with the heap size growing slowly, this is probably
an attempt to construct a data structure with too many (infinitely many?)
cells. If it displays few ``Growing heap...'' messages, but with a huge
increment in the heap size, this is probably an attempt to build an
excessively large array or string.
Chapter 7
The librarian (camllibr)
Mac: This command is a MPW tool, not a standalone Macintosh application.
7.1 Overview
The camllibr program packs in one single file a set of bytecode object files
(.zo files). The resulting file is also a bytecode object file and also has
the .zo extension. It can be passed to the link phase of the camlc compiler
in replacement of the original set of bytecode object files. That is, after
running
camllibr -o library.zo mod1.zo mod2.zo mod3.zi mod4.zo
all calls to the linker with the form
camlc ... library.zo ...
are exactly equivalent to
camlc ... mod1.zo mod2.zo mod3.zi mod4.zo ...
The typical use of camllibr is to build a library composed of several
modules: this way, users of the library have only one .zo file to specify on
the command line to camlc, instead of a bunch of .zo files, one per module
contained in the library.
The linking phase of camlc is clever enough to discard the code
corresponding to useless phrases: in particular, definitions for global
variables that are never used after their definitions. Hence, there is no
problem with putting many modules, even rarely used ones, into one single
library: this will not result in bigger executables.
The usage for camllibr is:
camllibr options file1.zo ... filen.zo
where file1.zo through filen.zo are the object files to pack together. The
order in which these file names are presented on the command line is relevant:
the compiled phrases contained in the library will be executed in that order.
(Remember that it is a link-time error to refer to a global variable that has
not yet been defined.)
7.2 Options
The following command-line options are recognized by camllibr.
58
Chapter 7. The librarian (camllibr) 59
-I directory
Add the given directory to the list of directories searched for the input
.zo files. By default, the current directory is searched first, then the
standard library directory. Directories added with -I are searched after
the current directory, but before the standard library directory. When
several directories are added with several -I options on the command
line, these directories are searched from right to left (the rightmost
directory is searched first, the leftmost is searched last).
-o library-name
Specify the name of the output file. The default is library.zo.
PC: The following option is also supported:
@response-file
Process the files whose names are listed in file response-file,
just as if these names appeared on the command line. File names in
response-file are separated by blanks (spaces, tabs, newlines).
This option allows to overcome silly limitations on the length of
the command line.
7.3 Turning code into a library
To develop a library, it is usually more convenient to split it into several
modules, that reflect the internal structure of the library. From the
standpoint of the library users, however, it is preferable to view the library
as a single module, with only one interface file (.zi file) and one
implementation file (.zo file): linking is easier, and there is no need to
put a bunch of #open directives, nor to have to remember the internal
structure of the library.
The camllibr command allows having a single .zo file for the whole library.
Here is how the Caml Light module system can be used (some say ``abused'') to
have a single .zi file for the whole library. To be more concrete, assume
that the library comprises three modules, windows, images and buttons. The
idea is to add a fourth module, mylib, that re-exports the public parts of
windows, images and buttons. The interface mylib.mli contains definitions for
those types that are public (exported with their definitions), declarations
for those types that are abstract (exported without their definitions), and
declarations for the functions that can be called from the user's code:
(* File mylib.mli *)
type 'a option = None | Some of 'a;; (* a public type *)
type window and image and button;; (* three abstract types *)
value new_window : int -> int -> window (* the public functions *)
and draw_image : image -> window -> int -> int -> unit
and ...
The implementation of the mylib module simply equates the abstract types and
the public functions to the corresponding types and functions in the modules
windows, images and buttons:
(* File mylib.ml *)
type window == windows__win
and image == images__pixmap
and button == buttons__t;;
Chapter 7. The librarian (camllibr) 60
let new_window = windows__open_window
and draw_image = images__draw
and ...
The files windows.ml, images.ml and buttons.ml can open the mylib module, to
access the public types defined in the interface mylib.mli, such as the option
type. Of course, these modules must not reference the abstract types nor the
public functions, to avoid circularities.
Types such as windows__win in the example above can be exported by the
windows module either abstractly or concretely (with their definition).
Often, it is necessary to export them concretely, because the other modules in
the library (images, buttons) need to build or destructure directly values of
that type. Even if windows__win is exported concretely by the windows module,
that type will remain abstract to the library user, since it is abstracted by
the public interface mylib.
The actual building of the library mylib proceeds as follows:
camlc -c mylib.mli # create mylib.zi
camlc -c windows.mli windows.ml images.mli images.ml
camlc -c buttons.mli buttons.ml
camlc -c mylib.ml # create mylib.zo
mv mylib.zo tmplib.zo # renaming to avoid overwriting mylib.zo
camllibr -o mylib.zo windows.zo images.zo buttons.zo tmplib.zo
Then, copy mylib.zi and mylib.zo to a place accessible to the library users.
The other .zi and .zo files need not be copied.
Chapter 8
Lexer and parser generators (camllex, camlyacc)
This chapter describes two program generators: camllex, that produces a
lexical analyzer from a set of regular expressions with associated semantic
actions, and camlyacc, that produces a parser from a grammar with associated
semantic actions.
These program generators are very close to the well-known lex and yacc
commands that can be found in most C programming environments. This chapter
assumes a working knowledge of lex and yacc: while it describes the input
syntax for camllex and camlyacc and the main differences with lex and yacc, it
does not explain the basics of writing a lexer or parser description in lex
and yacc. Readers unfamiliar with lex and yacc are referred to ``Compilers:
principles, techniques, and tools'' by Aho, Sethi and Ullman (Addison-Wesley,
1986), ``Compiler design in C'' by Holub (Prentice-Hall, 1990), or ``Lex &
Yacc'', by Mason and Brown (O'Reilly, 1990).
Streams and stream matching, as described in section 3.1, provide an
alternative way to write lexers and parsers. The stream matching technique is
more powerful than the combination of camllex and camlyacc in some cases
(higher-order parsers), but less powerful in other cases (precedences).
Choose whichever approach is more adapted to your parsing problem.
Mac: These commands are MPW tool, not standalone Macintosh applications.
8.1 Overview of camllex
The camllex command produces a lexical analyzer from a set of regular
expressions with attached semantic actions, in the style of lex. Assuming the
input file is lexer.mll, executing
camllex lexer.mll
produces Caml Light code for a lexical analyzer in file lexer.ml. This file
defines one lexing function per entry point in the lexer definition. These
functions have the same names as the entry points. Lexing functions take as
argument a lexer buffer, and return the semantic attribute of the
corresponding entry point.
Lexer buffers are an abstract data type implemented in the standard library
module lexing. The functions create_lexer_channel, create_lexer_string and
create_lexer from module lexing create lexer buffers that read from an input
channel, a character string, or any reading function, respectively. (See the
description of module lexing in chapter 13.)
When used in conjunction with a parser generated by camlyacc, the semantic
actions compute a value belonging to the type token defined by the generated
parsing module. (See the description of camlyacc below.)
61
Chapter 8. Lexer and parser generators (camllex, camlyacc) 62
8.2 Syntax of lexer definitions
The format of lexer definitions is as follows:
{ header }
rule entrypoint =
parse regexp { action }
| ...
| regexp { action }
and entrypoint =
parse ...
and ...
;;
Comments are delimited by (* and *), as in Caml Light.
8.2.1 Header
The header section is arbitrary Caml Light text enclosed in curly braces. It
can be omitted. If it is present, the enclosed text is copied as is at the
beginning of the output file. Typically, the header section contains the
#open directives required by the actions, and possibly some auxiliary
functions used in the actions.
8.2.2 Entry points
The names of the entry points must be valid Caml Light identifiers.
8.2.3 Regular expressions
The regular expressions are in the style of lex, with a more Caml-like syntax.
` char `
A character constant, with the same syntax as Caml Light character
constants. Match the denoted character.
_ Match any character.
eof Match the end of the lexer input.
" string "
A string constant, with the same syntax as Caml Light string constants.
Match the corresponding sequence of characters.
[ character-set ]
Match any single character belonging to the given character set. Valid
character sets are: single character constants ` c `; ranges of
characters ` c1 ` - ` c2 ` (all characters between c1 and c2, inclusive);
and the union of two or more character sets, denoted by concatenation.
[ ^ character-set ]
Match any single character not belonging to the given character set.
regexp *
(Repetition.) Match the concatenation of zero or more strings that match
regexp.
Chapter 8. Lexer and parser generators (camllex, camlyacc) 63
regexp +
(Strict repetition.) Match the concatenation of one or more strings that
match regexp.
regexp ?
(Option.) Match either the empty string, or a string matching regexp.
regexp1 | regexp2
(Alternative.) Match any string that matches either regexp1 or regexp2
regexp1 regexp2
(Concatenation.) Match the concatenation of two strings, the first
matching regexp1, the second matching regexp2.
( regexp )
Match the same strings as regexp.
Concerning the precedences of operators, * and + have highest precedence,
followed by ?, then concatenation, then | (alternation).
8.2.4 Actions
The actions are arbitrary Caml Light expressions. They are evaluated in a
context where the identifier lexbuf is bound to the current lexer buffer.
Some typical uses for lexbuf, in conjunction with the operations on lexer
buffers provided by the lexing standard library module, are listed below.
lexing__get_lexeme lexbuf
Return the matched string.
lexing__get_lexeme_char lexbuf n
th
Return the n character in the matched string. The first character
corresponds to n=0.
lexing__get_lexeme_start lexbuf
Return the absolute position in the input text of the beginning of the
matched string. The first character read from the input text has
position 0.
lexing__get_lexeme_end lexbuf
Return the absolute position in the input text of the end of the matched
string. The first character read from the input text has position 0.
entrypoint lexbuf
(Where entrypoint is the name of another entry point in the same lexer
definition.) Recursively call the lexer on the given entry point.
Useful for lexing nested comments, for example.
8.3 Overview of camlyacc
The camlyacc command produces a parser from a context-free grammar
specification with attached semantic actions, in the style of yacc. Assuming
the input file is grammar.mly, executing
camlyacc options grammar.mly
Chapter 8. Lexer and parser generators (camllex, camlyacc) 64
produces Caml Light code for a parser in the file grammar.ml, and its
interface in file grammar.mli.
The generated module defines one parsing function per entry point in the
grammar. These functions have the same names as the entry points. Parsing
functions take as arguments a lexical analyzer (a function from lexer buffers
to tokens) and a lexer buffer, and return the semantic attribute of the
corresponding entry point. Lexical analyzer functions are usually generated
from a lexer specification by the camllex program. Lexer buffers are an
abstract data type implemented in the standard library module lexing. Tokens
are values from the concrete type token, defined in the interface file
grammar.mli produced by camlyacc.
8.4 Syntax of grammar definitions
Grammar definitions have the following format:
%{
header
%}
declarations
%%
rules
%%
trailer
Comments are enclosed between /* and */ (as in C) in the ``declarations''
and ``rules'' sections, and between (* and *) (as in Caml) in the ``header''
and ``trailer'' sections.
8.4.1 Header and trailer
The header and the trailer sections are Caml Light code that is copied as is
into file grammar.ml. Both sections are optional. The header goes at the
beginning of the output file; it usually contains #open directives required by
the semantic actions of the rules. The trailer goes at the end of the output
file.
8.4.2 Declarations
Declarations are given one per line. They all start with a % sign.
%token symbol...symbol
Declare the given symbols as tokens (terminal symbols). These symbols
are added as constant constructors for the token concrete type.
%token < type > symbol...symbol
Declare the given symbols as tokens with an attached attribute of the
given type. These symbols are added as constructors with arguments of
the given type for the token concrete type. The type part is an
arbitrary Caml Light type expression, except that all type constructor
names must be fully qualified (e.g. modname__typename) for all types
except standard built-in types, even if the proper #open directives (e.g.
#open "modname") were given in the header section. That's because the
header is copied only to the .ml output file, but not to the .mli output
file, while the type part of a %token declaration is copied to both.
Chapter 8. Lexer and parser generators (camllex, camlyacc) 65
%start symbol...symbol
Declare the given symbols as entry points for the grammar. For each
entry point, a parsing function with the same name is defined in the
output module. Non-terminals that are not declared as entry points have
no such parsing function. Start symbols must be given a type with the
%type directive below.
%type < type > symbol...symbol
Specify the type of the semantic attributes for the given symbols. This
is mandatory for start symbols only. Other nonterminal symbols need not
be given types by hand: these types will be inferred when running the
output files through the Caml Light compiler (unless the -s option is in
effect). The type part is an arbitrary Caml Light type expression,
except that all type constructor names must be fully qualified (e.g.
modname__typename) for all types except standard built-in types, even if
the proper #open directives (e.g. #open "modname") were given in the
header section. That's because the header is copied only to the .ml
output file, but not to the .mli output file, while the type part of a
%token declaration is copied to both.
%left symbol...symbol
%right symbol...symbol
%nonassoc symbol...symbol
Associate precedences and associativities to the given symbols. All
symbols on the same line are given the same precedence. They have higher
precedence than symbols declared before in a %left, %right or %nonassoc
line. They have lower precedence than symbols declared after in a %left,
%right or %nonassoc line. The symbols are declared to associate to the
left (%left), to the right (%right), or to be non-associative
(%nonassoc). The symbols are usually tokens. They can also be dummy
nonterminals, for use with the %prec directive inside the rules.
8.4.3 Rules
The syntax for rules is as usual:
nonterminal :
symbol ... symbol { semantic-action }
| ...
| symbol ... symbol { semantic-action }
;
Rules can also contain the %prec symbol directive in the right-hand side part,
to override the default precedence and associativity of the rule with the
precedence and associativity of the given symbol.
Semantic actions are arbitrary Caml Light expressions, that are evaluated to
produce the semantic attribute attached to the defined nonterminal. The
semantic actions can access the semantic attributes of the symbols in the
right-hand side of the rule with the $ notation: $1 is the attribute for the
first (leftmost) symbol, $2 is the attribute for the second symbol, etc.
Actions occurring in the middle of rules are not supported. Error recovery
is not implemented.
Chapter 8. Lexer and parser generators (camllex, camlyacc) 66
8.5 Options
The camlyacc command recognizes the following options:
-v Generate a description of the parsing tables and a report on conflicts
resulting from ambiguities in the grammar. The description is put in
file grammar.output.
-s Generate a grammar.ml file with smaller phrases. Semantic actions are
presented in the grammar.ml output file as one large vector of functions.
By default, this vector is built by a single phrase. When the grammar is
large, or contains complicated semantic actions, the resulting phrase may
require large amounts of memory to be compiled by Caml Light. With the
-s option, the vector of actions is constructed incrementally, one phrase
per action. This lowers the memory requirements for the compiler, but it
is no longer possible to infer the types of nonterminal symbols:
typechecking is turned off on symbols that do not have a type specified
by a %type directive.
-bprefix
Name the output files prefix.ml, prefix.mli, prefix.output, instead of
the default naming convention.
8.6 A complete example
The all-time favorite: a desk calculator. This program reads arithmetic
expressions on standard input, one per line, and prints their values. Here is
the grammar definition:
/* File parser.mly */
%token INT
%token PLUS MINUS TIMES DIV
%token LPAREN RPAREN
%token EOL
%left PLUS MINUS /* lowest precedence */
%left TIMES DIV /* medium precedence */
%nonassoc UMINUS /* highest precedence */
%start Main /* the entry point */
%type Main
%%
Main:
Expr EOL { $1 }
;
Expr:
INT { $1 }
| LPAREN Expr RPAREN { $2 }
| Expr PLUS Expr { $1 + $3 }
| Expr MINUS Expr { $1 - $3 }
| Expr TIMES Expr { $1 * $3 }
| Expr DIV Expr { $1 / $3 }
| MINUS Expr %prec UMINUS { - $2 }
;
Here is the definition for the corresponding lexer:
(* File lexer.mll *)
{
Chapter 8. Lexer and parser generators (camllex, camlyacc) 67
#open "parser";; (* The type token is defined in parser.mli *)
exception Eof;;
}
rule Token = parse
[` ` `\t`] { Token lexbuf } (* skip blanks *)
| [`\n` ] { EOL }
| [`0`-`9`]+ { INT(int_of_string (get_lexeme lexbuf)) }
| `+` { PLUS }
| `-` { MINUS }
| `*` { TIMES }
| `/` { DIV }
| `(` { LPAREN }
| `)` { RPAREN }
| eof { raise Eof }
;;
Here is the main program, that combines the parser with the lexer:
(* File calc.ml *)
try
let lexbuf = lexing__create_lexer_channel std_in in
while true do
let result = parser__Main lexer__Token lexbuf in
print_int result; print_newline(); flush std_out
done
with Eof ->
()
;;
To compile everything, execute:
camllex lexer.mll # generates lexer.ml
camlyacc parser.mly # generates parser.ml and parser.mli
camlc -c parser.mli
camlc -c lexer.ml
camlc -c parser.ml
camlc -c calc.ml
camlc -o calc lexer.zo parser.zo calc.zo
Chapter 9
The debugger (camldebug)
This chapter describes the Caml Light source-level replay debugger camldebug.
Unix: The debugger resides in the directory contrib/debugger in the
distribution. It requires a Unix system that provides BSD sockets.
Mac: The debugger is not available.
PC: The debugger is not available.
9.1 Compiling for debugging
Before the debugger can be used, the program must be compiled and linked with
the -g option: all .zo files that are part of the program should have been
created with camlc -g, and they must be linked together with camlc -g.
Compiling with -g entails no penalty on the running time of programs: .zo
files and bytecode executable files are bigger and take slightly longer to
produce, but the executable files run at exactly the same speed as if they had
been compiled without -g. It is therefore perfectly acceptable to compile
always in -g mode.
9.2 Invocation
9.2.1 Starting the debugger
The Caml Light debugger is invoked by running the program camldebug with the
name of the bytecode executable file as argument:
camldebug program
The following command-line options are recognized:
-stdlib directory
Look for the standard library files in directory instead of in the
default directory.
-s socket
Use socket for communicating with the debugged program. See the
description of the command set socket (section 9.8.7) for the format of
socket.
-c count
Set the maximum number of checkpoints to count.
68
Chapter 9. The debugger (camldebug) 69
-cd directory
Run the debugger program from the working directory directory, instead of
the current directory.
-emacs
Tell the debugger it is executing under Emacs. (See section 11.4 for
information on how to run the debugger under Emacs.)
9.2.2 Quitting the debugger
The command quit exits the debugger. You can also exit the debugger by typing
an end-of-file character (usually ctrl-D).
Typing an interrupt character (usually ctrl-C) will not exit the debugger,
but will terminate the action of any debugger command that is in progress and
return to the debugger command level.
9.3 Commands
A debugger command is a single line of input. It starts with a command name,
which is followed by arguments depending on this name. Examples:
run
goto 1000
set arguments arg1 arg2
A command name can be truncated as long as there is no ambiguity. For
instance, go 1000 is understood as goto 1000, since there are no other
commands whose name starts with go. For the most frequently used commands,
ambiguous abbreviations are allowed. For instance, r stands for run even
though there are others commands starting with r. You can test the validity
of an abbreviation using the help command.
If the previous command has been successful, a blank line (typing just RET)
will repeat it.
9.3.1 Getting help
The Caml Light debugger has a simple on-line help system, which gives a brief
description of each command and variable.
help
Print the list of commands.
help command
Give help about the command command.
help set variable, help show variable
Give help about the variable variable. The list of all debugger
variables can be obtained with help set.
help info topic
Give help about topic. Use help info to get a list of known topics.
9.3.2 Accessing the debugger state
set variable value
Set the debugger variable variable to the value value.
Chapter 9. The debugger (camldebug) 70
show variable
Print the value of the debugger variable variable.
info subject
Give information about the given subject. For instance, info breakpoints
will print the list of all breakpoints.
9.4 Executing a program
9.4.1 Events
Events are ``interesting'' locations in the source code, corresponding to the
beginning or end of evaluation of ``interesting'' sub-expressions. Events are
the unit of single-stepping (stepping goes to the next or previous event
encountered in the program execution). Also, breakpoints can only be set at
events. Thus, events play the role of line numbers in debuggers for
conventional languages.
During program execution, a counter is incremented at each event
encountered. The value of this counter is referred as the current time.
Thanks to reverse execution, it is possible to jump back and forth to any time
of the execution.
Here is where the debugger events (written <) are located in the source
code:
- Following a function application:
(f arg)<
- After receiving an argument to a function:
fun x< y>< z -> >< ...
If a curried function is defined by pattern-matching with several cases,
events corresponding to the passing of arguments are displayed on the
first case of the function, because pattern-matching has not yet
determined which case to select:
fun pat1< pat2>< pat3 -> >< ...
| ...
- On each case of a pattern-matching definition (function, match...with
construct, try...with construct):
function pat1 -> < expr1
| ...
| patN -> < exprN
- Between subexpressions of a sequence:
Chapter 9. The debugger (camldebug) 71
expr1; < expr2; >< ...; >< exprN
- In the two branches of a conditional expression:
if cond then < expr1 else >< expr2
- At the beginning of each iteration of a loop:
while cond do < body done
for i = a to b do < body done
Exceptions: A function application followed by a function return is replaced
by the compiler by a jump (tail-call optimization). In this case, no event is
put after the function application. Also, no event is put after a function
application when the function is a primitive function (written in C). Finally,
several events may correspond to the same location in the compiled program.
Then, the debugger cannot distinguish them, and selects one of the events to
associate with the given code location. The event chosen is a ``function
application'' event if there is one at that location, or otherwise the event
which appears last in the source. This heuristic generally picks the ``most
interesting'' event associated with the code location.
9.4.2 Starting the debugged program
The debugger starts executing the debugged program only when needed. This
allows setting breapoints or assigning debugger variables before execution
starts. There are several ways to start execution:
run Run the program until a breakpoint is hit, or the program terminates.
step 0
Load the program and stop on the first event.
goto time
Load the program and execute it until the given time. Useful when you
already know approximately at what time the problem appears. Also useful
to set breakpoints on function values that have not been computed at time
0 (see section 9.5).
The execution of a program is affected by certain information it receives
when the debugger starts it, such as the command-line arguments to the program
and its working directory. The debugger provides commands to specify this
information (set arguments and cd). These commands must be used before
program execution starts. If you try to change the arguments or the working
directory after starting your program, the debugger will kill the program
(after asking for confirmation).
9.4.3 Running the program
The following commands execute the program forward or backward, starting at
the current time. The execution will stop either when specified by the
command or when a breakpoint is encountered.
Chapter 9. The debugger (camldebug) 72
run Execute the program forward from current time. Stops at next breakpoint
or when the program terminates.
reverse
Execute the program backward from current time. Mostly useful to go to
the last breakpoint encountered before the current time.
step [count]
Run the program and stop at the next event. With an argument, do it
count times.
backstep [count]
Run the program backward and stop at the previous event. With an
argument, do it count times.
next [count]
Run the program and stop at the next event, skipping over function calls.
With an argument, do it count times.
finish
Run the program until the current function returns.
9.4.4 Time travel
You can jump directly to a given time, without stopping on breakpoints, using
the goto command.
As you move through the program, the debugger maintains an history of the
successive times you stop at. The last command can be used to revisit these
times: each last command moves one step back through the history. That is
useful mainly to undo commands such as step and next.
goto time
Jump to the given time.
last [count]
Go back to the latest time recorded in the execution history. With an
argument, do it count times.
set history size
Set the size of the execution history.
9.4.5 Killing the program
kill
Kill the program being executed. This command is mainly useful if you
wish to recompile the program without leaving the debugger.
9.5 Breakpoints
A breakpoint causes the program to stop whenever a certain point in the
program is reached. It can be set in several ways using the break command.
Breakpoints are assigned numbers when set, for further reference.
break
Set a breakpoint at the current position in the program execution. The
current position must be on an event (i.e., neither at the beginning, nor
at the end of the program).
Chapter 9. The debugger (camldebug) 73
break function
Set a breakpoint at the beginning of function. This works only when the
functional value of the identifier function has been computed and
assigned to the identifier. Hence this command cannot be used at the
very beginning of the program execution, when all identifiers are still
undefined. Moreover, C functions are not recognized by the debugger.
break @ [module] line
Set a breakpoint in module module (or in the current module if module is
not given), at the first event of line line.
break @ [module] line column
Set a breakpoint in module module (or in the current module if module is
not given), at the event closest to line line, column column.
break @ [module] # character
Set a breakpoint in module module at the event closest to character
number character.
break address
Set a breakpoint at the code address address.
delete [breakpoint-numbers]
Delete the specified breakpoints. Without argument, all breakpoints are
deleted (after asking for confirmation).
info breakpoints
Print the list of all breakpoints.
9.6 The call stack
Each time the program performs a function application, it saves the location
of the application (the return address) in a block of data called a stack
frame. The frame also contains the local variables of the caller function.
All the frames are allocated in a region of memory called the call stack. The
command backtrace (or bt) displays parts of the call stack.
At any time, one of the stack frames is ``selected'' by the debugger;
several debugger commands refer implicitly to the selected frame. In
particular, whenever you ask the debugger for the value of a local variable,
the value is found in the selected frame. The commands frame, up and down
select whichever frame you are interested in.
When the program stops, the debugger automatically selects the currently
executing frame and describes it briefly as the frame command does.
frame
Describe the currently selected stack frame.
frame frame-number
Select a stack frame by number and describe it. The frame currently
executing when the program stopped has number 0; its caller has number 1;
and so on up the call stack.
backtrace [count], bt [count]
Print the call stack. This is useful to see which sequence of function
calls led to the currently executing frame. With a positive argument,
print only the innermost count frames. With a negative argument, print
only the outermost -count frames.
Chapter 9. The debugger (camldebug) 74
up [count]
Select and display the stack frame just ``above'' the selected frame,
that is, the frame that called the selected frame. An argument says how
many frames to go up.
down [count]
Select and display the stack frame just ``below'' the selected frame,
that is, the frame that was called by the selected frame. An argument
says how many frames to go down.
9.7 Examining variable values
The debugger can print the current value of a program variable (either a
global variable or a local variable relative to the selected stack frame). It
can also print selected parts of a value by matching it against a pattern.
Variable names can be specified either fully qualified
(module-name__var-name) or unqualified (var-name). Unqualified names either
correspond to local variables, or are completed into fully qualified global
names by looking at a list of ``opened'' modules that define the same name
(see section 9.8.5 for how to open modules in the debugger.) The completion
follows the same rules as in the Caml Light language (see section 2.2).
print variables
Print the values of the given variables.
match variable pattern
Match the value of the given variable against a pattern, and print the
values bound to the identifiers in the pattern.
The syntax of patterns for the match command extends the one for Caml Light
patterns:
pattern ::= ident
| _
| ( pattern )
| ncconstr pattern
| pattern , pattern {, pattern}
| { label = pattern {; label = pattern} }
| [ ]
| [ pattern {; pattern} ]
| pattern :: pattern
| # integer-literal pattern
| > pattern
The pattern ident, where ident is an identifier, matches any value, and
binds the identifier to this value. The pattern # n pattern matches a list, a
vector or a tuple whose n-th element matches pattern. The pattern > pattern
matches any constructed value whose argument matches pattern, regardless of
the constructor; it is a shortcut for skipping a constructor.
Example: assuming the value of a is Constr{x = [1;2;3;4]}, the command
match a > {x = # 2 k} prints k = 3.
set print_depth d
Limit the printing of values to a maximal depth of d.
set print_length l
Limit the printing of values to at most l nodes printed.
Chapter 9. The debugger (camldebug) 75
9.8 Controlling the debugger
9.8.1 Setting the program name and arguments
set program file
Set the program name to file.
set arguments arguments
Give arguments as command-line arguments for the program.
A shell is used to pass the arguments to the debugged program. You can
therefore use wildcards, shell variables, and file redirections inside the
arguments. To debug programs that read from standard input, it is recommended
to redirect their input from a file (using set arguments < input-file),
otherwise input to the program and input to the debugger are not properly
separated.
9.8.2 How programs are loaded
The loadingmode variable controls how the program is executed.
set loadingmode direct
The program is run directly by the debugger. This is the default mode.
set loadingmode runtime
The debugger execute the Caml Light runtime camlrun on the program.
Rarely useful; moreover it prevents the debugging of programs compiled in
``custom runtime'' mode.
set loadingmode manual
The user starts manually the program, when asked by the debugger. Allows
remote debugging (see section 9.8.7).
9.8.3 Search path for files
The debugger searches for source files and compiled interface files in a list
of directories, the search path. The search path initially contains the
current directory . and the standard library directory. The directory
command adds directories to the path.
Whenever the search path is modified, the debugger will clear any
information it may have cached about the files.
directory directorynames
Add the given directories to the search path. These directories are
added at the front, and will therefore be searched first.
directory
Reset the search path. This requires confirmation.
9.8.4 Working directory
Each time a program is started in the debugger, it inherits its working
directory from the current working directory of the debugger. This working
directory is initially whatever it inherited from its parent process
(typically the shell), but you can specify a new working directory in the
debugger with the cd command or the -cd command-line option.
Chapter 9. The debugger (camldebug) 76
cd directory
Set the working directory for camldebug to directory.
pwd Print the working directory for camldebug.
9.8.5 Module management
Like the Caml Light compiler, the debugger maintains a list of opened modules
in order to resolves variable name ambiguities. The opened modules also
affect the printing of values: whether fully qualified names or short names
are used for constructors and record labels.
When a program is executed, the debugger automatically opens the modules of
the standard library it uses.
open modules
Open the given modules.
close modules
Close the given modules.
info modules
List the modules used by the program, and the open modules.
9.8.6 Turning reverse execution on and off
In some cases, you may want to turn reverse execution off. This speeds up the
program execution, and is also sometimes useful for interactive programs.
Normally, the debugger takes checkpoints of the program state from time to
time. That is, it makes a copy of the current state of the program (using the
Unix system call fork). If the variable checkpoints is set to off, the
debugger will not take any checkpoints.
set checkpoints on/off
Select whether the debugger makes checkpoints or not.
9.8.7 Communication between the debugger and the program
The debugger communicate with the program being debugged through a Unix
socket. You may need to change the socket name, for example if you need to
run the debugger on a machine and your program on another.
set socket socket
Use socket for communication with the program. socket can be either a
file name, or an Internet port specification host:port, where host is a
host name or an Internet address in dot notation, and port is a port
number on the host.
On the debugged program side, the socket name is passed either by the -D
command line option to camlrun, or through the CAML_DEBUG_SOCKET environment
variable.
9.8.8 Fine-tuning the debugger
Several variables enables to fine-tune the debugger. Reasonable defaults are
provided, and you should normally not have to change them.
Chapter 9. The debugger (camldebug) 77
set processcount count
Set the maximum number of checkpoints to count. More checkpoints
facilitate going far back in time, but use more memory and create more
Unix processes.
As checkpointing is quite expensive, it must not be done too often. On the
other hand, backward execution is faster when checkpoints are taken more
often. In particular, backward single-stepping is more responsive when many
checkpoints have been taken just before the current time. To fine-tune the
checkpointing strategy, the debugger does not take checkpoints at the same
frequency for long displacements (e.g. run) and small ones (e.g. step). The
two variables bigstep and smallstep contain the number of events between two
checkpoints in each case.
set bigstep count
Set the number of events between two checkpoints for long displacements.
set smallstep count
Set the number of events between two checkpoints for small displacements.
The following commands display information on checkpoints and events:
info checkpoints
Print a list of checkpoints.
info events [module]
Print the list of events in the given module (the current module, by
default).
9.9 Miscellaneous commands
list [module] [beginning] [end]
List the source of module module, from line number beginning to line
number end. By default, 20 lines of the current module are displayed,
starting 10 lines before the current position.
source filename
Read debugger commands from the script filename.
Chapter 10
Profiling (camlpro)
This chapter describes how the execution of Caml Light programs can be
profiled, by recording how many times functions are called, branches of
conditionals are taken, ...
Mac: This command is not available.
PC: This command is not available.
10.1 Compiling for profiling
Before profiling an execution, the program must be compiled in profiling mode,
using the -p option to the batch compiler camlc (see chapter 4). When
compiling modules separately, the -p option must be given both when compiling
the modules (production of .zo files) and when linking them together.
The amount of profiling information can be controlled by adding one or
several letters after the -p option, indicating which parts of the program
should be profiled:
a all options
f function calls : a count point is set at the beginning of function
bodies
i if ...then ...else ... : count points are set in both then branch and
else branch
l while, for loops: a count point is set at the beginning of the loop body
m match branches: a count point is set at the beginning of the body of
each branch
t try ...with ... branches: a count point is set at the beginning of the
body of each branch
For instance, compiling with -pfilm profiles function calls, if...then
...else..., loops and pattern matching.
The -p option without additional letters defaults to -pfm, meaning that only
function calls and pattern matching are profiled.
78
Chapter 10. Profiling (camlpro) 79
10.2 Profiling an execution
Running a bytecode executable file that has been compiled and linked with -p
records the execution counts for the specified parts of the program and saves
them in a file called camlpro.dump in the current directory.
More precisely, the dump file camlpro.dump is written when the io__exit
function is called. The linker, called with the -p option, adds io__exit 0 as
the last phrase of the bytecode executable, in case the original program never
calls io__exit. However, if the program terminates with an uncaught
exception, the dump file will not be produced.
If a compatible dump file already exists in the current directory, then the
profiling information is accumulated in this dump file. This allows, for
instance, the profiling of several executions of a program on different
inputs.
10.3 Printing profiling information
The camlpro command produces a source listing of the program modules where
execution counts have been inserted as comments. For instance,
camlpro foo.ml
prints the source code for the foo module, with comments indicating how many
times the functions in this module have been called. Naturally, this
information is accurate only if the source file has not been modified since
the profiling execution took place.
The following options are recognized by camlpro:
compiler options -stdlib, -I, -include, -O, -open, -i, -lang
See chapter 4 for the detailed usage.
-f dumpfile
Specifies an alternate dump file of profiling information
-F string
Specifies an additional string to be output with profiling information.
By default, camlpro will annotate progams with comments of the form (* n
*) where n is the counter value for a profiling point. With option -F s,
the annotation will be (* sn *).
An additional argument specifies the output file. For instance
camlpro -f ../test/camlpro.dump foo.ml foo_profiled.ml
will save the annotated program in file foo_profiled.ml. Otherwise, the
annotated program is written on the standard output.
10.4 Known bugs
The following situation (file x.ml)
let a = 1;;
x__a ;;
will break the profiler. More precisely, one should avoid to refer to symbols
of the current module with the qualified symbol syntax.
Chapter 11
Using Caml Light under Emacs
This chapter describes how Caml Light can be used in conjunction with Gnu
Emacs version 19 (version 18 is also partially supported).
Unix: The Emacs Lisp files implementing the Caml/Emacs interface are in
contrib/camlmode in the distribution.
Mac: The Caml/Emacs interface is not available.
PC: The Caml/Emacs interface is not available.
11.1 Updating your .emacs
The following initializations must be added to your .emacs file:
(setq auto-mode-alist (cons '("\\.ml[iylp]?" . caml-mode) auto-mode-alist))
(autoload 'caml-mode "caml" "Major mode for editing Caml code." t)
(autoload 'run-caml "inf-caml" "Run an inferior Caml process." t)
(autoload 'camldebug "camldebug" "Run the Caml debugger." t)
11.2 The caml editing mode
The caml-mode function is a major editing mode for Caml source files. It
provides the correct syntax tables, comment syntax, ... for the Caml language.
An extremely crude indentation facility is provided, as well as a slightly
enhanced next-error command (to display the location of a compilation error).
The following key bindings are performed:
TAB (function caml-indent-command)
At the beginning of a line, indent that line like the line above.
Successive TABs increase the indentation level by 2 spaces (by default;
can be set with the caml-mode-indentation variable).
M-TAB (function caml-unindent-command)
Decrease the indentation level of the current phrase.
C-x ` (function caml-next-error)
Display the next compilation error, just as next-error does. In
addition, it puts the point and the mark around the exact location of the
error (the subexpression that caused the error). Under Emacs 19, that
subexpression is also highlighted.
80
Chapter 11. Using Caml Light under Emacs 81
M-C-h (function caml-mark-phrase)
Mark the Caml phrase that contains the point: the point is put at the
beginning of the phrase and the mark at the end. Phrases are delimited
by ;; (the final double-semicolon). This function does not properly
ignore ;; inside string literals or comments.
C-x SPC
When the Caml debugger is running as an inferior process (section 11.4
below), set a breakpoint at the current position of the point.
M-C-x or C-c C-e (function caml-eval-phrase)
When a Caml toplevel is running as an inferior process (section 11.3
below), send it the the Caml phrase that contains the point. The phrase
will then be evaluated by the inferior toplevel as usual. The phrase is
delimited by ;; as described for the caml-mark-phrase command.
C-c C-r (function caml-eval-region)
Send the region to a Caml toplevel running in an inferior process.
11.3 Running the toplevel as an inferior process
M-x run-caml starts a Caml toplevel with input and output in an Emacs buffer
named *inferior-caml*. This gives you the full power of Emacs to edit the
input to the Caml toplevel. An history of input lines is maintained, as in
Shell mode. This includes the following commands (see the function
comint-mode for a complete description):
RET Send the current line to the toplevel.
M-n and M-p
Move to the next or previous line in the history.
M-r and M-s
Regexp search in the history.
C-c C-c
Send a break (interrupt signal) to the Caml toplevel.
Phrases can also be sent to the Caml toplevel for evaluation from any buffer
in Caml mode, using M-C-x, C-c C-e or C-c C-r.
11.4 Running the debugger as an inferior process
The Caml debugger is started by the command M-x camldebug, with argument the
name of the executable file progname to debug. Communication with the
debugger takes place in an Emacs buffer named *camldebug-progname*. The
editing and history facilities of Shell mode are available for interacting
with the debugger.
In addition, Emacs displays the source files containing the current event
(the current position in the program execution) and highlights the location of
the event. This display is updated synchronously with the debugger action.
The following bindings for the most common debugger commands are available
in the *camldebug-progname* buffer (see section 9.3 for a full explanation of
the commands):
M-r run command: execute the program forward.
Chapter 11. Using Caml Light under Emacs 82
M-s step command: execute the program one step forward.
M-b back command: execute the program one step backward.
M-l last command: go back one step in the command history.
C-c >
down command: select the stack frame below the current frame.
C-c <
up command: select the stack frame above the current frame.
C-c C-f
finish command: run till the current function returns.
In a buffer in Caml editing mode, C-x SPC sets a breakpoint at the current
position of the point.
Chapter 12
Interfacing C with Caml Light
This chapter describes how user-defined primitives, written in C, can be added
to the Caml Light runtime system and called from Caml Light code.
12.1 Overview and compilation information
12.1.1 Declaring primitives
User primitives are declared in a module interface (a .mli file), in the same
way as a regular ML value, except that the declaration is followed by the =
sign, the function arity (number of arguments), and the name of the
corresponding C function. For instance, here is how the input primitive is
declared in the interface for the standard library module io:
value input : in_channel -> string -> int -> int -> int
= 4 "input"
Primitives with several arguments are always curried. The C function does not
necessarily have the same name as the ML function.
Values thus declared primitive in a module interface must not be implemented
in the module implementation (the .ml file). They can be used inside the
module implementation.
12.1.2 Implementing primitives
User primitives with arity n<5 are implemented by C functions that take n
arguments of type value, and return a result of type value. The type value is
the type of the representations for Caml Light values. It encodes objects of
several base types (integers, floating-point numbers, strings, ...), as well
as Caml Light data structures. The type value and the associated conversion
functions and macros are described in details below. For instance, here is
the declaration for the C function implementing the input primitive:
value input(channel, buffer, offset, length)
value channel, buffer, offset, length;
{
...
}
When the primitive function is applied in a Caml Light program, the C
function is called with the values of the expressions to which the primitive
is applied as arguments. The value returned by the function is passed back to
the Caml Light program as the result of the function application.
83
Chapter 12. Interfacing C with Caml Light 84
User primitives with arity greater than 5 are implemented by C functions
that receive two arguments: a pointer to an array of Caml Light values (the
values for the arguments), and an integer which is the number of arguments
provided:
value prim_with_lots_of_args(argv, argn)
value * argv;
int argn;
{
... argv[0] ...; /* The first argument */
... argv[6] ...; /* The seventh argument */
}
Implementing a user primitive is actually two separate tasks: on the one
hand, decoding the arguments to extract C values from the given Caml Light
values, and encoding the return value as a Caml Light value; on the other
hand, actually computing the result from the arguments. Except for very
simple primitives, it is often preferable to have two distinct C functions to
implement these two tasks. The first function actually implements the
primitive, taking native C values as arguments and returning a native C value.
The second function, often called the ``stub code'', is a simple wrapper
around the first function that converts its arguments from Caml Light values
to C values, call the first function, and convert the returned C value to Caml
Light value. For instance, here is the stub code for the input primitive:
value input(channel, buffer, offset, length)
value channel, buffer, offset, length;
{
return Val_long(getblock((struct channel *) channel,
&Byte(buffer, Long_val(offset)),
Long_val(length)));
}
(Here, Val_long, Long_val and so on are conversion macros for the type value,
that will be described later.) The hard work is performed by the function
getblock, which is declared as:
long getblock(channel, p, n)
struct channel * channel;
char * p;
long n;
{
...
}
To write C code that operates on Caml Light values, the following include
files are provided:
------------------------------------------------------------------------
|Include file |Provides |
------------------------------------------------------------------------
|mlvalues.h |definition of the value type, and conversion macros |
|alloc.h |allocation functions (to create structured Caml Light |
| |objects) |
|memory.h |miscellaneous memory-related functions (for in-place |
| |modification of structures, etc). |
------------------------------------------------------------------------
These files reside in the Caml Light standard library directory (usually
/usr/local/lib/caml-light).
Chapter 12. Interfacing C with Caml Light 85
12.1.3 Linking C code with Caml Light code
The Caml Light runtime system comprises three main parts: the bytecode
interpreter, the memory manager, and a set of C functions that implement the
primitive operations. Some bytecode instructions are provided to call these C
functions, designated by their offset in a table of functions (the table of
primitives).
In the default mode, the Caml Light linker produces bytecode for the
standard runtime system, with a standard set of primitives. References to
primitives that are not in this standard set result in the ``unavailable C
primitive'' error.
In the ``custom runtime'' mode, the Caml Light linker scans the bytecode
object files (.zo files) and determines the set of required primitives. Then,
it builds a suitable runtime system, by calling the native code linker with:
- the table of the required primitives
- a library that provides the bytecode interpreter, the memory manager, and
the standard primitives
- libraries and object code files (.o files) mentioned on the command line
for the Caml Light linker, that provide implementations for the user's
primitives.
This builds a runtime system with the required primitives. The Caml Light
linker generates bytecode for this custom runtime system. The bytecode is
appended to the end of the custom runtime system, so that it will be
automatically executed when the output file (custom runtime + bytecode) is
launched.
To link in ``custom runtime'' mode, execute the camlc command with:
- the -custom option
- the names of the desired Caml Light object files (.zo files)
- the names of the C object files and libraries (.o and .a files) that
implement the required primitives. (Libraries can also be specified with
the usual -l syntax.)
12.2 The value type
All Caml Light objects are represented by the C type value, defined in the
include file mlvalues.h, along with macros to manipulate values of that type.
An object of type value is either:
- an unboxed integer
- a pointer to a block inside the heap (such as the blocks allocated
through one of the alloc_* functions below)
- a pointer to an object outside the heap (e.g., a pointer to a block
allocated by malloc, or to a C variable).
12.2.1 Integer values
Integer values encode 31-bit signed integers. They are unboxed (unallocated).
Chapter 12. Interfacing C with Caml Light 86
12.2.2 Blocks
Blocks in the heap are garbage-collected, and therefore have strict structure
constraints. Each block includes a header containing the size of the block
(in words), and the tag of the block. The tag governs how the contents of the
blocks are structured. A tag lower than No_scan_tag indicates a structured
block, containing well-formed values, which is recursively traversed by the
garbage collector. A tag greater than or equal to No_scan_tag indicates a raw
block, whose contents are not scanned by the garbage collector. For the
benefits of ad-hoc polymorphic primitives such as equality and structured
input-output, structured and raw blocks are further classified according to
their tags as follows:
---------------------------------------------------------------------
|Tag |Contents of the block |
---------------------------------------------------------------------
|0 to No_scan_tag- 1 |A structured block (an array of Caml Light |
| |objects). Each field is a value. |
|Closure_tag |A closure representing a functional value. |
| |The first word is a pointer to a piece |
| |of bytecode, the second word is a value |
| |containing the environment. |
|String_tag |A character string. |
|Double_tag |A double-precision floating-point number. |
|Abstract_tag |A block representing an abstract datatype. |
|Final_tag |A block representing an abstract datatype with |
| |a ``finalization'' function, to be called when |
| |the block is deallocated. |
---------------------------------------------------------------------
12.2.3 Pointers to outside the heap
Any pointer to outside the heap can be safely cast to and from the type value.
This includes pointers returned by malloc, and pointers to C variables
obtained with the & operator.
12.3 Representation of Caml Light data types
This section describes how Caml Light data types are encoded in the value
type.
12.3.1 Atomic types
-------------------------------------------------
|Caml type |Encoding |
-------------------------------------------------
|int |Unboxed integer values. |
|char |Unboxed integer values (ASCII code). |
|float |Blocks with tag Double_tag. |
|string |Blocks with tag String_tag. |
-------------------------------------------------
12.3.2 Product types
Tuples and arrays are represented by pointers to blocks, with tag 0.
Records are also represented by zero-tagged blocks. The ordering of labels
in the record type declaration determines the layout of the record fields:
the value associated to the label declared first is stored in field 0 of the
block, the value associated to the label declared next goes in field 1, and so
on.
Chapter 12. Interfacing C with Caml Light 87
12.3.3 Concrete types
Constructed terms are represented by blocks whose tag encode the constructor.
The constructors for a given concrete type are numbered from 0 to the number
of constructors minus one, following the order in which they appear in the
concrete type declaration. Constant constructors are represented by
zero-sized blocks (atoms), tagged with the constructor number. Non-constant
constructors declared with a n-tuple as argument are represented by a block of
size n, tagged with the constructor number; the n fields contain the
components of its tuple argument. Other non-constant constructors are
represented by a block of size 1, tagged with the constructor number; the
field 0 contains the value of the constructor argument. Example:
------------------------------------------------------------------------
|Constructed term|Representation |
------------------------------------------------------------------------
|() |Size = 0, tag = 0 |
|false |Size = 0, tag = 0 |
|true |Size = 0, tag = 1 |
|[] |Size = 0, tag = 0 |
|h::t |Size = 2, tag = 1, first field = h, second field = t |
------------------------------------------------------------------------
12.4 Operations on values
12.4.1 Kind tests
- Is_int(v) is true if value v is an immediate integer, false otherwise
- Is_block(v) is true if value v is a pointer to a block, and false if it
is an immediate integer.
12.4.2 Operations on integers
- Val_long(l) returns the value encoding the long int l
- Long_val(v) returns the long int encoded in value v
- Val_int(i) returns the value encoding the int i
- Int_val(v) returns the int encoded in value v
12.4.3 Accessing blocks
- Wosize_val(v) returns the size of value v, in words, excluding the
header.
- Tag_val(v) returns the tag of value v.
th
- Field(v,n) returns the value contained in the n field of the structured
block v. Fields are numbered from 0 to Wosize_val(v)-1.
- Code_val(v) returns the code part of the closure v.
- Env_val(v) returns the environment part of the closure v.
- string_length(v) returns the length (number of characters) of the string
v.
Chapter 12. Interfacing C with Caml Light 88
th
- Byte(v,n) returns the n character of the string v, with type char.
Characters are numbered from 0 to string_length(v)-1.
th
- Byte_u(v,n) returns the n character of the string v, with type unsigned
char. Characters are numbered from 0 to string_length(v)-1.
- String_val(v) returns a pointer to the first byte of the string v, with
type char *. This pointer is a valid C string: there is a null
character after the last character in the string. However, Caml Light
strings can contain embedded null characters, that will confuse the usual
C functions over strings.
- Double_val(v) returns the floating-point number contained in value v,
with type double.
The expressions Field(v,n), Code_val(v), Env_val(v), Byte(v,n), Byte_u(v,n)
and Double_val(v) are valid l-values. Hence, they can be assigned to,
resulting in an in-place modification of value v. Assigning directly to
Field(v,n) must be done with care to avoid confusing the garbage collector
(see below).
12.4.4 Allocating blocks
From the standpoint of the allocation functions, blocks are divided according
to their size as zero-sized blocks, small blocks (with size less than or equal
to Max_young_wosize), and large blocks (with size greater than to
Max_young_wosize). The constant Max_young_wosize is declared in the include
file mlvalues.h. It is guaranteed to be at least 64 (words), so that any
block with constant size less than or equal to 64 can be assumed to be small.
For blocks whose size is computed at run-time, the size must be compared
against Max_young_wosize to determine the correct allocation procedure.
- Atom(t) returns an ``atom'' (zero-sized block) with tag t. Zero-sized
blocks are preallocated outside of the heap. It is incorrect to try and
allocate a zero-sized block using the functions below. For instance,
Atom(0) represents (), false and []; Atom(1) represents true. (As a
convenience, mlvalues.h defines the macros Val_unit, Val_false and
Val_true.)
- alloc(n,t) returns a fresh small block of size n< Max_young_wosize words,
with tag t. If this block is a structured block (i.e. if
t window = 1 "curses_initscr"
and endwin: unit -> unit = 1 "curses_endwin"
and refresh: unit -> unit = 1 "curses_refresh"
and wrefresh : window -> unit = 1 "curses_wrefresh"
and newwin: int -> int -> int -> int -> window = 4 "curses_newwin"
and mvwin: window -> int -> int -> unit = 3 "curses_mvwin"
and addch: char -> unit = 1 "curses_addch"
and mvwaddch: window -> int -> int -> char -> unit = 4 "curses_mvwaddch"
and addstr: string -> unit = 1 "curses_addstr"
and mvwaddstr: window -> int -> int -> string -> unit = 4 "curses_mvwaddstr"
;; (* lots more omitted *)
To compile this interface:
camlc -c curses.mli
To implement these functions, we just have to provide the stub code; the
core functions are already implemented in the curses library. The stub code
file, curses.o, looks like:
#include
#include
value curses_initscr(unit)
value unit;
{
Chapter 12. Interfacing C with Caml Light 92
return (value) initscr(); /* OK to coerce directly from WIN-
DOW * to value
since that's a block created by malloc() */
}
value curses_wrefresh(win)
value win;
{
wrefresh((WINDOW *) win);
return Val_unit;
}
value curses_newwin(nlines, ncols, x0, y0)
value nlines, ncols, x0, y0;
{
return (value) newwin(Int_val(nlines), Int_val(ncols),
Int_val(x0), Int_val(y0));
}
value curses_addch(c)
value c;
{
addch(Int_val(c)); /* Characters are encoded like integers */
return Val_unit;
}
value curses_addstr(s)
value s;
{
addstr(String_val(s));
return Val_unit;
}
/* This goes on for pages. */
(Actually, it would be better to create a library for the stub code, with each
stub code function in a separate file, so that linking would pick only those
functions from the curses library that are actually used.)
The file curses.c can be compiled with:
cc -c -I/usr/local/lib/caml-light curses.c
or, even simpler,
camlc -c curses.c
(When passed a .c file, the camlc command simply calls cc on that file, with
the right -I option.)
Now, here is a sample Caml Light program test.ml that uses the curses
module:
#open "curses";;
let main_window = initscr () in
let small_window = newwin 10 5 20 10 in
mvwaddstr main_window 10 2 "Hello";
mvwaddstr small_window 4 3 "world";
refresh();
for i = 1 to 100000 do () done;
Chapter 12. Interfacing C with Caml Light 93
endwin()
;;
To compile this program, run:
camlc -c test.ml
Finally, to link everything together:
camlc -custom -o test test.zo curses.o -lcurses
Part IV
The Caml Light library
94
Chapter 13
The core library
This chapter describes the functions provided by the Caml Light core library.
This library is special in two ways:
- It is automatically linked with the user's object code files by the camlc
command (chapter 4). Hence, the globals defined by these libraries can
be used in standalone programs without having to add any .zo file on the
command line for the linking phase. Similarly, in interactive use, these
globals can be used in toplevel phrases without having to load any .zo
file in memory.
- The interfaces for the modules below are automatically ``opened'' when a
compilation starts, or when the toplevel system is launched. Hence, it
is possible to use unqualified identifiers to refer to the functions
provided by these modules, without adding #open directives. Actually,
the list of automatically opened modules depend on the -O option given to
the compiler or to the toplevel system:
--------------------------------------------------------------
|-O option |Opened modules (reverse opening |
| |order) |
--------------------------------------------------------------
|-O cautious (default) |io, eq, int, float, ref, pair, |
| |list, vect, char, string, bool, exc, |
| |stream, builtin |
|-O fast |io, eq, int, float, ref, pair, list, |
| |fvect, fchar, fstring, bool, exc, |
| |stream, builtin |
|-O none |builtin |
--------------------------------------------------------------
Conventions
For easy reference, the modules are listed below in alphabetical order of
module names. For each module, the declarations from its interface file are
printed one by one in typewriter font, followed by a short comment. All
modules and the identifiers they export are indexed at the end of this report.
13.1 bool: boolean operations
value prefix & : bool -> bool -> bool
value prefix && : bool -> bool -> bool
value prefix or : bool -> bool -> bool
value prefix || : bool -> bool -> bool
95
Chapter 13. The core library 96
The boolean and is written e1 & e2 or e1 && e2. The boolean or is
written e1 or e2 or e1 || e2. Both constructs are sequential,
left-to-right: e2 is evaluated only if needed. Actually, e1 & e2 is
equivalent to if e1 then e2 else false, and e1 or e2 is equivalent to
if e1 then true else e2.
value prefix not : bool -> bool
The boolean negation.
value string_of_bool : bool -> string
Return a string representing the given boolean.
13.2 builtin: base types and constructors
This module defines some types and exceptions for which the language
provides special syntax, and are therefore treated specially by the
compiler.
type int
type float
type string
type char
The types of integers, floating-point numbers, character strings, and
characters, respectively.
type exn
The type of exception values.
type bool = false | true
The type of boolean values.
type 'a vect
The type of arrays whose elements have type 'a.
type unit = ()
The type of the unit value.
type 'a list = [] | prefix :: of 'a * 'a list
The type of lists.
type 'a option = None | Some of 'a
The type of optional values.
exception Match_failure of string * int * int
The exception raised when a pattern-matching fails. The argument
indicates the position in the source code of the pattern-matching (source
Chapter 13. The core library 97
file name, position of the first character of the matching, position of
the last character.
13.3 char: character operations
value int_of_char : char -> int
Return the ASCII code of the argument.
value char_of_int : int -> char
Return the character with the given ASCII code. Raise
Invalid_argument "char_of_int" if the argument is outside the range
0--255.
value string_of_char : char -> string
Return a string representing the given character.
value char_for_read : char -> string
Return a string representing the given character, with special characters
escaped following the lexical conventions of Caml Light.
13.4 eq: generic comparisons
value prefix = : 'a -> 'a -> bool
e1 = e2 tests for structural equality of e1 and e2. Mutable structures
(e.g. references and arrays) are equal if and only if their current
contents are structurally equal, even if the two mutable objects are not
the same physical object. Equality between functional values raises
Invalid_argument. Equality between cyclic data structures may not
terminate.
value prefix <> : 'a -> 'a -> bool
Negation of prefix =.
value prefix < : 'a -> 'a -> bool
value prefix <= : 'a -> 'a -> bool
value prefix > : 'a -> 'a -> bool
value prefix >= : 'a -> 'a -> bool
Structural ordering functions. These functions coincide with the usual
orderings over integer, string and floating-point numbers, and extend
them to a total ordering over all types. The ordering is compatible with
prefix =. As in the case of prefix =, mutable structures are compared by
contents. Comparison between functional values raises Invalid_argument.
Comparison between cyclic structures may not terminate.
value compare: 'a -> 'a -> int
Chapter 13. The core library 98
compare x y returns 0 if x=y, a negative integer if xy. The same restrictions as for = apply. compare can be
used as the comparison function required by the set and map modules.
value min: 'a -> 'a -> 'a
Return the smaller of the two arguments.
value max: 'a -> 'a -> 'a
Return the greater of the two arguments.
value prefix == : 'a -> 'a -> bool
e1 == e2 tests for physical equality of e1 and e2. On integers and
characters, it is the same as structural equality. On mutable
structures, e1 == e2 is true if and only if physical modification of e1
also affects e2. On non-mutable structures, the behavior of prefix == is
implementation-dependent, except that e1 == e2 implies e1 = e2.
value prefix != : 'a -> 'a -> bool
Negation of prefix ==.
13.5 exc: exceptions
value raise : exn -> 'a
Raise the given exception value.
A few general-purpose predefined exceptions.
exception Out_of_memory
Raised by the garbage collector, when there is insufficient memory to
complete the computation.
exception Invalid_argument of string
Raised by library functions to signal that the given arguments do not
make sense.
exception Failure of string
Raised by library functions to signal that they are undefined on the
given arguments.
exception Not_found
Raised by search functions when the desired object could not be found.
exception Exit
This exception is not raised by any library function. It is provided for
use in your programs.
Chapter 13. The core library 99
value failwith : string -> 'a
Raise exception Failure with the given string.
value invalid_arg : string -> 'a
Raise exception Invalid_argument with the given string.
13.6 fchar: character operations, without sanity checks
This module implements the same functions as the char module, but does
not perform bound checks on the arguments of the functions. The
functions are therefore faster than those in the char module, but calling
these functions with incorrect parameters (that is, parameters that would
cause the Invalid_argument exception to be raised by the corresponding
functions in the char module) can crash the program.
13.7 float: operations on floating-point numbers
value int_of_float : float -> int
Truncate the given float to an integer value. The result is unspecified
if it falls outside the range of representable integers.
value float_of_int : int -> float
Convert an integer to floating-point.
value minus : float -> float
value minus_float : float -> float
Unary negation.
value prefix + : float -> float -> float
value prefix +. : float -> float -> float
value add_float : float -> float -> float
Addition.
value prefix - : float -> float -> float
value prefix -. : float -> float -> float
value sub_float : float -> float -> float
Subtraction.
value prefix * : float -> float -> float
value prefix *. : float -> float -> float
value mult_float : float -> float -> float
Product.
value prefix / : float -> float -> float
value prefix /. : float -> float -> float
value div_float : float -> float -> float
Chapter 13. The core library 100
Division.
value prefix ** : float -> float -> float
value prefix **. : float -> float -> float
value power : float -> float -> float
Exponentiation.
value eq_float : float -> float -> bool
value prefix =. : float -> float -> bool
Floating-point equality. Equivalent to generic equality, just faster.
value neq_float : float -> float -> bool
value prefix <>. : float -> float -> bool
Negation of eq_float.
value prefix <. : float -> float -> bool
value lt_float : float -> float -> bool
value prefix >. : float -> float -> bool
value gt_float : float -> float -> bool
value prefix <=. : float -> float -> bool
value le_float : float -> float -> bool
value prefix >=. : float -> float -> bool
value ge_float : float -> float -> bool
Usual comparisons between floating-point numbers.
value acos : float -> float
value asin : float -> float
value atan : float -> float
value atan2 : float -> float -> float
value cos : float -> float
value cosh : float -> float
value exp : float -> float
value log : float -> float
value log10 : float -> float
value sin : float -> float
value sinh : float -> float
value sqrt : float -> float
value tan : float -> float
value tanh : float -> float
Usual transcendental functions on floating-point numbers.
value ceil : float -> float
value floor : float -> float
Round the given float to an integer value. floor f returns the greatest
integer value less than or equal to f. ceil f returns the least integer
value greater than or equal to f.
value abs_float : float -> float
Return the absolute value of the argument.
Chapter 13. The core library 101
value mod_float : float -> float -> float
fmod a b returns the remainder of a with respect to b.
value frexp : float -> float * int
frexp f returns the pair of the significant and the exponent of f (when f
is zero, the significant x and the exponent n of f are equal to zero;
when f is non-zero, they are defined by f = x *. 2 ** n).
value ldexp : float -> int -> float
ldexp x n returns x *. 2 ** n.
value modf : float -> float * float
modf f returns the pair of the fractional and integral part of f.
value string_of_float : float -> string
Convert the given float to its decimal representation.
value float_of_string : string -> float
Convert the given string to a float, in decimal. The result is
unspecified if the given string is not a valid representation of a float.
13.8 fstring: string operations, without sanity checks
This module implements the same functions as the string module, but does
not perform bound checks on the arguments of the functions. The
functions are therefore faster than those in the string module, but
calling these functions with incorrect parameters (that is, parameters
that would cause the Invalid_argument exception to be raised by the
corresponding functions in the string module) can crash the program.
13.9 fvect: operations on vectors, without sanity checks
This module implements the same functions as the vect module, but does
not perform bound checks on the arguments of the functions. The
functions are therefore faster than those in the vect module, but calling
these functions with incorrect parameters (that is, parameters that would
cause the Invalid_argument exception to be raised by the corresponding
functions in the vect module) can crash the program.
13.10 int: operations on integers
Integers are 31 bits wide (or 63 bits on 64-bit processors). All
31 63
operations are taken modulo 2 (or 2 ). They do not fail on overflow.
exception Division_by_zero
value minus : int -> int
Chapter 13. The core library 102
value minus_int : int -> int
Unary negation. You can write -e instead of minus e.
value succ : int -> int
succ x is x+1.
value pred : int -> int
pred x is x-1.
value prefix + : int -> int -> int
value add_int : int -> int -> int
Addition.
value prefix - : int -> int -> int
value sub_int : int -> int -> int
Subtraction.
value prefix * : int -> int -> int
value mult_int : int -> int -> int
Multiplication.
value prefix / : int -> int -> int
value div_int : int -> int -> int
value prefix quo : int -> int -> int
Integer division. Raise Division_by_zero if the second argument is 0.
Give unpredictable results if either argument is negative.
value prefix mod : int -> int -> int
Remainder. Raise Division_by_zero if the second argument is 0. Give
unpredictable results if either argument is negative.
value eq_int : int -> int -> bool
Integer equality. Equivalent to generic equality, just faster.
value neq_int : int -> int -> bool
Negation of eq_int.
value lt_int : int -> int -> bool
value gt_int : int -> int -> bool
value le_int : int -> int -> bool
value ge_int : int -> int -> bool
Usual comparisons between integers.
value abs : int -> int
Return the absolute value of the argument.
Chapter 13. The core library 103
value max_int : int
value min_int : int
The greatest and smallest integer values.
Bitwise operations
value prefix land : int -> int -> int
Bitwise logical and.
value prefix lor : int -> int -> int
Bitwise logical or.
value prefix lxor : int -> int -> int
Bitwise logical exclusive or.
value lnot : int -> int
Bitwise complement
value prefix lsl : int -> int -> int
value lshift_left : int -> int -> int
n lsl m, or equivalently lshift_left n m, shifts n to the left by m bits.
value prefix lsr : int -> int -> int
n lsr m shifts n to the right by m bits. This is a logical shift:
zeroes are inserted regardless of sign.
value prefix asr : int -> int -> int
value lshift_right : int -> int -> int
n asr m, or equivalently lshift_right n m, shifts n to the right by m
bits. This is an arithmetic shift: the sign bit is replicated.
Conversion functions
value string_of_int : int -> string
Convert the given integer to its decimal representation.
value int_of_string : string -> int
Convert the given string to an integer, in decimal (by default) or in
hexadecimal, octal or binary if the string begins with 0x, 0o or 0b.
Raise Failure "int_of_string" if the given string is not a valid
representation of an integer.
Chapter 13. The core library 104
13.11 io: buffered input and output
type in_channel
type out_channel
The abstract types of input channels and output channels.
exception End_of_file
Raised when an operation cannot complete, because the end of the file has
been reached.
value stdin : in_channel
value std_in : in_channel
value stdout : out_channel
value std_out : out_channel
value stderr : out_channel
value std_err : out_channel
The standard input, standard output, and standard error output for the
process. std_in, std_out and std_err are respectively synonymous with
stdin, stdout and stderr.
value exit : int -> 'a
Flush all pending writes on std_out and std_err, and terminate the
process, returning the given status code to the operating system (usually
0 to indicate no errors, and a small positive integer to indicate
failure.) This function should be called at the end of all standalone
programs that output results on std_out or std_err; otherwise, the
program may appear to produce no output, or its output may be truncated.
Output functions on standard output
value print_char : char -> unit
Print the character on standard output.
value print_string : string -> unit
Print the string on standard output.
value print_int : int -> unit
Print the integer, in decimal, on standard output.
value print_float : float -> unit
Print the floating-point number, in decimal, on standard output.
value print_endline : string -> unit
Print the string, followed by a newline character, on standard output.
value print_newline : unit -> unit
Chapter 13. The core library 105
Print a newline character on standard output, and flush standard output.
This can be used to simulate line buffering of standard output.
Output functions on standard error
value prerr_char : char -> unit
Print the character on standard error.
value prerr_string : string -> unit
Print the string on standard error.
value prerr_int : int -> unit
Print the integer, in decimal, on standard error.
value prerr_float : float -> unit
Print the floating-point number, in decimal, on standard error.
value prerr_endline : string -> unit
Print the string, followed by a newline character on standard error and
flush standard error.
Input functions on standard input
value read_line : unit -> string
Flush standard output, then read characters from standard input until a
newline character is encountered. Return the string of all characters
read, without the newline character at the end.
value read_int : unit -> int
Flush standard output, then read one line from standard input and convert
it to an integer. Raise Failure "int_of_string" if the line read is not
a valid representation of an integer.
value read_float : unit -> float
Flush standard output, then read one line from standard input and convert
it to a floating-point number. The result is unspecified if the line
read is not a valid representation of a floating-point number.
General output functions
value open_out : string -> out_channel
Open the named file for writing, and return a new output channel on that
file, positionned at the beginning of the file. The file is truncated to
zero length if it already exists. It is created if it does not already
exists. Raise sys__Sys_error if the file could not be opened.
Chapter 13. The core library 106
value open_out_bin : string -> out_channel
Same as open_out, but the file is opened in binary mode, so that no
translation takes place during writes. On operating systems that do not
distinguish between text mode and binary mode, this function behaves like
open_out.
value open_out_gen : sys__open_flag list -> int -> string -> out_channel
open_out_gen mode rights filename opens the file named filename for
writing, as above. The extra argument mode specify the opening mode (see
sys__open). The extra argument rights specifies the file permissions, in
case the file must be created (see sys__open). open_out and open_out_bin
are special cases of this function.
value open_descriptor_out : int -> out_channel
open_descriptor_out fd returns a buffered output channel writing to the
file descriptor fd. The file descriptor fd must have been previously
opened for writing, else the behavior is undefined.
value flush : out_channel -> unit
Flush the buffer associated with the given output channel, performing all
pending writes on that channel. Interactive programs must be careful
about flushing std_out and std_err at the right time.
value output_char : out_channel -> char -> unit
Write the character on the given output channel.
value output_string : out_channel -> string -> unit
Write the string on the given output channel.
value output : out_channel -> string -> int -> int -> unit
output chan buff ofs len writes len characters from string buff, starting
at offset ofs, to the output channel chan. Raise
Invalid_argument "output" if ofs and len do not designate a valid
substring of buff.
value output_byte : out_channel -> int -> unit
Write one 8-bit integer (as the single character with that code) on the
given output channel. The given integer is taken modulo 256.
value output_binary_int : out_channel -> int -> unit
Write one integer in binary format on the given output channel. The only
reliable way to read it back is through the input_binary_int function.
The format is compatible across all machines for a given version of Caml
Light.
value output_value : out_channel -> 'a -> unit
Write the representation of a structured value of any type to a channel.
Circularities and sharing inside the value are detected and preserved.
Chapter 13. The core library 107
The object can be read back, by the function input_value. The format is
compatible across all machines for a given version of Caml Light.
value output_compact_value : out_channel -> 'a -> unit
Same as output_value, but uses a different format, which occupies less
space on the file, but takes more time to generate and read back.
value seek_out : out_channel -> int -> unit
seek_out chan pos sets the current writing position to pos for channel
chan. This works only for regular files. On files of other kinds (such
as terminals, pipes and sockets), the behavior is unspecified.
value pos_out : out_channel -> int
Return the current writing position for the given channel.
value out_channel_length : out_channel -> int
Return the total length (number of characters) of the given channel.
This works only for regular files. On files of other kinds, the result
is meaningless.
value close_out : out_channel -> unit
Close the given channel, flushing all buffered write operations. The
behavior is unspecified if any of the functions above is called on a
closed channel.
General input functions
value open_in : string -> in_channel
Open the named file for reading, and return a new input channel on that
file, positionned at the beginning of the file. Raise sys__Sys_error if
the file could not be opened.
value open_in_bin : string -> in_channel
Same as open_in, but the file is opened in binary mode, so that no
translation takes place during reads. On operating systems that do not
distinguish between text mode and binary mode, this function behaves like
open_in.
value open_in_gen : sys__open_flag list -> int -> string -> in_channel
open_in_gen mode rights filename opens the file named filename for
reading, as above. The extra arguments mode and rights specify the
opening mode and file permissions (see sys__open). open_in and
open_in_bin are special cases of this function.
value open_descriptor_in : int -> in_channel
open_descriptor_in fd returns a buffered input channel reading from the
file descriptor fd. The file descriptor fd must have been previously
opened for reading, else the behavior is undefined.
Chapter 13. The core library 108
value input_char : in_channel -> char
Read one character from the given input channel. Raise End_of_file if
there are no more characters to read.
value input_line : in_channel -> string
Read characters from the given input channel, until a newline character
is encountered. Return the string of all characters read, without the
newline character at the end. Raise End_of_file if the end of the file
is reached at the beginning of line.
value input : in_channel -> string -> int -> int -> int
input chan buff ofs len attempts to read len characters from channel
chan, storing them in string buff, starting at character number ofs. It
returns the actual number of characters read, between 0 and len
(inclusive). A return value of 0 means that the end of file was reached.
A return value between 0 and len exclusive means that no more characters
were available at that time; input must be called again to read the
remaining characters, if desired. Exception Invalid_argument "input" is
raised if ofs and len do not designate a valid substring of buff.
value really_input : in_channel -> string -> int -> int -> unit
really_input chan buff ofs len reads len characters from channel chan,
storing them in string buff, starting at character number ofs. Raise
End_of_file if the end of file is reached before len characters have been
read. Raise Invalid_argument "really_input" if ofs and len do not
designate a valid substring of buff.
value input_byte : in_channel -> int
Same as input_char, but return the 8-bit integer representing the
character. Raise End_of_file if an end of file was reached.
value input_binary_int : in_channel -> int
Read an integer encoded in binary format from the given input channel.
See output_binary_int. Raise End_of_file if an end of file was reached
while reading the integer.
value input_value : in_channel -> 'a
Read the representation of a structured value, as produced by
output_value or output_compact_value, and return the corresponding value.
This is not type-safe. The type of the returned object is not 'a
properly speaking: the returned object has one unique type, which cannot
be determined at compile-time. The programmer should explicitly give the
expected type of the returned value, using the following syntax:
(input_value chan : type). The behavior is unspecified if the object in
the file does not belong to the given type.
value seek_in : in_channel -> int -> unit
seek_in chan pos sets the current reading position to pos for channel
chan. This works only for regular files. On files of other kinds, the
behavior is unspecified.
Chapter 13. The core library 109
value pos_in : in_channel -> int
Return the current reading position for the given channel.
value in_channel_length : in_channel -> int
Return the total length (number of characters) of the given channel.
This works only for regular files. On files of other kinds, the result
is meaningless.
value close_in : in_channel -> unit
Close the given channel. Anything can happen if any of the functions
above is called on a closed channel.
13.12 list: operations on lists
value list_length : 'a list -> int
Return the length (number of elements) of the given list.
value prefix @ : 'a list -> 'a list -> 'a list
List concatenation.
value hd : 'a list -> 'a
Return the first element of the given list. Raise Failure "hd" if the
list is empty.
value tl : 'a list -> 'a list
Return the given list without its first element. Raise Failure "tl" if
the list is empty.
value rev : 'a list -> 'a list
List reversal.
value map : ('a -> 'b) -> 'a list -> 'b list
map f [a1; ...; an] applies function f to a1, ..., an, and builds the
list [f a1; ...; f an] with the results returned by f.
value do_list : ('a -> unit) -> 'a list -> unit
do_list f [a1; ...; an] applies function f in turn to a1; ...; an,
discarding all the results. It is equivalent to
begin f a1; f a2; ...; f an; () end.
value it_list : ('a -> 'b -> 'a) -> 'a -> 'b list -> 'a
it_list f a [b1; ...; bn] is f (... (f (f a b1) b2) ...) bn.
Chapter 13. The core library 110
value list_it : ('a -> 'b -> 'b) -> 'a list -> 'b -> 'b
list_it f [a1; ...; an] b is f a1 (f a2 (... (f an b) ...)).
value map2 : ('a -> 'b -> 'c) -> 'a list -> 'b list -> 'c list
map2 f [a1; ...; an] [b1; ...; bn] is [f a1 b1; ...; f an bn]. Raise
Invalid_argument "map2" if the two lists have different lengths.
value do_list2 : ('a -> 'b -> unit) -> 'a list -> 'b list -> unit
do_list2 f [a1; ...; an] [b1; ...; bn] calls in turn
f a1 b1; ...; f an bn, discarding the results. Raise
Invalid_argument "do_list2" if the two lists have different lengths.
value it_list2 : ('a -> 'b -> 'c -> 'a) -> 'a -> 'b list -> 'c list -> 'a
it_list2 f a [b1; ...; bn] [c1; ...; cn] is
f (... (f (f a b1 c1) b2 c2) ...) bn cn. Raise
Invalid_argument "it_list2" if the two lists have different lengths.
value list_it2 : ('a -> 'b -> 'c -> 'c) -> 'a list -> 'b list -> 'c -> 'c
list_it2 f [a1; ...; an] [b1; ...; bn] c is
f a1 b1 (f a2 b2 (... (f an bn c) ...)). Raise
Invalid_argument "list_it2" if the two lists have different lengths.
value flat_map : ('a -> 'b list) -> 'a list -> 'b list
flat_map f [l1; ...; ln] is (f l1) @ (f l2) @ ... @ (f ln).
value for_all : ('a -> bool) -> 'a list -> bool
for_all p [a1; ...; an] is (p a1) & (p a2) & ... & (p an).
value exists : ('a -> bool) -> 'a list -> bool
exists p [a1; ...; an] is (p a1) or (p a2) or ... or (p an).
value mem : 'a -> 'a list -> bool
mem a l is true if and only if a is structurally equal (see module eq) to
an element of l.
value memq : 'a -> 'a list -> bool
memq a l is true if and only if a is physically equal (see module eq) to
an element of l.
value except : 'a -> 'a list -> 'a list
except a l returns the list l where the first element structurally equal
to a has been removed. The list l is returned unchanged if it does not
contain a.
value exceptq : 'a -> 'a list -> 'a list
Same as except, with physical equality instead of structural equality.
Chapter 13. The core library 111
value subtract : 'a list -> 'a list -> 'a list
subtract l1 l2 returns the list l1 where all elements structurally equal
to one of the elements of l2 have been removed.
value union : 'a list -> 'a list -> 'a list
union l1 l2 appends before list l2 all the elements of list l1 that are
not structurally equal to an element of l2.
value intersect : 'a list -> 'a list -> 'a list
intersect l1 l2 returns the list of the elements of l1 that are
structurally equal to an element of l2.
value index : 'a -> 'a list -> int
index a l returns the position of the first element of list l that is
structurally equal to a. The head of the list has position 0. Raise
Not_found if a is not present in l.
value assoc : 'a -> ('a * 'b) list -> 'b
assoc a l returns the value associated with key a in the list of pairs l.
That is, assoc a [ ...; (a,b); ...] = b if (a,b) is the leftmost binding
of a in list l. Raise Not_found if there is no value associated with a
in the list l.
value assq : 'a -> ('a * 'b) list -> 'b
Same as assoc, but use physical equality instead of structural equality
to compare keys.
value mem_assoc : 'a -> ('a * 'b) list -> bool
Same as assoc, but simply return true if a binding exists, and false if
no bindings exist for the given key.
13.13 pair: operations on pairs
value fst : 'a * 'b -> 'a
Return the first component of a pair.
value snd : 'a * 'b -> 'b
Return the second component of a pair.
value split : ('a * 'b) list -> 'a list * 'b list
Transform a list of pairs into a pair of lists:
split [(a1,b1); ...; (an,bn)] is ([a1; ...; an], [b1; ...; bn])
value combine : 'a list * 'b list -> ('a * 'b) list
Chapter 13. The core library 112
Transform a pair of lists into a list of pairs:
combine ([a1; ...; an], [b1; ...; bn]) is [(a1,b1); ...; (an,bn)]. Raise
Invalid_argument "combine" if the two lists have different lengths.
value map_combine : ('a * 'b -> 'c) -> 'a list * 'b list -> 'c list
map_combine f ([a1; ...; an], [b1; ...; bn]) is
[f (a1, b1); ...; f (an, bn)]. Raise invalid_argument "map_combine" if
the two lists have different lengths.
value do_list_combine : ('a * 'b -> unit) -> 'a list * 'b list -> unit
do_list_combine f ([a1; ...; an], [b1; ...; bn]) calls in turn
f (a1, b1); ...; f (an, bn), discarding the results. Raise
Invalid_argument "do_list_combine" if the two lists have different
lengths.
13.14 ref: operations on references
type 'a ref = ref of mutable 'a
The type of references (mutable indirection cells) containing a value of
type 'a.
value prefix ! : 'a ref -> 'a
!r returns the current contents of reference r. Could be defined as
fun (ref x) -> x.
value prefix := : 'a ref -> 'a -> unit
r := a stores the value of a in reference r.
value incr : int ref -> unit
Increment the integer contained in the given reference. Could be defined
as fun r -> r := succ !r.
value decr : int ref -> unit
Decrement the integer contained in the given reference. Could be defined
as fun r -> r := pred !r.
13.15 stream: operations on streams
type 'a stream
The type of streams containing values of type 'a.
exception Parse_failure
Raised by parsers when none of the first component of the stream patterns
is accepted
Chapter 13. The core library 113
exception Parse_error
Raised by parsers when the first component of a stream pattern is
accepted, but one of the following components is rejected
value stream_next : 'a stream -> 'a
stream_next s returns the first element of stream s, and removes it from
the stream. Raise Parse_failure if the stream is empty.
value stream_from : (unit -> 'a) -> 'a stream
stream_from f returns the stream which fetches its terminals using the
function f. This function could be defined as:
let rec stream_from f = [< 'f(); stream_from f >]
but is implemented more efficiently.
value stream_of_string : string -> char stream
stream_of_string s returns the stream of the characters in string s.
value stream_of_channel : in_channel -> char stream
stream_of_channel ic returns the stream of characters read on channel ic.
value do_stream : ('a -> unit) -> 'a stream -> unit
do_stream f s scans the whole stream s, applying the function f in turn
to each terminal encountered
value stream_check : ('a -> bool) -> 'a stream -> 'a
stream_check p returns the parser which returns the first terminal of the
stream if the predicate p returns true on this terminal, and raises
Parse_failure otherwise.
value end_of_stream : 'a stream -> unit
Return () iff the stream is empty, and raise Parse_failure otherwise.
value stream_get : 'a stream -> 'a * 'a stream
stream_get s return the first element of the stream s, and a stream
containing the remaining elements of s. Raise Parse_failure if the
stream is empty. The stream s is not modified. This function makes it
possible to access a stream non-destructively.
13.16 string: string operations
value string_length : string -> int
Return the length (number of characters) of the given string.
value nth_char : string -> int -> char
Chapter 13. The core library 114
nth_char s n returns character number n in string s. The first character
is character number 0. The last character is character number
string_length s - 1. Raise Invalid_argument "nth_char" if n is ouside
the range 0 to (string_length s - 1). You can also write s.[n] instead
of nth_char s n.
value set_nth_char : string -> int -> char -> unit
set_nth_char s n c modifies string s in place, replacing the character
number n by c. Raise Invalid_argument "set_nth_char" if n is ouside the
range 0 to (string_length s - 1). You can also write s.[n] <- c instead
of set_nth_char s n c.
value prefix ^ : string -> string -> string
s1 ^ s2 returns a fresh string containing the concatenation of the
strings s1 and s2.
value concat : string list -> string
Return a fresh string containing the concatenation of all the strings in
the argument list.
value sub_string : string -> int -> int -> string
sub_string s start len returns a fresh string of length len, containing
the characters number start to start + len - 1 of string s. Raise
Invalid_argument "sub_string" if start and len do not designate a valid
substring of s; that is, if start < 0, or len < 0, or
start + len > string_length s.
value create_string : int -> string
create_string n returns a fresh string of length n. The string initially
contains arbitrary characters.
value make_string : int -> char -> string
make_string n c returns a fresh string of length n, filled with the
character c.
value fill_string : string -> int -> int -> char -> unit
fill_string s start len c modifies string s in place, replacing the
characters number start to start + len - 1 by c. Raise
Invalid_argument "fill_string" if start and len do not designate a valid
substring of s.
value blit_string : string -> int -> string -> int -> int -> unit
blit_string s1 o1 s2 o2 len copies len characters from string s1,
starting at character number o1, to string s2, starting at character
number o2. It works correctly even if s1 and s2 are the same string, and
the source and destination chunks overlap. Raise
Invalid_argument "blit_string" if o1 and len do not designate a valid
substring of s1, or if o2 and len do not designate a valid substring of
s2.
Chapter 13. The core library 115
value replace_string : string -> string -> int -> unit
replace_string dest src start copies all characters from the string src
into the string dst, starting at character number start in dst. Raise
Invalid_argument "replace_string" if copying would overflow string dest.
value eq_string : string -> string -> bool
value neq_string : string -> string -> bool
value le_string : string -> string -> bool
value lt_string : string -> string -> bool
value ge_string : string -> string -> bool
value gt_string : string -> string -> bool
Comparison functions (lexicographic ordering) between strings.
value compare_strings : string -> string -> int
General comparison between strings. compare_strings s1 s2 returns 0 if
s1 and s2 are equal, or else -2 if s1 is a prefix of s2, or 2 if s2 is a
prefix of s1, or else -1 if s1 is lexicographically before s2, or 1 if s2
is lexicographically before s1.
value string_for_read : string -> string
Return a copy of the argument, with special characters represented by
escape sequences, following the lexical conventions of Caml Light.
value index_char: string -> char -> int
index_char s c returns the position of the leftmost occurrence of
character c in string s. Raise Not_found if c does not occur in s.
value rindex_char: string -> char -> int
rindex_char s c returns the position of the rightmost occurrence of
character c in string s. Raise Not_found if c does not occur in s.
value index_char_from: string -> int -> char -> int
value rindex_char_from: string -> int -> char -> int
Same as index_char and rindex_char, but start searching at the character
position given as second argument. index_char s c is equivalent to
index_char_from s 0 c, and rindex_char s c to
rindex_char_from s (string_length s - 1) c.
13.17 vect: operations on vectors
value vect_length : 'a vect -> int
Return the length (number of elements) of the given vector.
value vect_item : 'a vect -> int -> 'a
vect_item v n returns the element number n of vector v. The first
element has number 0. The last element has number vect_length v - 1.
Chapter 13. The core library 116
Raise Invalid_argument "vect_item" if n is outside the range 0 to
(vect_length v - 1). You can also write v.(n) instead of vect_item v n.
value vect_assign : 'a vect -> int -> 'a -> unit
vect_assign v n x modifies vector v in place, replacing element number n
with x. Raise Invalid_argument "vect_assign" if n is outside the range 0
to vect_length v - 1. You can also write v.(n) <- x instead of
vect_assign v n x.
value make_vect : int -> 'a -> 'a vect
make_vect n x returns a fresh vector of length n, initialized with x.
All the elements of this new vector are initially physically equal to x
(see module eq). Consequently, if x is mutable, it is shared among all
elements of the vector, and modifying x through one of the vector entries
will modify all other entries at the same time.
value make_matrix : int -> int -> 'a -> 'a vect vect
make_matrix dimx dimy e returns a two-dimensional array (a vector of
vectors) with first dimension dimx and second dimension dimy. All the
elements of this new matrix are initially physically equal to e. The
element (x,y) of a matrix m is accessed with the notation m.(x).(y).
value init_vect : int -> (int -> 'a) -> 'a vect
init_vect n f returns a fresh array of length n, with element number i
equal to f i.
value concat_vect : 'a vect -> 'a vect -> 'a vect
concat_vect v1 v2 returns a fresh vector containing the concatenation of
vectors v1 and v2.
value sub_vect : 'a vect -> int -> int -> 'a vect
sub_vect v start len returns a fresh vector of length len, containing the
elements number start to start + len - 1 of vector v. Raise
Invalid_argument "sub_vect" if start and len do not designate a valid
subvector of v; that is, if start < 0, or len < 0, or
start + len > vect_length v.
value copy_vect : 'a vect -> 'a vect
copy_vect v returns a copy of v, that is, a fresh vector containing the
same elements as v.
value fill_vect : 'a vect -> int -> int -> 'a -> unit
fill_vect v ofs len x modifies the vector v in place, storing x in
elements number ofs to ofs + len - 1. Raise Invalid_argument "fill_vect"
if ofs and len do not designate a valid subvector of v.
value blit_vect : 'a vect -> int -> 'a vect -> int -> int -> unit
blit_vect v1 o1 v2 o2 len copies len elements from vector v1, starting at
element number o1, to vector v2, starting at element number o2. It works
Chapter 13. The core library 117
correctly even if v1 and v2 are the same vector, and the source and
destination chunks overlap. Raise Invalid_argument "blit_vect" if o1 and
len do not designate a valid subvector of v1, or if o2 and len do not
designate a valid subvector of v2.
value list_of_vect : 'a vect -> 'a list
list_of_vect v returns the list of all the elements of v, that is:
[v.(0); v.(1); ...; v.(vect_length v - 1)].
value vect_of_list : 'a list -> 'a vect
vect_of_list l returns a fresh vector containing the elements of l.
value map_vect : ('a -> 'b) -> 'a vect -> 'b vect
map_vect f v applies function f to all the elements of v, and builds a
vector with the results returned by f:
[| f v.(0); f v.(1); ...; f v.(vect_length v - 1) |].
value map_vect_list : ('a -> 'b) -> 'a vect -> 'b list
map_vect_list f v applies function f to all the elements of v, and builds
a list with the results returned by f:
[ f v.(0); f v.(1); ...; f v.(vect_length v - 1) ].
value do_vect : ('a -> unit) -> 'a vect -> unit
do_vect f v applies function f in turn to all the elements of v,
discarding all the results:
f v.(0); f v.(1); ...; f v.(vect_length v - 1); ().
Chapter 14
The standard library
This chapter describes the functions provided by the Caml Light standard
library. Just as the modules from the core library, the modules from the
standard library are automatically linked with the user's object code files by
the camlc command. Hence, the globals defined by these libraries can be used
in standalone programs without having to add any .zo file on the command line
for the linking phase. Similarly, in interactive use, these globals can be
used in toplevel phrases without having to load any .zo file in memory.
Unlike the modules from the core library, the modules from the standard
library are not automatically ``opened'' when a compilation starts, or when
the toplevel system is launched. Hence it is necessary to use qualified
identifiers to refer to the functions provided by these modules, or to add
#open directives.
Conventions
For easy reference, the modules are listed below in alphabetical order of
module names. For each module, the declarations from its interface file are
printed one by one in typewriter font, followed by a short comment. All
modules and the identifiers they export are indexed at the end of this report.
14.1 arg: parsing of command line arguments
This module provides a general mechanism for extracting options and
arguments from the command line to the program.
Syntax of command lines: A keyword is a character string starting with a
-. An option is a keyword alone or followed by an argument. There are
four types of keywords: Unit, String, Int, and Float. Unit keywords do
not take an argument. String, Int, and Float keywords take the following
word on the command line as an argument. Arguments not preceded by a
keyword are called anonymous arguments.
Examples (cmd is assumed to be the command name):
cmd -flag (a unit option)
cmd -int 1 (an int option with argument 1)
cmd -string foobar (a string option with argument "foobar")
cmd -float 12.34 (a float option with argument 12.34)
cmd 1 2 3 (three anonymous arguments: "1", "2", and "3")
cmd 1 2 -flag 3 -string bar 4
(four anonymous arguments, a unit option, and
118
Chapter 14. The standard library 119
a string option with argument "bar")
type spec =
String of (string -> unit)
| Int of (int -> unit)
| Unit of (unit -> unit)
| Float of (float -> unit)
The concrete type describing the behavior associated with a keyword.
value parse : (string * spec) list -> (string -> unit) -> unit
parse speclist anonfun parses the command line, calling the functions in
speclist whenever appropriate, and anonfun on anonymous arguments. The
functions are called in the same order as they appear on the command
line. The strings in the (string * spec) list are keywords and must
start with a -, else they are ignored. For the user to be able to
specify anonymous arguments starting with a -, include for example
("--", String anonfun) in speclist.
exception Bad of string
Functions in speclist or anonfun can raise Bad with an error message to
reject invalid arguments.
14.2 baltree: basic balanced binary trees
This module implements balanced ordered binary trees. All operations
over binary trees are applicative (no side-effects). The set and map
modules are based on this module. This modules gives a more direct
access to the internals of the binary tree implementation than the set
and map abstractions, but is more delicate to use and not as safe. For
advanced users only.
type 'a t = Empty | Node of 'a t * 'a * 'a t * int
The type of trees containing elements of type 'a. Empty is the empty
tree (containing no elements).
type 'a contents = Nothing | Something of 'a
Used with the functions modify and split, to represent the presence or
the absence of an element in a tree.
value add: ('a -> int) -> 'a -> 'a t -> 'a t
add f x t inserts the element x into the tree t. f is an ordering
function: f y must return 0 if x and y are equal (or equivalent), a
negative integer if x is smaller than y, and a positive integer if x is
greater than y. The tree t is returned unchanged if it already contains
an element equivalent to x (that is, an element y such that f y is 0).
The ordering f must be consistent with the orderings used to build t with
add, remove, modify or split operations.
value contains: ('a -> int) -> 'a t -> bool
Chapter 14. The standard library 120
contains f t checks whether t contains an element satisfying f, that is,
an element x such that f x is 0. f is an ordering function with the same
constraints as for add. It can be coarser (identify more elements) than
the orderings used to build t, but must be consistent with them.
value find: ('a -> int) -> 'a t -> 'a
Same as contains, except that find f t returns the element x such that
f x is 0, or raises Not_found if none has been found.
value remove: ('a -> int) -> 'a t -> 'a t
remove f t removes one element x of t such that f x is 0. f is an
ordering function with the same constraints as for add. t is returned
unchanged if it does not contain any element satisfying f. If several
elements of t satisfy f, only one is removed.
value modify: ('a -> int) -> ('a contents -> 'a contents) -> 'a t -> 'a t
General insertion/modification/deletion function. modify f g t searchs t
for an element x satisfying the ordering function f. If one is found, g
is applied to Something x; if g returns Nothing, the element x is
removed; if g returns Something y, the element y replaces x in the tree.
(It is assumed that x and y are equivalent, in particular, that f y is
0.) If the tree does not contain any x satisfying f, g is applied to
Nothing; if it returns Nothing, the tree is returned unchanged; if it
returns Something x, the element x is inserted in the tree. (It is
assumed that f x is 0.) The functions add and remove are special cases
of modify, slightly more efficient.
value split: ('a -> int) -> 'a t -> 'a t * 'a contents * 'a t
split f t returns a triple (less, elt, greater) where less is a tree
containing all elements x of t such that f x is negative, greater is a
tree containing all elements x of t such that f x is positive, and elt is
Something x if t contains an element x such that f x is 0, and Nothing
otherwise.
value compare: ('a -> 'a -> int) -> 'a t -> 'a t -> int
Compare two trees. The first argument f is a comparison function over
the tree elements: f e1 e2 is zero if the elements e1 and e2 are equal,
negative if e1 is smaller than e2, and positive if e1 is greater than e2.
compare f t1 t2 compares the fringes of t1 and t2 by lexicographic
extension of f.
14.3 filename: operations on file names
value current_dir_name : string
The conventional name for the current directory (e.g. . in Unix).
value concat : string -> string -> string
concat dir file returns a file name that designates file file in
directory dir.
Chapter 14. The standard library 121
value is_absolute : string -> bool
Return true if the file name is absolute or starts with an explicit
reference to the current directory (./ or ../ in Unix), and false if it
is relative to the current directory.
value check_suffix : string -> string -> bool
check_suffix name suff returns true if the filename name ends with the
suffix suff.
value chop_suffix : string -> string -> string
chop_suffix name suff removes the suffix suff from the filename name.
The behavior is undefined if name does not end with the suffix suff.
value basename : string -> string
value dirname : string -> string
Split a file name into directory name / base file name.
concat (dirname name) (basename name) returns a file name which is
equivalent to name. Moreover, after setting the current directory to
dirname name (with sys__chdir), references to basename name (which is a
relative file name) designate the same file as name before the call to
chdir.
14.4 format: pretty printing
This module implements a pretty-printing facility to format text within
``pretty-printing boxes''. The pretty-printer breaks lines at specified
break hints, and indents lines according to the box structure.
Rule of thumb for casual users:
use simple boxes (as obtained by open_box 0);
use simple break hints (as obtained by print_cut () that outputs a simple
break hint, or by print_space () that ouputs a space indicating a break
hint);
once a box is opened, display its material with basic printing functions
(e. g. print_int and print_string);
when the material for a box has been printed, call close_box () to close
the box;
at the end of your routine, evaluate print_newline () to close all
remaining boxes and flush the pretty-printer.
You may alternatively consider this module as providing an extension to
the printf facility: you can simply add pretty-printing annotations to
your regular printf formats, as explained below in the documentation of
the function fprintf.
The behaviour of pretty-printing commands is unspecified if there is no
opened pretty-printing box. Each box opened via one of the open_
functions below must be closed using close_box for proper formatting.
Otherwise, some of the material printed in the boxes may not be output,
Chapter 14. The standard library 122
or may be formatted incorrectly.
In case of interactive use, the system closes all opened boxes and
flushes all pending text (as with the print_newline function) after each
phrase. Each phrase is therefore executed in the initial state of the
pretty-printer.
Boxes
value open_box : int -> unit
open_box d opens a new pretty-printing box with offset d. This box is
the general purpose pretty-printing box. Material in this box is
displayed ``horizontal or vertical'': break hints inside the box may
lead to a new line, if there is no more room on the line to print the
remainder of the box, or if a new line may lead to a new indentation
(demonstrating the indentation of the box). When a new line is printed
in the box, d is added to the current indentation.
value close_box : unit -> unit
Close the most recently opened pretty-printing box.
Formatting functions
value print_string : string -> unit
print_string str prints str in the current box.
value print_as : int -> string -> unit
print_as len str prints str in the current box. The pretty-printer
formats str as if it were of length len.
value print_int : int -> unit
Print an integer in the current box.
value print_float : float -> unit
Print a floating point number in the current box.
value print_char : char -> unit
Print a character in the current box.
value print_bool : bool -> unit
Print an boolean in the current box.
Break hints
value print_space : unit -> unit
Chapter 14. The standard library 123
print_space () is used to separate items (typically to print a space
between two words). It indicates that the line may be split at this
point. It either prints one space or splits the line. It is equivalent
to print_break 1 0.
value print_cut : unit -> unit
print_cut () is used to mark a good break position. It indicates that
the line may be split at this point. It either prints nothing or splits
the line. This allows line splitting at the current point, without
printing spaces or adding indentation. It is equivalent to
print_break 0 0.
value print_break : int -> int -> unit
Insert a break hint in a pretty-printing box. print_break nspaces offset
indicates that the line may be split (a newline character is printed) at
this point, if the contents of the current box does not fit on one line.
If the line is split at that point, offset is added to the current
indentation. If the line is not split, nspaces spaces are printed.
value print_flush : unit -> unit
Flush the pretty printer: all opened boxes are closed, and all pending
text is displayed.
value print_newline : unit -> unit
Equivalent to print_flush followed by a new line.
value force_newline : unit -> unit
Force a newline in the current box. Not the normal way of
pretty-printing, you should prefer break hints.
value print_if_newline : unit -> unit
Execute the next formatting command if the preceding line has just been
split. Otherwise, ignore the next formatting command.
Margin
value set_margin : int -> unit
set_margin d sets the value of the right margin to d (in characters):
this value is used to detect line overflows that leads to split lines.
Nothing happens if d is smaller than 2 or bigger than 999999999.
value get_margin : unit -> int
Return the position of the right margin.
Maximum indentation limit
value set_max_indent : int -> unit
Chapter 14. The standard library 124
set_max_indent d sets the value of the maximum indentation limit to d (in
characters): once this limit is reached, boxes are rejected to the left,
if they do not fit on the current line. Nothing happens if d is smaller
than 2 or bigger than 999999999.
value get_max_indent : unit -> int
Return the value of the maximum indentation limit (in characters).
Formatting depth: maximum number of boxes allowed before ellipsis
value set_max_boxes : int -> unit
set_max_boxes max sets the maximum number of boxes simultaneously opened.
Material inside boxes nested deeper is printed as an ellipsis (more
precisely as the text returned by get_ellipsis_text ()). Nothing happens
if max is not greater than 1.
value get_max_boxes : unit -> int
Return the maximum number of boxes allowed before ellipsis.
value over_max_boxes : unit -> bool
Test the maximum number of boxes allowed have already been opened.
Advanced formatting
value open_hbox : unit -> unit
open_hbox () opens a new pretty-printing box. This box is
``horizontal'': the line is not split in this box (new lines may still
occur inside boxes nested deeper).
value open_vbox : int -> unit
open_vbox d opens a new pretty-printing box with offset d. This box is
``vertical'': every break hint inside this box leads to a new line.
When a new line is printed in the box, d is added to the current
indentation.
value open_hvbox : int -> unit
open_hvbox d opens a new pretty-printing box with offset d. This box is
``horizontal-vertical'': it behaves as an ``horizontal'' box if it fits
on a single line, otherwise it behaves as a ``vertical'' box. When a new
line is printed in the box, d is added to the current indentation.
value open_hovbox : int -> unit
open_hovbox d opens a new pretty-printing box with offset d. This box is
``horizontal or vertical'': break hints inside this box may lead to a
new line, if there is no more room on the line to print the remainder of
the box. When a new line is printed in the box, d is added to the
current indentation.
Chapter 14. The standard library 125
Tabulations
value open_tbox : unit -> unit
Open a tabulation box.
value close_tbox : unit -> unit
Close the most recently opened tabulation box.
value print_tbreak : int -> int -> unit
Break hint in a tabulation box. print_tbreak spaces offset moves the
insertion point to the next tabulation (spaces being added to this
position). Nothing occurs if insertion point is already on a tabulation
mark. If there is no next tabulation on the line, then a newline is
printed and the insertion point moves to the first tabulation of the box.
If a new line is printed, offset is added to the current indentation.
value set_tab : unit -> unit
Set a tabulation mark at the current insertion point.
value print_tab : unit -> unit
print_tab () is equivalent to print_tbreak (0,0).
Ellipsis
value set_ellipsis_text : string -> unit
Set the text of the ellipsis printed when too many boxes are opened (a
single dot, ., by default).
value get_ellipsis_text : unit -> string
Return the text of the ellipsis.
Redirecting formatter output
value set_formatter_out_channel : out_channel -> unit
Redirect the pretty-printer output to the given channel.
value set_formatter_output_functions :
(string -> int -> int -> unit) -> (unit -> unit) -> unit
set_formatter_output_functions out flush redirects the pretty-printer
output to the functions out and flush. The out function performs the
pretty-printer output. It is called with a string s, a start position p,
and a number of characters n; it is supposed to output characters p to
p+n-1 of s. The flush function is called whenever the pretty-printer is
flushed using print_flush or print_newline.
Chapter 14. The standard library 126
value get_formatter_output_functions :
unit -> (string -> int -> int -> unit) * (unit -> unit)
Return the current output functions of the pretty-printer.
Multiple formatted output
type formatter
Abstract data type corresponding to a pretty-printer and all its
machinery. Defining new pretty-printers permits the output of material
in parallel on several channels. Parameters of the pretty-printer are
local to the pretty-printer: margin, maximum indentation limit, maximum
number of boxes simultaneously opened, ellipsis, and so on, are specific
to each pretty-printer and may be fixed independently. A new formatter
is obtained by calling the make_formatter function.
value std_formatter : formatter
The standard formatter used by the formatting functions above. It is
defined using make_formatter with output function output stdout and
flushing function fun () -> flush stdout.
value err_formatter : formatter
A formatter to use with formatting functions below for output to standard
error. It is defined using make_formatter with output function
output stderr and flushing function fun () -> flush stderr.
value make_formatter :
(string -> int -> int -> unit) -> (unit -> unit) -> formatter
make_formatter out flush returns a new formatter that writes according to
the output function out, and flushing function flush. Hence, a formatter
to out channel oc is returned by
make_formatter (output oc) (fun () -> flush oc).
value pp_open_hbox : formatter -> unit -> unit
value pp_open_vbox : formatter -> int -> unit
value pp_open_hvbox : formatter -> int -> unit
value pp_open_hovbox : formatter -> int -> unit
value pp_open_box : formatter -> int -> unit
value pp_close_box : formatter -> unit -> unit
value pp_print_string : formatter -> string -> unit
value pp_print_as : formatter -> int -> string -> unit
value pp_print_int : formatter -> int -> unit
value pp_print_float : formatter -> float -> unit
value pp_print_char : formatter -> char -> unit
value pp_print_bool : formatter -> bool -> unit
value pp_print_break : formatter -> int -> int -> unit
value pp_print_cut : formatter -> unit -> unit
value pp_print_space : formatter -> unit -> unit
value pp_force_newline : formatter -> unit -> unit
value pp_print_flush : formatter -> unit -> unit
value pp_print_newline : formatter -> unit -> unit
value pp_print_if_newline : formatter -> unit -> unit
value pp_open_tbox : formatter -> unit -> unit
Chapter 14. The standard library 127
value pp_close_tbox : formatter -> unit -> unit
value pp_print_tbreak : formatter -> int -> int -> unit
value pp_set_tab : formatter -> unit -> unit
value pp_print_tab : formatter -> unit -> unit
value pp_set_margin : formatter -> int -> unit
value pp_get_margin : formatter -> unit -> int
value pp_set_max_indent : formatter -> int -> unit
value pp_get_max_indent : formatter -> unit -> int
value pp_set_max_boxes : formatter -> int -> unit
value pp_get_max_boxes : formatter -> unit -> int
value pp_over_max_boxes : formatter -> unit -> bool
value pp_set_ellipsis_text : formatter -> string -> unit
value pp_get_ellipsis_text : formatter -> unit -> string
value pp_set_formatter_out_channel : formatter -> out_channel -> unit
value pp_set_formatter_output_functions : formatter ->
(string -> int -> int -> unit) -> (unit -> unit) -> unit
value pp_get_formatter_output_functions :
formatter -> unit -> (string -> int -> int -> unit) * (unit -> unit)
The basic functions to use with formatters. These functions are the
basic ones: usual functions operating on the standard formatter are
defined via partial evaluation of these primitives. For instance,
print_string is equal to pp_print_string std_formatter.
value fprintf : formatter -> ('a, formatter, unit) format -> 'a
fprintf ff format arg1 ... argN formats the arguments arg1 to argN
according to the format string format, and outputs the resulting string
on the formatter ff. The format is a character string which contains
three types of objects: plain characters and conversion specifications
as specified in the printf module, and pretty-printing indications. The
pretty-printing indication characters are introduced by a @ character,
and their meanings are:
[: open a pretty-printing box. The type and offset of the box may be
optionally specified with the following syntax: the < character,
followed by an optional box type indication, then an optional integer
offset, and the closing > character. Box type is one of h, v, hv, or
hov, which stand respectively for an horizontal, vertical,
``horizontal-vertical'' and ``horizontal or vertical'' box.
]: close the most recently opened pretty-printing box.
,: output a good break as with print_cut ().
: output a space, as with print_space ().
\n: force a newline, as with force_newline ().
;: output a good break as with print_break. The nspaces and offset
parameters of the break may be optionally specified with the following
syntax: the < character, followed by an integer nspaces value, then an
integer offset, and a closing > character.
.: flush the pretty printer as with print_newline ().
@: a plain @ character.
value printf : ('a, formatter, unit) format -> 'a
Same as fprintf, but output on std_formatter.
value eprintf: ('a, formatter, unit) format -> 'a
Same as fprintf, but output on err_formatter.
Chapter 14. The standard library 128
14.5 gc: memory management control and statistics
type stat = {
minor_words : int;
promoted_words : int;
major_words : int;
minor_collections : int;
major_collections : int;
heap_words : int;
heap_chunks : int;
live_words : int;
live_blocks : int;
free_words : int;
free_blocks : int;
largest_words : int;
fragments : int
}
The memory management counters are returned in a stat record. All the
numbers are computed since the start of the program. The fields of this
record are:
minor_words Number of words allocated in the minor heap.
promoted_words Number of words allocated in the minor heap that survived
a minor collection and were moved to the major heap.
major_words Number of words allocated in the major heap, including the
promoted words.
minor_collections Number of minor collections.
major_collections Number of major collection cycles, not counting the
current cycle.
heap_words Total size of the major heap, in words.
heap_chunks Number of times the major heap size was increased.
live_words Number of words of live data in the major heap, including the
header words.
live_blocks Number of live objects in the major heap.
free_words Number of words in the free list.
free_blocks Number of objects in the free list.
largest_words Size (in words) of the largest object in the free list.
fragments Number of wasted words due to fragmentation. These are 1-words
free blocks placed between two live objects. They cannot be inserted in
the free list, thus they are not available for allocation.
The total amount of memory allocated by the program is (in words)
minor_words + major_words - promoted_words. Multiply by the word size (4
on a 32-bit machine, 8 on a 64-bit machine) to get the number of bytes.
type control = {
mutable minor_heap_size : int;
mutable major_heap_increment : int;
mutable space_overhead : int;
mutable verbose : bool
}
The GC parameters are given as a control record. The fields are:
minor_heap_size The size (in words) of the minor heap. Changing this
parameter will trigger a minor collection.
major_heap_increment The minimum number of words to add to the major heap
when increasing it.
space_overhead The major GC speed is computed from this parameter. This
is the percentage of heap space that will be "wasted" because the GC does
Chapter 14. The standard library 129
not immediatly collect unreachable objects. The GC will work more (use
more CPU time and collect objects more eagerly) if space_overhead is
smaller. The computation of the GC speed assumes that the amount of live
data is constant.
verbose This flag controls the GC messages on standard error output.
value stat : unit -> stat
Return the current values of the memory management counters in a stat
record.
value print_stat : io__out_channel -> unit
Print the current values of the memory management counters (in
human-readable form) into the channel argument.
value get : unit -> control
Return the current values of the GC parameters in a control record.
value set : control -> unit
set r changes the GC parameters according to the control record r. The
normal usage is:
let r = gc__get () in (* Get the current parameters. *)
r.verbose <- true; (* Change some of them. *)
gc__set r (* Set the new values. *)
value minor : unit -> unit
Trigger a minor collection.
value major : unit -> unit
Finish the current major collection cycle.
value full_major : unit -> unit
Finish the current major collection cycle and perform a complete new
cycle. This will collect all currently unreachable objects.
14.6 genlex: a generic lexical analyzer
This module implements a simple ``standard'' lexical analyzer, presented
as a function from character streams to token streams. It implements
roughly the lexical conventions of Caml, but is parameterized by the set
of keywords of your language.
type token =
Kwd of string
| Ident of string
| Int of int
| Float of float
Chapter 14. The standard library 130
| String of string
| Char of char
The type of tokens. The lexical classes are: Int and Float for integer
and floating-point numbers; String for string literals, enclosed in
double quotes; Char for character literals, enclosed in backquotes; Ident
for identifiers (either sequences of letters, digits, underscores and
quotes, or sequences of ``operator characters'' such as +, *, etc); and
Kwd for keywords (either identifiers or single ``special characters''
such as (, }, etc).
value make_lexer: string list -> (char stream -> token stream)
Construct the lexer function. The first argument is the list of
keywords. An identifier s is returned as Kwd s if s belongs to this
list, and as Ident s otherwise. A special character s is returned as
Kwd s if s belongs to this list, and cause a lexical error (exception
Parse_error) otherwise. Blanks and newlines are skipped. Comments
delimited by (* and *) are skipped as well, and can be nested.
Example: a lexer suitable for a desk calculator is obtained by
let lexer = make_lexer ["+";"-";"*";"/";"let";"="; "("; ")"]
The associated parser would be a function from token stream to, for
instance, int, and would have rules such as:
let parse_expr = function
[< 'Int n >] -> n
| [< 'Kwd "("; parse_expr n; 'Kwd ")" >] -> n
| [< parse_expr n1; (parse_remainder n1) n2 >] -> n2
and parse_remainder n1 = function
[< 'Kwd "+"; parse_expr n2 >] -> n1+n2
| ...
14.7 hashtbl: hash tables and hash functions
Hash tables are hashed association tables, with in-place modification.
type ('a, 'b) t
The type of hash tables from type 'a to type 'b.
value new : int -> ('a,'b) t
new n creates a new, empty hash table, with initial size n. The table
grows as needed, so n is just an initial guess. Better results are said
to be achieved when n is a prime number. Raise
Invalid_argument "hashtbl__new" if n is less than 1.
value clear : ('a, 'b) t -> unit
Chapter 14. The standard library 131
Empty a hash table.
value add : ('a, 'b) t -> 'a -> 'b -> unit
add tbl x y adds a binding of x to y in table tbl. Previous bindings for
x are not removed, but simply hidden. That is, after performing
remove tbl x, the previous binding for x, if any, is restored. (This is
the semantics of association lists.)
value find : ('a, 'b) t -> 'a -> 'b
find tbl x returns the current binding of x in tbl, or raises Not_found
if no such binding exists.
value find_all : ('a, 'b) t -> 'a -> 'b list
find_all tbl x returns the list of all data associated with x in tbl.
The current binding is returned first, then the previous bindings, in
reverse order of introduction in the table.
value remove : ('a, 'b) t -> 'a -> unit
remove tbl x removes the current binding of x in tbl, restoring the
previous binding if it exists. It does nothing if x is not bound in tbl.
value do_table : ('a -> 'b -> unit) -> ('a, 'b) t -> unit
do_table f tbl applies f to all bindings in table tbl, discarding all the
results. f receives the key as first argument, and the associated value
as second argument. Each binding is presented exactly once to f. The
order in which the bindings are passed to f is unpredictable, except that
successive bindings for the same key are presented in reverse
chronological order (most recent first).
value do_table_rev : ('a -> 'b -> unit) -> ('a, 'b) t -> unit
Same as do_table, except that successive bindings for the same key are
presented in chronological order (oldest first).
The polymorphic hash primitive
value hash : 'a -> int
hash x associates a positive integer to any value of any type. It is
guaranteed that if x = y, then hash x = hash y. Moreover, hash always
terminates, even on cyclic structures.
value hash_param : int -> int -> 'a -> int
hash_param n m x computes a hash value for x, with the same properties as
for hash. The two extra parameters n and m give more precise control
over hashing. Hashing performs a depth-first, right-to-left traversal of
the structure x, stopping after n meaningful nodes were encountered, or m
nodes, meaningful or not, were encountered. Meaningful nodes are:
integers; floating-point numbers; strings; characters; booleans; and
constant constructors. Larger values of m and n means that more nodes
are taken into account to compute the final hash value, and therefore
Chapter 14. The standard library 132
collisions are less likely to happen. However, hashing takes longer.
The parameters m and n govern the tradeoff between accuracy and speed.
14.8 lexing: the run-time library for lexers generated by camllex
Lexer buffers
type lexbuf =
{ refill_buff : lexbuf -> unit;
lex_buffer : string;
mutable lex_abs_pos : int;
mutable lex_start_pos : int;
mutable lex_curr_pos : int;
mutable lex_last_pos : int;
mutable lex_last_action : lexbuf -> obj }
The type of lexer buffers. A lexer buffer is the argument passed to the
scanning functions defined by the generated scanners. The lexer buffer
holds the current state of the scanner, plus a function to refill the
buffer from the input.
value create_lexer_channel : in_channel -> lexbuf
Create a lexer buffer on the given input channel.
create_lexer_channel inchan returns a lexer buffer which reads from the
input channel inchan, at the current reading position.
value create_lexer_string : string -> lexbuf
Create a lexer buffer which reads from the given string. Reading starts
from the first character in the string. An end-of-input condition is
generated when the end of the string is reached.
value create_lexer : (string -> int -> int) -> lexbuf
Create a lexer buffer with the given function as its reading method.
When the scanner needs more characters, it will call the given function,
giving it a character string s and a character count n. The function
should put n characters or less in s, starting at character number 0, and
return the number of characters provided. A return value of 0 means end
of input.
Functions for lexer semantic actions
The following functions can be called from the semantic actions of lexer
definitions (the ML code enclosed in braces that computes the value
returned by lexing functions). They give access to the character string
matched by the regular expression associated with the semantic action.
These functions must be applied to the argument lexbuf, which, in the
code generated by camllex, is bound to the lexer buffer passed to the
parsing function.
value get_lexeme : lexbuf -> string
get_lexeme lexbuf returns the string matched by the regular expression.
Chapter 14. The standard library 133
value get_lexeme_char : lexbuf -> int -> char
get_lexeme_char lexbuf i returns character number i in the matched
string.
value get_lexeme_start : lexbuf -> int
get_lexeme_start lexbuf returns the position in the input stream of the
first character of the matched string. The first character of the stream
has position 0.
value get_lexeme_end : lexbuf -> int
get_lexeme_end lexbuf returns the position in the input stream of the
character following the last character of the matched string. The first
character of the stream has position 0.
14.9 map: association tables over ordered types
This module implements applicative association tables, also known as
finite maps or dictionaries, given a total ordering function over the
keys. All operations over maps are purely applicative (no side-effects).
The implementation uses balanced binary trees, and therefore searching
and insertion take time logarithmic in the size of the map.
type ('a, 'b) t
The type of maps from type 'a to type 'b.
value empty: ('a -> 'a -> int) -> ('a, 'b) t
The empty map. The argument is a total ordering function over the set
elements. This is a two-argument function f such that f e1 e2 is zero if
the elements e1 and e2 are equal, f e1 e2 is strictly negative if e1 is
smaller than e2, and f e1 e2 is strictly positive if e1 is greater than
e2. Examples: a suitable ordering function for type int is prefix -.
You can also use the generic structural comparison function eq__compare.
value add: 'a -> 'b -> ('a, 'b) t -> ('a, 'b) t
add x y m returns a map containing the same bindings as m, plus a binding
of x to y. Previous bindings for x in m are not removed, but simply
hidden: they reappear after performing a remove operation. (This is the
semantics of association lists.)
value find:'a -> ('a, 'b) t -> 'b
find x m returns the current binding of x in m, or raises Not_found if no
such binding exists.
value remove: 'a -> ('a, 'b) t -> ('a, 'b) t
remove x m returns a map containing the same bindings as m except the
current binding for x. The previous binding for x is restored if it
exists. m is returned unchanged if x is not bound in m.
Chapter 14. The standard library 134
value iter: ('a -> 'b -> unit) -> ('a, 'b) t -> unit
iter f m applies f to all bindings in map m, discarding the results. f
receives the key as first argument, and the associated value as second
argument. The order in which the bindings are passed to f is
unspecified. Only current bindings are presented to f: bindings hidden
by more recent bindings are not passed to f.
14.10 parsing: the run-time library for parsers generated by camlyacc
value symbol_start : unit -> int
value symbol_end : unit -> int
symbol_start and symbol_end are to be called in the action part of a
grammar rule only. They return the position of the string that matches
the left-hand side of the rule: symbol_start() returns the position of
the first character; symbol_end() returns the position of the last
character, plus one. The first character in a file is at position 0.
value rhs_start: int -> int
value rhs_end: int -> int
Same as symbol_start and symbol_end above, but return the position of the
string matching the nth item on the right-hand side of the rule, where n
is the integer parameter to lhs_start and lhs_end. n is 1 for the
leftmost item.
value clear_parser : unit -> unit
Empty the parser stack. Call it just after a parsing function has
returned, to remove all pointers from the parser stack to structures that
were built by semantic actions during parsing. This is optional, but
lowers the memory requirements of the programs.
exception Parse_error
Raised when a parser encounters a syntax error.
14.11 printexc: a catch-all exception handler
value f: ('a -> 'b) -> 'a -> 'b
printexc__f fn x applies fn to x and returns the result. If the
evaluation of fn x raises any exception, the name of the exception is
printed on standard error output, and the programs aborts with exit code
2. Typical use is printexc__f main (), where main, with type unit->unit,
is the entry point of a standalone program, to catch and print stray
exceptions. For printexc__f to work properly, the program must have been
linked with the -g option.
14.12 printf: formatting printing functions
type ('a, 'b, 'c) format
Chapter 14. The standard library 135
The type of format strings. 'a is the type of the parameters of the
string, 'c is the result type for the printf-style function, and 'b is
the type of the first argument given to %a and %t printing functions.
value fprintf: out_channel -> ('a, out_channel, unit) format -> 'a
fprintf outchan format arg1 ... argN formats the arguments arg1 to argN
according to the format string format, and outputs the resulting string
on the channel outchan. The format is a character string which contains
two types of objects: plain characters, which are simply copied to the
output channel, and conversion specifications, each of which causes
conversion and printing of one argument. Conversion specifications
consist in the % character, followed by optional flags and field widths,
followed by one conversion character. The conversion characters and
their meanings are:
d or i: convert an integer argument to signed decimal
u: convert an integer argument to unsigned decimal
x: convert an integer argument to unsigned hexadecimal, using lowercase
letters.
X: convert an integer argument to unsigned hexadecimal, using uppercase
letters.
s: insert a string argument
c: insert a character argument
f: convert a floating-point argument to decimal notation, in the style
dddd.ddd
e or E: convert a floating-point argument to decimal notation, in the
style d.ddd e+-dd (mantissa and exponent)
g or G: convert a floating-point argument to decimal notation, in style f
or e, E (whichever is more compact)
b: convert a boolean argument to the string true or false
a: user-defined printer. Takes two arguments and apply the first one to
outchan (the current output channel) and to the second argument. The
first argument must therefore have type out_channel -> 'b -> unit and the
second 'b. The output produced by the function is therefore inserted in
the output of fprintf at the current point.
t: same as %a, but takes only one argument (with type
out_channel -> unit) and apply it to outchan.
Refer to the C library printf function for the meaning of flags and field
width specifiers. If too few arguments are provided, printing stops just
before converting the first missing argument.
value printf: ('a, out_channel, unit) format -> 'a
Same as fprintf, but output on std_out.
value eprintf: ('a, out_channel, unit) format -> 'a
Same as fprintf, but output on std_err.
value sprintf: ('a, unit, string) format -> 'a
Same as fprintf, except that the result of the formatting is returned as
a string instead of being written on a channel.
value fprint: out_channel -> string -> unit
Print the given string on the given output channel, without any
formatting. This is the same function as output_string of module io.
Chapter 14. The standard library 136
value print: string -> unit
Print the given string on std_out, without any formatting. This is the
same function as print_string of module io.
value eprint: string -> unit
Print the given string on std_err, without any formatting. This is the
same function as prerr_string of module io.
14.13 queue: queues
This module implements queues (FIFOs), with in-place modification.
type 'a t
The type of queues containing elements of type 'a.
exception Empty
Raised when take is applied to an empty queue.
value new: unit -> 'a t
Return a new queue, initially empty.
value add: 'a -> 'a t -> unit
add x q adds the element x at the end of the queue q.
value take: 'a t -> 'a
take q removes and returns the first element in queue q, or raises Empty
if the queue is empty.
value peek: 'a t -> 'a
peek q returns the first element in queue q, without removing it from the
queue, or raises Empty if the queue is empty.
value clear : 'a t -> unit
Discard all elements from a queue.
value length: 'a t -> int
Return the number of elements in a queue.
value iter: ('a -> unit) -> 'a t -> unit
iter f q applies f in turn to all elements of q, from the least recently
entered to the most recently entered. The queue itself is unchanged.
Chapter 14. The standard library 137
14.14 random: pseudo-random number generator
value init : int -> unit
Initialize the generator, using the argument as a seed. The same seed
will always yield the same sequence of numbers.
value full_init : int vect -> unit
Same as init but takes more data as seed. It is not useful to give more
than 55 integers.
value int : int -> int
random__int bound returns a random number between 0 (inclusive) and bound
30
(exclusive). bound must be positive and smaller than 2 .
value float : float -> float
random__float bound returns a random number between 0 (inclusive) and
bound (exclusive).
14.15 set: sets over ordered types
This module implements the set data structure, given a total ordering
function over the set elements. All operations over sets are purely
applicative (no side-effects). The implementation uses balanced binary
trees, and is therefore reasonably efficient: insertion and membership
take time logarithmic in the size of the set, for instance.
type 'a t
The type of sets containing elements of type 'a.
value empty: ('a -> 'a -> int) -> 'a t
The empty set. The argument is a total ordering function over the set
elements. This is a two-argument function f such that f e1 e2 is zero if
the elements e1 and e2 are equal, f e1 e2 is strictly negative if e1 is
smaller than e2, and f e1 e2 is strictly positive if e1 is greater than
e2. Examples: a suitable ordering function for type int is prefix -.
You can also use the generic structural comparison function eq__compare.
value is_empty: 'a t -> bool
Test whether a set is empty or not.
value mem: 'a -> 'a t -> bool
mem x s tests whether x belongs to the set s.
value add: 'a -> 'a t -> 'a t
add x s returns a set containing all elements of s, plus x. If x was
already in s, s is returned unchanged.
Chapter 14. The standard library 138
value remove: 'a -> 'a t -> 'a t
remove x s returns a set containing all elements of s, except x. If x
was not in s, s is returned unchanged.
value union: 'a t -> 'a t -> 'a t
value inter: 'a t -> 'a t -> 'a t
value diff: 'a t -> 'a t -> 'a t
Union, intersection and set difference.
value equal: 'a t -> 'a t -> bool
equal s1 s2 tests whether the sets s1 and s2 are equal, that is, contain
the same elements.
value compare: 'a t -> 'a t -> int
Total ordering between sets. Can be used as the ordering function for
doing sets of sets.
value elements: 'a t -> 'a list
Return the list of all elements of the given set. The elements appear in
the list in some non-specified order.
value iter: ('a -> unit) -> 'a t -> unit
iter f s applies f in turn to all elements of s, and discards the
results. The elements of s are presented to f in a non-specified order.
value fold: ('a -> 'b -> 'b) -> 'a t -> 'b -> 'b
fold f s a computes (f xN ... (f x2 (f x1 a))...), where x1 ... xN are
the elements of s. The order in which elements of s are presented to f
is not specified.
value choose: 'a t -> 'a
Return one element of the given set, or raise Not_found if the set is
empty. Which element is chosen is not specified, but equal elements will
be chosen for equal sets.
14.16 sort: sorting and merging lists
value sort : ('a -> 'a -> bool) -> 'a list -> 'a list
Sort a list in increasing order according to an ordering predicate. The
predicate should return true if its first argument is less than or equal
to its second argument.
value merge : ('a -> 'a -> bool) -> 'a list -> 'a list -> 'a list
Merge two lists according to the given predicate. Assuming the two
argument lists are sorted according to the predicate, merge returns a
Chapter 14. The standard library 139
sorted list containing the elements from the two lists. The behavior is
undefined if the two argument lists were not sorted.
14.17 stack: stacks
This module implements stacks (LIFOs), with in-place modification.
type 'a t
The type of stacks containing elements of type 'a.
exception Empty
Raised when pop is applied to an empty stack.
value new: unit -> 'a t
Return a new stack, initially empty.
value push: 'a -> 'a t -> unit
push x s adds the element x at the top of stack s.
value pop: 'a t -> 'a
pop s removes and returns the topmost element in stack s, or raises Empty
if the stack is empty.
value clear : 'a t -> unit
Discard all elements from a stack.
value length: 'a t -> int
Return the number of elements in a stack.
value iter: ('a -> unit) -> 'a t -> unit
iter f s applies f in turn to all elements of s, from the element at the
top of the stack to the element at the bottom of the stack. The stack
itself is unchanged.
14.18 sys: system interface
This module provides a simple interface to the operating system.
exception Sys_error of string
Raised by some functions in the sys and io modules, when the underlying
system calls fail. The argument to Sys_error is a string describing the
error. The texts of the error messages are implementation-dependent, and
should not be relied upon to catch specific system errors.
Chapter 14. The standard library 140
value command_line : string vect
The command line arguments given to the process. The first element is
the command name used to invoke the program.
value interactive: bool
True if we're running under the toplevel system. False if we're running
as a standalone program.
type file_perm == int
value s_irusr : file_perm
value s_iwusr : file_perm
value s_ixusr : file_perm
value s_irgrp : file_perm
value s_iwgrp : file_perm
value s_ixgrp : file_perm
value s_iroth : file_perm
value s_iwoth : file_perm
value s_ixoth : file_perm
value s_isuid : file_perm
value s_isgid : file_perm
value s_irall : file_perm
value s_iwall : file_perm
value s_ixall : file_perm
Access permissions for files. r is reading permission, w is writing
permission, x is execution permission. usr means permissions for the
user owning the file, grp for the group owning the file, oth for others.
isuid and isgid are for set-user-id and set-group-id files, respectively.
The remaining are combinations of the permissions above.
type open_flag =
O_RDONLY (* open read-only *)
| O_WRONLY (* open write-only *)
| O_RDWR (* open for reading and writing *)
| O_APPEND (* open for appending *)
| O_CREAT (* create the file if nonexistent *)
| O_TRUNC (* truncate the file to 0 if it exists *)
| O_EXCL (* fails if the file exists *)
| O_BINARY (* open in binary mode *)
| O_TEXT (* open in text mode *)
The commands for open.
value exit : int -> 'a
Terminate the program and return the given status code to the operating
system. In contrast with the function exit from module io, this exit
function does not flush the standard output and standard error channels.
value open : string -> open_flag list -> file_perm -> int
Open a file. The second argument is the opening mode. The third
argument is the permissions to use if the file must be created. The
result is a file descriptor opened on the file.
Chapter 14. The standard library 141
value close : int -> unit
Close a file descriptor.
value remove : string -> unit
Remove the given file name from the file system.
value rename : string -> string -> unit
Rename a file. The first argument is the old name and the second is the
new name.
value getenv : string -> string
Return the value associated to a variable in the process environment.
Raise Not_found if the variable is unbound.
value chdir : string -> unit
Change the current working directory of the process. Note that there is
no easy way of getting the current working directory from the operating
system.
value system_command : string -> int
Execute the given shell command and return its exit code.
value time : unit -> float
Return the processor time, in seconds, used by the program since the
beginning of execution.
exception Break
Exception Break is raised on user interrupt if catch_break is on.
value catch_break : bool -> unit
catch_break governs whether user interrupt terminates the program or
raises the Break exception. Call catch_break true to enable raising
Break, and catch_break false to let the system terminate the program on
user interrupt.
Chapter 15
The graphics library
This chapter describes the portable graphics primitives that come standard in
the implementation of Caml Light on micro-computers.
Unix: On Unix workstations running the X11 windows system, an implementation
of the graphics primitives is available in the directory
contrib/libgraph in the distribution. See the file README in this
directory for information on building and using camlgraph, a toplevel
system that includes the graphics primitives, and linking standalone
programs with the library. Drawing takes place in a separate window
that is created when open_graph is called.
Mac: The graphics primitive are available from the standalone application
that runs the toplevel system. They are not available from programs
compiled by camlc and run under the MPW shell. Drawing takes place in
a separate window, that can be made visible with the ``Show graphics
window'' menu entry.
PC: The graphics primitive are available from the Windows application that
runs the toplevel system. They are not available from programs
compiled by camlc and run in a DOS command window. Drawing takes place
in a separate window.
The screen coordinates are interpreted as shown in the figure below. Notice
that the coordinate system used is the same as in mathematics: y increases
from the bottom of the screen to the top of the screen, and angles are
measured counterclockwisey(in degrees). Drawing is clipped to the screen.
|
-------------------------
size_y() | |
| Screen |
| |
| |
| |pixel at (x,y) |
y ---------
| | |
| | |
| | |
| | |
| |
------------------------------
| | | x
| x size_x()
Here are the graphics mode specifications supported by open_graph on the
various implementations of this library.
142
Chapter 15. The graphics library 143
Unix: The argument to open_graph has the format "display-name geometry",
where display-name is the name of the X-windows display to connect to,
and geometry is a standard X-windows geometry specification. The two
components are separated by a space. Either can be omitted, or both.
Examples:
open_graph "foo:0"
connects to the display foo:0 and creates a window with the default
geometry
open_graph "foo:0 300x100+50-0"
connects to the display foo:0 and creates a window 300 pixels wide
by 100 pixels tall, at location (50,0)
open_graph " 300x100+50-0"
connects to the default display and creates a window 300 pixels
wide by 100 pixels tall, at location (50,0)
open_graph ""
connects to the default display and creates a window with the
default geometry.
Mac: The argument to open_graph is ignored.
PC: The argument to open_graph has the format "widthxheight" or
"widthxheight+x+y", where width and height are the initial dimensions
of the graphics windows, and x and y are the position of the upper-left
corner of the graphics window. If omitted, (width,height) default to
(600,400) and (x,y) default to (10, 10).
15.1 graphics: machine-independent graphics primitives
exception Graphic_failure of string
Raised by the functions below when they encounter an error.
Initializations
value open_graph: string -> unit
Show the graphics window or switch the screen to graphic mode. The
graphics window is cleared. The string argument is used to pass optional
information on the desired graphics mode, the graphics window size, and
so on. Its interpretation is implementation-dependent. If the empty
string is given, a sensible default is selected.
value close_graph: unit -> unit
Delete the graphics window or switch the screen back to text mode.
value clear_graph : unit -> unit
Erase the graphics window.
Chapter 15. The graphics library 144
value size_x : unit -> int
value size_y : unit -> int
Return the size of the graphics window. Coordinates of the screen pixels
range over 0 .. size_x()-1 and 0 .. size_y()-1. Drawings outside of this
rectangle are clipped, without causing an error. The origin (0,0) is at
the lower left corner.
Colors
type color == int
A color is specified by its R, G, B components. Each component is in the
range 0..255. The three components are packed in an int: 0xRRGGBB,
where RR are the two hexadecimal digits for the red component, GG for the
green component, BB for the blue component.
value rgb: int -> int -> int -> color
rgb r g b returns the integer encoding the color with red component r,
green component g, and blue component b. r, g and b are in the range
0..255.
value set_color : color -> unit
Set the current drawing color.
value black : color
value white : color
value red : color
value green : color
value blue : color
value yellow : color
value cyan : color
value magenta : color
Some predefined colors.
value background: color
value foreground: color
Default background and foreground colors (usually, either black
foreground on a white background or white foreground on a black
background). clear_graph fills the screen with the background color.
The initial drawing color is foreground.
Point and line drawing
value plot : int -> int -> unit
Plot the given point with the current drawing color.
value point_color : int -> int -> color
Return the color of the given point.
Chapter 15. The graphics library 145
value moveto : int -> int -> unit
Position the current point.
value current_point : unit -> int * int
Return the position of the current point.
value lineto : int -> int -> unit
Draw a line with endpoints the current point and the given point, and
move the current point to the given point.
value draw_arc : int -> int -> int -> int -> int -> int -> unit
draw_arc x y rx ry a1 a2 draws an elliptical arc with center x,y,
horizontal radius rx, vertical radius ry, from angle a1 to angle a2 (in
degrees). The current point is unchanged.
value draw_ellipse : int -> int -> int -> int -> unit
draw_ellipse x y rx ry draws an ellipse with center x,y, horizontal
radius rx and vertical radius ry. The current point is unchanged.
value draw_circle : int -> int -> int -> unit
draw_circle x y r draws a circle with center x,y and radius r. The
current point is unchanged.
value set_line_width : int -> unit
Set the width of points and lines drawn with the functions above. Under
X Windows, set_line_width 0 selects a width of 1 pixel and a faster, but
less precise drawing algorithm than the one used when set_line_width 1 is
specified.
Text drawing
value draw_char : char -> unit
value draw_string : string -> unit
Draw a character or a character string with lower left corner at current
position. After drawing, the current position is set to the lower right
corner of the text drawn.
value set_font : string -> unit
value set_text_size : int -> unit
Set the font and character size used for drawing text. The
interpretation of the arguments to set_font and set_text_size is
implementation-dependent.
value text_size : string -> int * int
Return the dimensions of the given text, if it were drawn with the
current font and size.
Chapter 15. The graphics library 146
Filling
value fill_rect : int -> int -> int -> int -> unit
fill_rect x y w h fills the rectangle with lower left corner at x,y,
width w and heigth h, with the current color.
value fill_poly : (int * int) vect -> unit
Fill the given polygon with the current color. The array contains the
coordinates of the vertices of the polygon.
value fill_arc : int -> int -> int -> int -> int -> int -> unit
Fill an elliptical pie slice with the current color. The parameters are
the same as for draw_arc.
value fill_ellipse : int -> int -> int -> int -> unit
Fill an ellipse with the current color. The parameters are the same as
for draw_ellipse.
value fill_circle : int -> int -> int -> unit
Fill a circle with the current color. The parameters are the same as for
draw_circle.
Images
type image
The abstract type for images, in internal representation. Externally,
images are represented as matrices of colors.
value transp : color
In matrices of colors, this color represent a ``transparent'' point:
when drawing the corresponding image, all pixels on the screen
corresponding to a transparent pixel in the image will not be modified,
while other points will be set to the color of the corresponding point in
the image. This allows superimposing an image over an existing
background.
value make_image : color vect vect -> image
Convert the given color matrix to an image. Each sub-array represents
one horizontal line. All sub-arrays must have the same length;
otherwise, exception Graphic_failure is raised.
value dump_image : image -> color vect vect
Convert an image to a color matrix.
value draw_image : image -> int -> int -> unit
Draw the given image with lower left corner at the given point.
Chapter 15. The graphics library 147
value get_image : int -> int -> int -> int -> image
Capture the contents of a rectangle on the screen as an image. The
parameters are the same as for fill_rect.
value create_image : int -> int -> image
create_image w h returns a new image w pixels wide and h pixels tall, to
be used in conjunction with blit_image. The initial image contents are
random.
value blit_image : image -> int -> int -> unit
blit_image img x y copies screen pixels into the image img, modifying img
in-place. The pixels copied are those inside the rectangle with lower
left corner at x,y, and width and height equal to those of the image.
Mouse and keyboard events
type status =
{ mouse_x : int; (* X coordinate of the mouse *)
mouse_y : int; (* Y coordinate of the mouse *)
button : bool; (* true if a mouse button is pressed *)
keypressed : bool; (* true if a key has been pressed *)
key : char } (* the character for the key pressed *)
To report events.
type event =
Button_down (* A mouse button is pressed *)
| Button_up (* A mouse button is released *)
| Key_pressed (* A key is pressed *)
| Mouse_motion (* The mouse is moved *)
| Poll (* Don't wait; return immediately *)
To specify events to wait for.
value wait_next_event : event list -> status
Wait until one of the events specified in the given event list occurs,
and return the status of the mouse and keyboard at that time. If Poll is
given in the event list, return immediately with the current status. If
the mouse cursor is outside of the graphics window, the mouse_x and
mouse_y fields of the event are outside the range
0..size_x()-1, 0..size_y()-1. Keypresses are queued, and dequeued one by
one when the Key_pressed event is specified.
Mouse and keyboard polling
value mouse_pos : unit -> int * int
Return the position of the mouse cursor, relative to the graphics window.
If the mouse cursor is outside of the graphics window, mouse_pos()
returns a point outside of the range 0..size_x()-1, 0..size_y()-1.
Chapter 15. The graphics library 148
value button_down : unit -> bool
Return true if the mouse button is pressed, false otherwise.
value read_key : unit -> char
Wait for a key to be pressed, and return the corresponding character.
Keypresses are queued.
value key_pressed : unit -> bool
Return true if a keypress is available; that is, if read_key would not
block.
Sound
value sound : int -> int -> unit
sound freq dur plays a sound at frequency freq (in hertz) for a duration
dur (in milliseconds). On the Macintosh, the frequency is rounded to the
nearest note in the equal-tempered scale.
Chapter 16
The unix library: Unix system calls
The unix library (distributed in contrib/libunix) makes many Unix system calls
and system-related library functions available to Caml Light programs. This
chapter describes briefly the functions provided. Refer to sections 2 and 3
of the Unix manual for more details on the behavior of these functions.
Not all functions are provided by all Unix variants. If some functions are
not available, they will raise Invalid_arg when called.
Programs that use the unix library must be linked in ``custom runtime''
mode, as follows:
camlc -custom other options unix.zo other files -lunix
For interactive use of the unix library, run camllight camlunix.
Mac: This library is not available.
PC: This library is not available.
16.1 unix: interface to the Unix system
Error report
type error =
ENOERR
| EPERM (* Not owner *)
| ENOENT (* No such file or directory *)
| ESRCH (* No such process *)
| EINTR (* Interrupted system call *)
| EIO (* I/O error *)
| ENXIO (* No such device or address *)
| E2BIG (* Arg list too long *)
| ENOEXEC (* Exec format error *)
| EBADF (* Bad file number *)
| ECHILD (* No children *)
| EAGAIN (* No more processes *)
| ENOMEM (* Not enough core *)
| EACCES (* Permission denied *)
| EFAULT (* Bad address *)
| ENOTBLK (* Block device required *)
| EBUSY (* Mount device busy *)
| EEXIST (* File exists *)
| EXDEV (* Cross-device link *)
| ENODEV (* No such device *)
149
Chapter 16. The unix library: Unix system calls 150
| ENOTDIR (* Not a directory*)
| EISDIR (* Is a directory *)
| EINVAL (* Invalid argument *)
| ENFILE (* File table overflow *)
| EMFILE (* Too many open files *)
| ENOTTY (* Not a typewriter *)
| ETXTBSY (* Text file busy *)
| EFBIG (* File too large *)
| ENOSPC (* No space left on device *)
| ESPIPE (* Illegal seek *)
| EROFS (* Read-only file system *)
| EMLINK (* Too many links *)
| EPIPE (* Broken pipe *)
| EDOM (* Argument too large *)
| ERANGE (* Result too large *)
| EWOULDBLOCK (* Operation would block *)
| EINPROGRESS (* Operation now in progress *)
| EALREADY (* Operation already in progress *)
| ENOTSOCK (* Socket operation on non-socket *)
| EDESTADDRREQ (* Destination address required *)
| EMSGSIZE (* Message too long *)
| EPROTOTYPE (* Protocol wrong type for socket *)
| ENOPROTOOPT (* Protocol not available *)
| EPROTONOSUPPORT (* Protocol not supported *)
| ESOCKTNOSUPPORT (* Socket type not supported *)
| EOPNOTSUPP (* Operation not supported on socket *)
| EPFNOSUPPORT (* Protocol family not supported *)
| EAFNOSUPPORT (* Address family not supported by protocol family *)
| EADDRINUSE (* Address already in use *)
| EADDRNOTAVAIL (* Can't assign requested address *)
| ENETDOWN (* Network is down *)
| ENETUNREACH (* Network is unreachable *)
| ENETRESET (* Network dropped connection on reset *)
| ECONNABORTED (* Software caused connection abort *)
| ECONNRESET (* Connection reset by peer *)
| ENOBUFS (* No buffer space available *)
| EISCONN (* Socket is already connected *)
| ENOTCONN (* Socket is not connected *)
| ESHUTDOWN (* Can't send after socket shutdown *)
| ETOOMANYREFS (* Too many references: can't splice *)
| ETIMEDOUT (* Connection timed out *)
| ECONNREFUSED (* Connection refused *)
| ELOOP (* Too many levels of symbolic links *)
| ENAMETOOLONG (* File name too long *)
| EHOSTDOWN (* Host is down *)
| EHOSTUNREACH (* No route to host *)
| ENOTEMPTY (* Directory not empty *)
| EPROCLIM (* Too many processes *)
| EUSERS (* Too many users *)
| EDQUOT (* Disc quota exceeded *)
| ESTALE (* Stale NFS file handle *)
| EREMOTE (* Too many levels of remote in path *)
| EIDRM (* Identifier removed *)
| EDEADLK (* Deadlock condition. *)
| ENOLCK (* No record locks available. *)
| ENOSYS (* Function not implemented *)
| EUNKNOWNERR
Chapter 16. The unix library: Unix system calls 151
The type of error codes.
exception Unix_error of error * string * string
Raised by the system calls below when an error is encountered. The first
component is the error code; the second component is the function name;
the third component is the string parameter to the function, if it has
one, or the empty string otherwise.
value error_message : error -> string
Return a string describing the given error code.
value handle_unix_error : ('a -> 'b) -> 'a -> 'b
handle_unix_error f x applies f to x and returns the result. If the
exception Unix_error is raised, it prints a message describing the error
and exits with code 2.
Interface with the parent process
value environment : unit -> string vect
Return the process environment, as an array of strings with the format
``variable=value''. See also sys__getenv.
Process handling
type process_status =
WEXITED of int
| WSIGNALED of int * bool
| WSTOPPED of int
The termination status of a process. WEXITED means that the process
terminated normally by exit; the argument is the return code. WSIGNALED
means that the process was killed by a signal; the first argument is the
signal number, the second argument indicates whether a ``core dump'' was
performed. WSTOPPED means that the process was stopped by a signal; the
argument is the signal number.
type wait_flag =
WNOHANG
| WUNTRACED
Flags for waitopt and waitpid. WNOHANG means do not block if no child
has died yet, but immediately return with a pid equal to 0. WUNTRACED
means report also the children that receive stop signals.
value execv : string -> string vect -> unit
execv prog args execute the program in file prog, with the arguments
args, and the current process environment.
value execve : string -> string vect -> string vect -> unit
Chapter 16. The unix library: Unix system calls 152
Same as execv, except that the third argument provides the environment to
the program executed.
value execvp : string -> string vect -> unit
Same as execv, except that the program is searched in the path.
value fork : unit -> int
Fork a new process. The returned integer is 0 for the child process, the
pid of the child process for the parent process.
value wait : unit -> int * process_status
Wait until one of the children processes die, and return its pid and
termination status.
value waitopt : wait_flag list -> int * process_status
Same as wait, but takes a list of options to avoid blocking, or also
report stopped children. The pid returned is 0 if no child has changed
status.
value waitpid : wait_flag list -> int -> int * process_status
Same as waitopt, but waits for the process whose pid is given. Negative
pid arguments represent process groups.
value system : string -> process_status
Execute the given command, wait until it terminates, and return its
termination status. The string is interpreted by the shell /bin/sh and
therefore can contain redirections, quotes, variables, etc. The result
WEXITED 127 indicates that the shell couldn't be executed.
value getpid : unit -> int
Return the pid of the process.
value getppid : unit -> int
Return the pid of the parent process.
value nice : int -> int
Change the process priority. The integer argument is added to the
``nice'' value. (Higher values of the ``nice'' value mean lower
priorities.) Return the new nice value.
Basic file input/output
type file_descr
The abstract type of file descriptors.
Chapter 16. The unix library: Unix system calls 153
value stdin : file_descr
value stdout : file_descr
value stderr : file_descr
File descriptors for standard input, standard output and standard error.
type open_flag =
O_RDONLY (* Open for reading *)
| O_WRONLY (* Open for writing *)
| O_RDWR (* Open for reading and writing *)
| O_NDELAY (* Open in non-blocking mode *)
| O_APPEND (* Open for append *)
| O_CREAT (* Create if nonexistent *)
| O_TRUNC (* Truncate to 0 length if existing *)
| O_EXCL (* Fail if existing *)
The flags to open.
type file_perm == int
The type of file access rights.
value open : string -> open_flag list -> file_perm -> file_descr
Open the named file with the given flags. Third argument is the
permissions to give to the file if it is created. Return a file
descriptor on the named file.
value close : file_descr -> unit
Close a file descriptor.
value read : file_descr -> string -> int -> int -> int
read fd buff start len reads len characters from descriptor fd, storing
them in string buff, starting at position ofs in string buff. Return the
number of characters actually read.
value write : file_descr -> string -> int -> int -> int
write fd buff start len writes len characters to descriptor fd, taking
them from string buff, starting at position ofs in string buff. Return
the number of characters actually written.
Interfacing with the standard input/output library (module io).
value in_channel_of_descr : file_descr -> in_channel
Create an input channel reading from the given descriptor.
value out_channel_of_descr : file_descr -> out_channel
Create an output channel writing on the given descriptor.
value descr_of_in_channel : in_channel -> file_descr
Return the descriptor corresponding to an input channel.
Chapter 16. The unix library: Unix system calls 154
value descr_of_out_channel : out_channel -> file_descr
Return the descriptor corresponding to an output channel.
Seeking and truncating
type seek_command =
SEEK_SET
| SEEK_CUR
| SEEK_END
Positioning modes for lseek. SEEK_SET indicates positions relative to
the beginning of the file, SEEK_CUR relative to the current position,
SEEK_END relative to the end of the file.
value lseek : file_descr -> int -> seek_command -> int
Set the current position for a file descriptor
value truncate : string -> int -> unit
Truncates the named file to the given size.
value ftruncate : file_descr -> int -> unit
Truncates the file corresponding to the given descriptor to the given
size.
File statistics
type file_kind =
S_REG (* Regular file *)
| S_DIR (* Directory *)
| S_CHR (* Character device *)
| S_BLK (* Block device *)
| S_LNK (* Symbolic link *)
| S_FIFO (* Named pipe *)
| S_SOCK (* Socket *)
type stats =
{ st_dev : int; (* Device number *)
st_ino : int; (* Inode number *)
st_kind : file_kind; (* Kind of the file *)
st_perm : file_perm; (* Access rights *)
st_nlink : int; (* Number of links *)
st_uid : int; (* User id of the owner *)
st_gid : int; (* Group id of the owner *)
st_rdev : int; (* Device minor number *)
st_size : int; (* Size in bytes *)
st_atime : int; (* Last access time *)
st_mtime : int; (* Last modification time *)
st_ctime : int } (* Last status change time *)
The informations returned by the stat calls.
value stat : string -> stats
Chapter 16. The unix library: Unix system calls 155
Return the information for the named file.
value lstat : string -> stats
Same as stat, but in case the file is a symbolic link, return the
information for the link itself.
value fstat : file_descr -> stats
Return the information for the file associated with the given descriptor.
Operations on file names
value unlink : string -> unit
Removes the named file
value rename : string -> string -> unit
rename old new changes the name of a file from old to new.
value link : string -> string -> unit
link source dest creates a hard link named dest to the file named new.
File permissions and ownership
type access_permission =
R_OK (* Read permission *)
| W_OK (* Write permission *)
| X_OK (* Execution permission *)
| F_OK (* File exists *)
Flags for the access call.
value chmod : string -> file_perm -> unit
Change the permissions of the named file.
value fchmod : file_descr -> file_perm -> unit
Change the permissions of an opened file.
value chown : string -> int -> int -> unit
Change the owner uid and owner gid of the named file.
value fchown : file_descr -> int -> int -> unit
Change the owner uid and owner gid of an opened file.
value umask : int -> int
Set the process creation mask, and return the previous mask.
Chapter 16. The unix library: Unix system calls 156
value access : string -> access_permission list -> unit
Check that the process has the given permissions over the named file.
Raise Unix_error otherwise.
File descriptor hacking
value fcntl_int : file_descr -> int -> int -> int
Interface to fcntl in the case where the argument is an integer. The
first integer argument is the command code; the second is the integer
parameter.
value fcntl_ptr : file_descr -> int -> string -> int
Interface to fcntl in the case where the argument is a pointer. The
integer argument is the command code. A pointer to the string argument
is passed as argument to the command.
Directories
value mkdir : string -> file_perm -> unit
Create a directory with the given permissions.
value rmdir : string -> unit
Remove an empty directory.
value chdir : string -> unit
Change the process working directory.
value getcwd : unit -> string
Return the name of the current working directory.
type dir_handle
The type of descriptors over opened directories.
value opendir : string -> dir_handle
Open a descriptor on a directory
value readdir : dir_handle -> string
Return the next entry in a directory. Raise End_of_file when the end of
the directory has been reached.
value rewinddir : dir_handle -> unit
Reposition the descriptor to the beginning of the directory
Chapter 16. The unix library: Unix system calls 157
value closedir : dir_handle -> unit
Close a directory descriptor.
Pipes and redirections
value pipe : unit -> file_descr * file_descr
Create a pipe. The first component of the result is opened for reading,
that's the exit to the pipe. The second component is opened for writing,
that's the entrace to the pipe.
value dup : file_descr -> file_descr
Duplicate a descriptor.
value dup2 : file_descr -> file_descr -> unit
dup2 fd1 fd2 duplicates fd1 to fd2, closing fd2 if already opened.
value open_process_in: string -> in_channel
value open_process_out: string -> out_channel
value open_process: string -> in_channel * out_channel
High-level pipe and process management. These functions run the given
command in parallel with the program, and return channels connected to
the standard input and/or the standard output of the command. The
command is interpreted by the shell /bin/sh (cf. system). Warning:
writes on channels are buffered, hence be careful to call flush at the
right times to ensure correct synchronization.
value close_process_in: in_channel -> process_status
value close_process_out: out_channel -> process_status
value close_process: in_channel * out_channel -> process_status
Close channels opened by open_process_in, open_process_out and
open_process, respectively, wait for the associated command to terminate,
and return its termination status.
Symbolic links
value symlink : string -> string -> unit
symlink source dest creates the file dest as a symbolic link to the file
source.
value readlink : string -> string
Read the contents of a link.
Named pipes
value mkfifo : string -> file_perm -> unit
Chapter 16. The unix library: Unix system calls 158
Create a named pipe with the given permissions.
Special files
value ioctl_int : file_descr -> int -> int -> int
Interface to ioctl in the case where the argument is an integer. The
first integer argument is the command code; the second is the integer
parameter.
value ioctl_ptr : file_descr -> int -> string -> int
Interface to ioctl in the case where the argument is a pointer. The
integer argument is the command code. A pointer to the string argument
is passed as argument to the command.
Polling
value select :
file_descr list -> file_descr list -> file_descr list -> float ->
file_descr list * file_descr list * file_descr list
Wait until some input/output operations become possible on some channels.
The three list arguments are, respectively, a set of descriptors to check
for reading (first argument), for writing (second argument), or for
exceptional conditions (third argument). The fourth argument is the
maximal timeout, in seconds; a negative fourth argument means no timeout
(unbounded wait). The result is composed of three sets of descriptors:
those ready for reading (first component), ready for writing (second
component), and over which an exceptional condition is pending (third
component).
Locking
type lock_command =
F_ULOCK (* Unlock a region *)
| F_LOCK (* Lock a region, and block if already locked *)
| F_TLOCK (* Lock a region, or fail if already locked *)
| F_TEST (* Test a region for other process' locks *)
Commands for lockf.
value lockf : file_descr -> lock_command -> int -> unit
lockf fd cmd size puts a lock on a region of the file opened as fd. The
region starts at the current read/write position for fd (as set by
lseek), and extends size bytes forward if size is positive, size bytes
backwards if size is negative, or to the end of the file if size is zero.
Signals
type signal =
SIGHUP (* hangup *)
Chapter 16. The unix library: Unix system calls 159
| SIGINT (* interrupt *)
| SIGQUIT (* quit *)
| SIGILL (* illegal instruction (not reset when caught) *)
| SIGTRAP (* trace trap (not reset when caught) *)
| SIGABRT (* used by abort *)
| SIGEMT (* EMT instruction *)
| SIGFPE (* floating point exception *)
| SIGKILL (* kill (cannot be caught or ignored) *)
| SIGBUS (* bus error *)
| SIGSEGV (* segmentation violation *)
| SIGSYS (* bad argument to system call *)
| SIGPIPE (* write on a pipe with no one to read it *)
| SIGALRM (* alarm clock *)
| SIGTERM (* software termination signal from kill *)
| SIGURG (* urgent condition on IO channel *)
| SIGSTOP (* sendable stop signal not from tty *)
| SIGTSTP (* stop signal from tty *)
| SIGCONT (* continue a stopped process *)
| SIGCHLD (* to parent on child stop or exit *)
| SIGIO (* input/output possible signal *)
| SIGXCPU (* exceeded CPU time limit *)
| SIGXFSZ (* exceeded file size limit *)
| SIGVTALRM (* virtual time alarm *)
| SIGPROF (* profiling time alarm *)
| SIGWINCH (* window changed *)
| SIGLOST (* resource lost (eg, record-lock lost) *)
| SIGUSR1 (* user defined signal 1 *)
| SIGUSR2 (* user defined signal 2 *)
The type of signals.
type signal_handler =
Signal_default (* Default behavior for the signal *)
| Signal_ignore (* Ignore the signal *)
| Signal_handle of (unit -> unit) (* Call the given function
when the signal occurs. *)
The behavior on receipt of a signal
value kill : int -> signal -> unit
Send a signal to the process with the given process id.
value signal : signal -> signal_handler -> unit
Set the behavior to be taken on receipt of the given signal.
value pause : unit -> unit
Wait until a non-ignored signal is delivered.
Time functions
type process_times =
{ tms_utime : float; (* User time for the process *)
tms_stime : float; (* System time for the process *)
tms_cutime : float; (* User time for the children processes *)
Chapter 16. The unix library: Unix system calls 160
tms_cstime : float } (* System time for the children processes *)
The execution times (CPU times) of a process.
type tm =
{ tm_sec : int; (* Seconds 0..59 *)
tm_min : int; (* Minutes 0..59 *)
tm_hour : int; (* Hours 0..23 *)
tm_mday : int; (* Day of month 1..31 *)
tm_mon : int; (* Month of year 0..11 *)
tm_year : int; (* Year - 1900 *)
tm_wday : int; (* Day of week (Sunday is 0) *)
tm_yday : int; (* Day of year 0..365 *)
tm_isdst : bool } (* Daylight time savings in effect *)
The type representing wallclock time and calendar date.
value time : unit -> int
Return the current time since 00:00:00 GMT, Jan. 1, 1970, in seconds.
value gettimeofday : unit -> float
Same as time, but with resolution better than 1 second.
value gmtime : int -> tm
Convert a time in seconds, as returned by time, into a date and a time.
Assumes Greenwich meridian time zone.
value localtime : int -> tm
Convert a time in seconds, as returned by time, into a date and a time.
Assumes the local time zone.
value alarm : int -> int
Schedule a SIGALRM signals after the given number of seconds.
value sleep : int -> unit
Stop execution for the given number of seconds.
value times : unit -> process_times
Return the execution times of the process.
value utimes : string -> int -> int -> unit
Set the last access time (second arg) and last modification time (third
arg) for a file. Times are expressed in seconds from 00:00:00 GMT, Jan.
1, 1970.
User id, group id
value getuid : unit -> int
Chapter 16. The unix library: Unix system calls 161
Return the user id of the user executing the process.
value geteuid : unit -> int
Return the effective user id under which the process runs.
value setuid : int -> unit
Set the real user id and effective user id for the process.
value getgid : unit -> int
Return the group id of the user executing the process.
value getegid : unit -> int
Return the effective group id under which the process runs.
value setgid : int -> unit
Set the real group id and effective group id for the process.
value getgroups : unit -> int vect
Return the list of groups to which the user executing the process
belongs.
type passwd_entry =
{ pw_name : string;
pw_passwd : string;
pw_uid : int;
pw_gid : int;
pw_gecos : string;
pw_dir : string;
pw_shell : string }
Structure of entries in the passwd database.
type group_entry =
{ gr_name : string;
gr_passwd : string;
gr_gid : int;
gr_mem : string vect }
Structure of entries in the groups database.
value getlogin : unit -> string
Return the login name of the user executing the process.
value getpwnam : string -> passwd_entry
Find an entry in passwd with the given name, or raise Not_found.
value getgrnam : string -> group_entry
Find an entry in group with the given name, or raise Not_found.
Chapter 16. The unix library: Unix system calls 162
value getpwuid : int -> passwd_entry
Find an entry in passwd with the given user id, or raise Not_found.
value getgrgid : int -> group_entry
Find an entry in group with the given group id, or raise Not_found.
Internet addresses
type inet_addr
The abstract type of Internet addresses.
value inet_addr_of_string : string -> inet_addr
value string_of_inet_addr : inet_addr -> string
Conversions between string with the format XXX.YYY.ZZZ.TTT and Internet
addresses. inet_addr_of_string raises Failure when given a string that
does not match this format.
Sockets
type socket_domain =
PF_UNIX (* Unix domain *)
| PF_INET (* Internet domain *)
The type of socket domains.
type socket_type =
SOCK_STREAM (* Stream socket *)
| SOCK_DGRAM (* Datagram socket *)
| SOCK_RAW (* Raw socket *)
| SOCK_SEQPACKET (* Sequenced packets socket *)
The type of socket kinds, specifying the semantics of communications.
type sockaddr =
ADDR_UNIX of string
| ADDR_INET of inet_addr * int
The type of socket addresses. ADDR_UNIX name is a socket address in the
Unix domain; name is a file name in the file system.
ADDR_INET(addr,port) is a socket address in the Internet domain; addr is
the Internet address of the machine, and port is the port number.
type shutdown_command =
SHUTDOWN_RECEIVE (* Close for receiving *)
| SHUTDOWN_SEND (* Close for sending *)
| SHUTDOWN_ALL (* Close both *)
The type of commands for shutdown.
type msg_flag =
MSG_OOB
Chapter 16. The unix library: Unix system calls 163
| MSG_DONTROUTE
| MSG_PEEK
The flags for recv, recvfrom, send and sendto.
value socket : socket_domain -> socket_type -> int -> file_descr
Create a new socket in the given domain, and with the given kind. The
third argument is the protocol type; 0 selects the default protocol for
that kind of sockets.
value socketpair :
socket_domain -> socket_type -> int -> file_descr * file_descr
Create a pair of unnamed sockets, connected together.
value accept : file_descr -> file_descr * sockaddr
Accept connections on the given socket. The returned descriptor is a
socket connected to the client; the returned address is the address of
the connecting client.
value bind : file_descr -> sockaddr -> unit
Bind a socket to an address.
value connect : file_descr -> sockaddr -> unit
Connect a socket to an address.
value listen : file_descr -> int -> unit
Set up a socket for receiving connection requests. The integer argument
is the maximal number of pending requests.
value shutdown : file_descr -> shutdown_command -> unit
Shutdown a socket connection. SHUTDOWN_SEND as second argument causes
reads on the other end of the connection to return an end-of-file
condition. SHUTDOWN_RECEIVE causes writes on the other end of the
connection to return a closed pipe condition (SIGPIPE signal).
value getsockname : file_descr -> sockaddr
Return the address of the given socket.
value getpeername : file_descr -> sockaddr
Return the address of the host connected to the given socket.
value recv : file_descr -> string -> int -> int -> msg_flag list -> int
value recvfrom :
file_descr -> string -> int -> int -> msg_flag list -> int * sockaddr
Receive data from an unconnected socket.
Chapter 16. The unix library: Unix system calls 164
value send : file_descr -> string -> int -> int -> msg_flag list -> int
value sendto :
file_descr -> string -> int -> int -> msg_flag list -> sockaddr -> int
Send data over an unconnected socket.
High-level network connection functions
value open_connection : sockaddr -> in_channel * out_channel
Connect to a server at the given address. Return a pair of buffered
channels connected to the server. Remember to call flush on the output
channel at the right times to ensure correct synchronization.
value shutdown_connection : in_channel -> unit
``Shut down'' a connection established with open_connection; that is,
transmit an end-of-file condition to the server reading on the other side
of the connection.
value establish_server : (in_channel -> out_channel -> unit) -> sockaddr -
> unit
Establish a server on the given address. The function given as first
argument is called for each connection with two buffered channels
connected to the client. A new process is created for each connection.
The function establish_server never returns normally.
Host and protocol databases
type host_entry =
{ h_name : string;
h_aliases : string vect;
h_addrtype : socket_domain;
h_addr_list : inet_addr vect }
Structure of entries in the hosts database.
type protocol_entry =
{ p_name : string;
p_aliases : string vect;
p_proto : int }
Structure of entries in the protocols database.
type service_entry =
{ s_name : string;
s_aliases : string vect;
s_port : int;
s_proto : string }
Structure of entries in the services database.
value gethostname : unit -> string
Chapter 16. The unix library: Unix system calls 165
Return the name of the local host.
value gethostbyname : string -> host_entry
Find an entry in hosts with the given name, or raise Not_found.
value gethostbyaddr : inet_addr -> host_entry
Find an entry in hosts with the given address, or raise Not_found.
value getprotobyname : string -> protocol_entry
Find an entry in protocols with the given name, or raise Not_found.
value getprotobynumber : int -> protocol_entry
Find an entry in protocols with the given protocol number, or raise
Not_found.
value getservbyname : string -> string -> service_entry
Find an entry in services with the given name, or raise Not_found.
value getservbyport : int -> string -> service_entry
Find an entry in services with the given service number, or raise
Not_found.
Terminal interface
The following functions implement the POSIX standard terminal interface.
They provide control over asynchronous communication ports and
pseudo-terminals. Refer to the termios man page for a complete
description.
type terminal_io = {
Input modes:
mutable c_ignbrk: bool; (* Ignore the break condition. *)
mutable c_brkint: bool; (* Signal interrupt on break condition. *)
mutable c_ignpar: bool; (* Ignore characters with parity errors. *)
mutable c_parmrk: bool; (* Mark parity errors. *)
mutable c_inpck: bool; (* Enable parity check on input. *)
mutable c_istrip: bool; (* Strip 8th bit on input characters. *)
mutable c_inlcr: bool; (* Map NL to CR on input. *)
mutable c_igncr: bool; (* Ignore CR on input. *)
mutable c_icrnl: bool; (* Map CR to NL on input. *)
mutable c_ixon: bool; (* Recognize XON/XOFF characters on input. *)
mutable c_ixoff: bool; (* Emit XON/XOFF chars to control input flow. *)
Output modes:
mutable c_opost: bool; (* Enable output processing. *)
Control modes:
Chapter 16. The unix library: Unix system calls 166
mutable c_obaud: int; (* Output baud rate (0 means close connection).*)
mutable c_ibaud: int; (* Input baud rate. *)
mutable c_csize: int; (* Number of bits per character (5-8). *)
mutable c_cstopb: int; (* Number of stop bits (1-2). *)
mutable c_cread: bool; (* Reception is enabled. *)
mutable c_parenb: bool; (* Enable parity generation and detection. *)
mutable c_parodd: bool; (* Specify odd parity instead of even. *)
mutable c_hupcl: bool; (* Hang up on last close. *)
mutable c_clocal: bool; (* Ignore modem status lines. *)
Local modes:
mutable c_isig: bool; (* Generate signal on INTR, QUIT, SUSP. *)
mutable c_icanon: bool; (* Enable canonical processing
(line buffering and editing) *)
mutable c_noflsh: bool; (* Disable flush after INTR, QUIT, SUSP. *)
mutable c_echo: bool; (* Echo input characters. *)
mutable c_echoe: bool; (* Echo ERASE (to erase previous character). *)
mutable c_echok: bool; (* Echo KILL (to erase the current line). *)
mutable c_echonl: bool; (* Echo NL even if c_echo is not set. *)
Control characters:
mutable c_vintr: char; (* Interrupt character (usually ctrl-C). *)
mutable c_vquit: char; (* Quit character (usually ctrl-\). *)
mutable c_verase: char; (* Erase character (usually DEL or ctrl-H). *)
mutable c_vkill: char; (* Kill line character (usually ctrl-U). *)
mutable c_veof: char; (* End-of-file character (usually ctrl-D). *)
mutable c_veol: char; (* Alternate end-of-line char. (usually none). *)
mutable c_vmin: int; (* Minimum number of characters to read
before the read request is satisfied. *)
mutable c_vtime: int; (* Maximum read wait (in 0.1s units). *)
mutable c_vstart: char; (* Start character (usually ctrl-Q). *)
mutable c_vstop: char (* Stop character (usually ctrl-S). *)
}
value tcgetattr: file_descr -> terminal_io
Return the status of the terminal referred to by the given file
descriptor.
type setattr_when = TCSANOW | TCSADRAIN | TCSAFLUSH
value tcsetattr: file_descr -> setattr_when -> terminal_io -> unit
Set the status of the terminal referred to by the given file descriptor.
The second argument indicates when the status change takes place:
immediately (TCSANOW), when all pending output has been transmitted
(TCSADRAIN), or after flushing all input that has been received but not
read (TCSAFLUSH). TCSADRAIN is recommended when changing the output
parameters; TCSAFLUSH, when changing the input parameters.
value tcsendbreak: file_descr -> int -> unit
Send a break condition on the given file descriptor. The second argument
is the duration of the break, in 0.1s units; 0 means standard duration
(0.25s).
value tcdrain: file_descr -> unit
Chapter 16. The unix library: Unix system calls 167
Waits until all output written on the given file descriptor has been
transmitted.
type flush_queue = TCIFLUSH | TCOFLUSH | TCIOFLUSH
value tcflush: file_descr -> flush_queue -> unit
Discard data written on the given file descriptor but not yet
transmitted, or data received but not yet read, depending on the second
argument: TCIFLUSH flushes data received but not read, TCOFLUSH flushes
data written but not transmitted, and TCIOFLUSH flushes both.
type flow_action = TCOOFF | TCOON | TCIOFF | TCION
value tcflow: file_descr -> flow_action -> unit
Suspend or restart reception or transmission of data on the given file
descriptor, depending on the second argument: TCOOFF suspends output,
TCOON restarts output, TCIOFF transmits a STOP character to suspend
input, and TCION transmits a START character to restart input.
Chapter 17
The num library: arbitrary-precision rational arithmetic
The num library (distributed in contrib/libnum) implements exact-precision
rational arithmetic. It is built upon the state-of-the-art BigNum
arbitrary-precision integer arithmetic package, and therefore achieves very
high performance.
The functions provided in this library are fully documented in The CAML
Numbers Reference Manual by Valerie Menissier-Morain, technical report 141,
INRIA, july 1992 (available by anonymous FTP from ftp.inria.fr, directory
INRIA/publications/RT, file RT-0141.ps.Z). A summary of the functions is given
below.
Programs that use the num library must be linked in ``custom runtime'' mode,
as follows:
camlc -custom other options nums.zo other files -lnums
For interactive use of the num library, run camllight camlnum.
Mac: This library is not available.
PC: This library is available by default in the standard runtime system and
in the toplevel system. Programs that use this library can be linked
normally, without the -custom option.
17.1 num: operations on numbers
Numbers (type num) are arbitrary-precision rational numbers, plus the
special elements 1/0 (infinity) and 0/0 (undefined).
type num = Int of int | Big_int of big_int | Ratio of ratio
The type of numbers.
value normalize_num : num -> num
value numerator_num : num -> num
value denominator_num : num -> num
Arithmetic operations
value prefix +/ : num -> num -> num
value add_num : num -> num -> num
Addition
168
Chapter 17. The num library: arbitrary-precision rational arithmetic 169
value minus_num : num -> num
Unary negation.
value prefix -/ : num -> num -> num
value sub_num : num -> num -> num
Subtraction
value prefix */ : num -> num -> num
value mult_num : num -> num -> num
Multiplication
value square_num : num -> num
Squaring
value prefix // : num -> num -> num
value div_num : num -> num -> num
Division
value quo_num : num -> num -> num
value mod_num : num -> num -> num
Euclidean division: quotient and remainder
value prefix **/ : num -> num -> num
value power_num : num -> num -> num
Exponentiation
value is_integer_num : num -> bool
Test if a number is an integer
value integer_num : num -> num
value floor_num : num -> num
value round_num : num -> num
value ceiling_num : num -> num
Approximate a number by an integer. floor_num n returns the largest
integer smaller or equal to n. ceiling_num n returns the smallest
integer bigger or equal to n. integer_num n returns the integer closest
to n. In case of ties, rounds towards zero. round_num n returns the
integer closest to n. In case of ties, rounds off zero.
value sign_num : num -> int
Return -1, 0 or 1 according to the sign of the argument.
value prefix =/ : num -> num -> bool
value prefix : num -> num -> bool
value prefix >/ : num -> num -> bool
value prefix <=/ : num -> num -> bool
value prefix >=/ : num -> num -> bool
value prefix <>/ : num -> num -> bool
Chapter 17. The num library: arbitrary-precision rational arithmetic 170
value eq_num : num -> num -> bool
value lt_num : num -> num -> bool
value le_num : num -> num -> bool
value gt_num : num -> num -> bool
value ge_num : num -> num -> bool
Usual comparisons between numbers
value compare_num : num -> num -> int
Return -1, 0 or 1 if the first argument is less than, equal to, or
greater than the second argument.
value max_num : num -> num -> num
value min_num : num -> num -> num
Return the greater (resp. the smaller) of the two arguments.
value abs_num : num -> num
Absolute value.
value succ_num: num -> num
succ n is n+1
value pred_num: num -> num
pred n is n-1
value incr_num: num ref -> unit
incr r is r:=!r+1, where r is a reference to a number.
value decr_num: num ref -> unit
decr r is r:=!r-1, where r is a reference to a number.
Coercions with strings
value string_of_num : num -> string
Convert a number to a string, using fractional notation.
value approx_num_fix : int -> num -> string
value approx_num_exp : int -> num -> string
Approximate a number by a decimal. The first argument is the required
precision. The second argument is the number to approximate. approx_fix
uses decimal notation; the first argument is the number of digits after
the decimal point. approx_exp uses scientific (exponential) notation;
the first argument is the number of digits in the mantissa.
value num_of_string : string -> num
Convert a string to a number.
Chapter 17. The num library: arbitrary-precision rational arithmetic 171
Coercions between numerical types
value int_of_num : num -> int
value num_of_int : int -> num
value nat_of_num : num -> nat__nat
value num_of_nat : nat__nat -> num
value num_of_big_int : big_int -> num
value big_int_of_num : num -> big_int
value ratio_of_num : num -> ratio
value num_of_ratio : ratio -> num
value float_of_num : num -> float
value num_of_float : float -> num
value sys_print_num : int -> string -> num -> string -> unit
value print_num : num -> unit
17.2 arith_status: flags that control rational arithmetic
value arith_status: unit -> unit
Print the current status of the arithmetic flags.
value get_error_when_null_denominator : unit -> bool
value set_error_when_null_denominator : bool -> unit
Get or set the flag null_denominator. When on, attempting to create a
rational with a null denominator raises an exception. When off,
rationals with null denominators are accepted. Initially: on.
value get_normalize_ratio : unit -> bool
value set_normalize_ratio : bool -> unit
Get or set the flag normalize_ratio. When on, rational numbers are
normalized after each operation. When off, rational numbers are not
normalized until printed. Initially: off.
value get_normalize_ratio_when_printing : unit -> bool
value set_normalize_ratio_when_printing : bool -> unit
Get or set the flag normalize_ratio_when_printing. When on, rational
numbers are normalized before being printed. When off, rational numbers
are printed as is, without normalization. Initially: on.
value get_approx_printing : unit -> bool
value set_approx_printing : bool -> unit
Get or set the flag approx_printing. When on, rational numbers are
printed as a decimal approximation. When off, rational numbers are
printed as a fraction. Initially: off.
value get_floating_precision : unit -> int
value set_floating_precision : int -> unit
Get or set the parameter floating_precision. This parameter is the
number of digits displayed when approx_printing is on. Initially: 12.
Chapter 18
The str library: regular expressions and string processing
The str library (distributed in contrib/libstr) provides high-level string
processing functions, some based on regular expressions. It is intended to
support the kind of file processing that is usually performed with scripting
languages such as awk, perl or sed.
Programs that use the str library must be linked in ``custom runtime'' mode,
as follows:
camlc -custom other options str.zo other files -lstr
For interactive use of the str library, run camllight camlstr.
Mac: This library is not available.
PC: This library is not available.
18.1 str: regular expressions and high-level string processing
Regular expressions
type regexp
The type of compiled regular expressions.
value regexp: string -> regexp
Compile a regular expression. The syntax for regular expressions is the
same as in Gnu Emacs. The special characters are \$^.*+?[]. The
following constructs are recognized:
. matches any character except newline
* (postfix) matches the previous expression zero, one or several
times
+ (postfix) matches the previous expression one or several times
? (postfix) matches the previous expression once or not at all
[..] character set; ranges are denoted with -, as in a-z; an initial ^,
as in ^0-9, complements the set
^ matches at beginning of line
$ matches at end of line
\| (infix) alternative between two expressions
\(..\) grouping and naming of the enclosed expression
\1 the text matched by the first \(...\) expression (\2 for the
second expression, etc)
\b matches word boundaries
172
Chapter 18. The str library: regular expressions and string processing 173
\ quotes special characters.
value regexp_case_fold: string -> regexp
Same as regexp, but the compiled expression will match text in a
case-insensitive way: uppercase and lowercase letters will be considered
equivalent.
String matching and searching
value string_match: regexp -> string -> int -> bool
string_match r s start tests whether the characters in s starting at
position start match the regular expression r. The first character of a
string has position 0, as usual.
value search_forward: regexp -> string -> int -> int
search_forward r s start searchs the string s for a substring matching
the regular expression r. The search starts at position start and
proceeds towards the end of the string. Return the position of the first
character of the matched substring, or raise Not_found if no substring
matches.
value search_backward: regexp -> string -> int -> int
Same as search_forward, but the search proceeds towards the beginning of
the string.
value matched_string: string -> string
matched_string s returns the substring of s that was matched by the
latest string_match, search_forward or search_backward. The user must
make sure that the parameter s is the same string that was passed to the
matching or searching function.
value match_beginning: unit -> int
value match_end: unit -> int
match_beginning() returns the position of the first character of the
substring that was matched by string_match, search_forward or
search_backward. match_end() returns the position of the character
following the last character of the matched substring.
value matched_group: int -> string -> string
matched_group n s returns the substring of s that was matched by the nth
group \(...\) of the regular expression during the latest string_match,
search_forward or search_backward. The user must make sure that the
parameter s is the same string that was passed to the matching or
searching function.
value group_beginning: int -> int
value group_end: int -> int
group_beginning n returns the position of the first character of the
substring that was matched by the nth group of the regular expression.
Chapter 18. The str library: regular expressions and string processing 174
group_end n returns the position of the character following the last
character of the matched substring.
Replacement
value global_replace: regexp -> string -> string -> string
global_replace regexp repl s returns a string identical to s, except that
all substrings of s that match regexp have been replaced by repl. The
replacement text repl can contain \1, \2, etc; these sequences will be
replaced by the text matched by the corresponding group in the regular
expression. \0 stands for the text matched by the whole regular
expression.
value replace_first: regexp -> string -> string -> string
Same as global_replace, except that only the first substring matching the
regular expression is replaced.
value global_substitute: regexp -> (string -> string) -> string -> string
global_substitute regexp subst s returns a string identical to s, except
that all substrings of s that match regexp have been replaced by the
result of function subst. The function subst is called once for each
matching substring, and receives s (the whole text) as argument.
value substitute_first: regexp -> (string -> string) -> string -> string
Same as global_substitute, except that only the first substring matching
the regular expression is replaced.
Splitting
value split: regexp -> string -> string list
split r s splits s into substrings, taking as delimiters the substrings
that match r, and returns the list of substrings. For instance,
split (regexp "[ \t]+") s splits s into blank-separated words.
value bounded_split: regexp -> string -> int -> string list
Same as split, but splits into at most n substrings, where n is the extra
integer parameter.
Joining
value concat: string list -> string
Same as string__concat: catenate a list of string.
value join: string -> string list -> string
Catenate a list of string. The first argument is a separator, which is
inserted between the strings.
Chapter 18. The str library: regular expressions and string processing 175
Extracting substrings
value string_before: string -> int -> string
string_before s n returns the substring of all characters of s that
precede position n (excluding the character at position n).
value string_after: string -> int -> string
string_after s n returns the substring of all characters of s that follow
position n (including the character at position n).
value first_chars: string -> int -> string
first_chars s n returns the first n characters of s. This is the same
function as string_before.
value last_chars: string -> int -> string
last_chars s n returns the last n characters of s.
Formatting
value format: ('a, unit, string) printf__format -> 'a
Same as printf__sprintf.
Part V
Appendix
176
Chapter 19
Further reading
For the interested reader, we list below some references to books and reports
related (sometimes loosely) to Caml Light.
19.1 Programming in ML
The books below are programming courses taught in ML. Their main goal is to
teach programming, not to describe ML in full details --- though most contain
fairly good introductions to the ML language. Some of those books use the
Standard ML dialect instead of the Caml dialect, so you will have to keep in
mind the differences in syntax and in semantics.
- Pierre Weis and Xavier Leroy. Le langage Caml. InterEditions, 1993.
The natural companion to this manual, provided you read French. This
book is a step-by-step introduction to programming in Caml, and presents
many realistic examples of Caml programs.
- Guy Cousineau and Michel Mauny. Approche fonctionnelle de la
programmation. Ediscience, 1995.
Another Caml programming course written in French, with many original
examples.
- Lawrence C. Paulson. ML for the working programmer. Cambridge
University Press, 1991.
A good introduction to programming in Standard ML. Develops a theorem
prover as a complete example. Contains a presentation of the module
system of Standard ML.
- Jeffrey D. Ullman. Elements of ML programming. Prentice Hall, 1993.
Another good introduction to programming in Standard ML. No realistic
examples, but a very detailed presentation of the language constructs.
- Ryan Stansifer. ML primer. Prentice-Hall, 1992.
A short, but nice introduction to programming in Standard ML.
- Therese Accart Hardin and Veronique Donzeau-Gouge Viguie. Concepts et
outils de la programmation. Du fonctionnel a l'imperatif avec Caml et
Ada. InterEditions, 1992.
177
Chapter 19. Further reading 178
A first course in programming, that first introduces the main programming
notions in Caml, then shows them underlying Ada. Intended for beginners;
slow-paced for the others.
- Rachel Harrison. Abstract Data Types in Standard ML. John Wiley & Sons,
1993.
A presentation of Standard ML from the standpoint of abstract data types.
Uses intensively the Standard ML module system.
- Harold Abelson and Gerald Jay Sussman. Structure and Interpretation of
Computer Programs. The MIT press, 1985. (French translation: Structure
et interpretation des programmes informatiques, InterEditions, 1989.)
An outstanding course on programming, taught in Scheme, the modern
dialect of Lisp. Well worth reading, even if you are more interested in
ML than in Lisp.
19.2 Descriptions of ML dialects
The books and reports below are descriptions of various programming languages
from the ML family. They assume some familiarity with ML.
- Xavier Leroy and Pierre Weis. Manuel de reference du langage Caml.
InterEditions, 1993.
The French edition of the present reference manual and user's manual.
- Robert Harper. Introduction to Standard ML. Technical report
ECS-LFCS-86-14, University of Edinburgh, 1986.
An overview of Standard ML, including the module system. Terse, but
still readable.
- Robin Milner, Mads Tofte and Robert Harper. The definition of Standard
ML. The MIT press, 1990.
A complete formal definition of Standard ML, in the framework of
structured operational semantics. This book is probably the most
mathematically precise definition of a programming language ever written.
It is heavy on formalism and extremely terse, so even readers who are
thoroughly familiar with ML will have major difficulties with it.
- Robin Milner and Mads Tofte. Commentary on Standard ML. The MIT Press,
1991.
A commentary on the book above, that attempts to explain the most
delicate parts and motivate the design choices. Easier to read than the
Definition, but still rather involving.
- Guy Cousineau and Gerard Huet. The CAML primer. Technical report 122,
INRIA, 1990.
A short description of the original Caml system, from which Caml Light
has evolved. Some familiarity with Lisp is assumed.
Chapter 19. Further reading 179
- Pierre Weis et al. The CAML reference manual, version 2.6.1. Technical
report 121, INRIA, 1990.
The manual for the original Caml system, from which Caml Light has
evolved.
- Michael J. Gordon, Arthur J. Milner and Christopher P. Wadsworth.
Edinburgh LCF. Lecture Notes in Computer Science volume 78,
Springer-Verlag, 1979.
This is the first published description of the ML language, at the time
when it was nothing more than the control language for the LCF system, a
theorem prover. This book is now obsolete, since the ML language has
much evolved since then; but it is still of historical interest.
- Paul Hudak, Simon Peyton-Jones and Philip Wadler. Report on the
programming language Haskell, version 1.1. Technical report, Yale
University, 1991.
Haskell is a purely functional language with lazy semantics that shares
many important points with ML (full functionality, polymorphic typing),
but has interesting features of its own (dynamic overloading, also called
type classes).
19.3 Implementing functional programming languages
The references below are intended for those who are curious to learn how a
language like Caml Light is compiled and implemented.
- Xavier Leroy. The ZINC experiment: an economical implementation of the
ML language. Technical report 117, INRIA, 1990. (Available by anonymous
FTP on ftp.inria.fr.)
A description of the ZINC implementation, the prototype ML implementation
that has evolved into Caml Light. Large parts of this report still apply
to the current Caml Light system, in particular the description of the
execution model and abstract machine. Other parts are now obsolete. Yet
this report still gives a complete overview of the implementation
techniques used in Caml Light.
- Simon Peyton-Jones. The implementation of functional programming
languages. Prentice-Hall, 1987. (French translation: Mise en uvre des
langages fonctionnels de programmation, Masson, 1990.)
An excellent description of the implementation of purely functional
languages with lazy semantics, using the technique known as graph
reduction. The part of the book that deals with the transformation from
ML to enriched lambda-calculus directly applies to Caml Light. You will
find a good description of how pattern-matching is compiled and how types
are inferred. The remainder of the book does not apply directly to Caml
Light, since Caml Light is not purely functional (it has side-effects),
has strict semantics, and does not use graph reduction at all.
- Andrew W. Appel. Compiling with continuations. Cambridge University
Press, 1992.
Chapter 19. Further reading 180
A complete description of an optimizing compiler for Standard ML, based
on an intermediate representation called continuation-passing style.
Shows how many advanced program optimizations can be applied to ML. Not
directly relevant to the Caml Light system, since Caml Light does not use
continuation-passing style at all, and makes little attempts at
optimizing programs.
19.4 Applications of ML
The following reports show ML at work in various, sometimes unexpected, areas.
- Emmanuel Chailloux and Guy Cousineau. The MLgraph primer. Technical
report 92-15, Ecole Normale Superieure, 1992. (Available by anonymous
FTP on ftp.ens.fr.)
Describes a Caml Light library that produces Postscript pictures through
high-level drawing functions.
- Xavier Leroy. Programmation du systeme Unix en Caml Light. Technical
report 147, INRIA, 1992. (Available by anonymous FTP on ftp.inria.fr.)
A Unix systems programming course, demonstrating the use of the Caml
Light library that gives access to Unix system calls.
- John H. Reppy. Concurrent programming with events --- The concurrent ML
manual. Cornell University, 1990. (Available by anonymous FTP on
research.att.com.)
Concurrent ML extends Standard ML of New Jersey with concurrent processes
that communicate through channels and events.
- Jeannette M. Wing, Manuel Faehndrich, J. Gregory Morrisett and Scottt
Nettles. Extensions to Standard ML to support transactions. Technical
report CMU-CS-92-132, Carnegie-Mellon University, 1992. (Available by
anonymous FTP on reports.adm.cs.cmu.edu.)
How to integrate the basic database operations to Standard ML.
- Emden R. Gansner and John H. Reppy. eXene. Bell Labs, 1991. (Available
by anonymous FTP on research.att.com.)
An interface between Standard ML of New Jersey and the X Windows
windowing system.
Index to the library
! (infix), 112 add_int, 102
!= (infix), 98 add_num, 168
& (infix), 95 alarm, 160
&& (infix), 95 approx_num_exp, 170
* (infix), 99, 102 approx_num_fix, 170
** (infix), 100 arg (module), 118
**. (infix), 100 arith_status, 171
**/ (infix), 169 arith_status (module), 171
*. (infix), 99 asin, 100
*/ (infix), 169 asr (infix), 103
+ (infix), 99, 102 assoc, 111
+. (infix), 99 assq, 111
+/ (infix), 168 atan, 100
- (infix), 99, 102 atan2, 100
-. (infix), 99 background, 144
-/ (infix), 169 Bad (exception), 119
/ (infix), 99, 102 baltree (module), 119
/. (infix), 99 basename, 121
// (infix), 169 big_int_of_num, 171
< (infix), 97 bind, 163
<. (infix), 100 black, 144
(infix), 170 blit_image, 147
<= (infix), 97 blit_string, 114
<=. (infix), 100 blit_vect, 116
<=/ (infix), 170 blue, 144
<> (infix), 97 bool (module), 95
<>. (infix), 100 bounded_split, 174
<>/ (infix), 170 Break (exception), 141
= (infix), 97 builtin (module), 96
=. (infix), 100 button_down, 148
=/ (infix), 170
== (infix), 98 catch_break, 141
> (infix), 97 cd, 51
>. (infix), 100 ceil, 100
>/ (infix), 170 ceiling_num, 169
>= (infix), 97 char (module), 97
>=. (infix), 100 char_for_read, 97
>=/ (infix), 170 char_of_int, 97
@ (infix), 109 chdir, 141, 156
^ (infix), 114 check_suffix, 121
|| (infix), 95 chmod, 155
abs, 102 choose, 138
abs_float, 100 chop_suffix, 121
abs_num, 170 chown, 155
accept, 163 clear, 130, 136, 139
access, 156 clear_graph, 143
acos, 100 clear_parser, 134
add, 119, 131, 133, 136, 137 close, 141, 153
add_float, 99 close_box, 122
181
Index to the library 182
close_graph, 143 dup2, 157
close_in, 109 elements, 138
close_out, 107 empty, 133, 137
close_process, 157 Empty (exception), 136, 139
close_process_in, 157 End_of_file (exception), 104
close_process_out, 157 end_of_stream, 113
close_tbox, 125 environment, 151
closedir, 157 eprint, 136
combine, 111 eprintf, 127, 135
command_line, 140 eq (module), 97
compare, 97, 120, 138 eq_float, 100
compare_num, 170 eq_int, 102
compare_strings, 115 eq_num, 170
compile, 49 eq_string, 115
concat, 114, 120, 174 equal, 138
concat_vect, 116 err_formatter, 126
connect, 163 error_message, 151
contains, 119 establish_server, 164
copy_vect, 116 exc (module), 98
cos, 100 except, 110
cosh, 100 exceptq, 110
create_image, 147 execv, 151
create_lexer, 132 execve, 151
create_lexer_channel, 132 execvp, 152
create_lexer_string, 132 exists, 110
create_string, 114 exit, 104, 140
current_dir_name, 120 Exit (exception), 98
current_point, 145 exp, 100
cyan, 144
debug_mode, 50 Failure (exception), 98
decr, 112 failwith, 99
decr_num, 170 fchar (module), 99
denominator_num, 168 fchmod, 155
descr_of_in_channel, 153 fchown, 155
descr_of_out_channel, 154 fcntl_int, 156
diff, 138 fcntl_ptr, 156
directory, 51 filename (module), 120
dirname, 121 fill_arc, 146
div_float, 99 fill_circle, 146
div_int, 102 fill_ellipse, 146
div_num, 169 fill_poly, 146
Division_by_zero (exception), 102 fill_rect, 146
do_list, 109 fill_string, 114
do_list2, 110 fill_vect, 116
do_list_combine, 112 find, 120, 131, 133
do_stream, 113 find_all, 131
do_table, 131 first_chars, 175
do_table_rev, 131 flat_map, 110
do_vect, 117 float, 137
draw_arc, 145 float (module), 99
draw_char, 145 float_of_int, 99
draw_circle, 145 float_of_num, 171
draw_ellipse, 145 float_of_string, 101
draw_image, 146 floor, 100
draw_string, 145 floor_num, 169
dump_image, 146 flush, 106
dup, 157 fold, 138
Index to the library 183
for_all, 110 getpwuid, 162
force_newline, 123 getservbyname, 165
foreground, 144 getservbyport, 165
fork, 152 getsockname, 163
format, 175 gettimeofday, 160
format (module), 121 getuid, 160
fprint, 135 global_replace, 174
fprintf, 127, 135 global_substitute, 174
frexp, 101 gmtime, 160
fst, 111 Graphic_failure (exception), 143
fstat, 155 graphics (module), 143
fstring (module), 101 green, 144
ftruncate, 154 group_beginning, 173
full_init, 137 group_end, 173
full_major, 129 gt_float, 100
fvect (module), 101 gt_int, 102
gc (module), 128 gt_num, 170
ge_float, 100 gt_string, 115
ge_int, 102 handle_unix_error, 151
ge_num, 170 hash, 131
ge_string, 115 hash_param, 131
genlex (module), 129 hashtbl (module), 130
get, 129 hd, 109
get_approx_printing, 171 in_channel_length, 109
get_ellipsis_text, 125 in_channel_of_descr, 153
get_error_when_null_denominator, 171 include, 49
get_floating_precision, 171 incr, 112
get_formatter_output_functions, 126 incr_num, 170
get_image, 147 index, 111
get_lexeme, 132 index_char, 115
get_lexeme_char, 133 index_char_from, 115
get_lexeme_end, 133 inet_addr_of_string, 162
get_lexeme_start, 133 init, 137
get_margin, 123 init_vect, 116
get_max_boxes, 124 input, 108
get_max_indent, 124 input_binary_int, 108
get_normalize_ratio, 171 input_byte, 108
get_normalize_ratio_when_printing, input_char, 108
171 input_line, 108
getcwd, 156 input_value, 108
getegid, 161 install_printer, 50
getenv, 141 int, 137
geteuid, 161 int (module), 101
getgid, 161 int_of_char, 97
getgrgid, 162 int_of_float, 99
getgrnam, 161 int_of_num, 171
getgroups, 161 int_of_string, 103
gethostbyaddr, 165 integer_num, 169
gethostbyname, 165 inter, 138
gethostname, 164 interactive, 140
getlogin, 161 intersect, 111
getpeername, 163 invalid_arg, 99
getpid, 152 Invalid_argument (exception), 98
getppid, 152 io (module), 104
getprotobyname, 165 ioctl_int, 158
getprotobynumber, 165 ioctl_ptr, 158
getpwnam, 161
Index to the library 184
is_absolute, 121 map2, 110
is_empty, 137 map_combine, 112
is_integer_num, 169 map_vect, 117
it_list, 109 map_vect_list, 117
it_list2, 110 match_beginning, 173
iter, 134, 136, 138, 139 match_end, 173
join, 174 Match_failure (exception), 21--23,
96
key_pressed, 148 matched_group, 173
kill, 159 matched_string, 173
max, 98
land (infix), 103 max_int, 103
last_chars, 175 max_num, 170
ldexp, 101 mem, 110, 137
le_float, 100 mem_assoc, 111
le_int, 102 memq, 110
le_num, 170 merge, 138
le_string, 115 min, 98
length, 136, 139 min_int, 103
lexing (module), 132 min_num, 170
lineto, 145 minor, 129
link, 155 minus, 99, 102
list (module), 109 minus_float, 99
list_it, 110 minus_int, 102
list_it2, 110 minus_num, 169
list_length, 109 mkdir, 156
list_of_vect, 117 mkfifo, 157
listen, 163 mod (infix), 102
lnot, 103 mod_float, 101
load, 49 mod_num, 169
load_object, 49 modf, 101
localtime, 160 modify, 120
lockf, 158 mouse_pos, 147
log, 100 moveto, 145
log10, 100 mult_float, 99
lor (infix), 103 mult_int, 102
lseek, 154 mult_num, 169
lshift_left, 103 nat_of_num, 171
lshift_right, 103 neq_float, 100
lsl (infix), 103 neq_int, 102
lsr (infix), 103 neq_string, 115
lstat, 155 new, 130, 136, 139
lt_float, 100 nice, 152
lt_int, 102 normalize_num, 168
lt_num, 170 not (infix), 96
lt_string, 115 Not_found (exception), 98
lxor (infix), 103 nth_char, 113
magenta, 144 num (module), 168
major, 129 num_of_big_int, 171
make_formatter, 126 num_of_float, 171
make_image, 146 num_of_int, 171
make_lexer, 130 num_of_nat, 171
make_matrix, 116 num_of_ratio, 171
make_string, 114 num_of_string, 170
make_vect, 116 numerator_num, 168
map, 109 open, 140, 153
map (module), 133
Index to the library 185
open_box, 122 pp_open_hbox, 127
open_connection, 164 pp_open_hovbox, 127
open_descriptor_in, 107 pp_open_hvbox, 127
open_descriptor_out, 106 pp_open_tbox, 127
open_graph, 143 pp_open_vbox, 127
open_hbox, 124 pp_over_max_boxes, 127
open_hovbox, 124 pp_print_as, 127
open_hvbox, 124 pp_print_bool, 127
open_in, 107 pp_print_break, 127
open_in_bin, 107 pp_print_char, 127
open_in_gen, 107 pp_print_cut, 127
open_out, 105 pp_print_float, 127
open_out_bin, 106 pp_print_flush, 127
open_out_gen, 106 pp_print_if_newline, 127
open_process, 157 pp_print_int, 127
open_process_in, 157 pp_print_newline, 127
open_process_out, 157 pp_print_space, 127
open_tbox, 125 pp_print_string, 127
open_vbox, 124 pp_print_tab, 127
opendir, 156 pp_print_tbreak, 127
or (infix), 95 pp_set_ellipsis_text, 127
out_channel_length, 107 pp_set_formatter_out_channel, 127
out_channel_of_descr, 153 pp_set_formatter_output_functions,
Out_of_memory (exception), 98 127
output, 106 pp_set_margin, 127
output_binary_int, 106 pp_set_max_boxes, 127
output_byte, 106 pp_set_max_indent, 127
output_char, 106 pp_set_tab, 127
output_compact_value, 107 pred, 102
output_string, 106 pred_num, 170
output_value, 106 prerr_char, 105
over_max_boxes, 124 prerr_endline, 105
pair (module), 111 prerr_float, 105
parse, 119 prerr_int, 105
Parse_error (exception), 113, 134 prerr_string, 105
Parse_failure (exception), 112 print, 136
parsing (module), 134 print_as, 122
pause, 159 print_bool, 122
peek, 136 print_break, 123
pipe, 157 print_char, 104, 122
plot, 144 print_cut, 123
point_color, 144 print_endline, 104
pop, 139 print_float, 104, 122
pos_in, 109 print_flush, 123
pos_out, 107 print_if_newline, 123
power, 100 print_int, 104, 122
power_num, 169 print_newline, 104, 123
pp_close_box, 127 print_num, 171
pp_close_tbox, 127 print_space, 122
pp_force_newline, 127 print_stat, 129
pp_get_ellipsis_text, 127 print_string, 104, 122
pp_get_formatter_output_functions, print_tab, 125
127 print_tbreak, 125
pp_get_margin, 127 printexc (module), 134
pp_get_max_boxes, 127 printf, 127, 135
pp_get_max_indent, 127 printf (module), 134
pp_open_box, 127 push, 139
Index to the library 186
queue (module), 136 set, 129
quit, 49 set (module), 137
quo (infix), 102 set_approx_printing, 171
quo_num, 169 set_color, 144
raise, 98 set_ellipsis_text, 125
random (module), 137 set_error_when_null_denominator, 171
ratio_of_num, 171 set_floating_precision, 171
read, 153 set_font, 145
read_float, 105 set_formatter_out_channel, 125
read_int, 105 set_formatter_output_functions, 125
read_key, 148 set_line_width, 145
read_line, 105 set_margin, 123
readdir, 156 set_max_boxes, 124
readlink, 157 set_max_indent, 123
really_input, 108 set_normalize_ratio, 171
recv, 163 set_normalize_ratio_when_printing,
recvfrom, 163 171
red, 144 set_nth_char, 114
ref (module), 112 set_print_depth, 50
regexp, 172 set_print_length, 50
regexp_case_fold, 173 set_tab, 125
remove, 120, 131, 133, 138, 141 set_text_size, 145
remove_printer, 50 setgid, 161
rename, 141, 155 setuid, 161
replace_first, 174 shutdown, 163
replace_string, 115 shutdown_connection, 164
rev, 109 sign_num, 169
rewinddir, 156 signal, 159
rgb, 144 sin, 100
rhs_end, 134 sinh, 100
rhs_start, 134 size_x, 144
rindex_char, 115 size_y, 144
rindex_char_from, 115 sleep, 160
rmdir, 156 snd, 111
round_num, 169 socket, 163
socketpair, 163
s_irall, 140 sort, 138
s_irgrp, 140 sort (module), 138
s_iroth, 140 sound, 148
s_irusr, 140 split, 111, 120, 174
s_isgid, 140 sprintf, 135
s_isuid, 140 sqrt, 100
s_iwall, 140 square_num, 169
s_iwgrp, 140 stack (module), 139
s_iwoth, 140 stat, 129, 154
s_iwusr, 140 std_err, 104
s_ixall, 140 std_formatter, 126
s_ixgrp, 140 std_in, 104
s_ixoth, 140 std_out, 104
s_ixusr, 140 stderr, 104, 153
search_backward, 173 stdin, 104, 153
search_forward, 173 stdout, 104, 153
seek_in, 108 str (module), 172
seek_out, 107 stream (module), 112
select, 158 stream_check, 113
send, 164 stream_from, 113
sendto, 164 stream_get, 113
stream_next, 113
Index to the library 187
stream_of_channel, 113 vect_assign, 116
stream_of_string, 113 vect_item, 115
string (module), 113 vect_length, 115
string_after, 175 vect_of_list, 117
string_before, 175 verbose_mode, 50
string_for_read, 115 wait, 152
string_length, 113 wait_next_event, 147
string_match, 173 waitopt, 152
string_of_bool, 96 waitpid, 152
string_of_char, 97 white, 144
string_of_float, 101 write, 153
string_of_inet_addr, 162
string_of_int, 103 yellow, 144
string_of_num, 170
sub_float, 99
sub_int, 102
sub_num, 169
sub_string, 114
sub_vect, 116
substitute_first, 174
subtract, 111
succ, 102
succ_num, 170
symbol_end, 134
symbol_start, 134
symlink, 157
sys (module), 139
Sys_error (exception), 139
sys_print_num, 171
system, 152
system_command, 141
take, 136
tan, 100
tanh, 100
tcdrain, 166
tcflow, 167
tcflush, 167
tcgetattr, 166
tcsendbreak, 166
tcsetattr, 166
text_size, 145
time, 141, 160
times, 160
tl, 109
toplevel (module), 49
trace, 50
transp, 146
truncate, 154
umask, 155
union, 111, 138
unix (module), 149
Unix_error (exception), 151
unlink, 155
untrace, 50
utimes, 160
vect (module), 115
Index of keywords
and, see let, type, exception,
value, where
begin, 19, 20
do, see while, for
done, see while, for
downto, see for
else, see if
end, 19, 20
exception, 28, 29
for, 19, 23
fun, 19
function, 19, 30
if, 19, 22
in, see let
let, 19, 21
match, 19, 23, 30
mutable, 27, 32
not, 19
of, see type, exception
or, 19, 23
prefix, 19, 25, 34
rec, see let, where
then, see if
to, see for
try, 19, 24
type, 27, 29
value, 29
when, 31
where, 32
while, 23
with, see match, try
188