Table of Contents

Name

pure - the Pure interpreter

Synopsis

pure [-h] [-i] [-n] [-q] [-v[level]] [script ...] [-- args ...]
pure [-h] [-i] [-n] [-q] [-v[level]] -x script [args ...]

Options

-h
Print help message and exit.
-i
Force interactive mode (read commands from stdin).
-n
Suppress automatic inclusion of the prelude.
-q
Quiet startup (suppresses sign-on message in interactive mode).
-v
Set verbosity level. See below for details.
-x
Execute script with given command line arguments.
--
Stop option processing and pass the remaining command line arguments in the argv variable.

Description

Pure is a modern-style functional programming language based on term rewriting. Pure programs are basically collections of equational rules used to evaluate expressions in a symbolic fashion by reducing them to normal form. A brief overview of the language can be found in the PURE OVERVIEW section below. (In case you're wondering, the name ``Pure'' actually refers to the adjective. But you can also write it as ``PURE'' and take this as a recursive acronym for the ``Pure Universal Rewriting Engine''.)

pure is the Pure interpreter. The interpreter has an LLVM backend which JIT-compiles Pure programs to machine code, hence programs run blazingly fast and interfacing to C modules is easy, while the interpreter still provides a convenient, fully interactive environment for running Pure scripts and evaluating expressions.

If any source scripts are specified on the command line, they are loaded and executed, after which the interpreter exits. Otherwise the interpreter enters the interactive read-eval-print loop. You can also use the -i option to enter the interactive loop (continue reading from stdin) even after processing some source scripts. To exit the interpreter, just type the quit command or the end-of-file character (^D on Unix) at the beginning of the command line.

When the interpreter is in interactive mode and reads from a tty, commands are read using readline(3) (providing completion for all commands listed in section INTERACTIVE USAGE below, as well as for global function and variable symbols) and, when exiting the interpreter, the command history is stored in ~/.pure_history, from where it is restored the next time you run the interpreter.

Options and source files are processed in the order in which they are given on the command line. Processing of options and source files ends when either the -- or the -x option is encountered. The -x option must be followed by the name of a script to be executed, which becomes the ``main script'' of the application. In either case, any remaining parameters are passed to the executing script by means of the global argc and argv variables, denoting the number of arguments and the list of the actual parameter strings, respectively. In the case of -x this also includes the script name as argv!0. The -x option is useful, in particular, to turn Pure scripts into executable programs by including a ``shebang'' like

#!/usr/local/bin/pure -x

as the first line in your main script. (This trick only works with Unix shells, though.)

On startup, the interpreter also defines the version variable, which is set to the version string of the Pure interpreter, and the sysinfo variable, which provides a string identifying the host system. These are useful if parts of your script depend on the particular version of the interpreter and the system it runs on.

If available, the prelude script prelude.pure is loaded by the interpreter prior to any other other definitions, unless the -n option is specified. The prelude as well as other source scripts specified with a relative pathname are first searched for in the current directory and then in the directory specified with the PURELIB environment variable. If the PURELIB variable is not set, a system-specific default is used.

The -v option is most useful for debugging the interpreter, or if you are interested in the code your program gets compiled to. The level argument is optional; it defaults to 1. Six different levels are implemented at this time (two more bits are reserved for future extensions). For most purposes, only the first two levels will be useful for the average Pure programmer; the remaining levels are most likely to be used by the Pure interpreter developers.

1 (0x1)
denotes echoing of parsed definitions and expressions;
2 (0x2)
adds special annotations concerning local bindings (de Bruijn indices, subterm paths; this can be helpful to debug tricky variable binding issues);
4 (0x4)
adds abstract code snippets (matching automata etc.; you probably want to see this only when working on the guts of the interpreter).
8 (0x8)
dumps the ``real'' output code (LLVM assembler, which is as close to the native machine code for your program as it gets; you definitely don't want to see this unless you have to inspect the generated code for bugs or performance issues).
16 (0x10)
adds debugging messages from the bison(1) parser; useful for debugging the parser.
32 (0x20)
adds debugging messages from the flex(1) lexer; useful for debugging the lexer.

These values can be or'ed together, and, for convenience, can be specified in either decimal or hexadecimal. Thus 0xff always gives you full debugging output (which isn't most likely be used by anyone but the Pure developers).

Note that the -v option is only applied after the prelude has been loaded. If you want to debug the prelude, use the -n option and specify the prelude.pure file explicitly on the command line. Alternatively, you can also use the interactive list command (see the INTERACTIVE USAGE section below) to list definitions along with additional debugging information.

Pure Overview

Pure is a fairly simple language. Programs are simply collections of equational rules defining functions, let commands binding global variables, and expressions to be evaluated. Here's a simple example, entered interactively in the interpreter:

> // my first Pure example
> fact 1 = 1;
> fact n::int = n*fact (n-1) if n>1;
> let x = fact 10; x;
3628800

The language is free-format (blanks are insignificant). As indicated, definitions and expressions at the toplevel have to be terminated with a semicolon. Comments have the same syntax as in C++ (using // for line-oriented and /* ... */ for multiline comments; the latter may not be nested). Lines beginning with #! are treated as comments, too; as already discussed above, on Unix-like systems this allows you to add a ``shebang'' to your main script in order to turn it into an executable program.

On the surface, Pure is quite similar to other modern functional languages like Haskell and ML. But under the hood it is a much more dynamic and reflective language, more akin to Lisp. In particular, Pure is dynamically typed, so functions can be fully polymorphic and you can add to the definition of an existing function at any time:

> fact 1.0 = 1.0;
> fact n::double = n*fact (n-1) if n>1;
> fact 10.0;
3628800.0
> fact 10;
3628800

Also, due to its term rewriting semantics, Pure can do symbolic evaluations:

> square x = x*x;
> square (a+b);
(a+b)*(a+b)

The Pure language provides built-in support for machine integers (32 bit), bigints (implemented using GMP), floating point values (double precision IEEE), character strings (UTF-8 encoded) and generic C pointers (these don't have a syntactic representation in Pure, though, so they need to be created with external C functions). Truth values are encoded as machine integers (as you might expect, zero denotes ``false'' and any non-zero value ``true'').

Expressions are generally evaluated from left to right, innermost expressions first, i.e., using call by value semantics. Pure also has a few built-in special forms (most notably, conditional expressions and the short-circuit logical connectives && and ||) which take some of their arguments using call by name semantics.

Expressions consist of the following elements:

Constants: 4711, 4711L, 1.2e-3, "Hello, world!\n"
The usual C'ish notations for integers (decimal, hexadecimal, octal), floating point values and double-quoted strings are all provided, although the Pure syntax differs in some minor ways, as discussed in the following. First, there is a special notation for denoting bigints. Note that an integer constant that is too large to fit into a machine integer will be interpreted as a bigint automatically. Moreover, as in Python an integer literal immediately followed by the uppercase letter ``L'' will always be interpreted as a bigint constant, even if it fits into a machine integer. This notation is also used when printing bigint constants. Second, character escapes in Pure strings have a more flexible syntax borrowed from the author's Q language, which provides notations to specify any Unicode character. In particular, the notation \n, where n is an integer literal written in decimal (no prefix), hexadecimal (`0x' prefix) or octal (`0' prefix) notation, denotes the Unicode character (code point) #n. Since these escapes may consist of a varying number of digits, parentheses may be used for disambiguation purposes; thus, e.g. "\(123)4" denotes character #123 followed by the character `4'. The usual C-like escapes for special non-printable characters such as \n are also supported. Moreover, you can use symbolic character escapes of the form \&name;, where name is any of the XML single character entity names specified in the ``XML Entity definitions for Characters'', see http://www.w3.org/TR/xml-entity-names/. Thus, e.g., "\©" denotes the copyright character (code point 0x000A9).
Function and variable symbols: foo, foo_bar, BAR, bar2
These consist of the usual sequence of ASCII letters (including the underscore) and digits, starting with a letter. Case is significant, but it doesn't carry any meaning (that's in contrast to languages like Prolog and Q, where variables must be capitalized). Pure simply distinguishes function and variable symbols on the left-hand side of an equation by the ``head = function'' rule: Any symbol which occurs as the head symbol of a function application is a function symbol, all other symbols are variables -- except symbols explicitly declared as ``constant'' a.k.a. nullary symbols, see below. Another important thing to know is that in Pure, keeping with the tradition of term rewriting, there's no distinction between ``defined'' and ``constructor'' function symbols; any function symbol can also act as a constructor if it happens to occur in a normal form term.
Operator and constant symbols: x+y, x==y, not x
As indicated, these take the form of an identifier or a sequence of ASCII punctuation symbols, as defined in the source using corresponding prefix, postfix and infix declarations, which are discussed in section DECLARATIONS. Enclosing an operator in parentheses, such as (+) or (not), turns it into an ordinary function symbol. Symbols can also be defined as nullary to denote special constant symbols. See the prelude for examples.
Lists and tuples: [x,y,z], x..y, x:xs, x,y,z
The necessary constructors to build lists and tuples are actually defined in the prelude: `[]' and `()' are the empty list and tuple, `:' produces list ``conses'', and `,' produces ``pairs''. As indicated, Pure provides the usual syntactic sugar for list values in brackets, such as [x,y,z], which is exactly the same as x:y:z:[]. Moreover, the prelude also provides an infix `..' operator to denote arithmetic sequences such as 1..10 or 1.0,1.2..3.0. Pure's tuples are a bit unusual, however: They are constructed by just ``paring'' things using the `,' operator, for which the empty tuple acts as a neutral element (i.e., (),x is just x, as is x,()). The pairing operator is associative, which implies that tuples are completely flat (i.e., x,(y,z) is just x,y,z, as is (x,y),z). This means that there are no nested tuples (tuples of tuples), if you need such constructs then you should use lists instead. Also note that the parentheses are not part of the tuple syntax in Pure, although you can use parentheses, just as with any other expression, for the usual purpose of grouping expressions and overriding default precedences and associativity. This means that a list of tuples will be printed (and must also be entered) using the ``canonical'' representation (x1,y1):(x2,y2):...:[] rather than [(x1,y1),(x2,y2),...] (which denotes just [x1,y1,x2,y2,...]).
List comprehensions: [x,y; x = 1..n; y = 1..m; x<y]
Pure also has list comprehensions which generate lists from an expression and one or more ``generator'' and ``filter'' clauses (the former bind a pattern to values drawn from a list, the latter are just predicates determining which generated elements should actually be added to the output list). List comprehensions are in fact syntactic sugar for a combination of nested lambdas, conditional expressions and ``catmaps'' (a list operation which combines list concatenation and mapping a function over a list, defined in the prelude), but they are often much easier to write.
Function applications: foo x y z
As in other modern FPLs, these are written simply as juxtaposition (i.e., in ``curried'' form) and associate to the left. Operator applications are written using prefix, postfix or infix notation, as the declaration of the operator demands, but are just ordinary function applications in disguise. E.g., x+y is exactly the same as (+) x y.
Conditional expressions: if x then y else z
Evaluates to y or z depending on whether x is ``true'' (i.e., a nonzero integer). An exception is generated if the condition is not an integer. Conditional expressions are special forms with call-by-name arguments y and z; only one of the branches is actually evaluated. (The logical operators && and || are treated in a similar fashion, in order to implement short-circuit semantics.)
Lambdas: \x -> y
These work pretty much like in Haskell. More than one variable may be bound (e.g, \x y -> x*y), which is equivalent to a nested lambda (\x -> \y -> x*y). Pure also fully supports pattern-matching lambda abstractions which match a pattern against the lambda argument and bind multiple lambda variables in one go, such as \(x,y) -> x*y.
Case expressions: case x of rule; ... end
Matches an expression, discriminating over a number of different patterns; similar to the Haskell case construct.
When expressions: x when rule; ... end
An alternative way to bind local variables by matching a collection of subject terms against corresponding patterns. Similar to Aardappel's when construct, but Pure allows more than one definition. Note that multiple definitions in a when clause are processed from left to right, so that later definitions may refer to the variables in earlier ones. In fact, a when expression with multiple definitions is treated like several nested when expressions, with the first binding being the ``outermost'' one.
With expressions: x with rule; ... end
Defines local functions. Like Haskell's where construct, but can be used anywhere inside an expression (just like Aardappel's where, but Pure uses the keyword with which better lines up with case and when). Also note that while Haskell lets you do both function definitions and ``pattern bindings'' in its where clauses, in Pure you have to use with for the former and when for the latter. This is necessary because Pure, in contrast to Haskell, does not distinguish between defined functions and constructors and thus there is no magic to figure out whether an equation is meant as a function definition or a pattern binding.

Expressions are parsed according to the following precedence rules: Lambda binds most weakly, followed by when, with and case, followed by conditional expressions (if-then-else), followed by the ``simple'' expressions (i.e., all other kinds of expressions involving operators, function applications, constants, symbols and other primary expressions). Precedence and associativity of operator symbols are given by their declarations (in the prelude or the user's program), and function application binds stronger than all operators. Parentheses can be used to override default precedences and associativities as usual.

At the toplevel, a Pure program basically consists of rules a.k.a. equations defining functions, variable definitions a.k.a. global ``pattern bindings'', and expressions to be evaluated.

Rules: lhs = rhs;
The basic form can also be augmented with a condition if guard tacked on to the end of the rule (which restricts the applicability of the rule to the case that the guard evaluates to a nonzero integer), or the keyword otherwise denoting an empty guard which is always true (this is nothing but syntactic sugar to point out the ``default'' case of a definition; the interpreter just treats this as a comment).

Pure also provides some abbreviations for factoring out common left-hand or right-hand sides in collections of rules; see below for details.

Global variable bindings: let lhs = rhs;
This binds every variable in the left-hand side pattern to the corresponding subterm of the evaluated right-hand side.
Toplevel expressions: expr;
A singleton expression at the toplevel, terminated with a semicolon, simply causes the given value to be evaluated (and the result to be printed, when running in interactive mode).

Basically, the same rule syntax is used to define functions at the toplevel and in with expressions, as well as inside case and when expressions for the purpose of performing pattern bindings (however, for obvious reasons guards are not permitted in when expressions). When matching against a function call or the subject term in a case expression, the rules are always considered in the order in which they are written, and the first matching rule (whose guard evaluates to a nonzero value, if applicable) is picked. (Again, the when construct is treated differently, because each rule is actually a separate pattern binding.)

In any case, the left-hand side pattern must not contain repeated variables (i.e., rules must be ``left-linear''), except for the ``anonymous'' variable `_' which matches an arbitrary value without binding a variable symbol. Moreover, a left-hand side variable may be followed by one of the special type tags ::int, ::bigint, ::double, ::string, to indicate that it can only match a constant value of the corresponding built-in type. (This is useful if you want to write rules matching any object of one of these types; note that there is no way to write out all ``constructors'' for the built-in types, as there are infinitely many.)

The left-hand side of a rule can be omitted if it is the same as for the previous rule. This provides a convenient means to write out a collection of equations for the same left-hand side which discriminates over different conditions:

lhs       = rhs if guard;
          = rhs if guard;
          ...
          = rhs otherwise;

Pure also allows a collection of rules with different left-hand sides but the same right-hand side(s) to be abbreviated as follows:

lhs       |
          ...
lhs       = rhs;

This is useful if you need different specializations of the same rule which use different type tags on the left-hand side variables. For instance:

fact n::int    |
fact n::double |
fact n         = n*fact(n-1) if n>0;
               = 1 otherwise;

In fact, the left-hand sides don't have to be related at all, so that you can also write something like:

foo x | bar y = x*y;

The same works in case expressions, which is convenient if different cases should be mapped to the same value, e.g.:

case ans of "y" | "Y" = 1; _ = 0; end;

Here are some more definitions showing most of the elements discussed above in action:

fact n  = n*fact (n-1) if n>0;
        = 1 otherwise;
fib n   = a  when a, b   = fibs n end
             with fibs n = 0, 1 if n<=0;
                         = case fibs (n-1) of
                             a, b = b, a+b;
                           end;
             end;
let facts = map fact (1..10); let fibs = map fib (1..100);
facts; fibs;

This is a little list comprehension example: Erathosthenes' classical prime sieve.

primes n        = sieve (2..n) with
  sieve []      = [];
  sieve (p:qs)  = p : sieve [q; q = qs; q mod p];
end;

For instance:

> primes 100;
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]

If you dare, you can actually have a look at the catmap-lambda-if-then-else expression the comprehension expanded to:

> list primes
primes n = sieve (2..n) with sieve [] = []; sieve (p:qs) = p:sieve
(catmap (\q -> if q mod p then [q] else []) qs) end;

We mention in passing that list comprehensions are also a useful device to organize backtracking searches. For instance, here's an algorithm for the n queens problem, which returns the list of all placements of n queens on an n x n board (encoded as lists of n pairs (i,j) with i = 1..n), so that no two queens hold each other in check.

queens n        = search n 1 [] with
  search n i p  = [reverse p] if i>n;
                = cat [search n (i+1) ((i,j):p); j = 1..n; safe (i,j) p];
  safe (i,j) p  = not any (check (i,j)) p;
  check (i1,j1) (i2,j2)
                = i1==i2 || j1==j2 || i1+j1==i2+j2 || i1-j1==i2-j2;
end;

Exception Handling

Pure also offers a useful exception handling facility. To raise an exception, you just invoke the built-in function throw with the value to be thrown as the argument. To catch an exception, you use the built-in special form catch with the exception handler (a function to be applied to the exception value) as the first and the expression to be evaluated as the second (call-by-name) argument. For instance:

> catch error (throw hello_world);
error hello_world

Exceptions are also generated by the runtime system if the program runs out of stack space, when a guard does not evaluate to a truth value, and when the subject term fails to match the pattern in a pattern-matching lambda abstraction, or a let, case or when construct. These types of exceptions are reported using the symbols stack_fault, failed_cond and failed_match, respectively, which are declared as constant symbols in the standard prelude. You can use catch to handle these kinds of exceptions just like any other. For instance:

> fact n = if n>0 then n*fact(n-1) else 1;
> catch error (fact foo);
error failed_cond
> catch error (fact 100000);
error stack_fault

(You'll only get the latter kind of exception if the interpreter does stack checks, see the discussion of the PURE_STACK environment variable in the CAVEATS AND NOTES section.)

Note that unhandled exceptions are reported by the interpreter with a corresponding error message:

> fact foo;
<stdin>:2.0-7: unhandled exception 'failed_cond' while evaluating 'fact foo'

Exceptions can also be used to implement non-local value returns. For instance, here's a variation of our n queens algorithm which only returns the first solution. Note the use of throw in the recursive search routine to bail out with a solution as soon as we found one. The value thrown there is caught in the main routine. If no value gets thrown, the function regularly returns with () to indicate that there is no solution.

queens1 n       = catch reverse (search n 1 []) with
  search n i p  = throw p if i>n;
                = void [search n (i+1) ((i,j):p); j = 1..n; safe (i,j) p];
  safe (i,j) p  = not any (check (i,j)) p;
  check (i1,j1) (i2,j2)
                = i1==i2 || j1==j2 || i1+j1==i2+j2 || i1-j1==i2-j2;
end;

E.g., let's compute a solution for a standard 8x8 board:

> queens 8;
(1,1):(2,5):(3,8):(4,6):(5,3):(6,7):(7,2):(8,4):[]

Declarations

As you probably noticed, Pure is very terse. That's because, in contrast to hopelessly verbose languages like Java, you don't declare much stuff in Pure, you just define it and be done with it. Usually, all necessary information about the defined symbols is inferred automatically. However, there are a few toplevel constructs which let you declare special symbol attributes and manage programs consisting of several source modules. These are: operator and constant symbol declarations, extern declarations for external C functions (described in the next section), and using clauses which provide a simple include file mechanism.
Operator and constant declarations: infix level op ...;
Ten different precedence levels are available for user-defined operators, numbered 0 (lowest) thru 9 (highest). On each precedence level, you can declare (in order of increasing precedence) infix (binary non-associative), infixl (binary left-associative), infixr (binary right-associative), prefix (unary prefix) and postfix (unary postfix) operators. For instance:

infixl 6 + - ;
infixl 7 * / div mod ;

Moreover, constant symbols are introduced using a declaration of the form:

nullary symbol ...;

Examples for all of these can be found in the prelude which declares a bunch of standard (arithmetic, relational, logical) operator symbols as well as the list and pair constructors `:' and `,' and the constant symbols `[]' and `()' denoting the empty list and tuple, respectively.

Using clause: using name ...;
Causes each given script to be included, at the position of the using clause, but only if the script was not included already. The script name can be specified either as a string denoting the proper filename (possibly including path and/or filename extension), or as an identifier. In the latter case, the .pure filename extension is added automatically. In both cases, the script is searched for in the current directory and the directory named by the PURELIB environment variable. (The using clause also has an alternative form which allows dynamic libraries to be loaded, this will be discussed in the following section.)

C Interface

Accessing C functions from Pure programs is dead simple. You just need an extern declaration of the function, which is a simplified kind of C prototype. The function can then be called in Pure just like any other. For instance, the following commands, entered interactively in the interpreter, let you use the sin function from the C library (of course you could just as well put the extern declaration into a script):

> extern double sin(double);
> sin 0.3;
0.29552020666134

For clarity, the parameter types can also be annotated with parameter names, e.g.:

extern double sin(double x);

Parameter names in prototypes only serve informational purposes and are for the human reader; they are effectively treated as comments by the compiler.

The interpreter makes sure that the parameters in a call match; if not, the call is treated as a normal form expression. The range of supported C types is a bit limited right now (void, bool, char, short, int, long, double, as well as arbitrary pointer types, i.e.: void*, char*, etc.), but in practice these should cover most kinds of calls that need to be done when interfacing to C libraries.

Since Pure only has 32 bit machine integers and GMP bigints, a variety of C integer types are provided which are converted from/to the Pure types in a straightfoward way. The short type indicates 16 bit integers which are converted from/to Pure machine ints using truncation and sign extension, respectively. The long type always denotes 64 bit integers, even if the corresponding C type is actually 32 bit (as it usually is on most contemporary systems). This type is to be used if a C function takes or returns 64 bit integer values. For a long parameter you can either pass a Pure machine int (which is sign-extended to 64 bit) or a Pure bigint (which is truncated to 64 bit if necessary). 64 bit return values are always converted to (signed) Pure bigints.

Concerning the pointer types, char* is for string arguments and return values which need translation between Pure's internal utf-8 representation and the system encoding, while void* is for any generic kind of pointer (including strings, which are not translated when passed/returned as void*). Any other kind of pointer (except expr*, see below) is effectively treated as void* right now, although in a future version the interpreter may keep track of the type names for the purpose of checking parameter types.

The expr* pointer type is special; it indicates a Pure expression parameter or return value which is just passed through unchanged. All other types of values have to be ``unboxed'' when they are passed as arguments (i.e., from Pure to C) and ``boxed'' again when they are returned as function results (from C to Pure). All of this is handled by the runtime system in a transparent way, of course.

It is even possible to augment an external C function with ordinary Pure equations, but in this case you have to make sure that the extern declaration of the function comes first. For instance, we might want to extend our imported sin function with a rule to handle integers:

> sin 0;
sin 0
> sin x::int = sin (double x);
> sin 0;
0.0

Sometimes it is preferable to replace a C function with a wrapper function written in Pure. In such a case you can specify an alias under which the original C function is known to the Pure program, so that you can still call the C function from the wrapper. An alias is introduced by terminating the extern declaration with a clause of the form ``= alias''. For instance:

> extern double sin(double) = c_sin;
> sin x::double = c_sin x;
> sin x::int = c_sin (double x);
> sin 0.3; sin 0;
0.29552020666134
0.0

External C functions are resolved by the LLVM runtime, which first looks for the symbol in the C library and Pure's runtime library (or the interpreter executable, if the interpreter was linked statically). Thus all C library and Pure runtime functions are readily available in Pure programs. Other functions can be provided by including them in the runtime, or by linking the interpreter against the corresponding modules. Or, better yet, you can just ``dlopen'' shared libraries at runtime with a special form of the using clause:

using "lib:libname[.ext]";

For instance, if you want to call the GMP functions directly from Pure:

using "lib:libgmp";

After this declaration the GMP functions will be ready to be imported into your Pure program by means of corresponding extern declarations.

Shared libraries opened with using clauses are searched for on the usual system linker path (LD_LIBRARY_PATH on Linux). The necessary filename suffix (e.g., .so on Linux or .dll on Windows) will also be supplied automatically. You can also specify a full pathname for the library if you prefer that. If a library file cannot be found, or if an extern declaration names a function symbol which cannot be resolved, an appropriate error message is printed.

Standard Library

Pure comes with a collection of Pure library modules, which includes the standard prelude. Right now the library is pretty rudimentary, but it offers the necessary functions to work with the built-in types (including arithmetic and logical operations) and to do most kind of list processing you can find in ML- and Haskell-like languages. Please refer to the prelude.pure file for details on the provided operations. Also, the beginnings of a system interface can be found in the system.pure module. In particular, this also includes operations to do basic I/O. More stuff will be provided in future releases.

Interactive Usage

In interactive mode, the interpreter reads definitions and expressions and processes them as usual. The input language is just the same as for source scripts, and hence individual definitions and expressions must be terminated with a semicolon before they are processed. For instance, here is a simple interaction which defines the factorial and then uses that definition in some evaluations. Input lines begin with ``>'', which is the interpreter's default command prompt:

> fact 1 = 1;
> fact n = n*fact (n-1) if n>1;
> let x = fact 10; x;
3628800
> map fact (1..10);
[1,2,6,24,120,720,5040,40320,362880,3628800]

When running interactively, the interpreter also accepts a number of special commands useful for interactive purposes. Here is a quick rundown of the currently supported operations:

! command
Shell escape.
cd dir
Change the current working dir.
clear [symbol ...]
Purge the definitions of the given symbols (functions or global variables). If no symbols are given, purge all definitions (after confirmation) made after the most recent save command (or the beginning of the interactive session). See the DEFINITION LEVELS AND OVERRIDE MODE section below for details.
help [args]
Display the pure(1) manpage, or invoke man(1) with the given arguments.
list [option ...] [symbol ...]
List defined symbols in various formats. See the LIST COMMAND section below for details.
ls [args]
List files (shell ls(1) command).
override
Enter ``override'' mode. This allows you to add equations ``above'' existing definitions in the source script, possibly overriding existing equations. See the DEFINITION LEVELS AND OVERRIDE MODE section below for details.
pwd
Print the current working dir (shell pwd(1) command).
quit
Exits the interpreter.
run script
Loads the given script file and adds its definitions to the current environment. This works more or less like a using clause, but loads the script ``anonymously'', as if the contents of the script had been typed at the command prompt. That is, run doesn't check whether the script is being used already and it puts the definitions on the current temporary level (so that clear can be used to remove them again).
save
Begin a new level of temporary definitions. A subsequent clear command (see above) will purge all definitions made after the most recent save (or the beginning of the interactive session). See the DEFINITION LEVELS AND OVERRIDE MODE section below for details.
stats [on|off]
Enables (default) or disables ``stats'' mode, in which various statistics are printed after an expression has been evaluated. Currently, this just prints the cpu time in seconds for each evaluation, but in the future additional profiling information may be provided.
underride
Exits ``override'' mode. This returns you to the normal mode of operation, where new equations are added `below'' previous rules of an existing function. See the DEFINITION LEVELS AND OVERRIDE MODE section below for details.

Note that these special commands are only recognized at the beginning of the interactive command line. (Thus you can escape a symbol looking like a command by prefixing it with a space.)

Some commands which are especially important for effective operation of the interpreter are discussed in more detail in the following sections.

List Command

In interactive mode, the list command can be used to obtain information about defined symbols in various formats. This command recognizes the following options. Options may be combined, thus, e.g., list -tvl is the same as list -t -v -l.
-c
Annotate printed definitions with compiled code (matching automata). Works like the -v4 option of the interpreter.
-d
Disassembles LLVM IR, showing the generated LLVM assembler code of a function. Works like the -v8 option of the interpreter.
-e
Annotate printed definitions with lexical environment information (de Bruijn indices, subterm paths). Works like the -v2 option of the interpreter.
-f
Print information about function symbols only.
-g
Indicates that the following symbols are actually shell glob patterns and that all matching symbols should be listed.
-h
Print a short help message.
-l
Long format, prints definitions along with the summary symbol information. This implies -s.
-s
Summary format, print just summary information about listed symbols.
-t[level]
List only ``temporary'' symbols and definitions at the given level (the current level by default) or above. The level parameter, if given, must immediately follow the option character. A level of 1 denotes all temporary definitions, whereas 0 indicates all definitions (which is the default if -t is not specified). See the DEFINITION LEVELS AND OVERRIDE MODE section below for information about the notion of temporary definition levels.
-v
Print information about variable symbols only.

Note that some of the options (in particular, -c and -d) may produce excessive amounts of information. By setting the PURE_MORE environment variable accordingly, you can specify a shell command to be used for paging, usually more(1) or less(1) .

For instance, to list all definitions in all loaded scripts (including the prelude), simply say:

> list

This may produce quite a lot of output, depending on which scripts are loaded. The following command will only show summary information about the variable symbols along with their current values (using the ``long format''):

> list -lv
argc     var  argc = 0;
argv     var  argv = [];
sysinfo  var  sysinfo = "i686-pc-linux-gnu";
version  var  version = "0.4";
4 variables

If you're like me then you'll frequently have to look up how some operations are defined. No sweat, with the Pure interpreter there's no need to dive into the sources, the list command can easily do it for you. For instance, here's how you can list the definitions of all list ``zipping'' operations from the prelude in one go:

> list -g zip*
zip (x:xs) (y:ys) = (x,y):zip xs ys;
zip _ _ = [];
zip3 (x:xs) (y:ys) (z:zs) = (x,y,z):zip3 xs ys zs;
zip3 _ _ _ = [];
zipwith f (x:xs) (y:ys) = f x y:zipwith f xs ys;
zipwith f _ _ = [];
zipwith3 f (x:xs) (y:ys) (z:zs) = f x y z:zipwith3 f xs ys zs;
zipwith3 f _ _ _ = [];

Definition Levels and Override Mode

To help with incremental development, the interpreter also offers some facilities to manipulate the current set of definitions interactively. To these ends, defined symbols and their definitions are organized into different subsets called levels. The prelude, as well as other source programs specified when invoking the interpreter, are always at level 0, while the interactive environment starts at level 1.

Each save command introduces a new temporary level, and each subsequent clear command ``pops'' the symbols and definitions on the current level (including any definitions read using the run command) and returns you to the previous one. This gives you a ``stack'' of up to 255 temporary environments which enables you to ``plug and play'' in a safe fashion, without affecting the rest of your program. Example:

> save
save: now at temporary definitions level #2
> foo (x:xs) = x+foo xs;
> foo [] = 0;
> list foo
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55
> clear
This will clear all temporary definitions at level #2. Continue (y/n)? y
clear: now at temporary definitions level #1
> list foo
> foo (1..10);
foo [1,2,3,4,5,6,7,8,9,10]

We've seen already that normally, if you enter a sequence of equations, they will be recorded in the order in which they were written. However, it is also possible to override definitions in lower levels with the override command:

> foo (x:xs) = x+foo xs;
> foo [] = 0;
> list foo
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55
> save
save: now at temporary definitions level #2
> override
> foo (x:xs) = x*foo xs;
> list foo
foo (x:xs) = x*foo xs;
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
0

Note that the equation `foo (x:xs) = x*foo xs;' was inserted before the previous `foo (x:xs) = x+foo xs;' rule, which is at level #1.

Even in override mode, new definitions will be added after other definitions at the current level. This allows us to just continue adding more high-priority definitions overriding lower-priority ones:

> foo [] = 1;
> list foo
foo (x:xs) = x*foo xs;
foo [] = 1;
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
3628800

Again, the new equation was inserted above the existing lower-priority rules, but below our previous `foo (x:xs) = x*foo xs;' equation entered at the same level. As you can see, we have now effectively replaced our original definition of `foo' with a version that calculates list products instead of sums, but of course we can easily go back one level to restore the previous definition:

> clear
This will clear all temporary definitions at level #2. Continue (y/n)? y
clear: now at temporary definitions level #1
clear: override mode is on
> list foo
foo (x:xs) = x+foo xs;
foo [] = 0;
> foo (1..10);
55

Note that clear reminded us that override mode is still enabled (save will do the same if override mode is on while pushing a new definitions level). To turn it off again, use the underride command. This will revert to the normal behaviour of adding new equations below existing ones:

> underride

Caveats and Notes

Debugging. There's no symbolic debugger yet. So printf(3) (available in the system standard library module) should be your friend. ;-)

Tuples and parentheses. Please note that parentheses are really only used to group expressions and are not part of the tuple syntax; tuples are in fact not really part of the Pure language at all, but are implemented in the prelude. As you can see there, the pairing operator `,' used to construct tuples is (right-)associative. We call these the ``poor man's tuples'' since they are always flat and thus there are no nested tuples (if you need this then you should use lists instead). This also implies that an expression like [(1,2),(3,4)] is in fact exactly the same as [1,2,3,4]. If you want to denote a list of tuples, you must use the syntax (1,2):(3,4):[] instead; this is also the notation used when the interpreter prints such objects.

Special forms. Special forms are recognized at compile time only. Thus the catch function as well as the short-circuit logical connectives && and || are only treated as special forms in direct (saturated) calls. They can still be used if you pass them around as function values or partial applications, but in this case they lose all their special call-by-name argument processing.

Manipulating function applications. The ``head = function'' rule means that the head symbol f of an application f x1 ... xn occurring on (or inside) the left-hand side of an equation, pattern binding, or pattern-matching lambda expression, is always interpreted as a literal function symbol (not a variable). This implies that you cannot match the ``function'' component of an application against a variable, and thus you cannot directly define a generic function which operates on arbitrary function applications. As a remedy, the prelude provides three operations to handle such objects: applp, a predicate which checks whether a given expression is a function application, and fun and arg, which determine the function and argument parts of such an expression, respectively. (This may seem a little awkward, but as a matter of fact the ``head = function'' rule is quite convenient since it covers the common cases without forcing the programmer to declare ``constructor'' symbols (except nullary symbols). Also note that in standard term rewriting you do not have rules parameterizing over the head symbol of a function application either.)

Numeric types. If possible, you should always decorate numeric variables on the left-hand sides of function definitions with the appropriate type tags, like ::int or ::double. This often helps the compiler to generate better code and makes your programs run faster.

Talking about the built-in types, please note that int (the machine integers) and bigint (the GMP ``big'' integers) are really different kinds of objects, and thus if you want to define a function operating on both kinds of integers, you'll also have to provide equations for both. This also applies to equations matching against constant values of these types; in particular, a small integer constant like `0' only matches machine integers, not bigints; for the latter you'll have to use the ``big L'' notation `0L'.

External C functions. The interpreter always takes your extern declarations of C routines at face value. It will not go and read any C header files to determine whether you actually declared the function correctly! So you have to be careful to give the proper declarations, otherwise your program will probably segfault calling the function.

You also have to be careful when passing generic pointer values to external C routines, since currently there is no type checking for these; any pointer type other than char* and expr* is effectively treated as void*. This considerably simplifies lowlevel programming and interfacing to C libraries, but also makes it very easy to have your program segfault all over the place! Therefore it is highly recommended that you wrap your lowlevel code in Pure routines and data structures which do all the checks necessary to ensure that only the right kind of data is passed to C routines.

Stack size and tail recursion. Pure programs may need a considerable amount of stack space to handle recursive function calls, and the interpreter itself also takes its toll. So you may have to configure your system accordingly (8 MB of stack space is recommended for 32 bit systems, systems with 64 bit pointers probably need more). If the PURE_STACK environment variable is defined, the interpreter performs advisory stack checks and raises a Pure exception if the current stack size exceeds the given limit. The value of PURE_STACK should be the maximum stack size in kilobytes. Please note that this is only an advisory limit which does not change the program's physical stack size. Your operating system should supply you with a command such as ulimit(1) to set the real process stack size. Also note that this feature isn't 100% foolproof yet, since for performance reasons the stack will be checked only on certain occasions, such as entry into a global function.

Fortunately, Pure normally does proper tail calls (if LLVM provides that feature on the platform at hand), so most tail-recursive definitions should work fine in limited stack space. For instance, the following little program will loop forever if your platform supports the required optimizations:

loop = loop;

In the current implementation, a tail call will be eliminated only if the call is done directly, i.e., through an explicit call, not through a (global or local) function variable. Otherwise the call will be handled by the runtime system which is written in C and can't do proper tail calls because C can't (at least not in a portable way). This also affects mutually recursive global function calls, since there the calls are handled in an indirect way, too, through an anonymous global variable. (This is done so that a global function definition can be changed at any time during an interactive session, without having to recompile the entire program.) However, mutual tail recursion does work with local functions, so it's easy to work around this limitation.

Scheme programmers should note that conditional expressions (if-then-else) are tail-recursive in both branches, just like in Scheme, while the logical operators && and || are not tail-recursive. This is because the logical operators always return a proper truth value (0 or 1) which wouldn't be possible with tail call semantics.

Files

~/.pure_history
Interactive command history.
prelude.pure
Standard prelude. If available, this script is loaded before any other definitions, unless -n was specified.

Environment

PURELIB
Directory to search for source files, including the prelude. If PURELIB is not set, it defaults to some default location specified at installation time.
PURE_MORE
Shell command to be used for paging through output of the list command, when the interpreter runs in interactive mode.
PURE_PS
Command prompt used in the interactive command loop ("> " by default).
PURE_STACK
Maximum stack size in kilobytes (default: 0 = unlimited).

License

GPL V3 or later. See the accompanying COPYING file for details.

Author

Albert Graef <Dr.Graef@t-online.de>, Dept. of Computer Music, Johannes Gutenberg University of Mainz, Germany.

See Also

Aardappel
Another functional programming language based on term rewriting, http://wouter.fov120.com/aardappel.
Haskell
A popular non-strict FPL, http://www.haskell.org.
LLVM
The LLVM code generator framework, http://llvm.org.
ML
A popular strict FPL. See Robin Milner, Mads Tofte, Robert Harper, D. MacQueen: The Definition of Standard ML (Revised). MIT Press, 1997.
Q
Another term rewriting language by yours truly, http://q-lang.sf.net.


Table of Contents