A *procedure* in SETL is a sequence of computational steps which have been
given a name and which, using one or more data items passed to it for
processing, will compute and deliver a value. Most of the built-in SETL
operators, for example **max**, which returns the maximum of two values x and y, and **cos**, which returns the cosine of a floating-point number x passed to it, are procedures in this sense. However, since no finite collection will ever exhaust the whole catalog of procedures that a programmer may want to use, it is important to have a way of defining, and then using, as many additional operations as are helpful.

To make the preceding point more convincing, we can consider a simple example Suppose that the weights of individual eggs in batches coming from a chicken farm are measured daily, thus producing batches of measurements, each of which can be thought of as a set of numbers, e.g.,

{2.7,2.85,1.90,...,1.86} (1)

Suppose that in order to enforce some sort of quality control, various statistical properties are to be reported for each batch, including the weights of the three largest and the three smallest eggs in the batch.

To make this calculation easily, it would be convenient to use a pre-programmed procedure to which a set s like (1) can be passed, and which would then produce a tuple t

[1 86,1.90,...,2.7,2.85] (2)

such that all the members of s are arranged in increasing order. Since this procedure would simply sort the members of s, it can appropriately be called 'sort'. We would like to be able to produce the ordered tuple t from the set (1) simply by writing

t := sort(s). (3)

Note that if this can be done, then to print the three largest and three smallest measurements we have only to write

Of course, sorting the set s is not hard and can be done by the simple method explained in Section 4.7.2, which is to say, using the code

t := [ ];whiles /= { }looptwith:= (x :=min/s); (4) sless:= x;end loop;

However, what we want is to package the code (4), giving it the name sort and invoking it by this name. By doing this we make it possible to get the effect of the code (4), without having to concern ourselves with its inner workings, simply by writing (3). To "package" bits of code in this way becomes absolutely essential when one is constructing large programs (say a few hundred lines or more). Such large programs can only be built successfully if they are organized hierarchically into a modular collection of sub-procedures. Typically such a collection will include both high-level functions which simply make use of facilities provided by lower-level functions, and low-level procedures, like the sort which we have been discussing, which encapsulate generally useful primitive operations. Like most other programming languages, SETL does provide a facility for defining as many new procedures as you need, and we now proceed to explain how this is done.

To package or encapsulate the code (4), all we need to do is to enclose it between procedure header and trailer lines and add a **return** statement. This gives procedure sort(s);

proceduresort(s); t := [ ];whiles /= { }looptwith:= (x :=min/s); sless:= x;end loop; (5)returnt;endsort;

In (5) the procedure header line is

This line, introduced by the special keyword **procedure**, opens the procedure (5), gives it a name (in this case, the name sort), and also names its *formal parameters* (sometimes simply called parameters), i.e., the names of values which will be passed to the procedure whenever it is used (as in (3)), and from which the procedure will calculate the value that it returns. (In (5), the value returned is t, and there is only one formal parameter, namely, s.) The concluding trailer line

endsort; (5b)

marks the end of the procedure.

Finally the command

returnt; (5c)

appearing in (5) both indicates the point at which the procedure computation has finished calculating the value which it is to produce and defines the value that the procedure will return.

To call or invoke the procedure sort defined by (5), we have only to write sort(e), where e can be any set-valued expression (provided that the set members are all integers, or all real numbers, etc.). This automatically calculates and makes available the value returned by the procedure (5). For example, if we write

the result will be

The expression e occurring in such an invocation sort(e) of the procedure sort is called the actual parameter, or supplied argument, of the invocation. Whenever evaluation of a procedure invocation like (5d) begins, the value of the actual parameter (or parameters) appearing in it is transmitted to the procedure invoked and becomes the initial value of the procedure's formal parameter (or parameters).

To examine the behavior of SETL function call more closely let us consider the following invocation of the procedure 'sort', and trace through the way it works.

x :=sort({5, 1, 2, 7,0}); (6)

As with all assignment statements, execution of (6) begins with evaluation of its right-hand side. Since sort is the name of a procedure, evaluation of the procedure call appearing on the right-hand side of the assignment (6) involves the following steps:

- The current of 'actual' parameter value {5,1, 2, 7, 0} that appears in the procedure
invocation is assigned as the initial value of the formal parameter variable
s appearing in the procedure header line of the procedure (5).
**Figure 5.1 Detour and Return in Function Invocations** - Execution of the procedure (5) begins: the statements appearing in the
body of this procedure are executed in the ordinary way. However, when any
formal parameter appears in the body of the procedure, the
corresponding actual parameter value passed to the procedure is used.
- As soon as a
**return**statement is encountered, control is passed back from the procedure (5) to the instruction immediately following the call (6). Just before this happens, the expression following the keyword**return**is evaluated and becomes the value that the procedure (5) yields (e.g., becomes the value of the variable x in (6)).

This "detour and return" action of function invocations is shown schematically in Figure 5.1.

The following analogy should help to clarify the important distinction between the formal parameters and the actual parameters of a procedure. The formal parameters of a procedure can be compared to the ingredient names in a cookbook recipe. For example, a recipe may say "break an egg into half a cup of flour and stir." The names egg and flour appearing in such a recipe are formal names which stand for all the actual eggs and actual half cups of flour that will be used when the recipe is actually followed. As in the case of a function, new actual items, i.e., a different egg and a different half cup of flour, must be supplied each time the recipe is used, even though the formal names egg and flour appearing in the recipe remain the same. Continuing this analogy, the text of the recipe can be compared to the body of a procedure, which will yield something (e.g., a cake) when actual ingredients matching the formal ingredient names to which it refers are supplied.

It is also instructive to consider an example involving two invocations of the sort routine, with two different parameters:

x :=sort(s1) +sort(s2); (6b)

Suppose that when (6b) is executed s1 and s2 happen to have the values {3, 1, 0} and {-3, -1, 0} respectively. Then evaluation of sort(s1) will produce the value [0, 1, 3] and evaluation of sort(s2) will produce the value [-3, -1, 0], so that after (6b) is executed the variable x will have the value [O, 1, 3, -3, -1, 0].

The way this happens is as follows. As with all assignment statements, execution of (6b) begins with evaluation of its right-hand side, i.e., sort(s1) + sort(s2). This is an expression and is evaluated by first evaluating its two subexpressions sort(sl) and sort(s2) and then combining the two resulting values using the " + " operator.

The value of x in statement (6b) will be the same as the value of x resulting from the execution of

temp1 :=sort(s1); temp2 :=sort(s2); (7) x := temp1 + temp2;

As you can see, (7) involves two successive invocations of sort, followed by a use of the " +" operator to combine the two results produced.

The following important rules govern the use of procedures.

- The formal parameters that appear in the procedure heading must be valid
identifiers, that is to say, they must be variable names; furthermore no two
formal parameters can have the same name. For example, both
**procedure**p1(s * t); (8a)**procedure**p2(s, t, s); (8b)are illegal: (8a) because the parameter s*t is not a simple variable, and (8b) because the first and the third formal parameters of p2 are identical. On the other hand, any actual parameter of a function invocation can be an (arbitrarily complicated) expression, and actual parameters can be repeated. For example,

x :=

**sort**({x**in**ss | x > 0}); (9a)is legal if ss is a set (and if ss were {-10, 20, -20, 15, 10} would give x the value [10, 15, 20]). Similarly, if dot_prod(x,y) is a function which calculates and returns the dot-product of the two tuples x and y, then

a := dot_prod(u,u); (9b)

is legal (and will put the sum of the squared components of the tuple u into a).

- Each invocation of a procedure must have exactly as many actual parameters as the procedure has formal parameters. When
a procedure is invoked, the value of its first (resp. second, third, etc.) actual parameter becomes the value of its first (resp. second, third, etc.) formal parameter. For example, if a procedure whose header line is
**procedure**intermingle(a, b, c);is invoked by

x := intermingle ({x in s | x > 0}, {y **in**s2 | y < 0}, {x**in**s | x > 0});then a and c initially get the value {x

**in**s | x > 0}, and the value {y**in**s2 | y < 0} is transmitted to b. - The body of a procedure can contain any number of
**return**statements and often will contain more than one. The following code, which simply calculates and returns the maximum of two quantities, exemplifies this remark:**procedure**my_very_own_max_function(x, y);**if**x > y**then****return**x;**else****return**y;**end if**;**end**my_very_own_max_function;If no

**return**statement is encountered, execution of the procedure will terminate when and if its trailer line**end**proc_name is reached, and in this case the undefined value**OM**will be returned.Note: Other programming languages make the distinction between a function which returns a value, and a procedure, which does not. This distinction is not present in SETL: a procedure may or may not return a value.

Note that the keyword

**return**can be followed by an arbitrary expression. This expression may be complex; in fact, the whole body of the function may simply consist of a single return statement and nothing else, as in**procedure**positive_elements_in(s); -- returns the set of positive elements of s**return**{x**in**s | x > 0};**end**positive_elements_in;**Figure 5.2 Patterns of Control Transfer in Multiple Function Calls.** -
Procedures can invoke other procedures (including themselves) without
restriction. When control is transferred to a procedure f which in turn
invokes a function g, execution will proceed within the body of f until an invocation of g is encountered, at which point execution of f will be suspended and execution of g will begin. Thereafter, g will execute until a
**return**statement is encountered within g, at which point g will terminate, sending control, and possibly a value, back to f. Subsequently, when a**return**statement is encountered in f, f will itself be terminated, sending control (and a value) back to the procedure from which f was invoked. This will lead to patterns of control transfer like that shown in Figure 5.2. - Procedure invocations are themselves expressions and can be used freely
as parts of more complex expressions. For example, if sort is a function
which returns the elements of a set s in sorted order as a tuple, and
sum_square is a function which returns the sum of the squares of the three first elements of a tuple, then we can write
**print**(sum_square(sort(s)));to display the sum of the squares of the three smallest elements of s.

To illustrate the use of procedures, we will now exhibit a variety of procedures for sorting a set or tuple of elements into order. One simple, well-known way of sorting is the so-called bubble-sort method, which, simply stated, operates as follows: as long as there are two adjacent elements that are out of order in the sequence, interchange them. This is not a very efficient sorting method (and in the form presented here it is even more inefficient than the standard bubble sort), but it is one of the simplest to state and program. The input to the procedure is a tuple, and the output is another tuple, whose elements are in increasing order. Note that the code that follows applies equally well to a tuple of integers, a tuple of floating-point numbers, or a tuple of strings: in all three cases the comparison operator " > " defines the desired ordering.

proceduresort(t); -- sorts a tuple by the bubble-sort methodwhile existsiin[1..#t - 1] | t(i) > t(i + 1)loop[t(i),t(i + 1)] := [t(i + 1),t(i)];end loop;returnt;endsort;

(The attentive reader will notice that this procedure modifies its own parameter t and will wonder whether the value of the actual parameter will be modified when sort is invoked. In fact, the value of the actual parameter will not really be affected outside sort; but the rule guaranteeing this will only be stated in Section 5.5. This same remark also applies to several of the procedures presented later in this section.)

As we mentioned, the procedure just shown can be used to sort any tuple of integers, of reals, or of strings. For example, if we write

the result will be

More complex sorting routines than that shown are often needed. One reason for this is that sorting is often used to arrange more complex "records" into an order determined by some common "subfield" of the records. In SETL, such records are typically represented as tuples. Suppose, for example, that a group of students have taken a course in which their grades on a series of homework exercises and examinations have been collected, producing a tuple of tuples having the following form:

records := [["Gonzalez, Aldo", 80, 87,OM, 73, 90,..], ["Woburn, Linda", 82, 89, 85, 91, 90, 65,..], ["Luciano, Luigi", 80, 81, 75, 79,OM, 70,..],...]

Grades are assumed to be represented by integers, and missed exercises or
examinations by occurrences of **OM**. One might then want to arrange these
records in various orders, e.g.,

- Alphabetic order of student names
- Order of grade averages, with largest first
- Order of grades on midterm examination, largest first
- Order of number of exercises not handed in, largest first, etc.

To make it easy to sort these records according to any of their fields, we modify our original sorting procedure, so that it takes two arguments:

- The tuple of records to be sorted.
- The record component by which the records must be sorted.

This leads to the following procedure (which, however, does not treat **OM**
components correctly: see the following discussion).

proceduresort1(t, pos); -- t is a tuple of records (tuples) to be sorted. -- pos is the index of the component in each record, along which -- the records are to be sorted in increasing order.while existsiin[1..#t-1] | t(i)(pos) > t(i + 1)(pos)loop[t(i),t(i + 1)] := [t(i + 1),t(i)];end loop;returnt;endsort1;

Using this function, we can print the class records in alphabetical order simply by writing

forxinsort1(records,1)loopend loop;

Suppose now that we want to list these records in order of decreasing
midterm grades, with students who have missed the midterm coming last. If
the midterm is the 11th entry in the record, we may be tempted to sort the
records (into increasing order) according to that component and then list them
in reverse. The attentive reader will notice that sort1 as written will not work in the presence of missing grades: recall the convention that a missed test is marked as **OM** in the record. The comparison (**OM** > x) where x is a non-**OM** value is not meaningful, and in fact the-SETL system will stop any program at the point at which such a comparison is attempted. As a necessary modification to our sorting procedure, we therefore replace the comparison that drives the **while** loop, so that a value of **OM** is regarded as smaller than any existing
grade. Using the "is undefined" (i.e. questionmark) operator, we simply replace t(i)(pos) by t(i)(pos)?(-1). The improved sorting routine then reads

proceduresort2(t,pos); -- T is a tuple of records, some of whose components may beOM. -- pos is the index of the record component along which the records -- are to be sorted in increasing order.while existsiin[1..#t - 1] | t(i)(pos)?(-1) > t(i + 1)(pos)?(-1)loop[t(i), t(i + 1)] := [t(i + 1), t(i)];end loop;returnt;endsort2;

With this modification, we can print the desired ordering of records by midterm grades using the following code (recall that a student's name is the first component of his/her record, the midterm grade is the 11th component of the record, and this grade may be undefined):

ordered := sort2(records, 11);foriin[#ordered,#ordered-1..1]loopend loop;

A program that makes use of procedures ordinarily includes commands that
invoke these procedures; otherwise the procedures might as well not be there. As we have explained, the first function invoked can
in turn invoke any or all of the other functions, but at least one instruction not belownging to any procedure is needed to trigger this first invocation. In a program including one or more procedures, the "directly executed" portion of the program, i.e., everything not included in any procedure, is called the main block of the program, or the main program for short. This block of instructions has exactly the form of a **program** body, as described in Chapters 2 and 3, and it must precede all procedures. The main program and all the procedures which follow it must be prefixed by a **program** header line of the usual form, and a corresponding trailer line starting with the keyword end must follow the last procedure.

For example, a complete program consisting of the sort function shown previously and the two fragments of code which invoke it would have the following overall structure:

programprint_grade_info; -- program to print student grade records input_handle :=open("student_record_file","TEXT-IN"); -- file operations are described later in this chapterreada(input_handle,records); -- acquire the basic dataforxinsort (records,1)loopend loop;foriin[#ordered,#ordered-1..1]loopend loop;proceduresort(t, pos); -- t is a tuple of records. pos is the position of the record component -- according to which the records are to be sorted in increasing order.while existsiin[1..#t - 1] | t(i)(pos)?(-1) > t(i + 1)(pos)?(-1)loop[t(i),t(i + 1)] := [t(i + 1),t(i)];end loop;returnt;endsort;endprint_grade_info;

Execution of such a program begins at the first statement of its main program block and ends as soon as the last statement of its main program block has been executed (or when a stop statement is encountered; see Section 4.5).

As a next example, we define a procedure that takes a string and returns a similar string in which all lowercase alphabetic characters have been changed into the corresponding uppercase characters. Blanks and punctuation marks are not affected.

procedurecapitalize(s); -- capitalizes the string s and returns -- the result. Nonalphabetic characters are left alone small_letters := "abcdefghijklmnopqrstuvwxyz"; big_letters := "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; capital_of := {[let, big_letters(i)]: let = small_letters(i)}; -- maps each small letter into the corresponding capital.return+ /[capital_of(let)?let: let = s(i)]; -- Note that the map capital_of is defined over alphabetic characters -- only. Nonalphabetic characters, such as punctuation marks, are not -- converted, but left as they are. This is the purpose of the "? let" -- expression.endcapitalize;

A procedure can have any number of parameters, even no parameters. For example, suppose that we want to use a procedure which reads an input string, uses the capitalize procedure to capitalize this input, and returns the capitalized result. This function can be written as follows:

procedureread_a_line; -- procedure to read and capitalize a line file_handle :=open("test_file","TEXT-IN");reada(file_handle,x); -- read a quoted stringclose(file_handle);returnifx =OMthenOMelsecapitalize(x)end if;endread_a_line;

To invoke a parameterless procedure of this kind, one must write its name, followed by an empty parameter list. For example, to invoke the next_line procedure and print the capitalized string that it returns, we would write

We emphasize that the empty parameter list, i.e. the "( )" following the name of the parameterless procedure next_line, is obligatory.

As a further illustration of the use of procedures, we give a set of procedures for adding, subtracting, multiplying, and dividing polynomials in a single variable with real coefficients. Such polynomials are ordinarily printed in a standard algebraic form like

In the procedures that follow we will assume that a polynomial is represented internally by a SETL map which sends the exponent of each term of the polynomial into the coefficient of that term. For example, the polynomial shown previously would be represented internally by the map

As in algebra, we simply omit terms whose coefficients are zero.

Developing a package of procedures for manipulating polynomials represented in this way is easy.

To add (resp. subtract) two polynomials, we simply add (resp. subtract) the coefficients of corresponding terms. So the addition of two polynomials can proceed as follows:

proceduresum(p1,p2); -- computes the sum of two polynomials result := { };forc = p1(e)loop-- iterate over terms of first polynomialifp2(e) /=OMthen-- second polynomial has matching term cr := c + p2(e); -- coefficient of resultifcr /= 0.0then-- term is present result(e) := cr;end if;elseresult(e) := c;end if;end loop;forc = p2(e) | p1(e) =OM loop-- add terms in second polynomial that are -- not present in first result(e) := c;end loop;returnresult;endsum;

Note that the result of the second loop can be replaced by the following more compact expression.

We can also abbreviate the first loop by using the "?" operator and obtain the following compact procedure:

proceduresum(p1, p2); -- forms the sum of two polynomialsreturn{[e,c]: c1 = p1(e) | (c := c1 + (p2(e)?0.0)) /= 0.0} + {[e,c2]: c2 = p2(e) | p1(e) =OM};endsum;

Adapting this we can easily write a procedure for polynomial difference:

procedurediff(p1 , p2); -- forms the difference of two polynomialsreturn{[e,c]: c1 = p1(e) | (c := c1 - (p2(e)?0.0)) /= 0.0} + {[e,-c2]: c2 = p2(e) | p1(e) =OM};enddiff;

To multiply two polynomials, we multiply and sum all pairs of their individual terms. Finally, we eliminate terms which turn out to have zero coefficients. This is simply

procedureprod(p1, p2); -- forms the product of two polynomials p := {};forc1 = p1(e1), c2 = p2(e2)loopp(e1 + e2) := p(e1 + e2)?0.0 + c1 * c2;end loop;return{[e,c]: c = p(e) | c /= 0.0};endprod;

Next, we show how to divide a polynomial p1 by a polynomial p2. Let c_{1}x^{j1} be the leading term of p1, i.e., the term having largest exponent, and let c_{1}x^{j2}
be the leading term of p2. Then we subtract (c1/c2)x^{j1-j2}) times p2 from p1,
to eliminate the leading term of p1, and so on repeatedly until all terms of p1 with exponents larger than j_{2} have been eliminated. The collection of all terms by which p2 is multiplied constitutes the terms of the quotient.

procedurediv(p1,p2); -- forms the quotient polynomial p1/p2ifp2 = { }thenreturnOM;end if; -- this is the case p2 = 0. e1 :=max/[e: c = p1(e)]; -- largest exponent of p1 e2 :=max/[e: c = p2(e)]; -- largest exponent of p2 qcoeff := { }; -- start with an empty quotientforjin[e1 - e2, e1 - e2 - 1..0] | p1(e2 + j) /= 0.0loopqcoeff(j) := p1(e2 + j) / p2(e2); p1 := diff(p1,{[e + j,qcoeff(j) * c]: c = p2(e)});end loop;returnqcoeff; -- return the map representing the quotient.enddiv;

We note that techniques for manipulating polynomials by computer have been studied very intensively, and that muchh more efficient methods than those used in these simple illustrative procedures are known. See Knuth, The Art of Computer Programming, Vol. 2, for an account of these developments, which go beyond the scope of the present book.

In writing a long program, which can involve hundreds of procedures, it
is irritating, as well as highly error-inducing, to have to remember which
variables had been used for which purposes through the whole of a long text.
To see this, consider a function invocation imbedded in a **while** loop like

i:= O; j:= O;while(i + j) < f(j)loop. . .

and suppose that f is an invocation of a function whose body is found somewhere else in a long program text. It is entirely plausible that, unknown to the author of the code (1), the body of the function f should make use of the convenient variable name 'i', e.g., in a loop like

foralliin[1..#t] |... (2)

But then, if the i appearing in (1) and the i appearing in (2) were regarded as representing the same variable, the function invocation f(j) which occurs in the **while** loop could change the value of i in ways not at all hinted at by the outward form of the code (1). Were this the case, a programmer wishing to write a loop like (1) would first have to examine the body of the function f, to avoid variable name conflict. This would introduce many highly undesirable interactions between widely separated parts of a lengthy program and make large programs harder to write.

To avoid these very undesirable effects, most programming languages
make use of rules which restrict the *scope* of names. The SETL *scope rule*
is as follows. In the absence of explicit declarations, variables retain their
meaning only within a single procedure (or main program). This implies that
ordinarily a variable i appearing in one procedure and a variable i appearing in another procedure are treated as distinct. In effect, the SETL compiler (invisibly) applies the following renaming procedure to the program text which it processes:

- The main program which begins the program text is numbered zero, and the procedures which follow this main program are numbered 1, 2, .. in their order of occurrence.
- Every variable name xxx used in the n-th procedure, including the names of its formal parameters, is implicitly changed to xxx_n.

As an example, consider the program

programexample; x := {3,0,1,2};proceduresort(s); -- sorts by selection t := [ ];whiles /= { }looptwith:= (x := min/ s); sless:= x;end loop;returnt;endsort;proceduresquares(x); -- forms and returns the tuple of squares of the -- components of the tuple xreturn[e * e: e = x(i)];endsquares;endexample;

Given this program as input, the SETL compiler will implicitly apply the renaming rules (a), (b), and therefore it will really see the following renamed variant:

programexample; x_O := {3,0,1,2}; -- main programproceduresort(s_1); -- procedure number 1 t_1 := [ ];whiles_1 /= { } loop t_1with:= (x_1 :=min/s_1); s_1less:= x_1;end loop;returnt_1;endsort;proceduresquares(x_2); -- procedure number 2return[e_2 * e_2: e_2 = x_2(i_2)];endsquares;endexample;

As stated previously, rule (b) serves to isolate variables having the same
name from each other if they are used in different procedures. Variables used
in this way are said to be *local* to the procedures in which they appear.

In some cases, however, we do want a variable used in several procedures to refer to the same object. For example, one or more "major" data objects may be used by all the functions in a related group of functions. To see this, consider the case of a group of functions written as part of an inquiry system to be used by the executives of a bank. This might involve many functions, for example,

procedurepayments(customer name); -- returns a given customer's payment -- recordproceduretel_no(customer_name); -- returns a given customer's telephone -- numberprocedureoverdue(ndays); -- returns set of a customers whose -- payments are more than ndays -- overdue ...etc.

All these procedures will have to make use of one or more "master files".
(When represented in SETL, these "files" are likely to be sets of tuples representing records, maps sending customer names, or perhaps customer identifiers such as social security or account numbers, into associated records, etc.) Instead of insisting that these master files be passed as parameters to all the procedures that need to use them, it is more reasonable to make them
available directly to every procedure, giving them easily recognizable variable
names such as master_customer_file. To make this possible, SETL provides
a special form of statement, called the **var** declaration. By writing

at the start of the overall program in which the listed functions appear, we make master customer_file a global variable which designates the same object in all the procedures which reference this variable. The required layout of a program using one or more global variables is shown in the following example:

programbanking_system; -- header line for overall programvarmaster_customer_file; -- declaration of global variable -- (additional global variable declarations come here) -- (body of "main" program of banking_system comes here)procedurepayments (customer_name); -- first procedure ...endpayments;proceduretel_no (customer_name); -- second procedure ...endtel_no;procedureoverdue (n_days); -- third procedure ...endoverdue; -- (more procedures can come here)endbanking_system;

The statement

appearing first in this example is called a *declaration* rather than an *executable statement* because it serves to establish the meaning of certain names rather than to trigger any particular calculation.

The simplest form of a var declaration is

i.e., it consists of the keyword **var** followed by a comma-separated list of distinct variable identifiers.

Such declaration can appear in one of several positions:

- In a
**program**, before the program's first executable statement. Variable identifiers appearing in such a declaration are defined to be global variables directly accessible to each following**procedure**in the program. A**var**declaration appearing in this position is called a*global***var**declaration. - In a
**procedure**within a program, before the procedure's first executable statement. A**var**declaration appearing in this position is called a*local***var**declaration. Variable identifiers appearing in such declarations are defined to be*local variables*accessible only within the procedure (and in sub-procedures nested within it). Since variable names not appearing in any**var**declaration are in any case local to the procedures in which they appear,**var**declarations appearing in this position often serve only to document the way in which a procedure uses its variables. However, if the procedure is recursive (see Section 5.4),**var**declarations appearing in it have a more significant effect, which will be described more fully later. - In a
**package**or**package body**. We postpone discussion of**var**declarations in**package**s, which are tools for organizing large SETL programs, to Chapter 7. - At the start of a
**class**or**class body**. We postpone discussion of**var**declarations in classes and class bodies to Chapter 8, where this important aspect of SETL is discussed.

Any number of **var** declarations may appear either at the start of a
program or within a procedure, but all such declarations must precede the first executable statement of the program or procedure in which they appear. No variable should appear twice in **var** declarations (either global **var** declarations or declarations within a single procedure), nor is it legal for any procedure parameter name to appear in a global **var** declaration.

A global variable retains its value between invocations of the procedures that use it.

To sum up, there are two ways in which values can be communicated
between separate **procedures**:

- By being passed as parameters or returned by procedures.
- By direct global communication, i.e., by being the values of variables which have been declared to be global and hence are accessible to more than one procedure.

Method (ii) is powerful, but potentially undisciplined, since it allows procedures to influence each other in ways that their invocations hide. It is therefore good programming practice to avoid using more than a very few declared global variables. Generally speaking, variables should be made global only if

- They represent major data objects accessed by more than one of a program's procedures, and their usage is subject to clearly understood rules of style which pervade the entire program.
- They represent flags or other conditions that many procedures need to test (e.g., to determine whether particular debugging traces should be produced), but that play no role in the normal functioning of these procedures and are rarely modified.
- They need to be shared between procedures that do not call each other and must be kept alive between successive invocations of these procedures.
- They represent constants, too complex to be set up conveniently using a
**const**declaration (see the following section), which need to be used whenever a procedure is invoked. - They need to be accessible to all logical copies of a recursive procedure (see Section 5.4).

The capitalize function appearing in Section 5.1 can be used to illustrate point (d). As written, this forms the map

each time it is invoked. To do this is of course wasteful of computer time.
Using the **const** declaration described in the following section we would instead declare capital_of to be a constant having this value, but this requires writing out all the elements of capital_of explicitly, a nuisance since this involves typing 104 apostrophes, 51 commas, 52 brackets, etc. It is more convenient to declare

and then to add the instructions

small_letters :="abcdefghijklmnopqrstuvwxyz"; big_letters :="ABCDEFGHIJKLMNOPQRSTUVWXYZ"; capital_of := {[c,big_letters(i)]: 1 = small_letters(c)};

as part of a main program block before the first use of capitalize. The capitalize function then reduces to the following simple form:

procedurecapitalize(s);return+/ [capital_of(let)?let: let = s(i)];endcapitalize;

It is often convenient to use a symbolic name for a constant appearing repeatedly in a program. Among other things, naming a constant and using its name rather than its explicit representation make it muchh easier to modify your program if modification subsequently becomes necessary. To define constants, one or more **const** declarations are used. These have muchh the same form as **var** declarations with initialization clauses, except that an initialization clause is required for each name listed in a **const** declaration. An example is

**const** declarations have muchh the same semantics as **var** declarations with initialization clauses, but the name values that they declare cannot be changed since no subsequent assignment to a name declared constant is allowed.

This example illustrates the following rules:

- Each
**const**name must be a valid SETL identifier. By virtue of its appearance in**const**declaration, this identifier becomes a*constant identifier*, i.e., a synonym for the constant denotation, const_expn, that follows it in the**const**declaration. It retains this meaning throughout the scope of the identifier. - Each const_expnj appearing to the right of an equals sign in a declaration like (1) must be a valid constant expression. Such expressions are built out of the following:
- Elementary constant denotations, each of which designates an integer, a real number, or a quoted string.
- Constant identifiers, i.e., identifiers of constants introduced by earlier
**const**declarations.For example, it is possible to write

**const**one := 1 two := 2, one_and_two := {one, two};This is equivalent to

**const**one1 := 1, two := 2;**const**one_and_two := {1,2}; - Compound constant denotations can also appear in
**const**declarations. Examples are**const**complex_thing := [{"A",1},{"B",2},{{ }}];**const**let_1 := "alpha", let_2 := "beta", let_map := {["A",let_1],["B",let_2]};**const**two_pi := 2.0 * 3.14159;**const**sixty_blanks := 60 * " ";As the preceding examples show, fairly general expressions can appear in initialization clauses, though of course they must all be evaluable in terms of constants that have appeared in earlier

**var**and**const**declarations. The best way of determining whether a particular initialization clause is legal is simply to try it out. If it does not generate a syntax error it will work correctly.

The following kind of problem, often called the "buckets and well" puzzle, commonly appears on IQ tests. Suppose that one is given several buckets of various sizes, and that a well full of water is available. To focus on a simple specific case, suppose that just two buckets, a 3-quart bucket and a 5-quart bucket, are given. We are required to use them to measure out exactly 4 quarts of water. Since *exactly* this amount of water is to be measured out, no nonprecise operation is allowed. This means that only three kinds of operation can be used in a solution of this problem:

- Any bucket can be filled brim-full from the well;
- Any bucket can be emptied completely;
- Any bucket can be poured into any other, until either the first bucket becomes completely empty or the second bucket becomes brim-full.

As an example, the following is a way of measuring out exactly 4 quarts using only a 3- and a 5-quart bucket.

- Fill the 5-quart bucket.
- Pour the 5-quart bucket into the 3-quart bucket (leaving 2 quarts in the 5-quart bucket).
- Empty the 3-quart bucket. Pour the contents of the 5-quart bucket into the 3-quart bucket. (Now 2 quarts is in the 3-quart bucket, and the 5-quart bucket is empty).
- Fill the 5-quart bucket.
- Pour the 5-quart bucket into the 3-quart bucket, until the 3-quart bucket becomes full. (This leaves exactly 4 quarts in the 5-quart bucket.)
- Empty the 3-quart bucket. (Now exactly 4 quarts has been measured out.)

The fact that it is easy to program a computer to solve problems of this
kind might be considered surprising, since such solutions are often considered
to require intelligence. Nevertheless a systematic approach is not hard to find. The key idea is that of *state*. Specifically, as one moves through the steps of any solution to this kind of problem, the objects being manipulated (in this case, the buckets) will at any moment be in some particular condition. In the case we consider, this condition or state is determined by the amount of water in each of the buckets. We can represent this state as a tuple, of as many components as there are buckets. Initially, when both buckets are empty, the state is [0,0]. The target state for the example considered is that in which exactly 4 quarts has been measured into the 5-quart bucket; this is represented by the tuple [0, 4]. The state in which both buckets are completely full is [3, 5], that in which the 3-quart bucket is full and the 5-quart bucket is empty is [3, 0], etc. In this representation, the problem solution given by (i-vii) would be represented as the following sequence of states:

This way of looking at the problem makes it plain that what we need to
consider is the set of all possible states, and the manner in which new states
can be reached from old. Suppose that the tuple *state* represents the amount of water currently in the buckets, so that state(i) is the amount of water in the i-th bucket, and that the tuple 'size' represents the sizes of all the given buckets, so that size (i) is the capacity of the i-th bucket. In the buckets and well problem, only the three manipulations (a), (b), and (c) are allowed. If bucket i is poured into bucket j until either i becomes empty or j becomes full, then the amount poured will be

Hence the following procedure returns the collection of all states than can be reached in a single step from an initially given state:

procedurenew_states_from(state);return{empty(state,j): j in [1..#state]} + {fill(state,j): jin[1..#state]} + {pour(state,i,j): iin[1..#state], jin[1..#state] | (i /= j)};endnew_states_from;procedureempty(state,j); -- empties bucket j state(j) := 0;returnstate;endempty;procedurefill(state,j); -- fills bucket j state(j) := size(j); -- the "size" tuple is assumed to be globalreturnstate;endfill;procedurepour(state,i,j); -- pour bucket i into bucket j amount := state(i)min(size(j) - state(j)); -- amount that can be poured state(i) -:= amount; -- out of i and into j state(j) +:= amount;returnstate;endpour;

We can now solve our problem by a systematic process of state exploration. We start in the initial, all buckets empty, state to generate all the states that can be reached in one step from this starting state. Then we generate all states that can be reached in one step from these second-level states, etc. States that have been encountered previously are ignored; the ones that remain are precisely those which can be reached from the start in two steps but no fewer. From these, we generate all states which can be generated in three steps but no fewer, and so forth. As we go along, we check to see whether the target state has yet been reached. Eventually, we either reach the target state, thereby solving our problem, or find that no new states can be generated, even though the target state has not been reached. In this latter case, the problem clearly has no solution.

Figure 5.3 illustrates the notion of state search and shows some of the states that come up during search for a solution of our two-bucket example:

Note that in this figure we only show transitions which lead to states that have not been seen before. Other transitions are redundant, since the shortest path from start state to the target state will never pass through the same state twice.

To be sure that we can reconstruct the path from start to target once the target has been reached, we proceed as follows. Whenever a new state ns is seen for the first time it will have been generated from some immediately preceding old state os. As states are generated, we keep a map reached_from which maps each new state ns into the old state os from which ns has been reached. Once the target state has been reached, we can use this map to chain back from the target to the start state. Then the desired solution is simply the reverse of the sequence thereby generated.

The following code implements this state-generation and backchaining procedure. It is deliberately written in a manner that hides all information concerning the structure of states, as well as all details concerning the way in which new states arise from old. This makes it possible to use the same routine to solve many different kinds of state-exploration problems.

procedurefind_path(start,target); -- general state-exploration procedure. reached_from := {[start,start]}; -- the start state is considered -- to have been reached from itself just_seen := {start}; -- initially, only the start state has been seen got_it :=false; -- we don't have the solution yetwhilejust_seen /= { }loop-- while there exist newly seen states brand_new := { }; -- look for states that have not been seen beforeforold_stateinjust_seen, new_stateinnew_states_from(old_state) | reached_from(new_state) =OM loopbrand_newwith:= new_state; -- record a brand_new state reached_from(new_state) := old_state; -- and record its originifnew_state = targetthengot_it :=true;exit; -- since problem has been solvedend if;end loop;ifgot_itthen exit; end if; -- since problem has been solved just_seen := brand_new; -- now the brand-new states -- define those which have just been seenend loop;if notgot_itthen return OM; end if; -- since all states have been explored, and the target -- has not been found, we know that no solution exists. -- at this point the target has been found, so we chain back from the target -- to reconstruct the path from start to target rev_path := [target]; -- initialize the path to be builtwhile(last_state := rev_path(#rev_path)) /= startlooprev_path with:= reached_from(last_state); -- chain backwards to the startend loop;return[rev_path(j): jin[#rev_path, #rev_path - 1..1]]; -- reverse the pathendfind_path;

The following main program can be used to acquire a problem specification interactively and to invoke the find_path routine to solve it. Again we hide all problem-specific information in appropriate procedures.

programbuckets; -- Hann Xin divides winevarsize; -- global variable for storing problem specification prob_specs := get_prob_specs( ); [start, target, size] := prob_specs;if(path := find_path(start,target)) =OMthenforxinpathloopend loop;end if;procedurenew_states_from(state);return{empty(state,j): j in [1.. #state]} + {fill(state,j): jin[1.. #state]} + {pour(state,i,j): iin[1..#state], jin[1..#state] | (i /= j)};endnew_states_from;procedureempty (state,j); -- empties bucket j state(j) := 0;returnstate;endempty;procedurefill(state,j); -- fills bucket j state(j) := size(j); -- the "size" tuple is assumed to be globalreturnstate;endfill;procedurepour(state,i,j); -- pour bucket i into bucket j amount := state(i)min(size(j) - state(j)); -- amount that can be poured state(i) - := amount; -- out of i and into j state(j) + := amount;returnstate;endpour;procedurefind_path(start, target); -- general state-exploration -- procedure. text is on previous page ...endfind_path;procedureget_prob_specs; -- acquires and returns specifications of problem -- this can be replaced by a procedure that acquires problem specifications interactively, -- as explained in Chapter 10 start := [0,0,0]; target := [1,1,0]; size := [2,4,7];return[start, target, size];endget_prob_specs;endbuckets;

Since the notion of problem state used in the foregoing is general and since we have written the find_path procedure and the main program block shown in a manner which insulates them from the details of the problems that they solve, we can use these procedures to handle any path-finding problem of the same general class as the buckets and well problem. Another amusing problem of this kind is the goat, wolf, and cabbage puzzle. In this puzzle, a man, who brings with him a goat, a wolf, and a cabbage, comes to a river which he must cross in a boat just large enough for himself and one but not two of the objects 'goat', 'wolf', and 'cabbage'. He can never leave the goat and wolf, or the cabbage and goat, alone together, since in the first case the wolf would eat the goat and in the second the goat would eat the cabbage. How is he to cross the river?

To develop a program to solve this puzzle, we have only to rewrite the new_states_from procedure and the parameterless get_prob_specs procedure. First, we need to decide on a representation of the states of the puzzle. We can designate the four objects appearing in the puzzle by their initials as "G", "W", "C", and "M" (man), respectively, and represent each state of the puzzle by a pair [l, r], where l designates the set of all objects remaining to the left of the river, and r designates the set of all objects that have been moved across the river. For example,

represents the state in which the wolf and the cabbage have been moved across, and the man has returned to the left side of the river to get the goat. The start state is then

The new_states_from procedure appropriate for this problem can be represented as follows:

Run this program and you will see how the puzzle can be solved.procedurenew_states_from(state); [l, r] := state; -- "unpack" state into its "left" and "right" portionsreturnif"M"inlthen-- the man is on the left {[l - {"M",x}, r + {"M",x}]: x in l | x /= "M"andis_legal(l - {"M",x})} +ifis_legal(l - {"M"})then{[l - {"M"}, r + {"M"}]}else{ }end if-- and can go right alone, or with one objectelse-- the man is on the right {[l + {"M", x}, r - {"M",x}]: x in r | x /= "M"andis_legal(r - {"M",x})} +ifis_legal(r - {"M"})then{[l + {"M"}, r - {"M"}]}else{ }end if-- and can go left alone, or with one objectend if;endnew_states_from;procedureis_legal(s); -- verify that goat and cabbage or goat and wolf, are not alone on the same sidereturn"M" in sor not({"G","C"}subsetsor{"G","W"}subsets);endis_legal;

Path-finding programs like those described in the preceding paragraphs have always been of interest to artificial intelligence researchers. Artificial Intelligence can be defined as the attempt to imbue computers with human-like capabilities. The workings of the human mind, although still profoundly mysterious, can be described as follows. Various extremely sophisticated perceptual systems, which operate far beloww the level accessible to consciousness, capture and decode events in the external world and pass their conclusions to consciousness. These conclusions appear as a never-ending stream of perceptions which tell us what we think is in the world, but reveal little about the way in which they arise. This is true both for visual and for sound perception, including perception of speech: even at a noisy party we are able to pick words and sentences out of the incoming flow of sound, without knowing how we filter out distractions or locate word or syllable boundaries. These perceptual mechanisms, which use muchh of the brain's active surface, constantly maintain a model of our environment, of patterns of motion in this environment, of our position in it, and even of such fine details as the faces of other persons present, with clues to their emotional reactions and likely actions. Failure of any one of the many perceptual mechanisms involved can lead directly to devastating diagnosis: inability to recognize words, or faces, or our own bodies, or objects at all; to know that we see; to know that we are blind.

Another major group of mental mechanisms, equally unconscious, monitor and coordinate the smooth motion of our bodies through space and in gravity. Failures here can have equally devastating effects, e.g. uncontrollable trembling or stuttering, uncorrectable by any conscious attempt to make one's hand or tongue pick up the desired cup or speak the few desired words. So small a cause as bad signals from the inner ear's tiny balance sensor can in a moment leave one barely able to crawl nauseated along the floor, as the world seems to whirl violently around one.

Attempts to give computers perceptual capabilities that can compete with those of humans belowng to specialized branches of artificial intelligence: Computer Vision, Computer Analysis of Speech, Written Language Analysis, Robotics. Alongside of these, another branch of the subject concerns itself with the duplication of more abstract mental capabilities: the ability to plan, reason, solve puzzles, prove mathematical theorems, program. This is also quite difficult, since reason itself is doubtless guided by its own essential stream of unconscious perceptions, which give human reasoning a sense of fitness, analogy, and direction that computers lack. The overall consequence is that humans can learn by assembling related fragments of information into useful wholes. Computers, still lacking this ability, must still be programmed. For a person, the Encyclopedia Britannica, or the Library of Congress, is a treasure-trove of usable information. For a computer, as for a squid, it is simply a mystery, even though the computer is far better than the squid at storing, alphabetizing, and to some extent categorizing this information, all without being able to use it. But the squid's visual abilities are far more advanced.

The central importance of the human ability to integrate interests artificial intelligence researchers in all means of generating structured wholes from initially unordered heaps of information. The path-finding routines described in the preceding section do this, and so have been muchh studied. How far can path-finding approaches be pushed? Though encouragingly general, they are ultimately overwhelmed by the size of the state spaces that they may need to search.

A small buckets-and-well problem like that considered above has a small state space: since the allowed operations always leave an integer number of quarts in each bucket, and since the two buckets have sizes 3 and 5, the set of possible states has at most 4 * 6 = 24 elements. But what if we allowed 40 buckets, with sizes up to 9 quarts? Then the number of states could be as muchh as 10 ** 40, i.e. 10,000,000,000,000,000,000,000,000,000,000,000,000,000 states: surely too many to search blindly. What then to do?

A decomposition strategy adapted to their special structure will often work for buckets-and-well problems. As an example, consider any bucket problem in which buckets of sizes s1 and s2 without any common factor appear. Examples are [3,5], [5,7], [6,7], etc. These two buckets can be used to measure out 1 quart. To do so, we just use the fact that the greatest common factor F of s1 and s2 can always be expressed either as A * s1 - B * s2 or as B * s2 - A * s1. (Euclid's algorithm for calculating F also calculates these A and B.) Since s1 and s2 have no common factor, F must be 1; so we have either A * s1 - B * s2 = 1 or B * s2 - A * s1 = 1. If the first formula holds we can fill the bucket of size s1 A times and pour it into the other bucket, which we just empty whenever it gets full. Clearly 1 quart must be left at the end. The case in which the other formula holds is just the same. Once measured out, this 1 quart can be poured repeatedly into any desired one of the other buckets, so we can clearly bring all the other buckets to any state we like.

Every pouring operation available to us always leaves at least one of the buckets completely full or completely empty. Hence a target state is only reachable if it includes at least one completely full or one completely empty bucket. Since this bucket can be emptied in one step if it is full, and vice-versa, we might as well suppose that it is empty in the target state we seek. Call this the 'spare' bucket. Using the '1 quart' technique just explained, bring all the buckets except the s1 and s2 quart buckets to their desired state, and put the target amount for the s1 quart bucket into the 'spare' bucket (which we assume is no smaller). Express the target amount W of water for the s2-quart bucket as W = A * s1 - B * s2 or W = B * s2 - A * s1. Using the 'repeated pouring' technique explained in the preceding paragraph, bring W quarts into the s2-quart bucket, using only the s1 and s2-quart buckets. Finally, pour the 'spare' bucket into the s1-quart bucket. The problem is now solved.

These arguments tell us that, in the cases considered, a state is reachable from the starting state if and only if it includes at least one completely full or completely empty bucket. So the total number of these accessible states is

For example, if the vector of bucket sizes is [3,5,7,11,13,17,19,23], then 103,594,260 out of a total of 111,546,435 possible states are reachable from the start [0,0,0,0,0,0,0,0]. This formula is useful for the discussion of modified path-search procedures given beloww.

Ordinary software practice uses an army of programmers to generate a river of programs tailored to an endless variety of special situations. Research in artificial intelligences seeks to replace all this by finding a single magic key (or, perhaps, small box of magic keys): the one program, or complex of programs, smart enough to write all the programs we want, given only loose indications of what is wanted. Search programs like that considered above have often been regarded as first steps to this magic key. They must clearly be made capable of dealing with potentially enormous search spaces if they are to play this role. Toward this end, a wide variety of strategies have been attempted.

- Identification of intermediate points. If we can identify,
*a priori*, some point P that must lie along the start-to-target path to be constructed, then our problem decomposes into two easier ones: find a path from the start S to P, and from P to the target T. If this*a priori*identification step can be repeated, direct construction of the desired path may become possible. This was the key to our solution of the 'Towers of Hanoi' problem considered in Section XXX. To move all the disks one must move the bottom-most one, and this can only be moved from a to b if all the smaller disks have first been moved to the third peg c. Hence all solution paths must certainly involve the intermediate state: large disk in original position - target peg clear - all other disks on third peg. This reasoning, iterated, leads directly to the problem solution given in Section XXX, and even tells us that this solution is of the optimal length: 2^{n}- 1 for an n-ring Hanoi puzzle. The full state space for this puzzle has 3^{n}points, since any ring could in principle appear on any of the 3 pegs. Our reasoning therefore allows us to ignore all but a fraction (2/3)^{n}of the state space, which for an n = 20 Hanoi puzzle is .03%. - Search pruning. If
*a priori*reasoning can show that, if a path from S to T exists, then the shortest such path, or some other identifiable path, will or can omit certain pre-identified points, then we can check each new point to see if it is one of these points, and if so can drop it immediately. - Search guided by heuristics. If the true distance of each state from the target state were known, search would always be easy: in each state S we could simply search for an adjacent state S' one step closer to the target, and move from S to S'. Even if no exact way of calculating this distance is available, we can hope to use some easy formula approximating this true distance as a guide to searching: this will at tell us when we are 'getting warmer'. To use such a formula, we prefer points which seem to lie close to the target to other points, always searching forward from points seemingly closest to the target, and only considering points at a greater estimated distance when all such points have been exhausted. A variant find_path procedure using this strategy is given beloww. It can be seen to work well for buckets-and-well problems.
- Relaxed search. We can begin by trying to form, not a real path between the specified start and target states, but an initial rough plan for a subsequent final final search. This 'plan' can consist of a series of states which we hope will be reachable from each other by paths that are easier to find than the full path from start to finish. To generate the plan we then take steps which are not themselves legal, but which we nevertheless have reason to believe can be 'fleshed out' by fuller sequences of legal steps. To realize the plan once it has generated, we simply search for links between its successive nodes. Nodes which cannot be linked to their successors, either after exhaustive search or after a search effort deemed sufficient, can be dropped from the plan, after which the 'fleshing out' process can resume, now trying to complete the rougher plan which results. When its planning component fails, this approach will eventually reduce to a simple, planless effort to connect the start and target.
- Decomposition. It may be possible to map the state-space of the problem we are trying to solve onto some more easily searched space, in such a way that any two nodes in the original space which map into adjacent nodes N_i, N_j in the reduced space can be connected by a path P_i_j in the original space that is relatively easy to find. We can then try to find a path N_1, N_2,...,N_k in the reduced space, from the mapped initial state to the mapped final state, and then try to connect a series of states S_1, S_2,...,S_k mapping to the steps N_1, N_2,...,N_k in this plan by paths in the original state space.
- Two-way search. As children interested in paper-and-pencil mazes quickly learn, it is often easiest to construct a path from a specified start to a desired target by working simultaneously forward from the start and backward from the target. This will be the case if the state space 'branches' rapidly. Suppose, for example, that the path to each point reached in state space can be continued in one of 10 ways. Then comprehensive search n levels deep from any given point will generate a set of 10**n points. Plainly blind search by this method for a path between a source and a target 14 steps away will fail in the situation considered, since (1/2) * 10**14 states would probably need to be examined before a path was found. But if we search simultaneously forward from the source and backward from the target until some common point is reached, we may only need to go 7 steps forward from the source and 7 steps backward from the target, and so might find a solution after examining only 10**7 states, a far more feasible collection.

All of these strategies are highly fallible, and need not all work well even for buckets-and-well problems, which at first glance seem so transparent. Nevertheless buckets-and-well problems provide an interesting laboratory for study of the search strategies we have listed. Not all the strategies listed are easy to apply. The combinatorial detail critical to buckets-and-well problems leaves us without any obvious way of identifying either intermediate states which a solution must traverse, or large numbers of states which can be omitted without disrupting all solutions. The easiest strategies to apply are the 'decomposition','relaxed search', and the 'search guided by heuristics' approaches. One way of trying to decompose a buckets-and-well problem is by bring one of the buckets to the desired target condition, and then work on the other buckets without using this 'finished' bucket. A closely related 'search guided by heuristics' approach is to use the number of buckets which have not yet reached their target content as a heuristic measure of distance from the desired target. A surprisingly elementary 'relaxed search' approach, explored beloww and seen to work well,is simply to form'rough' paths by allowing the contents of any bucket to be changed arbitrarily.

The following variant of our first find_path procedure implements a general form of 'search guided by heuristics' strategy. It always searches forward from points seemingly closest to the target, putting all other points in a set called 'set_aside'. When all points along what seems to be the most direct path to the target have been exhausted, search backtracks to the points in 'set_aside' which seem to be closest to the target.

This find_path procedure is instrumented by insertion of statements which print out the distance-to-target estimate being used whenever it changes. The two lines of code inserted for this purpose are shown in italics.

A estimated distance-to-target function which reflects our expectation that we can find a solution which brings more and more buckets into their target condition is given following the 'find_path' procedure. Experiments with this function show that it works rather well. For example, with buckets of sizes [3,5,7,11,13,17,19,23] the target [3,4,5,3,5,2,2,9] is found at the end of a 38-step path after searching just 6,543 of the 103,594,260 reachable states. With the distance-to-target feature turned off, we would expect about half the states to be searched.

With buckets of sizes [3,5,7,11,13,17,19,23,29] the target [3,4,5,3,5,2,2,9,22] is found at the end of a 34-step path after searching just 10,007 of the 3,020,137,890 reachable states. With buckets of size [3,5,7,11,13,17,19,23,29,31] the target [3,4,5,3,5,2,2,9,22,30] is found at the end of a 94-step path after searching just 41,473 of the 94,053,692,040 reachable states.

The distance-to-target used for buckets-and-well problems should reflect our expectation that we can find a solution which brings more and more buckets into their target condition. Hence we simply estimate distance-to-target as the number of buckets which have not yet reached this condition. This function is:procedurefind_path(start,target); -- general state-exploration procedure, incorporating use of estimated distance to solution reached_from := {[start,start]}; -- the start state is considered -- to have been reached from itself best_dist := dist_to_target(start,target); just_seen := {start}; -- initially, only the start state has been seen set_aside := { }; -- collection of states temporarily set aside, because state believed closer to solution is known got_it :=false; -- we don't have the solution yetwhilejust_seen /= { }loop-- while there exist newly seen states brand_new := { }; -- look for states that have not been seen beforeforold_stateinjust_seenloopifdist_to_target(old_state,target) > best_distthenset_asidewith:= old_state; -- set this node aside, perhaps temporarilycontinue; -- and do not generate new states from this nodeend if;fornew_stateinnew_states_from(old_state) | reached_from(str(new_state)) =OM loopreached_from(str(new_state)) := old_state; -- and record its originif(dtt := dist_to_target(new_state,target)) <= best_distthenbrand_new with:= new_state; -- record a brand_new statebest_dist min:= dtt; -- we may now have a better distanceifdtt < best_distthen print("best_dist: ",dtt);end if;elseset_aside with:= new_state; -- set this node aside, perhaps temporarilyend if;ifnew_state = targetthengot_it :=true;exit; -- since problem has been solvedend if;end loop; -- end for new_state just_seen := brand_new; -- now the brand-new states -- define those which have just been seenifgot_itthen exit; end if; -- since problem has been solvedif#set_aside = 0then exit; end if; -- all states have been triedif#just_seen > 0then continue; end if; -- distance to target may still diminish -- otherwise we must backtrack, restarting with all the best of the states that have been set aside best_dist := min/[dist_to_target(state,target): state in set_aside]; just_seen := {stateinset_aside | dist_to_target(state,target) = best_dist}; set_aside -:= just_seen; -- the states now to be processed are no longer 'set aside'end loop; -- end for old_stateend loop; -- end whileif notgot_itthen return OM; end if;-- since all states have been explored, and the target -- has not been found, we know that no solution exists. -- at this point the target has been found, so we chain back from the target -- to reconstruct the path from start to target rev_path := [target]; -- initialize the path to be builtwhile(last_state := rev_path(#rev_path)) /= startlooprev_path with:= reached_from(str(last_state)); -- chain backwards to the startend loop;return[rev_path(j): j in [#rev_path, #rev_path - 1..1]]; -- reverse the pathendfind_path;

proceduredist_to_target(state,targ); -- estimated distance to target -- return 1; -- disable the estimatereturn#[t: t = targ(j) | t /= state(j)];enddist_to_target;

**Plan-guided path construction**. For buckets-and-well problems we can combine heuristically guided search with path planning by using this same pathfinding and distance-to-target function under the control of a top-level routine which first uses a relaxed new_states_from function to generate the plan, and then ties to flesh out the plan by filling in real paths between its steps. The relaxed new_states_from function can simply allow the state of any buckets to be changed arbitrarily. This is:

The top-level routine is as follows:procedurerelaxed_new_states_from(state); -- variant new states function, for rough planningreturn{new_state: j in [1..#state], k in [0..size(j)] | (new_state(j) := k) /= OM};endrelaxed_new_states_from;

Experiment shows that the planning strategy shown above improves the efficiency of path-finding somewhat, though at the cost of an increase in the length of the paths found. For buckets of sizes [3,5,7,11,13,17,19,23] the target [3,4,5,3,5,2,2,9] is found after searching 743 instead of 22,628 of the 103,594,260 reachable states, but the length of the path found increases from 24 to 82. For the larger cases reported above results are as follows:procedure find_path_by_planning(start,target);varnew_states_from; -- allowed-step function used by find_path routine -- first allow 'roughly correct' steps, to generate a plan new_states_from := relaxed_new_states_from;if(plan := find_path(start,target)) =OM then return OM; end if; -- no plan can be found -- now use 'exact' steps, to fill in the plan new_states_from := exact_new_states_from; details := []; -- will show how j-th step of plan was filled in plan_index := 1; -- next step of plan to be filled inwhileplan_index < #planloop-- try to fill in the plan with exact steps, abandoning plan steps which don't workif(steps := find_path(plan(plan_index),plan(plan_index + 1))) =OM then-- cannot take this step of plan; drop a future plan step if possibleifplan_index + 1 < #planthenplan(plan_index + 1) := [ ];elseifplan_index > 1then-- drop the prior step of the plan, and back up plan(plan_index) := [ ]; plan_index -:= 1;else-- impossible to connect start with targetreturn OM;end if;else-- record sequence of steps to the next plan point details(plan_index) := steps; plan_index +:= 1;end if;end loop; -- at this point we are done, and simply need to assemble all subsequences -- of steps into an overall solution, dropping repeated nodesreturn+/[steps(ifj = 1then1else2end if..): stepsindetails];endfind_path_by_planning;

Sizes | Target | Nodes Searched | Path Length | Prior Searched | Prior Length |

[3,5,7,11,13,17,19,23,29] | [3,4,5,3,5,2,2,9,22] | 1,126 | 94 | 10,007 | 34 |

[3,5,7,11,13,17,19,23,29,31] | [3,4,5,3,5,2,2,9,22,30] | 1,368 | 250 | 41,473 | 94 |

- The 'factorial' function n!, given by */ [i : i
**in**[1.. n]], satisfies the identityn! = **if**n = 1**then**1**else**n * ((n-1)!)**end if**; - The sum sigma(t) = +/t of all the components of a tuple t satisfies the identity
sigma(t) = **if**#t = 0**then**#t = 1**OM**elseif**then**t(1)**else**t(1) + sigma(t(2..))**end if**; - The tuple sort(s) representing the elements of a set s in sorted order satisfies the identity
sort(s) = **if**#s = 0**then**[ ]**else**[**min**/s] + sort(s**less****min**/ s)**end if**;

This same function sort(s) also satisfies many other interesting identities. Suppose, for example, that we pick an arbitrary element x from the set s and then divide the remaining elements of s into two parts, the first, L, containing all elements less than x, the second, G, containing all elements greater than x. Then if we sort the elements of L and G and concatenate the resulting sorted tuples, sandwiching x between them, we clearly get a tuple t which contains all the elements of s in sorted order. This shows that the function sort(s) also satisfies the identity

sort(s) =if(x :=arb(s)) =OM then[ ]elsesort({yins: y < x}) + [x] + sort({yins: y > x})end if;

Identities of the kind appearing in the preceding examples are called *recursive definitions*, and the functions appearing in them are called *recursively defined functions*. Such recursive definitions all have the following features:

- For certain particular simple or minimal values (like n = 1 in (i) or t = [ ] in (ii)) of the parameter variable x of a recursively defined function f(x), the value of f(x) is defined explicitly.
- For all other parameter values x, the value of f(x) is expressed in terms of the value that f produces for one or more smaller argument values x1,x2, . . xn. That is, there exists a relationship of the general form
f(x) = some_combination(f(x1),f(x2),..,f(xn)) - Repeated use of the relationship (b) will eventually express any value f(x) in terms of various values f(y) each of which has a parameter y which is minimal in the sense of (a), so that all values f(y) in terms of which f(x) is ultimately expressed are known explicitly.

Any recursive relationship satisfying (a, b, c) gives a method for calculating f(x) for each allowed argument x. Like many other programming languages, SETL allows one to express such recursive calculations very simply and directly, by writing recursive procedures, i.e., procedures which invoke themselves. This can be done for each of the three examples given, which then take on the following forms:

procedurefactorial(n); -- calculates the factorial n!return ifn = 1then1elsen * factorial(n - 1)end if;endfactorial;proceduresigma(t); -- calculates the sum of the components of t.return if#t = 0then 0 elseif# t = 1thent(1)elset(1) +sigma(t(2..))end if;endsigma;proceduresort(s); -- recursive sorting procedurereturn ifs = { }then[ ]else[min/s] + sort(sless min/ s)end if;endsort;proceduresort(s); -- second variant of recursive sorting procedurereturn if(x :=arbs) =OM then[ ]elsesort({yins | y < x}) + [x] + sort({yins | y > x})end if;endsort;

These examples illustrate the following general remarks concerning recursive procedures:

- Syntactically, recursive procedures have the same form as other procedures. Their only distinguishing trait is that recursive procedures invoke
themselves.
- The same name-scoping rules apply to recursive as to other procedures.

Note that a recursive procedure f(s) uses itself but always applies itself to arguments smaller than s; this is why the calculation of f eventually terminates.

A recursive procedure f need not invoke itself directly: It can invoke
another procedure g which invokes f, or g can invoke some h which then invokes f, etc. A group of procedures which invoke each other is sometimes called a *mutually recursive* family of procedures, and any procedure belownging to such a mutually recursive family is itself called recursive.

For an example of such a mutually recursive family, consider the problem of defining an overall order for SETL objects, which will allow any two SETL objects to be compared to each other. (Such an order could, for example, serve as the basis for an output routine which alway arranged the elements of sets in increasing order, thereby making it easier to locate elements in large sets when they were printed.) To define such an order, we can agree on the following conventions:

**OM**always comes first, integers before reals, reals before strings, strings before atoms, atoms before tuples, and tuples before sets.- Among themselves, integers and reals are arranged in their standard order,
strings in their standard alphabetical order, and atoms in the order of their
external printed representations; i.e., if x and y are two atoms then x comes
before y if and only if
**str**(x) <**str**(y). (Recall that**str**(x) operator produces a string identical with the external printed form of the object x; see Section 2.5.3) - Tuples are arranged in lexicographic order, i.e., t
_{1}comes before t_{2}if, in the first component in which t_{1}and t_{2}differ, t_{1}has a smaller component than t2. - To compare two sets, first arrange their elements in order. This allows them to be regarded as tuples; then apply rule (c).

The following mutually recursive group of procedures implements the ordering strategy we have just described.

procedureis_bigger(x,y); -- return true if x >= y in the -- order just describedreturn ifx = y or y =OM then trueelseifx =OMthen falseelseiftype(x) /=type(y) then type_number(type(x)) > type_number(type(y))elseifis_integer(x)thenx >= yelseifis_real(x)thenx >= yelseifis_string(x)thenx >= yelseifis_atom(x)thenstr(x) >str(y)elseifis_tuple(x)thenlex_compare(x,y)elselex_compare(sort(x), sort(y))end if; -- x and y are setsendis_bigger;procedurebiggest(S); -- find largest element in S, -- in the ordering defined by is_bigger. big :=arb(S);forxinSloopifis_bigger(x, big)then-- x may be biggest big := x;end if; endloop;returnbig;endbiggest;

proceduresort(S);ifS = { }then return[ ];elseb := biggest(S);returnsort(Slessb)withb;end if;endsort;procedurelex_compare(t1,t2); -- compare two different tuples, -- in their lexicographic order, components being compared by is_biggerreturnexistsc1 = t1(i) | is_bigger(c1,t2(i));endlex_compare;proceduretype_number(typ); -- converts typ, which is the -- name of a valid SETL type, into an integer tno := {["INTEGER", 1], ["REAL",2], ["STRING",3], ["ATOM",4], ["TUPLE",5],["SET",6]};returntno(typ);endtype_number;

Until now we have regarded recursive SETL procedures simply as SETL representations of recursive mathematical relationships and have ignored the question of how they are implemented, i.e., how the calculations which they define are actually performed. Our abstract view is really the best way to look at the matter, since the sequence of steps used to evaluate a recursive procedure can be complex and tricky to follow even when the mathematical relationship on which it is based is simple and easy to understand. Nevertheless one needs to understand how recursive calculations are performed. For example, when an incorrectly programmed recursive procedure malfunctions, one needs to know what is happening in order to diagnose the problem and correct it.

Implementation of recursive procedures, like that of mutually recursive groups of functions, is based upon the following rule. Whenever a procedure f invokes itself, a new logical copy of the procedure is created, initial parameter values are passed to this new logical copy, and execution of this new logical copy begins with its first statement. While the new copy of f is executing, the old copy of the function f, from which the new copy was created, remains in existence, but execution of it is suspended. The new copy can in turn invoke f, thereby creating a third copy of f, which can even go on in the same way to create yet a fourth copy, etc. However, if the recursion has been written correctly, the arguments x passed to the successive copies of f will be getting smaller and smaller. Eventually one of them will get small enough for the corresponding value f(x) to be evaluated directly. Once this happens, the currently active copy of the procedure f will execute a statement

for some directly evaluable expression e. This will pass the value of e back to the place from which the current copy of f (call it CCF) was invoked. CCF will then become superfluous and will disappear. The immediately prior copy of f will then become active, and when it finishes its execution it will in turn pass a value back to the copy of f from which it has been invoked and disappears, etc. Eventually a value, and control, will be returned to the very first copy of f, and the whole recursive evaluation will be completed as soon as this first copy executes a **return** statement.

As an example of this process of recursive evaluation, suppose that the recursive sort routine shown earlier in this section is invoked, and that initially the argument value {30,0,60,40} is transmitted to it. This will trigger the following steps of recursive evaluation.

- Copy 1 of sort begins to evaluate sort({30, 0, 60, 40})
- The minimum element 0 is removed from the set s, and sort is invoked
recursively to evaluate sort({30,60,40})
- Copy 2 of sort begins to evaluate sort({30, 60, 40})
- The minimum element 30 is removed from the set s, and sort is invoked recursively to evaluate sort( {60, 40} )
- Copy 3 of sort begins to evaluate sort({60,40})
- The minimum element 40 is removed from the set s, and sort is invoked recursively to evaluate sort({60})
- Copy 4 of sort begins to evaluate sort({60})
- The minimum (and only) element 60 is removed from the set s, and sort is invoked recursively to evaluate sort({ }).
- Copy 5 of sort immediately returns [ ] as the value of sort({ }) to copy 4 and disappears.
- Copy 4 of sort appends the returned value [ ] to [60], returns the result [60] to copy 3, and disappears.
- Copy 3 appends the returned value [60] to [40], returns the result
[40, 60] to copy 2, and disappears.
- Copy 2 appends the returned value [40, 60] to [30], returns the result
[30, 40, 60] to copy 1, and disappears.
- Copy 1 appends the returned value [30,40,60] to [0], and returns [0, 30, 40, 60], as the final result of the whole recursive evaluation, to the place from which sort was first invoked.

The complexity of this sequence of steps underscores the fact that whenever possible a recursive SETL function like sort should be looked at as the transcription of a recursive mathematical relationship, in this case, the very obvious relationship

rather than in terms of the sequence of steps required for its evaluation. However, the way in which recursive procedures are evaluated becomes relevant if they are miswritten and consequently malfunction. Certain common pathologies are associated with malfunctioning recursive routines, and one needs to be able to recognize them when they appear. A common error is to write a recursion which does not handle its easy, directly evaluable cases correctly, or which for some reason never reaches a directly evaluable case. If this happens, a recursive procedure will create more and more copies of itself without limit, until the entire memory of the computer on which it is running is exhausted, and a final, "MEMORY OVERFLOW" error message is emitted.

In somewhat more complex cases, a malfunctioning recursive procedure
will loop indefinitely, first creating additional copies of itself, then returning from and erasing these, then again creating new copies of itself, again returning from and erasing these, etc., without any overall progress to termination. Such a nonterminating recursive loop is likely to produce muchh the same symptoms as a nonterminating **while** loop; namely, the program will run on, either with no output or with a flood of repetitive output, until somebody notices that it has outrun its time limit and terminates it forcibly. This situation is most easily diagnosed at an interactive terminal, simply by printing out the parameters transmitted to the recursive function each time it is invoked; this pattern of parameters will fail to show the logical pattern upon which your hopes for eventual termination of the recursion rest.

Having said all this, we now go on to describe another interesting recursive procedure, appropriately called quicksort.

This quicksort sorting method works as follows: If the tuple t of elements to be sorted has no elements or just one element, we have nothing to do, since an empty tuple or a tuple with just one element is always sorted. Otherwise,

we remove the first element x from t and divide what remains into two parts, the first ("small_pile") consisting of all those components smaller than x, the second ("large_pile") consisting of all those components at least as large as x. We then sort these two piles separately. This can most readily be done just by using quicksort itself recursively. Finally, we recombine to get all the original components in their sorted order. This is done by putting the sorted small_pile first, followed by the element x, and then followed by the sorted large_pile.

See Figure 5.4 for further explanation of the way in which quicksort works. Code for this procedure can be written as follows:

procedurequick_sort(t); -- quicksort procedure, first formif#t < 2then returnt;end if; x := t(1); -- take the first component small_pile := [y: y = t(i) | y < x]; large_pile := [y: y = t(i) | y >= x and i > 1];returnquick_sort(small_pile) + [x] + quick_sort(large_pile);endquick_sort;

By using SETL expression features more strenuously, we can write this whole procedure in just one statement, namely as

procedurequick_sort(t); -- quicksort procedure, second formreturn if#t < 2thentelsequick_sort([y: y = t(i) | y < t(1)]) + [t(1)] + quick_sort([y: y = t(i) | y >= t(1) and i > 1])end if;endquick_sort;

The quicksort procedure that has just been presented sorts by separating an
array to be sorted into two piles which can be sorted separately and then
combined. This recursive approach, sometimes called *divide and conquer*,
forms the basis for many efficient data-manipulation algorithms. It is often
most effective to divide the problem given originally into two halves of nearly
equal size. Quicksort does not always lead to this equal division, since random
selection of a component x of a tuple t may cause it to be divided into parts [y: y **in** t | y < x] and [y: y **in** t | y > x] which are very different in size. For this reason, we will now describe another recursive sorting technique, called mergesort, which does begin by dividing the tuple t that is to be sorted into two parts of equal size. This procedure works as follows:

- Divide t (at its middle) into two equal parts t
_{1}and t2, and sort them separately. - Then merge the two sorted parts t1, t
_{2}of t, by removing either the first component of t_{1}or the first component of t2, whichever is smaller, and putting it first in the sorted version of the full tuple t, after which we can continue recursively, merging the remaining components of t_{1}and t2. Code for this procedure is as follows:

proceduresort(t); -- recursive merge_sort procedurereturn if#t < 2thent -- since a tuple of length 0 or 1 is ipso facto sortedelsemerge(sort(t(1..#t/2)), sort(t(#t/2 + 1..)))end if;endsort;proceduremerge(t1,t2); -- auxiliary recursive procedure for mergingreturn ift_{1}= [ ]thent2elseift_{2}= [ ]thent1elseift1(1) < t2(1)then[t1(1)] + merge(t1(2..),t2)else[t2(1)] + merge(t1, t2(2..))end if;endmerge;

Instead of programming the merge procedure recursively, we can write it
iteratively. For this, we have only to work sequentially through the two tuples
t1 and t_{2} to be merged, maintaining pointers i1, i2 to the first component of each which has not yet been moved to the final sorted tuple t being built up. Then we repeatedly compare t1(i1) to t2(i2), move the smaller of the two to t, and increment the index of the component that has just been moved to t. This revised merge procedure is as follows:

proceduremerge(t1,t2); -- iterative variant of merge procedure t := []; -- merged tuple to be built up i1 := i2 := 1; -- indices of first components not yet movedwhilei1 <= #t1andi2 <= #t2loopift1(i1) < t2(i2)then-- move t1(i1) to t twith:= t1(i1); i1 +:=1;else-- move t2(i2) to t t with:= t2(i2); i2 +:= 1;end if;end loop;returnt + t1(i1..) + t2(i2..); -- note that at most one of t1(i1..) and t2(i2..) is non-nullendmerge;

If the components of a tuple t are arranged in random order, then to find the component or components having a given value we must search serially through every one of the components of t. Clearly no component of t can go unexamined, since this may be precisely the component we are looking for. On the other hand, if the components of t are numbers or character strings, and if they are arranged in sorted order, then, as everyone who has ever looked up a word in a dictionary or a name in a telephone book should realize, a muchh faster searching procedure is available. The most elegant expression of this searching procedure is recursive and is as follows:

- Compare the item x being sought to the middle item t(#t / 2) of the sorted tuple t. If x is greater than (resp. not greater than) this middle item, search recursively in the left half of t, otherwise in the right half of t.
- The search ends when the vector in which we are searching has length equal to 1.

In coding this procedure, we maintain two quantities lo, hi, which are respectively the low and the high limits of the zone of t in which we must still search. When the search procedure is first called, lo should be 1 and hi should be #t. When lo and hi become equal, we return their common value. If this locates a component of t equal to x, we have found what we want; otherwise we can be sure that x is not present in t, i.e., that no component of t is precisely equal to x.

Recursive code for this searching procedure is as follows:

It is easy to express this search iteratively rather than recursively: we can simply writeproceduresearch(x, t, lo, hi); -- binary search for x in t between lo and hireturn iflo = hithenloelseifx <= t(mid := (lo + hi)/2)thensearch (x,t,lo,mid)elsesearch (x,t,mid + 1,hi)end if;endsearch;

proceduresearch(x, t); -- iterative form of binary search procedure lo := 1; hi := #t; -- initialize search limitswhilelo < hi loopifx <= t(mid := (lo + hi)/2)thenhi := mid;elselo:= mid + 1;end if;end loop;returnlo;endsearch;

Binary searching can be enormously more efficient than simple serial searching. Suppose, for example, that the sorted tuple t to be searched is of length 1,000,000. Then to search t serially several million elementary operations will be required. On the other hand, since 1,000,000 is roughly 2**20, only 20 probes will be required to locate a component of t by binary searching. So binary searching is roughly 50,000 times as fast as serial searching for sorted tuples of this length. This illustrates the vast efficiency advantage that can be gained by proper choice of the algorithm that you will use.

Among the many different kinds of puzzles that can be bought in toyshops, the Towers of Hanoi puzzle is a classic. This puzzle involves a board with three identical pegs and a set of rings of decreasing size that fit snugly around any of the pegs. As initially set up, the puzzle is as shown in Figure 5.5.

To solve the puzzle one must move all the disks from the particular peg (peg 1) on which they are originally placed to one of the other pegs (say, to peg 3). However, only one disk can be moved at a time, and it is forbidden ever to place a larger disk on top of a smaller disk.

Recursion gives us an amazingly effective way of writing a solution to this problem. The key idea is this: since a large disk can never be placed atop a smaller, all the disks except the bottom one must be moved to peg 2 before we can move the bottom disk from peg 1 to peg 3. Hence, to move a pile of n disks from peg 1 to peg 3, we must

- move a pile of (n-1) disks from peg 1 to peg 2
- move the n-th disk from peg 1 to peg 3
- move a pile of (n-1) disks from peg 2 to peg 3

The following elegant recursive procedure generates the sequence of moves required; each move is represented as a pair [f,t] showing the pegs from which and to which a peg is moved.

proceduremoves(ndisks,fr,to,via); -- moves n disks from peg fr to peg toreturnifndisks = 1then[[fr, to]]elsemoves(ndisks - 1, fr, via, to) + [[fr, to]] + moves(ndisks - 1, via, to, fr)end if;endmoves;

The procedures we have seen so far are given some collection of parameter
values and calculate a single result value, which it returns, from them. Occasionally, however, one wants to use procedures in a somewhat different way;
namely, one wants to invoke a procedure expressly in order to modify some
object that already exists. In this case, such a procedure is invoked for its effect, rather than for the value it delivers. This use of procedures moves us away from the notions of "value" and "expression" and focuses more on the somewhat different notion of *program state*, i.e., the collection of all values that local and global variables have at each moment during a computation. What we will be describing in this section is the way in which procedures are used to modify this program state. There are two ways in which procedures can have this effect: one of them is to modify one or more of their calling parameters; the second is to modify one or more global variables.

This use of procedures is perfectly legal in SETL and is accomplished as follows. A procedure's header line lists its parameters, as for example in

Parameters listed in this way can be modified within the body of the procedure (i.e., within my_proc), but parameter values are ordinarily local to the procedure, so that these modifications are not be transmitted back to the point from which the procedure was invoked. For example, if we define the procedure

procedurechange_parameter(x); (1a) x := 0;returnx;endchange_parameter;

and invoke it by

y := 1; z := change_parameter(y); (2)

then the **print** statement will produce the output

This reflects the fact that the **return** statement in the **procedure** returns the final value of the variable x (which is local to the **procedure**), but that modifications to the procedure parameter x are not transmitted back to the point of invocation and therefore do not affect the value of the actual argument y appearing in
(2). Thus the argument y remains unchanged.

This is the rule which ordinarily applies to **procedure**s, and which is most
appropriate for **procedure**s used as functions. However, it is possible to bypass this rule, and to create **procedure**s which do modify one or more of the actual arguments with which they are invoked. To do this, one simply prefixes the qualifier **rw** (meaning read/write parameter) to each parameter corresponding to one of these modifiable arguments. Suppose, for example, that we modify the procedure (1a), making it

procedurechange_parameter(rwx); x := 0;returnx;endchange_parameter;

Then the output of the **print** statement in (2) will change to

reflecting the fact that now changes in the value of the parameter x of the procedure (1b) will be transmitted back to the point from which the procedure was invoked.

Procedures whose parameters are qualified in this way will generally not be used as functions that return values (though technically it is legal to use them as functions). Instead, they will ordinarily be invoked simply by writing their names followed by their actual argument lists, as is illustrated by

y:= 1; change_parameter(y);

which produces the output

Any **procedure** my_proc(x1,..,xn) can be invoked in this way, simply by
writing a statement of the form

my_proc(a1,..., an); (4a)

where a1,...,an is any list of expressions (called, as usual, the actual arguments of the invocation (4a)). An invocation like (4a) is logically equivalent to an invocation

junk_variable := my_proc(a1,...,an); (4b)

where junk_variable can be the name of any variable whose value is never used for anything else.

Of course, if the procedure my_proc invoked by (4a) does not modify any
of its arguments, an invocation like (4a) will generally not be very useful, since none of the arguments a1,...,an will change, and since the value returned by my_proc is simply thrown away. On the other hand, if the procedure my_proc does modify its arguments, then the invocation (4a) will trigger corresponding modifications of any arguments which correspond to parameters carrying the qualification **rw**.

Procedures which modify some of their arguments and which are normally invoked in this way are often called simple-procedures, as distinct from functions, i.e. from procedures which do not modify their arguments and are normally invoked in the manner illustrated by

Since the value returned by a simple-procedure will just be thrown away, the expression e appearing in a statement

within such a procedure is usually without significance and may as well be **OM**. SETL allows

to be abbreviated simply as

and this is the form of the **return** statement which is appropriate to use in simple-procedures. Note also that a **return** statement immediately preceding the trailer line of a simple-procedure can be omitted.

Simple procedures with no parameters and which do not return any value can be invoked just by writing their names followed by a semicolon, as in

my_simple_proc_without_parameters; -- invokes procedure with this name.

As an example, here is a simple-procedure which "compresses" a tuple by dropping out all of its **OM** components:

procedurecompress (rwt); t := [xint | x /=OM]; (5a)endcompress;

(Here we have made use of one of the rules stated previously to save writing
a **return** statement just before the trailer line of this proc.)

Note that if x initially has the value [1,**OM**,**OM**,**OM**,2,**OM**,3], then the
invocation

compress(x); (6a)

will give x the value [1,2,3].

As a matter of style, note also that instead of writing (5a) we could have written a closely related function, namely,

procedurecompress (t);return[xint | x /=OM]; (5b)endcompress;

in which case would have had to write

x := compress(x); (6b)

to get the effect of (6a). The form (6a) is sometimes slightly more convenient to write, and it is this convenience that can induce us to write a simple-procedure rather than a function for some purpose we have in mind.

In addition to the parameter qualifier **rw**, two additional qualifiers **rd** and **wr** are provided. These parameter qualifiers have the following significance:

rd
| read parameter: can be read and written within its procedure, but modifications to it will not be transmitted back to the corresponding actual
argument. |

rw
| read/write parameter: can be read and written within its procedure, and modifications to it will be transmitted back to the corresponding actual
argument. |

wr
| write-only parameter: can be written and will be transmitted back to the corresponding actual argument, but will not be read. |

If none of these qualifiers is attached to a particular procedure parameter,
the parameter will be treated as if it were qualified with **rd**. Thus **rd** is the default qualifier for otherwise unqualified parameters of procedures.

Next suppose that a procedure called my_proc has one parameter x which is qualified with **rw** or **wr**. In this case an invocation

my_proc(e); (7a)

of my_proc is translated by introducing an otherwise unused temporary variable (call it var), and treating (7a) exactly as if it were

var := e; my_proc(var); (7b) e := var;

The last line indicates that the only expressions which can appear as actual arguments in place of parameters qualified by **rw** or **wr** are those which can legally appear to the left of an assignment operator. (See Section 3.12 for a comprehensive discussion of these assignment targets). This means that the invocations

and

are illegal, but

are legal and translate as

var := tuple(x); my_proc(var); tuple(x) := var;and

var := [x,y]; my_proc(var); [x,y] := var;

respectively.

One final, rather esoteric, point deserves mention. Actual argument values
are transmitted to a procedure and become the values of its formal parameters
immediately upon invocation of the procedure. These values are transmitted
by copying; i.e., each parameter receives a logically independent copy of the
appropriate actual argument value upon procedure invocation. If the procedure modifies its parameters, it is these copied values that are modified while the procedure runs; the original argument values remain unchanged. Moreover, even if the procedure
transmits changes in its parameter values back to the point of invocation,
these changes are only transmitted when the procedure executes a **return**, at
which time an assignment like that appearing in (7b) takes place. These rules
are natural enough and normally require little thought. However, examples
which show their effects can be contrived. For example, consider the following
code, in which the variable y is global:

Note finally that the last line of output produced by this program, which will be produced by theprogramesoteric;varx, y; -- This declaration makes x and y global x := "initial_val_of_x"; y := "initial_val_of_y"; manipulate(x,x,y); -- invoke procedure shown belowwproceduremanipulate(rw u,rw v,rw w);endmanipulate;endesoteric;

since after return from 'manipulate' y gets the value assigned to w by 'manipulate'.

EXERCISES

1. Write a procedure whose inputs are a tuple t of integers and a tuple s of integers
in increasing order, and which returns a tuple t_{1} of length #s + 1 defined as
follows: the first component of t_{1} is the number of components of t which are not greater than s(1); for j between 2 and #s, the j-th component of t_{1} is the number of components of t which are greater than s(j-1) but not greater than s(j); and the last component of t_{1} is the number of components of t which are greater than the last component of s. Try to make your procedure efficient.

2. "Bags," used in some programming languages, are like sets, but each element of a bag can occur several times (i.e., any specified number of times). In SETL, a bag b can be represented in two obvious ways.

(a) by a tuple: i.e., the elements of B can be arranged in some arbitrary order and made the components of a tuple; or

(b) by a map, which sends each element of B into the number of times that it occurs in B.

Write a pair of procedures that convert between these two different representations of a bag B. Also, write a collection of procedures which extend the following set operations to bags in the most useful way:

(i) b1 + b2, b1*b2, and b2-b2 (where b1 and b2 are bags)

(ii) x **in** b (where b is a bag and x is arbitrary)

3. The following table describes the tax due on D dollars of taxable income. Write a procedure which, given D, will return the amount of tax due.

Income Over | But Not Over | Tax |

2,300 | 3,400 | 14% |

3,400 | 4,000 | 154 + 16% of Amount Over --3,400 |

4,000 | 6,500 | 314 + 18% of Amount Over 4,400 |

6,500 | 8,500 | 692 + 19% of Amount Over 6,500 |

8,500 | 10,800 | 1,072 + 21% of Amount Over 8,500 |

10,800 | 12,900 | 1,555 + 24% of Amount Over 10,800 |

12,900 | 15,000 | 2,059 + 26% of Amount Over 12,900 |

15,000 | 18,200 | 2,605 + 30% of Amount Over 15,000 |

18,200 | 23,500 | 3,565 + 34% of Amount Over 18,200 |

23,500 | 28,800 | 5,367 + 39% of Amount Over 23,500 |

28,800 | 34,100 | 7,434 + 44% of Amount Over 28,800 |

34,101 | 41,500 | 9,766 + 49% of Amount Over 34,100 |

41,500 | 55,300 | 13,392 + 55% of Amount Over 41,500 |

55,300 | 81,800 | 20,982 + 63% of Amount Over 55,300 |

81,800 | 108,300 | 37,677 + 68% of Amount Over 81,800 |

108,300 | ---------- | 55,697 + 70% of Amount Over 108,800 |

4. Write a program which will read in a sequence of lines, each containing someone's name, first name first, and print out an alphabetized list of these names, in alphabetic order of last names. Repeat this exercise, but this time print the alphabetized list with last names first.

Three Exercises on Permutations

A *permutation* is a one-to-one mapping of a set s of n items
into itself. If the set s consists of the integers from 1 to n,
then such a permutation can be represented as a vector v of length n
such that every integer from 1 to n appears as a component of v.
The following exercises concern various properties of permutations.

5. The *product* prod(v1,v2) of two permutations v1 and v2 is the vector v such that
v(i) = v1(v2(i)) for each i in {1.. #v}. The *identity permutation* e of n integers is
the permutation represented by the vector [1,2,..,n]. The *inverse* inv(v) of a
permutation is the permutation such that prod(v,inv(v)) = e. Write two SETL
procedures prod and inv which realize these operations. Test them with the help
of a procedure rand_perm(n) that generates a different random permutation of
the integers from 1 to n each time it is called.

6. Check the following facts concerning permutations by generating a few random permutations and verifying that each fact asserted holds for these permutations. (The routines described in Ex. 5 should be used for this purpose.)

- The product of two permutations is a permutation, and the product of permutations is associative.
- prod(inv(v), v) = e for each permutation v.
- prod(inv(u), inv(v)) = inv(prod(v, u)) for any two permutations u, v of n elements.
- Define power(u, k) to be the product of k copies of the permutation v. Check that power(v, j + k) = prod(power(v,j), power(v, k)). Check that for each permutation v there exists a positive integer k such that power (v, k) = e.
- Is prod(u, v) = prod(v, u) true for every pair u, v of permutations of n items?

7. A simple recursive procedure to generate all the permutations of the elements of a set s is the following:

procedurepermutations(s);ifs = { }thenreturn{[ ]};elsereturn{[x] + P: x in s, P in permutations(s less x)};end if;endpermutations;

It is often more convenient to generate permutations one by one, by successive calls to a generating procedure. For example, a program to generate all permutations (rearrangements) of the integers 1 thru n can be built up as follows. Start with the numbers in the sequence s = [1. . n]. Then repeatedly find the last element s(j) in the sequence s such that s(j + 1) > s(j). Let s(i) be the last element following s(j) such that s(i) > s(j). Interchange s(i) with s(j), and then reverse the sequence of elements following the j-th position. This gives the next permutation s.

Write this permutation-generation procedure in SETL, and use it to write out the list of all permutations of the integers 1 thru 5. Use this same procedure to create a program which reads in a string of length 5 and prints it out in all possible permutations, but without any repetitions.

8. If a second-order polynomial P(x) = A*(x**2) + B*x + C with integer coefficients A, B, C has a first-order polynomial M* x + N with integer coefficients as a factor, then M is a factor of A and N is a factor of C. Write a procedure which uses this fact to test polynomials like P(x) to see whether they can be factored and that produces the two factors of P if P can be factored. How efficient can you make this factorization procedure? Can you devise a similar procedure for factoring third-order polynomials with integer coefficients?

9. Many years ago, tokens on the New York City subway system cost 60 cents. Tokens are sold at change booths. Purchasers normally pay for tokens without saying anything, simply by passing a sum of money to the token booth attendant. Certain sums of money (e.g., $1, which will purchase only one token) are unambiguous. Others, like a $5 bill, are ambiguous, since they will purchase anywhere from one to eight tokens. On the other hand, $5.50 is unambiguous, since the likely reason for adding the last 50 cents is to pay for nine rather than just eight tokens. Write a program which will read a tuple designating a collection of bills and coins, decide whether this is ambiguous or unambiguous, and print out an appropriate response (which might be either 'How many tokens do you want?' or 'Here are n tokens').

10. Before Britain began to use decimal coinage, its money consisted of pence, shillings worth 20 pence each, and pounds worth 12 shillings each. Write a procedure to add sums of money represented in this way, reducing the sum to pounds, shillings, and pence. (Sums of money can conveniently be represented as triples.) Write a procedure that will subtract sums of money represented as pounds, shillings, and pence, and which could have been used to make change in predecimal British shops.

11. Write a function whose argument is a tuple t with integer or real coefficients, and which returns the positions of all the local maxima in t, i.e., all the components of t which are larger than either of their neighboring components.

Exercises on Recursion

12. The greatest common divisor gcd(x,y) of two positive integers is the largest
positive integer z such that (x **mod** z) = 0 and (y **mod** z) = 0. (If x and y are equal,
then gcd(x, y) = x). Write procedures each of which calculates gcd(x, y) efficiently
by exploiting one of the following mathematical relationships:

(a) gcd(x,y) = gcd(x - y,y) **if** x > y

(b) gcd(x,0) = x and gcd(x,y) = gcd(x **mod** y,y) **if** x > y.

(c) gcd(x,y) = 2 * gcd(x/2, y/2) **if** x and y are both even.

(d) gcd(x,y) = gcd(x/2, y) **if** x is even and y is odd

(e) gcd(x,y) = gcd(x - y, y) **if** x and y are both odd and x > y.

13. Suppose that we make the gcd procedure of Ex. 12 into an infix operator **gcd** and
then evaluate **gcd**/ s for a set s. What result does this produce?
Assuming that s1 and s2 are non-null sets, is the identity

always true? What will happen if on,e of s1 or s2 is null?

14. A rational number m/n (with integer numerator and denominator) can be represented in SETL as an ordered pair [m,n]. Using this representation, write definitions for procedures called rs, rd, rp, and rq, which respectively form the sum, difference, product, and quotient of two fractions. These procedures should reduce fractions to lowest terms, for which purpose one of the gcd procedures developed in Ex. 12 will be found useful.

15. Supposing that fractions have the representation described in Ex. 14, write a procedure which takes a set of fractions and sorts them into increasing numerical order.

16. The following mathematical relationships can be used as the basis for recursive procedures for calculating various mathematical functions. Write out appropriate recursive procedures for each of these functions.

(a) The value x occurs as a component of a tuple t if and only if it occurs either as a component of the left half of t or as a component of the right half of t.

(b) The sum of all the components of a tuple t of integers is the sum of the left half of t plus the sum of the right half of t.

(c) The reverse of a tuple t is the reverse of its right half, followed by the reverse of its left half.

Think of at least four other relationships of this kind, and write out recursive procedures based on these relationships.

17. The *Fibonnacci numbers* F(n) are defined as follows:

(a) Write a recursive procedure for calculating F(n).

(b) Write a procedure which calculates F(n) without using recursion.

18. Write a recursive procedure to calculate the number of different ways that an integer n can be written as the sum of two squares, as the sum of two cubes, and as the sum of three cubes. Print out a table of these values and see whether they suggest any interesting general facts.

19. To compute the power x**n, one can multiply x**m by x**k for any positive integers m and k satisfying m + k = n. Write a recursive procedure which uses this fact to determine the minimum number M(n) of multiplications needed to calculate x**n. Print out a table of M(n) for all n from 1 to 100.

20. Take mergesort (Section 5.4.2) and one other recursive procedure, and track their recursive operation by inserting code which computes the level of recursion reached by every invocation of the procedure being tracked. (A global variable should be introduced for this purpose.) Messages like the following should be printed:

invoking mergesort from recursion level 3 entering mergesort at recursion level 4, parameter is. .. returning from mergesort to recursion level 3, result is...

21. The *correlation* corr(u,v) of two vectors u, v of n real numbers is the quotient

where Mu and Mv are the means (i.e., average) of u and v, respectively, while Va(u) and Va(v) are the variances of u and v, respectively. (The variance of a vector v is the sum of the all squares (v(i) - Mv)**2, i running from 1 to #v, where Mv is the mean of v).

Write SETL procedures which calculate and return this value. Use this procedure to calculate and print the correlation of 10 randomly selected pairs of vectors. What is the largest value that corr(u,v) can possibly have? What is the smallest?

22. Write a procedure which will read a number written in any specified number base from 2 to 36 and convert it to the integer it represents in decimal notation. Numbers in bases beloww 10 will involve only the digits 0 thru 9; numbers written in larger bases will use the capital letters A thru Z, in increasing order, as additional digits. For example, base 16 numbers will be written by using the characters

Also, write a procedure which will convert an integer to its string representation in any of these bases. These programs should allow for the fact that an illegal character might occur in a string which is to be converted to an integer.

23. Write a program which can be used to prepare an alphabetized directory of your friends' names, addresses, and telephone numbers. The input to this program is assumed to be a list of multiline entries, each starting with a line having the format

where 'key' designates an alphabetic key which determines the alphabetic position of the given entry. (These keys are not to be printed in the final directory.) For example, two entries might be

*Smith Mary Smith 222 Flowery Ridge Ossining, N.Y. 10520 (914)284-1234 *Termites Acme Exterminators (Termite Specialists) (Recommended by Mary) (202)789-1212

24. Write a "personalized letter" generator. The inputs to this program should be a form letter L and a file F containing "addresses" and "variations." The letter L is given as a text containing substrings **j**, and the file F given as a sequence of items **sl**s2**...**sn,each sj being some "personalizing" string.The expanded form of the letter is produced by inserting the address in an appropriate position and replacing each substring **j** in the form L by the string sj.

For example, if L begins

Dear **1**: Since only **2** weeks remain before you will graduate from **3**,

and the first entry in F is

Ms. Nancy Holman#353 Bleecker St#N.Y.C., 10012 NY **Nancy**six**New York University

the "personalized" letter generated will be

Ms. Nancy Holman 353 Bleeker St N.Y.C., 10012 NY Dear Nancy: Since only six weeks remain before you will graduate from New York University, ...

The "personalized" letters that your program generates should be right-justified and attractively formatted. Try to think of, and implement, features which will improve the utility of the personalized letter generator.

30. Manhattan Island was purchased in 1626 for $24. If instead this money had been deposited in a bank account drawing 6% annual interest, how muchh would be in the account now?

26. The set of distances between the centers of cities x, y directly connected by a road not going through any other city is given by a map dist(x, y). (Whenever dist(x, y) is defined, so is dist(y,x), and of course dist(x,y) = dist(y, x).) Write a program that will use this information to calculate the shortest driving distance between any two cities (whether or not they are connected directly by a road). This information should be printed out as an intercity distance chart of the usual form. Also, print out a chart which describes the shortest driving route between cities by listing the city z that you should drive to first if you want to go from x to z.

27. Write a procedure which, given two tuples t_{1} and t2, prints out a list of the number
of times each component of t_{1} occurs as a component of t2.

28. Write a procedure whose parameters are a string x and a set s of strings and which returns the element of s which has the largest number of successive character pairs in common with x. How would you structure this procedure if it is to be called repeatedly, always with the same s, but with many different values of x?

29. Write a procedure that determines whether a character C is a letter, digit, blank or special character. Try to make your code efficient.

Variables and constants declared in outer procedures are globally available within their nested procedures, unless they have been redeclared at some intermediate level of the hierarchy. E.g., in the example above, assuming that their names are distinct, top_lev_var_1, top_lev_var_2, second_lev_var_1, second_lev_var_2, and third_lev_var_1 are all accessible by code inproceduretop_level;vartop_lev_var_1,top_lev_var_2,...;consttop_lev_const_1 := val1,top_lev_const_2 := val2,...;forjin[...]loopdo_something_or_the_other; ...end loop; ...procedurenested_1;varsecond_lev_var_1,second_lev_var_2,...;constsecond_lev_const_1 := val3,second_lev_const_2 := val4,...;forjin[...]loopdo_something_else; ...end loop; ...proceduresubnested; -- a procedure subnested within nested_1varthird_lev_var_1,third_lev_var_2,...;constthird_lev_const_1 := val5,third_lev_const_2 := val6,...; do_something_tricky; ...endsubnested;endnested_1;procedurenested_2; -- another procedure directly nested in the top level procedure ...endnested_2;endtop_level;

The names of nested procedures are only available within the procedure in which they have been nested; in respect to namescoping, a procedure declaration placed at a certain level has the same effect as a constant declaration placed at that level.

The following example illustrates the rules that apply.

The output produced isprogramtest; print(top_level(2)); -- call procedureproceduretop_level(n);procedurenested_1(n);procedureother_nested(n);endother_nested;endnested_1;procedureother_nested(n);endother_nested;procedureother_nested2(n); other_nested(n); -- call procedureendother_nested2;endtop_level;endtest;

Hello from top_level Hello from top_level Hello from nested_1 Hello from deeply subnested procedure Hello from other_nested Hello from other_nested Hello from other_nested Hello from other_nestedThis is because:

- The program prints the first line before calling any subprocedure.
- The call to nested_1 first prints the second line, and then calls the version of other_nested declared within it, rather than the declaration of other_nested external to it, which the redeclaration of other_nested within nested_1 hides. For this reason, the third line printed is
Hello from deeply subnested procedure

- The top-level program's subsequent call to other_nested sees the version of this routine declared directly within it, so the fourth line printed is
Hello from other_nested Hello from other_nested

- The
**procedure**other_nested2 sees this same version of other_nested, and so prints the same line.

Nested procedures are typically used in a just a few ways. For example, they can be used to make a given procedure more readable and less error-prone by structuring it into a series of smaller blocks of no interest elsewhere, or to encapsulate procedures that are internally recursive. In this latter usage one typically nests a recursive 'workhorse' procedure (or a mutually recursive group of procedures) inside a short 'master' procedure which justs initializes an environment for the workhorse and sets it it to work.

Consider, for example, the problem of 'pretty-printing' deeply nested SETL objects in a form which improves their readability. One way of doing this is to print the opening and closing brackets of sets and tuples on separate lines, inside of which we print their elements in recursively indented fashion. The code which follows accomplishes this:

This routine will display the objectprocedurepretty_print(obj); -- top-level pretty-print routinevarindent_step := 4,indentation := -indent_step; -- amount of indentation wanted, set up as a global available to the workhorse pretty_print_recurse(obj); -- just call the recursive workhorseprocedurepretty_print_recurse(obj); -- the recursive workhorse indentation +:= indent_step; -- adjust the indentation levelif(is := is_set(obj))oris_tuple(obj)then-- print brackets, and then print elements recursivelyforsubobjinobjlooppretty_print_recurse(subobj);end loop;else-- print the object directly, but indentedend if; indentation -:= indent_step; -- restore the indentation levelendpretty_print_recurse;endpretty_print;

[1, 2, {4, ["Me", "You"], 3}, {44, ["He", "She"], 33}]in the indented form

[ 1 2 { 4 [ Me You ] 3 } { 44 [ He She ] 33 } ]Another typical use of procedure nesting is to 'memoize' another function. A function is said to be 'memoized' if previously calculated values of the function are stored in an auxiliary map from which they can be retrieved instead of having to be re-evaluated. This can speed them up greatly if they are either expensive to calculate or recursive and crudely written. As an example of this, consider calculation of the n-th Fibonacci number (see Exercise XXX), whose recursive definition is simply

For example, fibonacci(28) is 317811. But in calculating this you may notice that it takes a strangely long time. To see why this is, we can modify the fibonacci routine so that it counts the number of recursive calls to it, and print out this number of calls, as inprocedurefibonacci(n);return ifn < 3then1elsefibonacci(n - 1) + fibonacci(n - 2)end if;endfibonacci;

It will be seen that the number of calls to fibonacci made in the course of this evaluation is 635,621. The reason for this surprisingly large number is that the recursive Fibonacci routine calls itself twice each time it is called, so that the number of calls grows roughly as fast as the Fibonacci numbers themselves. To see what this means, we can modify the preceding test so that it counts just the number of times that fibonacci(1) is evaluated:programtest;varnumber_of_calls := 0;procedurefibonacci(n); number_of_calls +:= 1; -- note one more callreturn ifn < 3then1elsefibonacci(n - 1) + fibonacci(n - 2)end if;endfibonacci;endtest;

It will be seen that fibonacci(0) was evaluated 121,393 times.programtest;varnumber_of_evals_of_fib0 := 0;procedurefibonacci(n);ifn = 1thennumber_of_evals_of_fib0 +:= 1;end if; -- note one more callreturn ifn < 3then1elsefibonacci(n - 1) + fibonacci(n - 2)end if;endfibonacci;endtest;

The cure for this is to memorize, i.e. store and use previously calculated values instead of recalulating over and over again. We can do this by nesting the real Fibonacci formula inside a shell which checks to see if the desired value has already been calculated, and uses the recursive formula only if it has not been. In a first version (in which we include evaluation-counting), this gives

We get the same result, 317,811, but now the number_of_calls was 53 instead of 635,621. This encourages us to calculate fibonacci(280), which turns out to beprogramtest;varnumber_of_calls := 0;varpreviously_calculated := { }; -- this will map n for which fibonacci(n) has previously been calculated -- directly into fibonacci(n)procedurefibonacci(n); number_of_calls +:= 1; -- note one more call if (pc := previously_calculated(n)) /=OMthenreturnpc;end if;returnpreviously_calculated(n) := inner_fibonacci(n); -- otherwise calculate and keep the new valueprocedureinner_fibonacci(n);return ifn < 3then1elsefibonacci(n - 1) + fibonacci(n - 2)end if;endinner_fibonacci;endfibonacci;endtest;

calculated in 557 calls. If instead we had used the original recursive routine, the number of calls would have been roughly twice fibonacci(280), and so the calculation would have run far past the expected disappearance of the universe, which may be no more than a million billion years, or

One detail of the preceding code is ugly: The map 'previously_calculated' should really be internal to the fibonacci complex, rater than being obtrusively visible to the top-level program. We can readily accomplish this by returning fibonacci as a 'closure' from another routine, which gives our final version of the preceding code ('closures' are explained later in this chapter):

programtest; fibonacci := make_fibonacci(); -- get the 'fibonacci closure'proceduremake_fibonacci(); -- returns the fibonacci routine, as a closurevarpreviously_calculated := { }; -- this will map n for which fibonacci(n) has previously been calculated -- directly into fibonacci(n)returnfibonacci;procedurefibonacci(n); if (pc := previously_calculated(n)) /=OMthenreturnpc;end if;returnpreviously_calculated(n) := inner_fibonacci(n); -- otherwise calculate and keep the new valueprocedureinner_fibonacci(n);return ifn < 3then1elsefibonacci(n - 1) + fibonacci(n - 2)end if;endinner_fibonacci;endfibonacci;endmake_fibonacci;endtest;

Procedures play various roles and in particular serve to clarify the logical structure of a complex program by dividing it into subsections whose names hint at their purposes. However, the use of procedures is a bit "heavy" syntactically, in part because procedures require header and trailer lines to introduce them, in part because the variables of a procedure are logically isolated from all other procedures (unless these variables are made global, but then they become accessible to all procedures, which, as pointed out in Section 5.2, is often highly undesirable). In some cases it is nice to use small isolated groups of short parameterless procedures which need to share many variables among themselves. (Unless such groups are small and carefully isolated, this procedure is very dangerous and is strongly discouraged!) This can be done by nesting the group in an outer procedure (or short program) which serves to isolate them. The outer procedure should then declare all the variables which the nested procedures need to access. The following example, of a short program rather than a procedure, illustrates what is meant.

programquadratic;vara,b,c,x; -- the variables used read_data; -- the three steps solve_equation; output_results;proceduresolve_equation; x := (-b + sqrt(b * b - 4.0 * a * c))/ 2.0 * a;end;procedureoutput_results;end;procedureread_data; [a,b,c] := [3.0,5.0,1.0];endread_data;endquadratic;

Effective programming depends more on the proper use of procedures than on any other single factor. Your use of procedures should aim to achieve various important stylistic goals:

- Procedures are used to "paragraph" programs, i.e., to divide them into
manageably short subsections, each performing some easily definable
logical function, which can be read and understood in relative independence of each other. Here the key term is
*independence*: it is critically important to write your procedures in a manner that isolates each of them as muchh as possible from the internal details of other procedures. Very few data objects should be shared globally between procedures; sharing is dangerously productive of errors, so that all data object sharing should be carefully planned, should adhere to clearly understood stylistic rules, and must be scrupulously documented. Be sparing in your use of global**var**declarations! - Procedures are also used to
*abbreviate*, i.e., to give frequently used compound constructions a name facilitating their repeated use. This will often give rise to short procedures, the shortest of which may reduce to a single**return**statement. Code sequences used more than a very few times should be replaced by short procedures, since such procedures will need to be debugged only once, although repeated code sequences can be repeated incorrectly, and can interact in unanticipated ways with code surrounding them (for example, by accidental overlap of names). These facts make repetition of code sequences dangerous, and their replacement by procedures advantageous.

- Procedures define one's conceptual approach to a programming task and
are used to clarify and help document programs. If this is done well, a
program's topmost procedure will document the main phases of the
program and explain the principal data structures passed between its
phases. Then each intermediate level procedure will both realize and
"flowchart" an important substep of processing. Each bottom-level procedure will realize some well-defined utility operation and will be separately readable.
The narrative commentary that accompanies the program should be organized around the layout of its procedures. Comments concerning overall approach and main shared data objects will accompany top-level procedures, and detailed remarks on particular algorithms will be attached to the low-level sub-procedures that implement these algorithms.

- Procedures are used to decompose programs into separate parts which have different degrees of generality/specificity, or which have significantly different "flavors" in some other regard. The "buckets and well" example considered in Section 5.3.1 exemplifies this point. In this program, procedures new_states_from, pour, fill, etc., concentrate all details particular to the specific problem being solved, while procedure find_path, which simply realizes a general technique for searching over states and constructing paths, is independent of these details. This separation makes it possible to use find_path to solve other problems of the same kind, simply by replacing new_state_from and its subsidiary routines.
- When one is writing a program which addresses a mathematical or application area which makes use of some well-established family of concepts,
it can be very advantageous to define SETL representations for all the
kinds of objects used in this area, and then to write a collection of utility
procedures which can be used to apply all the important operations of the
area to these objects. These procedures should be written in a way which
allows their user to ignore the internal details of the objects representations, making it possible for him to think more as a specialist in the
application area rather than as a programmer. This is the important
principle of "information hiding": structure your programs in a way which allows the representational details of objects manipulated by the highest-level programs to be concealed from the authors of these programs. (So
important is this principle that some programming languages
include syntactic mechanisms for enforcing it rigorously.) A family of
procedures that manipulate objects whose internal representational details
are known only to these procedures is sometimes called a
*package*. The package of polynomial manipulation procedures shown in Section 5.1.3 is an example; other examples appear in the exercises.

It is worth saying a bit more concerning the paragraphing of code. Elusive errors easily creep into programs whose logic is spread over many lines. For this reason, one should always strive to break programs into independent "paragraphs" no more than 10 or so lines in length. (Longer paragraphs can be used where this is unavoidable, but as these grow to a page or more in size, the likelihood of troublesome multiple errors, as well as the difficulty of understanding what is going on when the code is read subsequently, will rise rapidly.) The two main constructs that can help you to paragraph code adequately are

- procedures
- the
**case**statement

Each procedure and refinement whose integrity is not compromised by an
undisciplined use of shared global variables constitutes an independent paragraph of code. Moreover, since only one of its alternatives will be performed
each time a **case** statement is executed, the separate alternatives of a **case** statement can be regarded as independent paragraphs. Hence, whenever the body of a procedure extends over more than a few dozen lines, most of this body should consist of one or more **case** statements each of whose alternatives is short. If this is not done, then the rules of good style are being violated; and this violation should either have compelling justification or be removed.

Nesting of loops and of **if**'s also raises interesting stylistic questions. Since iterations will rarely be nested more than three deep, nested iterations can generally be used without significant confusion resulting. When deeper nests start to build up, or the body of an outermost iteration tends to grow long, an effort should be made to relegate parts of its body to one or more separate procedures.

Deep nesting of **if**s leads very rapidly to confusion. Where at all possible nested **if**s more than two deep should be replaced by uses of **case** statements, or by segregation of the more deeply nested alternatives into procedures. A third alternative is to "flatten" a deeply nested **if** construct into an **if** construct which is less deeply nested, but in which the alternatives of the original **if**-nest have been combined using the Boolean **and**, **or**, etc. (However, this will tend to generate longish sequences of **elseif**s.) For example, instead of writing

ifa > Othenifb < Othena +:= 1;elsea -:= 1;endif;elseifb < Othenb +:= 1;elseb -:=1; endif;endif;

it is preferable to "flatten" and write

ifa > Oandb < Othena +:= 1;elseifa > Oandb >= Othena -:= 1;elseifa <= Oandb < Othenb +:= 1;elseifa <= Oandb >= Othenb -:= 1:

Still better, one can use the following **case** statement:

caseof(a>Oandb<O): a +:=1; (a>Oandb>O): a -:=1; (a<Oandb<O): b +:=1; (a<Oandb>O): b -:=1;endcase;

Note than an extended **if**..**elseif**..**elseif**...construct has some of the same paragraphing advantages as an extended sequence of **case** alternatives. However, **if** alternatives are less fully independent than **case** alternatives, since implicit conditions accumulate from each branch of an **if** statement to the next. Some of the confusion which this will cause can be avoided by using auxiliary comments to indicate the conditions under which each branch of an extended **if** will be executed, but it is even safer to use a **case** statement instead.

SETL supports some of the handy string primitives whose use was pioneered in the SNOBOL programming language. These generally have the form

operation_name(scanned_string, pattern_string). (1)

Each of these operations attempts to match a portion of its scanned_string
parameter in a manner defined by the pattern string. If a portion of the
scanned string is successfully matched, it is removed from the scanned_string
and returned by the function. If not even the first character of ss belowngs to
*ps*, then *ss* is unchanged and the function (1) yields the empty string.

The most often used string primitive is called **span**. The pattern string in this primitive is a sequence of characters. **Span** finds the longest initial segment of the scanned string which consists entirely of characters from the pattern string and breaks it off. If the first character of the scanned string is not in the pattern string, **span** yields the empty string (we also say that it *fails*) and the scanned string is unaffected.

Here are a few illustrations of the action of the **span** primitive; Suppose that
*ss* has the value "If, gentlemen." Then

has the value "If" and gives *ss* the value ", gentlemen". Also,

has the empty string as value and does not change ss.

The remaining string-scanning primitives provided by SETL are as follows:

any(ss,ps) (2)

breaks off and yields the first character of *ss* if this belowngs to *ps*. If the first character of *ss* does not belowngs to *ps*, then *ss* is unchanged and the value returned by **any** is the empty string. For example, the code fragment

ss := "ABC";any(ss, "AEIOU")," ",ss," ",any(ss, "AEIOU")," ",ss);

will yield

Think this through: the first value results because all the argument expressions of the **print** statement must be evaluated before the **print** statement itself is executed. Compare this to

ss := "ABC";any(ss, "AEIOU")," ",ss," ",any(ss, "AEIOU")," ",ss);

The string scanning primitive

break(ss,ps) (3)

scans *ss* from the left up to but not including the first character which does belowng to *ps*. This part of *ss* is broken off and becomes the value of the function (3). If the very first character of *ss* belowngs to *ps*, then (3) has a nullstring value and *ss* is not changed.

The scanning primitive

len(ss, n) (4)

has an integer second parameter. If #*ss* > = *n*, then (4) yields the value *ss*(1..*n*) and the assignment *ss* := *ss*(*n* + 1..) is performed; otherwise (4) yields
ss and *ss* is changed to the null string.

The primitive

match(ss,ps) (5)

yields *ps* if #*ps* <= #*ss* and if *ps* = *ss*(1..#*ps*). In this case the assignment
*ss* := *ss*( #*ps* + 1..) is performed. Otherwise (5) yields the nullstring value and ss is unchanged.

The primitive

notany(ss,ps) (6)

breaks off and yields the first character of *ss* if this does not belowng to the string *ps*. In the contrary case (6) yields the nullstring value and *ss* is unchanged.

Each of the preceding string primitives is also provided a "right-to-left" form which starts from the right, at the last character of the scanned string, and processes from right to left, rather than from left to right, starting at the first character of the scanned_string as in the cases already considered. The following table shows the right-to-left variant of each of the primitives described previously.

Left-to-Right Variant | Right-to-Left Variant |

any(ss,ps) | rany(ss,ps) |

break(ss,ps) | rbreak(ss,ps) |

len(ss,n) | rlen(ss,n) |

match(ss,ps) | rmatch(ss,ps) |

notany(ss,ps) | rnotany(ss,ps) |

span(ss,ps) | rspan(ss,ps) |

Two additional string utilities are provided to make productions of decently formatted string output easier. These are

The **lpad** primitive returns the string obtained by padding its first argument *ss* out to length *n* (which must be an integer) by adding as many blanks to the left of *ss* as necessary. If #*ss* >= *n*, then **lpad**(*ss*,*n*) is simply *ss*. The **rpad** primitive behaves similarly but adds blanks on the right.

One of the first problems that arises when one begins to program a compiler
for a programming language (like SETL, BASIC, or any of the other languages
with which you may be familiar) is to break the source form of the program
into a stream of individual identifiers, constants, and operators (collectively, these items are called *tokens*). The program that the computer will read must be decomposed into these elements before we can determine its meaning. For example, on reading the fragment

of text, one must break it up into the sequence of symbols

Note that the first of these items is an identifier, the second an operator sign the last a constant, etc. (Blanks separating tokens are ordinarily eliminated as the source text is scanned).

A procedure which performs this kind of decomposition of strings representing successive lines of program text is called a *lexical scanner*.
It is easy to write a lexical scanner for a simple language using the string
scanning operations that we have just described. We will now show how to
do this, but to avoid complications, we will suppose that the following rules
apply: .

- The program text to be scanned contains only identifiers, operator signs, integers, floating-point constants, and blanks.
- An
*identifier*is any string starting with an alphabetic and containing only alphabetic and numeric characters. - Any
*special character*(i.e., characters like " + ," "-," "," and ":," which are not blank, alphabetic, or numeric) will be regarded as an operator. - An
*integer*is a sequence of numerics not followed by a period. A*floating-point number*is a string of numerics including at most one period.

From the string being analyzed, the following procedure repeatedly breaks off a section consisting of a run of blanks, a run of digits, an identifier, or a single "special" character of some other kind. Blanks are ignored. If a run of digits is found, we check to see whether a decimal point and a second run of digits follow. If so, they are concatenated to the run of digits originally found. In each case, a nonblank section broken from ss constitutes a token, and it is added to the tuple of tokens which is eventually returned. The code assumes that num and alphanum are constants which must be initialized as follows:

programlexer; -- lexical scanprogramconstnum := "0123456789", alphanum := "abcdefghijklmnopqrstuvwxyz" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "0123456789";time forall good men 35 + 35.35 + 35. . . ;"));procedurelex_scan(stg); -- lexical scan routine where the -- parameter is a string. tup := [ ]; -- Initialize the tuple to be -- returned. stg + := " "; -- Add a terminating blank.whilestg /= ""looptoken :=span(stg," \t"); -- Break off a run of blanks, aiftoken = ""thentoken :=span(stg, num);end if; --ora number,iftoken = ""thentoken :=span(stg, alphanum);end if; --ora variable nameiftoken = ""thentoken :=len(stg,1);end if; --ora single letter.iftoken(1)in" \t"then continue;end if; -- Ignore blanks.iftoken(1)innumthen-- Testforfollowing "."and-- numerics.if match(stg, ".") = "."then-- Lookfordigits following the -- decimal point. token + := "." +span(stg, num);end if;end if; tupwith:= token; -- Add token to tuple being -- built up.end loop;returntup;endlex_scan;endlexer;

The following code generates a *cross-reference listing* or *concordance* of a source text. The source text is assumed to consist of a sequence of strings containing words separated by punctuation marks or blanks. The words present in the source text are printed in alphabetical order, each word being followed by a formatted list of all the lines in which it occurs.

programconcordance; -- concordance generator var capital_of, alphabetics; -- maps small letters to capitals var line_number; -- number of the current line var file_handle; -- handle to file beingprocessed initialize(capital_of,alphabetics); -- All upperandlowercasealphabetics. make_concordance("test_file");proceduremake_concordance(file_name); -- make concordanceforspecified file line_number := 0; -- Initialize line_number count. lines_word_is_in := { }; -- Initialize this to the empty map. file_handle :=open(file_name,"TEXT-IN"); --openfileforreadingwhile(tuple_of_words := break_next_line(line_number)) /=OMloop-- break_next_linereadsa line of textanddecomposes it into the words it -- contains by capitalizing themandeliminating punctuation marks.forwordintuple_of_wordslooplines_word_is_in(word) := lines_word_is_in(word)?[]withline_number;end loop;end loop; -- Now sort, putting all words encountered into alphabetical order. This -- is done by using the quicksortproceduredescribedinSection 5.4.1.for[word, lines]insort(lines_word_is_in)loopnprint(word + 20 * " ")(1..20)); arrange(lines); -- Arrange the line numbers neatly.end loop;endmake_concordance;procedurebreak_next_line(rwline_number); -- Inputandscanning routine. -- Thisprocedure readsa line of inputandscans it tobreakout the words -- which it contains. -- These words are capitalizedandplacedina tuple. line_number +:= 1; -- Advance the line number.geta(file_handle,line); --readline ofraw input,as explainedinsection XXXifline =OMthen returnOM;end if; --returnOMas signal that there are no more lines words := []; -- Start a new tuple of words.whileline /= ""loop-- Until the line has been digested.if break(line,alphabetics) =OMthen-- Dropanyleading nonalphabetic quit; -- charactersandquitifthere are none.end if; -- Some alphabetic characters left. wordswith:= capitalize(span(line,alphabetics));end loop;returnwords;endbreak_next_line;procedurearrange(lines); -- Routine toandarranges them neatlyinfields six characters wide.whilelines /= [ ]loop-- Until all line numbers are processed, group := lines(1..10min#lines); --breakoff a first group of up to ten lines. lines := lines(11min(#lines + 1)..);lpad(str(ln), 6): lningroup]);end loop;endarrange;procedurecapitalize(word); -- Capitalizes its parameterreturn"" +/ [capital_of(let)?let: letinword]; -- Returning capitalized versionendcapitalize;proceduresort(s); -- Quicksortprocedure, second form t:= [y: yins]; -- Get first element of unsorted t_{1}:= t(1); -- tuplereturn if# t < 2thentelsesort([y: y = t(i)|y(1) < t1(1)]) + [t1] + sort([y: y = t(i) | y(1) >= t1(1)andi > 1])end if;endsort;procedureinitialize(rwcapital_map,rwalphabet_string); -- Initialization routine small_lets := "abcdefghijklmnopqrstuvwxyz"; big_lets := "ABCDEFGHIJKLMNOPQRSTUVWXYZ"; alphabet_string := small_lets + big_lets; capital_map := {[small_let, big_lets(i)]: small_let = small_lets(i)};endinitialize;endconcordance;

Our third example is a margin justification procedure which takes a sequence of words separated by blanks and arranges them into lines which fit between left_margin and right margin with the first nonblank character placed in position left_margin and the last nonblank character placed in position right_margin. Extra blanks are inserted at random positions between the words to force "justification" of the right margin. Procedures of this sort are often used in text preparation programs.

procedurejustify(tuple_of_lines,left_margin,right_margin); -- line justification procedure tuple_of_words := [ ] +/ [break_words(line): lineintuple_of_lines]; --all the words in the linesuntilis_lastloop-- iterate over all sections line_words := break_next_line(tuple_of_words,right_margin - left_margin + 1); -- 'break_next_line' breaks offandreturns the tuple of -- words to be placed on the next line.if(is_last := (tuple_of_words = []))then-- Output last line with no justification.inline_words]);else-- Print justified line. spaces := -- Calculate vector of extra spaces. put_spaces(#line_words,right_margin - ((left_margin-1) +/[#word + 1: word in line_words]));end if;end loop;endjustify;procedurebreak_words(line); -- Breaks line at blanks and returns a tuple of words. tup := [ ]; -- Initialize tuple of words to be collectedwhileline /= ""loop-- now collect word :=span(line," \t"); -- remove whitespaceifword = ""thenword :=break(line," \t");end if; -- if no whitespace,try for a wordifword(1) /= " "thentupwith:= word;end if; -- collect nonempty wordsend loop;returntup; --returnthe list of words collectedendbreak_words;procedurebreak_next_line(rwtuple_of_words,nchars); -- This procedure breaks off and returns the longest sequence of words that will fit -- into nchars character positions; this sequence is broken off from tuple_of_words. sum := 0;forword = tuple_of_words(i)loopif(sum +:= #word + 1) > ncharsthen-- Too far, back up one word. save := tuple_of_words(1..i - 1); tuple_of_words := tuple_of_words(i..);returnsave;end if;end loop; -- Else this is last line; blank tuple_of_words and return all words save := tuple_of_words; tuple_of_words := [];returnsave;endbreak_next_line;procedureput_spaces(between_kwords, nblanks); -- This procedure finds the positions where n blanks are to be placed between -- k words. The blanks are placed evenly. space_count := (size := (between_kwords - 1)) * [0];forjin[1..nblanks]loopspace_count((jmodsize) + 1) +:= 1; -- Place a blank.end loop;returnspace_count;endput_spaces;

Atoms can be made members of sets or tuples (e.g., by the **with** operator) and can be tested for set membership (by the **in** and **notin** operators). Moreover, previously generated atoms which have been put into sets or made into components of tuples can reappear when one iterates over a set or tuple in which they have been placed.

To facilitate debugging of programs which use atoms, the *print* (but not the *read* operation) can be applied to atoms. The internal representation of an atom carries a system-generated integer called its *serial number*; when an atom is printed, the representation of it is placed on the output medium as

where *nnn* is the serial number of the atom. Thus, for example, if the very first statement in a program is

the output produced, namely

will represent a set of 11 *distinct* atoms.

Another important use of atoms is to represent objects which have a continuing identity, independent of any varying data attributes, associated with them. Consider, for example, the problem of maintaining a simple data base, which keeps track of a few items of data (e.g., name, address, and telephone number) for each of a varying group of people.

A given person would of course retain his or her identity if he or she changed
address, telephone number, or even name. Since these information items may
change, it is not always appropriate to identify a person with a tuple [name,
address, tel_no] even if this tuple gives all available information. The most
appropriate treatment of such situation may be to represent the person by an
atom *x* and to maintain three maps, called *name*, *address*, and *tel_no*, which
map *x* into the name, address, and telephone number of the person represented
by *x*. Then a name change for person x can be implemented simply by writing:

To give a small example of the use of atoms, we shall suppose that a graph
*G* is given as a set of ordered pairs, each pair [*x*,*y*] representing a directed edge of *G* going from node *x* of the graph to node *y* of the graph. In graph theory, one often wishes to form new graphs from old by introducing new points and edges that serve to simplify some mathematical argument. Suppose, in particular, that for some reason we wish to introduce two new graph nodes *n1* and *n2*, and to connect *n1* to each node of *G* which is the initial point of an edge in *G*, and also to introduce an edge [*x*,*n2*] for each node *x* of *G* which is the second node or "target" of an edge of *G*. This will define a new graph *G*2 within which the original graph *G*, with all its edges and nodes, is embedded as a subgraph.

To represent this construction in SETL, it is reasonable to introduce new
atoms for the points *n1* and *n2*. This leads us to the following short and quite straightforward code fragment:

n1 :=newat(); -- Generate first new point. n2 :=newat(); -- Generate second new point. -- Now introduce new edges to build G2. G2 := G + {[n1,x]: xindomainG} + {[y,n2]: yin rangeG};

In this section we collect a few additional examples which illustrate the use of the facilities discussed in this chapter.

Suppose that we are given a system of *n* linear equations in *n* unknowns *x*1,*x*2, ...,*x*n. We can suppose that these equations have the form

a_{11}*x_{1}+a_{12}*x_{2}+ ... +a_{1n}*x_{n}=b_{1}a_{21}*x_{1}+a_{22}*x_{2}+ ... +a_{2n}*x_{n}=b_{2}(1) . . .a_{n1}*x_{1}+a_{n2}*x_{2}+ ... +a_{nn}*x_{n}=b_{n}

Solution of equations of this kind is one of the most fundamental problems
of numerical analysis and has been intensively studied. Without wishing to
enter very far into the enormous literature that has developed around this
problem, we shall present a simple SETL code for solving such systems of
equations. The technique we will use is a variant of the famous (though
essentially straightforward) technique introduced by Karl Friedrich Gauss
(1777-1855, "The Prince of Mathematicians"). This technique is known as *Gaussian elimination*.

The idea can be summarized as follows: Each equation in the system (1)
involves n coefficients *a*_{j1}, *a*_{j2},...,*a*_{jn} If in any equation all of these coefficients are zero, then the whole left-hand side of the equation is zero, and the whole equation reduces to

If the quantity *b _{j}* occurring on the right-hand side is not zero the original system of equations (1) simply has no solutions. A system of equations (1) which either contains an equation all of whose coefficients

If this is the case, we can take any one of the equations in (1), say the first, and find at least one nonzero coefficient, say *a*_{1j}, on its left-hand side. Then we can form an equivalent but somewhat different system of equations by subtracting *a*_{kj}/*a*_{1j} times the first equation from the *k*-th equation for each *k* = 2,...,n. This subtraction eliminates the coefficient *a*_{kj} from all these other equations; i.e., it makes the coefficient *a*_{kj} of the variable *xj* equal to zero for k = 2,...,*n*. Hence we can regard equations 2,...,*n* as a system of (*n*-1) equations for the (*n*-1) unknowns *x2*,...,*xn*. Then, proceeding recursively, we can solve these equations for *x*2, ...,*xn*. Once this has been done, we can substitute the values of *x2*,...,*xn* into the first equation, thereby reducing it to a single linear equation in a single unknown. This final equation can then be solved for the remaining variable *x _{1}* by a single subtraction followed by a division.

Since in this procedure the subtractions applied to the *b*_{j} on the right sides of the equations (1) exactly parallel those applied to the left sides of the same equations, it is most convenient to transpose the ters and so rewrite equations (1) as

a_{11}*x_{1}+a_{12}*x_{2}+ ... +a_{1n}*x_{n}+a_{1(n + 1)}= 0a_{21}*x_{1}+a_{22}*x_{2}+ ... +a_{2n}*x_{n}+a_{2(n + 1)}= 0 (1) . . .a_{n1}*x_{1}+a_{n2}*x_{2}+ ... +a_{nn}*x_{n}+a_{n(n + 1)}= 0

That is, we can view our system of equations as a homogeneous system defined by an (n + 1) by n matrix, whose rows are successively simplified by subtracting an appropriate multiple of each from those that follow. The codes seen beloww use this representation of the system of equations to be solved.

We can write SETL code representing the Gaussian elimination procedure most clearly if we
write it recursively. To do this, we will need to use both an outer procedure
*Gauss* which sets up initial parameters and an inner "workhorse" procedure *Gauss-solve* which performs the actual arithmetic operations. Since the value of the array of coefficients *M* must be accessed and manipulated by all recursively generated invocations of the *Gauss_solve* routine (see Section 5.4), we adopt the (typical) expedient of making it a global variable. Thus the only parameters that need to be passed to *Gauss_solve* are a set, namely, the set of variables for which a first nonzero coefficient still has to be found, and an integer, namely the number of the next equation to be considered. The *Gauss-solve* routine returns **OM** if it encounters a "singular" equation all of whose coefficients are zero; otherwise, it returns a vector giving the values of the variables for which it has solved.

programtest;consteps := 1.0E-4; -- Define a utility real constantcloseto zero. var glob_M; -- Matrix of equation coefficients. var glob_soln_col; -- Length of matrix rows. -- (Note: these declarations must precede the firstprocedure). soln:= Gauss(M := [[1.0,1.0,5.2345,3.234],[1.0,2.02345,6.66,4.756],[11.0,32.0,6.662345,14.7234556]]); -- test inputin[1..#M]]: rowin[1..#M]]); -- checking the solutionprocedureGauss(M); -- Solves equations by Gaussian elimination. glob_M := M; -- Make original matrix globally available. glob_soln := []; -- Initialize tuple of solution values. glob_soln_col := #glob_M + 1; -- the solution columnreturnGauss_solve({1..#M},1);endGauss;procedureGauss_solve(var_numbers,next_eqn); -- Inner recursionforGaussian elimination. -- Var_numbers is the set of all indices of variables still to be processed; -- next_eqn is the index of the next equation to be examined.ifvar_numbers = {}then return[];end if; -- No variables,returnthe empty solution. row := glob_M(next_eqn); -- Get the row of coefficients.if not(exists vninvar_numbers |abs(row(vn)) > eps)thenreturnOM; -- Sincesystemis singular.end if;forjin[next_eqn + 1..ngm := #glob_M]looprow_j := glob_M(j); subtract := row_j(vn) / row(vn); -- factorforrow to be subtracted.forvnxinvar_numberswithglob_soln_collooprow_j(vnx) -:= subtract * row(vnx);end loop; glob_M(j) := row_j;end loop; -- Now call Gauss_solve recursively to solveforthe remaining variables.if(soln := Gauss_solve(var_numbersless:= vn,next_eqn + 1)) =OMthenreturnOM; -- Since a singularity has been detected.end if; -- Substitute to determine the value of the vn-th variable. soln(vn) := (row(#row) -/ [soln(vnx) * row(vnx): vnxinvar_numbers]) / row(vn);returnsoln;endGauss_solve;endtest;

It is not difficult to rework this procedure to use iterations rather than recursions. The iterative form of the procedure is shown beloww. The relationship between the recursive and the iterative form of this code is typical and is worth close study. Note that the iterative form of the procedure must implicitly save information (such as the order in which variables are processed) which the recursive form of the procedure saves implicitly (namely in the multiple procedure invocations which are created when the recursive procedure is executed). This is the reason that the quantity *var_order*, which has no counterpart in the recursive procedure, appears in the iterative variant shown. Aside from this, note that the *Gauss_solve* routine only invokes itself when it is near the point at which it will return; hence the only items of information which need to be saved for use after return from this invocation are *vn* (the number of the variable currently being processed) and *row*. However, *row* is just M(*vn*); thus only *vn* needs to be saved. This explains why we are able to transform the recursive procedure shown previously into the following more efficient iterative procedure. The initial sequence of recursive calls that would otherwise be required is first represented by a "forward elimination" pass over the rows of M, and in which the subsequent sequence of recursive returns becomes an iterative "back-substitution" pass.

procedureGauss(M); -- Solves equations by Gaussian elimination.consteps := 1.0E-4; -- Define a constantcloseto zero. soln := [ ]; -- Initialize solutions to be built. var_numbers := {1..n := #M}; -- Initially, all variables need to be processed. var_order := [ ]; -- This tuple will record the orderinwhich variables are processed. last_col := n + 1; -- index of the final columnforiin[1..n]loop-- Process rows one after another. row := M(i);if not(exists vninvar_numbers |abs(row(vn)) >= eps)thenreturnOM; -- Sincesystemis singular.end if;forjin[i + 1..n]looprow_j := M(j); subtract := row_j(vn) / row(vn); -- Amount to be subtracted.forvnxin(var_numberswithlast_col)looprow_j(vnx) -:= subtract * row(vnx);end loop; M(j) := row_j;end loop; var_orderwith:= vn; -- Note variable just processed var_numbersless:= vn; --andexclude itfromfurther processing.end loop; -- Next we work through the variablesinthe reverse orderfromthatinwhich they were initially processed -- Note that at this point the set var_numbers has become empty.foriin[n,n - 1..1]looprow := M(i); vn := var_order(i); soln(vn) := (row(n + 1) -/ [soln(vnx) * row(vnx): vnxinvar_numbers]) / row(vn); var_numberswith:= vn;end loop;returnsoln; -- Return the formal solution.endGauss;

Our next example will serve to illustrate some of the internal workings of an interactive text editor (though actually the program to be given will support only a few of the features which a full-scale editor would provide, and even these are highly simplified). This editor has the following capabilities:

- A vector of strings representing a text file to be edited can be passed to it.
- The editor prompts its user for a command by printing "?" and waits for him to respond.
- The allowed responses are as follows:
- A response of the form "/ABCD..E/abc..e" makes ABCD..E a member of a collection of
*search strings*that the editor maintains and indicates that some of the occurrences of*ABCD..E*in the text file are to be replaced by*abc..e*. Note that here*ABCD..E*and abc..e are intended to represent arbitrary strings which need not be of the same length;*abc..e*can even be null. Moreover, the delimiting character which we have written "/", can be any character which does not appear in*ABCD..E*. - A response of the form "/ABCD..E" with just one occurrence of the initial delimiting character indicates that ABCD..E is no longer to be searched for.
- A response of the form "//" indicates that searching is to start again from the beginning of the text file. A response of the form "done" indicates that editing is complete and triggers a return from the edit procedure.
- A nullstring response searches forward in the text file for the next following occurrence of any search string ABCD..E. If any such occurrence is found, it is displayed on the user's terminal, with a line of underscore characters placed immediately above it to mark its position. After this, another null response will trigger a search, but the response "/" will replace the string ABCD..E that has just been found by the corresponding string abc..e.

- A response of the form "/ABCD..E/abc..e" makes ABCD..E a member of a collection of

procedureedit(rw text); -- Text editor routine. line_no := line_pos :=1; -- Start at the first character of -- the first line of the text file. replacement := search_strings := { }; -- Initially no search strings -- have been defined. last pos :=OM; -- last pos will be the last -- character position in a zone -- located by searching; See the -- search procedure beloww. -- Initially, this is undefined. first chars := " "; -- first_chars is a string -- consisting of the first -- characters of all search strings.loop doif(r := response( )) = "STOP"thenreturn;elseifr = " "then-- Search forward from current position search(line_no, line pos, last_pos, search_strings, first_chars, text); -- See the search procedure given beloww for an account of its parameters.iflast pos =OMthenelseoverbar(line pos, last pos, text~line_no));endif;elseif# r = 1then-- Try to make replacement.iflast_pos =OMthen-- Successful search did not precede -- replacement.else-- Perform replacement text(line_no)(line_pos.... last_pos) := replacement(text(line_no) (line_pos..last_pos));OM; -- invalidate the search positionendif;else-- The user's response was at -- least two characters long. c := r(1); -- Get first character of this -- response.ifnot exists iin[2..#r] | c = r(i)then-- Drop search string. replacement(strg := r(2..)) :=OM; search_stringsless:= strg; -- Recalculate the "first-chars" -- string. first chars := " +/{x(1): xinsearch_strings};elseif# r = 2then-- "//"; hence restart search at top. line_no := line_pos := 1; last_pos :=OM; -- Invalidate search position.else-- A new replacement is being -- defined. replacement(strg := r(2..i-1)) := r(i + 1..); search_strings with:= strg; -- Recalculate the set of initial -- characters. first_chars := +/{x(1): xinsearch_strings}; last_pos := om; -- Invalidate any prior search.endifnot;endif;end loop;endedit;

proceduresearch(rw line_no, rw line pos, rw last_pos, search_strings, first_chars, text); -- This procedure searches forward, starting at a given text line and given -- character position, for the first position P at which any member of the -- setsearch_stringsof strings occurs. If such a position is found, then --line_nois set appropriately, line_pos is set to P andlast_posis set to the -- index of the last character matched. If no such position is found, then -- last_pos becomesOMwhileline_noandline_posremain the same. [old_line_no, old_line_pos] := [line_no, line_pos]; -- save to restore -- Iflast_posis notOM, indicating that a successful search has just taken -- place, then the search starts one character afterline_pos; this prevents -- repetitive searching.iflast_pos /=OMthenline_pos + := 1;end; search_string := text(line_no)(line_pos..); (whileline_no <= # text) (whilesearch_string /= ") -- While a portion of the current line remains to be examined.if(lead := break(search_string, first chars)) =OMthen-- No significant character in this line, so go to next line.quit;else-- See if one of the strings we -- are looking for is found here. line_pos + := #lead -- advance the line positionif existsstg in search_strings |match(search_string, stg) /=OMthen last_pos := line pos + # stg-1; -- end of matched zone return;else-- no match; advance by one -- character position line_pos + :=1; search_string := search_string(2..);end if;end if;endwhilesearch_string /= "; line_no + :=1; -- advance line number line_pos := 1; -- re-initialise line_pos search_string := text(line no);endwhileline_no; last_pos :=OM; -- note that search was -- unsuccessful [line_no, line pos] := [old_line no, old_line pos];endsearch;procedureoverbar(lpos, lastpos, line); -- displays string found print((lpos-1)*" " + (lastpos - Ipos + 1)*"-"); print(line);endoverbar;procedureresponse; -- reads user's responseget("SYS--INPUT", In);returnIn;endresponse;

SETL regards procedures as 'first-class' values, i.e. lets them be used in muchh the same way as any other kind of value. In particular, procedure values can be assigned to variables, passed as parameters, made elements of sets and components of tuples, etc. As an example of this, consider the simple function

This clearly expects a function-like value to be passed as an argument. And in fact we can pass the cosine function as a function value by writing make_table(cos); or the square root by writing make_table(sqrt);proceduremake_table(fcn); -- prints table of values of 'fcn'fori in [0..100]loopprint(fcn(float(i)/100.0)); -- print values at spacing of 0.01end loop;endmake_table;

This is a first tiny example of a family a very powerful programming techniques which culminate in the 'object oriented' programming ideas explored at length in Chapter 8.

- Argument binding: given an n-argument function P, create an n - k argument function Q by giving constant (possibly function) values to k selected arguments of P.
- Argument identification: n-argument function P(x,y,z,u,...),form functions like P(x,x,u,y,...).
- Composition: given two single-argument function P and Q, form their
*composition*, namely the function which maps each x into P(Q(x)). This operation plainly has multi-variable generalizations. For example, given three 2-argument functions f(x,y), g(x,y), h(x,y), create the 4-parameter function f(g(x,y),h(u,v)). - Vector, set,and map extension: given a single-argument function P(x), create the function Q which maps a vector t into [P(x): x in t], or a set s into {P(x): x in s}, or a map f into {[x,P(y)]: y = f(s)}.
- Filtering: given a single-argument Boolean-valued function P(x), create the function Q which maps a vector t into [x in t | P(x)], or a set s into {x in t | P(x)}.
- Iteration a specified number of times, or to convergence: given a single-argument function f(x), create the function rept(f,k)(x) which is the k-fold repetition f(f(f(...f(x)...))) of f. If these repetitions eventually stabilize for every x, we can also create the 'ultimate' value function f_infinity(x) to which they converge.

SETL's 'closure' construct makes all of these operations available. Argument binding gives a simple example. Consider, e.g., the small program

This is built on the very simple addition functionprogramtest; -- 'closure' example 1 add_999 := bind_y(sum,999); - creates a 1-variable function which adds 999 to its argumentin[1..10]]);proceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y);returnhas_one_param;procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;endbind_y;endtest;

The statement

procedurebind_y(fcn,y);returnhas_one_param;procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;endbind_y;

The essential points to notice are that

- The value returned by 'bind_y' is a procedure value, namely the value of bind_y's internal procedure 'has_one_param'.
- 'has_one_param' accesses a variable (the bind-y argument y) that is accessible inside 'bind_y' but not outside it.

This is the programming pattern that leads to closure formation:

- We call a procedure P which
*returns a procedure value*, namely the value of some internal procedure Q of P. - Q accesses one or more variables that are accessible inside 'bind_y' but not outside it. it is exactly these variables that become 'bound'into the procedure value formed and returned by P. They must be frozen on return from P because they are no longer accessible outside it. Global variables whose scopes extend outside P do not become bound when we return from P (but may become bound subsequently, if P is a more deeply sub-nested procedure from which an extended series of returns is subsequently made.)

The semantic rules which apply to closures, especially in the more complex case in which several nested procedures are put into a tuple or other composite SETL object and then simultaneously returned from the procedure which forms them, can be explained as follows. Procedures P which access no values except their arguments and variables local to P can obviously be used as values, since they define fully self-standing sequences of operations not dependent on anything external except the parameters transmitted to them. This idea extends readily to procedures P which also access some variables globally, provided that these extra variables are global to the entire program in which P exists, since such P never access any external quantities not available in their execution environment. This observation can be used to get a closure-like effect adequate for the small program shown above, even if the full SETL closure mechanism did not exist. We would merely have to introduce one new program-global variable, which we shall call 'the_fcn', and rewrite our example as

The SETL closure mechanism makes the transformation seen in the this simple example, but does so systematically and automatically, whenever procedure values P originally defined internally to a procedure Q are returned from Q. Upon any such return, the system generates new invisible 'shadow globals' like the auxiliary variable 'the_fcn' seen in the example above, performs initializing assignments to these variables like ourprogramtest; -- 'closures' simulatedvarthe_fcn; -- our auxiliary program-global variable add_999 := bind_y(sum,999); - creates a 1-variable function which adds 999 to its argumentin[1..10]]);proceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y); the_fcn := fcn; -- assign the function-valued parameter of bind_y to the program-global variable 'the_fcn'returnhas_one_param;procedurehas_one_param(x);returnthe_fcn(x,y);endhas_one_param; -- now the one external quantity accessed by 'has_one_param' is program-global, hence unproblematicalendbind_y;endtest;

and replaces the original functions P being returned by new functions P' which refer to the 'shadow globals' instead of the externally accessed variables originally appearing. These modified versions of the procedures P are the actual 'closures' which are then returned.

The SETL closure mechanism generates new shadow variables whenever return is made from a procedure Q which forms internal procedure values P which access quantities global to Q but not accessible outside Q. This is seen in our next example, which produces the output

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [999, 1998, 2997, 3996, 4995, 5994, 6993, 7992, 8991, 9990]The code is

If the two successive callsprogramtest; -- 'closure' example 2 add_999 := bind_y(sum,999); -- creates a 1-variable function which adds 999 to its argument mult_999 := bind_y(prod,999); -- creates a 1-variable function which multiplies its argument by 999in[1..10]]); -- call the first closurein[1..10]]); -- call the second closureproceduresum(x,y);returnx + y;endsum;procedureprod(x,y);returnx * y;endprod;procedurebind_y(fcn,y);returnhas_one_param;procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;endbind_y;endtest;

and

did not generate distinct shadow globals, this would instead generate the output

[999, 1998, 2997, 3996, 4995, 5994, 6993, 7992, 8991, 9990] [999, 1998, 2997, 3996, 4995, 5994, 6993, 7992, 8991, 9990],since it would behave like

programtest; -- 'closure' example 2, incorrect variant var the_fcn; -- just one auxiliaryprogram-global variable add_999 := bind_y(sum,999); -- creates a 1-variable function which adds 999 to its argument mult_999 := bind_y(prod,999); -- creates a 1-variable function which multiplies its argument by 999in[1..10]]); -- call the first closurein[1..10]]); -- call the second closureproceduresum(x,y);returnx + y;endsum;procedureprod(x,y);returnx * y;endprod;procedurebind_y(fcn,y); the_fcn := fcn; -- assign the function-valued parameter of bind_y to the oneprogramglobal variable 'the_fcn'returnhas_one_param;procedurehas_one_param(x);returnthe_fcn(x,y);endhas_one_param;endbind_y;endtest;

In this one-global' version, but not in the real closure version which precedes it, the second assignment

over-writes the effect of the prior operation-forming assignment

More of the force of the semantic rules for closures become visible if we consider a procedure which forms and returns two internal procedures simultaneously, as in the following example.

In this example, we create two 1-variable functions. The first adds a value w, initially 999, to its argument, while the second allows the value w to be changed. The output produced by the sequence of calls shown isprogramtest; -- 'closure' example 3: simultaneous closure of two functions [add_it,set_it] := bind_y(sum,999); -- creates two 1-variable functions. One adds a value to its argument, -- the other allows this value to be modifiedin[1..10]]); -- call the addition function set_it(444); -- change the amount to be addedin[1..10]]); -- call the addition function againproceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y);return[has_one_param,changes_bound];procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;procedurechanges_bound(z);returny := z;endchanges_bound;endbind_y;endtest;

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [445, 446, 447, 448, 449, 450, 451, 452, 453, 454],clearly showing that the two functions created communicate through a shared variable corresponding to the 'y' they both access. The effect is just as if we had written

However, the rule that each closure creation, whether of a single or of multiple simultaneously returned functions, creates new shadow variables whenever return is made from the closure-building procedure, remains valid. This is shown by our next example:programtest; -- 'closure' example 3: simulation var global_y; -- auxiliaryprogram-global variable [add_it,set_it] := bind_y(sum,999); -- creates two 1-variable functions. One adds a value to its argument, -- the other allows this value to be modifiedin[1..10]]); -- call the addition function set_it(444); -- change the amount to be addedin[1..10]]); -- call the addition function againproceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y); global_y := y; -- initialize the auxiliary globalreturn[has_one_param,changes_bound];procedurehas_one_param(x);returnfcn(x,global_y);endhas_one_param;procedurechanges_bound(z); global_y := z;endchanges_bound;endbind_y;endtest;

The output produced isprogramtest; -- 'closure' example 4; independence of successively formed pairs of closures [add_it,set_it] := bind_y(sum,999); -- creates two 1-variable functions, like those in preceding example [add_it2,set_it2] := bind_y(sum,999); -- creates a fully independent pair of 1-variable functionsin[1..10]]); -- call the first closure, first pairin[1..10]]); -- call the first closure, second pair set_it(444); set_it2(555); -- reset the value added, independently for the pairsin[1..10]]); -- call the first closure, first pair againin[1..10]]); -- call the first closure, second pair againproceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y);return[has_one_param,changes_bound];procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;procedurechanges_bound(z);returny := z;endchanges_bound;endbind_y;endtest;

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [445, 446, 447, 448, 449, 450, 451, 452, 453, 454] [556, 557, 558, 559, 560, 561, 562, 563, 564, 565]This shows clearly that two entirely independent pairs of closures have been formed, since 'set_it' clearly controls the value added by 'add_it', while 'set_it2' controls the value added by 'add_it2'.

Our next example shows that closures retain their properties and behavior even if returned as part of a SETL composite more complex than a simple pair. In the example, this is a map, which sends the string "sum_op" into the 'add_it' closure of our earlier example, and the string "sum_op" into the former 'set_it' closure.

Again the output produced isprogramtest; -- 'closure' example 5: returns closures as elements of a SETL map ops_map := bind_y(sum,999); -- creates two 1-variable functions, like those in preceding example ops_map2 := bind_y(sum,999); -- creates a fully independent pair of 1-variable functionsin[1..10]]); -- retrieve the first closure from its map and call itin[1..10]]); -- retrieve the second closure from its map and call it ops_map("control_op")(444); ops_map2("control_op")(555);in[1..10]]); -- call the first closure againin[1..10]]); -- call the first closure againproceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y);return{["sum_op",has_one_param],["control_op",changes_bound]};procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;procedurechanges_bound(z); y := z;endchanges_bound;endbind_y;endtest;

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [445, 446, 447, 448, 449, 450, 451, 452, 453, 454] [556, 557, 558, 559, 560, 561, 562, 563, 564, 565]showing that two entirely independent pairs of closures have been formed.

The next few examples illustrate the various function contructions listed at the start of this section. Each merely returns a function built on a simple expression which realizes the desired composition. First we show how easy it is to create a 'function composition' operator:

The closure seen in our second example extends any operation on single values to the corresponding componentwise operation on tuples, or element-wise operation on sets.programtest; -- illustration of function composition cos_of_sin := compose(cos,sin); -- creates composition of two functionsfloat(x)): xin[1..10]]);procedurecompose(fcn1,fcn2);returnresult_fcn;procedureresult_fcn(x);returnfcn1(fcn2(x));endresult_fcn;endbind_y;endtest;

The next example shows the formation of a closure which turns a specified Boolean-valued function into a filtering operation applicable to tuples.programtest; -- illustrates extension of operation to tuple float_to_tup := tup_op(float); -- appliesfloatoperation to tupleproceduretup_op(fcn);returnresult_fcn;procedureresult_fcn(tup_or_set);return[fcn(x): xintup_or_set];endresult_fcn;endtup_op;endtest;

The final example in this series shows how to form iterated versions of single-argument functions, i.e. constructions like f(f(f(...f(x)...))).programtest; -- 'filtering' illustration keep_evens_only := filter(even); -- creates filter from boolean-valued functionprocedurefilter(fcn);returnresult_fcn;procedureresult_fcn(tup_or_set);return[xintup_or_set | fcn(x) =true];endresult_fcn;endfilter;endtest;

Once formed, closures behave and can be used like any other procedure values,and in particular can be passed as parameters to further closure-forming operations. This is shown by our next example, which passes the one-parameter operation formed by binding the second parameter of 'sum' to the tuple-extension operation seen in an earlier example.programtest; -- iterated version of a single-argument function cos_repeated_20_times := iterate(cos,20); -- 20-fold iteration of the cosine functionprocedureiterate(fcn,k);returnresult_fcn;procedureresult_fcn(x);forjin[1..k]loopx := fcn(x);end loop;returnx;endresult_fcn;enditerate;endtest;

As the reader should anticipate, the output produced isprogramtest; -- illustrates extension of operation to tuple [add_it,set_it] := bind_y(sum,999); -- creates two 1-variable functions. One adds a value to its argument, -- the other allows this value to be modified add_to_tup := tup_op(add_it); -- extends addition operation to tupleproceduretup_op(fcn);returnresult_fcn;procedureresult_fcn(tup_or_set);return[fcn(x): xintup_or_set];endresult_fcn;endtup_op;proceduresum(x,y);returnx + y;endsum;procedurebind_y(fcn,y);return[has_one_param,changes_bound];procedurehas_one_param(x);returnfcn(x,y);endhas_one_param;procedurechanges_bound(z);returny := z;endchanges_bound;endbind_y;endtest;

[1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009] [445, 446, 447, 448, 449, 450, 451, 452, 453, 454]If the manner in which this output is produced is not clear to you, please review the whole of the present section closely.

As the preceding examples indicate,the SETL closure operations can be used to form arbitrarily large groups G of procedures wrapped around shared values common to them all. Each such group of functions can be regarded as an 'abstract object', whose state is defined by the internal values common to the closures in G, and which is manipulable by the functions of G and only by these functions. This exactly matches the semantic recipe fundamental to the kind of 'object oriented' programming explored in Chapter 8. Thus systematic use of communicating groups of closures affords an alternative to 'object oriented' programming (and vice-versa). However, the 'object oriented' programming explained in Chapter 8 provide much richer systematic mechanisms than those available through the use of closures alone.

**abend_trap**() has more interesting implications for debugging. This is a parameterless procedure called automatically whenever the standard SETL run-time error sequence begins. If a value (which must be a parameterless procedure) has been supplied for the special variable '**abend_trap**', it will be executed when and if its enveloping SETL program has generated an error, replacing the automatic abort which would otherwise take place. Aside from the fact that it is impossible to return from this 'customized abend' routine to continue execution, it behaves as a normal (but forced) procedure call, within which further abend_traps are possible and behave in the same way. This makes it possible to program a limited form of 'psuedo return': for an example of this, see the 'debug_watch' discussion beloww. (Since each such trap wil current print an abort message a few lines long, abend_trap routines should be written carefully to avoid an endless flood of output).
Often, however, the abend routine used will simply print some final diagnostic information and then stop. An example of such rudimentary use is

programtest; -- illustration of the use of 'abend_trap'varstep;abend_trap:=lambda();stop; end lambda; step := 1; x := 0; step := 2; x +:= "some string";endtest;

The 'debug_watch' code seen beloww can be inserted at the start of a program being debugged, just after any initial declarations. This provides a few global variable declarations, along with one 'main program' line and a few short auxiliary routines. The original main program of the code being debugged must then be wrapped as a procedure called 'run', with the header and trailer lines seen(the sample body seen in the example beloww should be removed). Trace-dode insertions like those seen should be manually adapted in the obvious way and inserted at each of the code points to be watched during debugging.

The sample program given here can be compiled and executed to see the detailed post-mortem dump produced.

After an abort message of standard form this program produces the outputprogramtest; -- more elaborate illustration ofabend_trapusageusestring_utility_pak;vardebug_count := 0,debug_watch_start := OM, debug_watch,debug_captions; -- globals for debug; move to start of programabend_trap:= rerun; run;procedurererun; -- program re_execution after crash debug_count := 0; debug_watch_start := debug_count - 5;abend_trap:= watch_trap; run;endrerun;procedurewatch_trap; -- debug info dump procedureif is_integer(debug_watch)thenprint("Failed on cycle: ",debug_watch);elsedebug_captions := breakup(debug_captions,","); print("Failed at count: ",debug_count," at location ",debug_captions(1)); for x = debug_watch(j) loop print(debug_captions(j + 1)?"Value",": ",x); end loop;end if;endwatch_trap;procedurerun; -- wrapper for main body of program being debugged -- debug_captions := "place,a,b,c"; -- list of variable names; insert this and following at program points being watched -- if (debug_count +:= 1) > (debug_watch_start?debug_count) then -- debug_watch := [a,b,c]; -- else -- debug_watch := debug_count; -- end if; -- Example:forj in [1..1000]loopdebug_captions := "loc1,j,n"; -- this point is being watchedif(debug_count +:= 1) > (debug_watch_start?debug_count)thendebug_watch := [j,n];elsedebug_watch := debug_count;end if; n := 10000/(j - 777); -- this point is being watched debug_captions := "loc2,j,m,n"; -- this point is being watchedif(debug_count +:= 1) > (debug_watch_start?debug_count)thendebug_watch := [j,m,n];elsedebug_watch := debug_count;end if; m := 10000/(j - 773 - n); -- this point is also being watchedend loop;endrun;endtest;

Failed at count: 1350 at location loc2 j: 675 m: -5000 n: -98which pinpoints the point of and reason for failure quite precisely.

EXERCISES 1. Write the values which x, y, and z will have after each of the following sequences is executed.

(a) | x := "abc"; | y := span(x,"ABC"); |

(b) | x := "abc"; | y := any(x,"ABC"); |

(c) | x := "abc"; | y := span(x,"ab"); z := rany(y,"ab"); |

(d) | x := "abc"; | y := break(x,"ABC"); |

(e) | x := "abc"; | y := break(x,"abc"); |

(f) | x := "abc"; | y := rbreak(x,"ABCabc"); |

(g) | x := "abc"; | y := len(x, 4); |

(h) | x := "abc"; | y := notany(x,"ABC"); |

(i) | x := "abc"; | y := rnotany(x,"ABC"); |

2. Write a program which will read a string s and will delete all sequences of blank spaces immediately preceding a punctuation mark, and then insert a blank space immediately after each punctuation mark that is not followed by either a blank or a numeric character.

3. Write a program which prints a set s of words in an alphabetized, neatly formatted arrangement; the words printed should be lined up in rows and columns. As many columns as possible should be used, but at least two blank spaces must separate any two words printed on the same line.

4. Modify the lexical scanner procedure of Section 5.8.1.1 so that it returns a pair [toks_and types, val_map], where toks_and_types is a tuple of pairs [tok, tok_typ], each tok being a token appearing in the source text scanned, and top_typ is the type (i.e., "integer", "floating point", "identifier", or "special") of tok. The quantity val_map should be a map sending the string form of each integer and floating-point number appearing in the sequence of tokens to its value.

5. As written, the lexical scanner procedure of Section 5.8.1.1 always treats the underbar character as a special character and does not allow floating-point numbers like ".3" which begin with a period. Modify this procedure so that it allows underbars within identifiers (but not as the first character of identifiers) and allows floating-point numbers to start with the "." character.

6. Modify the concordance program shown in Section 5.8.1.2 so that

- all words less than three characters long are omitted from the concordance;
- the program begins by reading a list of "insignificant" words, which occur on a sequence of lines terminated by a line containing the string "*****". It then omits these words from the concordance. (Multiple insignificant words can also occur, separated by blanks, on a single line of the data file listing all the insignificant words).

7. Modify the concordance program shown in Section 5.8.1.2 so that it begins by reading a blank-separated list of words and reports only on the occurrences of words belownging to this list.

8. Modify the concordance program shown in Section 5.8.1.2 so that it reports only on "infrequent" words, i.e., words that occur no more than twice. Words belownging to a specified set s of words should be ignored even if they are infrequent. Programs of this kind can be used to locate "suspicious" identifiers in other programs, i.e., identifiers which may have been misspelled or simply forgotten during program composition.

9. The simplified text editor shown in Section 5.10.2 does not protect its user against any of the errors that are likely to occur during a lengthy edit session. Add code which will alleviate this deficiency by implementing the following additional features: (a) Demand that "//", rather than any arbitrary string of two identical characters, be used to restart editing from the first line of the file F being edited, and that "/", rather than any arbitrary one-character string, be used to trigger a replacement. (b) Allow an additional command "x", which should produce a formatted display of all search strings, with their replacement strings. (c) Allow an additional command "f", which should undo the last correction made. Your system should allow up to five successive changes to be undone using the "f" command. (d) Allow the command "-" to trigger a search backward through the file, i.e., a search from the current character position through earlier positions and lines.

10. Browse through the user's manual of some text editor of medium complexity to become familiar with the various features it provides. Select an interesting one of these features, and modify the text editor code shown in Section 5.8.1.3 so that it implements the feature which you have selected.

11. Modify the character-string search procedure shown in Section 5.10.2 so that it can locate strings which run over from one line to the next. How should the editor program of Section 5.10.2 be modified to allow easy editing of strings of this kind?

12. The function **sin**(x) is the sum of the infinite power series whose n-th term is

(a) Let S_{5}(x) and S_{1O}(x) denote the first 5 and first 10 terms of this series respec-
tively. Calculate and print the difference S_{5}(x) - **sin**(x) and S_{1O}(x) - **sin**(x) for
each value of x from 0.0 to 3.14159 by steps of 0.1. What maximum deviation
between S_{5}(x) and **sin**(x) do you find? Can you find a constant b such that
addition of b to S_{5}(x) reduces this maximum deviation?

(b) Repeat part (a) for **cos**(x). This is the sum of the infinite series whose n-th term is

13. Certain types of forests are subject to infestation by budworms. The following rules can be used to model the results of such an infestation. We suppose for simplicity that the forest consists of an n by m rectangular array of trees. In a given year, any tree will be either healthy, infested, or leafless, having been infested the year before. A tree infested one year will be leafless the next year; a tree leafless one year will be healthy the next year. A tree healthy one year will be healthy the next year unless its neighbor to the north, south, east, or west is infested, in which case it will also become infested the next year.

Write a program which will simulate the progress of a budworm infestation obeying these rules. Track the progress of an infestation which starts with just one infested tree, and the progress of an infestation that starts with a row of three infested trees. Your program should print out a diagram of the forest in each of a sequence of years, together with a count of the number of infested, leafless, and healthy trees.

14. Write a procedure which can be used to print a coarse "graph" for any floating- pomt-valued function f of a floating-point variable x. This should be written as a procedure with floating-point parameters lo, hi (the lower and upper limit of the values of x for which f(x) will be graphed), lo_range, hi range (the lower and upper limits of the range of f that will be graphed), and an integer parameter n ~the number of lines on the printed output listing that the graph should occupy). Your procedure should call a subprocedure, "f_to graph" to obtain the values of the function to be graphed. Vertical and horizontal axes should be printed, with the vertical axis at the extreme left of the output listing. These axes should carry suitable markings to indicate the scale. The x axis should run horizontally. How would you change this procedure if the x axis is to run vertically down the length of the output listing?

15. Write a procedure which can be used to print a graph showing the values of several functions f(x). The main input to this procedure should be a sequence of tuples t of floating-point numbers all having the same length. Each of these tuples represents a sequence of values of one function f(x). Two floating-point numbers, lo and hi, defining the minimum and maximum values of the domain over which the dependent variable x has been evaluated to produce the tuple t, are also given. In addition, there are two more inputs: a character string whose j-th character will be used to print points belownging to the graph of the j-th function, and an integer n indicating the number of lines of the output listing which the graph is to occupy.

Your procedure should be written to accept an arbitrary number of tuples t. The scale of the graph should be adjusted to reflect the largest and the smallest values appearing in any of the tuples t. Axes should be printed with scales marked on both the x and y axis. If the tuples t are too long to be displayed with the x axis running horizontally, the graph should be turned 90 degrees so that the x axis runs vertically down the listing.

16. Write a procedure P which can be used to generate a variety of commercial reports
in graphical form. The inputs to P should be two tuples, tl and t2, of sales or
production figures; t_{1} representing the "current year" and t_{2} the "prior year." The
third parameter of P should be a two-character string defining the bar chart
desired, encoded in the following way:

"m"-monthly figures desired "c"-cumulative monthly figures desired "d"-difference between current and previous year desired "p"-percentage difference between current and previous year desired

The "d" chart should be organized as a series of adjacent pairs of bars showing figures for the current year and the previous year. Axes should be printed with the vertical axis using an appropriate scale and the horizontal axis carrying the names of the months. The "p" chart requires only a single bar for each month. What other useful features can you design and implement for a program of this kind?

17. Write a procedure which prints "bar charts" or "histograms." The inputs of this procedure should be a tuple t of floating-point numbers and an integer n indicating the number of lines on your listing that the chart is to occupy. A set of bars representing the components of t in graphic form should be printed. The scale of the bars should be adjusted to reflect the largest component and the smallest component of t, and the thickness of the bars should be adjusted to the length of t and the number of columns available on the output listing. Axes should be printed, the vertical axis being scaled. If t is too long for the required number of bars to fit horizontally, the chart should be turned 90 degrees so that the bars of the chart are horizontal.

18. Generalizing the procedure of Ex. 16, write a procedure which prints bar charts with bars which are divided into different "zones" representing different sets of quantities. The main input to this procedure should be a sequence of tuples t of floating-point numbers all having the same length. (But think of a good way to handle the case in which not all tuples have the same length!) The auxiliary inputs to the routine are a character string whose j-th character will be used to print the j-th zone of each bar and an integer n indicating the number of lines that the chart is to occupy on your listing. The procedure should be written to allow an arbitrary number of tuples t as parameters. If the tuples t are too long for the required number of bars to fit horizontally, the chart should be turned 90 degrees so that the bars are horizontal.

19. Write a procedure "Function_to_Graph" which can be used to print a graph of the "level curves" or "contours" for a floating-point-valued function of two variables x and y, where

and

The procedure should read in the number of contours desired. The printout should identify each contour by marking its outline according to its order from maximum to minimum.

20. Write a translation program which translates French to English word by word. (Warning: such a program will produce extremely mediocre translations.) The program should read a file of lines containing successive blank-separated pairs of French words and their English translations, and then read a French passage to be translated and print out its English translation.

21. Modify the word-by-word translation program described in Ex. 20 so that it becomes interactive, and so that it is prepared for the fact that certain French words might have several possible translations into English. When such words are encountered during translation, a numbered menu of all of them should be displayed, and the user should then have the ability to continue by selecting one of these possible translations.

22. *Pert charts* are used by project administrators to track progress and
monitor critical activities in large projects. To set up such a chart, one
first reads in a set s of pairs [activity1, activity2] defining the collection
of all activities that must finish before any given activity2 can start. One
also reads a map T sending each activity to its expected duration. Then one
calculates the earliest time that each activity A can finish, and for each such
A, the set of all activities whose completion is critical to completing A by
this time. Then one can print a list of all activities in order of their
completion times. Finally, working back from the last activity, which marks the
completion of the whole project, one can calculate the set of all critical
activities, that is, all activities which must be completed on time if
completion of the whole project is not to be delayed. One can also calculate
and print the degree of "slack" available for each activity, i.e.,
the amount that its completion could be delayed without slowing completion of
the whole project. Develop a program that calculates this information and
prints it out in a set of attractively formatted tables.

23. (Continuation of Ex. 22) Once started, large projects often begin to "slip" because some of their critical activities are not completed on time. Modify the pert program of Ex. 22 to allow it to read a list of activities which have already been started, together with their expected completion times, and to produce a new list of critical activities, and a revised table of "slack" for all (started and unstarted) activities. Can you design and implement any additional features which would make this pert program a more useful planning tool, especially if it is to be used interactively?

24. A meteorological station measures the temperature every hour, producing records arranged as a sequence of tuples t, each t having length 24 and representing a day's temperature measurements (the first being taken at midnight). Write a program which will read these data and print out a record of the highs, lows, and mean temperature for the entire day, and also the highs, lows, and mean temperature for the "daylight" hours (7 A.M. through 6 P.M).

25. The Bureau of Crime Statistics receives annual reports from all cities and incorporated towns, showing the number of major felonies recorded for the year. It then calculates the total number of cities and towns reporting felonies in the ranges < 100,101-500, 501-1000,1001-2000, and more than 2000. Assume that the file of data being read is a set of lines, each of which contains the name of a town and the number of reported felonies, separated by a blank. Write a program for preparing and printing this report.

26. When commands need to be entered interactively at a terminal, it is convenient to allow the shortest unambiguous prefix of any command to serve as an abbreviation for the command. Write a procedure which makes this possible. (Hint: alphabetize the set of allowed commands and locate prefixes by a fast search in this alphabetized list.)

27. Large sets of alphabetic strings which need to be stored can be represented in compressed form by arranging them in alphabetical order. Then all the strings beginning with a particular character, say "a"., can be preceded by the string "la", and the initial letter "a" dropped from all of them. Similarly, if the group of strings beginning with "a" contains more than two successive strings whose second character is "b", then the whole group of such strings can be prefixed by the string "2b", and the initial letters "ab" dropped from all of them. This transformation can be applied to as many initial characters as are appropriate.

Write a procedure which takes a set s of strings, alphabetizes it, and compresses it by using this technique. Write another procedure which takes a set s of strings represented in this form and prints s in its original alphabetized form.

is so very convenient, SETL is at pains to generalize it in various ways. One method provided is the overloading operations like '+" and '*' when they are applied either to SETL objects (like sets, strings. or tuples), or to user-defined object classes of the kind described in Chapter 8. A second method is the system of 'error extensions' explained in Section XXX. The present section describes yet a third such method, use of operator names prefixed by symbols having no other significance in the SETL syntax.

The characters usable in this way are:

~ ! @ $ % & \ ' £(option-3) ¢(option-4) ° (option-5) ¤ (option-6) ¦ (option-7) ¥ (option-8) º (option-b) ¶ (option-d) Æ (option-j) Â (option-l) µ (option-m) ¿ (option-o) ¹ (option-p) § (option-s) (option-t) · (option-w) Å (option-x) ´ (option-y) ½ (option-z) à(shift-option-7) ± (shift-option-=) ¸ (shift-option-P) × (shift-option-V)

These characters will have various representations in various national and operating environments, but generally their representations will be visible and recognizably 'operator-like'. The 'keyboard subset' of these characters, namely those that appear directly on a standard keyboard, are those most likely to have platform-independent print representations, and for portability you may want to confine your use of these characters to that set, which is

SETL allows any of these characters to be used either as an infix or prefix operator sign (but not both simultaneously), subject to precisely the same syntactic rules as the standard '-' sign (which can also be used either as an infix or a prefix operator.) For example, we can write

To make it possible to give a semantic meaning to these extended operator constructions, SETL associates a string name with each of them, as shown in the following table. (These names are taken from the macintosh appearance of the characters, in the BBEdit text viewer.)

~ | TILDE | ! | BANG_ |

@ | _AT_ | $ | DOLL_ |

% | PERCENT_ | & | AMP_ |

\ | BACKSL_ | ' | APOS_ |

£ (option-3) | BRITPOUND_ | § (option-s) | BETA_ |

¢ (option-4) | CENT_ | ° (option-5) | INFIN_ |

¤ (option-6) | _NOTE_ | ¦ (option-7) | PARA_ |

¥ (option-8) | DOT_ | º (option-b) | INTEGRAL_ |

¶ (option-d) | DIFF_ | Æ (option-j) | DELT_ |

Â (option-l) | NTSGN_ | µ (option-m) | MU_ |

¿ (option-o) | THORN_ | ¹ option-p) | SMALLPI_ |

(option-t) | DAGGER_ | · (option-w) | SIGMA_ |

Å (option-x) | APPROXE_ | ´ (option-y) | YEN<_/TD> |

½ (option-z) | OMEGA_ | à(shift-option-7) | GRCROSS_ |

± (shift-option-=) | PLMIN_ | ¸ (shift-option-P) | PI_ |

× (shift-option-V) | DIAMOND_ |

The string names seen in the preceding table can be used a two-parameter procedure with any binary occurence of an operator sign appearing in the list above, or a one-parameter procedure with any unary occurence of such an operator sign. For example, to give meaning to the operator signs occuring in the expression

one would simply write procedure definitions with headers

procedureAMP_(x,y); ...body...;endAMP_; -- 'binary' definitionprocedureAT_(x,y); ...body...;endAT_; -- 'binary' definition; not usable at same time as 'unary' definitionprocedureAT(x); ...body...;endAT; -- 'unary' definition; not usable at same time as 'binary' definition

Note that since SETL allows these extended operator signs to be used either as infix or as prefix operators, sequences of such characters are always treated as separate operator signs, e.g x @@ y has exactly the same meaning as x @ (@ y). The only exception to this rule is the exponentiation operator '**', which involves two successive operator signs.

Still more flexibility results from the fact that SETL allows any of the special operator characters listed above to be followed immediately by any alphabetic string (terminated by an end-of-line whitespace character) ad treats the resulting combination as a multi-character operator sign. For example, the expression

is allowed, and is understood to involve the four operators '@a', '@b', '@c', and '@', the first two having binary and the latter two having unary significance. To associate procedures with such extended operator names,one writes procedure definitions with headers like

procedureAT_A(x,y); ...body...;endAT_A; -- 'binary' definitionprocedureAT_B(x,y); ...body...;endAT_B; -- 'binary' definitionprocedureAT_C(x); ...body...;endAT_C; -- 'unary' definition

Suppose, as a second example, that we wish to introduce an operator called @dot which forms the dot-product of two vectors of equal length, i.e., the sum of the products of their corresponding components. This can be done as follows:

procedureAT_DOT(u,v);if#u /= #vthenreturnOM;elsereturn+/ [u(i) * v(i): i in [1..#v]];end if;end AT_DOT;

Once this operator has been defined, we can invoke it simply by writing

Another example is the useful operator !, which forms the composition of two (possibly multivalued) maps (see Section 3.8.4 for an explanation of the meaning of map composition.)

procedurebang_(f, g);return{[x,y]: xindomaing, zing{x}, yinf{z}};endbang_;

User-defined infix operators of this kind can be combined with the token ":=" to form assigning operators (see Section 3.12.1). For example, in the presence of the preceding definition we can write

to abbreviate the common construct

By defining a function of one parameter as an operator rather than an ordinary
**procedure**, we save what might otherwise be irritating parentheses. For example, if we define a unary operator minus by writing

procedureat_minus(u);return[-x: xinu];endat_minus;

Then the negative of a vector u can be formed by writing

If instead of this we made minus an ordinary function, we would have to write

instead.

One useful monadic operator is the prefix-print operator '@' defined by

procedureat_(u);returnu;endat_;

which can be inserted into expressions to capture their values during debugging. For example, if we suspect a bug in

programbad; i := 1; k := 2; j := 1/((i + k) * (i - k) - i * i + k * k);endbad;

we can easily spot the problem by adding the above definition of '@' and then changing it to

programbad; i := 1; k := 2; j := 1/@((i + k) * (i - k) - i * i + k * k);endbad;

The arguments of a user-defined infix or prefix operation always carry the
implicit qualifier rd, so that attempting to give them either of the qualifications **wr** or **rw** is illegal.

The precedence of any user-defined binary operator is the same as that of the '-' operator.

EXERCISES Thedot-productof a pairu,v of equally long vectors with integer or real coefficients is the sum +/[u(i)*v(i): i in [1..#v]]. 1. Write a prefix operator .rv n which returns a randomly chosen integer-valued vector of length n each time it is invoked. Use it and the operator .dot defined in Section 5.6.2 to test the validity of the following statements concerning vector dot-products:

(a) | (x.dot y) | = | (y.dot x) |

(b) | (x.dot x) | >= | (max/x)*(max/x) |

(c) | (x dot y)**2 | <= | (x.dot x)*(y.dot y) |

(a) | (x.dot y) | <= | (max/x)*(max/y)* # x |

2. The sum of two integer or real vectors x and y of equal length is [x(i) + y(i): i in [1..#x]], and their difference is [x(i)-y(i): i in [1..#x]]. Write definitions for two op's called .s and .d which produce these two vectors. Proceed as in Ex. I to test the following statements:

(a) | ((x.s y) .s z) | = | (x.s (y.s z)) |

(b) | (x.s (y .d x)) | = | y |

(c) | ((x.s y) .dot z) | = | (x.dot z) + (y.dot z) |

(a) | ((x.d y) .dot z) | = | (x.dot z)-(y.dot z) |

3. Write a procedure which, given a tuple

t, calculates a map which sends each componentxoftinto the index of the first occurrence ofxwithint.4. The storage space needed to represent a map

fcan sometimes be reduced very considerably by writingfin the formf(x) =f1(x)? (ifxinsthenf2(x)elseOMend), wheref1 has a small domain,shas a simple representation, and f2 is a programmed function. Write a procedurecompresswhich, givenf,s, andf2, will calculatef1. The functionf2 should be called bycompress, and it is assumed that user of the compress is required to supply code representingf2.5. Write a room assignment program which reads information concerning available rooms and classes needing rooms and generates a room assignment. The first of the two data items read by your program should be a map from room numbers to seating capacities. The second input read by your program should be a tuple of triples, each consisting of a class number (a string of the form

n.m wherenis a course number andma section number), number of students, and hour (possible hours are 8, 9,10,11,... up to 20). No two classes meeting at the same hour should be scheduled into the same room. Your program should print out a table, arranged by hour and room, of assignments. Starting with the largest class scheduled to meet in a given hour, each class should be assigned the smallest room into which it will fit. Classes which cannot be scheduled should be appropriately listed. Empty rooms should be indicated in the output table you print.The next three exercises relate to the earlier exercises on Boolean identities, found in Section 2.5.4.1.

6. A Boolean implication, which we will write as an infix operator

x.impy, istrueif eitherxisfalseoryistrue. Thusx.impyis equivalent to (notx) ory. Write a SETLopdefinition for this operator, which will be used in the next two exercises.7. Using the .imp operator defined in Ex. 6 and the method for checking Boolean statements described in Section 2.5.4.1, show that each of the following statements is true regardless of the Boolean values of the variables occurring in it. (a) (x

or noty) = (y .imp x) (b) ((xandy) .imp z) = (x .imp (y .imp z)) (c) (x .imp (y or z)) = ((x .imp y) or (x .imp z)) (d) ((x .imp y)andx) .imp y (e) (x .impnotx). impnotx (f) x .imp (y .imp x) (g) (notx) .imp (x .imp y)8. None of the following Boolean formulae is valid for all Boolean values of

xandy; each represents a common logical fallacy. Proceeding as in Ex. 7, write a SETL program which will find a case in which each of these formulae evaluates to (a) ((x imp y) and y) .imp x (b) ((x .imp y) and (x .imp z)) .imp (y .imp z) (c) ((x or y) and x) .imp not y (d) ((x .imp y) and not x) .imp not y 9. When a sequence of data items is read by a read statement of the formit will often be appropriate to check the items read to make sure that they have appropriate types and lie in appropriate ranges. For this purpose, the following approach, based upon the notion of "descriptor string," may be convenient: read(x,y,..z),

- Capital letters are used in the following way to designate the principal SETL
object classes:
Letter Value Letter Value *I*integer *T*tuple *R*real *E*set *S*string *A*atom - The ranges of integers and of real numbers can be constrained. For example,
*I*-100..100 designates an integer belownging to the set {-100..100},*I*0.. designates a non-negative integer,*R*-1.0..1.0 designates a real number lying between -1.0 and + 1.0. - The descriptors
*T*and*E*can be qualified to show the types of their components or members. For example*T*(*IIR*) describes a tuple of length 3 whose components are an integer, an integer, and a real, respectively;*T*.*I*describes an unknown-length tuple of integers;*E*.*T*(*II*) describes a set of pairs of integers. - To describe successive items in a list of variables being read, descriptors are
simply concatenated. For example, if three items
*x*,*y*,*z*, the first an integer, the second a set of pairs of integers, and the third a tuple of strings, are being read, we would describe it by IE.T(II)T.S..

Write a multiparameter procedure read_check whose first parameter is a descriptor string defining the data expected and whose remaining parameters are the variables whose values are to be read, e.g., in the example appearing in (d). we would writeread_check("IE.T(II)T.S", x,y,z); The read_check procedure should generate a report if it encounters any data of unexpected form. Of course, the read_check procedure must be foolproof. 10. Modify the read check procedure of Ex. 9 so that it echoes and labels all data read. For this modified procedure, the sequence of names of the variables being read should follow the data descriptor in the procedure's first parameter. These names should be separated from the data descriptor and from each other by blanks.

- File management operations:
**open**,**close**,**fexists**,**fsize**,**eof**. - Variants of the
**reada**and**print**operations:,**printa****nprint**,**n**,**printa****read**,**reads**,**str**,**unsstr**. - Line-oriented input operations:
**get**,.**geta** - Binary file I/O and related string operations:
**getb**,**putb**,**binstr**,**unbinstr**. - Random-access file operations:
**gets**,**puts**.

The file operations described in this section are of relatively low level, and to smooth their use it is often best to wrap them in auxiliary procedures or objects using the techniques described in Chapters 5 and 8.

The file operations described in this section are 'non-interactive'. To create interactive I/O involving windows, dynamically displayed text and graphs, point-and-click operations, etc. one uses the SETL graphical interface objects and operations described in Chapter 9.

- the one-parameter function
**fexists**(file_name) returns**true**if the named file exists,**false**otherwise. - The two-parameter functions
**open**(file_name,"TEXT-OUT");**open**(file_name,"TEXT-APPEND");**open**(file_name,"TEXT-IN");**open**(file_name,"BINARY-OUT");**open**(file_name,"BINARY-IN");**open**(file_name,"RANDOM");

convert the string name of a file into the 'handle' needed by other operations to reference the file. Their usage is illustrated by

file_handle := **open**(file_name,"RAND**OM**");Files can be opened only once, and must exist to be opened in any mode but "TEXT-OUT", "BINARY-OUT", or "RANDOM". The mode in which a file is be opened constrains the operations which can be applied to it, e.g. a file opened as "TEXT-IN" or "BINARY-IN" cannot be written to.

Both the "TEXT-OUT" and "BINARY-OUT" modes of file access begin by erasing the file being opened (or creating it if it does not already exist). Newly opened files are always positioned at their start. "TEXT-APPEND" is very similar to "TEXT-OUT", except that it starts writing the file being opened at its end.

The following list shows the operations which can be applied to files opened in the various possible modes:

- "TEXT-OUT" -
**close**,**fsize**,,**printa****n**.**printa** - "TEXT-APPEND" -
**close**,**fsize**,,**printa****n**.**printa** - "TEXT-IN" -
**close**,**fsize**,**reada**,.**geta** - "BINARY-OUT" -
**close**,**fsize**,**putb**. - "BINARY-IN" -
**close**,**fsize**,**getb**. - "RANDOM" -
**close**,**fsize**,**gets**,**puts**.

**close**(file_handle); closes the file referenced by file_handle, allowing it to be re-opened subsequently. Since SETL's operating environment only allows a limited number of files (about 20) to be opened simultaneously, it is necessary, when many files are being manipulated, to close some of them in order that others may be opened.Note that the files opened by a program are closed automatically when the program stops.

- the one-parameter function
**fsize**(file_handle) returns the length in bytes of the file referenced by file_handle. - the parameterless function
**eof**() can be called immediately after a file input operation like**reada**orhas been executed. It returns**geta****true**if the last input statement executed encountered an end of file,**false**otherwise.

(file_handle,x,y,..) statements convert their successive arguments (other than the first) to strings and print them to the file represented by 'file_handle'. A newline is written after the final character printed. To suppress this final action, use**printa****n**instead.**printa****reada**(file_handle,x,y,..); statements read one complete SETL object from the file represented by 'file_handle' for each of their (variable number of) arguments, and assign it to this argument. This statement generates an error it it encounters a quantity which is not the legal print form of a SETL object.**n**(file_handle,x,y,..) statements convert their successive arguments (other than the first) to strings and print them to the file represented by 'file_handle', without writing a newline after the final character printed.**printa****nprint**(x,y,..) statements convert their successive arguments to strings and print them to SETL's standard output file or console. No newline is written after the final character printed.**reada**(file_handle,x,y,..); statements read one complete SETL object from the specified file for each of its (variable number of) arguments, and assigns it to this argument. It generates an error it it encounters a quantity which is not the legal print form of a SETL object. The working of these statements is explored at length in Section XXX.**read**(x,y,..); statements read one complete SETL object from the standard SETL input file or console for each of its (variable number of) arguments, and assigns it to this argument. It generates an error it it encounters a quantity which is not the legal print form of a SETL object. Note however that in some environments a convenient standard input file may not be available.These statements are explored at length in Section XXX.

**reads**(stg,x,y,..); statements read from their string first argument rather than from a file. They read one complete SETL object from their string first argument for each of its (variable number of) arguments, and assigns it to this argument. The string section read is removed from the start of the string 'stg'.- the one-parameter function
**str**(x) converts any SETL object to its readable string representation. - the one-parameter function
**unstr**(stg) converts the readable string representation of SETL objects back into a SETL object O having this string representation. There may be several such objects, of which one is typically a string and the other a SETL object of some other kind, e.g. the printed representations of the null set {} and of the string "{}" are identical; the printed representation of 123 is identical with that of the string "123". When there is a non-string object O having 'stg' as its readable string representation,**unstr**(stg) will generally be this object, e.g.**print**(**unstr**("{}") = {}," ",**unstr**("[1,2,3]") = [1,2,3])yields "TRUE TRUE". If there is no such object O having 'stg' as its readable string representation,

**unstr**will often make a 'best efforts' attempt to convert at least the first part of 'stg' to an object. (**reada**and r**read**ead, which use**unstr**as a subroutine, behave in the same way.) In particular, variable names will be recognized as strings. For example,**print**(**unstr**("123a_b")," ",**unstr**("a_b")," ",**unstr**("a_b+..."))yields '123 a_b a_b'. However, when its input begins with an unexpected non-alphanumeric character,

**unstr**generally gives up and generates an error. For example,**unstr**("") generates not but an error.**OM**Since as just seen there can be several SETL objects (one often a string, the other not) having the same printed representation, the operations

**str**and**unstr**, and accordingly their file-oriented variants**reada**and, are generally but not invariably inverse to one another. The 'binary' file I/O operations described beloww repair this deficiency**printa**

Here is an example. The program

handle :=open("junk","TEXT-OUT");(handle,[4,5,6]," ",456," ",123);printaclose(handle); handle :=open("junk","TEXT-IN");(handle,line);getaclose(handle); handle :=open("junk","TEXT-IN");reada(handle,line);

yields

[4, 5, 6] 456 123 TRUE[4, 5, 6] FALSE

Note again that in many SETL operating environments no very convenient standard input file may be available.

- the one-parameter function
**binstr**maps any SETL value to a special binary string representation from which x can be reconstructed unambiguously using**unbinstr**. That is,**unbinstr(binstr**(x)) is always x. The only exceptions are for values x involving atoms or procedures; these may not be reconstructible in subsequent runs, when atoms might be renumbered, or after recompilation of procedures written by the**binstr**operator. - see
**binstr**for documentation of the one-parameter function**unbinstr** **putb**(file_handle,x,y,..); statements write one binary-coded SETL value to the file referenced by file_handle for each of their (variable number of) arguments. Objects are converted to tagged strings on exactly the same ay that**binstr**would convert them.**getb**(file_handle,x,y,..); statements read one binary-coded SETL value from the file referenced by file_handle for each of their (variable number of) arguments, and assigns the value read to this argument. These values must have been written by**putb**. The**binstr/unbinstr**rules stated above apply, since a**binstr**type of encoding is used by**putb**and assumed by**getb**.

writes and then rereads the procedure 'myproc', which is correctly executed after being reread. Execution is successful because the reread operation is part of the same uninterrupted SETL run as the preceding binary write of the procedure. Note also that the 'programtest; --binstr(myproc));unbinstr(s)();proceduremyproc;endmyproc;endtest;

writes the same procedure that our earlier example does. Then the programprogramtest; -- ohand :=open("junk","BINARY-OUT"); putb(ohand,s :=binstr(myproc));unbinstr(s)();proceduremyproc;endmyproc;endtest;

attempts to read and execute this procedure. But here the binary read operation fails and generates the error messageprogramtest; -- ohand :=open("junk","BINARY-IN"); getb(ohand,s);unbinstr(s)();endtest;

*** Abnormal End -- source file => line => 5 column => 9 Internal values are not preserved across program executions

**gets**(file_handle,start,n,x); statements read a string of length n from the file referenced by file_handle, starting at character 'start', and assigns the string read to x. The file must have been opened as "RANDOM" for this operation to be used.**puts**(file_handle,start,stg); statements writes the string 'stg' to the file referenced by file_handle, starting at character 'start'. The file, which is enlarged as muchh as necessary, must have been opened as "random" for this operation to be used.

These operations, which resemble SETL's string-slice and slice-assignment operations, treat files as long strings stored on disk. They move arbitrary strings of characters to and from such files without any conversion.

An extended example of the use of these operations is given beloww.

programtest; -- filereadaexample, version 2 line := "Doe John 81 85 Unsubmitted 75 Absent 68";reads(line,last_name,first_name,hw_1,hw_2,hw_3,hw_4,quiz_1,hw_5);for",first_name," ",last_name," is: " ,(0 +/ [if is_string(x)then0elsexend if: xin(tup := [hw_1,hw_2,hw_3,hw_4,hw_5])]) / #tup);endtest;

If the input file lines can be lightly punctuated with '[' and ']' marks indicating data groupings, this particularly elementary style of file input becomes even more flexible, as the following variant micro-program hints.

programtest; -- filereadaexample, version 2 line := "Doe John [81 85 Unsubmitted 75 68] [Absent]";reads(line,last_name,first_name,homework_record,exam_record]);for",first_name," ",last_name," is: " ,0 +/ (tup := [if is_string(x)then0elsexend if: x in homework_record]) / #tup);endtest;

SETL provides no built-in output-formatting facility, but it is not hard to create one using SETL's string-manipulation capabilities. The code shown below by creating a simplified PERL-like formatting facility, using which we can introduce formatting procedures by writing statements resembling

address_label := format( "========================================" + "\n" + "| @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |" + "\n" + -- name "| @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |" + "\n" + -- address "| @<<<<<<<<<<<<<<<<<<<<<<<<<<, @< @<<<< |" + "\n" + -- city state zip "========================================" );

and then writing

to convert any tuple containing the data that such a format expects to a printable string.

Fields in such a format into which values will be inserted are introduced by the reserved character '@', They can then continue either in the form '<<<...' (designating a left-justified field), or '>>>...' (right-justified field), '|||...' (centered field), or "####.###" (decimal fiesld of specified precision). Text in a format not belownging to any such field is carried forward without change.

The code required is

procedureformat(fmt_string); -- build a format closure var fmt; -- to form clousre returned beloww fmt := digest_format(fmt_string);return lambda(tup);returnput_in_format(tup,fmt);end lambda; -- return closureendformat;proceduredigest_format(fmt_string); -- digest a format stringinto a format tuple fmt_tup := []; -- will collect format tuplewhilefmt_string /= ""loop-- devour the input string piece :=break(fmt_string,"@");ifpiece /= ""thenfmt_tupwith:= piece;end if; -- collect string atsgn :=match(fmt_string,"@"); -- lookforfield stArtifatsgn /= ""then-- a field begins rest :=span(fmt_string,"<>|#");case(c := rest(1))when"<",">","|" => fmt_tupwith:= [c,#rest + 1]; -- collect a pairwhen"#" =>match(fmt_string,"."); tail :=span(fmt_string,"#"); -- determine desired precision fmt_tupwith:= [c,#rest + 1,#tail]; -- collect a tripleend case;endif;end loop;returnfmt_tup; -- return the collected tupleenddigest_format;procedureput_in_format(tup,fmt); -- insert tuple of values into digested format stg := ""; val_ctr := 0; --counter of values from tupleforeltinfmtloop-- process format text and fieldsif is_string(elt)thento_add := elt; -- string elements are used directlyelse-- must be tuple designating field val := tup(val_ctr +:= 1); -- otherwise get net valuefromtuple [sgn,n,p] := elt; -- unpack tuple element to_add :=casesgnwhen"<" => left_just(val,n) -- format field appropriatelywhen">" => right_just(val,n)when"|" => center(val,n)when"#" => decimal(val,n,p)end case;end if; stg +:= to_add; -- add formatted item to stringend loop;returnstg; -- return the string constructedendput_in_format;procedureleft_just(val,n); -- left-justify value in field of length n stg :=str(val) + n * " ";returnstg(1..n);endleft_just;procedureright_just(val,n); -- right-justify value in field of length n stg := n * " " +str(val);returnstg(#stg - n + 1..#stg);endright_just;procedurecenter(val,n); -- center value in field of length n ns := #(stg :=str(val)); stg := stg(1..nminns); -- padwithappropriate number of blanks on eftand thenleft-justifyreturnleft_just(((n - #stg) / 2) * " " + stg,n);endcenter;proceduredecimal(val,n,p); -- center value in field of length n stg :=str(val); befdot :=break(stg,".");returnright_just(befdot,n) + left_just(stg,p + 1);enddecimal;

The following small program can be used to test the procedure package seen above.

programtest; -- filereadaexample, version 2 address_label := format( "========================================" + "\n" + "| @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< |" + "\n" + -- name "| @|||||||||||||||||||||||||||||||||||| |" + "\n" + -- address "| @##.## @>>>>>>>>>>>>>>>>>>>, @< @<<<< |" + "\n" + -- city bill state zip "========================================" );endtest;

In this section we illustrate the use of the 'line-oriented' file operations by giving a procedure which can be used to sort very large files of text lines. The approach used is as follows. We regard the file to be sorted as a sequence of 'runs',each consiting of as many successive lines as are already in ascending order. The algorithm used, which is a 'tape' style sort which reads sequentially though each file used as input and writes reads sequentially to the end of each file used as output, actually sorts a pair of files together, so we start with such a pair of files, which we will call f1 and f2. To sort just one file f1 we simply let f2 be an empty file.

Sucessive pairs of runs from f1 and f2 are then merged together into a second pair f3, f4 of files, by taking the smallest available element of such a pair of runs and moving it to f3 as long as and ascending run results, but when the merged run cannot be continued because the next elements read from f1 and f2 are both larger than the last element moved to f3, we start building the next run at the end of f4, and so alternatingly until all the lines of f1 and f2 have been moved to f3 and f4 together. Once this point has been reached, we continue the same overall process, but now in the reverse direction, merging runs from f3 and f4 back into f1 and f2, and so back and forth until all runs have merged into one, at which point the whole files is sorted. Since each merge step combines two runs into one, each pass will cut the number of surviving runs by a factor of 2, so the total number of passes will be the logarithm of the total numberof lines in the input files. Thus our procedure attains the performance expected of sorting algorithms. Note also that if the two input files are already sorted, the algorithm will merge them in a single pass.

A few additional remarks will clarify the details of the following code, which works in the way just explained. The top level routines are file_merge(source1,source2,dest), which sorts a pair of source files into a destination file, and file_sort(source,dest), which just calls 'file_merge' with an empty file as second argument. file_merge(source1,source2,dest) calls the next-level routine files_merge(f1,f2,f3,f4) to merge runs from f1 and f2 into f3 and f4, as explained above, and repeats this step in alternating dirctions until just one fully sorted result remains. To make this possible, files_merge returns the number of ascending runs which remain at the end of each call to it.

In this procedure, as in all like procedures that use files, one must be sure to release all files by closing them as soon as a cycle of file use concludes. This allows files to be alternately written and read, and ensures that the files are properly rewound when next opened for reading. Be careful! if this is not done, subsequent file-open operations, and then the file I/O operations which depend on them, may fail without warning.

files_merge(f1,f2,f3,f4) carries out the single complete pass for which it is responsible by opening input handles to f1,f2 and output handles to f3,f4, and then calling the bottom-level routine 'more_runs' to merge runs from f1, f2 into f3 and f4 alternately

All in all, the code is

procedurefile_sort(source,dest); -- sort source file into dest file file_merge(source,OM,dest);endfile_sort;procedurefile_merge(source1,source2,dest); -- sort pair of source files into dest file f1 := of1 := source1 + ".1"; f2 := of2 := source1 + ".2"; f3 := dest; f4 := of4 := source1 + ".3"; -- we will use three auxiliary files altogether erase(f4); files_merge(source1,source2?f4,f1,f2);while(num_runs := files_merge(f1,f2,f3,f4)) > 1loop[f1,f2,f3,f4] := [f3,f4,f1,f2];end loop;iff3 /= destthencopy(f3,dest);end if; -- result is in the correct file erase(of1); erase(of2); erase(of4);endfile_merge;procedurefiles_merge(f1,f2,f3,f4); -- merge ascending runs from the first two files into the last two files; return the number of runs. f1_handle :=open(f1); f2_handle :=open(f2); -- open two files for reading f3_handle :=open("<" + f3); f4_handle :=open("<" + f4); -- open two files for writing num_runs := 1; [prior_f1,prior_f2] := []; -- initially no prior lines have been readwhile(pair := more_runs(f1_handle,prior_f1,f2_handle,prior_f2,f3_handle)) /= [] loop num_runs +:= 1; -- count up one more run [prior_f1,prior_f2] := pair; -- unpack lines read previously -- merge next pair of runs into the alternate output fileif(pair := more_runs(f1_handle,prior_f1,f2_handle,prior_f2,f4_handle)) = []thenclose(f1_handle);close(f2_handle);close(f3_handle);close(f4_handle); -- release all the files return num_runs;end if; num_runs +:= 1; -- count up one more run [prior_f1,prior_f2] := pair; -- unpack lines read previouslyend loop;close(f1_handle);close(f2_handle);close(f3_handle);close(f4_handle); -- release all the filesreturnnum_runs;endfiles_merge;proceduremore_runs(f1_handle,prior_f1,f2_handle,prior_f2,f3_handle); -- merge runs from f1 and f2 into f3;return[]iftheendof both input files is reachedifprior_f1 =OMthengeta(f1_handle,f1_elt);elsef1_elt := prior_f1;end if; -- try to get next element of each inputifprior_f2 =OMthengeta(f2_handle,f2_elt);elsef2_elt := prior_f2;end if;iff1_elt =OMand f2_elt =OMthenprint("immediate end");return[];end if; -- theendof both input files is reachediff1_elt =OMthen-- no more elements in f1printa(f3_handle,prior_elt := f2_elt); -- move f2 element to outputgeta(f2_handle,f2_elt); -- read another f2 elementelseif f2_elt =OMthen-- no more elements in f2printa(f3_handle,prior_elt := f1_elt); -- move f1 element to outputgeta(f1_handle,f1_elt); -- read another f1 elementelseif f1_elt < f2_eltthen-- use element of f1, which is smallerprinta(f3_handle,prior_elt := f1_elt); -- move f1 element to outputgeta(f1_handle,f1_elt); -- read another f1 elementelse-- use element of f2, which is smallerprinta(f3_handle,prior_elt := f2_elt); -- move f2 element to outputgeta(f2_handle,f2_elt); -- read another f2 elementend if;whilef1_elt /=OMor f2_elt /=OMloop-- now loop, moving the whole of an increasing runiff1_elt =OMthen-- no more elements in f1iff2_elt < prior_eltthenreturn[OM,f2_elt];end if; -- run ends; return f2_eltprinta(f3_handle,prior_elt := f2_elt); -- move f2 element to outputgeta(f2_handle,f2_elt); -- read another f2 elementelseif f2_elt =OMthen-- no more elements in f2iff1_elt < prior_eltthenreturn[f1_elt];end if; -- run ends; return f1_eltprinta(f3_handle,prior_elt := f1_elt); -- move f1 element to outputgeta(f1_handle,f1_elt); -- read another f1 elementelseif f1_elt < f2_eltthen-- use element of f1, which is smalleriff1_elt >= prior_eltthen-- can use the smaller elementprinta(f3_handle,prior_elt := f1_elt); -- move f1 element to outputgeta(f1_handle,f1_elt); -- read another f1 elementelseif f2_elt >= prior_eltthen-- can use the larger elementprinta(f3_handle,prior_elt := f2_elt); -- move f2 element to outputgeta(f2_handle,f2_elt); -- read another f2 elementelse-- neither element is goodreturn[f1_elt,f2_elt]; -- run ends; return both elementsend if;else-- use element of f2, which is smalleriff2_elt >= prior_eltthen-- can use the smaller elementprinta(f3_handle,prior_elt := f2_elt); -- move f1 element to outputgeta(f2_handle,f2_elt); -- read another f1 elementelseif f1_elt >= prior_eltthen-- can use the larger elementprinta(f3_handle,prior_elt := f1_elt); -- move f1 element to outputgeta(f1_handle,f1_elt); -- read another f1 elementelse-- neither element is goodreturn[f1_elt,f2_elt]; -- run ends; return both elementsend if;end if;end loop;return[]; -- all runs endedendmore_runs;procedureerase(file); -- file erasureprocedureclose(open("<" + file)); -- open the file for writing, andthenimmediatelycloseitenderase;procedurecopy(source,dest); -- file copyproceduresource_handle :=open(source); -- open the source file for reading dest_handle :=open("<" + dest); -- open the dest file for writinggeta(source_handle,line); -- read a first linewhileline /=OMloopprinta(dest_handle,line); -- move remaining linesgeta(source_handle,line);end loop;close(source_handle);close(dest_handle);endcopy;

The following main program can be used to test and time the procedures seen above. It shows that about 500 8-character lines can be sorted per second on a 450 Mhz Power Macintosh. The standard sorting routine on a Unix system of like performance will sort about ??? 8-character lines per second.

programtest; -- timing and test program for file sorting routines. handle :=open("<junk"); handle2 :=open("<junk3"); -- set up some test data for j in [1..n := 50000]loopstg := "00000000" + str(n - j);printa(if odd(j)thenhandleelsehandle2end if,stg(#stg - 6..));end loop;close(handle);close(handle2); print(time()); -- time the following operation file_merge("junk","junk3","junk2"); -- sort pair of source files into dest file print(time()); -- ********** Procedures go here **********endtest;

The commonest use of the binary file operations **getb** and **putb** is to save the state or history of some (often interactive) application which is to be restored subsequently. The following small program shows this code pattern.

programtest; -- binary state outputandrestoration y := ""; x := "xxx"; z := OM; -- data to be saved out_handle :=open("junk","BINARY-OUT"); -- binary write putb(out_handle,y,x,z);close(out_handle); -- and then later... in_handle :=open("junk","BINARY-IN"); -- binary reread getb(in_handle,y,x,z);close(in_handle);endtest;

Another way of accomplishing the same thing is

programtest; -- binary state outputandrestoration y := ""; x := "xxx"; z := OM; -- data to be saved out_handle :=open("junk","BINARY-OUT"); -- binary write putb(out_handle,[y,x,z]); -- data packedintupleclose(out_handle); -- and then later... in_handle :=open("junk","BINARY-IN"); -- binary reread getb(in_handle,yxz); [y,x,z] := yxz; -- unpack tupleclose(in_handle);endtest;

As a later example in this chapter shows, the closely related operators **binstr** and **unbinstr** can be used to flatten arbitrary SETL objects for storage in databases.

Interactive text editors will often support 'multiple undo', allowing a reversion of a large, perhps arbitrary, number of edit steps. This is typically implemented by saving the substring s replaced by any edit operation along with the upper and lower bounds u, l in the post-edit-step string of the section replaced. These triples are saved in a list (called the 'edit log') of tuples [s,u,l]. The immediately pre-edit state of a string 'current_string' can then be restored simply by executing

iterating as often as desired for multiple undo. The edit_log can easily be saved using **putb** and **getb**. It should be clear that I/O operations of a lower degree of precision would not be as convenient for this, since arbitary strings must be saved and restored.

Sometimes one will want to work with very large lists of SETL objects, e.g. tuples of 100,000,000 components, and these may need to be kept on disk rather than in RAM. This can be done by representing the tuple components as strings in **binstr** form, keeping these components in a main random access file F which represents the tuple, and setting up an auxiliary index file IX of fixed-length integer pairs (say 10 bytes) which indicates where each of these components starts, and its length. Components which are changed can be removed from their current positions and rewritten to the end of the file F. When F reaches some specified maximum length we can attempt to compress it by sorting the IX file into ascending order and then repacking F to eliminate all the 'holes' that have opened up when changed components were moved to the end of F.

The following code realizes these ideas. Index-file entries are 5 bytes, allowing tuples to have up to 2**40, or roughly 1 trillion, elements; more thanmost current disk systems can handle. (Since this code is set up to be compatible with the more elaborate 'database' procedures given slightly later, it contains several superfluous, but harmless, lines.)

programtest; --usestring_utility_pak;constzerozero := "\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"; -- coded pair of zeroes var bstr,the_hash; -- binstr form of value, and its hash var hash_locn; -- hashtable location of value located after search var first_empty,num_full := 0; -- first empty hashtable location on chain; number of entries used var hashtable_size; -- current hashtable size var m_handle,ix_handle,ht_handle,ix_size; -- working handles and sizes var rm_handle,rix_handle,rix_size; -- working handles and sizes for hashtable range vector var master_name; -- name of the master file test_tuples; -- invoke test codefortuple-related functions -- ************ principal file_tuple manipulation routines ************procedurefile_tup_get(dbid,ix); -- read SETL component from tupleif(offs := 10 * ix - 9) > ix_sizethen returnOM;end if; -- index out of range -- see if last element is coded [0,0]gets(ix_handle,offs,10,pos_ln_stg); -- get the position and length of the component [pos,ln] := stg_to_pair(pos_ln_stg); -- decodegets(m_handle,pos,ln,stg); -- read the component datareturnunbinstr(stg); -- decode and return the componentendfile_tup_get;procedurefile_tup_write(dbid,ix,val); -- write modified SETL component to tupleif(ixloc := 10 * ix - 9) > ix_sizethen-- index out of range nomstg := #(om_stg :=binstr(OM)); ixloc := ix_size + 1; -- first location in index file to be written m_size :=fsize(m_handle); -- get size of the main fileforjin[ix_size/10 + 2..ix]loop-- write seqence of OMs to the main fileputs(m_handle,msp1 := m_size + 1,om_stg); m_size +:= nomstg;puts(ix_handle,ixloc,pair_to_stg(msp1,nomstg)); ixloc +:= 10; ix_size +:= 10; -- keep the index size currentend loop;end if; -- we change an existing tuple component by writing the new value to the end of the file -- and changing its index entry -- if the component is the last component, the new value can over-write the old [st,ln] := file_append(m_handle,val); -- append SETL value, getting start and length in main fileputs(ix_handle,ixloc,pair_to_stg(st,ln)); -- get the position and length of the component ix_sizemax:= (ixloc + 9); -- enlarge the size if necessaryreturnval; --returnthe original valueendfile_tup_write;procedurefile_tup_append(dbid,val); -- append SETL value to end of tuple [st,ln] := file_append(m_handle,val); -- append SETL value, getting start and length in main fileputs(ix_handle,ix_size + 1,pair_to_stg(st,ln)); -- write the new component data ix_size +:= 10; -- note the additionreturnix_size - 9; -- return the relative location of the new index entryendfile_tup_append;procedurefile_append(handle,val); -- append SETL value to end of file fs :=fsize(handle);puts(handle,fsp1 := fs + 1,bs :=binstr(val));return[fsp1,#bs]; -- return starting point and length of stringendfile_append;procedurefile_tup_len(dbid); -- total length of file tuplereturnix_size/10; -- decode and return the componentendfile_tup_len;procedurefile_tup_setlen(dbid,the_len); -- set total length of file tupleifthe_len >= (len_now := file_tup_len(dbid))then return;end if; offs := 10 * len_now + 1;forjin[(the_lenmax0) + 1..len_now]loopputs(ix_handle,offs -:= 10,zerozero);end loop; ix_size := 10 * the_len; -- keep size up to dateendfile_tup_setlen; -- ************ initialization routines ************procedureclear_for_test(name); -- erase all files preparatory to test erase(name); erase(name + ".ix"); erase(name + ".ht"); erase(name + ".rix"); erase(name + ".rm");endclear_for_test;procedureopen_db(file_name); -- returns dbid m_handle :=open(master_name := file_name,"RANDOM"); -- access the main domain file ix_size :=fsize(ix_handle :=open(file_name + ".ix","RANDOM")); -- access the domain indexifix_size > 0then gets(ix_handle,ix_size - 9,10,stg);elsestg:= "";end if;whilestg = zerozeroloop--otherwisechain backwards as long as the pair encountered is [0,0]if(ix_size -:= 10) > 0then gets(ix_handle,ix_size - 9,10,stg);elsestg:= "";end if;end loop; hashtable_size :=fsize(ht_handle :=open(file_name + ".ht","RANDOM")); -- access the hash table rix_size :=fsize(rix_handle :=open(file_name + ".rix","RANDOM")); -- access the range index rm_handle :=open(file_name + ".rm","RANDOM"); -- access the main range filereturn"dbid1"; -- temporary dbidendopen_db; -- ************ debugging routines ************procedurefile_tup_reconstruct(dbid); -- reconstruct file tuple (for debugging)return[file_tup_get(dbid,j): jin[1..file_tup_len(dbid)]];endfile_tup_reconstruct;procedureerase(file); -- file erasure procedureclose(open(file,"TEXT-OUT")); --openthe file for writing, and then immediately close itenderase; -- ************ miscellaneous utilities ************procedurestg_to_pair(stg); -- convert pair of integers to 10-byte string xy := 0;forjin[10,9..1]loopxy *:= 256; xy +:=abs(stg(j));end loop;return[xymod1099511627776,xy /1099511627776];endstg_to_pair;procedurepair_to_stg(x,y); -- convert pair of integers to 10-byte string stg := ""; xy := x + 1099511627776 * y;forjin[1..10]loopstg +:=char(xymod256); xy /:= 256;end loop;returnstg;endpair_to_stg;endtest;

The two procedures pair_to_stg and stg_to_pair encode and decode pairs of integers not larger than 10**12 as 10-byte strings.

The code shown can be tested using the following test routine. Testing shows that a 450 Mhz Power Macintosh can handle about 1000 component reads per second. About 2/3 of this time is spent opening and closing the main and index files, work which can be avoided if handles to these files are kept in global variables. This easy improvement raises the capacity of a 450 Mhz Power Macintosh to 3000 component reads per second.

proceduretest_tuples; -- test codefortuple-related functions clear_for_test("junk"); -- erase all file preparatory to test dbid := open_db("junk");forjin[1..10]loopfile_tup_append(dbid,"convert pair" +str(j));end loop;thenwrite component 10"); file_tup_setlen(dbid,8);endtest_tuples;

Next suppose that files of the type just described are to be used, not merely as tuples T of SETL values, but as sets, making it necessary to provide a function file_loc(val) which returns the tuple index at which a give value is found (retuning **OM** if there is no such index.) Of the many techniques available for doing this, we shall explore just one, a so-called 'hashing' approach. This works by mapping the string representation of each setl value V into a fixed 5-byte integer hash(V), deliberately chosen to vary unpredictably if V is changed. These 'hash' values reference a table HT of size S roughly comparable to the length #T of the tuple T. The locator information for V is inserted at position hash(V) mod S + 1 of HT. The hash function serves to ensure that distinct components V_1, V_2 of T do not map to identical positions hash(V_1) mod S + 1, hash(V_2) mod S + 1 more often than the laws of probability would indicate. HT is kept in a file of 15-byte entries storing coded pairs of integers not larger than 10**12. When a location in HT is occupied by an entry, it stores the hash of the entry, its index in the tuple T, and the location in HT of the next entry having an identical hash. Empty locations in HT contain pointers to the next and previous empty locations, making it easy to find and occupy an empty location when one is needed (i.e. when an element being inserted into T is mapped to the same location in HT as an element already there.) We keep the table HT roughly half full at all times, doubling it in size when it becomes more than 3/4 full and cutting it in half when it becomes less than 1/4 full. In other regards, the management of changed components is like that used in the somewhat simpler 'file-tuple' code seen above.

procedureinitialize_hashfile(main_file_name,size); -- initialize a hashfile of a given size -- we write a chain of empty entries,of the form [prev,0,next] htt_handle :=open(main_file_name + ".ht","RANDOM"); offs := 1;puts(htt_handle,offs,quad_to_stg(0,0,offs + 30,0)); offs +:= 30; -- first elementforjin[2..size - 1]loopputs(htt_handle,offs,tts := quad_to_stg(offs - 30,0,offs + 30,0)); offs +:= 30;end loop;puts(htt_handle,offs,quad_to_stg(offs - 30,0,0,0)); -- last element first_empty := 1; num_full := 0;puts(htt_handle,30 * size + 1,pair_to_stg(0,0)); -- put zeroed wasted space info at theendof the hashtableclose(htt_handle); -- release the hashtablefor useendinitialize_hashfile;procedureopen_db(file_name); -- returns dbid m_handle :=open(master_name := file_name,"RANDOM"); -- access the maindomainfile hashtable_size :=fsize(ht_handle :=open(file_name + ".ht","RANDOM")) / 30; -- access the hash tablegets(ht_handle,30 * hashtable_size + 1,10,stg); --getthe wasted space infofromtheendof the hashtable [domfile_wasted,rangefile_wasted] := stg_to_pair(stg); -- decode it rm_handle :=open(file_name + ".rm","RANDOM"); -- access the mainrangefile domfile_size :=fsize(m_handle); rangefile_size :=fsize(rm_handle); -- note the file sizesreturn"dbid1"; -- temporary dbidendopen_db;procedureclose_db(dbid); --closespecified database, saving the information about wasted spaceputs(ht_handle,30 * hashtable_size + 1,pair_to_stg(domfile_wasted,rangefile_wasted)); -- put the wasted space info at theendof the hashtableclose(m_handle);close(rm_handle);close(ht_handle); --closeall the filesendclose_db;procedureget_map_val(dbid,dom_value); --get rangevaluefrom domainvalueif(locn := hashfile_locate(dbid,dom_value)) <= 0then returnOM;end if; --domainelementnotfound; val is OMgets(ht_handle,locn + 10,10,pos_ln_stg); --getthe positionandlength of therangecomponent [pos,ln] := stg_to_pair(pos_ln_stg); -- decodegets(rm_handle,pos,ln,stg); --readtherangedatareturn unbinstr(stg); -- decodeand returnthe componentendget_map_val;procedureset_map_val(dbid,dom_value,range_value); -- setrangevalue -- the length of newdomainvalues must be added to domfile_size,andthe length of deleteddomainvalues to domfile_wasted -- the length of newrangevalues must be added to rangefile_size,andthe length of deletedorchangedrangevalues to rangefilefile_wastedifrange_value =OMthen-- map value is being deletedif(locn := hashfile_locate(dbid,dom_value)) <= 0then returnOM;end if; --domainelementnotfound, nothing to delete [ln,rln] := release_hash_locn(locn - 10); -- release the hash entry at locn. returns lengths of items erased domfile_wasted +:= ln; rangefile_wasted +:= rln; -- note added wasted space may_halve_hashtable(dbid); -- halve the size of a a hashtableifneeded may_compress_domain(dbid); may_compress_range(dbid); -- compressrange/domain ifneededreturnOM;end if; --otherwisea map value is being insertedorchanged. -- If thedomainelement is new, the operation resembles 'hashfile_insert'if(locn := hashfile_locate(dbid,dom_value)) <= 0then--domainelementnotfound --nprint("notfound at prior hashchainend: ",locn); empty_spot := get_empty_spot(-locn); --getan empty spot,andattach it to the hash chainif any[st,ln] := file_append(m_handle,dom_value); -- append thedomainvalue to thedomainfile domfile_size +:= ln; [rst,rln] := file_append(rm_handle,range_value); -- appendrangevalue, getting startandlengthinmainrangefileputs(ht_handle,empty_spot,hex_to_stg(the_hash,0,st,ln,rst,rln)); -- write the new entry, which is theendof its chain rangefile_size +:= rln;returnrange_value; --returnthe original valueend if; --otherwisewe simply change therangeentry at locn; the operation resembles 'file_tup_write' [rst,rln] := file_append(rm_handle,range_value); -- append SETL value, getting startandlengthin rangefilegets(ht_handle,locn + 15,5,stg); --getlength ofrangeelement being deleted rangefile_wasted +:= int_of(stg); rangefile_size +:= rln; -- note added wasted spaceandadded space may_compress_range(dbid); -- compressrange/domain ifneededputs(ht_handle,locn + 10,pair_to_stg(rst,rln)); -- write the positionandlength of therangeelementreturnrange_value; --returnthe original valueendset_map_val;procedurehashfile_insert(dbid,dom_value); -- insert a value into a hashtableif(locn := hashfile_locate(dbid,dom_value)) > 0then returnlocn;end if; -- element is found;returnits location empty_spot:= get_empty_spot(-locn); --getan empty spot,andattach it to the hash chainif any[st,ln] := file_append(m_handle,dom_value); -- append thedomainvalue to thedomainfileputs(ht_handle,empty_spot,quad_to_stg(the_hash,0,st,ln)); -- write the new entry, which is theendof its chain --nprint(" hashfinsert: ",dom_value," ",locn," insert at: ",empty_spot,[the_hash,0,st,ln]);returnempty_spot + 10; --returnthe location of the new elementinthe hash tableendhashfile_insert;procedurehashfile_delete(dbid,value); -- delete a valuefroman unordered tuplewithauxiliary hashtableif(locn := hashfile_locate(dbid,value)) <= 0then returnlocn;end if; -- elementnotpresent; nothing to do --otherwisethe global variable hash_locn will contain the hashtable location of the value located [ln,-] := release_hash_locn(locn - 10); -- release the hash entry at locn domfile_wasted +:= ln; may_compress_domain(dbid); -- compress datafileifneeded may_halve_hashtable(dbid); -- halve the size of a a hashtableifneededendhashfile_delete;procedurerelease_hash_locn(hash_locn); -- release the hash entry at hash_locn -- attach the hashtable entry at (the global) hash_locn to the empty locations chain, as the new first_emptygets(ht_handle,first_empty,15,trip_stg); --getht entry at first_empty location [-,-,next_of_first] := stg_to_trip(trip_stg); -- decode itputs(ht_handle,ofe := first_empty,quad_to_stg(hash_locn,0,next_of_first,0)); -- revise former first_empty locationgets(ht_handle,hash_locn + 15,15,stg); [ln,-,rln] := stg_to_trip(stg); --getinfo on items erasedputs(ht_handle,hash_locn,quad_to_stg(0,0,first_empty,0)); -- write the new first_empty location first_empty := hash_locn; -- hash_locn becomes the new first_emptyreturn[ln,rln]; --returninfo on items erasedendrelease_hash_locn;procedureget_empty_spot(locn); --getan empty spotina hashtable, attaching it to a hash chainiflocn /= 0. -- this also calls the may_double_hashtable routine, to track hashtable occupancyiflocn = 0then-- make insertion into spot indicated by the_hash; this is empty -- first drop the empty elementfromthe chain of empty elements ht_place := 30 * (the_hashmodhashtable_size) + 1; -- spot indicated by the_hashgets(ht_handle,ht_place,15,trip_stg); --getht entry at current location [prev,-,next] := stg_to_trip(trip_stg); -- decode itifnext /= 0then-- there is a next; drop linkfromnextgets(ht_handle,next,15,trip_stg); --getht entry at next location [-,-,next_of_next] := stg_to_trip(trip_stg); -- decode itputs(ht_handle,next,trip_to_stg(prev,0,next_of_next));end if;ifprev /= 0then-- there is a prev; drop linkfromprevgets(ht_handle,prev,15,trip_stg); --getht entry at prev location [prev_of_prev,-,-] := stg_to_trip(trip_stg); -- decode itputs(ht_handle,prev,trip_to_stg(prev_of_prev,0,next));else--ifthere is no prev, this is the first element, so its next should become the new first_empty first_empty := next;end if; may_double_hashtable(dbid); -- enlarge the hashtableifit is getting fullreturnht_place;end if; --otherwisewe make insertion intoanyempty locationinthe hashtable, attaching this to the hash chain item at locngets(ht_handle,first_empty,15,trip_stg); --getht entry at first_empty location [-,-,next_of_first] := stg_to_trip(trip_stg); -- decode itgets(ht_handle,next_of_first,15,trip_stg); --getnext ht entryinthe empty chain [-,-,next_of_next] := stg_to_trip(trip_stg); -- decode itputs(ht_handle,next_of_first,trip_to_stg(0,0,next_of_next)); -- zero the 'prev' pointerinthe new first element empty_spot := first_empty; -- this is our 'empty spot' first_empty := next_of_first; -- note the new first emptygets(ht_handle,locn,5,int_stg); --getfirst part of ht entry at locn, to do attachment elt_hash := int_of(int_stg); -- decode itputs(ht_handle,locn,pair_to_stg(elt_hash,empty_spot)); -- write, thus attaching empty spot to hash chain may_double_hashtable(dbid);returnempty_spot;endget_empty_spot;proceduremay_double_hashtable(dbid); -- double the size of the hashtableifneeded.if4 * (num_full +:= 1) < 3 * hashtable_sizethen return;end if; -- occupancy ok rebuild(2 * hashtable_size); -- double the hashtableendmay_double_hashtable;proceduremay_halve_hashtable(dbid); -- halve the size of the hashtableifneeded.if4 * (num_full +:= 1) > hashtable_sizeorhashtable_size < 128then return;end if; -- occupancy ok rebuild(hashtable_size/2); -- halve the hashtableendmay_halve_hashtable;proceduremay_compress_domain(dbid); -- compress thedomainfileifneededifdomfile_size > 2 * domfile_wastedthen return;end if; -- occupancy OK hashfile_compress(false); -- compress thedomainfileendmay_compress_domain;proceduremay_compress_range(dbid); -- compress therangefileifneededifrangefile_size > 2 * rangefile_wastedthen return;end if; -- occupancy OK hashfile_compress(true); -- compress therangefileendmay_compress_range;procedurehashfile_compress(is_rt); -- compress a pair of files representing a hashtable -- 'is_rt' istrue ifwe are compressing therangefile rather than thedomainfile file_handle :=ifis_rtthenrm_handleelsem_handleend if; -- attach current location indicators to the index file, as final component. -- This is written as a line file, so that the line-file sort described previously can be used. nix_handle :=open("x.ix","TEXT-OUT"); --openthe auxiliary file -- pad the hashfile to full lengthifneededif(fs :=fsize(ht_handle)) < (ht30 := 30 * hashtable_size)thenputs(ht_handle,fs + 1,(ht30 - fs) * "\x00");end if;forlocin[11,41..30 * hashtable_size - 1]loop--loopover the hashtable, getting its non-empty elementsgets(ht_handle,loc,20,stg); [st,ln,st2,ln2] := stg_to_quad(stg);ifln = 0then continue;end if; -- bypass empty elementsifis_rtthen printa(nix_handle,st2," ",ln2," ",loc);else printa(nix_handle,st," ",ln," ",loc);end if;end loop;close(nix_handle); -- releaseforsorting file_sort("x.ix","x.ix2"); erase("x.ix"); -- sort the index file by its first component (locationindata file of item referenced) -- work thru this sorted file, moving referenced elements downward to new positions (innew file) nix_handle :=open("x.ix2","TEXT-IN"); --openthe fileforreading erase("x.m"); x_handle :=open("x.m","RANDOM"); --openan auxiliary fileforstoring the moved data writing_point := 1; -- data spaceinthe new file used so farreada(nix_handle,st,ln,loc); --readthe next index triplewhile not eof()loop--loopthru all the lines, which are nowinorder of increasing startsgets(ifis_rtthenrm_handleelsem_handleend if,st,ln,stg); --read fromthe old data fileputs(x_handle,writing_point,stg); -- write to the new data fileifis_rtthen-- note the new positionsinthe original hashtableputs(ht_handle,(oloc := loc) + 10,pts := pair_to_stg(writing_point,ln)); -- write therangepositionandlengthelseputs(ht_handle,oloc := loc,pts := pair_to_stg(writing_point,ln));end if; writing_point +:= ln; -- advance the write pointreada(nix_handle,st,ln,loc); --readthe next index tripleend loop;close(nix_handle); -- release the fileclose(file_handle); erase(nfn := master_name +ifis_rtthen".rt"else""end if); file_handle :=open(nfn,"RANDOM"); --nullthe old file preparatory to copying copy_file(x_handle,file_handle); -- replace the old data file by the new data file (copy)ifis_rtthenrm_handle := file_handle;elsem_handle := file_handle;end if;close(x_handle); erase("x.m"); erase("x.ix2"); -- erase the new data fileandthe sorted index fileifis_rtthenrangefile_wasted := 0; rangefile_size := writing_point - 1;elsedomfile_wasted := 0; domfile_size := writing_point - 1;end if;endhashfile_compress;procedurerebuild(new_size); -- rebuild hashtable at new size initialize_hashfile("x",new_size); -- initialize a hashfile of the new size -- now we transfer all the hashtable elementsfromthe old to the new table -- this is done by searching thru the old hash tableforall the non-empty elements --andinserting them into the newhash table at positions determined by their hash, -- which is availableinthe table entry. x_handle :=open("x.ht","RANDOM"); -- access the new tableforjin[1,31..30 * (hashtable_size - 1)]loopgets(ht_handle,j,30,hex_stg); --getthe entry [the_hash,-,dloc,dlen,rloc,the_rlen] := stg_to_hex(hex_stg); -- decode itifdlen = 0then continue;end if; -- bypass the empty entries newloc := hashentry_locate(x_handle,the_hash,new_size); -- locate the new-hashtable position of the entry -- this routine also attaches the new entry to theendof the appropriate new-hashtable chainputs(x_handle,newloc,hex_to_stg(the_hash,0,dloc,dlen,rloc,the_rlen)); -- write the new entry, which is lastinits chainend loop;close(ht_handle); -- to allow erasure erase(ht := master_name + ".ht"); -- finish by copying the new hashtable to the old,and thenerasing the new ht_handle :=open(ht,"RANDOM"); copy_file(x_handle,ht_handle); -- do file copy hashtable_size := new_size; -- note the new hashtable sizeclose(x_handle); erase("x.ht"); -- remove the auxiliary fileendrebuild;procedurecopy_file(from_handle,to_handle); -- file copy operation new_size :=fsize(from_handle);foroffsin[0,32768..fsize(from_handle) - 1]loop-- move by blocksof 32Kgets(from_handle,offs + 1,32768min(new_size - offs),stg);puts(to_handle,offs + 1,stg);end loop;endcopy_file;procedurehashentry_locate(handle,the_hash,sz); -- locate ansreturnthe new hashtable position of an entrywithspecified hash -- this routine also attaches the new entry to theendof the appropriate new-hashtable chain locn := orig_locn := 30 * (the_hashmodsz) + 1; -- starting pointforhash table searchwhilelocn /= 0loop-- search the hash chaingets(handle,locn,20,quad_stg); --getht entry at current location [prev_or_hash,chnex,next_or_domloc,dom_ln] := stg_to_quad(quad_stg); -- decode itifdom_ln = 0then-- location emptyiflocn = orig_locnthen-- the entry found is the firstinits hash chainifnext_or_domloc /= 0then-- there is a next; drop linkfromnextgets(handle,next_or_domloc,15,trip_stg); --getht entry at next location [-,-,next_of_next] := stg_to_trip(trip_stg); -- decode itputs(handle,next_or_domloc,trip_to_stg(prev_or_hash,0,next_of_next));end if;ifprev_or_hash /= 0then-- there is a prev; drop linkfromprevgets(handle,prev_or_hash,15,trip_stg); --getht entry at prev location [prev_of_prev,-,-] := stg_to_trip(trip_stg); -- decode itputs(handle,prev_or_hash,trip_to_stg(prev_of_prev,0,next_or_domloc));else--ifthere is no prev, this is the first element, so its next should become the new first_empty first_empty := next_or_domloc;end if;returnlocn;else-- the entry found isnotthe firstinits hash chain -- wereturnthe first empty locationinthe hashtable, attaching this to the hash chain item at locngets(handle,first_empty,15,trip_stg); --getht entry at first_empty location [-,-,next_of_first] := stg_to_trip(trip_stg); -- decode itgets(handle,next_of_first,15,trip_stg); --getnext ht entryinthe empty chain [-,-,next_of_next] := stg_to_trip(trip_stg); -- decode itputs(handle,next_of_first,trip_to_stg(0,0,next_of_next)); -- zero the 'prev' pointerinthe new first element empty_spot := first_empty; -- this is our 'empty spot' first_empty := next_of_first; -- note the new first emptygets(handle,locn,5,int_stg); --getfirst part of ht entry at locn, to do attachment elt_hash := int_of(int_stg); -- decode itputs(handle,locn,pair_to_stg(elt_hash,empty_spot)); -- write, thus attaching empty spot to hash chainreturnempty_spot;end if;end if;ifchnex = 0then--endof hash chain reached;notfoundgets(handle,first_empty,15,trip_stg); --getht entry at first_empty location [-,-,next_of_first] := stg_to_trip(trip_stg); -- decode itgets(handle,next_of_first,15,trip_stg); --getnext ht entryinthe empty chain [-,-,next_of_next] := stg_to_trip(trip_stg); -- decode itputs(handle,next_of_first,trip_to_stg(0,0,next_of_next)); -- zero the 'prev' pointerinthe new first element empty_spot := first_empty; -- this is our 'empty spot' first_empty := next_of_first; -- note the new first emptygets(handle,locn,5,int_stg); --getfirst part of ht entry at locn, to do attachment elt_hash := int_of(int_stg); -- decode itputs(handle,locn,pair_to_stg(elt_hash,empty_spot)); -- write, thus attaching empty spot to hash chainreturnempty_spot;end if; locn := chnex; -- advanceinsearchloop;end loop; -- should neverexitabort("should neverexit");endhashentry_locate;procedurecount_hashtable_empties(); -- examine the hashtable empty chain count := 0; cur := first_empty;whilecur /= 0loopcount +:= 1;gets(ht_handle,cur,15,trip_stg); [-,-,cur] := stg_to_trip(trip_stg); -- decode itend loop;endcount_hashtable_empties;procedurehashfile_locate(dbid,value); --findthe location of a valueina hashtable -- this returns the byte locationinthe hashtable of thedomainfield, a positive value,ifthe item is found; --otherwise0ifthe first location tried is empty; --otherwiseit returns -(the locationinht of the last hash-hain entry. -- the hashtble structure is hash,next_in_chain,dom_loc,dom_len,rng_loc,rng_lenforfull entries; -- prev,0,nextforempty entries the_hash := hash(bstr :=binstr(value)); -- convert the value to binary string formandhash it locn := orig_locn := 30 * (the_hashmodhashtable_size) + 1; -- starting pointforhash table search lbstr := #bstr; -- length of the binary stringwhilelocn /= 0loop-- search the hash chaingets(ht_handle,locn,20,quad_stg); --getht entry at current location [prev_or_hash,chnex,next_or_domloc,dom_ln] := stg_to_quad(quad_stg); -- decode itifdom_ln = 0then-- first location empty;notfoundreturn iflocn = orig_locnthen0else-locnend if;end if;ifprev_or_hash = the_hashthen-- hashesmatch, so we may have found the desired elementifdom_ln = lbstrthen-- lengthsmatch, so we may have found the desired elementgets(m_handle,next_or_domloc,dom_ln,stg); --getdata at location referencedifstg = bstrthen returnlocn + 10;end if; -- item foundend if;end if;ifchnex = 0then return-locn;end if; --endof hash chain reached;notfound locn := chnex ; -- advanceinsearchloop;end loop; -- should neverexitabort("should neverexit");endhashfile_locate;procedurefile_append(handle,val); -- append SETL value toendof file fs :=fsize(handle);puts(handle,fsp1 := fs + 1,bs :=binstr(val));return[fsp1,#bs]; --returnstarting pointandlength of stringendfile_append; -- ******** encoding utilitiesandhash fcn ********procedurehash(stg); -- hash a string into a 5-byte integerconstsmaller_prime := 1099511627689; -- 2 * 2 * 2 * 3 * 3 * 1487 + 1, the largest prime no larger than 1099511627776 = 256**5.constprimroot := 892417237418; -- a primitive root modulo this prime the_hash := 0;forcinstgloopthe_hash := (the_hash * primroot +abs(c))modsmaller_prime;end loop;return(the_hash * the_hash)modsmaller_prime;endhash;procedurestg_of(x); -- integer to 5-byte string stg := "";forjin[1..5]loopstg +:=char(xmod256); x /:= 256;end loop;returnstg;endstg_of;procedureint_of(stg); -- integerfrom5-byte string x := 0;forjin[5,4..1]loopx *:= 256; x +:=abs(stg(j));end loop;returnx;endint_of;procedurepair_to_stg(x,y); -- convert pair of integers to 10-byte stringreturnstg_of(x) + stg_of(y);endpair_to_stg;procedurestg_to_pair(stg); -- convert pair of integers to 10-byte stringreturn[int_of(stg(offs..offs + 4)): offsin[1,6]];endstg_to_pair;proceduretrip_to_stg(x,y,z); -- convert triple of integers to 15-byte stringreturnstg_of(x) + stg_of(y) + stg_of(z);endtrip_to_stg;procedurestg_to_trip(stg); -- convert pair of integers to 15-byte stringreturn[int_of(stg(offs..offs + 4)): offsin[1,6,11]];endstg_to_trip;procedurequad_to_stg(x,y,z,w); -- convert quad of integers to 20-byte stringreturnstg_of(x) + stg_of(y) + stg_of(z) + stg_of(w);endquad_to_stg;procedurestg_to_quad(stg); --getquad of integersfrom20-byte stringreturn[int_of(stg(offs..offs + 4)): offsin[1,6..16]];endstg_to_quad;procedurehex_to_stg(x,y,z,w,u,v); -- convert hex of integers to 30-byte stringreturnstg_of(x) + stg_of(y) + stg_of(z) + stg_of(w) + stg_of(u) + stg_of(v);endhex_to_stg;procedurestg_to_hex(stg); -- gwt hex of integersfrom30-byte stringreturn[int_of(stg(offs..offs + 4)): offsin[1,6..26]];endstg_to_hex;procedureerase(file); -- file erasureprocedureclose(open(file,"TEXT-OUT")); --openthe fileforwriting,and thenimmediatelycloseitenderase;