Control structures

Execution of a SETL program proceeds sequentially, one statement being executed after the other. In the simplest case, the order of execution is simply the order in which the statements are written in the program. For example, consider the following:

				a := 1;
				print("Initially, a = ", a);
				a := a + 1;
				print("Finally, a = ", a);

In this example, the variable a is assigned the value 1; then the first message is printed; a is then assigned the value 2; and finally the second message is printed.

Only the simplest computations can be carried out by such straight-line programs. In order to perform more complex calculations, we need to be able to describe conditional computations, i.e. computations that are to be executed only when certain conditions are met. We also need to program repeated computations, i.e. computations to be executed a number of times (100 times, or for all elements in a set, or until a certain calculation converges, or as long as a certain value has not been reached, etc.).

The order in which these more complex computations are executed is specified in the program text by means of language constructs commonly called control structures. In this chapter we will examine the most important control structures of the SETL language, namely: the if statement, case statement, for statement, while statement, and loop statement. The if , case, for, and some variant of the while constructs are commonly found in most modern programming languages and are regarded as the basic tools of "structured programming." The for construct in SETL is a bit richer than the loop constructs provided by most other languages, and it is specially tailored for the objects that characterize SETL, namely sets, tuples, and maps.

4.1 The if Statement

The if statement is used to route program execution along one of several alternate paths, chosen according to some stated condition. An example is

				if  balance > O then
					print("Your line of credit is:", balance);
					print("you are overdrawn by:", - balance);
				end if;

				print("Do you want additional information (y/n)?");

Here, the condition (i.e. whether the value of balance is positive or negative) determines which of two messages is printed. If the condition being tested is TRUE (i.e., the balance is positive) the statement following the keyword then is executed; if the condition is FALSE, the statement following the keyword else is executed instead.

After execution of the statements in either branch of the if statement, program execution continues from the first statement following the end of the if . In the preceding example, after execution of one of the branches of the if , the query "Do you want additional information(y/n)?" will be printed.

Any number (except 0) of statements can appear in either branch of an if statement. For example, we can write:

				if  line >= 50 then
					page := page + 1;
					line := 1;
					line := line + 1;
				end if;

In this case, if the condition 'line >= 50' is true, then the assignments to page and line are performed; otherwise,line is incremented. The syntax of the form of the if statement shown previously is

				if  condition then
					group of statements
					group of statements
				end if;

The construct 'condition' denotes any Boolean expression (see Section 2.2.4), i.e., any expression which yields either TRUE or FALSE. The group of statements in each branch designates any sequence of executable statements, which can be assignments, control statements such as other ifs, loops, etc.

The end of the if construct is indicated by the keywords end if.The following example illustrates the use of nested if statements and shows the way in which comments can be used to disambiguate the ends of nested if s.

			if  balance > O then

				print("Your line of credit is:", balance);


				print("you are overdrawn by:", - balance);

				if  line >= 50 then
					page := page + 1;
					line := 1;
					line := line + 1;
				end if;	-- end if line >= 50
			end if;	-- end if balance > O
Here is another such example:
			if  a /= 0 then

				if  b**2 > 4.0 * a * c then

					discr := sqrt(b ** 2 - 4.0 * a * c);
					print("r1 =",(-b + discr) / 2.0 * a);
					print("r2 =",(-b - discr) / 2.0 * a);

					print("Complex roots");
					re_part :=b / 2.0 * a;
					im_part := sqrt(4.0 * a * c - b ** 2) / 2.0 * a;
					print("r1 = ", re_part, "+ i",im_part);
					print("r1 = ", re_part, "- i",im_part);

				end if;	-- end if  b**2

				if  b /= 0 then
					print("Single root:",c/b);

					print("degenerate equation: a = b = 0");

				end if;	-- end if b /= 0

			end if;	-- end if a /= 0;

4.1.1 Omitting the else branch of an if statement

Sometimes we want to perform a series of actions when a certain condition is met, but to do nothing if it isn't. In this case it is possible to omit the else branch of an if statement, as illustrated in the following simple example:

if token notin keywords then print("Unrecognized operator:", token); end if;

If the condition is true, the statements following the then are executed; if the condition is false, this kind of if statement does nothing.

4.1.2 The null statement

For reasons of readability, it is often desirable to indicate both branches of an if statement, even if one of them is to do nothing. A -do nothing- statement is provided for this purpose. It is written as:


and causes no computation at all. This allows us to write the previous example as follows:

			if  token notin keywords then
				print("Unrecognized operator:", token);
			else null;
			end if;

This can also be expressed as

			if  token in keywords then
				print("Unrecognized operator:", token);
			end if;
Since SETL does not allow empty 'if ' statements (or loop statements), the null statement must be used whenever an 'if ', with its condition, is written and compiled before its body is filled in. In this case, the null statement stands as a surrogate for the body of statements to be filled in later.

4.1.3 Multiple alternatives in an if statement

We often encounter the following programming situation: when the condition of an if statement is false, we immediately perform another test to choose among another pair of alternatives, and so on. This can be expressed by means of nested if s but can be more clearly stated by "continuing" the if statement by means of a special construct to designate subsequent alternatives. In SETL, this is done using elseif, as shown in the following example:

		if  month = "February" then

			if  year mod 4 = O and year mod 200/= 0 then
				days := 29;
				days := 28;
			end if;

		elseif month in {"September","April","June","November"} then
				days := 30;
				days := 31;
		end if;

Here, three alternatives are being examined: whether the month is February, or is one of the 30-day months, or is one of the remaining months. Any number of alternatives can appear in this more general if construct, whose syntax is

Figure 4.1 if_Statement Syntax Diagrams

as follows (see Figure 4.1):

				if  condition then
					group of statements
				elseif condition then
					group of statements
					group of statements
				end if;

Note the important syntactic point:

-elseif is a single word, and it indicates an alternate test within the current if statement.

-else if, on the other hand, indicates that within the else branch of the current if statement, a nested if statement is present, which will need its own closing end. Be warned: if you use else if when elseif should be used, syntax errors, namely "missing end" messages, will result.

4.1.4 An important note on indentation and programming style

The physical layout of a SETL program on a printed page (or the screen) is of no concern to the SETL compiler. As long as the syntax of the language is obeyed, the user is free to write successive SETL statements with bizarrely varying indentation, to place several statements on the same line of text, etc. For the human reader, on the other hand, a good choice of program layout can make all the difference between clarity and hopeless muddle. This is particularly true when a program needs to to be read and understood by several programmers. Proper indentation should reflect program structure in such a way as to serve as additional implicit documentation on the intent of a program. The following maxim should be kept in mind: Programming is a social activity. If the programs you write are of any interest, there is a high likelihood that somebody else will want to examine them, so as to extend, modify, or simply understand their workings. (Often enough, this somebody else may be you, going back to a program written months before, trying to recapture the thought processes that led you to various programming decisions.) In other words, a program must be seen as a tool for communication, not only from programmer to computer, but also among programmers. From this perspective, it is easy to see that good indentation and program layout, helpful choice of variable names, and ample, carefully considered documentation, are hallmarks which distinguish the professional programmer's work from that of the amateur.

In the case of if statements, it is natural to regard the group of statements in each branch of the if as subordinate to the condition which introduces them. This is clearly reflected in the text if we indent the statements in each branch, with respect to the if and else keywords, as was done in the preceding examples. An additional rule to follow is to place the else in a line by itself, unless the corresponding branch reduces to a single short statement (for example: null;). The examples in this text follow these rules, as well as other ones which we will mention in connection with other control structures. As is usually the case for rules of style, these should only be regarded as guidelines and suggestions, to be tempered by individual taste. However, some consistent indentation and paragraphing style should be chosen.

4.1.5 The if expression

An if statement often is used to assign one of several values to a given variable. For example, one may write

if a > b then maxab := a; else maxab := b; end if;

In this case, the if expression (also called conditional expression) provides a clearer way of achieving the same intent. The syntax of an if expression is similar to that of the if statement (Figure 4.1). It denotes a value that depends on the outcome of a test (or tests). The general syntax of an if expression is

the following:

					test 2       				           (1)
					expr2 .....
				end if

This construct may be used in any position where an expression of any other kind would be acceptable. For example the if statement (1) can be written as

	maxab := if  a > b then a else b end if;		(2)

The following are also valid examples of if expressions:

	print(if  filler = "" then "***" else filler + "*" end if);
	print((if  filler = "" then "**" else filler end if) + "*");
		distance := distance + (if  edge = OM then 0
			else length(edge) end if);

Unlike the if statement, the if expression must obey the following rules:

  1. In an if expression, an else part must always be present (to ensure that the expression has a value in all cases).

  2. The terminator of an if expression must be a simple end if.

  3. There is no semicolon preceding the keywords elseif and else in an if expression. This is because these keywords are preceded by expressions. In contrast, these same keywords are preceded by semicolons in an if statement, because in that case a semicolon terminates the statement previous to the keyword.

If expressions can be nested, as the following rewriting of our "days in the month" example shows:

days := if  month = "February" then
		(if  year mod 4 = O and year mod 200 /= 0 then 29 else 28 end if)
	elseif month in {"September","April","June","November"} then 30
	else 31
	end if;

Figure 4.3 Case-of Statement Syntax Diagrams

4.2 The case Statement

The case statement is a generalization of the if statement. Whereas the if statement controls the flow of execution of a program by choosing between two alternatives, the case statement allows us to choose among any number of alternative paths of execution (see Figure 4.3). The case statement is available in two forms. Of these, the first and most general is

					when test1 => statement1
					when test2 => statement2
					when test3 =>: statement3
					when testn => statementn
					otherwise => statemente
				end case;

Each of statement1, statement2.. and statemente must be a sequence of one or more statements. Each of the expressions testl, test2.. must be a Boolean expression. Execution of this form of the case statement proceeds as follows:

  1. The expressions test1, test2..are evaluated. If one of them, say testi, yields true, then the corresponding statement, i.e. statement i, is executed, and then execution proceeds to the first statement that follows the case statement. If several of the expressions test1, test2.. evaluate to true, then any one of them is chosen and the corresponding statement is executed. The case statement thus differs from a similar sequence of if and elseif statements, where the tests are made in sequence.

  2. If none of the tests evaluates to true, then statemente, which follows the else clause of the statement, is executed. This else clause is optional. If the else clause is absent, and none of the tests in the case statement evaluates to true, the case statement is simply bypassed, and execution continues with the first statement that follows it.
In this case, statementn is a candidate for being executed if any one of the tests testl, test2.. yields true.

As a first example of the use of a case statement, the following SETL fragment calculates the volume of various geometric figures:

			when figure = "cube" =>
				volume := side ** 3;
			when figure = "sphere"=>
				volume := (4/3) * PI * radius**3;
			when figure = "cylinder"=>
				volume := PI * radius**2 * height;

		otherwise =>

				print("Sorry, I don't recognize this figure");
				volume := 0.0;
		end case;

As this example shows, it is quite common for the tests in a case statement simply to test a particular variable or expression for equality with a series of constants. The following second form of the case statement simplifies the writing of case statements for which this is true (see Figure 4.4):

		case expr

			when constant1 => statement1=>

			when constant2 =>  statement2

			when constantn => statementn

			otherwise => statemente

		end case;			

Figure 4.4 case-expr Statement Syntax Diagram

The expression in the header is evaluated (once) to give a test value. If the evaluation yields one of the constants prefixed to a branch of the case, say constantk, then the associated statement sequence is executed. The otherwise sequence is executed if the value of expr does not appear as the prefix of any branch of the case statement. The otherwise sequence can be omitted if no action is to be taken when this happens. In this case statement form, multiple constants can be attached to one branch by writing

when constant1, constant2,..., constantn => statements

If this is done, the block will be executed if the value of the expression in the case header equals any of the values constant1,..., constantn.

4.2.1 The case expression

One will sometimes want to use a case construct simply to assign one of several alternative values to a variable. This can be done with a case statement, for example:

		case day

			when Sunday => discount := 0.6;

			when Saturday => discount := 0.4;

			when Monday, Tuesday, Wednesday, Thursday, Friday =>
				discount := 0.0;
		end case;

In this example, the purpose of the case statement is simply to assign an appropriate value to the the variable 'discount'. In such situations the case expression can be used instead. A case expression can appear wherever an expression can appear. Its syntax can be that described by either of the syntax diagrams in Figure 4.5.

Evaluation of a case expression closely resembles that of the case statement. The execution of a case expression of the form (1) proceeds as follows:

  1. The expression following the case keyword is evaluated, yielding some value v.

  2. If v equals the value of one of the constants that mark a branch of the case expression, then the value of the expression tagged by that constant is the value of the case expression.

    Figure 4.5 case Expression Syntax Diagrams

  3. If none of the constants equals v, then the value of the expression following the keyword else is the value of the case expression.

Using this construct, the preceding example can be rewritten as follows:

			discount := case day
					when Sunday => 0.6,
					when Saturday=>  0.4
					otherwise=> 0.0
				end case;

Note that a comma is used to separate successive alternatives of the case expression and that no comma appears before the else keyword.

The second form of the case statement has no expression following the keyword case, and in it each case is marked by a list of expressions, each of which must yield a Boolean value. The value of such a case expression is the value of the expression tagged by a value of TRUE.

4.3 Loops

Almost every program involves some iteration. Whenever we need to deal with aggregates of data (all the books in a catalog, all the students in a class, all the prime numbers less than 1000, etc.) we probably need to specify that some computation is to be performed repeatedly. For example, we may want to do the following:

  1. List all the members of a set (for example, all the students registered in a given course).

  2. Modify each component of a tuple (for example, discount all entries in a price list by 10 percent).

  3. Modify selected members of a tuple, for example, raise by 6 percent the tax charged to every Texas resident appearing in a tuple, while leaving unchanged the taxes paid by residents of other states.

    We may even want to perform an action repeatedly when no data aggregates are involved. For example:

  4. Perform a series of actions a stated number of times (e.g., print the string -*-*-*-*-*-*- 10 times).

  5. Perform a series of actions as long as certain condition is true (e.g., to estimate the logarithm (base 2) of a number, we can divide it repeatedly by two as long as the result is greater than 1 and count the number of times the division is performed).

  6. Perform a series of actions until some condition is met (e.g., read input data until an end-of-file is detected).

The first three types of looping are expressed in SETL by using set and tuple iterators. Iterations of type (d) are expressed by using numeric or 'for loop' iterators. Types (e) and (f) correspond to while and until loops, respectively.

We now start our review of these various loop constructs, beginning with the simplest and most "natural" ones: the set and tuple iterators. We have already encountered various iterator forms when we discussed tuple and set formers. We will now examine these iterators in greater detail.

4.3.1 Set iterators

The set iterator is used to specify that a certain calculation is to be performed for each of the elements in a given set. In its simplest form, it reads as follows:

		for x in S loop

			sequence of statements       			    (1)

		end loop;

The meaning of (1) is as follows:

  1. Obtain the elements of the set S in succession.

  2. Perform the sequence of statements repeatedly once for each element of S.

  3. During successive iterations of the loop, assign the value of successive elements of S to variable x.

For example, suppose that S is the following set:

{"Springfield", "Albany", "Sacramento", "Boston"}

Then the loop

			for city in S loop
				print(city, " is a state capital.");
			end loop;

will produce the following output (in some order):

			Springfield is a state capital.
			Albany is a state capital.
			Sacramento is a state capital.
			Boston is a state capital.

The variable x in the construct "x in S" is called the bound variable of the iterator, or simply the iteration variable or loop variable. As you can see from the example, its name is arbitrary. We chose to call it "city" in this case but we could have called it "c", or "capital_city", or whatever; i.e., exactly the same output would have been obtained with the loop

for c in S loop print(c, " is a state capital."); end loop;

Each time the sequence of statements (also called the loop body) is executed, the bound variable is assigned the value of another element of S. The loop body is executed exactly as many times as there are elements in S. When all elements of S have been dealt with, the program moves on to execute the statements that follow the end of the loop.

Consider the following example:

		Fib13 := {1,1,2,3,5,8,13,21,34,55,89,144,233};
		count := 0;

		for n in Fib13 loop
			if  n mod 3 = 0 then
				print(n, " is a multiple of 3");
				count := count + 1;
			end if;
		end loop;

		print("There are ", count," multiples of 3 in Fibl3");

The purpose of this short code fragment is to list the multiples of 3 that appear in the set Fib13 (which happens to be the set of the first 13 so-called Fibonacci numbers). Each element of Fib13 is tested for divisibility by 3 and printed if the test succeeds. A count is kept of the multiples of 3 that we encounter, and this count is printed at the end. The output of this program is

			144 is a multiple of 3
			21 is a multiple of 3
			3 is a multiple of 3
			There are 3 multiples of 3 in Fib13

You may be surprised by the order in which the numbers 3,144, and 21 appear in the output. Why are they not listed in the same order as in the set Fib13? The reason is of course that sets have no particular ordering, and when we iterate over a set, we don't know in what order its various elements will be obtained. All we know is that we will obtain all of them, in some order, and that is all that matters. (When order matters, we must use tuples instead of sets. More about this later).

The bound variable that appears in a set iterator gets its values from successive elements of the set over which we iterate. When the iteration is complete, that is to say, when all elements of the set have been assigned to the loop variable, the loop variable gets the value OM. The following loop

			for number in {1,3,10} + {15,30} loop
				print("number is:", number);
			end loop;

			print("Now number is:", number);

produces the output:

			number is: 3
			number is: 1
			number is: 15
			number is: 10
			number is: 30
			Now number is: <OM>

Note two points about this example:

  1. We can iterate over any expression whose value is a set (i.e., the expression does not have to be a simple variable).

  2. OM, the undefined value, is printed as <OM>.

The reason for calling the loop variable a bound variable should be clear: the values taken by the loop variable are controlled by the iteration mechanism; its successive values correspond to the elements of the set over which we iterate, taken in an order over which the programmer has no control, namely the order in which the elements are stored internally.

The inquiring reader might wonder about the effect of making an explicit assignment to the bound variable in the middle of an iteration or even of making an assignment to the set over which the iteration takes place (the domain of iteration). SETL does not forbid such assignments, although such a practice is not likely to be useful. The SETL system is protected in that such assignments do not affect the way in which the iteration proceeds. For example, consider the following fragment:

				s := {1,2,3};
				for x in s loop print(x); end loop; 

This produces the output


in some order. Now consider:

				s := {1,2,3};
				for x in s loop 
					x := 10 * x; 
					s with := x;
				end loop; 

The output of this fragment will be


The assignments to x did not affect the iteration, nor did the modifications to s. It is only on exit from the loop that the modifications are affected. During the iteration, the domain of iteration is kept constant. Conditional set iterators

Consider the following problem: the holdings of a library are described by means of a set catalog and a series of maps: author, subject, and so on. We want to list those books in the catalog whose subject is calculus. This can be achieved by means of the program fragment:

			for book in catalog loop
				if  subject(book) = "calculus" then
				end if;
			end loop;

The same effect is achieved by the following code:

			for book in catalog | subject(book) = "calculus" loop
			end loop;

The vertical bar: "|", already introduced in Section 3.5.3, is read "such that", so that the last iterator can be expressed in English as follows: "iterate over the elements of catalog which are such that their subject is "calculus."" In other words, the "such that" construct appearing in a conditional iterator allows us to specify an iteration over a specified subset of a given set (Figure 4.6).

The general form of the conditional iterator is the following:

			for name in set expression | Boolean condition loop
			end loop;

In this construct, Boolean condition designates any Boolean expression, i.e., any expression that yields either true or false as its value. The meaning of this construct can be stated as follows:

  1. Iterate over the elements of set expression and assign the successive values of these elements to name.

  2. After each of these assignments, evaluate the Boolean condition. If the condition yields true, perform the list of statements. Otherwise, skip directly to the next value of set expression.

Figure 4.6 Simple Iterator Syntax Diagrams

Typically the iteration variable will appear in the Boolean condition. This is shown in our previous example.

However it is possible, though inelegant, to write a conditional iteration whose Boolean condition does not depend on the iteration variable. For example:

for x in S | true loop

is equivalent to:

for x in S loop

because the Boolean condition is true for all elements of S.

The following iteration is slightly less artificial than the preceding example:

for x in S | flag loop

where -flag- is some Boolean variable. It selects the elements of S according to the current setting of -flag-. This variable may be set elsewhere in the program, perhaps in the body of the iteration loop. However the intent of (2) is expressed more clearly by the equilavent code

				if  flag then
					for x in S loop
				end if

which should be preferred to (2) on stylistic grounds.

4.3.2 Tuple iterators

Iterations over tuples can be described in exactly the same manner as iteration over sets. That is, they can be given the following form:

		for name in expression | Boolean condition loop
		end loop optional tokens;

If expression is a set expression, the loop is a set iteration. If expression yields a tuple, it is a tuple iteration. One significant difference between set and tuple iterators is that for the latter we know the order in which the components of the tuple will be examined by the iteration. Namely, they are produced in order of increasing index. For example,

	width := [1, 3, 5, 7, 9, 2, 2];
	for w in width loop print(w * "*"); end loop;
always produces the following output:


In this example, the iteration variable w takes on the values of the components of the tuple -width-, exactly in the order in which they occur: first 1, 3, 5, 7, 9, and finally 2, 2. (Question: what would the picture look like if we had defined width as {1, 3, 5, 7, 9, 2, 2}?)

If a Boolean condition is present, the tuple iterator obeys the same rule as the set iterator: the body of the loop is executed only for those tuple components for which the condition yields true.

4.3.3 String iterators

An iteration over a character string is specified in exactly the same manner as an iteration over a tuple. The following example illustrates this.

		no_vowels := "";
		for c in "antidisestablishmentarianism" | c notin "aeiou" loop
			no_vowels +:= c;
		end loop;

The output of this loop is the string: "ntdsstblshmntrnsm"

The action of a string iterator is similar to that of a tuple iterator: successive components (in this case, characters) are assigned to the loop variable, and the body of the loop is executed for those values of the loop variable that satisfy the stated Boolean condition. The characters are iterated over in the order in which they appear in the string, from left to right.

4.3.4 Numerical iterators

An iterative computation is often expressed as follows: "Repeat the following calculation N times". Here the iterative process does not depend on a data aggregate, such as a set or a tuple, but rather depends on an integer, namely the value of N. This is the iterative construct most commonly supported by programming languages. In SETL, this type of iteration is expressed by a simple variant of the tuple iterator: performing a computation C repeatedly N times is equivalent to performing C once for each one of the integers in the range: 1, 2, 3..up to N. This range of values is described in SETL by writing

[1.. N]

and thus the repeated computation of C is expressed as follows:

				for i in [1..N] loop
				end loop;

The construct [1..N] looks like a tuple former, and indeed in contexts where a tuple is permissible, it is a valid tuple expression, as we saw in our discussion of tuple formers (Section 3.6). In an iteration this construct designates the range of values taken on by the loop variable in the course of the iteration. Note that an iteration variable appears here, just as it did in set and tuple iterations. This variable takes on the values specified by the range construct, in the order indicated, which is to say from 1 up to N in steps of 1.

Because of the importance of numerical iterators in programming, SETL provides a still more general form to describe them. We now proceed to explain this more general numerical iterator form. The general form of the numerical iterator

Any numerical iterator defines the sequence of integer values to be taken on by the iteration variable of a loop. The simple iterator form given previously specifies that the beginning (or lower bound) of the iteration is 1, and the iteration end (or upper bound) is N. The step between successive values of the sequence iterated over is 1. In a more general numerical iterator, these three quantities-lower bound, upper bound, and step-can be specified individually. To do so, use the following construct:

			[first,second..last]				 (3)
where first, second, and last are integer-valued expressions. For example,
       [1,3..9]  specifies the sequence 1, 3, 5, 7, 9
       [2,5..17] specifies the sequence 2, 5, 8, 11, 14, 17

As these examples indicate, the sequence iterated over is calculated as follows:

  1. The lower bound is the first expression in the iterator.

  2. The step between successive elements is the difference between the second expression and the first. If the second expression is missing, then, as in the examples of Section 4.3.4, the step is understood to be 1.

  3. Successive elements of the sequence are produced by repeatedly adding the value of the step, until we reach a value exceeding the last expression.

    (But see the following discussion.)

This description immediately raises three questions:

  1. What happens if the step is negative?
  2. What happens is the upper bound is not in the generated sequence?
  3. What happens if the step is zero?

The answer to (i) is what you would intuitively expect, namely, if the step is negative, then the elements of the sequence are produced in decreasing order. In that case, the third expression must be smaller that the first. For example, the iterator


specifies the sequence 10, 8, 6, 4, 2, 0 because the step is 8-10 =-2.

This form of the iterator is often used when the elements of a tuple must be processed in reverse order. For example, suppose that the elements of tuple T are numbers sorted in increasing order, and we want to list them in decreasing order, starting from the largest. The following loop will accomplish this:

		for i in [#t, #t - 1..1] loop
		end loop;

In this example, the first element of the sequence is given by the expression #t; the first value of the iteration variable i is therefore the index of the last element of t. The next value is #t - 1, from which we conclude that the step for this sequence is -1. The last value iterated over is 1.

Next consider the second question raised, namely, what if the final value appearing in the construct [first,second..Iast] is not in the generated sequence? For example, what is the sequence generated by the following iterators:




The answer to this question is determined by the following rule: a sequence iterated over is generated by successive additions of the step to the first element. If the sequence is increasing (i.e., if the step is positive) we generate all numbers in the sequence which are smaller than or equal to the last element. If the sequence is decreasing we generate all the numbers that are larger than or equal to the last element. Thus, for example

		[1,3..10]  specifies the sequence 1, 3, 5, 7, 9
		[15,10..1] specifies the sequence 15, 10, 5

What about [1,3..1]? According to the rule just stated, we start with 1. The step is 2. The next value in the sequence would be 3, but that is already greater than the stated upper bound of 1. Thus this iterator generates a singleton sequence, whose only element is 1. This leaves one final question: what is the meaning of the iterator if the step of the sequence is zero? In that case, the convention used by SETL is that the iteration is empty, i.e., iterates over no values at all. A loop whose iterator has a step of zero is simply not executed. The following are examples of empty loops:

		for i in [1,1..1000] loop 
			print("This message will never be seen");
		end loop;
		for x in {} loop 
			print("Nor will this one because {} has no elements");
		end loop;
		for i in [] loop 
			print("Need we say more");			
		end loop;

The previous rule also answers another lingering question: what is the value of the loop variable on exit from a numerical loop? We saw that in the case of set and tuple iterators, the loop variable became undefined on exit from the loop. In the case of numeric iterators, the value of the loop variable on exit is the last value in the sequence:

first, first + step, first + 2 * step,..

that lies inside the specified range. If the step is positive, this means the last value in that sequence that is no larger than the stated bound; if the step is negative, it is the last value which is no smaller than the bound.

4.3.5 Additional loop control statements: continue and exit

The continue and exit statements increase the syntactic flexibility of SETL's loop constructs. Their syntax is simply




The actions caused by continue and exit are as follows.

When a continue statement is executed in the body of a loop, execution of the rest of the body is skipped, and the iteration proceeds to the next value of the loop variable. Thus, the loop

		for x in S | C(x) loop
			some statements...
		end loop;

can be expressed as follows:

		for x in S loop
			if C(x) then
				some statements..
			else continue;
			end if;
		end loop;

Execution of a exit statement terminates the execution of a loop and causes execution to continue from the first statement following the end of that loop.

For example, consider the following fragment:

		sum := 0;
		for x in [1..100] loop
			sum := sum + x;
			if  sum > 10 then exit; end if;
		end loop;

This code fragment adds the integers in the range 1..100 until the sum obtained is greater than 10. After five iterations through the loop, sum is 1 + 2 + 3 + 4 + 5 = 15, and at that point the exit statement is executed. The value printed is 15, and the 95 iterations that remain are simply not executed.

continue and exit statements always refer to the innermost loop within which they appear.

The continue statement might be typically used in a search loop, when an object x satisfying a property C(x) is to be found in some data aggregate S, and then processed in some way. When so used, the body of the loop will include code that tests each element of S for the property C. It may be the case that we can determine that a given element y of S does not have the property C, without completing execution of the loop body. In that case, the continue statement allows us to avoid processing y and to proceed to the next element of S. We will see examples of such use later.

Like the continue statement, exit also typically appears in search loops. However, whereas continue bypasses unsuccessful cases, exit is used to signal that there is no need to continue with the iteration, either because the search has been successful or because it has become clear that the search will remain unsuccessful even if the remaining elements are examined.

To illustrate the use of these statements let us return to the problem of producing a table of prime numbers. This time, we will write our program as a series of loops. Moreover, we will start with a simple solution to the problem and improve this initial solution in order to develop more and more efficient versions of it. Our initial solution simply restates the definition of prime number: it is a number that has no factors except 1 and itself. In order to determine whether N is prime, we divide N by all numbers smaller than itself. If any of these divisions turns out to have no remainder, N is not prime, and we do not need to continue examining other divisors. If no division is exact, N is prime. Our first version reads as follows:

program primes1;

	N := 1000;			-- The desired range.
	primes := [];			-- Sequence to be constructed.

	for num in [2..N] loop			-- Examine all numbers in the range
		factor_was_found := false;		-- no factor was found yet
		for factor in [2..num - 1] loop -- Range of its possible divisors

			if  num mod factor = 0 then	-- num has an exact divisor. Skip it.
				factor_was_found := true;	-- note that a factor was found
				exit;		-- having found a factor, we need search no longer
			end if;

		end loop;
		if  factor_was_found then continue; end if;	-- skip all non-primes
		primes with := num;			-- If we reach this point, num is a prime.

	end loop;

	print("Primes in the range 1 to ", N, ":");

end primes1;

This simple program involves many redundant calculations, which we now proceed to discover and remove.

First, note that an even number (with the exception of 2) cannot be a prime number. There is therefore no need to iterate over all numbers in the range [2..N]. It is sufficient to consider only the odd numbers in that range. By the same token, these numbers can have only odd divisors. The outer loop should therefore have the range

for num in [3,5..N] loop

and the inner one

for factor in [3,5..num - 1] loop

This modification of the initial program makes it four times faster (only half as many operations are performed during each of the two nested levels of iteration).

Next, note that to determine whether -num- is prime, we do not need to examine all its possible divisors: it is sufficient to examine its possible prime divisors, i.e., all prime numbers smaller than it. In fact, we only need to examine all the primes which are no greater than the square root of -num-, since if num is not prime, at least one of its factors must be a prime no larger than this square root. If we modify the inner loop accordingly, we obtain the following program:

		program primes2;

		N := 1000;  -- The range.
		primes := [2];  -- The first prime.

		for num in [3,5..N] loop
			factor_was_found := false;		-- no factor was found yet

			for factor in primes loop

				if  num mod factor = 0 then
					factor_was_found := true;	-- note that a factor was found
					exit;	-- having found a factor, we need search no longer
				end if;

				if  factor * factor > num then exit; end if;
					-- need not test larger factors

			end loop;

			if  factor_was_found then continue; end if;	-- skip all non-primes
			primes with := num;	-- collect all primes

		end loop;

		print("primes in the range 1 to ", N, ":");

		end primes2;

Our next improvement generalizes the observation that allowed us to eliminate all even numbers from consideration. Numbers of the form 6n, 6n + 2, and 6n + 4 can never be primes, since they are even. Numbers of the form 6n + 3 can never be primes, since they are divisible by 3. So we need only examine numbers of the form 6n + 1 and 6n - 1. This leads us to an improved program which reads as follows:

		program primes3;

		N := 1000;  -- The range.
		primes := [2,3];  -- The first two primes.
		for num_part in [6,12..N] loop	-- advance by steps of 6

			for offset in [-1,1] loop		-- to try 6n - 1, and then 6n + 1
				num := num_part + offset;		-- try 6n - 1, and then 6n + 1
				factor_was_found := false;		-- no factor was found yet

				for factor in primes loop

					if  num mod factor = 0 then
						factor_was_found := true;	-- note that a factor was found
						exit;	-- having found a factor, we need search no longer
					end if;

					if  factor * factor > num then exit; end if;
					-- need not test larger factors

				end loop;	-- end of the 'for factor' loop

				if  factor_was_found then continue; end if;	-- skip all non-primes
				primes with := num;	-- collect all primes

			end loop;	-- end of the 'for offset' loop

		end loop;	-- end of the 'for num_part' loop

		print("primes in the range 1 to ", N, ":");

		end primes3;

4.3.6 Map iterators

We have emphasized repeatedly that maps are sets. Hence to iterate over all the elements p of a map f we can simply write

for p in f loop...

In this iteration, the bound variable p is assigned successive elements of f, which are all ordered pairs. If within the body of such a loop we wanted to refer to successive elements in the domain of f, we could "unpack" p by writing

		for p in f loop 
			x := p(1); 	-- x is in the domain of f
			y := p(2); 	-- y is the corresponding point in the range.

This same unpacking effect could also be obtained either by placing a tuple assignment of the form

[x,y] := p;

(see Section 3.12) at the start of the body of the iteration or by changing the iteration header itself to read

for [x,y] in f loop ...

Because of the importance of this type of iteration a still more elegant, map-like alternative notation is provided for it, namely,

         for y = f(x) loop      					           (5)

This form of iterator is called a map iterator. Note that both the variables and y are bound by this iterator: x receives successive values taken from the domain of f, while simultaneously y is set to the corresponding range value f(x). For example, suppose that f is the following map:

	{	["New York", "Albany"], ["California", "Sacramento"], 
		["Massachusetts","Boston"], ["Illinois", "Springfield"], 
		["North Dakota", "Fargo"], ["Idaho", "Boise"]} 

and that mid_west is the set:

{"Kansas", "Illinois", "South Dakota", "North Dakota", "Michigan", "Iowa", "Nebraska"}

then the following loop:

		for capital = f(state) | state notin mid_west loop
			print("The capital of ", state, " is ", capital); 
		end loop;

will have the following output:

		The capital of New York is Albany
		The capital of California is Sacramento
		The capital of Idaho is Boise
		The capital of Massachusetts is Boston

The syntax appearing in (5) can also be used for tuple iterators. If T is a tuple, then the iterator

for comp = T(i) loop

assigns the integer values 1,2..#T to -i- and simultaneously assigns the values of the corresponding components of T to comp. The advantage of this form over the simple tuple iterator is that it makes the index of each component available at the same time as the component. (The use of a syntax like that of map iterators for tuple iterators once again underlines the logical similarity between tuples and maps: tuples are very similar to maps whose domain is a set of integers).

The iterator (5) can be used only for single-valued maps, and the system will generate a run-time error if we attempt to use it on a multivalued map. To iterate over a multivalued map, the following form is provided:

			for s = f{x} loop      				          (6) 

Like (5), this construct, sometimes called a multivalued map iterator, controls both the values of x and s. The variable x receives successive values from the domain of f, and s becomes the corresponding image set of x, that is to say, f{x}. For example, let f be the map

{[i, j]: i in [1..4], j in [1..4] | i > j}

Then the iteration

		for s = f{x} | odd(#s) loop
			print(s, " is the image of ", x); 
		end loop;

will produce the following output:

		{1,2,3} is the image of 4
		{1} is the image of 2

4.3.7 Compound iterators

A compound iterator is a useful shorthand notation to describe a simple but commonly occuring nested iteration loop structure. For example, the code fragment

		for x in S1 loop
  			for y in S2 loop
			end loop;
		end loop;
can be written as follows:
		for x in S1, y in S2 loop
		end loop;

Any number of nested loops can be combined in this fashion. A single end statement closes all of them. The iterators in a compound iterator are under stood to be nested from left to right. The rightmost iterator in the compound is the innermost; its loop variable changes most rapidly.

All iterator forms can appear in a compound iterator: set and tuple iterators, numeric iterators, map iterators. For example, you can write

for x in S, y in [1.. x - 1], z = f(t) loop ....

Compound iterators can also have a such that clause. Such a clause is understood to apply to the innermost iterator in the compound; that is to say, this clause is evaluated for every assignment to the innermost loop variable.

Continue and exit statements appearing within a compound iterator application always refer to the outermost iteration of the compound iterator: there is no way to continue of quit any of the inner members of the compound. (If it is necessary to do so, the iterators must be written in the equivalent nested bu uncompunded form).

4.3.8 Other kinds of loop constructs

Each of the iterators discussed so far generates a sequence of values: the successive elements of a set, the components of a tuple, the characters of string. We have seen how iteration loops are described by means of successive iterators: the body of a loop is executed once for each value that appears in the generated sequence. Different kinds of loop constructs, called while and until loops, are used to describe computations that repeat until a desired state of affairs is reached, rather than according to some preset sequence of values. For example, we may want to process input data which are to be read from a file, but we may not know how many items are actually present in the file. In this case, we need to express the following intent: "Process the input as long as there is data to process." A second example is furnished by numerical analysis. Many numerical problems have the following general flavor: find a sequence of better and better approximations to a desired value (for example, to the root of an equation) and stop when the answer is "close enough". (Close enough usually means that rather than looking for an exact answer, we are satisfied with an answer which differs from the exact one by a very small number, say 1.0 E-7.) In these cases, we generally cannot state in advance how many times the loop body may have to be repeated. SETL provides several loop constructs for dealing with such common situations. The simplest of these is the indefinite loop, whose syntax is as follows:

			end loop;

The indefinite loop is not used very often, because ordinarily the condition under which it will terminate execution can be expressed more clearly by using one of two much more generally used loop forms, namely the while and the until loop. Let us now examine these. The while loop

A while loop is written as follows:

		 while condition loop
		end loop;

Execution of such a loop proceeds as follows:

The condition is evaluated. If its value is TRUE, then the loop body is executed. After each execution of the body, the condition is evaluated again, and as long as it yields TRUE, the body continues to be executed. As soon as the condition becomes FALSE, looping ends, and execution proceeds with the first statement that follows the loop.

If the first evaluation of the condition yields FALSE, then the loop body is not executed at all. It follows that a while loop can be executed zero or more times.

Let us look at some examples. The processing of a stream of data received from input is a typical case. Suppose that we want to read a list of names and print those that start with A. We do not know the number of items in the data stream, and it may even be that there are none. Fortunately, the SETL system uses a very simple convention to indicate that data have been exhausted. When we attempt to read data from a file but have reached the end of the file, the read statement yields OM. Thus, the following simple code fragment can be used to handle a stream of input data and stop when the end of the data has been reached:

	file_to_read := open("test_file","TEXT-IN");		-- open a file to allow reading
	reada(file_to_read,name);        -- Get first name from the file. 
	count := 0;

	while name /= OM loop	-- As long as we read something

		if  name(1) = "A" then
			count +:=1;
		end if;

		reada(file_to_read,name);	-- Acquire next data item from input file.

	end loop;

	close(file_to_read);		-- release the file for subsequent use
	print(count, " names starting with A were found");

(The open, reada, and close statements seen here operate respectively as follows: open makes the desired data file available for reading; reada reads one data item from the file; close releases the file, so that other problems can read it subsequently. Section XXX details the conventions that apply to these operations.)

Note that in this code we execute one reada statement before the loop, to "prime" the loop, so to speak. Doing this ensures that -name- receives a value before the first evaluation of the while condition. If the input file was not empty, then -name- is not OM, and the body of the loop is executed. If the input file was empty, then -name- is OM, and the loop is bypassed altogether. At the end of each execution of the loop body, we perform another reada operation. As long as something is read, the loop will be executed again. As soon as the stream of input data is exhausted, the reada statement will yield OM, the while condition evaluates to false, and execution of the while loop will terminate. Program execution will then proceed to the statement following the loop, which in the preceding case is the one that prints the little statistical report on the data.

Our next, more complex example is motivated by the following practical problem. Suppose that the catalog of a school specifies a set of prerequisites for each course that is offered. That is to say, for each course C, it specifies a set of courses which the student must take before being allowed to take C. Needless to say, the prerequisites of C often have further prerequisites of their own, and we will sometimes want to know the full set of courses that have to be taken before C is tackled. These include the prerequisites of C, the prerequisites of those prerequisites, and so on. Let us assume that the map -prerequisites- contains the standard information that appears in a school catalog, that is to say, the list of immediate prerequisites of each course C. Then the desired set can be obtained as follows:

P := prerequisites{C};		-- get the "immediate" prerequisites for the course

all_P := P;			-- initialize the set we aim to build

while P /= {} loop		-- as long as there is some prerequisite that
				-- has not been processed.

	course from P;			-- take one of them.

	all_P with:= course;		-- add to full set of prerequisites

	P +:= (prerequisites{course} - all_P); -- add all the prerequisites of P to the set 

end loop;

print("Before taking ", C, " the following must have been taken"); 

This example deserves careful study, because it embodies a very common program schema, sometimes called the use of a workpile. The set P originally consists of the immediate prerequisites of C. Each of these is placed in all_P, which is to be built up to the full set of prerequisites we are gathering, and each of their prerequisites in turn must be placed in all_P, and also into the set P, to see whether further prerequisites are implied by them. The process terminates when we reach courses that have no prerequisites at all (there must be some of those!). The "workpile" set P shrinks with each execution of the from statement but can increase again with the addition of new prerequisites of the course we have just extracted from P. Workpile algorithms of this kind typically involve while loops. The until loop

The syntax of the until loop is similar to that of the while loop. We write

	until condition loop
	end loop;

An until loop is executed as follows: The body of an until loop is always executed at least once. After it is executed the loop condition is evaluated. If this yields TRUE, then execution proceeds to the first statement following the loop. If it yields FALSE, the body of the loop is executed again. We can therefore say that the test of a while loop is performed at the beginning of the loop body, and that the test of an until loop is performed at the end of the loop body. Note also that the body of an until loop is always executed one or more times, in contrast to that of a while loop, which may not be executed at all.

As an example, let us consider the problem of finding the smallest number of steps that can take us from one point in a graph to another. In order to tackle this problem we must say a word about graphs, and about the ways in which they are generally described in SETL. A graph consists of a set of vertices, and a set of edges which connect the vertices. Edges of a graph can be represented in SETL by ordered pairs, whose first component is the starting edge. For example, the simple graph
is described by the following set of pairs (i.e., edges):

{[A, B], [B, A], [A, C], [C, D], [B, D], [D, A]}

Since in SETL a set of pairs is at the same time a map, we can also regard this representation as a successor map (also called an adjacency list) whose domain is the set of vertices of the graph. Then, for each vertex V, the value of the mapping successor{V} is the set of vertices that are reachable from V by means of some edge that starts at V. For example, in the graph shown, successor{B} is the set {A,D}, because of the existence of edges from B to A and D.

Using this convention, our problem can be stated as follows: given a graph G described by its set of edges, and given two vertices s (source) and t (target), find the length of the shortest path between s and t, i.e., the smallest number of edges that must be traversed in order to go from s to t. If we do not know a priori what path to take, we may have to explore a substantial number of paths starting from s, until we find one that reaches t. A possible way of organizing this exploration is to find all the vertices that can be reached from s in one step, two steps, etc., until we reach t. Our problem will therefore be solved by the following:

	seen := {s}; 
	length := 0;
	until t in seen loop
			-- add to seen all the vertices that can be reached by
			-- following one more edge from vertices already reached. 
		for v in seen loop
			seen +:= successor{v};
		end loop;
		length +:= 1;
	end loop;
	print("There is a path of length ", length, " from ", s, " to ", t);

Various shortcomings of this code are easily noted. For example, what if our graph is such that there is no path from s to t? As written, the preceding algorithm will iterate indefinitely, and the condition -t in seen- will never be met; i.e., we will endlessly retrace the edges that lead out of the vertices already reached. In order to prevent this behavior, we can modify our algorithm, so that at each step -seen- contains only those vertices that have not been reached on previous steps. This can be achieved as follows:

	length := 0;
	seen := {s}; -- The set of vertices reached at each step.
	reached := {s}; -- The set of all vertices reached so far.
	until t in seen or seen = {} loop
			-- We collect the new vertices reachable from the latest set,
			-- which were not reached previously.
		seen := (+/{successor{v}: v in seen} - reached);
		reached +:= seen;
		length +:= 1;
	end loop;
	if  seen = {} then
		print(t, "is not reachable from", s);
		print("There is a path of length", length, "from", s, "to", t);
	end if;
See Section 5.3.1 for a further example continuing this theme.

4.5 The stop statement

The stop statement is used simply to terminate execution when for any reason your program has decided that it should not go on (e.g., when all necessary work has been finished). This statement can be used either in your main program or in any procedure (see Chapter 5). A typical example of its use is

if workpile = {} then stop; end if;

Of course, your program will always stop by itself when it has executed the last statement of your main program. So no stop statement is needed there (even though it does no harm to put one in).

4.6 The assert statement

The form of an assert statement is

			assert expn;            			     (1)

where expn designates any Boolean-valued expression. To execute such a statement, the expn it contains is evaluated. If the resulting value is false, a message of the form "ASSERTION FAILED AT LINE XXX OF PROCEDURE YYY" is printed, and execution terminates. If true, then control passes immediately to the statement following the assert statement. More precisely, a false assertion will terminate execution if the check assertions feature of the SETL execution-time system is switched on. Moreover, if the confirm assertions feature of the SETL execution-time system is switched on, then each true assertion will produce a message "ASSERTION PASSED AT LINE XXX OF PROCEDURE YYY". (See the discussion in Chapter XXX of the execution-time assert preference.)

Assert statements are ordinarily used in a program for one of two reasons:

  1. To document and to check logical conditions which the programmer knows to be critical for correct functioning of her program. Used in this way, assert statements constitute a powerful program debugging aid. See Sections 6.2 and 6.7 for additional discussion of this point.

  2. To trigger any side effects caused by evaluation of the Boolean -expn- that the assert statement contains. Note that this -expn- can contain assignments or other subexpressions (such as existential or universal quantifiers) whose evaluation causes side effects. Evaluation of the assert statement (1) will always trigger these side effects even if assertion checking is switched off. (See the discussion of initial program parameter assert in Chapter 9).

Perhaps the most common case of this second use of the assert statement is in constructs of the form

assert exists x in s | C(x);

This construct can be used whenever one is certain that the set {x in s | C(x)} is non-null, and in this case it will always give x a value such that C(x) is true. A similar, somewhat more elaborate, use of the assert statement is shown in

assert (exists x in s | C(x)) or (exists x in s1 | C1(x));

Assuming that the assertion is true, execution of this statement will always set x either to an element of s for which C(x) is true or to an element of s1 for which C1(x) is true.

4.7 Programming Examples

4.7.1 An interpreter for a simple language

One of the most typical uses of the case statement is to program an interpreter. An interpreter is simply a program that executes sequences of commands written in some formalized language. An interpreter works by reading one command at a time, executing it, and then reading the next command, etc. Interpreters serve as an obvious means of creating special-purpose languages, and we will say more about this at the end of this section, but first we will present an example of an interpreter. This will make use of most of the control structures that we have examined so far in this chapter.

We will write an interpreter for a simplified version of the so-called Turtle language used in a popular system for grade-school computer education. The Turtle language consists of a series of commands that control the motion of a "turtle" on a screen or on a sheet of paper. The motions of the turtle generate a picture, and the purpose of the interpreter is to read a series of commands in Turtle language and construct the corresponding picture. The state of the Turtle at any given time is described by its coordinates and its direction of motion. The turtle can be commanded to move forward a certain number of steps, turn left or right, and put its pen down (to draw) or up (to move without drawing a line). The full list of commands and their syntax is the following:

FORWARD N—Move forward N steps.
RIGHT—Turn right from current direction of motion.
LEFT—Turn left from current direction of motion.
PEN_UP—Move without leaving a trace.
PEN_DOWN—Draw every motion.
DRAW—Display picture of motions so far.
END—Terminate picture, draw it, and stop.

For example, the following sequence of commands

                 FORWARD 5
                 FORWARD 10
                 FORWARD 5
                 FORWARD 5
                 FORWARD 10
generates the following picture:
		  * * * * * * * * * * *
		  *         *         *
		  *         *         *
		  *         *         *
		  *         *         *
		  *         * * * * * *

(Turtle starts here) Construction of a Turtle language interpreter

Our interpreter for the Turtle language will consist largely of a simple case statement, each of whose options correspond to one command in the Turtle language. That is to say, the basic structure of the interpreter will be as follows:

		case command of

		when "RIGHT" => .....

		when"LEFT" => .....

		when"FORWARD" => .....


Of course, we have to fill the dotted sections with an exact description of the actions that represent the corresponding motion of the turtle. This requires that we decide on how to represent the picture being drawn, and also the position and direction of motion of the turtle at each step.

First let us examine the matter of picture representation. In order to keep our task simple, we assume that the track of the turtle will be displayed by means of print commands. Each print statement generates one line of output, and it is reasonable to describe the picture as a sequence of lines. To make matters definite, we must choose the height and width of the picture: We choose size 50 by 50, so that it can fit easily on a simple page of printed output. This size will not change during execution of the program so we just initialize the picture to be an array consisting of 50 strings of length 50, consisting only of blanks:

picture := 50 * [50 * " "];

Notice the double use of the replication operation: the expression 50 *" " yields a string of 50 blanks; the brackets around this expression give us a tuple whose only element is such a string; and the outer replication operation yields a tuple with 50 elements, each of which is a blank string.

Of course this is not the only possible way of representing the picture. The far more powerful SETL graphic capabilities described in Chapter XXX can be used. However, we shall see that this choice makes creation and display of of a crude picture very simple.

The position of the turtle at each step is defined by giving a line and a character position on the line. If we think of each line as drawn horizontally across the picture, then the pair [row, column] designates the turtle position. In our simple interpreter the turtle can move in one of four directions, which we can label "NORTH," "EAST," "SOUTH," and "WEST," with the usual (Northern Hemisphere) convention that north is up. We choose to start the turtle on its trek from the lower left-hand corner of the picture, facing north.

Next let us sketch the actions performed upon each Turtle command. The turning commands —RIGHT and LEFT—are the simplest: they change only the direction of motion of the turtle, not its position, and they do not add anything to the picture being drawn. In what follows we have chosen to implement those commands simply by looking up the direction that lies to the right or left of the present direction of motion. This look-up operation uses SETL maps.

The pen commands —PEN_UP and PEN_DOWN — affect neither the position nor the direction of motion of the turtle. We describe their effect by using a Boolean variable called -tracing-, which is interrogated whenever the turtle actually moves.

The only nontrivial command is —FORWARD N— where N is some positive integer. This command alters the position of the turtle and produces a segment of the picture if the -tracing- indicator is true. Clearly the action of FORWARD depends on the current direction of motion. If the turtle faces east, the motion will be to the right, along a line or row. The same is true if the turtle faces west. On the other hand, if the turtle faces north or south, then its motion is along a column, and its row position is altered. The forward statement is therefore best described by a case statement. Let -distance- designate the extent of the specified forward motion, and let [new row, new_col] be the coordinates at which the turtle finds itself after the motion. Then the effect of FORWARD can be described as follows:

			case direction of
			when "NORTH") => 	new_row	:= row - distance;
					new_col	:= column;
			when "EAST") => 	new_col	:= column + distance;
					new_row	:= row;

Finally, how is the picture itself to be created? We want to fill in the trajectory described by the turtle by using some printable character, say the asterisk: "*". After each FORWARD command, we want to place asterisks along the line from [row, column] to [new row, new_column]. This is simple if the motion is horizontal, i.e., new_row = row, since in this case the line to be drawn is a part of the current row. If we recall that the picture is described by an array of horizontal lines or rows, then it is clear that the line on which the turtle is currently moving is given by picture(row). The motion of the turtle fills a substring of this row, and in the case of eastward motion this can be expressed as follows:

picture(row)(col..new_col) := distance * "*";

Westward motion is equally simple to describe. North-south motion is a trifle harder to handle. In such a motion, the turtle stays on the same column but crosses several rows. The line it traces has one character on each row traversed. We lay down the line as follows:

		for i in [row . . new_row] loop
			picture(i)(column) := "*";
		end loop;

We want our interpreter to read any number of turtle commands, and we do not know a priori how many there will be. We therefore enclose our basic case statement in another loop, this one bracketed by the 'endless' loop lines:



end loop;

Finally, the statement


which our interpreter must associate with the end command, will terminate interpretation.

Putting this all together, and remembering to keep the turtle within its 50 by 50 box, we get the following program.

program turtle;

	right := {["NORTH","EAST"], ["EAST","SOUTH"], ["SOUTH","WEST"], ["WEST","NORTH"]};
		-- The map giving the direction to the left of any direction is obviously
		-- the inverse of the -right- map.

	left := {[d1,d2]: [d2,d1] in right};
	picture := 50 * [50 * " "];		-- initialize an empty picture

		-- Initially the turtle is at the lower left-hand of the picture, facing north.

	direction := "NORTH";
	row := 50;
	column := 1;
	tracing := false;	-- initially, the turtle's pen is up
	loop		-- Main loop of the interpreter.


		case command

		when "RIGHT" => direction := right(direction);

		when "LEFT" => direction := left(direction);

		when "PEN_UP" => tracing := false;

		when "PEN_DOWN" => tracing := true;

		when "DRAW", "END" =>

			for line in picture loop
				print(line);	-- draw 50 lines, to make 1 picture
			end loop;

			picture := 50 * [50 * " "];		-- start over, with a new  empty picture

			if  command = "END" then stop; end if;

		when "FORWARD" => read(distance);

			case direction
				when"NORTH" => new_row := (row - distance) max 1;
				new_col := column;		-- note the new turtle position

				when"EAST" => new_col := (column + distance) min 50;
				new_row := row;		-- note the new turtle position

				when"WEST" => new_col := (column - distance) max 1;
				new_row := row;		-- note the new turtle position

				when"SOUTH" => new_row := (row + distance) min 50;
				new_col := column;		-- note the new turtle position

			end case;
			if  tracing then
				if  new_row = row then

				-- Find first and last column needed for tracing
					min_col := column min new_col;
					max_col := column max new_col;
					picture(row)(min_col..max_col) := (distance + 1) * "*";


				-- Find first and last row.
					min_row := row min new_row;
					max_row := row max new_row;

					for r in [min_row..max_row] loop
						picture(r)(column) := "*";
					end loop;

				end if;

			end if;

			row := new_row;		-- note the new turtle position
			column := new_col;

		otherwise  => print("INVALID COMMAND:", COMMAND);

		end case;

	end loop;

end turtle;

Several details of this program deserve notice:

  1. Two Turtle commands trigger output: the DRAW command, and the END command. It is therefore natural to place both commands in the same case tag and to add another simple check, made after the picture has been produced, to determine whether the program should stop.

  2. We all make mistakes, and the interpreter should be prepared to handle less-than-perfect instructions. What should the interpreter do, for example, with the commands
    			FORWAED 10

    and so on? In this program we have chosen to notify the user that a command just read is not part of the known set of Turtle commands. This is the purpose of the otherwise clause of the case statement. A more ambitious program might try to recognize misspellings of the known commands, accept abbreviations for them, accept upper- and lowercase names for commands, and so on. Some of these extensions are pursued in the following exercises.

  3. A different sort of error is exemplified by the command

    FORWARD 200

    which attempts to move the turtle beyond the bounds of the picture. In the preceding program, we have made sure that the values of new_row and new_col are always in the range 1 to 50.

  4. Using a printer, or printing an array of characters to the screen, is not the best way to display a picture. If you run the program as written, you will notice that the separation between successive lines is greater than that between successive characters on a line. As a result the picture looks cramped in the horizontal direction. A more aesthetic result is obtained if we count each horizontal step as two characters, or always add a blank between horizontal characters. This nicety is left to the reader.

4.7.2 Various elementary sorting techniques

Sorting is the problem of taking a set or tuple of items (such as integers, real numbers, or strings) which can be compared to one another and putting them in order. Dozens of interesting ways of using a computer to sort are known, and a few of the more interesting high-efficiency sorting techniques will be presented in later chapters. In this section, we present only some very simple sorting methods, which serve to illustrate various control structures discussed in this chapter. The first and simplest of these, the bubble sort method, sorts a tuple. It works simply by scanning the tuple for adjacent components which are out of order and interchanging them if they are found. In this way, larger items "bubble up" to their proper position in the tuple. When no out-of-order pairs remain, the tuple is sorted.

In SETL this is simply

		while exists i in [1..#t-1] | t(i) > t(i + 1) loop
			[t(i),t(i + 1)] := [t(i + 1),t(i)]; -- interchange the items.
		end loop;

This bubble-sort procedure has a number of interesting variants. In one of them, we simply sweep repeatedly through the tuple, interchanging all pairs of adJacent items which are out of order.

If we perform this sweeping operation at least as many times as the tuple has components, all items will be swept into their proper positions, since even If the smallest item originally came last it will have time to move down to the first position in the tuple.

We can express this "sweeping" procedure as

		for number_of_times in [1..#t] loop
			for i in [1..#t - 1] | t(i) > t(i + 1) loop
				[t(i), t(i + 1)] := [t(i + 1), t(i)]; -- interchange
			end loop;
		end loop;

This can also be put more succinctly as

	for number_of_times in [1..#t], i in [1..#t - 1] | t(i) > t(i + 1) loop
		[t(i), t(i + 1)] := [t(i + 1), t(i)];
	end loop;

A very different sorting method is to search repeatedly for the minimum element of a tuple, put it at the end of a new tuple which is being built up, and delete it from the original tuple. This is called the selection sort method. It can be written as

	new_tup := [];				-- initialize tuple to be built up.

	for i in [1..#t] loop

		min_till_now := t(1); min_place := 1;		-- save minimum element
                                              			-- scanned, and its location
		for j in [2..#t] | t(j) < min_till_now loop

			min_till_now := t(j);			-- save value of newly found
							-- minimum		
			min_place := j;			-- save position of new minimum

		end loop;

		new_tup with:= min_till_now;		-- put minimum element at end
								-- of new tuple
		t(min_place..min_place) := [];		-- delete minimum element

	end loop;

Beyond the methods shown here, you will find that it is instructive to review all the ways you can think of to sort a deck of cards by hand and to express these hand-sorting techniques in SETL.


1. A set of Markov productions is an ordered collection of rules of the form s1 > s2, where s1 and s2 are both character strings, neither of which contains the character ">". The string s1 is called the left side of the production s1 > s2, and the string s2 is called its right side. To apply such a set of productions to a string s, one searches through s, looking for a substring which coincides with the left-hand side of some production; if any such production is found, this substring is replaced by the right-hand side of the production.

Write a Markov production interpreter program which reads in a set of Markov productions and a string s and then applies n successive productions to s, displaying the result every m steps.

2. How would you express a for loop of the form

for n in [1..k] | C(n) loop...

in terms of a while loop? What about for loops of the form

for x in t | C(x) loop...


for x = t(i) | C(x) loop...

where t is a tuple?

3. Write a program which will compare two poker hands (each consisting of five cards) and decide which of the two is the winning hand according to the rules of Poker.

4.8 Reading and Writing Data - First Steps

Till now we have been using the two basic input-output commands, reada and print, which allow a SETL program to communicate with the rest of the world, informally. It is now time to discuss these commands more systematically. (More elaborate SETL input-output features, such as printa, get, put, etc., are described in the Section 4.9.).

The easiest way to send output to the terminal (that is, the SETL output window) is to use the print statement. This has the form


where each of exp1,...,expk is an expression. Any valid expression can appear in a print statement, and any valid SETL value, including Boolean values and atoms, can be printed. In particular, sets and tuples can be printed. Thus it is perfectly acceptable to write

print(2 + 2, {1,2,3},[{1},{{2}},[{3}]],"HELLO THERE");

The output produced by this print statement will look like

4 {3,1,2} [{1},{{2}},[{3}]]HELLO THERE

This example illustrates several details concerning the print primitive:

  1. Expressions are evaluated before being printed

  2. The elements of sets are grouped within set brackets, and tuple components are grouped within tuple brackets. For ease of reading, set elements and tuple components are separated by commas (even though this can lead to ambiguities when structures containing strings are printed).

  3. Strings are printed without quotation marks; e.g., when we print the constant "HELLO THERE" only the characters



  4. No blanks except those you print explicitly appear in the output. The statement

    print("N is",3);

    produces only the characters

    N is3

    To get better-formatted output you need to write

    print("N is ",3);


    print("N is"," ",3);
    Both of these will produce
    N is 3

    as output.

  5. Since sets have no particular order when they are printed, set elements can appear in any order.

  6. Integers are printed in standard decimal formats. Their representations require a number of characters defined by their size and nature. Floating-point numbers are always printed with a fixed number of decimal places and in exponential or 'scientific' form if required. E.g., 2.3 is printed as

    and 2.3 ** 100.0 is printed as


  7. Other kinds of SETL values will be represented by strings of the form #nnn, where nnn denotes the integer "serial number" of the atom. Note that these rules inevitably lead to a degree of ambiguity, as shown, e.g., in the output produced by
    				print(OM, " <om>");
    				print(TRUE," TRUE ",FALSE," FALSE");

    which is

    				<om> <om>

  8. Since sets are not printed in any particular order, it can be hard to locate elements in the printed representation of sets, especially large sets.

  9. A single print statement with many arguments will always try to put all the output which it produces on a single logical line of output. If the value or values to be printed are too large and complex to fit on a single line, they will be printed on as many lines as necessary. When this happens, the points at which one physical line of print ends and the next begins will fall haphazardly, and it can be something of a trial to read the resulting output.

  10. Each print statement starts a new logical line. A parameterless print statement can be used to generate a blank print line. Thus, whereas


    will produce the output


    the command

    print("AA"); print("BB"); print; print("CC");

    will produce the output


  11. As illustrated by the preceding examples, successive output items produced by a single print statement are not separated by 'automatically inserted' blanks and do not start a new line.

The SETL print facility is quite easy to use but does not produce output comparing in elegance with the formatted output generated by programs written in various other languages, especially languages such as PL/I or COBOL, which have a commercial orientation. To produce more elegant formulated output in SETL, it is necessary (albeit easy) to make use of string primitives which the language provides (see Sections 2.5.3 and 5.8). These allow one to build up output strings of arbitrary format and complexity. Note in particular that the str operator produces the very same string representation of a value that the print command would print but makes this string available as an internal object which can be manipulated by using the string operations which SETL provides. These facilities make it easy to program an arbitrarily complex "pretty print" function in SETL. Such a procedure can indent nested sets and tuples nicely, can sort their elements to make search easier, etc. Utilities of this kind are well worth developing when large objects need to be printed and inspected repeatedly.

To read input from a designated file the reada statement is used. This has the form

	        reada(file_handle,Lhs1, Lhs2, . . ., Lhsk),    		         (1)

where each of Lhs1,..., Lhsk is either a simple variable or a more complex expression of the kind which could legally appear on the left-hand side of an assignment statement: see Section 3.12. Before using the statement (1) you must create a 'text-file input handle' by 'opening' the file you want to read. This is done by using the SETL 'open' operator, e.g in a statement like The statement (1) reads in a sequence of SETL values from the standard input file and makes them the values of Lhs1, Lhs2,..., Lhsk, respectively. For example, if the next three items in the input file are

				{1 2 3}
				[{1}, 2, A],

then the command

reada(file_handle,x, y, z)

will give x, y, and z the respective values {1,2,3}, "HELLO THERE", and {},, ]. This example illustrates several of the following rules governing the reada primitive.

  1. Successive items in a bracketed SETL value to be read can be separated either by commas or by blanks. For example, to read in the set {1,2,3} we can write its external representation either as

    {1, 2, 3}

    or as

    {1 2,3}

    or as

    {1 2 3}


  2. Unbracketed items separated by blanks will be read in by successive read statements even if they all appear on a single line. None of them will be bypassed, and reading will advance from one line to the next only when more input data are needed to complete the line being read. For example if the first three lines of the input file are

                    1 2 3 4 5 {6 7
                      8 9

    then the commands

    		reada(file_handle,x, y, z);
    		reada(file_handle,u, v);

    will give the variables x through z the same values that they would be given by the following assignments:

    x:= 1; y:= 2; z:= 3; u:=4; v:= 5; w:= {6,7,8,9,10};

  3. When read in, valid identifiers, i.e., unbroken strings of letters and numbers starting with a letter, will be read as strings even if they are not enclosed in quotation marks. For example, if the input file contains

    [A BB C123],

    then the command reada(file_handle,x) will have the same effect as the assignment

    x := ["A", "BB", "C123"];

  4. The Boolean values true and false can be read in if they are written in the form in which they would be printed by a print command. That is, true and false can be read in if they are represented as TRUE and FALSE in the input file. For example, if the input file contains [TRUE, FALSE] the command reada(file_handle,x) will give x the same value that the assignment

    x := [true, false];

    would give it. These rules imply that the reada and print operations are inverses of each other in many cases, i.e., that a file of data written by print can almost be read back in by using read. Unfortunately, this is not always the case (however, this perfect inverse relationship does hold for SETL "binary" input-output primitives, namely getb and putb; see Chapter 9). For example, if the string "Hello there" is written out using print and then read back in using reada, it will appear as the pair "Hello" "there" of successive string items. Moreover, if the string "Hello!there" is written out using print then any attempt to read the result will cause an error, since the unquoted character "!" happens to be indigestible to the reada primitive. (Also, the external form of an atom is indigestible to the reada primitive ) Thus reada and print are only inverses to one another if the value being printed and then read back in contains no quoted strings which are not valid identifiers (and also contains no atoms).

As reada operations are successively executed, an implicit "read position" pointer moves progressively forward in the file being read, past one SETL value at a time, until eventually the very end of the input file is reached. Thus the input file behaves as a "tape" on which successive SETL va1ues are written and from which they can be read. Even when the end of the input file has been reached, the read operation will continue to execute without any errors occurring, but in this case all further values read from the input file will be OM. Therefore the input file behaves exactly as if its actual contents were followed by infinitely many OMs. To detect the actual end of input, one must use another SETL primitive operation, represented by the built-in function call eof() (test end of file). This can be used in expressions just as any other function call is, but its value is always false if the last read operation executed did not encounter the end of the input file which it is reading. Conversely, eof() is true if the end of the input file was reached by the last reada statement executed. The value returned by eof() changes as soon as a first attempt is made to read past the end of the input file. For example, if the input file contains just the three items {1} {2} {3}, then the loop

for j in [1..4] loop reada(file_handle,x); print(x, eof()); end loop;

will produce the output

			{1} FALSE
			{2} FALSE
			{3} FALSE
			<om> TRUE

It follows that to read all data items present in the input and print them out one wants to use a loop which tests the eof() condition immediately after an item is read, as in the following example:

			loop do -- loop to read and echo all items in the input file
				if eof() then quit; end if;
			end loop;

If a bracketed item is not properly closed and one attempts to read it, then a run-time error occurs. For example, any attempt to read an input file whose last two lines are


or whose last line is


is fatal.

4.8.1 Final Remarks. Character sets

The simple reada and print primitives described in this section get input from a designated file and send output to the standard output file. Other more advanced input-output primitives (described in Chapter 9) allow output to be read from other files. These files are made available to your SETL program in a manner which necessarily depends (to a certain extent at least) on the operating system being used. See Chapter 9 for additional details.

SETL implementations use the ASCII character set and thus support the full SETL character set.


1. Write a program which reads a set s of integers and prints out a list, in ascending order, of all the members of s which are prime.

2. A set of vectors, each of length n, and a vector x of the same length n are given. Write code which selects the element of s which has the largest number of components in common with x.

3. Write a program to read a character string, reverse the order of its characters, and print it out.

4. Write a program that will scan a string of characters containing parentheses, square brackets, and set brackets. Determine whether it is properly bracketed. (A string is properly bracketed if each left bracket or parenthesis is matched by a following right bracket or parenthesis of the same kind. For example, {[]} is properly bracketed, but {[ }] is not.)

5. Write a program that reads in successive pairs of strings s, t of the same length and determines whether t can be obtained from s by substituting for the characters of s in some single-valued way. For example, "ipstf" is obtained from "horse" by the substitution {["h","i"], ["o","p"], ["r","s"],["s","t"], ["e","f"]}, but "beer" cannot be obtained from anna' in this way, since two different characters would have to be substituted for "a". 6.

  1. Write a program which will translate an arbitrary message into Morse code The Morse codes for all characters of the alphabet and for the most common punctuation marks are shown in Figure 4.8.

    A:  .-I:  ..Q:  --.-Y:  -.--
    B:  -...J:  .--R:  .-.Z:  --..
    C:  -.-.K:  -.-S:  ...,:  --.--
    D:  -..L:  .-..T:  -.:  .-.-.-
    E:  .M:  --U:  ..- ;:  -.-.-.
    F:  ..-.N:  -.V:  ...-:  :  ---...
    G:  --.O:  ---W:  .--':  .----.
    H:  ....P:  .--.X:  -..--:  -...-

    Figure 4.8 Morse Codes for Alphabetic and Special Characters

  2. Write a program which will translate Morse code back into English.

7. A publisher produces books both in hard cover and paperback. Any given book can be either long, medium, or short and can be either elementary or advanced. A short, elementary paperback book sells for $5. Exactly $2 is added to the price of a book If it is hard cover; medium-length books sell for exactly $1 more than short books, and long books for exactly $3 more. The price of a book is doubled if it is advanced. Write a small program which will print out all possible categories of books, with their prices.

8. Write a program that will read an integer n and print its successive digits separated by spaces, starting with its leftmost digit.

9. Write a program which can read an arbitrary integer and print it in English. For example, -143 should print as "minus one hundred forty-three". Can you do the same for Spanish, French, Italian, Russian, or German?

10. Write a program to read in three points x, y, z, each represented by a pair of real numbers, and determine whether these three points (a) all lie along a line (b) form the corners of an isosceles or equilateral triangle (c) form the corners of a right triangle. Print out an appropriate message in each case.

11. Write a program which will read in a sequence of lines, each containing someone's name, family name first, and print out an alphabetized list of these names, in alphabetical order of last names. Repeat this exercise, but this time print the alphabetized list with given names first, in alphabetical order of given names.

12. Making use of a map from family names into their probable ethnic origins, write a program which reads a list of names and attempts to guess the ethnic origins of their bearers. Your program should also make use of facts like the following to increase its coverage: names beginning with Mc are probably Irish, with Mac probably Scottish; names ending in -ski are probably Polish, in ian probably Armenian, in wetz probably East European Jewish, in ini probably Italian, etc. How well does your program guess the family origins of your classmates? Modify this program so that it uses first names to guess sex. Note here that names ending in -a are probably female, etc.

13. A college collects statistics on the members of its entering freshman class. The basic information for each student is a line in a data file, consisting of the following items, in sequence, separated by blanks: student's last name, first name, age(in years), sex(M or F), marital status (0 = single, 1 = married, 2 = divorced or separated)

Write a program to print out the following information:

    (i) Percentage of students under 21 years old

    (ii) Percentage over 21 years old

    (iii) Percentage over 30 years old

    (iv) Percentage male and female

    (v) Percentage of males single, married, and divorced or separated

    (vi) Percentage of females single, married, divorced, or separated

14. An automobile sales agency employs 25 salespersons. Sales records are kept on cards, each card holding the following information, separated by blanks:

    (a) Person's last name

    (b) Make of car sold

    (c) Amount of sale

    (d) Net amount of sale (i.e., total amount minus discount allowed for trade-in)

Write a program which will read a monthly file of such cards and print out the commission due to each salesperson. The rules determining commissions are as follows:

  1. Standard commission is 5% on the first $20,000 of net sales, 6% on the next $10,000 of net sales, and 7% on all sales over $30,000.

  2. Individual sales totaling more than $10,000 earn a 1% bonus.

  3. Sales on which less than $500 trade-in is allowed earn a bonus of one half of 1%.

15. A factory's payroll is prepared from a set of daily time cards and a mapping f giving the hourly wage rate for each employee. Each time card contains an employee's social security number followed by the number of hours worked on each particular day. The mapping f sends each employee social security number m to the employee's name, hourly wage rate, and tax withholding rate. Total pay is number of hours worked, times hourly base rate, times (1 - r), where r is the tax withholding rate. However, all hours in excess of 40 are paid at a time-and-a-half rate. Write a program to read a file representing a week's payroll records, and print out a payroll showing employee name, social security number, total pay, tax withheld, and net pay.

16. Suppose that the daily time cards of Ex. 15 are grouped into batches separated by cards which contain only the single digit 0, with the Monday batch coming first, Tuesday next, etc., and that work performed on weekends is paid at a double-time rate. Modify the program of Ex. 15 to handle this case also.

17. In bowling, a complete game consists of 10 frames. Either one or two balls is rolled in a frame. If all 10 pins are knocked down by the first ball rolled in a frame (this event is called a "strike") the score for the frame is 10, plus the number of pins knocked down by the next two balls rolled. If all 10 pins are knocked down by the two balls rolled in a frame (called a "spare"), the score for the frame is 10, plus the number of pins knocked down by the next ball rolled. Otherwise the score for the frame is the number of pins knocked down by the two balls rolled in the frame. If a spare is rolled in the 10th frame, then you are allowed an extra ball; if a strike is rolled in the 10th frame, then you are allowed two extra balls. Write a program which will read a tuple representing the number of pins knocked down by each ball rolled during a game and print out the corresponding score.

18. A telegram is transmitted as a single string of characters, with words separated by blanks but the end of each line marked by a dummy word Z. Write a program which will count the number of words in the actual telegram. Words with more than eight letters in them are to count as two words.

19. Write a program that will read three strings s1, s2, s3 and then determine whether s2 occurs as a substring of s1 after all characters in s3 have been eliminated from s1.