The normal stages of a program's life cycle are:
This chapter discusses various key aspects of this program life cycle providing hints that aim to help the inexperienced programmer to cope effectively with the pragmatic problems normally associated with program design, debugging, and maintenance.
For example, if you write
immediately before proceeding on this assumption, your error will be pinpointed immediately; if you omit this check, you may have to find your way back to this error from some obscure symptom. A related idea is to introduce, and use, a collection of small standard procedures to handle these touchy situations in ways that are more instructive. For example, by introducing the macros (see Chapter 8).
procedure len_from(i,j); return j - i + 1; end; procedure take_len(s,i,k); return s(i..i + k-1); end;
we can accurately extract a string of length k from s starting at character position i by writing
take_len(s,i,k);
and can evaluate the length of s(i..j) by writing
len_from(i,j)
A good first step, but one that should not be allowed to hold you up too long, is to look closely at whatever fragments of correct output have been produced. If little or no output is correct, then your program may have failed before even the first print statement was executed. This hint may help you narrow the bug hunt. On the other hand, if some output is correct, then the program was probably functioning correctly till some point past the statement which produced the last correct output. Find the point in your program at which this output was produced, and see what comes before and after it. Again, this may narrow the hunt. Examine the erroneous output carefully and try to see whether its logical pattern reminds you of any particular section of your program. This also can sometimes yield useful hints concerning the likely location of the bug, especially if different parts of your output data are produced by recognizably different sections of code. If certain items of output that you expect are missing, try to see what evidence there is that all the code that you expected to execute did actually execute: remember that unanticipated data may have caused your program to follow an unexpected path, so that it may have bypassed, or may never have reached, the code sections which were supposed to have produced the output which you are surprised not to see. Analysis of evidence of this general kind will in favorable cases point the finger of suspicion at certain narrow program sections. However, in less favorable cases, the available evidence will be ambiguous. In this case, you will need to generate more extensive traces and dumps. This can be done in one of two ways:
Section 6.6 will have more to say about technique (b), which is related to the general issue of formal program verification. The following more pragmatic hints will help you to apply this technique effectively. It is particularly important to place assert statements in sections of code known to involve delicate constructions, especially if (as in the case of the "off by one" bugs considered in the last section) the necessary checks are simple. Since the correct functioning of a program often hinges upon the assumption that key variables will change in a consistent way as iterative execution proceeds (for example, always increasing or always decreasing) it can be useful to save the last previous value of each significant variable var and to write checks which compare the last previous value of var with its current value. This can be done by introducing an auxiliary variable last_var for each var and writing an assignment
whenever it is desirable to save the last value of var. Then checks like
etc., will all prove useful.
Ultimately, however, the problem with a purely assertion-based debugging technique is that it is not easy to formulate the necessary checks comprehensively enough to make it unlikely that a bug (which probably relates to something that has been overlooked) can slip through.
Hence one must often fall back on method (a), which generates additional raw evidence for inspection. The problem in using this method is to avoid burying yourself in too voluminous a trace of the thousands or millions of events that take place as a program executes. To avoid this danger a carefully planned sequence of probes is necessary. A good idea is to resurvey your program, mentally list its main phases, and determine all the data objects which each phase passes to the next phase. If your program has been well designed, i.e., has been built up from modules interacting with each other in well-structured fashion, there should not be too many of these objects, and then it is reasonable to print them out for inspection. Before inspecting this information, review the logic of your program, and make sure you know just what features you expect to find in values of the variables that you have printed. Try to be aware of every feature on which any part of your program depends. Then check the actual data. If the data printed at the end of a phase looks correct in every detail, then this phase is probably correct. If something strange-looking appears in the data produced by a given phase, although the data supplied to this phase look correct, then there is probably something wrong with the code of this phase.
When this stage of debugging is reached, you will at least have determined which of the several phases of your code contains the error for which you are hunting. At this point, it is a good idea to think over all the evidence that you have examined, and see if any compelling picture of the problem seems to suggest itself. Sometimes the fact that the offending phase has now been located removes enough confusion for the difficulty to be guessed quickly. If not, you will have to carry your tracing to a more detailed level. This is a matter of inserting print and assert statements more densely into the offending phase, in order to locate the particular subphase that contains the error. As before, this is the subphase to which good data are being supplied, but which is seen to pass bad data along to its successor.
(iii.a) Don't be too quick to suspect them. Though such problems do crop up from time to time, they are much rarer than errors in your newly written programs. Remember that dozens of people are using the same software systems that you are, and that if the problem afflicting you is a system-level problem, it would affect all of these people. Before you become willing to blame your problem on anything other than a fault in your program, you should always have examined your program with great care, located a section just a few lines long which you can be sure is receiving correct input (because you have printed and inspected its input) and producing bad output (again, you must have printed and inspected this output). Finally, meticulous examination of these few lines, with review of the definition of all the operations these lines involve, of the parenthesization of those lines, and of any applicable rules of operator precedence must give you "courtroom" evidence that the system is not performing according to its specifications. At this point you are almost (but still not quite) in position to report a system problem to the expert in charge of maintaining your copy of SETL system (or of the operating system within which the SETL system runs). Before doing so, however, you should try to simplify the evidence still further, isolating the malfunctioning lines into a malfunctioning program just a few lines long, and then paring this program down still further if possible, ideally to the point at which it contains just three lines: an assignment initialising a very few variables, a single line which obviously does not function as it should, and a print statement which confirms the fact that this line has failed to act in the manner demanded by the rules of SETL. If the system problem which you think lies at the root of your troubles disappears somewhere during this sequence of steps, the cause of your difficulties may not be a system problem at all, but an error or misunderstanding on your part, which your attempts to locate the suspected systems problem may have clarified. In this case, chastened, you should return to your original program, fix the error in it, and continue your debugging. If, however, you do succeed in creating a very short program which gives unmistakable evidence of system malfunction, you should transmit a complete, clean copy of this program to a systems expert. This should be accompanied by a clear explanation of the problem you have pinned down. He will then take steps to fix the SETL system, or to have it fixed.
Note that problems in the SETL system, like problems in your own programs, are most likely to concern marginal, rarely exercised cases. Though the system has been in use for a few years and has been tested fairly extensively, exhaustive testing of so complex a system is simply not possible. (See Section 6.4 for a discussion of some of the issues involved in attempts to test programs comprehensively.)
There are a few cases in which it is reasonable to jump a little more rapidly to the conclusion that a system bug is affecting you. One is the case in which two runs of absolutely identical programs and data yield different results. Another is the case in which insertion into your program of a statement which is harmless by definition changes the behavior of the program significantly. For example, if insertion of a print changes your program's flow of control, something is obviously amiss at the system level. This may be evidence that can be reported to an expert immediately (but see the caution extended in (f)).
To reduce the level of your own confusion, it is sometimes helpful to work over your problem with a friend, trying to explain what is going on, and reviewing salient parts of the logic of your program with him, till he begins to understand it. A more expert consultant will often be able to spot the trouble that you have missed, but even if your consultant is less expert than you yourself, you will often find that the very act of explaining the problem lets you spot what is wrong.
| Bug | Likely SETL symptom |
| Variable not given any initial value | "Illegal data-type" error |
| Incorrect termination of a loop (e.g., count off by 1) | Missing items in data collections, sums or sets too small if loop terminates too soon. "Illegal data-type" error if loop terminates too late |
| Incorrect limits in string and tuple slices (e.g., count off by 1) | (Similar to incorrect loop termination) |
| Incorrectly structured while loop conditions or bodies, or incorrect initial conditions in while loops | Program does not terminate |
| Incorrect treatment of initial cases in recursions, or bad procedure calls | Program does not terminate, possible memory overflow |
| Omission of exit or continue statement | Program "runs on" into code not intended for execution. |
| Omitted or improperly inserted data-value update | Failure of relationships expected subsequently; unexpected values or jumps. Effects can be subtle. |
| Shared global variable unexpectedly modified by invoked procedure or function | Efforts can be very subtle, and particularly hard to find. Beware of global variables! |
| Misspelled variables, e.g., AO for A0, Bl for B1, c1 for ci | "Illegal data-type" error, possibly no output |
| Reading unexpected data | Can cause wide variety of effects |
| Some unanticipated characters encountered by the string-scan operations | Program does not terminate |
| Failure to set a program switch | Effects can be subtle |
| Parameters out of order in procedure call | "Illegal data-type" error (generally easy to find) |
| Variable inadvertently modified by assignment to a variable intended to be different but having the same name | If no data-type error is caused, effects can be subtle |
| Variables out of order in read statement | "Illegal data-type" error (generally easy to find) |
| Read operations of program inconsistent with data actually present in input file | "Illegal data-type" error (generally easy to find) |
| Target of an assignment statement misspelled | Effects can be very subtle |
| Not resetting a counter | Effects can be very subtle (see "Incorrect loop termination") |
| Complex, incorrect combination of Boolean conditions | Effects can be very subtle |
| Too few or too many parentheses in expressions; misunderstanding of precedence rules | Effects can be very subtle |
Debugging is the process of searching for the exact location of a program error when you know that some error is definitely there. Testing is the systematic exercise of a program which you believe might be correct, in an effort to see whether bugs are really absent. If testing shows a bug, debugging starts again. If your tests are not systematically designed, then bugs may go undetected and remain in the program. All one knows about a poorly tested program is that it works in the few cases for which it has been tried; it may fail in many others.
Test design is as important a part of program development as the choice of algorithms and data structures. Development of a test plan should begin while a program is being written. A procedure which is hard to test is apt to be bug-prone, and should be simplified if possible. By keeping testability in mind, you will avoid unnecessarily complex constructions, and produce cleaner, sounder code.
Testing falls into three distinguishable phases, which we will call first-stage testing, second-stage or quality assurance testing, and maintenance or regression testing. First-stage testing begins as soon as a program is complete enough for execution to be possible. Its guiding hypothesis is that bugs are present in sufficient numbers to prevent much of the program from working at all. During first-stage testing, one aims to make the main facilities of the program being debugged operable by finding and removing bugs quickly. Quality assurance testing begins when first stage testing ends. It assumes that a few obscure bugs remain in the program to be tested and aims to test systematically enough to smoke them out. Maintenance testing aims to ensure that new bugs are not introduced into old programs during their extension and repair.
Perhaps because realism might lead to suicide, programmers are generally overoptimistic concerning the likelihood that a program that they have just written will work right away. Careful preparation of a first-stage test plan serves to counteract this common illusion; the more realistic attitude thereby engendered encourages more careful initial program inspection, and this often reduces the number of bugs present when first-stage testing begins. This is why programs developed cautiously often become operational quickly, whereas programs developed in too optimistic a frame of mind often begin to work only after frustrating and totally unexpected delays.
An effective way of organising tests is to group them into a single procedure called test_prog, whose one parameter s is a string consisting of test names separated by asterisks. The test_prog can then have the following structure:
procedure test_prog(s) -- skeleton of test procedure
while s /= "" loop
span(s,"*"); tn := break(s,"*");
if tn = "" then exit; end if;
print("Beginning Test", tn);
case tn of
(put sequence of named tests here)
else
print("Unknown test name");
end case;
end loop;
end test prog;
To trigger a sequence of tests named test_4, test_2, etc., one has only to write something like test_prog("test_1*test_2*..."). Later, when first-stage testing is complete, this call, and test_prog, can be left in the program P that has been tested, but the argument of the test_prog call can be changed so that it reads test_prog(getspp("TEST=/")) (see Chapter 9 for an account of the getspp library procedure.) If this is done, no tests will be executed unless P is invoked with a command line parameter of the form TESTS = testl*test2...*testn, in which case the named tests will be performed. This approach makes it easy to retest a program in which unexpected trouble has developed. Of course, the test facilities available should be carefully documented at the start of the test_prog procedure.
Especially when a long program P is being developed, it may be desirable to begin testing before all parts of P have been coded (or even designed) in detail. Of course, such an approach will be reasonable only if P has a sound, highly modular overall design, and only if the missing sections of P have been designed in enough detail so that you can be sure that no inconsistency will develop when they are designed and coded in detail. This mode of organizing development and testing is sometimes called top-down testing. It has the advantage of allowing testing and development to proceed in parallel. A related advantage is that it can provide particularly early confirmation of overall design soundness, or, if a design proves to be unsound (say, in terms of human factors, i.e., usability), it can give early warning of trouble.
If a top-down approach to development and testing is taken it will be found useful to provide a standard, multiparameter library routine having the name MISSING_SECTION. Then parts of your program that have not yet been coded can be replaced by invocations
where the string-valued parameter -name_of_missing_section- should assign the missing section a name that can be printed. The MISSING-SECTION procedure should also allow optional additional parameters, so that it can be invoked by
where p1, p2,..name various parameters with which the missing section would have to deal or which might explain why it was (perhaps unexpectedly) invoked.
Second-stage (or quality assurance) testing should aim to exercise a program comprehensively, in at least the following senses:
In preparing a comprehensive collection of tests, you will therefore need to survey your program systematically, listing marginal situations of this kind as exhaustively as you can; then you should prepare at least one test that will force each logically possible situation to occur.
Even where resources do not permit fully independent organization of the activity of program testing, it is well to ensure that every line of a complex program is read and understood by at least two programmers, each of whom will be able to spot problems and suggest tests that the other might have overlooked.
If a chart has been prepared showing the program features exercised by each test (see iv above) it can be used when a test fails to suggest what part of the program should be examined first to find the cause of failure. If some one of these program sections has just been changed, it will of course come under immediate suspicion.
Two kinds of machine resource are relevant to the study of program efficiency: execution time and data storage required. In what follows, we will concentrate our attention on the first of these, because in many cases it is time which is critical. Data storage requirements were a pressing concern in the early days of computing, because memory was an extremely expensive hardware component and computers were equipped with only a few thousand words of it. As a result, early programming languages such as FORTRAN included various complex mechanisms for managing the limited memory resources available, which forced the programmer to concern himself with low-level details which were extraneous to the actual application to be programmed. Memory has in the meantime become a much cheaper commodity, and concern with its optimal utilization can in many cases (but by no means all) be left to the machine itself (in the form of some facility for dynamic storage allocation). This is what we will do in most of what follows.
for i in [1..1000] loop for j in [1..2000] loop for k in [1..3000] loop for l in [1..4000] loop (1) sum +:= (2 * i**3 + j**3 + k**2); end loop; end loop; end loop; end loop;
In this code, for each successive value i, the variable j iterates over 2000 different values; for each value of i and j, the variable k iterates over 3000 values; and for each value of i, j, and k, the variable l iterates over 4000 values. Thus, all in all, the innermost statement of the code fragment (1) will be executed 1000 x 2000 x 3000 x 4000 times, i.e., 240 billion times. This statement involves 6 multiplications and 4 additions, so that at least 2.4 trillion elementary arithmetic operations are required to execute the code (1). Even on a computer capable of executing 400 million arithmetic operations per second (a fairly typical performance figure nowadays) and even if the code (1) were written in a programming language capable of exploiting this raw arithmetic capability to the utmost, 6,000 seconds would be needed to execute the code (1). Since an hour is about 4000 seconds, this is about 1.5 hours. However, since SETL (which pays a price in efficiency for its very high level) is about 30 times less efficient than this, execution of the SETL code (1) would require about 2 days of computer time. This makes it quite clear that in writing SETL programs one needs some way of estimating the computational resource which will be consumed to execute the code that one sets down.
Instructions whose operands are simple data types: integers, floating-point numbers, Booleans, all take roughly one cycle to execute. This includes
Arithmetic operations on integers and floating-point numbersBoolean operations.
Comparison operations ( =, /=, >, <, etc.)
At the level of the actual machine, there may be secondary differences among some of these; for example, a floating-point division is typically slower than an integer addition. In this discussion we can safely ignore these differences, and we shall do so in what follows.
The simple assignment statement
where v is a simple variable name, also takes roughly one cycle to execute (of course, only after the value of expr has been obtained).
| Cardinality: | #S takes one cycle to evaluate, regardless of the actual number of elements of S |
| Membership: | (x in S) where x is a simple item (numeric or Boolean) takes typically three to six cycles to evaluate, and increases only slowly with the size of the set S |
There is nothing obvious about these remarks! You might wonder how the membership test can actually be performed without examining all the elements of S. The answer lies in clever choice of the internal data structures used by the SETL run-time system itself These structures are discussed in Chapter 10 and we will not say more about them now, except to note that the possibility of using sets as freely as we do in SETL depends in part in the constant execution time required by the membership operation. Note that this constant time behavior applies when x is a simple (nonstructured) value. In general, the membership test x in S takes as long as (or a little longer than) an equality test x = y. The relevance of this remark will become apparent in the following section.
| Insertion and deletion: | For simple objects the operations x with S and S less x, take a nearly constant number of cycles to execute. Exceptions relate to questions of storage allocation, and will be discussed later. The actual number of cycles needed for insertion of a simple value into a set of moderate size is about 10-15 cycles. |
| Map retrieval: | To evaluate or to modify f(x), where f is a map and x is a simple value, takes nearly constant time, comparable to the time required for set membership operation. Once again, efficient implementation of this fundamental operation is secured by the use of a sophisticated data structure about which we will say more in Chapter 10. |
This process is described by the following equivalent SETL fragment:
S3 := S1; -- Initialize result for x in S2 loop -- Iterate over second set. if x notin S1 then S3 with:= x; (2) end if; end loop;
After code like this has been executed internally, S3 will clearly be equal to (S1 + S2). To estimate the execution cost of the loop in (2), we can reason as follows: the loop body is executed (#S2) times, i.e., once for each element of S2. For each such element we perform a membership test and (possibly) a set insertion, both of which take constant time. Therefore the loop takes an amount of time proportional to #S2. The initialization of S3 amounts to making a copy of S1, which takes a time proportional to the size of S1. Putting all of this together, and neglecting small differences between the execution time of membership and insertion operations, we can conclude that the cost of a union operation is proportional to the sum of the cardinalities of the sets involved. This is not an exact statement, but it is sufficiently accurate to convey an idea of the order of magnitude of the execution time of this set operation.
We can apply a similar analysis to the set intersection operation (S1*S2). This is evaluated by the SETL run-time system in the following fashion:
S3 := {};
for x in S1 loop
if x in S2 then
S3 with:= x; (3)
end if;
end loop;
Using much the same reasoning, we can easily see that the execution time for this fragment is proportional to the cardinality of S1, i.e., to the size of the smaller of the two sets entering into the intersection operation.
Set difference, set inclusion, and subset testing operations are performed in similar fashion and can therefore be expected to take an amount of time proportional to the cardinality of one of their arguments.
Other SETL primitives involve similar (implicit) iterations over sets. The most obvious of these is the print statement: print(S) requires the examination and output of each element of S. Set equality and inequality are more complex examples. To test whether S1 and S2 are equal, the SETL run-time system performs the following sequence of actions:
procedure equal(S1,S2); if # S2 /= # S1 then return false; else for x in S1 loop if x notin S2 then return false; end if; end loop; return true; -- All elements are common. end if; end equal;
We mentioned in the previous section that the membership operation must often perform an equality test involving the object being tested for membership. This means that if S1 is a set of sets, and su is a set, then the operation
will take a time proportional to the size of su, because it will involve testing su and one or more element(s) of S1 for equality. It should be clear that when sets of sets of sets of..are involved on both sides, the membership test operation will become more complex, and the size of the subobjects of S1 will have to be taken carefully into consideration.
On the other hand, membership testing, i.e., the operation (x in T) where T is a tuple or string, proceeds by examining all elements of T in succession, in the manner suggested by the following equivalent SETL procedure:
procedure intuple(x,T); for y = T(i) loop if x = y then return true; end if; end loop; return false; -- x was not found in T. end intuple;
Given the way that tuples and strings are represented internally, there is no more efficient way to tell a priori where x is found in T, if indeed it is found in T at all. If x is not in T, this will only be ascertained after all elements of T have been examined, and if x is in T, it may be the first, second, ... n-th component. We can say that on the average, half of the elements of T will have to be examined before x is found. This means that a membership test applied to a tuple T can be expected to take a number of cycles proportional to #T. (Compare this with the more advantageous constant time behaviour of membership tests on sets).
If x is itself a structured object, and T is a tuple of such structures, then the equality test in the loop (5) can be expected to be proportional to the size of x. Therefore in this case the membership test (x in T) will take time proportional to (#x * #T).
Tuple and string concatenation are analogous to set union: they take a time proportional to the sum of the sizes of the operands, i.e., to the size of the result. Testing tuples and strings for equality is also linear in the size of the operands, as it requires that each component of each object (in the worst case) be examined.
The execution times of the basic SETL operations are summarized in Table 6.1. We can estimate the cost of any straight-line program simply by adding
| Operation | Time Units | |
| Arithmetic | i + j, i * j | 1 |
| Assignment | x := y | 1 |
| Indexing | T(i) | 1 |
| Map retrieval | M(x) | 5 |
| Set membership | x in S | 5 |
| Tuple membership | x in T | #T |
| Set union | S1 + S2 | #S1 + #S2 |
| Concatenation | T1 + T2 | #T1 + #T2 |
| Quantifier | exists x in S | C(x) | #S * (1 + cost(C)) |
the costs of each instruction in it. However, other issues, discussed in the next section, arise if we go on to examine the more interesting problem of estimating the cost of programs with various loop structures.
or over a map, tuple, or string, as in
produces set elements (or map values, tuple components, or string characters) at a rate of one per cycle. Essentially the same remark applies to numerical iterators, like those in (1). Hence to estimate the time required to execute a loop, we have to multiply the number of times the loop will be executed by the (average) time that it will take to execute the body of the loop. An obvious generalization of this rule applies to imbedded loops: if one for loop is imbedded within another, then the time required to execute the outer loop is the number of times it will be executed, multiplied by the (average) number of times the imbedded loop will be executed, multiplied by the time required to execute the body of the embedded loop. For example, the double loop
will execute in a time roughly equal to 1000 * 500 multiplied by the amount of time required to execute the loop body, since the embedded loop over j will execute an average of 500 times for each successive value of i. (This number 500 is halfway between the number one of times that j changes when i = 1 and the number (1000) of times that j changes when i = 1000.)
Since quantifiers and set formers are in effect prepackaged iterations, very similar rules apply to them. An existential quantifier like
... exists x in s | C(x) ... (6)
is evaluated as if it were the following SETL fragment:
maybe := false; for x in s loop if C(x) then maybe := true; exit; end if; end loop;an internal variable that holds the Boolean result of the test. This SETL fragment will execute in time equal to the number of items examined, multiplied by the average time required to evaluate the Boolean condition C(x). If the quantifier (6) evaluates to false, then all the members of s will need to be examined, so the time required will be #s multiplied by the average time to evaluate C(x). If (6) evaluates to true, then iteration over s will terminate as soon as an x that actually satisfies C(x) is found; since iteration over a set is performed in a somewhat unpredictable order, the number of iterations needed to find such an x should be roughly # s/( # sat + 1) where sat is the set of all x satisfying the condition C(x).
Similar remarks apply to set formers and the tuple formers, except that:
must always proceed until all the elements of s have been examined. As an application of these rules, note that execution of the harmless-looking code fragment
f := {};
for x in s loop
f(x) := {y in s | (exists z in s | C(x, y, z))}; (7)
end loop;
involves three nested loops: first the for loop which appears explicitly, next the implicit iteration over s in the setformer {y in s..}, and then finally the implicit iteration over s in the quantifier exists z in s...Therefore the number of cycles required to execute (7) is roughly as high as the cube of the number of elements of the set s.
k := 0; while (k +:= 1) < n and t(k) /= OM loop statements.. end loop;
where k is incremented each time through the loop, behaves in very much the same way as a for loop and therefore will terminate after no more than n - 1 iterations. On the other hand, consider the loop
t := n * [0]; while exists x = t(i) | x = 0 loop print(t); t(i) := 1; t(1..i - 1) := (i - 1) * [0]; end loop;
This begins by generating a tuple t := [0,0,0,...0] of n zeroes and then repeatedly sets the first nonzero coordinate of t to 1 and all the coordinates preceding this coordinate to zero, thus carrying out a (left-to-right) form of binary counting. The sequence of tuples printed is
[0,0,0,...0] [1,0,0,...0] [0,1,0,...0] [1,1,0,...0] [0,0,1,...0] [1,0,1,...0], etc.
and is plainly of length 2n, in that it generates all binary numbers from 0 to 2n - 1. Hence the number of cycles required to execute the while loop (8) is at least 2n, which means that if n = 50 the loop will execute for roughly 320 years, even if 100,000 iterations are performed per second!
For a more realistic example of the way in which while loops are typically used, consider a particularly simple sorting method, dubbed flip-sort.
while exists i in [1..#t1] | t(i) > t(i + 1) loop [t(i),t(i + 1)] := [t(i + 1),t(i)]; (9) end loop;
This searches a tuple t for out-of-order components and interchanges a pair of such components whenever one is found. Plainly the number of cycles required to execute (9) is the average time required to search the tuple t for an out-of-order pair of adjacent components, multiplied by the number of interchanges required to put t in sorted order. Even though precise analysis of these times requires subtle analysis going far beyond the scope of this book, it is not hard to estimate these time requirements crudely. We can guess that, as long as an out-of-order pair exists, one such pair will be found after searching through some fraction of the length of tuple t being sorted; thus evaluation of the existential quantifier appearing in the first line of (9) is estimated to require cn cycles, where c is some constant which we will not attempt to evaluate here and n is #t. Moreover, since each execution of the body of the while loop (9) corrects exactly one case in which a pair of elements appears in inverted order, the expected number of times that (9) must iterate to put t into its final sorted order should be roughly equal to the number of pairs of components of t which occur in inverted order. In a random arrangement of the components of t, roughly half the components to the left of a given t(i) should be larger than t(i), and roughly half the components to the right of t(i) should be smaller than t(i). Thus each component t(i) of t should appear in roughly #t/2 inverted pairs, and it follows, since this way of looking at things counts inverted pairs twice, once for each of the components in such a pair that the expected number of inverted pairs in a randomly chosen arrangement of the components of t should be roughly (1/4)n2). Multiplying this expression by cn, representing the estimated time required to evaluate the existential quantifier in the first line of (9), we arrive at
(1/4) cn3 (10)
for the time required to execute this 'flip-sort' code on a tuple of length n.
The approximations which we have made in arriving at the estimate (10) are too crude for the constant (c/4) appearing in (10) to be particularly meaningful. (Ex. 9 following Section 6.5 outlines an experimental procedure for estimating this coefficient more accurately.) The significant feature of the estimate (10) is that it tells us that the time required to sort a tuple by the flip-sorting method is proportional to the cube of the length of t, i.e., that - sorting a tuple of length 10 by this method should take roughly 120 cycles sorting a tuple of length 100 roughly 120,000 cycles, and sorting a tuple of length 1000 roughly 120,000,000 cycles. These figures, which are not very favorable, reflect the rapidity with which the cube of n grows as n increases. In the jargon of algorithm analysis, one says that flip-sort is an 'n cube' algorithm - or an 0(n3) algorithm. Clearly, any sorting algorithm whose time requirement grows less rapidly than the cube of the length of t will be very much superior to flip sort as a technique for sorting large tuples t.
Let us therefore modify our naive flip-sort algorithm in order to obtain a more efficient sorting method. We begin by noticing that the existential quantifier in (9) forces us to scan the tuple from the beginning, each time we find an out-of-order pair. This is wasteful because an out-of-order element may be out of place with respect to several of its successors, not just one. A better approach is to try to push the winner of each such exchange as far as it will go, and go back to the beginning of the tuple only after we have made a full pass over the tuple. This leads to the following:
for j in [1.. #t - 1] loop for i in [1 .. #t - 1] loop if t(i) > t(i + 1) then [t(i),t(i + 1)] := [t(i + 1),t(i)]; (11) end if; end loop; end loop;
Note that this code will push the largest element in the tuple to the last position the first time the inner loop is executed, the second largest element will be pushed to the second position from the end on the next pass through the inner loop, and so on. Each execution of the inner loop adds one element to the fully sorted portion of t which is built up at its end. The outer loop is executed (#t1) times to ensure that every element of t finds its proper place.
A further improvement is possible. Elements that have reached their correct position need not be examined in successive passes because we know that they are in their proper places. We can therefore modify (11) to read as follows:
for j in [1..#t - 1] loop
for i in [1..#t - j] loop
if t(i) > t(i + 1) then
[t(i),t(i + 1)] := [t(i + 1),t(i)]; (12)
end if;
end loop;
end loop;
The fragment (12) is the bubble-sort method that we examined in Section 5.1.1.
Next let us examine the execution time of (12). Again, let n be #t. The outer loop is executed (n - 1) times. The inner loop is executed (n - 1), (n - 2), ... 3,2,1 times, that is to say, (n/2) times on the average. The body of the inner loop consists of various comparison, indexing and assignment operations, all of which take some constant time c1. Therefore the total execution time of (12) is roughly c1 * (n - 1) * (n/2) cycles. We therefore can say that the code (12) executes in time proportional to the square of the size of t. This is a considerable improvement over our initial flip sort. Several further refinements can be made to (12), but these improvements only affect the constant c1 and do not modify the n2 behavior that we have just obtained.
The order-of-magnitude analysis that we have just performed is the most frequent and useful kind of algorithm performance analysis. Often it is enough to know what the order-of-magnitude behavior of a program is to estimate whether it is acceptable for the small, medium, large, or very large amount of data that is to be processed.
The transformation that took us from (9) to (12) deserves further discussion. What we did was to remove an existential expression
which describes some exhaustive search over s, and replace it with a more intelligent search over that set. This shows a fairly typical use of existential quantifiers: they allow us to write a crude specification for some kind of search process, in order to obtain a working algorithm quickly. When the time comes to improve the efficiency of that algorithm, we can often replace the existential quantifier with a more precise (and therefore more efficient) search, better adapted to the problem at hand. This stepwise approach to program efficiency is one of the most important principles of good programming practice. Proper use of SETL consists precisely in starting with as condensed and abstract a formulation as can be achieved by use of set formers, quantified expressions, power sets, etc. Once such an abstract program is running, one can then replace these abstract constructs, in those places where they are unacceptably inefficient, by more efficient lower-level constructs which make use of sophisticated data arrangements.
However, in most cases the important expense is the time the called procedure actually takes to execute. If the called procedure consists only of loops and simple statements, we can estimate its execution time using the techniques sketched in the previous section. On the other hand if the called procedure calls other procedures in turn, or more interestingly calls itself recursively, then the analysis of its efficiency is considerably more subtle. To indicate the kind of reasoning required, and also introduce some useful notation, let us first examine the "quintessential recursive procedure," namely the factorial:
procedure fact(N); return if N = 0 then 1 else N * fact(N - 1) end if; (1) end fact;
Since the factorial of a positive number N is the product of successive integers up to N, an iterative calculation of fact(N) will require N multiplications, i.e., take time proportional to N. It is conventional to indicate this by saying that iterative calculation of fact (N) uses time O(N).
This is also the case for our recursive definition. To establish this, we reason as follows: if N is not zero, execution of fact (N) will require one comparison operation (the test N = O) one multiplication, and one recursive call to fact involving a smaller argument, namely N1. If we let T(N) designate the time it takes to execute fact when its argument is N, this leads to the following equation:
T(N) = 2 + T(N - 1) (2)
where the constant 2 corresponds to the two elementary operations noted.
This equation is called a recurrence relation for T and defines the value T for each possible argument N, as a function of the value that T(N) takes as an smaller argument. Note that this recurrence parallels the recursive definition of fact, in which the value calculated is also defined in terms of another invocation of the same procedure. To solve the recurrence relation means to express T(N) as some explicit function of N. For the recurrence (2), this can be done easily, as follows. The relation (2) holds for arbitrary values of N (except 0), so that substituting N for N1 in (2) we find that
We can use this to replace T(N1) on the right-hand side of (2), which gives
Repeating the same reasoning, we can express T(N - 2) in terms of T(N - 3), and so on. This process ends when we express T(1) as 2 + T(0). T(0), that is to say the time required to evaluate the factorial when its argument vanishes, is just the execution time of the test (N = O) because in that case the procedure does not recurse. Thus T(0) = 1. Putting all of this together, we obtain
T(N) = 2 * N + 1 (3)
which indicates that our recursive factorial program also executes in a time proportional to its argument N. The calculation just described neglected the actual cost of the procedure call itself, and that of the return instruction. Taking these into account would modify (3) only by constant factors, i.e., would gives us
where the constants a and b are 10. Hence our conclusion that calculation of fact(N) requires time proportional to N remains correct.
Next let us examine a more complex example of a recursive procedure: the Towers of Hanoi procedure of Section 5.4.4. Recall that the Towers of Hanoi problem is that of moving N rings, stacked in decreasing order of size on a peg A, to a peg B, using some intermediate peg C as a ring holder, and respecting the ordering of the rings, so that a ring never lies over a smaller ring. As seen in Section 5.4.4, the solution is given by means of the following recursive procedure:
procedure Hanoi(N, fr, to, via);
if N = 0 then return;
else
Hanoi(N - 1, fr, via, to);
print("from peg ", fr, " to peg ", to);
Hanoi(N - 1, via, to, fr);
end if;
end Hanoi;
Using the same reasoning as before, we can readily establish the following facts:
T(N) = a + 2 * T(N - 1) (5)
and by the "boundary condition"
T(O) = b (6)
where a and b are small constants. Once again, we can use this recurrence relation for T(N) to express T(N1) in terms of T(N2), T(N2) in terms of T(N3), and so on, until T(N) is expressed only in terms of a and b. These successive replacements look as follows:
T(N) = a + 2 * (a + 2 * T(N - 2))
= a * (1 + 2) + 4 * T(N - 3)
= a * (1 + 2) + 4 (a + 2 * T(N - 4))
= a * (1 + 2 + 4) + 8 * T(N - 4)
The pattern should be clear: each substitution adds another power of 2 to the coefficient of a, and another factor of 2 to the coefficient of T(Ni), so that this coefficient is always 2i-1. Our calculation stops when i = N, at which point we have
We conclude that procedure Hanoi takes an amount of time which is an exponential function of its input parameter N (the number of rings to be moved) This means that adding one more ring doubles its execution time. Clearly, a procedure with this behavior can only be useful for relatively small values of its argument. (The Buddhist legend from which this problem originates states that 64 such rings are being moved by dedicated monks since the beginning of time, and that the completion of their task will mark the end of the world. The preceding analysis indicates that we need not be too concerned yet about this prospect.)
Procedures are only useful in practice if they do not have the explosive time requirement that the Hanoi procedure illustrates: factorial is linear in its argument, as we saw previously, and the sorting procedures discussed so far are cubic or quadratic in the number of elements to be sorted. Most of the procedures that we use in practice consume time O(n), O(n2) or O(n3) at most.
Finally let us analyze the performance of one more simple recursive procedure, namely quicksort, which was presented in Section 5.4.1. This procedure, which can sort any homogeneous set s of integers, floating-point numbers, or strings, is simply
procedure quick_sort(s);
if #s = 0 then return [ ]; end if;
x := arb s; (7)
return quick_sort({y in s | y < x})
+ [x] + quick_sort({y in s | y > x});
end quick_sort;
Let T(n) be the number of cycles that this procedure will typically require to sort a set of n elements. Building up the two sets which appear in the final return statement of (7) will require a number of steps proportional to the size of s, in that a full pass over s is required in each case. Thus the time required to execute (7) is equal to some small constant multiple c*n of the length n of s (c = 3 is a fair guess), plus the time required to execute the two recursive invocations of quicksort which appear in the second return statement of (7). Since typically the element x chosen from s by the arb function of (7) will lie roughly halfway between the largest and the smallest elements of s, each of the two sets {y in s | y < x} and {y in s | y > x} should contain approximately half the elements of s. Thus, given that T(n) is the time required to sort a collection of n elements by the quicksort method, sorting each of these sets by use of quicksort should require roughly T(n/2) cycles. It follows that T(n) satisfies the recursive relationship
T(n) = 2 * T(n/2) + c * n (8)
The first of the terms on the right of (8) represents the time typically required to execute the two recursive invocations of quicksort appearing in (7), and the term c * n represents all the work needed to prepare for the two invocations.
We solve (8) in the same fashion as in our two previous examples, by substituting the expression (8) for the occurrence of T on the right of (8), which gives
T(n) = c * n + 2 * c * (n/2) + 4 * T(n/4) = 2 * c * n + 4 * T(n/4), (8a)
and then substituting (8) for T(n) on the right of (8a) to get
T(n) = 2 * c * n + 4 * c * (n/4) + 8 * T(n/8) = 3 * c * n + 8 * T(n/8). (8b)
Continuing inductively in this way we will clearly get
T(n) = 4 * c * n + 16 * T(n/16), T(n) = 5 * c * n + 32 * T(n/32),
and so forth, until eventually, when the power of 2 in the denominator on the right becomes roughly equal to n (which will happen after log n steps, where log n designates the logarithm of n to the base 2), we will find that T(n) is roughly
i.e., that T(n) can be estimated as the product of a small constant c (still roughly 3), times n, times the logarithm of n. One therefore says, in the jargon of algorithm analysis, that quicksort is an "n log n" or "O(n log n)" algorithm.
For n at all large and c roughly equal to 3, c * n * log n will be vastly smaller than the cube or even the square of n. For example, for n = 1000, n ** 3 is 1,000,000,000, n ** 2 is 1,000,000, whereas c*n* log n is only 30,000. Therefore quicksort can be used to sort large amounts of data, which could not be sorted in any reasonable amount of time using flip or even bubble sort. For example, if #t= 10,000, and on a computer capable of executing 100,000 of our nominal instruction cycles per second, sorting t using the flip-sort method will require approximately 16 hours, bubble sort will take about 20 minutes, whereas quicksort will accomplish the same operation in roughly 4 seconds.
This simple example shows why it is so important to find algorithms whose time requirements do not rise rapidly as the arguments passed to them grow larger. Very considerable efforts have been devoted to the search for such high-efficiency algorithms during the past decade, and a great many ingenious and important algorithms having efficient behaviour have been devised. Most of these algorithms lie beyond the scope of the present introductory work. For basic accounts of this important material, see the References at the end of this chapter.
To understand this important point, note first of all that copying is sometimes logically necessary. Consider, for example, the code
s := {1,2,3,4,5,6,7,8,9,10,15,20};
s1 := s;
s with := 25;
s1 less := 2;
print("s = ",s);
print("s1 = ",s1);
The output that this will produce is
s = {1 2 3 4 5 6 7 8 9 10 15 20 25}
s1 = {1 3 4 5 6 7 8 9 10 15 20}.
Since two different values will have been created by the time we reach the final step of (1), it is clear that somewhere along the way to this final step the single set constant assigned to the variable s will have to be copied. This copying can be done when the value of s is assigned to the second variable s1 in the second line of (1) (copying on assignment), or can be done in the third line of (1), when the value of s (but not that of s1) is modified by addition of the extra element 25 (copying on modification). Exactly where copying actually takes place will depend on the particular version of the SETL system that you are using, and especially on whether or not this compiler includes an "optimization" phase. But in any case, whenever two variables appear to be sharing the value of a composite object, and one of these variables is modified, the other one must be provided with a fresh copy of that composite object, because in SETL assigning to one variable never affects another. Copying a set or tuple with n components can require O(n) cycles. Hence execution of an apparently cheap operation like t(i) := x can require a number of cycles proportional to the length of t, if the value of t has been assigned to some other variable.
On the other hand, copying is frequently unnecessary, and both the optimizing version of the SETL compiler and the SETL run-time library include mechanisms for avoiding copying when it is not logically necessary. (Since these are implementation-level mechanisms, and fairly complex ones at that, we shall say nothing about how this is done.) When no copying is involved, the operation s with x is only two or so times slower than the membership test x in s, and similar remarks apply to the other operations in the group we have been considering. For example, the assignment t(i) := x can be done in just one of our nominal "cycles," and in the same circumstances map assignment is roughly five times as slow.
This is but one symptom of the fact that the storage management which the SETL system performs contributes to the total execution time of a program. This contribution will be small if the objects manipulated by the program do not change size very frequently, but can be very substantial if large objects are constantly created, shared, updated, and then discarded. Programming techniques that avoid storage management costs are known and studied in data-structure courses, for which a number of excellent texts exist. Such data structures can be constructed in SETL, and the next section gives one example of such, namely the linked list. In this text we have largely ignored the subject of data structures of this type, in accordance with the tenet of high-level programming which states that the burden of storage management is best left to the system. The gains in ease of programming thereby attained are well worth the execution costs incurred. Nevertheless, the fact that storage management can become expensive should be kept in mind. Note also that in some circumstances, such as in the programming and control of real-time devices (i.e., physical equipment such as a motor, a cardiac monitor, a radar, an air-conditioning system, etc.) where a computer system must respond with minimal delay, no hidden execution costs can be allowed, and the programmer must know, within a few milli- or microseconds, the time that a given code sequence will take to execute. Such systems cannot be programmed in SETL or in any other high-level language such as APL or SNOBOL For such systems, lower-level languages, in which all storage allocation is described in full detail by the programmer, must be used.
In order to focus our discussion, we have to examine more details of the implementation of the basic SETL types. In this section we will describe the way tuples are actually represented in the SETL run-time system and examine the advantages and disadvantages of this representation. This will allow us to understand the potential advantages of an alternative representation which is efficient in cases where the standard representation for tuples is comparatively expensive. Note that the representations of sets and maps used by the run-time system are considerably more complex than the representation of tuples.
Internally, that is to say in the memory of the computer on which your program is running, a tuple can be represented by a contiguous sequence of cells, or words, that contain the components of the tuple. These components appear in order, so that the first component of the tuple appears in the first of these cells, the second component in the second, etc. For example, if f is the tuple [2,3,5,7,11], it could be stored in memory as follows:
| 2 | 3 | 5 | 7 | 11 |
When working with such a representation the SETL run-time system would always know where the first component of t is found. From this, it can easily calculate the location of the second, third, components of t, and so on: the location of any component t(i) is simply (i - 1) plus the location of the first component. cells after the location of t(1). This is why a tuple is the most efficient way storing data that is referred to by number, i.e., according to its ordering: when such a collection of data is represented as a tuple, all items in this collections are equally accessible. Tuples are built for efficient indexing.
Other operations on tuples are not as easy to perform. Consider
whose purpose is to shorten t by removing two of its components. After this operation is performed, t will be stored as follows:
| 2 | 7 | 11 |
Thus we now have t(2) = 7, t(3) = 11, etc. This requires that the components of t that appear after the deleted section be "moved down" two positions. For a long tuple, this can represent a substantial expense. Similarly, operations that enlarge a tuple, by insertion or concatenation, such as
t with:= 13; t(1..3) := [1,2,3,4,5]; t +:= [17,19,23,29];
may require that a certain number of elements of t be moved or copied.: there is no room in the portion of memory currently allocated for t for the expansion required by one of the preceding operations, then a larger area will have to be found by the storage manager, into which the whole of t will have to be copied and then modified. We conclude that operations that modify the size of a tuple repeatedly carry a potentially substantial penalty in storage allocation activity. In cases where data are ordered but are not accessed by number, i.e. when no indexing operations are performed, and where substantial changes in the size of a tuple will occur, other representations exist that lead to smaller storage allocation and data motion costs. The most familiar among such representations is the chain, or linked list.
In the abstract, a linked list is a structure in which data are organized according to some linear order, i.e., with a first element, a second one, and so on, and so that it is easy to go from one element to its immediate successor. A tuple is of course an example of a list, but one in which elements are stored contiguously. In general a linked list is not stored contiguously, and the mechanism that takes us from one element to the next must be described explicitly. In SETL we may choose to use a map (call it next) which takes us from any element to its successor in the ordering. Then rather than having to use contiguous locations, successive elements of the list can be anywhere. In this case we can think of the location that holds each element of the list a being marked by an arbitrary tag, for which SETL atoms will be used in what follows. The contents of each such location are also described by a mapping (call it val_of), so that if C is a cell in this list, val_of(C) is the element stored at C, and next_of(C) is the cell that contains the next element. In addition we need to have an explicit way of referring to the first element of the list: a variable comp1 must be used, whose value is the first cell. Then, all in all, val_of(comp1) is the first element of the list, val_of(next_of(comp1)) is the value of the second element of the list, etc. The last cell cn in the list is distinguished by having no successor, which we express by setting next_of(cn) = OM.
We are now ready to build and manipulate lists. In order to create a list with a single element x, we execute the following fragment:
comp1 := newat(); -- Define cell. val_of(comp1) := x; -- place element in cell. next_of(comp1) := OM; -- First element is currently the last also.
In order to add a new element y after x on this list, we execute the following fragment:
new := newat(); -- Define new cell. val_of(new) := y; -- Place element in it. next_of(comp1) := new; -- First element now has successor. next_of(new) := OM; -- Second one does not.
If instead we want to add a new element in front of the list, we execute
new := newat(); -- New cell. val_of(new) := z; -- And its value. next_of(new) := comp1; -- First element now becomes second. comp1 := new; -- And most recent element is first.
Note that each invocation of newat() just represents a way of obtaining a unique marker for each element in the list. Instead of atoms we could have used integers, strings, or indeed any kind of SETL value. We choose to use atoms to emphasize that each cell in the list requires a "name" of some sort, but that this name serves as a place-marker for the value that goes with it and has no other significant relationship to that value.
In order to transform any tuple t into this list representation, we can make use of the following procedure:
procedure into_list(t);
if t = [ ] then return OM; end if;
first := newat();
cell := first;
succ := newat();
for i in [1..#t] loop -- Place tuple component
val_of(cell) := t(i); -- Cell for next.
next_of(cell) := succ; -- In which next component
cell := succ; -- will be stored.
succ := newat();
end loop;
next_of(cell) := OM; -- Last element.
return first; -- First allocated cell.
end into_list;
This procedure (which assumes that next_of and val_of are globally available, map-valued quantities) returns the first cell of the list, so that by applying the mappings next_of and val_of repeatedly we can reach all the elements of the list. We have assumed that next and val_of are global variables, defined outside this procedure. This is indeed the way in which a program using lists is usually structured: a single next mapping is used to describe the chaining of all lists present in the program. The chained representation of a tuple that is obtained from procedure into_list can be pictured as follows (Figure 6.1):
At first sight, representing a tuple in this way may not appear to be such a good idea. Of course, it is not hard to iterate over tuples having this representation: we simply start at comp1 and apply next repeatedly to step along, always applying val_of to get the component value corresponding to whatever index we have reached. For example, instead of writing
if exists x = t(i) | C(x) then ... else... (1a)
as we would if t were a standard SETL tuple, we would write
i := comp1; -- Initialize search index while i /= OM loop x := val_of(i); -- Value of next element. if c(x) then exit; end if; -- If found, search is successful (1b) i := next_of(i); -- else advance search index end loop; if i /= OM then ... else...
Although no less efficient than (la), the code (lb) is certainly more comp1ex and hard to read. Moreover, accessing a given component t(k) of t is much less efficient in the list representation, since for standard tuples the operation
is performed in one or two cycles, whereas if t has the list representation we will instead have to count up to k as we step through the list, as the following fragment shows:
i := comp1; for j in [1..k-1] loop i := next_of(i); end; x := val_of(i);
whose execution requires at least k cycles; that is to say, indexing operations are not efficient when applied to lists. On the other hand, certain operations that modify the size of tuples can be performed much more rapidly in the list representation than for standard SETL tuples. For example, for a standard tuple the operation which inserts x immediately after the i-th component of t requires time proportional to the length of t, since to create the expanded tuple all of the elements of t will have to be copied into new positions. On the other hand, if t has the list representation, and we know the name ni of the cell after which the insertion is to be performed, this operation can be done in just a few cycles, since all we have to do is
Similar remarks apply to the operation t(i..i) := [ ] which deletes a given component from a tuple in list representation. The following two procedures represent these operations. In writing these procedures, we suppose that a single pair of maps I>next and val_of will be used to represent all lists, and that the variables next and val_of have been declared global. We also suppose that only one logical reference to any of these maps is ever extant, so that no copying (see the preceding section) ever needs to be performed.
var next,val_of; -- Global Variables
next := {}; -- Initialize to null set
val_of := {}; -- Initialize to null set
procedure insert(x, ni); -- Inserts x immediately after the list
-- component
ix := newat(); -- whose cell is ni. Create a new cell ix,
next_of(ix) := next_of(ni); -- and make it the predecessor of next_of(ni)
next_of(ni) := ix; -- and the successor of ni
val_of(ix) := x; -- Place value in cell.
end insert;
procedure delete(ni);
-- delete the component immediately following that whose cell is ni
next_of(ni) := next_of(next_of(ni)?ni); -- unless ni is the last index in its list make
-- ni's successor the successor of ni's
-- original successor
end delete;
Provided tuples t1 and t2 are represented as lists, that t1 will not be required after t1 and t2 are concatenated, and that the name i of the last cell of t1 is easily available, the concatenation of t1 and t2 can also be formed in a number of cycles independent of the length of t1 and t2. The following procedure, in which we assume that each tuple t1 n list form is represented by a pair [first, last] consisting of the first and the last cell of t, shows this:
procedure concat(t1,t2); [t1_first,t1_last] := t1; -- unpack the first and -- last cells of t1 [t2_first,t2_last] := t2; -- and those of t2 if t1_first = OM then -- t1 is an empty list return t2; elseif t2_first = OM then -- t2 is an empty list return t1; else next_of(t1_last) := t2_first; -- link the two lists return[t1_first,t2_last]; end if; end concat;
The mapping next_of, and the variables used to keep track of first and last elements of lists, together define what are called pointers in other programming languages.
Quite a few other very useful trick representations of tuples, sets, maps, etc., are known. These representations make use of more comp1ex arrangements of pointers, i.e., internal mappings between elements. If the family of operations applied to a SETL object is appropriately limited, use of an appropriate one of these special representations can be very advantageous. Since further exploration of this very important issue would take us beyond the scope of the present work, we refer the reader to the References at the end of this chapter for additional material concerning the issue of data structuring.
EXERCISES
1. Write a while loop that takes 2**(2**n) cycles to execute, for some n. Write a while loop that take 2**(2**(2**n)) cycles to execute. For how many centuries will these loops run, if n = 100?
2. If the monks in charge of the Towers of Hanoi began moving the 64 rings in the year 1 million B.C. and if they more one ring per second, when will the world end?
3. By adding a mapping prev that takes a given list element to its predecessor in the list, it is possible to iterate over a list forward and backward. Write a procedure to transform a standard tuple into this "doubly-linked" list representation.
4. Write procedures to insert and delete elements from a doubly-linked list.
5. Estimate the cost of the list concatenation procedure of Section 6.5.9. Assuming that a map operation such as next_of(last) := first; takes five cycles, and that copying a tuple of size n takes 2*n cycles, determine the size of t1, t2 for which the linked representation is preferable to the standard contiguous representation for performing tuple concatenation.
The examination of various sorting examples in the previous sections indicates the very general fact that several algorithms, of quite different efficiencies, can usually be found to perform a given task. Several questions then arise:
The answer to (a) is found in the efficiency analysis which we sketched in Section 6.5. Given two algorithms, we can compare the number of operations they are expected to take in various cases and thus determine which one is preferable. Question (b), namely how do we know that a given algorithm is indeed the best possible for a given task, is much harder, and is the subject of so-called lower-bound analysis. For example, in the case of sorting lower-bound analysis gives the following result: "When we sort a set of size n by means of comparisons between its elements, 0(n log n) operations are required". This rigorous mathematical result (which we shall not prove here) shows that quicksort is probably close to the best way to sort, although bubble-sort is definitely not. Unfortunately such lower bounds are known only for a few interesting problems, and in general we have no systematic way of knowing how close to optimal a newly invented algorithm is.
Question (c), namely how can we improve on a given algorithm in order to improve it, is the source of much current research and also of large amounts of programming lore. Some basic techniques for transforming a program into a more efficient one take the form of very general prescriptions, such as
These prescriptions are simple and intuitive and can be expected to be generally useful. They can be applied almost mechanically and in fact can be automated to some extent, so that a clever compiler can apply such prescriptions by itself. In fact, most good compilers perform transformations corresponding to (1) and (2). Reuse of storage, as suggested in (3), is harder to automate, and the application of this rule requires greater ingenuity on the part of the programmer. The removal of recursive calls in favor of iterations has been studied in great detail in the context of languages such as LISP, which use recursion very heavily. We will examine a nontrivial example of recursion removal in a few pages.
In addition to the general rules stated in (1.)-(4.), there are certain optimizing program transformations that apply to very specific program constructs. Catalogs of such transformations can be found in some of the works listed in the References. Here we will only give some simple examples of applications of the rules (1), (2), and (3).
The simplest kind of efficiency-improving transformation which we can perform is to identify expressions that are common to several calculations and that can be evaluated once instead of repeatedly. For example, when calculating the roots of a quadratic equation, we may write the following familiar formulae:
x1 := (-b + sqrt(b**2-4.0*a*c))/(2.0*a); (1) x2 := (-b - sqrt(b**2-4.0*a*c))/(2.0*a);
In this case it is plain that the expression involving the square root is the same for both roots, which makes it possible to rewrite (1) as follows:
discr := sqrt(b**2-4.0*a*c); x1 := (-b + discr)/(2.0*a); x2 := (-b - discr)/(2.0*a);
Transforming from (1) into (2) saves four arithmetic operations and one square root evaluation and makes the expressions simpler to type. We remark that the expression (2.0*a), which is a common subexpression to x1 and x2, is still evaluated repeatedly in (2). Replacing the two evaluations by a single one by saving the result in some other variable would eliminate one arithmetic instruction but add one assignment statement and therefore would hardly be a saving at all.
As a less trivial example of removal of redundant subexpressions, consider the evaluation of polynomials at a point. That is to say, given some polynomial
we want to determine its numerical value at some point, e.g., for x = 10. If we represent the polynomial by a tuple P containing its coefficients in decreasing order, then we can evaluate the polynomial for x = c by means of the following code:
degree := #P - 1; value := +/[a*c**(degree-i + 1): a = P(i)];
Note that in the loop we evaluate c**n, c**(n-1), c**(n-2), etc. These exponentiations are evaluated as repeated multiplications, so that the number of operations performed will be
where the O(n) term takes care of the iteration over P and of a final summation of individual terms. In other words, as written the fragment (3) takes an amount of time proportional to the square of the degree of the polynomial. Substantial improvement can be obtained if we observe that the powers of c can be calculated in increasing order rather than in decreasing order, so that c**n is obtained from c**(n-1) by means of a single multiplication. An improved version is the following:
power := 1;
value := 0;
for i in [#P,#P - 1..1] loop
term := power * P(i); (4)
value +:= term;
power *:= c;
end loop;
Here the loop is executed n times (where n is the degree of P) and each execution of the loop takes three arithmetic instructions and a few assignments, so that the algorithm is now of order n, rather than n2. This is a substantial improvement for large enough n.
Let us note that a similar improvement can be obtained by a different approach, which is several centuries old, and is known as Horner's rule. We can rewrite the polynomial (*) by means of successive factorizations:
This suggests the following method for evaluating P:
value := 0;
for a = P(i) loop
value := value * c + a;
end loop;
Now the loop contains only two arithmetic operations and one assignment. This is certainly superior to (4), and is therefore the method of choice for evaluating polynomials. Note however that (5) was obtained by means of a clever rewriting of the original problem, and the solution was not the result of some systematic transformation rule, but was "pulled out of a hat," as it were. It is appropriate to remember at this point that the development and optimization of algorithms require a healthy dose of inventiveness (guided, to be sure, by a knowledge of existing algorithms and optimization techniques)
In order to illustrate the kind of program improvement that can be obtained by recursion removal, we will examine in some detail the (by now familiar) quicksort method. In its simplest version, we have described quicksort as follows:
procedure quicksort(S);
if S = {} then return [ ];
else
x := arb S;
lo := {y in S | y < x};
hi := {y in S | y > x}; (1)
return quicksort(lo) + [x] + quicksort(hi);
end if;
end quicksort;
The skeleton of this algorithm consists of two steps: partition and recurse. That is to say: pick some element of the set to be sorted, and partition the rest of the set into two disjoint sets, corresponding to elements that are smaller (resp. larger) than the partitioning element. Then apply this process recursively to each partition. The two very substantial improvement that can be brought to bear on the efficiency of quicksort are performing the partitioning in place and replacing the recursion with a loop.
Partitioning in Place. To see how to apply this improvement we first observe that after S has been partitioned into lo and hi, the value of S is not used again. This suggests that, instead of building lo and hi at each step as new sets, we may want to reuse the space occupied by S, to place lo and hi in it. How to do this is not immediately apparent: as long as S, lo, and hi are sets, we do not have a simple way of describing how they are actually stored. If however we make them into tuples forming part of a larger tuple, then it is possible to describe them by their size and their starting locations. Suppose S is a tuple of size n, and that the partitioning element which we choose turns out to be the k-th one (in increasing order). Then the locations 1, 2..k-1 can be used for lo, and the locations k + 1, k + 2..n can be used for hi. Then, rather than passing lo and hi as tuples in the recursive call, we can instead pass the end points of each and then proceed to partition each subtuple in place as well. (How the partitioning in place is actually performed remains to be discussed.) This leads to the following (tentative) improved version of quicksort:
T:= [x in S]; -- Initial tuple to be sorted. quicksort2(1,#T); -- First call to routine. procedure quicksort2(m, n); -- Apply quicksort to the portion of T (which is global) between T(m) and -- T(n) inclusive. if m >= n then return; -- Trivial case. else k = partition(m,n); -- k is the position of the partitioning element quicksort(m,k - 1); -- Elements smaller than it. quicksort(k + 1,n); -- Elements greater than it. end if; end quicksort2;
In this version of the algorithm, no composite objects are constructed (except for the global tuple T). Each call to quicksort is given an explicit region of T on which to operate. This forces us to mention the indices m, n, k, etc., explicitly, which makes for a longer and less readable piece of code. We must now show that the partitioning itself can be done in place. This is an interesting algorithm in its own right.
Partitioning an Array in Place. The problem is the following: given a tuple t, and an arbitrary element e of t, reorder the elements of t so that all elements l < e appear before e, and all elements h > e appear after e. For simplicity let us assume that e is chosen to be the first element of t. A first thought might be the following: scan t from left to right, and whenever an element e1 > e is found, move it up. But exactly where to? If we are to reorder t in place, then in order not to loose any information, we can only exchange elements. As a moment's reflection will show, the proper way to proceed is the following:
The following code (which deserves very careful scrutiny) performs this partitioning process:
procedure partition(i,j);
-- Partition the portion of array T from T(i) to T(j),
-- using T(i) as the partitioning element e. The procedure places e at
-- position k, and places all elements smaller than e at positions < k.
e := T(i); -- Partitioning element.
l := i + 1; -- Index for left-to-right scan.
r := j; -- Index for right-to-left scan.
while l < r loop -- Scans have not met in middle.
while l < r and T(l) < e loop -- bypass smaller elements.
l +:= 1;
end loop;
while l < r and T(r) > e loop -- bypass larger elements.
r -:= 1;
end loop;
-- Now T(l) > e, and T(r) < e. Exchange them if the two scans have not met yet.
if l < r then
[T(l), T(r)] := [T(r), T(l)];
-- Skip over these two elements which are now in proper place.
l +:= 1; if r > l then r -:= 1; end if;
end if;
end loop;
-- Now l = r. l = r or l - 1 (if it is positive) is the correct position
-- for the partitioning element. Place it there, and place T(r), which is no
-- larger than it, at the original position of the partitioning element.
if l > #T or T(l) > T(i) then l := (l - 1) min j; end if;
if l > 0 then T(i) := T(l); T(l) := e; end if;
return l;
end partition;
The top-level driver code for this routine can simply be
procedure quicksort(i,j); if i >= j then return; end if; mid := partition(i,j); quicksort(i,mid - 1); quicksort(mid + 1,j); end quicksort;
Both these procedures assume that the tuple T is globally available.
At this point, our first optimization goal has been achieved: we have a quicksort algorithm that works in place, that is to say, without constructing any new composite objects in the course of its execution. This has been achieved by explicitly describing the elementary operations that place each element of the object being sorted at some specified position in a single tuple.
In other words, we have deliberately avoided the use of composite objects in the explicit fashion characterizing version (1) of quicksort. Of course, when we avoid manipulation of composite objects, much of the power of high-level languages such as SETL is lost. In fact, our version of quicksort now looks remarkably similar to a PASCAL or an Ada version of the same algorithm. This indicates the relative advantages of these various languages: SETL and other languages that manipulate arbitrary composite objects are the most convenient programming tools and lead most rapidly and with smallest programming effort to correct and lucid versions of algorithms. However, when efficiency considerations become critical, we must often abandon the richer semantic primitives of these languages (or drop to a lower-level language altogether) in order to control the efficiency of every statement we write.
Removing the Recursion from Quicksort. The next step in our improvement of quicksort is to replace the recursive calls within quicksort by some iterative construct which can be expected to be cheaper (recall that a procedure call is as expensive as 10 to 20 arithmetic operations). The skeleton of the recursive version of quicksort is the following:
procedure quicksort(i,j); ... quicksort(i,k - 1); quicksort(k + 1,j);
Each invocation of quicksort receives some values for i and j and assigns some value to k. Therefore, to eliminate recursive invocations of the procedure, we must save the current values of i, j, and k, because they might be used after we return from the recursive call. This can be achieved by means of a stack on which we "pile up" the parameters which we need to preserve. A stack is a tuple which is manipulated as follows: elements are added to it at its end and deleted from it from the same end. In other words, the operations that modify a tuple are the following:
stack with:= x; -- "push" operator z frome stack; -- "pop" operator
Whenever we perform a recursive call, we use the stack to save whatever needs to be saved, i.e., whatever current values of variables and parameters will be needed after the recursive call. Whenever we return from the recursive call, we obtain, from the stack, all values that we need to resume the computation at the point where we left off. This is in fact what the SETL run-time system does: whenever a recursive call is executed, current values are saved before the (new instance of the) procedure starts executing. However, the run-time system does this in a stereotyped fashion which is often inefficient. In the case at hand, the two recursive calls are sandwiched between stacking and unstacking operations, whose action corresponds to the following code:
procedure quicksort(i,j); ... stack with:= [i,j,k]; -- Save current values. quicksort(i, k-l); -- Perform recursive call. [i,j,k] frome stack; -- Restore current values. stack with:= [i,j,k]; -- Save them again. quicksort(k + l,j); [i,j,k] frome stack; -- restore current values. end quicksort;
(Recall that the stack manipulations are the ones performed by the run-time system.) Several remaining inefficiencies should now be apparent: the most glaring one relates to the fact that the second recursive call is the last statement in the body of quicksort. In other words, there are no computations to be done after the second call, and therefore there is no need to save and restore current values of variables and parameters at this point. A smaller inefficiency is the fact that the second call only requires the values of k + 1 and j, so that there is therefore no need to save i either. So in fact we can do with the following:
procedure quicksort(i,j); ... stack with:= [k,j]; quicksort(i,k-1); [k,j] frome stack; quicksort(k + 1,j); end quicksort;
At this point, the meaning of the stack should be clear: it allows us to keep track of portions of the tuple T which remain to be sorted. With this understanding, we realize that quicksort will continue calling itself recursively as long as there is some portion of T still unsorted, i.e., as long as there is something left on the stack that indicates some pending invocation of quicksort. Having this insight, we can transform our algorithm into an iterative one that loops as long as the stack is not empty:
while stack /= [ ] loop [i,j] frome stack; -- Values to work on. ... stack with:= [i, k - 1]; -- to be worked on. stack with:= [k + 1,j]; -- ditto. end loop;
This is still incomplete: as written, each step deletes one pair of indices from the stack and adds two such pairs to it: clearly the stack will grow without bound, and the algorithm never terminates. What prevents this disaster from happening is that successive pairs describe shorter and shorter portions of the tuple being sorted. Whenever j - i < 2, there is nothing to sort. Therefore we should only save a pair on the stack if it describes a portion of the tuple of size greater than 1. A final improvement is to save only one pair on the stack and go to work immediately on the other, rather than stacking it and unstacking it at once. In order to keep the stack small, it turns out best to stack the indices of the larger portion to be sorted and go to work on the smaller one. Putting all this together, we obtain the following:
procedure quicksort(rw T);
-- Optimized version of quicksort procedure. It operates in place, with
-- explicit stack manipulation, and the partitioning process is performed
-- on line.
i:= 1;
j:= #T; -- Initial portion to be sorted is the
-- whole of T.
stack := [ ];
loop
e := T(i); -- Partitioning element.
l := (orig_l := i) + 1; -- Index for left-to-right scan.
orig_r := r := j; -- Index for right-to-left scan.
while l < r loop -- Scans have not met in middle.
while l < r and T(l) < e loop -- bypass smaller elements.
l +:= 1;
end loop;
while l < r and T(r) > e loop -- bypass larger elements.
r -:= 1;
end loop;
-- Now T(l) > e, and T(r) < e. Exchange them if the two scans have not met yet.
if l < r then
[T(l), T(r)] := [T(r), T(l)];
-- Skip over these two elements, which are now in proper place.
l +:= 1; if r > l then r -:= 1; end if;
end if;
end loop;
if l > #T or T(l) > T(i) then l := (l - 1) min j; end if;
if l > 0 then T(i) := T(l); T(l) := e; end if;
mid := l;
-- Save larger partition on stack.
if orig_r - mid > mid - orig_l then
[savei, savej] := [mid + 1,orig_r];
j := mid - 1; i := orig_l; -- Work on [orig_l..mid - 1]
else
[savei, savej] := [orig_l,mid - 1];
i := mid + 1; j := orig_r; -- Work on [mid + 1..orig_r]
end if;
-- Save larger partition only if it is longer than 1.
if savej - savei >= 1 then stack with:= [savei, savej]; end if;
-- Work on smaller partition only if it is longer than 1.
-- Otherwise take the top pending partition to work on.
if j - i = 1 and T(i) > T(j) then [T(i),T(j)] := [T(j),T(i)]; end if;
if j - i < 2 then
if stack = [ ] then -- Nothing left to do.
exit;
else
[i,j] frome stack;
end if;
end if;
end loop;
end quicksort;
This is very far from the short and intuitive algorithm with which we started. However, this final version is considerably more efficient than the original. It is in this final form that the quicksort procedure is usually implemented. Moreover, this more efficient version is still incorrect in some details and would have to be touched up to function properly! For example it will not work if T contains duplicate elements; moreover, during the