CHAPTER 6

Program Development, Testing, and Debugging

The normal stages of a program's life cycle are:

  1. Initial conception, formulation of requirements.
  2. Overall design of a program that will meet these requirements
  3. Detailed design and coding.
  4. Program review, with rework and extension as needed to clarify, simplify or improve efficiency.
  5. Development of a test plan, testing and debugging, removal of errors and retest.
  6. Operational use of program.
  7. Enhancement and repair during continuing operational use.
  8. Retirement.

This chapter discusses various key aspects of this program life cycle providing hints that aim to help the inexperienced programmer to cope effectively with the pragmatic problems normally associated with program design, debugging, and maintenance.

6.1 Bugs: How To Minimize Them

Any error affecting the behavior of a program is called a bug. Bugs are inevitable, but a few cardinal rules can help minimize the degree to which they infest your programs.

  1. Know that they will occur. Since any small error, e.g., forgetting a line, typing "-" where " + " is meant, misspelling an identifier or keyword, incorrectly inserting parentheses into an expression, will cause a bug, you must train and discipline yourself to higher levels of logical and typographical accuracy in programming than are required in any other human activity. Be suspicious. Program defensively. Check your programs scrupulously for syntactic and logical correctness, several times if necessary, before you try to run them. If in doubt as to the meaning of any operation or programming language construct, look it up.

  2. When bugs occur, your problem is to locate, recognize, and remove them. Bugs cannot be located unless you know the programming language with which you are working well enough to recognize problems when you are looking at them. Bugs cannot be eliminated until you have understood them well enough to know why and how they cause the faults that betray their presence. Finding bugs, like finding needles in a haystack, calls for systematic sifting, for careful detective work. A program is a delicate piece of machinery, and it is simple folly to think that you can make it work by kicking it hard in some random way to make its pieces fall into place. Because they involve many submechanisms, all of which must interface correctly if they are to work together properly, programs, like elaborate combination locks, require careful analysis and attentive sensing of their hidden internals when they need repair. The novice who tries to fix a malfunctioning program without fully understanding the way in which it is working is attempting a task that is far less hopeful than that faced by someone who tries to open an unfamiliar safe without understanding its workings or combination. "The sequence 33-8-19-27 doesn't work? Then I'll try 23-92-69-46. This doesn't work either? Then maybe 17-51-85-34 will be luckier." A student who allows himself to be drawn into of this sort of thoughtless, random attempt to repair a program will inevitably find that his efforts drag on unsuccessfully, not only till the end of the term or year, but until the end of the solar system, without revealing anything. What is needed instead is a systematic, analytic approach.

  3. Though programs are almost never entirely bug-free, observance of the rules of good programming style can reduce the density of bugs in your initial program drafts and allow bugs to be found more quickly once testing of your program begins. Finding the right approach to the programming task that confronts you, the right style in which to start writing the code that you need, is of prime importance. To find this "right approach" requires careful consideration of the logical structure of your programming task, with the aim of defining a collection of intuitively transparent operations that work well together and can be used to accomplish this task in as straightforward a way as possible. Code should impress by its clarity, naturalness, and inevitability, all of which make avoidance and exposure of bugs easier, rather than by obscure trickery and impenetrable cleverness. Programs that achieve brevity without sacrificing clarity are most desirable, since lines of code that you never need to write will never contain bugs. Effective brevity is attained by a correct choice of intermediate operations and by systematic use of these operations to produce the program you require. SETL is in itself a powerful programming language, but especially for larger, more complex applications it may be well to program by first inventing a still more powerful language specially adapted to your intended application. Then your initial program draft can be written in this (possibly unimplemented) language, after which it can be transcribed into SETL to make it executable. In this sort of approach, the primitives of your invented language will become the procedures of your SETL code. By using an auxiliary language in this way and by handling its transcription into SETL in as mechanical a style as possible, valuable protection against error is gained. See Sections 5.2 and 8.5 for a discussion of related issues.

  4. Careful program documentation also serves to expose and eliminate bugs. Good documentation will add an important degree of redundancy to your program. Your code expresses your intent in one way and your comments express the same intent in another. Discrepancies between the two indicate the presence of bugs. Carefully thought-out comments should be added to a program as soon as the code is written. Some comments will in fact be written before the code to which they refer, in order to guide composition of the code. Any additional comments needed to make documentation complete should be added to the code while it is still fresh; this creates an opportunity to review the code, checking it for logical faults. After the whole text, code plus comments, has been constructed and put into proper format, it should be left to "cool" for a few hours or days, after which it should be reviewed attentively and suspiciously. Such a "cooling-off period" will dispel some of the initial misapprehensions which may have crept into a code and thus will allow various systematic errors to be corrected.

  5. As has been said, brevity in coding is desirable, but this should be the kind of brevity that flows naturally from an effective overall approach to the programming task at hand, not the undesirable brevity which comes from stinting redundancy (e.g., by using short, unmnemonic, variable names). Use the features of the SETL language vigorously and eliminate clumsy circumlocutions where direct modes of expression exist, but avoid obscure tricks even where these gain brevity.

  6. Certain constructions, for example those that perform elementary arithmetic computations to determine positions in strings and tuples (for which "off-by-one" errors can easily occur) are bug-prone and need to be approached with caution. For example, what is the length of a string s(i..j), is it j - i, j - i + 1, or j - i - 1? To ensure that s(i..j) is exactly k characters long, what value do we give j: i + k - 1,or i + k + 1? Learn to recognize these trouble spots, double-check them when preparing your code, and surround them with assert checks when you do use them.

    For example, if you write

    assert #s(i..j) = j - i;

    immediately before proceeding on this assumption, your error will be pinpointed immediately; if you omit this check, you may have to find your way back to this error from some obscure symptom. A related idea is to introduce, and use, a collection of small standard procedures to handle these touchy situations in ways that are more instructive. For example, by introducing the macros (see Chapter 8).

    			procedure len_from(i,j); return j - i + 1; end;
    			
    			procedure take_len(s,i,k); return s(i..i + k-1); end;
    

    we can accurately extract a string of length k from s starting at character position i by writing

    			take_len(s,i,k);
    

    and can evaluate the length of s(i..j) by writing

    			len_from(i,j)
    
    

  7. As Donald Knuth has remarked, premature optimization is the root of much evil in programming. Compulsive (and ultimately ineffective) attempts to gain minor efficiency advantages often complicate programs and introduce bugs into them. As you compose a program, remember that substantial efficiency advantages will be gained globally by choice of effective algorithms, not locally by compounding of minor advantages.

  8. Your program test plan should begin to be developed as your program is being written, and a substantial portion of the collection of test- and debug-oriented print and assert statements that you will use to test your program should be composed and entered as soon as the first draft of the program beings to approach completion. Early attention to your test plan will serve to pinpoint complex program sections that require careful testing. These are also the sections whose logic needs to be inspected most closely before testing begins. See Sections 6.4 and 6.6 for additional discussion of this point.

6.2 Finding Bugs

Even, alas, if you are very systematic and professional, some bugs will creep into your program, and the problem will then be to find and fix them. The following remarks should help you learn how to do this effectively. Debugging always starts with evidence that a program error has occurred somewhere during a program run. The problem in debugging is to work one's way back, from the visible symptom first noticed, to the underlying error. The errors one is looking for can be called the error sources or primal anomalies. These are the first (incorrectly written) operations or statements which get correct data from what has gone before them but pass data that are no longer correct to what comes after them. They are the instructions at which your program first "runs off the rails." The initial evidence of error that you see may relate only indirectly to these primary error sources. The difficulty of finding the erroneous statements is complicated by the fact that the full history of an extensive computation comprises a vast mass of data, impossible to survey comprehensively. In debugging you must therefore aim to explore as narrow a path as possible, while still finding your way back to one or more primal anomalies.

A good first step, but one that should not be allowed to hold you up too long, is to look closely at whatever fragments of correct output have been produced. If little or no output is correct, then your program may have failed before even the first print statement was executed. This hint may help you narrow the bug hunt. On the other hand, if some output is correct, then the program was probably functioning correctly till some point past the statement which produced the last correct output. Find the point in your program at which this output was produced, and see what comes before and after it. Again, this may narrow the hunt. Examine the erroneous output carefully and try to see whether its logical pattern reminds you of any particular section of your program. This also can sometimes yield useful hints concerning the likely location of the bug, especially if different parts of your output data are produced by recognizably different sections of code. If certain items of output that you expect are missing, try to see what evidence there is that all the code that you expected to execute did actually execute: remember that unanticipated data may have caused your program to follow an unexpected path, so that it may have bypassed, or may never have reached, the code sections which were supposed to have produced the output which you are surprised not to see. Analysis of evidence of this general kind will in favorable cases point the finger of suspicion at certain narrow program sections. However, in less favorable cases, the available evidence will be ambiguous. In this case, you will need to generate more extensive traces and dumps. This can be done in one of two ways:

  1. By inserting additional print statements into your program, to make it print out something of a "motion picture" of what has happened.

  2. By inserting various other checks, especially assert statements, which check assumptions on which your program depends, but which you suspect might be failing.

Section 6.6 will have more to say about technique (b), which is related to the general issue of formal program verification. The following more pragmatic hints will help you to apply this technique effectively. It is particularly important to place assert statements in sections of code known to involve delicate constructions, especially if (as in the case of the "off by one" bugs considered in the last section) the necessary checks are simple. Since the correct functioning of a program often hinges upon the assumption that key variables will change in a consistent way as iterative execution proceeds (for example, always increasing or always decreasing) it can be useful to save the last previous value of each significant variable var and to write checks which compare the last previous value of var with its current value. This can be done by introducing an auxiliary variable last_var for each var and writing an assignment

last_var := var;

whenever it is desirable to save the last value of var. Then checks like

assert var = last_var;
or
assert var /= last_var;
or
assert last_var = OM or var * last_var = {} and var /= last_var;

etc., will all prove useful.

Ultimately, however, the problem with a purely assertion-based debugging technique is that it is not easy to formulate the necessary checks comprehensively enough to make it unlikely that a bug (which probably relates to something that has been overlooked) can slip through.

Hence one must often fall back on method (a), which generates additional raw evidence for inspection. The problem in using this method is to avoid burying yourself in too voluminous a trace of the thousands or millions of events that take place as a program executes. To avoid this danger a carefully planned sequence of probes is necessary. A good idea is to resurvey your program, mentally list its main phases, and determine all the data objects which each phase passes to the next phase. If your program has been well designed, i.e., has been built up from modules interacting with each other in well-structured fashion, there should not be too many of these objects, and then it is reasonable to print them out for inspection. Before inspecting this information, review the logic of your program, and make sure you know just what features you expect to find in values of the variables that you have printed. Try to be aware of every feature on which any part of your program depends. Then check the actual data. If the data printed at the end of a phase looks correct in every detail, then this phase is probably correct. If something strange-looking appears in the data produced by a given phase, although the data supplied to this phase look correct, then there is probably something wrong with the code of this phase.

When this stage of debugging is reached, you will at least have determined which of the several phases of your code contains the error for which you are hunting. At this point, it is a good idea to think over all the evidence that you have examined, and see if any compelling picture of the problem seems to suggest itself. Sometimes the fact that the offending phase has now been located removes enough confusion for the difficulty to be guessed quickly. If not, you will have to carry your tracing to a more detailed level. This is a matter of inserting print and assert statements more densely into the offending phase, in order to locate the particular subphase that contains the error. As before, this is the subphase to which good data are being supplied, but which is seen to pass bad data along to its successor.

  1. Once the bug location has been pinned down to a program section roughly a dozen lines long, review the logic of these lines. Read them very closely, looking for some misunderstanding which could have produced the anomalous data which you know that this section has generated. Try again to correlate data features with the operations responsible for producing these features. If this doesn't work, take the data supplied to the erroneous subphase, and try to trace the way that the subphase will act on this data, by hand, step by step, until you spot some error.

  2. In most cases, these steps will find the bug without too irritating an expenditure of effort. However, in the stubbornest, fortunately rare, cases, the problem for which you are hunting may still elude clear identification. In these particularly resistant cases one of three causes may be at fault.

    1. If the algorithm which you are using is complex, you may have misunderstood its logic. It may be that no single line of your code is wrong: rather, its overall pattern may be subtly wrong, causing it to produce the output you see, rather than the results you wrongly expected it to generate. Global logic errors of this sort are often quite confusing. If you come to suspect that a problem of this sort has occurred, you should reason once more through the structure of your program, trying to convince yourself by careful analysis that it is logically sound. Section 6.6 describes the formal rules that underlie reasoning of this sort.

    2. There may be nothing wrong: you may simply have misunderstood what output your program was supposed to produce. Or you may have been looking at the wrong phase of a program which really does contain a bug, because you thought that the output of this phase showed some error, while in reality the bug was elsewhere. Or you may not have been running the program you thought you were running, or the version of the program you thought you were running, or your program may have been reading input data different from what you assumed. In such case, take a few minutes to cool off, review the whole situation, including the logic of your program, once more, and start over.

    3. Your problem may be caused by a true "system bug," that is, an error, not in your program, but in one of the layers of the software including the SETL compiler, run-time library, or operating system under which you are running. Concerning bugs of this kind we can say the following:

      (iii.a) Don't be too quick to suspect them. Though such problems do crop up from time to time, they are much rarer than errors in your newly written programs. Remember that dozens of people are using the same software systems that you are, and that if the problem afflicting you is a system-level problem, it would affect all of these people. Before you become willing to blame your problem on anything other than a fault in your program, you should always have examined your program with great care, located a section just a few lines long which you can be sure is receiving correct input (because you have printed and inspected its input) and producing bad output (again, you must have printed and inspected this output). Finally, meticulous examination of these few lines, with review of the definition of all the operations these lines involve, of the parenthesization of those lines, and of any applicable rules of operator precedence must give you "courtroom" evidence that the system is not performing according to its specifications. At this point you are almost (but still not quite) in position to report a system problem to the expert in charge of maintaining your copy of SETL system (or of the operating system within which the SETL system runs). Before doing so, however, you should try to simplify the evidence still further, isolating the malfunctioning lines into a malfunctioning program just a few lines long, and then paring this program down still further if possible, ideally to the point at which it contains just three lines: an assignment initialising a very few variables, a single line which obviously does not function as it should, and a print statement which confirms the fact that this line has failed to act in the manner demanded by the rules of SETL. If the system problem which you think lies at the root of your troubles disappears somewhere during this sequence of steps, the cause of your difficulties may not be a system problem at all, but an error or misunderstanding on your part, which your attempts to locate the suspected systems problem may have clarified. In this case, chastened, you should return to your original program, fix the error in it, and continue your debugging. If, however, you do succeed in creating a very short program which gives unmistakable evidence of system malfunction, you should transmit a complete, clean copy of this program to a systems expert. This should be accompanied by a clear explanation of the problem you have pinned down. He will then take steps to fix the SETL system, or to have it fixed.

      Note that problems in the SETL system, like problems in your own programs, are most likely to concern marginal, rarely exercised cases. Though the system has been in use for a few years and has been tested fairly extensively, exhaustive testing of so complex a system is simply not possible. (See Section 6.4 for a discussion of some of the issues involved in attempts to test programs comprehensively.)

      There are a few cases in which it is reasonable to jump a little more rapidly to the conclusion that a system bug is affecting you. One is the case in which two runs of absolutely identical programs and data yield different results. Another is the case in which insertion into your program of a statement which is harmless by definition changes the behavior of the program significantly. For example, if insertion of a print changes your program's flow of control, something is obviously amiss at the system level. This may be evidence that can be reported to an expert immediately (but see the caution extended in (f)).

    4. It should be clear from what has been said that one of the very first things that you will want to trace when you start to analyze a malfunctioning program is the input data it is reading. Always echo this data by printing it out immediately after it is read. Your input data may not be what you think it is, or you may be reading it incorrectly.

    5. Especially if a difficult bug is being pursued, debugging as an activity tends to create an atmosphere of confusion, which grows like a thundercloud as the mind struggles to free itself from the misapprehension which first allowed the bug to slip in. Particularly difficult bugs sometimes make one feel that one is going insane, since the laws of logic seem to be breaking down. To combat this perilous confusion, you must maintain a very deliberate, step-at-a-time, and above all skeptical, attitude while you are debugging. Verify the situation at every turn; look at what really is in your source text rather than trying to remember what was there; print out a record of what your program really is doing rather than guess what is going on. Inexperienced student programmers often come to advisors with old versions of programs they are trying to debug, claiming that "I ran this program on Tuesday, and I made two or three changes that I am sure are harmless, and now it doesn't work." A more experienced programmer, who knows that the only valid evidence to work from is a current, single, untorn listing showing program and output unmistakably together, will only laugh at this.

      To reduce the level of your own confusion, it is sometimes helpful to work over your problem with a friend, trying to explain what is going on, and reviewing salient parts of the logic of your program with him, till he begins to understand it. A more expert consultant will often be able to spot the trouble that you have missed, but even if your consultant is less expert than you yourself, you will often find that the very act of explaining the problem lets you spot what is wrong.

    6. Even when a program has once begun to function (and often even when it has been used successfully and intensively over a considerable period), it may still contain bugs, which can lurk within sections of code which are rarely, perhaps almost never, exercised. For this reason, code inserted for debugging should generally not be removed once the bug is found. Don't throw away your crutches: it may become necessary to debug the same program again! Instead of removing debug code, you can comment it out by inserting a '--' sign at the start of each line inserted for debugging. (Only inserted lines that never generated any evidence useful for debugging should be wholly removed.) Another technique, particularly useful during extended development and debugging of large programs, is to make the most valuable debug prints and checks conditional, by including them in if statements containing conditions which are normally false but can be turned on by supplying program initialization parameters. (See Chapter 9 for a discussion of program parameters.) If this is done, it becomes possible to examine the inner working of a malfunctioning program quickly, without having to recompile it all.

6.3 A Checklist of Common Bugs

Certain bugs occur frequently, and the experienced programmer learns to recognize their characteristic symptoms. Here is a checklist of commonly occurring bugs, with some indication of the symptoms they are likely to produce. We only list bugs that would pass through compilation undetected.

BugLikely SETL symptom
Variable not given any initial value"Illegal data-type" error
Incorrect termination of a loop (e.g., count off by 1)Missing items in data collections, sums or sets too small if loop terminates too soon. "Illegal data-type" error if loop terminates too late
Incorrect limits in string and tuple slices (e.g., count off by 1)(Similar to incorrect loop termination)
Incorrectly structured while loop conditions or bodies, or incorrect initial conditions in while loopsProgram does not terminate
Incorrect treatment of initial cases in recursions, or bad procedure callsProgram does not terminate, possible memory overflow
Omission of exit or continue statementProgram "runs on" into code not intended for execution.
Omitted or improperly inserted data-value updateFailure of relationships expected subsequently; unexpected values or jumps. Effects can be subtle.
Shared global variable unexpectedly modified by invoked procedure or functionEfforts can be very subtle, and particularly hard to find. Beware of global variables!
Misspelled variables, e.g., AO for A0, Bl for B1, c1 for ci"Illegal data-type" error, possibly no output
Reading unexpected dataCan cause wide variety of effects
Some unanticipated characters encountered by the string-scan operationsProgram does not terminate
Failure to set a program switchEffects can be subtle
Parameters out of order in procedure call"Illegal data-type" error (generally easy to find)
Variable inadvertently modified by assignment to a variable intended to be different but having the same nameIf no data-type error is caused, effects can be subtle
Variables out of order in read statement"Illegal data-type" error (generally easy to find)
Read operations of program inconsistent with data actually present in input file"Illegal data-type" error (generally easy to find)
Target of an assignment statement misspelledEffects can be very subtle
Not resetting a counterEffects can be very subtle (see "Incorrect loop termination")
Complex, incorrect combination of Boolean conditionsEffects can be very subtle
Too few or too many parentheses in expressions; misunderstanding of precedence rulesEffects can be very subtle

6.4 Program Testing

Debugging is the process of searching for the exact location of a program error when you know that some error is definitely there. Testing is the systematic exercise of a program which you believe might be correct, in an effort to see whether bugs are really absent. If testing shows a bug, debugging starts again. If your tests are not systematically designed, then bugs may go undetected and remain in the program. All one knows about a poorly tested program is that it works in the few cases for which it has been tried; it may fail in many others.

Test design is as important a part of program development as the choice of algorithms and data structures. Development of a test plan should begin while a program is being written. A procedure which is hard to test is apt to be bug-prone, and should be simplified if possible. By keeping testability in mind, you will avoid unnecessarily complex constructions, and produce cleaner, sounder code.

Testing falls into three distinguishable phases, which we will call first-stage testing, second-stage or quality assurance testing, and maintenance or regression testing. First-stage testing begins as soon as a program is complete enough for execution to be possible. Its guiding hypothesis is that bugs are present in sufficient numbers to prevent much of the program from working at all. During first-stage testing, one aims to make the main facilities of the program being debugged operable by finding and removing bugs quickly. Quality assurance testing begins when first stage testing ends. It assumes that a few obscure bugs remain in the program to be tested and aims to test systematically enough to smoke them out. Maintenance testing aims to ensure that new bugs are not introduced into old programs during their extension and repair.

6.4.1 First-stage testing

First-stage testing should work through a program "bottom up," first testing the bottom-level procedures which implement the basic operations used by the rest of the program. Once the code realizing these operations has been checked and found to be operable, the testing process will focus on intermediate-level procedures, and once these have been checked one will begin testing the program's main capabilities. Attempts to short-circuit this systematic, level-by-level test procedure by jumping directly to tests of higher program levels are more apt to waste time than to speed things up, since the lower-level causes of higher-level failures will then be hard to understand. For systematic testing, test input will need to be prepared for each procedure to be tested; this should be designed to make the output produced easy to inspect. If any of the procedures being tested make use of difficult or obscure data structures, it may be necessary to develop auxiliary output procedures which print these data structures in formats that clarify their logical meaning. When such procedures become necessary, they should be written and tested immediately.

Perhaps because realism might lead to suicide, programmers are generally overoptimistic concerning the likelihood that a program that they have just written will work right away. Careful preparation of a first-stage test plan serves to counteract this common illusion; the more realistic attitude thereby engendered encourages more careful initial program inspection, and this often reduces the number of bugs present when first-stage testing begins. This is why programs developed cautiously often become operational quickly, whereas programs developed in too optimistic a frame of mind often begin to work only after frustrating and totally unexpected delays.

An effective way of organising tests is to group them into a single procedure called test_prog, whose one parameter s is a string consisting of test names separated by asterisks. The test_prog can then have the following structure:

procedure test_prog(s) -- skeleton of test procedure

while s /= "" loop
	span(s,"*"); tn := break(s,"*"); 
	if tn = "" then exit; end if;
	print("Beginning Test", tn);
	case tn of
		(put sequence of named tests here)
		else
			print("Unknown test name");
		end case;
	end loop;
end test prog;

To trigger a sequence of tests named test_4, test_2, etc., one has only to write something like test_prog("test_1*test_2*..."). Later, when first-stage testing is complete, this call, and test_prog, can be left in the program P that has been tested, but the argument of the test_prog call can be changed so that it reads test_prog(getspp("TEST=/")) (see Chapter 9 for an account of the getspp library procedure.) If this is done, no tests will be executed unless P is invoked with a command line parameter of the form TESTS = testl*test2...*testn, in which case the named tests will be performed. This approach makes it easy to retest a program in which unexpected trouble has developed. Of course, the test facilities available should be carefully documented at the start of the test_prog procedure.

Especially when a long program P is being developed, it may be desirable to begin testing before all parts of P have been coded (or even designed) in detail. Of course, such an approach will be reasonable only if P has a sound, highly modular overall design, and only if the missing sections of P have been designed in enough detail so that you can be sure that no inconsistency will develop when they are designed and coded in detail. This mode of organizing development and testing is sometimes called top-down testing. It has the advantage of allowing testing and development to proceed in parallel. A related advantage is that it can provide particularly early confirmation of overall design soundness, or, if a design proves to be unsound (say, in terms of human factors, i.e., usability), it can give early warning of trouble.

If a top-down approach to development and testing is taken it will be found useful to provide a standard, multiparameter library routine having the name MISSING_SECTION. Then parts of your program that have not yet been coded can be replaced by invocations

MISSING_SECTION(name_of missing_section);

where the string-valued parameter -name_of_missing_section- should assign the missing section a name that can be printed. The MISSING-SECTION procedure should also allow optional additional parameters, so that it can be invoked by

MISSING_SECTION("name_of missing", "p1 p2 ... pk", [p1, ..., pk]);

where p1, p2,..name various parameters with which the missing section would have to deal or which might explain why it was (perhaps unexpectedly) invoked.

6.4.2 Quality assurance testing

Second-stage (or quality assurance) testing should aim to exercise a program comprehensively, in at least the following senses:

  1. It is obvious that parts of your program that have never been executed during debugging may well contain unrecognized errors. The battery of tests you develop should therefore force every line of your program to be executed at least once.

  1. If your program branches on a Boolean condition, then you will want to supply at least one test case in which the condition evaluates to true, and another in which the condition evaluates to FALSE.

  2. Improper treatment of extreme values is a common cause of program failure. A program may work for nonnull sets, tuples, or strings, but not for the corresponding null cases; for positive integers n but not if n = 0; for integers less than the length of some string, but not for integers equal to this length, etc. It may work when a while or a for loop which it contains is entered, but fail if the loop is bypassed entirely.

    In preparing a comprehensive collection of tests, you will therefore need to survey your program systematically, listing marginal situations of this kind as exhaustively as you can; then you should prepare at least one test that will force each logically possible situation to occur.

  3. Once a list of all procedures, loops, branches, code sections, and marginal cases to be tested has been collected and a comprehensive set of tests has been developed, it may be worth preparing a formal test coverage matrix which shows which tests exercise each program feature. A chart of this kind makes it easier to spot cases that have never been tested. It can also help to select tests to be run during regression testing (see the following discussion) and can help to pinpoint program sections to be examined when a test fails. Such a chart will also make it easier to avoid running too many tests, all of which exercise the same limited group of program features but never use others. Note that, if regarded as a kind of test, "production" use of a program is subject to this objection; i.e., daily use of a program often exercises only a limited subset of its features. This is why programs that have been in heavy use for extended periods sometimes fail when their usage pattern changes significantly.

  4. Compilers sometimes include features which make it easier to determine the coverage provided by a family of tests. For example, it may be possible to generate a listing of all program statements executed during a sequence of runs, of all branches taken, of all procedures invoked, etc. The SETL profiling facilities is useful for this. You will want to familiarize yourself with this facility, which can help assure that the test sets you develop for your programs are adequate.

  5. Once developed, test sets become an important adjunct to the programs that they test. Such test sets should therefore be organized in a manner which facilitates their long-term maintenance and reuse. The tests that are available, and the coverage they provide, should be adequately documented.

  6. Programmers often find it hard to bring sufficient enthusiasm to the task of systematically rooting obscure bugs out of code that they themselves have written. In part, this is a matter of optimism; in part, a result of the mental fatigue which tends to set in at the end of a lengthy code development; in part, a consequence of the difficulty of overcoming the very mind-set which introduced an error in the first place. For all these reasons, it is good practice to make testing of large programs the responsibility of a quality assurance group independent of the development group that produced these programs. If this is done, then, knowing that an independent group of programmers will probe their work systematically to find shortcomings, the original development group will be encouraged to simplify their product so as to improve its reliability.

Even where resources do not permit fully independent organization of the activity of program testing, it is well to ensure that every line of a complex program is read and understood by at least two programmers, each of whom will be able to spot problems and suggest tests that the other might have overlooked.

6.4.3 Regression testing

Regression testing is testing routinely applied whenever a previously working program is amended, to ensure that newly introduced code has not caused new errors. Tests which will be used in this way should be written so as to be self-checking, i.e., to produce little or no output if they have run correctly, but to produce copious output pinpointing a problem as closely as possible when an error is detected. This can be done by organizing the tests so that they perform various calculations, always in at least two different but logically equivalent ways. If these paired computations produce the same result, then either no output, or a simple message "TEST xxx PASSED", should be printed, but if a discrepancy is detected output which shows the discrepancy and displays the values of all variables related to the discrepancy should be printed.

If a chart has been prepared showing the program features exercised by each test (see iv above) it can be used when a test fails to suggest what part of the program should be examined first to find the cause of failure. If some one of these program sections has just been changed, it will of course come under immediate suspicion.

6.5 Analysis of Program Efficiency

Up to this point, we have generally ignored one of fundamental issues of programming practice: how much machine resource, i.e., execution time and memory, are actually required to execute a given program. This omission was deliberate, and reflects a basic maxim of programming: Make it right before you make it fast. It is now time to repair this omission, and examine two related topics:

  1. The estimation of program efficiency.

  2. The study of means to improve program efficiency by refinement and transformation of a correct but inefficient working program.

Two kinds of machine resource are relevant to the study of program efficiency: execution time and data storage required. In what follows, we will concentrate our attention on the first of these, because in many cases it is time which is critical. Data storage requirements were a pressing concern in the early days of computing, because memory was an extremely expensive hardware component and computers were equipped with only a few thousand words of it. As a result, early programming languages such as FORTRAN included various complex mechanisms for managing the limited memory resources available, which forced the programmer to concern himself with low-level details which were extraneous to the actual application to be programmed. Memory has in the meantime become a much cheaper commodity, and concern with its optimal utilization can in many cases (but by no means all) be left to the machine itself (in the form of some facility for dynamic storage allocation). This is what we will do in most of what follows.

6.5.1 Estimation of program efficiency

It is very easy to use SETL (or any other programming language) to write programs which would take years, or even hundreds or thousands of years, to finish executing. Consider, for example, the code fragment

	for i in [1..1000] loop
		for j in [1..2000] loop
			for k in [1..3000] loop
				for l in [1..4000] loop 				 (1)
					sum +:= (2 * i**3 + j**3 + k**2);
				end loop;
			end loop;
		end loop;
	end loop;

In this code, for each successive value i, the variable j iterates over 2000 different values; for each value of i and j, the variable k iterates over 3000 values; and for each value of i, j, and k, the variable l iterates over 4000 values. Thus, all in all, the innermost statement of the code fragment (1) will be executed 1000 x 2000 x 3000 x 4000 times, i.e., 240 billion times. This statement involves 6 multiplications and 4 additions, so that at least 2.4 trillion elementary arithmetic operations are required to execute the code (1). Even on a computer capable of executing 400 million arithmetic operations per second (a fairly typical performance figure nowadays) and even if the code (1) were written in a programming language capable of exploiting this raw arithmetic capability to the utmost, 6,000 seconds would be needed to execute the code (1). Since an hour is about 4000 seconds, this is about 1.5 hours. However, since SETL (which pays a price in efficiency for its very high level) is about 30 times less efficient than this, execution of the SETL code (1) would require about 2 days of computer time. This makes it quite clear that in writing SETL programs one needs some way of estimating the computational resource which will be consumed to execute the code that one sets down.

6.5.2 The execution time of elementary instructions

Let us begin by describing a basic unit of expense. There is some elementary time interval, which we will call D, which is the time it takes to execute the simplest SETL instructions. All instructions consume an amount of time which is some multiple of D. The actual value of D depends on the computer and on the specific SETL implementation being used and can be anywhere between a millionth to a ten-thousandth of a second. In what follows, estimates of execution times will always be given in units of D. We will refer to D as the basic instruction cycle, or cycle for short.

Instructions whose operands are simple data types: integers, floating-point numbers, Booleans, all take roughly one cycle to execute. This includes

Arithmetic operations on integers and floating-point numbers

Boolean operations.

Comparison operations ( =, /=, >, <, etc.)

At the level of the actual machine, there may be secondary differences among some of these; for example, a floating-point division is typically slower than an integer addition. In this discussion we can safely ignore these differences, and we shall do so in what follows.

The simple assignment statement

v := expr;

where v is a simple variable name, also takes roughly one cycle to execute (of course, only after the value of expr has been obtained).

6.5.3 Operations on sets

Operations on composite objects (sets, tuples, and maps) fall into two categories: instructions that take a fixed number of cycles, regardless of the size of the object to which they apply, and instructions whose execution time is a function of the size of their arguments. Let us examine each of these categories of set operations.

6.5.3.1 Elementary operations on sets

The following operations on sets can be performed in a constant number of cycles:

Cardinality:#S takes one cycle to evaluate, regardless of the actual number of elements of S
Membership:(x in S) where x is a simple item (numeric or Boolean) takes typically three to six cycles to evaluate, and increases only slowly with the size of the set S

There is nothing obvious about these remarks! You might wonder how the membership test can actually be performed without examining all the elements of S. The answer lies in clever choice of the internal data structures used by the SETL run-time system itself These structures are discussed in Chapter 10 and we will not say more about them now, except to note that the possibility of using sets as freely as we do in SETL depends in part in the constant execution time required by the membership operation. Note that this constant time behavior applies when x is a simple (nonstructured) value. In general, the membership test x in S takes as long as (or a little longer than) an equality test x = y. The relevance of this remark will become apparent in the following section.

Insertion and deletion:For simple objects the operations x with S and S less x, take a nearly constant number of cycles to execute. Exceptions relate to questions of storage allocation, and will be discussed later. The actual number of cycles needed for insertion of a simple value into a set of moderate size is about 10-15 cycles.
Map retrieval:To evaluate or to modify f(x), where f is a map and x is a simple value, takes nearly constant time, comparable to the time required for set membership operation. Once again, efficient implementation of this fundamental operation is secured by the use of a sophisticated data structure about which we will say more in Chapter 10.

6.5.3.2 Global operations on sets

Operations that construct composite objects can be expected to take an amount of time proportional to the size of the object they build. This means that set union, intersection, and difference take longer when their arguments are larger. To illustrate this point let us examine the set union operation S1 + S2. This is evaluated in several steps:

  1. Initialize by making a copy of S1. This can be expected to take a number a steps proportional to #S1.

  2. Iterate over S2, and check each element of S2 for membership in S1. Whenever an element is found which is not in S1, insert it into the union set.

This process is described by the following equivalent SETL fragment:

	S3 := S1;           	 -- Initialize result
	for x in S2 loop        -- Iterate over second set.
		if x notin S1 then
			S3 with:= x;                                 (2)
		end if;
	end loop;		

After code like this has been executed internally, S3 will clearly be equal to (S1 + S2). To estimate the execution cost of the loop in (2), we can reason as follows: the loop body is executed (#S2) times, i.e., once for each element of S2. For each such element we perform a membership test and (possibly) a set insertion, both of which take constant time. Therefore the loop takes an amount of time proportional to #S2. The initialization of S3 amounts to making a copy of S1, which takes a time proportional to the size of S1. Putting all of this together, and neglecting small differences between the execution time of membership and insertion operations, we can conclude that the cost of a union operation is proportional to the sum of the cardinalities of the sets involved. This is not an exact statement, but it is sufficiently accurate to convey an idea of the order of magnitude of the execution time of this set operation.

We can apply a similar analysis to the set intersection operation (S1*S2). This is evaluated by the SETL run-time system in the following fashion:

  1. Initialize the result to the empty set.

  2. Determine which one of the two sets has the smaller cardinality (for definiteness, let us say it is S1). Then iterate over S1, testing each of its elements for membership in S2. Insert each element of S1 which is also a member of S2 into the intersection set being built up. This process corresponds to the following SETL fragment:

		S3 := {};
		for x in S1 loop
			if x in S2 then
				S3 with:= x; 				  (3)
			end if;
		end loop;

Using much the same reasoning, we can easily see that the execution time for this fragment is proportional to the cardinality of S1, i.e., to the size of the smaller of the two sets entering into the intersection operation.

Set difference, set inclusion, and subset testing operations are performed in similar fashion and can therefore be expected to take an amount of time proportional to the cardinality of one of their arguments.

Other SETL primitives involve similar (implicit) iterations over sets. The most obvious of these is the print statement: print(S) requires the examination and output of each element of S. Set equality and inequality are more complex examples. To test whether S1 and S2 are equal, the SETL run-time system performs the following sequence of actions:

  1. Compare the cardinalities of S1 and S2. If these cardinalities differ, the two sets are distinct.

  2. If the two sets have the same cardinality, then we must examine each element of one set, and check it for membership in the other set. If an element is found which does not belong in both, we know that the two sets are not equal. If the two sets are equal, each element will have to be examined, and the process of establishing equality will take a time proportional to the common size of S1 and S2. The internal equality testing process can be described by the following SETL procedure:

	procedure equal(S1,S2);
		
		if # S2 /= # S1 then return false;
		else
		
			for x in S1 loop
				if x notin S2 then return false; end 		if;
			end loop;
		
			return true; -- All elements are common.
		
		end if;
		
	end equal;

We mentioned in the previous section that the membership operation must often perform an equality test involving the object being tested for membership. This means that if S1 is a set of sets, and su is a set, then the operation

su in S1

will take a time proportional to the size of su, because it will involve testing su and one or more element(s) of S1 for equality. It should be clear that when sets of sets of sets of..are involved on both sides, the membership test operation will become more complex, and the size of the subobjects of S1 will have to be taken carefully into consideration.

6.5.4 Operations on tuples and strings

Indexing, i.e., the retrieval or modification of the i-th component of a tuple or string, is the basic operation on indexable objects like strings and tuples. This operation takes a small constant amount of time, typically two to three basic cycles.

On the other hand, membership testing, i.e., the operation (x in T) where T is a tuple or string, proceeds by examining all elements of T in succession, in the manner suggested by the following equivalent SETL procedure:

	procedure intuple(x,T);

		for y = T(i) loop
			if x = y then return true; end if;
		end loop;
	
		return false; 		-- x was not found in T.

	end intuple;

Given the way that tuples and strings are represented internally, there is no more efficient way to tell a priori where x is found in T, if indeed it is found in T at all. If x is not in T, this will only be ascertained after all elements of T have been examined, and if x is in T, it may be the first, second, ... n-th component. We can say that on the average, half of the elements of T will have to be examined before x is found. This means that a membership test applied to a tuple T can be expected to take a number of cycles proportional to #T. (Compare this with the more advantageous constant time behaviour of membership tests on sets).

If x is itself a structured object, and T is a tuple of such structures, then the equality test in the loop (5) can be expected to be proportional to the size of x. Therefore in this case the membership test (x in T) will take time proportional to (#x * #T).

Tuple and string concatenation are analogous to set union: they take a time proportional to the sum of the sizes of the operands, i.e., to the size of the result. Testing tuples and strings for equality is also linear in the size of the operands, as it requires that each component of each object (in the worst case) be examined.

The execution times of the basic SETL operations are summarized in Table 6.1. We can estimate the cost of any straight-line program simply by adding

Table 6.1. Execution Time of Various SETL Primitives

OperationTime Units

Arithmetici + j, i * j1
Assignmentx := y1
IndexingT(i)1
Map retrievalM(x)5
Set membershipx in S5
Tuple membershipx in T#T
Set unionS1 + S2#S1 + #S2
ConcatenationT1 + T2#T1 + #T2
Quantifierexists x in S | C(x)#S * (1 + cost(C))

i, j designate integers, T designates a tuple, S a set, and M a map.

the costs of each instruction in it. However, other issues, discussed in the next section, arise if we go on to examine the more interesting problem of estimating the cost of programs with various loop structures.

6.5.5 The execution time of programs containing loops

Iteration over a set, as for example in

for x in s loop ... end loop;

or over a map, tuple, or string, as in

for x = t(i) loop ... end loop;

produces set elements (or map values, tuple components, or string characters) at a rate of one per cycle. Essentially the same remark applies to numerical iterators, like those in (1). Hence to estimate the time required to execute a loop, we have to multiply the number of times the loop will be executed by the (average) time that it will take to execute the body of the loop. An obvious generalization of this rule applies to imbedded loops: if one for loop is imbedded within another, then the time required to execute the outer loop is the number of times it will be executed, multiplied by the (average) number of times the imbedded loop will be executed, multiplied by the time required to execute the body of the embedded loop. For example, the double loop

for i in [1..1000], j in [1..i] loop ... end loop;

will execute in a time roughly equal to 1000 * 500 multiplied by the amount of time required to execute the loop body, since the embedded loop over j will execute an average of 500 times for each successive value of i. (This number 500 is halfway between the number one of times that j changes when i = 1 and the number (1000) of times that j changes when i = 1000.)

Since quantifiers and set formers are in effect prepackaged iterations, very similar rules apply to them. An existential quantifier like

		... exists x in s | C(x) ...    		(6)  

is evaluated as if it were the following SETL fragment:

		maybe := false;
		
		for x in s loop
			if C(x) then maybe := true; exit; end if;
		end loop;
an internal variable that holds the Boolean result of the test. This SETL fragment will execute in time equal to the number of items examined, multiplied by the average time required to evaluate the Boolean condition C(x). If the quantifier (6) evaluates to false, then all the members of s will need to be examined, so the time required will be #s multiplied by the average time to evaluate C(x). If (6) evaluates to true, then iteration over s will terminate as soon as an x that actually satisfies C(x) is found; since iteration over a set is performed in a somewhat unpredictable order, the number of iterations needed to find such an x should be roughly # s/( # sat + 1) where sat is the set of all x satisfying the condition C(x).

Similar remarks apply to set formers and the tuple formers, except that:

  1. Each insertion into a set takes somewhat longer than a simple iterative step, because we must check for and eliminate duplicate elements; and

  2. The implicit iteration appearing in a set or tuple former like

{x in s | C(x)}

must always proceed until all the elements of s have been examined. As an application of these rules, note that execution of the harmless-looking code fragment

	f := {};

	for x in s loop
		f(x) := {y in s | (exists z in s | C(x, y, z))};         (7)
	end loop;

involves three nested loops: first the for loop which appears explicitly, next the implicit iteration over s in the setformer {y in s..}, and then finally the implicit iteration over s in the quantifier exists z in s...Therefore the number of cycles required to execute (7) is roughly as high as the cube of the number of elements of the set s.

6.5.6 The execution time of while loops

The possibility that a program can loop forever in an ill-constructed while loop should serve to alert us to the fact that analysis of the time required to execute a while loop can be much subtler than for loop analysis. Of course some while loops are easily analyzed. For example, if the variable k is not modified in its body, the loop

	k := 0;
	while (k +:= 1) < n and t(k) /= OM loop 
		statements..
	end loop;

where k is incremented each time through the loop, behaves in very much the same way as a for loop and therefore will terminate after no more than n - 1 iterations. On the other hand, consider the loop

	t := n * [0];

	while exists x = t(i) | x = 0 loop
		print(t);
		t(i) := 1;
		t(1..i - 1) := (i - 1) * [0];
	end loop;

This begins by generating a tuple t := [0,0,0,...0] of n zeroes and then repeatedly sets the first nonzero coordinate of t to 1 and all the coordinates preceding this coordinate to zero, thus carrying out a (left-to-right) form of binary counting. The sequence of tuples printed is

			[0,0,0,...0]
			[1,0,0,...0]
			[0,1,0,...0]
			[1,1,0,...0]
			[0,0,1,...0]
			[1,0,1,...0], etc.

and is plainly of length 2n, in that it generates all binary numbers from 0 to 2n - 1. Hence the number of cycles required to execute the while loop (8) is at least 2n, which means that if n = 50 the loop will execute for roughly 320 years, even if 100,000 iterations are performed per second!

For a more realistic example of the way in which while loops are typically used, consider a particularly simple sorting method, dubbed flip-sort.

	while exists i in [1..#t1] | t(i) > t(i + 1) loop
		[t(i),t(i + 1)] := [t(i + 1),t(i)];              (9)
	end loop;

This searches a tuple t for out-of-order components and interchanges a pair of such components whenever one is found. Plainly the number of cycles required to execute (9) is the average time required to search the tuple t for an out-of-order pair of adjacent components, multiplied by the number of interchanges required to put t in sorted order. Even though precise analysis of these times requires subtle analysis going far beyond the scope of this book, it is not hard to estimate these time requirements crudely. We can guess that, as long as an out-of-order pair exists, one such pair will be found after searching through some fraction of the length of tuple t being sorted; thus evaluation of the existential quantifier appearing in the first line of (9) is estimated to require cn cycles, where c is some constant which we will not attempt to evaluate here and n is #t. Moreover, since each execution of the body of the while loop (9) corrects exactly one case in which a pair of elements appears in inverted order, the expected number of times that (9) must iterate to put t into its final sorted order should be roughly equal to the number of pairs of components of t which occur in inverted order. In a random arrangement of the components of t, roughly half the components to the left of a given t(i) should be larger than t(i), and roughly half the components to the right of t(i) should be smaller than t(i). Thus each component t(i) of t should appear in roughly #t/2 inverted pairs, and it follows, since this way of looking at things counts inverted pairs twice, once for each of the components in such a pair that the expected number of inverted pairs in a randomly chosen arrangement of the components of t should be roughly (1/4)n2). Multiplying this expression by cn, representing the estimated time required to evaluate the existential quantifier in the first line of (9), we arrive at

		(1/4) cn3			 (10)

for the time required to execute this 'flip-sort' code on a tuple of length n.

The approximations which we have made in arriving at the estimate (10) are too crude for the constant (c/4) appearing in (10) to be particularly meaningful. (Ex. 9 following Section 6.5 outlines an experimental procedure for estimating this coefficient more accurately.) The significant feature of the estimate (10) is that it tells us that the time required to sort a tuple by the flip-sorting method is proportional to the cube of the length of t, i.e., that - sorting a tuple of length 10 by this method should take roughly 120 cycles sorting a tuple of length 100 roughly 120,000 cycles, and sorting a tuple of length 1000 roughly 120,000,000 cycles. These figures, which are not very favorable, reflect the rapidity with which the cube of n grows as n increases. In the jargon of algorithm analysis, one says that flip-sort is an 'n cube' algorithm - or an 0(n3) algorithm. Clearly, any sorting algorithm whose time requirement grows less rapidly than the cube of the length of t will be very much superior to flip sort as a technique for sorting large tuples t.

Let us therefore modify our naive flip-sort algorithm in order to obtain a more efficient sorting method. We begin by noticing that the existential quantifier in (9) forces us to scan the tuple from the beginning, each time we find an out-of-order pair. This is wasteful because an out-of-order element may be out of place with respect to several of its successors, not just one. A better approach is to try to push the winner of each such exchange as far as it will go, and go back to the beginning of the tuple only after we have made a full pass over the tuple. This leads to the following:

for j in [1.. #t - 1] loop
	for i in [1 .. #t - 1] loop

		if t(i) > t(i + 1) then
			[t(i),t(i + 1)] := [t(i + 1),t(i)];		  (11)
		end if;  				

	end loop;
end loop;

Note that this code will push the largest element in the tuple to the last position the first time the inner loop is executed, the second largest element will be pushed to the second position from the end on the next pass through the inner loop, and so on. Each execution of the inner loop adds one element to the fully sorted portion of t which is built up at its end. The outer loop is executed (#t1) times to ensure that every element of t finds its proper place.

A further improvement is possible. Elements that have reached their correct position need not be examined in successive passes because we know that they are in their proper places. We can therefore modify (11) to read as follows:

     
for j in [1..#t - 1] loop
	for i in [1..#t - j] loop

		if t(i) > t(i + 1) then
			[t(i),t(i + 1)] := [t(i + 1),t(i)]; 		 (12)
    	end if;

	end loop;
end loop;

The fragment (12) is the bubble-sort method that we examined in Section 5.1.1.

Next let us examine the execution time of (12). Again, let n be #t. The outer loop is executed (n - 1) times. The inner loop is executed (n - 1), (n - 2), ... 3,2,1 times, that is to say, (n/2) times on the average. The body of the inner loop consists of various comparison, indexing and assignment operations, all of which take some constant time c1. Therefore the total execution time of (12) is roughly c1 * (n - 1) * (n/2) cycles. We therefore can say that the code (12) executes in time proportional to the square of the size of t. This is a considerable improvement over our initial flip sort. Several further refinements can be made to (12), but these improvements only affect the constant c1 and do not modify the n2 behavior that we have just obtained.

The order-of-magnitude analysis that we have just performed is the most frequent and useful kind of algorithm performance analysis. Often it is enough to know what the order-of-magnitude behavior of a program is to estimate whether it is acceptable for the small, medium, large, or very large amount of data that is to be processed.

The transformation that took us from (9) to (12) deserves further discussion. What we did was to remove an existential expression

... exists x in s | C(x)

which describes some exhaustive search over s, and replace it with a more intelligent search over that set. This shows a fairly typical use of existential quantifiers: they allow us to write a crude specification for some kind of search process, in order to obtain a working algorithm quickly. When the time comes to improve the efficiency of that algorithm, we can often replace the existential quantifier with a more precise (and therefore more efficient) search, better adapted to the problem at hand. This stepwise approach to program efficiency is one of the most important principles of good programming practice. Proper use of SETL consists precisely in starting with as condensed and abstract a formulation as can be achieved by use of set formers, quantified expressions, power sets, etc. Once such an abstract program is running, one can then replace these abstract constructs, in those places where they are unacceptably inefficient, by more efficient lower-level constructs which make use of sophisticated data arrangements.

6.5.7 Efficiency analysis of recursive routines

The presence of procedure calls adds another dimension to efficiency analysis. To begin with, execution of the procedure call instruction itself has a certain cost. This cost can be divided into a fixed portion and a variable portion. The first part of the cost (on the order of five cycles) does what is needed to remember the point of call (so we can return to it) and start the called procedure. The variable portion of the cost is proportional to the number of actual parameters being passed to the called procedure and can be expected to add two to five cycles per parameter. This indicates that a procedure call involving just a few parameters is as time-consuming as 10 to 15 arithmetic operations.

However, in most cases the important expense is the time the called procedure actually takes to execute. If the called procedure consists only of loops and simple statements, we can estimate its execution time using the techniques sketched in the previous section. On the other hand if the called procedure calls other procedures in turn, or more interestingly calls itself recursively, then the analysis of its efficiency is considerably more subtle. To indicate the kind of reasoning required, and also introduce some useful notation, let us first examine the "quintessential recursive procedure," namely the factorial:

	procedure fact(N);
		return if N = 0 then 1 else N * fact(N - 1) end if;       (1)
	end fact;

Since the factorial of a positive number N is the product of successive integers up to N, an iterative calculation of fact(N) will require N multiplications, i.e., take time proportional to N. It is conventional to indicate this by saying that iterative calculation of fact (N) uses time O(N).

This is also the case for our recursive definition. To establish this, we reason as follows: if N is not zero, execution of fact (N) will require one comparison operation (the test N = O) one multiplication, and one recursive call to fact involving a smaller argument, namely N1. If we let T(N) designate the time it takes to execute fact when its argument is N, this leads to the following equation:

                  T(N) = 2 + T(N - 1)              (2)

where the constant 2 corresponds to the two elementary operations noted.

This equation is called a recurrence relation for T and defines the value T for each possible argument N, as a function of the value that T(N) takes as an smaller argument. Note that this recurrence parallels the recursive definition of fact, in which the value calculated is also defined in terms of another invocation of the same procedure. To solve the recurrence relation means to express T(N) as some explicit function of N. For the recurrence (2), this can be done easily, as follows. The relation (2) holds for arbitrary values of N (except 0), so that substituting N for N1 in (2) we find that

T(N - 1) = 2 + T(N - 2).

We can use this to replace T(N1) on the right-hand side of (2), which gives

T(N) = 4 + T(N - 2)

Repeating the same reasoning, we can express T(N - 2) in terms of T(N - 3), and so on. This process ends when we express T(1) as 2 + T(0). T(0), that is to say the time required to evaluate the factorial when its argument vanishes, is just the execution time of the test (N = O) because in that case the procedure does not recurse. Thus T(0) = 1. Putting all of this together, we obtain

				T(N) = 2 * N + 1					(3)

which indicates that our recursive factorial program also executes in a time proportional to its argument N. The calculation just described neglected the actual cost of the procedure call itself, and that of the return instruction. Taking these into account would modify (3) only by constant factors, i.e., would gives us

T(N) = aN + b

where the constants a and b are 10. Hence our conclusion that calculation of fact(N) requires time proportional to N remains correct.

Next let us examine a more complex example of a recursive procedure: the Towers of Hanoi procedure of Section 5.4.4. Recall that the Towers of Hanoi problem is that of moving N rings, stacked in decreasing order of size on a peg A, to a peg B, using some intermediate peg C as a ring holder, and respecting the ordering of the rings, so that a ring never lies over a smaller ring. As seen in Section 5.4.4, the solution is given by means of the following recursive procedure:

 
procedure Hanoi(N, fr, to, via);

	if N = 0 then return;
	else
		Hanoi(N - 1, fr, via, to);
		print("from peg ", fr, " to peg ", to);
		Hanoi(N - 1, via, to, fr);
	end if;

end Hanoi;

Using the same reasoning as before, we can readily establish the following facts:

  1. When the argument N is zero, Hanoi does not recurse but executes a single test and returns.

  2. When N is greater than zero, Hanoi executes a test, a print statement, and two recursive calls with argument (N - 1). In other words, the time consumed by the procedure is described by the recurrence relation

                        T(N) = a + 2 * T(N - 1)			(5)

and by the "boundary condition"

               	    	 T(O) = b				(6)

where a and b are small constants. Once again, we can use this recurrence relation for T(N) to express T(N1) in terms of T(N2), T(N2) in terms of T(N3), and so on, until T(N) is expressed only in terms of a and b. These successive replacements look as follows:

             T(N) = a + 2 * (a + 2 * T(N - 2))
                  = a * (1 + 2) + 4 * T(N - 3)
                  = a * (1 + 2) + 4 (a + 2 * T(N - 4))
                  = a * (1 + 2 + 4) + 8 * T(N - 4)

The pattern should be clear: each substitution adds another power of 2 to the coefficient of a, and another factor of 2 to the coefficient of T(Ni), so that this coefficient is always 2i-1. Our calculation stops when i = N, at which point we have

T(N) = a(1 + 2 + ... + 2N-2) + 2N-1 * T(0)

a(2N-1) + 2N-1 * b = 2N-1 * (a + b) - a

We conclude that procedure Hanoi takes an amount of time which is an exponential function of its input parameter N (the number of rings to be moved) This means that adding one more ring doubles its execution time. Clearly, a procedure with this behavior can only be useful for relatively small values of its argument. (The Buddhist legend from which this problem originates states that 64 such rings are being moved by dedicated monks since the beginning of time, and that the completion of their task will mark the end of the world. The preceding analysis indicates that we need not be too concerned yet about this prospect.)

Procedures are only useful in practice if they do not have the explosive time requirement that the Hanoi procedure illustrates: factorial is linear in its argument, as we saw previously, and the sorting procedures discussed so far are cubic or quadratic in the number of elements to be sorted. Most of the procedures that we use in practice consume time O(n), O(n2) or O(n3) at most.

Finally let us analyze the performance of one more simple recursive procedure, namely quicksort, which was presented in Section 5.4.1. This procedure, which can sort any homogeneous set s of integers, floating-point numbers, or strings, is simply

procedure quick_sort(s);

	if #s = 0 then return [ ]; end if;
	x := arb s;						(7)
	
	return quick_sort({y in s | y < x})
		+ [x] + quick_sort({y in s | y > x});

end quick_sort;

Let T(n) be the number of cycles that this procedure will typically require to sort a set of n elements. Building up the two sets which appear in the final return statement of (7) will require a number of steps proportional to the size of s, in that a full pass over s is required in each case. Thus the time required to execute (7) is equal to some small constant multiple c*n of the length n of s (c = 3 is a fair guess), plus the time required to execute the two recursive invocations of quicksort which appear in the second return statement of (7). Since typically the element x chosen from s by the arb function of (7) will lie roughly halfway between the largest and the smallest elements of s, each of the two sets {y in s | y < x} and {y in s | y > x} should contain approximately half the elements of s. Thus, given that T(n) is the time required to sort a collection of n elements by the quicksort method, sorting each of these sets by use of quicksort should require roughly T(n/2) cycles. It follows that T(n) satisfies the recursive relationship

				T(n) = 2 * T(n/2) + c * n              (8)

The first of the terms on the right of (8) represents the time typically required to execute the two recursive invocations of quicksort appearing in (7), and the term c * n represents all the work needed to prepare for the two invocations.

We solve (8) in the same fashion as in our two previous examples, by substituting the expression (8) for the occurrence of T on the right of (8), which gives

	T(n) = c * n + 2 * c * (n/2) + 4 * T(n/4) = 2 * c * n + 4 * T(n/4),		(8a)
	

and then substituting (8) for T(n) on the right of (8a) to get

	T(n) = 2 * c * n + 4 * c * (n/4) + 8 * T(n/8) = 3 * c * n + 8 * T(n/8).		(8b)
	

Continuing inductively in this way we will clearly get

			T(n) = 4 * c * n + 16 * T(n/16),
			T(n) = 5 * c * n + 32 * T(n/32),

and so forth, until eventually, when the power of 2 in the denominator on the right becomes roughly equal to n (which will happen after log n steps, where log n designates the logarithm of n to the base 2), we will find that T(n) is roughly

c * n * log n + n * T(1),

i.e., that T(n) can be estimated as the product of a small constant c (still roughly 3), times n, times the logarithm of n. One therefore says, in the jargon of algorithm analysis, that quicksort is an "n log n" or "O(n log n)" algorithm.

For n at all large and c roughly equal to 3, c * n * log n will be vastly smaller than the cube or even the square of n. For example, for n = 1000, n ** 3 is 1,000,000,000, n ** 2 is 1,000,000, whereas c*n* log n is only 30,000. Therefore quicksort can be used to sort large amounts of data, which could not be sorted in any reasonable amount of time using flip or even bubble sort. For example, if #t= 10,000, and on a computer capable of executing 100,000 of our nominal instruction cycles per second, sorting t using the flip-sort method will require approximately 16 hours, bubble sort will take about 20 minutes, whereas quicksort will accomplish the same operation in roughly 4 seconds.

This simple example shows why it is so important to find algorithms whose time requirements do not rise rapidly as the arguments passed to them grow larger. Very considerable efforts have been devoted to the search for such high-efficiency algorithms during the past decade, and a great many ingenious and important algorithms having efficient behaviour have been devised. Most of these algorithms lie beyond the scope of the present introductory work. For basic accounts of this important material, see the References at the end of this chapter.

6.5.8 The cost of copying, or Space is Time after all

Some SETL operations, like (s with x) where s is a set, and (t with x) where t is a tuple, also s less x, x from s, x fromb t, and x frome t, modify a composite object (i.e., a set s or tuple t) which may be large. The same remark applies to the tuple assignment t(i) := x, and to map assignments like f(i) := x and f{i} := x. The time required to execute these operations will vary dramatically depending on whether or not the large composite argument of the operation needs to be copied.

To understand this important point, note first of all that copying is sometimes logically necessary. Consider, for example, the code

	s := {1,2,3,4,5,6,7,8,9,10,15,20};
	s1 := s;
	s with := 25;
	s1 less := 2;
	print("s = ",s);
	print("s1 = ",s1);

The output that this will produce is

		 s = {1 2 3 4 5 6 7 8 9 10 15 20 25}
		s1 = {1 3 4 5 6 7 8 9 10 15 20}.

Since two different values will have been created by the time we reach the final step of (1), it is clear that somewhere along the way to this final step the single set constant assigned to the variable s will have to be copied. This copying can be done when the value of s is assigned to the second variable s1 in the second line of (1) (copying on assignment), or can be done in the third line of (1), when the value of s (but not that of s1) is modified by addition of the extra element 25 (copying on modification). Exactly where copying actually takes place will depend on the particular version of the SETL system that you are using, and especially on whether or not this compiler includes an "optimization" phase. But in any case, whenever two variables appear to be sharing the value of a composite object, and one of these variables is modified, the other one must be provided with a fresh copy of that composite object, because in SETL assigning to one variable never affects another. Copying a set or tuple with n components can require O(n) cycles. Hence execution of an apparently cheap operation like t(i) := x can require a number of cycles proportional to the length of t, if the value of t has been assigned to some other variable.

On the other hand, copying is frequently unnecessary, and both the optimizing version of the SETL compiler and the SETL run-time library include mechanisms for avoiding copying when it is not logically necessary. (Since these are implementation-level mechanisms, and fairly complex ones at that, we shall say nothing about how this is done.) When no copying is involved, the operation s with x is only two or so times slower than the membership test x in s, and similar remarks apply to the other operations in the group we have been considering. For example, the assignment t(i) := x can be done in just one of our nominal "cycles," and in the same circumstances map assignment is roughly five times as slow.

This is but one symptom of the fact that the storage management which the SETL system performs contributes to the total execution time of a program. This contribution will be small if the objects manipulated by the program do not change size very frequently, but can be very substantial if large objects are constantly created, shared, updated, and then discarded. Programming techniques that avoid storage management costs are known and studied in data-structure courses, for which a number of excellent texts exist. Such data structures can be constructed in SETL, and the next section gives one example of such, namely the linked list. In this text we have largely ignored the subject of data structures of this type, in accordance with the tenet of high-level programming which states that the burden of storage management is best left to the system. The gains in ease of programming thereby attained are well worth the execution costs incurred. Nevertheless, the fact that storage management can become expensive should be kept in mind. Note also that in some circumstances, such as in the programming and control of real-time devices (i.e., physical equipment such as a motor, a cardiac monitor, a radar, an air-conditioning system, etc.) where a computer system must respond with minimal delay, no hidden execution costs can be allowed, and the programmer must know, within a few milli- or microseconds, the time that a given code sequence will take to execute. Such systems cannot be programmed in SETL or in any other high-level language such as APL or SNOBOL For such systems, lower-level languages, in which all storage allocation is described in full detail by the programmer, must be used.

6.5.9 Data structures used for the efficient realization of some SETL primitives: the linked list

As seen in the preceding pages, execution of some of the most important SETL operations, for example, set union and tuple concatenation, may require time proportional to the size of the arguments to which these operations are applied. However, if the programs performing these unions and concatenations use the relevant sets and tuples only in restricted, particularly favorable ways, we can sometimes improve their efficiency very greatly by using alternate more complex representations for the objects appearing in these programs. Although doing this is something of a violation of the SETL "spirit," which emphasises ease of programming over efficiency of program, SETL can at least be used to explain these efficiency-oriented techniques.

In order to focus our discussion, we have to examine more details of the implementation of the basic SETL types. In this section we will describe the way tuples are actually represented in the SETL run-time system and examine the advantages and disadvantages of this representation. This will allow us to understand the potential advantages of an alternative representation which is efficient in cases where the standard representation for tuples is comparatively expensive. Note that the representations of sets and maps used by the run-time system are considerably more complex than the representation of tuples.

Internally, that is to say in the memory of the computer on which your program is running, a tuple can be represented by a contiguous sequence of cells, or words, that contain the components of the tuple. These components appear in order, so that the first component of the tuple appears in the first of these cells, the second component in the second, etc. For example, if f is the tuple [2,3,5,7,11], it could be stored in memory as follows:

235711

When working with such a representation the SETL run-time system would always know where the first component of t is found. From this, it can easily calculate the location of the second, third, components of t, and so on: the location of any component t(i) is simply (i - 1) plus the location of the first component. cells after the location of t(1). This is why a tuple is the most efficient way storing data that is referred to by number, i.e., according to its ordering: when such a collection of data is represented as a tuple, all items in this collections are equally accessible. Tuples are built for efficient indexing.

Other operations on tuples are not as easy to perform. Consider

t(2..3) := [ ];

whose purpose is to shorten t by removing two of its components. After this operation is performed, t will be stored as follows:

2711

Thus we now have t(2) = 7, t(3) = 11, etc. This requires that the components of t that appear after the deleted section be "moved down" two positions. For a long tuple, this can represent a substantial expense. Similarly, operations that enlarge a tuple, by insertion or concatenation, such as

			t with:= 13;
			t(1..3) := [1,2,3,4,5];
			t +:= [17,19,23,29];

may require that a certain number of elements of t be moved or copied.: there is no room in the portion of memory currently allocated for t for the expansion required by one of the preceding operations, then a larger area will have to be found by the storage manager, into which the whole of t will have to be copied and then modified. We conclude that operations that modify the size of a tuple repeatedly carry a potentially substantial penalty in storage allocation activity. In cases where data are ordered but are not accessed by number, i.e. when no indexing operations are performed, and where substantial changes in the size of a tuple will occur, other representations exist that lead to smaller storage allocation and data motion costs. The most familiar among such representations is the chain, or linked list.

In the abstract, a linked list is a structure in which data are organized according to some linear order, i.e., with a first element, a second one, and so on, and so that it is easy to go from one element to its immediate successor. A tuple is of course an example of a list, but one in which elements are stored contiguously. In general a linked list is not stored contiguously, and the mechanism that takes us from one element to the next must be described explicitly. In SETL we may choose to use a map (call it next) which takes us from any element to its successor in the ordering. Then rather than having to use contiguous locations, successive elements of the list can be anywhere. In this case we can think of the location that holds each element of the list a being marked by an arbitrary tag, for which SETL atoms will be used in what follows. The contents of each such location are also described by a mapping (call it val_of), so that if C is a cell in this list, val_of(C) is the element stored at C, and next_of(C) is the cell that contains the next element. In addition we need to have an explicit way of referring to the first element of the list: a variable comp1 must be used, whose value is the first cell. Then, all in all, val_of(comp1) is the first element of the list, val_of(next_of(comp1)) is the value of the second element of the list, etc. The last cell cn in the list is distinguished by having no successor, which we express by setting next_of(cn) = OM.

We are now ready to build and manipulate lists. In order to create a list with a single element x, we execute the following fragment:

	comp1 := newat();    -- Define cell.
	val_of(comp1) := x; -- place element in cell.
	next_of(comp1) := OM; -- First element is currently the last also.

In order to add a new element y after x on this list, we execute the following fragment:

	new := newat();        -- Define new cell.
	val_of(new) := y;    -- Place element in it.
	next_of(comp1) := new;  -- First element now has successor.
	next_of(new) := OM;     -- Second one does not.

If instead we want to add a new element in front of the list, we execute

	new := newat();		-- New cell.
	val_of(new) := z;	-- And its value.
	next_of(new) := comp1;	-- First element now becomes second.
	comp1 := new;		-- And most recent element is first.

Note that each invocation of newat() just represents a way of obtaining a unique marker for each element in the list. Instead of atoms we could have used integers, strings, or indeed any kind of SETL value. We choose to use atoms to emphasize that each cell in the list requires a "name" of some sort, but that this name serves as a place-marker for the value that goes with it and has no other significant relationship to that value.

In order to transform any tuple t into this list representation, we can make use of the following procedure:

    procedure into_list(t);
        if t = [ ] then return OM; end if;

        first := newat();
        cell := first;
        succ := newat();

        for i in [1..#t] loop   -- Place tuple component
                val_of(cell) := t(i);   -- Cell for next.
                next_of(cell) := succ;  -- In which next component
                cell := succ;   -- will be stored.
                succ := newat();
        end loop;
        
        next_of(cell) := OM;       -- Last element.
        return first;   -- First allocated cell.
    end  into_list;                

This procedure (which assumes that next_of and val_of are globally available, map-valued quantities) returns the first cell of the list, so that by applying the mappings next_of and val_of repeatedly we can reach all the elements of the list. We have assumed that next and val_of are global variables, defined outside this procedure. This is indeed the way in which a program using lists is usually structured: a single next mapping is used to describe the chaining of all lists present in the program. The chained representation of a tuple that is obtained from procedure into_list can be pictured as follows (Figure 6.1):

Figure 6.1 A Tuple Represented by a Chained List of Elements

At first sight, representing a tuple in this way may not appear to be such a good idea. Of course, it is not hard to iterate over tuples having this representation: we simply start at comp1 and apply next repeatedly to step along, always applying val_of to get the component value corresponding to whatever index we have reached. For example, instead of writing

	if exists x = t(i) | C(x) then ... else...  	(1a)

as we would if t were a standard SETL tuple, we would write


	i := comp1;			-- Initialize search index
	while i /= OM loop 
		x := val_of(i);        -- Value of next element.
		if c(x) then exit; end if; -- If found, search is successful 		(1b)
		i := next_of(i);          -- else advance search index
	end loop;
	if i /= OM then ... else...

Although no less efficient than (la), the code (lb) is certainly more comp1ex and hard to read. Moreover, accessing a given component t(k) of t is much less efficient in the list representation, since for standard tuples the operation

x := t(k)

is performed in one or two cycles, whereas if t has the list representation we will instead have to count up to k as we step through the list, as the following fragment shows:

	i := comp1;
	for j in [1..k-1] loop i := next_of(i); end;
	x := val_of(i);

whose execution requires at least k cycles; that is to say, indexing operations are not efficient when applied to lists. On the other hand, certain operations that modify the size of tuples can be performed much more rapidly in the list representation than for standard SETL tuples. For example, for a standard tuple the operation which inserts x immediately after the i-th component of t requires time proportional to the length of t, since to create the expanded tuple all of the elements of t will have to be copied into new positions. On the other hand, if t has the list representation, and we know the name ni of the cell after which the insertion is to be performed, this operation can be done in just a few cycles, since all we have to do is

  1. create a new atom ix to name the cell for the new component value x;
  2. link ix at the appropriate position into the list representing t.

Similar remarks apply to the operation t(i..i) := [ ] which deletes a given component from a tuple in list representation. The following two procedures represent these operations. In writing these procedures, we suppose that a single pair of maps I>next and val_of will be used to represent all lists, and that the variables next and val_of have been declared global. We also suppose that only one logical reference to any of these maps is ever extant, so that no copying (see the preceding section) ever needs to be performed.

	var next,val_of;		-- Global Variables
	
	next := {};			-- Initialize to null set
	val_of := {};			-- Initialize to null set
	
    procedure insert(x, ni);                -- Inserts x immediately after the list
                                        -- component
        ix := newat();                  -- whose cell is ni. Create a new cell ix,
        next_of(ix) := next_of(ni);           -- and make it the predecessor of next_of(ni)
        next_of(ni) := ix;                 -- and the successor of ni
        val_of(ix) := x;                -- Place value in cell.
    end  insert;
        
        
    procedure delete(ni);
        
                -- delete the component immediately following that whose cell is ni
                next_of(ni) := next_of(next_of(ni)?ni); -- unless ni is the last index in its    list make
                                -- ni's successor the successor of ni's
                                -- original successor
    end  delete;

Provided tuples t1 and t2 are represented as lists, that t1 will not be required after t1 and t2 are concatenated, and that the name i of the last cell of t1 is easily available, the concatenation of t1 and t2 can also be formed in a number of cycles independent of the length of t1 and t2. The following procedure, in which we assume that each tuple t1 n list form is represented by a pair [first, last] consisting of the first and the last cell of t, shows this:

	procedure concat(t1,t2);
	
		[t1_first,t1_last] := t1;		-- unpack the first and
							-- last cells of t1
		
		[t2_first,t2_last] := t2;		-- and those of t2
		
		if t1_first = OM then			-- t1 is an empty list
			return t2;
		elseif t2_first = OM then		-- t2 is an empty list
			return t1;
		else
			next_of(t1_last) := t2_first;	-- link the two lists
			return[t1_first,t2_last];
		end if;
	
	end  concat;

The mapping next_of, and the variables used to keep track of first and last elements of lists, together define what are called pointers in other programming languages.

Quite a few other very useful trick representations of tuples, sets, maps, etc., are known. These representations make use of more comp1ex arrangements of pointers, i.e., internal mappings between elements. If the family of operations applied to a SETL object is appropriately limited, use of an appropriate one of these special representations can be very advantageous. Since further exploration of this very important issue would take us beyond the scope of the present work, we refer the reader to the References at the end of this chapter for additional material concerning the issue of data structuring.

EXERCISES

1. Write a while loop that takes 2**(2**n) cycles to execute, for some n. Write a while loop that take 2**(2**(2**n)) cycles to execute. For how many centuries will these loops run, if n = 100?

2. If the monks in charge of the Towers of Hanoi began moving the 64 rings in the year 1 million B.C. and if they more one ring per second, when will the world end?

3. By adding a mapping prev that takes a given list element to its predecessor in the list, it is possible to iterate over a list forward and backward. Write a procedure to transform a standard tuple into this "doubly-linked" list representation.

4. Write procedures to insert and delete elements from a doubly-linked list.

5. Estimate the cost of the list concatenation procedure of Section 6.5.9. Assuming that a map operation such as next_of(last) := first; takes five cycles, and that copying a tuple of size n takes 2*n cycles, determine the size of t1, t2 for which the linked representation is preferable to the standard contiguous representation for performing tuple concatenation.

6.5.10 Some techniques for program transformation

The examination of various sorting examples in the previous sections indicates the very general fact that several algorithms, of quite different efficiencies, can usually be found to perform a given task. Several questions then arise:

  1. How can we find the most efficient method or algorithm to perform a computation?

  2. How can we know that it is the most efficient one available?

  3. Given one algorithm which is not the most efficient for the problem at hand, can we modify it in some systematic fashion in order to improve its efficiency, perhaps by transforming it into one of the more efficient algorithms of its family?

The answer to (a) is found in the efficiency analysis which we sketched in Section 6.5. Given two algorithms, we can compare the number of operations they are expected to take in various cases and thus determine which one is preferable. Question (b), namely how do we know that a given algorithm is indeed the best possible for a given task, is much harder, and is the subject of so-called lower-bound analysis. For example, in the case of sorting lower-bound analysis gives the following result: "When we sort a set of size n by means of comparisons between its elements, 0(n log n) operations are required". This rigorous mathematical result (which we shall not prove here) shows that quicksort is probably close to the best way to sort, although bubble-sort is definitely not. Unfortunately such lower bounds are known only for a few interesting problems, and in general we have no systematic way of knowing how close to optimal a newly invented algorithm is.

Question (c), namely how can we improve on a given algorithm in order to improve it, is the source of much current research and also of large amounts of programming lore. Some basic techniques for transforming a program into a more efficient one take the form of very general prescriptions, such as

  1. Do not evaluate the same expression repeatedly.

  2. In particular, if a calculation is performed repeatedly in a loop and yields the same result every time, then perform the calculation once outside the loop, save the result, and use it within the loop.

  3. Try to reuse storage occupied by a variable if the value of that variable is not needed any further.

  4. When possible, replace recursive calls by iterative loops.

These prescriptions are simple and intuitive and can be expected to be generally useful. They can be applied almost mechanically and in fact can be automated to some extent, so that a clever compiler can apply such prescriptions by itself. In fact, most good compilers perform transformations corresponding to (1) and (2). Reuse of storage, as suggested in (3), is harder to automate, and the application of this rule requires greater ingenuity on the part of the programmer. The removal of recursive calls in favor of iterations has been studied in great detail in the context of languages such as LISP, which use recursion very heavily. We will examine a nontrivial example of recursion removal in a few pages.

In addition to the general rules stated in (1.)-(4.), there are certain optimizing program transformations that apply to very specific program constructs. Catalogs of such transformations can be found in some of the works listed in the References. Here we will only give some simple examples of applications of the rules (1), (2), and (3).

6.5.10.1 The elimination of repeated calculations

The simplest kind of efficiency-improving transformation which we can perform is to identify expressions that are common to several calculations and that can be evaluated once instead of repeatedly. For example, when calculating the roots of a quadratic equation, we may write the following familiar formulae:

		x1 := (-b + sqrt(b**2-4.0*a*c))/(2.0*a);        (1)
		x2 := (-b - sqrt(b**2-4.0*a*c))/(2.0*a);

In this case it is plain that the expression involving the square root is the same for both roots, which makes it possible to rewrite (1) as follows:

		discr := sqrt(b**2-4.0*a*c);
		x1 := (-b + discr)/(2.0*a);
		x2 := (-b - discr)/(2.0*a);

Transforming from (1) into (2) saves four arithmetic operations and one square root evaluation and makes the expressions simpler to type. We remark that the expression (2.0*a), which is a common subexpression to x1 and x2, is still evaluated repeatedly in (2). Replacing the two evaluations by a single one by saving the result in some other variable would eliminate one arithmetic instruction but add one assignment statement and therefore would hardly be a saving at all.

As a less trivial example of removal of redundant subexpressions, consider the evaluation of polynomials at a point. That is to say, given some polynomial

anxn + an - 1 xn - 1 + ... + a0 (*)

we want to determine its numerical value at some point, e.g., for x = 10. If we represent the polynomial by a tuple P containing its coefficients in decreasing order, then we can evaluate the polynomial for x = c by means of the following code:

			degree := #P - 1;
			value := +/[a*c**(degree-i + 1): a = P(i)];

Note that in the loop we evaluate c**n, c**(n-1), c**(n-2), etc. These exponentiations are evaluated as repeated multiplications, so that the number of operations performed will be

n + (n-1) + (n-2) + + 0(n) = n2 + 0(n) = 0(n2)

where the O(n) term takes care of the iteration over P and of a final summation of individual terms. In other words, as written the fragment (3) takes an amount of time proportional to the square of the degree of the polynomial. Substantial improvement can be obtained if we observe that the powers of c can be calculated in increasing order rather than in decreasing order, so that c**n is obtained from c**(n-1) by means of a single multiplication. An improved version is the following:

                power := 1;
                value := 0;
                for i in [#P,#P - 1..1] loop
                        term := power * P(i);        (4)
                        value +:= term;
                        power *:= c;
                end loop;

Here the loop is executed n times (where n is the degree of P) and each execution of the loop takes three arithmetic instructions and a few assignments, so that the algorithm is now of order n, rather than n2. This is a substantial improvement for large enough n.

Let us note that a similar improvement can be obtained by a different approach, which is several centuries old, and is known as Horner's rule. We can rewrite the polynomial (*) by means of successive factorizations:

((..(anx + an - 1)x + an - 2)..)) + aO

This suggests the following method for evaluating P:

                value := 0;
                for a = P(i) loop
                        value := value * c + a;
                end loop;

Now the loop contains only two arithmetic operations and one assignment. This is certainly superior to (4), and is therefore the method of choice for evaluating polynomials. Note however that (5) was obtained by means of a clever rewriting of the original problem, and the solution was not the result of some systematic transformation rule, but was "pulled out of a hat," as it were. It is appropriate to remember at this point that the development and optimization of algorithms require a healthy dose of inventiveness (guided, to be sure, by a knowledge of existing algorithms and optimization techniques)

6.5.10.2 Reuse of storage and recursion elimination: the case of quicksort

In order to illustrate the kind of program improvement that can be obtained by recursion removal, we will examine in some detail the (by now familiar) quicksort method. In its simplest version, we have described quicksort as follows:

	procedure quicksort(S);

		if S = {} then return [ ];
		else
			x := arb S;
			lo := {y in S | y < x};
			hi := {y in S | y > x};               		  (1)
			return quicksort(lo) + [x] + quicksort(hi);
		end if;

	end quicksort;

The skeleton of this algorithm consists of two steps: partition and recurse. That is to say: pick some element of the set to be sorted, and partition the rest of the set into two disjoint sets, corresponding to elements that are smaller (resp. larger) than the partitioning element. Then apply this process recursively to each partition. The two very substantial improvement that can be brought to bear on the efficiency of quicksort are performing the partitioning in place and replacing the recursion with a loop.

Partitioning in Place. To see how to apply this improvement we first observe that after S has been partitioned into lo and hi, the value of S is not used again. This suggests that, instead of building lo and hi at each step as new sets, we may want to reuse the space occupied by S, to place lo and hi in it. How to do this is not immediately apparent: as long as S, lo, and hi are sets, we do not have a simple way of describing how they are actually stored. If however we make them into tuples forming part of a larger tuple, then it is possible to describe them by their size and their starting locations. Suppose S is a tuple of size n, and that the partitioning element which we choose turns out to be the k-th one (in increasing order). Then the locations 1, 2..k-1 can be used for lo, and the locations k + 1, k + 2..n can be used for hi. Then, rather than passing lo and hi as tuples in the recursive call, we can instead pass the end points of each and then proceed to partition each subtuple in place as well. (How the partitioning in place is actually performed remains to be discussed.) This leads to the following (tentative) improved version of quicksort:

	T:= [x in S];		-- Initial tuple to be sorted.
	quicksort2(1,#T);		-- First call to routine.

procedure quicksort2(m, n);

-- Apply quicksort to the portion of T (which is global) between T(m) and
-- T(n) inclusive.

	if m >= n then return; -- Trivial case.
	else
		k = partition(m,n);		-- k is the position of the partitioning element
	
		quicksort(m,k - 1);		-- Elements smaller than it.
		quicksort(k + 1,n);		-- Elements greater than it.
	end if;	

end quicksort2;

In this version of the algorithm, no composite objects are constructed (except for the global tuple T). Each call to quicksort is given an explicit region of T on which to operate. This forces us to mention the indices m, n, k, etc., explicitly, which makes for a longer and less readable piece of code. We must now show that the partitioning itself can be done in place. This is an interesting algorithm in its own right.

Partitioning an Array in Place. The problem is the following: given a tuple t, and an arbitrary element e of t, reorder the elements of t so that all elements l < e appear before e, and all elements h > e appear after e. For simplicity let us assume that e is chosen to be the first element of t. A first thought might be the following: scan t from left to right, and whenever an element e1 > e is found, move it up. But exactly where to? If we are to reorder t in place, then in order not to loose any information, we can only exchange elements. As a moment's reflection will show, the proper way to proceed is the following:

  1. Going from left to right, find an element e1 > e, which must therefore appear after e in the desired reordering.

  2. Going from right to left, find an element e2 < e, which must therefore appear before e .

  3. Interchange e1 and e2 .

  4. Continue scanning from left to right from the position of e2 , and from right to left from the position of e1 , interchanging whenever the next pair of out-of-place elements is found.

  5. Stop when those two scans meet (at the correct location for e).

The following code (which deserves very careful scrutiny) performs this partitioning process:

procedure partition(i,j);
        
        -- Partition the portion of array T from T(i) to T(j),
        -- using T(i) as the partitioning element e. The procedure places e at
        -- position k, and places all elements smaller than e at positions < k.
        
        
        e := T(i);      -- Partitioning element.
        l := i + 1;     -- Index for left-to-right scan.
        r := j;         -- Index for right-to-left scan.
        
	while l < r loop 	-- Scans have not met in middle.
 
                while l < r and T(l) < e loop     -- bypass smaller elements.
                        l +:= 1;
                end loop;
        
        
                while l < r and T(r) > e loop     -- bypass larger elements.
                        r -:= 1;
                end loop;
         
        -- Now T(l) > e, and T(r) < e. Exchange them if the two scans have not met yet.
        
                if l < r then
                        [T(l), T(r)] := [T(r), T(l)];
                                -- Skip over these two elements which are now in proper place.
                        l +:= 1; if r > l then r -:= 1; end if;
                end if;
        
        end loop;

        -- Now l = r. l = r or l - 1 (if it is positive) is the correct position
        -- for the partitioning element. Place it there, and place T(r), which is  no
        -- larger than it, at the original position of the partitioning element.

        if l > #T or T(l) > T(i) then l := (l - 1) min j; end if;
        if l > 0 then T(i) := T(l);  T(l) := e; end if;
        
        return l;

end partition;

The top-level driver code for this routine can simply be

	procedure quicksort(i,j);
	    if  i >= j then  return;  end if;
	    mid := partition(i,j);
	    quicksort(i,mid - 1); quicksort(mid + 1,j);
	end quicksort;

Both these procedures assume that the tuple T is globally available.

At this point, our first optimization goal has been achieved: we have a quicksort algorithm that works in place, that is to say, without constructing any new composite objects in the course of its execution. This has been achieved by explicitly describing the elementary operations that place each element of the object being sorted at some specified position in a single tuple.

In other words, we have deliberately avoided the use of composite objects in the explicit fashion characterizing version (1) of quicksort. Of course, when we avoid manipulation of composite objects, much of the power of high-level languages such as SETL is lost. In fact, our version of quicksort now looks remarkably similar to a PASCAL or an Ada version of the same algorithm. This indicates the relative advantages of these various languages: SETL and other languages that manipulate arbitrary composite objects are the most convenient programming tools and lead most rapidly and with smallest programming effort to correct and lucid versions of algorithms. However, when efficiency considerations become critical, we must often abandon the richer semantic primitives of these languages (or drop to a lower-level language altogether) in order to control the efficiency of every statement we write.

Removing the Recursion from Quicksort. The next step in our improvement of quicksort is to replace the recursive calls within quicksort by some iterative construct which can be expected to be cheaper (recall that a procedure call is as expensive as 10 to 20 arithmetic operations). The skeleton of the recursive version of quicksort is the following:

		procedure quicksort(i,j);
			...
			quicksort(i,k - 1);
			quicksort(k + 1,j);

Each invocation of quicksort receives some values for i and j and assigns some value to k. Therefore, to eliminate recursive invocations of the procedure, we must save the current values of i, j, and k, because they might be used after we return from the recursive call. This can be achieved by means of a stack on which we "pile up" the parameters which we need to preserve. A stack is a tuple which is manipulated as follows: elements are added to it at its end and deleted from it from the same end. In other words, the operations that modify a tuple are the following:

			stack with:= x; -- "push" operator
			z frome stack; -- "pop" operator

Whenever we perform a recursive call, we use the stack to save whatever needs to be saved, i.e., whatever current values of variables and parameters will be needed after the recursive call. Whenever we return from the recursive call, we obtain, from the stack, all values that we need to resume the computation at the point where we left off. This is in fact what the SETL run-time system does: whenever a recursive call is executed, current values are saved before the (new instance of the) procedure starts executing. However, the run-time system does this in a stereotyped fashion which is often inefficient. In the case at hand, the two recursive calls are sandwiched between stacking and unstacking operations, whose action corresponds to the following code:

	procedure quicksort(i,j);
			...
		stack with:= [i,j,k]; 	-- Save current values.
		quicksort(i, k-l); 	-- Perform recursive call.
		[i,j,k] frome stack; 	-- Restore current values.
		stack with:= [i,j,k]; 	-- Save them again.
		quicksort(k + l,j);
		[i,j,k] frome stack;	 -- restore current values.

	end quicksort;

(Recall that the stack manipulations are the ones performed by the run-time system.) Several remaining inefficiencies should now be apparent: the most glaring one relates to the fact that the second recursive call is the last statement in the body of quicksort. In other words, there are no computations to be done after the second call, and therefore there is no need to save and restore current values of variables and parameters at this point. A smaller inefficiency is the fact that the second call only requires the values of k + 1 and j, so that there is therefore no need to save i either. So in fact we can do with the following:

	procedure quicksort(i,j);
		...
		stack with:= [k,j];
		quicksort(i,k-1);
		[k,j] frome stack;
		quicksort(k + 1,j);

	end quicksort;

At this point, the meaning of the stack should be clear: it allows us to keep track of portions of the tuple T which remain to be sorted. With this understanding, we realize that quicksort will continue calling itself recursively as long as there is some portion of T still unsorted, i.e., as long as there is something left on the stack that indicates some pending invocation of quicksort. Having this insight, we can transform our algorithm into an iterative one that loops as long as the stack is not empty:

	while stack /= [ ] loop
		[i,j] frome stack;	-- Values to work on.
		...
		stack with:= [i, k - 1]; -- to be worked on.
		stack with:= [k + 1,j]; -- ditto.
	end loop;

This is still incomplete: as written, each step deletes one pair of indices from the stack and adds two such pairs to it: clearly the stack will grow without bound, and the algorithm never terminates. What prevents this disaster from happening is that successive pairs describe shorter and shorter portions of the tuple being sorted. Whenever j - i < 2, there is nothing to sort. Therefore we should only save a pair on the stack if it describes a portion of the tuple of size greater than 1. A final improvement is to save only one pair on the stack and go to work immediately on the other, rather than stacking it and unstacking it at once. In order to keep the stack small, it turns out best to stack the indices of the larger portion to be sorted and go to work on the smaller one. Putting all this together, we obtain the following:

procedure quicksort(rw T);

    -- Optimized version of quicksort procedure. It operates in place, with
    -- explicit stack manipulation, and the partitioning process is performed
    -- on line.
    
    i:= 1;          
    j:= #T;                 -- Initial portion to be sorted is the
                            -- whole of T.
    
    stack := [ ];        

    loop

        e := T(i);      -- Partitioning element.
        l := (orig_l := i) + 1;     -- Index for left-to-right scan.
        orig_r := r := j;         -- Index for right-to-left scan.
        
        while l < r loop                        -- Scans have not met in middle.
 
                while l < r and T(l) < e loop     -- bypass smaller elements.
                        l +:= 1;
                end loop;
        
        
                while l < r and T(r) > e loop     -- bypass larger elements.
                        r -:= 1;
                end loop;
         
        -- Now T(l) > e, and T(r) < e. Exchange them if the two scans have not met yet.
        
                if l < r then
                        [T(l), T(r)] := [T(r), T(l)];
                                -- Skip over these two elements, which are now in proper place.
                        l +:= 1; if r > l then r -:= 1; end if;
                end if;
        
        end loop;
 
        if l > #T or T(l) > T(i) then l := (l - 1) min j; end if;
        if l > 0 then T(i) := T(l);  T(l) := e; end if;
        
        mid := l;

                    -- Save larger partition on stack.
        if orig_r - mid > mid - orig_l then
                [savei, savej] := [mid + 1,orig_r];
                j := mid - 1; i := orig_l;             -- Work on [orig_l..mid - 1]
        else
                [savei, savej] := [orig_l,mid - 1];
                i := mid + 1; j := orig_r;                       -- Work on [mid + 1..orig_r]
        end if;
        
                    -- Save larger partition only if it is longer than 1.
        if savej - savei >= 1 then stack with:= [savei, savej]; end if;
    
            -- Work on smaller partition only if it is longer than 1.
            -- Otherwise take the top pending partition to work on.

        if j - i = 1 and T(i) > T(j) then [T(i),T(j)] := [T(j),T(i)]; end if;
        
        if j  - i < 2 then 
            if stack = [ ] then   -- Nothing left to do.
                 exit;
            else
                 [i,j] frome stack;
            end if;
        end if;

end loop;
    
end quicksort;

This is very far from the short and intuitive algorithm with which we started. However, this final version is considerably more efficient than the original. It is in this final form that the quicksort procedure is usually implemented. Moreover, this more efficient version is still incorrect in some details and would have to be touched up to function properly! For example it will not work if T contains duplicate elements; moreover, during the