CHAPTER 9

The Language in Action: A Gallery of Programming Examples.

In this, the last chapter of the introductory part of this book, we illustrate the use of SETL by giving a variety of programs which exhibit its features and can serve as useful models ol style. Some of the smaller programs present significant algorithms; the larger examples show how more substantial programming problems and applications can be addressed.

9.1 Eulerian Paths in a Graph

A graph is a collection of nodes, pairs of which are connected by edges (see Section XXX). Graphs come in two varieties, directed graphs, each of whose edges has a specified starting node and target node, and undirected graphs, whose edges can be traversed in either direction. The most natural SETL representation of a directed graph G is a set of ordered pairs [x, y], each such pair representing an edge with starting node x and target node y. It is convenient to represent an undirected graph G in the same way, but in this case the reversed edge [y, x] belongs to G whenever [x, y] belongs to G. This representation also allows us to regard G as a multivalued map: G{x} is the set of nodes connected to x by some edge. The following algorithm makes use of this fact.

Given an undirected graph G, the Eulerian path problem, named after the famous mathematician Leonhard Euler (1707-83), "who calculated as other men breathe," is to traverse all the edges of G exactly once by a single unbroken path p which starts at some node x of the graph and ends at some other node y (which might be the same as x). We can think of such a path, called a Eulerian path, as "using up" edges as it traverses them. Euler used the following argument to determine which graphs contain paths p of this kind. If a node z along p is different from the starting and ending nodes x and y of p, then immediately after p has reached z along one edge p will leave it along some other edge, and thus p will always use up an even number of the edges which touch any node z of p not equal to x or y. The same remark applies to the starting node x if x = y, but if x and y are different then p must use up an odd number of the edges touching x and an odd number of the edges touching y. It follows that a Eulerian path p which traverses all the edges of G just once can only exist if G is connected and either has no nodes x touched by an odd number of edges or has exactly two such nodes x, y; and in this latter case every Eulerian path p must start at one of x, y and end at the other.

Suppose, conversely, that G has either no nodes or exactly two nodes which are touched by an odd number of edges. Then we can construct an Eulerian path p as follows. If every node of G is touched by an even number of edges of G, let x be any node of G; otherwise let x be one of the two nodes x, y of G touched by an odd number of edges. Start the path p at x, and extend p as long as possible by stepping from its endpoint along any edge of G that has not been traversed before. Since we consider an edge to be used up as soon as it is traversed, the construction of p uses up more and more edges of G and therefore must eventually stop. Hence p must be finite. Suppose that p ends at a node y. Clearly all the edges touching y must have been traversed by p, since otherwise p could be extended by some edge. Thus, if the starting node x of p is touched by an odd number of edges, p must end at some other node y which is also touched by an odd number of edges, whereas if x is touched by an even number of edges, then p must return to x and end there. In either case, removing all edges traversed by p from G will leave behind a graph G' each of whose nodes is touched by an even number of edges. If p does not already traverse all the edges of G, then some node z along p will be touched by some untraversed edge. In this case, one can construct a path q by starting from z with this edge and extending q along untraversed edges as long as possible. Since the remarks concerning p apply to q as well, and since q can be regarded as a path in the graph G', and since all of the nodes preceding G are touched by an even number of edges, the path q must both begin and end at z; i.e., q must be a circuit. Hence we can insert q into p, thereby constructing a path which first follows p to z, then follows q until q finally returns to z, and then follows the remainder of p to its end. Call this extended path by the same name p. Repeating the construction and insertion of circuits like q as often as possible, we must eventually build up a path p which traverses all the edges of the original graph G.

The two following procedures realize the Eulerian path construction described in the preceding paragraphs. Procedure build_path starts a new path and extends it as far as possible, deleting (from G) the edges traversed by this path; Procedure Euler_path installs the path sections returned by build_path into the overall Eulerian path that it constructs and returns.

```program Euler;    -- Eulerian path construction

graph := {[1,2], [2,3], [3,4], [4,1], [4,2]};             -- a small graph
print(euler_path(graph + {[y, x]: [x, y] in graph}));        -- which is undirected.

procedure Euler_path(G);                -- constructs Eulerian path for graph G

nodes := domain(G);                        -- all nodes in the graph.

if #(odds := {x in nodes | odd(#G{x})}) > 2 then
return OM;      -- since more than two nodes are
end if;             -- touched by an odd number of edges

-- odds is the set of all nodes of G that
-- are touched by an odd number of edges
x := arb(odds)?arb(nodes);      -- pick a node of odds if possible
-- otherwise pick any node of G

path := [x] + build_path(x,G);

while exists z = path(i) | G{z} /= {} loop
new_p := build_path(z, G); -- insert new section into path
G -:= ({[y,x]: [x,y] in new_p} + {e: e in new_p});
path := path(i..i - 1) + new_p + path(i..);
end loop;

return path;

end Euler_path;

procedure build_path(x, rw G);        -- builds maximal path section starting at x,
-- and deletes all edges traversed

p := [ ];

while (y := arb G{x}) /= OM loop
-- while there exists an edge leaving the last point reached
p with:= y;                 -- extend path to traverse the edge
G -:= {[x, y], [y, x]};        -- delete the edge just traversed
x := y;                        -- step to y
end loop;

return p;

end build_path;

end euler;
```

9.2 Topological Sorting

Certain problems, of which scheduling problems are typical, require one to arrange the nodes n of a graph G in a list such that every edge of G goes from a node n1 to a node n2 coming later in the list. This is called the problem of topological sorting. Suppose, for example, that a student must choose the order in which he or she will take the courses required to qualify as a computer science major, some of which have other courses as prerequisites. Suppose also that we represent the prerequisite relationship as a set G of pairs, agreeing that whenever course n1 is a prerequisite of course n2, we will put the pair [n1,n2] into G. Then, mathematically speaking, G is a graph; in heuristic terms, G{n1} is the set of all courses for which n1 is a prerequisite. (Note the connection of the topological sorting problem with the transitive computation of prerequisites described in Section 4.3.8.1.)

To sort a collection of courses topologically is simply to arrange them in any order in which they could actually be taken, given that all the prerequisites of each course n must be taken before n is taken. To do this we can simply find some course n which has no (unfulfilled) prerequisites, put n first in the list L, drop all edges [n, n1] from G (since n is no longer an unfulfilled prerequisite), and then continue recursively as long as courses without unfulfilled prerequisites remain. Written as a recursive SETL routine, this is short and direct:

```procedure top_sort1(G,nodes);      -- topological sorting procedure, recursive form
return if exists n in nodes | n notin range(G) then
[n] + top_sort1(G lessf n, nodes less n) else [] end if;
end top_sort1;
```
Invocation of top_sort1(G) will return a tuple t consisting of some or all of the nodes of G. If it is possible to sort nodes of G topologically, then every node of G will appear in t. This will be the case if and only if G admits no cycle of nodes such that
```    n1 is prerequisite to n2 is prerequisite to n3 is prerequisite to ...
is prerequisite to nk is prerequisite to n1.```
To see this, note that it is clear that when such a cycle of mutually prerequisite nodes exists, no node in the cycle can ever be put into the tuple t returned by top_sort1. Conversely, if a node n0 belongs to no such cycle, then eventually top_sort1 will have processed all the predecessors (i.e., prerequisites) of n0, and after this top_sort1 must eventually put n0 into the tuple t it returns. This shows that the set of all nodes belonging to any cycle like (n1,n2,...,nk,n1) is simply
`       nodes - {x in top_sortl(G,nodes)},`
so that (1) can also be used to test a graph G for the presence of cycles.

Like many other tail recursions, i.e., recursive procedures which only call themselves immediately before returning, the topological sort procedure seen above can be rewritten as an iteration (see Section 5.4). Written in this way, the topological sort procedure becomes:

```procedure top_sort2(G);         --  first iterative form of topological sort

nodes := domain(G) + range(G);   --  Here we calculate the set of all nodes; this makes it unnecessary to
--  pass the set of nodes as an additional parameter.

t := [ ];                    --  initialize the tuple to be returned

while exists n in nodes | n notin range(G) loop
t with:= n;
G lessf:= n;
nodes less:= n;
end loop;

return t;

end top_sort2;
```

It is possible to improve the efficiency of (3) very substantially by keeping the current value of the set

`            {n in nodes | n notin range G} `
available at all times. To do this, we proceed as follows:
1. For each node n, we maintain a count of the number of the predecessors of n which have not yet been put into t.

2. When n is put into t, we reduce this count by 1 for all nodes nl in Gl~n}.

3. If count(x) falls to zero, then x becomes a member of the preceding set.
These observations, which could be derived step by step from the more general formal differencing principles discussed in Section 6.5, underlie the following revised form of (3):
```procedure top_sort3(G);
-- second iterative form of the topological sorting procedure
nodes := (domain G) + (range G);
count := {};    -- initialize the count function described previously

ready := nodes; -- The following loop will remove elements that have
for [x, y] in G loop
count(y) := (count(y)?0) + 1;
ready less := y; -- since y has a predecessor
end loop;
-- At this point 'ready' is the set of all nodes without predecessors

t := [ ];   -- t is the tuple being built up

t with:= n;

for n1 in G{n} loop
if (count(n1) -:= 1) = 0 then ready with:= n1; end if;
end loop;

end loop;

return t;

end top_sort3;
```
It is not hard to see that the preceding code examines each edge of the graph G just twice. Thus the time needed to execute this code is linearly proportional to #G.

9.3 The Stable Assignment Problem

Suppose that the members of a population of n students are applying to a collection of m colleges. We suppose also that each student finds a certain collection of colleges acceptable, and that he/she ranks these colleges in order of decreasing preference. Finally we suppose that each college c can admit only a given quota Q(c) of the students who apply to it, and that it is able to rank all the students in order of decreasing preference. We do not suppose that any of these preferences is necessarily related to any other; that is, different students can rank colleges in radically different orders, and different colleges may find quite different types of students preferable. The problem we consider is that of assigning students to colleges in such a way as to satisfy the following three conditions:

1. No college accepts more than Q(c) students;

2. A college c never admits a student s1 if it has filled its quota Q(c) and there exists an unassigned student s2 to whom college c is acceptable and whom college c prefers to student s1.

3. There is no situation in which student s1 is assigned to college c1 and student s2 is assigned to college c2, but both the students involved and the colleges involved prefer to switch; that is, s1 prefers c2 to c1, s2 prefers c1 to c2, c1 prefers s2 to s1, c2 prefers s1 to s2.

This problem was studied by David Gale and Lloyd Shapley (American Mathematical Monthly, 1962, pp. 9-15), who gave a simple algorithm for finding an assignment satisifying conditions (a), (b), and (c). The algorithm is just this: Each student applies to his first-choice college. Then each college c puts the topmost-ranked Q(c) students who have applied to it on an active list and notifies the others that they have been rejected. All rejected students now apply to their second-choice colleges. Then all colleges rerank their applicants, keep the first Q(c) of these applicants, and again notify the others that they have been rejected. This cycle of reapplication and reranking continues until no rejected students have any more colleges on their list of acceptable colleges. It is clear that the assignment produced by this procedure satisfies condition (a). Condition (b) is also satisfied, since if s2 finds college c acceptable, he/she will eventually apply to college c and can then bump any student s1 whom c finds less acceptable, but will never subsequently be bumped except by a student whom c finds more acceptable. Finally, condition (c) is satisfied, since if s1 prefers c2 to c1 he/she must have applied to c2 before c1 but been bumped from c2's active list by a student that c2 prefers to s1. But when this happened c2's active list could not have contained any student that c2 does not prefer to s1. Therefore, since the students on college c2's active list never grow any less attractive from c2's point of view, c2 will never regard any student on its final active list as less desirable than s2.

The Gale-Shapley iteration only continues as long as some student has just been rejected by the latest college to which they have applied. Since this rejection 'uses up' one of the items on that student's list of acceptable colleges, the nmber of iteerations required can be no more than the sum of the lengths of all these lists, plus 1. The code which follows reflects this fact.

Programmed in SETL, the Gale-Shapley algorithm is as follows.

```program gale_shapley; -- Gale-Shapley assignment algorithm
const A := "A", B := "B", CC := "CC", D := "D";  -- constants designating colleges

stpref := {[1, [A, B, CC]], [2, [B, CC, A, D]], [3, [CC, A, B]],
[4, [B, A, CC]],[5, [B, A, CC, D]]};   -- students' choices
colpref:= {[A,[1,2,3,4]],[B,[4,3,2,1]],[CC,[2,4,3]],[D,[1,2,4,5]]};
-- colleges' rankings of applicants
quot := {[A,2], [B,1], [CC,1], [D,2]}; -- size of entering classes

print(assign(stpref, colpref, quot));        --invoke assignment algorithm

procedure assign(stud_pref,coll_pref,quota);         -- Gale_Shapley stable assignment algorithm
-- stud_pref maps each student into the vector of colleges he/she finds
-- acceptable, ranked in decreasing order of preference; coll_pref(c)(s1 s2)
-- is true if college c finds student s1 preferable to student s2, false otherwise.
-- The map quota gives the number of students each college will accept.

colleges := domain(quota);
active := {[c,[]]: c in colleges};           -- set up an empty active list for each college
applicants := domain(stud_pref);             -- initialize the pool of applicants
maxits := 1 +/ [#colls: [-,colls] in stud_pref];    -- maximum reauired iterations; see preceding  remark

for j in [1..maxits] loop      -- iterate as often as needed

new_applicants := applicants; -- save the set of applicants, which will be iterated over

for s in applicants | stud_pref(s) /= [] loop
-- each unsatisfied student who has a college to apply to does so
first_choice fromb stud_pref(s);
active(first_choice) with:= s;
new_applicants less:= s;
end loop;

applicants := new_applicants; -- bring the set of applicants into its new condition

for c in colleges | #active(c) > quota(c) loop     -- drop all 'over quota' applicants

active(c) := pref_sort(active(c),coll_pref(c));     -- rerank all who have applied

for k in [quota(c) + 1..#active(c)] loop
applicants with:= active(c)(k);        -- return student to applicant pool

end loop;

active(c) := active(c)(1..#active(c) min quota(c));        -- cut back active list

end loop;

if not (exists s in applicants | stud_pref(c) /= []) then exit; end if;

end loop;

return [active, applicants];    -- pattern of assignments is complete
-- we  return  the list of  accepted students, by colleges,
-- and the set of students not accepted by any college to which they applied
end assign;

procedure pref_sort(apvect,order);
-- this returns the current group of applicants in the order of the college's choice.
applicants := {x: x in apvect};        -- convert to set
return [x in order | x in applicants] + [x in applicants | x notin order];        -- unranked students come last
end pref_sort;

end gale_shapley;
```

9.4 A Text Preparation Program

Text preparation programs aid in the preparation of printed material by arranging text in attractively indented, justified, centered, and titled paragraphs and pages. You may well have used some utility program of this type: they are commonly available under such names as Script, Runoff, Troff, etc. In this section, we will describe the internal structure of a simplified version of such a program.

Our program, which we will call Prepare, accepts source text containing embedded command lines as input and reformats the text in the manner specified by the command lines. Command lines are distinguished from text lines by the fact that the former start with a period as their first character, and by the fact that this initial character is followed by a few other characters signifying one of the allowed Prepare commands, as listed later. In its ordinary mode of operation, Prepare collects words from the text it is formatting and fills up successive lines until no additional words will fit on the line being filled. Then the line is right-justified and printed. However, commands can also be used to center a line, and lines can be terminated without being filled (we call this action a break). Text can also be arranged in several special table formats, as described later.

The Prepare program treats any unbroken sequence of nonblank characters as a word. An autoparagraphing feature, which causes every text line starting with a blank to start a new paragraph, is also available. Margins and spacing are controllable by commands. A literal command, which causes text following it to be printed exactly as it stands, is available to override the normal reformatting action of Prepare . Facilities for automatic numbering of sections and subsections are also available. If the activity of Prepare discloses incon- sistencies or errors in the commands presented to it, a file of diagnostic warnings is printed. The formatting commands supported by Prepare are listed in a table below. However, it will be easier to read these commands if you keep in mind the fact that they sense and manipulate the following variables, which are crucial to Prepare's activity:

 Variable Name Meaning Page_honzontal Horizontal width of paper Page_vertical Number of lines on page Spacing Current spacing of lines; 1 = single spacing Left_margin Current indentation for left margin Right_margin Current right indentation for right margin Old margins Saved prior values of margins Current line Line of output text currently being built up Fill Controls collection of words into current_line Justify Switch controlling right justification of output lines/TD> Line_count Counts number of lines output so far on current page Page_number_stack Stack of page and subpage numbers Number_pages Switch for page numbering Header_number_stack Stack of section and subsection numbers Title Current page title Subtitle Current page subtitle Chapter number Current chapter number

The commands supported by the Prepare system are as follows:

Now we give SETL code for our text preparation system.

```program prepare;        -- text preparation system
use random_pak;

var                                     -- global variables
last_page_had_output := true,        -- did the preceding page allow any input to be processed?
Page_horizontal,            -- horizontal width of paper
Page_vertical,             -- vertical height of paper, in lines
Spacing,                    -- current spacing of lines
Left_margin,                -- left margin
Right_margin,               -- right margin
Autoparagraph,              -- switch for autoparagraphing
Tuple_of_words,             -- collects words of input for output
Justify,                    -- controls right justification
Fill,                         -- controls filling of lines
Line_count,                 -- counts number of lines on page
Page_number_stack,       -- stack of page & subpage numbers
Number_pages,              -- switch for page numbering
Main_title,                 -- page title
Subtitle,                  -- page subtitle
Fillj_save,              -- saves fill & justify during LIT
First_page,                 -- switch for first page
Chapter_number,          -- current chapter number
Margin_save,                -- saves margins during indented note
Figure_lines,              -- number of lines reserved for figure
Figure_flag,              -- switch to leave space for figure
Page_figure_flag,        -- leaves space for figure on top of next    page
Indent_flag,              -- switch for indentation
Para_indent_flag,        -- switch for paragraph indentation
Number_blanks,              -- number of spaces to indent
Paragraph_spacing,       -- current spacing between paragraphs
Paragraph_indent;        -- number of spaces to indent for paragraph

var ihandle,ohandle,rand_handle;        -- file handles; also handle for random number generatiion

const  -- constants designating all commands
BR := "BR",S := "S",B := "B",FG := "FG",I := "I",P := "P",C := "C",NT := "NT",EN := "EN",TP := "TP",
NM := "NM",NNM := "NNM",CH := "CH",NC := "NC",T := "T",FT := "FT",SB := "SB",PG := "PG",ESP := "ESP",
HD := "HD",NHD := "NHD",J := "J",NJ := "NJ",F := "F",NF := "NF",LIT := "LIT",ELI := "ELI",LM := "LM",
RM := "RM",PV := "PV",SP := "SP",AP := "AP",SS := "SS",NAP := "NAP";

const Legal_ops :=   -- legal commands
{BR, S, B, FG, I, P, C, NT, EN, TP, NM, NNM, CH, NC, T, FT, SB, PG,
ESP, HD, NHD, J, NJ, F, NF, LIT, ELI, LM, RM, PV, SP, AP, SS, NAP};

const Cause_new_line :=         -- these commands cause Tuple_of_words to be
-- emptied. Text immediately following these
-- commands is output at beginning of new line.
{BR, S, B, I, C, NT, EN, PG, CH, J, NJ, F, NF, LIT, ELI, LM, RM};

-- The text preparation system's main job is to FILL and/or JUSTIFY the text found in its
-- source file. The action of the main procedure depends on the FILL and JUSTIFY settings.

-- If both the FILL and JUSTIFY switches are off, text is printed in the same
-- format as input. In all other cases words of text are broken out of the input
-- line and placed in a Tuple_of_words. If FILL is off and JUSTIFY is on
-- (i.e., we are justifying but not filling lines), then Tuple_of_words is printed in justified form
-- immediately after a line of input is processed. If FILL is on then Tuple_of_words
-- is filled until one of the following three cases arises:

-- (1) a command line is encountered. (a line begining with a period).

-- (2) a new paragraph is to begin.

-- (3) end of input.

-- However the entire Tuple_of_words is not output at this time. Those words that do not
-- fill the last line of output remain in Tuple_of_words until a specific command that
-- causes a new line is encountered, or until case 2 or case 3 is encountered.

-- NOTE: Only if FILL is on can ore than one line of input be held temporarily in Tuple_of_words
-- before being output

-- 'Autoparagraphing' is a major feature of the text preparation system.
-- If the Autoparagraphing switch is on, (it initially isn't), an input line beginning
-- with a blank causes Tuple_of_words to be emptied, a preset number of lines (initially 1)
-- to be skipped, and a preset indentation (initially 5 spaces) to be applied.

-- Command lines differ from text lines in that they begin with a period.

-- The source text to be processed should be  placed in a file named PREP.IN.
-- The file PREP.OUT will then contain the output.

-- Two major types of error are detected by the program:

-- Command error. This is caused by either an invalid command or an error
-- in the command's parameters. In this case the command is ignored and
-- an appropriate message is printed.

-- Justify error. If JUSTIFY is on and FILL is off, too many words in a line
-- of input (initially, more than 60 characters) or a single word will cause
-- a justify error. This type of error terminates processing.

-- Here are explanations of some of the subtler uses of variables in the following code:

-- Indent_flag/Para_indent_flags:
-- Two different flags are used two handle the two following situations.
-- Indent_flag is used when the command INDENT is encountered.
-- Para_indent_flag is used in conjunction with the Autoparagraph command.
-- Since the INDENT command can be used when Autoparagraph is on, two flags are necessary.

-- Figure_flag/Page_figure_flags: When the FG command is encountered,
-- Figure_flag is turned on so as to leave a specified number of lines blank
-- the next time Tuple_of_words is output. If an insufficient number of lines
-- are left on the current page for the figure, blank lines must be left at the top of
-- the next page. Page_figure_flag is turned on in this case.

-- First_page. This switch is initially on, but is turned off after the first page
-- It causes page headers for the first page to be output at the same time that
-- the first words of tTuple_of_words output.

-- Page headers are not printed during initialization, since we may first want to  execute
-- commands which will change certain default initializations.
-- This is especially necessary in connection with the following commands;

--     Fill      Fill      Fill      Fill -> ON (initialization value)
--     Justify      Justify      Justify      Justify -> ON
--     Page number switch      Page number switch      Page number switch      Page number switch -> ON
--     Title      Title      Title      Title -> NULL
--     Subtitle      Subtitle      Subtitle      Subtitle -> NULL
--     Chapter#      Chapter#      Chapter#      Chapter# -> 1

-- **** MAIN PROGRAM OF TEXT PREPARATION SYSTEM ****

initialize;            -- initialize all global variables  and set input and output files

loop                    -- remain in loop until all text is processed

geta(ihandle,line);           -- read line of input

if line = OM then exit; end if;    -- end of text input

if match(line,".") /= "" then    -- a command line

cmd := break(line," \t"); span(line," \t");        -- break out the comand

if cmd notin Legal_ops then error_proc(cmd); continue; end if;

if Fill then print_lines; end if;
-- output the text collected in Tuple_of_words in its correct format.
-- Words that remain in Tuple_of_words are those that do not fill the last line.

command_tuple := command_check(cmd,line);
-- 'command check' checks validity of the command line.
-- command_tuple contains the command and its parameters.

if command_tuple /= OM then
handle_command(command_tuple);  -- 'handle command' carries out the     command.
else
error_proc(cmd);
end if;

continue;

end if;                 -- end if match(line,".")

if not (Fill or Justify) then
output(line);           -- output line as it was read in
else                -- if Autoparagraph is on, a blank space at the beginning
-- of the paragraph signals for new paragraph.

if Autoparagraph and line(1) = " " then paragraph; end if;

span(line," ");         -- bypass remaining blanks

until line = "" loop    -- this loop places words of input into Tuple_of_words

if (next_word := break(line," ")) = "" then
next_word := line;
else
span(line," ");         -- bypass extra blanks
end if;

Tuple_of_words with:= next_word;        -- collect the word

end loop;

end if;

if Justify and not Fill then print_lines; end if;
-- if Fill is off, Tuple_of_words is printed after each line of text is read

end loop;

finalize;                       -- take final cleanup steps

-- ***************** END OF MAIN PROGRAM *****************

procedure initialize;     -- parameter & file name initialization
-- this procedure initializes all global variables and opens the input & output files.

ihandle := open("prep_in","TEXT-IN");            -- open source file
ohandle := open("prep_out","TEXT-OUT");            -- open output file
rand_handle := start_random(1000000,OM);        -- open a stream o f random numbers, using current time as seed

Page_vertical := 58;            -- default lines per page
Page_horizontal := 60;          -- default spaces per line
Left_margin := 0;               -- default margins
Right_margin := 60;             -- default margins
Spacing := 1;                   -- single spacing
Paragraph_spacing := 1;         -- lines between paragraphs
Paragraph_indent := 5;          -- default paragraph indentation
Indent_flag := false;           -- switch that controls indentation
Para_indent_flag := false;      -- controls indentation of paragraphs
Figure_flag := Page_figure_flag := false;
First_page := true;             -- turned off when first 'page' occurs
Print_header := true;           -- initially on
Main_title := Subtitle := "";
Autoparagraph := false;         -- initially off
Tuple_of_words := [];           -- contains words of text to be processed
Page_number_stack := [1];       -- initially on first page
Fill := true;                   -- initially on
Justify := true;                    -- initially on
Line_count := 1;                -- counts lines on page
Number_pages := true;           -- page numbering initially on
Chapter_number := 1;            -- advances with each chapter

end initialize;

procedure page; -- page advance procedure
-- This procedure is invoked whenever output proceeds to a new page. This
-- procedure puts out a line containing a page advance character, then the page
-- number, title, & subtitle if these are switched on.

if not last_page_had_output then print("PAGE IS TOO SMALL TO HOLD ANY OUTPUT"); stop; end if;
-- prevent endless looping if page size has  been set too small

Line_count := 0;                        -- zero the line count
First_page := false;                    -- drop the 'isfirst page' flag

if Number_pages then    -- build up first line with page number, which may involve subpages as in nn.smm.sskk
Number_line := "PAGE " +/[str(Page_number_stack(j)) + ".": j in [1..#Page_number_stack]];
Number_line := Number_line(1..#Number_line - 1);         -- drop last dot
Page_number_stack(#Page_number_stack) +:= 1;             -- advance top page number on  stack
output(Number_line); output("");                         -- output page number, then empty line
end if;

if Print_header then            -- print the title and subtitle, centered
center(OM,Main_title); center(OM, Subtitle);
output("");                -- skip 1 line
end if;

if Page_figure_flag then          -- leave room for a figure at the bottom of the page
Page_figure_flag := false;
blankout(Figure_lines);      -- enough lines for figure
end if;

end page;

procedure output(line);         -- this is the main output routine of the Prepare program

if First_page then page; end if;            -- output a page header if the First_page flag is set
nblanks := Spacing;     -- number of lines to skip

if Figure_flag then        -- skip space for figure

Figure_flag := false;     -- if figure can fit on this page, room is left for it. If there is
-- not enough space, room is left on the top of the next page.
-- Page_figure_flag is used in thelater case.

if Figure_lines + Line_count + Spacing >= Page_vertical then
Page_figure_flag := true;            -- set flag to leave space at  bottom of page
else
end if;

end if;

Line_count +:= 1;  -- counts lines on each page

printa(ohandle,line);

if Line_count >= Page_vertical then page; end if;    -- advance page if page is full
blankout(nblanks - 1);                        -- print blank lines if double print blank lines if double-  or triple-spacing, etc.

end output;

procedure command_check(cmd,line);   -- breaks command out of line
-- this procedure checks command and parameter validity; it also sends back
-- the command and its parameters in a Tuple.

case cmd

when  BR, EN, PG, NM, NNM, SP, ESP, HD, NHD,
J, NJ, F, NF, LIT, ELI, AP, NAP =>  -- no parameters
return [cmd];

when CH, T, FT, SB =>           -- these commands have one string parameter

if line = "" then return OM; else return [cmd,line]; end if;

when  S, B, FG, I, TP, NC, LM, RM, SS, PV => -- these commands have one integer parameter

if (param := integer_check(line)) = OM then
return OM;                              -- error encountered
else
return [cmd,param];
end if;

-- the remaining commands are treated separately

when NT =>

if line = OM then                       -- has form 'NT text'.
return [cmd,"NOTE"];   -- If 'text' is OMitted then the word 'NOTE' is used
else
return [cmd,line];     -- return command with trailing text
end if;

when C =>               -- has form 'C n; text'. n is optional

if match(line,";") = "" then     -- semicolon is not first character, so integer is  present

the_int := break(line,";");        -- find the integer
if (param := integer_check(the_int)) = OM then return OM; end if;    -- verify
span(line," /t");                -- drop trailing whitespace

if match(line,";") = "" then return OM; end if;        -- semicolon is missing
span(line," ");                    -- drop trailing whitespace

if line = "" then return OM; else return [cmd,param,line]; end if;        -- text is required

else                 -- semicolon is first character, so integer is not present
span(line," /t");                -- drop trailing whitespace
if line = "" then return OM; else return [cmd,OM,line]; end if;        -- text is required
end if;

when P =>               -- has the form 'P n1 n2'

if (param := integer_check(line)) = OM then return OM; end if;    -- check first integer

span(line," /t");                -- drop trailing whitespace
if (param2 := integer_check(line)) = OM then return OM; end if;    -- check second integer

return [cmd, param, param2];     -- return command with both parameters

end case;

end command_check;

procedure integer_check(rw line);     -- checks validity of string integer

if (param  := span(line,"-0123456789")) = "" or line = "-" then return OM; end if;
if match(param,"-") = "" then return unstr(param); else return -unstr(param); end if;

end integer_check;

procedure handle_command(command_tuple); -- command interpeter
-- this command interpeter handles all Prepare commands.

[cmd,p1,p2] := command_tuple;
if (cmd in Cause_new_line) then print_remaining_line; end if;
--print("handle_command: ",cmd);
case cmd

when BR =>             -- break command
return;

when I =>            -- indent
Indent_flag := true;
Number_blanks := p1 max 0 min (Right_margin - 10);

when NM =>          -- resume page numbering
Number_pages := true;

when NNM =>                     -- end page numbering
Number_pages := false;

when NC =>                      -- set chapter number
Chapter_number := p1;

when T =>                       -- supply title
Main_title := p1;

when SB =>                      -- subtitle
Subtitle := p1;

when SP =>                      -- start subpage
page;
Page_number_stack with:= 1;

when ESP =>                     -- end subpage
page;

if #Page_number_stack > 1 then  -- drop one level
junk frome Page_number_stack;
Page_number_stack(#Page_number_stack) +:= 1;
end if;

when HD =>                              -- print page headers

when NHD =>                             -- end page headers

when J =>                               -- begin justification
Justify := true;

when NJ =>                              -- end justification
Justify := false;

when F =>                               -- fill lines
Fill := true;

when NF =>                              -- end filling lines
Fill := false;

when PV =>                              -- lines per page
Page_vertical := p1;

when LIT =>                             -- suspend fill/justify
Fillj_save := [Fill, Justify];
Fill := Justify := false;

when ELI =>                             -- resume fill/justify
[Fill, Justify] := Fillj_save;

when LM =>          -- set left margin
Left_margin := p1 max 0 min (Right_margin - 10);
Page_horizontal := Right_margin - Left_margin;

when RM =>          -- set right margin
Right_margin := p1 min (Page_horizontal + Left_margin) max (Left_margin + 10);
Page_horizontal := Right_margin - Left_margin;

when SS =>         -- set spacing
Spacing := p1 max 1 min 5;

when AP =>         -- start autoparagraphing
Autoparagraph := true;

when NAP =>                     -- end autoparagraphing
Autoparagraph := false;

when P =>                       -- set paragraph spacing
Paragraph_indent := p1;
Paragraph_spacing := p2;

when S =>           -- skip n spacings
if (p1 * Spacing) + Line_count > Page_vertical then
page;
else
blankout((p1 max 0) * spacing);
end if;

when B =>           -- skip n lines
if p1 + Line_count > Page_vertical then
page;
else
blankout(p1 max 0);
end if;

when FG =>              -- leave lines for figure
Figure_flag := true;
Figure_lines := p1;

when C =>           -- center text
center(p1, p2);

when PG =>                       -- start new page if current is not empty
if Line_count > 0 then page; end if;

when TP =>                      -- start new page if less than p1 lines remain

if Line_count + p1 >= Page_vertical then page; end if;

when CH =>                      -- chapter
chapter(p1);

when NT =>                      -- indented note
blankout(2);                                                -- skip two blank lines
if p1 = "" then p1 := "NOTE"; end if;                        -- use "NOTE" if caption absent
center(OM,p1);                                                -- print the note header
blankout(1);                                                -- skip line
Margin_save := [Right_margin,Left_margin,Page_horizontal];    -- save the margin settings
Left_margin +:= (Page_horizontal / 4);                        -- indent left margin
Right_margin:= Right_margin - (Page_horizontal / 4);        -- indent right margin
Page_horizontal := Right_margin - Left_margin;                -- set available space

when EN =>                      -- end indented note
blankout(2);                                                -- skip two blank lines
[Right_margin,Left_margin,Page_horizontal] := Margin_save;    -- restore the margin settings

otherwise =>

print("****** UNKNOWN COMMAND *****");

end case;

end handle_command;

procedure paragraph;    -- autoparagraph procedure. This procedure is called when a space
-- begins a line of input (and AUTOPARAGRAPH is on).

if Fill then print_lines; end if;          -- first the Tuple_of_words is completely output.
print_remaining_line;

Para_indent_flag := true;       -- the next time the Tuple is printed, indentation
-- will be made. (see procedure 'indenter').
blankout(paragraph_spacing - 1);        -- write designated number of blank lines  before paragraph

end paragraph;

procedure blankout(nlines);             -- leaves nlines empty. If the end of page
-- is reached no more lines are blanked out.
if First_page then page; end if;

for j in [1..nlines] loop
Line_count +:= 1;
printa(ohandle,"");       -- outputs a blank line
if Line_count >= Page_vertical then page; return; end if;        -- new page, no more blank lines
end loop;
last_page_had_output := true;        -- note  this non_recurring output

end blankout;

procedure indenter;     -- if Para_indent_flag or Indent_flag is on, this procedure causes indentation.
-- This is done by adding blanks to the beginning of the first word in the Tuple.
-- Number_blanks contains the number of blanks to be indented.

If Tuple_of_words = [] then return; end if;        -- nothing to do in this case

if Para_indent_flag and Indent_flag = false then    -- indent for paragraph only
Number_blanks := Paragraph_indent;                -- use size of paragraph indent
end if;

if Indent_flag or Para_indent_flag then    -- we  indent  only oneline
Para_indent_flag := false; Indent_flag := false;        -- drop flags, since  indenting this line

--prefix the first word in the Tuple with blanks
Tuple_of_words(1) := (" " * Number_blanks) + Tuple_of_words(1);
end if;

end indenter;

procedure print_lines;          -- generic printing procedure

indenter;
printed_lines := if Fill and Justify then fill_and_justifier() else fill_only() end if;

for line in printed_lines loop          -- Fill on, Justify off
line := (" " * Left_margin) + line;
output(line);
end loop;

end print_lines;

procedure fill_and_justifier;       -- produces output that is filled and justified
return fill_may_justifier(true);        -- call for fill and justify
end fill_and_justifier;

procedure fill_only;       -- produces output that is filled but not justified
return fill_may_justifier(false);        -- call for fill with no justify
end fill_only;

procedure fill_may_justifier(justify_flag);       -- produces output that is filled and possibly justified

last_page_had_output := true;        -- some input is processed

indenter;            -- indent line if indent is on
processed_lines := [];        -- start to collect justified lines

loop
nwords := nchar := 0;         -- number of words and of characters in line so far

exited := false;            -- exit flag for following loop

for word in Tuple_of_words loop
nchar +:= #word + 1; nwords +:= 1;        -- count words and characters collected
if nchar > (Page_horizontal + 1) then final_word  := word; exited := true; exit; end if;
-- exit if full, but insist on collecting at least one word
end loop;

if not exited then

if not Fill then             -- process last line, even though not yet full
processed_lines with:= put_spaces(#Tuple_of_words,Page_horizontal - nchar + 2,Justify);
Tuple_of_words := [];        -- drop these words, which have been used
end if;

return processed_lines;

end if;

temp_line := put_spaces(nwords - 1,Page_horizontal - nchar + #final_word + 2,Justify);
Tuple_of_words := Tuple_of_words(nwords..);

if (wd := Tuple_of_words(1)) /=  OM  and #wd > page_horizontal then         -- excessively longword
Tuple_of_words(1..1) := [wd(1..page_horizontal),wd(page_horizontal + 1..)];    -- split  into parts
end if;

processed_lines with:= temp_line;

end loop;

end fill_may_justifier;

procedure justifier_error;
abort("TEXT PREPARATION TERMINATED DUE TO ERROR IN USE OF JUSTIFY");
stop;
end justifier_error;

procedure put_spaces(nwords,nblanks,justify_flag);        -- may insert spaces into line to force right may insert spaces into line to force right-justification

space_count  := (nspots := (nwords -  1) max 0) * [1] + [0];
-- this array will count number of blanks in spots following each word except last

if justify_flag and nspots > 0 then        --  insert  spaces to justify

for j in [1..nblanks] loop        -- insert required total of blanks in random positions
space_count((random(rand_handle) mod nspots) + 1) +:= 1;
end loop;

end if;

return "" +/ [Tuple_of_words(j) + space_count(j) * " ": j in [1..nwords]];        -- now collect words

end put_spaces;

procedure print_remaining_line; -- procedure prints remaining line from  Tuple_of words

if Tuple_of_words = [] then return; end if;
temp_line := (" " * Left_margin);
word fromb Tuple_of_words;

while word /= OM loop
temp_line +:= word + " ";
word fromb Tuple_of_words;
end loop;

output(temp_line);

end print_remaining_line;

procedure center(n,text);   -- center text on column n

if n = OM then n := (Page_horizontal / 2) + Left_margin; end if;
n := (n - #text/2) max 0;
line := (n * " ") + text;
output(line);

end center;

procedure chapter(text);                -- new chapter is to begin, chapter headers are printed.

page;
blankout(3);
center(OM, "CHAPTER " + str(Chapter_number));

Chapter_number +:=1;
blankout(2);
center(OM, text);
blankout(3);
Main_title := text; -- set the title to the chapter text

-- the following are reset to their initial values
Subtitle := "";
Spacing := 1;
Justify := true;
Fill := true;
Left_margin := 0;
Right_margin := 60;
Page_horizontal := 60;

end chapter;

procedure error_proc(cmd); -- prints out error message
print("ERROR ENCOUNTERED WITH COMMAND: ", cmd, " - COMMAND IGNORED");
end error_proc;

procedure finalize;    -- finalize system, first print what's left in the tuple.

if Fill then print_lines; end if;
print_remaining_line;
close(ihandle);
close(ohandle);

end finalize;

end prepare;```

9.5 A Simplified Financial Record-keeping System

Next we will give SETL code representing some small part of the operations of a bank, albeit in simplified form. The system to be represented corresponds in a rough way to the "Checking Plus" service offered by Citibank in New York City. Note, however, that the simple code shown does not deal adequately with all the anomalies and error conditions that a full-scale banking system would have to handle, nor does it support all the functions that are actually required. For example, the code we give does not provide any way for customer accounts to be opened or closed. A more ambitious commerical application showing how such matters can be treated would be very instructive, but since the issues that enter into the design of a full-scale commercial systcm can grow to be quite complex, we will not attempt to discuss the whole interesting range of questions that enter into the design of such systems.

The simplified system which we consider is aware of a collection of customers, each of whom has an account. A customer's account consists of two parts: a balance representing funds available to the customer, and an overdraft debit representing the amount that the customer has drawn against the Checking Plus feature of his/her account. This debit is limited for each account not to exceed a given credit_limit, established when the account is opened. The bank pays 5% per annum daily interest on positive balances in checking accounts, and charges 18% per annum daily interest on overdraft debits. Like most commercial application programs, the following code maintains a data base, i.e., a collection of maps which collectively represents the situation with which the program must deal, and reads a transaction file whose entries inform it of changes in this situation. Using these files it produces various output documents, for example, lists of checks deposited for transmission to other banks, and monthly statements which are mailed to customers.

The transactions supported by our simplified system are as follows:

 Transaction Code Explanation deposit (D) Customer deposits either cash, a check drawn on another bank, or a check drawn on this bank. withdrawal (W) A customer appears at a teller's booth and attempts to withdraw cash. payment (PA) Customer transfers a stated sum from his available balance to reduce his overdraft debit. presentation (P) Check is presented by another bank for payment. clear (C) Another bank informs this bank that a check has cleared for payment. return (R) A previously deposited check, sent to another bank for payment, is returned either as a bad check or for lack of available funds. (Checks written without sufficient funds cause their writer account to be debited \$10.00). end of day (DAY) End of banking day has arrived, daily interest is to be credited/debited to all accounts.

On the last day of each month, an end_of_day transaction triggers the production of bank statements which are sent to each customer. On the last day of December, this statement includes an indication of interest charged and interest earned during the year.

Each transaction handled is represented by a single line (string) in the transaction file. This line always starts with a code letter identifying the transaction and for the rest consists of various fields, separated by blanks.

The fields expected for the various transactions supported are as follows:

 D customer number,amount,bank_number,account_number (missing if cash deposit) W customer_number,amount,teller_terminal_number PA customer_number,amount P customer_number,amount,check number,bank number C check number R check number reason DAY (None)

The global data structures used to support our simplified banking system are as follows:

 cust_info This map sends each customer_number into the record maintained for the corresponding customer.

The components of a customer record are

 balance available balance currently available balance_deposited balance showing checks deposited but not yet cleared overdraft_debit amount currently drawn against "Checking Plus" overdraft_limit maximum overdraft allowed transactions_this_month list of all completed transactions this month interest earned total interest earned this year interest_paid total interest paid for overdrafts this year name customer name social security number customer social security number address customer address telephone number customer telephone number bank_info This map sends the numerical code of each bank from which checks will be accepted into the bank's address information. pending_checks When checks deposited are sent along to another bank for confirmation of payment, they are issued unique numerical identifiers. This map sends each such identifier into the transaction to which it corresponds.

Having now outlined all the transactions which our simplified banking system will support and listed the principal data structures which it uses, we are in position to give the code itself

```program bank_checking;   -- simplified check simplified check-processing program
use sort_pak;

-- ******* DECLARATION OF GLOBAL VARIABLES AND CONSTANTS *******

var         -- global variables
Cust_info,        -- maps account number into customer record
Bank_info,        -- maps bank number into bank address, etc.
Pending_checks,    -- maps each suspended transaction number into a detailed transaction record
This_banks_code,    -- code identifying this bank
Check_counter,    -- counter identifying checks sent to other banks for verification
Message_list,    -- maps each bank identifier into a list of messages to be sent to the bank.
Transfile,    -- file of transactions, to be processed
Last_day;    -- last day for which 'DAY' operation was run

var transaction_handle;            -- handle for transactions file
var is_January;                    -- flag for first month of year
var simulated_date := OM;        -- nominal date in banking simulation

var traversed:= {};                    -- code points traversed

const                       -- strings indicating transaction results

cash_dep := "CASH_DEP",                 -- cash deposit
cash_withdrawal := "CASH_WITHDRAWAL",     -- cash withdrawal
payment := "PAYMENT",                    -- payment of check
ckplus_payment := "CKPLUS_PAYMENT",     -- "checking "checking-plus" payment
deposit := "DEPOSIT",                    -- check deposited
overdrawn := "OVERDRAWN",                -- charge for overdrawn check
nofunds := "NOFUNDS",                    -- funds not available to pay check

const        -- constants designating transaction codes
D := "D",W := "W",PA := "PA",P := "P",C := "C",R := "R",DAY := "DAY";

const Transaction_codes := {D, W, PA, P, C, R, DAY};
-- constants designating transactions.

const Involves_customer := {D, W, PA, P};
-- transactions whose second parameter is a customer number.

const Needs_updating := {D, W, PA, P, C, R};
-- transactions which modify customer record.

const digits := "0123456789"; -- the decimal digits

const Annual_rate := 6,    -- interest paid on checking balances
Overdraft_rate := 18;    -- interest charged on overdrafts

-- ****** MAIN PROGRAM OF BANKING SYSTEM ******
--setup;  stop;
initialize_system;
-- call initialization procedure to read in all required global data structures.

loop

if transaction = OM then exit; end if;    -- all transactions processed
if #transaction = 0 then continue; end if;        -- skip blanks
if transaction(#transaction) = "#" then exit; end if;        -- exit  on endmarks
if "#" in transaction then rbreak(transaction,"#");  rspan(transaction,"# \t"); end if;
process_transaction(transaction);        -- otherwise process transaction

end loop;

finalize_system;     -- write state of system to output file
print; print; print("END OF TRANSACTION PROCESSING"); print("Traversed: ",merge_sort(traversed));

procedure seen(pt);            -- a code a code-point is traversed
traversed with:= pt;
end seen;

procedure setup;            -- set up initial data

input_file := "banking_input_file"; Transfile := "transactions_file";
input_handle := open(input_file,"BINARY-OUT");   -- open the input file for reading.
This_banks_code := "Bank_0001"; Check_counter := 0;

balance_available := 100000;            -- cust 1 info
balance_deposited := 100000;
overdraft_debit := 0;
overdraft_limit := 100000;
transactions_this_month := [];
interest_earned := 0;
interest_paid := 0;
name := "Jack";
soc_sec_no := "098-90-1888";
address := "67 Broome St., NYC,NY,10011";
tel_no := "212-998-1111";

cust_1_info :=  [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,

balance_available := 200000;            -- cust 2 info
balance_deposited := 200000;
overdraft_debit := 0;
overdraft_limit := 200000;
transactions_this_month := [];
interest_earned := 0;
interest_paid := 0;
name := "Jill";
soc_sec_no := "089-09-2777";
address := "76 Groom St, NYC, NY, 10211";
tel_no := "212-678-1234";

cust_2_info :=  [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,

Cust_info := {["001-001-0001",cust_1_info],["001-001-0002",cust_2_info]};
Pending_checks := {};

bank_name := "Prettybank";                -- bank 1 info
bank_address := "101 Wall St, NYC, NY, 10001";
bank_tel :=  "212-444-4444";

bank_name := "Time Savings Bank";                -- bank 2 info
bank_address := "201 Tall St, Bklyn, NY, 20022";
bank_tel :=  "718-666-0000";

Bank_info := {["Bank_0001",bank_1_info],["Bank_0002",bank_2_info]};
Last_day:= "01/27/101";
putb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
close(input_handle);       -- now finished with input file; release it
input_file := "banking_input_file"; Transfile := "transactions_file";
input_handle := open(input_file,"BINARY-IN");   -- open the input file for reading.
getb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
close(input_handle);       -- now finished with input file; release it
print(This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);

end setup;

procedure process_transaction(t);
-- the principal transaction the principal transaction-processing procedure.
print("\ntransaction: ",t," ");

if (dec := decode_transaction(t)) = OM then return; end if;        -- since transaction is bad.
[code,cust_number,amount,p4,p5] := dec;    -- get fields of transaction
number:= cust_number;        -- for the 'C' and 'R' transactions,wich use a  generated transaction serial number

if code in Involves_customer then

[balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month, interest_earned,
interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number)?[];     -- obtain fields of customer record
-- empty record signals that customer is bad
-- the vector of items constituting a customer's record. Note that all
-- amounts are kept as integer numbers of pennies.
-- make balance_available, balance_deposited, overdraft_debit, overdraft_limit, etc. available.
--print("Involves_customer: ",cust_number," ",Cust_info(cust_number)," ",balance_deposited);
end if;

case code

when D =>             -- deposit

if p4 = OM then
seen("DA");
balance_available +:= amount;     -- deposit is cash: accept it immediately
balance_deposited +:= amount;
transactions_this_month with:= post(CASH_DEP,amount);

elseif p4 = This_banks_code then         -- check is drawn on this bank
seen("DB");
-- We handle a check drawn on this bank as a combination of a 'P' transaction
-- with the transaction (either 'C' or 'R') that responds to this 'P' transaction.
-- For this, it is convenient to allow this procedure to call itself recursively
balance_deposited +:= amount;
Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

process_transaction("P " + p5 + " " + dollar(amount) + " 0 " + This_banks_code);
-- generate a  payment transaction for the account  on  which the check is drawn
pending_checks("0") := t;  -- for consistency, note that pending check is drawn on this bank

result := Message_list(This_banks_code)(1);        -- get the result  of this payment transaction
Message_list(This_banks_code) := [ ];     -- get result and clear message list

process_transaction(result);             -- process the resulting 'C' or 'R'; check OK or refused
return;                                 -- since all steps of transaction are now complete

else         -- the check is drawn on another bank. Note, but do not credit, the deposit.
seen("DC");
balance_deposited +:= amount;
identifier := str(Check_counter +:= 1);
Pending_checks(identifier) := t;     -- save transaction for later completion.
Message_list(p4) with:= "P " + p5 + " " + dollar(amount) + " " + identifier + " " + This_banks_code;
-- send notification to bank on which the check is drawn

end if;

when W =>    -- withdrawal

if ok_withdraw(amount,balance_available,overdraft_debit,balance_deposited,overdraft_limit) then
seen("WA");
send_teller(p4,"PAYMENT APPROVED");
transactions_this_month with:= post(CASH_WITHDRAWAL,amount);
else
seen("WB");
send_teller(p4, NOFUNDS);
end if;

when PA =>            -- payment of portion of overdraft debit

will_pay := amount min balance_available min overdraft_debit;
if will_pay = 0 then return; end if; -- bypass transaction  if payment  = 0
seen("PAB");
balance_available -:= will_pay;
balance_deposited -:= will_pay;        -- update balance_deposited
overdraft_debit -:= will_pay;
transactions_this_month with:= post(CKPLUS_PAYMENT,will_pay);

when P =>            -- presentation (for approval) of check by other bank

if (c_info := Cust_info(cust_number)) = OM then        -- check is bad
seen("PA");
Message_list(p5) with:= "R " + p4 + " X " + BAD_CHECK;
-- note: the "reason" parameter is forced into fourth position by the inserted X
return;                -- abort transaction
end if;

[balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := c_info;
-- make fields of customer info available

if ok_withdraw(amount, balance_available, overdraft_debit,balance_deposited,overdraft_limit) then
seen("PB");
Message_list(p5) with:= "C " + p4;    -- confirm clearance
transactions_this_month with:= post(PAYMENT, amount);

else        -- check must be refused for insufficient funds
seen("PC");
Message_list(p5) with:= "R " + p4 + " X " + NOFUNDS;
-- note: the "reason" parameter is forced into fourth position by the inserted X.
-- in this case the customer is charged a \$10.00 fee, or whatever smaller amount remains in his account
charge := 1000 min (balance_available + overdraft_limit - overdraft_debit) max 0;
assert(ok_withdraw(charge,balance_available,overdraft_debit,balance_deposited,overdraft_limit));
transactions_this_month with:= post(OVERDRAWN, charge);
end if;

when C =>    -- pending check clears
seen("C");
assert(dec := decode_transaction(Pending_checks(number)))/= OM;

-- We can make this assertion because the system represented here does not
-- allow customer accounts to be closed. However, this assertion would
-- continue to hold true even in a more realistic system, since in such a system
-- we would not close an account until all its outstanding deposit transactions
-- have been completed.

Pending_checks(number) := OM; -- drop from pending list
[-, cust_number,amount] := dec;  -- get customer number and amount
[balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number);
balance_available +:= amount; -- credit to available balance
transactions_this_month with:= post(PAYMENT, amount);

when R =>        -- pending check fails to clear
seen("R");
reason := p4;
-- in this, case the p4 field contains the reason for refusal of the check transmitted for approval

assert((dec := decode_transaction(Pending_checks(number)))/= OM);
-- see comment following case(C)

Pending_checks(number) := OM; -- drop from pending list
[-,cust_number,amount] := dec;  -- get customer number and amount
[balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no] := Cust_info(cust_number);
balance_deposited -:= amount;     -- debit the estimated total of deposits.
transactions_this_month with:= post(reason, amount);

when DAY =>    -- end of banking day: take end end of banking day: take end-of-day, and if necessary end-of-month, actions.
end_of_day;        -- take end of day actions
seen("DAY");
if day_field(daystring()) = "01" then
seen("MONTH");
end_of_month;
end if;

otherwise =>        -- have some system error.
-- take end_of day action, save system, and note error.

print("SYSTEM ERROR *** ILLEGAL TRANSACTION:", t);
finalize_system;
stop;

end case;

if code in Needs_updating then             -- customer information must be updated
Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

print("Updated: ",cust_number," ",Cust_info(cust_number));
end if;

print ("MESSAGE LIST", Message_list);

end process_transaction;

procedure ok_withdraw(amount,rw bal_avail,rw over_debit,rw bal_deposit,over_limit);
-- This auxiliary procedure checks to see whether the stated 'amount' can
-- be withdrawn from an account, by increasing the overdraft debit if
-- necessary. If so, the balance available, amount provisionally on deposit,
-- and overdraft debit are appropriately adjusted, and true is returned;
-- otherwise false is returned.

if amount > (bal_avail + over_limit - over_debit) then          -- no good
return false;
end if;

bal_avail -:= (amt_frm_bal := amount min bal_avail);
bal_deposit -:= amt_frm_bal;

over_debit +:= (amount - amt_frm_bal);
return true;

end ok_withdraw;

procedure post(trans_type,amount);
-- This auxilliary routine converts transactions into strings consisting of an
-- amount, a coded indicator of the transaction type, and a date; the result is
-- suitable for printing in a customer's end suitable for printing in a customer's end-of-month statement.

return daystring() + " " + trans_type + " " + dollar(amount);
end post;

procedure decode_transaction(t);   -- decodes string form of transaction
-- This procedure reads the string form of a transaction and decodes it into
-- the various blank the various blank-separated fields of which it consists. It verifies that each
-- field has the expected type. If any field is found to be bad, or if any field is
-- missing, then the transaction is posted to a "rejected transactions" list, and
-- this procedure returns OM.

-- Otherwise, a tuple c consisting of the converted fields is returned.

-- Map from transaction type to pattern of fields expected for transaction.
-- See procedure  See procedure -field_check-, below, for an explanation of the codes
-- appearing here.

const XCABX := "XCABX",XCAX := "XCAX",XCA := "XCA",XXAXX := "XXAXX",
XX := "XX",XXXX := "XXXX",X := "X";        -- checkstring constants

const Check_strings
:= {[D,XCABX],[W,XCAX],[PA,XCA],[P,XXAXX],[C,XX],[R,XXXX],[DAY,X]};

savet := t;       -- save original form of transaction string
decoded_trans := [];     -- tuple for decoded form of transaction
nfield := 1;      -- counter for field number
check_string := "T";   -- check character for first field is 'T'

while t /= "" and nfield <= #check_string loop

span(t," "); -- span off blanks
if (raw_field := break(t," ")) =  "" and nfield = 4 and decoded_trans(1) = "D" then exit; end if;
-- special case of cash deposit; just 3 fields

if (field := field_check(raw_field,check_string(nfield))) = OM then
return OM;
end if;

-- If the first field has just been decoded, use it to determine what further checks are necessary.
if nfield = 1 then check_string := Check_strings(field); end if;

decoded_trans with:= field;
nfield +:= 1;

end loop;

if #decoded_trans = #check_string
or (decoded_trans(1) = D and #decoded_trans = 3) then
return decoded_trans;
end if;

Bad_transactions with:= t;   -- otherwise missing or superfluous fields
return OM;

end decode_transaction;

procedure field_check(field,test_char); -- auxiliary test/convert procedure

-- This procedure checks the  This procedure checks the -field- passed to it for conformity with the
-- expected field type, which is described by its  expected field type, which is described by its -test_char- argument.
-- The allowed test_char characters, and their significance, are as follows:

-- 'T': must be transaction code
-- 'X': no test required
-- 'C': must be customer account number
-- "A" : must be dollar amount
-- "B" : must be identifier of correspondent bank

-- If the test fails, then OM is returned; if the test succeeds, and the field type is
-- "A" , then the field is converted from standard DDDD.CC 'dollars and cents'
-- form to an integer number of cents,

case test_char

when "T" => return if field in Transaction_codes then field else OM end if;
when "X" => return field;
when "C" => return if Cust_info(field) = OM then OM else field end if;

when "A" => dollars := span(field, Digits);
if match(field,".") = OM then return OM; end if;
cents := span(field, Digits);
if #cents /= 2 or field /= "" then return OM; end if;
return 100 * unstr(dollars) + unstr(cents);

when "B" => return if field /= This_banks_code and Bank_info(field) = OM then
OM else field end if;
otherwise => return OM;

end case;

end field_check;

procedure initialize_system; -- system initialization code

-- First we acquire the name of the input file for this run of the banking system,
-- which is supplied as a command which is supplied as a command-line parameter;
--    input_file := getspp("OLD = OLD.DAT/OLD.DAT");

-- Next we read the code for this bank, the pending transaction counter, the
-- master customer file, the bank address file, and the last previous processing
-- date, from the specified input information file.
input_file := "banking_input_file"; Transfile := "transactions_file";

input_handle := open(input_file,"BINARY-IN");   -- open the input file for reading.
getb(input_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,Last_day);
close(input_handle);       -- now finished with input file; release it

-- Next various subsidiary initializations are performed.

Message_list := {[bank, [ ]]: x = Bank_info(bank)};
-- start an empty message file for each correspondent bank

--    Transfile := getspp("TRANS = TRANS.DAT/TRANS.DAT");
transaction_handle := open(Transfile,"TEXT-IN");    -- open file of transactions

end initialize_system;

procedure finalize_system;        -- end end-of-run 'dump' procedure
return;
-- First we acquire the name of the output file for this run of the banking
-- system, which is supplied on the command line.

output_file := getspp("NEW = NEW.DAT/NEW.DAT");
output_handle := open(output_file, "BINARY-OUT"); -- open the output file for writing.

-- Next we write the code for this bank, the pending transaction counter, the
-- master customer file, and the bank file to be specified output file

putb(output_handle,This_banks_code,Check_counter,Cust_info,Pending_checks,Bank_info,daystring());

close(output_handle);      -- now finished with output file; release it(See Section 9.1).

end finalize_system;

procedure send_teller(terminal_no, msg);
-- In an actual system, this procedure would send the message  In an actual system, this procedure would send the message -msg- to the
-- teller terminal identified by  teller terminal identified by -terminal_no-. Since it is not easy to use SETL
-- to send messages to more than one terminal, we simplify this procedure
-- drastically and simply print  drastically and simply print -msg-, with an indication of the number of the
-- terminal to which msg should actually be sent.

print(msg," has been sent to terminal ",terminal_no);
end send_teller;

procedure end_of_day;       -- end of day procedure
-- This procedure is called at the end of each banking day. In practice, it would
-- write out a collection of files, including the following:

-- (a) for each bank with which this bank does business, a file of messages, each
-- representing either a
-- (i) confirmation that a check transmitted for approval was actually approved;
-- (ii) rejection of a check, with an indication of the reason for rejection;
-- (iii) request for approval of a check,
-- (b) a list of bad transactions, for visual inspection and possible reentry.

-- We begin by crediting interest payments and making interest charges for all
-- customers.

-- First check to ensure that interest has not already been credited today.
if daystring() /= Last_day then

for [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no] = Cust_info(cust_number) loop

interest_earned +:= (earned := (balance_available * Annual_rate) / 36500);
balance_available +:= earned; balance_deposited +:= earned;

-- Next, make charges on the customer's overdraft debit
interest_paid +:= (owed := (Overdraft_debit * Overdraft_rate) / 36500);

-- Draw this interest out of the account if possible. If not enough remains,
-- interest will be charged as an overdraft, even though this causes the actual
-- overdraft to exceed its stated limit.

if not ok_withdraw(owed,balance_available,overdraft_debit,balance_deposited,overdraft_limit) then
-- run an "excess overdraft"

overdraft_debit +:= owed - balance_available;
balance_deposited -:= balance_available;
balance_available := 0;

end if;

Cust_info(cust_number) := [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no];

end loop;

is_January := (month(daystring()) = "01");

end if;

-- Write a file of messages for each bank with which this bank does business.

for bank_inf = Bank_info(code) | code /= This_banks_code loop

write_message_file(bank_inf,Message_list(code));
Message_list(code) := [ ]; -- clear the message list to avoid resending.
end loop;

-- Write out the file of bad transactions.

end end_of_day;

procedure write_message_file(bank_inf, mess_list);
-- In a realistic system, this procedure might write a list of messages to a file, possibly on
-- magnetic tape, which was then sent by internet, air express, or special courier to one of
-- the banks with which this bank does business. However, in our simplified
-- system, we simply print out a  system, we simply print out a -bank_inf- as a header and follow it by the
-- individual messages of mess_list.

print; print; print(bank_inf); print; print;

for m in mess_list loop print(m); end loop;

end write_message_file;

-- In a realistic system, this procedure might write its list of transactions to an
-- on on-line disk file, which would then be scrutinized and manually edited,
-- reference being made if necessary to the original handwritten or typed
-- document which first ordered the transaction. However, in our simplified
-- system, we simply print out the list of bad transactions.

for m in list loop print(m); end loop;

print("END OF BAD TRANSACTION LIST"); print;

procedure month(dstg); return dstg(1..2); end month;        -- extract  month from daystring

procedure end_of_month;            -- end end-of-month procedure
-- This procedure, called on the last day of each month, prepares a monthly
-- statement for each customer. If the month is January, a statement of total
-- interest charged/earned appears on the statement, and the accrued interest
-- fields in the customer record are cleared.
--print("end_of_month:",Cust_info);
if daystring() = Last_day then return; end if;    -- since statements have already been prepared.

for [balance_available, balance_deposited, overdraft_debit, overdraft_limit, transactions_this_month,
interest_earned, interest_paid, name, soc_sec_no, address, tel_no] = Cust_info(cust_number) loop

print("Monthly Statement as of ",(ds := daystring())(1..2),"/",ds(3..4),"/",ds(5..)); print;
for trans in transactions_this_month loop print(trans); end loop;

transactions_this_month := [ ];

if is_January then        -- test for January
print;
print("SAVE THIS STATEMENT-IT CONTAINS VALUABLE TAX INFORMATION");
print;
print("Interest earned: ",interest_earned);
print("Interest paid: ",interest_paid);
end if;

end loop;

end end_of_month;

procedure dollar(amt);     -- converts numerical amount to dollar
return str(amt / 100) + "." + if #(cts := str(amt mod 100)) = 2 then cts else cts + "0" end if;
end dollar;

procedure day_field(stg); return stg(3..4); end day_field;        -- get day from date string

procedure daystring();         -- extracts month_day_year from nominal date

dat := date();         -- get the calendar date
return dat(1..2) + dat(4..5) + dat(7..9);        -- simplifies full date

end daystring;

end bank_checking;```

9.6 A Turing-Machine Simulator

1. Some new character c' is written into the tape square at which the read-write head is positioned, replacing the character c that was there;

3. The read-write head either moves one step right, or one step left or remains where it is.

Plainly, these actions of the Turing machine can be defined by a map action(c, s), whose two parameters are a tape character c and an internal state s, and whose value is a tuple [c',s',n'], consisting of the tape character c' that will overwrite c, the new internal state s' of the read-write head, and an indicator n of the direction of head motion, which must be either + 1 (move right),-1 (move left), or 0 (don't move). The following procedures read in the description of a Turing machine, check this description for validity, read in the initial contents of the Turing machines' tape, and then proceed to imitate its actions. The tape is represented by a tuple 'tape' whosej-th component is the character written in thej-th square. Blank squares contain the blank character. The Turing machine stops when it reaches an internal state s such that action(c,s) is undefined. We assume that the Turing machine description read in initially is a set of quintuples [c,s,c',s',n'], each representing an action-map entry [[c,s],c',s', n]. This description is checked to verify that the action map it describes is really single-valued. The auxiliary procedure print_tape prints the contents of the Turing machine tape after each cycle of operation.

```program Turing_simulate;   -- Turing machine simulator

if (atps := read_check()) = OM then stop; end if;  -- illegal specification

[action,tape,position,state] := atps; -- unpack action table,
-- initial tape, initial position, and nitial state and change internal state

while (act := action(tape(position), state)) /= OM loop         -- until stop

[tape(position),state,n] := act;        -- write new character to tape

if (position +:= n) < 1 then        -- moved left to brand moved left to brand-new square
tape := [" "] + tape;         -- add blank to left of tape
position := 1;                -- and adjust position pointer
elseif position > #tape then    -- moved right to brand moved right to brand-new square;
tape with:= " ";            -- add blank to right of tape
end if;

print_tape(tape, position, state);

end loop;

print("Simulation ended. Character and state are: ",tape(position)," ",state);

procedure read_check;        -- reads and checks action table, tape, initial position, and initial state

in_handle := open("turing_data","TEXT-IN");
action := {[[c,s],[c2,s2,n]]: [c,s,c2,s2,n] in actuples};
not_single := false;

for im = action{cs} | #im > 1 loop    -- action is not single action is not single-valued

not_single := true;
print();
print("action is indeterminate in condition", cs);
print("actions could be:");

for [c2, s2, n] in im loop
print(c2, s2, n);
end loop;

print();

end loop;

if not_single then return OM; end if;       -- as indication of error in action table

check((bad_cs := {cs: [c2,s2,n] = action(cs) | n notin {-1, 1, 0}}) = {},
"Illegal tape-motion indicators occur for conditions:",bad_cs);

check(is_integer(position), "Illegal initial position:", position);
check(is_tuple(tape), "Illegal initial tape:", tape);
check(forall t = tape(i) | is_string(t) and # t = 1,"Illegal initial tape", tape);

-- now add extra blanks to the initial tape if necessary
if position > #tape then            -- extend tape with additional blank squares
tape +:= #tape_position * [" "];
elseif position < 1 then
tape := (1 - position) * [" "];    -- add extra blank squares to left
position := 1;                    -- adjust index of position on extended tape
end if;

return [action, tape, position, state];

procedure print_tape(tape,position,state);        -- Turing machine tape print utility.
-- This procedure is used to display the state of the Turing machine tape at
-- the end of each cycle of simulation

const sq := 18, hsq := 9;            -- one one-fourth and one-eighth screen size
const screen_size := 72;            -- number of characters on terminal

topline := screen_size * "_";
topline(4 * hsq + 1..4 * hsq + 4) := botline := screen_size * "_";

tape_string := (hsq * " ") +/ tape + (hsq * " ");
-- convert tape to string and pad with blanks.
tape_string := tape_string(position..position + 2 * hsq - 1);

picture := +/ ["|" + t + " ": t in tape_string];
picture(1) := " ";        -- remove first vertical bar.
print(); print(topline); print(picture + " " + str(state)); print(botline);

end print_tape;

procedure check(condition, message, quantity); -- utility macro for input utility macro for input-
if condition then return true;  end if;         -- condition OK

print(message, quantity);         -- diagnostic message and offending quantity
return false;                    -- as indication of error

end check;

end Turing_simulate;```

Here is a small data file that you can use to test the preceding simulator. It describes a Turing machine with just two tape characters 'a' and 'b' and two internal states 'move_left' and 'move_right'. The machine is designed to change any run of a's into which its reading head is intially placed into a run of b's. It starts in the 'move_right' state. As long as it is positioned over an 'a' it stays in this state and continues moving right, one square at a time. When it encounters the character 'b' or ' ' it switches to the 'move_left' state and moves one position to the left. Then, as long as a's are encountered, it changes them to 'b' and continues moving left. After the last 'a' it stops.

The data file that describes this simple program is as follows. The intial tape is [b,a,a,a,a], and the reading head is initially placed on the second character, with the machine in 'move_right' state. The tuple representing the machine's state-transition table comes first in the file.

```	[
[a,move_right,a,move_right,1],
[b,move_right,b,move_left,-1],
[" ",move_right,b,move_left,-1],
[a,move_left,b,move_left,-1],
]
[b,a,a,a,a]
2
move_right
```

You may wish to develop other machines of this type, and, if ambitious, try to reconstruct Turing's proof that a properly designed single machine of this simple kind can simulate the action of any computer program.

9.7 Huffman Coding of Text Files

The standard ASCII alphabet of computer characters contains 127 characters, each of which is usually represented at the machine level by a sequence of 8 binary bits. If large volumes of English-language text need to be stored, this internal coding, which uses just as much computer memory space to represent a rare character like 'z' as to represent a common character like 'e', is by no means optimal. It is better to represent frequently occuring characters by shorter sequences of bits, even though this forces one to lengthen the internal encoding of less frequent characters, since overall this will diminish the total storage required to store typical texts. An effective method for using variable- length encodings of this kind was described by David Huffman and has become known as Huffman coding. Huffman's technique is to arrange all the characters to be encoded as the terminal nodes of a binary tree, in the manner shown in Figure 9.1. This tree should be set up so that commonly occurring characters appear near its "root" node and rare characters appear at a greater distance from its root.

There will always exist a unique path from the root node of such a tree to each terminal node or "twig" of the tree, and any such path can always be described by a unique sequence of zeroes and ones, where '0' means "take the left branch" and '1' means "take the right branch down the tree. As the code for a character c we can therefore use the binary sequence describing the path

Figure 9.1 Binary Huffman Tree with Characters Attached to its Terminal Nodes.

from the root node of the tree to the terminal node at which c is attached. For example, the tree shown in Figure 9.1 would assign the code '000' to 'E', the code '0010' to 'T', the code '0101' to I, etc. To encode a sequence of characters, we simply concatenate the sequences of zeroes and ones representing its individual characters. To decode a sequence s of zeroes and ones, we start from the root of the Huffman tree which defines our encoding and use the leftmost bits of s to guide us down a path in the tree. As soon as we reach a twig of the tree we add the character attached to this twig to the sequence of decoded characters we are building up. The sequence of bits that led us to this character is then detached from s, and we return to the root of the Huffman tree and continue the decoding process using what remains of s. The three routines which follow embody this encoding and decoding technique. The 'Huff' procedure takes a character string and encodes it using Huffman's method. 'Puff', which is the inverse of 'Huff', takes the encoded form of a string s and recovers the original form of s. The third procedure, called setup, takes maps 'left' and 'right' representing a Huffman tree and uses them to initialize various global data objects required by the Huff and Puff routines.

The algorithm uses '1' and '0' to represent bits.

```program Huffman;          -- Huffman  code procedures

var H_code,           -- maps each character into its Huffman code
H_root,       -- root node of Huffman tree
H_left,       -- maps each node of the Huffman tree to its left descendant
H_right,      -- maps each node of the right tree to its left descendant
H_char;       -- maps terminal nodes of the Huffman tree to the characters they represent

huff_test;              -- test code for Huffman program

procedure setup(root,left,right,chr); -- auxiliary initialization routine

-- We begin by using the procedure arguments to initialize all but the first of
-- the global variables listed above.

H_left := left; H_right := right;
H_root := root; H_char := chr;

-- Next we calculate H_code(c) for each character c

parent := {[y,x]: [x,y] in (H_left + H_right)};         -- This maps each tree node to its parent

H_code := {};           -- begin calculating Huffman codes from tree structure

for c = H_char(node) loop

bits := "";                      -- initially, path is null

while node /= H_root loop               -- chain up to the root, noting how we got there
bits := if H_left(par := parent(node)) = node then "0" else "1" end if  + bits;
node := par; -- step up to parent
end loop;

H_code(c) := bits;              -- record Huffman code for the current character

end loop;

end setup;

procedure Huff(stg);            -- calculates Huffman code for string stg
return "" +/ [H_code(c): c = stg(i)];   -- concatenate codes of individual characters
end Huff;

procedure Puff(Huff_stg);             -- decodes a Huffman-coded string

stg := "";-- initialize decoded string
node := H_root;-- start at Huffman-tree root

for b = Huff_stg(j) loop -- examine binary bits of Huff_stg in order

node := if b = "0" then H_left(node) else H_right(node) end if;

if (c := H_char(node)) /= OM then-- have reached twig
stg +:= c; -- append to decoded portion
node := H_root;-- restart at Huffman-tree root
end if;

end loop;

return stg;

end Puff;

-- routines listed below shold be inserted here.....

end Huffman;

```

The encoding and decoding procedures shown sidestep the question of how to find the tree that will give us a maximum degree of text compression. Of course, the rule for finding this tree, given the frequency with which each character occurs in the text we are to encode, is Huffman's essential discovery. His rule is as follows: we begin by finding the two characters c1, c2 of lowest frequency. These are then logically "conglomerated" into a single joint character c, of which c1 and c2 become the left and right descendants, respectively. We remove c1 and c2 from the collection of characters which remain to be processed and replace them by c. Continuing this until only one character remains, we will have bulit the Huffman tree.

Represented in SETL, this procedure is as follows:

```procedure Huff_tree(freq);      -- Huffman tree-build routine
-- freq is assumed to map all the characters of our alphabet into their
-- expected frequencies of occurrence.

-- This procedure returns a quadruple [root, left, right, chr] consisting of the
-- Huffman tree root, its left and right descendancy maps, and a map chr
-- which sends each terminal node of the tree into the character attached to this node.

-- Since the code which follows will represent tree nodes by character strings,
-- the chr map is just the identity map on single-character strings and is
-- conveniently set up right here.

chr := {[c,c]: c in domain freq};

left := right := {};      -- initialize the descendancy mappings

while #freq > 1 loop        -- iterate till all nodes have been conglomerated  into 1

[c1,freq_cl] := get_min(freq); [c2,freq_c2] := get_min(freq);
freq(c := (c1 + c2)) := freq_cl + freq_c2;        -- form "group" character
left(c) := c1; right(c) := c2;    -- make c1 and c2 descendants of c
end loop;

return [arb domain freq, left, right, chr]; -- which is necessarily the tree root

end Huff_tree;

procedure get_min(rw freq);
-- This auxiliary procedure finds the character c of minimum frequency,
-- returns c and its frequency, and deletes c from the domain of freq. Note
-- that it uses a "dangerous" program construction, legal in SETL, but
-- certainly not recommended for use in any context which is at all complex;
-- namely it is a function which modifies the argument with which it is called.

min_freq := min/[f: f = freq(c)];
assert(exists f = freq(c) | f = min_freq);        -- find the minimizing c
freq(c) := OM;        -- remove it from the domain of the 'freq' mapping
-- this modifies the input argument (which is read-write).
-- DANGEROUS!

return [c,f];        -- return character and its frequency

end get_min;

procedure huff_test;            -- test code for Huffman program
order := "e taionshrdluqwypfgjkmbvcxz";       -- blank is second most frequent
freq := {[c,30 - j]: c = order(j)};         -- character in English text.
[root, left, right, chr] := huff_tree(freq);
setup(root, left, right, chr);
print(huff("hello there"));
print(puff(huff("hello there")));

end huff_test;

```

The output produced is

```		00001011111011110101011010100000001011111111011
hello there
```

showing that the Huffman coded form of 'hello there' is 47 bits, or just under 6 bytes long, saving nearly 50% of the original 11 byte message.

Various improvements and extensions of the procedures described in this section appear in Exercises 13-18.

9.8 A Game-playing Program

In this section, we will explore the basic structure of programs which play board games like chess and checkers which involve two players, whom we shall call "A" and "B" . The momentary state s = [p, x] of any such game can be defined by giving the position p of the various pieces or counters used in the game, and by stating which of the players, x = "A" or x = "B" , is to move next. Given any such state s, the rules of the game will determine the moves which are legal and hence will determine the set of all possible new states s1,..., sk, exactly one of which must be chosen by the active player, i.e., the player whose turn it is to move. We shall suppose in what follows that the map has_turn(s) determines this player (i.e., has_turn(s) is just x, if as previously s has the form [p,x]). We also suppose that the map next_states(s) gives us the set {s1,..., sk} of states to which the active player can move. Any such game will end as soon as certain states, called terminal states, are reached. (In chess, for example, these are the states in which one of the players has been checkmated.)

For purposes of analysis it is convenient to suppose that when a terminal state s is reached, D dollars are transferred from player B to player A. We can suppose either that the sum D is fixed or that it depends on s. It is actually more convenient to make the latter assumption, and we shall do so, supposing accordingly that we are given a function A_wins(s) defined on all terminal states s, and that when a terminal state s is reached the sum A_wins(s) is transferred from B to A. Plainly A is the winner if D > 0, B is the winner if D < 0, and the game counts as a tie if D = 0. It is convenient to suppose that A_wins(s) = OM if the state s is not a terminal state; then the condition A_wins(s) /= OM can be used to test for terminal states. The three functions has_turn(s) (whose value must be either "A" or "B" ), next_states(s), and A_wins(s) serve to encapsulate the basic rules of any two-player game we wish to study.

Next, to begin to understand the strategic considerations which determine the laws of effective play, it is useful to extend the function A_wins(s), which is only defined for terminal states, so that it becomes a function A_can_win(s), defined for all states. We do this in the following recursive way:

```	A_can_win(s) = A_wins(s)?			(1)
if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
else min/ [A_can_win(sy): sy in next_states(s)] end if;```

The meaning of this formula can be explained as follows:

1. If the state s is terminal, the game is over and the amount that A can win is exactly the amount that A has in fact won.

2. Otherwise, if it is A's turn to move, he will choose the move that is most favorable to him, shifting the game into that state sy in next_states(s) for which A_can_win(sy) is as large as possible. Conversely, if it is B's turn to move, she will defend herself as well as possible against A's attempts to win a maximum amount. B does this by shifting the game into the state sy for which A's attainable winnings are as small as possible. Since A wins what B loses, and vice versa, this is at the same time the state in which B's winnings are as large as possible.

It is not hard to see that if the function A_can_win defined by (1) is known, and if both players expect their opponents to play with perfect accuracy, player A should always use her turn to move to a state sy such that A_can_win(sy) is as large as possible, and player B should always use his turn to move to a state sy such that A_can_win(sy) is as small as possible. To show this, suppose that the sequence of states traversed in the history of a game, from the moment at which it reaches state s up to the moment at which the game terminates, is S = s1,s2,..., sn, Using (1) it is easy to see that if A uses this strategy, A_can_win(sj) will never decrease, so that by using our recommended strategy A guarantees that when the game terminates he will win at least the amount A_can_win(s). Conversely, if B uses the strategy we recommend, then formula (1) shows that A_can_win(sj) will never increase. Hence, if player A ever makes a move which decreases the value of A_can_win from v to some value u which is less than v, then after this B can prevent him from recovering, i.e., from ever winning more than u. If follows that, if A gives his opponent credit for playing optimally, A must never "give ground" in regard to the function A_can_win, i.e., that when it is his turn to move he should always move to a new state sj such that such that A_can_win(sj) is as large as possible. (Of course, if he does this, then A_can_win(sj) = A_can_win(s); see (1)). Reasoning by symmetry, we also see that B should always move to a new state sy such that A can win(sj) is as small as possible.

These considerations indicate that any game-playing program will need to calculate the function (1). However, if the game being analyzed is at all complex, it will not be feasible just to use the recursive definition (1) as it stands, since the tree of possible moves and countermoves which (1) would examine will tend to grow very rapidly. For example, if at every level A has just 4 possible moves and B has 4 possible countermoves, then 256 different positions can evolve from an initial state s after A and B make two moves each, 64,000 different positions after A and B have made 4 moves each, and hence the recursion (1) would have roughly 16,000,000 positions to examine if we used it to look ahead through all possible combinations of 6 moves of A and 6 countermoves of B.

This makes it plain that it is important to accelerate calculation of the function A_can_win as much as we can. Several techniques for doing this have been developed, but we shall only describe a few particularly important methods of this kind, namely

1. Memoization of the position-evaluation function

2. Exploitation of symmetries

3. The 'alpha-beta' or 'reasonable expectations' pruning method

4. Heuristic guidance of the best-move search

The first two of these techniques are elementary, but the 'reasonable expectations' pruning method is subtler.

We will use the well-known children's game of 'Tik-Tak-Toe' or 'Noughts and Crosses" as a running example. This game is played on a 3 by 3 grid, originally empty. The players, whom we shall call 'A' and 'B', move alternately. Player 'A' (resp. 'B') can put an 'A' (resp. 'B') at any empty position in the grid. The first player to get 3 identical letters in a row, column, or diagonal wins. If the grid fills without any player winning, it is a tie. (As, alas, it always is if both players move optimally.) The termination and next_states routines for this game are given below. States of the game are represented as pairs [board,mover], where 'mover' is always 'A' or 'B',and 'board' is a tuple of three strings representing the three rows of the grid. Each row is a string of three characters, each of which is 'A', 'B', or '.' (representing an empty square.)

The termination routine, 'A_wins' first finds all 'non-blocked' rows, columns, and diagonals, namely those which do not contain letters placed by both players. If none remain, the game is a tie. If any is full, then the player whose letter occupies this row, column, or diagonal is the winner. Otherwise there is no winner, so OM is returned, and the game goes on.

The 'next_states' routine first checks to see if either player has won, and if so returns an empty list so as not to continue a game that is already decided. Otherwise it fills an empty square with the activeplayes lett in all possible ways, and returns the resulting list of states.

```procedure A_wins(s);            -- termination routine for Tik-Tak-Toe

[board,mover] := s;        -- unpack the state

only_1_rows := {row in [1..ncols] | #{br: br = board(row)(col) | br /= "."} < 2};       -- rows with only 1 kind of fill
only_1_cols := {col in [1..ncols] | #{br: row in [1..ncols] | (br := board(row)(col)) /= "."} < 2};    -- cols with only 1 kind of fill
only_1_diags := {};
if #{br: row in [1..ncols] | (br := board(row)(row)) /= "."} < 2 then only_1_diags with:= "up"; end if;                -- diagonal with only 1 kind of fill
if #{br: row in [1..ncols] | (br := board(row)(4 - row)) /= "."} < 2 then only_1_diags with:= "down"; end if;        -- diagonal with only 1 kind of fill

if only_1_rows = {} and only_1_cols = {} and only_1_diags = {} then return 0; end if;        -- tie

if exists win_row in only_1_rows | #{col in [1..ncols] | board(win_row)(col) /= "."} = 3 then return if board(win_row)(1) = "A" then 1 else -1 end if;    end if;        -- find winner
if exists win_col in only_1_cols | #{row in [1..ncols] | board(row)(win_col) /= "."} = 3 then return if board(1)(win_col) = "A" then 1 else -1 end if; end if;
if  "up" in only_1_diags and #{row in [1..ncols] | board(row)(row) /= "."} = 3 then return if board(1)(1) = "A" then 1 else -1 end if; end if;        -- find winner
if  "down" in only_1_diags and #{row in [1..ncols] | board(row)(4 - row) /= "."} = 3 then return if board(1)(3) = "A" then 1 else -1 end if; end if;    -- find winner

return OM;        -- nobody wins

end A_wins;

procedure next_states(s);        -- next_states routine for Tik-Tak-Toe

if A_wins(s) /= OM then return []; end if;        -- if the game is decided, don't generate new states

[board,mover] := s;        -- unpack the state
empties := {[i,j]: i in [1..ncols], j in [1..ncols] | board(i)(j) = "."};
-- usable empty spots on board

new_boards := [];
nm  := if mover = "A" then "B"  else "A" end if;
for [i,j] in empties loop new_board := board; new_board(i)(j) := mover; new_boards with:= [new_board,nm]; end loop;

return new_boards;

end next_states;

procedure has_turn(s); return s(2); end has_turn;```

Even this very simple game has surprisingly many potential states. Each of its 9 grid positions can be in any of 3 conditions: empty, filled with an 'A', filled with a 'B'. Either of the two players can be the next to move. Thus the number of possible states is 2 * 3**9, i.e. 39,366. This make it clear that we want to remember the values A_can_win(s) for states for which this function has already been calculated, rather than repeating its evaluating recursion unnecesssarily over the many thousands of states involved. That is, instead of using the raw evaluation procedure

```procedure A_can_win(s); -- position evaluator

val := A_wins(s)?
if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
else min/ [A_can_win(sy): sy in next_states(s)] end if;
return val;

end A_can_win;```

we need to 'memoize' it by introducing a global value 'seen_already', originally empty, which maps states that have already been evaluated into their 'win-lose-tie' values, and then return this value if it is already available. This gives us the following, greatly improved version of the evaluation function, embodying our first improvement, memoization of the position-evaluation function.

```procedure A_can_win(s); -- position evaluator

if (val:= seen_already(s)) /= OM then return val; end if;

seen_already(s) := val := A_wins(s)?         -- note value of state
if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
else min/ [A_can_win(sy): sy in next_states(s)] end if;
return val;

end A_can_win;```

The overall program that can be built around this evaluator is as follows.

```program test;                        -- game playing program

const ncols:= 3;
const all_A := ncols * "A",all_B:= ncols * "B";             -- winning and losing configuration

var count := 0;
board_start := ncols * [ncols * "."];		-- empty Tik-Tak-Toe board

game_start := [board_start,"A"];                  -- empty board, A moves first

print(if (val := A_can_win(game_start)) = 1 then "A wins" elseif val = -1 then "B wins" else  "Game is tie" end if);

procedure A_can_win(s); -- position evaluator
-- ... as above
end A_can_win;
-- other procdeures as above ...
end test;
```

Execution of this program shows that to evaluate the game outcome (that is, the value of the starting state, which is, as children know, a tie) the code calculates the values of approximately 11,500 game states. This makes it plain that further optimzations are desirable. One such is to exploit the game's symmetry: since the board on which it is played and the rules of play are both symmetrical, we need not evaluate any position whose mirror image has already been evaluated. The following variant of 'A_can_win' exploits this fact, for vertical mirroring of board positions. It illustrates the second general optimization technique listed above, namely exploitation of symmetries.

```procedure A_can_win(s); -- position evaluator

if (val:= seen_already(s)) /= OM then return val; end if;
if (val:= seen_already(mirror(s))) /= OM then return val; end if;

seen_already(s) := val := A_wins(s)?         -- note value of state
if has_turn(s) = "A"  then max/ [A_can_win(sy): sy in next_states(s)]
else min/ [A_can_win(sy): sy in next_states(s)] end if;
return val;

end A_can_win;
procedure mirror(s); [board,mover] := s; return [[board(3),board(2),board(1)],mover]; end mirror;
```

Execution of the Tik-Tak-Toe program with this improvement shows that it evaluates the game outcome after examining 6,000 states, about 50% fewer than if we ignore symmetry.

Next we turn our attention from these important but elementary techniques to the third and most sophisticated of the above-listed game-algorithm improvements, 'reasonable expectations' pruning. To introduce the notion of search pruning, we can first observe that for win-lose-tie games like Tik-Tak-Toe, a player A need who has found a winning strategy among the sucessors of a given state s in which he has the move need never examine any additional sucessors, since if the state s arises he (a) can choose the winning strategy, and(b) might as well choose the winning strategy, since in any case the best he can do is win. This leads to the simple following variant of 'A_can_win':

```procedure A_can_win(s);
if (val:= seen_already(s)) /= OM then return val; end if;
if (val:= seen_already(rev(s))) /= OM then return val; end if;
if (v := A_wins(s)) /= OM then return v; end if;

if has_turn(s) = "A" then

max_till_now := -1;

for sy in next_states(s) loop
if (max_till_now max:= A_can_win(sy)) >= 1 then seen_already(s) := 1; return 1; end if;
end loop;

return max_till_now;

else        -- has_turn(s) = "B"

min_till_now := 1;

for sy in next_states(s) loop
if (min_till_now min:= A_can_win(sy)) <= -1 then seen_already(s) := -1; return -1; end if;
end loop;

return min_till_now;

end if;

end A_can_win;```

The fact that the loops in this form of 'A_can_win' are terminated early, i.e., terminated as soon as a winning strategy is found, sometimes improves the efficiency of (3) very substantially; this is the kind of improvement that we want. Executing the Tic-Tac_Toe prgom with this version of 'A_can_win', we see that it establishes that the starting position is a tie after evaluating roughly 4,500 positions, 25% less than the number examined if the search improvement which it embodies is not used.

Indeed, the sooner we can find a winning strategy (if one exists) the more effective such improvements should become. This leads us to the fourth of our improvement methods, heuristic guidance of the best-move search. Here we do not select moves for evaluation in random order, but give preference to moves which are deemed likeliest to lead to a win. The way in wich this is done will depend on the game being played. In our Tik-Tak-Toe case, it is reasonable (i) never to select a positon which lies entirely in rows, columns, and diagonals in which the opposing player has alredy mad a mark(since such a move can never contribute to a win), and (ii) to prefer moves into positions which create the largest number of 'threats' for the opponent.

The following routine incorporates this idea. It rates available empty squares according to the number of rows, columns, and diagonals which run through them and contain no square marked by the opponent. These are then returned in decreasing order of desirability (sorting is done with the standard merge-sort routine found in the sort_pak package distributed with SETL.)

```procedure best_move_list(s);                    -- estimate the value of a move to the current mover

[board,mover]:= s;            -- unpack state

value := {};        -- map from positions to  their values
opponent := arb({"A","B"} less mover);            -- the other guy

empties := {[i,j]: i in [1..ncols], j in [1..ncols] | board(i)(j) = "."};
-- usable empty spots on board

for row in [1..ncols] | not (exists cl in [1..ncols] | board(row)(cl) = opponent) loop         -- check rows
for col in [1..ncols] | board(row)(col) = "."  loop value(row,col) := value(row,col)?0 +  1; end loop;
end loop;

for col in [1..ncols] | not  (exists row in [1..ncols] | board(row)(col) = opponent) loop          -- check columns
for row in [1..ncols] | board(row)(col) = "."  loop value(row,col) := value(row,col)?0 +  1; end loop;
end loop;

if not (exists row in [1..ncols] | board(row)(row) = opponent) then           -- check rising diagonal
for row in [1..ncols] | board(row)(row) = "." loop value(row,row) := value(row,row)?0 +  1; end loop;
end if;

if not (exists row in [1..ncols]|  board(row)(4 - row) = opponent) then     -- check falling diagonal
for row in [1..ncols] | board(row)(4 - row) = "." loop value(row,4 - row) := value(row,4 - row)?0 +  1; end loop;
end if;

empties -:= domain(value);        -- keep the senseless moves, just in case

return [y: [x,y] in merge_sort([[-y,x]: [x,y] in value])] + [x: x in empties];        -- return the inverse  map, as a list

end best_move_list;```
```
The following slightly modified version of 'next_states' uses the state ordering produced by 'best_move_list'.

procedure next_states(s);        -- next_states routine for Tic next_states routine for Tik-Tak-Toe

if A_wins(s) /= OM then return []; end if;        -- if the game is decided, dont generate new states
[board,mover]:= s;            -- unpack state
new_boards := [];            -- will build list

nm  := if mover = "A" then "B"  else "A" end if;            -- the other guy
for [i,j] in best_move_list(s) loop new_board := board; new_board(i)(j) := mover; new_boards with:= [new_board,nm]; end loop;
--print("new_boards: ",#new_boards);    if #new_boards = 0 then print(A_wins(s),s); end if;
return new_boards;

end next_states;```

With this further improvement the program establishes that the starting position is a tie after evaluating roughly 4,500 positions, no real improvement over our last method. The probable reason is that, since both players aim for a win, they explore alomost all move-tree banches other than than those that our earlier improvements have already eliminated. However, carful ordering of the next move list is of clear benefit when used in connection with the more sohisticated pruning techniques discussed below.

The precding discussion suggests the following variant evaluation procedure, which comes closer to the idea of 'reasonable expectations' search pruning. Suppose that one of the players, knowing that Tik-Tak-Toe is always a tie in the case of optimal play by both sides, decides to rest content with a tie instead always of trying for a win. (This 'conservative' play will lead to the same outcome in the case of an opponent playing optimally, but will miss chances to win against an opponent who makes errors.) This is easily done by changing the one line

`        if (max_till_now max:= A_can_win(sy)) >= 1 then seen_already(s) := 1; return 1; end if;`

in our earlier version of 'A_can_win' to

`     if (max_till_now max:= A_can_win(sy)) >= 0 then seen_already(s) := max_till_now; return max_till_now; end if;`

With this change the program confirms that the starting position is a tie after evaluating only 425 positions. (Without 'best move first' ordering, 675 positions would be evaluated instead.) This drastic improvement in search efficiency suggests pushing this idea as far as we can, which can be done as follows. Suppose that instead of evaluating the simple function A_can_win(y). which depends only on the current game state y, we elaborate it to the following two-parameter variant:

mover_can_win(s,hoped_for)

Here as before, s is a game state in process of evaluation, but now the extra parameter 'hoped_for' is the amount that the player active in state s can reasonably hope for, given the amount that the opponent player may be certain of in the context in which s comes under consideration. Let WORST denote the value of the worst outcome that a player need fear (in our Tic-Tak-Toe example, this is -1, representing loss of the game.) Then we can write the following recursion for 'mover_can_win':

```procedure mover_can_win(s,hoped_for);            -- returns  amount that the active player can win if game state s is reached

if (val := mover_wins(s)) /= OM then return val; end if;
-- if the game has ended, return its outcome

certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
[board,mover] := s;                -- unpack the state

for sy in next_states(s) loop

[-,new_mover] := sy;        -- see who moves next

if mover = new_mover then     -- I have another turn
certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
else             -- it becomes the opponent's turn
certain_of max:= -mover_can_win(sy,-certain_of);
-- if I am already certain of winning the amount 'certain_of',  the opponent cannot
-- possibly win more than the negative of this amount, and should not hope to do more.
-- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
end if;

if certain_of >= hoped_for then return certain_of; end if;
-- no point continuing search if best possible outcome has been found

end loop;

return certain_of;        -- this state value is now known

end mover_can_win;```

This variant of best-move searching expresses the idea of 'reasonable expectations' or'alpha-beta' search pruning. The memoized version of this, with the same use of symmetry as before, is

```procedure mover_can_win(s,hoped_for);            -- returns  amount that the active player can win if game state s is reached

if (val:= seen_already(s)) /= OM then return val; end if;        -- memoization
if (val:= seen_already(rev(s))) /= OM then return val; end if;

if (val := mover_wins(s)) /= OM then seen_already(s) := val; return val; end if;
-- if the game has ended, return its outcome

certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
[board,mover] := s;                -- unpack the state

for sy in next_states(s) loop

[-,new_mover] := sy;        -- see who moves next

if mover = new_mover then     -- I have another turn
certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
else             -- it becomes the opponent's turn
certain_of max:= -mover_can_win(sy,-certain_of);
-- if I am already certain of winning the amount 'certain_of',  the opponent cannot
-- possibly win more than the negative of this amount, and should not hope to do more.
-- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
end if;

if certain_of >= hoped_for then seen_already(s) := certain_of; return certain_of; end if;
-- no point continuing search if best possible outcome has been found

end loop;

return seen_already(s) := certain_of; certain_of;        -- this state value is now known

end mover_can_win;

procedure mover_wins(s);             -- symmetrical form of A_wins
[-,mover] := s;
return if (val := A_wins(s)) = OM then OM elseif mover  = "A" then val else -val end if;
end mover_wins;```

This variant of the evaluation algorithm shows that Tik-Tak-Toe is a tie after evaluating only 1500 position, not nearly as few as needed when one of the players tries only for a tie, but nevertheless much bettr than the 4500 evaluations needed if reasonable expectations pruning is not used. (Not also that if we omit the use of best-move-first ordering, the Tik-Tak-Toe evaluation cost of reasonable expectations pruning rises to 2700.

In spite of this very substantial improvement in search time which 'reasonable expectations pruning achieves, complex games will still lead to trees of moves which are so deep and branch so rapidly that full exploration using this algorithm is quite impossible. One technique used to cope with this fundamental difficulty is to limit the number of recursive levels that the code explorea. When this limit is reached, we use some ad hoc estimate, called an evaluation heuristic, to approximate the value of A_can_win(s). In effect, this approach pretends to replace the full game that we would like to analyze by a truncated game that is played for some limited number L of moves and then terminated with a payoff determined by the evaluation heuristic. To play the full game, we then reanalyze this truncated game each time it is a given player's turn to move and choose the best move in the truncated game as her recommended move in the real game. Assuming that A_estimate(s) is the estimated value of state s to player A, it is easy to modify the preceding codes to incorporate such a limit on the number of levels of move and countermove that will be examined. Doing so, we get

```procedure est_mover_can_win(s,hoped_for,limit);
-- returns estimated amount that the active player can win if game state s is reached

if (val:= seen_already(s)) /= OM then return val; end if;        -- memoization
if (val:= seen_already(rev(s))) /= OM then return val; end if;

if (val := mover_wins(s)) /= OM then seen_already(s) := val; return val; end if;
-- if the game has ended, return its outcome

if (lim -:= 1) = O then
val := if has_turn(s) = "A"  then A_estimate(s) else -A_estimate(s) end if;
end if;

certain_of := WORST;            -- on first encountering an unexplored state, we fear the worst
[board,mover] := s;                -- unpack the state

for sy in next_states(s) loop

[-,new_mover] := sy;        -- see who moves next

if mover = new_mover then     -- I have another turn
certain_of max:= mover_can_win(sy,hoped_for);        -- continue in whatever way is best
else             -- it becomes the opponent's turn
certain_of max:= -mover_can_win(sy,-certain_of);
-- if I am already certain of winning the amount 'certain_of',  the opponent cannot
-- possibly win more than the negative of this amount, and should not hope to do more.
-- Also, if the value of 'sy' to the opponent is x, it is  Also, if the value of 'sy' to the opponent is x, it is -x to me.
end if;

if certain_of >= hoped_for then seen_already(s) := certain_of; return certain_of; end if;
-- no point continuing search if best possible outcome has been found

end loop;

return seen_already(s) := certain_of; certain_of;        -- this state value is now known

end est_mover_can_win;```

9.9 Implementation of a Macroprocessor

Languages of relatively low level, like assembly language and C, are very commonly provided with macroprocessors, but a mccro capability can be a useful adjunct to languages of higher level also.

Macros are abbreviations that obviate the need to write similar pieces of code repeatedly. They allow programmers to introduce convenient shorthand notations for constructs that are used many times in a program. Macros, like procedures, are defined once and can then be used several times.

Macros and procedures resemble each other in that both give ways of associating names with bodies of code text and of invoking this code when the name is mentioned. However, when a macro is mentioned in a program after having been defined, the program text which it represents is substituted directly for the invoking occurrence of the macro name; this substitution is called macro-expansion and is to be contrasted with the detour-and-return action (see Section 5.1) triggered by a procedure invocation. That is to say, macros make use of a purely textual mechanism; they simply replace the name of the macro by its definition at the point where the name appears. This means that unlike procedures (which can be invoked before their definition has been seen), macros must be defined before they are used; i.e., the definition of a macro must appear physically in a program before the macro is first used.

In this section we will first describe a general purpose macro capability and then show how to implement it. We allow macros of three basic kinds: parameterless macros, macros with explicit parameters, and macros with both explicit and generated parameters. These have the following three forms:

```(1)	MACRO m_name;
macro-body
ENDM;

(2)	MACRO m_name(p_name1,p_name2...,p_namek);
macro-body
ENDM;

(3)	MACRO m_name(p_name1,..., p_namek; gp_name1,..., gp_namen);
macro-body
ENDM;```

Macros without parameters provide for the simplest kind of abbreviation: the name of such a macro simply stands for its macro body, which replaces the macro name whenever this name appears. For example, we can write:

```	MACRO countup;			(4)
t := t + 1;
if t > limit then
errmsg('out of bounds');
end if
ENDM;```

Following the appearance of definition (4) in a program, module, or procedure, any subsequent appearance of the name countup, for example, in the line

`			(5) countup`
triggers replacement of (5) by the body of (4), i.e., by the four lines of text shown previously.

Macros with parameters

Macros with parameters are introduced by macro definitions of the form

```	MACRO m_name(p_name1,...,p_namek);
macro-body
ENDM;```

Here, m_name can be any legal identifier, which becomes the name of the macro introduced by (1); p_name1,...,p_namek, called the formal parameters of the macro, can be any list of distinct identifiers. The body of the macro can be any sequence of tokens.

After being introduced by a macro definition (1), the macro m_name can be invoked simply by using its name, followed by a list of k actual arguments, at any place within a text. Suppose, to be specific, that this invoking occurrence is

`		m_name(arg1,..., argk)	(2)`

Then the macroprocessor implemented below replaces the macro invocation (2) with an occurrence of the body of the corresponding macro definition (1), but in this body every occurrence of a formal parameter name p_namej will have been replaced by an occurrence of the corresponding argument argj. We emphasize again that this is to be done by replacement of text, and not, as in the case of a procedure call, by evaluation of arguments and transmission of their values. This means that the arguments argj of macro invocation need not even be complete, evaluable expressions; indeed, they can be arbitrary sequences of keywords, operator signs, constants, or identifiers. (However, since commas are used to separate the successive arguments of a macro invocation, no argument of such an invocation can contain an embedded comma.) This gives macros a syntactic flexibility which procedures do not have and which is sometimes useful. Suppose, for example, that we wanted to print out a series of examples illustrating the use of the compound operator in a language like SETL. In SETL, this could be done directly by using the following code:

```	v:= [1,2,3,4,5];
print("Combining the components of v using the operator + gives", +/v);
print("Combining the components of v using the operator * gives", */v);
print("Combining the components of v using the operator max gives",max/v);```
By using a suitable macro, we could abbreviate this repetitive code, as follows:
```	MACRO print_op(opsign, op);
print("Combining the components of v using the operator", opsign,
"gives", op/v)
ENDM;
v := [1,2,3,4,5];

print_op("+", +);		(3)
print_op("*", *);
print op("max", max);```

This illustrates the possibility of transmitting an isolated operator sign to a macro as an argument; notice that no corresponding possibility exists for procedures.

Macros with generated parameters

In addition to its ordinary parameters and arguments, macros can make use of generated parameters which play the role for macros that local variables play for procedures. To make use of this feature we write macro definitions having the form

```	MACRO m_name(p_name1,...,p_namek; gp_name1,...,gp_namen);		(6)
macro-body
ENDM;```

The additional parameters gp_name1,...,gp_namen appearing after the first semicolon in (6) but not in (1) are called generated parameters. The programmer does not supply arguments corresponding to parameters of this kind when a macro like (6) is invoked. Instead, one invokes a macro like (6) in exactly the same way as the macro (1). However, when a macro like (6) with generated parameters is invoked, the macroprocessor generates new tokens (of an artificial form that cannot be used accidentally by the programmer) and substitutes them for occurrences of the corresponding generated parameter names in the body of (6).

A common use of this option is to generate a supply of fresh variable names when these are required for local use within the substituted body of a macro. Suppose, for example, that we want to write a macro which tests the value of an expression e for membership in a given set s, and which returns immediately from the procedure invoking the macro in case the test (e in s) fails. Suppose also that in case of failure we want to return both a numerical error indication and the value of the expression e. If we write

```	 MACRO double_check(e, error_no); if e notin s then return [error_no, e];
end if ENDM;```
we would not get exactly the desired effect because when this macro is invoked, it will insert the actual argument for e in two places, which will lead to repeated evaluation of e (notice that e appears twice in the body). For example:
`	double_check(f(y) + g(y), 15);`
would expand into
```	if f(y) + g(y) notin s then
return [15,f(y) + g(y)];
end  if;```
In order to avoid this double evaluation we can use the following macro (which has a :generated parameter);
```	MACRO in_check(e, error_no; temp)
if (temp := (e)) notin s then return [error_no, temp]; end if	(7)
ENDM;```
To invoke this macro we would, for example, write
```	in_check(t +:= x,1);
...
in_check(t +:= y,2);  	(8)
...```

Note that if (as in (8)) an argument expression e, causing some side effect, is passed to the macro (7), it becomes essential that the value of e should be assigned to an auxiliary variable (the generated parameter temp) and that e should not be evaluated twice. Note also that each use of (7) will generate a new name for the parameter temp so that no accidental interference will occur between invocations of this macro. Finally, note the use of a precautionary extra pair of parentheses around the occurrence of the parameter e in the body of (7); these parentheses ensure that the argument transmitted to the macro in place of e will be handled as a unit, no matter what its actual syntactic form happens to be.

Macro nesting

Macro bodies can contain invocations of other macros; and macro names can be transmitted to other macros as arguments. For example, suppose that we define the following two macros:
```	MACRO triple(pa);
pa, pa, pa
ENDM;

MACRO q;
"hello there"
ENDM;```
Then, after expansion, the macro invocation
`		triple(q)`
becomes
`	"hello there", "hello there", "hello there"`
This example illustrates the fact that macro expansion is outside-in and recursive. That is to say, the expansion of a given macro body may trigger the expansion of an inner macro invocation.

Macro bodies can also contain embedded macro definitions. For example, the definition

```	MACRO def_x(pa);
MACRO x; pa endm;		(9)
ENDM;```
is legal. An embedded macro definition imd becomes active when one invokes the macro M in which imd is embedded, thus causing the body of M to be expanded. As an example, note that after expansion the sequence
```	def_x("aaa");
x x x
def x("bbb");		(10)
x x x```
becomes
```	"aaa" "aaa" "aaa"
"bbb" "bbb" "bbb"		(11)```
This happens in the following way. The first line in (10) is expanded and, according to the definition (9), becomes the macro definition
`		MACRO x; "aaa" ENDM;`
Subsequently the second line of (10) is expanded. It generates the first line of (11). After this, the third line of (10) is expanded into
`		MACRO x; "bbb" ENDM;`
This changes the meaning of the macro x, causing the fourth line of (10) to expand into the second line of (11).

Dropping and redefining macros

If a macro is only needed over a limited portion of a program, it is possible to "undefine" it so that the name of the macro can be used for another purpose. To erase a macro definition, one uses a macor definition with a completely empty macro body (the body should not even contain blanks), as in
`		MACRO x;ENDM;`
Once a macro has been dropped, its name reverts to 'ordinary token' status. For example,
```	MACRO x; print("now you see it") ENDM;
x;
MACRO x;ENDM; 	-- this drops x from macro-status
x;
MACRO x; print("now you don't"); ENDM;
x;```
expands into
```	print("now you see it");
x;
print("now you don't");```
This follows since the first line of (12) makes x a macro equivalent to "now you see it," but then the third line of (12) drops x from macro status, so that the fourth line of (12) carries over unchanged to become the second line of (13). The new definition of x is then seen, invoked, and expanded.

Considerably more elaborate macro features than those we have described are supported by many programming languages, especially by assembly languages. However, high level languages like SETL have less need for complex macro features than do lower-level languages, and thus the macro facility described in the preceding paragraphs would be found adequate for a language like SETL. Let us remark that macros, like procedures, can perform the useful function of hiding low-level details and thus help make a program more readable and more modular. The information-hiding capability of macros is most useful when we want to shield a program from possible changes in the structure of composite objects which it manipulates.

Implementation

The context within which our macroprocessor is to be implemented is assumed to be as follows:

(i) The macroprocessor reads a succession of tokens, obtained by decomposing some input file into successive tokens.

(ii) When the special token MACRO is encountered, a macro definition is opened. This token must be followed by a macro name, which can in turn be followed by a list of formal parameters and generated formal parameters, in the manner explained in Sections 8.2.3 and 8.2.4. The macro body following such a macro opener is collected and saved in a map def_of, which associates each macro name with its list of parameters, its list of generated parameters, and its macro body.

(iii) When a macro invocation starting with a token belonging to the domain of the map def_of is encountered, its actual arguments are collected, and the invocation is replaced by a substituted version of the macro body. This substituted text is logically inserted immediately in front of the remainder of the input file and reprocessed by the macro-expansion mechanism, thereby ensuring that macro invocations and definitions embedded within macro bodies will be treated in the manner described in Section 8.2.5.

(iv) The macroprocessor makes various syntactic checks. For example, it checks that the parameters appearing in a macro definition are all distinct, and that each macro invocation has as many arguments as the corresponding macro definition has parameters. If an error is detected, a diagnostic message is printed, and any macro action in progress is simply bypassed.

(v) The macroprocessor is structured as a module, which exports just one procedure, namely a parameterless procedure called next_tok, which can be called repeatedly to obtain the sequence of tokens representing the input file after macro expansion. When the input file is exhausted, next_tok will return OM. The macroprocessor module imports just one procedure, namely a parameterless procedure called input_tok. Successive calls to input_tok generate the sequence of input tokens which constitute the macroprocessor's initial input.

```program macroprocessor; -- macroprocessor test
use get_lines_pak;        -- use text  input utility
var
gmac_ctr,        -- counter for generated macro arguments
def_of,            -- maps macro names into their definitions
expanded_tokens;    -- vector of tokens obtained by prior macro expansion

var line_no,line_now,text;     -- globals for input reader

const alphanums :=
"abcdefghijklmnopqrstuvwxyzABCEDFGHIJKLMNOPQRSTUVWXYZ";

const Illformed_list := "ILLFORMED MACRO PARAMETER LIST";
-- error message to avoid later trouble in macro arglist
const comma := ",";

line_now := "";        -- initially line is empty
line_no := 1;        -- initialize line count

gmac_ctr := 0;     -- generated macro argument counter
def_of := {};     -- initially no definitions
expanded_tokens := [ ];  -- initially no prior tokens
text := get_lines("macro.in"); -- read input

toks := [];

while (wd := next_tok( )) /= OM loop toks with:= wd; end loop;
print("" +/ toks);

procedure input_tok(want_blanks);    -- input reader; operates in a 'want blanks' and an 'ignore blanks' mode

if (blanks := span(line_now," \t\n\r"))  /= "" and want_blanks then return blanks; end if;
-- remove or return whitespace

while line_now = "" loop    -- get next line of text
if line_no > #text then return OM; end if;    -- signal end of input
line_now := text(line_no) + "\n"; line_no +:= 1;    -- otherwise set up line
if (blanks := span(line_now," \t\n\r"))  /= "" and want_blanks then return blanks; end if;
end loop;

wd := span(line_now,alphanums);

return if wd /= "" then wd else len(line_now,1) end if;        -- return next word, or at least one character

end input_tok;

procedure another_tok(want_blanks);    -- "token feeder" for macro processor
-- This returns the token standing at the head of expanded_tokens unless
-- expanded_tokens is empty, in which case it calls the "primary" token source
-- input_tok to get the token to be returned. It also operates in a 'want blanks' and an 'ignore blanks' mode

while (tok fromb expanded_tokens) /= OM loop
if tok(1) notin " \t\n\r" or want_blanks then return tok; end if;    -- return nonblank, or blank if wanted
end loop;

return input_tok(want_blanks);

end another_tok;

procedure next_tok;
-- called to obtain successive tokens in the sequence of tokens generated by macro expansion
-- tokens returned by 'input_tok' are returned after all tokens generated by prior macro expansions
-- have been returned.

loop    -- we return to this point whenever macro errors are detected

if (tok := another_tok(true)) = OM then return OM; end if;
-- end of input file encounted

if (tok /= "MACRO") and (mdef := def_of(tok)) = OM then        -- token is ordinary;
end if;

if tok = "MACRO" then        -- start new macro definition

if (parm_list := get_parm_list()) = OM
or (mac_body := get_macro_body()) = OM then
continue;        -- since macro is bad
end if;

[mac_name,mac_pars,mac_gpars] := parm_list;         -- get macro name and parameters

def_of(mac_name) := if mac_body = [] then OM     -- macro drop
else
[mac_pars,#mac_gpars,template(mac_body,mac_pars,mac_gpars)] end if;

else           -- have a macro invocation

[mac_pars,n_gpars,mac_template] := mdef;        -- look up macro definition

if (arg_list := get_arg_list(#mac_pars)) = OM then    -- abort expansion
continue;        -- since number of arguments and number of parameters differ
end if;

for n in [1..n_gpars] loop arg_list with:= [generated_parm()]; end loop;
-- generate additional parameters as required and replace the macro at the
-- start of the expanded_tokens vector by its expansion

expanded_tokens :=
[] +/ [if is_string(mac_tok) then [mac_tok] else arg_list(mac_tok) end if:
mac_tok = mac_template(j)] + expanded_tokens;
end if;

-- now that macro has been expanded, the top of the loop will try again to
-- supply the requested token
end loop;

end next_tok;

procedure get_parm_list;      -- gets sequence of parameters for macro
-- The sequence of parameters collected by this procedure must be a
-- comma comma-separated list opened by a left par~tnthesis and closed by a right
-- parenthesis. If this syntax is violated, or if two parameters are identical,
-- an error message is printed, and OM is returned.

have_gen_parms := false; -- flag: No generated parameters yet

mac_parms := mac_gparms := [];     -- initializes parameters and generalized parameters

if (name := namecop := another_tok(false)) = OM or name /= span(namecop,alphanums) then

print("ILLFORMED MACRO NAME");
return OM;

elseif (tok := another_tok(false)) = ";" then        -- terminating semicolon, macro with no parameters

return [name,[],[]];     -- no parameters

end if;

if not check(tok = "(",Illformed_list) then return OM; end if;        -- otherwise macro list must open with paren

until tok = ")" loop        -- until terminating parenthesis

if not check((tok := another_tok(false)) /= OM, illformed_list) then return OM; end if;        -- get a token

if have_gen_parms then mac_gparms with:= tok; else mac_parms with:= tok; end if;
-- this must be a parameter or generalized parameter
if not check((tok := another_tok(false)) /= OM, illformed_list) then return OM; end if;        -- get following punctuation token

if not check(tok = "," or (tok = ";" and not have_gen_parms) or tok = ")", illformed_list) then return OM; end if;
if tok = ";" then have_gen_parms := true; end if;

end loop;

if not check(another_tok(false) = ";", Illformed_list) then return OM; end if;            -- look for terminating semicolon

return [name,mac_parms,mac_gparms];        -- return macro name, parameter names, and generated parameter names

end get_parm_list;

procedure get_arg_list(n);      -- gets specified number of arguments for macro

if n = 0 then return []; end if;        -- no arguments to get

arglist := [];            -- will collect
current_arg := [];        -- initialize current argument as empty token sequence
if another_tok(false) /= "(" then return OM; end if;        -- check for "(" which opens arglist

while n > 0 loop          -- collect successive arguments

if (tok := another_tok(true)) = OM then return OM; end if;            -- end of file reached

if tok /= "," and not (tok = ")" and n = 1) then
current_arg with:= tok;        -- not delimiting comma or terminating ")", so collect
else                -- have a delimiting comma

arglist with:= current_arg; current_arg := [];        -- collect an argument and start a new ome
n -:= 1;            -- decrement the number of remaining arguments

end if;

end loop;

if tok /= ")" then return OM; end if;        -- check for ")" which terminates arglist

return arglist;

end get_arg_list;

procedure get_macro_body;        -- collects sequence of tokens up to ENDM
-- to allow multiple levels of macro nesting, this procedure must count nesting levels,
-- incrementing the level each time that the token "MACRO" is encountered, and decrementing
-- it each time "ENDM" occurs. Only an occurence of "ENDM" at the original 0 level ends the body.

mbody := [ ];            -- collected body
macnum  := 0;            -- number of embedded macros

loop        -- here we accept whitespace tokens
tok := another_tok(true);

if not check(tok /= OM, "MACRO BODY NOT PROPERLY ENDED: ") then return OM; end if;

if tok = "MACRO" then macnum +:= 1; end if;
if tok = "ENDM" then if (macnum -:= 1) = -1 then exit; end if; end if;
mbody with:= tok;

end loop;

return mbody;

end get_macro_body;

procedure template(mac_body,mac_pars,mac_gpars);
-- This procedure builds the "macro template" stored as the definition of a
-- macro. The template consists of the string constituting the macro body,
-- but with every parameter and generated parameter replaced by an integer.

counter := 0;    -- start count at zero

replacement := {[t,(counter +:=1)]: t in mac_pars + mac_gpars};
-- This maps every macro parameter into its replacement integer

return [replacement(t)?t: t in mac_body];

end template;

procedure generated_parm;     -- auxiliary procedure to produce generated macro parameters.
-- The macro parameters generated by this procedure have the form 'ZZZn',
-- where n is the string representation of an integer. ~ where n is the string representation of an integer. ~-

return "ZZZ" + str(gmac_ctr +:= 1);

end generated_parm;

procedure check(condition,msg);            -- error  checkutility
return if not (condition) then err_msg(msg) else true end if;
end check;

procedure err_msg(message); print(message); return false; end err_msg;    -- error message utility

end macroprocessor;```
The folloing small input file can be used to test the macroprocessor code given above.
```	First some simple macros.
MACRO aaa(xxx , fff;
uuu,vvv); xxx yyy uuu fff ffff vv vvv ENDM
whatza.in aaa(a b c,d) bbb ccc
MACRO aaa;ENDM
aaa you me hoo
Now for some nested macros!
MACRO aaa; MACRO aaa(xxx , fff;
uuu,vvv); xxx yyy uuu fff ffff vv vvv ENDM MACRO uuu; MACRO vvv; www_www ENDM ENDM ENDM
aaa
aaa(BBB,CCC) uuu vvv```

EXERCISES

1. A nondeterministic Turing machine is a Turing machine TM whose action mapping is not constrained to be single-valued. In addition, one particular internal state of each such machine must be designated as its "failed" state. Such machines can be regarded as describing indefinitely large families of computations which proceed in parallel. More specifically, we start with a given tape, tape position, and internal machine state, as in the case of an ordinary Turing machine. Then, whenever the internal state and the character under the machine's read head are such that action(character,state) is multivalued (consisting, say, of n values), we create as many logical copies of the machine as needed and assign one of them to take each of these n actions and continue the computation. This can generate a rapidly expanding set of computations, all proceeding in parallel. If a particular logical copy TMj of TM reaches the special "failed" internal state, the particular path of computation which it is following ceases, and TMj is simply deleted. As soon as any computation TMk reaches an ordinary "stop" condition all other computations are deleted, and the result calculated by this successful logical copy TMk of TM becomes the final result of the nondeterministic computation. On the other hand, if all computations TMk reach the "failed" internal state, the nondeterministic Turing machine computation is considered to have failed. Modify the Turing machine simulation program shown in Section 9.6 so that it can simulate both ordinary and nondeterministic Turing machines.

2. A multitape Turing machine is one which has several separate tapes, with a read-write head on each, whose action on each cycle is determined by its internal state and by the characters found under all of its read-write heads. Modify the Turing machine simulation program shown in Section 9.6 so that it can simulate multitape Turing machines with any specified number of heads.

3. Can you think of any well-defined computing automaton or computational process whose activity could not be simulated by a SETL program? Review Exercises 1 and 2 before you answer.

4. The macroprocessor shown in Section 9.9 is programmed to regard every comma in a macro argument list as a separator. For example, if my mac is a macro name, then the invocation

`          my_mac(f(x, y), z)`
is considered to have three components, namely
`         f(x      y)      z`
This is not the best convention: it would be better to regard commas contained within parentheses or brackets as being invisible to the macroprocessor, so that the macro call shown would be regarded as having just two arguments f(x,y) and z. Modify the macroprocessor so that it behaves in this way.

5. (Continuation of Ex. 4). Especially if the modification suggested in Exercise 4 is made, use of a macroprocessor becomes subject to two dangers:

(a) If the parenthesis terminating an argument list is missing, much of the body of text following a macro invocation may be swallowed up in what appears to be a very long final argument.

(b) If the keyword ENDM ending a macro is missing or misspelled, the text following a macro definition may be swallowed up by the macro definition.

Modify the macroprocessor of Exercise 4 so as to limit each macro argument to 50 tokens and each macro definition to 200 tokens.

Exercises related to the "check processing" system of Section 9.5

6. Modify the check processing system so that it tracks

(a) The total dollar volume of transactions handled each day.

(b) The total dollar credit/debit that the bank using the system has built up against each of its correspondent banks. These quantities should be printed out as additional information by the DAY transaction.

7. Modify the check processing system, adding a new transaction DEL which prints out a list of all accounts for which more than a month has gone by without at least 10% of a customer's outstanding overdraft debit having been paid.

8. Modify the check processing system, adding the following two transactions:

(a) A transaction AB ('abuse') which shows all accounts for which an excess overdraft has accumulated or against which more than 10 ''insufficient funds" charges have been made during the current month.

(b) A transaction | ("idle") which shows all accounts against which no checks have been drawn during the past 6 months.

9. Modify the check processing system, adding transactions O and CL which allow new customer accounts to be opened and closed. Closing of accounts should be handled carefully: such accounts should be marked as having been closed but should not actually be deleted while there exist outstanding transactions, still to be returned by other banks, that might affect the account which is being closed. When an account is finally closed, the balance remaining in it should be used to pay off any outstanding overdraft_debit, and a check for the amount remaining in the account after this final payment should be prepared for mailing. How will you handle an account closing when the balance remaining is insufficient to pay off the overdraft debit?

10. Modify the check processing system so that it can add a short advertisement to the monthly statements being prepared for mailing to customers. The text of this advertisement should be supplied by a transaction of the form

where n is an integer, and where this line will be followed by n more lines giving the text of the advertisement. This transaction must be run just before the DAY transaction which triggers preparation of monthly statements.

11. If you have a checking account, save the next monthly statement you get from your bank, and scrutinize it carefully. How many of the features of this statement suggest that your bank is using a program similar to the check processing program shown in Section 9.5? What features reveal the use of processing steps that our simplified check processing system does not perform? If you can find any such feature, choose one of them and modify the check processing system to include it.

12. To what kinds of human error (e.g. misread or illegible checks, bad adresses entered for accounts, etc.) might the check processing system described in Section 9.5 be exposed? How would it neeed to be modified to make detection of such errors easier, and to allow them to be corrected manually?

13. The degree of compression attained by the Huffman coding procedure shown in Section 9.7 can be increased by using the fact that the probability of encountering a character depends on the character that has just been encountered. That is, we can calculate not one, but a whole family of Huffman trees, one for each high- probability character c in our alphabet; this tree should position other characters d according to the probability that d follows c.

Develop a modified Huffman package which uses these more refined probabilities, and also a modified Huff_tree code which calculates all the Huffman trees required.

14. Storing a Huffman tree requires memory space proportional to the size of the alphabet whose characters are attached to the terminal nodes of the tree. If the improved technique described in Exercise 13 is used, such a tree will have to be stored for each character in the alphabet, and the amount of space required for this can grow unpleasantly large (especially if the data compression procedure is to be reprogrammed for a small machine). In this case, the following expedient can be used to reduce the amount of storage required:

(a) For each character c, establish a limit L(c) which will bound the number of nodes used in the modified Huffman tree built from the frequency count developed for letters following c. This limit should be larger for commonly occurring characters c, smaller for infrequent characters.

(b) For each c, find the L(c) characters which most frequently follow c and "lump" all the other characters into a new character c'. The sum of the frequencies of all these "lumped" characters then becomes the frequency of c'.

(c) Build a Huffman tree for the alphabet of L(c) + 1 characters left after step (b). Then let the code of each character not "lumped" into c' be determined as in Exercise 13, but let the code of each character x "lumped" into c' be the concatenation of the normal Huffman code of c' with the standard internal SETL code of c.

Modify the Huffman encode/decode procedures to incorporate this space- saving refinement.

15. If the "Huff' and "Puff" procedures shown in Section 9.7 are really to be used for compressing large texts, we will want them to produce densely packed character strings rather than SETL-level sequences of zeroes and ones. To achieve this without having to abandon SETL in favor of a language in which sequences of bits can be manipulated directly, we can break the sequence of zeroes and ones that "Huff" would most naturally produce into 8-bit sections, each of which is then represented by a single SETL character. Conversely, when decoding, we can first convert each character in the string being decoded into a string of zeroes and ones. Modify the Huffman routines shown in Section 9.7 so that they work in this way. Your modified setup procedure should construct the extra data structures needed to convert characters into 8-bit sequences of zeroes and ones, and vice versa.

16. The decoding procedure shown in Section 9.7 and further described in Exercise 14 can be accelerated by keeping a map Decode which sends the start (say the first 8 bits) of the sequence s being decoded either into a pair [c, n], where c is the first character obtained by decoding s and n is the number of bits of s that represent this character, or into the node of the Huffman tree that is reached after walking down the tree in the manner determined by the first 8 bits of s, if these 8 bits do not lead us to a terminal node. Rewrite these routines by incorporating the suggested improvements.

17. The Huffman setup procedure shown in Section 9.7 can be made more efficient by saving the sequence of zeroes and ones describing the path from each Huffman tree node traversed. This information can be stored at the node. This makes it unnecessary for the setup procedure to traverse any edge of the Huffman tree more than once. Rewrite setup, incorporating this improvement.

18. The Huff_tree procedure shown in Section 9.7 can be made more efficient by using the treelike structures described in Section 9.7 to accelerate the aux- iliary get_min procedure. Rewrite Huff_tree and get_min, incorporating this improvement.

19. In playing a game, one may wish not only to win as much as possible, but also to win in the smallest possible number of moves. A recursion much like formula (1) of Section 9.8 can be used to determine the minimum number of steps which the winning player will need to bring the game to a successful conclusion. Find this recursion, and use it to develop a variant of the "alpha-beta" game-playing procedure which tells the winning player how to win as rapidly as possible and tells the losing player how to postpone his inevitable defeat as long as possible.

20. The "alpha-beta" game-playing program (see Est_A_can_win, Section 9.8) operates most efficiently if moves likely to return a large Est_A_can_win value are explored first. To guess in advance which moves these are likely to be, one can save the values calculated by Est_A_can_win during each cycle of play and use these values as estimates of move quality the next time it is the same player's turn to move. Write a variant of the Est_A_can_procedure which incorporates this improvement.

SETL Reserved Words

The words on the following page have a predefined meaning within a SETL program, and should only be used for their defined purpose.
 and arb assert body case const continue domain else elseif end exit find for forall from fromb frome if in incs lambda less lessf loop max min mod not notin npow null or otherwise package pow procedure program range rd return rw sel stop subset then until use var when while with wr class native cos sin tan acos asin fix floor ceil float newat abs sign even odd sqrt log exp str char unstr open close gets puts fsize nprint nprinta print printa read reada reads binstr unbinstr get geta eof len any notany span break match lpad rlen om atan tanh atan2 rany rnotany rspan rbreak rmatch rpad is_atom is_boolean is_integer is_real is_string is_set is_map is_tuple is_procedure type date time fexists system command_line abort abend_trap true false

Syntax Diagrams

Throughout this text, syntax diagrams are used to describe the grammatical structure of SETL constructs. For convenience, all syntax diagrams are collected in this appendix.

Each diagram describes the structure of a language construct. Each path through a given diagram traces one valid instance of the corresponding construct. The following conventions are used in drawing a syntax diagram:

(a) Syntactic classes are written in lowercase and enclosed in rectangular boxes.

(b) Terminal symbols of the language (delimiters and keywords) are in boldface and enclosed in rounded boxes.

(c) When the presence of a construct in a given diagram is optional (say the declarations in a program) then a path that bypasses the optional construct appears in the graph above that construct. For example, a procedure body includes the following:

(d) Repetition is indicated by a backward path that passes under the repeated construct. For example, a list of constants is a sequence of one or more constants, separat_d by commas. The corresponding syntax graph for the construct 'constant list' is the following: