Foreword
A word on the audience for whom this book is intended.
Any technical book must, by emphasizing certain details and leaving others unspoken, make certain assumptions about the prior knowledge which its reader brings to its study. This book assumes that its reader has a good knowledge of standard programming techniques, particularly of string manipulation and parsing, and a general familiarity with those parts of mathematics which are analyzed in detail in the main series of definitions and proof scenarios to which much of its bulk is devoted. Less of knowledge of formal logic is assumed. For this reason we try to present needed ideas from logic in a reasonably self-contained way, emphasizing guiding ideas likely to be important in pragmatic extensions of the work begun here, rather than technicalities. Foundational issues, for example consideration of the strength or necessity of axioms, or the precise relationship of our formalism to other weaker or stronger formalisms studied in the literature, are neglected. Because we expect our readers to be programmers of some sophistication, syntactic details of the kind that often appear early in books on logic are underplayed, and we repeatedly assume that anything programmable with relative ease can be taken as routine, and that the properties of such programmable operators can be proved when necessary to some theoretical discussion. This reflects our feeling that understanding develops top-down, focusing on details only as these become necessary.
This belief implies that too much detail is more likely to impede than to promote understanding. Who reads, or would want to read, the Whitehead-Russell Principia, or to testify that its hundreds of formula-filled pages are without error? But since we ask this question, why do we include hundreds of formula-filled pages in this book, and hope to regard it as a successor to this very same Principia? The reason lies in the fact that our formal proof text is fully computer-checked. The CD_ROM accompanying this book, gives all these proofs in computer readable form, along with software capable of checking them. Though relatively useless to the human reader unless their correctness can be verified mechanically, long lists of formulae become useful once such verification has been achieved.
Chapter 1. Introduction
... This then is the advantage of our method: that immediately ... guided only by characters in a safe and really analytic way, we bring to light truths that others have barely achieved by an immense effort of mind and by chance. And therefore we are able to present results within our century which otherwise would hardly be attained in the course of millennia.Gottfried Wilhelm Leibniz, 1679 1.1 Loomings
Logic begins with Aristotle's systematic enumeration of the forms of syllogism, as an attempt to improve the rigor of philosophical (and possibly also political) reasoning. Euclid then demonstrated that reasoning at Aristotle's syllogistic level of rigor could cover a substantial body of knowledge, namely the whole of geometry as known in his day. Subsequent mediaeval work, first in the Islamic world and later in Europe, began to uncover new algebraic forms of symbolic reasoning. Fifteen centuries after Euclid, Leibniz proposed that algebra be extended to a larger symbolism covering all rigorous thought. So two basic demands, for rigor and for extensive applicability, are fundamental to logic.
Leibniz did little to advance his proposal, which only began to move forward with the much later work of Boole (on the algebra of propositions), the 1879 Concept-Notation (Begriffsschrift) of Frege, and Peano's axiomatization of the foundations of arithmetic. This stream of work reached a pinnacle in Whitehead and Russell's 1910 demonstration that the whole corpus of mathematics could be covered by an improved Frege-like logical system.
Developments in mathematics had meanwhile prepared the ground for the Whitehead-Russell work. Mathematics can be seen as the combination of two forms of thought. Of these, the most basic is intuitive, and, as shown by geometry (or more primitively by arithmetic), often inspired by experience with the physical world which it captures and abstracts. But mathematics works on this material by systematically manipulating collections of statements about it. Thus the second face of mathematics is linguistic and formal. Mathematics attains rigor by demanding that the statement sequences which it admits as proofs conform to rigid formal constraints. For this to be possible, the pre-existing, intuition-inspired content of mathematics must be progressively resolved into carefully formalized concepts, and thus ultimately into sentences which a Leibniz-like formal logical language can cover. A major step in this analysis was Descartes reduction, via his coordinate method, of 2- and 3-dimensional geometry to algebra. To complete this, it became necessary to solve a nagging technical problem, the 'problem of the continuum', concerning the system of numbers used. An intuition basic to certain types of geometric reasoning is that no continuous curve can cross from one side of a line to another without intersecting the line in at least one point. To capture this principle in an algebraic model of the whole of geometry one must give a formal definition of the system of 'real' numbers which models the intuitively conceived real axis, must top this by giving a formal definition of the notion of continuity, and must use this definition to prove the fundamental theorem that a continuous function cannot pass from a positive to a negative value without becoming zero somewhere between.
This work was accomplished gradually during the 19th century. The necessary definition of continuity appeared in Cauchy's Cours d'Analyse of 1821. A formal definition of the system of 'real' numbers rigorously completing Cauchy's work was given in Dedekind's 1872 study Continuity and Irrational Numbers. Together these two efforts showed that the whole of classical calculus could be based on the system of fractions, and so, by a short step, on the whole numbers. What remained was to analyze the notion of number itself into something more fundamental. Such an analysis, of the notion of number into that of sets of arbitrary objects standing in 1-1 correspondence, appeared in Frege's 1884 Foundations of Arithmetic, was generalized and polished in Cantor's transfinite set theory of 1895, and was approached in alternative, more conventionally axiomatic terms by Peano in his 1894 Mathematical Formulary. Like Whitehead and Russell's Principia Mathematica, the series of definitions and theorems found later in this work walks the path blazed by Cauchy, Dedekind, Frege, Cantor, and Peano.
As set theory evolved, its striving for ultimate generality came to be limited by certain formal paradoxes, which become unavoidable if the doors of formal set-theoretic definition are opened too widely. These arise very simply. Suppose, for example, that we allow ourselves to consider 'the set of all sets that are not members of themselves'. In a formal notation very close to that continually used below, this is simply s = {x: x notin x}. But now consider the proposition 's in s'. On formal grounds this is equivalent to 's in {x: x notin x}', and so by the very definition of set membership to the proposition 's notin s'. So in these few formal steps we have derived the proposition
s in s *eq s notin s,
a situation around which no coherent logical system can be built. The means adopted to avoid this immediate collapse of the formal structure that one wants to build is to restrict the syntax of the set-formers which can legally be written, in a way which forbids constructions like {x: x notin x} without ruling out the similar but somewhat more limited expressions needed to express the whole of standard mathematics. These fine adjustments to the formal structure of logic were worked out, first by Russell and Whitehead, later and a bit differently by their successors.
A higher technical polish was put on all this work by 20th century efforts. Cantor's work was extended, and began to be formalized, by Zermelo in 1908, and more completely formalized by Fraenkel in 1923. The axiomatization of set theory at which they arrived is called Zermelo-Fraenkel set theory. Starting in 1905 the great German mathematician David Hilbert began the influential series of studies of the algebra of logic, later summarized in his 1939 work Foundations of Mathematics (with Paul Bernays). First in his 1925 paper 'An Axiomatization of Set Theory', and then in a fuller 1928 version, John von Neumann elegantly recast the Zermelo-Fraenkel set formalism, along with Frege's analysis of the concept of number, by encoding the integers set-theoretically: the number 0 as the empty set, 1 as the singleton-set {0}, 2 as the set {0,1}, and more generally each integer n as the n-element set {0,1,...,n - 1}. A corresponding, equally elegant definition of the notions of ordinal and cardinal numbers (both finite and infinite) was given in von Neumann's carefully honed formalism, from which the more computer-oriented exposition found later in the present work derives very closely.
Especially at first, Hilbert's logical studies stood in a positive relation to the program proposed by Leibniz, since it was hoped that close analysis of the algebra of logic might in principle lead to a set of algorithms allowing any mathematical statement to be decided by a suitable calculation. But the radical attack on the intuitive soundness of non-constructive Cantorian reasoning and of the conventional foundations of mathematics published by the Dutch mathematician L.E. J. Brouwer in 1918 led Hilbert's work in a different direction. Hilbert hoped that the 'meta-mathematical' tools he was developing could be used to reply to Brouwer's critique. For this reply, a combinatorial analysis of the algebra of logic, to which Brouwer could have no objections since only constructive arguments would be involved, would be used metamathematically to demonstrate formal limits on what could be proved within standard mathematics, and in particular to show that no contradiction could follow from any standard proof. Once done, this would demonstrate the formal consistency of standard mathematics within a Brouwerian framework. But things turned out differently. In a startling and fundamentally new development, the metamathematical techniques pioneered by the Hilbert school were used in 1931 by Kurt Gödel to show that Hilbert's program was certainly unrealizable, since no logical system of the type considered by Hilbert could be used to prove its own consistency. The brilliance of this result changed the common professional view of logic, which came to be seen, not as a Leibnizian engine for the formal statement and verification of ordinary mathematics, but as a negatively-oriented tool for proving various qualitative and quantitative limits on the power of formalized mathematical systems.
In the late 1940's the coming of the computer brought in new influences. Expression in a rigorously defined system of formulae makes mathematics amenable to computer processing, and daily work with computer programs makes the absolute rigor of formalized mathematical systems obvious. The possibility of using computer assistance to lighten the tedium (so evident in Russell and Whitehead) of fully formalized proof began to make the Leibniz program seem more practical. (Initially it was even hoped that suitably pruned computer searches could be used rather directly to find many of the ordinary proofs used in mathematics). The fact that the methods of formalized proof could be used to check and verify the correctness of computer programs gave economic importance to what would otherwise remain an esoteric endeavor. Computerized proof verifier systems, emphasizing various styles of proof and potential application areas, began to appear in the 1960's. The system described in the present text belongs to this stream of work.
A fully satisfactory formal logical system should be able to digest 'the whole of mathematics', as this develops by progressive extension of mathematics-like reasoning to new domains of thought. To avoid continual reworking of foundations, one wants the formal system taken as basic to remain unchanged, or at any rate to change only by extension as such efforts progress. In any fundamentally new area work and language will initially be controlled more by guiding intuitions than by precise formal rules, as when Euclid and his predecessors first realized that the intuitive properties of geometric figures in 2 and 3 dimensions, and also some familiar properties of whole numbers, could be covered by modes of reasoning more precise than those used in everyday life. Similarly, the initially semiformal languages that developed around studies of the 'complex' and 'imaginary' roots of algebraic equations, the 'infinitesimal' quantities spoken of in early versions of the calculus, the 'random' quantities of the probabilist, and the physicist's 'Dirac delta functions', all need to be absorbed into a single formal system. This is done by modeling the intuitively grasped objects appearing in important semi-formalized languages by precisely defined objects of the formal system, in a way that maps all the useful statements of the imprecise initial language into corresponding formulae. If less than vital, statements of the initial language that do not fit into its formalized version can then be dismissed as 'misunderstandings'.
The mathematical developments surveyed in the preceding discussion succeeded in re-expressing the intuitive content of geometry, arithmetic, and calculus ('analysis') in set-theoretic terms. The geometric notion of 'space' maps into 'set of all pairs (or triples) of real numbers', preparing the way for consideration of the 'set of all n-tuples of real numbers', as 'n-dimensional space', and of more general related constructs as 'infinite dimensional' and 'functional' spaces. The 'figures' originally studied in geometry map, via the 'locus' concept, into sets of such pairs, triples, etc. The next necessary step is to analyze the notion of real number into something more basic, the essential technical requirement for this being to ensure that no function roots (e.g. Pythagoras' square root of 2) are 'missing'. As noted above, this was accomplished by Dedekind, who reduced 'real number x' to 'nonempty set x of rational numbers, bounded above, such that every rational not in x is larger than every rational in x'. To eliminate everything but set theory from the formal foundations of mathematics, it only remains (since 'fractions' can be seen as pairs of numbers) to reduce the notion of 'integer' to set-theoretic terms. This was done by Cantor and Frege: an integer is the class of all finite sets in 1-1 correspondence with any one such set. None of the other important mathematical developments enumerated in the preceding paragraph required fundamental extension of the set-theoretic foundation thereby attained. Gauss realized that the 'complex' numbers used in algebra could be modeled as pairs of real numbers, Kolmogrorov modeled 'random' variables as functions defined on an implicit set-theoretic measure space, and Laurent Schwartz interpreted the initially puzzling 'delta functions' in terms of a broader notion of generalized function defined systematically in set theoretic terms. So all of these concepts were digested without forcing any adjustment of the set-theoretic foundation constructed for arithmetic, analysis, and geometry. This foundation also supports all the more abstract mathematical constructions elaborated in such 20th century fields as topology, abstract algebra, and category theory. Indeed, these were expressed set-theoretically from their inception. So (if we ignore a few ongoing explorations whose significance remains to be determined) set theory currently stands as a comfortable and universal basis for the whole of mathematics.
It can even be said that set theory captures a set of reality-derived intuitions more fundamental than such basic mathematical ideas as that of number. Arithmetic would be very different if the real-world process of counting did not return the same result each time a set of objects was counted, or if a subset of a finite set S of objects proved to have a larger count than S. So, even though Peano showed how to characterize the integers and derive many of their properties using axioms free of any explicit set-theoretic content, his approach robs the integers of much of their intuitive significance, since in his reduced context they cannot be used to count anything. For this and the other reasons listed above, we will work with a thoroughly set-theoretic formalism, contrived to mimic the language and procedures of standard mathematics closely.
The special nature of mathematical reasoning within human reason in general
The syllogistic patterns characteristic of mathematical reasoning derive from, and thus often reappear in, other reasoned forms of human discourse, for example in arguments offered by lawyers and philosophers. Mathematical reasoning is distinguished within this world of reason by its rigorous adherence to the pattern originally set by Euclid. Some fixed set of statements, the axioms, perhaps carrying some insight about an observed or intuited world, must be firmly set down. Certain named predicates (and perhaps also function symbols) will appear in these axioms. The ensuing discourse (which may be lengthy) must work exclusively with properties of these predicates (and symbols) which follow formally from the axioms, precisely as if these predicates had no meaning other than that which the axioms give them. When new vocabulary is introduced (as will generally be necessary to provide intellectual variety and sustain interest) this must be by formal definition in terms of predicates (and function symbols) which either are those found in the axioms or which appear earlier in the discourse. Such extensions of vocabulary are subject to rules which ensure that all new symbols introduced can be regarded as tools of art which add nothing fundamental to the axioms. That is, mathematics' rules of definition ensure that allowed extensions of vocabulary cannot make it possible to prove any statement made in the original vocabulary that could not be proved, in the axioms' original vocabulary, from the axioms. This rule, which insists that definitions must be devoid of all hidden axiomatic content, is fundamental to mathematics. It will appear in our later technical discussions as the conservative extension principle.
Legal, philosophical, and scientific reasoning commonly fail to observe the rules which restrict mathematical discourse, since these styles of agument allow new terms with explicitly or implicitly assumed properties to be introduced far more freely. Science cannot avoid this, since it is dedicated to exploration of the world in all its variety, and must therefore speak of what it finds as best it can. But unconstrained introduction, into a line of reasoning, of even a few new terms having implicitly assumed properties can readily become an engine of deception (and of self-deception). Science tries to avoid such self-deception by taking all of its reasoned outcomes as provisional subject to comparison with observed reality. If observation conflicts with the outcome of a line of scientific reasoning, the assumptions and informal definitions entering into it will be adjusted until better agreement is attained. Legal and philosophical reasoning, lacking this mechanism, remain more permanently able to be used as engines of deception (perhaps deliberate) or of self-deception (which has its intellectual delights).
1.2 Proof verifiers
A Proof verifier is an interactive program for manipulation of the state of a mathematical discourse. It allows computer checking of such discourse in full detail, and collection of the resulting theorems for subsequent re-use. It must
(a) only allow theorems to be derived;
(b) allow all theorems to be derived.
Besides their theoretical interest, proof verifiers have one potential practical use: Program Verification. To adapt a proof verifier to this use, we can simply annotate (ordinary procedural) programs with assertions A breaking every loop in their control flow. Then, for every path forward through the annotated program P and its assignments
x1 := expn(x1,...,xn) running from an assertion A1 immediately before such an assignment to an assertion A2 immediately after the assignment we must show that
(FORALL x1,...,xn | A1(x1,...,xn) *imp A2(expn(x1,...,xn),x2,...,xn))
holds. Once this has been done systematically throughout the program, we can be sure that the program is correct.
To give proofs acceptable to a programmed verifier, i.e. proofs every one of whose details can be checked by a computer, we must 'walk in shackles'; but then we want these shackles to be as light as possible. That is, we want the ordinary small steps of mathematical discourse to remain small, rather than expanding into tedious masses of detail. We aim for a formalized interactive conversation with the computer whose general 'feel' resembles that of ordinary mathematical exposition. The better we succeed in achieving this, the closer the verifier comes to passing the 'Turing test', at least in the restricted mathematical setting in which it is designed to operate. So the internal structure of a successful proof verifier can be seen as a model both of mathematics and of mathematical intelligence, which is an important, albeit limited, form of intelligence in general.
1.3 Informal introduction to the formalism in which we will work
A proof verifier must provide various tools. First of all, it must allow the elementary steps of proofs to be expressed by formulae in some agreed-on system. These formulae become the elementary steps which the system allows. The system-provided tools, which embody the system's 'deduction rules', must allow manipulation of these formulae in ways which mimic the normal flow of a mathematical discourse.
The collection of proofs presented to a verifier for validation is expressed as a sequence of logical formulae, to which we may attach formalized annotations to guide the action of the verifier. Given such a sequence of formulae, the verifier first checks all the statements presented to it for syntactic legality, and then goes on to verify the successive statements of each proof. As in ordinary proof, the verifier's user aims to guide discourse along paths which bring designated target theorems into the collection of proved statements. This is done by arranging the formulae (proof steps) of the discourse in such a way as to ensure that each step encountered satisfies the conditions required for it to be accepted as a consequence of what has gone before. This will be the case in various situations, each corresponding to one of the basic deduction rules which the system allows. Broadly speaking, these are as follows:
(i) Immediate deduction
The collection of statements already accepted as proved are always included in a 'penumbra' D of additional statements which follow from them as elementary consequences. The verifier as programmed is able to check that each statement in D follows immediately from statements already accepted. Some well-known examples are as follows.
(a) If a formula F in a proof is preceded by an (already accepted) formula G, and by a second (already accepted) formula of the form 'G *imp F', where '*imp' is the operator sign designating implication, then F will be accepted;
(b) If a formula 'x in E' in a proof is preceded by an (already accepted) formula 'x in H', and by a second (already accepted) formula 'E incs H', 'where 'incs' is the operator sign designating set-theoretic inclusion, then 'x in E' will be accepted;
(c) If (c.1) we are given a formula having the syntactic structure 'P(e)', where 'P(x)' is a formula containing a variable x, and P(e) is the result of replacing each of the occurrences of x in P with an occurrence of the (syntactically well-formed) subexpression 'e'; (c.2) the formula P(e) is preceded by an (already accepted) formula (FORALL x | P(x)), where the symbol 'FORALL' designates the 'universal quantifier' construct of logic, then P(e) will be accepted.
The more we can enlarge the available family of immediate deductions by extending a verifier's immediate-deduction algorithms, the more we will succeed in reducing the number of steps needed to reach our target theorems. Means for doing this are explored later in this chapter, and then more systematically in Chapter 3.
(ii) Proof by 'supposition' and 'discharge' ('Natural Deduction')
At any point in a proof, any syntactically well-formed statement S can be introduced for provisional use by including a verifier directive of the form
Suppose ==> S.
Conclusions can be drawn from such statements in the normal way, but such conclusions are not accepted as having been definitively proved, but only as having been 'provisionally proved', subject to the 'assumption' expressed by S. However, if such an assumption S can be shown to lead to the impossible conclusion 'false', then S can be 'discharged', i.e. its negation 'not S' can be accepted as a definitely proved formula. This manner of proceeding mimics the familiar method of 'proof by contradiction' (also called 'reductio ad absurdum') of ordinary mathematical discourse.
(iii) Use of definitions
Statements which introduce entirely new constant or function names can be true 'by definition'. Suppose, for example, that constants b and c, and a monadic function symbol f, have already been introduced into a discourse, and that d is a name not previously used. Then the statement
d = f(b,f(c,b))
can be accepted immediately, since it merely defines d, i.e. makes an initial reference to an object d concerning which we know nothing else. Such definitions are subject to rules which serve to ensure that the new symbols introduced by such definitions imply only those properties of previously introduced symbols which are entailed by our previous knowledge concerning them. For example, a statement like
b = f(b,f(d,b))
is not a valid definition for a new constant d, since at the very least it implies that there exists some x for which b = f(b,f(x,b)) (and this may be false).
Definitions serve various purposes. At their simplest they are merely abbreviations which concentrate attention on interesting constructs by assigning them names which shorten their syntactic form. (But of course the compounding of such abbreviations can change the appearance of a discourse completely, transforming what would otherwise be an exponentially lengthening welter of bewildering formulae into a sequence of sentences which carry helpful intuitions). Beyond this, definitions serve to 'instantiate', that is, to introduce the objects whose special properties are crucial to an intended argument. Like the selection of crucial lines, points, and circles from the infinity of geometric elements that might be considered in a Euclidean argument, definitions of this kind often carry a proof's most vital ideas.
As explained in more detail below, we use the dictions of set theory, in particular its general set formers, as an essential means of instantiating new objects. As we will show later by writing a hundred or so short statements which define all the essential foundations of standard mathematics, set theory gives us a very flexible and powerful tool for making definitions.
Our system allows four forms of definition. The first of these is definition using set formers (or 'algebraic constructions' more generally), as exemplified by
Un(s) := {y: x in s, y in x}
(which defines 'the set of all elements of elements of s', i.e. 'the union of all elements of s'), and assigns it the symbol 'Un' (which must never have been used previous to this definition). A second example is
Less_usual(s) := {y: x in s, y in x} - s
(which defines 'the set of all elements of elements of s which are not directly elements of s').
The second form of definition allowed generalizes this kind of set-theoretic definition in a less commonly used but very powerful way. In ordinary definitions, the symbol being defined can only appear on the left-hand side of the definition, not on its right. This standard rule prohibits 'circular' definitions. In a recursive definition this rule is relaxed. Here the symbol being defined, which must designate a function of one or more variables, can also appear on the right of the definition, but only in a special way. More specifically, we allow function definitions like
f(s,t) := d({g(f(x,h1(t)),s,t): x in s | P(x,f(x,h2(t)),s,t)})
where it is assumed that d, g, h1, h2, and P are previously defined symbols and that f is the symbol being defined by the formula displayed. Here circularity is avoided by the fact that the value of f(s,t) can be calculated from values f(x,t') for which we can be sure that x is a member of s, so x must come before s in the implicit (possibly infinite) sequence of steps which build sets up from their members, starting with the empty set as the only necessary foundation object for the so-called 'pure' theory of sets.
'Transfinite recursive' definitions like that displayed above give us access to the sledgehammer technique called 'transfinite induction', which like other sledgehammers we use occasionally to break through key obstacles, but generally set aside.
The third and fourth forms of definition allowed, 'Skolemization' and use of 'theories', are explained later.
1.4 More about our formalism
Any formalism begins with some initial 'endowment', i.e. system of allowed formulae and built-in rules for the derivation of new formulae from old. If one intends to use such a formalism as a basis for metamathematical reasoning, one may aim to simplify the implied combinatorial analyses of the formalism by minimizing this endowment. But we intend to use our formalism to track ordinary mathematical reasoning as closely and comfortably as we can; hence we streamline the endowment of formulae and formula transformations with which our system begins, but try to maximize its power. Accordingly, the system we propose incorporates various very powerful means for definition of objects and proof of their properties.
Propositional and predicate calculus
First consider what is most necessary, which we will handle in entirely standard ways. The apparatus of Boolean reasoning is needed if we are to make such statements as 'a and b are both true', 'a or b is true', 'a implies b', etc. The 'propositional calculus' required for this is elementary, and easily automated. We simply adopt this calculus, writing its operators as '&' (conjunction), 'or' (disjunction), 'not' (negation), '*imp' (implication), '*eq' (logical equivalence). Our system is decidable, in the sense it includes an algorithm able to detect statements which are universally true by virtue of their propositional form. This will, for example, automatically detect that
(p *imp q) *imp ((not q) *imp (not p)) and
(F(x + y) = F(F(x)) *imp F(F(x)) = 0) *imp (F(F(x)) /= 0) *imp (F(x + y) /= F(F(x)))) are both always true. The first of these formulae belongs directly to the 'propositional calculus'. Automatic treatment of the second formula uses a fundamental internal system operation called 'blobbing', which works by reducing formulae to skeletal forms legal in some tractable sublanguage of the full set-theoretic language in which we work. Applied to the second formula displayed above, 'blobbing' sees it to have a Boolean skeleton identical to that of the first. More is said about this important technique below.
Statements of the form 'for all..' and 'there exists ...', as in 'for all integers n greater than 2 there exists a unique non-decreasing sequence of prime integers whose product is n', are obviously needed for mathematics. To handle these, we adopt the standard apparatus of the 'predicate calculus' (or more properly 'first order predicate calculus'). This extends the propositional calculus by allowing its proposition-symbols p,q,... to be replaced by predicate subformulae constructed recursively out of
(i) constants c and variables x denoting specified or arbitrary objects drawn from some (implicit) universe U of objects.
(ii) Named predicates, e.g. P(x,y), Ord(x), Between(x,c,z), depending on some given number of constants and variables, which for each combination x,y,... yield some true/false (i.e. Boolean) value.
(iii) Named function symbols, e.g. f(x), g(x,y), h(x,c,z), depending on some given number of constants and variables, which for each combination x,y,... chosen from the 'universe' U yield an object belonging to this same universe.
(iv) Two 'quantifiers',
(FORALL x | P(x)) and (EXISTS x | P(x)), respectively representing the constructs 'for all possible values of the variable x , P(x) (the statement which follows the vertical bar) is true' and 'there exists some value of the variable x for which P(x) (the statement which follows the vertical bar) is true'. For example, to express the condition that at least one of the predicates P(x) and Q(x) is true for each possible value of the variable x, we write
(FORALL x | P(x) or Q(x)).
To state that exactly one of these conditions is true for every possible value of the variable x, we can write
(FORALL x | (P(x) or Q(x)) & (not (P(x) & Q(x))))
To state that for each possible value of the variable x having the property P(x) there exists a value standing in the relationship R(x,y) to it, we can write
or equivalently (1a) (FORALL x | P(x) *imp (EXISTS y | R(x,y))),
(1b) (FORALL x | (EXISTS y | P(x) *imp R(x,y))).
It should be plain that this predicate notation allows us to write universally and existentially quantified statements generally, provided only that names are available for all the multivariable predicates in which we are interested.
Intuitively speaking, a universally quantified (resp. existentially quantified) formula represents the conjunction (resp. disjunction) of all possible cases of the formula; e.g., (FORALL x | P(x)) can be regarded as a formalized abbreviation for the 'infinite conjunction' that might be written informally as
P(x1) & P(x2) & P(x3) & , ...
where x1, x2, x3,... is an enumeration of all the values which the variable x can assume. Similarly, an existentially quantified statement like (EXISTS x | P(x)) can be regarded as a formalized abbreviation for the 'infinite disjunction' that might be written as
P(x1) or P(x2) or P(x3) or ... .
This shows us why the two predicate formulae (1a) and (1b) displayed above are equivalent, namely this informal style of interpretation explicates (FORALL x | P(x) *imp (EXISTS y | R(x,y))) as
(P(x1) *imp (EXISTS y | R(x1,y))) & (P(x2) *imp (EXISTS y | R(x2,y))) & ...
and hence as
(1) (P(x1) *imp (R(x1,x1) or R(x1,x2) or R(x1,x3) or ...)
& (P(x2) *imp (R(x2,x1) or R(x2,x2) or R(x2,x3) or ...)
& ... .
Expansion of (FORALL x | EXISTS y | P(x) *imp R(x,y)) in exactly the same way results in
(2) ((P(x1) *imp R(x1,x1)) or (P(x1) *imp R(x1,x2)) or (P(x1) *imp R(x1,x3)) or ...)
& ((P(x2) *imp R(x1,x1)) or (P(x2) *imp R(x1,x2)) or (P(x2) *imp R(x1,x3)) or ...)
& ... .
Applying the standard propositional reduction of the implication operator 'p *imp q' to '(not p) or q' and using the commutativity of the disjunction operator 'or', we can rewrite the first line of (1) as
(not P(x1)) or (R(x1,x1) or R(x1,x2) or R(x1,x3) or ...)
and the first line of (2) as
(not P(x1) or R(x1,x1)) or (not P(x1) or R(x1,x2)) or (not P(x1) or R(x1,x3)) ... ,
respectively, and similarly for all later lines. But since disjunction is idempotent, i.e.
p or p or p or ...
is exactly equivalent to p, the two propositional expansions seen above are equivalent. Hence the claimed equivalence of
(FORALL x | P(x) *imp (EXISTS y | R(x,y))) and (FORALL x | EXISTS y | P(x) *imp R(x,y))
is intuitively apparent. We will explain later how the predicate calculus manages to handle all of this formally.
Set theory: the third main ingredient of our formalism
We view set theory as the established language of mathematics and take a rich version of it as fundamental. In particular, the language with which we will work includes a full sublanguage of set formers, constrained just enough to avoid paradoxical constructions like the {x: x notin x} setformer discussed above. Setformer expressions like
{e(x): x in s | P(x)},
{e(x,y): x in s(y) | P(x,y)},
{e(x,y,z): x in s(z), y in s'(x,z) | P(x,y,z)}
and even
{e(x,y,z,w): x in s(w), y in s'(x,w), z in s''(x,y,w) | P(x,y,z,w)}
are all allowed, as are
{e(x): x *incin s | P(x)},
{e(x,y): x *incin s(y) | P(x,y)},
{e(x,y,z): x *incin s(z), y *incin s'(x,z) | P(x,y,z)},
and
{e(x,y,z,w): x *incin s(w), y in s'(x,w), z *incin s''(x,y,w) | P(x,y,z,w)},
which use the sign '*incin' designating set inclusion in place of one or more occurrences of the sign 'in' (designating set membership).
Set formers have several crucial advantages as language elements. First of all, they give us very powerful means for defining most mathematical objects of strategic interest. This allows the very succinct series of mathematical definitions given later, which lead in roughly 100 lines from rudimentary set theoretic concepts to core statements in analysis (e.g. the Cauchy integral theorem). A second advantage of set formers traces back to the fact that the human mind is 'perception dominated', in the sense that we all depend heavily upon many innate perceptual abilities, which operate rapidly and subconsciously, and by which the conscious (and reasoning) abilities of the mind are largely limited. Perceivable things and relationships can be dealt with rapidly. Where direct perception fails, we must fall back on more tortuous processes of reconstruction and detection, slowing progress by orders of magnitude. Hence the importance of notations, diagrams, graphs, animations, and scientific visualization techniques generally (e.g. the Arabic numerals, algebra, calculus, 'commutative diagrams' in topology, etc.). Among innate perceptual abilities we count the ability to decode spoken and written language, to remember phrases and simple relationships among them, and to recognize various language-like but somewhat more abstract syntactic structures. From this point of view, much of the importance of set theory and its set-former notations lies in the fact that their syntax reveals various simplifications and relationships with which the mind operates comfortably. These include:
(i) Various algebraic transformations of set formers, of which
{e(x): x in {e'(y): y in s | Q(y)} | P(x)} = {e(e'(y)): y in s | P(e'(y)) & Q(y)}
and
{e(x): x in {e'(y,z): y in s1, z in s2 | Q(y,z)} | P(x)} =
{e(e'(y,z)): y in s1, z in s2 | P(e'(y,z)) & Q(y,z)}
are typical.
(ii) Setformer expressions make various important monotonicity and domination relationships visible. For example, a glance at
{e(x): x in s | F(x) in s - t} tells us that this expression is monotone increasing in s and monotone decreasing in t. From this, a statement like
(g(a) incs g(b) & h(a) *incin h(b)) *imp ({e(x): x in s | F(x) in g(a) - h(a)} incs {e(x): x in s | F(x) in g(b) - h(b)})is obvious by elementary reasoning concerning set unions, differences, and inclusions, which an algorithm can handle very adequately.
Deductions like this are frequent in the long sequence of steps which we will use to verify the standard mathematical material at which this text aims. Hence the stress we lay on deduction methods like that just explained, which we make available within our system under such names as ELEM ('elementary set-theoretic deduction', expanded as much as we dare), SIMPLF (deduction methods based on algebraic simplification), etc. Hence also the special methods provided to deal with set-theoretic, predicate, and algebraic monotonicity.
The setformer constructs described above, and the other elementary operations of set theory, play two roles. On the one hand, they define operations on finite sets which can be implemented explicitly, for example by programming them systematically so as to create a full programming language which allows free use of finite sets as data objects. On the other hand, they define a language in which one can talk about a much larger universe of infinite sets, even though such sets can have no explicit representation other than the formulae used to speak of them. Since the formulae used to speak of infinite sets are the same as those used for finite sets, and since much the same axioms are assumed for sets of both kinds, many of the properties deduced for infinite sets stand in analogy to the more directly visible properties of finite sets.
A few simple but basic set constructs. The operation {x,y} which forms the (unordered) pair of two sets is an important but entirely elementary set operation. For this we have
z in {x,y} *eq (z = x or z = y).Then plainly {x,x} satisfies z in {x,x} *eq z = x, so {x,x} is the singleton {x} whose only member is x.The setformer expression
Un(x) := {z: y in x, z in y}defines the set of all z which are elements of some element of x. This is the so-called 'general union set' of x, which can be thought of as 'the union of all elements of x'. Since we havez in Un({x,y}) *eq (z in x or z in y),Un({x,y}) is the set of all z which are either members of x or of y. This very commonly used operation is generally written as x + y. Given any two sets x and y, it gives us a way of constructing a set at least as large as either of them, of which both are subsets.We can use the union operator to define the sets having three, four, etc. given elements by writing
{x,y,z} = {x,y} + {z}, {x,y,z,w} = {x,y,z} + {w},...It is easily proved from these definitions thatu in {x,y,z} *eq (u = x or u = y or u = z), u in {x,y,z,w} *eq (u = x or u = y or u = z or u = w),etc. The intersection operator, which gives the common part of two sets s and t, can be defined directly by a setformer:s * t := {x: x in s | x in t}.The powerset operator, which gives the set of all subsets of a set s, can also be defined by a setformer expression:pow(s) := {x: x *incin s}.The choice operator 'arb'. The less elementary 'choice' operation arb(s) reflects the intuition, verifiably true in the hereditarily finite case discussed in Chapter 2, that all sets can be constructed in an order in which all the elements of set s are constructed before s itself is constructed. Since, as we shall see, a finite string representation is available for each hereditarily finite set, we can arrange such sets in order of the length of their string representations. Then arb(s) can be defined for each finite set as the first member of s, in this standard order. We complete this definition for the one special case in which s has no members, i.e. is the null set, by agreeing that arb({}) = {}. Then, for each nonempty set s, arb(s) must be disjoint from s, since if x were a common member of s and arb(s), x would have to be an element of s coming earlier than arb(s) in standard order, contradicting our definition of arb(s) as the first element of s in this order. Hence, whenever this notion of 'construction in some standard order' applies, we can expect the 'arb' operator, defined in the manner just explained, to satisfy
(FORALL s | (s = {} & arb(s) = {}) or (arb(s) in s & arb(s) * s = {})).This statement, intuitively justified in the manner just explained, is taken as an axiom in the version of set theory used in this book. It is assumed to apply to all sets, whether finite or infinite. In conventional terms, this axiom states a very strong form of the so-called 'axiom of choice': arb chooses a first element from each nonempty set, 'first' in the sense that there exists no other element of s which is also an element of arb(s).It follows that there can exist no set x for which
x in x.For if there were, we would have arb({x}) = x, and so x would be a common element of {x} and arb({x}), contradicting our assumption concerning 'arb'. It follows similarly that there can exist no 'membership cycle', i.e. no sequence x1,x2,...,xn of sets of which each is a member of the next and for which the last is a member of the first. For if there were, we would have arb({x1,x2,...,xn}) = xj for some j, and then either xj - 1 or xn would be a common element of arb({x1,x2,...,xn}) and {x1,x2,...,xn}. Much the same argument shows that there can exist no infinite sequence x1,x2,...,xn,... for which each xj + 1 is a member of xj. Note however that x, {x},{{x}},... is always a sequence each of whose components is an element of the next following component.The 'arb' operator as the basis for proofs by transfinite induction. The standard (Peano) principle of mathematical induction is equivalent to the statement that every non-empty set s of integers contains a smallest element n0. For suppose that P(n) is a predicate, defined for integers, for which the implication
(FORALL n | (FORALL m | (m < n) *imp P(m)) *imp P(n))has been established, that is, for which P(n) must be true for a given n if it is true for all smaller m. Then P(n) must be true for all integers n. For if not, the set of all integers n such that P(n) is false will be nonempty, and so will contain a smallest integer n0. But then P(m) is clearly true for all m < n0, implying that P(n0) is true, contrary to assumption.Use of the 'arb' operator allows us to extend this very convenient style of inductive reasoning to entirely general sets, irrespective of whether they are finite or infinite. Suppose, more specifically, that P(s) is a predicate, defined for sets, for which the implication
(FORALL s | (FORALL t | (t in s) *imp P(t)) *imp P(s))has been established. That is, we suppose that P(s) must be true for a given s if it is true for all members of s. Then P(s) must be true for all sets s. For if not, then P(s) must be false for some member s1 of s. Repeating this argument, we see that there must exist a member s2 of s1 for which P(s2) is false, then a member s3 of s2 for which P(s3) is false, and so forth. This gives us an infinite sequence s = s0,s1,s2,...,sn,..., each component of which is a member of the preceding component, which we have seen to be impossible.This very broad generalization of the ordinary principle of mathematical induction is called the principle of transfinite induction. It plays much the same role for the infinite ordinals discussed in the next section that the ordinary principle of mathematical induction plays for integers.
Ordered pairs We need, in many situations, not the unordered pair construct {x,y} described above, but rather an ordered pair construct [x,y]. The only properties of [x,y] that we require are: (i) [x,y] is defined for any two sets x, y and is itself a set; (ii) the pair [x,y] defines its two components x and y uniquely, i.e. there exist operations car(z) and cdr(z) such that car([x,y]) = x and cdr([x,y]) = y for all x and y. It is not necessary to add these statements as additional set-theoretic axioms, since the necessary pairing operations can be defined using the unordered pair construct {x,y} and the arb operator, in any number of (artificial) ways (none of them having any particular significance). For example, we can use the definition
[x,y] := {{x},{{x},{{y},y}}}.Then arb([x,y]) = {x}, since the only other element of {{x},{{x},{{y},y}}} has the element {x} in common with [x,y]. Thus the expression arb(arb([x,y])) always reconstructs x from [x,y]. Moreover {{x},{{x},{{y},y}}} - {{x}} = {{{x},{{y},y}}}, soarb(arb([x,y] - {arb([x,y])}) - {arb(x)} = {{y},y},and therefore the expression arb(arb(arb([x,y] - {arb([x,y])}) - {arb(x)}) reconstructs y from [x,y]. The reader is invited to amuse him/herself by inventing other like constructions having similar properties.Once ordered pairs and the operators which extract their components have been defined in this way, it is easy to define the general set-theoretic notion of 'relationship' and the associated notions of 'single-valued mapping', 'inverse relationship', and '1-1 relationship'. A relationship or mapping, or just map, is simply a set of ordered pairs. To formalize this, we have only to write
is_map(f) := (f = {[car(x),cdr(x)]: x in f}).The domain and range of a relationship are then defined in the usual way asdomain(f) := {car(x): x in f}andrange(f) := {cdr(x): x in f}respectively. A relationship is single-valued if the first component u of each pair [u,v] in it defines the associated second component v uniquely. Formally this isSvm(f) := is_map(f) & (FORALL x in f | (FORALL y in f | (car(x) = car(y)) *imp (x = y)))The inverse of a relationship is defined byinv(f) := {[cdr(x),car(x)]: x in f}.A relationship is 1-1 if it and its inverse are both single-valued. Other standard constructs involving mappings, for example the composition of two mappings, are equally easy to define.Integers and ordinal numbers in set theory. As noted above, John von Neumann suggested that the fundamental mathematical notion of 'integer' be expressed set-theoretically by encoding 0,1,..,.n,... set theoretically as {},{0},...,{0,1,..,n - 1},... The set Z of all integers is then
{0,1,..,n,...}.
All of these sets s, including the infinite set Z, have the following properties:
(i) any member of a member of s is also a member of s;
(ii) given any two distinct members x,y of s, one of x and y must (come earlier in the sequence in which we have enumerated the members of s, and so must) be a member of the other;
von Neumann then realized that sets having these two properties had exactly the properties of 'ordinal numbers' as originally defined by Cantor, so that (i) and (ii) can be taken as the definition of the notion of ordinal number. Besides its striking directness and simplicity, this definition has the advantage (over Cantor's original definition) of representing each ordinal number by a unique set. Moreover, all the basic operations on infinite ordinals which Cantor introduced take on simple set-theoretic forms if ordinals are defined in this way. For example, for the integers in their von Neumann representation, each integer m less than an integer n is a member of n; hence the arithmetic relationship 'm < n' can be defined 'm in n', i.e. by the simplest of all set theoretical relationships. We use this definition, i.e. 's less than t' means simply 's in t', for arbitrary ordinals s.
Instantiation and proof by use of 'Theories'
The 'theory' mechanism which our system provides relates to logical proof in something like the way in which the use of 'procedures' relates to programming practice. It facilitates introduction of symbol groups or single symbols (like the standard mathematical summation operator S and the rather similar product operator P) which derive from previously defined functions and constants ('+' and '0' in the case of S, multiplication and '1' in the case of P), that have the properties required for definition of the new symbols. As these examples indicate, our 'theory' mechanism eases an important class of instantiations which need to be justified by supporting theorems. It adds a touch of second-order logic capability to the first-order system in which we work.
The syntax used to work with 'theories' is described by the following procedure-like template.
THEORY theory_name(list_of_assumed_symbols) assumptions ==>(list_of_defined_symbols) conclusions END theory_name;
The formal description of the important 'theory of sigma', which we will use as a running example, illustrates the way in which we set up and use theories. This theory captures a construction, ubiquitous in mathematical practice, which is normally written using 'three dots' notation, e.g. as f1 + f2 + ... + fk.
THEORY SIGMA_theory(s,PLUZ,e) e in s (FORALL x in s | (FORALL y in s | x PLUZ y in s)) (FORALL x in s | x PLUZ e = x) (FORALL x in s | (FORALL y in s | x PLUZ y = y PLUZ x)) (FORALL x in s | (FORALL y in s | (FORALL z in s | (x PLUZ y) PLUZ z = x PLUZ (y PLUZ z)))) ==> (SIGMA) (FORALL f | (Finite(f) & Svm(f) & range(f) *incin s) *imp (SIGMA(f) in s & SIGMA({}) = e & (FORALL x,y | SIGMA({[x,y]}) = y) & (FORALL t | SIGMA(f) = SIGMA(f | (domain(f) * t)) PLUZ SIGMA(f | (domain(f) - t))) & (FORALL x in domain(f) | SIGMA(f) = SIGMA(f | (domain(f) - {x})) PLUZ arb(f{x})) & (FORALL g | (Finite(g) & Svm(g) & domain(f) = domain(g)) *imp SIGMA(f) = SIGMA({[y,SIGMA(f | INV_IM{g,y})]: y in range(g)})))) END SIGMA_theory(The final conclusion displayed encapsulates a general 'rearrangement of sums' principle). The assumed_symbols of this theory are s, PLUZ, and e, and its only defined symbol is SIGMA. 'Finite' and 'Svm' are standard set-theoretic predicates, which we assume to have been defined prior to the introduction of the theory displayed: 'Finite(f)' states that f is finite, and 'Svm(f)' states that f is a single-valued map. Similarly, 'domain(f)' and 'range(f)' denote the domain and range of f respectively, 'f|d' denotes the restriction of f to d (namely the largest possible map which is included in f and whose domain is included in d), and 'INV_IM{g,y}' denotes the set of all elements of the domain of g which g maps into the element y. f{x} designates the range of f on the set {x}, and arb(f{x}) the unique element of this range, i.e. the image of x under the single-valued mapping f.
Were the mechanisms of second-order predicate calculus available to us, the meaning of the theory could be rendered precisely by
(FORALL s | (FORALL PLUZ | (FORALL e | (EXISTS SIGMA | (e in s & (FORALL x in s | (FORALL y in s | x PLUZ y in s)) & (FORALL x in s | x PLUZ e = x) & (FORALL x in s | (FORALL y in s | x PLUZ y = y PLUZ x)) & (FORALL x in s | (FORALL y in s | (FORALL z in s | (x PLUZ y) PLUZ z = x PLUZ (y PLUZ z))))) *imp (FORALL f | (Finite(f) & Svm(f) & range(f) *incin s) *imp (SIGMA(f) in s & SIGMA({}) = e & (FORALL x,y | SIGMA({[x,y]}) = y) & (FORALL t | SIGMA(f) = SIGMA(f | (domain(f) * t)) PLUZ SIGMA(f | (domain(f) - t))) & (FORALL x in domain(f) | SIGMA(f) = SIGMA(f | (domain(f) - {x})) PLUZ arb(f{x})) & (FORALL g | (Finite(g) & Svm(g) & domain(f) = domain(g)) *imp SIGMA(f) = SIGMA({[y,SIGMA(f | INV_IM{g,y})]: y in range(g)}))))))))Informally speaking, this second-order formula states that given any set s and commutative-associative operator defined on it, there must exist a monadic function SIGMA which relates to them in the manner stated in the conclusion of the quantified formula displayed. If our formalism allowed the second-order mechanisms (of quantification over function and relation symbols, which it does not) seen here, and were this second-order formula proved, we could substitute any three actual symbols for which the hypotheses of the formula had been proved for the three universally quantified function symbols s, PLUZ, and e which appear, thereby obtaining the existentially quantified conclusion
(EXISTS SIGMA | (e in s & (FORALL x in s | (FORALL y in s | x PLUZ y in s)) & (FORALL x in s | x PLUZ e = x) & (FORALL x in s | (FORALL y in s | x PLUZ y = y PLUZ x)) & (FORALL x in s | (FORALL y in s | (FORALL z in s | (x PLUZ y) PLUZ z = x PLUZ (y PLUZ z))))) *imp (FORALL f | (Finite(f) & Svm(f) & range(f) *incin s) *imp (SIGMA(f) in s & SIGMA({}) = e & (FORALL x,y | SIGMA({[x,y]}) = y) & (FORALL t | SIGMA(f) = SIGMA(f | (domain(f) * t)) PLUZ SIGMA(f | (domain(f) - t))) & (FORALL x in domain(f) | SIGMA(f) = SIGMA(f | (domain(f) - {x})) PLUZ arb(f{x})) & (FORALL g | (Finite(g) & Svm(g) & domain(f) = domain(g)) *imp SIGMA(f) = SIGMA({[y,SIGMA(f | INV_IM{g,y})]: y in range(g)})))))This last statement (still second-order, since it is quantified over the function symbol SIGMA) would allow us to introduce a new symbol SIGMA_ for which
(e in s & (FORALL x in s | (FORALL y in s | x PLUZ y in s)) & (FORALL x in s | x PLUZ e = x) & (FORALL x in s | (FORALL y in s | x PLUZ y = y PLUZ x)) & (FORALL x in s | (FORALL y in s | (FORALL z in s | (x PLUZ y) PLUZ z = x PLUZ (y PLUZ z))))) *imp (FORALL f | (Finite(f) & Svm(f) & range(f) *incin s) *imp (SIGMA_(f) in s & SIGMA_({}) = e & (FORALL x,y | SIGMA_({[x,y]}) = y) & (FORALL t | SIGMA_(f) = SIGMA_(f | (domain(f) * t)) PLUZ SIGMA_(f | (domain(f) - t))) & (FORALL x in domain(f) | SIGMA_(f) = SIGMA_(f | (domain(f) - {x})) PLUZ arb(f{x})) & (FORALL g | (Finite(g) & Svm(g) & domain(f) = domain(g)) *imp SIGMA_(f) = SIGMA_({[y,SIGMA_(f | INV_IM{g,y})]: y in range(g)}))))is known. This final statement is now first-order.The second-order mechanisms needed to proceed in just the manner explained are not available in our first-order setting. The theory mechanism that is provided serves as a partial but adequate substitute for it.
After these introductory remarks we return to a detailed consideration of the general theory template displayed at the start of this section. In it, 'theory_name' names the theory in which we are interested. A theory's 'list_of_assumed_symbols' is analogous to the parameter list of a procedure. It is a comma-separated list of symbol names, which stand for other symbols which must replace the assumed_symbols whenever the theory is applied. The members of the list of 'assumptions' which follow must be formulae which, aside from basic predicate and set-theoretic constructions (quantifiers and set formers), involve only elements of the list_of_assumed_symbols, possibly along with other symbols that have been defined previously to introduction of the theory, in the context in which the theory is introduced. The formal description of the 'theory of SIGMA' given above illustrates these rules.
The 'conclusions' which follow the syntactic delimiter '==>' in the general template must be formulae which, aside from basic predicate and set-theoretic constructions, involve only elements of the list_of_assumed_symbols and the list_of_defined_symbols, along with other symbols that have previously been defined in the context in which the theory is introduced. The elements of the (comma-delimited) list_of_defined_symbols are symbol names, which must be defined within the theory, more precisely as part of a proof (given within the theory), of the theory's stated conclusions. Each defined_symbol is replaced with a previously unused symbol whenever the theory is applied.
Once a theory has been introduced in the manner just explained, and before it can be used, a sequence of theorems and definitions culminating in those which appear as the conclusions of the theory must be proved in the theory. The syntax used to begin this process, which temporarily 'enters' the theory, is simply
ENTER_THEORY theory_nameThis statement creates a subordinate proof context in which the assumed_symbols of the theory, together with all its stated assumptions, are available. Then, using these assumptions, one must give definitions of all the theory's defined_symbols, and proofs of all its conclusions. Once this has been done, one can return from the subordinate logical context to the parent context from which it was entered by executing another ENTER_THEORY command, which now must name the parent theory to which we are returning. (Proof always begins in a top-level context named 'set_theory'). After return, the theory's conclusions become available for application. Note also that theories previously developed in the parent context of a new theory T are available for application during the construction of T.
The syntax (analogous to that for 'calling' procedures) used to apply theories is
APPLY(new_symbol:defined_symbol_of_theory,...) theory_name(list_of_replacements_for_assumed_symbols)As indicated, the keyword 'APPLY' is followed by a comma-delimited sequence of colon-separated pairs which associates each defined_symbol of the theory with a previously unused symbol, which then replaces the defined_symbol in the set of conclusions that results from successful application of the theory. Next there must follow a comma-delimited list of symbols defined previously, equal in length to the theory's list of assumed_symbols, which specifies the symbols which are to replace the assumed_symbols at the point of application. Our verifier replaces all the assumed_symbols appearing in the theory's assumptions with these replacement symbols, and searches the logical context available at the point of theory application for theorems identical with the resulting formulae. If any of these is missing, the requested theory application is refused. If all are found, then the conclusions of the theory are turned into theorems by replacing every occurrence of the theory's defined symbols by the corresponding new_symbol and every occurrence of the theory's assumed symbols by its specified replacement symbol.
Assume, for example, that the 'SIGMA_theory' displayed above has been made available (in the way explained above), and that theorems
e in Z , (FORALL x in Z | x + 0 = x) , (FORALL x in Z | (FORALL y in Z | x + y = y + x)) , (FORALL x in Z | (FORALL y in Z | (FORALL z in Z | (x + y) + z = x + (y + z))))
have been proved (separately from the theory) for the integers Z, and integer addition. Then the verifier instruction
makes the symbol SIG (which must not have been defined previously) available, and gives us the theorem APPLY(SIG:SIGMA) SIGMA_theory(Z,+,0)
(FORALL f | (Finite(f) & Svm(f) & range(f) *incin s) *imp (SIG(f) in Z & SIG({}) = 0 & (FORALL x,y | S({[x,y]}) = y) & (FORALL t | SIG(f) = SIG(f | (domain(f) * t)) + SIG(f | (domain(f) - t))) & (FORALL x in domain(f) | SIG(f) = SIG(f | (domain(f) - {x})) + arb(f{x})) & (FORALL g | (Finite(g) & Svm(g) & domain(f) = domain(g)) *imp SIG(f) = SIG({[y,SIG(f | INV_IM{g,y})]: y in range(g)}))))without further proof.The theory of equivalence classes is a second important 'theory' example.
THEORY equivalence_classes(P,s) (FORALL x in s | (FORALL y in s | (P(x,y) *eq P(y,x)) & P(x,x))) (FORALL x in s | (FORALL y in s | (FORALL z in s | (P(x,y) & P(y,z)) *imp P(x,z)))) ==>(Eqc,F) (FORALL x in s | F(x) in Eqc) & (FORALL y in Eqc | (arb(y) in s & F(arb(y)) = y)) (FORALL x in s | (FORALL y in s | P(x,y) *eq (F(x) = F(y)))) (FORALL x in s | P(x,arb(F(x)))) (FORALL x in s | x in F(x)) END equivalence_classes;
This states that any dyadic 'equivalence relation' P(x,y) can be represented in the form P(x,y) *eq (F(x) = F(y)) by some monadic function F. (Conventionally, one speaks of F(x) as the equivalence class of x; notice, however, that we are deliberately 'hiding' such secondary facts as '{} notin Eqc', 's=Un(Eqc)', and '(FORALL y in Eqc, x in y | F(x) = y)'). The theory of equivalence classes is one of a family of easy but widely applicable results which represent various kinds of monadic relationships in terms of elementary relationships which are especially easy to work with (often because decision algorithms apply to them). For example, one can easily show that any partial ordering on set elements x,y can be represented in the form F(x) *incin F(y). Results of this kind lend particular importance to the relationships to which they apply.
1.5 An informal overview of the sequence of formal set-theoretic proofs to be given later
This text culminates in the sequence of definitions and proofs found in Chapters XX-YY. The theorems (with proofs set up to be verifiable by our system) fall into the following categories:
Basic elementary results
(i) Definition and basic properties of ordered pairs. These are fundamental to many of the following definitions, e.g. of maps and of the Cartesian product.
(ii) Definition of the notions of map, single-valued map, 1-1-map, map restriction, domain, range, map product, etc. and derivation of the ubiquitous elementary properties of maps, as a long series of elementary theorems. Some of these properties of maps are captured for convenience in a theory called 'fcn_symbol' which can be used to prove basic properties of set formers defining single-valued maps.
Ordinals
(iii) Definition of the notion of 'ordinal', and proof of the basic properties of ordinals. Completely formal proofs of all the basic properties of ordinal numbers will be given in Chapter 5. But to make these proofs more comprehensible it is well to translate some of them, and some of the key definitions used in them, into the more comfortable language of ordinary mathematics. We follow von Neumann in defining an ordinal as a set (I) properly ordered by membership, and for which (II) members of members are also members. The key results proved are: (a) the collection of all ordinals is itself properly ordered by membership, and members of ordinals are ordinals, but (b) the collection of all ordinals is not a set. Also, (c) proceeding recursively in the manner explained in Section XX, we define a standard enumeration for every set and show that this puts the members of the set in 1-1 correspondence with an ordinal. This is the 'enumerability principle' fundamental to our subsequent work with cardinal numbers.
The von Neumann representation ties the ordinal concept very directly to the most basic concepts of set theory, allowing the properties of ordinals to be established by reasoning that uses only elementary properties of sets and set formers, with occasional use of transfinite induction. (For ease of use, statement and proof of this general principle are captured as a theory called 'transfinite_induction': the principle follows very directly from our strong form of the axiom of choice).
For example, in the von Neumann representation, the next ordinal after an ordinal s is simply s + {s}. To see that s' = s + {s} must be an ordinal, note first that each member of a member of s' is either a member of a member of s, or directly a member of s; and hence in any case a member of s'; thus s' has property (II). The proof that s' also has property (I) is equally elementary and is left to the reader. Together these show that s' is an ordinal. Other equally elementary results concerning ordinals, whose proof is also left to the reader are:
a. The intersection s * t of any two ordinals is an ordinal.
b. Any member t of an ordinal s is an ordinal.
Let s be an ordinal. Since any member of a member t of s is a member of s by (i), any member t of s is a subset of s. Thus for ordinals the membership relation 't in s' implies the inclusion relation 't *incin s'. On the other hand, if t is also an ordinal and t *incin s, then either t = s or t in s. To prove this, suppose that t /= s, and consider the element x = arb(s - t). Any element y of t is also an element of s, so by (ii) we have either y in x, y = x, or x in y. Both y = x and x in y would imply x in t which is impossible. Thus we must have y in x whenever y in t, i.e. t *incin x. But x - t must be null. Indeed, let z in x - t. Then z in x, but also z in s - t, contradicting the fact that x = arb(s - t) is disjoint from s - t. Hence x = t, i.e. t is an element of s, proving our assertion that any subset t of s which is also an ordinal must either be identical to s or must be a member of s. That is, for ordinals the relationship '*incin s' is equivalent to the condition 'is a member of s or is equal to s.'
Next we show that, given any two distinct ordinals s and t, one is a member of the other. Suppose that this is not the case. Then if s = s * t then s is a subset of t, and hence, by the result just proved, is a member of t. Similarly, if t = s * t then t is a member of s. So it follows that s /= s * t and t /= s * t. Since s * t is an ordinal and a subset of s, it follows by the result just proved that s * t in s; similarly s* t in t, so s * t in s * t, which is impossible since the membership operator can admit no cycles. This proves our claim.
It follows that if s and t are both ordinals, the intersection s * t is the smaller of s and t, while the union s + t is the larger of s and t. If O is any non-empty set of ordinals, then x = arb(O) is a member of O and hence an ordinal. By definition of arb, x must be disjoint from O. Hence if y is any other member of O, y in x is impossible so x in y must be true. That is, arb(O) must be the smallest of all the elements of O. Moreover the union Un(O) of all the elements of O must be an ordinal, since if x in Un(O) and y in x then there is an s in O such that x in s, from which it follows that y in s and so y in Un(O), proving that Un(O) has property (i). Moreover if x in Un(O) and y in Un(O), then there must exist s in O and t in O such that x in s and y in t. Then one of s and t, say s, must include the other, and so x and y must both be members of s. Since s is an ordinal and therefore has property (ii), it follows that either x in y, x = y, of y in x. Hence Un(O) also has property (ii). This shows that the union Un(O) of any set of ordinals must itself be an ordinal, which is easily seen to be the smallest ordinal including all the members of O.
Using the statements just proved it is easy to show that if s is an ordinal, then s' = s + {s} is the least ordinal greater than s. Indeed, we have shown above that s' is an ordinal. Moreover s in s', so s' is larger than s in the ordering of ordinals. If t is any ordinal larger than s, i.e. s in t, then either s' in t, s' = t, or t in s' by what has been proved above. But t in s' is impossible, since it would imply that either t in s or t = s, and so in either case would lead to an impossible membership cycle. Therefore either s' in t or s' = t, i.e. t is no smaller than s', proving that s' is the least ordinal greater than s, as asserted. It is therefore reasonable to write s + {s} as next(s).
Any ordinal s which is greater than every integer n must have all such n as members, proving that the set Z of all integers must be a subset of the set s. Hence Z must be the smallest ordinal which is greater than every integer n. Therefore the smallest members of the collection of all ordinals can be written as
0,1,..,n,...,Z,next(Z),next(next(Z)),...
in their natural order (of membership). In his initial series of papers on ordinals Georg Cantor introduced a variety of constructions for ordinals which generalize various arithmetic constructions for ordinary integers and which allow the sequence of ordinal notations shown above to be extended systematically.
Well ordering: the principle of transfinite enumerability
The ordinal numbers, as we (or von Neumann, or Cantor) have defined them capture an abstract notion of sequential enumeration, even for sets which are not restricted to be finite. A crucial property of the ordinals is that they allow any set s to be enumerated, irrespective of whether s is finite or infinite. This is the so-called Well-Ordering Theorem. This famous result is not hard to prove given the very generous variant of set theory which we allow, which as explained earlier lets us write very general recursive definitions in set theoretic notation, and also admits free use of the choice operator 'arb'.
To prove the well-ordering theorem, we first show that the collection Ord of all ordinals is not a set, i.e. that there is no set O such that s is an ordinal if and only if s in O. For otherwise s = Un(O) would be an ordinal by what we have just proved, and so as shown above s + {s} is also an ordinal, implying that s is a member of a member of s, and so s in s, which is impossible.
Next define a function enum(X,S) of two parameters by writing
enum(X,S) := if S *incin {enum(y,S): y in X} then S else arb(S - {enum(y,S): y in X}) end if.
That is, we define enum(X,S) to be the element of S - {enum(y,S): y in X} chosen by 'arb' if {enum(y,S): y in X} differs from S; otherwise enum(X,S) is simply S. This definition implies that the elements enum(0,S),enum(1,S),enum(2,S),...,enum(Z,S),... have the following values:
enum(0,S) = arb(S) enum(1,S) = arb(S - {arb(S)}) enum(2,S) = arb(S - {arb(S),enum(1,S)}) ... enum(Z,S) = arb(S - {arb(S),enum(1,S),enum(2,S),...}) ...The crucial fact, proved in the next paragraph, is that the elements enum(x,S) remain distinct, for distinct ordinals x, as long as {enum(y,S): y in x} is a proper subset of S. Note also that as the ordinal x increases, so does the set {enum(y,S): y in x}.It is easy to prove that enum(x,S) and enum(y,S) must be distinct if x and y are distinct ordinals and both enum(x,S) and enum(y,S) are different from S. Indeed, one of x and y, say y, must be a member of the other, and then by definition we must have enum(x,S) = arb(S - {enum(z,S): z in x}), so enum(x,S) in S - {enum(z,S): z in x}, while enum(y,S) in {enum(z,S): z in x}. It follows from this that there must exist an ordinal x for which S = {enum(z,S): z in x}. For if this is false, then by what we have just proved the mapping z :-> enum(z,S) maps the collection of all ordinals in 1-1 fashion into a subset of the set S. But an axiom of set theory (the so-called 'Axiom of Replacement', detailed below) tells us that every collection which can be put in 1-1 correspondence with a set must itself be a set. Hence it would follow that the collection of all ordinals is a set, contradicting what has been proved above.
Since we have just shown that there exists an ordinal x such that S = {enum(z,S): z in x}, there must exist a least such ordinal y (which we can define as
arb({y in next(x) | S = {enum(z,S): z in y}}).It is easily seen (we leave details to the reader) that z :-> enum(z,S) maps this y in 1-1 fashion onto S, completing our proof of the Well-Ordering Theorem.Cardinal numbers
(iv) Definition of 'cardinality' and of the operator #s which gives the (possibly infinite) number of members of a set s. The cardinality of a set is defined as the smallest ordinal which can be put into 1-1 correspondence with the set, and it is proved that (a) there is only one such ordinal, and (b) this is also the smallest ordinal which can be mapped onto s by a single-valued map.
The proof of the Well-Ordering Theorem puts us in position to introduce the notion of cardinal number and to prove the basic elementary properties of these numbers. We define the cardinals as a subset of the ordinals; an ordinal x is called a cardinal if x cannot be put into 1-1 correspondence with any smaller ordinal. By the Well-Ordering Theorem, any set s can be put in 1-1 correspondence with some ordinal, and arguing as above it follows that s can be put in 1-1 correspondence with some smallest ordinal x. Since the composition of two 1-1 mappings is itself 1-1, it follows that this unique x must itself be a cardinal. We call this cardinal the cardinality of s, and write it (using the standard number sign) as #s.
In this section we also define the notions of cardinal sum and product of two sets a and b. These are respectively defined as #(copy_a + copy_b), where copy_a and copy_b are disjoint copies of a and b, and the cardinality of the Cartesian product a *PROD b of a and b. Using these definitions, it is easy to prove the associative and distributive laws of cardinal arithmetic. We also prove a few basic properties of the #s operator, e.g. its monotonicity.
(v) A set s is then defined to be finite if it has no 1-1 mapping into a proper subset of itself, or, equivalently, is not the single-valued image of any such proper subset. We prove that the null set and any singleton are finite, and (using transfinite induction) that the collection of finite sets is closed under the union, Cartesian product, and power set operators. It is proved that s is finite if and only if its cardinality #s is finite. We then prepare for the introduction of signed integer arithmetic by proving all the basic arithmetic properties of unsigned integers and then defining the cardinal subtraction operator a MINUS b and showing that for finite cardinals subtraction has its expected properties. We also prove that integer division with remainder is always possible. These results are proved with the help of a modified version of the principle of induction which is demonstrated for finite sets: given any predicate P(x) not true for all finite sets, there exists a finite set s for which P(s) is false, but P(s') is true for all proper subsets of s. Like the rather similar transfinite induction, this principle is captured for convenience in a theory.
(vi) Sets which are not finite are said to be infinite. By considering the cardinality #s_inf of the infinite set s_inf whose existence is assumed in an axiom of infinity, we prove that there exists an infinite cardinal, and so can define the set Z of integers as the least infinite ordinal, and show that this is a cardinal, and is in fact the set of all finite cardinals. The set Z of all integers is infinite, since the 1-1 correspondence n :-> next(n) maps Z to a subset of itself (the zero integer, i.e. {}, is not in the range of 'next'). It is not hard to see that if the set s is finite, so is next(s) = s + {s}. Indeed, if s + {s} is infinite, there exists a 1-1 mapping f of s + {s} to a proper subset of itself. The range of the mapping f must therefore omit some element of s + {s}, i.e. must either omit s or some element x of s. Consider the latter of these two cases. We can plainly construct a 1-1 mapping g of s + {s} onto itself which interchanges x and s. Then the composition of f and g is a 1-1 mapping of s + {s} into itself whose range omits the value s. This shows that if next(s) is infinite, there must always exist a 1-1 mapping f of next(s) into s, but then f maps s into s - {f(s)}, so s is also infinite. I.e., s is infinite if next(s) is infinite, implying that next(s) is finite if s is finite.
It follows that all the integers 0 = {}, 1 = next(0), 2 = next(1),... are finite, and so each of these ordinals must also be a cardinal. Moreover, the infinite ordinal Z must also be a cardinal. Indeed, if this is not the case, there would exist a 1-1 mapping f of Z into a smaller ordinal, i.e. to some integer n in Z. But then f would also map the subset next(n) of Z into its proper subset n, implying that next(n) is infinite, which we have seen to be impossible. Thus Z is not only the smallest infinite ordinal but also the smallest infinite cardinal. This implies that
#0 = 0, #1 = 1, #2 = 2,... ,#Z = Z(every cardinal is its own cardinality, and every ordinal less than or equal to Z is a cardinal). On the other hand, the cardinality of next(Z) = Z + {Z} is simply Z. Indeed, we have seen that there exists a 1-1 mapping f of Z into itself whose range omits the integer 0; this can plainly be extended to a 1-1 mapping of Z + {Z} into Z. This same argument shows that if #s = Z then #next(s) = Z also. Therefore the sequence of cardinalities of the ordinals0,1,2,...,Z,next(Z),next(next(Z)),next(next(next(Z))),...is0,1,2,...,Z,Z,Z,Z,...That is, all the infinite ordinals displayed, though distinct, have the same cardinality. Any set s whose cardinality #s is Z is said to be denumerable, or countably infinite; and a set which is either finite or denumerable is said to be countable. Our next question is: how can we be sure that uncountable sets, namely sets whose cardinality exceeds Z, actually exist?(vii) Another idea is plainly needed if we are to show that there exist any cardinals larger than Z. As a digression, we prove that the sum and product of any two infinite cardinals degenerates to their maximum (hence there are no more rational numbers than there are integer numbers), but (Cantor's Theorem) that the power set of any cardinal always has a larger cardinality. Cantor noted that for any set s, the set pow(s) of all subsets of s must have cardinality larger than that of s. For suppose the contrary, i.e. suppose that there exists a 1-1 mapping f of s onto pow(s). Then consider the subset {x: x in s | x notin f(x)} of s. This must have the form f(y) for some y in s; hence f(y) = {x: x in s | x notin f(x)}. But then y in f(y) is equivalent to y in {x: x in s | x notin f(x)}, i.e. to y notin f(y), which is impossible. (Incidentally, since a 1-1 correspondence between reals and pow(Z) can be found, this implies that real numbers form an uncountable set).
Since s always has a 1-1 embedding into pow(s) (we can simply map each x in s into the singleton {x}), the cardinality of s is never greater than that of pow(s). The theorem of Cantor proved in the preceding paragraph shows that in fact we always have #s < #pow(s), i.e. #s in #pow(s). Hence #pow(Z) is an infinite cardinal which is definitely larger than Z; similarly #pow(#pow(Z)) is larger than #pow(Z) and so forth, proving that there must exist infinitely many infinite cardinals. In fact, we can easily prove that there exists a 1-1 correspondence between the collection of all ordinals and the collection of all cardinals. For this, we simply need to make the transfinite inductive definition
alph(x) := arb({z: z in (next(#pow(Un({alph(y): y in x}))) - {alph(y): y in x}) | is_cardinal(z)}),where 'is_cardinal' is the predicate, easily expressible in elementary set-theoretic terms, which states that its argument y is a cardinal number. Since all the occurrences of 'alph' on the right-hand side of this definition lie in the scope of constraints of the form 'y in x', this is a legal transfinite definition according to the rule stated earlier. For each ordinal x, this formula defines alph(x) to be the smallest cardinal (if any) which is not more than #pow(Un({alph(y): y in x}) but is not one of the cardinals alph(y) for any ordinal y less than x. Since we have seen above that u = Un({alph(y): y in x} is an ordinal at least as large as any of the alph(y) for y in x, and also that #pow(u) is larger than u, the set next(#pow(u)) - {alph(y): y in x}) must be nonempty, and so alph(x) must indeed be the smallest cardinal greater than all of the cardinals alph(y) for any ordinal y in x. It is easily seen (details are left to the reader) that alph(y) < alph(z) if y < z. Hence the function 'alph' is a 1-1, monotone increasing map of the collection of all ordinals to the collection of all cardinals. It is not hard to prove that every cardinal must appear as on of the alph(y). Thus 'alph' actually puts the collection of all ordinals in 1-1 correspondence with the collection of all cardinals. For small ordinals we havealph(0) = 0, alph(1) = 1, alph(2) = 2,..., alph(Z) = Z.A mystery, first encountered by Cantor, occurs at the very next position in this sequence. alph(next(Z)) is the smallest cardinal greater than Z. We have seen that the cardinal number #pow(Z) is larger than Z; hence alph(next(Z)) <= #pow(Z). But is this inequality actually an equality, or does there exist a cardinal number between Z and #pow(Z)? Indeed, do there exist infinitely many cardinal numbers in this range? This is the so-called 'Continuum problem', originally stated by Cantor. Its very surprising resolution, ultimately achieved by Kurt Gödel and Paul Cohen, required over 60 years of penetrating work: the statement alph(next(Z)) = #pow(Z) is independent of the axioms of set theory, which admit both of models in which this statement is true and of many structurally distinct models in which it is false.All the semi-formal proofs given above will recur, in completely formalized versions, in Chapter 5. The semi-formal proofs given in this section can serve as intuitive guides to the larger mass of detail appearing in these formal proofs.
Survey of the major sequence of definitions and proofs considered in this text
(viii) The set of signed integers is then introduced as the set of pairs [x,0] (representing the positive integers) and [0,x] (representing the integers of negative sign). [0,0] is the 'signed integer' 0, and the 1-1 mapping x :-> [x,0], whose inverse is simply y :-> car(y), embeds Z into the set of signed integers, in a manner allowing easy extension of the addition, subtraction, multiplication, and division operators to signed integers. In preparation for introduction of the set of rational numbers, it is proved that the set of signed integers is an 'integral domain'. At this point, we are well on the royal road of standard mathematics.
(ix) Next we introduce two important 'theories' mentioned above: the theory of equivalence classes and the theory of SIGMA. As previously noted, the theory of SIGMA is a formal substitute for the common but informal mathematical use of 'three dot' summation (and product) notations like
a1 + a2 + ... + an and a1 * a2 * ... * an.
The theory of equivalence classes characterizes the dyadic predicates R(x,y) which can be represented in terms of the equality predicate using a monadic function, i.e. as R(x,y) *eq (F(x) = F(y)). These R are the so-called 'equivalence relationships', and for each such R defined for all x belonging to a set s, the theory of equivalence classes constructs F (for which arb turns out to be an inverse), and the set into which F maps s. This range is the 'family of equivalence classes' defined by the dyadic predicate R. The construction seen here, which traces back to Gauss, is ubiquitous in 20th century mathematics.
To illustrate the use of the theory of SIGMA we digress slightly from our main line of development to prove the prime factorization theorem: every integer greater than 1 can be factored as a product of prime integers, essentially in only one way.
(x) Next the family Q of rational numbers is defined as the set of equivalence classes arising from the set of all pairs [n,m] of signed integers for which m /= 0. To do this we consider the equivalence relationship Same_frac([n,m],[n',m']) := n * m' = n' * m. The mapping n :-> [n,1], whose inverse is simply car(x), embeds the signed integers into the rationals in a manner preserving all elementary algebraic operations, and also preserving order. From the fact that the set of signed integers is an ordered integral domain we easily prove that the rationals are an ordered field.
(xi) Our next step, following Cantor, is to define real numbers as equivalence classes of 'Cauchy sequences' si of rationals. Here, a sequence is a 'Cauchy sequence' if it satisfies
(FORALL x in Q | (EXISTS n in Z | (FORALL i, j in Z | (x > 0 & i > n & j > n) *imp abs(si - sj) < x))).The equivalence relation used isSame_real(s,t) = (FORALL x in Q | (EXISTS n in Z | (x > 0 & i > n & j > n) *imp abs(si - ti) < x)).Arithmetic operations for these equivalence classes are easily derived from the corresponding functions for rationals, and the 'completeness' of the set of real numbers, a key goal of early 19-th century foundational work on analysis, can be proved without difficulty.
Since it is required for the elementary discussion of complex numbers, we prove the existence and basic properties of the square root, which is shown to exist for any non-negative real number.
(xii) Next the complex numbers are introduced as pairs of real numbers, and their elementary properties are established. In particular, they are shown to constitute a field, within which the field of real numbers has a natural embedding. The modulus of a complex number is defined and its basic properties demonstrated.
(xiii) This completes our preliminary work. What remains is to give the formal details of those parts of standard mathematical analysis needed to state and prove our assigned target result, the Cauchy integral theorem. For this, various familiar results concerning differentiation and integration are needed, first for functions of a real variable, then for functions of a complex variable. Our approach is as follows. The space of all real functions of a real variable is defined, along with the (pointwise) operations of addition, subtraction, and multiplication for functions, function comparison, the positive part of a function, and the least upper bound of a set of functions. Various elementary facts concerning this space of functions are established. In particular, it is shown that they form a ring under addition and multiplication. This allows application of the previously developed 'theory of sigma' to define the sum of an arbitrary finite sequence of real functions. In preparation for the definition of the (ordinary Lebesgue) integral, the sum of an absolutely convergent series of positive real numbers is defined, and the basic properties of such sums are established. This prepares for definition of the sum of an absolutely convergent series of positive real functions, and for a proof of a few basic properties of such series.
In more direct preparation for definition of the integral, we define 'block' functions as real-valued functions of a real variable which are constant inside some finite interval of the real axis, and zero outside this interval. The integral of such a function is simply the area under its graph, which is an elementary rectangular block.
The greatest lower bound of a set of real numbers bounded below is then defined. This is immediately used to define the (Lebesgue) 'upper' integral of an arbitrary non-negative real function of a real variable. This is the greatest lower bound of the sum the integrals of all infinite sequences of non-negative block functions, extended over all such sequences whose (pointwise) sum exceeds the value f(x) at each real point x. Using this, we can define the integral of an arbitrary real function f (which now can have values of both signs) as the difference of the upper integrals of its positive and negative parts.
A function f of a real value is defined to be continuous if it satisfies the standard 'epsilon-delta' condition. To define the derivative of such functions by the technique we adopt, the extension of this definition to the space of real-valued functions of two real variables is needed. To set this up, we first define n-dimensional Euclidean space as the set of all real-valued maps whose domain is the set of integers less than n. The standard Euclidean distance function is defined in this space and its basic properties are proved. Once this has been done, the space of continuous real-valued functions on a Euclidean space of any number of dimensions can be defined by extending the 'epsilon-delta' formulation to this slightly more general setting. We can then define a real-valued function f of one real variable to be (continuously) differentiable if there exists a real-valued function g of two real variables such that (x - y)g(x,y) = f(x) - f(y) for all real x and y. We prove that if such a g exists it is unique, in which case we define the derivative of f as the function h of one variable satisfying h(x) = g(x,x).
Next this whole discussion is carried over to complex functions of a complex variable. We successively define the space of all such functions, the complex Euclidean space of n dimensions with its norm, and the sum, difference, and product for complex-valued functions, either of a single complex variable, or of a point in complex Euclidean space. The 'epsilon-delta' definition of continuity is extended to the complex case for both these classes of functions. This allows direct extension of the notion of derivative, and of its elementary properties, to complex-valued functions of a complex variable.
A set of points in the complex plane is defined to be open if it is the union of the interiors of a set of circles, and a complex function defined in such a set is defined to be analytic if it is differentiable within the set.
Next we define the complex exponential function cexp as the unique complex function analytic everywhere in the complex plane and satisfying the equations Dcexp= cexp and cexp([0,0]) = [1,0], where Dcexp denotes the derivative of cexp. The constant pi is then defined as the smallest positive real root of cexp([0,x]) = [-1,0].
Directly after this, we define the notion of a continuous complex function of a real variable by extending the 'epsilon-delta' formulation to this case in the obvious way. A similar extension of the construction used in the real case gives us the notion of a differentiable complex-valued function of a real variable (i.e. of a smooth curve in the complex plane), and of its derivative. The complex line integral of a complex function g defined on such a curve is then taken to be the ordinary integral of the complex product of g by Df (where as before Df is the derivative of f); the integral of the complex-valued function h = g * Df (which is a function of a single real variable) is by definition obtained by adding the real integrals of the real and imaginary parts of h. We show that the line integrals of an analytic function g over any two curves lying in its domain of analyticity are the same, provided that the two curves lie sufficiently close to one another. Using this, we show that the line integral over the periphery of the unit circle of the quotient function f/(z - w) is 2*pi*i*f(w) for every function f analytic in an open set including the unit circle and its interior, and for every point interior to the unit circle.
Satisfied with this somewhat special form of the Cauchy integral theorem, we rest from our labors.
This chapter prepares for the extensive account of our verifier system given in Chapter 3 by describing and analyzing two of the system's basic ingredients, the propositional calculus from which we take all necessary properties of the logical operations &, or, not, *imp, and *eq, and the (first order) predicate calculus, which to these propositional mechanisms adds compound functional and predicate constructions and the two quantifiers FORALL and EXISTS. Chapter 2. Propositional and Predicate-calculus preliminaries.
Why predicate calculus? Our aim is to develop a mechanism capable of ensuring that the logical formulae in which we are interested are universally valid. Since, as we shall see in Chapter 4, there can exist no algorithm capable of making this determination in all cases, we must use the mechanism of proof. This embeds the formulae in which we are interested in some system of sequences of formulae, within which we can define a property Is_a_proof(p) capable of being verified by an algorithm, such that we can be certain that the final component t of any sequence p satisfying Is_a_proof(p) is universally valid. Then we can use intuition freely to find aesthetically pleasing sequences p, the proofs, leading to interesting end goals t, the theorems. In principle, any system of formulae and sequences of formulae having this property is acceptable. The propositional/predicate calculus and set theory in which we work is merely one such formalism, of interest because of its convenience and wide use, and because much effort has gone into ensuring its reliability.
2.1. The propositional calculus
The propositional calculus constitutes the 'bottom-most' part of the full logical formalism with which we will work in this book. It provides only the operations &, or, not, *imp, and *eq, and the two constants 'true' and 'false', all other symbolic constructions being reduced ('blobbed') down to single letters when propositional deductions must be made. An example given earlier, i.e. the formula
(F(x + y) = F(F(x)) *imp F(F(x)) = 0) *imp (F(F(x)) /= 0) *imp (F(x + y) /= F(F(x)))whose 'blobbed' propositional skeleton is
(p *imp q) *imp ((not q) *imp (not p)), illustrates what is meant.
Formulae of the propositional calculus are built starting with string names designating propositional variables and combining them using the dyadic infix operators '&', 'or', '*imp', and '*eq' and the monadic operator 'not'. Parentheses are used to group the subparts of formulae. The only precedence relation supported is the rule that '&' binds more tightly than 'or', so parentheses must normally be used rather liberally. Syntactically, the propositional calculus is a simple operator language, whose (syntactically valid) formulae parse unambiguously into syntax trees, each of whose internal nodes is marked either with one of the allowed infix operators, in which case it has two descendants, or with the monadic operator 'not', in which case it has one descendant. Each leaf of such a tree is marked either with the name of a propositional variable or with one of the two allowed constant symbols 'true' and 'false'.
An example is
(pan *imp quack) *imp ((not quack) *imp (not true)). Here the propositional variables which appear are 'pan' and 'quack', and the constant 'true' also appears.
Since the derivation of the syntax tree of a propositional formula from its string form ('parsing') and of the string form from the syntax tree ('unparsing') are both standard programming operations, we generally regard these two structures as being roughly synonymous and use whichever is convenient without further ado.
As in other logical systems we can think of our formulae either in terms of the values of functions which they represent, or as statements deducible from one another under certain circumstances, and so as the ingredients of some system of formalized proof. We begin with the first approach. In this way of looking at things, each propositional variable represents one of the truth values 1 or 0, which the propositional operators combine in standard ways. The following more formal definition captures this idea:
Definition: An assignment for a collection of propositional formulae is a single-valued function A mapping each of its constants and variables into one of the two values 1 and 0. Each assignment is required to map 'true' into 1 and 'false' into 0. The assignment is said to cover each of the formulae in the collection.
Given any such assignment A, and a formula F which it covers, the value Val(A,F) of the assignment A for the expression F is the Boolean value defined in the following recursive way.
(i) If the formula F is just a variable x or is one of the constants 'true' and 'false', then Val(A,F) = A(x).
(ii) If the formula F has the form 'G & H', then Val(A,F) is the minimum of Val(A,G) and Val(A,H).
(iii) If the formula F has the form 'G or H', then Val(A,F) is the maximum of Val(A,G) and Val(A,H).
(iv) If the formula F has the form 'not G', then Val(A,F) = 1 - Val(A,G).
(v) If the formula F has the form 'G *imp H', then Val(A,F) = Val(A,'(not G) or H').
(vi) If the formula F has the form 'G *eq H', then Val(A,F) = Val(A,'(G & H) or ((not G) & (not H))').
Definition: A propositional formula F is a tautology if Val(A,F) = 1 for all the assignments A covering it.
So tautologies are propositional formulae which evaluate to true no matter what truth values are assigned to their variables. Examples are
p or (not p) , q *imp (p *imp q) , p *imp (q *imp (p & q)) ,and many others, some listed below. These are the propositional formulae which possess 'universal logical validity'.Since the number of possible assignments A for a propositional formula F is at most 2n, where n is the number of variables in the formula, we can determine whether F is a tautology by evaluating Val(A,F) for all such A. An alternative approach is to establish a system of proof by singling out some initial collection of tautologies (which we will call 'axioms') from which all remaining tautologies can be derived using rules of inference, which must also be defined. (This is the 'logical system' approach). The axioms and rules of inference can be chosen in many ways. Though not at all the smallest possible set, the following collection has a familiar and convenient algebraic flavor.
(i) (p & q) *eq (q & p) (ii) ((p & q) & r) *eq (p & (q & r)) (iii) (p & p) *eq p (iv) (p or q) *eq (q or p) (v) ((p or q) or r) *eq (p or (q or r)) (vi) (p or p) *eq p (vii) (not (p & q)) *eq ((not p) or (not q)) (viii) (not (p or q)) *eq ((not p) & (not q)) (ix) ((p or q) & r) *eq ((p & r) or (q & r)) (x) ((p & q) or r) *eq ((p or r) & (q or r)) (xi) (p *eq q) *imp ((p & r) *eq (q & r)) (xii) (p *eq q) *imp ((p or r) *eq (q or r)) (xiii) (p *eq q) *imp ((not p) *eq (not q)) (xiv) (p *eq q) *imp (q *imp p) (xv) (p *imp q) *eq ((not p) or q) (xvi) (p *eq q) *eq ((p *imp q) & (q *imp p)) (xvii) (p & q) *imp p (xviii) (p *eq q) *imp ((q *eq r) *imp (p *eq r)) (xix) (p *eq q) *imp (q *eq p) (xx) (p *eq p) (xxi) (p & (not p)) *eq false (xxii) (p or (not p)) *eq true (xxiii) (not (not p)) *eq p (xxiv) (p & true) *eq p (xxv) (p & false) *eq false (xxvi) (p or true) *eq true (xxvii) (p or false) *eq p (xxviii) (not (true)) *eq false (xxix) (not (false)) *eq true (xxx) trueThe preceding are to be understood as axiom 'templates' or 'schemas', in the sense that all formulae resulting from one of them by substitution of syntactically legal propositional formulae P,Q,... for the letters p,q,... occurring in them are also axioms. For example,
(((p or q) or (r *imp r)) & ((p or q) or (r *imp r))) *eq ((p or q) or (r *imp r))is a substituted instance of (iii) and therefore is also regarded as an axiom.The reader can verify that all of the axioms listed are in fact tautologies.
In the presence of this lush collection of axioms we need only one rule of inference (namely the 'modus ponens' of mediaeval logicians). From any two formulae of the form
pandp *imp qthis allows us to deduce q. As with the axioms, this rule is to be understood as a template, covering all of its substituted instances.To ensure that the tautologies are exactly the derivable propositional formulae we must prove that (I) only tautologies can be derived, and (II) all tautologies can be derived. (I) is easy. We reason as follows. All the axioms are tautologies. Moreover, since
Val(A,p *imp q) = Max(1 - Val(A,p),Val(A,q)),it follows that if Val(A,p *imp q) and Val(A,p) are both 1, so is Val(A,q). So if 'p *imp q' and p are both tautologies, then so is q. This proves our claim (I).Proving claim (II) takes a bit more work, whose general pattern is much like that used to reduce multivariate polynomials to their canonical form. Starting with any syntactically well formed propositional formula F, we can proceed in the following way to derive a chain of formulae equivalent to F (via an explicit chain of equivalences Fi *eq Fi + 1). Note that axioms (xviii-xx) ensure that the equivalence relator '*eq' has the same transitivity, symmetry, and reflexivity properties as equality, while (xi-xiii) allow us to replace any subexpression of an expression formed using only the three operators &, or, not by any equivalent subexpression.
Using these facts and (xv-xvi) we first descend recursively through the syntax tree of F, replacing any occurrence of one of the operations *imp, *eq by an equivalent expression involving only &, or, not. This reduces F to an equivalent formula involving only the operators &, or, not. Then, using (vii-viii) and (x), we systematically push 'not' and 'or' operators down in the syntax tree, moving '&' operators up. Subformulae of the form (not (not p)) are simplified to p using axiom (xxiii). Axioms (xxiv-xxix) can be used to simplify expressions containing the constants 'true' and 'false'. When this work is complete F will been have reduced to an equivalent formula F' which is either one of the constants 'true' or 'false' or has the form a1 & ... & ak, where each aj is a disjunction of the form
b1 or ... or bh,each bm being either a propositional variable or the negation of a propositional variable. (ii) and (v) allow us to think of these conjunctions and disjunctions without worrying about how they are parenthesized. Then (iv) and (vi) can be used to bring all the bm involving a particular propositional variable together within each aj.Now assume that F is a tautology, so that every one of the formulae to which we have reduced it must also be a tautology (since the substitutions performed all convert tautologies to tautologies), and so our final formula F' is a tautology. We will now further reduce F', so that it becomes the formula 'true'. Unless F' is already 'true', in each aj, there must occur at least one pair bm, bn of disjuncts such that bm is a propositional variable of which bn is the negation, 'not bm'. Indeed, if this is not the case, then any propositional variable which occurs in aj will occur either negated or non-negated, but not both. Given this, we can assign the value 0 to each non-negated variable and the value 1 to each negated variable. Then every bm in aj will evaluate to 0, so the whole expression b1 or ... or bh will evaluate to 0, that is, aj will evaluate to 0. But as soon as this happens the whole formula a1 & ... & ak will evaluate to 0. This shows that there exists an assignment A such that Val(A,F') = 0, contradicting the fact that F' is a tautology. This contradiction proves our claim that each aj must contain at least one pair bm, bn of disjuncts which agree except for the presence of a negation operator in one but not in the other.
Given this fact, (xxii) tells us that 'bm or bn' simplifies to 'true', so that (xxvi) can be used repeatedly to simplify aj to 'true'. Since this is the case for each aj, repeated use of (xxiv) allows us to reduce any tautology to 'true' using a chain of equivalences. Since this chain of equivalences can as well be traversed in the reverse direction, we can equally well expand the axiom 'true' (axiom (xxx)) into our original formula F using a chain of equivalences. Then (xiv) can be used to convert this chain of equivalences into a chain of implications, giving us a proof of F by repeated uses of modus ponens.
Any set of axioms from which all the statements (i-xxx) can be derived as theorems can clearly be used as an axiomatic basis for the propositional calculus. This allows much leaner sets of axioms to be used. We refrain from exploring this point, which lacks importance for the rest of our discussion.
However, it is worth embedding the notion of 'tautology' in a wider, relativized, set of ideas. Suppose that we write
|= Fto indicate that the formula F is a tautology, and|- Fto indicate that F is a provable formula of the propositional calculus. The preceding discussion shows that |= F and |- F are equivalent conditions. This result can be generalized as follows. Let S designate any finite set of syntactically well-formed formulae of the propositional calculus. We can then writeS |= Fto indicate that, for each assignment A covering both F and all the formulae in S, we have Val(A,F) = 1 whenever Val(A,G) = 1 for all G in S. Also, we writeS |- Fto indicate that F follows by propositional proof if the statements in S are added to the axioms of propositional calculus (each of them acting as an individual axiom, not as a template). Then it is easy to show thatS |= F if and only if S |- F.To show this, first suppose that S |= F. Let C designate the conjunctionG1 & ... & Gkof all the formulae in S. Then since Val(A,H1 & H2) = Min(Val(A,H1),Val(A,H2)) for any two formulae H1,H2, it follows that Val(A,C) = 1 if and only if Val(A,G) = 1 for all G in S. We haveVal(A,C *imp F) = Val(A,(not C) or F) = Max(1 - Val(A,C),Val(F))for all assignments A covering C *imp F (i.e. covering both F and all the formulae in S). It follows that for each assignment A covering both F and all the formulae in S, we have Val(A,C *imp F) = 1, since if 1 - Val(A,C) /= 1 then Val(A,C) must be 1 and so Val(F) must be 1. Thus|= C *imp F,and so it follows that|- C *imp F,i.e. C *imp F can be proved from the axioms of propositional calculus alone. But then if the statements in S are added as additional axioms we can prove F by first proving C *imp F and then using the statements in S to prove the conjunction C. This shows that S |= F implies S |- F.Next suppose that S |- F, and let A be an assignment A covering both F and all the formulae in S so that Val(A,G) = 1 for every statement G in S. Then Val(A,G) = 1 for every statement G that can be used as an axiom in the proof of F from the standard axioms of propositional calculus and the statements in S as additional axioms. But we have seen above that if Val(A,p *imp q) and Val(A,p) are both 1, so is Val(A,q). Since derivation of q from p and p *imp q is the only inference step allowed in propositional calculus proofs, it follows that S |= F, completing our proof that the conditions S |= F and S |- F are equivalent.
We shall see that similar statements apply to the much more general predicate calculus studied in the following section. In that section, we will need the following extension of the preceding results to countably infinite collections of propositional formulae.
Definition. A (finite or infinite) collection S of formulae of the propositional calculus is said to be consistent if the proposition 'false' cannot be deduced from S, i.e.
S |- falseis false. We say that S has a model A if there exists some assignment A covering all the formulae of S such that Val(A,F) = 1 for every F in S.Theorem (Compactness): Let S be a denumerable collection of formulae of the propositional calculus. Then the following three conditions are equivalent:
(i) S is consistent.
(ii) Every finite subset of S is consistent.
(iii) S has a model.
Proof: Since subsets of a consistent S are plainly consistent, (i) implies (ii). On the other hand, any proof of 'false' from the statements of S is of finite length by definition, and so uses only a finite number of the statements of S. Thus (ii) implies (i), so (ii) and (i) are equivalent.
Next suppose that S is not consistent, so that 'false' can be proved from some finite subset S' of the statements in S. Let C be the conjunction of all the statements in S'. It follows from the discussion immediately preceding the statement of the present theorem that |- C *imp false, and so Val(A,'C *imp false') = 1 for any assignment A covering all the propositional symbols in S. This gives Val(A,C) = 0 for all such A, so that S has no model. This proves that (iii) implies (i).
Next we show that (i) implies (iii). For this, let Sj be an increasing sequence of finite subsets of S whose union is all of S. Each Sj is plainly consistent, so
Sj |- falseis false for each j, and thereforeSj |= falseis false, since we have shown above that these two conditions are equivalent for finite Sj. That is, for each j there must exist an assignment Aj covering all the variables appearing in any formula of Sj, such that Val(Aj, Sj) = 1. Let v1, v2, v3,... be an enumeration of all the variables appearing in any of the formulae of S. Then each vk must be in the domain of all Aj for all j beyond a certain point J = Jk.Let I0 designate the sequence of all integers. Since Aj(v1) must have one of the two values 0 and 1, there must exist an infinite subsequence I1 of I0 for all j of which Aj(v1) has the same value. Call this value B(v1). Arguing in the same way we see that here must exist an infinite subsequence I2 of I1 and a Boolean value B(v2) such that
B(v2) = Aj(v2) for all j in I2.Arguing repeatedly in this way we eventually construct values B(vk) for each k such that for each finite m, there exist infinitely many j such thatB(vn) = Aj(vn) for all n from 1 to m.Now consider any of the formulae G of S. Since G can involve only finitely many propositional variables vj, all its variables will be included in the set {v1,...,vk} for each sufficiently large k. Take any Aj for which B(vn) = Aj(vn) for all n from 1 to k. Then it is clear that for some i greater than j, we haveVal(B,G) = Val(Ai,G) = 1.Hence Val(B,G) = 1 for all G in S, so that B is a model of S, proving that (i) implies (iii), and thereby completing the proof of our theorem. QEDUsing the Compactness Theorem, we can show that the conditions S |- F and S |= F are equivalent even in the case in which S is an infinite set of propositional formulae.
To show this, first assume that S |= F. Then the set S + {not F} of propositions is plainly not consistent, and so by the Compactness Theorem S must contain some finite subset S0 such that S0 + {not F} is not consistent. Then plainly S0 |= F, so we have S0 |- F. This clearly implies S |- F; so S |- F follows from S |= F.
But, as noted at the end of the proof of the Compactness Theorem, S |= F follows from S |- F even if S is infinite, completing the proof of our claim.
2.2. The predicate calculus
The predicate calculus constitutes the next main part of the logical formalism used in this book. This calculus enlarges the propositional calculus, preserving all its operations but also allowing compound functional and predicate terms and the two quantifiers FORALL and EXISTS. An example is the formula
((FORALL x,y | F(x + y) = F(F(x))) *imp F(F(x)) = 0) *imp ((EXISTS x | F(F(x)) /= 0) *imp (F(x + y) /= F(F(x)))).Formulae of the predicate calculus are built starting with string names of three kinds, respectively designating 'individual' variables, function symbols, and predicate symbols. These are combined into 'terms', 'atomic formulae', and 'formulae' using the following recursive syntactic rules.
(i) Any variable name is a term. (We assume variable names to be alphanumeric and to start with lower case letters).
(ii) Each function symbol has some fixed finite number k of arguments. If f is a function symbol of k arguments, and t1,...,tk are any k terms, then f(t1,...,tk) is a term. (We assume function names to be alphanumeric and to start with lower case letters).
(iii) Each predicate symbol has some fixed finite number k of arguments. If P is a predicate symbol of k arguments, and t1,...,tk are any k terms, then P(t1,...,tk) is an atomic formula. (We assume predicate names to be alphanumeric and to start with upper case letters).
(iv) Formulae are formed starting from atomic formulae and using the operators and syntactic rules of the propositional calculus and the two quantifiers FORALL and EXISTS. More precisely, if e and f are any two predicate formulae and v1,...,vn are any n variable names, with n>0, then the following expressions are predicate formulae:
e & f, e or f, e *imp f, e *eq f, not e, (FORALL v1,...,vn | e), (EXISTS v1,...,vn | e).Like propositional formulae, the formulae of predicate calculus parse unambiguously into syntax trees each of whose internal nodes is marked either (i) with one of the propositional operators, and then has as many descendants as the corresponding propositional node, or (ii) with a function or predicate symbol, in which case its descendants correspond to the arguments of the function or predicate symbol; (iii) a quantifier FORALL or EXISTS involving n variable names, in which case the node has n + 1 descendants, the first n marked with the n variable names appearing in the quantifier and the
n + 1-st which is the syntax tree of the expression e that is being quantified. Each leaf of such a tree is marked either with the name of an individual va