Category Archives: Computability

A Fun, Quick Theorem

I like this theorem, and it’s pretty straightforward to prove since I’ve done all the leg room in previous posts.

Theorem: There are sets of natural numbers that no computer program can enumerate.

Proof:

  1. There set of all computer programs is enumerable (countably infinite). (Proof here)
  2. The set of all sets of natural numbers in not enumerable (uncountably infinite). (Proof here)
  3. Therefore there are more sets of natural numbers than there are computer programs to enumerate them.
  4. Therefore there are sets of natural numbers that no computer program can enumerate.

\Box

Nothing too earth-shattering, I just thought it was cute.

Oh yeah, did I mention there are different sizes of infinities? I guess I should talk about that next time.

Gödel! #5 An Introduction to Gödel’s Theorems Chapter 3 Part 1

3 Axiomatized formal theories

Alright, cards on the table: I found this chapter kind of boring until around the end, so I’m just going to try and skim through it to get to the interesting parts.

3.1 Formalization as an ideal

Gödel’s theorems are statements about formal languages. Why do we care about formal languages? It’s pretty straightforward: formal languages allow us to ensure correctness by following specific rules regarding structure and syntax. In a natural language, such as English, you can have sentences with ambiguous meanings. For example:

“John knows a woman with a cat named Amy.”

could have two possible meanings: either John knows a woman who has a cat whose name is Amy, or John knows a woman named Amy who has a cat. This won’t upset us too much in our day-to-day lives. Usually the intended meaning can be inferred from context. But when we’re trying to prove something logically, this requires precision. Thus, we can use a formal language (such as first-order logic or even any given programming language) to create an unambiguous parsing of our sentences.

Even in proofs, mathematics or computer science, however, we don’t always use such rigorous formalization, as such precision can be tedious. What’s important in these cases is more to do with conveying an understanding of a concept. But the underpinnings of the formal language exists, and if one desired should be able to be spelled out with perfect rigour.

3.2 Formalized languages

Normally we are interested in interpreted languages, ie: ones with not just a syntax for deriving valid structures, but one where those structures have actual meaning. I could symbolize one version of the John/Amy sentence from above with Kja\wedge Wa\wedge Ca but that sentence is meaningless unless I also inform you that Kxy means “x knows y“, Wx means “x is a woman”, Cx means “x has a cat”, and j,a mean “John” and “Amy” respectively.

Thus, we will define a language L as being a pair \langle\mathcal L,\mathcal I\rangle where \mathcal L is a syntactically defined system of expressions and \mathcal I is an intended interpretation of those expressions.

Starting with \mathcal L: it will be based on a finite alphabet of symbols (I’m pretty sure you can get away with relaxing the requirement to an effectively enumerable alphabet, but the book says finite so we’ll go with finite). Some of these symbols will make up \mathcal L‘s logical vocabulary, for example: connectives, quantifiers, parentheses, identity… Other symbols will constitute \mathcal L‘s non-logical vocabulary: predicates and relations, functions, constants, variables… We will also need a system of rules for determining which sequences of symbols are well-formed formulae of \mathcal L (referred to throughough the text as its wffs).

For example, in first-order logic, our logical vocabulary is \{(,),\wedge,\vee,\neg,\rightarrow,\leftrightarrow,=,\forall,\exists\}. Our non-logical vocabulary is a bit more complicated in that it needs to potentially address infinitely many variables. There are two ways to do this. The way I like to think of it is having \{f^i_j,P^i_j|i,j\in\mathbb N\} which gives you an infinite set containing all of your variables f^0_0,f^0_1,f^0_2,\ldots, all of your k-place predicates P^k_0,P^k_1,P^k_2,\ldots and all of your k>0-place functions f^k_0,f^k_1,f^k_2,\ldots. But, of course, this results in an infinite alphabet of symbols. The book chooses to accomplish this by having your variables be something like x,x^\prime,x^{\prime\prime},\ldots which operates on a finite alphabet (x and ^\prime). In either case, we will typically just denote variables as x,y,z, predicates as P,Q,R, functions as f,g,h and so on. As stated in the previous section, we don’t always need to use perfect rigour but it’s important to understand how it would be accomplished. The union of the logical and non-logical vocabularies form the language’s alphabet. To see how first-order logic determines its wffs, see wikipedia. It is important for our purposes that determining whether a sentence is a wff is effectively decidable.

We then use \mathcal I to set the interpretation of our language. It cant do this by manually setting the truth conditions for each wff (there are too many of them). What it does, rather, is to determine the domain of quantification, the which terms are applied to particular predicates, and the rules for determining the truth of a sentence. For example, \mathcal I might set the domain of quantification to the set of people, set the value constants m,n to Socrates and Plato respectively, and the meaning of the predicate F to mean “is wise”. Then \mathcal I continues by indicating which predicates are true of terms predicates, for example that F is true of both m and n. Finally, \mathcal I sets up rules for determining whether wffs are true. For example a wff of the form A\wedge B is true iff A is true and B is true. This can be tedious but straightforward. Again, however, it is important that this process be an effectively decidable one.

3.3 Axiomatized formal theories

The following things are required to construct an axiomatized formal theory T:

(a) First, some wffs of our T‘s language are selected as its axioms. There must be an effective decision procedure to determine whether a given wff is an axiom of T. This doesn’t mean that T must have a finite number of axioms: we can have an axiom schema which tells us that sentences of such-and-such a form are axioms. For example, in Zermelo-Fraenkel set theory any wff of the form \forall z\exists y\forall x(x\in y\leftrightarrow(x\in z\wedge\phi)) (where \phi can be substituted for (essentially) any property) is an axiom, giving the theory infinitely many axioms.

(b) We also need some form of deductive apparatus or proof system in order to prove things in T. I’ve talked about proof systems before and demonstrated two: truth tables and Fitch-style calculus. The exact system used for T is irrelevant so long as it is effectively decidable whether a derivation from premises \varphi_1,\varphi_2,\ldots,\varphi_n to a conclusion \psi is valid. Note that this is different from having an effective procedure to actually create this derivation. All is has to do is determine whether a given proof which has been provided is valid for teh system.

(c) Thus, given an axiomatized formal theory T, since we can effectively decide which wffs are T-axioms, and whether a derivation in T‘s proof system is valid, it follows that it is also decidable whether a given array of wffs forms a T-proof (ie: which proofs are sound in T).

(d) For the remainder of this book, when we discuss theories, what we really mean is axiomatized formal theories. Many times in logic “theory” simply means any collection of sentences, thus we must be careful to make this distinction here.

Next time we’ll finish up chapter 3 where it actually gets interesting.

Gödel! #4 An Introduction to Gödel’s Theorems 2.2

2.4 Effective enumerability

Similar to effective computability and effective decidability, effective enumerability describes which sets can be enumerated using a strictly mechanical procedure (such as our Turing machines described in secion 2.2). Now it is easy to imagine a program that enumerates a finite set: it just spits out the elements of the set and then halts. Consider the program:

\texttt{for (int i=0;i<5;i++)\{}\\\texttt{\indent print(i);}\\\texttt{\}}

This program will enumerate the set \{0,1,2,3,4,5\}. But what does it mean to effectively enumerate an infinite set? Consider the program:

\texttt{for (int i=0;;i++)\{}\\\texttt{\indent print(i);}\\\texttt{\}}

We can get an intuitive sense that this program will enumerate the set of all natural numbers, \mathbb N, but can we formalize the notion of a non-terminating program being used to enumerate an infinite set? This isn’t discussed in the book, but I like to think of it as follows:

A program \Pi is said to list an element e iff there, after some finite number of steps of execution \Pi prints out e. If \Pi lists every element in a set B then \Pi is said to enumerate B. Thus, B is effectively enumerable iff there exists a program \Pi which enumerates it.

So in our program above, we can now see formally that every element of n\in\mathbb N will be listed after exactly n+1 steps (if we ignore steps involved in managing the loop logic.

An interesting fact is that every finite set is enumerable. To observe this, consider a finite set \{n_0,n_1,\ldots,n_k\}. This set is then enumerable by the program:

\texttt{print(}n_0\texttt{);}\\\texttt{print(}n_1\texttt{);}\\\ldots\\\texttt{print(}n_k\texttt{);}

Thus, given any finite set, we can in essence simply store every element in the set into memory and spit them out on demand.

2.5 Effectively enumerating pairs of numbers

What follows is a useful theorem that we will use time and time again.

Theorem 2.2: The set of ordered pairs of numbers \langle i,j\rangle is effectively enumerable.

Proof: Consider a table of ordered pairs of numbers:

\langle 0,0\rangle \rightarrow \langle 0,1\rangle \langle 0,2\rangle \rightarrow \langle 0,3\rangle \ldots
\swarrow \nearrow  \swarrow
\langle 1,0\rangle \langle 1,1\rangle \langle 1,2\rangle \langle 1,3\rangle \ldots
\downarrow  \nearrow  \swarrow
\langle 2,0\rangle \langle 2,1\rangle \langle 2,2\rangle \langle 2,3\rangle \ldots
 \swarrow
\langle 3,0\rangle \langle 3,1\rangle \langle 3,2\rangle \langle 3,3\rangle \ldots
\downarrow
\vdots \vdots \vdots \vdots \ddots

We can use a computer to zig-zag through this table (in the order indicated by the arrows) which will eventually arrive at every possible combination. Since the table will be stored somehow in memory, we must generate it on the fly since we can’t operate on an infinite table, but we can make the table of arbitrarily large size and increase it as needed due to the infinite memory capabilities of our idealized computer system. Thus, we have created a program which enumerates the set \mathbb N^2.

QED

The book mentions that you can explicitly create a computable function f:\mathbb N\rightarrow\mathbb N^2 that will do this without the use of the table, however it doesn’t go into details and even if it did, this exercise can be quite messy. The way I actually know how to do it doesn’t involve the table at all and actually goes the other way, \pi:\mathbb N^2\rightarrow\mathbb N. It looks like:

\pi(x,y)=2^x(2y+1)-1

You can also define inverse functions \pi_0,\pi_1 such that \pi(\pi_0(n),\pi_1(n))=n, but this is complex and involves \mu-calculus which I’m not ready to get into yet.

That wraps up chapter 2. Next time: axiomatized formal theories and why we should care.

Gödel! #2 An Introduction to Gödel’s Theorems 2.0

2 Decidability and enumerability

Here we go over some basic notions that will be crucial later.

2.1 Functions

As I imagine anyone reading this is aware (although it’s totally cool if you’re not… that’s why it’s called learning), a function f:\Delta\rightarrow\Gamma is a rule f that takes something from its domain \Delta and turns it into something from its co-domain \Gamma. We will be dealing exclusively with total functions, which means that f is defined for every element in \Delta. Or, more plainly, we can use anything in \Delta as an argument for f and have it make sense. This is contrasted with the notion of partial functions, which can have elements of the domain that f isn’t designed to handle. We will not be using partial functions at any point in this book (or so it promises).

So, given a function f:\Delta\rightarrow\Gamma, some definitions:

The range of a function is the subset of the \Gamma that f can possibly get to from elements of \Delta, ie: \{f(x)|x\in\Delta\}. In other words, the range is the set of all possible outputs of f.

f is surjective iff for every y\in\Gamma there is some x\in\Delta such that f(x)=y. Equivalently, f is surjective iff every member of its co-domain is a possible output of f iff its co-domain and its range are identical. This property is also called onto.

f is injective iff for it maps every different element of \Delta to a different element of \Gamma. Equivalently, f is injective iff x\neq y implies that f(x)\neq f(y). This property is also called one-to-one because it matches everything with exactly one corresponding value.

f is bijective iff it is both surjective and injective. Because f is defined for every element of \Delta (total), can reach every member of \Gamma (surjective) and matches each thing to exactly one other thing (injective), an immediate corollary of this is that \Delta and \Gamma have the same number of elements. This is an important result that we will use quite often when discussing enumerability.

2.2 Effective decidability, effective computability

Deciding is the idea of determining whether a property or a relation applies in a particular case. For example, if I ask you to evaluate the predicate “is red” against the term “Mars”, you would say yes. If I gave you the predicate “halts in a finite number of steps” to the computer program \texttt{while (true);} you would probably say no. In either case you have just decided that predicate.

Computing is the idea of applying a function to an argument and figuring out the result is. If I give you the function f(x)=x+1 and the argument x=3 you would compute the value 4. If I give you the function f(x)=\text{the number of steps a computer program }x\text{ executes before halting} to the argument of the same computer program as above, you would conclude that the result is infinite. In both cases you have just computed that function.

What effectiveness comes down to is the notion of whether something can be done by a computer. Effective decidability is the condition that a property or relation can be decided by a computer in a finite number of operations. Effective computability is the condition that the result of a function applied to an argument can be calculated by a computer in a finite number of operations. For each notion, consider the two sets of two examples above. In each, the first is effectively decidable/computable and the second is not, for reasons I hope will eventually be clear.

This raises an obvious questions: what is a computer? Or, more to the point, what can computers do exactly? For our purposes we will be using a generalized notion of computation called a Turing machine (named for their inventor, Alan Turing). Despite its name, a Turing machine is not actually a mechanical device, but rather a hypothetical one. Imagine you have an infinite strip of tape, extending forever in both directions. This tape is divided up into squares, each square containing either a zero or a one. Imagine also that you can walk up and down and look at the square you’re standing next to. You have four options at this point (and can decide which to do take based on whether you’re looking at a zero or a one, as well as a condition called the “state” of the machine): you can either move to the square to your left, move to the square on your right, change the square you’re looking at to a zero, or change it to a one. It may surprise you, but the Turing machine I have just described is basically a computer, and can execute any algorithm that can be run on today’s state-of-the-art machines.

In fact, throughout the history of computability theory, whenever a new model has been developed of what could be done algorithmically by a computer (such as \lambda-calculus, \mu-calculus, and even modern programming languages) it has turned out that each of these notions were equivalent to a Turing machine, as well as each other. Thus, Alan Turing and Alonzo Church separately came up with what is now called the Church-Turing thesis (although the book only deals with Turing, hence “Turing’s thesis”):

Turing thesis: the numerical functions that are effectively computable in an informal sense (ie: where the answer can be arrived at by a step-by-step application of discrete, specific numerical operations, or “algorithmically”)  are just those functions which are computable by a properly programmed Turing machine. Similarly, the effectively decidable properties and relations are just the numerical properties and relations which are decidable by a suitable Turing machine.

Of course we are unable to rigorously define an “intuitive” or “informal” notion of what could be computed, so Turing’s thesis could never be formally proven, however all attempts to disprove it have been thoroughly rebuked.

You might wonder, however, about just how long it might take such a simple machine to be able to solve complex problems. And you would be right to do so: Turing machines are notoriously hard to program, and take an enormous number of steps in order to solve most interesting problems. If we were to actually use such a Turing machine to try and get a useful answer to a question (as opposed to, say, writing a C++ program) it could very realistically take lifetimes to calculate. By what right, then, do we call this “effective”? Another objection to be raised might have to do with the idea of an infinite storage medium, which violates basic engineering principles of modern computer architectures.

Both of these objections can be answered at once: when we discuss computability, we are not so much interested in how practical it is to run a particular program. What interests us is to know what is computable in principle, rather than in practice. The reason for this is simple: when we discover a particular problem that cannot be solved by a Turing machine in a finite number of steps, this result is all the more surprising for the liberal attitude we’ve taken towards just how long we will let our programs run, or how much space we will allow them to take up.

One final note in this section. The way we’ve defined Turing machines, they operate on zeroes and ones. This of course reflects how our modern computers represent numbers (and hence why the Turing thesis refers to “numerical” functions, properties and relations). So how then can we effectively compute functions or decide properties of other things, such as truth values or sentences? This is simple. We basically encode such things into numbers and then perform numerical operations upon them.

For example, most programming languages encode the values of \texttt{True} and \texttt{False} as 1 and 0, respectively. We can do the same thing with strings. An immediate example is ASCII encoding of textual characters to numeric values which is standardized across virtually all computer architectures. Later in the book we will learn another, more mathematically rigorous way to do the same.

The rest of this chapter is about the enumerability and effective enumerability of sets, but I’m going to hold off on talking about those until next time.