Category Archives: Math

A Fun, Quick Theorem

I like this theorem, and it’s pretty straightforward to prove since I’ve done all the leg room in previous posts.

Theorem: There are sets of natural numbers that no computer program can enumerate.

Proof:

  1. There set of all computer programs is enumerable (countably infinite). (Proof here)
  2. The set of all sets of natural numbers in not enumerable (uncountably infinite). (Proof here)
  3. Therefore there are more sets of natural numbers than there are computer programs to enumerate them.
  4. Therefore there are sets of natural numbers that no computer program can enumerate.

\Box

Nothing too earth-shattering, I just thought it was cute.

Oh yeah, did I mention there are different sizes of infinities? I guess I should talk about that next time.

Advertisements

The Enumerability of Strings

In my last post I mentioned that the set of finite strings over a countable (possibly infinite) alphabet is enumerable. I want to go over the details of that because I think it’s neat but I don’t think I made a compelling case for it in that post.

We’ll need a few definitions to get us started:

An alphabet \Sigma is simply a set of symbols. For example, our English alphabet uses the symbols \{a,b,c,\ldots,x,y,z\} and the Greek alphabet uses the symbols \{\alpha,\beta,\gamma,\ldots,\chi,\psi,\omega\}. Both of these examples are finite alphabets, but we can also imagine a countably infinite alphabet. For generic purposes, we’ll consider the symbols of this alphabet to be \{\sigma_0,\sigma_1,\sigma_2,\sigma_3,\ldots\}. For the purposes of our proofs, all the alphabets we discuss below will be countably infinite.

A string over an alphabet \Sigma is a sequence of symbols in \Sigma, usually indicated between quotes. For example, an English string might be “apple” or “xoarotl”. A generic string would look something like “\sigma_3\sigma_{99}\sigma_0\sigma_0\sigma_{806}“. We also impose a restrictions that strings must be finite in length.

And now the meat of what we want to show:

Lemma: The set of strings of length-n>0 is enumerable.

Proof of Lemma: This can be proved by induction. The set of length-1 strings is trivially enumerable by the function \lambda_1 where \lambda_1(i)=\sigma_i. Then, given a function \lambda_k that enumerates the length-k strings, we can enumerate the length-k+1 strings by creating the following table, where each \lambda_k(i) runs across the x-axis, and the symbols of our alphabet run down the y-axis. In each spot in the table, we append the symbol from the y-axis to the string from the x-axis to obtain the listed result (let \& be the concatenation operator).

\lambda_k(0)\&\sigma_0 \rightarrow \lambda_k(1)\&\sigma_0 \lambda_k(1)\&\sigma_0 \rightarrow \lambda_k(1)\&\sigma_0 \ldots
\swarrow \nearrow  \swarrow
\lambda_k(0)\&\sigma_1 \lambda_k(1)\&s_1 \lambda_k(2)\&\sigma_1 \lambda_k(3)\&\sigma_1 \ldots
\downarrow  \nearrow  \swarrow
\lambda_k(0)\&\sigma_2 \lambda_k(1)\&\sigma_2 \lambda_k(2)\&\sigma_2 \lambda_k(3)\&\sigma_2 \ldots
 \swarrow
\lambda_k(0)\&\sigma_3 \lambda_k(1)\&\sigma_3 \lambda_k(2)\&\sigma_3 \lambda_k(3)\&\sigma_3 \ldots
\downarrow
\vdots \vdots \vdots \vdots \ddots

We can then generate \lambda_{k+1} by moving along the diagonal of this table in the order indicated by the arrows. With some reflection, you should be able to see that this will enumerate all of the length-k+1 strings. Thus, by induction, for a given n>0, we can enumerate all of the length-n stringsover our alphabet. The function that enumerates them will be called \lambda_n.

\Box

Theorem: The set of strings over a finite alphabet \Sigma is enumerable.

Proof: This proof is almost trivial after the last one, but it makes use of the same diagonalization algorithm as well as all of the \lambda_i functions. Consider the following table:

\lambda_1(0) \rightarrow \lambda_1(1) \lambda_1(2) \rightarrow \lambda_1(3) \ldots
\swarrow \nearrow  \swarrow
\lambda_2(0) \lambda_2(1) \lambda_2(2) \lambda_2(3) \ldots
\downarrow  \nearrow  \swarrow
\lambda_3(0) \lambda_3(1) \lambda_3(2) \lambda_3(3) \ldots
 \swarrow
\lambda_4(0) \lambda_4(1) \lambda_4(2) \lambda_4(3) \ldots
\downarrow
\vdots \vdots \vdots \vdots \ddots

Now, again by travelling along the diagonal (as indicated by the arrows) we can see that all of the \Sigma-strings will be enumerated, since each i^{th} row will contain all of the i-length strings. Thus we can enumerate all the strings over a countably infinite alphabet.

\Box

This isn’t quite the enumeration I alluded to in my last post, which instead of using the diagonalization trick involves an explicit mathematical formula, however this version is much easier to illustrate and does the trick equally well.

This theorem has a couple of interesting applications. As we saw in the previous post, we can consider a proof to be a “string” over an “alphabet” where the symbols are individual sentences. Thus, by this theorem, the set of proofs in a language is enumerable (assuming we have an effective way to decide which strings observe the rules of our proof system). We can get a couple of other interesting interpretations, too:

Corollary: (i) The set of all tuples of natural numbers, (ii) the set of finite subsets of \mathbb N, (iii) the set of images, and (iv) the set of all computer programs are all enumerable.

Proof of (i): A tuple of natural numbers can be thought of as a string over the alphabet \mathbb N. By our theorem, the set of all tuples of natural numbers is enumerable.

Proof of (ii): The set of subsets of \mathbb N can be thought of as unordered tuples. So long as we allow repetitions in our enumerations, this set is enumerable.

Proof of (iii): An image can be thought of as a string over an alphabet of pixel colours. When we do the diagonalization trick from above, instead of looking at the length of a string, we need to consider two different dimensions: the width and height of the image. This can be done with a simple inverse pairing function applied to the original string length. Thus, the set of all images is enumerable.

Proof of (iv): Similar to a proof, a computer program can be thought of as a linear sequence of instructions. Thus, a computer program is a “string” over an “alphabet” of possible instructions and variables (such as loop controls, conditional logic, arithmetic operations, etc…). Thus the set of all computer programs (in a given language) is enumerable.

\Box

(On a personal note, I totally finished writing this just as the podcast I was listening to ended. Great timing or what?)

Gödel! #4 An Introduction to Gödel’s Theorems 2.2

2.4 Effective enumerability

Similar to effective computability and effective decidability, effective enumerability describes which sets can be enumerated using a strictly mechanical procedure (such as our Turing machines described in secion 2.2). Now it is easy to imagine a program that enumerates a finite set: it just spits out the elements of the set and then halts. Consider the program:

\texttt{for (int i=0;i<5;i++)\{}\\\texttt{\indent print(i);}\\\texttt{\}}

This program will enumerate the set \{0,1,2,3,4,5\}. But what does it mean to effectively enumerate an infinite set? Consider the program:

\texttt{for (int i=0;;i++)\{}\\\texttt{\indent print(i);}\\\texttt{\}}

We can get an intuitive sense that this program will enumerate the set of all natural numbers, \mathbb N, but can we formalize the notion of a non-terminating program being used to enumerate an infinite set? This isn’t discussed in the book, but I like to think of it as follows:

A program \Pi is said to list an element e iff there, after some finite number of steps of execution \Pi prints out e. If \Pi lists every element in a set B then \Pi is said to enumerate B. Thus, B is effectively enumerable iff there exists a program \Pi which enumerates it.

So in our program above, we can now see formally that every element of n\in\mathbb N will be listed after exactly n+1 steps (if we ignore steps involved in managing the loop logic.

An interesting fact is that every finite set is enumerable. To observe this, consider a finite set \{n_0,n_1,\ldots,n_k\}. This set is then enumerable by the program:

\texttt{print(}n_0\texttt{);}\\\texttt{print(}n_1\texttt{);}\\\ldots\\\texttt{print(}n_k\texttt{);}

Thus, given any finite set, we can in essence simply store every element in the set into memory and spit them out on demand.

2.5 Effectively enumerating pairs of numbers

What follows is a useful theorem that we will use time and time again.

Theorem 2.2: The set of ordered pairs of numbers \langle i,j\rangle is effectively enumerable.

Proof: Consider a table of ordered pairs of numbers:

\langle 0,0\rangle \rightarrow \langle 0,1\rangle \langle 0,2\rangle \rightarrow \langle 0,3\rangle \ldots
\swarrow \nearrow  \swarrow
\langle 1,0\rangle \langle 1,1\rangle \langle 1,2\rangle \langle 1,3\rangle \ldots
\downarrow  \nearrow  \swarrow
\langle 2,0\rangle \langle 2,1\rangle \langle 2,2\rangle \langle 2,3\rangle \ldots
 \swarrow
\langle 3,0\rangle \langle 3,1\rangle \langle 3,2\rangle \langle 3,3\rangle \ldots
\downarrow
\vdots \vdots \vdots \vdots \ddots

We can use a computer to zig-zag through this table (in the order indicated by the arrows) which will eventually arrive at every possible combination. Since the table will be stored somehow in memory, we must generate it on the fly since we can’t operate on an infinite table, but we can make the table of arbitrarily large size and increase it as needed due to the infinite memory capabilities of our idealized computer system. Thus, we have created a program which enumerates the set \mathbb N^2.

QED

The book mentions that you can explicitly create a computable function f:\mathbb N\rightarrow\mathbb N^2 that will do this without the use of the table, however it doesn’t go into details and even if it did, this exercise can be quite messy. The way I actually know how to do it doesn’t involve the table at all and actually goes the other way, \pi:\mathbb N^2\rightarrow\mathbb N. It looks like:

\pi(x,y)=2^x(2y+1)-1

You can also define inverse functions \pi_0,\pi_1 such that \pi(\pi_0(n),\pi_1(n))=n, but this is complex and involves \mu-calculus which I’m not ready to get into yet.

That wraps up chapter 2. Next time: axiomatized formal theories and why we should care.

Gödel! #3 An Introduction to Gödel’s Theorems 2.1

2.3 Enumerable sets

Another useful notion we will be using time and again is that of effective enumerability. As with effective decidability and effective computability, we will first explain what it means to be enumerable in principle, before moving on to the effectively so. Straight from the book:

A set \Sigma is enumerable if its members can – at least in principle – be listed off in some order (a zero-th, first, second) with every member appearing on the list; repetitions are allowed, and the list may be infinite.

What this means is that you can give a (possibly infinite) list that will contain every single member of \Sigma and each member is guaranteed to appear a finite number of entries into this list. The finite case is fairly obvious if \Sigma=\emptyset (where \emptyset denotes the empty set containing no elements) then trivially all elements of \Sigma will appear on any list (say, the empty list containing no entries). If \Sigma is larger, but still finite, we can imagine just going through each of the elements and listing them one by one. For example, 0, 1, 2, 3, 4, 5 is an enumeration of the finite set \{0,1,2,3,4,5\}. Similarly, 0, 1, 5, 3, 4, 3, 0, 1, 2 also enumerates \Sigma, although with redundancies and not in a natural order.

The tricky case is infinite lists. The condition you need to pay special attention to in this instance is that each element of \Sigma must appear a finite number of entries into the list. So, for example, the following lists each enumerate \Sigma=\mathbb N (where \mathbb N denotes the infinite set containing all natural numbers):

  1. 0, 1, 2, 3, 4, 5,\ldots
  2. 1, 0, 3, 2, 5, 4,\ldots
  3. 0, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4,\ldots

Notice that in each case (assuming our patterns hold) we can determine exactly how many entries into the list a given number n will appear. In 1. n is in the n^\text{th} position. In 2. n appears n+1 entries down the list if n is even, and n-1 entries down if n is odd. In the third example n appears \Sigma_{i=0}^n i entries into the list. Note that to make the math easier, we start our counting at zero: thus, the left-most element listed is the “zero-th”, the next is the first, the next is the second and so on. Now 2. and 3. are contrived examples but the point they make is that each n appears a finite number of entries into the list, and we can tell exactly how far into the list it is. Contrast that with the following non-examples:

  1. 0, 2, 4, 6, ... , 1, 3, 5, 7,\ldots
  2. 100, 99, 98, 97,\ldots
  3. 1, 9, 0, 26, 82, 0, 13,\ldots

In 1., all the odd numbers seem to appear an infinite number of places into the list. This clearly violates precisely what we’re looking at. In 2. there’s still an obvious pattern, but any number greater than 100 doesn’t seem to appear at all. Finally, in 3. there’s no clear pattern to how the numbers are being listed. It is entirely possible that this is the beginning of some valid enumeration, but without more information it’s impossible to tell. So despite the fact that \Sigma is enumerable, none of these three lists are valid ways to do so.

So hopefully that gives you a bit of an intuitive notion of the idea of enumerability. For the more formally-inclined, here is how this is defined mathematically:

The set \Sigma is enumerable iff either \Sigma is empty or else there is a surjective (onto) function f:\mathbb N\rightarrow\Sigma (so that \Sigma is the range of f). We say that such a function enumerates \Sigma.

The text proves that these two definitions are equivalent, but it’s fairly straightforward, so if you’re having trouble seeing it, I suggest sitting down and working out why these two versions of enumerability come out to the same thing. It should be similarly obvious that any subset of \mathbb N (finite or infinite) is also enumerable. However:

Theorem 2.1 There are infinite sets that are not enumerable.

Proof: Consider the set \mathbb B of infinite binary strings (ie: the set containing strings like ``011001011001..."). Obviously \mathbb B is infinite. Suppose, for the purposes of contradiction (also known as reductio) that some enumerating function f:\mathbb N\rightarrow \mathbb B does exist. Then, for example, f will look something like:

0\mapsto s_0:\underline{0}110010010\ldots\\1\mapsto s_1:1\underline{1}01001010\ldots\\2\mapsto s_2:10\underline{1}1101100\ldots\\3\mapsto s_3:000\underline{0}000000\ldots\\4\mapsto s_4:1110\underline{1}11000\ldots\\\ldots

The exact values of s_i aren’t important (as we will see) so this example will abstract to the general case. What we are going to do now is construct a new string, t, such that t does not appear in the enumeration generated by f. We will do this by generating t character-by-character. To determine the n^\text{th} character in t simple look at the n^\text{th} character of s_n and swap it. Thus, given our example enumeration above, the first 5 characters of t would be ``01010\ldots" which we get by just this method (for convenience, the n^\text{th} character of each s_n has been underlined). Now all we have to do is notice that t will differ from each of the s_i‘s at precisely the i^{th} position. As such, t does not appear in the enumeration generated by f. Thus, f is not an enumeration of \mathbb B which contradicts our hypothesis that \mathbb B is enumerable.

QED

This gives us some interesting corollaries depending on how you want to interpret the set \mathbb B:

For example, a binary string b\in\mathbb B can be thought of as representing a real binary decimal number 0\leq b\leq 1 (ie: ``0010110111..." would represent 0.0010110111... and ``0000000000..." would represent 0. Thus we know that the real numbers in the interval [0,1] are not enumerable  (and so neither is the set of all real numbers \mathbb R).

Another way to think of \mathbb B is that it is the set of sets of natural numbers. To see this, interpret a given string b=``b_0b_1b_2\ldots" to be the set b^\prime=\{n|b_n=1\}, where a number n\in b^\prime iff b_n=1 and n\not\in b^\prime iff b_n=0. So for example, if b=``10101000111..." then b^\prime=\{0,2,4,8,9,10,\ldots\}. Thus, the set of sets of natural numbers (denoted \mathcal P\mathbb N) is also not enumerable.

In later chapters we will learn the notion of a characteristic function which is a function f:\mathbb N\rightarrow\{0,1\} which takes a numerical property P and maps n\mapsto 0 if P n holds and n\mapsto 1 if \neg Pn holds. (This may seem backwards, since 0 typically denotes \texttt{False} and 1 denotes \texttt{True}, however we will see the reasons for this in due course.) If we consider an element b=``b_0b_1b_2\ldots"\in\mathbb B to describe a characteristic function b^\prime by n\mapsto b_n, then we can observe that the set of all characteristic functions is similarly non-enumerable.

Next time we will finish up chapter 2 by discussing the limitations of what can be effectively enumerated by a computer program.

Gödel! #2 An Introduction to Gödel’s Theorems 2.0

2 Decidability and enumerability

Here we go over some basic notions that will be crucial later.

2.1 Functions

As I imagine anyone reading this is aware (although it’s totally cool if you’re not… that’s why it’s called learning), a function f:\Delta\rightarrow\Gamma is a rule f that takes something from its domain \Delta and turns it into something from its co-domain \Gamma. We will be dealing exclusively with total functions, which means that f is defined for every element in \Delta. Or, more plainly, we can use anything in \Delta as an argument for f and have it make sense. This is contrasted with the notion of partial functions, which can have elements of the domain that f isn’t designed to handle. We will not be using partial functions at any point in this book (or so it promises).

So, given a function f:\Delta\rightarrow\Gamma, some definitions:

The range of a function is the subset of the \Gamma that f can possibly get to from elements of \Delta, ie: \{f(x)|x\in\Delta\}. In other words, the range is the set of all possible outputs of f.

f is surjective iff for every y\in\Gamma there is some x\in\Delta such that f(x)=y. Equivalently, f is surjective iff every member of its co-domain is a possible output of f iff its co-domain and its range are identical. This property is also called onto.

f is injective iff for it maps every different element of \Delta to a different element of \Gamma. Equivalently, f is injective iff x\neq y implies that f(x)\neq f(y). This property is also called one-to-one because it matches everything with exactly one corresponding value.

f is bijective iff it is both surjective and injective. Because f is defined for every element of \Delta (total), can reach every member of \Gamma (surjective) and matches each thing to exactly one other thing (injective), an immediate corollary of this is that \Delta and \Gamma have the same number of elements. This is an important result that we will use quite often when discussing enumerability.

2.2 Effective decidability, effective computability

Deciding is the idea of determining whether a property or a relation applies in a particular case. For example, if I ask you to evaluate the predicate “is red” against the term “Mars”, you would say yes. If I gave you the predicate “halts in a finite number of steps” to the computer program \texttt{while (true);} you would probably say no. In either case you have just decided that predicate.

Computing is the idea of applying a function to an argument and figuring out the result is. If I give you the function f(x)=x+1 and the argument x=3 you would compute the value 4. If I give you the function f(x)=\text{the number of steps a computer program }x\text{ executes before halting} to the argument of the same computer program as above, you would conclude that the result is infinite. In both cases you have just computed that function.

What effectiveness comes down to is the notion of whether something can be done by a computer. Effective decidability is the condition that a property or relation can be decided by a computer in a finite number of operations. Effective computability is the condition that the result of a function applied to an argument can be calculated by a computer in a finite number of operations. For each notion, consider the two sets of two examples above. In each, the first is effectively decidable/computable and the second is not, for reasons I hope will eventually be clear.

This raises an obvious questions: what is a computer? Or, more to the point, what can computers do exactly? For our purposes we will be using a generalized notion of computation called a Turing machine (named for their inventor, Alan Turing). Despite its name, a Turing machine is not actually a mechanical device, but rather a hypothetical one. Imagine you have an infinite strip of tape, extending forever in both directions. This tape is divided up into squares, each square containing either a zero or a one. Imagine also that you can walk up and down and look at the square you’re standing next to. You have four options at this point (and can decide which to do take based on whether you’re looking at a zero or a one, as well as a condition called the “state” of the machine): you can either move to the square to your left, move to the square on your right, change the square you’re looking at to a zero, or change it to a one. It may surprise you, but the Turing machine I have just described is basically a computer, and can execute any algorithm that can be run on today’s state-of-the-art machines.

In fact, throughout the history of computability theory, whenever a new model has been developed of what could be done algorithmically by a computer (such as \lambda-calculus, \mu-calculus, and even modern programming languages) it has turned out that each of these notions were equivalent to a Turing machine, as well as each other. Thus, Alan Turing and Alonzo Church separately came up with what is now called the Church-Turing thesis (although the book only deals with Turing, hence “Turing’s thesis”):

Turing thesis: the numerical functions that are effectively computable in an informal sense (ie: where the answer can be arrived at by a step-by-step application of discrete, specific numerical operations, or “algorithmically”)  are just those functions which are computable by a properly programmed Turing machine. Similarly, the effectively decidable properties and relations are just the numerical properties and relations which are decidable by a suitable Turing machine.

Of course we are unable to rigorously define an “intuitive” or “informal” notion of what could be computed, so Turing’s thesis could never be formally proven, however all attempts to disprove it have been thoroughly rebuked.

You might wonder, however, about just how long it might take such a simple machine to be able to solve complex problems. And you would be right to do so: Turing machines are notoriously hard to program, and take an enormous number of steps in order to solve most interesting problems. If we were to actually use such a Turing machine to try and get a useful answer to a question (as opposed to, say, writing a C++ program) it could very realistically take lifetimes to calculate. By what right, then, do we call this “effective”? Another objection to be raised might have to do with the idea of an infinite storage medium, which violates basic engineering principles of modern computer architectures.

Both of these objections can be answered at once: when we discuss computability, we are not so much interested in how practical it is to run a particular program. What interests us is to know what is computable in principle, rather than in practice. The reason for this is simple: when we discover a particular problem that cannot be solved by a Turing machine in a finite number of steps, this result is all the more surprising for the liberal attitude we’ve taken towards just how long we will let our programs run, or how much space we will allow them to take up.

One final note in this section. The way we’ve defined Turing machines, they operate on zeroes and ones. This of course reflects how our modern computers represent numbers (and hence why the Turing thesis refers to “numerical” functions, properties and relations). So how then can we effectively compute functions or decide properties of other things, such as truth values or sentences? This is simple. We basically encode such things into numbers and then perform numerical operations upon them.

For example, most programming languages encode the values of \texttt{True} and \texttt{False} as 1 and 0, respectively. We can do the same thing with strings. An immediate example is ASCII encoding of textual characters to numeric values which is standardized across virtually all computer architectures. Later in the book we will learn another, more mathematically rigorous way to do the same.

The rest of this chapter is about the enumerability and effective enumerability of sets, but I’m going to hold off on talking about those until next time.