Something that is dear to my thought process, not least because it is related to an argument that I present in my own God Doesn't; We Do, is the question of handing probability in spaces that contain infinitely many points. This topic is pretty thorny and ugly, and there are many places where it gets very weird. More importantly, parts of this topic are far from settled in the field of the philosophy of mathematics, just as is the topic of the interpretation of probability in the first place. Indeed, these are among the bitterest debates in the philosophy of mathematics at present, the latter being probably the bitterest by a considerable margin and intense enough to have boiled over into the philosophy of science and the field of ontology.
I don't intend (and am not qualified) to get deep into those debates, but I do want to talk about the issue some since it is a highly seductive area to play in and one in which our intuition is likely to lead us like the siren's song directly to the rocks of nonsense. I am guilty of this type of error myself, as I have admitted in a previous post where I note that I made this sort of error in God Doesn't; We Do. The primary aspect of how weird all this gets that I want to talk about is related to probability density functions being put on infinite spaces.
Probability Density Functions and Infinite Spaces
A probability density function is a rule that tells us how we will assign probabilities to the set of outcomes we are interested in investigating, called the sample space. Some sample spaces are finitely big, like the set of outcomes when rolling a six-sided die. Other sample spaces are infinitely big, like the range of possible values between 0 and 1 on the real number line, or all of the natural numbers (1, 2, 3, ...). Other sample spaces are finitely big in reality but with such a large number of possible outcomes as to be modelled with infinite sample spaces, like the range of all possible heights of all conceivably possible human beings. The probability density function (PDF) is what tells us the rule that assigns probabilities to various outcomes or ranges of outcomes.
In the case of the six-sided die, matters are relatively simple. If we assume a standard die (numbered 1, 2, 3, 4, 5, 6, one numeral on each face) that is fair (each outcome equally likely to occur in any roll of the die), then the PDF assigns a probability of 1/6 to each of those outcomes. This PDF is called a uniform PDF. [For those interested in the debate about probabilistic interpretation, the frequentist approach argues that this is meaningful because if we perform this experiment a very large number of times, we will get close to 1/6 of the total outcomes showing each individual value; the propensitist approach argues that this is meaningful because there are six equally likely faces dividing up a total probability of 1; and the famous Law of Large Numbers connects the two approaches without resolving the debate.]
In the case of the values 0 to 1 on the real line (denoted [0,1] if we include 0 and 1), we can assign a uniform PDF that gives an equal chance of picking any value, but that chance for any particular value is zero, which is not very useful. Perhaps the most useful way to handle this problem is by using calculus (or, in this case, geometry). We can think of the function (PDF) that is the horizontal line at height one on the interval [0,1], and then say that the probability that a value is in any interval [a,b] within [0,1] is the area under the line at height 1, above the horizontal axis, and between the endpoints of the outcome interval, a and b. For example, if we choose a=1/4 and b=3/4, then we get that the probability that a randomly selected element of [0,1] will occur in [1/4, 3/4] under the uniform distribution is 1/2. In general, the PDF could be a curve (though this wouldn't be a uniform PDF), and this area under the curve (the definite integral of the PDF on the specified interval) gives the probability of finding the random variable within that interval. This can be generalized to abstract integration (which defines modern probability theory).
Wait a minute! The total probability has to be 1, and this is like adding up a whole bunch of zeros! Good observation. We get out of this problem by noting that probability (even in the most abstract sense) is only countably additive, and the number of values in any interval of the real number line is uncountably infininte. Thus, we get the very odd and slightly uncomfortable idea that modern mathematics indicates (for good reasons, see the previous post in this series) that if you add up enough zeros, you can get something nonzero. Indeed, if you think carefully enough about this, it creates another important weirdness (called Exhibit A, see bottom of post).
In the case of the range of heights, we know that more people are near some average height but that not everyone is. The PDF that would tell you how likely it is that someone chosen at random would have a height that falls in a given range is (nearly) what is known as the normal distribution, which is the famous "bell curve." It is not really the normal distribution, not only because there are only finitely many conceivable humans with finitely many conceivable human heights, not only because of sexual dimorphism in humans, but particularly (meaning including if we ignore those other details) because no one can have negative height no matter what is hypothetically possible in the positive direction. The take-home message of this example is that some PDFs assign different values to different ranges, translating to some ranges being more likely than others (in fact, infinitely more likely than others).
So when does this get weird?
But what about in the case of the full set of natural numbers? I offer this example in God Doesn't; We Do to appeal to intuition, and by doing so, I listened to the siren song too much and committed an error (that is but isn't). Link. The song is so sweet that even a mathematician can fall prey to it. The example is to consider selecting a random number from various intervals of whole numbers: between one and [six] (following above), between one and ten, one and one hundred, one and one million, one and one billion, and so on. If we assume a uniform distribution in each case, we see probabilities of selecting some special number (say 1, which is in all of the collections mentioned) as being 1/6, then 1/10, then 1/100, then 1/1,000,000, then 1/1,000,000,000. These numbers are getting smaller and smaller and smaller as we make the sample space bigger and bigger and bigger.
The implication that follows our intuition is easy. What if we go all the way to infinity? Well, we get probability zero for drawing 1 from all of the natural numbers with a uniform distribution on them, right? No.
The problem is that if we try to add up all those zeros, we don't actually have enough of them to get a nonzero number. We get zero. But the total probability for the whole space has to be one. On the other hand, if we say the probability of selecting each number is anything nonzero with the probabilities all equal to one another, then the sum of all of those values, no matter how small, is infinite. We cannot get one this way. We cannot even put the uniform PDF on the set of natural numbers because that PDF cannot exist (i.e. it is logically contradictory).
But it feels so natural...
It's not being referred to here as a siren's song for nothing! Our intuition makes us expect that this makes sense, but it doesn't. So what gives?
What gives is the main point of this post--I'll make a conjecture about why this song is so alluring--but first I want to address a very common misconception people have at this point. Many people think at this point that the absence of the possibility of a uniform distribution* on the full set of natural numbers means that it is somehow impossible to choose at random from amongst all natural numbers. This is incorrect. It's just that the PDF that describes how likely elements are to be chosen cannot be uniform. We have to use a PDF that assigns different values to different numbers, and when we say "choose a number at random from amongst all natural numbers," that is automatically and necessarily what we mean (see *). The PDF we choose must be such that if we add up all of the probabilities for every number, we get 1.
One example has already sneakily been given. Consider the PDF that assigns probability 1/6 for selecting each of the numbers 1, 2, 3, 4, 5, and 6 and zero for every number bigger than 6. This is the same PDF we had for a fair six-sided die above, extended to the infinite space of natural numbers. An infinite number of possible outcomes here are assigned probability zero (meaning that they cannot occur). Particularly, not every number is equally likely to occur (because, e.g. 7 is less likely than 6 is).
Another example assigns a nonzero probability to every number in the natural numbers. If we consider the PDF that assigns probability 2-n to the number n for every natural number, then the probability that we would select one is 1/2, that we'd get two is 1/4, that we'd get three is 1/8, and so on, which works as a PDF (because the series (infinite sum) over all values of n is 1). Here, not every number is equally likely to occur, clearly.
Some folks argue at this point that we can imagine a situation in which we have a bag with labelled balls in it, one for each natural number. We reach in and select one at random. That means there can be a uniform PDF on the natural numbers then, right? Nope. It doesn't get around the problem. The problems with this bag are copious and discussed in the section below.
*Some folks argue that we can define a very small thing called an "infinitesimal," so small it is no longer actually positive, although it is bigger than zero, that fills this gap. Infinitesimals are the reciprocals of infinities, and there are entire fields of mathematics that have developed (and that are still developing) to try to describe and work with them. There are problems. See either my post about the error that is but isn't or Exhibit A at the bottom of this post.
Why is it so alluring then? What gives?
I suspect that because (countable) infinity is a value that represents a strong limit cardinal, we are actually not very good at conceiving of what infinity really means. Indeed, since all of our usual attempts to conceive of infinity are of the constructivist type, using something like succession or exponentiation to try to make our concept grow "toward infinity," we are falling for the idea that we can "get to" infinity when indeed, we cannot get to infinity. That's the meaning of it being a strong limit cardinal. To "get there," we have to cheat and just jump there.
Our intuition doesn't work that way, though. It says, "ten things, yeah, okay; one hundred; yeah; a million, no problem; you can see where this is going." Then it cheats and makes the jump. The thing is, by using this method of trying to get there, we will never get there. Add one more? Always finite. Put another zero on the end? Still finite. Double the number of digits? Still finite every. single. time.
Rephrased in terms of probabilities, we say "one in ten, yeah, okay; one in a hundred, yeah; one in a million, no problem; you can see where this is going." Then our intuition cheats us and makes the jump. The problem is, it jumps to something that isn't there, but our intuition is
That means that when we're doing the "you can see where this is going" step, our intuition is pulling a fast one on us by thinking of successively lower PDFs that look like the dice thing, and then it jumps. When it jumps is when we punt on not being able to really fathom the numbers any longer. When it jumps, what it doesn't realize that it's done is assign a tail behavior to those remaining values above our last example (maybe it's a googolplex or a googolplex to a googolplex power; it really doesn't matter). That is, it does something that either drops off to zero at some point and ever after, or it trails off to zero fast enough so that the total sum of the probabilities of every possible outcome is still zero.
But what about that bag example?
Yeah, so the bag with infinitely many labelled balls in it. This has some issues with it. First, how did the balls all get into the bag in the first place? If added sequentially or even in ever-growing (but still finite) lumps, the bag is never filled if we fill at regular intervals or any nonconvergent time intervals (convergent time intervals very rapidly have us putting balls in the bag faster than the Planck time, which is not good for making physical sense--but then neither is a bag that can hold infinitely many balls, presumably all of the same size).
If we suppose that we somehow magically have that bag, we still have a problem, though. Our discussion above really reveals what it is: we know the PDF cannot be uniform, and so we conclude that it cannot be possible to assign equal probability to every ball in the bag. This makes a lot of sense as soon as we imagine it.
This bag, presumably is meant to be reached into in order to pick a ball. That makes us imagine its dimensions. If it is only finitely big around, then it must be infinitely deep. At some point, I think we all have to admit that the balls way down in there (there is no bottom, actually) are less likely to be chosen than the balls nearer the top. This is where our intuition does that jumping thing. Down to any specified depth, we're good to go and can force the analogy. The problem is that infinitely many balls are below that specified depth, and our intuition automatically either ignores those balls (or all of the balls some depth related to the specified one, like 1000 times deeper) or assigns a diminishing significance to their role in the problem (and thus a shrinking likelihood of choosing them).
On the other hand, if the bag is infinitely big around, then we may have a bottom to the bag, but the bag has no edges. We do the same fuzzing out around the edges, ignoring vast swaths of balls (almost all of them, in fact) which defy our intuitions around the edges.
Thus, this bag has to have a non-uniform PDF assigned to picking balls out of it, and once we realize that, the reasons start to seem pretty obvious. Such a bag, even in principle forces us to ignore most of the balls in it.
In general, then, bag analogy or not, the reason we feel like it is intuitive to do the impossible and put a uniform PDF on a space with infinite measure comes down to having to ignore the vast majority of the values when we ultimately cheat and jump the gap to the strong limit cardinal that is infinity. Our ignorance of the majority of the values is so complete that we are too ignorant to realize we're ignoring them, and them is almost all of them.
I'll end this here, except for the alluded-to Exhibit A. This is quite long and weird enough as it is.
A savvy reader will have noticed when discussing the interval [a,b] inside of [0,1] that we said that we get a total probability of 1 for [0,1] and probability of b-a for [a,b] by the uncomfortable (and weird) notion that we can get it if we add up sufficiently many zeros (uncountably many) because we aren't bound by the countable additivity that defines probabilities. This observation will strike you as weird when you are made aware of the fact that the number of values in [0,1] and in [a,b] is exactly the same. Indeed, it's the same as all of the numbers on the entire real line. Thus, by adding together the same sufficiently many zeros, we can get any finite value and infinite values (we can even get zero, but not by considering true intervals).
What the hell?
This is a major point I will keep making while I talk about what's inside this infinite rabbit hole. It underscores the frustrations involved in attempting to use infinitesimals to get around the problem at the center of this post: that uniform PDFs cannot be put on spaces of infinite measure. That point is that we cannot tell how many infinitesimals we're adding together up to a certain equivalence class of them. Here, we saw that adding that many zeros gave us 1 and gave us b-a (=1/2 in the example I used). Those aren't the same, but we added exactly the same number of infinitesimals of the same kind together to get both of those values. Indeed, technically, we know exactly how many infinitesimals of a certain kind we're adding together, but that information is insufficient to tell us what the sum should be.
Another good example of this is that if we look at the natural numbers, it is very intuitive to want to say that the probability of picking an even from the naturals (with a uniform distribution) is 1/2 and that the probability of picking a number divisible by 3 is 1/3. Indeed, attempts to make this work like Erdos's natural density approach have been pushed for a long time. Here's the problem, though. The number of even natural numbers is the same as the number of natural numbers divisible by 3 is the same as the number of natural numbers overall. So if I give you that many infinitesimals (of the kind reciprocating countable infinity) and have you add them up, how can you distinguish between getting 1/3, 1/2, and 1 (or any other value) as the answer? The natural density of the set of primes is 0, so starting from countably many countable infinitesimals, how do you know you don't get a sum of zero? Alternatively, if we were to use the primes as the base set, how do you know you don't get a sum of infinity? The usual way to sneak out of this is to say that you define the infinitesimal so that adding the right infinity of them together gives you 1, but then you cannot account for the 1/2 or the 1/3 you want in those other cases.
What about Exhibit A in the case of the infinitely broad magical bag containing balls labelled with all of the natural numbers? This is a great visual example of the preceding paragraph: how many layers (each holding infinitely many balls) deep is this bag? If the bag is infinitely broad, then one layer could contain all of the balls, but two layers contains just as many, as does three, four, five, and, indeed, even infinitely many such layers. We can argue that it doesn't matter, but it does since intuition is our (bad) guide here. If I had two identical bags of this kind except that one had one layer and one had four, intuition tells us that the probability of drawing any particular ball from the four-layer bag should be 1/4 of that of drawing it from the one-layer bag--but it isn't. In each bag, there is still only one ball for each natural number.
I consider this to be quite important to anyone who wants to be cautious enough to get their arguments involving infinity right.
My conclusion is the same as in the previous post in this series: infinity is a useful axiomatically defined abstraction that if used to assign properties to God immediately renders that God an abstraction as well, and abstractions don't do anything like answer prayers, make worlds, or anything else of that kind.
Edit (11 Feb. 2013): Part three, discussing the issue of infinity being a strong limit cardinal from the other side, is now available (Link).
(15 Feb. 2013): Part four, discussing the impossibility of the uniform PDF on an infinite space more tightly, is now available (Link).
(21 Feb. 2013): Part five, discussing the apparently very human endeavor of selecting axioms, say about infinity, is now available (Link).
If you enjoy my writing (and honesty!), you can read more of it in my first book, God Doesn't; We Do: Only Humans Can Solve Human Challenges. If you choose to pick it up, I thank you for your support of myself, my family, and indie authors in general.