Monday, June 2, 2014

A Problem with Evidence

How do we know when to call something we observe "evidence" for some idea we have about the world?

As it turns out, this is a very hard question. The answer matters too. On some of the going understandings of the concept, it is correct to say that there is evidence for God, Santa, and flying saucers, which I think causes most people to raise a skeptical eyebrow. Should we use such an understanding of evidence? Personally, I don't think so. I think there's something profoundly broken about the idea that, properly understood, there can be evidence for something that is not the case.

I. Backstory and introduction

I have become involved in a discussion, largely with philosophers of religion about evidence (non-professional philosophers, to be clear--just as I'm neither a philosopher or a scientist, which will prove a relevant admission as we go from here). They are operating by one definition of the term, and frankly, I don't like it. This is an attempt to put down some of my thoughts about the topic of evidence. It, unfortunately, is lengthy.

I don't want to bog this down with a lengthy background, so instead I'll offer a short summary of the key points.
  • There are a number of usages for the word "evidence" that all fall within the same general sphere of meaning but which differ from each other significantly;
  • In many cases, philosophers and scientists tend to use different definitions of evidence, and many scientists seem to hold a working understanding of evidence that is at odds with the more common philosophical definitions (this being complicated because many scientists don't bother to consider the philosophical definitions, the philosophical definitions seem superficially good, some notable scientists seem to agree with the definition, and the usual so on that makes any controversial topic complex);
  • I feel the more common philosophical definitions, though sophisticated, have some problems, and I wish to present those here. To be more specific, I feel like they are irresponsibly misleading and miss a great deal of what people seem to mean by the term "evidence";
  • I will offer something of a definition for "evidence" that attempts to account for some or all of the problems I feel are present in the main philosophical definition I encounter, one that I think falls nearer to what is meant by the scientific use of the term. Particularly, I will make the case that when talking about evidence, we are referring to a body of observations, except in certain special cases. I do not think it is likely to be in our best interests to discuss whether or not individual observations constitute evidence except in those special cases;
  • Because my purposes here are directly rooted in arguments about God and religious belief, I will entertain a number of asides to discuss how the material under discussion applies to the question of God's existence; and
  • Mirroring the definition of knowledge as justified true belief, which requires that the belief be true to qualify, I will make a case that we should only use the word "evidence" for information that points to something true. That is, I will argue that there is no evidence for anything false.
A number of terms crop up here that may also not have meanings that are universally agreed upon. Importantly, among these is the word "observation," which I have borrowed from the philosophical definition I'm most interested in critiquing. (I think I would have used the word "data" had I not taken "observation" from their language.) Ideally, I think observations should be considered objective information that we have somehow gathered about reality.

There is no way that I'm aware of that I could be comprehensive with this endeavor, particularly in the given format and without a very long period of serious research that I simply don't have time for (nobody pays me to do this stuff, folks). I don't pretend that I've given anything like a final word on the immensely complex and unsettled topic of "evidence," although I do feel like I am trying to add to the conversation about it in a productive manner, by which I mostly mean for philosophers as scientists seem to be getting along quite well with whatever their working definitions happen to be.

Regretfully, this is far too long as it is. More regretfully, it's not nearly long enough to do the thing right.

II. The complexity of the problem

Evidence is a tricky term. It has one (or more) meaning(s) in everyday usage, which academics refer to as the "folk concepts," the statements of which can be found in any English-language dictionary. It has another, more precise meaning in science (three, actually, that are similar and situational). It has a variety of meanings to philosophers.

Unfortunately, none of these definitions for "evidence" agrees except in general spirit, which makes it a ripe area for arguments and publishing lots of papers in sophisticated philosophy journals, notably in the philosophy of science and epistemology. The academic debate about what constitutes evidence is rather hotter than most people realize or care (or, I'll note, will ever care, at least directly), and curiously, the people who rely on it perhaps the most and who are most readily identifiable with it, scientists, appear to care the least. I would guess that this is because they are largely satisfied with their working definition because it does what all things scientific must do, it works.

Worse, when people use the word "evidence," sometimes they clearly refer to a single observation ("These flowers are evidence she loves me.") and at other times to a collection of observations ("The evidence for the Higgs boson allows us to conclude it exists."). These two usages have to be--but can't really be--teased apart to get some much-needed clarity on the notion of what we mean by the term evidence and thus how we should use that term. I am sympathetic to the notion that evidence should be viewed in most cases as a body of observations, although there are circumstances in which a single observation suffices.

Complicating matters still further is the usage surrounding the critical role that evidence plays in legal proceedings. In legal matters, we talk about different kinds of evidence, notably circumstantial and direct evidence, where circumstantial evidence relies upon inference to reach the conclusion and direct does not. The usual legal understanding is that when lacking direct evidence, circumstantial evidence has to accrue (via corroborating circumstantial evidence) into a body of observations that, together, are sufficient to make and decide the case (to a particular standard of burden of proof, higher in criminal cases than in civil). In law, circumstantial evidence becomes worth more as possible alternative explanations for the observations are ruled out.

Law is significant in this discussion because "the evidence" is used to make a case that is judged to favor either the plaintiff or the defendant. This causes a difficulty in the form of giving the illusion that "the evidence," meaning the available body of information pertinent to both the situation and the case, can point to something that is not true, as when an innocent man is convicted or a guilty one acquitted. Critical to note is that legal cases are adjudicated upon by people approximating the net worth of various observations, labelled "evidence," in deciding a case, and people are lamentably a rather unreliable indicator of the truth of the matter at trial. Something else to realize here is that there are stated, though slightly fuzzily applied, standards in different kinds of trials ("preponderance of evidence," i.e. greater than 50% favors one party or the other, and "beyond reasonable doubt," which is far stronger). These situations can be likened to statistical confidence tests at different levels of confidence.

Yet another example of a common, and complicating, statement in legal proceedings would be, "to make a decision, the court needs more evidence." This kind of a statement could be taken to mean a variety of things, but it carries the implication that observations constitute evidence on their own because they support one case or another. Some better ways to phrase this, I think, would be to say that "the court needs more information," that "the court needs more potential evidence," or that "the information provided constitutes evidence only at a confidence level of p when a decision of this type requires a confidence level of q" (with q larger than p).

Clearly, there's a lot going on in the idea of evidence, much of it not-too-clear, and different people use it in many different ways. The group I haven't talked about yet is philosophers.

III. Getting to my specific problem, a philosophical definition

The definition of evidence that many philosophers are using currently--one very popular in the philosophy of religion--is one I think has some pretty serious problems with it. (Note: philosophers recognize at least four distinct definitions of evidence, based upon Jeffrey Jay Lowder's recent brief Google-hangout summary of The Book of Evidence by noted philosopher Peter Achinstein at John Hopkins University.) I'll state it in a rough form here:
An observation A is evidence for a hypothesis H if the probability that H is true given A (along with background information) is larger than the probability that H is true given background information alone.
If I were J.R.R. Tolkien writing The Hobbit right now, I'd say, "Now, this definition has a few pretty serious problems with it that you, no doubt, saw at once, but you would not have done nearly as well if you had dedicated your entire life to parsing out abstract ideas in their most pure form without necessarily caring how they attach to reality." Of course, I'm not him, so I'll just point out that under this definition, J.R.R. Tolkien's brilliant and popular children's novel, The Hobbit, constitutes evidence for hobbits, dwarves, dragons, elves, talking giant spiders that hate being called attercop, and all manner of other imaginary things. The probability that they exist is higher because someone talked about them than it would be if no one ever thought to. This shouldn't bode well for such a definition.

Indeed, to really upset ourselves about this definition, let's consider just a sparse few more things that it technically tells us because the conclusion is slightly more likely given the observation than without it:
  • Being from Baltimore is evidence that you are a murderer.
  • Fire is evidence for dragons.
  • Water molecules on Mars are evidence that Mars is made out of cheese.
  • Owning a gun is evidence that you will commit suicide.
  • Being from the wrong neighborhood is evidence that you're a criminal.
  • Having Hussein as a middle name is evidence that you're a Muslim.
  • The existence of jet contrails is evidence for chemtrails.
  • Their surprising popularity is evidence that homeopathic remedies work.
  • The existence of arguments for a hypothesis is evidence for that hypothesis, as is the existence of a believer in the argument. 
  • The (statistical) wave nature of light is evidence for the discredited luminiferous aether.
  • The (erroneous) observation of neutrinos that (actually did not) travel faster than the speed of light is evidence that there are particles that travel faster than the speed of light (at least when this observation is taken on its own, which is a possibility under this bizarre definition, which considers individual observations and grades them as evidence or not). 
  • That they are occasionally "right" is evidence for astrological horoscopes and thus astrology.
  • Easter baskets are evidence for the Easter Bunny.
  • Everything is evidence for a God with a Plan.
  • Correlation is evidence for causation.
In case you believe that I'm being flippant in bringing up some of these examples, to "better" make the point about how this definition of "evidence" works, some of the philosophically sophisticated are happy to pass around a "proof" that there is legitimately evidence for Santa Claus, something to do with the sound or appearance of footprints.

What's the problem? Well, to be Tolkienish, I'm sure you can see it now, unless, that is, you're a philosopher who has spent an awful lot of time getting into this line of thinking: it flies in the face of what we mean (folk concept) when we use the word evidence. My wife, who has little interest in such squabbles, upon finding out about the "evidence for Santa" thing said, and I quote her verbatim, "If they say they have evidence for Santa, that means that they at least partially believe Santa exists. If they're adults, don't talk to those people." She refused to accept the "more sophisticated" definition.

Worse, this definition, because it allows such statements, is profoundly misleading, not just potentially but actively and actually, as my wife's comment demonstrates. People think evidence for something implies truth. Of course, the real power of science literally lies in rejecting this understanding of evidence (see last bullet point above).

A curious point is also raised about the usefulness of this definition of evidence. Note, for instance, that the observation that you have a lottery ticket is simultaneously evidence for winning the lottery during the next drawing and losing it.

Being fair

Please do not let me mislead you into thinking that philosophers who concern themselves with this matter are stupid because they're not. They're just too technical and abstract at the same time. They have a robust and sophisticated understanding of the matter that makes this definition still work in practice, and I am led to understand that some prominent scientists and many lay people agree heartily to it once they understand it.

Now, it must be granted that all such examples would be qualified with "these are examples of very weak evidence...," but the point remains that under this definition, each of those statements, and many more besides, are technically true. Thus this definition is misleading. The problem, like I say, is that it is not how most people are willing to use the term; it is misleading; and it isn't terribly useful for science on its own (or, as one scientist I spoke with worded it, "it is terrible and almost useless!") Take note that true-believers in anything will latch onto the phrase "there is evidence for [insert whacky belief]" and run with it--thus, it's not just misleading, it's irresponsibly misleading and potentially dangerous.

To elaborate on that critically important parenthetical point, beliefs like religious beliefs--which despite all else routinely lead to horrible abuses--are maintained on biases like cherry picked "evidence." For a fundamentalist, "there is evidence for God," particularly from an atheist, is more than all he needs to get on with, and so too for most typical believers. A common colloquial understanding of the term "evidence" is that it constitutes sufficient justification to warrant belief. Worse, for an apologist this definition is pure gold, and it is no wonder I ran into this definition first by dealing with people interested chiefly in the philosophy of religion. Their main line, with which they bamboozle themselves and other believers, is "we have evidence for our faith." Supporting religious beliefs, or reinforcing tools to allow others to do so, on a technicality is simply dangerous and irresponsible. If a better definition is available, it is unquestionable that we should use it instead.

Philosophers deal with this problem by pointing out that we use evidence to make a case for a hypothesis or against it. It isn't really the evidence but the case being what we use to determine the truth-value of a hypothesis. Cases can include arguments or not, but typically they involve many evidential observations collected together and pointing in a single direction.

In other words, what really matters for philosophers using this definition, when it comes to making decisions, isn't evidence exactly, but rather bodies of evidence and the case made by them. "Sure, footprints in the fireplace are evidence for Santa," they might argue, "but the whole body of evidence about Santa collectively weighs against belief." Thus, for them, there is no problem. Never mind the fact that they're putting on their most serious attitude and making a case that there's evidence for an obvious fiction.

Their position is that evidence comes in various degrees, and strong evidence outweighs weak evidence, perhaps extraordinarily heavily. Put all together, a case is formed that allows us to decide upon hypotheses. One could gather evidence bearing on the Santa question, see that there are thousands of bits of very weak evidence in favor (mostly circumstantial or correlative), realize that the evidence against is very strong, and conclude that there is no Santa. The assumption is that even in complex matters that aren't obvious works of fiction, a rational agent or team of them using corrective measures like scientific protocol and peer review will do just that. Their definition is fine and gives them a relationship between observations and the notion of evidence for a hypothesis or other idea.

This is all well and good, but it the problem of it being misleading and not directly useful in the sciences lingers.

An aside into a curious possible loophole

This definition admits a curious loophole that I think is actually pretty important, though this is something of a fringe opinion--not that it matters much because I don't want to use this philosophers' definition for "evidence" anyway.

If we allow that a hypothesis or belief can have a probability of zero, almost surely, either a priori or on background knowledge, then there is only one kind of evidence that could have a chance of making that probability higher: almost certain evidence, the kind that grants a conclusion with literally all but 100% certainty.

Here's an example: Santa Claus, this time done right. I suggest that the reason we recoil against the idea that there is any evidence for Santa Claus is because we know Santa is a fiction as part of our background knowledge. As I'm framing it, one way to view it would be that knowing Santa is a fictional character implies that the probability of the Santa-hypothesis is zero, almost surely. (NB: "Almost surely" is a technical mathematical term that needs to be included to avoid question-begging categorical denial.) Put in mathematical shorthand,
P(Santa | background)=0, almost surely, because part of our background knowledge is knowing that Santa is a fictional character and thus almost surely not real.
If I'm right, we are almost absolutely sure that Santa doesn't exist, but we still leave open the tiniest possibility of being stood corrected. What I mean is that it would take almost certain evidence for Santa--like actually meeting an unequivocal Santa on Christmas night--to have any chance of raising the probability that Santa exists. In this situation, no circumstantial observation changes the probability of the Santa claim because it's totally overwhelmed by the fact that we know Santa is a fiction.

Critically, the only kind of observation or collection of observations that could constitute evidence by the philosophical definition under examination is one that bears almost sure confirmation; literally no other observation can qualify as "evidence." Since we do not possess any almost sure evidence for Santa, we can safely conclude that we do not possess any evidence for Santa. (Mathematically, this could possibly be handled by using l'Hôpital's rule, or something like it, to handle the resulting indeterminate forms arising in Bayes's theorem). To construe it like the philosophers do,
If P(Santa | background)=0, a.s., then P(Santa | obs. + background) can only get bigger if the observation in question for Santa is almost surely evidential for Santa Claus.
This raises the question of whether or not almost sure evidence can exist for a known fiction, which is an interesting enough question for people to work on, I suppose. I would contend that it is, at least in principle, but perhaps not. (Would a real-life Santa that matches enough of the properties in the stories really be the Santa in the stories?) I do think it's beyond question, though, that we could have almost certain evidence for a sufficiently Santa-like something to get on with calling it Santa Claus without too much fuss. (This philosophical beef jerky is more entertaining if we imagine finding paleolithic teddy-bear-like creatures on a forest moon in the far reaches of the galaxy--are they, or are they not, Ewoks?)

What about God?

Where this question gets particularly curious, and poignant, is regarding "theism," a blanket term for about a bajillion different positions that don't agree with each other. I would contend that this question can be analyzed by (a) not choosing a prior probability for "the God hypothesis" at all, (b) considering all of the background knowledge we have that bears on the matter, which I'd contend renders the background probability for "the God hypothesis" arbitrarily small, which is to say effectively zero, almost surely, and then (c) concluding that no observation but almost certain confirmation would raise the probability that God exists at all. This would make it the case that unless something absolutely and unquestionably had God's fingerprints on it, it couldn't be construed as evidence for God--and this is on their definition of evidence, the one I think is way too permissive.

Importantly, I don't categorically deny that this is possible, in principle, and I certainly don't just assume it isn't possible to have any kind of evidence for God, but wishy-washy crap like "life" and "consciousness" simply shouldn't cut it (especially since one not even need assume strict materialism to come up with other probably bad pseudo-explanations for life and consciousness). I simply think that there are no observations that point to the existence of God sufficiently to qualify as evidence, although there could have been, and books like the Bible use mythology to show us how ancient people pretended there was.

A more accurate representation of my position is that I think there are certain things that would count (like continued miracles, unequivocal and obvious benefits to believing the right religion, the obvious fulfilment of the promises of Jesus, or God directly and unequivocally communicating directly with every person), but our background knowledge of the world contains none of these, or any sort of thing like them. Instead, every set of observations for which we have explanations can be satisfactorily accounted for without a God, and that set is getting pretty small in every avenue that it might matter. Instead of a single observation that truly points to God, we get mysteries (e.g. consciousness), arguments about abstractions (like ontological ones, among others), and heaps of confirmation bias (and listing only these is actually being quite kind).

At any rate, if it makes sense to say that it is possible to have almost sure evidence, thus almost sure probabilities for hypotheses (based upon our backgrounds, e.g. knowing that a story is a fiction, ancient mythology, a etc.), which I think it is (because to say I'm almost sure my desk exists seems pretty reasonable), then in the cases where that occurs, an observation or body of them would only qualify as evidence in the cases when it is almost certain itself.

IV. Another philosophical definition

Philosophers knock on an important door with another definition, according to Lowder's summary of Achinstein: that which raises the probability that a hypothesis is true to greater than half, i.e. something that makes the truth of a hypothesis more likely than not. This is the "preponderance of evidence" definition in the legal sense. There are some curious issues here, some of which may have resulted from misunderstanding the very brief introduction I saw for it.

First, this definition is a weird one in the context of the discussion about "an observation A is evidence when [something]" because other than direct, concrete observations, individual observations only rarely would meet this criterion. (An example of a direct, concrete observation: "I have goats on my farm. Here, see this goat?" "By Jove, you do have goats on this farm!" These, I think, being almost sure and conferring 100% confidence, almost surely.) Indeed, this definition seems to apply best to collections of observations that collectively raise our confidence above some stated limit, here half. This may be the point I misunderstood from their brief introduction, but to my credit, it appears that they also got snagged on this point in their conversation.

Also, notice that this definition isn't very useful except in civil law--the "preponderance" requirement is, in an important sense, arbitrary and weak. Scientists, engineers, and courts routinely require much stronger requirements before we consider a matter settled and the observations to be evidence, like 95%, 99%, 99.9%, five-sigma, six-sigma, and "beyond reasonable doubt," none of which are accounted for by this restricted definition. That the "evidence" bar in this other philosophical attempt to define the term is set so low renders it rather misleading as well as largely useless.

V. The scientific definition(s)

Apparently, this whole argument arose because I differ from my philosophically inclined friends by firmly espousing something like the scientific definition (perhaps my training as a physicist, though incomplete, served some purpose).

The scientific definition can best be summarized in a colloquial fashion by the usual humility and honesty of science. One needs only listen to how readily scientists will say that some observation X is not evidence for some hypothesis H the moment they know that H is discredited, wrong, or false. Fire is not evidence for phlogiston; the wave nature of light is not evidence for the luminiferous aether; and life is not evidence for vitalism. Scientists do not tend to say that there is evidence for discredited or obsolete scientific models. (Note that one important difference relevant to another heated philosophy/science discussion that rages currently is that part of how science makes so much progress so rapidly is that it discards discredited ideas, which philosophy has some intrinsic problems with doing.)

In other words, generally speaking, scientists tend not accept that the term "evidence" applies to anything we know to be false. I have personally been using this line of argument for a while now--there is no evidence for something false, only evidence that can be misattributed to a false idea. In that, I think of evidence as a body of facts and observations that reflect reality, and only reality. Note that we have another word for such observations: data (or, in some cases, just "observations"). That raises the question of when data qualify as evidence, which is really just where we started.

Before we get to that, there's another problem that scientists have with the "raises the probability of the hypothesis" philosophical approach to evidence. The idea that a single observation can be considered evidence for some hypothesis, outside of immediately sufficient circumstances, which I will discuss later, is dangerously misleading. In talking with a working scientist about this last week, his immediate, recoiling response was that "a single datum could be construed to be evidence for almost anything, and the error bars on a single datum's support for any model would be so enormous to render it meaningless!" Thinking of evidence as single observations is probably an error--except in the cases where we have direct, concrete observation, like seeing a goat on a farm or recognizing that something hit a particle detector.

That last point is one I think is important in this discussion also; the role the hypothesis plays isn't trivial in determining if something is evidence or not. Particularly, we want to avoid confirmation bias. Notice that we don't need a theory that encompasses protons to notice when one hits a particle detector, and thus we don't have to get caught up with the confirmation bias-laden activity of starting with a hypothesis and then seeking evidence for it.

In science, the real heavy-lifting kind of observations are predictive ones that would disconfirm the model if they weren't satisfied, so instead of seeking to confirm a model, we seek to break it and call it good only if it resists our best efforts. Only when that happens are careful scientists usually eager to consider their observations to be evidence. In science, models and hypotheses and theories are all throw-away entities, the data itself is ultimately the core of what matters. The data become "evidence" when the model starts getting sufficient confirmation.

On the other hand, a single observation of something hitting a particle detector doesn't necessarily count as evidence--there's too much room for error. A few decades ago, a sensitive bit of equipment was seeking to detect magnetic monopoles, which are known to be very rare if they exist naturally at all, and it had the right kind of signal come up one day. But that signal occurred when no one was present (labs being far less sophisticated then than now), and it registered in a way that left far too much doubt that there might have been some kind of coincidental interference that it was a legitimate observation--perhaps the foundation of the building settled ever so slightly and jarred the detector. No one knows. That observation is evidence under the philosophical definition, but no scientist I am aware of seriously considers it to be evidence for natural magnetic monopoles.

Overall, the scientific usage of the term evidence seems to run along the ideas that evidence is a body of knowledge that supports a model that is provisionally true. Both of these conditions need to be satisfied to qualify a set of data as evidence.

Regarding "provisional truth," this is determined by a number of complex factors including the support of all relevant data, the explanatory salience of the model, the predictive power of the model, the ranges the model is considered valid over, the confidence with which we can say the data supports the model, and the consistency of the model with other successful models in related fields of science. It is the goal of many scientific endeavors to offer models that qualify as being taken as provisionally true.

NB: There are a few understandings of evidence from the scientific perspective, at least three, but they generally run along the same theme and apply in different arenas. I'm sidestepping this detail in the interest of brevity, which I've already lost with a great deal left to go.

VI. A modest proposal

My preliminary proposal is pretty straightforward, and it sort of blends two of the understandings that philosophers use and tries to keep to the scientific understanding of evidence, which is actually useful and not misleading. Further, I think it reflects the everyday "folk" use of the word in many applications.
A body of observations O is evidence for a hypothesis H if, and only if, it is a consistent part of a larger body of observations called evidential closure of O, comprised of all observations bearing significantly upon H, such that the probability that H is true given O (plus its evidential closure) is sufficiently great to warrant justified belief that H is true. In this case, we could call an observation A in O an evidential observation.
To summarize this definition in plainer language, I'm saying that an observation should only be considered "evidence" (more carefully, an evidential observation) for a hypothesis if it is a consistent part of a large number of observations that taken together, along with all other observations that have relevance, constitute support that justifies belief in the hypothesis. In short, we only have evidence if all of the relevant information we have, taken together, justifies accepting the hypothesis at a given level of confidence, and then the specific body of observations that provide inferential or direct support for the hypothesis is the evidence.

The body of observations that collectively justify acceptance of the hypothesis, not any observation individually, is what we should consider to be evidence, and we could call an observation in that body an "evidential observation" if we wanted to. The key here is that something should only constitute evidence for a hypothesis if that hypothesis has, on the whole, strong enough reasons to be believed to be taken as provisionally true.

Thinking of evidence as a body of observations, instead of thinking of individual observations themselves as being evidence, comports fairly well, but imperfectly, with the way lay people, scientists, and lawyers use the word, so it is not a radical overhaul to suggest that it be treated specifically as such.

So, about God...?

As a consequence of this definition, being from Baltimore is not evidence for being a murderer. Owning a gun is not evidence that you will commit suicide. Water molecules on Mars are not evidence that Mars is made of cheese. There's no evidence for Santa Claus; there is no evidence for the Easter Bunny. In all of these cases, the body of observations relevant to these hypotheses are not sufficient to justify provisional truth, and so these observations are neither evidential observations nor are they evidence.

Also, and for the same reason, there is no evidence for God. The total body of observations relevant to the question of God's existence simply isn't sufficient to justify knowledge that God exists, and thus these observations do not constitute evidence for God's existence. Those who believe and apologize for belief may have observations, even suggestive ones, to support their belief, but they do not have evidence.

Again, this need not necessarily be the case, though. I think it is imminently reasonable to suggest that if we found the world ordered in the way that the ancient scriptures, like the Bible, imagined it was ordered, we would have sufficient reasons to believe that God exists. Thus the relevant observations supporting some specific brand of theism would constitute evidence. It's the fact that we cannot conclude that the existence of God is a truth of the world that prevents the observations we have from constituting evidence for the existence of God. (Of course, an epistemically hidden God presents a superficially tough nut because belief in such a thing can only proceed on faith, by which I mean belief without evidence, but it gets weirder than that because a properly epistemically hidden technically cannot even have observations that lead to belief. Thus, this case may be moot.)

NB: Concerning the Bible, on the suggested definition of evidence, the Bible does not constitute evidence that the world was ordered in the way the Bible imagines, although on the philosophical definition I have a problem with, it does.

The evidential closure

The purpose of the introduction of the evidential closure of a set of observations into the definition is that otherwise a set of observations O is subject to having been cherry picked, and that's not acceptable. My thought here is that the evidential closure of O includes all observations that bear significantly upon O, and if O doesn't confer justification in light of those other observations, O shouldn't be called evidence because it isn't truly sufficient to justify belief.

For example, if we limit ourselves to "usual" sizes, speeds, and gravitational environments, Newtonian mechanics has an enormous body of observations that would (and should--as we'll see) constitute evidence for it, but the observation of the precession of the perihelion of Mercury, for example, bears upon O without being a part of it, indicating that something more robust than Newtonian mechanics is needed (here, general relativity seems to do the trick). The precession of the perihelion of Mercury is in the evidential closure of the body of evidence for Newtonian mechanics. In this case, the evidential closure of O contains observations that are relevant to the hypothesis, the "truth" of Newtonian mechanics, that one could accidentally or dishonestly avoid by choosing O to suit a belief that Newtonian mechanics is "true." This kind of cherry-picking shouldn't be acceptable, but as we will see shortly, the matter with Newtonian mechanics is a special kind of case that complicates matters.

Best-available evidence

One other quick note to make is that there is a subset of what I called the evidential closure of O that we have to call the "best-available evidence." This is almost exactly what it says it is, the best evidence that we have at the given moment. There's a little bit of an issue here.

The "best available evidence" regarding life on Mars currently leads us to conclude that there is no life on Mars. There are tantalizing observations that suggest it is possible, maybe even probable, but we cannot make the conclusion that there is life on Mars by the "best available evidence" we have. The issue is that if we use this phrase to say "the (best available) evidence indicates that there is no life on Mars," and later we find life on Mars, we will not continue to say that the evidence indicates no life on Mars. Instead, people will say "by the best available evidence at the time, we could not conclude that there was life on Mars." This may actually be an abuse of the term "evidence," as is revealed by the fact that exchanging "data" or "potential evidence" for "evidence" in "best available evidence" completely eliminates the problem.

Confidence and ranges


The example of Newtonian mechanics, brought up above, is a pretty good one for talking about ranges (of relevance) and provisional truth--thus evidence. Scientists, I think, would all agree that there is copious evidence for Newtonian mechanics even though they universally know better than to say Newtonian mechanics is "true." General relativity supersedes it. Of course, we must stay constantly aware that we know that "true" isn't a real property of scientific models. Provisional truth is the relevant idea.

Newtonian mechanics is, like all models, an approximation that is useful and provisionally true, provided that we are limiting our range of relevance to large, "slow" objects in particular gravitational circumstances. In the cases where the error is small enough, Newtonian mechanics is provisionally true even though general relativity is more accurate. (GPS is an example where very small gravitational influences matter profoundly.) On the ranges where it has low error, Newtonian mechanics is provisionally true, and thus we have evidence for it on those ranges.

This last bit needs highlighting. Scientific models seek to be useful and to provide some decent degree of explanatory salience, their utility being to describe and make predictions about phenomena. Use is limited to the range over which the model is sufficiently accurate. For another example, the small angle approximation that says that the sine of an angle is approximately equal to the measure of that angle for small angles provides a model that's useful over a certain range of small angles, and that's all that matters.

When we say that a hypothesis is provisionally true, part of what that entails is an acknowledgement that we're only referencing the range over which it is useful, limits that can be described quite accurately when we are aware of them.


In science, confidence is measured by statistics performed on bodies of data. To be able to draw good conclusions, a fair amount of data may be needed, ranging from dozens of elements in the sample for some kinds of conclusions (e.g. some medical and psychological studies) to billions for others (like in particle accelerators). Confidence is a measure of how sure we can be that our model is "true" in the sense that it accurately describes and predicts the data over its relevant ranges. It's worth noting that this is all very well understood and can be applied quite effectively by scientists working with the statistical tools relevant to their fields (something that other scientists are eager to point out should another use the wrong statistical methods since it is a virtually guaranteed publication to do so).

Here's where the "justified as provisionally true" part of the definition that I offered for "evidence" comes in. One of the philosophical definitions says that something is evidence if it makes the probability that the hypothesis is true greater than half (so the hypothesis is more likely to be true than false, the "preponderance of evidence" from law). Science is already doing this, but it's not doing it with such a low bar. Statistical confidence intervals do a better job, and those are usually 95% or something much stricter, not 50%.

And here we can see that by using statistics we can grade a body of observations in terms of what we mean by calling them "evidence." We can state our confidence, as a probability, that some hypothesis H is true is given set of observations O (plus background) that is sufficiently great to warrant justified belief that H is true. If we are using 95% confidence, we can say that the observations constitute evidence at the 95% confidence level (and, by implication, not necessarily stricter levels).

For many kinds of research, 95% is sufficiently great, and for other kinds, we need to be sure to better than one part in millions or billions. Importantly, statistics on a body of observations allows us not only to decide when they constitute evidence, but they allow us to state exactly the confidence we have in that determination. If we conclude with 95% confidence that a body of observations constitute evidence for a hypothesis, we're automatically stating that there's a 5% chance that we're wrong and that those observations are not actually evidence for that hypothesis at all.

What could we call observations that aren't good enough to qualify as evidence? Observations. Data. Potential evidence. We already have words for this, and it may be a grave error to haggle over whether or not they are evidence. We lose nothing by making statements like, "the data suggest that [insert hypothesis here] is true." When the data is sufficiently suggestive to conclude provisional truth, we can consider it evidence. When we start to suspect that the data is pointing in a particular direction, we could call it "potential evidence" or merely "data." This still bucks the lay usage, of course, by making it more precise, but it makes it more careful in a way that is far, far less misleading and far, far more useful than the profoundly misleading philosophical definition.


Idealizing this definition would read something like this, putting it roughly for convenience:
An observation A is evidential for a hypothesis H if it raises the probability that we're right to think H is true and H is actually true.
This is an idealization because in many important cases we cannot know that a hypothesis is true. (For an example where we can, I feel that I'm fully justified in saying that it is true that I'm typing this on a desktop computer with a black keyboard. For an example where we cannot, we cannot technically know whether a coin is perfectly fair, though we could become confident to arbitrarily good precision given enough time.) The idealization here mirrors another one that is famous to philosophy, the issue of knowledge.

Going back to Plato, knowledge has been understood to be justified true belief. The issue is that we use data, which is construed as evidence, to justify a belief, and, not knowing for sure what is and isn't true in most cases, we use data, construed as evidence, to determine whether or not a belief is true. Plato idealized, and "truth" hung out there as an ideal, never mind if it could be reached or not. In this conundrum, the entire field of epistemology has its roots.

"True," though, in this definition of knowledge, is an idealization of what I called provisionally true earlier. Nothing is true but reality, and we can only know what's true for (almost) certain in pretty special circumstances, which we'll discuss in more detail momentarily. For all the rest of the cases, the most interesting ones, we have to rely upon the provisional truth (based upon the many things mentioned earlier) of our hypotheses instead of their (certain) truth, which is out of our epistemic reach. (NB: Mathematicians might say that certain statements are "true" and that they can know it, but those truths are abstract logical consequences of axioms to which they are slaves and thus "true" in a meaningfully different sense than what is meant by something being "true" about reality.)

So, my conception of evidence is designed to mirror the definition of knowledge. No matter how justified a belief may be, it does not constitute knowledge unless it is also the case that that belief is true. That is, knowledge has to accurately reflect reality. Similarly, it seems to me that within the core of the general idea of evidence is that it represents a set of observations supporting knowledge, not mere beliefs.

Thus, for an observation to be considered evidence for a hypothesis, it is my contention that we should require also that the hypothesis it supports is actually true. In that sense, evidence is a stronger form of data; evidence is data that is sufficient to justify that it supports something true. (Again, I've already accounted for the fact that we often can't know what is true, above, by discussing confidence values and provisional truth.)


Now consider the question of the existence of God, looking at the idealized form of the suggested definition. If it is the case that God does not indeed exist, since the claim that God does exist would be false, no body of observations would constitute evidence under the idealized definition. I've worded this more eloquently in the past: If God does not exist, there is no evidence for God, only evidence misattributed to God.

VII. On the special cases, the sufficient ones

It is reasonable to conclude that some observations carry enough potency to confer immediate knowledge and thus to, on their own, constitute evidence. If, for instance, you are dealt the queen of hearts from a deck of cards, that observation alone is sufficient to conclude that you are holding the queen of hearts, a red card, a face card, a card worth ten points in various games, and the like. The thing here is that a sufficient condition to justify the almost sure truth of certain hypotheses has been met.

A single observation can be sufficient to raise the probability that the given hypothesis is true beyond reasonable doubt and thus constitutes direct evidence and should be treated as such. The law, in fact, calls this "direct evidence," and it carries immense weight in a case ("We have direct evidence that the gun that fired the shots was on the person of the defendant on the night of the crime" means that there is no doubt that the murder weapon was in the hands of the accused at the right time, and though it may be circumstantial to having committed the murder, it is enormously heavy in supporting that case.)

Of note, we see this kind of thing come up in the sciences, particularly in terms of the discovery of new kinds of objects--species, states of matter, planets, physical processes, and so on. A single observation is sufficient to establish the truth of a hypothesis and thus constitutes, on its own, evidence for that hypothesis. This is not a challenge to the suggested definition because any body of relevant observations that includes a sufficient observation will automatically pass the bar of whatever reasonable confidence level we wish to state. (Indeed, I suspect that evidence of this kind appears often in the form of almost sure evidence, if that line of thinking carries validity.)  

Strongly suggestive but insufficient observations

A single observation can be sufficient to raise the probability to a high) degree of confidence as well. For example, there is a diagnostic test for a cartilage tear in the shoulder (the passive distraction test for a SLAP tear) with 94% specificity. This means that a positive passive distraction test result strongly suggests a SLAP tear in the shoulder. Incidentally, the actual confidence in the hypothesis of a SLAP tear given a positive test result depends upon both the sensitivity (53%) and the specificity, and turns out to be 72% certainty in this case (citation). Here, a single observation confers 72% confidence in the relevant hypothesis.

On both philosophical definitions I have discussed, this constitutes evidence for a SLAP tear, but on the more robust definition I am suggesting, it only constitutes evidence up to the 72% level of confidence. If a decision, like surgery, requires a higher degree of confidence than 72%, one should not call this observation evidence on its own, though it is a highly suggestive observation. Imagine for a moment an orthopedic surgeon coming into the exam room and telling you that your positive passive distraction test is evidence for a SLAP tear and that you need surgery. If you have your wits about you, you will probably immediately ask how good the evidence is from just one manipulative test, since for most people the words "need surgery" reasonably require a pretty strong justification. Now imagine finding out that it is only 72% certain. How do you feel about the word "evidence" in this case? Personally, I feel it too strong for the circumstances and thus misleading. I think you would agree if having agreed to a surgery, the surgeon returned to tell you the good news that your labrum was not actually torn in the first place.

This case is important (and one example of many) because it isn't just very weak evidence under the philosophical definition that is misleading. A positive PDT result is strong evidence under the philosophical definition, and yet it is still potentially misleading even to use the word "evidence" without qualifying it to its confidence level. On its own, a positive PDT result is evidence that there is a 72% chance of having a SLAP tear, not that there is a SLAP tear. The difference is that the italicized statement is true.

The same is true in the lottery ticket example I mentioned earlier. Holding a lottery ticket is extraordinarily strong "evidence" that you will lose the lottery jackpot--a 99.9999994% chance of losing, if by "losing" we mean "not winning," using the current one-in-175,223,510 Powerball Jackpot odds. Doesn't it feel more than a bit presumptive, though, to say that having a lottery ticket is evidence that you will not win the lottery?

If such an attitude were common, I don't think many tickets would sell (and even if this is a moral victory, my point is that I don't think we necessarily think of evidence that way). The fact that "you will not win the lottery" technically may not be a true statement stands on the head of calling an observation like that evidence, even if it does constitute evidence for a very high confidence in that belief. (So the moral victory is available here not just by trying to get people to accept a weird definition of "evidence" but also by teaching them enough introductory statistics to understand evidence in light of the confidence value justifying it--as many universities now do by requiring introductory statistics as a service course required of many non-technical majors.)

VIII. Additional issues

There are five additional issues that stick up still, at least on this preliminary attempt to lay out my thoughts about this complicated topic. They are
  1. The idea that arguments can be construed as evidence;
  2. The issue of "background";
  3. Talking about the probability that a hypothesis is true at all;
  4. The idea of certainty in general; and
  5. Not all concepts are hypotheses, some are beliefs and stories instead.
Arguments as evidence?

It's a pretty odd thing that I'm seeing people attempting to make the case that arguments can constitute evidence since arguments are a linguistic construct (premises connected by logic to a conclusion) dealing ultimately with abstractions while evidence is data, meaning direct observations of reality. Calling an argument evidence is a category error, and using a definition of "evidence" that allows such a thing has to face this problem.

There is an important relationship between data and evidence that involves arguments: for data to be considered evidence, we need an argument to connect it to the hypothesis. This doesn't make the argument itself evidence, though. It just says that we need linguistic constructs dealing with abstractions to connect reality to the abstractions that we use to describe reality.

In the sciences, because we don't do science just out of nowhere, the connection is frequently pretty clear, though. The model under investigation is itself already an attempted description of reality and thus is already connected to the observations that led to formulating the model. (Remember elementary-school science: a hypothesis is an educated guess.) This connection is reinforced or broken by investigating how successfully the model is able to make accurate predictions within its useful range.

Generally, I'd say the same was true of theism until real science came along, at which point superstitions (including the superstitions that support belief in revelation) were revealed to be unreliable methods of gaining accurate knowledge about anything (this being a huge difference between accurate and merely useful ideas). God was an attempted description of reality, but the lack of predictive power--and, indeed, the lack of descriptive and explanatory salience, together with its failure to mesh with other fields of inquiry--of theistic attribution cut our ties to that model, showing us that observations do not support it. All that supports it now are the arguments that try to shoehorn cherry-picked data back into the unfit model, and those arguments do not constitute evidence, nor do they make evidence out of observations that aren't evidential.


Determining which observations are in the background and which aren't is critical to the success of any definition of evidence that compares against background knowledge. This determination, in fact, might be very hard to make. For instance, if the observation in question is part of the background knowledge, the union (combination) of {observation} and background is just background, so conditioning the probability of the truth of the hypothesis on the background together with the observation will yield the same probability as conditioning on the background alone--which makes the observation not evidence (read: evidential) on the philosophical definition I'm taking issue with (unless we can consider it evidential for another reason).

Picking this apart seems to require calculating probabilities from counterfactual states we pretend are background, ones in which by "background" is really meant "background knowledge except this particular observation." This casts a shadow on the whole approach. Many things are interconnected causally, and removing an observation from our body background knowledge may require substantial modifications. For instance, if we take the observation of life out of our background knowledge to determine if life constitutes evidence for God, we may have to take out a number of other properties that are causally linked to life, and then we're comparing apples and oranges. Simply put, not everything can be treated like an experiment, and those things that can't be shouldn't be.

This whole pile of rot can be avoided simply, though, just by ceasing to think of evidence as individual observations and starting to think of it as much of our everyday language indicates: as a body of observations bearing upon a matter of fact (consider: "the evidence for biological evolution" instead of "the evidences for biological evolution"). (Note, facts are, by definition, true.)
Probability that a hypothesis is true and certainty

When we talk about the probability that a hypothesis is true, it is technically a discussion of what we know, not of reality itself. A hypothesis is either true or false, and in a manner of speaking, except in the special sufficient cases, we can't really know which it is. Reality, though, is always true. This means that we're not really talking about the probability that the hypothesis is true; we're talking about how likely we are to be right if we say it is true. So when we're saying something like "p is the probability that hypothesis H is true," what we're really saying is "p is how confident I can be that hypothesis H reflects reality accurately." This is accounted for by statistical confidence testing, as discussed above.

Importantly, thinking about the probability that a hypothesis is true is kind of wrongheaded on its own and is therefore likely to be misleading. It causes us to put a bit too much stock in our hypotheses instead of putting it where it belongs, in the observations themselves. Our hypotheses follow from our observations in the first place, and the observations are the relevant bits that we can be quite sure have something to do with reality, which is not necessarily true for hypotheses, particularly when broadly construed as they often are in this discussion. Calling a hypothesis "true," though, seems really to be a reflection of how useful, salient, and consistent it is with better-established knowledge. Confusing the map for the terrain is enormously common, and learning to reject it is a properly big deal.

That might not be a hypothesis

Philosophers and scientists squabble also about what constitutes a hypothesis. Particularly, scientists have a real and legitimate issue with the idea that just any idea constitutes a hypothesis, which seems to be the implication of the philosophical definitions of evidence.

A hypothesis is, so far as I can tell and without getting technical, an idea about the world formed by examining the information that we have, including other models that have proven to be quite successful at what they do (the background or a preliminary set of interesting data). They have to make some kind of testable predictions, and there has to be a way to falsify them. Another quality that's a bit more ambiguous is that hypotheses really shouldn't be too ad hoc.

So that brings us back to God's existence. Is it even a hypothesis? I don't think so. First of all, I don't think there's an object of attribution more ad hoc than a deity, particularly God, as it is frequently conceived. God is usually left undefined except that "whatever we see, God is the explanation for it somehow." This is why I am an "ignatheist," someone who thinks that the notion of God is too vague to deserve any consideration, although when specifics happen to be given, say as in classical theism, I reject them (for good reasons).

Secondly, "God did this or that" is not falsifiable, and "God exists" is not falsifiable. In fact, "God provides an explanation for this or that" is also not falsifiable because whatever real explanation is given, "God did it that way" is a natural reply that keeps people believing, and that reply is not falsifiable. Take biological evolution, for example--a huge swath of the American public tacitly denies biological evolution by saying that it happened because God guided it the way it went, which is a weak form of Intelligent Design. In the same sense, the idea of God doesn't make any testable predictions either. It seems to, but in every instance, it's possible simply to say that things went according to God's Plan, which apparently includes not being put to any tests.

This raises a question that demands a pretty good answer: What on earth does it mean to say that an (individual) observation is evidence for a "hypothesis" that doesn't even qualify as a hypothesis?

Additionally, now that we have developed better tools to make sense of the world--proven by the fact that we can do things like put functional robots on Mars and treat a huge swath of deadly diseases, for example--than were available thousands or even hundreds of years ago, particularly the maturation of science, our background knowledge no longer leaves any room for the addition of God. This is why I suggest that the working probability of God, treated like a "hypothesis," is zero, almost surely. That "hypothesis" is off the table. It's not a hypothesis. Almost certain evidence would be needed to overturn that fact.

This, again, to briefly divert from the topic of evidence, is the position I've called ignatheism, which can be summarized thusly: Theism is not even wrong except when it bothers to be, and then it's still wrong.

IX. Summary

I hope here I've made a case that:
  • I don't think we should call observations evidence unless they support a hypothesis that is (provisionally) true, this mirroring the formal definition for knowledge, which requires that a belief be true to constitute knowledge. This would elevate evidence as a special kind of set of observations, the kind that points us to provisionally true hypotheses and thus away from error and nonsense;
  • There is a way to do this by taking evidence to be a body of observations that all together, including all additional observations that bear on the matter, lead us to justified belief that the hypothesis is true;
  • That lacking certainty, this can be accomplished by understanding provisional truth, including stating the confidence values that describe our degree of justification and thus our level of certainty with which we feel the observations constitute evidence;
  • The pure and abstract philosophical definition under examination and gaining traction lately amongst some philosophers isn't just misleading, it's irresponsibly and dangerously misleading, and should not be promoted without serious amendment, for which I've offered a tentative first attempt; and
  • Theism shouldn't be on this table at all until something like almost certain evidence is available. I have recently made a case that continued miracles would be sufficient for the task, so I'm not putting as impossible a bar here as people may think I am.
TL;DR: If nothing else gets taken away here, the big idea I've argued for is that we shouldn't call observations evidence except when they support ideas that are true. Even better, we should consider evidence to be bodies of observations that are collectively sufficient to determine provisional truth of the matter in question.


  1. I found my first serious mistake: "Of course, I'm not him, so I'll just point out that under this definition, J.R.R. Tolkien's brilliant and popular children's novel, The Hobbit, constitutes evidence for hobbits, dwarves, dragons, elves, talking giant spiders that hate being called atterop, and all manner of other imaginary things."

    I believe Bilbo enraged the spiders with the term "attercop," not "atterop."

    1. Crap! I even looked up the spelling of that and then still managed the typo. I had thought it was "attacop." Doubly wrong. So wrong.

  2. JL: "She refused to accept the "more sophisticated" definition."

    Yeah, she's a keeper.

  3. JL: "Note, for instance, that the observation that you have a lottery ticket is simultaneously evidence for winning the lottery during the next drawing and losing it."

    This seems trenchant, and rhetorically useful.

  4. This is a lot, and I would like to re-read and digest some more, but mostly a) I'm grateful, and b) impressed.

    That's a lot of great work (at least for me) above. A fair amount of it I already agree with, a lot of it helped crystallize some thoughts I was having that hadn't fully formed, and there are some great insights I was never going to come up with left to my own devices.