Interpretations of probability

A quick intro to the main interpretations of probability

Since probability calculus has been axiomatized, Kolmogorov’s axiomatization being the standard one, and the one we briefly considered in this course, one might simply say that probability is whatever satisfies the axioms of probability, much in the same way in which, say, Euclidean items are whatever satisfies Hilbert’s axiomatization of geometry. Many quantities, such as normalized length, satisfy the axioms of probability. However, such quantities do not provide an interpretation of probability in the sense of an analysis of the notion of probability, which, presumably, is what one has in mind when one asks what probability is. Hence, assuming that the question is not ill-posed, one may feel the need to engage in some mathematical/philosophical considerations.

The main interpretations of probability are best divided into into two groups:

Epistemological interpretations, according to which probability is primarily related to human knowledge or belief.
Objective interpretations, according to which probability is about a feature of reality independent of human knowledge or belief. Sometimes reality is taken to be the physical world; at times it is taken to include a sort of Platonic realm of mathematical and logical entities.

The Classical interpretation (Bernoulli, Laplace, and most everyone up to the 1800’s)

This interpretation was developed first in the late xvii century, especially by Jacob Bernoulli (Ars Conjecturandi, 1713), but codified by Laplace (Philosophical Essays on Probabilities, 1814). For Laplace:

Determinism obtains in the natural world. Hence, probability is epistemic.
To determine the probability of, say, getting a 2 when tossing a fair die, we constructed the ratio between favorable cases and possible cases. Probability is such ratio.
Ratios between favorable and possible cases can be easily shown to obey the axioms of probability calculus.
Passage to the limit allows then the construction of probabilities not expressible as rational numbers (fractions having integers as numerator and denominator).

NOTE: This is required by the fact that in science probabilities can be expressed by irrational numbers.

What possible cases? For example, in tossing a fair die, one could have a sample universe of 2 and not-2 and claim that consequently Pr(2)=Pr(not-2)=1/2, which will not do. The answer is to require that the possible cases be equiprobable. To avoid circularity (defining probability by appealing to equiprobability) one defines equiprobable cases as those for which there are no relevant rational grounds for choosing among them (Principle of Indifference). Hence, the case not-2 subdivides into 5 equiprobable cases. So, for Laplace, if I know a coin is loaded but I don’t know how, Pr(H)=Pr(T)=1/2, as I have no ground to determine which side is favored.

NOTE: for the objective interpretation this is nonsense. If the coin is loaded, certainly Pr(H) ≠ Pr(T) ≠ 1/2.

Problems:

The probability of a single event (e.g., the murder of Caesar) cannot be determined by constructing a ratio, as there seem to be no relevant equiprobable cases.
Bertrand’s paradox. The Principle of Indifference gives inconsistent results because the same situation can be described in different, but equivalent, ways. A factory produces cubes with side between 0 and 1. If the production output is uniformly distributed along side-length, Pr(side is less than 1/2)=1/2. But the same cubes are then produces with unifirm distribution of face area between 0 and 1. By analogy we could then say that Pr(face area is less then 1/4)=1/4, and yet by necessity a cube with side less than 1/2 has faces with area less than 1/4, and viceversa.

The frequency interpretation (Venn, Reichenbach, von Mises)

Probability theory is taken to be a mathematical science dealing with mass random events, which are unpredicatble in detail but whose numerical proportion in the long run with respect to a given set of events (the reference class) are predicatble

Example:proportion of heads when flipping a coin many times; births (deaths) of, say, males in a population; raindrop distributions, etc.

NOTE: analogy with, say, dynamics, whose subject matter is force.

Gamblers and statisticians have long known of the intimate relation between probability and frequency: if Pr(2)=1/6, then in the long run the frequency of 2 within the class of all the outcomes tends towards 1/6. The frequency interpretation holds that the probability of an event or property M in a reference class B is (perhaps in an idealized way) the frequency of M within B

NOTE:

As frequencies are taken to be objective features of reality, this is an objective interpretation.
The frequency interpretation satisfies the axioms of probability theory, with some caveats, however, with respect to countable additivity.

Problems:

If the frequency is taken to be finite, then only probabilities expressed by rational numbers are allowed, which is problematic. The answer is to allow infinite (limiting) frequencies. The idea here is that one can introduce limits as in physics.
M may not occur. For example an unflipped coin lacks a probability for tails in this interpretation.
M may be a single event, e.g. the civil war or Caesar’s murder, in which case no frequency can be provided.

NOTE: von Mises did not consider this a serious objection: as the mechanical definition of work does not apply to the everyday notion of work, so this interpretation does not agree with our everyday notion of probability.

Since M must be given with respect to B, an individual will have M only qua B, not absolutely. So, Pr(I live to 90) must be understood of me qua male, or qua philosopher, or qua white, and so on.

NOTE: this may be a problem that affects other interpretations as well, however.

The Logical Interpretation (Keynes, Jeffrey, Carnap)

The basic idea of the logical interpretation is that probability is the measurement of partial entailment (with probabilities 1 and 0 as limiting cases), that is, the measurement of the evidential link between evidence E and the hypothesis H supported by E. As such the logical interpretation tries to provide a framework for inductive logic. We have already seen this in our discussion of entailment in terms of conditional probability.

There are several versions of this interpretation, but the most famous is by Carnap (Logical Foundations of Probability, 1950).

Consider a language with 3 names, a, b, c, and a predicate F. This language has 8 state descriptions, that is, statements saying for each individual whether it has F or not:

Fa&Fb&Fc
-Fa&Fb&Fc
Fa&-Fb&Fc
Fa&Fb&-Fc
-Fa&-Fb&Fc
-Fa&Fb&-Fc
Fa&-Fb&-Fc
-Fa&-Fb&-Fc

When we look at the state descriptions, we note that some differ only by permutation of names. For example, (2), (3), (4) all have two things with F and one with –F. (2), (3), and (4) constitute a structure description. There are four structure descriptions:

{1}: every individual is an F
{2,3,4}: two individuals are F and one is not
{5,6,7}: one individual is F two are not
{8}: no individual is F.

Now one defines a function m* that assigns weights to structure and state descriptions in two steps:

All structure descriptions get the same weight; hence, in our case each gets weight 1/4.
Each state description within a given state structure gets the same weight. So, since {1} has only one state description, (1) gets weight 1/4; by contrast, as {2,3,4} has three state descriptions, (2), (3), and (4) get each weight 1/12, that is, 1/4 divided by 3.

Note that:

Such assignments are a priori, much as in the classical theory.
It turns out that m* satisfies the axioms of probability.
The logical interpretation is an objective interpretation. (For Carnap, however, probability is essentially tied to a language, in this case to the very simple one we considered)
Since any statement in the language is expressible in terms of state descriptions, m* can be extended to any statement.

At this point, given any two statement h and e, one can introduce a confirmation function c* such that

c* (h,e) = [m*(h&e)]/m*(e).

Clearly, c* (h,e) does the job of Pr(h|e). c* is introduced expressly to account for our ability to learn from experience. c* can be generalized to a family of functions, but considering that is beyond our goals here.

Most of the problems of the logical theory center on the attempt to provide a framework for inductive logic:

It is unclear why one should pick c* as the confirmation function, as it is not the only one that allows learning from experience. In other words, how does one decide a priori what the calibration of the confirmation function should be?
It is unclear what e amounts to in specific cases. For example, if I toss a die and get 5, is my evidence that I got 5, or that the die made a noise when it landed, or that it had a certain trajectory, or…?

Propensity interpretation (Popper)

In this view, probability is a physical disposition to produce outcomes of a certain kind.

NOTE:

Presumably, such dispositions are causally effective.
This interpretation allows one to make sense of single case-probabilities such as the probability that this atom will be observed at position a is 2/3.
This is an objective interpretation, as propensities are taken to be features of the world.

For some, the outcomes are long run (but not infinite) frequencies: A fair coin has a propensity to land with T half the times in the long run. Note that ½ does not measure this tendency, whose strength, as it were, is close to 1.

For others, the outcomes are single outcomes: the propensity of a fair coin to come up with T is 1/2.

Problems:

It is unclear what such propensities are, and therefore it is hard to see how this interpretation clarifies what probability is.
It is unclear whether Bayes’ Theorem, which ties a conditional probability and its inverse, can be couched in propensity terms because propensities seem tied to causation, which is asymmetric in such as way that at times while it makes sense to say that B causes A, it makes little sense to say that A causes B. So, if Pr(P|D) measures the propensity of of disease D to produce a positive test result, then Pr(D|P) seems to make little sense if understood as the propensity of the positive test result to produce the disease.

The Subjectivist interpretation (de Finetti, Jeffrey)

Probability is degree of belief held by a rational agent, that is, an agent whose degrees of belief (minimally):

Satisfy the axioms of probability
Are updated by conditioning: Pr(A) becomes Pr(A|E) in the face of new evidence E.

NOTE:

Most people violate probability calculus, especially with respect to conditional probability, and therefore a normative component (the rationality of the agent) is necessary.
Probability is then presented as the logic of partial belief with classical logic as a limiting case.

Many subjectivists (e.g., de Finetti) analyze degrees of belief (probabilities) in terms of (possible) betting behavior. Consider a bet where one wins W=1 if A is true and loses L if it is false. The probability you attribute to A is what you think the fair value of L expressed in units of W, that is, the value of L if you did not know which side of the bet you would have to take. For example, suppose you consider the arrangement whereby one wins $1 if A it true and loses $1/3 if A is false a fair one. Then, you believe that Pr(A) =1/4. In fact the arrangement is fair when

1Pr(A)=1/3 (1-Pr(A)),

that is,

4/3Pr(A)=1/3,

Pr(A)=1/4.

Here probability is understood in terms of utility (in the example, $$) and rational preference. (Since you don’t know which side of the bet you’ll get, you’ll settle for a fair bet).

Others (e.g., Ramsey) try to obtain both probability and utility from rational preference in a two step procedure whereby first one obtains probability from rational preferences, and then utility from probability and rational preference. Roughly, here is the procedure. First, Ramsey introduces ‘ethically neutral’ statements, namely statements which per se are indifferent to you, so that their only significance is their association with the outcomes of gambles. Suppose now that you prefer A over B, that statement P is ethically neutral and that you are indifferent between the gamble: get A if P is true and B if P is false and get A if P is false and B if P is true. Then, by definition, Pr(P)=1/2. Note that now P can be used to set up a lottery just in the same way a fair coin can. The probabilities of many other ethically neutral statements can be obtained analogously. Once the set of ethically neutral statements for which we know the probability is large enough, they can be used in place of lotteries in determining utilities. Hence, one determines utilities and then the probabilities of the remaining ethically neutral statements. Finally, by knowing utilities, one can obtain the probabilities of non-ethically neutral statements by appealing to the expected values of bets.

Other constraints beyond (1)-(2) have been proposed, but dealing with them would take us too far afield.