DO NOT QUOTE OR DISTRIBUTE Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief James M. Joyce Department of Philosophy Uni...
Author: Amos Bond
0 downloads 3 Views 467KB Size
DO NOT QUOTE OR DISTRIBUTE Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief James M. Joyce Department of Philosophy University of Michigan Traditional epistemology is dogmatic: it is organized around the concept of full belief. It is also alethic: it takes truth to be the cardinal epistemic virtue that a full belief can possess. Someone who fully believes a proposition categorically accepts it as true, categorically rejects it as false, or suspends judgment on the matter. From a purely epistemic perspective, such categorical beliefs are evaluated on the basis of what William James calls the “two great commandments” of epistemology: Believe the truth! Avoid error! Other central concepts of dogmatic epistemology – knowledge, justification, reliability, sensitivity, and so on – are understood in terms of their relationships to this ultimate standard of truth. Some epistemologists, inspired by Bayesian approaches in decision theory and statistics, have sought to replace the dogmatic model with a probabilistic one in which partial beliefs, or “credences,” play the leading role. A person’s credence in a proposition X is her level of confidence in its truth. It corresponds, roughly, to the degree to which she is disposed to presuppose X in her theoretical and practical reasoning. Credences are inherently gradational: the strength of a partial belief in X can range from certainty of its truth, though maximal uncertainty (in which X and its negation are believed equally strongly), to complete certainty of falsehood. These variations in confidence are warranted by differing states of evidence about X, and they rationalize different choices among options whose outcomes depend on X’s truth-value. It is a central normative doctrine of probabilistic epistemology that rational credences should be coherent: they should obey the laws of probability. In the idealized case where a believer has a numerically precise, real-valued credence for every proposition, 1 these laws are (where T is the logical truth, ⊥ is the logical contradiction, and X and Y are arbitrary propositions): 1

These are the laws of finitely additive probability. The results discussed below extend to the countably additive case. Weaker versions of these principles apply to subjects who lack precise credences. Also, this formulation assumes a framework in which there is only one logical truth. On more fine-grained views of propositions one will need to add a condition saying that every logical truth (falsehood) has the same probability.

Non-triviality. b(⊥) < b(T). Boundedness. b(X) is in the interval with endpoints b(⊥) and b(T). Additivity. b(X ∨ Y) + b(X ∧ Y) = b(X) + b(Y). While this formulation may seem strange, one can secure the familiar version of the laws of probability by adding the convention that b(⊥) = 0 and b(T) = 1. 2 We have chosen a more general formulation to emphasize that these identities are mere measurement conventions. Philosophers have tried to justify the requirement of coherence in many ways. Some, following Ramsey (1931), de Finetti (1937) and Savage (1971), offer pragmatic arguments to show that incoherent believers are disposed to make selfdefeating choices. Others, like Howson and Urbach (1989) and Christensen (1996), argue that incoherence generates inconsistencies in value judgments. Still others, notably van Fraassen (1983) and Shimony (1988), have sought to tie coherence to rules governing the estimation of relative frequencies. Finally, Joyce (1998) hoped to clarify the normative status coherence, and to establish an alethic foundation for probabilistic epistemology, by showing that having credences that obey the laws of probability is conducive to accuracy. The central claims of that article were as follows: 1. Partial beliefs should be evaluated on the basis of a gradational conception of accuracy, according to which the accuracy of a belief in a true/false proposition is an increasing/decreasing function of the belief’s strength. 2. One can identify a small set of constraints that any reasonable measure of gradational accuracy should satisfy. 3. Relative to any measure of gradational accuracy that satisfies the constraints, it can be show that: (3a) each incoherent system of credences is accuracy dominated in the sense that there is a coherent system that is strictly more accurate in every possible world; and (3b) coherent credences are never accuracy dominated. 2

There is also a conventional element in Non-triviality. Conjoining Additivity with the stipulation that b(⊥) > b(T) produces an anti-probability, a function whose complement b–(X ) = 1 – b(X) is a probability. The difference between representing degrees of belief using probabilities or anti-probabilities is a matter of taste: the two ways of speaking are entirely equivalent. Those who think of the strength of a belief as the degree to which a believer “regards the proposition as true” will prefer a probabilistic representation, whereas those who think of the strength of a belief as the degree to which a believer “regards the proposition as false” will favor an anti-probabilistic one.


4. Accuracy domination is an epistemic defect: the fact that incoherent credences are accuracy dominated while coherent credences are not provides a purely epistemic reason to favor coherent credences. This essay will clarify and reevaluate these claims. As it happens, the restrictions on accuracy measures imposed in Joyce (1998) are stronger than needed to obtain the desired result. Moreover, neither point (3b) nor point (4) where adequately defended. These deficiencies will be corrected here, and it will be argued that the requirement of probabilistic coherence for credences can indeed be given a ‘nonpragmatic vindication’ along the lines of 1-4. Formal Framework. We imagine an (idealized) believer with sharp degrees of confidence in propositions contained in some ordered set X = 〈X1, X2,…, XN〉. For simplicity we will assume that X is finite, and that its elements form a partition, so that, as a matter of logic, exactly one Xn is true. 3 A subject’s degrees of belief can then be represented by a credence function b that assigns a real number b(X) between zero and one (inclusive) to each X ∈ X. One can think of b as a vector 4 〈b1, b2,…, bN〉 where each bn = b(Xn) measures the subject’s confidence in the truth of Xn on a scale where 1 and 0 correspond, respectively, certainty of truth and certainty of falsehood. The N-dimensional cube BX = [0, 1]N contains all credence functions defined on X. Its subsets include both (a) the collection VX of all consistent truth-value assignments to elements of X, and (b) the family PX of all probability functions on X. If we let 0 signify falsity and 1 denote truth, we can identify VX with the set of binary sequences vn = 〈vn1, vn2,…, vnN〉 where vnj = 0 if n ≠ j and vnj =1 if n = j. It is useful to think of the various truth-value assignments vn as ‘possible worlds’. Since not all credences are probabilities, and all truth-value assignments are probabilities, we have VX ⊂ PX ⊂ BX, with all inclusions strict. Moreover, it is easy to see that a credence function b obeys the laws of probability if and only if it is a weighted average of truth-value assignments, so that b = Σj λj⋅vj where the λj are non-negative real numbers that sum to 1. Geometrically speaking, PX is VX’s convex hull. For example, if X = 〈X1, X2, X3〉, then VX is the set of points {〈1, 0, 0〉, 〈0, 1, 0〉, 〈0, 0, 1〉} in ℜ3, PX is the triangle with these points as vertices, and the 3

The results obtained here can be extended case where X is denumerably infinite and not a partition. Notation: (a) Vector quantities are in bold; their arguments are not. (b) variables ranging over integer indices (i, j, k, m, n) are always lower case, and their maximum value is the associated upper case letter, so that, e.g., n ranges over 1, 2,…, N. (c) A vector x = 〈x1, x2,…, xN〉 will often be denoted 〈xn〉



regions above (1 < b1 + b2 + b3) and below (1 > b1 + b2 + b3) this triangle contain the credence assignments that violate the laws of probability. Epistemic Utility and Scoring Rules. It requires substantive philosophical argument to determine which features of credence assignments are desirable from an epistemic perspective. We will adopt the useful fiction that the notion of overall epistemic goodness for partial beliefs is sufficiently clear and determinate to admit of quantification, so that for each credence function b and truth-value assignment v there is a real number that measures its ‘epistemic utility’ of having credences b in world v. The choice of an epistemic utility function reflects our views about what sorts of traits make beliefs worth holding from the purely epistemic perspective. It will be convenient to measure epistemic utilities using a non-negative scale on which lower numbers indicate higher overall epistemic quality. Formally, such an epistemic scoring rule is a family of real-valued functions SX, one for each partition of propositions X, each of which assigns a real number SX(b, v) ≥ 0 to each b in BX and v in VX. (When the underlying partition is clear we will write S(b, v).) Intuitively, SX(b, v) measures the overall ‘epistemic disutility’ of the credences in b when v gives the truth-values of propositions in X, thereby capturing the extent to which b’s credences diverge from some epistemic ideal at v. The term ‘scoring rule’ comes from economics, where values of S are seen as imposing penalties for making inaccurate probabilistic predictions. If, say, a meteorologist is paid to predict rain in Cleveland, then her employer might seek to promote accuracy by docking her pay $S(b, v) where b is the predicted chance of rain on a given day and v is 1 or 0 depending upon whether or not it rains that day. When scoring rules are so construed, it is vital to know whether they create incentives that encourage subjects to make honest and accurate predictions. Our focus will be different. Rather than thinking of a subject as being motivated to minimize her penalty, as economists do, we will use scoring rules to gauge those qualities of credences that have epistemological significance. Instead of viewing epistemic scoring rules as setting penalties that believers might suffer, we view them as tools of evaluation that third parties can use to assess the overall epistemic quality of a person’s opinions. The fact that one set of credences incurs a lower penalty than another at a given world should be taken to mean that, from a purely epistemic point of view, it would be better in that world to hold the first set of credences than to hold the second. It is, of course, quite consistent with this that the agent has an incentive structure that encourages her to hold beliefs that diverge greatly from the epistemic ideal.


Estimation and Accuracy. The interest of any epistemic scoring rule depends on what virtues it captures. While systems of beliefs can possess many laudable qualities – they might be informative, highly explanatory, justified, reliably produced, safe, useful for making decisions, and so on – we will be principally concerned with questions about overall accuracy. No matter what other features credences possess, epistemological evaluation is always centrally concerned with the relationship between belief and truth. Indeed, many of the features of belief just mentioned are desirable in virtue of their connection to accuracy and truth. It is natural, then, to treat accuracy as an overriding value in assessments of the epistemic quality of credences. Whatever other virtues or defects they might possess, if one system of credences is more accurate than another, then, from the purely epistemic perspective, the first set is a better set of credences to have. Accuracy is thus the cardinal epistemic value for credences. But, what does it mean to say that credences are accurate, and how is this sort of accuracy to be assessed? There are two related issues here. Systems of credences can be accurate in various respects, and we need to decide which of these epistemologists care about. Second, even after the right notion of accuracy is identified, there will still be many ways to measure accuracy, and we need to determine which of these is most appropriate for epistemological evaluation. As a step toward answering the first question, we can exploit the fact that a person’s credences determine her best estimates of epistemologically significant quantities, the sorts of quantities that, from a purely epistemic perspective, it is important to be right about. The accuracy of a system of credences can then be assessed by looking at how closely its estimates are to the actual values of the quantities in question. For these purposes, a ‘quantity’ is any function that assigns real numbers to possible worlds in VX. Here are two natural ‘epistemologically significant’ quantities: 5 • Truth-values: Quantities are propositions, thought of as indicator functions that map VX into {0, 1}. 6 For each proposition Y, Y(v) = 1 means that Y is true at v and Y(v) = 0 means that Y is false at v. 5

Some might include objective chances in this list. For current purposes, however, it is not useful to focus on the connection between credences and chances. Credences can often be portrayed as estimates of objective chances, but unlike the cases of truth-values and frequencies the relationship is not uniform. There are situations, namely those in which a believer has ‘inadmissible’ information in the sense of Lewis (1980), in which degrees of belief and estimates of objective chance diverge. 6 The choice of 1 to represent truth and of 0 to represent falsity is pure convention; any choice with T(v) > ⊥(v) will do just as well. If one wants to set T(v) < ⊥(v), the ideas developed here lead to an ‘anti-probability’ representation for credences. See footnote 2.


• Relative frequencies: Each quantity is associated with a set of propositions Y = {Y1, Y2,…, YK}. Every world v is mapped to the proportion of Y’s elements it makes true, so that FreqY(v) = [Y1(v) +…+ YK(v)]/K. One’s choices about which quantities to focus on will be tied up with one’s view of what credences are. There are accounts that construe credences as truth-value estimates (Jeffrey (1986), Joyce (1998)), or that tie them to estimates of relative frequency (Shimony (1988), van Fraassen (1983)). Once appropriate epistemic quantities have been selected, the next step is to explain how credences fix estimates. Estimation is straightforward when b obeys the laws of probability: the correct estimate for any quantity F is then its expected value computed relative to b, so that Estb(F) = ∑n b(vn)·F(vn). Since expectation is additive, Estb(F + G) = Estb(F) + Estb(G), it follows that for any set of propositions {Yk} one has both Estimated Truth-value Additivity. Estb(∑k Yk) = ∑k b(Yk). Estimated Frequency Additivity. Estb(Freq(Y)) = ∑k b(Yk)/K. Coherent credences can thus be summed to estimate either the number of truths or the relative frequency of truths in a set of propositions. These are universal facts about coherent credences: as long as b is coherent, the above identities hold no matter which set of propositions is chosen. When a credence function b violates the laws of probability these equations fail, and it becomes unclear what estimates b sanctions. Fortunately, in the special case of truth-value estimation there is a principle that does apply to all credences, solely in virtue of what they are, whether or not they obey the laws of probability. Alethic Principle. A rational believer’s estimate for the truth-value of any proposition will coincide with her credence for it: b(Y) = Estb(Y) for all credence functions b (coherent or not) and all propositions Y. This reflects the intuition that, because credences are used to estimate quantities that depend on truth-values, the relationship between a credence and its associated truth-value estimate should be as direct as possible. A person’s credence for Y is thus a kind of ‘summary statistic’ that encapsulates all those features of her evidential situation that are in any way relevant to estimates of Y’s truth-value. Given this minimal understanding of the way that credences fix truth-value estimates, we can ask how the accuracies of such estimates should be evaluated. 6

As already noted, the aim in estimation is to get as close as one can to the value of the estimated quantity. This underwrites two key tenets of epistemic evaluation. The first says that accuracy assessments of estimates are appropriately evaluated on a ‘closeness counts’ scale. GRADATIONAL ACCURACY. At world v, if the estimates in {Est*(Fj)} are uniformly closer to the values of F1,…, FJ than are the estimates in {Est(Fj)}, so that either Est(Fj) > Est*(Fj) > Fj(v) or Est(Fj) < Est*(Fj) < Fj(v) holds for all j, then the first set of estimates is more accurate than the second. The second tenet pertains to accuracy assessments across possible worlds. While it is no defect for one set of estimates to be less accurate than another at a given world, it is problematic if some alternative set of estimates is more accurate at every world. For then, the estimator can do better by shifting over to the other estimates no matter what the world is like! Let us say, then, that a set of estimates {Est(Fj)} is defective when some competing set {Est*(Fj)} has higher overall accuracy under every logically consistent assignment of truth-values to propositions in the underlying partition. We assert the following as a basic criterion of epistemic rationality: Basic Criterion. It is epistemically irrational to hold beliefs that sanction estimates {Est(Fj)} for epistemically significant quantities when there is another set of estimates {Est*(Fj)} that is more accurate for every logically consistent truth-values to propositions in X. This is a modest way of endorsing the idea that accuracy in estimation is an overriding epistemic value. We will use it to establish coherence as an epistemic asset, and incoherence as a defect, by showing that incoherent credences always sanction defective estimates, whereas coherent credences never do. The ALETHIC PRINCIPLE and BASIC CRITERION together ensure that rational credences obey the first two laws of probability. Combined, they entail that it is an error to have credences that sanction one set of estimates {Est(Fj)} when there is another {Est*(Fj)} for which Est(Fj) > Est*(Fj) > Fj(v) or Est(Fj) < Est*(Fj) < Fj(v) holds for all j and v. It follows directly that estimates should track values of estimated quantities at least to the extent that bounds on the estimated quantity translate into bounds on the estimates. E-Boundedness. If f1 ≥ F(v) ≥ f2 for all v ∈ VX, then f1 ≥ Est(F) ≥ f2. 7

This has the following implications: • Est(T) = 1 for T any logical truth, and Est(⊥) = 0 for ⊥ any contradiction. • 1 ≥ Est(Y) ≥ 0 for any proposition Y. The ALETHIC PRINCIPLE allows us to substitute ‘b’ for ‘Est’, which turns these into the first two laws of probability. A more substantive argument is required to establish that credences should satisfy the third law of probability. One strategy is to augment BASIC CRITERION with further constraints on rational estimation, and to show that these force the estimation operator to be additive. Two common requirements are: Dominance. If F(v) ≥ G(v) for all v ∈ VX, then Est(F) ≥ Est(G). Independence. If F(v) = G(v) for all v in some subset W of VX, then Est(F) ≥ Est(G) iff Est(F*) ≥ Est(G*) for all other quantities F* and G* that (i) agree with one another on W, and (ii) agree, respectively, with F and G on VX ~ W. This general approach is reflected in most justifications of coherence, including Dutch-book arguments and representation theorems. Alternatively, one can simply require that estimates be additive. Jeffrey, for example, once called this, “as obvious as the laws of logic” (1986, p. 52). This is unlikely to move anyone, however, since the additivity of truth-value estimates is straightforwardly equivalent to the additivity of credences. A slightly less objectionable approach would be to introduce the following principle, which does not so obviously presuppose that credences are additive. Calibration. For all credence functions b (coherent or not) and all sets of propositions Y, if b(Y) = b for all Y ∈ Y, then Estb(Freq(Y)) = b. This is intermediate between ALETHIC and the idea that estimated frequencies are summed credences. It reflects an intuition that is surely central to degrees of belief: what can it mean to assign credence b(Y) to Y unless one is committed to thinking that propositions with Y’s overall epistemic profile are true roughly b(Y) proportion of the time? Despite this, Calibration is still too similar to additivity to serve as a premise in the latter’s justification. Calibration requires every uniform distribution of credences over a partition {Yk} to be coherent, so that b(Yk) = 1/K 8

for all k. This is a strong requirement, and opponents of probabilism will surely want to know what there is, other than a prior commitment to additivity, that prevents a rational person from believing both Y and ~Y to degree 0.45 while still acknowledging the logical fact that exactly half the elements of {Y, ~Y} are true. Rather than pursuing these strategies, we will assume only ALETHIC and BASIC CRITERION, and will derive the requirement of probabilistic coherence with the help of some plausible principles about what epistemic accuracy consists in. The aim is to establish coherence as an epistemic virtue, and incoherence as an epistemic vice, by showing that adhering to the laws of probability enhances the accuracy of credences, and that violating them encourages inaccuracy, relative to all reasonable ways of assessing accuracy. We shall spend the next few sections discussing various properties that measures of epistemic accuracy might possess. Following that we will prove a theorem which shows that, under fairly weak conditions, coherence promotes accuracy. We will then discuss the significance of this result, and will tie it to some interesting facts about the estimation of relative frequencies in the paper’s last section. Basic Assumptions. Begin by considering three very modest requirements on epistemic scoring rules. The first asks epistemic scoring rules to assign lower penalties to credence functions whose values are uniformly closer to the truth. TRUTH-DIRECTEDNESS. For any distinct credence functions b and c, if b’s truth-value estimates are uniformly closer to the truth than c’s at a world v in VX, so that either cn ≥ bn ≥ vn or cn ≤ bn ≤ vn for all n, then S(c, v) > S(b, v). This is merely GRADATIONAL ACCURACY and the ALETHIC PRINCIPLE applied to truth-value estimates. Rules that fail this test let people improve the epistemic quality of their opinions by becoming less certain of truths and more certain of falsehoods. Such rules do not make accuracy a cardinal virtue, and so are not instruments of pure epistemic assessment. We shall also assume that small changes in credences produce only small changes in accuracy, and that accuracy varies smoothly with changes in credence. SMOOTHNESS. For each truth-value assignment v, S(b, v) is a continuous, twice differentiable function of each bn in b.


Mathematically, this comes to the requirement that each of the partial derivatives ∂2/∂bn S(b, v) exists at each point (b, v). Though plausible in its own right, this is an assumption of convenience since the main results below do hinge on it. We also impose a simple symmetry requirement to ensure that the overall epistemic accuracy of a credence function is independent of how propositions happen to be numbered. PERMUTATION INVARIANCE: For any partition X = 〈Xn〉, any credence function b = 〈bn〉 on X, and any permutation σ of{1, 2,…, N}, one has S(〈bσ(n)〉, vσ(n)) = S(〈bn〉, vn) for all n. So, no change in accuracy ensues if the propositions, credences and truth-values are rearranged so that each proposition retains its same credence and truth value. Atomism or Holism? The above requirements are consistent with either an atomistic or a holistic conception of accuracy. On an atomistic picture, the accuracy of each bn can be ascertained independently of the values of other credences in b. On a holistic conception, individual credences only have welldefined accuracies within the context of a system – one can speak of the accuracy of the function b but not of the various bn – and overall accuracy is not simply a matter of amalgamating accuracies of individual components. To illustrate, let X = 〈X1, X2, X3〉 and v = 〈1, 0, 0〉. Truth-directedness ensures that accuracy improves when b(X1) is moved closer to 1 and b(X2) is moved closer to 0, and this is true whatever value b(X3) happens to have. If, however, b(X1) and b(X2) both shift toward 1, so that b becomes more accurate in its first coordinate but less accurate in its second, then it is consistent with TRUTH-DIRECTEDNESS that accuracy increases for some values of b(X3) but decreases for others. On an atomistic conception this should not happen: the effect of changes in b(X1) and b(X2) on overall accuracy should not depend on what credences appear elsewhere in b. To put it differently, one should be able to ignore cases where b and c agree when assessing the relative change in accuracy occasioned by a shift in credences from b to c. One can enforce atomism by requiring accuracy measures to obey an analogue of the decision-theoretic ‘sure thing principle’.


SEPARABILITY. 7 Suppose Y is a subset of X, and let b, b*, c, c* be credence functions in BX such that b(X) = b*(X) and c(X) = c*(X) for all X ∈ Y b(X) = c(X) and b*(X) = c*(X) for all X ∈ X ~ Y Then S(b, v) ≥ S(c, v) if and only if S(b*, v) ≥ S(c*, v). This ensures that the overall epistemic accuracy of a person’s credences over Y can be assessed in a way that is independent of what credences are assigned outside Y. Many scoring rules are separable. Consider, for example, functions of the additive form S(b, v) = Σn λX(Xn) sn(bn, vn), where each component function sn measures the accuracy of bn on a scale that decreases/increases in the first coordinate when the second coordinate is one/zero, and where the weights λX(Xn) are non-negative real numbers summing to one that reflect the degree to which the accuracy of credences for Xn matter to the overall epistemic accuracy of credences over X. Such a function is separable as long as (a) each sn depends only on Xn, bn and vn (and not on Xk, bk or vk for k ≠ n), and (b) each λX(Xn) depends only on X and Xn (and not on b or v). 8 Those inclined toward a holist conception of fit between belief and world will deny SEPARABILITY, arguing that accuracy, like many epistemological concepts (e.g., justification), is holistic in character. The issue, at bottom, is whether one thinks of estimation as a ‘local’ process in which one’s estimate of X’s truth-value reflects one’s thinking about X taken in isolation, or whether it is a more ‘global’ process that forces one to take a stand on truth-values of propositions that do not entail either X or its negation. We shall not pursue this debate here since many of the more important results obtained below do not depend on it. Content Independence. Another set of issues concerns the extent to which the standard of accuracy for credences should be allowed to vary in response to features of propositions other than truth-values. It is consistent with everything said so far that being right or wrong about one truth has a greater effect on overall accuracy, ceteris paribus, than being right or wrong about another. It might be, for example, that a shift in credence toward a proposition’s truth-value matters more or less depending upon whether the proposition is more or less informative. Or, in addition to being awarded points for having credences that are near truth-values, 7

In Joyce (1998), truth-directedness and separability were combined into one notion, which was called Normality. The weights may depend on X. If Xn is an element of another set Y it can happen that λX(Xn) < λY(Xn), in which case the credence for Xn matters less to overall accuracy in the context of X than in the context of Y.



subjects might get credit for having credences that are also close to objective chances, so that a credence of 0.8 for a truth is more accurate if the proposition’s objective chance is 0.7 than if it is 0.2. Or, it might be that assigning high credences to falsehoods is less of a detriment to accuracy when the falsehood has high ‘verisimilitude’. Or, perhaps assigning a high credence to a truth is worth more or less depending on the epistemic or practical costs of falsely believing it. Those who take an austere view will maintain that nothing but credences and truth-values matter to assessments of epistemic accuracy. They will insist on: EXTENSIONALITY. Let b and b* be credence functions defined, respectively, over X = 〈X1,…, XN〉 and X* = 〈X*1,…, X*N〉, and let v and v* be associated truth-value assignments. If bn = b*n and vn = v*n for all n, then SX(b, v) = SX*(b*, v*). EXTENSIONALITY requires the same basic yardstick to be used in evaluating all credences. It ensures that b’s overall accuracy at v will be a function only of b’s credences and v’s truth-values. Additional facts about propositions – their levels of justification, informativeness, chances, verisimilitude, practical importance and so on – have no influence on evaluations of accuracy except insofar as they affect credences and truth-values. EXTENSIONALITY is a strong and useful requirement. For example, it entails that SX(v, v) = SX(v*, v*) < SX(1 – v, v) = SX(v*, 1 – v*) for all v and v*. This makes it possible to fix a single scale of accuracy for credences over X that holds good for all truth-value assignments (using SX(v, v) as the zero and SX(1 – v, v) as the unit). An even more telling illustration of its effects can be obtained by considering its impact on the additive rules. In theory, the component functions in such a rule might reflect standards of evaluation that vary proposition by proposition, so that sn(bn, vn) and sk(bk, vk) differ even when bn = bk and vn = vk. 9 In addition, even with a single standard of evaluation, accuracy with respect to Xn might still have more impact on overall accuracy than accuracy with respect Xk because λX(Xn) > λX(Xk). EXT eliminates this variability: it requires that sn(b, v) = sk(b, v) for all b and v, and λX(Xn) = 1/N for all n. All additive rules then assume the simple form S(b, v) = 1/N⋅Σn s(bn, vn) where s represents a single standard for evaluating the accuracy of a credence given a truth value.


Note that, by PERMUTATION INVARIANCE, sn and sk cannot differ merely in virtue of their position in the ordering. They might, however, differ because they attach to propositions with different informational content.


As further indication of EXTENSIONALITY’s potency, notice that applying it to X = 〈X, ~X〉 and X* = 〈~X, X〉 enforces a kind of symmetry between credences for propositions and their negations. Given b, c ∈ [0, 1] and v ∈ {0, 1}, EXT entails that S(〈b, c〉, 〈v, 1 – v〉) = S(〈c, b〉, 〈1 – v, v〉). So, assigning credence of b to X and credence of c to ~X when X has a given truth-value is the same, insofar as accuracy is concerned, as assigning credence of c to X and credence of b to ~X when X has the opposite truth-value. In the special case where c = 1 – b, this becomes S(〈b, 1 – b〉, 〈v, 1 – v〉) = S(〈1 – b, b〉, 〈1 – v, v〉). This last equation is independently plausible, and one might extend it to incoherent credences, so that S(〈b, c〉, 〈v, 1 – v〉) = S(〈1 – b, 1 – c〉, 〈1 – v, v〉) for all choices of b and c. Failures of this identity do have an odd feel. Suppose Jack sets credences of 0.8 and 0.4 for X and ~X, while Mack, Jack’s counterpart in another possible world, sets his at 0.2 and 0.6. Imagine that X is true in Jack’s world, but false in Mack’s world. It seems unfair for Jack’s accuracy to exceed Mack’s since each has a credence for X that is 0.2 units from its truth-value, and a credence for ~X that is 0.4 units from its truth-value. The reversal of these truth-values in the different worlds does not seem relevant to the accuracies of Jack’s or Mack’s credences. This sort of reasoning suggests the requirement of 0/1-SYMMETRY. Let vi and vj be truth-value assignments for the partition 〈XN〉 with vi(Xi) = vj(Xj) = 1. Let b and b* be credence functions for X that are identical except that b(Xi) = 1 – b*(Xi) and b(Xj) = 1 – b*(Xj). Then, SX(b, vi) = SX(b*, vj). The combination of 0/1-SYMMETRY and EXTENSIONALITY entails a condition that was endorsed in (Joyce 1998): NORMALITY. Let b and b* be defined, respectively, over X = 〈X1,…, XN〉 and X* = 〈X*1,…, X*N〉, and let v and v* be associated truth-value assignments. If |bn – vn| = |b*n – v*n| for all n, then SX(b, v) = SX*(b*, v*). This makes accuracy entirely dependent on absolute distances between credences and truth-values, so that S(b, v) = F(|b1 – v1|,…, |bN – vN|) where F(x1,…, xn) is a continuous, real function that decreases monotonically in each argument. So, NORMALITY forces any additive rule to take the form S(b, v) = Σn λn⋅f(|bn – vn|), where f: ℜ Æ ℜ is monotonically decreasing. Note that NORMALITY is weaker than the combination of 0/1-SYMMETRY and EXTENSIONALITY because it allows for the possibility that S’s various component functions have different weights.


The appropriateness of the preceding conditions as epistemic norms is up for debate. Detractors will contend that our inclinations to judge believers right or wrong often depend on the believed proposition’s informativeness, its objective chance, or on the practical or epistemic cost of being mistaken about it. Being confident to a high degree in a very specific truth is a more significant cognitive achievement than being equally confident in some less specific truth. Having a credence far from a proposition’s objective chance seems like a defect even if that credence is close to the proposition’s truth-value. Being highly confident in a true proposition whose truth-value matters a great deal seems ‘more right’ than being confident to the same degree in a true proposition whose truth-value is a matter of indifference. For all these reasons, some will argue, we need a notion of epistemic accuracy that is more nuanced than EXTENSIONALITY or NORMALITY allow. Those on the other side will portray these objections as conflating issues about what makes a belief accurate with questions about how hard it is to arrive at a justified accurate belief about some topic, or how important it is to hold such a belief. For example, the more informative a proposition is, ceteris paribus, the more evidence it takes to justify a belief in its truth. This makes it harder for a believer to confidently, justifiably, and truly believe it. While this added ‘degree of difficulty’ might be relevant to evaluations of the belief’s justification, it is not germane to its accuracy. Likewise, even though more might hang on being right about one proposition than another, this does not alter the fact someone who correctly believes them both to same degree is equally accurate. Once again, we leave it to readers to adjudicate these issues. Until the very end of the paper, EXTENSIONALITY, 0/1-SYMMETRY and NORMALITY will be treated as optional. Some Examples. It might be useful to consider some examples of additive scores that satisfy all the conditions listed so far. Consider first the exponential scores: αz(b, v) = 1/N Σn |vn – bn|z, for z > 0. The best known of these is the Brier score, Brier(b, v) = α2(b, v), which identifies the inaccuracy of each truth-value estimate with the squared distance between it and the actual truth-value. Since Brier (1950), meteorologists have used scores of this form to gauge the accuracy of probabilistic weather forecasts. Another popular score is the absolute value measure α1(b, v), which has been defended by Patrick Maher (2002) and Paul Horwich (1982), among others, as the correct yardstick for measuring probabilistic inaccuracy. Exponential scores differ in the degrees of convexity or concavity of their component functions. For a fixed truth-value v, a component s(•, v) is convex at b 14

just when s(b + δ, v) – s(b, v) < s(b, v) – s(b – δ, v) for small δ. This means that any prospective gain in accuracy that a believer might achieve by moving her credence incrementally closer to v is exceeded by the loss in accuracy she would suffer by moving her credence away from v by the same increment. s(•, v) is concave at b when s(b + δ, v) – s(b, v) > s(b, v) – s(b – δ, v) for small δ. It is flat at b when this inequality is replaced by an identity. As a way of getting a handle on these concepts, imagine a random process that, with equal probability, slightly raises or lowers credences of magnitude b for propositions with truth-value v. If epistemic inaccuracy is measured using a convex rule, this process is, on balance, detrimental to accuracy: it would be better, on average, if credences just stayed at b. If inaccuracy is convex, the process should improve accuracy, on balance. If inaccuracy is flat it should have no average effect. An additive score’s convexity/concavity properties are reflected in the second derivatives of its components. A positive/zero/negative value of s’’(b, v) signifies that the score is convex/flat/concave at b. For the exponential scores, we have αz’’(b, 1) = z⋅(z – 1)⋅(1 – b)z – 2 and αz’’(b, 0) = z⋅(z – 1)⋅bz – 2. These are positive, zero, or negative throughout (0, 1) depending upon whether z is greater than, equal to, or less than 1. αz rules with z > 1 are thus everywhere convex for both truth-values; they penalize incremental shifts in credence away from a truthvalue more than they reward similar shifts toward that truth-value. α1 is everywhere flat for both truth-values; its penalty for shifting away from a truthvalue is equal to its reward for shifting toward it. αz rules with z > 1 are concave for both truth-values: they reward incremental shifts toward truth-values more than they penalize similar shifts away from them. We will pursue issues about convexity at more length below. The lz-metrics 10 form a closely related group of scoring rules. For z > 1, the function lz(x, y) = 1/N⋅[Σn |xn – yn|z]1/z defines a consistent notion of distance on ℜN. One can interpret inaccuracy as ‘distance from truth’ by setting S(b, v) = lz(b, v). If one is willing to forgo a connection between epistemic scoring and distance, one can extend this to 0 < z < 1, so that each exponential rule αz(b, v) has an associated lz(b, v). It is easy to show that one of these two is convex/flat/concave whenever the other is, and that all the critical points of the two functions agree. 11 Despite this similarity, we will see in the next section that αz(b, v) and lz(b, v) can provide quite different measures of inaccuracy. 10

Save for 1/N , these are identical to what mathematicians call the lp metrics. I have changed to subscript so as not to confuse it with the value of a probability function, which will sometimes be represented here by the letter ‘p’. 11 To prove this, not that if g(x) = [f(x)]1/z then (i) ∂/xj g(x) = 1/z⋅[f(x)]1/z – 1⋅∂/xj f(x), which is zero whenever ∂/xj f(x) is zero, and (ii) ∂2/xj g(x) = (1 – z)/z2⋅[f(x)]1/z – 2⋅[∂/xj f(x)]2 + 1/z⋅ [f(x)]1/z – 1⋅∂2/xj f(x).


Now consider the power rules (Selten (1998)), which have components of the form πz(b, 1) = 1 – [z⋅bz – 1 – (z – 1)⋅bz] and πz(b, 0) = (z – 1)⋅bz. It is easy to see that these are everywhere convex in both components for 2 ≥ z > 1. Note the Brier score is just the z = 2 power rule. Spherical scores have components βz(b, 1) = 1 – [bz–1/(bz + (1 – b)z)(z–1)/z] and βz(b, 0) = 1 – [(1 – b)z–1/(bz + (1 – b)z)(z–1)/z], where z > 1. The only one of these in common use is ‘the’ spherical score β2 (hereafter β). One can show 12 that β’’(b, 1) is negative/zero/positive depending upon whether b is less than/equal to/greater than (7 – √17)/8, and that β’’(b, 0) is negative/zero/positive depending upon whether b is greater than/equal to/less than (1 + √17)/8. So, unlike the exponential scores, the spherical score is neither uniformly convex nor uniformly concave. There is, however, a region symmetric about ½ – between (7 – √17)/8 and (1 + √17)/8 – where both its components are convex. Last, the logarithmic score has components χ(b, 1) = – ln(1 – b) and χ(b, 1) = – ln(b). Its second derivatives – χ’’(b, 1) = 1/(1 – b)2 and χ’’(b, 0) = 1/b2 positive everywhere between zero and one, so this rule is convex. Strictly Proper Measures. Economists call a scoring rule strictly proper when it gives a coherent agent an incentive to announce her actual credence as her estimate of X’s probability. Likewise, an inaccuracy measure is strictly proper when each coherent credence function uniquely minimizes expected inaccuracy, relative to its own probability assignments. PROPRIETY. Expb(S(b)) = Σn b(vn)⋅S(b, vn) < Σn b(vn)⋅S(c, vn) = Expb(S(c)) for every b ∈ PX and c ∈ BX. It is easy to show that an additive rule is strictly proper if and only if its component functions are strictly proper in the sense that b⋅sn(b, 1) + (1 – b)⋅sn(b, 0) < b⋅sn(c, 1) + (1 – b)⋅sn(c, 0) for all b, c ∈ [0, 1]. When the rule satisfies EXTENSIONALITY, this becomes a constraint on a single pair of functions s(b, 1) and s(b, 0). When the rule is also 0/1-symmetric it becomes a constraint on the single function s(b, 1). 12

If D(b) = (b2 + (1 – b)2)1/2, then β’(b, 1) = – (1 – b)/D(b)3, β’(b, 0) = b/D(b)3, β’’(b, 1) = (–4b2 + 7b – 2)/D(b)5 and β’’(b, 0) = (–4b2 + b + 1)/D(b)5


Readers are invited to verify the following facts: • The Brier score, α2, is the only strictly proper exponential score. • No lz-score is proper. • Every power rule πz is strictly proper. • β2 is the only strictly proper spherical score. • The logarithmic score χ is strictly proper. Clearly, propriety places strong restrictions on accuracy measures. Indeed, as Schervish (1989) shows, a necessary and sufficient condition for PROPRIETY in additive, extensional rules is the existence of a strictly increasing, positive function h such that Schervish. h’(b) = –s’(b, 1)/(1 – b) = s’(b, 0)/b. 13 An equivalent characterization was given earlier in Savage (1970), who showed that s(b, 1) and s(b, 0) define a strictly proper additive scoring rule if and only if there is some twice differentiable, decreasing positive function g on [0, 1] such that Savage.

s(b, v) = g(b) + (1 – v)⋅g’(b)

One can see that the two characterizations are equivalent by setting h(b) = – g’(b). Here are the Schervish and Savage functions for the Brier, spherical and logarithmic scores.




Spherical β

Logarithmic χ

s(b, 1)

(1 – b)2

1 – b/D(b)

– ln(b)

s(b, 0)


1 – (1 – b)/D(b)

– ln(1 – b)


b⋅(1 – b)

1 – D(b)

(1 – b)⋅ln(1 – b) + b⋅ln(1 – b)




1/[b⋅(1 – b)]

D(b) = (b2 + (1 – b)2)1/2

If we relax EXTENSIONALITY, then these relationships hold for each sn.


There are many strictly proper scores. As Gibbard (2006) emphasizes, the Schervish equations provide a recipe for constructing them: for if s(b, 1) is any strictly decreasing differentiable function of b, then setting s(b, 0) = ∫0b xh’(x) dx yields a strictly proper rule. For example, if s(b, 1) = 1 – b, then h’(b ) = 1/(1 – b) and s(b, 0) = 1 – b – ln(1 – b). The quantity h’ is interesting. As Gibbard observes, the derivative of h at b reflects the “urgency” of avoiding inaccuracy in the vicinity of b. To see this, it helps to think of –s’(b, 1) and s’(b, 0) as gauges of “urgency” as well. Both functions express the rate at which inaccuracy decreases or increases with small shifts in credence toward or away from the believed proposition’s truth-value: the larger the value of –s’(b, 1), the better/worse it is for a person with credence b(X) = b to become slightly more/less confident of X when it is true; the larger s’(b, 0) is, the better/worse it is for the person to become slightly less/more confident of X when it is false. The functions thus indicate the stringency of the local dependence of inaccuracy on credence. When their values are high it is “urgent” to shift one’s credence toward rather than away from the truth-value (if one shifts at all) because there is a large difference in accuracy between going the right way and going the wrong way. If we think of –s’(b, 1) and s’(b, 0) as measuring the absolute urgency of holding an accurate credence at b, then h’ is a kind of discounted urgency that diminishes absolute urgency in inverse proportion to the distances from credences to truth-values. Formally, if h’(b, v) = s’(b, v)/(b – v), then –s’(b, 1) and h’(b, 1) measure absolute and discounted urgency for beliefs about truths, while s’(b, 0) and h’(b, 0) measure these quantities when b is the credence of a falsehood. The absolute member in each pair treats each decrease in the accuracy of a credence as having the same urgency no matter where it originates; the discounted measure regards such decreases as increasingly more urgent in inverse proportion to its distance between the credence and the relevant truth-value. While these measures can, in principle, distinguish between the urgency of holding accurate beliefs about truths and the urgency of holding accurate beliefs about falsehoods, it is natural to think that there should be no difference along this dimension. But, is absolute or discounted urgency at issue? Depending on the answer, there are two constraints one might impose on additive scoring rules: 0/1-SYMMETRY FOR ABSOLUTE URGENCY. Any additive scoring rule must be such that –s’(b, 1) = s’(1 – b, 0).


0/1-SYMMETRY FOR DISCOUNTED URGENCY. Any additive scoring rule must be such that h’(b, 1) = h’(b, 0). The first of these, a consequence of NORMALITY, makes absolute urgency depend solely on absolute distances from credences to truth-values, irrespective of which truth-value is at issue. The second, a consequence of PROPRIETY, forces discounted accuracy to have this feature. These principles are independent and do not conflict. Rules satisfying both – like the Brier, spherical, and logarithmic scores – have Schervish functions with h’(b) = h’(1 – b). Exponential rules αz with z ≠ 2 satisfy the first principle but not the second. Power rules πz with z ≠ 2 satisfy the second principle but not the first. In keeping with our treatment of NORMALITY, we take no position on the status of 0/1-symmetry for absolute urgency, at least for the moment. We will, however, argue that discounted urgency must be 0/1-symmetric by establishing that epistemic scoring rules must be strictly proper. The argument to be given was inspired by, but is slightly different from, one for the same conclusion found in Gibbard (2006). 14 Borrowing Gibbard’s terminology, call a coherent credence function b immodest with respect to a measure of epistemic inaccuracy S when b uniquely minimizes expected inaccuracy from its own perspective, so that Expb(S(c)) > Expb(S(b)) for all c ∈ BX. An immodest credence function, then, expects itself to be more accurate than any competitor, and thus has no reason, at least none based in considerations of accuracy, to prefer any set of credences to itself. A modest credence function, in contrast, assigns some other credence function a higher expected accuracy than it assigns itself. So, someone whose credences are modest expects that she would do better, in terms of overall accuracy, by holding opinions other than her own. Modest credences are epistemically defective: they undermine their own adoption. Recall that a person whose credences obey the laws of probability is committed to using expectations derived from her credences to make estimates. These estimates represent her best judgments about the actual values of quantities. Now, if, relative to a person’s own credences, some alternative system of beliefs has a lower expected inaccuracy, then, by her own estimation, that system provides a more accurate picture of the world than the one she holds. This puts her in an untenable doxastic situation. Since overall accuracy is an overriding epistemic 14

Gibbard argues that immodesty in credences is required for “guidance value.” Coherent agents, he claims, can only rationally rely on their credences to choose acts in the pursuit of goals if their credences are immodest.


value, the person has a prima facie 15 epistemic reason, grounded in her own beliefs, to distrust those very beliefs. This is a probabilistic version of Moore’s paradox. Just as a rational person cannot fully believe ‘X but I don’t believe X,’ so a person cannot rationally hold a set of credences that require her to estimate that some other specific set is more accurate. The epistemically modest person is always in this position pathological position: her beliefs undermine themselves. It should be stressed that epistemic modesty is unlike garden-variety cases in which a person knows that some unknown set of beliefs is actually more accurate than her own. We find ourselves in this position all the time. Indeed, whenever we assign any intermediate credence to a proposition X we are aware that one of the extreme credences, 0 or 1, is more accurate de facto. But, without further evidence about X’s truth-value we do not know which extreme credence is best, and so the estimate of X’s truth-value that best reflects our evidence falls between the extremes. In contrast, for the modest believer there is a credence function that, in light of all her evidence about truth-values, has a lower expected inaccuracy than her own. So, if she was fully aware of the implications of her credences for accuracy, 16 she would be forced to judge that holding these other credence would be a better way to minimize inaccuracy. Moreover, she would be able to make this judgment without acquiring any further evidence about X! Given that modesty is a defect, it would be a serious flaw in a measure of inaccuracy if it required obviously rational credences to be modest. To see how bad things can be, consider the exponential scores. A coherent agent who assigns credence b to X will set Expb(αz(c)) = b(1 – c)z + (1 – b)cz. As long as z ≠ 1, this has an extreme point at q = by/[by + (1 – b)y] where y = 1/(z – 1). q is a maximum when z < 1, and a minimum when z > 1. So, while all the αz agree that someone who is absolutely certain about X’s truth-value can be immodest, only the Brier score α2 permits immodesty across the board. Setting z > 2 encourages doxastic conservatism. Someone who is the least bit uncertain about X can always improve expected accuracy by shifting her credence to q, which lies between her credence and ½. Thus, when z > 2 one can only be immodest about credences of 1, ½ or 0. When 1 < z < 2 doxastic extremism rules: a person who is leaning, even slightly, toward thinking that X is true (or false), can improve expected accuracy by leaning 15

It does not follow that she has an all-things-considered reason to change her beliefs. Epistemic considerations are only one among many reasons that a person might have to alter one’s beliefs. 16 The person might not be able to identify the credence function that maximizes expected accuracy relative to her credences if she lacks information about either (a) the way in which epistemic accuracy is measured, or (b) the contents of her own credences. Unlike the normal case, the additional information she needs to determine that some other set of beliefs is more accurate, on balance, than her own is not information about the truth-values of the propositions to which her credences attach. She needs epistemological information about the way that accuracy is measured, or information about her own belief state.


even more strongly in that direction. Again, the message is that one should be either entirely opinionated or completely non-committal. The extremism is even more pronounced for the absolute value score α1, or for any αz with z ≤ 1. Here a person who is more confident than not of X’s truth (falsity) does best, by her own lights, by jumping to the conclusion that X is certainly true (false). To see the pernicious effects of this, imagine someone who believes that a given die is fair, and so assigns a credence of 5/6 to Xj = ‘face j will not come up on the next toss” for j = 1, 2,.., 6. If αz measures inaccuracy for z ≤ 1, the person will expect herself to be less accurate than someone who holds the logically inconsistent position that every Xj is certainly true! Thus, in all cases, except for the proper rule α2, the mere fact that a credence falls in the intervals (0, ½) or (½, 1) is sufficient grounds for judging it unsound. Worse yet, this judgment is independent of both the content of the believed proposition or the believer’s evidential position with respect to it. 17 In general, it would be a serious flaw in a proposed measure of epistemic accuracy if it forced probabilistically coherent systems of credences to be modest. Even those who deny that epistemic rationality requires obedience to the laws of probability should grant that coherent credences cannot be prohibited, a priori, on grounds of inaccuracy alone. After all, for any assignment of probabilities p, a person could, at least in principle, 18 have evidence that makes it reasonable for her to believe that each X in X has p(X) as its objective chance. 19 Moreover, this could exhaust her information about X’s truth-value. But, according to the ‘Principal Principle’ of Lewis (1980), someone who knows that the objective chance of X is p(X), and who does not possess any additional information relevant to X’s truthvalue, should adopt p(X) as her credence for X. p is thus the rational credence function for the person to hold. So, any system of credences that satisfies the laws of probability is at least a permissible state of opinion – there are evidential conditions under which it would be rational to hold it. Any acceptable measure of epistemic inaccuracy should respect this fact by making coherent systems


All failures of modesty have this character if EXTENSIONALITY holds. Some credence values are prohibited independent of the propositions to which they attach or the believer’s evidence with respect to them! 18 Alan Hájek has pointed out that it might be difficult to acquire evidence for such beliefs, e.g., it might take an infinite amount of evidence to learn that some event has probability 1/π. While quite true, this does not affect the main point. Even if it takes a highly idealized cognizer with vast amounts of evidence to rationally believe that a certain probability function gives the chances of the elements of X, the mere possibility of such a being suffices to make this argument work. 19 Some have held objective chances are not probabilities. This seems unlikely, but explaining why would take us too far afield. Nothing said here presupposes that all chance distributions are realized as probabilities. It is only being assumed that for any probability distribution p over a partition X it is possible that a person can reasonably believe that the objective chance of each X in X is p(X). This weak assumption is especially compelling when EXTENSIONALITY is assumed. For in this case, the requirement is only that there could be some partition or other for which each X in X has p(X) as its objective probability.


immodest. It should, that is, ensure that Expb(S(c)) > Expb(S(b)) for all b ∈ PX and c ∈ BX, which is just what PROPRIETY demands. Convexity. While PROPRIETY delimits the range of allowable inaccuracy scores, significant variation still remains. A further restriction is provided by considering the effects of a scoring rule’s convexity or concavity on its accuracy evaluations. Recall that an additive scoring rule is convex/flat/concave at (b, v) exactly if all its components satisfy s(b + δ, v) – s(b, v) s(b, v) – s(b – δ, v) for small δ. More generally, a scoring rule, additive or not, is (everywhere) convex/flat/concave at v iff ½⋅S(b, v) + ½⋅S(c, v) >/=/< S(½⋅b + ½⋅c, v) for all credence functions b and c. So, if Barb and Cindy have credences b and c, and if Mel’s credences m are an even compromise between the two, so that m(X) = ½⋅b(X) + ½⋅c(X) for each X, then a convex/flat/concave rule will make Mel’s beliefs more/as/less accurate than the average accuracies of the beliefs of Barb and Cindy. Here are two useful and general ways of stating that S(•, v) is convex, each assumed to hold for all v ∈ VX and all b, c ∈ BX. • S(b + δ, v) – S(b, v) > S(b, v) – S(b – δ, v) for every vector of real numbers δ = 〈δn〉 with 0 < bn ± δn < 1. • For any credence functions b1, b2,…, bm, and μ1, μ2,…, μm ≥ 0 with ∑k μk = 1, one has ∑m μm⋅S(bm, v) > S((∑m μm⋅bm), v). Recall also that an additive measure S(•, v) is convex at b if its components are convex at b, and a sufficient condition for this is that sn’’(b, vn) > 0. As our discussion of exponential rules suggested, the convexity properties of a scoring rule determine the degree of epistemic conservatism or adventurousness that it encourages. Altering any credence involves risking error, since one might move away from the truth, but it also carries the prospect of increased accuracy, since one might move closer to the believed proposition’s truth-value. The more convex a score is at a point, the greater the emphasis it places on the avoidance of error as opposed to the pursuit of truth near that point. The more concave it is, the greater the emphasis it places on the pursuit of truth as opposed to the avoidance of error. As William James famously observed, the requirements to avoid error and to believe the truth – epistemology’s two “great commandments” – are in tension, and different epistemologies might stress one at the expense of the other. James endorsed a liberal view that accents second commandment, while W. K. Clifford, his conservative foil, emphasized the first. This debate plays out in the current 22

context as a dispute about convexity/concavity properties of measures of epistemic accuracy. 20 Convexity encourages Cliffordian conservatism in the evaluation of credences by making the accuracy costs of moving away from the truth greater than the benefits of comparable movements in the direction of the truth. This makes it relatively risky to change credences, and so discourages believers from making such changes without being compelled by their evidence. In contrast, concavity fosters Jamesian extremism by making the accuracy costs of moving away from a truth smaller than the benefits of moving the same distance toward it. This can encourage believers to alter their credences without corresponding changes in evidence. Flat measures set the costs of error and the benefits of believing the truth equal, and so it becomes a matter of indifference whether or not one makes a small change in credence. To illustrate, suppose that a single ball will be drawn at random from an urn containing nine white balls and one black ball. On the basis of this evidence, a person might reasonably settle on a credence of b = 0.1 for the proposition that the black ball will be drawn and a credence of b = 0.9 for the proposition that a white ball will be drawn. The ball is drawn, and we learn that it is black. We are then asked to advise the person, without telling her what ball was drawn, whether or not to take a pill that will randomly raise or lower her credence for a black draw, with equal probability, by 0.01, while leaving her credence for a white draw at 0.9. If we are only interested in improving the person’s accuracy, then our advice should depend on the convexity of the inaccuracy score for truths at credence 0.1. For a rule that is convex here, like the Brier score, the pill’s potential disadvantages outweigh its advantages. For a rule that is concave, like the spherical score, the potential benefits are, on average, worth the risks. For a rule that is flat at 0.1, like the absolute value score, there is no advantage either way. As this case illustrates, concavity or flatness in scoring rules give rise to an epistemology in which the pursuit of accuracy can be furthered, or at least not hindered, by the employment of belief-forming or belief-altering processes that permit credences to vary randomly and independently of the truth-values of the propositions believed. This is clearly objectionable. The epistemic liberality of non-convex scoring rules encourages changes of opinion that are inadequately tied to corresponding changes in a believer’s evidence. As a result, believers can make themselves better off, in terms of accuracy, by ignoring their evidence and letting their opinions be guided by random processes that have nothing to do with the truth-value of the proposition believed. In any plausible epistemology, inaccuracy 20

Not every aspect of the James/Clifford debate is captured by the convexity question. For example, James maintained that the requirement to believe the truth can be justified on the basis of pragmatic considerations, whereas Clifford believed that epistemic conservatism was justified on both practical and moral grounds.


will be at least slightly Cliffordian; the penalties for belief changes that decrease accuracy truth will be at least a little more onerous, on average, than the penalties for staying put and forgoing a potential increase in accuracy. It might be, of course, that believers have non-epistemic reasons for altering their beliefs in the absence of any changes in their evidence, but from a purely epistemic point of view this sort of behavior should not be endorsed. It is no response to observe that strictly proper scoring rules discourage the use of random belief-forming mechanisms even when they are flat or concave. If inaccuracy is measured using β, say, then a person with credences 〈0.1, 0.9〉 for 〈black, white〉 will feel no inclination to take a pill that will change her credences to 〈0.11, 0.9〉 or 〈0.09, 0.9〉 with equal probability. On the basis of all her evidence, she expects her current credence to be more accurate, on balance, than any others she might adopt (assuming, of course, that she knows how accuracy is assessed). Yet, it remains true that we, with more relevant information, are forced to judge that she would be well advised to let her self be guided by a random process that has just as much chance of moving her away from the truth as it has of moving her toward it. This is objectionable whatever the person herself may think. Moreover, the person’s own position is vexed. It would be one thing if she was refraining from taking the pill on the grounds that belief-revision processes should not be randomly correlated with the truth. But, this is not the story. If she knows how inaccuracy is measured, she declines the pill because she is unsure whether or not a random beliefs forming process will improve her expected accuracy. She thinks it probably will not (because she thinks v is probably 0), but she also recognizes that there is a one-in-ten chance that it will. This is situation is intolerable as well. Our epistemology should not leave believers to wonder about whether, as an empirical matter, they would be wise to leave their opinions to be decided by random processes that are uncorrelated with the truth. The only way to avoid this is by requiring inaccuracy measures to be convex. Thus we have CONVEXITY. ½⋅S(b, v) + ½⋅S(c, v) > S(½⋅b + ½⋅c, v) for any credence functions b and c. Joyce (1998) defends this principle (though the justification given there is less satisfactory than the one given here). In a criticism, Maher (2002) denies that measures of epistemic accuracy should be convex. Maher’s case rests ultimately on the claim that the non-convex absolute value score, α1(b, v) = |b – v|, is a plausible measure of inaccuracy for credences. Maher offers two considerations in support of α1. First, he writes, “it is natural to measure the inaccuracy of b with respect to the proposition X in possible world v by |b(X) – v(X)|. It is also natural to 24

take the total inaccuracy of b to be the sum of its inaccuracies with respect to each proposition.” (Maher 2002, p. 77) Second, he points out that there are situations in which we use α1 to measure accuracy of various types. For instance, one naturally averages when calculating students’ final grades, which is tantamount to thinking that the inaccuracy of their answers is best measured by the absolute value score. Neither argument is convincing. While it may be natural to use the absolute value score in grading, it is not an appropriate measure for other things. When testing an archer’s accuracy, for example, we use a target of concentric circles rather than concentric squares aligned with vertices up/down and left/right. There is a sound reason for this. With the square target, an archer whose inaccuracy tends to be confined mainly along the vertical or horizontal axis is penalized less than one whose inaccuracy is distributed more evenly over both dimensions. For example, an arrow that hits nine inches below and one inch right of the bull’s-eye is deemed more accurate that one than hits 7.5 inches from the bull’s-eye at a 45o angle from vertical. While one can surely contrive scenarios in which accuracy along the vertical or horizontal dimension is more important than accuracy in other directions, this is not the norm. When evaluating the accuracy of archers there are no “preferred” directions; an error along any line running through the bull’s eye is counts for just as much as an error along any other such line. The square target uses an absolute value metric, while the circular one employs the Euclidean distance metric, the analogue of the Brier score. Both modes of measurement can seem “natural” in some circumstances, but unnatural in others. Moreover, for all its ‘naturalness’, the absolute value measure produces absurd results if used across the board. We have already seen that α1 is not strictly proper, but this is just the tip of the iceberg. Measuring accuracy using α1 – or any extensional, everywhere non-convex rule – permits logically inconsistent beliefs to dominate reasonable probabilistically coherent beliefs with respect to accuracy. Suppose a fair die is about to be tossed, and let Xj say that it lands with j = 1,…, 6 spots uppermost. Though it seems natural to set bj = 1/6, the absolute value score forces one to pay an inescapable penalty, not just an expected penalty, for doing so. For if c is the logically inconsistent 21 truth-value assignment that sets cj = 0 for all j, then α1(b, v) = 10/36 > α1(c, v) = 1/6 it is easy to see that for every truth-value assignment v. So, no matter how the truth-values turn out, a believer does better by adopting the inconsistent assignment 〈0,0,0,0,0,0〉 over the obviously correct


A credence assignment is logically inconsistent (not merely probabilistically incoherent) when it either assigns probability zero to all elements of some logical partition or when it assigns probability one to all members of some logically inconsistent set.


consistent assignment 〈1/6, 1/6, 1/6, 1/6, 1/6, 1/6〉. 22 Here we cross the border from probabilistic incoherence into outright logical inconsistency. A believer minimizes expected inaccuracy by being absolutely certain that every Xj is false even though logic dictates that one of them must be true. Measures that encourage this sort of inconsistency should be eschewed. This includes the absolute-value rule as well as every other additive rule whose components are uniformly nonconvex. Prospects for a Nonpragmatic Vindication of Probabilism. We now have three constraints on inaccuracy measures with compelling epistemic rationales: TRUTHDIRECTEDNESS, PROPRIETY, and CONVEXITY. Five others – SEPARABILITY, SMOOTHNESS, EXTENSIONALITY, 0/1-SYMMETRY, and NORMALITY – are perhaps less convincingly established. In this section, we restrict our attention to rules that are truth-directed, proper and convex in order to explain how this triad of properties singles out probabilistically coherent credences as being especially excellent from the purely epistemic perspective. The combination of TRUTHDIRECTEDNESS, PROPRIETY, and CONVEXITY opens the way for a nonpragmatic vindication of probabilism, which justifies the requirement of probabilistic coherence by showing how incoherence generates inaccuracy. The idea of vindicating coherence on the basis of accuracy considerations – and without the use of Dutch book arguments or representation theorems – stems from the work of (van Fraassen 1983) and (Shimony 1988). These articles sought, in slightly different ways, to show that incoherence leads believers to make poorly calibrated estimates of relative frequency, while coherence enhances calibration. Unfortunately, frequency calibration is a poor standard of epistemic assessment. The case against it is made in (Joyce 1998), though many of the basic points were raised in earlier (Seidenfeld 1985). Joyce (1998) sought to improve on the van Fraassen/Shimony approach by focusing on truth-values rather than frequencies, and arguing that following will be true on any ‘reasonable’ measure of epistemic inaccuracy for truth-value estimates: I. For any incoherent credence function c there is a coherent credence function b that is strictly more accurate than c under every logically possible assignment of truth-values, so that S(c, v) > S(b, v) for all v ∈ VX. 22

The same follows for all everywhere non-convex rules. Something not quite so bad also happens for everywhere convex rules that are not sufficiently uniform in their convexity. Here one ends up with the uniform 1/6 distribution being dominated by a probabilistically incoherent, but still logically consistent credence distribution.


When S(c, v) > S(b, v) for all v ∈ VX say that b “accuracy dominates” c. It must be emphasized that (I) does not imply that each coherent b accuracy dominates every incoherent c. In fact, for every coherent b there is a truth-value assignment v and (infinitely) many alternative credence assignments c, both coherent and incoherent, such that S(b, v) > S(c, v). Likewise, for any truth-value assignment v and every coherent b other than v, there will always be (infinitely) many, coherent and incoherent c, such that S(b, v) > S(c, v). 23 Joyce (1998) also presupposed, but not actually prove, that: II. No coherent credence function b is accuracy-dominated in this way by any other credence function c, whether coherent of incoherent: there is always a truth-value assignment v such that S(c, v) > S(b, v). (II) is a straightforward consequence of PROPRIETY. When S is proper it must be true that Σn b(vn)⋅S(b, vn) < Σn b(vn)⋅S(c, vn) for all b ∈ PX and c ∈ BX, which cannot happen if there is some c with S(b, vn) > S(c, vn) for all vn ∈ VX. In Joyce (1998) accuracy domination was seen as an epistemic defect, and the fact that incoherent credence functions are accuracy dominated was portrayed as a strong, and purely epistemic, reason to prefer the later over the former. The underlying normative principle was an instance of the BASIC CRITERION: III. A credence function that is accuracy-dominated relative to any reasonable measure of inaccuracy cannot be rationally held. The combination of (I), (II) and (III) was supposed to provide a purely epistemic reason for preferring coherent credences to incoherent ones. For this argument to work, there needs to be some account of what makes a scoring rule ‘reasonable’. Joyce (1998) required the equivalent of SMOOTHNESS, TRUTH-DIRECTEDNESS, SEPARABILITY, NORMALITY and CONVEXITY supplemented by a symmetry principle that forces complementary mixtures of equally accurate credence functions to be equally accurate. 24 (Maher 2002) and (Gibbard 2006) object to this principle, and Gibbard rejects NORMALITY. These objections have 23

To make this explicit, (I) says ∀c ∈ BX ~ PX, ∃b ∈ BX , ∀v ∈ VX, S(c, v) > S(b, v). This is consistent with both (A) ∀b ∈ BX, ∃c ∈ BX ~ PX, ∃v ∈ VX, S(c, v) < S(b, v) and (B) ∀v ∈ VX, ∀b ∈ PX ~ {v}, ∃c ∈ BX ~ PX, , S(c, v) < S(b, v). (A) and (B) are generally true, but neither poses any problem for our argument. Coherence cannot ensure de facto accuracy – there will always be incoherent credences infinitesimally removed from the truth – but (I) shows that it avoids a kind of necessary inaccuracy from which incoherent credences suffer. 24 The precise requirement is that S(λb + (1 – λ)c, v) = S((1 – λ)b + λc, v) when S(b, v) = S(c, v) for any 0 < λ < 1. B






merit, and it would be best to find a vindication of probabilism that avoids such controversial premises. Fortunately, PROPRIETY, which played no role in (Joyce 1998), makes this possible. Neither SMOOTHNESS, SEPARABILITY, NORMALITY nor the strong symmetry principle are required to establish (I) and (II); the combination of TRUTH-DIRECTEDNESS, PROPRIETY and CONVEXITY (over PX) suffices. THEOREM: Let S be a scoring rule defined on a finite partition X = 〈Xn〉. If S is truth-directed, strictly proper, and S(b, v) is convex for all b ∈ PX and v ∈ VX, then (I) and (II) hold. Proof: As we have seen, (II) is an obvious consequence of PROPRIETY. We prove (I) using the method of Fan et. al., (1957). Fix an incoherent credence function c. For each n, define a map from the set of coherent credence functions PX into the real numbers by setting fn(b) = S(b, vn) – S(c, vn). fn(b) is the difference in S-score between b and c at world vn that makes the proposition Xn true. Note, for future reference, that each fn is also convex on PX because each S(•, vn) is strictly convex on PX. To prove (I) it suffices to show that there is some b ∈ PX with fn(b) < 0 for all n. Suppose, for purposes of reductio, that this is not so, and that for every coherent b there is a n with fn(b) ≥ 0. Let A ⊂ ℜN contain all n-tuples 〈f1(b), f2(b),…, fN(b)〉 with b ∈ PX. Denote A’s convex hull by A+. A+ contains all points a = 〈an〉 ∈ ℜN for which there are b1, b2,…, bM ∈ PX and μ1, μ2,…, μM ≥ 0 with ∑m μm = 1 such that an = ∑m μm⋅fn(bm). Let B contain all y = 〈yn〉 ∈ ℜN with yn < 0 for all n. B is convex and contains none of its boundary points (those for which some yn = 0). Note first that A+ and B are disjoint. Since each fn is convex on PX, an = ∑m μm⋅fn(bm) > fn(∑m μm⋅bm). But, since ∑m μm⋅bm is itself a point of PX, our reductio assumption entails that fn(∑m μm⋅bm) ≥ 0 for some n. Thus, A+ ∩ B = ∅ because each a ∈ A+ is positive at some coordinate an Since A+ and B are disjoint and convex, the Hyperplane Separation Theorem ensures that there is a hyperplane H in ℜN that separates the two sets. This means that:


• There are real coefficients λ1, λ2,…, λN, not all zero, and a t ≥ 0 such that (i) ∑n λn⋅an ≥ t if 〈an〉 ∈ A+ (ii) ∑n λn⋅hn = t if 〈hn〉 ∈ H (iii) ∑n λn⋅yn ≤ t if 〈yn〉 ∈ B. • Points in A+ ∩ H or B ∩ H are on the boundary of A+ or B, respectively. Since B does not contain its boundary points, inequality (iii) must be strict. However, the only way to have ∑n λn⋅yn < t throughout B is for each λn to be positive. If not, one can find a point 〈yn〉 that is both in B and on the wrong side of H. Suppose (without loss of generality) that there are indices j, k ≥ 1 such that λ1,…, λj < 0 = λj + 1,…, λk – 1 < λk,…, λN Choose 〈yn〉 so that yn < (t + 1)/(j⋅λn) when n ≤ j, yn = –1 when j < n < k, 0 > yn > –1/((N – K)⋅λn) when k ≤ n. Each yn < 0, so 〈yn〉 ∈ B. But, since ∑n ≤ j λn⋅yn > ∑n ≤ j (t + 1)/j = t + 1 and ∑n ≥ k λn⋅yn > ∑n ≥ k –1/(N – K) = –1 it follows that ∑n λn⋅yn = ∑n ≤ j λn⋅yn + ∑n ≥ k λn⋅yn > t. This puts 〈yn〉 on the wrong side of H. Since none of the λn is negative and at least one is non-zero we can define a probability function p over X by setting p(Xn) = λn/(∑n λn). By dividing inequality (i) through by ∑n λn we obtain (i*) ∑n p(Xn)⋅an ≥ t/(∑n λn) ≥ 0 for all 〈an〉 ∈ A+. Since A is a subset of A+, it follows that ∑n p(Xn)⋅fn(b) ≥ 0 for all b ∈ PX. In particular, since that p ∈ PX we have ∑n p(Xn)⋅fn(p) ≥ 0. But, this means that ∑n p(Xn)⋅S(b, vn) ≥ ∑n p(Xn)⋅S(c, vn), which contradicts the fact that S is strictly proper. Our reductio assumption thus fails: there is at least one b ∈ PX with S(b, vn) < S(c, vn) for all n. QED 29

The significance one assigns to this piece of mathematics will depend on whether one thinks that epistemic inaccuracy is truth-directed, strictly proper and convex, and on how plausible one finds (III). The merits of the three requirements have been discussed, so let’s focus on (III). Aaron Bronfman (unpublished) has raised serious questions about (III)’s normative status. 25 The basic thrust of the objection, albeit not in Bronfman’s terms, runs thus: (III) has a wide-scope reading and a narrow-scope reading. Read wide, it says that a credence function c is defective whenever some alternative b accuracy-dominates it on every reasonable measure of inaccuracy. Read narrowly, it says that c is defective when, for each reasonable measure of accuracy S, there is a b (possibly dependent on S) that accuracy-dominates c. As should be clear, (I) and (II) only vindicate probabilism if (III) is read narrowly. For, it is consistent with (I) and (II), and with details of the above proof, that for each incoherent c and each truth-directed, proper, convex scoring rule S there is a coherent assignment bS that is uniformly more accurate than c is with respect to S, but that no single coherent b accuracy dominates c with respect to every S. A narrow reading of (III) is thus required. Bronfman argues that (III) is of questionable normative force when read narrowly. If no single coherent system of credences b is unequivocally better than the incoherent c, then a believer cannot move from c to b without risking increased inaccuracy relative to some reasonable scoring rule in some world. Since this is also true of coherent credences – for every coherent b there is an incoherent c such that S(c, v) < S(b, v) for some reasonable S and some truth-value assignment v – (I) and (II) offer no compelling rationale for having credences that obey the laws of probability. The mistake in Joyce (1998), Bronfman claims, lies in assuming that a credence function that is defective according to each reasonable measure of inaccuracy is thereby defective simpliciter, in much the same way that a person who is rich under every reasonable way of measuring wealth is unqualifiedly rich. To see the problem with this assumption, consider an analogy. Suppose ethicists and psychologists somehow decide that there are exactly two plausible theories of human flourishing, both of which make geographical location central to well-being. Suppose also that, on both accounts, it turns out that for every city in the U.S. there is an Australian city with the property that a person living in the former would be better off living in the latter. The first account might say that Bostonians would be better off living in Sydney, while the second says they would do better living in Coober Pedy. Does it follow that any individual Bostonian will be better off living in Australia? It would if both theories said that a Bostonian 25

The same objection was raised independently by Franz Huber and Alan Hájek (who inspired the Australia example). An excellent discussion of this, and related points, can be found in Huber (forthcoming).


will be better off living in Sydney. But, if the first theory ranks Sydney > Boston > Coober Pedy, and the second ranks Coober Pedy > Boston > Sydney, then we cannot definitively conclude that the person will be better off in Sydney, nor that she will be better off in Coober Pedy. So, while both theories say that a Bostonian would be better off living somewhere or other in Australia, it seems incorrect to conclude that she will be better off in Australia per se because the theories disagree about which places in Australian would make her better off. While the force of Bronfman’s objection must be conceded, it only applies if there is no determinate fact about which reasonable measure of inaccuracy is correct. If any truth-directed, strictly proper, convex scoring rule is as good as any other when it comes to measuring inaccuracy, then (I)-(III) cannot vindicate coherence without the help of a dubious inference from ‘c is defective on every reasonable measure of inaccuracy’ to ‘c is unqualifiedly defective’. If, on the other hand, there is some single correct measure of epistemic inaccuracy, then the wide and narrow scope readings of (III) collapse and Bronfman’s worries become moot. It may be that the correct scoring rule varies with changes in the context of epistemic evaluation, and it may even be that we are ignorant of what the rule is, but the nonpragmatic vindication of probabilism is untouched by Bronfman’s objection as long as there is a correct rule. Consider two analogies. Some philosophers claim that the standards for the truth of knowledge assertions vary with context, but that in any fixed context a single standard applies. Under these conditions, if every standard of evaluation has it that knowledge requires truth then knowledge requires truth per se. Similarly, as Andy Egan has noted, even if we do not know which ethical theory is correct, as long as there is some correct theory, then the fact that every reasonable candidate theory tells us to help those in need means that we should help those in need. So, the argument from (I)-(III) to the requirement of coherence goes through with the help of one further premise: IV. Only one scoring rule functions as the correct measure of epistemic accuracy in any context of epistemic evaluation. How plausible is this premise? It turns out to be quite plausible. While it seems unlikely that any one scoring rule will apply across all contexts, it will often be reasonable, given our interests and goals as epistemic evaluators, to settle on some single was of measuring accuracy in the context at hand. Indeed, it turns out that Brier score has properties that make it an excellent tool for assessing the accuracy of credences across a wide range of contexts. It is probably not the ideal score to use in every situation – for some purposes we might want to asses the accuracies of credences in ways that are non-extensional or non-separable – but it does apply in 31

a wide range of circumstances. In particular, when want to assess accuracy in a way that satisfies all of SMOOTHNESS, TRUTH-DIRECTEDNESS, SEPARABILITY, NORMALITY and CONVEXITY, it is hard to do better than the Brier score. Homage to the Brier Score. There are a number of reasons for using the Brier score to assess epistemic accuracy. First, in addition to being truth-directed, strictly proper, and convex, it is smooth, separable, extensional and normal. In many contexts of evaluation – specifically those in which questions of holistic dependence or informativeness are ignored – these are reasonable properties for a scoring rule to have. Moreover, as Savage (1971) showed, the Brier score is the only rule with these properties that can be extended to a measure of accuracy for probability estimates generally. It is natural to think of truth-value estimation as a species of probability estimation. One can assess such estimates using an extended scoring rule that takes each b ∈ BX and p ∈ PX to a real number S+(b, p) ≥ 0 that gives the inaccuracy of b’s values as estimates of the probabilities assigned by p. In keeping with GRADATIONAL ACCURACY, if bn is always strictly between cn and pn, then S+(b, p) < S+(c, p). S+ extends a truth-value based rule S when S+(b, v) = S(b, v) for every v. Extended scoring rules can be strictly proper, convex, separable, additive or normal. In his (1971) Savage proved the following result (in slightly different terms): Theorem: If an extended scoring rule S+ is strictly proper, convex, additive and normal, then it has the quadratic form S+(b, p) = Σn λn⋅(pn – bn)2. So, if one thinks that accuracy evaluations for truth-values should dovetail with accuracy evaluations for probability estimates, and that the latter should be strictly proper, convex, additive and normal, then one will assess truth-value estimates using a function of the form S+(b, v) = Σn λn⋅(vn – bn)2. If, in addition, one also accepts EXTENSIONALITY, one must use the Brier score. Savage provided yet another compelling characterization of the Brier score. Instead of assuming NORMALITY, which makes the inaccuracy a b as an estimate of p a function of the absolute differences |pn – bn|, he insisted on S+(b, p) = S+(p, b) for all coherent b and p. Again, the score so characterized has the quadratic form Σn λn⋅(vn – bn)2. Selten (1998) obtained the same result using a related symmetry property. Selten offers an argument that is compelling for both properties. He imagines a case in which we know that either p or b is right probability, but do not know which. He writes: 32

The [inaccuracy] of the wrong theory is a measure of how far it is from the truth. It is only fair to require that this measure is “neutral” in the sense that it treats both theories equally. If p is wrong and b is right, then p should be considered to be as far from the truth as b in the opposite case that b is wrong and p is right.… A scoring rule which is not neutral [in this way] is discriminating on the basis of the location of the theories in the space of all probability distributions.… Theories in some parts of this space are treated more favorably than those in some other parts without any justification. (Selten 1998, p. 54 minor notational changes) This defense seems correct, at least when considerations about the informativeness of propositions are being set aside. A final desirable feature of the Brier score has to do with the relationship between truth-value estimates and frequency estimates. Let Z be an arbitrary finite set of proposition, and let {Zj} be any partitioning of Z into disjoint subsets. nj is the cardinality of Zj, and N = Σj nj is the cardinality of Z. Imagine a person with credences b who makes an estimate fj for the frequency of truths in each Zj. Following Murphy (1973), we can gauge the accuracy of her estimates using an analogue of the Brier score called the calibration index. Cal({Zj}, 〈fj〉, v) = Σj (nj/N)⋅(FreqZj(v) – fj)2 As already noted, a coherent believer will use average credences as estimates of truth-frequencies, so that fj = ∑Z ∈ Zj b(Z)/nj. It is then possible to write: Cal({Zj}, b, v) = (1/N)⋅[∑j α2Zj(b, v) – 2⋅∑j (∑ Y ≠ Z ∈ Zj (v(Y) – b(Z))⋅(v(Z) – b(Y)))] This messy equation assumes a simple and illuminating form when propositions are grouped by credence. Suppose that each element of Z has a credence in {b1, b2,…, bJ}, and let Zj = {Z ∈ Z: b(Z) = bj}. It then follows that Cal({Zj}, b, v) = (1/N)⋅Σj ΣZ ∈ Zj (FreqZj(v) – bj)2. So, relative to this partitioning, b produces frequency estimates that are perfectly calibrated (Cal = 0) when half of the propositions assigned value 1/2 are true, two-fifths of those assigned value 2/5 are true, three-fourths of those assigned value 3/4 are true, and so on. b’s estimates are maximally miscalibrated (Cal = 1) when all truths in X are assigned credence 0, and all falsehoods are assigned credence 1. As Murphy showed, relative to this partition the Brier score is a straight sum of the calibration index and the average variance in truth-values across the 33

elements of {Zj}. For a given v, the variance in truth-value across Zj is given by s2(Zj, v) = (1/nj)⋅∑ Z ∈ Zj (FreqZj(v) – v(Z))2. To measure the average amount of variation across all the sets in {Zj}, Murphy weighted each Zj by its size to obtain discrimination index 26 Dis({Zj}, b, v) = ∑j (nj/N)⋅s2(Zj, v). This measures the degree to which b’s values sort elements of Z into classes that are homogenous with respect to truth-value. Perfect discrimination (Dis = 0) occurs when each Zj contains only truths or only falsehoods. Nondiscrimination is maximal (Dis = 1/4) when every Zj contains exactly as many truths as falsehoods. As Murphy demonstrated, the sum of the calibration and discrimination indexes is just the Brier score. 27 MURPHY DECOMPOSITION. Cal({Zj}, b, v) + Dis({Zj}, b, v) = Brier(b, v) The Brier score thus incorporates two quantities that seem germane to assessments of epistemic accuracy. Other things equal, it enhances accuracy when credences sanction well-calibrated estimates of truth-frequency. It is likewise a good thing, ceteris paribus, if credences sort propositions into classes of similar truth-values. Even so, neither calibration nor discrimination taken alone is an unalloyed good. As Brier noted, there are ways of improving one at the expense of the other that harm overall accuracy. One can, for example, ensure perfect calibration over a set of propositions that is closed under negation by assigning each proposition a credence of ½. Such credences are highly inaccurate, however, because they do not discriminate truths from falsehoods. Conversely, one achieves perfect discrimination by assigning credence one to every falsehood and credence zero to every truth, but one is then inaccurate because one is maximally miscalibrated. The moral here is that calibration and discrimination are components of accuracy that must be balanced off against one another in a fully adequate epistemic scoring rule. The fact that the Brier score, a rule with so many other desirable properties, balances the two off in such a simple and beautiful way provides yet another compelling reason to prefer it as a measure of epistemic accuracy across a wide range of contexts of evaluation. This is not to say that the Brier score is the best rule to use in every context. Some legitimate modes of epistemic evaluation surely require us to weight propositions by informativeness, in which case some (specific) quadratic rule of 26 27

Murphy actually broke the discrimination index into two components. For the proof, use s2(Zj, v) = (1/nj)⋅∑ Z ∈ Zj [v(Z)2 – nj⋅(FreqZj(v))2].


the form Σn λn⋅(vn – bn)2 might be called for. Doubtless there are other options as well. Still, as long as some single truth-directed, strictly proper, convex rule is appropriate in a given context it will be true in that context that coherence fosters accuracy and that incoherence hinders it.


References Brier, G. W. (1950) “Verification of Forecasts Expressed in Terms of Probability,” Monthly Weather Review 78: 1-3. Bronfman, A. (unpublished) “A Gap in Joyce’s Argument for Probabilism.” Christensen, David. (1996) “Dutch Books Depragmatized: Epistemic Consistency for Partial Believers,” Journal of Philosophy 93: 450 - 79. de Finetti, B. (1937) “La prévision : ses lois logiques, ses sources subjectives,” Annales de l'institut Henri Poincaré, 7: 1-68. Translated as “Foresight: Its Logical Laws, Its Subjective Sources,” in H. Kyburg and H. Smokler, eds., Studies in Subjective Probability. New York: John Wiley, 1964: 93-158. de Finetti, B. (1974) Theory of Probability, vol. 1. New York: John Wiley and Sons. Fan, K., Glicksberg. I. and Hoffman, A. J. (1957) “Systems of Inequalities Involving Convex Functions,” Proceedings of the American Mathematical Society 8: 617-622. Horwich, Paul. (1982) Probability and Evidence. New York: Cambridge University Press. Howson, C. and Urbach, P. (1989) Scientific Reasoning: The Bayesian Approach. La Salle: Open Court. Huber, F. (unpublished) “The Consistency Argument for Ranking Functions.” Jeffrey, R. (1986) “Probabilism and Induction,” Topoi 5: 51-58. Joyce, J. (1998) “A Non-Pragmatic Vindication of Probabilism,” Philosophy of Science 65: 575-603. Lewis, D. (1980) “A Subjectivist’s Guide to Objective Chance,” in Studies in Inductive Logic and Probability, edited by R. Jeffrey, vol. 2, Berkeley: University of California Press: 263-94. Maher, P. (2002) “Joyce’s Argument for Probabilism,” Philosophy of Science 96: 73-81. Murphy, A. (1973) “A New Vector Partition of the Probability Score,” Journal of Applied Meteorology 12: 595-600. Ramsey, F. P. (1931) “Truth and Probability,” in R. Braithwaite, ed., The Foundations of Mathematics and Other Logical Essays. London: Kegan Paul: 156-98.


Savage, L. J. (1971) “Elicitation of Personal Probabilities,” Journal of the American Statistical Association 66: 783-801. Savage, L. J. (1972) The Foundations of Statistics, 2nd edition New York: Dover. Schervish, M. (1989) “A General Method for Comparing Probability Assessors,” The Annals of Statistics 17: 1856-1879. Seidenfeld. T. (1985) “Calibration, Coherence, and Scoring Rules,” Philosophy of Science 52: 274-294. Selten, R. (1998). “Axiomatic Characterization of the Quadratic Scoring rule,” Experimental Economics 1: 43-62. Shimony, A. (1988) “An Adamite Derivation of the Calculus of Probability,” in J. H. Fetzer, ed., Probability and Causality. Dordrecht: D. Reidel: 151-161. van Fraassen, B. (1983) “Calibration: A Frequency Justification for Personal Probability,” in R. Cohen and L. Laudan, eds., Physics Philosophy and Psychoanalysis. Dordrecht: D. Reidel: 295-319.


Suggest Documents