Oct 17, 2015 - No-go theorems assert that hidden-variable theories, subject to appropriate hypotheses, cannot reproduce the predic- tions of quantum t...

0 downloads 4 Views 469KB Size

ON HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS ANDREAS BLASS AND YURI GUREVICH Abstract. No-go theorems assert that hidden-variable theories, subject to appropriate hypotheses, cannot reproduce the predictions of quantum theory. We examine two species of such theorems, value no-go theorems and expectation no-go theorems. The former assert that hidden-variables cannot match the predictions of quantum theory about the possible values resulting from measurements; the latter assert that hidden-variables cannot match the predictions of quantum theory about the expectation values of measurements. We sharpen the known results of both species, which allows us to clarify the similarities and diﬀerences between the two species. We also repair some ﬂaws in existing deﬁnitions and proofs.

1. Introduction This paper is about “no-go” theorems asserting the impossibility of schemes for explaining the probabilistic aspects of quantum mechanics in terms of ordinary, classical probability. Such schemes are often called hidden-variable theories. They postulate that a quantum state, even if it is a pure state and thus contains as much information as quantum mechanics permits, actually describes an ensemble of systems with different values for some additional, hidden properties that are not taken into account in quantum mechanics. The ensemble given by a quantum state is thus composed of sub-ensembles, each having specific values for the hidden variables. The idea is that, once the values of these hidden variables are specified, all the properties of the system become determinate (or at least more determinate than quantum mechanics says). Thus the randomness in quantum predictions results (entirely or at least partially) from the randomness involved in choosing a particular element, with particular values of the hidden variables, from the ensemble that a quantum state describes. Part of the ﬁrst author’s work was done as a visiting researcher at Microsoft Research; another part was done as a visiting fellow at the Isaac Newton Institute for Mathematical Sciences. 1

2

ANDREAS BLASS AND YURI GUREVICH

No-go theorems for hidden-variable interpretations of quantum mechanics assert that, under reasonable assumptions, a hidden-variable interpretation cannot reproduce the predictions of quantum mechanics. There are many no-go theorems in the literature. Although they all share the basic idea, “hidden-variable theories cannot succeed,” they differ from one another in the particular description of what a hiddenvariable theory is and what is meant by succeeding. A typical no-go theorem can be formulated in terms of a hypothesis saying what a hidden-variable theory should look like and a conclusion saying that certain predictions of quantum mechanics can never result from such a theory. In this paper, we examine two species of such theorems, value no-go theorems and expectation no-go theorems. We sharpen the results of both species, which allows us to clarify both the similarities and the differences between the two species. The value approach originated in the work of Bell [1, 2] and of Kochen and Specker [11] in the 1960’s. A very readable overview of this work, with some simplifications and historical information, is given by Mermin [12]. Value no-go theorems establish that, under suitable hypotheses, hidden-variable theories cannot reproduce the predictions of quantum mechanics concerning the possible results of measurements. There is no need to consider the probabilities of possible results or the expectation values of measurements; the measured values alone provide a discrepancy between hidden-variable theories and quantum theory. The hypotheses that are used to deduce these theorems concern the measurements of observables in quantum states. The expectation approach was developed in the last decade by Spekkens [16] and by Ferrie, Emerson, and Morris [6, 7, 8], with [8] giving the sharpest result. In this approach, the discrepancy between hidden-variable theories and quantum mechanics appears in the predictions of the expected values of measurements. There is no need to consider the actual values obtained by measurements or the probability distributions over these values. The hypotheses that are used to deduce these results concern the measurement of effects, i.e. the elements of positive operator-valued measurements (POVMs). Effects are represented by Hermitian operators with spectrum on the real interval [0, 1]. They are regarded as representing yes-or-no questions, the probability of “yes” for effect E in state |ψi being hψ|E|ψi. Although both approaches involve measurements associated to Hermitian operators, they are different sorts of measurements. In the value approach, Hermitian operators serve as observables, and measuring one of them produces a number in its spectrum. In the expectation approach, certain Hermitian operators serve as effects, and measuring

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS

3

one of them produces “yes” or “no”, i.e., 1 or 0, even if the spectrum contains — or even consists entirely of — other points. The only Hermitian operators for which these two uses coincide are the projections, the operators whose spectrum is included in {0, 1}. We sharpen the results of both approaches so that only projection measurements are used. The present work started with repairing various flaws in the literature on expectation no-go theorems. Although the papers purport to specify the exact assumptions needed to obtain their no-go results, some of them bring in, afterward, an additional assumption of convexlinearity; another erroneously claims that this assumption follows from the others. In addition, the assumptions are sometimes ambiguous, and one of the papers relies on an erroneous result of Bugajski [4], which needs some additional hypotheses to become correct. We explain the flaws that we found and how to circumvent them, and we strengthen the Ferrie, Morris, and Emerson result in [8] by substantially weakening the hypotheses. We do not need arbitrary effects, or even arbitrary sharp effects, but only rank-1 projections. Accordingly, we need convex-linearity only for the hidden-variable picture of states, not for that of effects. Theorem 1. For no quantum system are there a measurable space Λ, a convex-linear map T from density matrices ρ to probability measures on λ, and a map S from rank-1 projections E in the Hilbert space of the Rsystem into measurable functions from Λ to [0, 1], such that Tr(ρE) = S(E) dT (ρ) for all ρ and E. Λ

Some of the literature on expectation no-go theorems emphasizes a symmetry between states and effects. We explain why such symmetry is to be expected only when the Hilbert space of states is finitedimensional and the space of possible values for the hidden-variables is not merely finite-dimensional but finite. We formulate the value no-go theorems in terms of the maps, from observables to real numbers, that a hidden-variable theory would assign to individual systems. We define a value map v for a set O of Hermitian operators on Hilbert space H to be a function that assigns to each operator A ∈ O a number v(A) in the spectrum of A in such a way that, for any pairwise commuting operators A1 , . . . , An ∈ O, the tuple (v(A1 ), . . . , v(An )) belongs to the joint spectrum of the tuple (A1 , . . . , An ). (The notion of joint spectra is uncommon in the quantum literature, so we explain it and its relevant properties.) Our value no-go theorem is close to one of Bell’s results, as interpreted by Mermin [12].

4

ANDREAS BLASS AND YURI GUREVICH

Theorem 2. Suppose that H is a Hilbert space of dimension ≥ 3. (1) There is a finite set of projections for which no value map exists. (2) If dim(H) < ∞ then there is a finite set of rank-1 projections for which no value map exists. The desired finite sets of projections are constructed explicitly in the proof. The condition dim(H) ≥ 3 is necessary. In the case of dim(H) = 2 there are counterexamples [2, 12] that produce not only correct values but also correct probabilities for pure states; we slightly simplify the verification of that. These counterexamples do not violate Theorem 1 merely because they apply only to pure states and do not admit convex linearity. The condition dim(H) < ∞ in (2) is also necessary. If dim(H) = ∞ then the zero function is a value map for the set of all finite-rank projections. Note that there is no implication in either direction between our two theorems. One says that a hidden-variable theory cannot predict the correct values for measured quantities (though it might predict correct expectations) while the other says that a hidden-variable theory cannot predict the correct expectations (though it might predict the correct values, with incorrect probabilities). Thus, there are two separate reasons why hidden theories must fail. We postpone to future work a similar study of no-go theorems for local hidden-variable theories. In these theories, a certain amount of contextuality is allowed, which means that the measured value of an observable can depend on which other, commuting observables are measured along with it, but only if those other observables are local in a suitable sense. There are value no-go theorems for such theories [1, 12], but they rely on a stronger notion of value map. Consider, for example, two observables that do not commute and therefore cannot, in general, be simultaneously measured. They might nevertheless share a common eigenvector |ψi and would then have simultaneous definite values when the state of the system is |ψi. In this case, a hidden-variable theory should provide a value map that assigns appropriately correlated values for these two observables. This paper is organized as follows. In Section 2, we describe in detail the ingredients of various hidden-variable theories. Section 3 is devoted to expectation no-go theorems. We begin by describing the work of Spekkens [16] and of Ferrie, Emerson, and Morris [6, 7, 8], pointing out the flaws that we found and suggesting how to circumvent them. At the end of the section, we prove our expectation no-go result, Theorem 1, which strengthens the result of Ferrie, Morris, and Emerson

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS

5

in [8]. Section 4 is devoted to value no-go theorems, ending with the proof of Theorem 2. Section 5 is devoted to giving a mathematical basis for the intuitive idea that a hidden-variable theory for one Hilbert space should specialize to a hidden-variable theory for any closed subspace, because the latter space just represents a subset of the states of the former. Thus a no-go theorem for the subspace should imply a no-go theorem for the larger space. We prove theorems that support this intuition in several cases. Section 6 examines an example, due to Bell [2] and described by Mermin [12], of a hidden-variable theory for pure states in the case of a two-dimensional Hilbert space. The example shows that the assumption of dimension at least 3 cannot be omitted from Theorem 2. Our expectation no-go result, Theorem 1, applies in all dimensions from 2 up, and we point out why the example does not contradict the theorem. The paper has two appendices. The first discusses the notion of convex-linearity, which played a role in some of the flaws we found in the literature. The second presents a no-go theorem adapted to the original framework described by Spekkens [16], minimally modified to remove unintended aspects and ambiguities. 2. Hidden-Variable Theories In this section, we describe some of the differences between various approaches to hidden variables. These differences include what sorts of quantum states are considered, what sorts of measurements are considered, and which predictions of quantum mechanics should be matched by the hidden-variable theory. 2.1. States. Most hidden-variable theories begin with states in the usual sense of quantum mechanics and seek to make their properties more determinate by adjoining hidden variables. In some cases, however, they begin with a more primitive notion, that of a preparation, a way of producing systems in a specific quantum state. Different preparations might produce the same state. In [16], Spekkens works with a notion of ontological model of quantum theory, in which distribution functions (describing how a quantum ensemble is composed of more determinate sub-ensembles) are assigned to preparation procedures. He gives the name “preparation noncontextuality” to the hypothesis that different preparations of the same quantum state yield the same distribution function, i.e., that the distribution function is determined by the quantum state. This hypothesis is in force for most of [16], but

6

ANDREAS BLASS AND YURI GUREVICH

it is pointed out explicitly as a hypothesis that could, in principle, be questioned. Other hidden-variable theories begin with quantum states rather than with preparations, so that preparation noncontextuality is built into the foundational framework of these theories. They seek to analyze quantum states as ensembles obtained by mixing sub-ensembles with more determinate properties. These sub-ensembles are viewed in different ways by the various theories, but these viewpoints are ultimately equivalent. For example, Mermin [12] talks about individual systems in the quantum ensemble while von Neumann [18] talks about dispersion-free sub-ensembles. Other authors [16, 6, 7, 8], do not refer to the sub-ensembles explicitly but work with distribution functions over a space whose points are best viewed as parametrizing such subensembles. Even after one decides to work with quantum states, one still has a choice whether to work only with pure states or to admit mixed states as well. At first sight, the difference between these two options might seem unimportant. After all, any mixed state is a weighted average of pure states. So, given interpretations of pure states as ensembles, we can use weighted mixtures of these ensembles to represent mixed states. The situation is, however, more subtle. A single mixed state may be represented as a weighted average of pure states in more than one way. Can the associated weighted averages of ensembles depend on which of these representations we use? In general, the answer is yes, and then we do not obtain a single, well-defined ensemble to represent this mixed state. Well-definedness of the ensemble representations of mixed states is not automatic but rather imposes a non-trivial consistency requirement on the representations of the pure states. In Section 6, we shall describe an example, essentially due to Bell, of a hidden-variable representation of pure states (for a 2-dimensional Hilbert space) that cannot be extended to mixed states while respecting weighted averages. To summarize this situation, we list four approaches to the issue of what states (or preparations) should be given a hidden-variable interpretation. (We use “mixed” here to mean “possibly mixed”; pure states are included among the mixed ones.) (1) Pure states, with no consistency requirement on the representation. (2) Pure states, subject to the consistency requirement allowing a well-defined extension to mixed states, by respecting weighted averages. (3) Mixed states, with no consistency requirement.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS

7

(4) Mixed states, subject to the requirement of respecting weighted averages. In items 2 and 4, “respecting weighted averages” means that the collection of sub-ensembles associated to a weighted mixture of some given states is the corresponding weighted average of the sub-ensembles for the given states. In item 2, respect for weighted averages serves as a method for extending the hidden-variable interpretation from pure to mixed states. In item 4, respect for weighted averages is a requirement imposed on the assumed interpretation of mixed states. These two items are equivalent, in the sense that the mixed-state interpretations considered in item 4 are exactly the (unique) extensions to mixed states of the pure-state interpretations considered in item 2. The other two items in the list, items 1 and 3, are more liberal because they do not require any respect for weighted averages. The preceding list of four (or three in view of the equivalence between items 2 and 4) approaches could be doubled by including analogous versions with preparations in place of states. Notation 3. The concept of respecting weighted averages has several names in the literature. The formal definition of the concept, namely that the function f in question satisfies f (ax + by) = af (x) + bf (y) whenever a and b are nonnegative real numbers with sum 1, looks like the definition of linearity except that it applies only to the restricted options for a and b that produce weighted averages, also known as convex combinations. Because of this, some authors, for example Spekkens [16], use the term convex linear, and we shall follow this terminology. Other authors (see, for example, [6, 7, 8]) prefer the shorter name affine, though this would seem more natural for the related concept where a or b can be negative and the only constraint on them is a + b = 1. In Appendix A, we look more closely at the notion of convex-linearity. 2.2. Measurements. We consider next the sorts of measurements that a hidden-variable theory should explain. In quantum mechanics, measurements are ordinarily represented by certain Hermitian operators on the Hilbert space of states of a system. In this context, those operators are usually called observables. Before turning to the question of which operators should be treated in a hidden-variable theory, we first address a prior issue, analogous to the issue of state versus preparation in the previous subsection. The analogous issue here is measurement versus apparatus. It is entirely possible that different experimental arrangements measure the same observable. In such a situation, those arrangements should produce the same results (the same statistical distribution of measured values)

8

ANDREAS BLASS AND YURI GUREVICH

for any particular quantum state, but it is not clear that they should produce the same results on each of the sub-ensembles considered in a hidden-variable theory. Spekkens’s ontological models [16] assign measurement values not to observables but to measurement procedures. He introduces the name “measurement noncontextuality” for the hypothesis that different measurement procedures for the same observable result in the same outcomes. (Actually, he deals only with measurements of effects; see below.) When hidden-variable theories take observables to be the entities to be measured in their sub-ensembles, either because of an explicit assumption of measurement noncontextuality or because observables are built into the foundation of the theory, there still remains a choice as to which observables are to be considered and what is meant by measuring them. A traditional viewpoint is that observables are arbitrary1Hermitian operators and that a measurement of such an operator in some state produces a real number in the spectrum of the operator. For simplicity, we shall pretend for a while that our Hilbert spaces are finitedimensional, so that a measurement produces an eigenvalue of the operator. We shall see in Section 5 that no-go theorems for finitedimensional Hilbert spaces typically imply the corresponding theorems for infinite-dimensional spaces, so in these cases our simplifying assumption does not really lose generality. Quantum mechanics gives well-known formulas for the probabilities of the various eigenvalues and therefore also for quantities like the expectation of the measured values. For a hidden-variable theory to successfully match the predictions of quantum mechanics, one would reasonably require it to predict, in particular, the possible values of any measurement (namely the eigenvalues of the observable being measured) and their respective probabilities. It turns out, somewhat surprisingly, that several no-go theorems work under considerably weaker demands on what the hiddenvariable theory must accomplish. Specifically, some theorems show that a hidden-variable theory cannot even predict the correct values for all observables, even if one doesn’t care about probabilities or even the expectation values. Other theorems show that a hidden-variable theory cannot even predict the correct expectations, even if one doesn’t care about the particular values or probabilities. For brevity, we shall

1We

ignore here the complications arising from superselection rules, which make some Hermitian operators unobservable.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS

9

refer to theorems of these two sorts as “value no-go” and “expectation no-go” theorems, respectively. Another common view of measurements in quantum mechanics is based not on observables but on particular Hermitian operators called effects and on certain sets of effects called positive operator-valued measures (POVMs). An effect is a Hermitian operator E whose spectrum lies in the interval [0, 1] of the real line. Among the effects are the sharp effects, those whose spectrum is included in the two-element set {0, 1}. The sharp effects are simply the projection operators from the Hilbert space onto its closed subspaces. Arbitrary effects are weighted averages of sharp ones. A POVM is a set of effects whose sum is the identity operator I. Notice that every effect E is a member of at least one POVM, namely {E, I − E}; unless E = I, it is also a member of numerous other POVMs. A POVM {Ek : k ∈ S}, where S is some index set (usually finite), is intended to model a measurement whose outcome is a member of S, the probability of outcome k for state |ψi being hψ|Ek |ψi, or, in the case of mixed states with density matrix ρ, Tr(Ek ρ). Measurement of an observable A amounts to measurement of the POVM consisting of the projections to the eigenspaces of A. Arbitrary POVMs are more general in two respects, first that the effects in a POVM need not be sharp, and second that these effects need not commute with one another. Despite the additional generality, it is known that general POVM measurements can be reduced to measurements of observables in a larger Hilbert space, one in which the original Hilbert space is isometrically embedded. For details, see for example [20, Section 5.1] or [19]. Because actually measuring a general POVM can be a complicated process, involving an enlargement of the original Hilbert space, it is not clear that POVMs are so fundamental that a hidden-variable theory should be required to produce correct predictions for them. In particular, it is not clear that enlargement of the Hilbert space makes sense for the sub-ensembles considered by such theories. It is therefore preferable for no-go theorems to apply even when the hidden-variable theory is required to work correctly only for those POVMs whose measurement does not require enlarging the Hilbert space. Such POVMs include, in particular, those consisting of mutually commuting, sharp effects. It makes sense to speak of measuring a single effect E; this means measuring the POVM {E, I − E}. In other words, it is a yes-or-no measurement, with “yes” corresponding to E and “no” to I − E. The probability of the answer “yes” when effect E is measured in a pure

10

ANDREAS BLASS AND YURI GUREVICH

state |ψi is hψ|E|ψi; for a mixed state with density matrix ρ, it is the trace Tr(Eρ). When a hidden-variable theory uses POVMs and effects as the measurements for which values are predicted, we encounter a third notion of noncontextuality, in addition to the preparation noncontextuality and measurement noncontextuality mentioned above. The question here is whether the measurement of an effect E depends only on E itself or on the entire POVM of which E is a member. For a quantum state, the probability of getting “yes” when measuring E depends only on E, but that does not necessarily imply that the same situation obtains for all the sub-ensembles within that state. The assertion that, even for the sub-ensembles, it is only E that matters, not the whole POVM, is the third sort of noncontextuality. This issue also arises, as made very clear in [12], when measurements are given by observables rather than effects and POVMs. Noncontextuality in this context means that the result of measuring an observable A does not depend on what other observables might be measured along with A. (In this framework, those other observables must commute with A and with each other, for otherwise they could not be measured simultaneously. The framework does not envision enlarging the Hilbert space.) We shall use the word determinate to refer to all sorts of noncontextuality. The intended meaning is that a hidden-variable theory’s analysis of some aspect of quantum theory — such as states or observables or effects — should be completely determined by what is explicitly mentioned, regardless of other aspects of the situation — preparations or apparatuses or other simultaneous measurements. Remark 4. Before leaving the discussion of measurements, we point out, to avoid possible confusion, that, although an effect E is, in particular, a Hermitian operator and thus an observable, measuring it as an effect is quite different from measuring it as an observable. According to quantum theory, the former always produces 1 (“yes”) or 0 (“no”); the latter always produces one of the eigenvalues of E. The two sorts of measurement coincide only when E is a sharp effect. 3. Expectation No-Go Theorems In this section, we discuss, clarify, and extend the work of Spekkens [16] and of Ferrie et al. [6, 7, 8] , which yields what we called expectation no-go theorems above. That is, under suitable hypotheses, it is shown that hidden-variable theories cannot correctly predict the expectation values of effects. To describe and clarify the contents of these

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 11

papers, we begin with the earliest of them, [16], comment on various aspects in need of clarification, and then indicate how these aspects are developed in the three papers of Ferrie et al. The papers under discussion differ somewhat in the hypotheses that they explicitly assume, and they also differ in their names for the theories that satisfy their hypotheses. We shall use the generic name “probability representations” for these theories. In the rest of this section, we shall describe in considerable detail the variations in content of these theories; see Remark 6 for variations in the terminology. 3.1. Spekkens’s no-go theorem. The following definition is essentially from [16], but see the commentary following the definition for more details. Definition 5. Given a Hilbert space H, a probability representation (Spekkens version) for quantum systems described by H consists of

• a measure space Λ, • for every density operator ρ on H, a nonnegative R real-valued measurable function µρ on Λ, normalized so that Λ µρ (λ) dλ = 1, • for every POVM {Ek }, a set {ξEk } of nonnegative real-valued measurable functions on Λ that sum to the unit function on Λ, subject to the requirement that, if Ek = 0, then the associated function ξEk is identically zero,

such that for allRdensity operators ρ and all POVM elements Ek , we have Tr(ρEk ) = Λ dλ µρ(λ)ξEk (λ).

The intention behind this definition is that each point λ ∈ Λ represents a particular sub-ensemble as provided by the hidden-variable theory. A quantum state ρ represents an ensemble composed of various of these sub-ensembles, mixed together according to the probability measure µρ (λ) dλ. When an effect E is measured on a system from the sub-ensemble λ, the probability of getting a “yes” answer is ξE (λ). Note that even a sharp effect can have, in a sub-ensemble λ, a nontrivial probability of producing “yes”; the probability need not be 0 or 1. This is discussed in detail in the early part of [16]. Note that the last part of Definition 5 requires the expectation value for an effect E in a state ρ, as computed by quantum mechanics, namely Tr(ρE), to agree with the prediction of the hidden-variable theory, the weighted average of the probabilities ξE (λ) weighted according to the composition µρ of the state ρ. This is the only agreement demanded here between quantum mechanics and a hidden-variable theory; that

12

ANDREAS BLASS AND YURI GUREVICH

is why we refer to the resulting no-go theorem as an expectation no-go theorem. We have deviated here in several ways from Spekkens’s formulation in [16], and we pause to explain the deviations. First, while giving the definition, Spekkens explains “density operator” as “a positive traceclass operator”. We take “density operator” in its usual meaning, which requires that the trace of ρ is 1. We assume that this is what Spekkens intended, both because of the terminology and because of the required normalization of µ R ρ . If ρ were an arbitrary trace-class operator, then we would expect Λ µρ (λ) dλ to equal the trace of ρ rather than 1. Second, Spekkens refers to Λ as a measurable space rather than a measure space. The difference is that a measurable space consists just of a set Λ and a σ-algebra of subsets called the measurable sets; a measure space has, in addition, a specific measure defined on this σ-algebra. The integrals in the definition, both in the normalization condition for µρ and in the equation at the end of the definition, presuppose the availability of a fixed measure denoted by dλ. So we assume that Spekkens intended Λ to be a measure space, and we have formulated our definition accordingly. Third, we have required the functions µρ and ξE to be measurable. This requirement is needed in order for the integrals in the definition to make sense. Because the probability densities associated to states (density operators) ρ are given by functions µρ , they are, when considered as measures on Λ, always absolutely continuous with respect to the fixed measure dλ. This aspect of the definition does not seem well motivated. It remains in force in [6] and [7], but in the more recent paper [8] it is replaced by a broader viewpoint, taking Λ to be a measurable space (not a measure space, i.e., no fixed measure) and representing states ρ by measures rather than by functions. Remark 6. We already mentioned the ontological models from [16]; these assign density functions µ and outcome functions ξ to preparations and measurement procedures, respectively, rather than to states ρ and effects E. The hypotheses for probability representations that we gave above are what one obtains by adding to the notion of ontological models the additional hypotheses of preparation noncontextuality and measurement noncontextuality. In the same paper [16], Spekkens introduces a notion of “quasiprobability representation”, which requires the functions µρ and ξE to be determined independently of the preparation of ρ and the apparatus measuring E, but which allows these functions to take negative values. Thus, our notion of probability representation

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 13

can be obtained by adjoining, to the notion of quasiprobability representation, the additional hypothesis of nonnegativity. In other words, the three notions of nonnegative quasiprobability representation, noncontextual ontological model (both from [16]), and probability representation (in our terminology) coincide. The coincidence of the first two of these accounts for the title of [16]. In formulating Definition 5, we have retained one ambiguity from [16], namely the third form of noncontextuality, mentioned in Section 2: Does ξE depend only on E or also on the POVM that E is a member of? The notation ξE , which mentions only E and not a whole POVM, suggests the former, but the wording of the relevant clause in the definition of “quasiprobability representation” in [16], “every POVM . . . is represented by a set . . . of real-valued functions . . . ,” suggests the latter. We adopt the former interpretation, that ξE is determined by E, for two reasons. First, the formulation of measurement noncontextuality in [16] supports this interpretation. Second, this interpretation seems to be essential for the proof of the no-go theorem in [16]. To complete our discussion of the hypotheses in [16], one more assumption needs to be discussed, namely convex-linearity. This assumption is not present in the definitions of “quasiprobability representation” and “ontological model” nor in the additional assumptions of nonnegativity and noncontextuality. It is, however, explicitly asserted both for density matrices and for effects as if it were a necessary property of such models. Specifically, equations (7) and (8) of [16] say that, for probability distributions {wj }, X X if ρ = wj ρj , then µρ (λ) = wj µρj (λ) j

j

and

if E =

X j

wj Ej , then ξE (λ) =

X

wj ξEj (λ).

j

Spekkens gives a quite plausible argument for the first of these equations, namely that an ensemble represented by the convex combination ρ can be prepared by first choosing a value of j at random, with each j having probability wj , and then preparing the correponding state ρj . The corresponding sub-ensembles should then be given by the corresponding weighted mixtures of the sub-ensembles of the ρj ’s. The plausibility of convex-linearity might be reduced if one considers the fact that the same ρ can result from such a mixture in many ways, so convex-linearity imposes some highly nontrivial constraints on the µ functions. Any uneasiness resulting from this consideration can,

14

ANDREAS BLASS AND YURI GUREVICH

however, be ascribed to the assumption of preparation noncontextuality rather than to convex-linearity. The uneasiness results from the requirement that all the many ways to prepare a ρ ensemble must yield the same mixture of sub-ensembles. Despite the plausibility of convex-linearity, it does not follow from just the definitions in [16] or from our version, Definition 5 above. To see this, suppose that the functions ξE do not span the whole space of square-integrable functions on Λ, so that there is a funcRtion σ orthogonal to all of these ξE ’s, where “orthogonal” means that σ(λ)ξE (λ) dλ = 0. One could then modify the µρ functions by adding to each one some multiple of σ, obtaining µ′ρ = µρ + cρ σ and still satisfying the definitions. Here the coefficients cρ can be chosen arbitrarily for all of the density operators ρ. By choosing P them in a sufficiently incoherent way, one could arrange that µ′ρ (λ) 6= j wj µ′ρj (λ). If, on the other hand, the ξE ’s do span the whole P space of functions on Λ, then Spekkens’s desired equation µρ (λ) = j wj µρj (λ) does follow, for all but a measure-zero set of λ’s, because the two sides of this equation must give the same result when integrated against any ξE . Unfortunately, nothing in the definitions requires the ξE ’s to span the whole space. For example, given any probability representation, we can obtain another, physically equivalent one as follows. Replace Λ by the disjoint union Λ1 ⊔ Λ2 of two copies of Λ. Define the measure of any subset of Λ1 ⊔ Λ2 to be the average of the original measures of its intersections with the two copies of Λ. Define all the functions µρ and ξE on the new space by simply copying the original values on both of the Λi ’s. The result is a probability representation in which the ξE ’s span only the space of functions that are the same on the two copies of Λ. Convex-linearity plays an important role in Spekkens’s proof of the no-go theorem in [16], so, in order to support this proof, it should be added either as a requirement in the definition of the probability representations under consideration or as a hypothesis in the no-go theorem. Convex-linearity leads to another problem in [16]. Spekkens asserts that, if a function f is convex-linear on a convex set S of operators that span the space of Hermitian operators (and f takes the value zero on the zero operator if the latter is in S), then f can be uniquely extended to a linear function on this space. Unfortunately, such a linear extension need not exist in the general case, when zero is not in S.2 For a simple 2Spekkens

gives a formula purporting to deﬁne a linear extension of f in general, but it is not well-deﬁned because it involves some arbitrary choices. He also gives,

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 15

example, consider the function that is identically 1 on an S that spans the space of Hermitian operators, does not contain 0, but does contain two orthogonal projections and their sum. Because of this difficulty, we give, in Appendix A, a careful discussion of convex-linearity and its relation to linearity. The no-go theorem in [16] says that, when the Hilbert space H has dimension at least 2, there cannot be a probability representation (Spekkens version), subject to the clarifications above, and satisfying the additional hypothesis of convex-linearity both for states and for effects. We give a careful proof of this theorem in Appendix B. 3.2. Ferrie and Emerson’s no-go theorems. We turn next to a discussion of the papers [6, 7] of Ferrie and Emerson. These papers use the notion of frames in Hilbert spaces, a generalization of the notion of basis. We did not find frames useful, so we describe the relevant part of these papers in a way that minimizes reference to frames. In both [6] and [7], a quasiprobability representation of quantum states3 is defined as a linear and invertible map T from the space of Hermitian operators on H to L2 (Λ, µ). Here Λ is a measure space4, with measure µ, and L2 (Λ, µ) is the space of real-valued, square-integrable functions on it, modulo equality µ-almost everywhere. Note that both L2 (Λ, µ) and the space of Hermitian operators on H are real Hilbert spaces, the latter having the inner product defined by Tr(AB). As far as we can see, the motivation for using L2 (Λ, µ) rather than L1 (Λ, µ) comes neither from intuition nor from physics but rather from the mathematical benefits of having a Hilbert space and from the authors’ desire to use the frame formalism. The intuition behind a quasiprobability representation in this sense is that each λ ∈ Λ represents an assignment of possible values to the hidden variables, or equivalently it represents one of the sub-ensembles provided by the hidden-variable theory. For a density operator ρ, the function T (ρ) is the probability distribution on sub-ensembles in the ensemble described by ρ. in footnote 18 of the newer version [17] of his paper, an argument purporting to show that his formula is independent of those choices, but that argument fails. It involves dividing by an appropriate constant C to turn two nonnegative linear combinations, the two sides of an equation, into convex combinations so that the assumption of convex-linearity can be applied. But the necessary divisor C may need to be diﬀerent for the two sides of the equation. 3Not of quantum mechanics but merely of quantum states. A representation of quantum mechanics would also include an interpretation for measurements. 4We use the notation Λ for consistency with [16] and [7]. The corresponding space is called Γ in [6] and Ω in [8].

16

ANDREAS BLASS AND YURI GUREVICH

We believe that, when requiring T to be invertible, the authors of [6, 7] meant only to require that it be one-to-one, not that it be surjective as the usual meaning of “invertible” would imply. In other words, “invertible” was intended to mean merely “invertible on the range of T .” Comparing the work in these papers with our commentary on [16] above, we note that in both [6] and [7], the definition of “density operator” includes, as we expected, the requirement that the trace be 1; the space Λ is explicitly equipped with a fixed measure µ (corresponding to the implicit dλ in [16]); and the functions representing states and effects are required to be measurable. Because states are represented by functions in the presence of the fixed measure µ, the probability distributions of the sub-ensembles within a quantum state’s ensemble are always absolutely continuous with respect to µ, just as in [16]. Concerning the question whether an effect E completely determines the function ξE or whether ξE can depend also on the POVM in which E occurs, [6] contains the same ambiguity as [16], but [7] unambiguously requires determinateness here: ξE depends only on E. Concerning convex-linearity, the situation in these papers [6, 7] is rather complicated. As already indicated, the definition of a quasiprobability representation of states in these papers explicitly requires linearity. For the broader notion of a quasiprobability representation of quantum mechanics (incorporating not just states but effects), the discussion in [6, Section 3.2] begins in the context of frame representations, which are necessarily linear. But it continues with what the authors call a reformulation of the axioms of quantum mechanics, and this reformulation does not mention convex-linearity. Indeed, the axioms listed there are very similar to those of Spekkens [16] that we put into Definition 5. Just as in our discussion of [16], the axioms do not imply convex-linearity. In [7, Section IV.B], we find a notion of “frame representation of quantum theory” that implies linearity. Later, in Sections V.A and V.B, there are notions of “classical representation of quantum theory” and of “quasi-probability representation of quantum theory,” neither of which mentions or implies convex-linearity. Lemma 2 in Section V.B asserts that the mappings in a quasi-probability representation of quantum theory are affine, but this lemma is incorrect. (The error in the proof is the assumption, in the last displayed implication, that a convex combination pµσ1 + (1 − p)µσ2 of two µ-functions representing states is again such a µ-function, so that the preceding displayed implication can be applied to it.)

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 17

3.3. Ferrie, Morris, and Emerson’s no-go theorem. The difficulties in [16, 6, 7] that we have pointed out here, are resolved in [8]. In the abstract and introduction of [8], the authors describe their contribution as being primarily the extension of the earlier results in [6, 7] from finite-dimensional Hilbert spaces to infinite-dimensional ones. In view of results to be presented in Section 5 below, showing that in many situations no-go theorems for one Hilbert space automatically extend to similar theorems for any larger Hilbert spaces, we regard this extension as less important than the contribution in [8] of giving precise formulations that correct the deficiencies of prior work. The properties required of hidden-variable theories in [8] constitute Definition 10 below, but before formulating this definition we need to introduce notations for the spaces and subsets involved, and we need to point out some relationships between these spaces.

Notation 7. • In the following, let Λ be a measurable space. Recall that this means that the set Λ is equipped with a specified σ-algebra Σ of subsets. • F (Λ, Σ), often abbreviated to simply F , is the space of bounded, measurable, real-valued functions on Λ. It is a vector space over the real numbers, and we equip it with the supremum norm, kf k = sup{|f (λ)| : λ ∈ Λ}. • F[0,1] (Λ, Σ) or simply F[0,1] is the subset of F consisting of those functions whose values lie in the interval [0, 1]. • M(Λ, Σ), often abbreviated to simply M, is the space of bounded, signed, real-valued measures on Λ. It is a vector space over the real numbers, and we equip it with the total variation norm. That is, if µ ∈ M, then µ can be expressed as µ+ − µ− , where µ+ and µ− are positive measures with disjoint supports (called the positive and negative parts of µ). Then kµk = µ+ (Λ) + µ− (Λ). • M+1 (Λ, Σ) or simply M+1 is the subset of M consisting of the probability measures, i.e., the positive measures with total measure equal to 1. • H is a complex Hilbert space. • B(H), often abbreviated to simply B, is the real Banach space of bounded, self-adjoint operators H → H; its norm is the operator norm kAk = sup{kAxk : x ∈ A, kxk = 1}. • B[0,1] (H) or simply B[0,1] is the subset of B consisting of the effects, i.e., operators A ∈ B such that both A and I − A are

18

ANDREAS BLASS AND YURI GUREVICH

positive5, or equivalently such that the spectrum of A lies in the interval [0, 1]. • T (H), often abbreviated simply T , is the vector subspace of B consisting of the (self-adjoint) trace-class operators. These are the operators A whose spectrum consists only of (real) eigenvalues αi (eigenvalues with multiplicity > 1 are repeated in this list; the P continuous spectrum is empty or {0}) such that the sum i |αi | is finite; this sum serves as the norm kAk of A in T . (Note that this norm is usually not equal to the operator norm, the norm of A in B, which equals the supremum of the |αi |.) • T+1 (H) or simply T+1 is the subset of T consisting of the density operators, positive operators of trace 1. Remark 8. We have modified some of the notations from [8]. In the first place, we have removed a subscript s from T and B. The subscript’s purpose was to indicate that these spaces consist only of selfadjoint operators. Since we do not deal with more general operators in this context, the subscript seemed superfluous. Also, what we have called F[0,1] , M+1 , B[0,1] , and T+1 have in [8] the notations E(Λ, Σ), S(Λ, Σ), E(H), and S(H), respectively. The double use of E and S served the useful purpose of indicating which ingredients of quantum theory correspond to which ingredients of a hidden-variable theory, but they also prevented any abbreviations omitting (Λ, Σ) or H. We hope that our notations will be easier to remember, since the main symbols (F , M, B, T ) indicate the vector spaces in which these subsets lie, while the subscripts hint at the restriction that characterizes elements of the subset. In contrast to [16, 6, 7] there is no specified measure on Λ. As in these papers discussed earlier, a point λ ∈ Λ represents specific values for all the hidden variables, and thus represents a specific sub-ensemble for the hidden-variable theory. A quantum state will then be viewed as a mixture of such sub-ensembles according to a probability measure on Λ, i.e., an element of M+1 . This approach avoids any assumption of absolute continuity of these measures with respect to an a priori given measure; there simply is no a priori given measure.

5I

is the identity operator. Positivity of a self-adjoint operator A means that all of its spectrum lies in the non-negative half of the real line. Equivalently, it means that hψ|A|ψi ≥ 0 for all |ψi ∈ H

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 19

The normed vector spaces introduced above are connected by two duality relations. First, every f ∈ F induces a continuous linear functional f¯ on M by integration: Z ¯ f (µ) = f dµ. Λ

We shall not need to deal with the entire dual space6M′ of M but only with the part consisting of functionals f¯ arising from F . Second, the dual T ′ of T can be identified with B as follows. Every ¯ on T by B ∈ B induces a continuous linear functional B ¯ B(W ) = Tr(BW ), because the product of a bounded operator and a trace-class operator is again in the trace class (i.e., the trace class is an ideal in the ring of bounded operators). Furthermore, every bounded linear functional on T arises in this way from a unique B ∈ B. The correspondence ¯ is an isometric isomorphism between B and T ′ , and so one B 7→ B often identifies these two spaces. For details about this duality, see, for example, [15, Theorem 23]. Remark 9. Note that this duality relationship is not symmetric. That is, although each W ∈ T induces a continuous linear functional on B, namely A 7→ Tr(AW ), these will not be all of the linear functionals on B unless H is finite-dimensional. Spekkens [16] emphasizes a certain symmetry between states and measurements, and, at the end of the paper, he seeks to give an “evenhanded” proof of a no-go theorem, respecting this symmetry. The fact that B, the space in which measurements live, is the dual of T , the space in which states live, but not vice versa, suggests that the actual situation is not really symmetrical. One reflection of this asymmetry arises when we try to prove a no-go theorem for probability representations (Spekkens version) as defined above. After building into that definition our clarifications and corrections of Spekkens’s assumptions, the proof that we obtained, and which we record in Appendix B below, is not even-handed in the sense desired by Spekkens. We do not have any even-handed proof of an expectation no-go theorem. 6In

general, and even for nice measurable spaces like the real line R with the σ-algebra of Borel sets, M′ is an unpleasantly complicated space. In particular, in this special case of R, the linear functional P assigning to each measure µ ∈ M the total measure of all the individual points, x∈R µ({x}), is not of the form f¯ for any f ∈ F . For more information about the dual of M, see, for example, [5] and the references cited there.

20

ANDREAS BLASS AND YURI GUREVICH

The asymmetry in the duality relationship between B and T is specific to the case of infinite-dimensional spaces. In the case of finitedimensional H, say of dimension d, all bounded linear operators are in the trace class, so B and T are the same when considered just as vector spaces. Their norms, though not identical, are equivalent in the sense that each is bounded by a constant (depending on d) multiple of the other. They are identified with the space of Hermitian d × d matrices. To get a really smooth symmetry, though, one would need not only that H is finite-dimensional but also that Λ is finite. That additional finiteness would make M and F dual to each other and would avoid the messiness that arises in M′ in the general case. Unfortunately, finiteness of Λ is quite a restrictive assumption. Consider, for example, a spin- 12 particle in an eigenstate of the z-spin. The hidden variables in this situation would have to determine the spin components in all directions other than z, and there is a continuum of possibilities there. It seems that finiteness of Λ becomes plausible only if one can argue that, because of limited precision of measurements, the spaces of measurement outcomes can be discretized and thus treated as finite. See Section 6 below for further discussion of symmetry (or its absence) in the light of some specific examples. We are now in a position to present the notion that Ferrie et al. [8] call a classical representation of quantum mechanics. We prefer to call it a probability representation, viewing it as an updating and clarification of the notion introduced in Definition 5. Definition 10. A probability representation (Ferrie-Morris-Emerson version) for quantum systems described by H consists of • a measurable space Λ, • a convex-linear map T from the set T+1 of density matrices into the set M+1 of probability measures, and • a convex-linear map S from the set B[0,1] of effects into the set F[0,1] of measurable functions from Λ to [0, 1], subject to, for all ρ ∈ T+1 and all E ∈ B[0,1] , Z Tr(ρE) = S(E) dT (ρ). Λ

The correspondence between this definition and the earlier Definition 5 is that the measure T (ρ) is what was previously written µρ (λ) dλ, and S(E) was previously ξE . The “trace equals integral” requirement in the last clause of the definition still says that the expectation of the effect E in the state ρ is the same whether computed in quantum mechanics (the trace) or in the hidden-variable theory (the integral).

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 21

Theorem 1 of [8] asserts that such a probability representation is impossible (provided H has dimension at least 2). The proof has a gap, which we fill in the next section, and we simultaneously make some other improvements to the theorem and its proof. 3.4. Our Expectation No-Go Theorem. In this section, we prove the first main result of this paper, a no-go theorem that strengthens Theorem 1 of [8]. Our theorem and its proof are based on the result in [8] but differ from it in two major respects. First, we use a weaker hypothesis, requiring the existence of S(E) only for (certain) sharp effects E, not for all effects. We impose no convex-linearity assumption on S. Second, we fill a gap that apparently resulted from quoting a misstated fact in [4]. In addition to these changes, we also remove an unnecessary paragraph in the otherwise terse proof. The following definition, our final updating of the notion of “probability representation,” expresses the hypotheses necessary for our theorem. The conventions in Notation 7 remain in force. Definition 11. A probability representation (our version) for quantum systems described by H consists of • a measurable space Λ, • a convex-linear map T from the set T+1 of density matrices into the set M+1 of probability measures, and • a map S from the set of rank-1 projections in H into the set F[0,1] of measurable functions from Λ to [0, 1], subject to, for all ρ ∈ T+1 and all rank-1 projections E, Z Tr(ρE) = S(E) dT (ρ). Λ

This definition differs from the previous version, Definition 10, in that the domain of S is no longer the set B[0,1] of all effects but the much smaller set of sharp effects of rank 1. The requirement that S be convex-linear is removed, because it would make no sense when the domain of S is not convex.

Remark 12. The restriction to sharp effects is significant because, as explained in Remark 4, measuring an effect E is not in general the same as measuring the observable that is given by the same self-adjoint operator E. The two sorts of measurement are the same if and only if E is a sharp effect, i.e., a projection operator from H to a closed subspace. Thus, sharp effects are the area common to the effect-based hidden-variable notions considered in this section and the observablebased hidden-variable theories to be discussed in Section 4 below.

22

ANDREAS BLASS AND YURI GUREVICH

Definition 11 reduces the domain of S not only to the set of sharp effects but to the even smaller set of projections for which the rank, the dimension of the range, is 1. This additional reduction is included simply as a mathematical optimization of the theorem. Since the domain of S is, in Definition 11, no longer a convex set, there is no requirement that S be convex-linear. In principle, a quite arbitrary function could serve as S, though, as we shall see in the proof of the theorem below, the last clause of the definition, equating a trace to an integral, implies a remnant of linearity for S, namely that S is one-to-one and its inverse is the restriction to the range of S of a linear transformation. We now turn to our expectation no-go theorem, Theorem 1 in the introduction, expressing it in the language of probability representations. Theorem 13. If the Hilbert space H has dimension at least 2, then there is no probability representation (our version) for quantum systems described by H. Proof. Suppose, toward a contradiction, that we have a probability representation (our version), consisting of Λ, T, S, for some H of dimension at least 2. We begin by working with the convex-linear map T : T+1 → M+1 , and our first objective is to extend it to a linear map, still called T , from all of T into M. For general information about such extensions of convex-linear maps, see Appendix A, but for the case at hand it is convenient to give the following very specific argument. Any trace-class self-adjoint operator A ∈ T can be written as the difference of two positive trace-class operators A = A+ −A− , where A+ has the same positive eigenvalues and corresponding eigenspaces as A but is identically zero on all the eigenspaces corresponding to nonpositive eigenvalues. −A− similarly matches the negative eigenvalues and eigenspaces of A; we reverse its sign to get the positive operator A− . As long as neither A+ nor A− is zero, we can multiply them by suitable scalars to produce operators with trace 1, i.e., elements of T+1 , and thus we can write A = bB − cC where B, C ∈ T+1 and b, c are positive real numbers. If one or both of A+ and A− is zero, then we still have such a formula for A but one or both of b and c will be zero. So we always have A = bB − cC where B, C ∈ T+1 and b, c ≥ 0. Note for future reference that in this situation Tr(A) = b − c and, for the particular construction of A± , B, and C given here, kAk = b + c. (The norm here is that in T , which we defined as the sum of the absolute values of the eigenvalues.)

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 23

We extend T to a map T : T → M by setting, with notation as above, T (A) = bT (B) − cT (C). Even though A can have many representations as bB − cC with B, C ∈ T+1 and b, c ≥ 0, they all yield the same T (A). Indeed, if b′ B ′ − c′ C ′ is another such representation, then from bB − cC = A = b′ B ′ − c′ C ′ , we obtain bB + c′ C ′ = b′ B ′ + cC. Furthemore, since all of B, C, B ′ , C ′ have trace 1, we also have b+c′ = b′ +c, and therefore b c′ b′ c ′ B + C = B′ + ′ C. ′ ′ ′ b+c b+c b +c b +c Here, both sides are convex combinations, so convex-linearity of T yields c′ b′ c b ′ T (B) + T (C ) = T (B ′ ) + ′ T (C). ′ ′ ′ b+c b+c b +c b +c Transposing some terms and clearing fractions (remembering that b + c′ = b′ + c), we get bT (B) − cT (C) = b′ T (B ′ ) − c′ T (C ′ ),

which means that T (A) is well-defined. An easy computation then shows that T is linear. We claim that T is a bounded linear transformation. To this end, consider some A with kAk ≤ 1 in T . Then, as indicated above, we can represent A as bB − cC with B, C ∈ T+1 , with b, c ≥ 0, and with b+c = kAk ≤ 1. Now T (B) and T (C) are measures with norm 1 in M. So T (A) = bT (B) − cT (C) has norm at most b + c ≤ 1. This completes the proof that T : T → M is a bounded linear transformation. It follows that T induces a bounded linear transformation on the dual spaces, T ′ : M′ → T ′ . In detail, T ′ sends any bounded linear functional h ∈ M′ (which means h : M → R) to the bounded linear functional T ′ (h) = h ◦ T : T → R; T ′ (h)(A) = h(T (A)) for all h ∈ M′ and all A ∈ T .

Recall from the discussion in Subsection 3.3 how the dual space T ′ of T is identified with B and part of the dual space M′ of M is identified with F . Via these identifications, T ′ : M′ → T ′ restricts to a bounded linear transformation, which we still call T ′ , from F to B. Untangling the definitions, we find that, for each f ∈ F , T ′ (f ) is the unique element of B that satisfies Z ′ (1) Tr(T (f )A) = f dT (A) for all A ∈ T . Λ

Indeed, the left side of this equation is the value obtained by applying to A ∈ T the functional identified with T ′ (f ) ∈ B, while the right side

24

ANDREAS BLASS AND YURI GUREVICH

is the value obtained by applying to the measure T (A) the functional identified with f ∈ F . Note also that this equation, though true for all A ∈ T , would still suffice to uniquely determine T ′ (f ) if it were asserted only for A ∈ T+1 ; this is because, as we showed above, the linear span of T+1 is the whole space T . We now invoke the last clause in Definition 11 to find that, for all rank-1 projections E and all ρ ∈ T+1 , Z Tr(Eρ) = S(E) dT (ρ) = Tr(T ′ (S(E))ρ)). Λ

But this is, as we saw in the preceding paragraph, enough to show that T ′ (S(E)) = E. Recall that we imposed no linearity conditions on S. Nevertheless, because T ′ is linear, this last equation gives what can be viewed as a weak linearity requirement for S. On its range, S is inverted by a linear transformation T ′ . So far, we have followed the argument in [8] fairly closely, just adding some details, for example the reason why T is bounded, and noting that a drastically reduced domain of S suffices. At this point, though, Ferrie et al. claim, quoting Bugajski [4], that the linearity of T ′ implies that it preserves a property called coexistence. Unfortunately, this preservation claim needs not only that T ′ is linear but also that it preserves positivity and sends the constant function 1 to the identity operator. T ′ actually has these properties, but this needs to be checked; we give the proof below. Also, although we could work with the general notion of coexistence, it turns out to be more convenient to use an equivalent formulation, from [9], for the special case of two effects. (For readers interested in the general notion, we suggest [4] and [9].) In preparation for the next step in the proof, we need some computations. The first of these is to compute T ′ (1), where 1 ∈ F means the constant function with value 1. Referring to the formula (1) characterizing T ′ and remembering that it suffices to have this formula for A ∈ T+1 , we see that T ′ (1) is the unique bounded linear operator that satisfies, for all ρ ∈ T+1 , Z ′ Tr(T (1)ρ) = dT (ρ) = T (ρ)(Λ) = 1 = Tr(ρ) = Tr(Iρ), Λ

where the third equality comes from the fact that T maps T+1 into the space M+1 of probability measures. Thus, T ′ (1) = I. The other computation that we need is conveniently summarized in the following lemma. Recall that a bounded linear operator A is said

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 25

to be positive if hψ|A|ψi ≥ 0 for all |ψi ∈ H and that A ≤ B means that B − A is positive. Lemma 14. If f ∈ F is nonnegative (meaning f (λ) ≥ 0 for all λ ∈ Λ), then T ′ (f ) is a positive operator. Therefore, if f ≤ g pointwise in F then T ′ (f ) ≤ T ′ (g) in B. Proof. The second assertion follows immediately from the first applied to g − f , because T ′ is linear. To prove the first assertion, suppose f ∈ F is nonnegative, and let |ψi be any vector in H. The conclusion we want to deduce, hψ|T ′(f )|ψi ≥ 0, is obvious if |ψi = 0, so we may assume that |ψi is a non-zero vector. Normalizing it, we may assume further that its length is 1. Then |ψihψ| ∈ T+1 and therefore T (|ψihψ|) ∈ M+1 . Using equation (1), we compute Z ′ ′ hψ|T (f )|ψi = Tr(T (f )|ψihψ|) = f dT (|ψihψ|) ≥ 0, Λ

where we have used that both the measure T (|ψihψ|) and the integrand f are nonnegative.7 The following lemma says, in view of a criterion of Heinosaari [9, equation (2)], that any two elements of F[0,1] coexist. Lemma 15. If f, g ∈ F[0,1] , then there exists h ∈ F[0,1] such that all four of h, f − h, g − h, and 1 − f − g + h are nonnegative. Proof. Define h(λ) = min{f (λ), g(λ)} for all λ ∈ Λ. Then the first three of the assertions in the lemma are obvious, and the fourth becomes obvious if we observe that f + g − h = max{f, g} ≤ 1. Corollary 16. For any two rank-1 projections A, B of H, there exists an operator H ∈ B such that all four of H, A − H, B − H, and I − A − B + H are positive operators. Proof. Apply Lemma 15 with f = S(A) and g = S(B), let h be the function given by the lemma, and let H = T ′ (h). The nonnegativity of h, f − h, g − h, and 1 − f − g + h implies, by Lemma 14, the positivity of T ′ (h) = H, T ′ (S(A) − h) = A − H, T ′ (S(B) − h) = B − H, and T ′ (1 − S(A) − S(B) + h) = I − A − B + H, where we have also used the linearity of T ′ , the fact that T ′ (1) = I, and the formula T ′ (S(A)) = A for all A in the domain of S. Let us apply this corollary to two specific rank-1 projections. Fix two orthonormal vectors |0i and |1i. (This is where we use that H has 7The

proof would break down here if we were working with possibly negative quasiprobabilities.

26

ANDREAS BLASS AND YURI GUREVICH

√ dimension at least 2.) Let |+i = (|0i+|1i)/ 2. We use the projections A = |0ih0| and B = |+ih+| to the subspaces spanned by |0i and |+i. Let H be as in Corollary 16 for these projections A and B. From the positivity of H and of A − H, we get that 0 ≤ h1|H|1i and that 0 ≤ h1|(A − H)|1i = h1|A|1i − h1|H|1i = −h1|H|1i,

where we have used that |1i, being orthogonal to |0i, is annihilated by A. Combining the two inequalities, we infer that h1|H|1i = 0 and therefore, since H is positive, H|1i√= 0. Similarly, using the orthogonal vectors |+i and |−i = |0i − |1i)/ 2 in place of |0i and |1i, we obtain H|−i = 0. So, being linear, H is identically zero on the subspace of H spanned by |1i and |−i; note that |0i is in this subspace, so we have H|0i = 0. Now we use the part of Corollary 16 that has not yet been used, namely the positivity of I − A − B + H. Since H|0i = 0, we can compute 1 −1 0 ≤ h0|(I −A−B+H)|0i = h0|0i−h0|A|0i−h0|B|0i = 1−1− √ = √ . 2 2 This contradiction completes the proof of the theorem. 4. Value No-Go Theorems We turn now to a different species of no-go theorems, ones saying that hidden-variable theories cannot even produce the correct outcomes for individual measurements, let alone the correct probabilities or expectation values. Such theorems considerably predated the expectation no-go theorems considered in the preceding section. Value no-go theorems were first established by Bell [1, 2] and then by Kochen and Specker [11]; we shall also refer to the user-friendly exposition given by Mermin [12]. Note that there is no implication in either direction between value no-go theorems and expectation no-go theorems. The former say that a hidden-variable theory cannot predict the correct values for measured quantities, but it might still predict the correct expectations; the latter say that a hidden-variable theory cannot predict the correct expectations, but it might still predict the correct values. Of course, in order to formulate value no-go theorems, one must specify what “correct outcomes for individual measurements” means. For this purpose, we need the notion of the joint spectrum of commuting operators on Hilbert space, and we devote the next subsection to summarizing the basic facts about joint spectra.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 27

4.1. Joint Spectra. A general reference for the notion of joint spectrum is [3, Section 6.5]. Let A1 , . . . , An be a finite list of pairwise commuting, self-adjoint operators on a Hilbert space H. The notion of the joint spectrum of such a list is a natural generalization of the notion of the spectrum of a single self-adjoint operator. The simplest case occurs when the operators are simultaneously diagonalizable, i.e., when H admits an orthonormal basis consisting of common eigenvectors of all the Ai ’s. In this case, the joint spectrum consists of the n-tuples of scalars ν = (ν1 , . . . , νn ) ∈ Rn that occur as the eigenvalues for such common eigenvectors. That is, ν belongs to the joint spectrum if and only if there is a non-zero vector |ψi ∈ H such that Ai |ψi = νi |ψi for i = 1, . . . , n. If H is finite-dimensional, then this simple case is the only one that can arise, but for infinite-dimensional H we must take into account the possibility of a continuous spectrum (instead of, or in addition to, the discrete spectrum given by eigenvectors). A point ν ∈ Rn belongs to the joint spectrum σ(A1 , . . . , An ) of A1 , . . . , An if and only if it is approximately a tuple of eigenvalues in the following sense: For every positive ε, there is a unit vector |ψi ∈ H (an approximate simultaneous eigenvector) such that, for each i = 1, . . . , n, we have kAi |ψi − νi |ψik < ε. The joint spectrum of a tuple of self-adjoint operators is a closed subset of Rn . If the operators are bounded, then so is their joint spectrum. Just as for a single operator, there is a spectral decomposition leading to a functional calculus for tuples of commuting self-adjoint operators. In more detail, there is a unique spectral measure E, a countably additive map from Borel subsets of Rn to projection operators on H, such that, for each i, Z Ai = xi dE(x1 , . . . , xn ). Rn

The joint spectrum σ(A1 , . . . , An ) can be characterized as the support of this spectral measure, i.e., the set of points ν ∈ Rn such that E(B) 6= 0 for all neighborhoods B of ν. The preceding information about joint spectra is explicit in [3, Section 6.5]. (For the boundedness of the joint spectrum of commuting bounded operators, look at the proof of Theorem 1 in that section.) What follows is implicit in the statement, on page 155 of [3], that most of Section 1, Subsection 4, which concerns functions of a single

28

ANDREAS BLASS AND YURI GUREVICH

operator, can be repeated in the present context of several commuting operators. We fill in some arguments that are not given in that subsection of [3]. Given a Borel function f : Rn → R, one defines Z f (A1 , . . . , An ) = f (x1 , . . . , xn ) dE(x1 , . . . , xn ). Rn

We shall use this notion only for continuous f , and in this case we have the following useful information.

Proposition 17. Let A1 , . . . , An be commuting, self-adjoint operators, with joint spectrum σ(A1 , . . . , An ). Then, for any continuous f : Rn → R, we have f (A1 , . . . , An ) = 0 if and only if f vanishes identically on σ(A1 , . . . , An ). Furthermore, a point ν ∈ Rn belongs to σ(A1 , . . . , An ) if and only if every continuous function f : Rn → R that satisfies f (A1 , . . . , An ) = 0 also satisfies f (ν) = 0. Proof. Although we have two “if and only if” statements to prove, their “only if” halves say the same thing, so we need only to prove three implications: (1) If a continuous function f : Rn → R vanishes identically on σ(A1 , . . . , An ), then f (A1 , . . . , An ) = 0. (2) If f (A1 , . . . , An ) = 0 for a continuous f and if ν ∈ σ(A1 , . . . , An ), then f (ν) = 0. (3) If ν ∈ / σ(A1 , . . . , An ), then there is a continuous f : Rn → R with f (A1 , . . . , An ) = 0 but f (ν) 6= 0 Item (1) here is clear from the definition of f (A1 , . . . , An ). It is the integral of f with respect to E, and f vanishes on the support of E. For item (2), we use the generalization to several commuting operators of a fact from the cited subsection of [3], namely that kf (A1 , . . . , An )k = E- sup{|f (ν)| : ν ∈ σ(A1 , . . . , An )}.

Here the notation E- sup means the essential supremum with respect to the spectral measure E, which is the infimum of all the numbers a such that E({ν : |f (ν)| > a}) = 0. In the situation of item (2), we therefore have that this essential supremum is zero. Suppose now, toward a contradiction, that ν ∈ σ(A1 , . . . , An ) is a point for which f (ν) 6= 0. Since f is continuous and f (ν) 6= 0, there is an open neighborhood N of ν such that, for all x ∈ N, |f (x)| > 21 |f (ν)| > 0. Since the essential supremum of |f | is zero, there is an a < 12 |f (ν)| for which E({x : |f (x)| > a}) = 0. But the set {x : |f (x)| > a} includes N, so E(N) = 0. This is a contradiction, because every neighborhood N of a point ν in the joint spectrum must have E(N) 6= 0.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 29

Finally, to prove item (3), suppose ν ∈ / σ(A1 , . . . , An ) and notice that, thanks to item (1), we need only find a continuous f that vanishes identically on σ(A1 , . . . , An ) but does not vanish at ν. Since σ(A1 , . . . , An ) is closed, the function sending each point in Rn to its distance from σ(A1 , . . . , An ) does the job. The last assertion in Proposition 17 can be summarized as: The joint spectrum of A1 , . . . , An consists of all those points (ν1 , . . . , νn ) that satisfy all the same equations as the operators themselves. Here “equations” should be understood as equations between continuous functions. Just as the points in the spectrum of a single Hermitian operator A are, according to quantum theory, the possible results of a measurement of A, so the points in the joint spectrum of A1 , . . . , An are the possible outcomes of a simultaneous measurement of all of A1 , . . . , An . Note that both mathematics and physics require the operators A1 , . . . , An here to commute — mathematics in order that the joint spectrum be defined, and physics in order that these observables be simultaneously measurable. We record, for future reference, some very special cases of the definition of joint spectrum. These all fall under the simple case mentioned at the beginning of this subsection: the operators will be simultaneously diagonalizable, so the joint spectrum consists of the eigenvalues for the common eigenvectors of the operators A1 , . . . , An . If the Ai are projections, then each point in their joint spectrum is a tuple of zeros and ones. If A1 , . . . , An are the rank-1 projections to an orthogonal set of directions, then their joint spectrum contains all the n-tuples consisting of a single one and n − 1 zeros. The only other point that could be in the joint spectrum is the n-tuple of all zeros; it is present if and only if the directions to which that Ai ’s project do not span the whole space H. 4.2. Value Maps. Now we are ready to define precisely what is expected of a hidden-variable theory in order for it to predict the correct values for observables. The following definition, which is based on the discussion in [12, Section II], is intended to provide that specification. Definition 18. Let H be a Hilbert space, and let O be a set of observables, i.e., self-adjoint operators on H. A value map for O in H is a function v assigning to each observable A ∈ O a number v(A) in the spectrum of A, in such a way that, whenever A1 , . . . , An are pairwise commuting elements of O, then (v(A1 ), . . . , v(An )) is in the joint spectrum of (A1 , . . . , An ).

30

ANDREAS BLASS AND YURI GUREVICH

The intention behind this definition is that, in a hidden-variable theory, a quantum state represents an ensemble of individual systems, each of which has definite values for observables. That is, each individual system has a value map associated to it, describing what values would be obtained if we were to measure observable properties of the system. A believer in such a hidden-variable theory would expect a value map for the largest possible O, the set of all self-adjoint operators on H, unless there were superselection rules rendering some such operators unobservable. The part of Definition 18 about pairwise commuting operators says exactly that, if one measures the observables A1 , . . . , An simultaneously, which is possible because they commute, then the values one obtains should be among the possibilities permitted by quantum mechanics, namely the n-tuples in the joint spectrum of the operators. On the other hand, for observables that do not commute, quantum mechanics does not allow them to be simultaneously exactly measured, does not describe possible simultaneous values, and thus does not impose restrictions on value maps. 4.3. No-Go Theorem. A hidden-variable theory should do more than just provide some value maps describing the properties of the subensembles inside the quantum states. It should provide, for each quantum state ρ, a probability distribution µρ over the set of value maps that accounts for the measured values of observables in O. The precise meaning of “accounts for” is as follows. For each observable A ∈ O, there is a probability distribution µA ρ induced on the spectrum of A by µA ρ (X) = µρ ({v : v(A) ∈ X}) for all subsets X of the spectrum of A. This induced probability distribution should agree with the probability distribution predicted by quantum theory for the observable A in the state ρ. One would thus expect that a no-go theorem in this context would say that there is no way to assign, to each state, an appropriate probability distribution over value maps. Surprisingly, the no-go theorems of Bell [1, 2] and Kochen and Specker [11] are far stronger. They say that, for H of dimension at least 3, there are no value maps at all for H and the set Oall of all self-adjoint operators on H. Better yet, there are no value maps for certain specific finite8 subsets O of Oall . 8In

the case of ﬁnite-dimensional H, where each observable has only a ﬁnite spectrum, we can use the compactness theorem of propositional logic to infer, from the no-go theorem for Oall , that there is also a no-go theorem for some ﬁnite O ⊆

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 31

We strengthen this result by tightly restricting the sort of observables that are needed in O. This is Theorem 2 from the introduction.

Theorem 19. Suppose that the dimension of the Hilbert space is at least 3. (1) There is a finite set O of projections for which no value map exists. (2) If the dimension is finite then there is a finite set O of rank 1 projections for which no value map exists. The desired finite sets of projections are constructed explicitly in the proof. Remark 20. The assumption in part (2) of Theorem 19 that the dimension of H is finite cannot simply be omitted. If dim(H) is infinite, then the set O of all finite-rank projections admits a value map, namely the constant zero function. This works because the definition of “value map” imposes constraints on only finitely many observables at a time. Proof. We start with proving Theorem 19.2, i.e. part (2) of Theorem 19. Arguably the result is implicit in [2, Section 5] but it is not explicitly stated there and no specific O of the desired sort is given. In [11] and [12], the result is explicitly proved for 3-dimensional H, but the extension to larger H, which is easy if one just wants to extend a general no-go theorem, is not quite so obvious under the restriction to finitely many rank-1 projections. Because of this situation, we outline both versions of the proof, referring to these older papers for much of the work but filling in the additional arguments needed to get our result. Proof of Theorem 19.2 following Bell. Bell [2, Section 5] works from three basic properties of (what we call) a value map v, namely (1) For every rank-1 projection |ψihψ| (where |ψi is a unit vector), v(|ψihψ|) is 0 or 1. (2) If v(|ϕihϕ|) = 1 and |ψi is orthogonal to |ϕi, then v(|ψihψ|) = 0. (3) If v(|ψ1 ihψ1 |) = v(|ψ2 ihψ2 |) = 0 for two orthogonal unit vectors |ψ1 i and |ψ2 i, then also v(|ψihψ|) = 0 for all unit vectors |ψi of the form α|ψ1 i + β|ψ2 i. All three of these follow from the definition of value map provided O contains all of the rank-1 projections of H. Property (1) is immediate from the fact that the spectrum of a non-trivial projection is included

Oall . The compactness argument does not, however, produce a speciﬁc example of such an O.

32

ANDREAS BLASS AND YURI GUREVICH

in {0, 1}. Similarly, Property (2) follows from the facts that, if |ϕi and |ψi are orthogonal, then the projections |ϕihϕ| and |ψihψ| commute and their joint spectrum is {(0, 0), (0, 1), (1, 0)}. (If H were only 2-dimensional, this joint spectrum would be only {(0, 1), (1, 0)}, but Property (2) would still follow for the same reason: (1, 1) is not in the joint spectrum.) To prove Property (3), complete {|ψ1 i, |ψ2 i} to an orthonormal basis for H, say {|ψ1 i, |ψ2 i, . . . , |ψn i}. The associated rank-1 projections |ψi ihψi | commute, and their joint spectrum consists of the vectors in which one component is 1 and all the rest are 0. So we must have v(|ψi i) = 1 for some i ≥ 2. But then the desired equation in (3) follows from (2) because α|ψ1 i + β|ψ2 i is orthogonal to |ψi i. (This argument appears to require dim(H) ≥ 3 in order to have a |ψi i to work with here, but this appearance is wrong. If dim(H) = 2 then Property (3) holds vacuously because {|ψ1 i, |ψ2 i} is an orthonormal base for H, so v must send one of the associated projections to 1. The real use of dim(H) ≥ 3 comes later.) Bell deduces from these three properties and dim(H) ≥ 3 that v is continuous. More explicitly, he shows that, if v(|ϕihϕ|) = 0 and v(|ψihψ|) = 1, for unit vectors |ϕi and |ψi, then k|ϕi − |ψik > 12 . His argument involves applying the three properties to some auxiliary vectors in addition to |ϕi and |ψi. Bell completes the proof of the no-go theorem by observing that, since v must take both values 0 and 1, this continuity result is a contradiction. So there cannot be a value map defined on all of the rank-1 projections. For our purposes, namely producing a finite set O of rank-1 projections with no value map, we must work a bit more. Using the fact that dim(H) is finite and at least 2, start with an orthonormal base O1 for H and enlarge it to a finite superset O2 with the property that every two vectors |ϕi, |ψi ∈ O2 can be joined by a chain in O2 , |ϕi = |χ0 i, |χ1 i, . . . , |χl i = |ψi in which the distance between any two consecutive terms is at most 1 . So, for each two consecutive terms, Bell’s argument gives us 2 v(|χi ihχi |) = v(|χi+1 ihχi+1 |). Of course, the argument involves the auxiliary vectors mentioned above, in addition to these two consecutive |χi’s, but there are only finitely many of these auxiliary vectors. Adjoin all of those vectors, for all i, to O2 to get the final O. If v were a value map for O, then, by Bell’s argument, we would have v constant on the rank-1 projections associated to the vectors in O2 and therefore in particular the vectors in the orthonormal base O1 . That is absurd, because a value map, when applied to the projections associated to an

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 33

orthonormal base always produces a single 1 and the rest 0’s. So O is as required by the theorem. Proof of Theorem 19.2 following Kochen-Specker and Mermin. When the dimension of H is exactly 3, the constructions given by Kochen and Specker [11] and Mermin [12, Section IV] provide the desired O. More precisely, the proof of Theorem 1 in [11] uses a Boolean algebra generated by a finite set of one-dimensional subspaces of H, and it shows that the projections to those subspaces constitute an O of the required sort. Mermin works instead with squares Si2 of certain spin-components of a spin-1 particle, but these are projections to 2-dimensional subspaces of H, and the complementary rank-1 projections I − Si2 serve as the desired O. When the dimension of H is greater than 3, but still finite, we shall see in Theorem 21 below how to bootstrap the result from lower to higher dimensions. Notice that, if one merely wants a no-go theorem saying that some O has no value map, then this bootstrapping is easy, as noted in [1, 11, 12]. Work is needed only to get all the operators in O to be rank 1 projections. Proof Theorem 19.1. The case where dim(H) is finite was covered by Theorem 19.2, so it remains to treat the case of infinite-dimensional H. Let K and L be Hilbert spaces, with dim(K) = 3 and dim(L) = dim(H). Note that then their tensor product K ⊗ L has the same dimension as H, so it can be identified with H. Let O be as in Theorem 19 for the 3-dimensional K. Let O′ = {P ⊗ IL : P ∈ O}, where IL is the identity operator on L. Then O′ is a set of infinite-rank projections of K ⊗ L = H, having the same algebraic structure as O. It follows that there is no value map for O′ . This completes the proof of Theorem 19.

We note that the measurements involved in Theorem 19.2, namely the rank-1 projections, are the same as those involved in our expectation no-go Theorem 13. We hope that, by reducing both species of no-go theorems to an extremely simple sort of measurement, and furthermore a sort where measurement as observable and measurement as effect coincide, we have clarified the similarities as well as the differences between the two species.

34

ANDREAS BLASS AND YURI GUREVICH

5. Bootstrapping the dimension Our objective in this section is to show that, in many cases, a no-go theorem for a Hilbert space H automatically yields no-go theorems for larger Hilbert spaces, ones that contain H as closed subspaces. The section has independent value and can be read independently except that it needs the definition of value map and two definitions (Spekkens’s and ours) of probability representation. Intuitively, such dimension bootstrapping results are to be expected. If hidden-variable theories could explain the behavior of quantum systems described by the larger Hilbert space, say H′ , then they could also provide an explanation for systems described by the subspace H. The latter systems are, after all, just a special case of the former, consisting of the pure states that happen to lie in H or mixtures of such states. The no-go theorems under discussion here, both ours (Theorems 13 and 19) and those from the previous literature ([16, 6, 7, 8, 1, 2, 11, 12]), give much more information than just the impossibility of matching the predictions of quantum-mechanics with a hidden-variable theory. They establish that hidden-variable theories must fail in very specific ways. It is not so obvious that these specific sorts of failures, once established for a Hilbert space H, necessarily also apply to its superspaces H′ . We shall prove two theorems saying that no-go results for a Hilbert space H′ follow directly from no-go results for a subspace H. The two theorems differ in the sort of no-go results that they apply to; one is for expectation no-go results as in Theorem 13; the other is for value no-go results as in Theorem 19. We shall also comment on the situation for the results in [16, 6, 7]. We begin with the theorem dealing with value no-go results. This is the most important part of this section, because it was used in the proof of Theorem 19.2 above. There, we invoked constructions from the literature proving the result for H of dimension 3 but we claimed the result for all finite dimensions from 3 up. That claim is supported by the following theorem. Theorem 21. Suppose H ⊆ H′ are finite-dimensional Hilbert spaces. Suppose further that O is a finite set of rank-1 projections of H for which no value map exists. Then there is a finite set O′ of rank-1 projections of H′ for which no value map exists. Proof. Clearly, if two Hilbert spaces are isomorphic and if one of them has a finite set O of rank-1 projections with no value map, then the other also has such a set. It suffices to conjugate the projections in O by any isomorphism between the two spaces. Thus, the existence of

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 35

such a set O depends only on the dimension of the Hilbert space, not on the specific space. Proceeding by induction on the dimension of H′ , we see that it suffices to prove the theorem in the case where dim(H′ ) = dim(H) + 1. Given such H and H′ , let |ψi be any unit vector in H′ , and observe that its orthogonal complement, |ψi⊥ , is a subspace of H′ of the same dimension as H and thus isomorphic to H. By the induction hypothesis, this subspace |ψi⊥ has a finite set O of rank-1 projections for which no value map exists. Each element of O can be regarded as a rank-1 projection of H′ ; indeed, if the projection was given by |ϕihϕ| in |ψi⊥ , then we can just interpret the same formula |ϕihϕ| in H′ , using the same unit vector |ϕi ∈ |ψi⊥ Let O1 consist of all the projections from O, interpreted as projections of H′ , together with one additional rank-1 projection, namely |ψihψ|. What can a value map v for O1 look like? It must send |ψihψ| to one of its eigenvalues, 0 or 1. Suppose first that v(|ψihψ|) = 0. Then, using the fact that |ψihψ| commutes with all the other elements of O1 , we easily compute that what v does to those other elements amounts to a value map for O. But O was chosen so that it has no value map, and so we cannot have v(|ψihψ|) = 0. Therefore v(|ψihψ|) = 1. (It follows that v maps the projections associated to all the other elements of O′ to zero, but we shall not need this fact.) We have thus shown that any value map for the finite set O1 must send |ψihψ| to 1. Repeat the argument for another unit vector |ψ ′ i that is orthogonal to |ψi. There is a finite set O2 of rank-1 projections such that any value map for O2 must send |ψ ′ ihψ ′ | to 1. No value map can send both |ψihψ| and |ψ ′ ihψ ′ | to 1, because their joint spectrum consists of only (1, 0) and (0, 1). Therefore, there can be no value map for the union O1 ∪ O2 , which thus serves as the O′ required by the theorem. The finiteness of dim(H′ ) is essential in this theorem. If the theorem were true for infinite-dimensional H′ , then the same would be the case for Theorem 19, contrary to Remark 20. The next theorem, in contrast, does not require dimensions to be finite. Theorem 22. Let H′ be a Hilbert space and H a closed subspace of H′ . From any probability representation (our version) for quantum systems described by H′ , one can directly construct such a representation for systems described by H.

36

ANDREAS BLASS AND YURI GUREVICH

Strictly speaking, this theorem is vacuous, since Theorem 13 says that there is no probability representation (our version) for quantum systems described by any Hilbert space of dimension ≥ 2. The intention, however, is that the construction here is considerably easier than that in Theorem 13. In particular, if we knew Theorem 13 only for 2dimensional H, this would suffice to get the full Theorem 13. This fact supports our assessment, in Section 3, that the careful development and rigorous proofs in [8] are a greater contribution than the extension to infinite-dimensional Hilbert spaces. (Additional support will come later in this section.)

Proof. We construct a probability representation (our version) Λ, T , and S for quantum systems described by H (with notation as in Definition 11) from any such representation Λ′ , T ′ , and S ′ for the larger Hilbert space H′ . To begin, we set Λ = Λ′ . To define T and S, we use the inclusion map i : H → H′ , sending each element of H to itself considered as an element of H′ , and we use the adjoint p : H′ → H, which is the orthogonal projection of H′ onto H. Given any density operator ρ ∈ T+1 (H), we can expand it to a density operator ρ¯ = i◦ρ◦p ∈ T+1 (H′ ). Note that this expansion is very natural: If ρ corresponds to a pure state |ψi ∈ H, i.e., if ρ = |ψihψ|, then ρ¯ corresponds to the same |ψi ∈ H′ . If, on the other hand, ρ is a mixture of states ρi , then ρ¯ is the mixture, with the same coefficients, of the ρi . Define T : T+1 (H) → M+1 (Λ) by T (ρ) = T ′ (¯ ρ). The definition of S is similar. Notice that, if E is a rank-1 projection in H, then E¯ = i ◦ E ◦ p is a rank-1 projection in H′ . So we can define ¯ Again, the passage from E to E ¯ is very natural. If E S(E) = S ′ (E). projects to the one-dimensional subspace spanned by |ψi ∈ H, then E¯ projects to the same subspace, now considered as a subspace of H′ . This completes the definition of Λ, T , and S. Most of the requirements in Definition 11 are trivial to verify. For the last requirement, the agreement between the expectation computed as a trace in quantum mechanics and the expectation computed as an integral in the probability representation, it is useful to notice first p ◦ i is the identity operator on H. We can then compute, for any ρ ∈ T+1 (H) and any

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 37

rank-1 projection E on H, Z Z ¯ dT ′ (¯ S(E) dT (ρ) = S ′ (E) ρ) Λ

Λ

¯ = Tr(¯ ρE)

= Tr(i ◦ ρ ◦ p ◦ i ◦ E ◦ p) = Tr(i ◦ ρ ◦ E ◦ p)

= Tr(ρ ◦ E ◦ p ◦ i)

= Tr(ρ ◦ E), as required.

To finish this section, we briefly discuss the possibility of transferring no-go theorems as in [16, 6, 7] from a Hilbert space H to a larger space H′ . To be specific, we consider probability representations (Spekkens version) as in Definition 5, subject to the assumptions of determinateness (ξE depends only on the effect E, not on the POVM containing it) and convex-linearity of both of the maps ρ 7→ µρ and E 7→ ξE . Proposition 23. Let H be a closed subspace of the Hilbert space H′ . If H′ admist a probability representation (Spekkens version) satisfying determinateness and convex-linearity, then so does H. Proof. At first, it might seem that we can proceed exactly as in the proof of Theorem 22, transforming the density operators ρ and effects E of the subspace H to density operators ρ¯ = i ◦ ρ ◦ p and effects ¯ = i ◦ E ◦ p on the superspace H′ , and then using this transformation E to convert a probability representation (Spekkens version) for H′ , say Λ′ , µ′, ξ ′ , to one for H. In detail, we would use the same measure space, Λ = Λ′ , and we would set µρ = µ′ρ¯ and ξE = ξE′¯ . This approach works well as far as ρ¯ and µρ are concerned, but there is a problem with E¯ and ξE . Definition 5 requires that, Pif {Ek : k ∈ K} is a POVM, i.e., if the effects Ek have sum I, then k∈K ξEk (λ) = 1 for all λ ∈ Λ. Given that ξ ′ satisfies this requirement on H′ , we want that ξ satisfies it on H. So we would like to argue that, if {Ek : k ∈ K} ¯k : k ∈ K} is a POVM in H′ , which would is a POVM in P H, then {EP give us that k ξEk (λ) = k ξE′¯k (λ) = 1. Unfortunately, {E¯k : k ∈ K} will not be a POVM for H′ (unless H = H′ ). Indeed, using the fact that {Ek : k ∈ K} is a POVM, we can compute X X X Ek ◦ p = i ◦ I ◦ p = i ◦ p. E¯k = i ◦ Ek ◦ p = i ◦ k

k

k∈K

38

ANDREAS BLASS AND YURI GUREVICH

Here i◦p is the transformation i◦p : H′ → H′ that projects orthogonally to the subspace H; it is not the identity unless H = H′ . To correct the problem, we modify the definition of E¯ as follows. ¯ to be the unique Fix an arbitrary unit vector |αi ∈ H. Then define E ′ linear operator on H such that ( E|ψi if ψ ∈ H, ¯ E|ψi = hα|E|αi|ψi if |ψi⊥H. In other words, E¯ agrees with E on H and with a scalar multiple of the identity on the orthogonal complement of H, the multiplier of the identity being hα|E|αi. Another way to write E¯ uses the operator I − i ◦ p, which projects H′ onto the orthogonal complement of H; we have E¯ = i ◦ E ◦ p + hα|E|αi(I − i ◦ p). ¯ This the old one, because, P new version of E overcomes the problem withP if k Ek = I, then, because |αi is a unit vector, k hα|Ek |αi = 1 and X X X hα|Ek |αi(I − i ◦ p) = i ◦ p + 1(I − i ◦ p) = I. i ◦ Ek ◦ p + E¯k = k

k

k

Furthermore, this extension process from E to E¯ sends the identity and zero operators on H to the identity and zero operators on H′ , and the process respects weighted averages. Using the new extension process, we define ξE = ξE′¯ , and we claim that the result is a probability representation (Spekkens version) for H. The only non-trivial thing to check is the final requirement that the quantum-theoretic expectation values R Tr(ρE) agree with the hidden-variable theory’s expectation values dλ µρ(λ)ξE (λ). We compute Z Z dλ µρ(λ)ξE (λ) = dλ µ′ρ¯(λ)ξE′¯ (λ) Λ

Λ

¯ = Tr(¯ ρE)

= Tr(iρp · (iEρ + hα|E|αi(I − ip)))

= Tr(iρpiEp) + hα|E|αiTr(iρp(I − ip)).

The first term here was computed earlier and found to be Tr(ρE), which is the desired result, so it remains to check that the second term vanishes. Up to a factor hα|E|αi, it is Tr(iρp − iρpip) = Tr(iρp − iρp) = 0, where we have used that pi is the identity operator of H. This completes the proof of the proposition.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 39

Thus, for example, to prove the no-go theorems of Spekkens [16] and of Ferrie and Emerson [6, 7] (with appropriate clarifications as discussed above in Section 3), it would suffice to prove them for twodimensional Hilbert spaces (in quantum computing terminology, onequbit spaces); the theorems would automatically carry over to all larger Hilbert spaces. Because of the need for clarifications in these theorems, we give, in Appendix B, a proof of a Spekkens-style no-go theorem for Hilbert spaces of dimension two. Remark 24. The proof of Proposition 23 involved choosing an arbitrary unit vector |αi in H. This arbitrariness can be avoided when H is finitedimensional by averaging over all |αi’s. That is, if dim(H) = d, then we can replace the definition of E¯ in the proof with ( E|ψi if ψ ∈ H, ¯ E|ψi = 1 Tr(E)|ψi if |ψi⊥H, d and, since Tr(I) = d, the rest of the proof would work as before. 6. Bell’s Example and Symmetry Theorem 13 applies to all Hilbert spaces of dimension at least 2. We cannot expect any sort of no-go result in lower dimensions, because quantum theory in Hilbert spaces of dimensions 0 and 1 is trivial and therefore classical. The second part of Theorem 19 applies only to Hilbert spaces whose dimension is finite and at least 3. We have already indicated in Remark 20 why the theorem fails in infinite dimensions and in the first part of Theorem 19 why a modified version holds in infinite dimensions. What about dimension 2? Bell has given, in [1, 2], hidden-variable theories for a twodimensional Hilbert space. More precisely, he has assigned to each pure state |ψi in such a Hilbert space H a probability distribution on value maps, such that the resulting probability distributions for any observable agree with the predictions of quantum theory. In this section, we summarize the improved version of Bell’s example described by Mermin [12], we simplify part of his argument, and we explain why the example doesn’t contradict Theorem 13. We work with the Hilbert space H of 2-component vectors over C, so that operators on H are given by 2 × 2 matrices. Let ~σ be the 3-component “vector” whose entries are the Pauli matrices 0 1 0 −i 1 0 σx = , σy = , σz = . 1 0 i 0 0 −1

40

ANDREAS BLASS AND YURI GUREVICH

If ~n is any 3-component unit vector in R3 , then the dot product ~n · ~σ is a Hermitian operator with eigenvalues ±1. Every pure state of H is an eigenstate, for eigenvalue +1, of ~n · ~σ for a unique ~n. We use the notation |~ni for this eigenstate. (If H represents the states of a spin- 21 particle, then the operator 21 ~n · ~σ represents the spin component in the direction ~n, and so |~ni represents the state in which the spin is definitely aligned in the direction ~n. It is a special property of spin 21 that all pure states are of this form; for higher spins, a superposition of states with definite spin directions need not have a definite spin direction.) Any observable, i.e., any Hermitian operator on H, can be expressed as A = a0 I + (~a · ~σ ) for some scalar a0 ∈ R and vector ~a ∈ R3 . Its eigenvalues are a0 ± k~ak. The hidden-variable theory, as presented in [12, Section 3], assigns to each state |~ni a family of sub-ensembles labeled by unit vectors m ~ ∈ R3 , the probability distribution of m ~ being uniform on the unit 3 sphere in R . In the sub-ensemble of |~ni given by m, ~ the observable a0 I + (~a · ~σ ) has the (definite) value a0 + k~ak if (m ~ + ~n) · ~a ≥ 0

a0 − k~ak if (m ~ + ~n) · ~a < 0. Mermin writes that elementary integration confirms that, for any fixed state |~ni, the average over all m ~ of the values asigned to an observable a0 I + (~a · ~σ ) agrees with the result a0 + (~a · ~n) predicted by quantum mechanics. In fact, the required integration is so elementary that it was done by Archimedes. All one needs is the theorem that, when a sphere is cut by a plane, its area is divided in the same ratio as the length of the diameter perpendicular to the plane. To verify that the average over m ~ of the values of a0 I +(~a ·~σ) in the state |~ni is a0 +(~a ·~n), we begin with a couple of simplifications. First, we may assume that a0 = 0, because a general a0 would just be added to both sides of the equation that we are trying to prove. Second, thanks to the rotational symmetry of the situation (where any rotation is applied to all three of ~a, ~n and m), ~ we may assume that the vector ~a points in the z-direction. Finally, by scaling, we may assume that ~a = (0, 0, 1). So our task is to prove that the average over m ~ of the values assigned to σz is nz . By definition, the value assigned to σz is ±1, where the sign is chosen to agree with that of mz + nz . In view of how m ~ is chosen, this mz + nz is the z-coordinate of a random point on the unit sphere centered at ~n. So the question reduces to determining what fraction of this sphere lies above the x-y plane. This plane cuts this unit sphere horizontally

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 41

at a level nz below the sphere’s center. So, by Archimedes’s theorem, it divides the sphere’s area in the ratio of 1 + nz (above the plane) to 1 − nz (below the plane). That is, the value assigned to σz is +1 with probability (1 + nz )/2 and −1 with probability (1 − nz )/2. Thus, the average value of σz is nz , as required. This hidden-variable theory can be viewed in the framework of Section 4. Each of the vectors m ~ + ~n corresponds to a value map, namely the map sending any observable a0 I + (~a · ~σ ) to the value described above. It is not difficult to verify that this is indeed a value map, because there are so few commuting observables for our 2-dimensional H. Two observables commute if and only if their ~a’s are parallel or antiparallel. That is, they differ by only a scalar factor on the ~a ·~σ part and an arbitrary change of the a0 I part. The mere existence of a value map (let alone a good probabiilty distribution on value maps for all the states) shows that, in Theorem 19, the hypothesis of dimension ≥ 3 cannot be weakened so as to allow dimension 2. What happens if we try to fit this hidden-variable theory into the framework of Section 3? A natural choice for Λ is the space of all the value maps obtained above, or, more geometrically, the space of their parametrizations m ~ +~n. Since both m ~ and ~n are unit vectors, Λ will be the ball of radius 2 centered at the origin of R3 . For any pure state |~ni, the associated probability distribution T (|~vih~v|) is the uniform distribution on the two-dimensional surface of a unit sphere centered at ~n, because we are choosing m ~ randomly while ~n is fixed. Notice that the framework of Definition 5 does not handle this situation well, because these probability distributions are not absolutely continuous with respect to any natural probability distribution on Λ. (What a physicist might call the probability density on Λ associated to a state is not a function but a distribution.) So we work instead with the framework of Ferrie, Morris, and Emerson [8], as summarized in Definition 10 above or with the more liberal Definition 11. Both of these definitions require a convex-linear map T from the set T+1 of density matrices (representing mixed states) to the set M+1 of probability measures on Λ. The hidden-variable theory under consideration has, so far, provided measures only for the pure states, i.e., the density matrices of the special form |~nih~n|; to such a density matrix, it associated the uniform measure on the unit sphere surface centered at ~n. To obtain a probability representation, in either the Ferrie-Morris-Emerson version or our version, we must extend this map convex-linearly to all density matrices.

42

ANDREAS BLASS AND YURI GUREVICH

No such extension exists. Here is an example showing what goes wrong. Consider the four pure states corresponding to spin in the directions of the positive x, negative x, positive z and negative z axes. The corresponding density operators are the projections I + σx , 2

I − σx , 2

I + σz , 2

I − σz , 2

respectively. Averaging the first two with equal weights, we get 12 I; averaging the last two gives the same result. So a convex-linear extension T would have to assign to the density operator 12 I the average of the probability measures assigned to the pure states with spins in the ±x directions and also the average of the probability measures assigned to pure states with spins in the ±z directions. But these two averages are visibly very different. The first is concentrated on the union of two unit spheres tangent to the y-z-plane at the origin, while the second is concentrated on the union of two unit spheres tangent to the x-y-plane at the origin. Thus, Bell’s example of a hidden-variable theory for 2-dimensional H does not fit the assumptions in any of the expectation no-go theorems. It does not, therefore, clash with the fact that those theorems, unlike the value no-go theorems, apply in the 2-dimensional case. Another way to view this situation is as a demonstration that the hypothesis of convex-linearity cannot be omitted from the expectation no-go theorems. In comparison with Definition 10, which described the hypotheses used by Ferrie, Morris, and Emerson [8], our Definition 11 dropped the requirement of convex-linearity for effects; Bell’s example shows that we cannot also drop that requirement for states. In view of the idea of symmetry or even-handedness suggested by Spekkens [16], one might ask whether there is a dual version of Theorem 13, that is, a version that requires convex-linearity for effects but looks only at pure states and does not require any convex-linearity for states. The answer is no; with such requirements there is a trivial example of a successful hidden-variable theory, regardless of the dimension of the Hilbert space, so there cannot be a no-go theorem. The example can be concisely described as taking the quantum state itself as the “hidden” variable. In more detail, let Λ be the set of all states, i.e., the projective space obtained from the set of unit vectors of H by identifying any two that differ only by a phase factor. Let T assign to each pure state |ψihψ| the probability measure on Λ concentrated at the point λ|ψi that corresponds to the vector |ψi. Let S assign to each

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 43

effect E the function on Λ defined by S(E)(λ|ψi ) = hψ|E|ψi.

We have trivially arranged for this to give the correct expectation for any effect E and any pure state |ψi. The formula for S(E) is clearly convex-linear (in fact, linear) as a function of E. Of course, T cannot be extended convex-linearly to mixed states, so that Theorem 13 does not apply. Appendix A. Convex-Linearity As we pointed out, near the end of Section 3.1, Spekkens [16] erroneously claims that, if a function f is convex-linear on a convex set S of operators that span the space of Hermitian operators (and f takes the value zero on the zero operator if the latter is in S), then f can be uniquely extended to a linear function on this space. The correct version of the result extends f not to a linear function but to translated-linear function, i.e., a composition of translations and a linear function. The rest of this section is devoted to a proof of this fact, in its natural level of generality. It applies to arbitrary real vector spaces; that the space consists of Hermitian operators is irrelevant. The convex hull, Conv(S), of a subset S of a real vector space V consists of the convex combinations a1 v1 + · · · + an vn of vectors v1 , . . . , vn ∈ S where a1 + · · · + an = 1 and every ai ≥ 0. The affine hull, Aff(S), of S consists of the affine combinations a1 v1 + · · · + an vn of vectors v1 , . . . , vn ∈ S where a1 + · · · + an = 1 but some coefficients ai may be negative. A set is convex if it contains all the convex combinations of its members; similarly, it is an affine space if it contains all the affine combinations of its members. An easy computation shows that convex hulls are convex and affine hulls are affine spaces; that is Conv(Conv(S)) = Conv(S) and Aff(Aff(S)) = Aff(S). An affine space A in a vector space V is said to be parallel to a linear subspace L of V if A = u0 + L = {u0 + v : v ∈ L} for some u0 ∈ V . It is easy to see that, if an affine space A is parallel to a linear space L as above, then (i) L is unique, (ii) u0 ∈ A, (iii) any vector in A can play the role of the translator u0 , and (iv) A is either equal to L or disjoint from L. Lemma 25 (§1 in [14]). Any affine subspace A of a real vector space V is parallel to a linear subspace L of V . In other words, any affine subspace is a translation of a linear subspace. For example, in R2 , we have that Aff{(0, 1), (1, 0)} is parallel

44

ANDREAS BLASS AND YURI GUREVICH

to the diagonal y = −x, and Aff{(0, 1), (1, 0), (1, 1)} is (and thus is parallel to) R2 . Proof. If A contains the zero vector ~0 then it is a linear subspace. Indeed, if v ∈ A then any multiple av = av + (1 − a)~0 ∈ A. And if u, v ∈ A then u + v = 2( 21 u + 21 v) ∈ A. For the general case, let u0 be any vector in the affine space A. It suffices to show that L = {v − u0 : v ∈ A} is an affine space, because then the preceding paragraph shows that it is a linear space, and clearly A = u0 + L. Any affine combination a1 (v1 − u0 ) + · · · + an (vn − u0) of vectors in L (so the vi are in A and the sum of the ai is 1) can be rewritten as (a1 v1 + · · · + an vn ) − u0 , which is in L. Let V and W be real vector spaces, S a subset of V , C = Conv(S) its convex hull, and A = Aff(S) its affine hull. Recall that a transformation f : C → W is convex-linear on S if f (a1 v1 + · · · + an vn ) = a1 f (v1 ) + · · · + an f (vn ) for any convex combination a1 v1 + · · · + an vn of vectors vi from S. A transformation f : A → W is translated-linear if it has the form f (v) = w0 + h(v − u0 ) for some w0 ∈ W , some u0 ∈ A, and some linear function h : L → W defined on the linear space L = A − u0 parallel to A. Proposition 26. With notation as above, any transformation f : C → W that is convex-linear on S has a unique extension to a translatedlinear function on A. Proof. Notice first that translations v 7→ v − u0 and linear functions both preserve affine combinations. A translated-linear function, being the composition of two translations and a linear function, therefore also preserves affine combinations. This observation implies the uniqueness part of the proposition. Indeed, every element of A is an affine combination a1 s1 + · · · + an sn of elements of S, and therefore any translated-linear extension of f must map it to a1 f (s1 ) + · · · + an f (sn ). To prove the existence part of the proposition, it will be useful to work with the graphs of functions. For any function g : S → W with S ⊆ V , its graph is the subset of V ⊕ W consisting of the pairs (s, g(s)) for s ∈ S.9 We record for future reference that the graph of g is a linear subspace of V ⊕ W if and only if the domain of g is a linear subspace of V and g is a linear transformation from that domain to W . We 9In

set-theoretic foundations, a function is usually deﬁned as a set of ordered pairs, and so g is the same thing as its graph.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 45

also note that the projection π : V ⊕ W → V : (v, w) 7→ v is a linear transformation that sends the graph of any g to the domain of g. In the situation of the proposition, let f : C → W be a transformation that is convex-linear on S, and let F ⊆ V ⊕ W be its graph. Also, let F − be the graph of the restriction of f to S. Notice that the convex-linearity of f on S means exactly that F is the convex hull of F − . It follows that F and F − have the same affine hull, because Aff(F ) = Aff(Conv(F − )) ⊆ Aff(Aff(F − )) = Aff(F − ) ⊆ Aff(F ).

We claim that this affine hull Aff(F − ) is the graph of a function; that is, it does not contain two distinct elements (v, w) and (v, w ′) with the same first component v. To see this, suppose we had two such elements in Aff(F ) = Aff(F − ), say and

(v, w) = a1 (s1 , f (s1 )) + · · · + am (sm , f (sm ))

(v, w ′) = b1 (t1 , f (t1 )) + · · · + bn (tn , f (tn )), where all the si ’s and tj ’s are in S and where (2)

a1 + · · · + am = b1 + · · · + bn ,

because both sides are equal to 1. So we have (3)

a1 s1 + · · · + am sm = b1 t1 + · · · + bn tn ,

because both sides are equal to v, and we want to prove w = w ′ , i.e., (4)

a1 f (s1 ) + · · · + am f (sm ) = b1 f (t1 ) + · · · + bn f (tn ).

In the special case where all coefficients ai and bj are ≥ 0, vector v is in C and both sides of (4) are equal to f (v). The general case reduces to this special case as follows. In all three equations (2)–(4), move every summand with a negative coefficient to the other side, and then divide the resulting equations by the left part of the rearranged equation (2). As a result we return to the special case already treated. Since the old version of (4) follows from the new one, this completes the proof of our claim that Aff(F ) = Aff(F − ) is the graph of a function. By Lemma 25, the affine space Aff(F ) is parallel to a linear subspace H of V ⊕ W , say Aff(F ) = (u0, w0 ) + H, where u0 ∈ V and w0 ∈ W . From the fact that Aff(F ) is the graph of a function, it follows immediately that H is also the graph of a function. Indeed, if H contains (v, w) and (v, w ′), then Aff(F ) contains (v − u0 , w − w0 ) and (v − u0 , w ′ − w0 ), so w − w0 = w ′ − w0 and w = w ′. Let h be the function whose graph is H. Because H is a linear subspace of V ⊕ W , we know that h is a linear transformation from some linear subspace L of V into W .

46

ANDREAS BLASS AND YURI GUREVICH

The fact that (u0 , w0 ) + H = Aff(F ) tells us, by applying the linear projection π : V ⊕ W → V , that u0 + L equals π(Aff(F )) = Aff(π(F )) = Aff(C) = A, where the first equality comes from linearity of π and the second from the fact that F is the graph of the function f whose domain is C. So A is parallel to the linear subspace L of V . Furthermore, for each v ∈ C, we have (v, f (v)) ∈ F ⊆ Aff(F ) = (u0 , w0 ) + H, so (v − u0 , f (v) − w0 ) is in the graph H of h. That is, h(v − u0 ) = f (v) − w0 and so f (v) = w0 + h(v − u0 ). Thus, the translated-linear function v 7→ w0 + h(v − u0 ) is the desired extension of f .

Remark 27. A linear function h on a subspace L of a vector space V ¯ on all of V . Extend any basis can be extended to a linear function h ¯ of L to a basis of V , define h arbitrarily on the new basis vectors that are not in L, and extend the resulting function by linearity to all of V . For transformations defined on all of V , we have a simpler formula for translated-linear functions, because ¯ − u0 ) = w0 + h(v) ¯ ¯ 0 ) = h(v) ¯ w0 + h(v − h(u + w1 ,

¯ 0 ). where w1 = w0 − h(u ¯ is not unique On the other hand, in contrast to Proposition 26, this h (unless L = V ). Also, in the case of infinite-dimensional spaces, the extension process requires the axiom of choice (to extend bases) and need not be wellbehaved with respect to natural topologies on the vector spaces. Appendix B. No-Go Theorem for Spekkens Version This appendix is devoted to proving the following no-go theorem for the original Spekkens version of probability representations, subject to the clarifications discussed in Section 3.1. Theorem 28. For a Hilbert space H of dimension at least two, there is no probability representation (Spekkens version) subject to determinateness and convex-linearity. Proof. In view of Proposition 23, it suffices to prove the theorem under the assumption that H has dimension exactly two. To begin, we recall the form of density operators and effects in a two-dimensional Hilbert space H. A basis for the Hermitian operators

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 47

on H is given by the identity and the three Pauli matrices 1 0 0 1 0 −i 1 0 I= , σx = , σy = , σz = . 0 1 1 0 i 0 0 −1

It will be convenient to use vector notation, denoting the triple of matrices (σx , σy , σz ) by ~σ . Then the general Hermitian matrix looks like wI + xσx + yσy + zσz = wI + ~x · ~σ , where w and the three components of ~x are real numbers. The eigenvalues of this Hermitian matrix are p w ± x2 + y 2 + z 2 = w ± k~xk In particular, the trace of this matrix is 2w, and the matrix is positive if and only if w ≥ k~xk. Density matrices are the Hermitian, positive matrices of trace 1, so they have the form 1 ρ = ρ(~x) = (I + ~x · ~σ ), 2 where k~xk ≤ 1. As indicated by the notation, we parametrize these density matrices by three-component vectors ~x of norm ≤ 1. The threedimensional ball that serves as the parameter space here is called the Bloch sphere (with its interior). Similarly, effects have the form

with

E = E(m, p~) = mI + pσx + qσy + rσz = mI + p~ · ~σ

k~pk ≤ m ≤ 1 − k~pk (because E and I − E are positive operators) and therefore k~pk ≤ 21 . The parameter space here, consisting of all four-component vectors satisfying these inequalities, is a double cone over a three-dimensional ball of radius 21 . We record for future reference the traces Tr(I) = 2,

Tr(σx ) = Tr(σy ) = Tr(σz ) = 0

and the multiplication table σx σy = −σy σx = iσz ,

and

σy σz = −σz σy = iσx ,

σz σx = −σx σz = iσy ,

σx2 = σy2 = σz2 = I. From these facts, it is easy to compute that Tr(ρ(~x)E(m, ~p)) = m + ~x · ~p,

48

ANDREAS BLASS AND YURI GUREVICH

where the factor 21 in the definition of ρ(~x) has cancelled the factor 2 arising from Tr(I). Given this background information, we are ready to prove Theorem 28. Suppose, toward a contradiction, that we have a probability representation (Spekkens version) satisfying determinateness and convex-linearity, for a two-dimensional H. In view of Proposition 26, we know that ~ µρ(~x) (λ) = ~x · A(λ) + C(λ) and ~ ξE(m,~p) = ~p · B(λ) + mD(λ) + F (λ) for some nine functions Ai (λ), Bi (λ), C(λ), D(λ), F (λ) where the index i ranges from 1 to 3. (The “translated” part of “translated-linear” accounts for C and F .) The definition of probability representation (Spekkens version) leads to some simplifications. E(0, ~0) is the zero operator, whose associated ξ function is required to be identically zero. That gives us F (λ) = 0 for all λ, so we can simply omit F from the formula for ξ. Also, E(1, ~0) is the identity operator, whose associated ξ function is required to be identically 1. That gives us D(λ) = 1 for all λ. So we can simplify the ξ formula above to read ~ ξE(m,~p) = p~ · B(λ) + m. Next, consider the requirement that Z Tr(ρ(~x)E(m, ~p)) = ξE(m,~p) µρ(~x) dλ.

We already evaluated the trace on the left side of this equation at the end of the preceding section. The integral on the right side is Z ~ ~ ~ ~ [(~p · B(λ))(~ x · A(λ)) + (~p · B(λ))C(λ) + m(~x · A(λ)) + mC(λ)] dλ.

Comparing the trace and the integral, and equating coefficients of the various monomials in m, p~, and ~x, we find that Z (5) Bi (λ)Aj (λ) dλ = δi,j , Z (6) Bi (λ)C(λ) dλ = 0, Z (7) Ai (λ) dλ = 0, and Z (8) C(λ) dλ = 1.

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 49

Next, we extract as much information as we can from the assumption that all the functions µρ and ξE are nonnegative. In the case of ξE , this means that, as long as k~pk ≤ m, 1 − m (so ~ that E(m, p~) is an effect), we must have m + ~p · B(λ) ≥ 0 for all λ. Temporarily consider a fixed λ and a fixed m ∈ [0, 12 ]. To get the ~ most information out of the inequality m + p~ · B(λ) ≥ 0, we choose ~ the “worst” vector p~, i.e., we make p~ · B(λ) as negative as possible, ~ by choosing p~ in the opposite direction to B(λ) and with the largest permitted magnitude, namely m. That is, we take m ~ p~ = − B(λ) ~ kB(λ)k ~ so that our inequality becomes 0 ≤ m(1 − kB(λ)k), and therefore ~ kB(λ)k ≤1

for all λ.

Repeating the exercise for m ∈ [ 21 , 1] gives no new information. So we turn to the case of µρ(~x) , for which the nonnegativity requirement reads ~ ~x · A(λ) + C(λ) ≥ 0. For each fixed λ, we consider the “worst” ~x, namely a vector ~x in the ~ direction opposite to A(λ) and with the maximum allowed magnitude, namely 1. So we take ~ A(λ) ~x = − ~ kA(λ)k

~ and obtain the inequality 0 ≤ −kA(λk + C(λ). Thus, we have ~ kA(λ)k ≤ C(λ)

for all λ.

In particular, C(λ) is everywhere nonnegative. ~ A trivial consequence of kA(λ)k ≤ C(λ) is that |A1 (λ)| ≤ C(λ). ~ Similarly, a trivial consequence of kB(λ)k ≤ 1 is |B1 (λ)| ≤ 1. Putting this information into the i = j = 1 case of equation (5), and also using (8), we find that Z Z Z 1 = B1 (λ)A1 (λ) dλ ≤ |B1 (λ)| · |A1 (λ)| dλ ≤ 1 · C(λ) dλ = 1.

So both of the inequalities here must be equalities. In particular, |B1 (λ)| = 1 for almost all λ except where C(λ) = 0. Similarly, we get that, for almost all λ except where C(λ) √ = 0, we ~ also have |B2 (λ)| = |B3 (λ)| = 1 and therefore kB(λ)k = 3. Since ~ we also know kB(λ)k ≤ 1, we must conclude that C(λ) = 0 almost

50

ANDREAS BLASS AND YURI GUREVICH

everywhere. But that contradicts equation (8), and so the proof of the no-go theorem is complete. References [1] John S. Bell, “On the Einstein-Podolsky-Rosen paradox,” Physics 1 (1964) 195–200. [2] John S. Bell, “On the problem of hidden variables in quantum mechanics,” Reviews of Modern Physics 38 (1966) 447–452. [3] Michael S. Birman and Michael Z. Solomjak, Spectral Theory of Self-Adjoint Operators in Hilbert Space (Russian), Leningrad University Press (1980). English translation, Reidel (1987). [4] Slawomir Bugajski, “Classical frames for a quantum theory—a bird’s-eye view,” International Journal of Theoretical Physics 32 (1993) 969–977. [5] “Dual of the space of ﬁnite measures,” http://math.stackexchange.com/questions/74875 [6] Christopher Ferrie and Joseph Emerson, “Frame representations of quantum mechanics and the necessity of negativity in quasi-probability representations,” Journal of Physics A: Mathematical and Theoretical 41 352001 (2008), also arXiv:0711.2658. [7] Christopher Ferrie and Joseph Emerson, “Framed Hilbert space: hanging the quasi-probability pictures of quantum theory,” New Journal of Physics 11 063040 (2009), also arXiv:0903.4843. [8] Christopher Ferrie, Ryan Morris and Joseph Emerson, “Necessity of negativity in quantum theory,” Physical Review A 82, 044103 (2010), also arXiv:0910.3198. [9] Teiko Heinosaari, “A simple suﬃcient condition for the coexistence of quantum eﬀects,” Journal of Physics A: Mathematical and Theoretical 46 (2013) 152002. [10] Teiko Heinosaari, Juka Kiukas, and Daniel Reitzner, “Coexistence of eﬀects from an algebra of two projections,” Journal of Physics A: Mathematical and Theoretical 47 (2014) 225301. [11] Simon Kochen and Ernst Specker, “The problem of hidden variables in quantum mechanics,” Journal of Mathematics and Mechanics 17 (1967) 59–87. [12] N. David Mermin, “Hidden variables and the two theorems of John Bell,” Reviews of Modern Physics 65 (1993) 803–815. [13] Michael A. Nielsen and Isaac A. Chuang, Quantum Computation and Quantum Information, Cambridge University Press (2000). [14] R. Tyrrell Rockafellar, “Convex Analysis,” Princeton University Press 1970. [15] Paul Skoufranis, “Trace class operators,” http://www.math.tamu.edu/~pskoufra/OANotes-TraceClassOperators.pdf [16] Robert W. Spekkens, “Negativity and contextuality are equivalent notions of nonclassicality,” Physics Review Lettters 101(2) (2008) 020401, also arXiv:0710.5549. [17] Robert W. Spekkens, “Negativity and contextuality are equivalent notions of nonclassicality,” arXiv:0710.5549v2. [18] Johann von Neumann, Mathematische Grundlagen der Quantenmechanik, Grundlehren der mathematischen Wissenschaften 38, Springer-Verlag (1932). English translation, Mathematical Foundations of Quantum Mechanics, translated by Robert Beyer, Princeton University Press (1955).

HIDDEN VARIABLES: VALUE AND EXPECTATION NO-GO THEOREMS 51

[19] Wikipedia, “POVM”, retrieved 2 August 2015, https://en.wikipedia.org/?oldid=660642371 [20] John Watrous, CS 766/QIC 820 Theory of Quantum Information (Fall 2011) at https://cs.uwaterloo.ca/~watrous/LectureNotes.html Mathematics Department, University of Michigan, Ann Arbor, MI 48109–1043, U.S.A. E-mail address: [email protected] Microsoft Research, One Microsoft Way, Redmond, WA 98052, U.S.A. E-mail address: [email protected]