Strongly Symmetric Spectral Convex Sets are Jordan Algebra State Spaces

The title of this post is also the title of my most recent paper on the arXiv, from 2019, with Joachim Hilgert of the University of Paderborn. The published version is titled Spectral Properties of Convex Bodies, Journal of Lie Theory 30 (2020) 315-355.

Here's a preprint version of the J Lie Theory article:

Compared to the arXiv version, the J Lie Theory version (and the above-linked preprint) has less detail about the background results by other authors (mainly Jiri Dadok's theory of polar representations of compact Lie groups, and the Madden-Robertson theory of regular convex bodies) which we use, and a bit more detail in the proof of our main result. It also has a more extensive discussion of infinite-dimensional Jordan algebraic systems, in the context of discussion of ways in which the Jordan algebraic systems can be narrowed down to the complex quantum ones. I like the title of the arXiv version better, since it states the main result of the paper, but Joachim was scheduled to give a talk at the celebration of Jimmie Lawson's 75th birthday in 2018, while we were writing up the main result, so he gave a talk on our work titled Spectral Properties…, and we decided to publish the paper in the proceedings, which are a special issue of J Lie Theory, so it ended up with the same title as the talk.

Here are slides from a talk on the result I gave to an audience of mathematicians and mathematical physicists:

MadridOperatorAlgebras2019

Although the paper's main theorem is a result in pure mathematics, and, I think, interesting even purely from that point of view, it is also a result in the generalized probabilistic theories (GPT) framework for formulating physical theories from a very general point of view, which describes physical systems in terms of the probabilities of the various results of all possible ways of observing (we often say "measuring") those systems. The state in which a system has been prepared (whether by an experimenter or by some natural process) is taken to be defined by specifying these probabilities of measurement-results, and it is then very natural to take the set of all possible states in which a system can be prepared to be a compact convex set. Such sets are usually taken to live in some real affine space, for instance the three-dimensional one of familiar Euclidean geometry, which may be taken to host the qubit, whose state space is a solid ball, sometimes called the Bloch ball. In this framework, measurement outcomes are associated with affine functionals on the set taking real values---in fact, values between 0 and 1 --- probabilities, and measurements are associated with lists of such functionals, which add up to the unit functional---the constant functional taking the value 1 on all states of the system. (This ensures that whatever state of the system is prepared, the probabilities of outcomes of a measurement add up to 1.) This allows an extremely wide variety of convex sets as state spaces, most of which are neither the state spaces of quantum systems nor classical systems. An important part of the research program of those of us who spend some of our time working in the GPT framework is to characterize the state spaces of quantum systems by giving mathematically natural axioms, or axioms concerning the physical properties exhibited by, or the information-processing protocols we can implement using, such systems, such that all systems having these properties are quantum systems. To take just a few examples of the type of properties we might ask about: can we clone states of such systems? Do we have what Schroedinger called "steering" using entangled states of a pair of such systems? Can we define a notion of entropy in a way similar to the way we define the von Neumann entropy of a quantum system, and if so, are there thermodynamic protocols or processes similar to those possible with quantum systems, in which the entropy plays a similar role? Are there analogues of the spectral theorem for quantum states (density matrices), of the projection postulate of quantum theory, of the plethora of invertible transformations of the state space that are described, in the quantum case, by unitary operators? We often limit ourselves (as Joachim and I did in our paper) to finite-dimensional GPT systems to make the mathematics easier while still allowing most of the relevant conceptual points to become clear.

This paper with Joachim builds on work by me, Markus Mueller and Cozmin Ududec, who showed that three principles characterize irreducible finite-dimensional Jordan-algebraic systems (plus finite-dimensional classical state spaces). Since these systems were shown (by Jordan, von Neumann, and Wigner in the 1930s, shortly after Jordan defined the algebras named after him) to be just the finite-dimensional quantum systems over the real, complex, and quaternionic numbers, plus systems whose state space is a ball (of any finite dimension), plus three-dimensional quantum theory over the octonions (associated with the so-called exceptional Jordan algebra), this already gets us very close to a characterization of the usual complex quantum state space (of density matrices) and the associated measurement theory described by positive operators. The principles are (1) A generalized spectral decomposition: every state is a convex combination of perfectly distinguishable pure states, (2) Strong Symmetry: every set of perfectly distinguishable pure states may be taken to any other such set (of the same size) by a symmetry of the state space, and (3) that there is no irreducible three (or more) path interference. Joachim and I characterized the same class of systems using only (1) and (2). In order for you to understand these properties, I need to explain some terms used in them: pure states are defined as states that cannot be viewed as convex combinations of any other states---that is, there is no "noise" involved in their preparation---they are sometimes called "states of maximal information". The states in a given list of states are "perfectly distinguishable" from each other if, when we are guaranteed that the state of a system is one of those in the list, there is a single measurement that can tell us which state it is. The measurement that does the distinguishing may, of course, depend on which list it is. Indeed one can take it as a definition of classical system, at least in this finite-dimensional context, that there is a single measurement that is capable of distinguishing the states in any list of distinct pure states of the system.

If one wants to narrow things down further from the Jordan-algebraic systems to the complex quantum systems, there are known principles that will do it: for instance, energy observability (from the Barnum, Mueller, Ududec paper linked above, although it should be noted that it's closely related to concepts of Alfsen and Shultz ("dynamical correspondence") and of Connes ("orientation")): that the generators of continuous symmetries of the state space are also observables, and are conserved by the dynamics that they generate, a requirement very reminiscent of Noether's theorem on conserved generators of symmetries. Mathematically speaking, we formulate this as a requirement that the Lie algebra of the symmetry group of the state space embeds, injectively and linearly, into the space of observables (which we take to be the ambient real vector space spanned by the measurement outcomes) of the system, in such a way that the embedded image of a Lie algebra generator is conserved by the dynamics it generates. In the quantum case, this is just the fact that the Lie algebra su(n) of an n-dimensional quantum system's symmetry group is the real vector space of anti-Hermitian matrices, which embeds linearly (over the reals) and injectively into the Hermitian matrices (indeed, bijectively onto the traceless Hermitian matrices), which are of course the observables of a finite-dimensional quantum system. This embedding is so familiar to physicists that they usually just consider the generators of the symmetry group to be Hermitian matrices ("Hamiltonians"), and map them back to the antiHermitian generators by considering multiplication by i (that's the square root of -1) as part of the "generation" of unitary evolution. This is discussed in the arXiv version, but the Journal of Lie Theory version discusses it more extensively, and along the way indicates some results on infinite-dimensional Jordan algebraic systems since Alfsen and Shultz, and Connes, worked in frameworks allowing some infinite-dimensional systems. See also John Baez' excellent recent paper, Getting to the Bottom of Noether's Theorem. One can also narrow things down to complex quantum systems by requiring that systems compose in a "tomographically local" way, which means that there is a notion of composite system, made up of two distinct systems, such that all states of the composite system, even the entangled ones (a notion which makes sense in this general probabilistic context, not only in quantum theory), are determined by the probabilities they give to pairs of local meausurement outcomes (i.e. the way in which they correlate (or fail to correlate) these outcomes).

What's wrong---and what's right---about quantum parallelism as an explanation for quantum speedup

I couldn't help crying "Nooo!!!!" on Twitter to the following statement by Pierre Pariente, "Strategic Analyst chez L’Atelier BNP Paribas," from a 2015 article "Quantum Computing Set to Revolutionize the Health Sector"

L’Atelier's more mathematically-inclined readers will recognise the general rule that with n qbits, a quantum computer may be in a quantum superposition of 2^n states and will thus possess the capacity to solve that number of problems simultaneously.

In the interest of not just being negative, I should explain what's wrong with this statement---and why it does capture something relevant to many examples of what most quantum computation theorists believe to be theoretical speed advantages of quantum algorithms over classical algorithms.  I'll give the short version first, and maybe in another post I'll explain a bit (groan) more.

What's wrong with quantum parallelism?

If you want, you can run a quantum version of a classical algorithm for solving some problem---say, computing a function on an n-bit long input string---on a superposition of all 2^n states representing different input strings.  This is often called "quantum parallelism".  If we consider computing the value of some function, F, on each input x to be "solving a different problem", then in some very weak sense we might think of this as "solving 2^n different problems simultaneously".  However, the drug designers, or radiotherapists, or whoever one is going to be selling one's quantum computers and software to, are presumably going to want to know the answers to the problems.  But you cannot read out of a quantum computer that has run such a "quantum parallel" computation  the answer to all of these problems (the value of the function on all inputs).  In fact, you can read out the answer to only one of the problems (the value, F(x), of F on a particular input x).  If you have run in superposition and you measure the register in which the algorithm writes the value F(x), you'll get the answer on a randomly drawn input (the famous randomness of quantum theory), and if you measure a part of the computer where the input string was stored, you'll also get the value of said randomly drawn input. There is absolutely no advantage here over just running the classical algorithm on a randomly chosen input.

What's right with quantum parallelism

So how can this quantum parallelism nevertheless be useful?  The reason for this is essentially the existence of many mutually incompatible observables in quantum theory (position and momentum being the most famous example of a pair of incompatible observables, though they are not the ones usually used in proposals for quantum computation).  In particular, quantum theory allows such a superposition state to be effectively (and often efficiently) measured in ways other than reading out the value of F and the input that gave rise to it---ways incompatible with that readout.  This can tell us certain things about the function F other than its values on specific inputs, and these things may even depend on its values on exponentially many inputs.  The incompatibility of these measurements with the one that reads out the input and the function value, implies that once the global information is obtained, the information about inputs and function values is no longer available---an example of the well-known fact that  "measurement disturbs the state" of a quantum system.

Why do we care?

Many problems of practical interest take this form:  we have a program (say, a classical circuit---from which we can always construct a quantum circuit) for computing a function F, but we want to know something about the function that we don't know how to just read off from the information we used to construct the circuit. For example, if F takes values in an ordered set (like the real numbers, to some precision) we might want to know its maximum value, and probably also an input on which it takes this maximum. (This application, "optimization", is mentioned in the article.)  Or, we might know because of the way the circuit for computing the function is constructed that it is periodic (we will need for the set of inputs to have some structure some that allows us to define periodicity, of course---maybe they are interpreted as binary notation for integers, for example), but we might not know---and might want to know---the period.  To oversimplify a bit, solving this problem is the key ingredient in Shor's algorithm for factoring large integers using quantum computation---the one that breaks currently used public-key cryptography protocols.  If we could compute and read out the values of the function on all inputs, of course we could figure out about anything we want about it---but since there are 2^n inputs, doing things this way is going to take exponential (in n) resources---e.g., 2^n calculations of the function.  To repeat, it is the fact that quantum computing allows such a superposition state to be effectively measured in ways other than reading out the value of F, that  nevertheless gives us potential access, after just one function evaluation, to a certain amount of information about "global" aspects of the function, like its period, that depend on many values.   We are still very limited as to how much information we can get about the function in one run of a "quantum parallel" function evaluation---certainly, it is only polynomially much, not the exponentially much one could get by reading out its value on all inputs.  But in some cases, we can get significant information about some global aspect of the function that we care about, with a number of quantum-parallel function evaluations that, if they were mere classical function evaluations, would leave us with the ability to get far less information, or even no information, about that global aspect of the function.  How significant this speedup is may depend on what global information we want to know about the function, and what we know about it  ahead of time.  If we know it is periodic, then in good cases, we can use polynomially many evaluations to get the period in situations where it's thought we'd need exponentially many classical evaluations.  This is the basis of the "near-exponential"  speedup of factoring by Shor's algorithm (it is not quite exponential, presumably because the best classical algorithm is not just to try to brute-force find the period that is found in the quantum algorithm, but is more sophisticated).   For relatively unstructured optimization problems, quantum algorithms usually make use of Grover's algorithm, which, again, does use quantum parallelism, and can find the optimum to high precision with roughly the *square root* of the number of function evaluations needed classically.  If the classical algorithm needs (some constant times) 2^n evaluations, in other words, the quantum one will need roughly (some constant times) 2^{n/2}---still exponential, though with half the exponent; an advantage which could still be useful if the large constant cost factor that comes from doing quantum operations instead of classical ones does not swamp it.

Quantum algorithms beyond quantum-parallelism

There are of course other scenarios in which quantum computing may be useful, that are not well described as "finding out unknown properties of a function by running a circuit for it"---especially those where the problem itself comes from quantum physics, or happens to have a mathematical structure similar to that of quantum physics.   Typically these have the property that one is still "finding out unknown properties of a quantum circuit by running it", but the circuit is not necessarily a quantum version of a classical function evaluation, but rather represents some mathematical structure, or physical evolution, that is well-described by a quantum circuit.  The most obvious example is of course the evolution of some quantum physical system!  Some of these kinds of problems---like quantum chemistry applications---are among those potentially relevant to the health applications that are the topic of the BNP article.

Anthony Aguirre is looking for postdoc at Santa Cruz in Physics of the Observer

Anthony Aguirre points out that UC Santa Cruz is advertising for postdocs in the "Physics of the Observer" program; and although review of applications began in December with a Dec. 15 deadline "for earliest consideration", if you apply fast you will still be considered.  He explicitly states they are looking for strong applicants from the quantum foundations community, among other things.

My take on this: The interaction of quantum and spacetime/gravitational physics is an area of great interest these days, and people doing rigorous work in quantum foundations, quantum information, general probabilistic theories have much to contribute.  It's natural to think about links with cosmology in this context.  I think this is a great opportunity, foundations postdocs and students, and Anthony and Max are good people to be connected with, very proactive in seeking out sources of funding for cutting-edge research and very supportive of interdisciplinary interaction.  The California coast around Santa Cruz is beautiful, SC is a nice funky town on the ocean, and you're within striking distance of the academic and venture capital powerhouses of the Bay Area.  So do it!

Martin Idel: the fixed-point sets of positive trace-preserving maps on quantum systems are Jordan algebras!

Kasia Macieszczak is visiting the ITP at Leibniz Universität Hannover (where I arrived last month, and where I'll be based for the next 7 months or so), and gave a talk on metastable manifolds of states in open quantum systems.  She told me about a remarkable result in the Master's thesis of Martin Idel at Munich: the fixed point set of any trace-preserving, positive (not necessarily completely positive) map on the space of Hermitian operators of a finite-dimensional quantum system, is a Euclidean Jordan algebra.  It's not necessarily a Jordan subalgebra of the usual Jordan algebra associated with the quantum system (whose Jordan product is the antisymmetrized matrix multiplication, ).  We use the usual characterization of the projector onto the fixed-point space of a linear map .  The maximum-rank fixed point is (where is the identity matrix), which we'll call , and the Jordan product on the fixed-point space is the original one "twisted" to have as its unit:  for fixed-points, this Jordan product, which I'll denote by , is:

which we could also write in terms of the original Jordan product as , where is the map defined by .

Idel's result, Theorem 6.1 in his thesis, is stated in terms of the map on all complex matrices, not just the  Hermitian ones; the fixed-point space is then the complexification of the Euclidean Jordan algebra.  In the case of completely positive maps, this complexification is "roughly a algebra" according to Idel.  (I suspect, but don't recall offhand, that it is a direct sum of full matrix algebras, i.e. isomorphic to a quantum system composed of several "superselection sectors" (the full matrix algebras in the sum), but as in the Euclidean case, not necessarily a -subalgebra of the ambient matrix algebra.)

I find this a remarkable result because I'm interested in places where Euclidean Jordan algebras appear in nature, or in mathematics.  One reason for this is that the finite-dimensional ones are in one-to-one correspondence with homogeneous, self-dual cones; perhaps I'll discuss this beautiful fact another time.  Alex Wilce, Phillip Gaebeler and I related the property of homogeneity to "steering" (which Schrödinger considered a fundamental weirdness of the newly developed quantum theory) in this paper.  I don't think I've blogged about this before, but Matthew Graydon, Alex Wilce, and I have developed ways of constructing composite systems of the general probabilistic systems based on reversible Jordan algebras, along with some results that I interpret as no-go theorems for such composites when one of the factors is not universally reversible.  The composites are still based on Jordan algebras, but are necessarily (if we wish them to still be Jordan-algebraic) not locally tomographic unless both systems are quantum.  Perhaps I'll post more on this later, too.  For now I just wanted to describe this cool result of Martin Idel's that I'm happy to have learned about today from Kasia.

ITFP, Perimeter: selective guide to talks. #1: Brukner on quantum theory with indefinite causal order

Excellent conference the week before last at Perimeter Institute: Information Theoretic Foundations for Physics.  The talks are online; herewith a selection of some of my favorites, heavily biased towards ideas new and particularly interesting to me (so some excellent ones that might be of more interest to you may be left off the list!).  Some of what would have been possibly of most interest and most novel to me happened on Weds., when the topic was spacetime physics and information, and I had to skip the day to work on a grant proposal.  I'll have to watch those online sometime.  This was going to be one post with thumbnail sketches/reviews of each talk, but as usual I can't help running on, so it may be one post per talk.

All talks available here, so you can pick and choose. Here's #1 (order is roughly temporal, not any kind of ranking...):

Caslav Brukner kicked off with some interesting work on physical theories in with indefinite causal structure.  Normally in formulating theories in an "operational" setting (in which we care primarily about the probabilities of physical processes that occur as part of a complete compatible set of possible processes) we assume a definite causal (partial) ordering, so that one process may happen "before" or "after" another, or "neither before nor after".  The formulation is "operational" in that an experimenter or other agent may decide upon, or at least influence, which set of processes, out of possible compatible sets, the actual process will be drawn, and then nature decides (but with certain probabilities for each possible process, that form part of our theory), which one actually happens.  So for instance, the experimenter decides to perform a Stern-Gerlach experiment with a particular orientation X of the magnets; then the possible processes are, roughly, "the atom was deflected in the X direction by an angle theta," for various angles theta.  Choose a different orientation, Y, for your apparatus, you choose a different set of possible compatible processes.  ("The atom was deflected in the Y direction by an angle theta.")  Then we assume that if one set of compatible processes happens after another, an agent's choice of which complete set of processes is realized later, can't influence the probabilities of processes occuring in an earlier set.  "No signalling from the future", I like to call this; in formalized operational theories it is sometimes called the "Pavia causality axiom".   Signaling from the past to the future is fine, of course.  If two complete  sets of processes are incomparable with respect to causal order ("spacelike-separated"), the no-signalling constraint operates both ways:  neither Alice's choice of which compatible set is realized, nor Bob's, can influence the probabilities of processes occuring at the other agent's site.   (If it could, that would allow nearly-instantaneous signaling between spatially separated sites---a highly implausible phenomenon only possible in preposterous theories such as the Bohmian version of quantum theory with "quantum disequilibrium", and Newtonian gravity. ) Anyway, Brukner looks at theories that are close to quantum, but in which this assumption doesn't necessarily apply: the probabilities exhibit "indeterminate causal structure".  Since the theories are close to quantum, they can be interpreted as allowing "superpositions of different causal structures", which is just the sort of thing you might think you'd run into in, say, theories combining features of quantum physics with features of general relativistic spacetime physics.  As Caslav points out, since in general relativity the causal structure is influenced by the distribution of mass and energy, you might hope to realize such indefinite causal structure by creating a quantum superposition of states in which a mass is in one place, versus being in another.  (There are people who think that at some point---some combinations of spatial scales (separation of the areas in which the mass is located) and mass scales (amount of mass to be separated in "coherent" superposition)) the possibility of such superpositions breaks down.  Experimentalists at Vienna (where Caslav---a theorist, but one who likes to work with experimenters to suggest experiments---is on the faculty) have created what are probably the most significant such superpositions.)

Situations with a superposition of causal orders seem to be exhibit some computational advantages over standard causally-ordered quantum computation, like being able to tell in fewer queries (one?) whether a pair of unitaries commutes or anticommutes.  Not sure whose result that was (Giulio Chiribella and others?), but Caslav presents some more recent results on query complexity in this model, extending the initial results.  I am generally wary about results on computation in theories with causal anomalies.  The stuff on query complexity with closed timelike curves, e.g. by Dave Bacon and by  Scott Aaronson and John Watrous has seemed uncompelling---not the correctness of the mathematical results, but rather the physical relevance of the definition of computation---to me for reasons similar to those given by Bennett, Leung, Smith and Smolin.  But I tend to suspect that Caslav and the others who have done these query results, use a more physically compelling framework because they are well versed in the convex operational or "general probabilistic theories" framework which aims to make the probabilistic behavior of processes consistent under convex combination ("mixture", i.e. roughly speaking letting somebody flip coins to decide which input to present your device with).  Inconsistency with respect to such mixing is part of the Bennett/Leung/Smolin/Smith objection to the CTC complexity classes as originally defined.

[Update:  This article at Physics.org quotes an interview with Scott Aaronson responding to the Bennett et. al. objections.  Reasonably enough, he doesn't think the question of what a physically relevant definition of CTC computing is has been settled.  When I try to think about this issue sometimes I wonder if the thorny philosophical question of whether we court inconsistency by trying to combine intervention ("free choice of inputs") in a physical theory is rearing its head.  As often with posts here, I'm reminding myself to revisit the issue at some point... and think harder.]

Thinking about Robert Wald's take on the loss, or not, of information into black holes

A warning to readers: As far as physics goes, I tend to use this blog to muse out loud about things I am trying to understand better, rather than to provide lapidary intuitive summaries for the enlightenment of a general audience on matters I am already expert on. Musing out loud is what's going on in this post, for sure. I will try, I'm sure not always successfully, not to mislead, but I'll be unembarassed about admitting what I don't know.

I recently did a first reading (so, skipped and skimmed some, and did not follow all calculations/reasoning) of Robert Wald's book "Quantum Field Theory in Curved Spacetime and Black Hole Thermodynamics".  I like Wald's style --- not too lengthy, focused on getting the important concepts and points across and not getting bogged down in calculational details, but also aiming for mathematical rigor in the formulation of the important concepts and results.

Wald uses the algebraic approach to quantum field theory (AQFT), and his approach to AQFT involves looking at the space of solutions to the classical equations of motion as a symplectic manifold, and then quantizing from that point of view, in a somewhat Dirac-like manner (the idea is that Poisson brackets, which are natural mathematical objects on a symplectic manifold, should go to commutators  between generalized positions and momenta, but what is actually used is the Weyl form of the commutation relations), doing the Minkowski-space (special relativistic, flat space) version before embarking on the curved-space, (semiclassical general relativistic) one.   He argues that this manner of formulating quantum field theory has great advantages in curved space, where the dependence of the notion of "particle" on the reference frame can make quantization in terms of an expansion in Fourier modes of the field ("particles") problematic.  AQFT gets somewhat short shrift among mainstream quantum field theorists, I sense, in part because (at least when I was learning about it---things may have changed slightly, but I think not that much) no-one has given a rigorous mathematical example of an algebraic quantum field theory of interacting (as opposed to freely propagating) fields in a spacetime with three space dimensions.  (And perhaps the number of AQFT's that have been constructed even in fewer space dimensions is not very large?).  There is also the matter pointed out by Rafael Sorkin, that when AQFT's are formulated, as is often done, in terms of a "net" of local algebras of observables (each algebra associated with an open spacetime region, with compatibility conditions defining what it means to have a "net" of algebras on a spacetime, e.g. the subalgebra corresponding to a subset of region R is a subalgebra of the algebra for region R; if two subsets of a region R are spacelike separated then their corresponding subalgebras commute), the implicit assumption that every Hermitian operator in the algebra associated with a region can be measured "locally"  in that region actually creates difficulties with causal locality---since regions are extended in spacetime, coupling together measurements made in different regions through perfectly timelike classical feedforward of the results of one measurement to the setting of another, can create spacelike causality (and probably even signaling).  See Rafael's paper "Impossible measurements on quantum fields".   (I wonder if that is related to the difficulties in formulating a consistent interacting theory in higher spacetime dimension.)

That's probably tangential to our concerns here, though, because it appears we can understand the basics of the Hawking effect, of radiation by black holes, leading to black-hole evaporation and the consequent worry about "nonunitarity" or "information loss" in black holes, without needing a quantized interacting field theory.  We treat spacetime, and the matter that is collapsing to form the black hole, in classical general relativistic terms, and the Hawking radiation arises in the free field theory of photons in this background.

I liked Wald's discussion of black hole information loss in the book.  His attitude is that he is not bothered by it, because the spacelike hypersurface on which the state is mixed after the black hole evaporates (even when the states on similar spacelike hypersurfaces before black hole formation are pure) is not a Cauchy surface for the spacetime.  There are non-spacelike, inextensible curves that don't intersect that hypersurface.  The pre-black-hole spacelike hypersurfaces on which the state is pure are, by contrast, Cauchy surfaces---but some of the trajectories crossing such an initial surface go into the black hole and hit the singularity, "destroying" information.  So we should not expect purity of the state on the post-evaporation spacelike hypersurfaces any more than we should expect, say, a pure state on a hyperboloid of revolution contained in a forward light-cone in Minkowski space --- there are trajectories that never intersect that hyperboloid.

Wald's talk at last year's firewall conference is an excellent presentation of these ideas; most of it makes the same points made in the book, but with a few nice extra observations. There are additional sections, for instance on why he thinks black holes do form (i.e. rejects the idea that a "frozen star" could be the whole story), and dealing with anti de sitter / conformal field theory models of black hole evaporation. In the latter he stresses the idea that early and late times in the boundary CFT do not correspond in any clear way to early and late times in the bulk field theory (at least that is how I recall it).

I am not satisfied with a mere statement that the information "is destroyed at the singularity", however.  The singularity is a feature of the classical general relativistic mathematical description, and near it the curvature becomes so great that we expect quantum aspects of spacetime to become relevant.  We don't know what happens to the degrees of freedom inside the horizon with which variables outside the horizon are entangled (giving rise to a mixed state outside the horizon), once they get into this region.  One thing that a priori seems possible is that the spacetime geometry, or maybe some pre-spacetime quantum (or post-quantum) variables that underly the emergence of spacetime in our universe (i.e. our portion of the universe, or multiverse if you like) may go into a superposition (the components of which have different values of these inside-the-horizon degrees of freedom that are still correlated (entangled) with the post-evaporation variables). Perhaps this is a superposition including pieces of spacetime disconnected from ours, perhaps of weirder things still involving pre-spacetime degrees of freedom.  It could also be, as speculated by those who also speculate that the state on the post-evaporation hypersurface in our (portion of the) universe is pure, that these quantum fluctuations in spacetime somehow mediate the transfer of the information back out of the black hole in the evaporation process, despite worries that this process violates constraints of spacetime causality.  I'm not that clear on the various mechanisms proposed for this, but would look again at the work of Susskind, and Susskind and Maldacena ("ER=EPR") to try to recall some of the proposals. (My rough idea of the "ER=EPR" proposals is that they want to view entangled "EPR" ("Einstein-Podolsky-Rosen") pairs of particles, or at least the Hawking radiation quanta and their entangled partners that went into the black hole, as also associated with miniature "wormholes" ("Einstein-Rosen", or ER, bridges) in spacetime connecting the inside to the outside of the black hole; somehow this is supposed to help out with the issue of nonlocality, in a way that I might understand better if I understood why nonlocality threatens to begin with.)

The main thing I've taken from Wald's talk is a feeling of not being worried by the possible lack of unitarity in the transformation from a spacelike pre-black-hole hypersurface in our (portion of the) universe to a post-black-hole-evaporation one in our (portion of the) universe. Quantum gravity effects at the singularity either transfer the information into inaccessible regions of spacetime ("other universes"), leaving (if things started in a pure state on the pre-black-hole surface) a mixed state on the post-evaporation surface in our portion of the universe, but still one that is pure in some sense overall, or they funnel it back out into our portion of the universe as the black hole evaporates. It is a challenge, and one that should help stimulate the development of quantum gravity theories, to figure out which, and exactly what is going on, but I don't feel any strong a priori compulsion toward one or the other of a unitary or a nonunitary evolution on from pre-black-hole to post-evaporation spacelike hypersurfaces in our portion of the universe.

 

 

Quantum imaging with entanglement and undetected photons, II: short version

Here's a short explanation of the experiment reported in "Quantum imaging with undetected photons" by members of Anton Zeilinger's group in Vienna (Barreta Lemos, Borish, Cole, Ramelow, Lapkiewicz and Zeilinger).  The previous post also explains the experiment, but in a way that is closer to my real-time reading of the article; this post is cleaner and more succinct.

It's most easily understood by comparison to an ordinary Mach-Zehnder interferometry experiment. (The most informative part of the wikipedia article is the section "How it works"; Fig. 3 provides a picture.)  In this sort of experiment, photons from a source such as a laser encounter a beamsplitter and go into a superposition of being transmitted and reflected.  One beam goes through an object to be imaged, and acquires a phase factor---a complex number of modulus 1 that depends on the refractive index of the material out of which the object is made, and the thickness of the object at the point at which the beam goes through.  You can think of this complex number as an arrow of length 1 lying in a two-dimensional plane; the arrow rotates as the photon passes through material, with the rate of rotation depending on the refractive index of the material. (If the thickness and/or refractive index varies on a scale smaller than the beamwidth, then the phase shift may vary over the beam cross-section, allowing the creation of an image of how the thickness of the object---or at least, the total phase imparted by the object, since the refractive index may be varying too---varies in the plane transverse to the beam.  Otherwise, to create an image rather than just measure the total phase it imparts at a point, the beam may need to be scanned across the object.)  The phase shift can be detected by recombining the beams at the second beamsplitter, and observing the intensity of light in each of the two output beams, since the relative probability of a photon coming out one way or the other depends on the relative phase of the the two input beams; this dependence is called "interference".

Now open the homepage of the Nature article and click on Figure 1 to enlarge it.  This is a simplified schematic of the experiment done in Vienna.  Just as in ordinary Mach-Zehnder interferometry, a beam of photons is split on a beamsplitter (labeled BS1 in the figure).  One can think of each photon from the source going into a superposition of being reflected and transmitted at the first beamsplitter.  The transmitted part is downconverted by passing through the nonlinear crystal NL1 into an entangled pair consisting of a yellow and a red photon; the red photon is siphoned off by a dichroic (color-dependent) beamsplitter, D1, and passed through the object O to be imaged, acquiring a phase dependent on the refractive index of the object and its thickness.  The phase, as I understand things, is associated with the photon pair even though it is imparted by the passing only the red photon through the object.  In order to observe the phase via interferometry, one needs to involve both the red and yellow photon, coherently.  (If one could observe it as soon as it was imparted to the pair by just interacting with the yellow photon, one could send a signal from the interaction point to the yellow part of the beam instantaneously, violating relativity.)   The red part of the beam is then recombined (at dichroic beamsplitter D2) with the reflected portion of the beam (which is still at the original wavelength), and that portion of the beam is passed through another nonlinear crystal, NL2.  This downconverts the part of the beam that is at the original wavelength into a red-yellow pair, with the resulting red component aligned with --- and indistinguishable from---the red component that has gone through the object.  The phase associated with the photon pair created in the transmitted part of the beam whose red member went through the object is now associated with the yellow photons in the transmitted beam, since the red photons in that beam have been rendered indistinguishable from the ones created in the reflected beam, and so retain no information about the relative phase.  This means that the phase can be observed siphoning out the red photons (at dichroic beamsplitter D3), recombining just the yellow photons with a beamsplitter BS2, and observing the intensitities at the two outputs of this final beamsplitter, precisely as in the last stage of an ordinary Mach-Zehnder experiment.  The potential advantage over ordinary Mach-Zehnder interferometry is that one can image the total phase imparted by the object at a wavelength different from the wavelength of the photons that are interfered and detected at the final stage, which could be an advantage for instance if good detectors are not available at the wavelength one wants to image the object at.

Quantum imaging with entanglement and undetected photons in Vienna

[Update 9/1:  I have been planning (before any comments, incidentally) to write a version of this post which just provides a concise verbal explanation of the experiment, supplemented perhaps with a little formal calculation.  However, I think the discussion below comes to a correct understanding of the experiment, and I will leave it up as an example of how a physicist somewhat conversant with but not usually working in quantum optics reads and quickly comes to a correct understanding of a paper.  Yes, the understanding is correct even if some misleading language was used in places, but I thank commenter Andreas for pointing out the latter.]

Thanks to tweeters @AtheistMissionary and @robertwrighter for bringing to my attention this experiment by a University of Vienna group (Gabriela Barreto Lemos, Victoria Borish, Garrett D. Cole, Sven Ramelo, Radek Lapkiewicz and Anton Zeilinger), published in Nature, on imaging using entangled pairs of photons.  It seems vaguely familiar, perhaps from my visit to the Brukner, Aspelmeyer and Zeilinger groups in Vienna earlier this year;  it may be that one of the group members showed or described it to me when I was touring their labs.  I'll have to look back at my notes.

This New Scientist summary prompts the Atheist and Robert to ask (perhaps tongue-in-cheek?) if it allows faster-than-light signaling.  The answer is of course no. The New Scientist article fails to point out a crucial aspect of the experiment, which is that there are two entangled pairs created, each one at a different nonlinear crystal, labeled NL1 and NL2 in Fig. 1 of the Nature article.  [Update 9/1: As I suggest parenthetically, but in not sufficiently emphatic terms, four sentences below, and as commenter Andreas points out,  there is (eventually) a superposition of an entangled pair having been created at different points in the setup; "two pairs" here is potentially misleading shorthand for that.] To follow along with my explanation, open the Nature article preview, and click on Figure 1 to enlarge it.  Each pair is coherent with the other pair, because the two pairs are created on different arms of an interferometer, fed by the same pump laser.  The initial beamsplitter labeled "BS1" is where these two arms are created (the nonlinear crystals come later). (It might be a bit misleading to say two pairs are created by the nonlinear crystals, since that suggests that in a "single shot" the mean photon number in the system after both nonlinear crystals  have been passed is 4, whereas I'd guess it's actually 2 --- i.e. the system is in a superposition of "photon pair created at NL1" and "photon pair created at NL2".)  Each pair consists of a red and a yellow photon; on one arm of the interferometer, the red photon created at NL1 is passed through the object "O".  Crucially, the second pair is not created until after this beam containing the red photon that has passed through the object is recombined with the other beam from the initial beamsplitter (at D2).  ("D" stands for "dichroic mirror"---this mirror reflects red photons, but is transparent at the original (undownconverted) wavelength.)  Only then is the resulting combination passed through the nonlinear crystal, NL2.  Then the red mode (which is fed not only by the red mode that passed through the object and has been recombined into the beam, but also by the downconversion process from photons of the original wavelength impinging on NL2) is pulled out of the beam by another dichroic mirror.  The yellow mode is then recombined with the yellow mode from NL1 on the other arm of the interferometer, and the resulting interference observed by the detectors at lower right in the figure.

It is easy to see why this experiment does not allow superluminal signaling by altering the imaged object, and thereby altering the image.  For there is an effectively lightlike or timelike (it will be effectively timelike, given the delays introduced by the beamsplitters and mirrors and such) path from the object to the detectors.  It is crucial that the red light passed through the object be recombined, at least for a while, with the light that has not passed through the object, in some spacetime region in the past light cone of the detectors, for it is the recombination here that enables the interference between light not passed through the object, and light passed through the object, that allows the image to show up in the yellow light that has not (on either arm of the interferometer) passed through the object.  Since the object must be in the past lightcone of the recombination region where the red light interferes, which in turn must be in the past lightcone of the final detectors, the object must be in the past lightcone of the final detectors.  So we can signal by changing the object and thereby changing the image at the final detectors, but the signaling is not faster-than-light.

Perhaps the most interesting thing about the experiment, as the authors point out, is that it enables an object to be imaged at a wavelength that may be difficult to efficiently detect, using detectors at a different wavelength, as long as there is a downconversion process that creates a pair of photons with one member of the pair at each wavelength.  By not pointing out the crucial fact that this is an interference experiment between two entangled pairs [Update 9/1: per my parenthetical remark above, and Andreas' comment, this should be taken as shorthand for "between a component of the wavefunction in which an entangled pair is created in the upper arm of the interferometer, and one in which one is created in the lower arm"], the description in New Scientist does naturally suggest that the image might be created in one member of an entangled pair, by passing the other member through the object,  without any recombination of the photons that have passed through the object with a beam on a path to the final detectors, which would indeed violate no-signaling.

I haven't done a calculation of what should happen in the experiment, but my rough intuition at the moment   is that the red photons that have come through the object interfere with the red component of the beam created in the downconversion process, and since the photons that came through the object have entangled yellow partners in the upper arm of the interferometer that did not pass through the object, and the red photons that did not pass through the object have yellow partners created along with them in the lower part of the interferometer, the interference pattern between the red photons that did and didn't pass through the object corresponds perfectly to an interference pattern between their yellow partners, neither of which passed through the object.  It is the latter that is observed at the detectors. [Update 8/29: now that I've done the simple calculation, I think this intuitive explanation is not so hot.  The phase shift imparted by the object "to the red photons" actually pertains to the entire red-yellow entangled pair that has come from NL1 even though it can be imparted by just "interacting" with the red beam, so it is not that the red photons interfere with the red photons from NL2, and the yellow with the yellow in the same way independently, so that the pattern could be observed on either color, with the statistical details perfectly correlated. Rather, without recombining the red photons with the beam, no interference could be observed between photons of a single color, be it red or yellow, because the "which-beam" information for each color is recorded in different beams of the other color.  The recombination of the red photons that have passed through the object with the undownconverted photons from the other output of the initial beamsplitter ensures that the red photons all end up in the same mode after crystal NL2 whether they came into the beam before the crystal or were produced in the crystal by downconversion, thereby ensuring that the red photons contain no record of which beam the yellow photons are in, and allowing the interference due to the phase shift imparted by the object to be observed on the yellow photons alone.]

As I mentioned, not having done the calculation, I don't think I fully understand what is happening.  [Update: Now that I have done a calculation of sorts, the questions raised in this paragraph are  answered in a further Update at the end of this post.  I now think that some of the recombinations of beams considered in this paragraph are not physically possible.]  In particular, I suspect that if the red beam that passes through the object were mixed with the downconverted beam on the lower arm of the interferometer after the downconversion, and then peeled off before detection, instead of having been mixed in before the downconversion and peeled off afterward, the interference pattern would not be observed, but I don't have clear argument why that should be.  [Update 8/29: the process is described ambiguously here.  If we could peel off the red photons that have passed through the object while leaving the ones that came from the downconversion at NL2, we would destroy the interference.  But we obviously can't do that; neither we nor our apparatus can tell these photons apart (and if we could, that would destroy interference anyway).  Peeling off *all* the red photons before detection actually would allow the interference to be seen, if we could have mixed back in the red photons first; the catch is that this mixing-back-in is probaby not physically possible.]  Anyone want to help out with an explanation?  I suspect one could show that this would be the same as peeling off the red photons from NL2 after the beamsplitter but before detection,  and only then recombining them with the red photons from the object, which would be the same as just throwing away the red photons from the object to begin with.  If one could image in this way, then that would allow signaling, so it must not work.  But I'd still prefer a more direct understanding via a comparison of the downconversion process with the red photons recombined before, versus after.  Similarly, I suspect that mixing in and then peeling off the red photons from the object before NL2 would not do the job, though I don't see a no-signaling argument in this case.  But it seems crucial, in order for the yellow photons to bear an imprint of interference between the red ones, that the red ones from the object be present during the downconversion process.

The news piece summarizing the article in Nature is much better than the one at New Scientist, in that it does explain that there are two pairs, and that the one member of one pair is passed through the object and recombined with something from the other pair.  But it does not make it clear that the recombination takes place before the second pair is created---indeed it strongly suggests the opposite:

According to the laws of quantum physics, if no one detects which path a photon took, the particle effectively has taken both routes, and a photon pair is created in each path at once, says Gabriela Barreto Lemos, a physicist at Austrian Academy of Sciences and a co-author on the latest paper.

In the first path, one photon in the pair passes through the object to be imaged, and the other does not. The photon that passed through the object is then recombined with its other ‘possible self’ — which travelled down the second path and not through the object — and is thrown away. The remaining photon from the second path is also reunited with itself from the first path and directed towards a camera, where it is used to build the image, despite having never interacted with the object.

Putting the quote from Barreta Lemos about a pair being created on each path before the description of the recombination suggests that both pair-creation events occur before the recombination, which is wrong. But the description in this article is much better than the New Scientist description---everything else about it seems correct, and it gets the crucial point that there are two pairs, one member of which passes through the object and is recombined with elements of the other pair at some point before detection, right even if it is misleading about exactly where the recombination point is.

[Update 8/28: clearly if we peel the red photons off before NL2, and then peel the red photons created by downconversion at NL2 off after NL2 but before the final beamsplitter and detectors, we don't get interference because the red photons peeled off at different times are in orthogonal modes, each associated with one of the two different beams of yellow photons to be combined at the final beamsplitter, so the interference is destroyed by the recording of "which-beam" information about the yellow photons, in the red photons. But does this mean if we recombine the red photons into the same mode, we restore interference? That must not be so, for it would allow signaling based on a decision to recombine or not in a region which could be arranged to be spacelike separated from the final beamsplitter and detectors.  But how do we see this more directly?  Having now done a highly idealized version of the calculation (based on notation like that in and around Eq. (1) of the paper) I see that if we could do this recombination, we would get interference.  But to do that we would need a nonphysical device, namely a one-way mirror, to do this final recombination.  If we wanted to do the other variant I discussed above, recombining the red photons that have passed the object with the red (and yellow) photons created at NL2 and then peeling all red photons off before the final detector, we would even need a dichroic one-way mirror (transparent to yellow, one-way for red), to recombine the red photons from the object with the beam coming from NL2.  So the only physical way to implement the process is to recombine the red photons that have passed through the object with light of the original wavelength in the lower arm of the interferometer before NL2; this just needs an ordinary dichroic mirror, which is a perfectly physical device.]

Free will and retrocausality at Cambridge II: Conspiracy vs. Retrocausality; Signaling and Fine-Tuning

Expect (with moderate probability) substantial revisions to this post, hopefully including links to relevant talks from the Cambridge conference on retrocausality and free will in quantum theory, but for now I think it's best just to put this out there.

Conspiracy versus Retrocausality

One of the main things I hoped to straighten out for myself at the conference on retrocausality in Cambridge was whether the correlation between measurement settings and "hidden variables" involved in a retrocausal explanation of Bell-inequality-violating quantum correlations are necessarily "conspiratorial", as Bell himself seems to have thought.  The idea seems to be that correlations between measurement settings and hidden variables must be due to some "common cause" in the intersection of the backward light cones of the two.  That is, a kind of "conspiracy" coordinating the relevant hidden variables that can affect the meausrement outcome with all sorts of intricate processes that can affect which measurement is made, such as those affecting your "free" decision as to how to set a polarizer, or, in case you set up a mechanism to control the polarizer setting according to some apparatus reasonably viewed as random ("the Swiss national lottery machine" was the one envisioned by Bell), the functioning of this mechanism.  I left the conference convinced once again (after doubts on this score had been raised in my mind by some discussions at New Directions in the Philosophy of Physics 2013) that the retrocausal type of explanation Price has in mind is different from a conspiratorial one.

Deflationary accounts of causality: their impact on retrocausal explanation

Distinguishing "retrocausality" from "conspiratorial causality" is subtle, because it is not clear that causality makes sense as part of a fundamental physical theory.   (This is a point which, in this form, apparently goes back to Bertrand Russell early in this century.  It also reminds me of David Hume, although he was perhaps not limiting his "deflationary" account of causality to causality in physical theories.)  Causality might be a concept that makes sense at the fundamental level for some types of theory, e.g. a version ("interpretation") of quantum theory that takes measurement settings and outcomes as fundamental, taking an "instrumentalist" view of the quantum state as a means of calculating outcome probabilities giving settings, and not as itself real, without giving a further formal theoretical account of what is real.  But in general, a theory may give an account of logical implications between events, or more generally, correlations between them, without specifying which events cause, or exert some (perhaps probabilistic) causal influence on others.  The notion of causality may be something that is emergent, that appears from the perspective of beings like us, that are part of the world, and intervene in it, or model parts of it theoretically.  In our use of a theory to model parts of the world, we end up taking certain events as "exogenous".  Loosely speaking, they might be determined by us agents (using our "free will"), or by factors outside the model.  (And perhaps "determined" is the wrong word.)   If these "exogenous" events are correlated with other things in the model, we may speak of this correlation as causal influence.  This is a useful way of speaking, for example, if we control some of the exogenous variables:  roughly speaking, if we believe a model that describes correlations between these and other variables not taken as exogenous, then we say these variables are causally influenced by the variables we control that are correlated with them.  We find this sort of notion of causality valuable because it helps us decide how to influence those variables we can influence, in order to make it more likely that other variables, that we don't control directly, take values we want them to.  This view of causality, put forward for example in Judea Pearl's book "Causality", has been gaining acceptance over the last 10-15 years, but it has deeper roots.  Phil Dowe's talk at Cambridge was an especially clear exposition of this point of view on causality (emphasizing exogeneity of certain variables over the need for any strong notion of free will), and its relevance to retrocausality.

This makes the discussion of retrocausality more subtle because it raises the possibility that a retrocausal and a conspiratorial account of what's going on with a Bell experiment might describe the same correlations, between the Swiss National lottery machine, or whatever controls my whims in setting a polarizer, all the variables these things are influenced by, and the polarizer settings and outcomes in a Bell experiment, differing only in the causal relations they describe between these variables.  That might be true, if a retrocausalist decided to try to model the process by which the polarizer was set.  But the point of the retrocausal account seems to be that it is not necessary to model this to explain the correlations between measurement results.  The retrocausalist posits a lawlike relation of correlation between measurement settings and some of the hidden variables that are in the past light cone of both measurement outcomes.  As long as this retrocausal influence does not influence observable past events, but only the values of "hidden", although real, variables, there is nothing obviously more paradoxical about imagining this than about imagining----as we do all the time---that macroscopic variables that we exert some control over, such as measurement settings, are correlated with things in the future.   Indeed, as Huw Price has long (I have only recently realized for just how long) been pointing out, if we believe that the fundamental laws of physics are symmetric with respect to time-reversal, then it would be the absence of retrocausality, if we dismiss its possibility, and even if we accept its possibility to the limited extent needed to potentially explain Bell correlations, its relative scarcity, that needs explaining.  Part of the explanation, of course, is likely that causality, as mentioned above, is a notion that is useful for agents situated within the world, rather than one that applies to the "view from nowhere and nowhen" that some (e.g. Price, who I think coined the term "nowhen") think is, or should be,  taken by fundamental physical theories.  Therefore whatever asymmetries---- these could be somewhat local-in-spacetime even if extremely large-scale, or due to "spontaneous" (i.e. explicit, even if due to a small perturbation) symmetry-breaking --- are associated with our apparently symmetry-breaking experience of directionality of time may also be the explanation for why we introduce the causal arrows we do into our description, and therefore why we so rarely introduce retrocausal ones.  At the same time, such an explanation might well leave room for the limited retrocausality Price would like to introduce into our description, for the purpose of explaining Bell correlations, especially because such retrocausality does not allow backwards-in-time signaling.

Signaling (spacelike and backwards-timelike) and fine-tuning. Emergent no-signaling?

A theme that came up repeatedly at the conference was "fine-tuning"---that no-spacelike-signaling, and possibly also no-retrocausal-signaling, seem to require a kind of "fine-tuning" from a hidden variable model that uses them to explain quantum correlations.  Why, in Bohmian theory, if we have spacelike influence of variables we control on physically real (but not necessarily observable) variables, should things be arranged just so that we cannot use this influence to remotely control observable variables, i.e. signal?  Similarly one might ask why, if we have backwards-in-time influence of controllable variables on physically real variables, things are arranged just so that we cannot use this influence to remotely control observable variables at an earlier time?  I think --- and I think this possibility was raised at the conference --- that a possible explanation, suggested by the above discussion of causality, is that for macroscopic agents such as us, with usually-reliable memories, some degree of control over our environment and persistence over time, to arise, it may be necessary that the scope of such macroscopic "observable" influences be limited, in order that there be a coherent macroscopic story at all for us to tell---in order for us even be around to wonder about whether there could be such signalling or not.  (So the term "emergent no-signalling" in the section heading might be slightly misleading: signalling, causality, control, and limitations on signalling might all necessarily emerge together.) Such a story might end up involving thermodynamic arguments, about the sorts of structures that might emerge in a metastable equilibrium, or that might emerge in a dynamically stable state dependent on a temperature gradient, or something of the sort.  Indeed, the distribution of hidden variables (usually, positions and/or momenta) according to the squared modulus of the wavefunction, which is necessary to get agreement of Bohmian theory with quantum theory and also to prevent signaling (and which does seem like "fine-tuning" inasmuch as it requires a precise choice of probability distribution over initial conditions), has on various occasions been justified by arguments that it represents a kind of equilibrium that would be rapidly approached even if it did not initially obtain.  (I have no informed view at present on how good these arguments are, though I have at various times in the past read some of the relevant papers---Bohm himself, and Sheldon Goldstein, are the authors who come to mind.)

I should mention that at the conference the appeal of such statistical/thermodynamic  arguments for "emergent" no-signalling was questioned---I think by Matthew Leifer, who with Rob Spekkens has been one of the main proponents of the idea that no-signaling can appear like a kind of fine-tuning, and that it would be desirable to have a model which gave a satisfying explanation of it---on the grounds that one might expect "fluctuations" away from the equilibria, metastable structures, or steady states, but we don't observe small fluctuations away from no-signalling---the law seems to hold with certainty.  This is an important point, and although I suspect there are  adequate rejoinders, I don't see at the moment what these might be like.