Limitations on quantum dimensionality reduction

The Johnson-Lindenstrauss Lemma is a classic result which implies that any set of n real vectors can be compressed to O(log n) dimensions while only distorting pairwise Euclidean distances by a constant factor. Here we consider potential extensions of this result to the compression of quantum states. We show that, by contrast with the classical case, there does not exist any distribution over quantum channels that significantly reduces the dimension of quantum states while preserving the 2-norm distance with high probability. We discuss two tasks for which the 2-norm distance is indeed the correct figure of merit. In the case of the trace norm, we show that the dimension of low-rank mixed states can be reduced by up to a square root, but that essentially no dimensionality reduction is possible for highly mixed states.


Introduction
The Johnson-Lindenstrauss (JL) Lemma 1 is a dimensionality reduction result which has found a vast array of applications in computer science and elsewhere (see e.g. Refs. [2][3][4]. It can be stated as follows. Theorem 1 (Johnson-Lindenstrauss Lemma 1 ). For all dimensions d, e, there is a distribution D over linear maps E : R d ! R e such that, for all real vectors v, w, Pr E$D ½ð1 À Þjjv À wjj 2 jjEðvÞ À EðwÞjj 2 jjv À wjj 2 ! 1 À expðÀð 2 eÞÞ; where jj Á jj 2 is the Euclidean ð' 2 Þ distance.
The lemma is usually applied via the following corollary, which follows by taking a union bound.
Corollary 2. Given a set S of n d-dimensional real vectors, there is a linear map E : R d ! R Oðlog n= 2 Þ that preserves all Euclidean distances in S, up to a multiple of 1 À . Further, there is an e±cient randomized algorithm to¯nd and implement E.
There are several remarkable aspects of this result. First, the target dimension does not depend on the source dimension d at all. Second, the randomized algorithm can be simply stated as: choose a random e-dimensional subspace with e ¼ Oðlog n= 2 Þ, project each vector in S onto this subspace, and rescale the result by a constant that does not depend on S. Third, this algorithm is oblivious: in other words, E does not depend on the vectors whose dimensionality is to be reduced.
More generally, let ' d p be the vector space R d equipped with the ' p -norm jj Á jj p . A randomized embedding from ' d p to ' e p with distortion a 1=ð1 À Þ and failure probability is a distribution D over maps E : R d ! R e such that, for all v; w 2 R d , Pr E$D ½ð1 À Þjjv À wjj p jjEðvÞ À EðwÞjj p jjv À wjj p ! 1 À : This de¯nition does not allow the distance between vectors to increase; such embeddings are called contractive. The JL Lemma states that there exists a randomized embedding from ' d 2 to ' e 2 with distortion 1=ð1 À Þ and failure probability expðÀð 2 eÞÞ. Another natural norm to consider in this context is ' 1 . In this case the situation is less favorable: it has been shown by Charikar and Sahai 5 that there exist OðdÞ points in ' d 1 such that any linear embedding into ' e 1 must incur distortion ð ffiffiffiffiffiffiffi ffi d=e p Þ. Brinkman and Charikar later gave a set of n points for which any (even nonlinear) embedding achieving distortion D requires n ð1=D 2 Þ dimensions. 6

The JL Lemma in quantum information theory
The JL Lemma immediately gives rise to a protocol for quantum¯ngerprinting, 7 or in other words e±cient equality testing. Imagine that Alice and Bob each have an n-bit a We use this somewhat clumsy de¯nition of distortion for consistency with prior work.
string, and are required to send quantum states of the shortest possible length to a referee, who has to use these states to determine if their bit strings are equal (this is the so-called SMP, or simultaneous message passing, model of communication complexity 8 ). Associate each bit string with an orthonormal basis vector of R 2 n . Then the JL Lemma guarantees that there exists a map from R 2 n into R OðnÞ such that the inner products between all of these 2 n vectors are preserved, up to a small constant. So Alice and Bob each simply apply this map to their vectors, renormalize the output (which makes very little di®erence to the inner products), and send the Oðlog nÞ qubit states corresponding to the resulting OðnÞ-dimensional vectors to the referee, who applies the swap test to the states. 7 Given two states j i, ji, this test accepts with probability 1 2 þ 1 2 jh jij 2 . As the inner products are approximately preserved by the map into R OðnÞ , the referee can distinguish between the two cases of the states he receives being equal or distinct, with constant probability.
More generally, Alice and Bob can use a similar SMP protocol to solve the following task: given quantum states j A i, j B i, each picked from a set of k states, determine h A j B i up to a constant. Whatever the initial dimension of the states, the JL Lemma (strictly speaking, an easy extension of the JL Lemma to complex vectors) guarantees that they can be compressed to Oðlog kÞ dimensions with at most constant distortion, implying that the referee can estimate h A j B i up to a constant using only Oðlog log kÞ qubits of communication.
However, there is a problem with this protocol. While it is oblivious in the sense that it does not depend on the k states which are given as input, it is not oblivious in the following quantum sense: Alice and Bob each need to know what their states are in order to apply the embedding. b One would expect the right quantum analogue of a randomized embedding to map quantum states to quantum states in an oblivious fashion. Such an algorithm can be expressed as a distribution over quantum channels (completely positive, trace preserving (CPTP) maps 9,10 ), which are the class of physically implementable operations in quantum theory.
Let BðdÞ denote the set of d-dimensional Hermitian operators. The distance between quantum states , 2 BðdÞ can be measured using the Schatten p-norm jj À jj p , which is de¯ned as is the ith eigenvalue of X. The case p ¼ 1 is known as the trace norm, and p ¼ 2 is sometimes known as the Hilbert-Schmidt norm. We have the following de¯nition. De¯nition 1. A quantum embedding from S BðdÞ to BðeÞ in the Schatten p-norm, with distortion 1=ð1 À Þ and failure probability , is a distribution D over quantum channels E : BðdÞ ! BðeÞ such that, for all , 2 S, Pr E$D ½ð1 À Þjj À jj p jjEðÞ À EðÞjj p jj À jj p ! 1 À : Rather than only considering embeddings that succeed for all states in BðdÞ, we generalize the de¯nition to subsets of states. An interesting such subset is the pure b On the other hand, if the unphysical operation of postselection is allowed, the JL Lemma can be applied directly.
states, for which one might imagine stronger embeddings can be obtained. Indeed, a closely related notion has been studied before by Winter, 11 and more recently Hayden and Winter, 12 under the name of quantum identi¯cation for the identity channel. In this setting, the sender Alice has a pure state j i 2 C d and the receiver Bob is given the description of a pure state ji 2 C d . Alice encodes her state j i as a quantum message using a quantum channel E : BðC d Þ ! BðC e Þ and sends it to Bob, who performs a measurement ðD ; I À D Þ on the message. The goal is to obtain approximately the same measurement statistics as if Bob had performed the measurement ðjihj; I À jihjÞ on j i: 8 j i; ji; jtr½D Eðj ih jÞ À jh jij 2 j : Winter showed in Ref. 11 that, for constant , this can be achieved with e ¼ Oð ffiffiffi d p Þ; note that the resulting states Eðj ih jÞ are highly mixed. Winter's result allows the development of a one-way protocol for testing equality of n-bit strings using 1 2 log 2 n þ Oð1Þ qubits of communication from Alice to Bob, which is still the best known separation between one-way quantum and classical communication complexity for total functions. 13 In our terminology, the result of Ref. 11 shows that there exists a quantum embedding from BðdÞ to BðOð ffiffiffi d p ÞÞ that approximately preserves the trace distance between (initially) pure states. But note that one aspect of Winter's result is stronger than we need: he showed the existence of a channel such that the distance is approximately preserved between all pairs of states. Here, we are interested in¯nding distributions D over channels E such that, for an arbitrary pair of states, the distance is approximately preserved with high probability; this is potentially a weaker notion. In particular, it is not necessarily true that the individual channel obtained by averaging over D will preserve the distance between an arbitrary pair of states.
We pause to mention that the JL Lemma has found some other uses in quantum information theory. Cleve et al. 14 used it to give an upper bound on the amount of shared entanglement required to win a particular class of non-local games. Gavinsky, Kempe and de Wolf 15 used it to give a simulation of arbitrary quantum communication protocols by quantum SMP protocols (with exponential overhead). Embeddings between norms have also been used. Aubrun, Szarek and Werner 16,17 have used a version of Dvoretzky's theorem on \almost-Euclidean" subspaces of matrices under Schatten norms to give counterexamples to the additivity conjectures of quantum information theory. And, more recently, Fawzi, Hayden and Sen 18 have used ideas from the theory of low-distortion embeddings of the \' 1 ð' 2 Þ"-norm to prove the existence of strong entropic uncertainty relations.

Our results
In this paper, we show that the dimensionality reduction that can be achieved by quantum embeddings is very limited. We begin, in Sec. 2, by considering the Schatten 2-norm (which is just the vector 2-norm on matrices). We show that, in stark contrast to the JL Lemma, any quantum embedding which preserves the 2-norm distance between (say) orthogonal pure states with constant distortion and constant failure probability can only achieve at most a constant reduction in dimension.
One potential criticism of this result is that the 2-norm is not usually seen as a physically meaningful distance measure, as compared with the trace norm. However, we argue in Sec. 3 that for certain problems the 2-norm is indeed the correct distance measure. We discuss two problemsequality testing without a reference frame and state discrimination with a random measurementwhere the 2-norm appears naturally as the¯gure of merit.
In Sec. 4 we turn to the trace norm, for which we have upper and lower bounds. On the upper bound side, we extend the result of Winter 11 to show that low-rank mixed states are also amenable to dimensionality reduction; roughly speaking, d-dimensional mixed states of rank r can be embedded into Oð ffiffiffiffiffi rd p Þ dimensions with constant distortion. On the other hand, we show using the 2-norm lower bound that highly mixed states cannot be embedded into low dimension: there is a lower bound of ð ffiffiffi d p jjÀjj 1 jjÀjj 2 Þ on the target dimension of any constant distortion trace norm embedding that succeeds with constant probability for the pairs UU † , UU † for all unitary operators U . In particular, this implies an ð ffiffiffi d p Þ lower bound for any embedding which succeeds for a unitarily invariant set of states. In the case that j À j is proportional to a projector (i.e. all non-zero eigenvalues of À are equal in absolute value), our upper and lower bounds coincide.
Finally, some notes on miscellaneous notation. F d will denote the unitary operator which swaps (or°ips) two d-dimensional quantum systems (i.e. F d ¼ P d i;j¼1 jiihjj jjihij), and I d will denote the d-dimensional identity matrix. Whenever we say that U 2 UðdÞ is a random unitary operator, we mean that U is picked uniformly at random according to Haar measure on the unitary group UðdÞ.

Dimensionality Reduction in the 2-Norm
We now show that quantum dimensionality reduction in the 2-norm is very limited.
Theorem 3. Let D be a distribution over quantum channels (CPTP maps) E : BðC d Þ ! BðC e Þ such that, for¯xed quantum states 6 ¼ and for all unitary operators U 2 UðdÞ, Note that the above lower bound on target dimension holds for any embedding of a unitarily invariant set of states. For example, taking and to be orthogonal pure states and inserting ¼ ¼ 0 recovers the (unsurprising) result that any embedding that exactly preserves distances between all orthogonal pure states with certainty must satisfy e ! d. More generally, if we have an embedding which succeeds with constant probability and has constant distortion, the target dimension can be no smaller than ðdÞ. In order to prove the theorem, we will need the following two technical lemmas, which are proved in Appendix A.
Lemma 4. Let E : BðC d Þ ! BðC e Þ be a quantum channel (CPTP map). Then tr½F e E 2 ðF d Þ de: The following lemma is the key to most of the results in this paper.
Lemma 6. Let and be quantum states and let E : We use linearity of E in the¯rst equality, and the second equality is the tensor product trick tr½X 2 ¼ tr½F e X 2 for e-dimensional operators X. The fourth equality is Lemma 5, the¯rst inequality is Lemma 4, and the second inequality is simply tr 2 ! 1=e for all e-dimensional states .
We are¯nally ready to prove Theorem 3.
Proof of Theorem 3. We will prove something slightly stronger: that for a random U , the 2-norm is not approximately preserved under a map E picked from D, unless e is almost as large as d. So assume where we use the unitary invariance of the 2-norm. By Markov's inequality, this implies that Z implying in turn that there must exist some E such that So let E : BðC d Þ ! BðC e Þ be a quantum channel that does satisfy this inequality. Then we have where the second inequality follows from Lemma 6, assuming that e d. We have shown that e ! ð1 À Þð1 À Þ 2 d, completing the proof of the theorem.

Operational Meaning of the 2-Norm
In this section, we discuss the meaning of the 2-norm distance between quantum states. It is usually assumed that the trace norm is the \right" measure of distance between states, and proofs going via the 2-norm usually do so only for calculational simplicity. However, here we argue that the 2-norm is of interest in its own right, by giving two operational interpretations of this distance measure.

Equality testing without a reference frame
Consider the following equality-testing game. We are given a description of two di®erent states and . An adversary prepares two systems in one of the states , , or , with equal probability of each. He then applies an unknown unitary U to each system (i.e. he applies U U to the joint state). Our task is to determine whether the two systems have the same state or di®erent states. This models equality testing in a two-party scenario in which the preparer and tester do not share a reference frame. 19 One protocol for solving this task is simply to apply the swap test 7 to the two states we are given, output \same" if the test accepts, and \di®erent" otherwise. When applied to two states , this test accepts with probability 1 2 þ 1 2 tr , so for any U the overall probability of success is 1 4 Using our previous result, we now show that this is optimal.
Theorem 7. The maximal probability of success of the above game is 1 2 þ 1 8 jj À jj 2 2 .
Proof. Let ðM; I À MÞ be an arbitrary POVM where the operator M corresponds to the answer \same". Then the probability of success achieved by this POVM for a given U is 1 2 þ 1 2 B, where B is the bias, which is equal to If the adversary adopts the strategy of picking U uniformly at random, the average bias obtained is which by Lemma 5 is equal to This expression is maximized by setting M equal to a projector onto the subspace spanned by the eigenvectors of F d À I d 2 d with positive eigenvalues. As F d has dðd þ 1Þ=2 eigenvalues equal to 1, and dðd À 1Þ=2 eigenvalues equal to À1, we obtain tr½MðF d À I d 2 d Þ ¼ ðd 2 À 1Þ=2. This implies that the average bias is at most 1 4 jj À jj 2 2 . As the worst-case bias can only be lower, this implies the claimed result.

Performing a random measurement
The second game we will discuss is state discrimination with a¯xed or random measurement. Imagine we are given a state which is promised to be either or , with equal probability of each, and we wish to determine which is the case. It is well known that the largest bias achievable by choosing an appropriate measurement is 1 2 jj À jj 1 (recall from the previous section that the bias B and the success probability p have the relationship p ¼ 1 2 þ B 2 ). But how well can we do if the measurement we apply does not in fact depend on and ?
We will see that jj À jj 2 is closely related to the optimal bias achievable by performing one of the following two measurements, and deciding whether the state is or based on the outcome.
. The uniform (isotropic) POVM whose measurement elements consist of normalized projectors onto all states j i.
. A projective measurement in a random basis (i.e. applying a random unitary operator and measuring in the computational basis).
In general, the largest bias achievable by measuring a POVM M which consists of measurement operators M i can be written as Each measurement operator of the uniform POVM is given by the projector onto some state j i, normalized by a factor of d (to check that this is right, note that . So the bias induced by the uniform POVM is In the case of a measurement in a random basis U 2 UðdÞ, we can calculate the expected bias as follows: so these quantities are the same. They are also closely related to the 2-norm distance, as we will now see.
Theorem 8. Let , be d-dimensional quantum states. Then The lower bound in Theorem 8 was shown by Ambainis and Emerson 20 (see also the proof of Matthews, Wehner and Winter 21 ), and the upper bound is not hard. However, as this result does not appear to be widely known, we include a proof (which is essentially the same as that of Ref. 21) in Appendix B.
In fact, the corresponding upper and lower bounds on the bias hold for any¯xed POVM whose measurement vectors form a 4-design, 20 and the upper bound even holds for any¯xed POVM whose vectors form a 2-design. This result can be useful in cases where one wishes to perform state discrimination without necessarily being able to construct the optimal measurement e±ciently. 22 See Ref. 21 for much more detail on the bias achievable in state discrimination with¯xed measurements.

Dimensionality Reduction in the Trace Norm
In this section we consider embeddings that reduce dimension while preserving the trace norm distance between states. As no quantum channel can increase this distance, we¯rst observe that any such embedding will automatically be contractive.

Upper bound
It was previously shown by Winter 11 that, in our language, d-dimensional pure states can be embedded into BðOð ffiffiffi d p ÞÞ with constant distortion. We now extend this result to general mixed states, by showing that rank r mixed states can be embedded into dimension Oð ffiffiffiffiffi rd p Þ with constant distortion. The embedding is conceptually very simple: apply a random unitary and trace out a subsystem. However, when the target dimension e does not divide d, we are forced to consider random isometries V : C d ! C e C dd=ee instead of unitaries, where dxe is the smallest integer y such that y ! x. Recall that an isometry is a norm-preserving linear map, i.e. a map taking an orthonormal basis of one space to an orthonormal set of vectors in another (potentially larger) space. A random isometry is de¯ned as ā xed isometry followed by a random unitary. Formally, our embedding is a distribution over the following quantum channels E V . De¯nition 2. Let d and e be positive integers such that e d. For any isometry V : C d ! C e C dd=ee , let E V : BðC d Þ ! BðC e Þ be the quantum channel that consists of performing V , then tracing out (discarding) the second subsystem.
We now analyze the performance of the embedding obtained by picking a random V and applying this channel.
Theorem 9. Let d be a positive integer, and let and be arbitrary d-dimensional mixed states such that has rank r. Fix such that 0 < < 1. For any e such that 2 ffiffiffiffiffiffiffiffiffi rd= p e d, let D be the distribution on channels E V : BðC d Þ ! BðC e Þ that is uniform on isometries V : for a universal constant K which may be taken to be ð1 À ln 2Þ=ð2 ln 2Þ % 0:22.
In order to prove this theorem, we will need the following technical lemma, which is proven in Appendix C. Lemma 10. Let H ¼ H A H B be a¯nite-dimensional Hilbert space decomposed into subsystems A and B. For any projector P onto a subspace of H, let P ? ¼ I À P be the projector onto the orthogonal subspace, and let D be the projector onto the support of tr B P . Then, for any j i 2 H, tr½ðD IÞP ? j ih jP ? tr½ðD IÞj ih jtr½P ? j ih j: We will also need the following useful result of Bennett et al. 23 (see also Ref. 11).
Lemma 11. Let j i be a d-dimensional pure state, let P be the projector onto a t-dimensional subspace of C d , and let U 2 UðdÞ be picked according to Haar measure. Then, for any ! 0, Proof of Theorem 9. We will upper bound the probability of the embedding failing, i.e.
Let S þ and S À be the disjoint sets of indices of ð À Þ's positive and negative eigenvalues, respectively. Set s ¼ jS þ j, and note that s rankðÞ ¼ r. 24 For a¯xed V , expand V ð À ÞV † as follows: For any states 0 and 0 , it holds that in a protocol for distinguishing 0 and 0 , M is a measurement operator corresponding to the outcome that the state was 0 . Thus, in order for it to hold that jjE V ð À Þjj 1 ! ð1 À Þjj À jj 1 , it su±ces to exhibit an operator M such that 0 M I and To¯nd such an operator, set Note that P V is the projector onto a random s-dimensional subspace of C e C dd=ee . Now let D V be the projector onto the support of tr B P V . Then For all i 2 S þ , tr½D V tr B j i ih i j ¼ 1, and for all i 2 S À , it holds that tr½P V j i ih i j ¼ 0. Aside from this constraint, each individual state j i i, i 2 S À , is picked at random and can be expressed in terms of a general random state ji 2 C e C dd=ee as where P ? V ¼ I À P V and the denominator is non-zero with probability 1. Then where the inequality is Lemma 10. For any e such that e ! sdd=ee, D V has rank sdd=ee with probability 1. So, for any such e, D V I has rank sdd=ee 2 with probability 1. Applying Lemma 11, for any ! 0, and hence Using a union bound over S À in Eq. (1), for any e satisfying e ! sdd=ee it holds that We now set ¼ e sdd=ee À 1. This gives the following bound, valid when e ! sdd=ee: d exp Àsðd=eÞdd=ee e sdd=ee À 1 À ln e sdd=ee ðln 2Þ ðln 2Þ : Now the function fðxÞ ¼ xð1 þ lnð1=xÞÞ increases with x in the range 0 < x 1, so for any e such that sdd=ee e 1=2, we have ¼ d expðÀdð1 À ln 2Þ=ð2 ln 2ÞÞ: Thus this inequality holds for any e such that e ! 2sdd=ee. As dd=ee 2d=e for e d, this will be satis¯ed for any e ! 2 ffiffiffiffiffiffiffiffiffi ffi sd= p , and in particular any e ! 2 ffiffiffiffiffiffiffiffiffi rd= p , implying for any such e Pr E V $D ½jjE V ðÞ À E V ðÞjj 1 ð1 À Þjj À jj 1 d expðÀdð1 À ln 2Þ=ð2 ln 2ÞÞ as required.
Although this result is expressed in terms of the rank of the input states, a similar result would apply to states which are very close (in trace norm) to having low rank, but for simplicity we do not discuss this here.

Lower bound
It turns out that Lemma 6 is also strong enough to give a bound on embeddings of the trace norm, via a similar proof to that of Theorem 3. Charikar and Sahai 5 showed that there exists a set of OðdÞd-dimensional vectors whose dimension cannot be signi¯cantly reduced while preserving their ' 1 distances. One might expect the same to be true for the trace norm, as the trace norm on diagonal matrices is just the ' 1 -norm of the diagonal entries. However, note that this does not follow immediately from Charikar and Sahai's work, as it is conceivable that an embedding mapping diagonal to non-diagonal matrices could do better. Nevertheless, we now show that dimensionality reduction is impossible for some sets of highly mixed states.
So we see that achieving any signi¯cant dimensionality reduction for arbitrary highly mixed states is impossible, and even for pure states the dimension can only be reduced by a square root (which was already known 11 ).
Rearranging gives the theorem.
This implies that the protocol of Theorem 9 is optimal for certain families of states, up to constant factors. Consider the family of pairs UU † , UU † for all U 2 UðdÞ, where and are proportional to projectors onto orthogonal r-dimensional subspaces of C d . Then implying that embeddings of this family with constant distortion and failure probability have a lower bound on the target dimension of ð ffiffiffiffiffi rd p Þ, which is achieved by the embedding of Theorem 9.

Conclusions
We have shown that in the 2-norm, any constant-distortion embedding of a unitarily invariant set of d-dimensional states must have target dimension ðdÞ, in contrast to the classical situation where an exponential reduction can be achieved. In the trace norm, the situation is somewhat better: d-dimensional states of rank r can be embedded in Oð ffiffiffiffiffi rd p Þ dimensions with constant distortion, but there is a lower bound of ð ffiffiffi d p jjÀjj 1 jjÀjj 2 Þ dimensions on any constant distortion embedding that succeeds for the pairs of states UU † and UU † , for all unitary U .
Although the trace distance is often the most physically relevant distance measure to consider, we also argued that for certain tasks, the 2-norm distance is in fact the relevant distance measure between states. This occurs when the basis in which the states were prepared is unknown or the measurement apparatus does not depend on the states to be distinguished.
The alert reader will have noticed that, in the case where one is interested in embedding a unitarily invariant set of states, the embedding might as well start by performing a random unitary. Furthermore, as any quantum channel can be represented as an isometry into a larger space followed by tracing out a subsystem, this makes any embedding seem somewhat similar to the embedding used in Theorem 9. But note that the latter embedding is subtly di®erent, as it can be seen as performing a¯xed isometry followed by a random unitary, rather than vice versa. Further analysis of this embedding might allow the gap between the upper and lower bounds in the trace norm to be closed.
Another open question is whether bounds could be obtained on the possible dimensionality reduction when multiple copies of the input state are available. For example, if a very large number of copies are allowed, tomography can be performed, the input state can be approximately determined, and the JL Lemma applied. Presumably, even for a lower number of copies, stronger dimensionality reduction is possible than in the single-copy case. One could also ask whether stronger dimensionality reduction can be achieved by allowing some additional classical information; for some results in this direction, see Ref. 18.

Appendix A. Lemmas Relating to 2-Norm Embeddings
We now prove the subsidiary lemmas required for the proof of Lemma 6.
Proof of Lemma 4. Assume that E has the Kraus (operator-sum) decomposition (Note that such a representation does indeed exist, from the unitary freedom in the Kraus decomposition. 9 ) Then write The fourth equality uses the orthogonality of the A i and cyclicity of the trace, and the¯nal inequality uses the facts that Proof of Lemma 5. For brevity, set :¼ R U 2 ð À Þ 2 ðU † Þ 2 dU. Because of the averaging (\twirling") over the unitary group, must be a linear combination of the identity and swap operators on the space of two d-dimensional systems. 25 To evaluate this, we write ¼ I d 2 þ F d and calculate implying that Solving for and gives the claimed result.

Appendix B. Proof of Theorem 8
We follow the strategy of Matthews, Wehner and Winter 21 to prove Theorem 8. We will use two subsidiary results, which are formalized as separate lemmas.
Proof. We use the tensor product trick: noting that À is traceless and that R d ðj ih j 2 Þ is proportional to the projector onto the symmetric subspace of two d-dimensional systems.
Lemma B.2. Let , be d-dimensional quantum states. Then Z d h jð À Þj i 4 9tr½ð À Þ 2 2 dðd þ 1Þðd þ 2Þðd þ 3Þ : Proof. This is the same technique as the previous lemma, but is a little more involved. Writing Z d h jð À Þj i 4 ¼ tr ð À Þ 4 Z d ðj ih j 4 Þ ; we note that R d ðj ih j 4 Þ is proportional to the projector onto the symmetric subspace of four d-dimensional systems, which we write as where S 4 is the symmetric group of order 4 and P is the operator that permutes the four systems according to the permutation . Let CycðÞ denote the sequence of cycle lengths in (e.g. Cycðð12Þð3ÞÞ ¼ ð2; 1Þ). Then, for any d-dimensional operator X, it holds that which can be shown diagrammatically or by explicitly writing out the P matrix. In particular, trP ¼ d jCycðÞj . Permutations of four elements break down into¯ve conjugacy classes, as follows: there is one of the form ð1Þð2Þð3Þð4Þ; six of the form ð12Þð3Þð4Þ; three of the form ð12Þð34Þ; eight of the form ð123Þð4Þ; and six of the form ð1234Þ. Thus We can now calculate ð3tr½ð À Þ 2 2 þ 6tr½ð À Þ 4 Þ; where we use the fact that À is traceless to ignore all terms corresponding to permutations with¯xed points. The upper bound claimed in the statement of the theorem follows by simply noting that tr½ð À Þ 4 tr½ð À Þ 2 2 .
We are¯nally ready to prove Theorem 8, which we restate for convenience.
Proof of Theorem 8. The upper bound is straightforward: where the¯rst inequality is Jensen's inequality, and the equality is Lemma B. Appendix C. Proof of Lemma 10 We now prove Lemma 10, which we restate for convenience.
Proof of Lemma 10. The inequality clearly holds if tr½P ? j ih j ¼ 0, so assuming this is not the case and dividing both sides by tr½P ? j ih j, the left-hand side is equal to tr½ðD IÞðI À P Þj ih jðI À P Þ 1 À tr½P j ih j : The key observation which will allow us to simplify this expression is that ðD IÞP ¼ P ¼ P ðD IÞ. To see this, note that the support of P is contained within the subspace onto which D I projects, implying that D I acts as the identity with respect to P. The left-hand side thus simpli¯es to tr½ðD IÞj ih j À tr½P j ih j 1 À tr½P j ih j tr½ðD IÞj ih jð1 À tr½P j ih jÞ 1 À tr½P j ih j ¼ tr½ðD IÞj ih j as claimed.