Continuous LWE is as Hard as LWE
(and Applications to Gaussian Mixture Learning)
Aparna Gupte∗ Neekon Vafa† Vinod Vaikuntanathan‡
MIT MIT MIT
January 31, 2022
Abstract
We show a direct and conceptually simple reduction from the classical learning with errors
(LWE) problem to its continuous analog called CLWE (Bruna, Regev, Song and Tang, STOC
2021). This allows us to bring to bear the powerful machinery of LWE-based cryptography to
the applications of CLWE.
As a concrete application, we show a nearly tight hardness result for the problem of dis-
tinguishing between a mixture of Gaussians in Rn and the standard multivariate Gaussian,
under the (plausible and widely believed) exponential √hardness of the classical LWE problem.
In particular, we demonstrate a mixture of roughly O( log n) Gaussians in Rn which is indis-
tinguishable from the standard multivariate Gaussian N (0, In×n) with poly(log n) samples and
poly(n) time. This gives us a tight computational gap as the problem can be solved in slightly
quasipolynomial time, even with only roughly log n samples.
Our result improves on Bruna, Regev, Song and Tang (STOC 2021) who show the hardness
√
of learning mixtures of more than n Gaussians under the worst-case quantum hardness of
lattice problems. The best known polynomial-time algorithms can learn any mixture of O(1)
Gaussians.
Our key technique is an improved reduction from classical LWE to LWE with k-sparse secrets
(Goldwasser, Kalai, Peikert and Vaikuntanathan, ITCS 2010; M√icciancio, Theory of Computing,
2018) where the multiplicative increase in the noise is only O( k), independent of the ambient
dimension n.
∗Email: agupte@mit.edu
†Email: nvafa@mit.edu.
‡Email: vinodv@mit.edu
Contents
1 Introduction 1
1.1 Our Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Technical Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Open Questions and Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Preliminaries 5
2.1 Lattices and Discrete Gaussians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Learning with Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Reducing LWE to CLWE 9
4 Hardness of k-sparse LWE 16
5 Hardness of Density Estimation for Mixtures of Gaussians 23
6 Low-Sample Algorithm for hCLWE(g) 27
i
1 Introduction
The problem of learning a mixture of Gaussians is of fundamental importance in many fields of
science [TTM+85, MP00]. Given a set of g multivariate Gaussians in n dimensions, parameterized
by their means µ ni ∈ R , covariance matrices Σ ∈ Rn×ni , and non-negative weights w1, . . . , wg
summing to one, the Gaussian mixture model is defined to be the distribution generated by picking
a Gaussian i ∈ [g] with probability wi and outputting a sample from N (µi,Σi).
Dasgupta [Das99] initiated the study of this problem in computer science. A strong notion of
learning mixtures of Gaussians is that of parameter esetimation, that is to estimate all µi, Σi and
wi given samples from the distribution. If one assumes the Gaussians in the mixture are well-
separated, then the problem is known to be tractable, for a constant number of Gaussians [Das99,
SK01, VW02, AM05, KSV05, DS07, BV08, MV10, BS15, HP15, RV17, HL18, KSS18, DKS18].
Moitra and Valiant [MV10] and Hardt and Price [HP15] also show that for parameter estimation,
there is an information theoretic lower bound on the sample complexity that is exponential in the
number of Gaussian components, namely g.
Consequently, it makes sense to ask for a weaker notion of learning the mixture of Gaussians in
this case, where the goal is to output a density estimate for the mixture of Guassians that is -close
in statistical distance to the underlying mixture of Gaussians [FSO06]. That is, given some samples
from the Gaussian mixture, one can ask if there is an efficient algorithm that outputs some density
oracle (e.g. a circuit) that on any input x ∈ Rn, outputs an estimate of the density at x which
closely approximates the density of the underlying Gaussian mixture. The sample complexity of the
density estimation problem does not suffer from the exponential dependence in g, as was the case
for parameter estimation. In fact, Diakonikolas, Kane, and Stewart [DKS17] show a poly(n, g, 1/)
upper bound on the sample complexity, by giving an exponential time algorithm for the problem
of density estimation. Given this, one could hope for a more efficient, ideally poly(n, g, 1/)-time,
algorithm for the problem of density estimation.
Unfortunately, Diakonikolas, Kane, and Stewart [DKS17] show that even this weaker notion of
learning mixtures of Gaussian has a super-polynomial lower bound in the restricted statistical query
(SQ) model [DKS17] (see [Kea98, FGR+17] for a formal description of the SQ model). Explicitly,
they show that any SQ algorithm giving density estimates requires nΩ(g) queries to an SQ oracle of
precision n−O(g); this is super-polynomial as long as g is super-constant.
However, this lower bound does not show anything about arbitrary polynomial time algorithms for
density estimation. Known algorithms for density estimation (e.g. [MV10]) all have a poly(n, 1/)-
time algorithm in dimension n outputting an estimate with statistical distance , where the number
of Gaussians g in the mixture is some fixed constant. However, the dependence on g is exponential;
in [MV10], the dependence is nf(g) for some f , meaning they only show that this runs in polynomial
time for constant g.
Recently, Bruna, Regev, Song and Tang [BRST21] show that an algorithm for outputting a
density estimate implies an algorithm for (widely believed to be hard) worst-case lattice problems.
That is, they give a reduction from worst-case lattice problems to outputting a density estimate
√
for mixtures of g > n Gaussians, giving a lower-bound for outputting Gaussian mixture density
estimates under well-founded cryptographic assumptions.
1
1.1 Our Results
Instead of focusing on density estimation for mixtures of Gaussians, we focus our attention on
showing hardness for the easier problem of distinguishing N := N (0, In×n) and a certain mixture
of g Gaussians which has large statistical distance from it. Let S be some distribution over Rn,
let H(g)(s) be some mixture of g Gaussians over Rn indexed by a vector s ∈ Rn, and let H(g)(S)
denote the resulting distribution over mixtures of Gaussians when choosing and fixing some s ∼ S
across all samples. As long as ∆(N,H(g)(s)) ≥ 1/2 for all s, it turns out that distinguishing N
and H(g)(S) is easier than density estimation. From here on out, we will fix H(g)(s) to be some
particular mixture of Gaussians that we consider (indexed by s). (In particular, jumping ahead, we
remark that it will be the homogeneous CLWE distribution truncated to g Gaussians with secret
direction s.)
Our main results consist of a reduction from LWE to CLWE and an improved leakage-resilience
theorem for k-sparse LWE, which when put together, give us an exponential improvement of roughly
√ √
log n Gaussians as compared to the n Gaussians as in Bruna et al. [BRST21], at the expense of
a stronger computational assumption.
Theorem 1 (Informal). Assume that the exponential LWE assumption holds. Then, the problem
(√ )
of distinguishing N and H(g)(S) for roughly g = O log n Gaussians with poly(log n) samples
requires quasipolynomial time.
The learning with errors (LWE) problem [Reg09] asks to distinguish between “LWE samples”
(ai, b = 〈ai, s〉 + ei (mod q)) where the LWE secret s ∈ Znq is chosen at random and fixed for all
samples, ai ∼ Znq is uniformly random, and ei ∼ DZ,σ is drawn from a discrete Gaussian distribution
with standard deviation σ. The LWE problem has been very well-studied in the cryptography
community and lies at the center of efforts by the National Institutes of Standards and Technology
√
(NIST) to develop post-quantum cryptosystems. In particular, for q = poly(n) and σ = O( n), the
LWE problem is believed to be exponentially hard; that is, hard for 2n -time algorithms that have
2n
 LWE samples, for any  < 1 (see, e.g. [LP11]).
In other words, we show an exponentially tighter result for density estimation for mixtures of
√ √
Gaussians than [BRST21] ( log n vs. n Gaussians) under a stronger hardness assumption. We
remark that translating the stronger hardness assumption into the tighter lower bound requires
substantially new techniques which we elaborate on in the rest of the introduction.
One crucial difference in the mixture of Gaussians we consider is that the secret direction dis-
tribution S is now discrete, where in [BRST21], it was continuous over Rn. This allows us to
give simple algorithmic upper bounds, letting us state a tight computational gap for the task of
distinguishing these distributions, N and H(g)(S).
Theorem 2 (Informal). There is an algorithm running in quasipolynomial time in n distinguishing
N and H(g)(S) using roughly O(log n) samples (for the same H(g)(S) as in Theorem 1).
Combining the above two theorems, assuming exponential LWE, we get that the time complexity
√
of distinguishing N and H(g)(S) with poly(log n) samples, where g is roughly O( log n), is exactly
quasipolynomial in n. (The remaining question is then which exact quasipolynomial time-bound it
is).
2
1.2 Other Applications
We mention that our hardness result for CLWE can also be applied in showing (further) hardness
of learning single periodic neurons, i.e., neural networks with no hidden layers and a periodic
activation function ϕ(t) = cos(2πγt) with frequency γ. Song, Zadik, and Bruna [SZB21] give a
direct reduction from CLWE to learning single periodic neurons, showing hardness of learning this
class of functions assuming the hardness of CLWE. Our reduction from LWE to CLWE shows that
this hardness result can be based directly on LWE instead of worst-case lattice assumptions, as done
in [BRST21]. Furthermore, our results expand the scope of their reduction in two ways. First, their
√
reduction shows hardness of learning periodic neurons with frequency γ ≥ n, while ours, based
√
on exponential hardness of LWE, applies to frequencies almost as small as γ = O( log n), which
covers a substantially larger class of periodic neurons. Second, our hardness of k-sparse CLWE from
(standard) LWE shows that even learning sparse features (instead of features drawn from the unit
sphere Sn−1) is hard under LWE for appropriate parameter settings.
We also note that our hardness result for k-sparse LWE may be useful in other settings. Partic-
ularly, sparse binary secrets are attractive in practical contexts (e.g. post-quantum cryptographic
objects [NIS]) as well as theoretical ones (e.g. where having a low-norm secret, such as in reducing
noise blowup in fully-homomorphic encryption, is beneficial).
1.3 Technical Overview
Bruna, Regev, Song and Tang [BRST21] introduced a continuous version of LWE called CLWE,
and showed that CLWE is hard assuming worst-case lattice assumptions, in a similar way to how
LWE is hard assuming worst-case lattice assumptions.
Definition 1 (CLWE Distribution [BRST21], informally and rescaled). Let γ, β ∈ R, and let S be
a distribution over the (n−1)-sphere, Sn−1 ⊂ Rn. Let CLWE(m,S, γ, β) be the distribution given by
sampling a1, · · · ,am ∼ N (0, In×n), w ∼ S, e1, · · · , em ∼ N (0, β
2) and outputting (ai, γ · 〈ai,w〉+ei
(mod 1)) for all i ∈ [m]. We refer to n as the dimension and m as the number of samples.
While hardness from worst-case lattice assumptions phrases the hardness of LWE and CLWE as
an analogy, our main conceptual contribution is a direct reduction from LWE to CLWE. At a high
level, the goal of this reduction is to reduce samples from Znq to N := N (0, In×n) (the multivariate
Gaussian), secrets from Znq to Sn−1, the (n − 1) dimensional sphere embedded in Rn, and errors
from discrete to uniform Gaussians (and also mod q to mod 1).
One useful tool in going from discrete to continuous for these distributions is adding continuous
Gaussian noise in various places. As an example, to make samples from Tnq := U([0, q)n) (i.e. the
n-wise product of the continuous uniform distribution over [0, q)) instead of Znq , we add a sufficiently
wide continuous Gaussian to the samples, and argue that this converts Znq to Tnq at some small cost
to the width of the noise. Two other types of changes are needed to make the reduction go through.
First, we need to fix the norm of the LWE secret, to make it a direction in Rn, and second, we
have to convert continuous uniform samples to continuous Gaussian samples. We make the secret
direction have fixed norm by using instead a binary seceret s ∼ {+1,−1}n and relying on a work
of Micciancio [Mic18] to argue hardness. We describe how we convert uniform samples to Gaussian
samples immediately below. See Figure 1 for a full breakdown of the reduction.
To go from uniform to Gaussian samples, Boneh et al. [BLMR13] give a general reduction from
discrete uniform samples to “coset-sampleable” distributions, and as one example, they show how
3
to reduce discrete uniform samples to discrete Gaussian samples, at the cost of a log(q) multiplica-
tive overhead in the dimension, which in some sense is unavoidable information-theoretically. We
improve this reduction and circumvent this lower bound in the continuous version by having no
overhead in the dimension, i.e. the dimension of both samples are the same. The key ingredient to
this improvement is a simple Gaussian pre-image sampling algorithm, which on input z ∼ U([0, q)),
outputs y such that q · y = z (mod q) and y is statistically close to a continuous Gaussian (when
marginalized over z ∼ U([0, 1))). (See Lemma 12 for a more precise statement.)
Bruna et al. [BRST21] show that a homogeneous version of CLWE, called hCLWE, which we
denote here as H(g)(S), has a natural interpretation as a certain distribution of mixtures of Gaus-
sians. They show that any distinguisher between H(g)(S) and the standard multivariate Gaussian
N turns out to be enough to solve CLWE, which thus solves worst-case lattice problems. Therefore,
density estimation for Gaussian mixtures implies a solver for CLWE, and so under worst-case lattice
√ √
assumptions, density estimation for g > n Gaussian mixtures is hard. (The condition that g > n
is a consequence of their worst-case to average-case reduction.)
This direct reduction from LWE to CLWE opens up a large toolkit of techniques that were
developed in LWE-based cryptography. In this work, we leverage tools from leakage-resilient cryp-
tography [Mic18, BD20] to greatly improve the hard instance of [BRST21]. It turns out the number
of Gaussians g in the mixture at the end of the day roughly corresponds to the norm of the secrets
in LWE. Thus, if we can assume hardness of low-norm secrets LWE, then we get hardness for a
small number of Gaussians.
Indeed, we achieve hardness of low norm LWE secrets by reducing LWE to k-sparse LWE,
using an improved leakage-resilience theorem for LWE with k-sparse secrets. We call a vector
s ∈ {+1, 0,−1}n k-sparse if it has exactly k non-zero entries.
Theorem 3 (Informal). Assume LWE in dimension ` with n samples is hard with secrets s ∼ Z`q
and√errors of width σ. Then, LWE in dimension n with k-sparse secrets is hard for errors of width
O( k · σ), as long as k log2(n) ` log2(q).
We note that showing hardness for LWE with sparse binary secrets is attractive in other settings,
both practical and theoretical. In practice, LWE-based cryptosystems sometimes use sparse secrets
(and small corresponding errors) to get concrete efficiency gains, and in theory, sparse secrets allows
LWE hardness to be interpreted in a fine-grained way.
√It turns out that for our purposes, it is crucial that the blowup in the noise is only a multiplicative
O( k) factor. Micciancio [Mic18] gives a simple proof for the hardness of LWE for {+1,−1}n secrets
√
with a O( n) blowup in the noise with secrets s ∼ {+1,−1}n. In fact, we can view our k-sparse
hardness result as a generalization of the work of Micciancio [Mic18] for arbitrary sparsity k, inste√ad
of sparsity k = n, which becomes to {+1,−1}n. At the same time, we wish to get a smaller k
blowup in the noise. Brakerski and Döttling [BD20] give a general reduction from LWE to LWE
with arbitrary secret distributions with lar√ge enough√entropy, but the noise blowup when applying
their results directly to k-sparse secrets is kmn k (where m is the number of samples), which
is too large for our purposes. √
We now describe how we get this improvement to only O( k) blowup in the noise. Our starting
point is the reduction of Micciancio [Mic18], which gives a reduction for {+1,−1}n secrets with
√
O( n) noise blowup. Typically, reductions like this would map standard LWE to the binary LWE
distribution and uniform to uniform, but the reduction of [Mic18] takes a different form. The main
insight of that works is that it suffices to give an efficiently computable randomized mapping ϕ
4
that maps the uniform distribution, U(Zn×mq ) (or just Zn×mq to abuse notation) to the binary LWE
distribution, LWEbin (where secrets are binary) but also maps standard LWE distribution (with
secret matrices instead of vectors) to another standard LWE distribution (with secret matrices
instead of vectors). Very informally, we have
ϕ(Zm×nq ) = LWEbin, and ϕ(LWE) = LWE.
The argument of why such a ϕ is sufficient is that under the LWE assumption, LWE ≈ Zn×mq ,
so a distingiusher for LWE and Zn×mbin q would imply a distinguisher for LWEbin = ϕ(Zn×mq ) and
LWE = ϕ(LWE), which would then imply a distinguisher for Zn×mq and LWE, by applying our
mapping ϕ in the reduction. Thus, constructing some efficient ϕ is sufficient.
As a first attempt, one might try ϕ(B) = [B,Bz + e], where z ∼ {+1,−1}n and e is some noise
of width σ. This indeed maps ϕ(Zm×nq ) = LWEbin. Furthermore, ϕ(LWE) is almost the same as
LWE by the leftover hash lemma, except that the noise matrix becomes [E,Ez + e]. However, this
noise is no longer Gaussian, as the noise is correlated with the secret z. To salvage this, Micciancio
[Mic18] carefully constructs a gadget matrix n×O(n)Q ∈ Zq to make the correlations cancel out and
modifies the mapping ϕ appropriately, along with adding more sources of randomness. Explicitly,
the mapping becomes
[ ]
ϕ(B) = [s, s · a> +B,G]Q>Z, s + e ,
where s ∼ Zmq , a ∼ Zn−1
m×O(n)
q , G ∼ DZ,σ , e ∼ D
m
Z,2σ, where Z = diag(z) for z ∼ {+1,−1}
n.
Our main technical contribution is to give a similar mapping ϕ that works for the case when z
is k-sparse. Ultimately, this boils down to carefully adjusting the gadget matrix Q and the matrix
Z to work in the k-sparse case. For a full description, see Section 4 (particularly Lemma 17 and
Lemma 18).
1.4 Open Questions and Future Directions
The best algorithms for learning mixtures of Gaussians run in polynomial time only for constantly
√
many Gaussians. We show hardness (under a plausible setting of LWE) for roughly log n Gaus-
sians. In fact, for our distribution of Gaussians, we know from Bruna et al. [BRST21] that there
exists an algorithm running in time roughly 2O(g2), which becomes almost polynomial at the ex-
tremes of our parameter settings, which makes our lower bound nearly tight assuming LWE in these
parameter settings.
One way to interpret our result is that if there is an algorithm for estimating density of mixtures
of 2−g Gaussians (even just in our case) in time poly(n) · 2g using poly(log n) samples, then we
get a state-of-the-art algorithm for LWE (runtime 2nδ where δ < 1), even for just our mixture of
Gaussians. Valiant and Moitra [MV10] have a nf(g) dependence in their runtime. Is it possible to
do any better, and does this improve any state-of-the-art algorithms for LWE?
2 Preliminaries
For a distribution D, we write x ∼ D to denote a random variable x being sampled from D. For
any n ∈ N, we let Dn denote the n-fold product distribution, i.e. (x1, . . . , x nn) ∼ D is generated
by sampling xi ∼i.i.d. D independently. For any finite set S, we write U(S) to denote the discrete
5
uniform distribution over S; we abuse notation and write x ∼ S to denote x ∼ U(S). For any
continuous set S, we write U(S) to denote the continuous uniform distribution over S (i.e. having
support S and constant density); we also abuse notation and write x ∼ S to denote x ∼ U(S).
For distributions D1,D2 supported on a measurable set X , we define the statistical distance∫
between D1 and D2 to be ∆(D1,D 12) = |D1(x)−D2(x)|dx. We say that distributions D2 x∈X 1,D2
are -close if ∆(D1,D2) ≤ . For a distinguisher A running on two distributions D1, D2, we say
that A has advantage  if
∣ ∣
∣ ∣
∣ Pr [A(x) = 1]− Pr [A(x) = 1]∣ ≥ ,
∣x∼D x∼D ∣1 2
where the probability is also over any internal randomness of A.
We let In×n ∈ {0, 1}n×n denote the n×n identity matrix. When n is clear from context, we write
this simply as I. For any matrix M ∈ Rm×n, we let M> be its transpose matrix, and for ` ∈ [n],
we write M ∈ Rm×`[`] to denote the submatrix of M consisting of just the first ` columns, and we
write M ∈ Rm×(n−`)]`[ to denote the submatrix of M consisting of all but the first ` columns.
For any vector v ∈ Rn, we write ‖v‖ to mean the standard `2-norm of v, and we write ‖v‖∞ to
denote the `∞-norm of v, meaning the maximum absolute value of any component. For n ∈ N, we
let Sn−1 ⊂ Rn denote the (n − 1)-dimensional sphere embedded in Rn, or equivalently the set of
unit vectors in Rn. By Zq, we refer to the ring of integers modulo q, represented by {0, . . . , q − 1}.
By Tq, we refer to the set R/qZ = [0, q) ⊆ R where addition (and subtraction) is taken modulo q
(i.e. Tq is the torus scaled up by q). We denote T := T1 to be the standard torus. By taking a
real number mod q, we refer to taking its representative as an element of Tq in [0, q) unless stated
otherwise.
Definition 2 (Min-Entropy). For a discrete distribution D with support S, we let H̃∞(D) denote
the min-entropy of D,
( )
H̃∞(D) = − log2 max Pr [x = s] .
s∈S x∼D
Lemma 1 (Leftover Hash Lemma [HILL99]). Let `, n, q ∈ N,  ∈ R>0, and let S be a distribution
over {−1, 0, 1}n ⊆ Znq . Suppose H̃∞(S) ≥ ` log2(q) + 2 log2(1/). Then, the distributions given by
(A,As (mod q)) and (A,b) where A ∼ Z`×n `q , s ∼ S, b ∼ Zq have statistical distance at most .
2.1 Lattices and Discrete Gaussians
A rank n integer lattice is a set Λ = BZn ⊆ Zd of all integer linear combinations of n linearly
independent vectors B = [b1, . . . ,bn] in Zd. The dual lattice Λ∗ of a lattice Λ is defined as the set
of all vectors y ∈ Rd such that 〈x,y〉 ∈ Z for all x ∈ Λ.
For arbitrary x ∈ Rn and c ∈ Rn, let
1 ( )
ρs,c(x) = exp −π‖(x− c)/s‖
2
sn
denote the density function of the standard Gaussian over Rn of width s ∈ R>0 centered at c. Let
Ds,c be the corresponding distribution. Note that Ds,c is the n-dimensional Gaussian distribution
with mean c and covariance matrix s2/(2π) · In×n. When c = 0, we omit the subscript notation of
c on ρ and D.
6
For an n-dimensional lattice Λ ⊆ Rn and point c ∈ Rn, we can define the discrete Gaussian of
width s to be given by the mass function
ρs(x)
DΛ+c,s(x) =
ρs(Λ + c)
∑
supported on x ∈ Λ + c, where by ρs(Λ + c) we mean y∈Λ ρs(y + c).
We now give the smoothing parameter as defined by [Reg09] and some of its standard properties.
Definition 3 ([Reg09], Definition 2.10). For an n-dimensional lattice Λ and  > 0, we define η(Λ)
to be the smallest s such that ρ ∗1/s(Λ \ {0}) ≤ .
Lemma 2 ([Reg09], Lemma 2.12). For an n-dimensional lattice Λ and  > 0, we have
√
ln(2n(1 + 1/))
η(Λ) ≤ · λn(Λ).
π
Here λi(Λ) is defined as the minimum length of the longest vector in a set of i linearly independent
vectors in Λ.
Lemma 3 ([Reg09], Corollary 3.10). For any n-dimensional lattice Λ and  ∈ (0, 1/2) σ, σ′ ∈ R>0,
and z ∈ Rn, if
1
η(Λ) ≤ √ ,
1/(σ′)2 + (‖z‖/σ)2
then if v ∼ DΛ,σ′ and e ∼ Dσ, then 〈z,v〉+e has statistical distance at most 4 from D√ .(σ′‖z‖)2+σ2
Lemma 4 ([MR07], Lemma 4.1). For an n-dimensional lattice Λ,  > 0, c ∈ Rn for all s ≥ η(Λ),
we have
∆(Ds,c mod P (Λ), U(P (Λ))) ≤ /2,
where P (Λ) is the half-open fundamental parallelepiped of Λ.
Lemma 5 ([MR07], implicit in Lemma 4.4). For an n-dimensional lattice Λ, for all  > 0, c ∈ Rn,
and all s ≥ η(Λ), we have
[ ]
1− 
ρs(Λ + c) = ρs,−c(Λ) ∈ , 1 · ρs(Λ).
1 + 
Now we recall other facts related to lattices.
Lemma 6 ([MP13], Theorem 3). Suppose v ∈ Zm with gcd(v) = 1, and suppose y ∼ Dmi Z,σ for all√ ∑ i
i ∈ [m]. As long as σi ≥ 2‖v‖∞η  (Z) for all i ∈ [m], then we have y = i∈[m] yivi is O()-close
√ 2m
∑
to D 2 2Z,σ where σ = i∈[m] σi vi .
Lemma 7 ([Mic18], Lemma 2.2). For w ∼ U(Z`q), the probability that gcd(w, q) =6 1 is at most
log(q)/2`.
Definition 4. We say that a matrix T ∈ Zk×m is primitive if TZm = Zk, i.e., if T : Zm → Zk is
surjective.
7
Lemma 8 ([Mic18], Lemma 2.6). For any primitive matrix T ∈ Zk×m and positive reals α, σ > 0,
if TT> = α2I and η(ker(T )) ≤ σ, then T (DZm,σ) and DZn,ασ are -close.
We also use the notation N (µ,Σ) to denote a multivariate Gaussian distribution with mean
µ ∈ Rn and covariance matrix Σ ∈ Rn×n for symmetric positive semi-definite Σ.
We now define mixtures of Gaussians, follow the definition for estimating the density for mixtures
of Guassians as given in [BRST21].
Definition 5. Let Gn,k be the set of all mixtures of k Gaussians in Rn. That is, Gn,k contains
exactly the distributions distribution P that can be written as
∑
P = wi · N (µi,Σi),
i∈[k]
for weights wi ∈ [0, 1] summing to 1 and arbitrary µ ∈ Rni and covariance matrices Σ ∈ Rn×ni . We
define the problem of density estimation for Gn,k to be the following problem. Given sample access
to an arbitrary (and unknown) P ∈ Gn,k, with probability ≥ 9/10, output a distribution Q (as an
evaluation oracle) such that ∆(P,Q) ≤ 1/10.
2.2 Learning with Errors
Throughout, we work with decisional versions of LWE, CLWE, and hCLWE.
Definition 6 (LWE Distribution). Let n,m, q ∈ N, let A be a distribution over Rn, S be a dis-
tribution over Zn, and E be a distribution over R. Let LWE(q,m,A,S, E) be distribution given by
sampling a1, · · · ,am ∼ A, s ∼ S, and e1, · · · , em ∼ E, and outputting (ai, s
>ai + ei (mod q)) for
all i ∈ [m]. We refer to n as the dimension and m as the number of samples. Whenever q is clear
from the distribution on A, we omit it for brevity.
We also consider the case where S is a distribution over Zn×j and E is a distribution over Rj.
In this case, the ouput of each sample is (a , S>i ai + ei (mod q)), where S ∼ S and ei ∼ E.
Definition 7 (CLWE Distribution [BRST21]). Let n,m, q ∈ N, γ, β ∈ R, and let A be a distribution
over Rn×m and S be a distribution over Sn−1. Let CLWE(q,m,A,S, γ, β) be the distribution given
by sampling a1, · · · ,am ∼ A, w ∼ S, e1, · · · , em ∼ Dβ and outputting (ai, γ · 〈ai,w〉+ ei (mod q))
for all i ∈ [m]. We refer to n as the dimension and m as the number of samples. We omit q if q = 1
and omit S if S = U(Sn−1), as is standard for CLWE.
Definition 8 (hCLWE Distribution [BRST21]). Let n,m ∈ N, γ, β ∈ R, and let A be a distribution
over Rn×m and S be a distribution over Sn−1. Let hCLWE(m,A,S, γ, β) be the the distribution
CLWE(m,A,S, γ, β), but conditioned on the fact that for all samples second entries are 0 (mod 1).
We refer to n as the dimension and m as the number of samples. We omit S if S = U(Sn−1), as is
standard for hCLWE.
Note that the hCLWE distribution is itself a mixture of Gaussians. Explicitly, for a secret s ∼ S,
we can write the density of hCLWE(1, D1, s, γ, β) at point x ∈ Rn as
( )
∑ ∑ γ
ρ(x) · ρβ(k − γ · 〈s,x〉) = ρ√ 2 2(k) · ρ(πs⊥(x)) · ρ
√ 〈s,x〉 − k , (1)
β +γ β/ β2+γ2 β2 + γ2
k∈Z k∈Z
8
where πs⊥(x) denotes the projection onto the orthogonal complement of s. Thus, we can view√
hCLWE samples as being drawn from a mixture of Gaussians of width β/ β2 + γ2 ≈ β/γ in the
secret direction, and width 1 in all other directions.
Definition 9 (Truncated hCLWE Distribution [BRST21]). Let n,m, g ∈ N, γ, β ∈ R, and let S be a
distribution over Sn−1. Let hCLWE(g)(m,S, γ, β) be the the distribution hCLWE(m,Dn1 ,S, γ, β), but
restricted to the central g Gaussians, where by central g Gaussians, we mean the central g Gaussians
in writing hCLWE samples as a mixture of Gaussians, as in Eq. 1. Explicitly, for secret s ∼ S, the
density of one sample at a point x ∈ Rn is
b(g−1)/2c ( )
∑
ρ√
γ
2 2(k) · ρ(πs⊥(x)) · ρ
√
2 2 〈s,x〉 − k . (2)β +γ β/ β +γ β2 + γ2
k=−bg/2c
The following theorem tells us that distinguishing a truncated version of the hCLWE Gaussian
mixture from the standard Gaussian is enough to distinguish the original Gaussian mixture from
the standard Gaussian. In particular, we can use density estimation to solve hCLWE since the
truncated version has a finite number of Gaussians.
Theorem 4 (Proposition 5.2 of [BRST21]). Let n,m ∈ N, γ, β ∈ R>0 with β < 1/32 and γ ≥ 1.√
Let S be a distribution over Sn−1. For sufficiently large m and for g = 2γ lnm/π, if there is an
algorithm running in time T that distinguishes hCLWE(2g+1)(m,S, γ, β) and Dn×m1 with constant
probability, then there is a time T + poly(n,m) algorithm distinguishing hCLWE(m,Dn1 ,S, γ, β) and
Dn×m1 with constant probability. In particular, if there is an algorithm running in time T that
solves density estimation for Gn,2g+1, then there is a time T + poly(n,m) algorithm distinguishing
hCLWE(m,Dn,S, γ, β) and Dn×m1 1 with constant probability.
We also use a Lemma which says that if CLWE is hard, then so is hCLWE.
Lemma 9 (Lemma 4.1 of [BRST21]). There is a poly(n, 1/β)-time reduction M such that M maps
samples CLWE(Dn1 , s, γ, β) to hCLWE(D
n
1 , s, γ, 2β) and maps D
n × U(T ) to Dn1 1 1 .
3 Reducing LWE to CLWE
Our main result in this section is a reduction from decisional LWE to decisional CLWE. Explicitly:
Theorem 5. Let q, n,m,m1 ∈ N with m1 ≥ m, and let γ, β, σ,  ∈ R>0. If there is a T -time
m×m
distinguisher with advantage  between CLWE(m1, (D
m
1) , γ, β) and D
1
1 ×U(Tm1), then there is
a time T + poly(n,m,m1, q, λ) time distinguisher with advantage /O(m1) up to additive negl(λ)
factors between LWE(m,Znq ,Znq , DZ,σ) and U(Zn×mq × Zmq ), for
(√ )
γ = O m(lnm1 + ω(log λ)) ,
(√ )
m √
β = O · σ2 + lnm1 + ω(log λ) ,
q
√
as long as log(q)/2n = negl(λ), m ≥ 2n log2 q, and σ ≥ C · lnm1 + ω(log λ) for some universal
constant C.
9
Reducing LWE to CLWE (Theorem 5)
Samples Secrets (· γ) Errors # Samples Adv.
Start (LWE) Znq Znq DZ,σ m O(m1)
Step 1 (Theorem 6) Zn1q {+1,−1}n1 DZ,σ m1 1 
Step 2 (Lemma 10) Zn1 {+1,−1}n1q Dσ m1 2
Step 3 (Lemma 11) Tn1q {+1,−1}n1 D√ σ m1 3
Step 4 (Lemma 13) Dn1 √1 {+1,−1}n1τ (· q n1) Dσ m1 n1 3√
Step 5 (Lemma 14) Dn1 Sn1−1τ (· q n√ 1) Dσ m1 3
CLWE (Lemma 15) nD 1 Sn1−11 (· τ n1) Dβ m1 
Figure 1: This tables shows the steps in the reduction from LWE to CLWE. When a distribution
is not explicitly specified, it is taken to be the uniform distribution. Here, “Adv.” is short for
advantage of the distinguisher, which holds up to additive negl(λ) factors.
Remark 1. The use of the security parameter λ here is only to talk about disitnguishing advantage;
in particular, one can set λ = Θ(1) independently of all other parameters (so negl(λ) = o(1)) to get
that a distinguisher with advantage Ω(1) for CLWE implies a distinguisher with advantage Ω(1/m1)
for LWE.
This reduction goes via a series of transformations, which we briefly outline below:
1. We convert secrets s ∼ Znq to binary secrets s ∼ {+1,−1}n1 for slightly larger n1 and noise
σ1, now with m1 samples instead of m.
2. We convert discrete Gaussian errors e ∼ DZ,σ to continuous Gaussian errors e ∼ Dσ for σ2 2
slightly larger than σ1.
3. We convert discrete uniform samples a ∼ Zn1q to continuous uniform samples a ∼ Tn1q with
errors from Dσ , where σ3 is slightly larger than σ3 2.
4. We convert uniform a ∼ Tn1q to Gaussian a ∼ Dn1τ where the secret is effectively scaled up√
by a factor of q; viewing it as a CLWE distribution, we have parameter γ0 = q n1 and unit
vector secret w ∼ √1 {+1,−1}n1 .
n1
5. We now re-randomize the secret distribution to be a continuously uniformly random element
of Sn1−1 instead of discrete uniform over √1 {+1,−1}n1 .
n1
6. We scale variables to bring us to the standard formulation of CLWE for parameters γ, β set
appropriately.
Setting of parameters. If we start with dimension n and m samples with error width σ:
√
1. After the first step, we get n1 = m, m1 samples, and σ1 = 2σ m, with advantage loss of
multiplicative O(m1).
√
2. After the second step, we get σ2 = σ21 + 4 ln(m1) + ω(log λ).
10
√
3. After the third step, we get σ = σ2 + 9n
√ √ 3 2 1
(lnn1 + lnm1 + ω(log λ)), as long as σ2 ≥
3 n1 lnn1 + lnm1 + ω(log λ).
√
4. After the fourth step, we get τ = lnn1 + lnm1 + ω(log λ) where the secret now has norm√
γ0 = q n1.
5. Nothing changes in the fifth step.
6. After step 5, we have γ = γ0 · τ/q and β = σ3/q.
Step 1: Converting uniform secrets to binary secrets. We can reduce the standard LWE
problem above to a version where secrets are drawn uniformly from s ∼ {+1,−1}n1 for some slightly
larger n1. This has the effect of making the secret both short and have `2 norm (in Rn1) exactly√
n1. From [Mic18], we know a reasonably tight reduction between these two problems.
Theorem 6 ([Mic18], Theorem 3.1 and Lemma 2.9). Let q, n,m,m1 ∈ Z, σ ∈ R. If a T -time al-
(m+1)×m
gorithm has advantage  in distinguishing LWE(m1,Zm+1q , {+1,−1}m+1, D
1
Z,σ′) and U(Zq ×
Zm1q ), then there is a time T + poly(n,m, q, λ) algorithm with advantage /O(m1) (up to addi-
n×(m+1)
tive negl(λ)) in distinguishing LWE(m + 1,Zn,Zn, D ) and U(Z × Zm+1q q Z,σ q q ), as long as√ √
log(q)/2n = negl(λ), σ ≥ 4 ω(log λ) + lnm+ lnm1, m ≥ 2n log2 q+ω(log λ), and σ
′ = 2σ m+ 1.
Remark 2. Note that we phrase the parameter requirements differently here than is done in [Mic18],
mainly because we want to delink the security parameter from n. Explicitly:
• The requirements q ≤ 2poly(m) and n ≥ ω(logm) in [Mic18] are needed only to make sure that
the first row of a primitive matrix is close to uniform over Zq. Indeed, Lemma 2.2 of [Mic18]
shows the statistical distance is at most log(q)/2n. Thus, the requirement log(q)/2n = negl(λ)
is sufficient.
√ √
• We require σ ≥ 4 ω(log λ) + lnm+ lnm1 instead of σ ≥ ω( logm) for various triangle
inequalities to go through to get negl(λ) overall statistical distance.
Step 2: Converting discrete errors to continuous errors. Now, we make the error distribu-
tion statistically close to a continuous Gaussian instead of a discrete Gaussian. Essentially, all we
do is add a small continuous Gaussian noise to the second component and argue that this makes
the noise look like a continuous Gaussian instead of a discrete one.
Lemma 10. Let n,m, q ∈ N, σ ∈ R>0. For any distribution S over Zn, suppose there is a T -time
distinguisher LWE(m,Zn,S, D ) and U(Zn×m mq σ′ q × Tq ), where
√
σ′ = σ2 + 4 ln(m) + ω(log λ).
√
If σ > 4 lnm+ ω(log λ), then there is a distinguisher between LWE(m,Znq ,S, DZ,σ) and U(Zn×mq ×
Zmq ) running in time T + poly(m,n, q, λ).
Proof. We run our original distinguisher for LWE(m,Zn,S, D ) and U(Zn×m mq σ′ q × Tq ). For every
sample (a, b) (from either LWE(m,Znq ,S, DZ,σ) or U(Zn×m mq ×Zq )), we sample a continuous Gaussian
e′ ∼ Dσ′′ where σ′′ will be set later, and send (a, b+ e′ (mod q)) to the distinguisher.
11
By Lemma 4, we know that the distribution of e′ (mod 1) has statistical distance at most  to
U([0, 1)) as long as σ′′ ≥ η(Z). Therefore, if we are given samples from U(Zn×m × Zmq q ), due to
symmetry of b ∼ Zq, we can set  = λ−ω(1)/m to have b + e′ (mod q) look negl(λ)/m-close to Tq,
making it look like samples from U(Zn×mq × Tmq ).
If we are given samples from LWE(m,Znq ,S, DZ,σ), then the second component can be seen as hav-√
ing noise e+e′, where e ∼ DZ,σ and e′ ∼ Dσ′′ . Applying Lemma 3, as long as 1/ 1√/σ
2 + 1/(σ′′)2 ≥
η(Z), then e + e′ will look O()-close to D√ 2 ′′ 2 . Thus, as long as σ, σ′′ ≥ 2 · η(Z), it allσ +(σ )
goes through, as taking errors mod q (i.e. in Tq instead of R) can only decrease statistical distance.√
Now, applying Lemma 2, we can set  = λ−ω(1)/m and σ′′ = 4 ln(m) + ω(log λ), and as long as
√
σ > 4 ln(m) + ω(log λ), all goes through. Now, doing the triangle inequality over all m samples,
we get negl(λ)-closeness of all samples.
Step 3: Converting discrete to continuous samples. Now, we convert discrete uniform
samples a ∼ Znq to continuous uniform samples a ∼ Tnq .
Lemma 11. Let n,m, q ∈ N, σ ∈ R. Let S be a distribution over Zn where all elements in the
support have fixed norm r, and suppose that
√
σ ≥ 3r lnn+ lnm+ ω(log λ).
Suppose there is a T -time distinguisher between the distributions LWE(m,Tnq ,S, Dσ′) and U(Tn×mq ×
Tmq ), where we set √
σ′ = σ2 + 9r2(lnn+ lnm+ ω(log λ)).
Then, there is a T + poly(m,n, λ, q) time distinguisher between the distributions LWE(m,Znq ,S, Dσ)
and U(Zn×mq × Tmq ).
Proof. We run our distinguisher for LWE(m,Tnq ,S, Dσ′) and U(Tn×m mq × Tq ). Let  = negl(λ)/m,√
and let σ′′ ≥ 2 ·η(Zn). For each sample (a, b) (from either LWE(m,Znq ,S, Dσ) or U(Zn×m×Tmq q )),
we sample a continuous Gaussian a′ ∼ n(Dσ′′) and send (a+a′ (mod q), b) to the distinguisher. By
Lemma 4, we know that the distribution of a′ (mod 1) has statistical distance at most  = negl(λ)/m
to U([0, 1)n). Thus, by symmetry over a ∼ (Z )nq , the distribution of a + a′ (mod q) will be
negl(λ)/m-close to uniform over (T )nq . Therefore, by the triangle inequality, if we are given samples
from U(Zn×m × Tmq q ), the reduction gives samples to the distinguisher that are negl(λ)-close to
U(Tn×mq × Tmq ).
If we are given samples from LWE(m,Znq ,S, Dσ), then the reduction gives us (taking everything
mod q)
(a + a′, 〈a, s〉+ e) = (a + a′, 〈a + a′, s〉+ e− 〈a′, s〉) = (a + a′, 〈a + a′, s〉+ e′),
where we define
e′ = e− 〈a′, s〉
over R. Conditioned on a+a′ mod q, a′ is a discrete Gaussian distributed according toDZn+(a+a′),σ′′ .
By Lemma 3, as long as σ ≥ rσ′′, the distribution of e′ is O() = negl(λ)/m close to Dσ′ , where
√
σ′ = σ2 + r2(σ′′)2.
12
Averaging the distribution of e′ over s will not change the distribution over e′. Therefore, if we
are given the m samples from LWE(m,Znq ,S, Dσ), the reduction gives us samples negl(λ)-close to
LWE(m,Tnq ,S, Dσ′), as desired. √ √
To set parameters, we choose σ′′ = 3 lnn+ lnm+ ω(log λ) to ensure that σ′′ ≥ 2·η nnegl(λ)/m(Z ).
This gives
√
σ′ = σ2 + 9r2(lnn+ lnm+ ω(log λ)),
along with the requirement that
√
σ ≥ rσ′′ = 3r lnn+ lnm+ ω(log λ).
Step 4: Converting uniform to Gaussian samples.
Lemma 12. Let t ∈ R>0 be a parameter. There is a poly(n, t, q, λ)-time algorithm such that on
input z ∈ Tnq , the algorithm outputs some y ∈ Rn such that q · y = z (mod q). Moreover, if
z is uniform over Tn nq , then the distribution on the outputs y is negl(λ)/t-close to (Dτ ) , where√
τ = lnn+ ln t+ ω(log λ).
Remark 3. In the discrete setting, there is in some sense a necessary multiplicative Ω(log q) over-
head in the dimension due to entropy arguments, but the above shows that we can overcome that
barrier in the continuous case.
Proof. We give each coordinate of y separately. By the triangle inequality, it suffices to show how
to sample y ∈ R such that qy = z (mod q) and such that if z ∼ Tq, then y is negl(λ)/(tn)-close to
Dτ .
We sample
y ∼ DZ+z/q,τ ,
which can be sampled efficiently (see e.g. [BLP+13], Section 5.1 of full version), where we have
negl(λ)/(tn) statistical distance between y and DZ+z/q,τ , and always satisfy y ∈ Z + z/q. Since
y ∈ Z + z/q, it follows that qy ∈ qZ + z, which implies that qy = z (mod q).
Now, we need to argue that the distribution of y looks negl(λ)/(tn)-close to Dτ when z ∼
U([0, q)). To see this, observe that z/q is distributed uniformly on [0, 1), so it suffices to show that
DZ+r,τ for r ∼ [0, 1) is statistically close to Dτ . Note that for fixed r ∈ [0, 1), we can view the
distribution DZ+r,τ as a continuous distribution with density
ρτ (x)
DZ+r,τ (x) = δ(x− r mod 1) ·
ρτ (Z + r)
for arbitrary x ∈ R, where δ(·) is the Dirac delta function. Thus, as long as τ ≥ η(Z) (for  set
13
later), the density of the marginal distribution DZ+r,τ where r ∼ U([0, 1)) is given by
∫ 1
DZ+U([0,1)),τ (x) = 1 ·DZ+r,τ (x) · dr
0
∫ 1 ρτ (x)
= δ(x− r mod 1) · dr
0 ρτ (Z + r)
ρτ (x)
=
ρτ (Z + x)
[ ]
1 +  ρτ (x)
∈ 1, ·
1−  ρτ (Z)
[ ]
1 + 
∝ 1, · ρτ (x),
1− 
where the inclusion comes from Lemma 5. Therefore, a standard calculation shows that the statis-
tical distance between DZ+U([0,1)),τ and Dτ is at most O(). Setting  = λ−ω(1)/(t · n), we need to
√
take τ ≥ ηλ−ω(1)/(t·n)(Z), which we can do by setting τ = lnn+ ln t+ ω(log λ) by Lemma 2.
Lemma 13. Let n,m, q ∈ N, σ, r, γ ∈ R. Let S be a distribution over Zn where all elements in
the support have fixed norm r. Suppose there is a T -time distinguisher between the distributions
LWE(m, q,Dnτ , q · S, Dσ) = CLWE(m, q,D
n, 1τ · S, γ, σ) and D
n×m × U(Tmτ q ), where γ = r · q andr
√
τ = lnn+ lnm+ ω(log λ).
Then, there is a T -time distinguisher between the distributions LWE(m,Tnq ,S, Dσ) and U(Tn×mq ×
Tmq ).
Proof. We run the distinguisher for LWE(m, q,Dn, q · S, D ) and Dn×mτ σ τ ×U(Tmq ). For each sample
(a, b) from either or LWE(m,Tnq ,S, Dσ) and U(Tn×mq × Tmq ), we invoke Lemma 12 on a with pa-
rameter t = m to get some y ∈ Rn with statistical distance negl(λ)/m from Dnτ such that q · y = a
(mod q). We then send (y, b) to the distinguisher. If (a, b) is a sample from LWE(m,Tnq ,S, Dσ),
then for secret s ∼ S, since s ∈ Zn, we have
(y, b) = (y, 〈a, s〉+ e (mod q)) = (y, 〈q · y, s〉+ e (mod q))
= (y, 〈y, q · s〉+ e (mod q)),
where this is now negl(λ)/m close to a sample from LWE(m, q,Dnτ , q ·S, Dσ). Applying this reduction
to U(Tn×mq ×Tmq ) clearly gives us a statistically close sample to Dn×m mτ ×U(Tq ) by Lemma 12 and
the triangle inequality over all m samples.
Step 5: Converting the secret to a random direction. The distribution on the secret as
given above is not uniform over the sphere, so we apply the worst-case to average-case reduction
for CLWE (Claim 2.22 in [BRST21]). For completeness, we provide a proof.
Lemma 14 ([BRST21], Claim 2.22). Let n,m, q ∈ N, and let τ, σ ∈ R>0. Let S be a distribu-
tion over Rn of fixed norm 1. Suppose there is a T -time distinguisher between the distributions
14
CLWE(m, q,Dnτ , γ, σ) and D
n×m×U(Tmτ q ). Then, there is a T + poly(n,m, q) time distinguisher be-
tween the distributions CLWE(m, q,Dnτ ,S, γ, σ) and D
n×m
τ ×U(Tmq ). That is, we can reduce CLWE
to CLWE to randomize the secret to be a uniformly random unit vector instead of drawn from
(possibly discrete) S.
Proof. We run the distinguisher for CLWE(m, q,Dnτ , γ, σ) and Dn×mτ × U(Tmq ). Let R ∈ Rn×n be
a uniformly random rotation matrix in Rn, fixed for all samples. When giving the distinguisher a
sample, we get (a, b) from either CLWE(m, q,Dnτ ,S, γ, σ) or Dn×mτ × U(Tmq ), and send (Ra, b) to
the distinguisher. If (a, b) is drawn from CLWE(m, q,Dnτ ,S, γ, σ), then we have
(Ra, b) = (Ra, γ〈a, s〉+ e (mod q)) = (Ra, γ〈Ra, Rs〉+ e (mod q))
= (a′, γ · 〈a′,w〉+ e (mod q)),
for a ∼ Dnτ , s ∼ S, and e ∼ Dσ, where we set a′ = Ra and w = Rs (fixed for all samples). For an
arbitrary rotation R, since the distribution on a is spherically symmetric, we have a′ = Ra ∼ (Dτ )n,
independently of R. For a random rotation matrix R, for arbitrary s, we have that w = Rs is a
uniformly random unit vector in Rn. Since this holds for arbitrary s, this also holds when averaging
over the distribution s ∼ S. If (a, b) is drawn from Dn×mτ ×U(Tmq ), then (Ra, b) is drawn identically
to (a, b), since the distribution on a′ = Ra is spherically symmetric. Thus, the reduction maps the
distributions perfectly.
Step 6: Going from mod q to mod 1. Now, we divide a by τ , multiply γ by τ/q, and divide
e by q to finally reduce to decisional CLWE as defined in [BRST21]. To be precise:
Lemma 15. Suppose there is a T -time distinguisher between the distributions CLWE(m,Dn ′1 , γ , σ/q)
and Dn×m1 × U(Tm), where γ′ = γ · τ/q. Then, there is a T + poly(n,m, q, λ) time distinguisher
between the distributions CLWE(m, q,Dnτ , γ, σ) and D
n×m m
τ × U(Tq ).
Proof. The reduction follows by rescaling the samples appropriately.
Now, we are ready to give a proof of Theorem 5. (See Figure 1 for a sketch.)
Proof of Theorem 5. Throughout this proof, when we say advantage, we omit additive negl(λ) terms.
Suppose there is no T + poly(n,m,m1, q, λ) time distinguisher with advantage /O(m1) between
LWE(m,Znq ,Znq , DZ,σ) and U(Zn×m mq × Zq ).
Then, by Theorem 6, there is no T + poly(n,m,m1, q, λ)-time distinguisher between√
LWE(m ,Zm, {+1,−1}m, DZ ) and U(Zm×m1 × Zm11 q ,σ q q ) with advantage , where σ1 = 2σ m, and1
all other sufficient conditions are met by the hypotheses of the theorem. Note that we are setting
n1 = m.
Then, by Lemma 10, there is no T + poly(n,m,m1, q, λ)-time distinguisher between
LWE(m m m m×m1 m1 2 21,Zq , {+1,−1} , Dσ ) and U(Zq × Tq ) with advantage , where σ2 = σ1 + 4 lnm2 1 +
ω(log λ). Note that σ21 = 4σ2m ≥ 4m · C2(lnm1 + ω(log λ)) 4 lnm+ ω(log λ), as needed.
Then, by Lemma 11, there is no T + poly(n,m,m1, q, λ)-time distinguisher between
LWE(m m m1,Tq , {+1,−1} , Dσ ) and U(Tm×m1 m1q ×Tq ) with advantage , where σ23 = σ22 + 9m(lnm+3 √ √
lnm1 + ω(log λ)), as long as σ2 ≥ 3 m lnm+ lnm1 + ω(log λ), which we are given, since
σ22 > σ
2
1 = 4σ
2 ·m ≥ C2(lnm1 + ω(log λ)) ·m,
15
where C is chosen to be a sufficiently large constant.
Then, by Lemma 13, there is no T + poly(n,m,m1, q, λ)-time distinguisher between
( { }m )
m m m 1 1LWE(m1, q,Dτ , {+q,−q} , Dσ ) = CLWE m1, q,Dτ , √ ,−√ , σ3, γ3 0m m
√ √
and Dm×m1τ × U(Tm1q ) with advantage  for γ0 = m · q, and for τ = lnm+ lnm1 + ω(log λ).
Then, by Lemma 14, there is no T + poly(n,m,m1, q, λ)-time distinguisher between
m×m
CLWE (m1, q,D
m
τ , σ3, γ0) and D 11 × U(Tm1q ) with advantage .
Lastly, by Lemma 15, there is no T -time distinguisher between CLWE (m m
√ 1
, D1 , β, γ) and
Dm×m1 m1τ × U(Tq ) with advantage  where γ = γ0 · τ/q = m · τ and β = σ3/q.
Unraveling parameters, we have
√ (√ )
γ = m · τ = O m(lnm1 + ω(log λ)) ,
and
( ) ( )
σ2 σ22 3 2 +m lnm1 +m · ω(log λ)) σ
2
1 +m lnm1 +m · ω(log λ)β = = O = O
q2 q2 q2
( )
σ2 + lnm1 + ω(log λ)
= O m · ,
q2
as desired.
4 Hardness of k-sparse LWE
In this section, we reduce from standard LWE to a version where secrets are sparse, in the sense
that they have few non-zero entries.
Definition 10. For k, n ∈ N with k ≤ n, let Sn,k be the subset of vectors in {−1, 0,+1}n with
exactly k non-zero entries. We call s ∈ Zn k-sparse if s ∈ Sn,k.
Lemma 16. We have H̃∞(Sn,k) ≥ k log2(n/k).
( ) ( )
Proof. Observe that |S | = nn,k · 2k. Using the bound (n/k)k ≤ n , we havek k
(( )n)k
H̃∞(Sn,k) ≥ log2 2 · ≥ k log2(n/k),
k
as desired.
Micciancio [Mic18] gave a simplified proof of hardness for n-sparse secrets (i.e. binary secrets
with entries in {+1,−1}), and we show that his result with slight modification extends to the
k-sparse setting in a natural way. Explicitly, we have the following.
16
Theorem 7. Let q,m, n, `, k ∈ N with 1 < k < n, and let σ,  ∈ R `
√ >0
. Suppose log(q)/2 =
negl(λ), σ ≥ 4 ω(log λ) + lnn+ lnm, and k log(n/k) ≥ (` + 1) log2(q) + ω(log λ). Suppose there
is no T + poly(n,m, q, λ)-time distinguisher with advantage  between LWE(n− 1,Z` ,Zm×`q q , DZm,σ)
Z`×(n−1) m×(n−1)and U( q ×Zq ), and further suppose there is no T time distinguisher with advantage 
m×(`+1) (`+1)×(n+1)
between LWE(n+ 1,Z`+1q ,Zq , DZm,2σ) and U(Zq × Z
m×(n+1)
q ). Then, there is no T
time distinguisher with advantage 2 (up to additive negl(λ) factors) between LWE(m,Znq ,Sn,k, DZ,σ′)√
and U(Zm×n × Zm), where σ′q q = 2σ k + 1.
Definition 11. Let n, k ∈ Z with k ≤ n. For all i ∈ [n], we define ei to be the ith standard basis
column vector, i.e. having a 1 in the ith coordinate and 0s elsewhere. We then define u ∈ Zn to be
∑k
u = i=1 ei, i.e. 1s in the first k coordinates and 0 elsewhere.
Lemma 17. There is a poly(n)-time computable matrix Q ∈ Zn×(2n+5) such that Q
√ [n]
is invert-
ible, u>Q = e>, the vector v> = u>Q ∈ Zn+5[n] 1 ]n[ satisfies ‖v‖2 = 2 k and ‖v‖√ ∞ = 2, and√
Q]1[(DZ2n+4,σ) and DZn,2σ are negl(λ)/t close as long as σ ≥ 6 · ω(log λ) + lnn+ ln t for a
parameter t.
Proof. We use essentially the same gadget Q as in Lemma 2.7 of [Mic18], except we modify two
entries of the matrix and add two columns. Specifically, we set Qk,k+1 = 0 (instead of −1),
Qk,n+k+1 = 0 (instead of 1), and add two columns to the end that are all 0 except for two en-
tries of 1 in Qk,2n+4 and Qk,2n+5.
We will give it explicitly as follows. Let the matrix X ∈ Zn×(n−1) be defined by
 
−1
 1 −1 
 
 . .  . . . . 
 
 1 −1 
 
X =  1 0  ,
 
 
 1 −1 
 . . . .

 . . 
 
 1 −1
1
where the row with the abnormal 0 is the kth row. Similarly, let Y ∈ Zn×(n−1) be defined by
 
1
1 1 
 
 . . 
 . . . . 
 
 1 1 
 
Y =  1 0  ,
 
 
 1 1 
 . .
 . . .

. 
 
 1 1
1
17
where the row with the abnormal 0 is again the kth row. We then define Q ∈ Zn×(2n+5) by
Q = [e1, X,−en, Y, en, e1, e1, ek, ek].
First, notice that Q[n] is invertible, since it is upper-triangular with 1s on the diagonal. Next, notice
that u>Q > >[n] = e1 , as u e1 = 1 and the sum of the first k entries in each column of X are all 0 by
construction. We can write v> = u>Q]n[ = [0, 2, 2, · · · , 2, 0, · · · , 0, 1, 1, 1, 1], which has `2 norm
√ √
(k − 1) · 22 + 4 · 12 = 2 k.
It’s clear to also see that ‖v‖∞ = 2. All that is remaining to show is that Q]1[(DZ2n+4,σ) and DZn,2σ
are negl(λ)/t-close, which we do below.
To show that Q]1[(DZ2n+4,σ) and DZn,2σ are negl(λ)/t close, we first prove the preconditions of
and then invoke Lemma 8. Let T = Q ∈ Zn×(2n+4)]1[ .
First, we show that T is primitive. It suffices to show that for every standard basis column vector
ei, there is some g 2n+4i ∈ Z such that ei = Tgi. For all j ∈ [2n + 4], we define fj to be the jth
standard basis column vector in R2n+4. Let g1 = f2n+1, and gk+1 = fk. It can be easily checked
that e1 = Tg1 and ek+1 = Tgk+1. Then, for all i such that 1 < i ≤ k and k + 1 < i ≤ n, let
gi = fi−1 + gi−1. Using an inductive argument, and by the construction of T , it follows that
Tgi = T (fi−1 + gi−1)
= T fi−1 + Tgi−1
= (ei − ei−1) + ei−1
= ei.
It is easy to check that TT> = 4I. Finally, we bound the smoothing parameter of the lattice
Λ = ker(T ). Since T ∈ Zn×(2n+4) and T has full rank, its kernel Λ has dimension n + 4. The
columns of the following matrix give a basis for the lattice Λ.
 
Ỹ e1 −ek−1
 
 −X̃ −e1 −ek−1 
 
 1 1 
V = ∈ Z(2n+4)×(n+4)  ,
 1 −1 
 
−Z̃k−1 1 1 
−Z̃k−1 1 −1
where we define
 
−1
 1 −1 
 
X̃ = n×n .
 . . . .
 ∈ Z ,
. 
1 −1
 
1
1 1 
 
Ỹ = n×n . .
 . . .
 ∈ Z , and
. 
1 1
[ ]
Z̃k−1 = 0 . . . 0 1 0 . . . 0 ∈ Z1×n.
18
Here Z̃k−1 is the zero matrix except for the (k − 1)th column which has a 1 entry. By direct
computation, it is easy to see that the columns of V lie in ker(T ). To see that V is a basis for ker(T ),
we can show that its columns are linearly independent by constructing a matrix W ∈ Z(n+4)×(2n+4)
such that WV = 2I(n+4)×(n+4). Indeed, we can do so in the following way. We can first define
matrices
   
1 1
 . .  . .

   . . 
   
 1   1 
   
I =  0 1  ∈ Zn×n, I =  + 0 −1 ∈ Zn×n,  −  
 1      1 
 . .   . .  .   . 
1 1
where the abnormal row is the (k − 1)th row, and then define
 
I+ I−
 1 1 
 
W =  1 −1  ∈ Z(n+4)×(2n+4),
 
Z̃ k −Z̃k 1 1
1 −1
where similarly to before, Z̃ ∈ Z1×nk is the one-hot vector with a 1 in the kth column. It is
straightforward to verify that WV = 2I(n+4)×(n+4), showing that the columns of V are linearly
independent. √
√ By looking at the columns of V , we have λ (Λ) ≤ 6, so by Lemma 2, we have η (Λ) ≤√ n+4 
6 · ω(log λ) + lnn+ ln t ≤ σ, where we set  = n√egl(λ)/t. Therefore by Lemma 8, we get that√
Q]1[(DZ2n+4,σ) and DZn,2σ are negl(λ)/t close if σ ≥ 6 · ω(log λ) + lnn+ ln t.
Lemma 18. There is a poly(n) time algorithm that on input z ∈ S outputs a matrix Z ∈ Zn×nn,k
(as a function of z) that satisfies the following properties:
• Z is a permutation matrix with signs, i.e. a permutation matrix where the non-zero entries
could be ±1 instead of just 1,
• Z = Z> = Z−1, and
• Zz = u.
Proof. We can define Z as follows. Let
T≤k = {i ∈ [k] : zi =6 0}, T>k = {i ∈ [n] \ [k] : zi 6= 0},
T ∗≤k = {i ∈ [k] : zi = 0}, T
∗
>k = {i ∈ [n] \ [k] : zi = 0}.
Intuitively, T≤k and T>k partition the non-zero coordinates of z based on whether they lie in the
first k coordinates, and T ∗≤k and T
∗
>k partition the zero-coordinates of z based on whether they lie
in the first k coordinates. Note that by k-sparsity of z, we have
|T>k| = k − |T
∗
≤k| = |[k] \ T≤k| = |T≤k|.
19
Therefore, we can choose an arbitrary bijection f : T ∗>k → T≤k.
For all i ∈ T≤k, we set Zi,i = zi ∈ {+1,−1}. For all i ∈ T ∗>k, we set Zi,i = 1. For all i ∈ T>k, we
set Zf(i),i = zi ∈ {+1,−1} and Zi,f(i) = Zf−1(f(i)),f(i) = zi ∈ {+1,−1}. We set all other entries of
Z to be 0. It’s clear from this definition that Z = Z>.
First, observe that Z is a signed permutation matrix. For all i ∈ T ∗≤k∪T>k, Z is the identity map
up to signs (on basis vectors ei), and for all i ∈ T>k, Z consists of signed transpositions Zei = zief(i)
and Zef(i) = zief−1(f(i)) = ziei. Therefore, Z is a signed permutation matrix, and furthermore we
have also shown Z2 = In×n. Therefore, Z = Z−1.
Lastly, we show Zz = u. We can decompose z as z = z≤k+z>k in the natural way by considering
the non-zero coordinates of z on [k] and [n] \ [k] respectively. We then have
Zz = Z(z≤k + z>k) = Zz≤k + Zz>k = 1T + 1T ∗ = u,≤k ≤k
as desired.
Definition 12. We define a randomized mapping ϕ as follows. Let Q be as defined in Lemma 17.
We sample z ∼ Sn,k, s ∼ Zmq , a ∼ Zn−1q , e ∼ DZm,2σ, G ∼ D n×nZm×(n+5),σ. Let Z ∈ Z be as defined
Zm×(n−1)in Lemma 18 as a function of z. On input B ∈ q , we define
[[ ] ]
ϕ(B; z, s,a, e, G) = s, s · a> +B,G Q>Z, s + e .
First, we show that maps Zm×(n−1)ϕ B ∼ U( q ) to LWE(m,Znq ,Sn,k, DZ,σ′).
m×(n−1)
Lemma 19. Assume the same hypothesis as Theorem 7. For B ∼ U(Zq ), we have ϕ(B) and
LWE(m,Znq ,Sn,k, DZ,σ′) are negl(λ)-close.
Proof. We fix a ∈ Zn−1q , z ∈ S nn,k and we argue that ϕ(B) maps to LWE(m,Zq , z, DZ,σ′), i.e. the
LWE distribution with secret z. Averaging over a and z gives the desired result.
[[ ] ]
First, we show that X = s, s · a> +B,G Q>Z looks uniform. By construction, [s, s · a>+B]
has distribution U(Zm×nq ), by using the independent randomness of s and B. We can write
X = [s, s · a> +B]Q>[n]Z +GQ
>
]n[Z.
Since Q[n] and Z are invertible, by a one-time pad argument, we have X ∼ U(Zm×nq ), independent
of G and e.
Now, we have to argue that the conditional distribution on x = s + e is equal to Xz + e′ for
some Gaussian noise e′. We can directly write
x−Xz = s + e− ([s, s · a> +B]Q>[n]Z +GQ
>
]n[Z)z
= s + e− [s, s · a> +B]Q> >[n]u−GQ]n[u
= s + e− [s, s · a> +B]e1 −Gv
= e−Gv,
where we use the fact that Zz = u, u>Q > >[n] = e1 and u Q]n[ = v>.
For all j ∈ [m], let g ∈ Zn+5j be the jth row of G. For each entry (row) ẽj of e − Gv, we can
write ẽj = ej − g>j v = 〈[ej ,gj ], [1,−v]〉 and apply Lemma 6 with the vector v
′ = [1,−v] to argue
20
√ ∑ √ √
that ẽj is O()-close to D with σ′Z,σ′ = (2σ)2 + 2i∈[n+5](σvi) = σ 4 + ‖v‖
2
2 = 2σ k + 1, as
√
long as σ ≥ 2‖v‖∞η/(2(n+6))(Z). Now, using the triangle inequality over all m rows to get overall
statistical distance negl(λ), we can set  = negl(λ)/m, for which
√
σ ≥ 2 · 2 · ηnegl(λ)/(mn)(Z)
√
is sufficient. By Lemma 2, this holds as long as σ ≥ 4 lnm+ lnn+ ω(log λ), which we are given.
Next, we show ϕ maps the standard LWE (with matrices as secrets) to standard LWE in slightly
different dimensions, very much following the proof of Claim 3.3 of [Mic18].
Lemma 20. Assume the same hypothesis as Theorem 7. Let D1 denote the distribution of SA+E
`×(n−1) m×(n−1)
(mod q), where A ∼ U(Z m×`q ), S ∼ U(Zq ), E ∼ DZ,σ . Let D2 denote the distribution of
(`+1)×(n+1) m×(`+1) m×(n+1)
ŜÂ + Ê (mod q), where Â ∼ U(Zq ), Ŝ ∼ U(Zq ), Ê ∼ DZ,2σ . Then, ϕ(D1) is
negl(λ)-close to D2.
The proof goes exactly as in Claim 3.3 of [Mic18]. The only differences are in our matrices Q,Z,
and our distribution of secrets z ∼ Sn,k. The full differences are as follows.
• While our Z is different, since Z = Z> is a permutation matrix with signs, it still holds that
Z ·Dn nZ,2σ = DZ,2σ due to symmetry.
• We have Q]1[(D2n+4 nZ,σ ) is negl(λ)/m-close to DZ,2σ by Lemma 17.
• The probability that w (in their notation) is not primitive is at most log(q)/2` = negl(λ), as
desired.
• When applying leftover hash lemma (Lemma 1), the min-entropy of z ∼ Sn,k is now k log2(n/k).
Thus, we require k log2(n/k) ≥ (`+1) log2(q)+ω(log λ) instead of n ≥ (`+1) log2(q)+ω(logm).
For completeness, we provide a self-contained proof, exactly following Claim 3.3 of [Mic18].
Proof of Lemma 20. Let B ∼ D1. Let Y = [s, sa> + B]. By linearity, we can decompose Y as
Y = Ys + Ye, where Ys = [s, sa> + SA] and Ye = [0, E]. Similarly, we can write
[[ ] ]
ϕ(B) = s, s · a> +B,G Q>Z, s + e = [Xs, s] + [Xe, e],
where Xs = Y Q>s Z and Xe = [Ye, G]Q>Z = [E,G]Q> Z. Our goal is to now show that [Xs, s] is[n] ]1[
statistically close to ŜÂ, and that [Xe, e] is statistically close to Ê, where ŜÂ+ Ê is a sample from
D2. If this holds, then ϕ(B) is statistically close to ŜÂ+ Ê, which completes the proof.
First, let us look at [Xe, e]. Note that e is a discrete Gaussian vector of width 2σ independent
of everything else, so the last column has the desired distribution. Furthermore, note that E and G
have entries that are discrete Gaussian of width , so m×(2n+4)σ [E,G] ∼ DZ,σ . By Lemma 17, setting
t = m, we can use the triangle inequality over all m rows to get that [E,G]Q> is negl(λ) close to
√ ]1[√
Dm×nZ,2σ as long as σ ≥ 6 ω(log λ) + lnn+ lnm. Since Z is a signed permutation, by symmetry,
21
we then know that Xe = [E,G]Q> Z is negl(λ) close to Dm×nZ,2σ , and thus [Xe, e] is negl(λ) close to]1[
m×(n+1)
DZ,2σ , which is the same distribution as Ê. Note that this depends only on e, G,E.
To finish, we look at [Xs, s]. We now define
[ ]
Ŝ = s, S W−1 ∈ Zm×(`+1)q ,
where W is a uniformly random invertible matrix over Z(`+1)×(`+1)q . Since W is invertible, using
the randomness of S and s, Ŝ is uniformly random independently of W . Next, we define
Â = WHQ>[n]Z
>[In×n, z] ∈ Z(`+1)×(n+1)q , where
[ ]
1 a>
H = ∈ Z(`+1)×n
0 A q
.
Note that we have the identity Q> Z>z = Q> Zz = Q> u = e1 by Lemmas 18 and 17, as well as[n] [n] [n]
the identity ŜWH = [s, S]H = Ys. Therefore,
ŜÂ = ŜWHQ> Z>[n] [In×n, z] = Y Q
> Z>s [n] [In×n, z] = [Y Q
>
s [n]Z, Yse1] = [Xs, s],
as desired.
Now, we have to show that Ŝ and Â have the correct distributions. We have already shown that
Ŝ has the correct distribution (only depending on S and s), so it suffices to show that Â has the
correct distribution given S and s, using the randomness of A,a,W and z. First, let’s look at the
matrix WH. Let w be the first column of W . The first column of WH will be exactly w. Since
W is a uniformly random invertible matrix, w is distributed uniformly among all primitive vectors
in Z`+1q , i.e. so that gcd(w, q) = 1. By Lemma 7, as long as log(q)/2` = negl(λ), which we have
assumed, then the distribution of w is negl(λ)-close to uniform over Z`+1q . The remaining columns[ ]
>
of WH will be aW , which by using the uniform randomness of a and A, and the invertibility
A
of W , will be uniformly random and independent of w. Therefore, Z(`+1)×nWH ∈ q is negl(λ)-close
to uniformly random. Now, since Q> and Z> are invertible, we have WHQ> Z> is negl(λ)-close
[n] [n]
to uniform, independently of z. Let A′ = WHQ> Z>, which we have just shown is negl(λ)-close to
[n]
uniform, independently of z. Note that
Â = A′[In×n, z] = [A
′, A′z].
Applying the leftover hash lemma (Lemma 1) and Lemma 16, since k log2(n/k) ≥ (`+ 1) log2(q) +
ω(log λ), we know Â is negl(λ)-close to uniform, independently of Ŝ and Ê. This completes the
proof that ϕ(D1) and D2 are negl(λ)-close.
With the above claims, we are ready to prove the main theorem of this section.
Proof of Theorem 7. We will show the contrapositive. Suppose we have a T -time distinguisher
between LWE(m,Znq ,Sn,k, DZ,σ′) and U(Zm×n × Zmq q ) with advantage 2
We have two cases. Suppose that this distinguisher distinguishes between U(Zm×nq × Zmq ) =
m×(n+1)
U(Zq ) and D2 as given in Lemma 20, with advantage . Then, we have a T time dis-
tinguisher between Z`+1 Zm×(`+1) (`+1)×(n+1) m×(n+1)LWE(n + 1, q , q , DZm,2σ) and U(Zq × Zq ) where we
simply discard the samples, i.e. the first part in Z(`+1)×(n+1)q (the matrix Â).
22
Now, for the second case, suppose that this distinguisher does not distinguish between U(Zm×nq ×
Zm Zm×(n+1)q ) = U( q ) and D2 with advantage . Then, we have a T -time distinguisher between
LWE(m,Znq ,Sn,k, DZ,σ′) and D2 with advantage ≥ 2 −  =  by the triangle inequality. Now, we
can use this distinguisher to distinguish LWE(n−1,Z`q,Zm×`q , DZm,2σ) and U(Z
`×(n−1)
q ×Zm×(n−1)q ) by
once again discarding the samples, i.e. the first part in Z`×(n−1)q (the matrix A), and then by applying
ϕ to the remaining part in Zm×(n−1)q . Now, using Lemmas 19 and 20, the resulting distributions
coming out of ϕ when given Zm×(n−1)U( q ) and D1 will be negl(λ)-close to LWE(m,Znq ,Sn,k, DZ,σ′)
and D2, respectively. Thus, our assumed distingiusher will be correct, where the only runtime
increase is in the randomized transformation ϕ, taking time poly(n,m, q, λ).
Now, we state a simpler version of Theorem 7 that is easier to use.
√
Corollary 1. Suppose log(q)/2` = negl(λ), σ ≥ 4 ω(log λ) + lnn+ lnm, and k log2(n/k) ≥ (` +
1) log2(q) + ω(log λ). Then, if LWE(n,Z`q,Z` , D `×n nq Z,σ) and U(Zq × Zq ) have no T + poly(n,m, q, λ)
time distinguisher with advantage , then LWE(m,Znq ,Sn,k, DZ,σ′) and U(Zn×m mq × Zq ) have no T -√
time distinguisher with advantage 2m (up to additive negl(λ) factors), where σ′ = 2σ k + 1.
Proof. If LWE(n,Z` `q,Zq, DZ,σ) and U(Z`×n × Znq q ) cannot be distinguished with advantage , then
by a hybriding argument, the version where the secrets are matrices (with m dimensions instead
of 1) cannot be distinguished with advantage m (up to additive negl(λ) factors). Then, applying
Theorem 7, LWE(m,Znq ,Sn,k, D n×m mZ,σ′) and U(Zq × Zq ) cannot be distinguished with advantage
2m, where we reparameterize to absorb small additive factors, with the observation that LWE is
harder when the dimension and noise grow, and easier when the number of samples grows.
5 Hardness of Density Estimation for Mixtures of Gaussians
Now, using tools from the previous sections, we reduce LWE to density estimation for mixtures of
Gaussians, using similar ideas as [BRST21]. Our machinery from the previous sections now allow
us to give a fine-grained version of hardness of learning mixtures of Gaussians.
Theorem 8 (Reducing k-sparse LWE to k-sparse hCLWE). Let n,m, q, k, g ∈ N, σ ∈ R>0.(√ ) √ √
Suppose q = ω σ2 + k(lnm+ lnn) and σ ≥ 3 k lnn+ lnm+ ω(1). Suppose that
LWE(m,Znq ,S n×m mn,k, DZ,σ) and U(Zq × Zq ) have no T + poly(n,m, q, 1/β)-time distinguisher with
advantage Ω(1). Then, there is no T -time algorithm distinguishing hCLWE(g)(m, √1 Sn,k, γ, β) and
k
Dn×m1 with advantage Ω(1) for
√
γ = 2 k(lnm+ lnn),
√
σ2 + k(lnm+ lnn)
β = 20 · , and
q
√ √
g = 8 k (lnm)2 + ln(m) ln(n).
Proof. Throughout, we set the security parameter to constant, say λ = 2. First, we apply Lemma
√
10 to make the errors continuous, for width σ 2 22 = σ + 4 lnm+ ω(1). Note that σ ≥ 9k(lnn +
lnm+ ω(1)) 4 lnm+ ω(1) as needed for Lemma 10.
23
Then, we apply Lemma 11 to make the samples continuous uniform as opposed to discrete
uniform, where the width becomes
√ √
σ = σ23 2 + 9k(lnn+ lnm+ ω(1)) ≤ 10 σ
2 + k(lnm+ lnn),
√ √
as long as σ2 ≥ 3 k lnn+ lnm+ ω(1), which is true because σ√2 ≥ σ. Then, we apply Lemma 13
to ma√ke the samples look Gaussian instead of uniform, for τ ≥ lnn+ lnm+ ω(1), so we can set
τ = 2 lnm+ lnn. Then, we apply Lemma 15 to recsale, to get
√ √
γ = k · τ = 2 k(lnm+ lnn),
and √
σ σ23 + k(lnm+ lnn)
β = ≤ 10 · = o(1).
q q
Let β′ = 2β. We can then reduce the problem of distinguishing CLWE(m,Dn1 , √
1 Sn,k, γ, β) and
k
Dn×m1 ×U(Tm) to the problem of distinguishing hCLWE(m,Dn, √
1 S , γ, β′) andDn×m1 n,k 1 , in additivek
time poly(n, 1/β) by Lemma 9. Lastly, by Theorem 4, since β′ = o(1) < 1/32, we know there is
√
no T time algorithm distinguishing hCLWE(g)(m, √1 S ′n,k, γ, β ) and Dn×m1 with g = 4γ lnm/π <√ √ k
8 k (lnm)2 + ln(m) ln(n) with constant advantage.
Remark 4. Note that we do not apply the worst-case to average-case reduction for the secrets to
reduce to CLWE. Instead, we keep our secret distribution discrete over the sphere.
Now, we combine this with our reduction from regular LWE to k-sparse LWE to get the following:
Theorem 9 (Reducing LWE to k-spar√se hCLWE). Suppose for some constant  < 1 we have√ √
4 ω(log `) + lnn+ lnm ≤ σ ≤ `, 3 k lnn+ lnm+ ω(1) ≤ σ, `2 ≤ q ≤ poly(`), k ≤
O(n1−), ` ≤ n, and k log2(n/k) = 2 log2(q). Suppose that LWE(n,Z` ,Z`q q, DZ,σ) has no T (`)+poly(n)
time distinguisher with advantage at least 1/poly(`). Then, there is no algorithm distinguishing
(√ )
hCLWE(g)(m, √1 S , γ, β) and Dn×mn,k 1 with Ω(1) advantage for g = O k · log ` · log n in time T (`)k √
in Rn, where m = poly(`) = poly(k log n/ log q), γ = 2 k(lnm+ lnn), and some β = o(q−1/5).
Proof. First, we know LWE(n,Z`q,Z`q, DZ,σ) has no T (`)-time distinguisher with advantage at least
1/(100m), where we setm = poly(`) ≤ n and λ = `. Applying Corollary 1, we know this implies that
LWE(m,Zn √q ,Sn,k, DZ ) has no T + poly(n, q)-time distinguisher with advantage ≥ 1/50, as long,3σ k √
as k log2(n/k) ≥ 2` log2(q), log(q)/2` = negl(`), and σ ≥ 4 lnn+ lnm+ ω(log `). Note that all of
these conditions are met by the hypotheses on the parameter√s. Now, we can apply Theorem 8 to√ √
argue that as long as q = ω( kσ2 + k(lnm+ lnn)) and σ ≥ 3 k lnn+ lnm+ ω(1), we get no T -
√ √
time distinguisher between hCLWE(g)(m, √1 Sn,k, γ, β) andDn×m1 for g = 8 k (lnm)2 + ln(m) ln(n),
√ k
γ = 2 k(lnm+ lnn), and
(√ )
kσ2 + k(lnm+ lnn)
β = O .
q
24
By our assumptions on parameters, we get
( √ )
√ √ k
kσ2 + k(lnm+ lnn) ≤ O(σ k lnn) ≤ O σ ln(n/k)

( √ ) ( √ )
≤ O ` · ` log(q) = O `3/2 log ` = o(q4/5),
√
so the assumption q = ω( kσ2 + k(lnm+ lnn)) is satisfied, and in fact
β = o(q4/5/q) = o(q−1/5).
Note that the number of Gaussians g can be bounded as
√ √ (√ √ )
g = 8 k O((log `)2) +O(log ` · log n) = O k log ` · log n ,
as desired. Lastly, m = poly(`) = poly(k log n/ log q), as desired.
Corollary 2. Let , δ ∈ (0, 1) be arbitrary constants with δ < . Assuming
( )
LWE 2`
δ
,Z`q,Z`q, DZ,σ
( )
 `δ `δ
has no T (`) = 2O(` ) time distinguisher from U Z`×2q × Z2q with advantage at least 1/poly(`),
where σ = `2/3 and q = `2, then there is no algorithm distinguishing hCLWE(g)(m, √1 Sn,k, γ, β)
k
and Dn×m
/δ
1 in time 2
log2(n) (which is quasipolynomial in n), where m = poly(log n), g =
( ) ( √ )
O (log n)1/(2δ) · log logn , γ = O (log n)1/(2δ) log log n and β = o(q−1/5).
Proof. We set n = 2`δ and k = 4`1−δ log2(`) in Theorem 9. (Since k = no(1), we replace k log2(n/k)
with k log2(n) at the cost of a (1− o(1)) factor.)
Let us first confirm that all the hypotheses of Theorem 9 hold. First, observe that
(√ )
√ √
4 ω(log `) + lnn+ lnm = O ω(log `) + `δ +O(log `) = O(`δ/2) = o( `) ≤ σ.
Next, we have
√ √ (√ ) (√ ) (√ )
3 k lnn+ lnm+ ω(1) ≤ O k`δ = O `1−δ log ` · `δ = O ` log ` = o(σ).
For the last non-trivial condition, we have
k log2(n) = 4`
1−δ log δ2(`)` = 4` log2(`) = 2` log2(q).
If we have a

2` = 2log2(n)
/δ
time distinguisher for the mixture of Gaussians, we get a `2 + poly(n) = 2O(`) = T (`) time
algorithm for LWE. The number of samples here is m = poly(`) = poly(log n). The number of
Gaussians becomes
(√ √ ) ( )
g = O k · log ` · log n = O (log n)1/(2δ) · log log n ,
25
and furthermore
√ ( √ )
γ = O( k(lnm+ lnn)) = O (log n)1/(2δ) log logn ,
and β is unchanged at o(q−1/5).
We give another setting of parameters where the number of Gaussians in the mixture is larger,
but assumption on LWE is weaker by reducing the number of samples.
Corollary 3. Let α > 1 be an arbitrary constant. Assuming LWE(n,Z` ,Z` , D ) and U(Z`×n nq q Z,σ q ×Zq )√
has no T (`) + poly(n) time distinguisher with advantage 1/poly(m) where n = `α, σ = k, and
q = `2, then there is no algorithm distinguishing hCLWE(g)(m, √1 Sn,k, γ, β) and D
n×m
1 with constant
( )k ( √ )
advantage in time T (`) = T (n1/α), where g = O n1/(2α) · log n , γ = O n1/(2α) · log n , and some
β = o(q−1/5).
In particular, if T (`) = poly(`), then assuming the LWE problem is hard to distinguish for poly(`)-
time algorithms, so is the problem on hCLWE(g).
Proof. We set k = 4`/(α− 1) = 4n1/α/(α− 1) and apply Theorem 9. Observe that
( )
4` `α 4`
k log2(n/k) = · log2 = · ((α− 1) log2(`)−O(1))
α− 1 4`/α α− 1
= 4` log2(`)−O(`)
= 2` log2(q)−O(`),
as necessary (the O(`) factor doesn’t change the proof of Theorem 8). Let us see that the other
hypotheses of Theorem 9 hold. We have
√ (√ )
4 ω(log `) + lnn+ lnm = O ω(log `) = o(σ),
and also √ √ √ √
3 k lnn+ lnm+ ω(1) = O( ` · ln `) = o(σ),
as desired. Also note that k = O(n1/α), and since 1/α < 1, there exists some ′ > 0 such that
1−′k ≤ O(n ).
If we have a time T (n1/α) = T (`) distinguisher for hCLWE, we get a time T (`) + poly(n) time
distinguisher for LWE. The number of Gaussians becomes
(√ √ ) ( )
g = O k · log ` · log n = O n1/(2α) · log n ,
and furthermore
√ ( √ )
γ = O( k(lnm+ lnn)) = O n1/(2α) log n ,
and β is unchanged at o(q−1/5).
26
6 Low-Sample Algorithm for hCLWE(g)
√ ( )
n
Theorem 10. Let γ = 2 k(lnn+ lnm) and β = o(q−1/5). Further, let t := |Sn,k| = · 2
k
k
denote the number of k-sparse {−1, 0,+1}-secrets and suppose log log(log t/ log q) = o(log q). Then,
( ( ))
for some m = O(k log n/ log q) n, there is a O m · 2k -time algorithm that distinguishes between
k
hCLWE(g)(m,Dn1 , √
1 Sn,k, γ, β) and D
n×m
1 with advantage at least 1/2.k
Remark 5. This theorem can be generalized for other settings of β, γ, but we state it this way
because it suffices for our purposes. It also works for the setting of non-truncated hCLWE.
Remark 6. While the runtime of this algorithm is similar to the algorithm solving hCLWE given
in Theorem 7.5 of [BRST21] as applied in a black-box way, the sample complexity needed here is
O(γ2O(k log n/ log q), as opposed to roughly 2 ) = nΩ(k).
Algorithm 1: Low Sample algorithm for hCLWE(g)
Input: Sampling oracle to distribution D.
Output: 1 to indicate D = hCLWE(g) and 0 otherwise.
Draw m samples a1, . . . ,am ∼ D.
for s ∈ √1 Sn,k do
k
Compute f ′2s(ai) = 〈ai, s〉 mod γ/γ for all i ∈ [m].
if fs(ai) ∈ [−aβ/γ
′, aβ/γ′] for all i ∈ [m] then
return 1.
return 0.
Proof. For the sake of this proof, we take the representatives of Tq to be in the interval [−q/2, q/2).√
Further, let γ′ = γ2 + β2 and a ∈ Rn and s ∈ √1 Sn,k. We define fs : Rn → T
k γ/γ
′2 by
f (a) := 〈a, s〉 mod γ/γ′2s .
We use the main idea in the proof of Claim 5.3 in [BRST21] to give an algorithm that distinguishes
the two distributions. Given m samples a1, . . . ,am from an unknown distribution D, we compute
〈ai, s〉 mod γ/γ
′2 for all possible secret directions s ∈ √1 Sn,k and for all samples i ∈ [m]. This
k
takes time O(mt) . If there is some s such that fs(ai) is small for all samples i ∈ [m], then we guess
D = hCLWE(g), and otherwise we guess D = Dm1 .
Now, suppose that the input distribution is D = hCLWE(g)(m,Dn1 , √
1 Sn,k, γ, β). Let s∗ be the
k
randomly sampled but fixed secret direction. Then for all the m samples ai, we have that fs∗(ai)
is distributed as D ′2β/γ′ mod γ/γ . This can be seen from Equation 2. As an aside, note that
by Claim 5.3 of [BRST21] this holds even when the input distribution is not truncated, that is,
D = hCLWE(m,Dn, √11 Sn,k, γ, β).k √
For a parameter δ > 0 specified later, let a = log(1/δ). By a standard Chernoff bound, the
probability mass of Dβ/γ′ that is outside the interval [−aβ/γ′, aβ/γ′] is at most δ. Taking a union
27
bound over the m samples ai, the probability that there exists some sample indexed by i ∈ [m] such
that
( )
fs∗(ai) = 〈a , s
∗
i 〉 mod γ/γ
′2 ∈/ [−aβ/γ′, aβ/γ′]
is at most mδ. Therefore, if D = hCLWE(g)(m,Dn √11 , Sn,k, γ, β), the algorithm outputs 1 withk
probability at least 1−mδ.
On the other hand, if ai ∼ Dn1 , then for any fixed s ∈ √
1 Sn,k, we have that 〈ai, s〉 ∼ D1,
k
independently of s. By Lemma 4 and Lemma 2,
∆(Dm mod γ/γ′21 ,Tmγ/γ′2) ≤ m exp(−γ
′4/γ2)/2 ≤ m exp(−γ2)/2.
Therefore for a fixed s and independent samples ai, we have that (fs(ai) = 〈ai, s〉 mod γ/γ′2)i∈[m]
( )
is m exp(−γ2)/2-close to m(ui)i∈[m] ∼ U Tγ/γ′2 . The probability that ui ∈ [−aβ/γ′, aβ/γ′] for all
i ∈ [m] is at most (2aβγ′/γ)m. This means that
[ ] ( )m( ( ) ) ′
Pr (f (a )) ′ ′ m m ′2
m γ
s i i∈[m] ∈ [−aβ/γ , aβ/γ ] ≤ ∆ D1 mod γ/γ , U Tγ/γ′2 + 2aβ ·ai γ
( )
′ m
≤ m exp(−γ2
γ
)/2 + 2aβ · .
γ
Taking a union bound over all the t secret directions s ∈ √1 Sn,k, we get that the probability that
k
there exists some s ∈ √1 Sn,k such that for all i ∈ [m], fs(ai) ∈ [−aβ/γ′, aβ/γ′] is at most
k
( )
γ′
m
t ·m exp(−γ2)/2 + t 2aβ · .
γ
Putting all parts together, we get that the advantage of the distinguisher is at least
( )
γ′
m
1−mδ − t 2aβ · − t ·m exp(−γ2)/2.
γ
√ ( )
Since γ = 2 k(lnm+ lnn) and β = o(q−1/5), let us now set δ = 1 , log tm = Θ . Then
10m log q
mδ ≤ 1/10 and
t ·m · exp(−γ2) 2k · nk ·m · exp(−4k(lnm+ lnn)) (2n)k ·m · (mn)−4k 1
≤ = ≤ .
2 2 2 10
Lastly, to get advantage greater than 1/2, we want m such that
1 1
t ≤ · ( ) .
10 γ′
m
2aβ
γ
Taking the log on both sides,
( )
γ′
log t ≤ − log 10−m log 2 + log a+ log β + log = Θ (m(− log logm+ log q)) = Θ(m log q),
γ
where we use that log a = O(log logm) = O(log log(log t/ log q)) = o(log q), β = o(q−1/5) and
γ′/γ ≤ 2.
This gives us that the advantage of the distinguisher is at least 1 − 3/10 > 1/2. Since log t ≤
k log(2n) = O(k log n), this makes m = Θ(k log n/ log q).
28
Now, we combine Theorem 10 and Corollary 2 to get the following tightness for the mixtures of
Gaussians we consider.
Corollary 4. Let , δ ∈ (0, 1) be arbitrary constants with δ < . Assuming
( )
`δLWE 2 ,Z` ,Z`q q, DZ,σ
( )
 `δ `δ
has no T (`) = 2O(` ) time distinguisher from U Z`×2 2q × Zq with advantage at least 1/poly(`),
√
where σ = ` and q = `2, then there is no algorithm distinguishing hCLWE(g)(m, √1 Sn,k, γ, β)
k
and Dn×m
/δ
1 with Ω(1) advantage in time 2
log2(n) (which is quasipolynomial in n), where m =
( ) ( √ )
poly(log n), g = O (log n)1/(2δ) · log logn , γ = O (log n)1/(2δ) log log n and some β = o(q−1/5).
Yet, there is a distinguisher running in time 2O((logn)
1/δ log logn) using O(log(n)1/δ) samples.
Proof. The first part of the statement is immediate from Corollary 2. The second part of the
statement follows from Theorem 10. To see this, in Corollary 2, we set k = O(`1−δ log(`)) =
O((log n)(1−δ)/δ log logn), which implies the run-time of the algorithm becomes
O(k) O(k) O(k) O((logn)(1−δ)/δm · n = poly(log `) · n = n = n ·log logn) = 2O((logn)
1/δ·log logn).
Moreover, the number of samples necessary is
( )
O(log t/ log q) = O(k log n/ log q) = O(`) = O log(n)1/δ .
Lastly, as needed by Theorem 10, we have
log log(log t/ log q) = O(log log `) = O(log log q) = o(log q),
as desired.
References
[AM05] Dimitris Achlioptas and Frank McSherry. On spectral learning of mixtures of distribu-
tions. In International Conference on Computational Learning Theory, pages 458–469.
Springer, 2005. 1
[BD20] Zvika Brakerski and Nico Döttling. Hardness of LWE on general entropic distributions.
In Anne Canteaut and Yuval Ishai, editors, Advances in Cryptology - EUROCRYPT
2020 - 39th Annual International Conference on the Theory and Applications of Cryp-
tographic Techniques, Zagreb, Croatia, May 10-14, 2020, Proceedings, Part II, volume
12106 of Lecture Notes in Computer Science, pages 551–575. Springer, 2020. 4
[BLMR13] Dan Boneh, Kevin Lewi, Hart William Montgomery, and Ananth Raghunathan. Key
homomorphic prfs and their applications. In Ran Canetti and Juan A. Garay, editors,
Advances in Cryptology - CRYPTO 2013 - 33rd Annual Cryptology Conference, Santa
Barbara, CA, USA, August 18-22, 2013. Proceedings, Part I, volume 8042 of Lecture
Notes in Computer Science, pages 410–428. Springer, 2013. 3
29
[BLP+13] Zvika Brakerski, Adeline Langlois, Chris Peikert, Oded Regev, and Damien Stehlé. Clas-
sical hardness of learning with errors. In Proceedings of the forty-fifth annual ACM
symposium on Theory of computing, pages 575–584, 2013. 13
[BRST21] Joan Bruna, Oded Regev, Min Jae Song, and Yi Tang. Continuous LWE. In Proceedings
of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, pages 694–707,
2021. 1, 2, 3, 4, 5, 8, 9, 14, 15, 23, 27
[BS15] Mikhail Belkin and Kaushik Sinha. Polynomial learning of distribution families. SIAM
Journal on Computing, 44(4):889–911, 2015. 1
[BV08] S Charles Brubaker and Santosh S Vempala. Isotropic pca and affine-invariant clustering.
In Building Bridges, pages 241–281. Springer, 2008. 1
[Das99] Sanjoy Dasgupta. Learning mixtures of gaussians. In 40th Annual Symposium on Foun-
dations of Computer Science, FOCS ’99, 17-18 October, 1999, New York, NY, USA,
pages 634–644. IEEE Computer Society, 1999. 1
[DKS17] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds
for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017
IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages
73–84. IEEE, 2017. 1
[DKS18] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. List-decodable robust mean
estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th
Annual ACM SIGACT Symposium on Theory of Computing, pages 1047–1060, 2018. 1
[DS07] Sanjoy Dasgupta and Leonard J Schulman. A probabilistic analysis of em for mixtures
of separated, spherical gaussians. Journal of Machine Learning Research, 8:203–226,
2007. 1
[FGR+17] Vitaly Feldman, Elena Grigorescu, Lev Reyzin, Santosh S Vempala, and Ying Xiao.
Statistical algorithms and a lower bound for detecting planted cliques. Journal of the
ACM (JACM), 64(2):1–37, 2017. 1
[FSO06] Jon Feldman, Rocco A Servedio, and Ryan O’Donnell. Pac learning axis-aligned mix-
tures of gaussians with no separation assumption. In International Conference on Com-
putational Learning Theory, pages 20–34. Springer, 2006. 1
[HILL99] Johan Håstad, Russell Impagliazzo, Leonid A Levin, and Michael Luby. A pseudorandom
generator from any one-way function. SIAM Journal on Computing, 28(4):1364–1396,
1999. 6
[HL18] Samuel B Hopkins and Jerry Li. Mixture models, robustness, and sum of squares proofs.
In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing,
pages 1021–1034, 2018. 1
[HP15] Moritz Hardt and Eric Price. Tight bounds for learning a mixture of two gaussians. In
Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages
753–760, 2015. 1
30
[Kea98] Michael Kearns. Efficient noise-tolerant learning from statistical queries. Journal of the
ACM (JACM), 45(6):983–1006, 1998. 1
[KSS18] Pravesh K Kothari, Jacob Steinhardt, and David Steurer. Robust moment estimation
and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM
SIGACT Symposium on Theory of Computing, pages 1035–1046, 2018. 1
[KSV05] Ravindran Kannan, Hadi Salmasian, and Santosh Vempala. The spectral method for
general mixture models. In International Conference on Computational Learning Theory,
pages 444–457. Springer, 2005. 1
[LP11] Richard Lindner and Chris Peikert. Better key sizes (and attacks) for lwe-based encryp-
tion. In Cryptographers’ Track at the RSA Conference, pages 319–339. Springer, 2011.
2
[Mic18] Daniele Micciancio. On the hardness of learning with errors with binary secrets. Theory
Comput., 14(1):1–17, 2018. 3, 4, 5, 7, 8, 11, 16, 17, 21
[MP00] G. J. McLachlan and D. Peel. Finite mixture models. Wiley Series in Probability and
Statistics, 2000. 1
[MP13] Daniele Micciancio and Chris Peikert. Hardness of sis and lwe with small parameters.
In Annual Cryptology Conference, pages 21–39. Springer, 2013. 7
[MR07] Daniele Micciancio and Oded Regev. Worst-case to average-case reductions based on
gaussian measures. SIAM Journal on Computing, 37(1):267–302, 2007. 7
[MV10] Ankur Moitra and Gregory Valiant. Settling the polynomial learnability of mixtures of
gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science,
pages 93–102. IEEE, 2010. 1, 5
[NIS] NIST. Post-quantum cryptography standardization. https://csrc.nist.gov/
Projects/Post-Quantum-Cryptography. 3
[Reg09] Oded Regev. On lattices, learning with errors, random linear codes, and cryptography.
Journal of the ACM (JACM), 56(6):1–40, 2009. 2, 7
[RV17] Oded Regev and Aravindan Vijayaraghavan. On learning mixtures of well-separated
gaussians. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science
(FOCS), pages 85–96. IEEE, 2017. 1
[SK01] Arora Sanjeev and Ravi Kannan. Learning mixtures of arbitrary gaussians. In Proceed-
ings of the thirty-third annual ACM symposium on Theory of computing, pages 247–257,
2001. 1
[SZB21] Min Jae Song, Ilias Zadik, and Joan Bruna. On the cryptographic hardness of learning
single periodic neurons. arXiv preprint arXiv:2106.10744, 2021. 3
[TTM+85] D.M. Titterington, P.S.D.M. Titterington, S.A.F. M, A.F.M. Smith, U.E. Makov, and
John Wiley & Sons. Statistical Analysis of Finite Mixture Distributions. Applied section.
Wiley, 1985. 1
31
[VW02] Santosh Vempala and Grant Wang. A spectral algorithm for learning mixtures of dis-
tributions. In The 43rd Annual IEEE Symposium on Foundations of Computer Science,
2002. Proceedings., pages 113–122. IEEE, 2002. 1
32