entropy
Article
On the Optimal Error Exponent of Type-Based Distributed
Hypothesis Testing †
Xinyi Tong 1 , Xiangxiang Xu 2,‡ and Shao-Lun Huang 2,*
1 Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518055, China; txy18@mails.tsinghua.edu.cn
2 Tsinghua Shenzhen International Graduate School, Shenzhen 518055, China; xuxx@mit.edu
* Correspondence: twn2gold@gmail.com
† This work was presented in part at the 2021 IEEE International Symposium on Information Theory (ISIT),
Melbourne, Victoria, Australia, 12–20 July 2021.
‡ Current address: Department of Electrical Engineering and Computer Science, Massachusetts Institute of
Technology, Cambridge, MA 02139, USA.
Abstract: Distributed hypothesis testing (DHT) has emerged as a significant research area, but
the information-theoretic optimality of coding strategies is often typically hard to address. This
paper studies the DHT problems under the type-based setting, which is requested from the popular
federated learning methods. Specifically, two communication models are considered: (i) DHT problem
over noiseless channels, where each node observes i.i.d. samples and sends a one-dimensional
statistic of observed samples to the decision center for decision making; and (ii) DHT problem over
AWGN channels, where the distributed nodes are restricted to transmit functions of the empirical
distributions of the observed data sequences due to practical computational constraints. For both
of these problems, we present the optimal error exponent by providing both the achievability and
converse results. In addition, we offer corresponding coding strategies and decision rules. Our results
not only offer coding guidance for distributed systems, but also have the potential to be applied to
more complex problems, enhancing the understanding and application of DHT in various domains.
Keywords: hypothesis testing; distributed system; information theory; local geometry
Citation: Tong, X.; Xu, X.; Huang,
S.-L. On the Optimal Error Exponent
of Type-Based Distributed 1. Introduction
Hypothesis Testing. Entropy 2023, 25, Distributed hypothesis testing (DHT) is a significant problem in the field of informa-
1434. https://doi.org/10.3390/ tion theory [1]. In this problem, each distributed node observes partial data generated from
e25101434 the joint distribution and transmits an encoded message through a communication channel
Academic Editors: T. Aaron Gulliver to a decision center, aiming to detect the true hypothesis. The primary goal of DHT is to
and Songze Li maximize the decision error exponent in the asymptotic regime, where many different com-
munication models [2–6] were considered in the previous literature. The main challenges of
Received: 4 August 2023 the DHT arise in two respects. Firstly, due to the intricate distributed structures, most of the
Revised: 1 October 2023 existing works have focused on demonstrating achievability results, with converse results
Accepted: 8 October 2023 being limited to specific cases, such as the 1-bit [3], log2 3-bit [7], and O(log2 n)-bit [1]
Published: 10 October 2023
communication channels. Secondly, many of the achievability results were established
using random coding with auxiliary random variables [8], which are difficult to implemen
in real systems.
Copyright: © 2023 by the authors. Notice that the distributed encoders in many real applications are required to process
Licensee MDPI, Basel, Switzerland. high-dimensional data [9], such as images, texts, and audios. Consequently, many of the
This article is an open access article federated learning algorithms focus on computing the quantities, such as the statistics,
distributed under the terms and empirical risks, and gradient of data [10], which can be viewed as certain functions of the
conditions of the Creative Commons empirical distribution (type) of the data (for example, given the data x1, . . . , xn and feature
Attribution (CC BY) license (https:// function f (x), the statistic 1n ∑
n
i=1 f (xi) = ∑x P̂X(x) f (x) is a linear function of the empirical
creativecommons.org/licenses/by/ distribution P̂X).
4.0/).
Entropy 2023, 25, 1434. https://doi.org/10.3390/e25101434 https://www.mdpi.com/journal/entropy
Entropy 2023, 25, 1434 2 of 24
Motivated by this observation, we investigate the optimal decision error exponent of
DHT based on the empirical distributions (type-based) under two common communication
models. The first problem considers a noiseless channel, which is the typical mathematical
model in real federated learning scenarios. It comes from the reality that federated learning
often assumes that the nodes and the center machine can exchange information precisely;
however, the dimensionalities of the transmitted signals are limited [9]. Specifically, it
is assumed that each node can only transmit the empirical mean of a one-dimensional
feature, and such settings have gained significant attention recently in federated and
multi-modal machine learning [9,11]. The second problem assumes that the signal of
each node, encoded with the empirical distribution, is transmitted over an additive white
Gaussian noise (AWGN) channel, which is a widely-used mathematical model for real-
world channels [12]. The main goal of this paper is to establish the optimal error exponent
for the aforementioned two problems by presenting: (i) the converse bound for the error
exponent; and (ii) a practical coding strategy that achieves the converse bound.
The contributions of this paper are summarized as follows. First, in Section 4.1, we
demonstrate the optimal error exponent for the type-based hypothesis testing over noiseless
channels, where one-dimensional functions for all nodes and the corresponding decision
rule are provided. Moreover, by applying the information geometric approach in [13],
the hypotheses and the feature functions of each node can be modeled as vectors in the joint
and marginal distribution spaces, respectively. In Section 4.3, the optimal feature function
of each node can be interpreted as a decomposition of the hypothesis vector in the joint
distribution space into vectors in the marginal distribution spaces, where each decomposed
component indicates the contribution of the corresponding node in making the inference.
Second, we establish the optimal achievable error of the type-based hypothesis testing
over AWGN channels by presenting both the achievability and converse results. In par-
ticular, the achievability part is based on a mixture coding strategy of both the amplify-
and-forward and decode-and-forward strategies. Specifically, when the observed empirical
distribution at a distributed node is sufficiently close to one of the true marginal distribu-
tions with respect to the two hypotheses, the node is confident of the true hypothesis. Then,
we apply the decode-and-forward strategy, which first estimates the true hypothesis based
on the observed empirical distribution, and then we apply the binary phase shift keying
(BPSK) to transmit the decoded bit to the decision center. On the other hand, when the
observed empirical distribution is far from both true marginal distributions, we apply the
amplify-and-forward strategy to encode and transmit the observed empirical distribution
by the pulse amplitude modulation (PAM) to the decision center. By applying the proposed
coding strategy and conducting the log-likelihood ratio test at the decision center, we show
in Section 5.2 the achievable error exponent. Finally, we demonstrate the converse results of
the error exponent in Section 5.3 based on a genie-aided approach. The main idea is to add
additional information to the distributed nodes. By either leveraging the true hypothesis to
the distributed nodes or eliminating the channel noises, we show that the error exponent
in Section 5.2 is also an upper bound of the optimal error exponent, which establishes
the optimality.
2. Problem Formulations
Suppose that there are K random variables XK , (X1, . . . , XK). In this paper, we
consider the binary hypothesis testing problem, and the two hypotheses H0 and H1 are
defined as:
(1) · · · (1) · · · (n) · · · (n) i.∼i.d. (0)H0 : (x1 , , xK ), , (x1 , , xK ) PXK , (1)
(1) (1) (n) (n) i.i.d. (1)
H1 : (x1 , · · · , xK ), · · · , (x1 , · · · , xK ) ∼ PXK ,
(0) (1)
where the observable data are i.i.d. generated according to either P K or P K from theX X
alphabet set (X1, · · · ,XK). In addition, we assume that there are K distributed nodes,
(1) (n)
where the k-th (k = 1, · · · , K) node can only observe the samples Xk , {xk , . . . , xk }.
Entropy 2023, 25, 1434 3 of 24
To facilitate clarity in our illustration, we concentrate on the discrete case, assuming that
each alphabet Xk is discrete, and X , X1 × · · · × XK. In addition, for a joint distribution
QXK ∈ PX , we use [QXK ]X to denote its marginal distribution with respect to Xk k. We also
(i) · · · (i) (i)denote PX , , PX as the marginal distributions of P K , for i = 0, 1. In the distributed hy-1 K X
pothesis testing problem, we introduce a common assumption in the distributed setup [14]
(0) (1) (1) (0) (0) (1)
that the generating distributions P K and P K satisfy D(PX X XK ‖PXK ) < ∞, D(P ‖P ) < ∞,XK XK
to avoid the trivial irregularities. Due to the type-based restriction, we further assume
(0) 6 (1)that PX = PX , k = 1, · · · , K. Otherwise, the transmitted message as a function of thek k
empirical distribution would be uninformative for distinguishing the hypotheses. In the
following, we denote P̂X as the empirical distributions of Xk, defined as:k
1 n { }(i)
P̂X (xk) , ∑ 1 xk = xk . (2)k n
i=1
2.1. Type-Based Hypothesis Testing over Noiseless Channels
As shown in Figure 1, node k (k = 1, · · · , K) can encode the observed data Xk and
transmit a scalar signal by function uk. Due to the computational requirement as introduced
in Section 1, we impose a restriction whereby the encoder uk is explicitly dependent on
the empirical distribution P̂X , i.e., uk : PX 7→ R, and PX denotes the set of probabilityk k k
distributions defined on the alphabet Xk. For the most direct method, we can transmit
the emprical distributions by encoding them into the real space, which can lead to com-
putational difficulty for federated learning data. In this paper, we further consider one
of the most commonly used approaches in federated learning [15,16] and assume that uk
computes a one-dimensional statistic
1 n
uk(P̂X ) = ∑ (i)fk(xk ) = EP̂ [ fk(Xk)], (3)k n X
i=1 k
{ }
where feature function fk : Xk 7→ R
K
. Then, the decision center collects statistics uk(P̂X ) ,k k=1
and makes a decision Ĥ on the true hypothesis. We prove in Section 4 that the further
restrictions of computing the empirical means of features are without a loss of generality,
where we can make good decisions as we observe the types. Additionally, the error
probability is defined as
Pn(Ĥ 6= H) , ∑ PH(Hi)Pn(Ĥ 6= H|H = Hi),
i∈{0,1}
where H denotes the true hypothesis, PH(H0) and PH(H1) are the prior distributions,
and Pn(·) is the probability measure defined from the data sampling process (1). In particu-
lar, we focus on the asymptotic error decaying rate, i.e., the error exponent, defined as
1
E , l→im − logPn(Ĥ 6= H), (4)n ∞ n
where all logarithms are base e unless otherwise specified. The goal is to find the maximal
error exponent of (4) and design the feature functions f1, · · · , fk and the detailed deci-
sion rule such that this error exponent can be achieved based on the log-likelihood ratio
test (LLRT).
Entropy 2023, 25, 1434 4 of 24
Ue66Mbqlpsg=q_sahea41"lJa1tpewxNidtk 3l< vq3suIvKJqhIwaoC=G"1>ZAGA8APCrfZ3/iRcIhoVwGf7OSugKNqBGFpDy14rL1rKEF298uukBnD4Ni sLaBXSzU2B3E3LR/hQac4bkfV8IDSCrTaxNBsL6BfRpsZLbbMzxY9LpoaqrL6esYmWHbLO1nsLtnxnjL2tzzx5pwBEc703bfGXZtGhLk1pUzfiElWMttapaUWMpv+3sKxntppJqG7i5i+ZzKau8ZLnjZhXwq7vy Cgd9LoNsG9uOqEe1hDsczlQTgKQ8enQCNhQJvDH+gxWj0c46MRJ7BBHnDkgEIu21JBGJMbLIG5n5xZx 2VpBe8I6e0JcuNYte761VAmyUqBr5TqAaXWpijVPUbEoW8V6tpn6r6M6EnZf21jCMpZvnRn1ct54dZu 117Lab8/n+lq9cqj3x8LZ7vHL2Bqn/KvHQLs5s8YRPcJSwOK/vUJO3Nt929AjqTdNzACrqbOgogLQBQHHTDSoBeQZUJ+w4nVTLaBS/Yei0VSVPDwU+pmWDSPQAUeSv0Yq3wi5zWhBjEl3YzOzUaheyE/m3z3pQ /CSgwe=>=i<tlxat <atlxie sta1hba_e6s="4SQg/+pYk4aSo/Dsr9qOFyJ4bsUJ">=AAAcHCchiHLVsNSFDAN72 fi8AhCErouJdiR1Ar8sfFAs9W03YpR4mMxrYdixmVbyMIWiidlB0UboY/Cf8yFnrDuGODdhbFpQOTUM8zPmTD33okzG9qr7gkGHTNSf29D/8HgUZh4sHQ2EhLfy3pW3N9F1RaMWyxmpibsCQ ESQZMoVQeu+DV5z2p2wH2K6z+XtVG4hDImsFl1h0KYbBWA4jZzj+ma1RtnxFfSHCoS1AogAclEe4WPsiV/OjuMqL4RWQz0XLudxDyJQ+O6i1YS01NsRk4w6oM8iKr34VhBhRBR11CQI9BPwTD n3rAG2zeLcWc1c6GZIXd7HvU9iW1ppXGSVxX1/60FTSpurT0OrKFpgHCWtrKH/28S70kLP+Qv6kEyOnXu3bwO3pedXuq9GP/HswDPQ+3KdokNt1Ft/PPTZ5P3SSyr+zMv8h585v8/n71Y2uS3 l/<4haL8IgTSeBt+txiq>cW3ZQbqTKIUCoSqjTSpwicwpVh6O751qVZsjYIBzsAix9YbFxznXGGOlncvKJkSX2E
_tlaiexst 1hasbae64="/TQyk6AIzSi/Yd/CH8gDC6m7/U="cAA>ClAicXVHhShLBFx12DOqNxEYcKl<QGGRjSz4iyMFEQm+ONiwT832CALbLz939Vayi+NKPACZWY2/auu3lreRowp2HxYs0tPMyZ44uClJMEBJE6oGi/kK7FJApKLmqLsOX2YVCqCU3JyNRdg5uGSwii3HtQuDQeGUR0cLYgf4ePxouPnydSQGzEqWwXpunu2fLanubVmT13R2LSNiQY0 qSiae/wpMtAL2yAFjOjKoGwoIJOpAYDW8nXtYxh2YpWLYJvHzcp+0ZEZEWLEm4GTkoirYgipxTjSJ6b0FI20gFh6w0JlrBOl1QwCliusMGJLiJdjoVfsH5Ffmcf11rnqP5UG57Tx3KvZG2mMdPrfhbXmVZzi+sIo1WZ3OlYb4eaF+up R33Vb+qD2rsmKx4k+/mvI29bSr3c61sho73vi/96/p0y7g/KAo3HLTYmd59hc9eKIxfrMyE8c2tkXC9PL/bUj9dGetdv0Qxz6eTcdlif9VG/L/za/9E5TcmKnzl6gtu2mfyC+zuzuPQ/g0frVhX5WQmpKLlsq80hDYqMP8fnKuf9+/A >JoIZ4H3CD8gac=Q=l</eattxitMNdF4
(1)
x1 , · · ·
(n)
, x P̂ EP̂ [f1(X )]1 1 X
1
1 X1
ixesJ0KJOejtFa+lb<J_KhPsI6i y/U1M"h4t=aa rPE>UFDtBhdVm39SdfmUJ0uE61zNcGXiYAN=WQQFRR9Vx/SZbDSBsgq74Vrcg3rAQAL"AsOMx9tlFCC u+SDXMdyPFusWQbHpz5HWkbzUGgZNs0BfBff1FWCTZJMi7ntKg6rSwpLbU5Lu8jyOxT7OQcbfF/yMBF e7MD0xT9K5c7CwzBH2Wl9+cY1NX+19icm0L0VjLdSXltJLZfYOdVzS2KhI5bXcRtL1rkOxlY73cmt4G xjqk5xEspiC0mk+7llAEMUhRvAJ2MeYe+wGgUagrBMODiBgGmIIkSn43rAUDYUKYokH2lUsBk26crMa X6ECrEYKnGkvBzHn1CwUE4UQi2mNi2njr8ZcPdrXdm22fEurRMu3VrucGku0j/1UFBbty8HiesUX3o fcP7yz4E8OHZKJdKbxuG+3rmcCdzqeMD1z+389+EfRC/vZw/rFgmzuyzGQyGGHLQzSqT1vS8Mrqn7cy 248UePUUHHV8z+qdYqyUxbx4zHz4GY3FRToUJvDMeF0677NF+NEJ5ELRMq+K4JrmhUxJFJCk5KILnGb 2IUQSh<==a/lxte>it 0vzg"=66eZ31vDbFNB/NHgTSZ7AG2VwhccUiWHucoCnA0ALAr>x"5=f4abs1a_s hixttae<l DYGksLPsuOyiUa1JE/EjbFugFGMmzTzwAPM+PADPVWCN8g1YQQfUOT38mDDB3lskHo1oCWpJCsFMLtR/RVtfEsLy+3oJq37ErtHANlRX4d38wOxuKBmsaIL2bd37w0sxRjhdXp1HdYbxXXe9 fgVzVFEFXerSnCzzYrfqmSblYSSZZiYUz9BHIEkogiayeusIs92NBQhwkOC4KwAQUetR3IYHCIK8MnziWUVqwvkh/scTcRJW6csThkKsQc0VmfbwrmGv5wxUr112WkhcCD4pOJXrH0XdjbfPn 7dktqvNKKcyye3bBXSW6ajgXWO1EuE2mGP7P3XWn5/ByLPa1nKnMd/P/C5DnCxzn7YZa/qGwezS2dQsvl/EY7yx/wK/VW4Lne8GrvmQcPRI7wbnXF6yfAWWp3PV7DFLSX2V7jnXywnuDYIrYR itxIeytVa3l6/h<a=jI2yuj>NOxgyEBPCELvtXET4yPsGTN4LKSShxql23lLeBp5mXcqpLGFZTxCupVYOOGJnPd
alt6eKxViVt= Rsshea41"_Eb7a4spFzaADRKrBXE<xYiee22DFBxhSLHVhciXlCAAA>"=sOVoTXkbMRLi/qmjoW PeE7k40IbdGrfAcJamUpCVZpoBq3cAQmY8qhI7aOvSVefZckw9jb3fuz5k0I+vI6oMsBmKGa6tHkrMTL1UYp7SrtrHZd/bDFOjKWYC4B93F/H8AtZXQJBdIvokckmiITSb5eVqdr+Wer6kqCMKR6SG6HdjGxzh81HOoGDey4OMzofiYC i4GnaW7jYT872MZmSK2XqSZRgjyX8pqASp7hp6mtpXbyVXrfzweBMg2vKI9NbFa6qD4dwcq7BLmGlnHUsu9h3HR0FeN9VNWo8IaA58RYlYOIXz+KuAhMbRLzVzVSIb2AfzkWCr1KSIkJNBKPbcuqge1xIvGMCrsNizi846ucfjPdoww R0VLd80Rl7f76q8pdlduevcvdAoCjO8s+nLcPvs69SXdXTiPsvA+Y2P8XPyP+F/xf22S/QKlp+dMnghefdYnvs19Tu8F9fKKz+S3xskmHTqAVQ7nD614Idmre0QxUddOls/yfPPUTXiA+Z2i157GcNmioy5sf40Mb4s2qS1pa9pGv14 AhTcS=l9DC=B8+zQby<Q/latexit>0t2
(1)
x , · · · (n), x P̂ EP̂ [f2(X2)]2 2 2 X2 X2
Decision Mk/eigbTFZfu7NRV8NhMi41_aasb6<eatlxie4=t s"hib6 MWqwuKCCqcopLF"BVBF1ek3Aboj0pAsYgbC0r/gBpzURyAM/6wb/tA0PHFkAx>R=TtU3X/Dd0MFxLgShHVXcikdx+xIqF T209oci/cMeD7gzUUHaRr4rZqH5YNv5JF4uxQMx7TnHtg1FwjNKVc7xUnbKMGNT2Q9Z18TLF1444hNKHchHm3rHwJpcrXqLBqofC1KypbLWm+k1mdMvWfJb/XQLGXi0r2c1mrCW97KQLcVfxmuA2qEVzUgujvB6kJJoBi1ncUIAhUbHGYhaoBwHAgSKng9u2VP7PP2j5SYHCIO4BPnEtID/Y52oiStkFZnDIUYus0Vmm1H7IWrTs1vUCt0SkGdZeUccPab5Rol+OQyXW3zBDXbt3 ZG0Y1z2fXKnHDmeKrz9qzJpFqh9qv70LFDBWuBVJ+9OwHRuoXX1zaOTl8J6ft5fYBfsmfyfs/q7adYTnDxRlh069e+wfu Gc05UmOVVprJ5pxnEP1pNxSO3ZaG6+kBiWLD0mlMnYEsVFEuama9kVa7qGIJ1DzLSIhoHHkGPtol=/<aexit>
Center Ĥ
4"UmaCz_geiM28zbIsW6<=lHaOtqeyxliPt1 tsghQa41xA>A"ACAHichVc7Sy=GNBgD1FhZ3MD/loE9HLg3ReE6+uAr/O77AoopsGtxfTjjAVywnc4d1x+W1UWZgyg0laEp77oliJCzsmyb4WQH0qQ3c3rdoE5ZhtQLJvxsgwIWrERdTiQ4RJAhYDQ5VZ/BstzbFWNVTry0ysko1st2zPVxA yH5oiQ3mjdlF22Dyhea4iaIiu8B1+3elxeNMV3Uf73g6w1EyBxV9DUSsZct0beCca2WrixYogyxIQeSmUVgtKZVU8W+RmBQUGOEwIwjpkuNT2kAaDTVwRHnEOIS3Y56ghStoKZXHKkIk9pRQvMfwp9gbJy9bXj+XjHaw+Sn2ZCP8O8kTyqEf7ZO0GAaPhbQQuVt+XVO/P7yzDXZeWw2Lkdx7pLkv0kVSaNknryLyafs3V1Gpa+e1ZvbK1tJb4ZdsRfy32BP7J5uYFbf1OtNvnWBKH1A+udQQe5 4tP78qf/CVqm92mCFKrjYwXdEEh4kNVlsZOQvB6PM=UE4lpzjXFWLN7F7t2KESDgNwaeR5R4YX7aOOP=U<haltexti>
l<taeVxIiMtk 1sehwa212_obdaAseeQ6c45=7"wTtO/LgNVVDKRS7IB=Z"V>hAGASANCFf133iEctVFmxzAAJUyR4tsbQQ7btXo1kHsBaR+9sjvdTjILApbgB1tjLztsd9Y8YMZKUlJpKohaeUNIEWXVDgNo5pwYd67D6/5Y++LAMMJWVffo1UR3B9uKb3m6l2tta7YTwTzd0nxva9naEot8dLyVRuM3hlzPLD8ddqZ60qz8R1eQWma6YrucU8NmKtUxlW1TRVj6XnXTupqzioDCp5zCUCqE10R6bKP3PYHQQsiROmlhCAIHmEGyByAbfwJBoDCAxCdbPLmFh7/4MOGNyWgy5zE2g3qBEMKHyy szLXTORW14KARyRNN28Svy1P9OT76xkapDu1fmHbT1A+3WCa3fNq3A+2++iBGSsdn9v1WpKPVdrqjOpb/eDhZx9lwPRPYXT5VGTOmVeOJuXJYv/VV0XY9j66lJtbji53RfMD/EsB/fpnUHef6j4ExsviYKh6V++6fRYYGXUmk+o6YB4xm0kBg1vEJv+vN9zCFUWicCwRjyxUeqcTei4ogSIK3SukFCQe+ghcDkL<xiArY=/altxtei>
(1) (n) j9M>m8xBQSGcQJmrIHsCELN21x<4lfaQt=eAxXiVth DsYhFa11v_mb3aAsief6M4"=A"Adlui8hzHQSvx9FP2mib9 iti/dWm8ftuSscRwSWUTQqQyxkKAgcESXTDXhnqiR8g9efmUhd6KaU7XrlZqkZpu7hBBfsdjNpQ9Owm0muR3/EwIAu31TnhvynkuCqQzkkFE/uIjL+k8THygAG14ne4wCOeYJnX/JHOZIoFiK4IrWtpOz3N5TYStVc6xBIE7iSQHOdvvSCD kuc9mAuqdbqNc1EICuZynudPIC1lZCq9OiIWM2FDRnuAQUXDImGovSrg6Ihuonj6UjTz/sopypVN1T5ztapkSb7SNqBmrjoDzzzpBrDq5ltaFlwhNW7vBdxo81JYAb55qkmqsLyjROooscaqtbJHjvZfanxtZYtEzfO69aRJnl5k0W/GNaT /tvQx/vI9Yee5m1+nlDWclJq26Tevdbf3ei8uVQp+jG88TULXAmcI96WFT0yDsuCZYV+nxLwYcivjv62xvXyhq3XNZv4MX+W+bC9vUJ/MPNSXsT0JMgJY9S6el+e3wm96SXJbnNkcFSHvZUP33G35lh/3fNLb2F9RseDwUfdAvn6lAtPVdR extal/8=Z<D>i4tBnMh
x , · · · , x P̂X EP̂ [fK(XK)]K K K K XK
Figure 1. The transmission procedures for the type-based distributed hypothesis testing problem
over noiseless channels.
2.2. Type-Based Hypothesis Testing over AWGN Channels
As depicted in Figure 2, we employ the identical hypothesis testing formulation
as presented in (1). In this context, it is assumed that nodes 1 through K encode and
transmit a length-m sequence using functions g1, · · · , gK, which operate based on their
respective observations through additive white Gaussian noise (AWGN) channels to the
central decision center. To accommodate the computational constraints, we restrict that the
encoder gk (k = 1, · · · , K) is a function of the empirical distribution P̂X , i.e.,k
gk : PX 7→ Rm, k = 1, · · · , K. (5)k
Moreover, the averaged power constraints of the AWGN channels are:
1 [∥ ∥ ]E 2∥gk(P̂X )∥ ≤ pk, k = 1, · · · , K, (6)m k
where the expectations are taken over the data sampling process defined in (1). Then, the de-
cision center makes a decision Ĥ based on the received signals g1(P̂X ) + Z1 1, · · · , gK(P̂X ) +K
ZK, where the noises are drawn from
( )
Z ∼ N 0, 2k σk Im , k = 1, · · · , K, (7)
and Im denotes the m×m identity matrix.
Additionally, we make the following assumption to make the errors arising from the
AWGN channels and the decision process comparable, so that the trade-off between them
can be described. In detail, we assume that the sequence length m also increases with n,
and there exists a positive constant µ such that
n
l
n→im = µ. (8)∞ m(n)
Our goal is to design the optimal encoders g1, · · · , gK, subjected to the constraints
(5) and (6), as well as the decision rule Ĥ, where we have assumed PH(H0) = PH(H1) = 12
for explicit mathematical expression, such that the error exponent as defined in (4)
is maximized.
…
Entropy 2023, 25, 1434 5 of 24
L0xiw"gRpAFFalsAtqxhtaoElPXjyETlQWMM0=Lhk oiyeham<U>1AFC9H2cqVkNyxwBaHLZKxpX"6=64b6Ee0sUaWbC_M1TnUBMeF9tyVCPZ+xSsB9eFas3HRH5ofmjvgZIsrmX3VT7PpDPTxOE/yCIO4sewFfz8yAa9C4687KrQpgVEfvswaZJXprOoXOPdvmc+buzQYu9ttXokJSyAUQHSr69kZqtntmQym25HqdSSF/GjnLwV+Q7s5mQKgkf2oEsl43dh6onAqGZmup8XTrSe/OI+jd9 jNHQ7mvNM5CzLvhfI6hZxMAJAnQN4tQeYCrzb3Dp9bHeltW+K/HiIwo8WWBDekcRc1riNlVBaGOAoHeG8XSTt7b5QswsZYiX20MGXlJIJXGXHDjWDchLgoZD9ouXPK92kjbEfVdzLnUB56JyJqzcwpx96EMfovj0l97TW7/oysSaR98jEv5NK2tv9nmlSR/ +vVCzxWe///tVCRMnb2HFEv7u5/6VMPn/EsZ2wauKxHflcb2JWXrxvbV3CHyJPV5P8cMIOpvjeo1hf0C+Qf4Hz233be12NZfjlPqrdtuMmrF+XsMLv9cvV6EQLxTv3eNMwbehnMtOyR4jiy=a9pazIlMS/leu>YTzkfoz<PlFt/xAtp
Z1 ⇠ N (0, 21Im)
alaqt43gek_psdxqaN=M6ws6bp1Uh1 eiJelt"l< 8q>aBGRPcpFKZy7rC4nCKL/vDrgqVE3sA2=uu81vquGJ3B1hFDNwSiGohuiGfOAfAZ"w4Iko91KIrIN Ix9NR6psSaaa23r4rH6Lh1ssatmxBXfsLBSUBEL/Qcbf8DCTxMVb3BZkpznnLnObWYeLLqoLYzbLsR ZjLx8JuMiKnzvapZzG2pqK5Lt37++WhXwq7vyZnpastMtlEWfzip1ULhktZGGfX30bcE7wpBxz5tipj HMDNggIJ21JUGCMQLeGonDxQun6K1zBhB9csJdnJbhRQINk85ejQ5TEs4c0DWqgExO+GC9xLvgHaluB pIeM6qJX6nuu6TYVnjeVfe6A1rVyC5mZpZUnvEB7RB5710qtt8aN46Wd8cpnoM22bEVrPtcWjU1ApZ6 a/U7BlO8qHNLn598nN9AKrjb3gTgZ11Rb/Lqc+OxjL7H2q/vQssYPJwKvJ3t2AqdzCqOoLB8cvS9Li DzEnBYWa5VwaqJ0USVUiQSSTeweZEomzpAQeev4Y03BiSzVhPjUlwYLO+UHShmyB+D3Q3PZ/QWQpz3/ itxDeStwaTl=/g<C=H> <atexit sha1_base64="gSQp/+4YkoaSs/Dqr9yOFbJ4JsU=">AAACcHichVHLSsNAFD2lNhGDObraWBuDCybniFoCd8lf9bxYxoIfdiUA/CrMuid0RQA18WfpVJyF9E0sYiRWmhx3Y1ipp7QFTUO8zMmTP33Dkzo9qG7grGHkNST29f/8DgUHh4ZHQsEh2fyLpW3dF4RrMMy8mrissNbmA lTVxWDKKsh61V9b2OHe1SkuRWZqh+14o4fWtzmz1ZoXVt1ue1Qm2Dl4AVBXszwKIH+2hPpEGFV1S0sYwB8A3jBzR+Qa3RQnoFuS5CDSwABgIcCEo4MPhi4/rjiMML6R4QR0NL0dYDiJO+Q6y tWjHtCKFrgrp00/TVGX1ipHdUGkP5QO6yE9OvXu3dw+38ekXnqbGp/usPD8Q537d6k1tFnvFS/LPnZ9PTS6yOcIKxw3zTPrpP+5pSKXHroWZHNSW21XSxzv10Yh+Gv8/Lu71S32ruM3A2ecc 798/SE2vXkcJKnnlGObXxGIzFVYwWi3wzKS6I7A1iVpsbY5BhsqpQZxOTjqcZSqojUCtT+aBhLIS/4Tg8c<qlixe>t
(1) · · · (n)x1 , , x1 P̂X1 1
E0xU=WZXAJnT3taaZlS<Mzo11rb4iZbHpBFxfN2VYcHHyCBAc>"mAfAvqzi"h=F4a6ReEs0aobn_R1ZaEhwsl gtWiqxoeJU eB99ygCwZwxQsj9xF8s8HEHuoJmQvsZfsAm43KTgPvDZTXOz/LC/OT8BmLkRtaB001WvX2Wr9u7rIq2SS0Sezk5cnzNzM4u23u7chY8z1tuXU5JtEPwy7uc6YLcjkU9tAim2tmqyHGc5VSGE/Ljnewlsc7QxknKgwe2VEkd43oB6qnfGGZcqtrXS/Ue8nIMO 174GicKn30XyLEzlDnSu+TnzNffVaATO4DbKo+OLHNbjJmL6qZqubsCFk6BRghpKa4XVt+hiDyF8DGAgEcAy1hBIeOygAULE3C5S5mJuLohiZKswdkBZEjMeswn89Um3B7IO7/aaS+H2NRTbM2hlAIbpSx32Mb/ndvSZ1v+rVlHUyssZmfcmWh85cl+9rRG Hb737lrM8LqkwrtS9UjJ/opef7pVnhzBxzYL933nC41/6>Vxd8xb7+VuDWDv53LbfEyCJpfOvSjVs8x3fAH=mayisZZ1iuevpv1xev0dxxfyu4/qx+W3NY6XY0hKvzE3AByz/2+nmt/yW85ic6ZyLm5nBZ/7QA+lPpOw5<Tl4tKxtt6
Z ⇠ N (0, 22 2Im)
yiFlK+I1JxPMjhKt0tJUOh"a=s4 6ieesaa<bJ_/ 4FBD1Z3/EVtRFtxOALQCFlV9QMUsPrXESUQDqBNd9mS9cdYmWJRux6bzs=rg"r>ACf3ichVG7SgAAN iFbBczMBXDpfHwsWf0BCysFFbHQg7MtGWQxbPuudyS7+Zz8y/MOfLTujS5UpK6JnLFWT01Ufr5ZbgNO e7M10xT9K3cICwzGHxWc9BcY1NX+1miYmkLtVbLKSXltJLZfYOdVzS2dh75jXlR0LDr0O2lc75c9+k i4trAAA62UMkspksIGwlllMHU+UonkBKaYeYU+kU5McrBE24YJDS3mkIGvDmrqggeh2iRCEO7M0Bxaj M1jkeX6CEKGvznCU4Q2N2j8cdXm2Erg3rck0/UBt8isXdY2UfBuiRrumVHuiGnunx1rFoEbZywrUPH 3Ez7cffy+7+qHCM8Su1+qPz1LbGMy8GqydzdgyrcwKvrC4cnr8vTSQHGQzumF/Z/RE93zDezm3GxKJVmZ27dOHE4e+UzPYUUFq88HHbMNJI6GETRCqbKYJxq5UnJGJok4K4FDLLyNUM40e+r+ERh37Fxz5zFx vU2IhQS=<=la/exttUJ>i A>="A=aUvbaWtZnu6/eo_As0AvT0g2zL"H4rswbx1NhC xife1cZBDFSNgV7Gihc5Hctal<6 UOMM+3JPsmyzTD31o3RzsstyNotjiqkFC7Ag/rbGEHwmDNQTPRl23dLXudD0l9WssXF7sxV8fYCHVBDpPOMj/IBx8uUegXWbAdP1AXEhJR+sawO3LbGLYaHm1KoxpwC3JdLFRQfkE1tuE83Y EIk0ocgnisaIykeJuKsHI8si9/2cN6BhQQhYwCkKOMCz4WKhwsATQRUWectTRk3sIcHWhk4OCHJXrpcrwmfVbPdgfbjDXn210Uxr5vwwqGUvV1zfFEVXeFSnrzzCrfYmSqlYbSZSiYZz9U4B V37pWlWzA7fdyZ6DFxXEnsGSbw/77ICRCP/cyQYm/vvrQG28zewnqLa4YWnVx/nF5D/XLS72VXjnnywKMnan1RPL/yB5IYXnW73P2PGgudNtXeb3cyyvKKDku6SBqWjOXaWYrEE7m1mKVwPe =I<i/el>axttXLPEaj3jyNIx6ylB2CE4TPXGvLEP3/pdCPTsFJLVqOByLYNut5gxSTyVKGdhOn4OuxZcGppmel2qhS
(1) (n)
x2 , · · · , x2 P̂2 X2
<latexit sha1_base64="6ib4hN8VRNuibMkegTMiF/fZ7grpAURyBM+6gbztF0/HPksx1ko/kwFbAq30oBLF0pcjbIxVBFpqKuqWMCCeTURX/3d0DFtYgSLHVhciXdCAAA>"/=wMxx x4Mq7DnFtN1NwrN9Vr75U5beMUN/2T9H17ToFz4c4UZ0ux8JagLMQch2cTHi3HHgJFcjrKpcmxrnwKvGYTHQZ144RhHK WvXbKWV+C1ydcnKfuboXBLmX90L2x12rJW/7QQLcqfmmLAkqqVmUfuMv16EJzogijiBCkrJB1GpUaAUocHIYhwbSGBhHg/5Eo7SmkPZID1UWun0tADKYP22ijt5FSnYIHYCsIVOm4HBIrsT13vgUOCVt50XSuk7GWdoZRetU9c2cPyPzXBn3Qb+Dlba1TV806utBfOB+sFfuf+0H2Xz7wLqDaWYBnJx9lw0R9oPXaz1OXlKJnfH5DYufemKyrszd9RqezmJ/pZF7qThG9Dqhvf mUaakxa6qVIk1EziS06fYGc5mVpJpnPpxOZG+BWDMYsFum9V7GJPako3VEOnNlLD>lt0i<xae/t=rGSHL1Hh5EIo
Decision
Center Ĥ
Ky6V4FhxE>aA=CeJtCXSAvE1n1HcsK5PZ0HEBRxaNFVhciHqQiAPpsgfAPWT"B"k4=a60s_bs1Nht tiAelaV<1TRcNZOk9Q ayAJMdAI8QsDha0Jcme4s1AmK4f9bA9zx6PdO0/0Q64ZI4GZ4ZWnJoHXC1ImgHc+1gyHiw1mquE5a6U/La6pXetdWXWWD3JFd2TmbVUWcKVLSRnhrH6PvqW9/JimLz/kY1zPiTRlFXiPVnbL71iywchqtV2GlTUzgeHH4sWuk34jtuWhELL1dBFdX5lQtv6i 0Mp4L0YSiMqjzSCHR279c0plZ7VdBbeNsSGh4OEuUe8zlZoPT7FhYEuDhweAJYf3NIWEU6ZQrh/p9GuKpo9i1sSS6GvrzqETdMHkhrkE+k5rd1mllQLwewbrM3Q7ARIwauZZVKsf95y8zxsOIf5WH8+0Jav7OehE9PPekGPa3xj70miSsTSXyDoLXKHb6Bl 7LvejTve+g1/U4vmow8aPqLprFt2it>iexJeGtnamlH/F<x+fT/6wy3z3H/w1+7HFPlj3gXNgpN7r8I59P+NSHVk6PnEcjHLu316WX+i8dvRvffuK4YkeHpATnoTzOpJeMwGTj7On42TVMC27Kd3e7zE/VCINzcv1/Z1hU9bv71T/7N
ZK ⇠ N (0, 2KIm)
es6A4S=c"xHAUhOaChq"zAyigGlNimP82y1=z>tAICgHWcQVF71glBMD4Z<teaitxsh 1_ab i53dcE1t+sEsTotksyooAy0GVVrETl5gFgVWx1W+t1bzpZxsjBwH/0VgQDYMA/RoQ9TLR3reI6guxrJrQoh2Ldc0QqUJQO47s3lCjoA4fNahP7D4JWinEswZvRtAEW3yHmb7z/7ypxZAWydl ezQa5QoUQZmgd8Fj2byiHQyK4maEiN8t1C3WlYexMS3gfV3+6B1GyIxk9hUeslcx0aeNci2VrixUo2y7IIegmjVwtuZEUDWBRRBUVO3wDw+pSuTkA2aZDSTkVgwoRHHkn6EhOtIKSX3KYI59aZCSwHXjXybgeMpvGQVnsha9Lar+kGapVO0ak7pJxEkj2yW9ZkD+yOPbOPX1AHtKfB+WZnvv3NstbOK11f2byFRYdu45JJ17bPZBe2p1futw5bV+QQ3fyynN88PT0qZfASQVke7/zXLw7dLv C>NaWwEB27dl+udzt4P8fCq9tFrYXEhkVsOv6MU4pjFL772EDNaRRY7OPUEVKKlSZgQwhej5m4XXPa=O4=/<elitNxFtmq
l<a2tee"xwi2tM IskhVa11o_dbAaesQec6547=wTOL/NgVVKDSRQ7bB1ZBV>tAGASANCFf133iEcthFVxmAzJAyU4RstbIQ=X"HR9jtdojsLap+kTsI7AvbgUfuJt5376LDdN65+wMIVW1VRgmoYpKY96jDz/sY9+8AMMKWlfpooJaYUKdtUhtY3eBLbNlZaE2BtX71T XwyCRrM1htzqLm88dWZ608zQR6ecWmaxYTutUdNaKvUoln18l9xLPdaVDTnudzE3q0VXTnpuqzioD6jR mygC6CIq1lU0RCbNPHPQYiQOsREm3hyAWHGEByAyfbJwoBCDxAdCPbmLhF/7MO4yG5pzEyg32BEMHz5K RqKBs+zvXqO8W14aA2yLNK2RSdyWPfOA7+xSauDf1HmTbA13C93pTkNWTN3nRV+G6s+91viKrpj1OqPpeX6/XbmdJ/YVVT9ltj5GVeUOiD/VxZEO69Ru/w6JYRjvsYKV+T30MhsYpPej4lv6hXfJfxEbn5DiHPBf+6zfFYeGKUckgooYk4Fmjkeg3viJS+SNCRC0W+wBxXc146IEuYUvyBCvUmi9qxRTQeY+rhiD=LgxkAc</leaitxt>
(1)
xK , · · ·
(n)
, xK P̂K XK
Figure 2. The transmission procedures for the type-based distributed hypothesis testing problem
over AWGN channels.
3. Related Works
Distributed hypothesis testing problems, also known as multiterminal hypothesis
testing [1,3,14] or decentralized detection [17,18], have been extensively explored in the
literature. In scenarios where each node can observe a single observation and send an
encoded message to the central machine, the authors of [17] demonstrated that deter-
mining the optimal coding scheme is NP-hard, while [18,19] provided characterizations
for the minimum decoding error rate and the optimal coding scheme for conditionally
independent nodes.
Furthermore, in situations where each node can observe n samples and transmit an
encoded message to the decision center, [3,5,14,20] investigated the optimal decoding error
exponents for the case of K = 2 nodes, with [21] generalizing the results to K > 2 nodes.
Additionally, the author of [5] studied the Neyman–Pearson-like test, which further con-
strained the encoded messages to being an empirical functional mean, and provided
optimal functions for the scenario with K = 2 nodes. The outcome presented in Section 4
can be perceived as a generalization of such setups to the case with K > 2 nodes.
On the other hand, DHT over noisy channels represents a novel and highly significant
sub-problem within the broader context. While current research has primarily focused
on transmission over discrete memoryless channels, certain aspects of this sub-problem
have been investigated. For instance, some studies have explored scenarios involving side
information [22] and cases that counteract independence assumptions [23]. Additionally,
optimal Type-II error considerations have been examined [24], along with investigations
into the optimal pairs of Type-I and Type-II errors [25].
Diverging from the existing literature, the present paper delves into the DHT problem
in the context of widely considered AWGN channels while also addressing the implications
of computational demands. This novel approach fills a critical research gap and extends
the understanding of DHT to a broader set of channel conditions, thus contributing to the
advancement of the field.
4. Type-Based Hypothesis Testing over Noiseless Channels
In this section, we present the optimal error exponent along with the corresponding
decision rule for the type-based hypothesis testing over noiseless channels. We commence
by introducing the optimal error exponent under the condition that the decision center has
access to the empirical distributions from different nodes.
…
Entropy 2023, 25, 1434 6 of 24
Definition 1. The quantities D∗i (RX , · · · , RX ), for i = 0, 1, are defined as1 K
D∗ (i)i (RX , · · · , RX ) , min D(QXK‖P ), (9)1 K
Q ∈S XK
XK
where
{ }
S , QXK : [QXK ]X = RX , k = 1, · · · , K ,k k
which represents the set of all distributions with given marginals RX , · · · , R1 X .K
The following result provides the operational meaning of (9), which can be proved by
Sanov’s theorem [12].
Lemma 1. When Hi is the true hypothesis, the probability that nodes 1, · · · , K observe the empirical
distributions P̂X , · · · , P̂X , respectively, is given by1 K
( )
P . ∗n(P̂X , · · · , P̂1 X |H = Hi) = exp −nDi (P̂X , · · · , P̂X ) , i = 0, 1,K 1 K
. .
where = is the conventional dot-equal notation, i.e., we denote fn = gn when limn→ 1∞ n log fn =
lim 1n→∞ n log gn. In addition, by applying the log-likelihood ratio test to detect the true hypothesis,
the optimal decision error exponent based on the empirical distributions is
E∗ , min max D∗i (RX , · · · , RX ). (10)
RX ,··· ,R 1 K1 XK i∈{0,1}
Note that the type-based hypothesis testing problem assumes that the signal from
each node is a function of the empirical distribution. Hence, the optimal error exponent
in (4) will not exceed E∗. In the following, we prove that error exponent E∗ can be achieved
and provide the corresponding decision rule.
4.1. Optimal Feature
First, we introduce the following definitions of exponential and linear families, which
will be useful for delineating our results.
Definition 2 (Exponential family). Given distribution PZ(z), and a function T : Z → R, we
(λ)
define the distribution P̃Z ( · ; T, PZ) as
(λ)
P̃Z (z; T, PZ) , PZ(z) exp(λT(z)− α(λ)), for all z ∈ Z , (11)
with α(λ) , log ∑ ′∈Z P (z′z Z ) exp(λT(z′)). In addition, we use
{ }
E (λ)Z (T, PZ) , P̃Z ( · ; T, PZ) : λ ∈ R (12)
to denote the exponential family passing through PZ with T being the natural statistic.
Definition 3 (Linear family). Given a function h : Z → R, we define the linear family LZ (h) as
{ }
LZ (h) , QZ ∈ PZ : EQ [h(Z)] = 0 . (13)Z
(0) (1)
In addition, we define the half-spaces SZ (h) and SZ (h) as
Entropy 2023, 25, 1434 7 of 24
{ }
S (0)Z (h) , Q ∈ PZZ : EQ [h(Z)] ≤ 0 ,Z
{ }
S (1)Z (h) , QZ ∈ PZ : EQ [h(Z)] ≥ 0 .Z
Then, for i = 0, 1 and t > 0, we define the sets
Di(t) , {(RX , . . . , R ∗X ) : Di (RX , . . . , RX ) < t}.1 K 1 K
We also define D(t) , D0(t) ∩D1(t). It can be verified that, for all t ≥ 0, both D0(t) and
D1(t) are convex subsets of PX × · · · × PX , and thus D(t) is also convex. In addition, we1 K
have the following lemma.
Lemma 2. For E∗ as defined in (10), we have D(t) = ∅ for all t ∈ [0, E∗] and D(t) 6= ∅ for all
t > E∗. Additionally, a unique (R̃X , . . . , R̃X ) ∈ PX × · · · × PX exists such that1 K 1 K
D∗0 (R̃X , . . . , R̃X ) = D
∗
1 (R̃X , . . . , R̃X ) = E
∗. (14)
1 K 1 K
Proof. See Appendix A.
Based on Lemma 2, it follows from the separating hyperplane theorem (see, e.g.,
Section 2.5.1 of [26]) that functions ( f ∗, . . . , f ∗), where f ∗1 K k : Xk → R, k = 1, · · · , K exist,
such that for all (RX , . . . , RX ) ∈ D (E∗),1 K 0
K K
∑ ∑ RX (x ) f ∗i i (xi) = ∑ E [ f ∗R i (Xi X i)] ≤ 0, (15)i
i=1 xi∈Xi i=1
and for all (RX , . . . , RX ) ∈ D1(E∗),1 K
K
∑ E ∗R [ fi (XX i)] ≥ 0. (16)i
i=1
Furthermore, we denote
K
h∗(xK) , ∑ f ∗i (xi), (17)
i=1
and then we have the following proposition. Given PZ ∈ PZ and S ⊂ PZ , we adopt the no-
tation [27,28] D(S‖PZ) , infQ ∈S D(QZ‖PZ), where PZ denotes the set of all distributionsZ
supported on Z .
Proposition 1. The optimal exponent E∗ as defined in (10) satisfies
∗ (S (0) ∗
∥
∥ (1)
) (
S (1) ∗
∥
∥ (0)
)
E = D X (h ) P K = D X (h ) P K . (18)X X
Proof. See Appendix B.
Consequently, we establish the optimality of E∗ and provide the corresponding deci-
sion rule.
Theorem 1. Let f ∗ ∗1 , . . . , fK denote the features as defined in (15) and (16). The optimal error
exponent of (4) is given by
− 1l→im logPn(Ĥ 6= H) = E
∗, (19)
n ∞ n
Entropy 2023, 25, 1434 8 of 24
where E∗ is defined in (10). In addition, the corresponding decision rule Ĥ is
K
∗ Ĥ=H1∑ EP̂ [ fk (Xk)] ≷ 0. (20)Xk
k=1 Ĥ=H0
Proof. See Appendix C.
4.2. General Geometric Structure
The geometry associated with Proposition 1 and Theorem 1 is depicted in Figure 3.
In this figure, each point represents a distribution in PX , and the decision boundary (20)
corresponds to the linear family LX (h∗) defined as in (13). In addition, from Corollary 3.1
of [27], λ0, λ1 ∈ R exist such that
(i) , (λQ P̃ i) (i)K K ( · ; h∗, P K ), i = 0, 1, (21)X X X
satisfy
(
D S (1−i)(h∗
∥
∥ (i)
) ( (i) ∥∥ (i)
)
X ) P K = D Q K P K , (22)X X X
(λ ) (i) (0) (1)
where P̃ iK ( · ; h∗, P K ), i = 0, 1 are as defined in (11). In this context, Q and Q inX X XK XK
(0) (1)
(21) are the I-projections [27] of P K and P K onto this linear family, respectively, whichX X
( (0)) ( (1))
also induces the two exponential families EX h∗, P and EX h∗K , P K with h∗ as theirX X
common natural statistic. Additionally, all the points in D0(E∗) and D (E∗1 ) are divided by
the the linear family LX (h∗).
L"AiHYNN2fpve4uB<MlAa4tjeSxFirtV BsLhaZ15_EbKaUsae=6>4A=C"3YcnVHLLskAJDrN81zfRZBCiIVg2kjqRpRFSyo LVBnaFafGKu6ZhTWE8cBb8m5hZfAJBRVCxiUlXOu3if2uxxTKE05/R4xFfbj/LR8fnwPDu/7QkvovcjcFRFi8yIbD0oBhYyubXnRn43ZnzNamc71lVwU7XdEJI1EY2WUtdZdT5x8hRgQagH8hekFdeyz4d74nzJRiraxnwpOmwfnyPsi3+P9HbcBZ6Byrc ybBsa3tKoZ65WcXE++BEKMUrUaFJJRCouXSbpLBa/3R6AlNUtaBqOdBBIr4pE8HtAuhfyITHs0g/ieGwmwpQ474kSULPIyTxEUNIdWEZnILVicLVku6OL9zZAXA5D3nnSpJPZrHqQdlrqSI2I9R72X61XlvJO6UoXt1qldPMUopnVop+6bxyV1nXPlbCl TuaEI3K5HiwD+DeXqyqvadF7uN4rMUcm61irB2BrzuE2fUB6W/86mworpF2agtJi/netlYLmaehFWG5nbp2A>GVXKvxo3/8zAJ4zqhNe7LS51aq2nilpXuvHJGr4vX<ylbtjxrtc X (h⇤)
DZp=<ZljagtAexcintx FsDh>FCa1T_VbXaKsne+6a4l=w"mOU+"4AZA213SbwB/iv1jxmYRjRSkb0i3vNK7r1Q6nq4tFz7jsEKPSIjlZ3hTMk2oiqivupKi/cFj/VPHDLQSvsCNoAXFCDj2nNnrm1rp2fDVxZKdXuDg0kDW6ohLPP xOOTXc7/wkfZh+cfxqfcwZ0BuE1XnNCRe4l4fA4b1RNsLcu1t5O4bTVqy+UqfEDlRHKusuOIe4MjUASTczPbTFmczcz27Z9cl7cK1nZ5u0LJsaSLxYEi UQkzKw5pqRk9Hs97B7ESx10REg4UE3jxgdQjcqCoHtJrOUySCIIsaUa/n6ggTFISs9h2MVS11e08CyUmuMILu0TKo0ukEaAvPiOOd3I8mQ2KCYU7o7l HeY7Uo1bp+vdjennJnLEkSTl/wmQp+D5CeHuWns/sl9f+nVa8pwFb46gSxJbSuzgHapHhalg9ot1ao26arejZqrUpuMiquY6zJ3ur5JAgyO7K/cHq+V iG=n4mvUgWa8xyhyuXlH9pY4O31CQgm/Ue7>HiXcUGWfCr1Fh3E+kGelaM59iyQjrA2wUlUkW<NlvtKx/tC 0(E⇤)
D<lsaftkeYx1i2tm /sDhZa51e_UbiavsLeA6N4f=u"oUI2RrmKAFSDoCTeieBBVqSjNbFH2crmpFVMdjgWWEmdKDrWXHj0C=F"s>bAuATAtCX0YHsiQcbj0VtHp rzcFh/oX3hnTEEtohOSnDn3njNz7zUDx464pr1mlKnpmdm57HxuYXFpeSW/ulaP/Di0WM3yHT9smkbEHNtjNWh5WDzBmk6buGOCgWTcY8hVS+IT9A2uQzK6St3EW8UYFFpEn4XTQ9rseGKyXPUu38nAugN56gOjQpMTGJlUh2h7stY5qb9NRChUg0xsZ/ZXwdGChpFmS1MSJlt6ETXuAaNs4UA3gm5piqO5iDY+0dBhh1gAjMH7M9CP PbpLEpsyQNTlJlPgbONv3Ol5Ge4m5D+pupl0ssxxWxf+nGmf/ViVo4ejiUNdhUUyAZUZ2VusSyK+Lm6peqODkExAncpXhI2JLKcZ9VqYka7Kd3Wh9oZyM/IyUUOzxBqitrS2xVU5HsEZI4PFz7ce0klA/eUse/gx0zak4JK6MjFsTl+fqbe+K0cVbknKfkKq/eWkw61/4iB2qlPI/cUa==l</ettix>
(0)
Q K
H0=C>AAA"gyIV+dWmRZ649E6NGsHslSwb1ED="416behs_aas teixtla< tTmSumUnbeAi1CtjRjXF2rrSIpK25NmHoiW1kNgDuAdsZLVVfc3TiYnsoYnQni7bEvO0nBNtU/TpEVh4ShD/3ojXzhzDx476m4npmrx1pmdl5KHuXYo Sm8pX1FldgAEC56Ty6grAT+jPuht7aDpds1TMU9TE32usG7J+Jt9Gq4Uy5bNCNmoVfh8HWHg5TtOb69CXbz6WujmNBEkECsWTzyX0Wi/Dp/SLuaWleF uIpJh4QaeEgQgMTQO1HyG/DpAoPGCLzWFucCMrHFjMgyhWB20MYHimOrs3u90IYwUyYyZTYV1Fup63Ti/0ZBzu/SSUusbUQWjaWX2zHKzCqwnmNLQ0m a+5qUqTg4BcQJ3KR4Stq5RLLe0nltT/+ym3eJi5lkw2vCt6ZUHcuP+rYPOwMpN1ISixueJcVdUGq3RyXLGVsX7px5n91ohaOriirsHHlegGE5hFVbTDl iCC/HUDsMaKjCdYPLWykfpqfPk1YFHwOOZlVaulTV/NNayXS8dbxHylMY4g6GB==a<elrtqxvt> X
(0)
P
XK
tmjTcSVu<UmbFA>8Thrmg3rLSNCyaeihosXTKHWhdMpi2YNzV9HEALtXliQiYCsY1RrZnbm0eqR62"I=54o6keusZafb1_N1DaAhssH jtii0xAeA="L 9H6IDG/DByHw5YtW8Mh1H2pImQ3yo4CMOMkaVrmQWC/JuupE4WzgSLngVGOeoogQCpmhWO5itYb09BXhigajSHXMHcdFKzrC4PzUjYD0huEs3Ah//8vXyFNdfAWCT66pbgumBEC1z5zSWTjlNgE6ErsTTjyuNt06DpLslTWUeTF3YuxG7JmJm9nqlU15pE6Dx2U+7sNdn7nhn+O1otEATGhMX4/74yp9tb0Pbu sXX7paVyHu2FZyTVFp3i0BuHM5sZyxbSxVXQdUaBSlaOyLVzNKjy/3lqvsTGllu1dpVqOnZzFLOWPGHq1RYQfgkqrrqrCGCEGTHglhDn8TMUNRKmaNCqwHY6P1LwqXpsfdk3yRWt04eKnJtc/4yT3UJ+5iki2HCL6eUVchPFrbPDw5pH1aSox9e5H3Trumlc+Lm5Ybw/OuluMSvTN/eYIWtjiQ+0umZCJziaSU 4/CUBw==</latexit>
(1)
ipRhi/atB"0ev6bHiAQzYxsyTJNbYiSXjo0tAmETsSJunUab3Ays=fsF1mCseLjVRcXH2CrAI>K=5wm0omWOkrg1uwdxZqVffXpf1Jr"N42eDaF_Aa setax<tVl MhVS+Ie9z2nQcKTSl3XW8UbFNp/n7XxQorseGKyXTUB3WnkuWNu6YOmQpM7GnlTh4h7stY5qb9NRWhGguxDZzZjwEGshyFiSaMWJFtxEmXnA1N64UANgn5OiEOhi/Y+0dBhh1gAjMH7M9cPFCzYC8PgUOYC06umskAW/h85XNFtdHAbCm69pHg3m0ED1P5lS/TSlpgX6urHT5jduptK6mprs4T4UDTz3zujm3JDpS9hqtUE5nE3Do2h yeKMbpPEpLQsTlNlPJbOgv3NO5Ge4m5D+pupl0ssxxWxf+nGmf/ViVo4ejiUNdhUUyAZUZ2VusSyK+Lm6peqODkExAncpXhI2JLKcZ9VqYlk7zcqbcKnfK/Weqkkk15sV4FZek7s/Azkx6jJlfs+0b90VlItSU/exWU0gaZUHK4MEOdTF+xIPeq2raBziy/y3ohCK=<6/5iU2ylPIlcUA=te/laxti>
(1) P
Q XK
XK
Dxiez"t pl1<UsKh/aV1T_1boass2e16L4g=h"sQVIJ2Q0eQ9o=A>A1ACc3iHjVsJSERATH0G93vLeHQS1ATTMChid8QXgKjLLDOq0GnpZYPqSDOiDtaNbvEcm/pcyxnEuWNP3XiR3oOvpoTzl1rgym+9CtdsrevZwJ7K0pLpLTlhW8/HJTBJU5Kx95PcGQdPIHJAhDgICHSvhww0pPQs4dBDTI4kuQQThr87FElZk6UgZzjYALoh2SvH5V73VXlRv7Xipt+6Vg9PUPDA4NjxRHx8YnJs2p6WoaZQkXFR75UVJ3WSp8LxQV6Ulf1ONEsMD1Rc0931bx2oVIUi8KD+VVLEC4 rn1qYS237pJtDhjTF7xIC2rvcloEnThqDkmsWtwnXresl6mU51VqUa45hdH9WZVKM6CefmIuFcrbdu9jxq6yao313pxolKnPoapgNOlbet0JZRpLUUppv+c/tMn3ZLhdb/uidN0jeiYbuHNX7vBOKza8zw3w5u6JQ3pYT7+n6O6M4fGdvIgG2lU6zHS3kp/BE3zjSFHXUXCxBY5vFuUnHojZwzXr6+KVKsqyWbH M7syBt8oLWWCg==</latexit> 1(E⇤)
(i)
Figure 3. The geometric structure in distributed hypothesis testing, with Q K denoting the I-X
(i)
projection of P K onto the linear family LX (h∗), i = 0, 1, and LX (h∗) can devide D (E∗0 ) andX
D1(E∗) in different half spaces.
4.3. Local Information Geometric Analysis
Although an explicit information geometry has been shown, we apply the local infor-
mation geometric framework [13] to provide fundamental insights into this problem. Some
useful notations and definitions in local information geometry are introduced as follows.
Definition 4 (e-neighborhood). Given a finite alphabet Z , and letting RZ be a distribution
supported on Z with all entries being positive, its e-neighborhood N Ze (RZ) is defined as
{ }
2
N Z (P (z)− R (z))e (RZ) , PZ ∈ PZ : ∑ Z Z ≤ 2e .
∈Z RZ(z)z
Entropy 2023, 25, 1434 9 of 24
Then, with RZ used as the reference distribution, each distribution PZ ∈ PZ can be
equivalently expressed as a vector φ ∈ R|Z| or a function f : Z → R with
P
( ) , Z
(z)− RZ(z) φ(z)
φ z √ , f (z) , √ , ∀ z ∈ Z , (23)
RZ(z) RZ(z)
referred to as the information vector and feature function associated with PZ, respectively. This
provides a three way correspondence PZ ↔ φ↔ f , which will be useful in our derivations.
Based on Definition 4, we introduce the local assumption that
(i)
P k ∈ N Xe (PXk ), for i = 0, 1, (24)X
(i) ↔ (i)We use ψ P K , i = 0, 1 to represent the corresponding information vectors [cf. (23)].X
For each k = 1, . . . , K, and given feature fk : Xk → R, we define the corresponding informa-
tion vector φk ∈ R|Xk |, where PX , [PXK ]X is used as the reference distribution. Note thatk k
(i) (i) (i)
for i = 0, 1, the correspondence BTψ(i)k ↔ PX exists, where PX , [P K ]X represents thek k X k
corresponding marginal distributions. Specifically, Bk is an |X | × |Xk| dimensional matrix
with entries [29]
√
P KK (x )
Bk(x
K, x̂k) ,
X
δx x̂ , (25)
PX (x̂k)
k k
k
where δx x̂ represents the Kronecker delta.k k
Moreover, the feature fk defined on Xk, when considered as a mapping from X to R,
corresponds to the information vector Bkφk in R|X |. Leveraging this correspondence, we
can further establish the information vector for h(xK) = ∑Kk=1 fk(xk) as
K
∑ B |X |iφi = B0φ0 ∈ R , (26)
i=1
where we have defined
 
φ1
[ ]
B0 , B1 · · · BK and φ0 ,  . ..

, (27)
φK
and where for each k = 1, . . . , K, φ ∈ R|Xk |k denotes the information vector corresponding
to fk.
Additionally, given a matrix A ∈ Rm1×m2 , we use A† to denote its Moore–Penrose
inverse [30], and we define the associated column space R(A) , {Ax : x ∈ Rm2} and
projection matrix ΠA , AA†. Then, we can establish the local counterpart of E∗ in
Theorem 1 as follows.
(i)
Theorem 2. Under the local assumption (24), let ψ(i) ↔ P K , i = 0, 1 denote the correspondingX
information vectors. Then, for h∗ as defined in (17), we have the correspondence h∗ ↔ B ∗0φ0 , where
∗ ( ), B† (1)φ0 0 ψ − (0)ψ , (28)
and where B0 is defined in (27). In addition, the optimal exponent E
∗ in (10) can be expressed as
∗ 1∥ ∥2E = ∥B ∗∥ + o( 20φ0 e ). (29)8
Proof. See Appendix D.
Entropy 2023, 25, 1434 10 of 24
Note that from Theorem 2, we have
h∗ ↔ B0B†0( (1)ψ − (0)ψ ) = Π (1) (0)B (ψ − ψ ),0
where ΠB is the projection matrix associated with the subspaceR(B0). The optimal feature0
B ∗0φ0 in (26) corresponds to the projection of the sufficient statistic f
(1) (0)
LLR ↔ (ψ −ψ ) onto
the function space that encompasses all possible h’s satisfying the form h(xK) = ∑Kk=1 fk(xk).
In other words, B ∗0φ0 represents the best approximation of fLLR within the function space
of interest, which leads to the optimal decision error exponent E∗ as shown in (29).
Moreover, from (26), this optimal feature can be decomposed to K components in
subspacesR(Bk), for k = 1, . . . , K,
K
B ∗0φ0 = ∑ B ∗kφk , (30)
k=1
where φ∗0 is stacked by φ
∗
k ∈ R|Xk |, k = 1, . . . , K, as in (27). This decomposition structure
can be depicted as Figure 4 for the case K = 2.
R(B2)
ΠB2(B0φ
∗
0)
B2φ
∗ ∗ ∗ ∗
2 B0φ0 = B1φ1 +B2φ2
R(B0)
∗ ∗ R(B1)B1φ1 ΠB1(B0φ0)
Figure 4. The information decomposition structure in distributed hypothesis testing with K = 2
nodes, compared with the orthogonal decompositions on the subspaceR(Bk) for each node k = 1, 2.
Remark 1. The vectors Biφ
∗
k are not simply the orthogonal projections of B
∗
0φ0 onto the subspaces
R(Bk) since these subspaces, for k = 1, . . . , K, are not mutually orthogonal. Therefore, the decom-
position of B ∗0φ0 will depend on the Gram matrix [30] of the subspaces R(Bk), as illustrated in
Figure 4. Furthermore, it is noteworthy that the orthogonal projection of B φ∗0 0 onto the subspaces
R(Bk) can be interpreted as characterizing the optimal error exponent of the binary hypothesis
testing problem solely with the observations of Xk [12]. When the subspacesR(Bk) are orthogonal
to each other, the optimal inference approach is straightforward, involving the extraction of the
optimal information from each node by orthogonal projection. However, when the subspacesR(Bk)
are not orthogonal, different nodes may share various forms of common information. Our result
fundamentally demonstrates how to handle this shared information and extract the optimal features
through the decomposition of the information vector over non-orthogonal subspaces. This insight
provides a novel approach to address the challenges posed by the non-orthogonal subspaces and
reveals how to extract the most informative features effectively, ultimately leading to improved
performance in the distributed hypothesis testing problem.
5. Type-Based Hypothesis Testing over AWGN Channels
This section presents the optimal error exponent of the type-based hypothesis testing
problem over AWGN channels, along with the corresponding coding strategy. To begin,
we introduce several notations that will help in the presentation of the results.
Definition 5. Let [K] , {1, 2, · · · , K}, and for subset ω ⊆ [K], i = 0, 1, we define
Dωi ({RX }
(i)
k∈ω) , min D(QXK‖P ), (31)k Q ∈S XK
XK ω
Entropy 2023, 25, 1434 11 of 24
where
{ }
Sω , QXK : [QXK ]X = RX , k ∈ ω .k k
[K]
It would be easy to find that D ∗i (·) = Di (·), and D∗i (·) is as defined in (9). Moreover,
we define the following error exponent with respect to ω ⊆ [K].
{ √
{ } (θk − pk)
2
Eω , min max Dω( R
{ } { } 0 Xk k∈ω
) + ∑ ,
R 2Xk k∈ω , θk k∈[K]\ω ∈[ ]\ 2µσk K ω k
√ }
(θ 2
ω { } ∑ k
+ pk)
D1 ( RX k∈ω) + , (32)k 2µσ2
k∈[K]\ω k
where we have used A \ B to represent the relative complement of set B in set A, and where
µ is as defined in (8). We can also find E ∗[K] = E and E∗ is as defined in (10). Finally, we
define the quantity E, which will be shown as the optimal error exponent
E , min Eω, (33)
ω∈=([K])
where =([K]) denotes the power set of [K].
Theorem 3. The optimal error exponent of (4) is given by
1
l→im − logPn(Ĥ 6= H) = E
. (34)
n ∞ n
In the following, we prove Theorem 3 by both the achievability and converse result.
5.1. The Coding Strategy for Distributed Nodes
First, we define the different regimes of empirical distributions, for each k = 1, · · · , K
and for some γ ∈ (0, 1). Basically, the specific choice of γ does not effect the achievable
error exponent as long as γ ∈ (0, 1). It helps conduct the decode-and-forward and amplify-
and-forward coding strategies as introduced in Section 1.
Decode-and-forward regime:
{ }
M(0) (0)k , RX : D(R ‖P −γk Xk X ) < n ,k
{ }
M(1) , ‖ (1)k RX : D(R P −γX X ) < n .k k k
Amplify-and-forward regime:
{ { } }
Mc , ‖ (0) (1)k RX : min D(RX PX ), D(RX ‖PX ) ≥ n−γ . (35)k k k k k
Note that for each k = 1, · · · , K, the probability of the empirical distribution P̂X inMc( ) k k
is exp −n1−γ . Consequently, in the amplify-and-forward regime, we can transmit such
empirical distributions with exponentially large power by Pulse Amplitude Modulation
(n)
(PAM) while still satisfying the power constraint. Specifically, let PX be the set of allk
(n)
possible empirical distributions of Xk with n samples, and denote η
c
k , |PX ∩Mk |.k
(n)
We define the bijective function ξk : PX ∩Mck 7→ {1, . . . , ηk} as the indices of empiricalk
distributions. Then, according to the observed empirical distribution, the encoder of node
k (k = 1, · · · , K) is designed to transmit the signal
(
1− )γ
Qk(P̂X ) , ξk(P̂X) · exp n 2 . (36)k
Entropy 2023, 25, 1434 12 of 24
Furthermore, if the empirical distributions are in the decode-and-forward regimes, we
initially detect the true hypothesis and then transmit the bit using Binary Phase Shift Keying
(BPSK) with the appropriate power. By employing these strategies, the achievability result
can be obtained through repeated transmissions from all the distributed nodes. In other
words, the resulting encoder for node k is defined as follows:
g∗ ∗ ∗k = [gk , · · · , gk ], k = 1, · · · , K, (37)
where

√
 (0) pk − δ(n, γ), if P̂X ∈ Mk k
∗ √gk (P̂X ) ,k − pk − (1)δ(n, γ), if P̂X ∈ M , (38)

k k
Qk(P̂X ), if P̂X ∈ Mck k k
and where
Pn(P̂X ∈ Mc ( 1− )γ
(n, ) , max k k
) · (n + 1)2|X |δ γ k∈ M · exp 2n
2 . (39)
k∈[K] P cn(P̂X /k k)
Proposition 2. The encoders as defined in (38) satisfy the power constraint (6), and
l→im δ(n, γ) = 0. (40)n ∞
Proof. See Appendix E.
5.2. Decision Rule and Achievable Error Exponent
After the decision center receives the output signals g∗1 (P̂
∗
X ) + Z1, · · · , gK(P̂X ) + Z1 K K,
we then compute
1 m
, ∑ [g∗θk (P̂X ) + Zm k k k]i, k = 1, · · · , K,
i=1
where [·]i denotes the i-th entry of a given vector. Then, we conduct the log-likelihood ratio
test (LLRT) to detect the true hypothesis:
Pn(θ , · · · | = ) Ĥ=H, θ H H 00
log 1 KP · · · | R 0. (41)n(θ1, , θK H = H1)
Ĥ=H1
Note that exponentially large power is allocated for the empirical distributions in the
amplify-and-forward regime (cf. (35), (36)); the decision center can correctly detect the
coding regime of the nodes with super-exponentially high probability, i.e., for k = 1, · · · , K,
( ∣ ( ))
l→im −
1 1−γ
logP P̂ ∈ Mc∣∣θ
∞ n Xn k k k
≤ exp n 4 = ∞,
n
1 ( ∣ ( ))− ∈ M ∣ 1−γl→im logPn P̂X /
c
k∣θk > exp n 4 = ∞. (42)n ∞ n k
Therefore, we can assume that the decision center knows the coding regime of the nodes
and define the following regime of the received signals with respect to subset ω ⊆ [K].
{ ( ) ( ) }
Θω , (θ1, · · ·
1−γ 1−γ
, ′θK) : θk > exp n 4 , ∀k ∈ ω, and θk′ ≤ exp n 4 , ∀k ∈ [K] \ω ,
Entropy 2023, 25, 1434 13 of 24
for all ω ∈ =([K]). When the received signals (θ1, · · · , θK) ∈ Θω, the decision center can
recover the empirical distributions P̂X (k ∈ ω) from the received signals θk by the decoder:k
(⌊ ( ) ⌋)
Q−1
1−γ
k (
−1
θk) , ξk θk/ exp n 2 + 0.5 , (43)
where b·c denotes the floor function [31]. The following result shows that decoding error
of (43) can be neglected.
(n)
Proposition 3. For all P̂X ∈ PX ∪Mck , k = 1, · · · , K,k k
1
l −1
n→im − logP(Qk (θk) 6= P̂X ) = ∞. (44)∞ n k
Proof. See Appendix F.
In the following, we denote p′k , pk − δ, for k = 1, · · · , K and discuss the decision
error exponent when the received signals are in Θω. For k ∈ ω, the empirical distribution
P̂X can be recovered by (43), and for k ∈ [K] \ω, node k detects the hypothesis accordingk
to the observed empirical distribution and transmits the detected bit by BPSK (cf. (38))
through the AWGN channel. Then, the decision center detects the true hypothesis from the
received signals by LLRT (41), which can be reduced to
Ĥ=H1
Ẽω ω0 (θ1, · · · , θK) R Ẽ1 (θ1, · · · , θK), (45)
Ĥ=H0
where for i = 0, 1,
Ẽωi (θ1, · · · , θK)
√ √
(θ − p′ )2 (θ ′ + p′ )2
, min D∗
k k k k′
i (P̄X , · · · , P̄ ) + + ,
ω̄∈= 1 XK ∑ ∑([K]\ω) 2 2k∈ 2µσk ′∈[ ]\( ∪ ) 2µσω̄ k K ω ω̄ k′
where =([K] \ω) denotes the power set of [K] \ω, and where for k = 1, · · · , K,

Q−1 (θ k k), if k ∈ ω
, (0)P̄X PX , if k ∈ ω̄ . (46)k

k
 (1)PX , if k ∈ [K] \ (ω ∪ ω̄)k
Consequently, the decision error exponent is characterized by the following proposition.
Proposition 4. For any e > 0 and ω ∈ =([K]), the decision error exponent by the decision rule
(45) satisfies
1 ( )
l→im − logPn Ĥ 6= H, (θ1, · · · , ) ∈ Θ ≥ E

θK ω − e, (47)
n ∞ n
where E is as defined in (33).
Proof. See Appendix G.
Noticing that the overall decision error probability is
Pn(Ĥ 6= H) = ∑ P(Ĥ 6= H, (θ1, · · · , θK) ∈ Θω),
ω∈=([K])
Entropy 2023, 25, 1434 14 of 24
the following proposition establishes the achievable error exponent by the coding strat-
egy (38).
Proposition 5. By using the encoders g∗1 , · · · , g∗K as defined in (38), and the decision rules Ĥ
from (41), the achievable error exponent is given by E, i.e.,
l→im −
1
logPn(Ĥ 6= H) ≥ E, (48)
n ∞ n
where E is as defined in (33).
5.3. The Converse Result
In this section, we show that E is indeed an upper bound of (4), which establishes The-
orem 3. Our main technique is to apply a genie-aided approach, which provides different
kinds of additional information to both nodes and computes the corresponding error expo-
nents under additional information. As depicted in Figure 5, given index set ω ∈ =([K]),
suppose that for all k ∈ ω, node k can know and cancel the channel noise in advance;
then, the channel is noiseless, and the decision center can perfectly receive the empirical
distribution P̂X . On the other hand, suppose that for all k
′ ∈ [K] \ω, we can leverage the
k
true hypothesis H to node k′; then, with such additional information, we can establish the
following upper bound of (4) (cf. (33)).
Proposition 6. Given index set ω ∈ =([K]), suppose that for all k ∈ ω, the decision center can
obtain P̂X perfectly. Additionally, for all k
′ ∈ [K] \ ω, node k′ can obtain the true hypothesis H.
k
The resulting optimal decision error exponent is
1
l→im − logPn(Ĥ 6= H) = Eω, (49)n ∞ n
where Eω is as defined in (32).
Proof. See Appendix H.
3SNIRcoA<hl5aBeLxFi+t" AsihHaN1d_BbKajsneP6z4v=u"=h>OAtCtXccuViLRsQ DF2AN/7o/Bq2qtuFl4HScoKkBQZHFZE/M/psEpR1MmWRVW6DMHZAlFqx/EYEBwW0k4lfKIwCJ4rrGuOHcHNhzTrROoxQhFof0UqOtMg3FPRmrzHDg1G36zjs6yToztpqZGr7V EUEcdHVUV46VbJK5RqrWgplmQNJ7VLMNyCqrickM3eVbowuAF2+FKXTV4Xq3t+Pv5abAQDOEKutjBUwgiDa1uGGIIEcCyM2gSrYJkDFBT5nJk8pd3TIPxKnNi3WlYupclVMX ud6AdProStBfY1qu1vAu7JAtCI84hxCerDP0UKUtA3K4pShEFujsUKrw5A1ae3XdseO+f2buRtgylhc/Gl+a1p2V8a1Jvd1/xXdWqZHH+pwpnzlJebgRfd4J6d/xnC52dbH5B c3P9qxfai8vRSdBZlm5Lp+7Tk/Tmsn=XxYiAH9R3OAJbjLO5JqUNzxSm3+bd<wtXt/gAd+7TVvYzR/Z0tTB5aFRZnSl8SlupXqpNibUOOaL3MNfoVOPvw2=I/Ya8e0ih>
l5u<GPta6tHeaxuidtk Ss=hsaX1y_+blaQhWbre"O4Re fNsCAAg8Sbcf/LQBUEBLSNBDFD1Z3/DwBfVKSRdufopLdKsOdwJIGHNt39NihG8IVZRIF10qGof3iNNDg="G7VAhc> 7JVXGb1ZkFzin31pkUZhmtsG1fn3cbpEpwxAjzxst636s3LsHaXNMa9xISRpra2hr6aTCxbBsLYBLRzLoqYLeWObnL ZhzlHL7haJ0QKQANiCt8NnXe0g7QnKtlETDcZsBT6cp1tDahrq5ebEs9cOljfW8h9V2taxt7piGJplKx7arv3r0+Xn 9DWc6MKtcH3u8mcRMzgM5C9mgFxwS2oKxLuig6aj3ETZFgw3GQ9XzKvajsRUbemxzcKXKRDkbrwDEgCgcwGixQCx+g fqN5OOxVom6X77lFt359gVVeVZ744l2dlje9XV++FzP+hzGEAd2nkANIkWMcWt816Ls/23XguVVhotL/3qIXuqdreq 0gIoFkSodVGgMzj95CNIRuP3cnfUpAoYqsEmkzrAnwhe7eSvs4HYiSp9TWg7po/qHofyMgdUSV81+jF093jB7ilVzS ySgftalHC9G>ttei9xQS/<==w
(1) · · · (n)x , , x TClAhAvAb>E"t=J8Y34r0Yr7zolxip/Lj"=46esab_1ahs tixetal<C cFDKYFwF6fSlVAoIcFqKMqF6EI/bvlp3F1AY7W81+CGRk9EQrfo1rN2DFANsSLHVhciXZ/JE5Ar3aoJLq0EEJYCb39Uk4VgRzz+IZW73cXfhjA pTmpuTMZSqPvY2SykZjqsQ6buY7ppL5mwqbi6VS9vqfr2MAawlGZhDs4ab3onTDQM4uuqg21KVgvmgrtqOpxlr1tSRZTE4fWolmZieEXKWrEut 8mXY+ggmVYgAiIjZuwIW8mFdRRC1C0hwBeuG4XQbRDd3C7Rq/J1WPoR53sMJKBIZ3gstuBd+ezvtRE+yUK1nqDjbbc6yrIrAFvMnrGGi0tSlIi TRD+/T2a68WX0PzZuRuaWRFtHM9/M+u/aeq8xi71M/PU1MpG2mcVX0ZBSMz356X2jPX32vbWvgmyMrdmwouf1fr4Wd/ssPnq767Vk5l73fMDz+ 1oBBvWzPiMeyRHCpI7pNVQNLxxdzMFrLwKVNeRE4MPYOFITKkPA0Ko0m2ZU6lVnQAoBB5FPwAltIo/I<eat>ixk k P̂Xk k
leaxai_te 1sbhsa<t64m=s"duteLYZZMHei0AX=mta6eRFQSohYe+AYS1x92TCtjbR9/SdsDcxDgRLuVqcSHKCpAlw3">L ychL0pbaRMF/V9AkTnvIjAwpiMjYwWK6uUAUywnFi/Co+mPwNDO18zPCVqA6RuV AJOwvmYPKUV85NVVXTjHd1yVJt94WigPyaLR6Qmzvba3XL2ieO8HRKF2nmozHLd g2Jd/+G2oUq+omVX1f2/OndcYdlbZsPX+4maMZVZUKp6VCHeKmYtJ5URWSyKsZ0 KvSJGU/EMS7jgkJ9BhgQ3JGc4tgo4dAWgNROgM1ERhhCtUtkDxMw2oTtc0mgQ0p E+uxRzdH3ipJSK229RNlEmWc+Mbl2RIi1qozbRFWnaILUUUXys3iIaetB/7Zmp/ 79olrWdTGvTwo1RpajpdbJ+ibrO+q/qJH6H/vqsoTzFYKsw73gA+bfQL/rO0Dc6 l2jjqLnNztk9+bXgLNZPrzDrL/tVOh848/g/eBm//PsRNOZ9XZ+PRuaZdmXgoT8 mXMhQBs<vifkcdCtldrSG6G0P+A=pl0xbCgVPYnauLMEFv5AE41RJwK=j/PaKet it>8k 2 !
Decision asteT6M4/=a"e6iiZb=4bhkNg8MVFRfN7_Y1"bhuitla<exts i gMxF/DL0FdF3b/z0wXRSUHTke0CBCRM/W/recuxK0CUdAFxBxVqxBIwb1jopksFgA+3LorgPpFUKyhM461bQhTVGXcitA0HAk>qpqA grXHnHWiHNbTjTh2p91c0/fMcD7gHUcHrFW4nZLH1YQvAJK4xxKMQ7cn3tJ1rwmNwVb7+UdbfMXNX229r17TcFm4q4oZ5uz8NacL5R mmCCuLi9KktKrqHL5mBVYfFxoMquc122YWOEnyDz2Jtgnoij6/VBUKYkAQPJjBSBHGI14pPaEvIU/c5IohSbkGZhLoCwiAJSvHUBqg rTFs11JvKULCRtH0qSqkWGwdPZKeuUzcJchy7zDBB39bRDbXatXgn9Due2rV9Pz7pPq39XvW0nF5BQuoV++OOlHIH1mmV0suYUIDW7 5ufnYOBBfLs6mOfMy7frs50Jdp2GRDzse9wJmfqN/Sa3ZpYp7PnxTZx+GWlmDY0Fhm9VfG+DoIf0YlGncE5VJE8alTOaz1XXoumVt6 S0kh1LtHq1xHzGIkaPaoi=><E/klVa6txeaUi
<Zl7a4tueex"iAtf ts6h=aP1u_6bXaGscAV CS9nHQNhis6iKgdL7l5sCBCkAL=W"I>CAaAIA3hbXBiocGhvVIG37OTIhotGBJFIDX15eX GWecXZQmp5iNkmiH7dwsQwBKEKZlF4wA+cyo9q+IOglUaRSRv5gkQuajbOGLnnzpnkZa6V EhvkNOMzYXiuvWP5MytLI85lJy+b3cdRh1rTE+hr6EnweUUMS0ISavKP53iDNmWvlp7gNF cdL3/Zap4m3J4UgiXmO1dQQZn2XSii8uYMRntTt9R87MS1EXvQ9w2+54Ts0oxXXTmDjXUnO7nwAIqNXj9cRMBS8gCWnZo5gFkkXXQkgKKCIjBHJIdITo45HdJN4/50IJjOkCZRIt7l4RnhUGk2EajuEHdJm5hnsO0a69oAe8srwX6dt2wG5OpOBZpWL6bho/0lF6b/d2z54+VMFPEb jx74lrLvHXpv64usVxwo89LBFYK2Yve6Gf2SB7TPaAg7TX518/wci3rbFz+D6Ov4zVUnfvP yJzLqps/n/WYA4L0voU0l80dQfrGr1/CebTHC1+F9ftb1zbxiOabi+SYrT4atu0ee4O+lw >ttUSseixlta4/<KX5RAvS92R4fYW
8k0 2 [K] \ ! Center ĤMM6FVT50HICtp6xkxHbWjyggcWN2BfZHX46=E>xA5C+37UwSpaIs0e"6A4A=r"ikhuFmaSR2E50HP7LZjOxPEUwal<latexit sha1_bEW clc7EuVzmESuOQp3d1ZZf2AG4XFsymBv4Z4tzmLu72xeqGOmzd66RFzdHZzgB/R40Bt/jwD4gEWlIBJZReuHE77ygjb27Ixv9vXR5yIURXU9CRtlrYO+NeRrapa773Xdrr+dqeVaV0oMf686IJ50cmvPTRYk80cfmapfEb7FWoTkpKgl0G8yl6KYWtXUlaVhy04k9crLW4b/0qi3h7K yPf/gVwWIemu0j7q0bls0lFPfm+luP23Cryvu/fs+vjTlSe/sGIHsZaui8dAVXy2WumxLnDXI1vz5jAUDMz6aLFMS2up2nh/Iy2dQ/NJEPjXqOw1MVfvvP2qRDEWipGSsaZTyOIHLgsVcujBmBkiqpEbkiLe2oFvS0wczEVu7INgqQJY3wc0TGad89xmheoYtVWVXt9E/BmvXaJP+7 nh6N3OKf1W7wi0z7zh1f=28bE/<VxcG+l/3e5Tt7OBvPxaVFRat9If3D>5fMkBYvPT3dGc2FmoxF2smqZK7zTJFzyeb9iGxefgeaw+BYa8fGpH/dzO3v=wqGpMAFuz9MPcNQS1i
Zk0 ⇠ N (0, 2k0Im)
aBbr6UhRs=1i 6eU<1zg1hmx"L4zeFat_Caqs/trxrthl DMQo8ZxL2ECkg38iRuEARhKFKqseysO1spf"nAFi0HMRd3JE9NvIMZhJuha2dMLX9eY=E>PAoCXHWcNVyLagGBTDcT AvEVwEHkGWdij7wXzhWShwiSb18lViVJV63PwSu3NENUnT16IVXVXEc7mxs6rhRVmWbkqsxWGsqOfp8113VlbnNtVz VfwiEltSnN/lOSMVOLwsDC+bLUmt6Cs0q+W70reUf9TW/2/7lREENU35RarlY+yhcfEl+iNqfkXaT9K20eXurptLYk BIsQbFHyQ0zGBuiN5ATSxL2w3OIUiEuKkLVAWpwyF2DyNLdrR1LNFk1Li1iaR72gDYZGMmfjQNILlMMGAr2zHlCY5Q N3+y7VqCLjzu+e7k0zhPJtd0W5azs/LlWthF1p/Jj0tLSHlbI3lZBw32ElVn3spCWdCrTt6dkAlHuZXoxf1j52S8Wp 6riiYYqJhF/Z6l/NQrTdPh4/kmXh4aD8u/MoAKj3PL37+w5J2JDg5+/xSRx3o9C+g9wImM5ZOjcQ1zhGhRB/MySdq5 pQmna8CWk2XdymJoFzQq</latexit>UuT5Vjq
EILE=X"j>7ArAyAzCxsyntiic4ScyErMIKysSBwcuelaG<Ax44pllbAuiJihxqxQt /iL0"=46esab_1ahCMxTg RpUkJFSFaKmXJnuAU8kmxnqweUlbOm0AMMklgl8nvXSLyB0XqnzysBzQPxCGyomepmLpEOiiNSz+Uq1qMazA8vtxMSyV0vxkOHLCAgE6KmxgQIvwIDxUmaQgkBlglUQSHl1aMxZZXpRc8FRRXxZT6JvaXB1Uu1rcEF5oCyaHW8ZNc ZqBo2FgiPRcUGBdWgJwlpgq+alWabGRWggSQBuVoNtMx5S3oQK+hiWNohIyhJElocmdgo0qkp0N4hG88nahKFhgocHfpCAVP0MArSMSYCrBDGsWZDa3EUs/O2ErBwMwpDPkMyQylDLkMqQx5DCVAdg5DIkMxEEYzGDIYMBQAWxIQ CNrIwDYMNIFmBOqS+QBg7Llj9moGckNBPNggMvNHAGHQeNWZVsfhkypGciGRCBpgHbwNDZogSBAFxgcCYFB6+cAGZMoy3wDoszN0FDZMJ300ITZAKJKNVlhBo2ZcA2T56rGr/KaGikfAWAgXdLgZ3CmFMkGDl081FCdgQKcXoFVqN >DYwAU1=wqK=txieta/l<
(1)
x 0 , · · · (n), x 0 , H Tjs6E+=BNI6"kASAagei4h"GHFyUCft+9g4S1<FlkaUtme8xPi=t> AsChHac1V_7bg Vx993FD2u7/iK2gg2YojGJs6KqFgFbSxQQSCbubMS7ZF7uTQFxS2PoDFlYKEsRGv8HGH7DwE8RSwcbCu5sFUBHvsnPPnLnnz5d5beYjhNxqk4GHwZ14PpnXdC12F2DTz0FT1Z czWe1WzDdvOq4nFDt3hW6MLgecfliqkaPKdW14P1XJ27nm5bu6Lh8H1TqVj6ga4pgFqhrSfLKo2kbZa5iU/Eqz5Fdnm6miVbKXwhUST37KAU08CK2hPcqmJa5hALjskGEBAMFxI7xCYCe0ybj2eOSUwxzZarfCnE/EoztOjX7gqRChqALwloT8j+YgqqFQowFepANQstLwW+YieOqJWeBQDHpqU57zINDBUEKy2Bd3pbZFMXLt9gz9pux2+ZvP07HPA3dXrXtf kdA4lWOFoUbPXw1K95aIPJTMlt5r3kVG7XaZQM/UwGy37LYaM8/jkR/lZu4e/ksaj0kt5igr1WVU+f1jyNyN23+jfYYCbygOAxerTBvS1J/W0eTt7lCk2k6cekqkFX/AY/sFE /=<tlaiex>t0ZcEAtGAX=89m"mAviN4R"<xl2aKtnexxiivta 8sQhFac1r_=b>aAsCen6ciiG6OqCJBZHdNa9Pt7mZ32oTGH5C4cnhebIt82JSyL3Nzm8HnK564SCp8b85MucjgOsDtPNvi7OVAPPEcAfEUF8Rwi/NwGHB6v6poqwiiS6JxcPLNrWcUGeF V20Rjlhjy1334dymPXJd0edG97nMgEeRM8GVIaHhrbqi306m2JFbEy1OpvX46nr4K9svCd3VZV6rMZMg5NWldoiC02RlaUZiNnuCBQDZ4LpnhKwxmFr41u0K r6mrr2bDAPYqIo9p+uZTIuoXMRBqI9FeN2tnPlNBWusR7wj2RocWmXG3JPF6KeqaxIR65uJXrYG6zuY6GZL6dGAxbgZy5HYvb75iBUVgsvnNIo263ngmCD1I jzmxkZk4+G+fxKIK1BBx3YhJ6PfDx38/gBoB/jBxbwmccUpUwfufe8ljPNeUBJ1P91lc7nnOz1zmR3YEYrVTT0v/+zrvOTC9xAhgbbToZczxIWz2pypNT+sL mkajieDRxSSAo>ltyi<xze/t=aVI0YXACzS85nmFlYk k 0 P̂X 0 , Hk k gk0(·)
Figure 5. A geometric explanation of the genie-aided approach, which can lead to Eω as the upper
bound of the error exponent in (4).
Notice that Proposition 6 is verified for all ω ∈ =([K]), and we cannot obtain a better
performance than Proposition 6 for the DHT over AWGN channels without the additional
information. We then conclude the following error exponent upper bound.
Entropy 2023, 25, 1434 15 of 24
Proposition 7. For all possible encodes g1, · · · , gK under the power constraint (6), the correspond-
ing error exponent with respect to the LLRT decision rule satisfies
1
l→im − logPn(Ĥ 6= H) ≤ E
, (50)
n ∞ n
where E is as defined in (33).
Finally, by combining Propositions 5 and 7, Theorem 3 is proved.
Remark 2 (Local-geometric interpretation). Note that the expression of the optimal error ex-
ponent E as defined in (33) is quite intricate, which could limit our understanding. To simplify
the analysis, we introduce the local geometry assumption as given in (24). In Appendix I, we
demonstrate that the error exponent corresponds to a more manageable expression
1∥ ( (1) (0))∥2 p
E = min ∥BωB†
k 2
ω ψω − ψ ∥ω + ∑ + o(e ), (51)2
ω∈=([K]) 8
k∈[K]\ 2µσω k
where for ω = {i1, · · · , i|ω|}, we have defined
[ ]
Bω , Bi · · · B1 i| | , (52)ω
(i)
and ψω ↔ (i)[P K ]X ···X , i = 0, 1. Given ω ∈ =([K]), the first term in (51) represents theX i1 i|ω|
optimal error exponent (cf. (29)) when the decision center can access the empirical distributions
P̂X , k ∈ ω. The second term corresponds to the optimal error exponent when node k, k ∈ [K] \ωk
can know the true hypothesis H and transmit the bit using BPSK modulation. The total error
exponent is the sum of these two parts, and E aims to determine the minimum sum among all
possible splits of the index set [K]. In other words, E finds the optimal trade-off between accessing
empirical distributions at the decision center and having individual nodes transmit bits with BPSK
modulation.
6. Discussion
This paper discusses the DHT problem over two communication models. The first
is the noiseless channel, which is mostly considered in current distributed learning and
federated learning systems [9,11]. For the noiseless channels, we show that by using one-
dimensional statistics from different nodes, it is possible to achieve the same error exponent
when the decision center has knowledge of the corresponding empirical distributions. This
result is significant as it simplifies the coding process at distributed nodes, allowing them
to transmit only the necessary statistics rather than the entire empirical distribution, which
provides a practical implementation of the result in [5]. This finding proves the rationality
of transmitting statistics as the most widely-used strategy in distributed learning and
federated learning [11].
For the AWGN channels, this paper introduces a novel coding strategy, which cleverly
combines decode-and-forward and amplify-and-forward techniques. The underlying
concept of this coding strategy is based on the observation that the probability of the
empirical distribution deviating significantly from the true marginal distribution diminishes
exponentially. Consequently, by employing sufficiently large power, we can transmit the
empirical distribution almost perfectly to the decision center while satisfying the averaged
power constraint. When the prior distributions are not 1/2, the strategy still work for
the optimal error exponent, and the only difference is to adjust the BPSK points for two
hypotheses according to the power constaint. The demonstrated optimality of the achieved
decision error exponent further indicates that the proposed coding strategy is highly
effective and successfully approaches the theoretical limit within the given constraints of
the problem.
Entropy 2023, 25, 1434 16 of 24
7. Conclusions
This paper focuses on investigating DHT problems over both noiseless channels and
AWGN channels, where the distributed nodes are constrained to encoding the received
empirical distributions, driven by practical computational considerations. In the first
problem, we demonstrate that utilizing one-dimensional statistics of distributed nodes
and simply summing them up as the decision rule can lead to the optimal error exponent.
For the second problem, we propose a coding strategy that combines decode-and-forward
and amplify-and-forward techniques. We further introduce a genie-aided approach to
establish the optimality of the achieved decision error exponent. Overall, our findings
offer valuable insights into coding techniques for distributed nodes, and the established
strategies can be extended to more general scenarios, broadening the applicability of DHT
in diverse settings.
Author Contributions: X.T., X.X. and S.-L.H. contributed to the conceptualization, methodology,
and writing of this paper. All authors have read and agreed to the published version of the manuscript.
Funding: The research of Shao-Lun Huang is supported in part by National Key R&D Program
of China under Grant 2021YFA0715202 and the Shenzhen Science and Technology Program under
Grant KQTD20170810150821146.
Institutional Review Board Statement: Not applicable.
Data Availability Statement: Data sharing not applicable.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript;
or in the decision to publish the results.
Abbreviations
The following abbreviations are used in this manuscript:
DHT Distributed hypothesis testing
AWGN Additive white Gaussian noise
BPSK Binary phase shift keying
LLRT Log-likelihood ratio test
PAM Pulse amplitude modulation
Appendix A. Proof of Lemma 2
We have the following facts:
∗ (1) (1) ≤ (1)‖ (0)D0 (PX , . . . , P ) D(P P )1 XK XK XK
and
∗ (0) (0) (0) (1)D1 (PX , . . . , PX ) ≤ D(P K ‖P K ),1 K X X
from which we know D(t̃) 6= ∅, where t̃ , { (0)‖ (1) (1) (0)min D(P K P K ), D(P K ‖P K )}. Moreover,X X X X
from the facts D(0) = ∅ and
D(t1) ⊂ D(t2), for all 0 ≤ t1 ≤ t2, (A1)
we define
t0 , sup{t ≥ 0 : D(t) = ∅}. (A2)
We also have
D(t) 6= ∅ =⇒ D(t− e) 6= ∅ for some e > 0. (A3)
Entropy 2023, 25, 1434 17 of 24
Indeed, since D(t) is non-empty, (RX , . . . , RX ) and e > 0 exist such that1 K
D∗i (RX , . . . , RX ) < t− e,1 K
for i = 0, 1, and thus D(t− e) is non-empty.
To sum up, from (A1)–(A3) we obtain D(t) 6= ∅ for all t > t0 and D(t) = ∅ for all
t ≤ t0.
Furthermore, to prove (14), we define
D̄i(t) , {(RX , . . . , RX ) : D∗i (RX , . . . , RX ) ≤ t},1 K 1 K
and D̄(t) , D̄0(t) ∩ D̄1(t). Then, for all t > t0 we have
min max D∗(R , . . . , R )
RX ,...,R1 XK i∈{0,1}
i X1 XK
= min max D∗i (RX , . . . , RX ) ∈ [t1 K 0, t],
(RX ,...,RX )∈D̄(t) i∈{0,1}1 K
where the second minimum exists since D̄(t) is closed and bounded. This implies that
t = E∗0 (cf. (10)). Hence, marginal distributions R̃X , . . . , R̃X exist such that1 K
D∗i (R̃X , . . . , R̃ ) = E
∗
X , i = 0, 1. (A4)1 K
Finally, to illustrate the uniqueness of (R̃X , . . . , R̃X ), suppose that (14) also holds1 K
for (R̃′ , . . . , R̃′ ) 6= (R̃ , . . . , R̃ ). Let R̃′′ , (R̃ + R̃′X X X X X X X )/2 for k = 1, . . . , K; then, it1 K 1 K k k k
follows from the strong convexities of D∗0 (·) and D∗1 (·) that
D∗(R̃′′i X , . . . , R̃
′′
X ) < t0, i = 0, 1,1 K
which contradicts (A2).
Appendix B. Proof of Proposition 1
(i) (i)
We know that D ∗ ∗i(E ) ⊂ SX (h ), for i = 0, 1. This implies that S ∗ c ∗X (h ) ⊂ D1−i(E ),
where for t ≥ 0 and i = 0, 1, we have defined Dci (t) , (PX × · · · × PX ) \ Di(t).1 K
Moreover, let (R̃X , . . . , R̃X ) ∈ PX × · · · × PX be as defined in Lemma 2; then, we1 K 1 K
have
(0) (1)
(R̃X , . . . , R̃X ) ∈ L (h∗) = S (h∗) ∩ S (h∗). (A5)1 K X X X
As a result, for i = 0, 1 we have
E∗ = D∗i (R̃X , . . . , R̃X )1 K
(
≥ S (1−i) ∗
∥
∥ (i)
)
D X (h ) PXK
= min D∗i (RX , . . . , R )
( − ) 1 XK
(RX ,...,RX )∈S 1 i (h∗)1 K X
≥ min D∗(RX , . . . , RX )
(R ,...,R )∈Dc ∗ i 1 KX1 X i (E )K
≥ E∗, (A6)
which implies (18).
Appendix C. Proof of Theorem 1
On the one hand, note that from the Markov relation
H− (P̂X , . . . , P̂X )− (u (P̂ ), . . . , u (P̂ )),1 K 1 X1 K XK
Entropy 2023, 25, 1434 18 of 24
the minimum possible decision error can be obtained when we choose the empirical
distributions P̂X , . . . , P̂X themselves as the statistics.1 K
One the other hand, from Proposition 1, the error exponents associated with the
( (1) ∥ (0)) ( (0) ∥ (1))
type I error and the type II error are D S (h∗X )∥P K and D S ∗X (h )∥P K , respectively.X X
From (18), both exponents are E∗, and thus the error exponent for Pn(Ĥ 6= H) is also E∗.
Appendix D. Proof of Theorem 2
To begin, we define ψ , ψ(1) − ψ(0). Then, for given fk : Xk → R it follows from
Lemma 17 of [13] that the exponent based on the feature h(xK) = ∑Kk=1 fk(xk) is
〈 〉2
1 · ψ, ζE = 2‖ ‖ + o(e ),8 ζ 2
where we have defined ζ , B0φ0 ∈ R(B0), and where φ0 is as defined in (27).
Then, note that the projection matrix ΠB satisfies ΠB = (Π 2B ) and ζ = Π0 0 0 B ζ.0
Therefore, from the Cauchy–Schwarz inequality we have
〈 〉
〈ψ, ζ〉2 (ψT 2Π ζ)2B ΠB ψ, ζ ∥ ∥0 0 2∥ ∥
‖ζ‖ =2 ‖ ‖ = ‖ ‖ ≤ ΠB ψ ,ζ 2 ζ 2 0
where the inequality holds with equality if and only if ζ takes the optimal values
∗
ζ = c ·ΠB ψ,0
or equivalently, B0φ
∗ †
0 = c · B0B0ψ for some constant scalar c 6= 0.
To determine the value of c, note that we have ζ∗ ↔ h∗, where h∗ is the optimal feature
(i)
as defined in (17). Note that in (21), for each i = 0, 1, Q depends only on the product
∗ X
K
λih ; we may assume λ0 = 1/2 and simply use λ to denote λ1. Then, we have
(0) 1
Q (xK
( ) (0)
K ) = P̃
2
K (x
K; h∗, P
X X XK
)
[ ( [ ])]
(0) 1
= P K (x
K) 1 + h∗(xK)−E h∗(XK(0) ) + o(e)X 2 P
XK
( √ ) [ √ ]
= P (xK) + P (xK) (0)
1
XK XK ψ (x
K) · 1 + PXK (xK)ζ(xK) + o(e)2
√ ( )
1
= P (xK) + P (xK) · (0)(xK) + (xKXK XK ψ ζ ) + o(e),2
which implies the correspondence
( )
(0) 1
Q K (x
K)↔ (0)ψ + ζ + o(e) .
X 2
Similarly, we have
( )
(1)
Q (xK)↔ (1)ψ + λζ + o(e) .
XK
Then, it follows from the second-order Taylor series expansion of the K-L divergence
that (see, e.g., Lemma 10 of [13])
( (0)∥∥ (0)
) 1
D Q P 2K K = ‖ζ‖ + o( 2e ),X X 8
( 2(1)∥∥ (1)
) λ
D Q K P = ‖ζ‖2 + o( 2e ). (A7)X XK 2
Entropy 2023, 25, 1434 19 of 24
Moreover, note that since (cf. Lemma 9 of [13])
[ ] 〈 〉
E h∗ 1(0) (XK) = (0)ψ + ζ, ζ + o( 2e ),
Q K 2X
[ ] 〈 〉
E ∗(1) h (XK) = (1)ψ + λζ, 2ζ + o(e ),
Q
XK
we have
[ ] [ ]
0 = E h∗(XK(1) ) −E ∗(0) h (XK)
Q
XK
Q
XK
〈 ( ) 〉
1
= 2ψ + λ− ζ, ζ + o(e )
2
〈 ( ) 〉
= c ψ + λ− 1 c ·ΠB ψ, ΠB ψ + o( 2e )
2 0 0
[ ( ) ]
· − 1= c 1 + λ c · ‖Π ‖2B ψ + o( 2e ). (A8)
2 0
( (0)∥ (0)) ( (1)∥ (1))
As a result, it follows from D Q ∥ ∥ 1
XK
P
XK
= D Q
XK
P
XK
and (A8) that c = 1, λ = − 2 .
Then, we obtain
∗
ζ = Π † ∗B ψ = B B0 0 0ψ = B0φ0 ,
where φ∗0 , B
†
0ψ.
Finally, the optimal error exponent is
∗ 1 ∥ ∥· 2 2 1
∥ ∥2
E = ∥ΠB ψ∥ + o(e ) = · ∥B ∗φ ∥0 0 + o( 2e ).8 0 8
Appendix E. Proof of Proposition 2
. ( ) .
According to Sanov’s theorem, P (P̂ cn X ∈ Mk) = exp −n1−γ , and Pn(P̂ ck X ∈/Mk) = 1.k
Then, we have
Pn(P̂ c (X ∈ M ) · 2|X | · 1−
) ( )
γ
k k .(n + 1) k exp 2n 2 = exp −n1−γ ,
Pn(P̂ ∈/McX k)k
which will converge to 0 as n→ 0. Additionally, for the power constraint,
( ( ))2
E[g∗2k (P̂X )] ≤ (pk − δ(n, γ)) ·
1−γ
P (P̂ ∈/Mc) + |Mcn X k k | · exp n 2 · P(P̂X ∈ Mck k k)
( )
≤ p − (n, ) · P (P̂ ∈/Mc) + (n + 1)2|Xk | · 1−γexp 2n 2 · P(P̂ ∈ Mck δ γ n Xk k X k)
≤ pk.
Appendix F. Proof of Proposition 3
Note that equivalently,
∗
θk = gk (P̂X ) + Z̃k k, (A9)
where Z̃k ∼ N (0, σ2k /m). We then apply the typical result for Gaussian tail [32], i.e., for any
α > 0,
1 ( ) α2− l→im logP Z̃∞ n k > α = ,n 2µσ2k
Entropy 2023, 25, 1434 20 of 24
which implies that
1 ( )
( ( ))
lim− logP Q−1 1 1 1−γ→ (Q (P̂∞ k k X ) + Z̃k) 6= P̂X ≥ l→im − logP |Z̃k| > exp n 2 = ∞.n n k k n ∞ n 2
Appendix G. Proof of Proposition 4
Note that
Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H0)
( ∣ )
.
= Pn (θ1, · · · , θK), P̂X ∈ Mck , ∀k ∈ ω, P̂X ′ ∈/Mc
∣
k′ , ∀k′ ∈ [K] \ω∣H = H0 (A10)k k
{
( ∣ ) ( ∣ )
∑ ∏ P ∣ ∈ M(0) · ∏ P ∣ ∈ M(1)= θk′ ∣P̂X ′ k θk k′′ ∣P̂Xk′′ k
ω̄∈=([K]\ω) k′∈ω̄ k′′∈[K]\(ω∪ω̄)
(
·∏ ∑ P (0)(θk|P̂X )P c ′k n P̂X , P̂X ∈ Mk k k , ∀k ∈ ω, P̂X ′ ∈ M ′ , ∀k ∈ ω̄,k k
k∈ω (n)P̂X ∈Pk Xk
}
∣ )
P̂ ∈ M(1), ∀k′′ ∣X ′′ ′′ ∈ [K] \ (ω ∪ ω̄)∣H = H0 , (A11)k k
where (A10) comes from (42). By decoding the empirical distributions from −1θk with Qk (·)
for k ∈ ω and Proposition 3, we have
(
∑ P (0)( c ′ (1)θk|P̂X )Pn P̂X , P̂X ∈ Mk , ∀k ∈ ω, P̂k k k Xk′ ∈ M ′ , ∀k ∈ ω̄, P̂X ∈ M ,k k′′ k′′
∈P (n)P̂Xk Xk
∣ )
∀k′′ ∈ [K] \ (ω ∪ ∣ω̄)∣H = H0
(
.
= P(θk| (0)P̂X = Q−1k (θk))P P̂ = Q−1 c ′k n Xk k (θk), P̂X ∈ Mk , ∀k ∈ ω, P̂k Xk′ ∈ M , ∀k ∈ ω̄,k′
∣ )
P̂X ′′ ∈ M
(1)
′′ , ∀k′′ ∈ [K] \ (ω ∪ ∣ω̄)∣H = Hk k 0
. ( )
= P(θk|P̂X = Q−1k (θk)) · exp −n · D∗0 (P̄X , · · · , P̄k 1 X ) .k
With
 √ 
(θk′ − p′ ′)2
P(θk′ |P̂X ′ ∈ M
(0) . k
 
k k
) = exp −n · ,
2µσ2
k′
and
 √ 
(θ ′′ − p′ )2
P | ∈ M(1) .
k ′′
(θk′′ P̂X ′′ k ) = exp
−n · k ,
k 2µσ2
k′′
we have
Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H0)
{ }
. − ( )= ∑ ∏ ·P( 1 ωθk|P̂X = Qk (θk)) · exp −n · Ẽ0 (θ1, · · · , θK) .k
ω̄∈=([K]\ω) k∈ω
Similarly,
Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H1)
{ }
. − ( )= ∑ ∏ P( |P̂ = Q 1( )) · exp −n · Ẽωθk X θ (θ , · · · , θ ) .k k k 1 1 K
ω̄∈=([K]\ω) k∈ω
Entropy 2023, 25, 1434 21 of 24
Note that P( |P̂ = Q−1θk X k (θk)) is not related to ω̄ and H, and then we can derive thek
decision rule (45) with LLRT. To compute the error exponent, we use Proposition 3 and the
.
fact that P(θk|P̂X = Q−1k (θk)) = 1 when θk = Q(P̂X ). Then, the optimal error exponentk k
corresponds to
min max min
{P̂X }k∈ω ,{θk k′}k′∈ \ i=0,1 ω̄∈=([K]\ω)[K] ω
√ √
(θ ′ 2 ′ 2
∗ k
− p (θk) k′ + pk′)
D ω̄i (R̄X , · · · , R̄ω̄X ) + + , (A12)1 K ∑ 2 ∑ 2
k∈ 2µσk 2µσω̄ k′∈[K]\(ω∪ω̄) k′
where for k = 1, · · · , K, and ω̄ ∈ =([K] \ω),

 P̂X , if k ∈ ω k
ω̄ , (0)R̄X PX , if k ∈ ω̄ . (A13)k  k (1)PX , if k ∈ [K] \ (ω ∪ ω̄)k
To finish the proof, we introduce the following lemma.
Lemma A1. For arbitrary functions v1, · · · , v` : Z 7→ R and w1, · · · , w`′ : Z 7→ R, where Z is
a given set, we have
{ }
min max min{v1(z), · · · , v`(z)}, min{w1(z), · · · , w`′(z)}
z∈Z
{ }
= min min max vi(z), wj(z) . (A14)
i∈{1,··· ,`},j∈{1,··· ,`′} z∈Z
With Lemma A1, we only need to compare each component in (A12), i.e.,
min min max
ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω ,{θk′}k k′∈[K]\ω
√ √
{
(θ − p′k k)2 (θk′ + p′ ′)2k
D∗0 (R̄
ω̄
X , · · · , R̄ω̄X ) +1 K ∑ + ∑ ,2µσ2 2k∈ω̄ k k′∈[K]\(ω∪ 2µσω̄) k′
√ √
′ 2 ′ 2}
∗ ω̄′ · · · ω̄′
(θk − p (θ + p )k) k′ k′
D1 (R̄X , , R̄X ) + ∑ + ∑ . (A15)1 K 2 2
k∈ω̄′ 2µσk ′∈[ ]\( ∪ ′) 2µσk K ω ω̄ k′
√ √
Given ω̄ and ω̄′, let ω̃ = ω̄ ∩ ω̄′. By selecting θk = p′k for k ∈ ω̃ and θk = − p′k for
k ∈ [K] \ (ω ∪ (ω̄ ∪ ω̄′)) in the minimization of (A15), (A15) equals
min min max
ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω ,{θk′}k′ ′k ∈ω̄∪ω̄ \ω̃
√ √
{
(θ − p′ )2 (θ ′ + p′ )2
D∗
k k ′
(R̄ω̄0 X , · · · , R̄ω̄
k k
1 X
) +
K ∑ + ,2 ∑ 2
k∈ω̄\ 2µσω̃ k ′∈ ′\ 2µσk ω̄ ω̃ k′
√ √
′ }
′ ′ (θk + pk)
2 (θk′ − p′ )2∗ ω̄ ω̄ k′D1 (R̄X , · · · , R̄X ) + +1 K ∑ ∑ . (A16)
∈ \ 2µσ
2
k ′∈ ′\ 2µσ
2
k ω̄ ω̃ k ω̄ ω̃ k′
In the following, we denote Ω , [K] \ (ω ∪ (ω̄ ∪ ω̄′)). For those indices k ∈ ω̃ or k ∈ Ω,
although they do not contribute to the Gaussian-like error exponents, they restrict that
Entropy 2023, 25, 1434 22 of 24
ω̄ ω̄′ (0) ′ (1) ′R̄X = R̄X = P
ω̄ ω̄ ω̄ ω̄
X or R̄X = R̄X = PX . By letting R̄X = R̄X = P̂X (k ∈ ω̃ or k ∈ Ω) thatk k k k k k k k k
can be optimized, we find the lower bound of (A15).
(A15) ≥ min min max
ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω∪ω̃∪Ω ,{θk′}k′k ∈ω̄∪ω̄′\ω̃
√ √
{
(θ − p′ )2k (θk′ + p′ )2′
D∗(R̄ω̄ , · · · , R̄ω̄ k k0 X1 X ) + ∑ +K 2 ∑ ,2
k∈ \ 2µσk ′∈ ′\ 2µσω̄ ω̃ k ω̄ ω̃ k′
√ √
}
′ ′ (θk + p
′ )2 (θ ′ 2k k′ − pk′)
D∗1 (R̄
ω̄ ω̄
X , · · · , R̄X ) + ∑ +1 K 2 ∑ 2
k∈ω̄\ 2µσk ′∈ ′\ 2µσω̃ k ω̄ ω̃ k′
= min E − ≥ Ee − e, (A17)
ω̄,ω̄′∈= ω∪ω̃∪Ω([K]\ω)
where we have used the fact that limn→∞ p′k = pk,
( )
D∗ ω̄0 (R̄X , · · · , R̄ω̄ ) = Dω∪ω̃∪Ω1 XK 0 {P̂X }k k∈ω∪ω̃∪Ω ,
′ ′ ( )
D∗ ω̄1 (R̄X , · · · , R̄ω̄X ) = Dω∪ω̃∪Ω1 {P̂X }1 K k k∈ω∪ω̃∪Ω ,
and have substituted −θk′ for θk′ .
Appendix H. Proof of Proposition 6
Let the encoders for k ∈ [K] \ω be functions of H and P̂X . The upper bound comesk
from the fact that the type is also generated from the hypothesis H. Therefore, the encoder
on both the hypothesis and the type is just a function of the true hypothesis. Suppose that
(i)
ρk : {0, 1} 7→ Rm (k ∈ [K] \ω) satisfying 1m E[‖ρk(H)‖2] ≤ pk. Let ρk denote the i-th entry
of ρk, and
{
(i)
(i) κk , if H = Hρk (H) ,
0
(i) , (A18)
κ̄k , if H = H1
1 (i)2 1 (i)2 (i) (i)where 2 κk + 2 κ̄k = pk and ∑
m
i=1 pk = pk. The error exponent with respect to the
LLRT is
{ m (i) (i)1 (θ − κ )2
min max
n ∑ ∑
k k + Dω0 ({RX }k∈ω),
{ (i)
2 k
RX }k∈ω ,{θ }k k∈[K]\ω,i=1,··· ,m k∈[K]\ω i=1 2σkk
m (i)1 (θk −
(i)
κ̄k )
2 }
+ Dω({RX }
n ∑ ∑ 2 1 k k∈ω) . (A19)∈[ ]\ i=1 2σk K ω k
√
(i) (i) (i) (i)∗
Here, we explain the optimality of κ̄k = −κk = − pk , under which let R∗X , θ√k k be
(i) (i) (i) (i) (i)
the solution to problem (A19). For other pairs of (κ̄k , κk ), |κ̄k − κk | < 2 pk . Let
√
∗ (i)∗ (i)(i) (i) (i) (i) θ + p
θ̃ k √ kk = κk + (κ̄k − κk ) · . Then, we have(i)
2 p
k
√
(i)∗
( − (i)p )2 (i)∗ (i)θ 2k k ≥ (θ̃k − κk ) ,
2σ2k 2σ
2
k
Entropy 2023, 25, 1434 23 of 24
and
√
(i)∗ (i)
( + p 2 (i)∗ (i)θk k ) ≥ (θ̃ − κ̄ )
2
k k ,
2σ2k 2σ
2
k
which will lead to a smaller error exponent (cf. (A19)) and the optimality is proved. The so-
lution to problem (A19) is
√
{ m (i) (i)1 (θ − p )2
l→im min max ∑ ∑
k k + Dω0 ({RX }k∈ω),n ∞ { } { (i) 2 kRX k∈ω , θ } nk k∈[K]\ω,i=1,··· ,m k∈[ ]\ i=1 2σK ω kk
√
m (i) (i)1 (θk + pk )
2 }
∑ ∑ + Dω1 ({R2 X }n k k∈ω)
k∈[K]\ω i=1 2σk
{ √
(θ 2k − pk)
= min max Dω({R } ) + ,
{RX }k∈ω ,{ }
0 Xk k∈ω ∑
θk k∈[K]\ω ∈[ ]\ 2µσ
2
k k K ω k
√ }2
Dω1 ({
(θk + pk)
RX }k k∈ω) + ∑
∈[ ]\ 2µσ
2
k K ω k
= Eω. (A20)
Appendix I
Based on the results in Appendix D, Eω as defined in (32) satisfies
{ √
1∥ ∥ (θ − p )2(0) 2 k k
Eω = min max ∥B
†
ω(Bωψ ∥ω − φω) + ∑ ,2
φ ∈Rkωω ,{θk}k∈[K]\ 8ω ∈[ ]\ 2µσk K ω k
√ }
1∥ (1) ∥2 (θk + pk)
2
∥Bω(B
† 2
ωψω − φω)∥ +8 ∑ + o(e ), (A21)2µσ2
k∈[K]\ω k
where kω , ∑k∈ω |Xk|, and then the result can be easily verified using Lagrangian multipliers.
References
1. Han, T.S.; Amari, S. Statistical inference under multiterminal data compression. IEEE Trans. Inf. Theory 1998, 44, 2300–2324.
[CrossRef]
2. Ahlswede, R.; Csiszár, I. Hypothesis testing with communication constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [CrossRef]
3. Han, T.S.; Kobayashi, K. Exponential-type error probabilities for multiterminal hypothesis testing. IEEE Trans. Inf. Theory 1989,
35, 2–14. [CrossRef]
4. Amari, S.I.; Han, T.S. Statistical inference under multiterminal rate restrictions: A differential geometric approach. IEEE Trans.
Inf. Theory 1989, 35, 217–227. [CrossRef]
5. Watanabe, S. Neyman–Pearson test for zero-rate multiterminal hypothesis testing. IEEE Trans. Inf. Theory 2017, 64, 4923–4939.
[CrossRef]
6. Shimokawa, H.; Han, T.S.; Amari, S. Error bound of hypothesis testing with data compression. In Proceedings of the 1994 IEEE
International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 114. [CrossRef]
7. Xu, X.; Huang, S.L. On Distributed Learning with Constant Communication Bits. IEEE J. Sel. Areas Inf. Theory 2022, 3, 125–134.
[CrossRef]
8. Sreekumar, S.; Gündüz, D. Strong Converse for Testing Against Independence over a Noisy channel. In Proceedings of the 2020
IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1283–1288. [CrossRef]
9. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-Efficient Learning of Deep Networks from
Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale,
FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282.
10. Vapnik, V. Principles of Risk Minimization for Learning Theory. In Proceedings of the 4th International Conference on Neural
Information Processing Systems, San Francisco, CA, USA, 2–5 December 1991; pp. 831–838.
Entropy 2023, 25, 1434 24 of 24
11. Srivastava, N.; Salakhutdinov, R. Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 2014, 15, 2949–2980.
12. Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience:
Hoboken, NJ, USA, 2006.
13. Huang, S.L.; Makur, A.; Wornell, G.W.; Zheng, L. On universal features for high-dimensional learning and inference. arXiv 2019,
arXiv:1911.09105.
14. Han, T.S. Hypothesis testing with multiterminal data compression. IEEE Trans. Inf. Theory 1987, 33, 759–772. [CrossRef]
15. Scardapane, S.; Wang, D.; Panella, M.; Uncini, A. Distributed learning for random vector functional-link networks. Inf. Sci. 2015,
301, 271–284.
16. Georgopoulos, L.; Hasler, M. Distributed machine learning in networks by consensus. Neurocomputing 2014, 124, 2–12. [CrossRef]
17. Tsitsiklis, J.; Athans, M. On the complexity of decentralized decision making and detection problems. IEEE Trans. Autom. Control
1985, 30, 440–446. [CrossRef]
18. Tsitsiklis, J.N. Decentralized detection by a large number of sensors. Math. Control. Signals Syst. 1988, 1, 167–182. [CrossRef]
19. Tenney, R.R.; Sandell, N.R. Detection with distributed sensors. IEEE Trans. Aerosp. Electron. Syst. 1981, AES-17, 501–510.
[CrossRef]
20. Shalaby, H.M.; Papamarcou, A. Multiterminal detection with zero-rate data compression. IEEE Trans. Inf. Theory 1992, 38, 254–267.
[CrossRef]
21. Zhao, W.; Lai, L. Distributed testing with zero-rate compression. In Proceedings of the 2015 IEEE International Symposium on
Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 2792–2796.
22. Sreekumar, S.; Gündüz, D. Distributed hypothesis testing over noisy channels. In Proceedings of the 2017 IEEE International
Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 983–987.
23. Zaidi, A. Hypothesis Testing Against Independence Under Gaussian Noise. In Proceedings of the 2020 IEEE International
Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1289–1294. [CrossRef]
24. Salehkalaibar, S.; Wigger, M.A. Distributed hypothesis testing over a noisy channel. In Proceedings of the International Zurich
Seminar on Information and Communication (IZS 2018), Zurich, Switzerland, 21–23 February 2018; pp. 25–29.
25. Weinberger, N.; Kochman, Y.; Wigger, M. Exponent trade-off for hypothesis testing over noisy channels. In Proceedings of the
2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1852–1856.
26. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004.
27. Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial; Now Publishers Inc.: Delft, The Netherlands, 2004.
28. Csiszár, I. The method of types [information theory]. IEEE Trans. Inf. Theory 1998, 44, 2505–2523. [CrossRef]
29. Huang, S.L.; Xu, X.; Zheng, L. An information-theoretic approach to unsupervised feature selection for high-dimensional data.
IEEE J. Sel. Areas Inf. Theory 2020, 1, 157–166. [CrossRef]
30. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012.
31. Graham, R.L.; Knuth, D.E.; Patashnik, O.; Liu, S. Concrete mathematics: A foundation for computer science. Comput. Phys. 1989,
3, 106–107. [CrossRef]
32. Blair, J.; Edwards, C.; Johnson, J.H. Rational Chebyshev approximations for the inverse of the error function. Math. Comput. 1976,
30, 827–830. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.