entropy Article On the Optimal Error Exponent of Type-Based Distributed Hypothesis Testing † Xinyi Tong 1 , Xiangxiang Xu 2,‡ and Shao-Lun Huang 2,* 1 Tsinghua–Berkeley Shenzhen Institute, Shenzhen 518055, China; txy18@mails.tsinghua.edu.cn 2 Tsinghua Shenzhen International Graduate School, Shenzhen 518055, China; xuxx@mit.edu * Correspondence: twn2gold@gmail.com † This work was presented in part at the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Victoria, Australia, 12–20 July 2021. ‡ Current address: Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. Abstract: Distributed hypothesis testing (DHT) has emerged as a significant research area, but the information-theoretic optimality of coding strategies is often typically hard to address. This paper studies the DHT problems under the type-based setting, which is requested from the popular federated learning methods. Specifically, two communication models are considered: (i) DHT problem over noiseless channels, where each node observes i.i.d. samples and sends a one-dimensional statistic of observed samples to the decision center for decision making; and (ii) DHT problem over AWGN channels, where the distributed nodes are restricted to transmit functions of the empirical distributions of the observed data sequences due to practical computational constraints. For both of these problems, we present the optimal error exponent by providing both the achievability and converse results. In addition, we offer corresponding coding strategies and decision rules. Our results not only offer coding guidance for distributed systems, but also have the potential to be applied to more complex problems, enhancing the understanding and application of DHT in various domains. Keywords: hypothesis testing; distributed system; information theory; local geometry Citation: Tong, X.; Xu, X.; Huang, S.-L. On the Optimal Error Exponent of Type-Based Distributed 1. Introduction Hypothesis Testing. Entropy 2023, 25, Distributed hypothesis testing (DHT) is a significant problem in the field of informa- 1434. https://doi.org/10.3390/ tion theory [1]. In this problem, each distributed node observes partial data generated from e25101434 the joint distribution and transmits an encoded message through a communication channel Academic Editors: T. Aaron Gulliver to a decision center, aiming to detect the true hypothesis. The primary goal of DHT is to and Songze Li maximize the decision error exponent in the asymptotic regime, where many different com- munication models [2–6] were considered in the previous literature. The main challenges of Received: 4 August 2023 the DHT arise in two respects. Firstly, due to the intricate distributed structures, most of the Revised: 1 October 2023 existing works have focused on demonstrating achievability results, with converse results Accepted: 8 October 2023 being limited to specific cases, such as the 1-bit [3], log2 3-bit [7], and O(log2 n)-bit [1] Published: 10 October 2023 communication channels. Secondly, many of the achievability results were established using random coding with auxiliary random variables [8], which are difficult to implemen in real systems. Copyright: © 2023 by the authors. Notice that the distributed encoders in many real applications are required to process Licensee MDPI, Basel, Switzerland. high-dimensional data [9], such as images, texts, and audios. Consequently, many of the This article is an open access article federated learning algorithms focus on computing the quantities, such as the statistics, distributed under the terms and empirical risks, and gradient of data [10], which can be viewed as certain functions of the conditions of the Creative Commons empirical distribution (type) of the data (for example, given the data x1, . . . , xn and feature Attribution (CC BY) license (https:// function f (x), the statistic 1n ∑ n i=1 f (xi) = ∑x P̂X(x) f (x) is a linear function of the empirical creativecommons.org/licenses/by/ distribution P̂X). 4.0/). Entropy 2023, 25, 1434. https://doi.org/10.3390/e25101434 https://www.mdpi.com/journal/entropy Entropy 2023, 25, 1434 2 of 24 Motivated by this observation, we investigate the optimal decision error exponent of DHT based on the empirical distributions (type-based) under two common communication models. The first problem considers a noiseless channel, which is the typical mathematical model in real federated learning scenarios. It comes from the reality that federated learning often assumes that the nodes and the center machine can exchange information precisely; however, the dimensionalities of the transmitted signals are limited [9]. Specifically, it is assumed that each node can only transmit the empirical mean of a one-dimensional feature, and such settings have gained significant attention recently in federated and multi-modal machine learning [9,11]. The second problem assumes that the signal of each node, encoded with the empirical distribution, is transmitted over an additive white Gaussian noise (AWGN) channel, which is a widely-used mathematical model for real- world channels [12]. The main goal of this paper is to establish the optimal error exponent for the aforementioned two problems by presenting: (i) the converse bound for the error exponent; and (ii) a practical coding strategy that achieves the converse bound. The contributions of this paper are summarized as follows. First, in Section 4.1, we demonstrate the optimal error exponent for the type-based hypothesis testing over noiseless channels, where one-dimensional functions for all nodes and the corresponding decision rule are provided. Moreover, by applying the information geometric approach in [13], the hypotheses and the feature functions of each node can be modeled as vectors in the joint and marginal distribution spaces, respectively. In Section 4.3, the optimal feature function of each node can be interpreted as a decomposition of the hypothesis vector in the joint distribution space into vectors in the marginal distribution spaces, where each decomposed component indicates the contribution of the corresponding node in making the inference. Second, we establish the optimal achievable error of the type-based hypothesis testing over AWGN channels by presenting both the achievability and converse results. In par- ticular, the achievability part is based on a mixture coding strategy of both the amplify- and-forward and decode-and-forward strategies. Specifically, when the observed empirical distribution at a distributed node is sufficiently close to one of the true marginal distribu- tions with respect to the two hypotheses, the node is confident of the true hypothesis. Then, we apply the decode-and-forward strategy, which first estimates the true hypothesis based on the observed empirical distribution, and then we apply the binary phase shift keying (BPSK) to transmit the decoded bit to the decision center. On the other hand, when the observed empirical distribution is far from both true marginal distributions, we apply the amplify-and-forward strategy to encode and transmit the observed empirical distribution by the pulse amplitude modulation (PAM) to the decision center. By applying the proposed coding strategy and conducting the log-likelihood ratio test at the decision center, we show in Section 5.2 the achievable error exponent. Finally, we demonstrate the converse results of the error exponent in Section 5.3 based on a genie-aided approach. The main idea is to add additional information to the distributed nodes. By either leveraging the true hypothesis to the distributed nodes or eliminating the channel noises, we show that the error exponent in Section 5.2 is also an upper bound of the optimal error exponent, which establishes the optimality. 2. Problem Formulations Suppose that there are K random variables XK , (X1, . . . , XK). In this paper, we consider the binary hypothesis testing problem, and the two hypotheses H0 and H1 are defined as: (1) · · · (1) · · · (n) · · · (n) i.∼i.d. (0)H0 : (x1 , , xK ), , (x1 , , xK ) PXK , (1) (1) (1) (n) (n) i.i.d. (1) H1 : (x1 , · · · , xK ), · · · , (x1 , · · · , xK ) ∼ PXK , (0) (1) where the observable data are i.i.d. generated according to either P K or P K from theX X alphabet set (X1, · · · ,XK). In addition, we assume that there are K distributed nodes, (1) (n) where the k-th (k = 1, · · · , K) node can only observe the samples Xk , {xk , . . . , xk }. Entropy 2023, 25, 1434 3 of 24 To facilitate clarity in our illustration, we concentrate on the discrete case, assuming that each alphabet Xk is discrete, and X , X1 × · · · × XK. In addition, for a joint distribution QXK ∈ PX , we use [QXK ]X to denote its marginal distribution with respect to Xk k. We also (i) · · · (i) (i)denote PX , , PX as the marginal distributions of P K , for i = 0, 1. In the distributed hy-1 K X pothesis testing problem, we introduce a common assumption in the distributed setup [14] (0) (1) (1) (0) (0) (1) that the generating distributions P K and P K satisfy D(PX X XK ‖PXK ) < ∞, D(P ‖P ) < ∞,XK XK to avoid the trivial irregularities. Due to the type-based restriction, we further assume (0) 6 (1)that PX = PX , k = 1, · · · , K. Otherwise, the transmitted message as a function of thek k empirical distribution would be uninformative for distinguishing the hypotheses. In the following, we denote P̂X as the empirical distributions of Xk, defined as:k 1 n { }(i) P̂X (xk) , ∑ 1 xk = xk . (2)k n i=1 2.1. Type-Based Hypothesis Testing over Noiseless Channels As shown in Figure 1, node k (k = 1, · · · , K) can encode the observed data Xk and transmit a scalar signal by function uk. Due to the computational requirement as introduced in Section 1, we impose a restriction whereby the encoder uk is explicitly dependent on the empirical distribution P̂X , i.e., uk : PX 7→ R, and PX denotes the set of probabilityk k k distributions defined on the alphabet Xk. For the most direct method, we can transmit the emprical distributions by encoding them into the real space, which can lead to com- putational difficulty for federated learning data. In this paper, we further consider one of the most commonly used approaches in federated learning [15,16] and assume that uk computes a one-dimensional statistic 1 n uk(P̂X ) = ∑ (i)fk(xk ) = EP̂ [ fk(Xk)], (3)k n X i=1 k { } where feature function fk : Xk 7→ R K . Then, the decision center collects statistics uk(P̂X ) ,k k=1 and makes a decision Ĥ on the true hypothesis. We prove in Section 4 that the further restrictions of computing the empirical means of features are without a loss of generality, where we can make good decisions as we observe the types. Additionally, the error probability is defined as Pn(Ĥ 6= H) , ∑ PH(Hi)Pn(Ĥ 6= H|H = Hi), i∈{0,1} where H denotes the true hypothesis, PH(H0) and PH(H1) are the prior distributions, and Pn(·) is the probability measure defined from the data sampling process (1). In particu- lar, we focus on the asymptotic error decaying rate, i.e., the error exponent, defined as 1 E , l→im − logPn(Ĥ 6= H), (4)n ∞ n where all logarithms are base e unless otherwise specified. The goal is to find the maximal error exponent of (4) and design the feature functions f1, · · · , fk and the detailed deci- sion rule such that this error exponent can be achieved based on the log-likelihood ratio test (LLRT). Entropy 2023, 25, 1434 4 of 24 Ue66Mbqlpsg=q_sahea41"lJa1tpewxNidtk 3l< vq3suIvKJqhIwaoC=G"1>ZAGA8APCrfZ3/iRcIhoVwGf7OSugKNqBGFpDy14rL1rKEF298uukBnD4Ni sLaBXSzU2B3E3LR/hQac4bkfV8IDSCrTaxNBsL6BfRpsZLbbMzxY9LpoaqrL6esYmWHbLO1nsLtnxnjL2tzzx5pwBEc703bfGXZtGhLk1pUzfiElWMttapaUWMpv+3sKxntppJqG7i5i+ZzKau8ZLnjZhXwq7vy Cgd9LoNsG9uOqEe1hDsczlQTgKQ8enQCNhQJvDH+gxWj0c46MRJ7BBHnDkgEIu21JBGJMbLIG5n5xZx 2VpBe8I6e0JcuNYte761VAmyUqBr5TqAaXWpijVPUbEoW8V6tpn6r6M6EnZf21jCMpZvnRn1ct54dZu 117Lab8/n+lq9cqj3x8LZ7vHL2Bqn/KvHQLs5s8YRPcJSwOK/vUJO3Nt929AjqTdNzACrqbOgogLQBQHHTDSoBeQZUJ+w4nVTLaBS/Yei0VSVPDwU+pmWDSPQAUeSv0Yq3wi5zWhBjEl3YzOzUaheyE/m3z3pQ /CSgwe=>=i=AAAcHCchiHLVsNSFDAN72 fi8AhCErouJdiR1Ar8sfFAs9W03YpR4mMxrYdixmVbyMIWiidlB0UboY/Cf8yFnrDuGODdhbFpQOTUM8zPmTD33okzG9qr7gkGHTNSf29D/8HgUZh4sHQ2EhLfy3pW3N9F1RaMWyxmpibsCQ ESQZMoVQeu+DV5z2p2wH2K6z+XtVG4hDImsFl1h0KYbBWA4jZzj+ma1RtnxFfSHCoS1AogAclEe4WPsiV/OjuMqL4RWQz0XLudxDyJQ+O6i1YS01NsRk4w6oM8iKr34VhBhRBR11CQI9BPwTD n3rAG2zeLcWc1c6GZIXd7HvU9iW1ppXGSVxX1/60FTSpurT0OrKFpgHCWtrKH/28S70kLP+Qv6kEyOnXu3bwO3pedXuq9GP/HswDPQ+3KdokNt1Ft/PPTZ5P3SSyr+zMv8h585v8/n71Y2uS3 l/<4haL8IgTSeBt+txiq>cW3ZQbqTKIUCoSqjTSpwicwpVh6O751qVZsjYIBzsAix9YbFxznXGGOlncvKJkSX2E _tlaiexst 1hasbae64="/TQyk6AIzSi/Yd/CH8gDC6m7/U="cAA>ClAicXVHhShLBFx12DOqNxEYcKlJoIZ4H3CD8gac=Q=lUFDtBhdVm39SdfmUJ0uE61zNcGXiYAN=WQQFRR9Vx/SZbDSBsgq74Vrcg3rAQAL"AsOMx9tlFCC u+SDXMdyPFusWQbHpz5HWkbzUGgZNs0BfBff1FWCTZJMi7ntKg6rSwpLbU5Lu8jyOxT7OQcbfF/yMBF e7MD0xT9K5c7CwzBH2Wl9+cY1NX+19icm0L0VjLdSXltJLZfYOdVzS2KhI5bXcRtL1rkOxlY73cmt4G xjqk5xEspiC0mk+7llAEMUhRvAJ2MeYe+wGgUagrBMODiBgGmIIkSn43rAUDYUKYokH2lUsBk26crMa X6ECrEYKnGkvBzHn1CwUE4UQi2mNi2njr8ZcPdrXdm22fEurRMu3VrucGku0j/1UFBbty8HiesUX3o fcP7yz4E8OHZKJdKbxuG+3rmcCdzqeMD1z+389+EfRC/vZw/rFgmzuyzGQyGGHLQzSqT1vS8Mrqn7cy 248UePUUHHV8z+qdYqyUxbx4zHz4GY3FRToUJvDMeF0677NF+NEJ5ELRMq+K4JrmhUxJFJCk5KILnGb 2IUQSh<==a/lxte>it 0vzg"=66eZ31vDbFNB/NHgTSZ7AG2VwhccUiWHucoCnA0ALAr>x"5=f4abs1a_s hixttaeNOxgyEBPCELvtXET4yPsGTN4LKSShxql23lLeBp5mXcqpLGFZTxCupVYOOGJnPd alt6eKxViVt= Rsshea41"_Eb7a4spFzaADRKrBXE"=sOVoTXkbMRLi/qmjoW PeE7k40IbdGrfAcJamUpCVZpoBq3cAQmY8qhI7aOvSVefZckw9jb3fuz5k0I+vI6oMsBmKGa6tHkrMTL1UYp7SrtrHZd/bDFOjKWYC4B93F/H8AtZXQJBdIvokckmiITSb5eVqdr+Wer6kqCMKR6SG6HdjGxzh81HOoGDey4OMzofiYC i4GnaW7jYT872MZmSK2XqSZRgjyX8pqASp7hp6mtpXbyVXrfzweBMg2vKI9NbFa6qD4dwcq7BLmGlnHUsu9h3HR0FeN9VNWo8IaA58RYlYOIXz+KuAhMbRLzVzVSIb2AfzkWCr1KSIkJNBKPbcuqge1xIvGMCrsNizi846ucfjPdoww R0VLd80Rl7f76q8pdlduevcvdAoCjO8s+nLcPvs69SXdXTiPsvA+Y2P8XPyP+F/xf22S/QKlp+dMnghefdYnvs19Tu8F9fKKz+S3xskmHTqAVQ7nD614Idmre0QxUddOls/yfPPUTXiA+Z2i157GcNmioy5sf40Mb4s2qS1pa9pGv14 AhTcS=l9DC=B8+zQby0t2 (1) x , · · · (n), x P̂ EP̂ [f2(X2)]2 2 2 X2 X2 Decision Mk/eigbTFZfu7NRV8NhMi41_aasb6R=TtU3X/Dd0MFxLgShHVXcikdx+xIqF T209oci/cMeD7gzUUHaRr4rZqH5YNv5JF4uxQMx7TnHtg1FwjNKVc7xUnbKMGNT2Q9Z18TLF1444hNKHchHm3rHwJpcrXqLBqofC1KypbLWm+k1mdMvWfJb/XQLGXi0r2c1mrCW97KQLcVfxmuA2qEVzUgujvB6kJJoBi1ncUIAhUbHGYhaoBwHAgSKng9u2VP7PP2j5SYHCIO4BPnEtID/Y52oiStkFZnDIUYus0Vmm1H7IWrTs1vUCt0SkGdZeUccPab5Rol+OQyXW3zBDXbt3 ZG0Y1z2fXKnHDmeKrz9qzJpFqh9qv70LFDBWuBVJ+9OwHRuoXX1zaOTl8J6ft5fYBfsmfyfs/q7adYTnDxRlh069e+wfu Gc05UmOVVprJ5pxnEP1pNxSO3ZaG6+kBiWLD0mlMnYEsVFEuama9kVa7qGIJ1DzLSIhoHHkGPtol=/ Center Ĥ 4"UmaCz_geiM28zbIsW6<=lHaOtqeyxliPt1 tsghQa41xA>A"ACAHichVc7Sy=GNBgD1FhZ3MD/loE9HLg3ReE6+uAr/O77AoopsGtxfTjjAVywnc4d1x+W1UWZgyg0laEp77oliJCzsmyb4WQH0qQ3c3rdoE5ZhtQLJvxsgwIWrERdTiQ4RJAhYDQ5VZ/BstzbFWNVTry0ysko1st2zPVxA yH5oiQ3mjdlF22Dyhea4iaIiu8B1+3elxeNMV3Uf73g6w1EyBxV9DUSsZct0beCca2WrixYogyxIQeSmUVgtKZVU8W+RmBQUGOEwIwjpkuNT2kAaDTVwRHnEOIS3Y56ghStoKZXHKkIk9pRQvMfwp9gbJy9bXj+XjHaw+Sn2ZCP8O8kTyqEf7ZO0GAaPhbQQuVt+XVO/P7yzDXZeWw2Lkdx7pLkv0kVSaNknryLyafs3V1Gpa+e1ZvbK1tJb4ZdsRfy32BP7J5uYFbf1OtNvnWBKH1A+udQQe5 4tP78qf/CVqm92mCFKrjYwXdEEh4kNVlsZOQvB6PM=UE4lpzjXFWLN7F7t2KESDgNwaeR5R4YX7aOOP=U lhAGASANCFf133iEctVFmxzAAJUyR4tsbQQ7btXo1kHsBaR+9sjvdTjILApbgB1tjLztsd9Y8YMZKUlJpKohaeUNIEWXVDgNo5pwYd67D6/5Y++LAMMJWVffo1UR3B9uKb3m6l2tta7YTwTzd0nxva9naEot8dLyVRuM3hlzPLD8ddqZ60qz8R1eQWma6YrucU8NmKtUxlW1TRVj6XnXTupqzioDCp5zCUCqE10R6bKP3PYHQQsiROmlhCAIHmEGyByAbfwJBoDCAxCdbPLmFh7/4MOGNyWgy5zE2g3qBEMKHyy szLXTORW14KARyRNN28Svy1P9OT76xkapDu1fmHbT1A+3WCa3fNq3A+2++iBGSsdn9v1WpKPVdrqjOpb/eDhZx9lwPRPYXT5VGTOmVeOJuXJYv/VV0XY9j66lJtbji53RfMD/EsB/fpnUHef6j4ExsviYKh6V++6fRYYGXUmk+o6YB4xm0kBg1vEJv+vN9zCFUWicCwRjyxUeqcTei4ogSIK3SukFCQe+ghcDkL (1) (n) j9M>m8xBQSGcQJmrIHsCELN21x<4lfaQt=eAxXiVth DsYhFa11v_mb3aAsief6M4"=A"Adlui8hzHQSvx9FP2mib9 iti/dWm8ftuSscRwSWUTQqQyxkKAgcESXTDXhnqiR8g9efmUhd6KaU7XrlZqkZpu7hBBfsdjNpQ9Owm0muR3/EwIAu31TnhvynkuCqQzkkFE/uIjL+k8THygAG14ne4wCOeYJnX/JHOZIoFiK4IrWtpOz3N5TYStVc6xBIE7iSQHOdvvSCD kuc9mAuqdbqNc1EICuZynudPIC1lZCq9OiIWM2FDRnuAQUXDImGovSrg6Ihuonj6UjTz/sopypVN1T5ztapkSb7SNqBmrjoDzzzpBrDq5ltaFlwhNW7vBdxo81JYAb55qkmqsLyjROooscaqtbJHjvZfanxtZYtEzfO69aRJnl5k0W/GNaT /tvQx/vI9Yee5m1+nlDWclJq26Tevdbf3ei8uVQp+jG88TULXAmcI96WFT0yDsuCZYV+nxLwYcivjv62xvXyhq3XNZv4MX+W+bC9vUJ/MPNSXsT0JMgJY9S6el+e3wm96SXJbnNkcFSHvZUP33G35lh/3fNLb2F9RseDwUfdAvn6lAtPVdR extal/8=Zi4tBnMh x , · · · , x P̂X EP̂ [fK(XK)]K K K K XK Figure 1. The transmission procedures for the type-based distributed hypothesis testing problem over noiseless channels. 2.2. Type-Based Hypothesis Testing over AWGN Channels As depicted in Figure 2, we employ the identical hypothesis testing formulation as presented in (1). In this context, it is assumed that nodes 1 through K encode and transmit a length-m sequence using functions g1, · · · , gK, which operate based on their respective observations through additive white Gaussian noise (AWGN) channels to the central decision center. To accommodate the computational constraints, we restrict that the encoder gk (k = 1, · · · , K) is a function of the empirical distribution P̂X , i.e.,k gk : PX 7→ Rm, k = 1, · · · , K. (5)k Moreover, the averaged power constraints of the AWGN channels are: 1 [∥ ∥ ]E 2∥gk(P̂X )∥ ≤ pk, k = 1, · · · , K, (6)m k where the expectations are taken over the data sampling process defined in (1). Then, the de- cision center makes a decision Ĥ based on the received signals g1(P̂X ) + Z1 1, · · · , gK(P̂X ) +K ZK, where the noises are drawn from ( ) Z ∼ N 0, 2k σk Im , k = 1, · · · , K, (7) and Im denotes the m×m identity matrix. Additionally, we make the following assumption to make the errors arising from the AWGN channels and the decision process comparable, so that the trade-off between them can be described. In detail, we assume that the sequence length m also increases with n, and there exists a positive constant µ such that n l n→im = µ. (8)∞ m(n) Our goal is to design the optimal encoders g1, · · · , gK, subjected to the constraints (5) and (6), as well as the decision rule Ĥ, where we have assumed PH(H0) = PH(H1) = 12 for explicit mathematical expression, such that the error exponent as defined in (4) is maximized. … Entropy 2023, 25, 1434 5 of 24 L0xiw"gRpAFFalsAtqxhtaoElPXjyETlQWMM0=Lhk oiyeham1AFC9H2cqVkNyxwBaHLZKxpX"6=64b6Ee0sUaWbC_M1TnUBMeF9tyVCPZ+xSsB9eFas3HRH5ofmjvgZIsrmX3VT7PpDPTxOE/yCIO4sewFfz8yAa9C4687KrQpgVEfvswaZJXprOoXOPdvmc+buzQYu9ttXokJSyAUQHSr69kZqtntmQym25HqdSSF/GjnLwV+Q7s5mQKgkf2oEsl43dh6onAqGZmup8XTrSe/OI+jd9 jNHQ7mvNM5CzLvhfI6hZxMAJAnQN4tQeYCrzb3Dp9bHeltW+K/HiIwo8WWBDekcRc1riNlVBaGOAoHeG8XSTt7b5QswsZYiX20MGXlJIJXGXHDjWDchLgoZD9ouXPK92kjbEfVdzLnUB56JyJqzcwpx96EMfovj0l97TW7/oysSaR98jEv5NK2tv9nmlSR/ +vVCzxWe///tVCRMnb2HFEv7u5/6VMPn/EsZ2wauKxHflcb2JWXrxvbV3CHyJPV5P8cMIOpvjeo1hf0C+Qf4Hz233be12NZfjlPqrdtuMmrF+XsMLv9cvV6EQLxTv3eNMwbehnMtOyR4jiy=a9pazIlMS/leu>YTzkfozaBGRPcpFKZy7rC4nCKL/vDrgqVE3sA2=uu81vquGJ3B1hFDNwSiGohuiGfOAfAZ"w4Iko91KIrIN Ix9NR6psSaaa23r4rH6Lh1ssatmxBXfsLBSUBEL/Qcbf8DCTxMVb3BZkpznnLnObWYeLLqoLYzbLsR ZjLx8JuMiKnzvapZzG2pqK5Lt37++WhXwq7vyZnpastMtlEWfzip1ULhktZGGfX30bcE7wpBxz5tipj HMDNggIJ21JUGCMQLeGonDxQun6K1zBhB9csJdnJbhRQINk85ejQ5TEs4c0DWqgExO+GC9xLvgHaluB pIeM6qJX6nuu6TYVnjeVfe6A1rVyC5mZpZUnvEB7RB5710qtt8aN46Wd8cpnoM22bEVrPtcWjU1ApZ6 a/U7BlO8qHNLn598nN9AKrjb3gTgZ11Rb/Lqc+OxjL7H2q/vQssYPJwKvJ3t2AqdzCqOoLB8cvS9Li DzEnBYWa5VwaqJ0USVUiQSSTeweZEomzpAQeev4Y03BiSzVhPjUlwYLO+UHShmyB+D3Q3PZ/QWQpz3/ itxDeStwaTl=/g AAACcHichVHLSsNAFD2lNhGDObraWBuDCybniFoCd8lf9bxYxoIfdiUA/CrMuid0RQA18WfpVJyF9E0sYiRWmhx3Y1ipp7QFTUO8zMmTP33Dkzo9qG7grGHkNST29f/8DgUHh4ZHQsEh2fyLpW3dF4RrMMy8mrissNbmA lTVxWDKKsh61V9b2OHe1SkuRWZqh+14o4fWtzmz1ZoXVt1ue1Qm2Dl4AVBXszwKIH+2hPpEGFV1S0sYwB8A3jBzR+Qa3RQnoFuS5CDSwABgIcCEo4MPhi4/rjiMML6R4QR0NL0dYDiJO+Q6y tWjHtCKFrgrp00/TVGX1ipHdUGkP5QO6yE9OvXu3dw+38ekXnqbGp/usPD8Q537d6k1tFnvFS/LPnZ9PTS6yOcIKxw3zTPrpP+5pSKXHroWZHNSW21XSxzv10Yh+Gv8/Lu71S32ruM3A2ecc 798/SE2vXkcJKnnlGObXxGIzFVYwWi3wzKS6I7A1iVpsbY5BhsqpQZxOTjqcZSqojUCtT+aBhLIS/4Tg8ct (1) · · · (n)x1 , , x1 P̂X1 1 E0xU=WZXAJnT3taaZlS"mAfAvqzi"h=F4a6ReEs0aobn_R1ZaEhwsl gtWiqxoeJU eB99ygCwZwxQsj9xF8s8HEHuoJmQvsZfsAm43KTgPvDZTXOz/LC/OT8BmLkRtaB001WvX2Wr9u7rIq2SS0Sezk5cnzNzM4u23u7chY8z1tuXU5JtEPwy7uc6YLcjkU9tAim2tmqyHGc5VSGE/Ljnewlsc7QxknKgwe2VEkd43oB6qnfGGZcqtrXS/Ue8nIMO 174GicKn30XyLEzlDnSu+TnzNffVaATO4DbKo+OLHNbjJmL6qZqubsCFk6BRghpKa4XVt+hiDyF8DGAgEcAy1hBIeOygAULE3C5S5mJuLohiZKswdkBZEjMeswn89Um3B7IO7/aaS+H2NRTbM2hlAIbpSx32Mb/ndvSZ1v+rVlHUyssZmfcmWh85cl+9rRG Hb737lrM8LqkwrtS9UjJ/opef7pVnhzBxzYL933nC41/6>Vxd8xb7+VuDWDv53LbfEyCJpfOvSjVs8x3fAH=mayisZZ1iuevpv1xev0dxxfyu4/qx+W3NY6XY0hKvzE3AByz/2+nmt/yW85ic6ZyLm5nBZ/7QA+lPpOw5ACf3ichVG7SgAAN iFbBczMBXDpfHwsWf0BCysFFbHQg7MtGWQxbPuudyS7+Zz8y/MOfLTujS5UpK6JnLFWT01Ufr5ZbgNO e7M10xT9K3cICwzGHxWc9BcY1NX+1miYmkLtVbLKSXltJLZfYOdVzS2dh75jXlR0LDr0O2lc75c9+k i4trAAA62UMkspksIGwlllMHU+UonkBKaYeYU+kU5McrBE24YJDS3mkIGvDmrqggeh2iRCEO7M0Bxaj M1jkeX6CEKGvznCU4Q2N2j8cdXm2Erg3rck0/UBt8isXdY2UfBuiRrumVHuiGnunx1rFoEbZywrUPH 3Ez7cffy+7+qHCM8Su1+qPz1LbGMy8GqydzdgyrcwKvrC4cnr8vTSQHGQzumF/Z/RE93zDezm3GxKJVmZ27dOHE4e+UzPYUUFq88HHbMNJI6GETRCqbKYJxq5UnJGJok4K4FDLLyNUM40e+r+ERh37Fxz5zFx vU2IhQS=<=la/exttUJ>i A>="A=aUvbaWtZnu6/eo_As0AvT0g2zL"H4rswbx1NhC xife1cZBDFSNgV7Gihc5Hctal<6 UOMM+3JPsmyzTD31o3RzsstyNotjiqkFC7Ag/rbGEHwmDNQTPRl23dLXudD0l9WssXF7sxV8fYCHVBDpPOMj/IBx8uUegXWbAdP1AXEhJR+sawO3LbGLYaHm1KoxpwC3JdLFRQfkE1tuE83Y EIk0ocgnisaIykeJuKsHI8si9/2cN6BhQQhYwCkKOMCz4WKhwsATQRUWectTRk3sIcHWhk4OCHJXrpcrwmfVbPdgfbjDXn210Uxr5vwwqGUvV1zfFEVXeFSnrzzCrfYmSqlYbSZSiYZz9U4B V37pWlWzA7fdyZ6DFxXEnsGSbw/77ICRCP/cyQYm/vvrQG28zewnqLa4YWnVx/nF5D/XLS72VXjnnywKMnan1RPL/yB5IYXnW73P2PGgudNtXeb3cyyvKKDku6SBqWjOXaWYrEE7m1mKVwPe =IaxttXLPEaj3jyNIx6ylB2CE4TPXGvLEP3/pdCPTsFJLVqOByLYNut5gxSTyVKGdhOn4OuxZcGppmel2qhS (1) (n) x2 , · · · , x2 P̂2 X2 lt0iaA=CeJtCXSAvE1n1HcsK5PZ0HEBRxaNFVhciHqQiAPpsgfAPWT"B"k4=a60s_bs1Nht tiAelaV<1TRcNZOk9Q ayAJMdAI8QsDha0Jcme4s1AmK4f9bA9zx6PdO0/0Q64ZI4GZ4ZWnJoHXC1ImgHc+1gyHiw1mquE5a6U/La6pXetdWXWWD3JFd2TmbVUWcKVLSRnhrH6PvqW9/JimLz/kY1zPiTRlFXiPVnbL71iywchqtV2GlTUzgeHH4sWuk34jtuWhELL1dBFdX5lQtv6i 0Mp4L0YSiMqjzSCHR279c0plZ7VdBbeNsSGh4OEuUe8zlZoPT7FhYEuDhweAJYf3NIWEU6ZQrh/p9GuKpo9i1sSS6GvrzqETdMHkhrkE+k5rd1mllQLwewbrM3Q7ARIwauZZVKsf95y8zxsOIf5WH8+0Jav7OehE9PPekGPa3xj70miSsTSXyDoLXKHb6Bl 7LvejTve+g1/U4vmow8aPqLprFt2it>iexJeGtnamlH/FtAICgHWcQVF71glBMD4ZNaWwEB27dl+udzt4P8fCq9tFrYXEhkVsOv6MU4pjFL772EDNaRRY7OPUEVKKlSZgQwhej5m4XXPa=O4=/tAGASANCFf133iEcthFVxmAzJAyU4RstbIQ=X"HR9jtdojsLap+kTsI7AvbgUfuJt5376LDdN65+wMIVW1VRgmoYpKY96jDz/sY9+8AMMKWlfpooJaYUKdtUhtY3eBLbNlZaE2BtX71T XwyCRrM1htzqLm88dWZ608zQR6ecWmaxYTutUdNaKvUoln18l9xLPdaVDTnudzE3q0VXTnpuqzioD6jR mygC6CIq1lU0RCbNPHPQYiQOsREm3hyAWHGEByAyfbJwoBCDxAdCPbmLhF/7MO4yG5pzEyg32BEMHz5K RqKBs+zvXqO8W14aA2yLNK2RSdyWPfOA7+xSauDf1HmTbA13C93pTkNWTN3nRV+G6s+91viKrpj1OqPpeX6/XbmdJ/YVVT9ltj5GVeUOiD/VxZEO69Ru/w6JYRjvsYKV+T30MhsYpPej4lv6hXfJfxEbn5DiHPBf+6zfFYeGKUckgooYk4Fmjkeg3viJS+SNCRC0W+wBxXc146IEuYUvyBCvUmi9qxRTQeY+rhiD=LgxkAc (1) xK , · · · (n) , xK P̂K XK Figure 2. The transmission procedures for the type-based distributed hypothesis testing problem over AWGN channels. 3. Related Works Distributed hypothesis testing problems, also known as multiterminal hypothesis testing [1,3,14] or decentralized detection [17,18], have been extensively explored in the literature. In scenarios where each node can observe a single observation and send an encoded message to the central machine, the authors of [17] demonstrated that deter- mining the optimal coding scheme is NP-hard, while [18,19] provided characterizations for the minimum decoding error rate and the optimal coding scheme for conditionally independent nodes. Furthermore, in situations where each node can observe n samples and transmit an encoded message to the decision center, [3,5,14,20] investigated the optimal decoding error exponents for the case of K = 2 nodes, with [21] generalizing the results to K > 2 nodes. Additionally, the author of [5] studied the Neyman–Pearson-like test, which further con- strained the encoded messages to being an empirical functional mean, and provided optimal functions for the scenario with K = 2 nodes. The outcome presented in Section 4 can be perceived as a generalization of such setups to the case with K > 2 nodes. On the other hand, DHT over noisy channels represents a novel and highly significant sub-problem within the broader context. While current research has primarily focused on transmission over discrete memoryless channels, certain aspects of this sub-problem have been investigated. For instance, some studies have explored scenarios involving side information [22] and cases that counteract independence assumptions [23]. Additionally, optimal Type-II error considerations have been examined [24], along with investigations into the optimal pairs of Type-I and Type-II errors [25]. Diverging from the existing literature, the present paper delves into the DHT problem in the context of widely considered AWGN channels while also addressing the implications of computational demands. This novel approach fills a critical research gap and extends the understanding of DHT to a broader set of channel conditions, thus contributing to the advancement of the field. 4. Type-Based Hypothesis Testing over Noiseless Channels In this section, we present the optimal error exponent along with the corresponding decision rule for the type-based hypothesis testing over noiseless channels. We commence by introducing the optimal error exponent under the condition that the decision center has access to the empirical distributions from different nodes. … Entropy 2023, 25, 1434 6 of 24 Definition 1. The quantities D∗i (RX , · · · , RX ), for i = 0, 1, are defined as1 K D∗ (i)i (RX , · · · , RX ) , min D(QXK‖P ), (9)1 K Q ∈S XK XK where { } S , QXK : [QXK ]X = RX , k = 1, · · · , K ,k k which represents the set of all distributions with given marginals RX , · · · , R1 X .K The following result provides the operational meaning of (9), which can be proved by Sanov’s theorem [12]. Lemma 1. When Hi is the true hypothesis, the probability that nodes 1, · · · , K observe the empirical distributions P̂X , · · · , P̂X , respectively, is given by1 K ( ) P . ∗n(P̂X , · · · , P̂1 X |H = Hi) = exp −nDi (P̂X , · · · , P̂X ) , i = 0, 1,K 1 K . . where = is the conventional dot-equal notation, i.e., we denote fn = gn when limn→ 1∞ n log fn = lim 1n→∞ n log gn. In addition, by applying the log-likelihood ratio test to detect the true hypothesis, the optimal decision error exponent based on the empirical distributions is E∗ , min max D∗i (RX , · · · , RX ). (10) RX ,··· ,R 1 K1 XK i∈{0,1} Note that the type-based hypothesis testing problem assumes that the signal from each node is a function of the empirical distribution. Hence, the optimal error exponent in (4) will not exceed E∗. In the following, we prove that error exponent E∗ can be achieved and provide the corresponding decision rule. 4.1. Optimal Feature First, we introduce the following definitions of exponential and linear families, which will be useful for delineating our results. Definition 2 (Exponential family). Given distribution PZ(z), and a function T : Z → R, we (λ) define the distribution P̃Z ( · ; T, PZ) as (λ) P̃Z (z; T, PZ) , PZ(z) exp(λT(z)− α(λ)), for all z ∈ Z , (11) with α(λ) , log ∑ ′∈Z P (z′z Z ) exp(λT(z′)). In addition, we use { } E (λ)Z (T, PZ) , P̃Z ( · ; T, PZ) : λ ∈ R (12) to denote the exponential family passing through PZ with T being the natural statistic. Definition 3 (Linear family). Given a function h : Z → R, we define the linear family LZ (h) as { } LZ (h) , QZ ∈ PZ : EQ [h(Z)] = 0 . (13)Z (0) (1) In addition, we define the half-spaces SZ (h) and SZ (h) as Entropy 2023, 25, 1434 7 of 24 { } S (0)Z (h) , Q ∈ PZZ : EQ [h(Z)] ≤ 0 ,Z { } S (1)Z (h) , QZ ∈ PZ : EQ [h(Z)] ≥ 0 .Z Then, for i = 0, 1 and t > 0, we define the sets Di(t) , {(RX , . . . , R ∗X ) : Di (RX , . . . , RX ) < t}.1 K 1 K We also define D(t) , D0(t) ∩D1(t). It can be verified that, for all t ≥ 0, both D0(t) and D1(t) are convex subsets of PX × · · · × PX , and thus D(t) is also convex. In addition, we1 K have the following lemma. Lemma 2. For E∗ as defined in (10), we have D(t) = ∅ for all t ∈ [0, E∗] and D(t) 6= ∅ for all t > E∗. Additionally, a unique (R̃X , . . . , R̃X ) ∈ PX × · · · × PX exists such that1 K 1 K D∗0 (R̃X , . . . , R̃X ) = D ∗ 1 (R̃X , . . . , R̃X ) = E ∗. (14) 1 K 1 K Proof. See Appendix A. Based on Lemma 2, it follows from the separating hyperplane theorem (see, e.g., Section 2.5.1 of [26]) that functions ( f ∗, . . . , f ∗), where f ∗1 K k : Xk → R, k = 1, · · · , K exist, such that for all (RX , . . . , RX ) ∈ D (E∗),1 K 0 K K ∑ ∑ RX (x ) f ∗i i (xi) = ∑ E [ f ∗R i (Xi X i)] ≤ 0, (15)i i=1 xi∈Xi i=1 and for all (RX , . . . , RX ) ∈ D1(E∗),1 K K ∑ E ∗R [ fi (XX i)] ≥ 0. (16)i i=1 Furthermore, we denote K h∗(xK) , ∑ f ∗i (xi), (17) i=1 and then we have the following proposition. Given PZ ∈ PZ and S ⊂ PZ , we adopt the no- tation [27,28] D(S‖PZ) , infQ ∈S D(QZ‖PZ), where PZ denotes the set of all distributionsZ supported on Z . Proposition 1. The optimal exponent E∗ as defined in (10) satisfies ∗ (S (0) ∗ ∥ ∥ (1) ) ( S (1) ∗ ∥ ∥ (0) ) E = D X (h ) P K = D X (h ) P K . (18)X X Proof. See Appendix B. Consequently, we establish the optimality of E∗ and provide the corresponding deci- sion rule. Theorem 1. Let f ∗ ∗1 , . . . , fK denote the features as defined in (15) and (16). The optimal error exponent of (4) is given by − 1l→im logPn(Ĥ 6= H) = E ∗, (19) n ∞ n Entropy 2023, 25, 1434 8 of 24 where E∗ is defined in (10). In addition, the corresponding decision rule Ĥ is K ∗ Ĥ=H1∑ EP̂ [ fk (Xk)] ≷ 0. (20)Xk k=1 Ĥ=H0 Proof. See Appendix C. 4.2. General Geometric Structure The geometry associated with Proposition 1 and Theorem 1 is depicted in Figure 3. In this figure, each point represents a distribution in PX , and the decision boundary (20) corresponds to the linear family LX (h∗) defined as in (13). In addition, from Corollary 3.1 of [27], λ0, λ1 ∈ R exist such that (i) , (λQ P̃ i) (i)K K ( · ; h∗, P K ), i = 0, 1, (21)X X X satisfy ( D S (1−i)(h∗ ∥ ∥ (i) ) ( (i) ∥∥ (i) ) X ) P K = D Q K P K , (22)X X X (λ ) (i) (0) (1) where P̃ iK ( · ; h∗, P K ), i = 0, 1 are as defined in (11). In this context, Q and Q inX X XK XK (0) (1) (21) are the I-projections [27] of P K and P K onto this linear family, respectively, whichX X ( (0)) ( (1)) also induces the two exponential families EX h∗, P and EX h∗K , P K with h∗ as theirX X common natural statistic. Additionally, all the points in D0(E∗) and D (E∗1 ) are divided by the the linear family LX (h∗). L"AiHYNN2fpve4uB4A=C"3YcnVHLLskAJDrN81zfRZBCiIVg2kjqRpRFSyo LVBnaFafGKu6ZhTWE8cBb8m5hZfAJBRVCxiUlXOu3if2uxxTKE05/R4xFfbj/LR8fnwPDu/7QkvovcjcFRFi8yIbD0oBhYyubXnRn43ZnzNamc71lVwU7XdEJI1EY2WUtdZdT5x8hRgQagH8hekFdeyz4d74nzJRiraxnwpOmwfnyPsi3+P9HbcBZ6Byrc ybBsa3tKoZ65WcXE++BEKMUrUaFJJRCouXSbpLBa/3R6AlNUtaBqOdBBIr4pE8HtAuhfyITHs0g/ieGwmwpQ474kSULPIyTxEUNIdWEZnILVicLVku6OL9zZAXA5D3nnSpJPZrHqQdlrqSI2I9R72X61XlvJO6UoXt1qldPMUopnVop+6bxyV1nXPlbCl TuaEI3K5HiwD+DeXqyqvadF7uN4rMUcm61irB2BrzuE2fUB6W/86mworpF2agtJi/netlYLmaehFWG5nbp2A>GVXKvxo3/8zAJ4zqhNe7LS51aq2nilpXuvHJGr4vXFCa1T_VbXaKsne+6a4l=w"mOU+"4AZA213SbwB/iv1jxmYRjRSkb0i3vNK7r1Q6nq4tFz7jsEKPSIjlZ3hTMk2oiqivupKi/cFj/VPHDLQSvsCNoAXFCDj2nNnrm1rp2fDVxZKdXuDg0kDW6ohLPP xOOTXc7/wkfZh+cfxqfcwZ0BuE1XnNCRe4l4fA4b1RNsLcu1t5O4bTVqy+UqfEDlRHKusuOIe4MjUASTczPbTFmczcz27Z9cl7cK1nZ5u0LJsaSLxYEi UQkzKw5pqRk9Hs97B7ESx10REg4UE3jxgdQjcqCoHtJrOUySCIIsaUa/n6ggTFISs9h2MVS11e08CyUmuMILu0TKo0ukEaAvPiOOd3I8mQ2KCYU7o7l HeY7Uo1bp+vdjennJnLEkSTl/wmQp+D5CeHuWns/sl9f+nVa8pwFb46gSxJbSuzgHapHhalg9ot1ao26arejZqrUpuMiquY6zJ3ur5JAgyO7K/cHq+V iG=n4mvUgWa8xyhyuXlH9pY4O31CQgm/Ue7>HiXcUGWfCr1Fh3E+kGelaM59iyQjrA2wUlUkWbAuATAtCX0YHsiQcbj0VtHp rzcFh/oX3hnTEEtohOSnDn3njNz7zUDx464pr1mlKnpmdm57HxuYXFpeSW/ulaP/Di0WM3yHT9smkbEHNtjNWh5WDzBmk6buGOCgWTcY8hVS+IT9A2uQzK6St3EW8UYFFpEn4XTQ9rseGKyXPUu38nAugN56gOjQpMTGJlUh2h7stY5qb9NRChUg0xsZ/ZXwdGChpFmS1MSJlt6ETXuAaNs4UA3gm5piqO5iDY+0dBhh1gAjMH7M9CP PbpLEpsyQNTlJlPgbONv3Ol5Ge4m5D+pupl0ssxxWxf+nGmf/ViVo4ejiUNdhUUyAZUZ2VusSyK+Lm6peqODkExAncpXhI2JLKcZ9VqYka7Kd3Wh9oZyM/IyUUOzxBqitrS2xVU5HsEZI4PFz7ce0klA/eUse/gx0zak4JK6MjFsTl+fqbe+K0cVbknKfkKq/eWkw61/4iB2qlPI/cUa==l (0) Q K H0=C>AAA"gyIV+dWmRZ649E6NGsHslSwb1ED="416behs_aas teixtla< tTmSumUnbeAi1CtjRjXF2rrSIpK25NmHoiW1kNgDuAdsZLVVfc3TiYnsoYnQni7bEvO0nBNtU/TpEVh4ShD/3ojXzhzDx476m4npmrx1pmdl5KHuXYo Sm8pX1FldgAEC56Ty6grAT+jPuht7aDpds1TMU9TE32usG7J+Jt9Gq4Uy5bNCNmoVfh8HWHg5TtOb69CXbz6WujmNBEkECsWTzyX0Wi/Dp/SLuaWleF uIpJh4QaeEgQgMTQO1HyG/DpAoPGCLzWFucCMrHFjMgyhWB20MYHimOrs3u90IYwUyYyZTYV1Fup63Ti/0ZBzu/SSUusbUQWjaWX2zHKzCqwnmNLQ0m a+5qUqTg4BcQJ3KR4Stq5RLLe0nltT/+ym3eJi5lkw2vCt6ZUHcuP+rYPOwMpN1ISixueJcVdUGq3RyXLGVsX7px5n91ohaOriirsHHlegGE5hFVbTDl iCC/HUDsMaKjCdYPLWykfpqfPk1YFHwOOZlVaulTV/NNayXS8dbxHylMY4g6GB==a X (0) P XK tmjTcSVu8Thrmg3rLSNCyaeihosXTKHWhdMpi2YNzV9HEALtXliQiYCsY1RrZnbm0eqR62"I=54o6keusZafb1_N1DaAhssH jtii0xAeA="L 9H6IDG/DByHw5YtW8Mh1H2pImQ3yo4CMOMkaVrmQWC/JuupE4WzgSLngVGOeoogQCpmhWO5itYb09BXhigajSHXMHcdFKzrC4PzUjYD0huEs3Ah//8vXyFNdfAWCT66pbgumBEC1z5zSWTjlNgE6ErsTTjyuNt06DpLslTWUeTF3YuxG7JmJm9nqlU15pE6Dx2U+7sNdn7nhn+O1otEATGhMX4/74yp9tb0Pbu sXX7paVyHu2FZyTVFp3i0BuHM5sZyxbSxVXQdUaBSlaOyLVzNKjy/3lqvsTGllu1dpVqOnZzFLOWPGHq1RYQfgkqrrqrCGCEGTHglhDn8TMUNRKmaNCqwHY6P1LwqXpsfdk3yRWt04eKnJtc/4yT3UJ+5iki2HCL6eUVchPFrbPDw5pH1aSox9e5H3Trumlc+Lm5Ybw/OuluMSvTN/eYIWtjiQ+0umZCJziaSU 4/CUBw== (1) ipRhi/atB"0ev6bHiAQzYxsyTJNbYiSXjo0tAmETsSJunUab3Ays=fsF1mCseLjVRcXH2CrAI>K=5wm0omWOkrg1uwdxZqVffXpf1Jr"N42eDaF_Aa setax (1) P Q XK XK Dxiez"t pl1A1ACc3iHjVsJSERATH0G93vLeHQS1ATTMChid8QXgKjLLDOq0GnpZYPqSDOiDtaNbvEcm/pcyxnEuWNP3XiR3oOvpoTzl1rgym+9CtdsrevZwJ7K0pLpLTlhW8/HJTBJU5Kx95PcGQdPIHJAhDgICHSvhww0pPQs4dBDTI4kuQQThr87FElZk6UgZzjYALoh2SvH5V73VXlRv7Xipt+6Vg9PUPDA4NjxRHx8YnJs2p6WoaZQkXFR75UVJ3WSp8LxQV6Ulf1ONEsMD1Rc0931bx2oVIUi8KD+VVLEC4 rn1qYS237pJtDhjTF7xIC2rvcloEnThqDkmsWtwnXresl6mU51VqUa45hdH9WZVKM6CefmIuFcrbdu9jxq6yao313pxolKnPoapgNOlbet0JZRpLUUppv+c/tMn3ZLhdb/uidN0jeiYbuHNX7vBOKza8zw3w5u6JQ3pYT7+n6O6M4fGdvIgG2lU6zHS3kp/BE3zjSFHXUXCxBY5vFuUnHojZwzXr6+KVKsqyWbH M7syBt8oLWWCg== 1(E⇤) (i) Figure 3. The geometric structure in distributed hypothesis testing, with Q K denoting the I-X (i) projection of P K onto the linear family LX (h∗), i = 0, 1, and LX (h∗) can devide D (E∗0 ) andX D1(E∗) in different half spaces. 4.3. Local Information Geometric Analysis Although an explicit information geometry has been shown, we apply the local infor- mation geometric framework [13] to provide fundamental insights into this problem. Some useful notations and definitions in local information geometry are introduced as follows. Definition 4 (e-neighborhood). Given a finite alphabet Z , and letting RZ be a distribution supported on Z with all entries being positive, its e-neighborhood N Ze (RZ) is defined as { } 2 N Z (P (z)− R (z))e (RZ) , PZ ∈ PZ : ∑ Z Z ≤ 2e . ∈Z RZ(z)z Entropy 2023, 25, 1434 9 of 24 Then, with RZ used as the reference distribution, each distribution PZ ∈ PZ can be equivalently expressed as a vector φ ∈ R|Z| or a function f : Z → R with P ( ) , Z (z)− RZ(z) φ(z) φ z √ , f (z) , √ , ∀ z ∈ Z , (23) RZ(z) RZ(z) referred to as the information vector and feature function associated with PZ, respectively. This provides a three way correspondence PZ ↔ φ↔ f , which will be useful in our derivations. Based on Definition 4, we introduce the local assumption that (i) P k ∈ N Xe (PXk ), for i = 0, 1, (24)X (i) ↔ (i)We use ψ P K , i = 0, 1 to represent the corresponding information vectors [cf. (23)].X For each k = 1, . . . , K, and given feature fk : Xk → R, we define the corresponding informa- tion vector φk ∈ R|Xk |, where PX , [PXK ]X is used as the reference distribution. Note thatk k (i) (i) (i) for i = 0, 1, the correspondence BTψ(i)k ↔ PX exists, where PX , [P K ]X represents thek k X k corresponding marginal distributions. Specifically, Bk is an |X | × |Xk| dimensional matrix with entries [29] √ P KK (x ) Bk(x K, x̂k) , X δx x̂ , (25) PX (x̂k) k k k where δx x̂ represents the Kronecker delta.k k Moreover, the feature fk defined on Xk, when considered as a mapping from X to R, corresponds to the information vector Bkφk in R|X |. Leveraging this correspondence, we can further establish the information vector for h(xK) = ∑Kk=1 fk(xk) as K ∑ B |X |iφi = B0φ0 ∈ R , (26) i=1 where we have defined   φ1 [ ] B0 , B1 · · · BK and φ0 ,  . ..  , (27) φK and where for each k = 1, . . . , K, φ ∈ R|Xk |k denotes the information vector corresponding to fk. Additionally, given a matrix A ∈ Rm1×m2 , we use A† to denote its Moore–Penrose inverse [30], and we define the associated column space R(A) , {Ax : x ∈ Rm2} and projection matrix ΠA , AA†. Then, we can establish the local counterpart of E∗ in Theorem 1 as follows. (i) Theorem 2. Under the local assumption (24), let ψ(i) ↔ P K , i = 0, 1 denote the correspondingX information vectors. Then, for h∗ as defined in (17), we have the correspondence h∗ ↔ B ∗0φ0 , where ∗ ( ), B† (1)φ0 0 ψ − (0)ψ , (28) and where B0 is defined in (27). In addition, the optimal exponent E ∗ in (10) can be expressed as ∗ 1∥ ∥2E = ∥B ∗∥ + o( 20φ0 e ). (29)8 Proof. See Appendix D. Entropy 2023, 25, 1434 10 of 24 Note that from Theorem 2, we have h∗ ↔ B0B†0( (1)ψ − (0)ψ ) = Π (1) (0)B (ψ − ψ ),0 where ΠB is the projection matrix associated with the subspaceR(B0). The optimal feature0 B ∗0φ0 in (26) corresponds to the projection of the sufficient statistic f (1) (0) LLR ↔ (ψ −ψ ) onto the function space that encompasses all possible h’s satisfying the form h(xK) = ∑Kk=1 fk(xk). In other words, B ∗0φ0 represents the best approximation of fLLR within the function space of interest, which leads to the optimal decision error exponent E∗ as shown in (29). Moreover, from (26), this optimal feature can be decomposed to K components in subspacesR(Bk), for k = 1, . . . , K, K B ∗0φ0 = ∑ B ∗kφk , (30) k=1 where φ∗0 is stacked by φ ∗ k ∈ R|Xk |, k = 1, . . . , K, as in (27). This decomposition structure can be depicted as Figure 4 for the case K = 2. R(B2) ΠB2(B0φ ∗ 0) B2φ ∗ ∗ ∗ ∗ 2 B0φ0 = B1φ1 +B2φ2 R(B0) ∗ ∗ R(B1)B1φ1 ΠB1(B0φ0) Figure 4. The information decomposition structure in distributed hypothesis testing with K = 2 nodes, compared with the orthogonal decompositions on the subspaceR(Bk) for each node k = 1, 2. Remark 1. The vectors Biφ ∗ k are not simply the orthogonal projections of B ∗ 0φ0 onto the subspaces R(Bk) since these subspaces, for k = 1, . . . , K, are not mutually orthogonal. Therefore, the decom- position of B ∗0φ0 will depend on the Gram matrix [30] of the subspaces R(Bk), as illustrated in Figure 4. Furthermore, it is noteworthy that the orthogonal projection of B φ∗0 0 onto the subspaces R(Bk) can be interpreted as characterizing the optimal error exponent of the binary hypothesis testing problem solely with the observations of Xk [12]. When the subspacesR(Bk) are orthogonal to each other, the optimal inference approach is straightforward, involving the extraction of the optimal information from each node by orthogonal projection. However, when the subspacesR(Bk) are not orthogonal, different nodes may share various forms of common information. Our result fundamentally demonstrates how to handle this shared information and extract the optimal features through the decomposition of the information vector over non-orthogonal subspaces. This insight provides a novel approach to address the challenges posed by the non-orthogonal subspaces and reveals how to extract the most informative features effectively, ultimately leading to improved performance in the distributed hypothesis testing problem. 5. Type-Based Hypothesis Testing over AWGN Channels This section presents the optimal error exponent of the type-based hypothesis testing problem over AWGN channels, along with the corresponding coding strategy. To begin, we introduce several notations that will help in the presentation of the results. Definition 5. Let [K] , {1, 2, · · · , K}, and for subset ω ⊆ [K], i = 0, 1, we define Dωi ({RX } (i) k∈ω) , min D(QXK‖P ), (31)k Q ∈S XK XK ω Entropy 2023, 25, 1434 11 of 24 where { } Sω , QXK : [QXK ]X = RX , k ∈ ω .k k [K] It would be easy to find that D ∗i (·) = Di (·), and D∗i (·) is as defined in (9). Moreover, we define the following error exponent with respect to ω ⊆ [K]. { √ { } (θk − pk) 2 Eω , min max Dω( R { } { } 0 Xk k∈ω ) + ∑ , R 2Xk k∈ω , θk k∈[K]\ω ∈[ ]\ 2µσk K ω k √ } (θ 2 ω { } ∑ k + pk) D1 ( RX k∈ω) + , (32)k 2µσ2 k∈[K]\ω k where we have used A \ B to represent the relative complement of set B in set A, and where µ is as defined in (8). We can also find E ∗[K] = E and E∗ is as defined in (10). Finally, we define the quantity E , which will be shown as the optimal error exponent E , min Eω, (33) ω∈=([K]) where =([K]) denotes the power set of [K]. Theorem 3. The optimal error exponent of (4) is given by 1 l→im − logPn(Ĥ 6= H) = E . (34) n ∞ n In the following, we prove Theorem 3 by both the achievability and converse result. 5.1. The Coding Strategy for Distributed Nodes First, we define the different regimes of empirical distributions, for each k = 1, · · · , K and for some γ ∈ (0, 1). Basically, the specific choice of γ does not effect the achievable error exponent as long as γ ∈ (0, 1). It helps conduct the decode-and-forward and amplify- and-forward coding strategies as introduced in Section 1. Decode-and-forward regime: { } M(0) (0)k , RX : D(R ‖P −γk Xk X ) < n ,k { } M(1) , ‖ (1)k RX : D(R P −γX X ) < n .k k k Amplify-and-forward regime: { { } } Mc , ‖ (0) (1)k RX : min D(RX PX ), D(RX ‖PX ) ≥ n−γ . (35)k k k k k Note that for each k = 1, · · · , K, the probability of the empirical distribution P̂X inMc( ) k k is exp −n1−γ . Consequently, in the amplify-and-forward regime, we can transmit such empirical distributions with exponentially large power by Pulse Amplitude Modulation (n) (PAM) while still satisfying the power constraint. Specifically, let PX be the set of allk (n) possible empirical distributions of Xk with n samples, and denote η c k , |PX ∩Mk |.k (n) We define the bijective function ξk : PX ∩Mck 7→ {1, . . . , ηk} as the indices of empiricalk distributions. Then, according to the observed empirical distribution, the encoder of node k (k = 1, · · · , K) is designed to transmit the signal ( 1− )γ Qk(P̂X ) , ξk(P̂X) · exp n 2 . (36)k Entropy 2023, 25, 1434 12 of 24 Furthermore, if the empirical distributions are in the decode-and-forward regimes, we initially detect the true hypothesis and then transmit the bit using Binary Phase Shift Keying (BPSK) with the appropriate power. By employing these strategies, the achievability result can be obtained through repeated transmissions from all the distributed nodes. In other words, the resulting encoder for node k is defined as follows: g∗ ∗ ∗k = [gk , · · · , gk ], k = 1, · · · , K, (37) where  √  (0) pk − δ(n, γ), if P̂X ∈ Mk k ∗ √gk (P̂X ) ,k − pk − (1)δ(n, γ), if P̂X ∈ M , (38)  k k Qk(P̂X ), if P̂X ∈ Mck k k and where Pn(P̂X ∈ Mc ( 1− )γ (n, ) , max k k ) · (n + 1)2|X |δ γ k∈ M · exp 2n 2 . (39) k∈[K] P cn(P̂X /k k) Proposition 2. The encoders as defined in (38) satisfy the power constraint (6), and l→im δ(n, γ) = 0. (40)n ∞ Proof. See Appendix E. 5.2. Decision Rule and Achievable Error Exponent After the decision center receives the output signals g∗1 (P̂ ∗ X ) + Z1, · · · , gK(P̂X ) + Z1 K K, we then compute 1 m , ∑ [g∗θk (P̂X ) + Zm k k k]i, k = 1, · · · , K, i=1 where [·]i denotes the i-th entry of a given vector. Then, we conduct the log-likelihood ratio test (LLRT) to detect the true hypothesis: Pn(θ , · · · | = ) Ĥ=H, θ H H 00 log 1 KP · · · | R 0. (41)n(θ1, , θK H = H1) Ĥ=H1 Note that exponentially large power is allocated for the empirical distributions in the amplify-and-forward regime (cf. (35), (36)); the decision center can correctly detect the coding regime of the nodes with super-exponentially high probability, i.e., for k = 1, · · · , K, ( ∣ ( )) l→im − 1 1−γ logP P̂ ∈ Mc∣∣θ ∞ n Xn k k k ≤ exp n 4 = ∞, n 1 ( ∣ ( ))− ∈ M ∣ 1−γl→im logPn P̂X / c k∣θk > exp n 4 = ∞. (42)n ∞ n k Therefore, we can assume that the decision center knows the coding regime of the nodes and define the following regime of the received signals with respect to subset ω ⊆ [K]. { ( ) ( ) } Θω , (θ1, · · · 1−γ 1−γ , ′θK) : θk > exp n 4 , ∀k ∈ ω, and θk′ ≤ exp n 4 , ∀k ∈ [K] \ω , Entropy 2023, 25, 1434 13 of 24 for all ω ∈ =([K]). When the received signals (θ1, · · · , θK) ∈ Θω, the decision center can recover the empirical distributions P̂X (k ∈ ω) from the received signals θk by the decoder:k (⌊ ( ) ⌋) Q−1 1−γ k ( −1 θk) , ξk θk/ exp n 2 + 0.5 , (43) where b·c denotes the floor function [31]. The following result shows that decoding error of (43) can be neglected. (n) Proposition 3. For all P̂X ∈ PX ∪Mck , k = 1, · · · , K,k k 1 l −1 n→im − logP(Qk (θk) 6= P̂X ) = ∞. (44)∞ n k Proof. See Appendix F. In the following, we denote p′k , pk − δ, for k = 1, · · · , K and discuss the decision error exponent when the received signals are in Θω. For k ∈ ω, the empirical distribution P̂X can be recovered by (43), and for k ∈ [K] \ω, node k detects the hypothesis accordingk to the observed empirical distribution and transmits the detected bit by BPSK (cf. (38)) through the AWGN channel. Then, the decision center detects the true hypothesis from the received signals by LLRT (41), which can be reduced to Ĥ=H1 Ẽω ω0 (θ1, · · · , θK) R Ẽ1 (θ1, · · · , θK), (45) Ĥ=H0 where for i = 0, 1, Ẽωi (θ1, · · · , θK) √ √ (θ − p′ )2 (θ ′ + p′ )2 , min D∗ k k k k′ i (P̄X , · · · , P̄ ) + + , ω̄∈= 1 XK ∑ ∑([K]\ω) 2 2k∈ 2µσk ′∈[ ]\( ∪ ) 2µσω̄ k K ω ω̄ k′ where =([K] \ω) denotes the power set of [K] \ω, and where for k = 1, · · · , K,  Q−1 (θ k k), if k ∈ ω , (0)P̄X PX , if k ∈ ω̄ . (46)k  k  (1)PX , if k ∈ [K] \ (ω ∪ ω̄)k Consequently, the decision error exponent is characterized by the following proposition. Proposition 4. For any e > 0 and ω ∈ =([K]), the decision error exponent by the decision rule (45) satisfies 1 ( ) l→im − logPn Ĥ 6= H, (θ1, · · · , ) ∈ Θ ≥ E θK ω − e, (47) n ∞ n where E is as defined in (33). Proof. See Appendix G. Noticing that the overall decision error probability is Pn(Ĥ 6= H) = ∑ P(Ĥ 6= H, (θ1, · · · , θK) ∈ Θω), ω∈=([K]) Entropy 2023, 25, 1434 14 of 24 the following proposition establishes the achievable error exponent by the coding strat- egy (38). Proposition 5. By using the encoders g∗1 , · · · , g∗K as defined in (38), and the decision rules Ĥ from (41), the achievable error exponent is given by E , i.e., l→im − 1 logPn(Ĥ 6= H) ≥ E , (48) n ∞ n where E is as defined in (33). 5.3. The Converse Result In this section, we show that E is indeed an upper bound of (4), which establishes The- orem 3. Our main technique is to apply a genie-aided approach, which provides different kinds of additional information to both nodes and computes the corresponding error expo- nents under additional information. As depicted in Figure 5, given index set ω ∈ =([K]), suppose that for all k ∈ ω, node k can know and cancel the channel noise in advance; then, the channel is noiseless, and the decision center can perfectly receive the empirical distribution P̂X . On the other hand, suppose that for all k ′ ∈ [K] \ω, we can leverage the k true hypothesis H to node k′; then, with such additional information, we can establish the following upper bound of (4) (cf. (33)). Proposition 6. Given index set ω ∈ =([K]), suppose that for all k ∈ ω, the decision center can obtain P̂X perfectly. Additionally, for all k ′ ∈ [K] \ ω, node k′ can obtain the true hypothesis H. k The resulting optimal decision error exponent is 1 l→im − logPn(Ĥ 6= H) = Eω, (49)n ∞ n where Eω is as defined in (32). Proof. See Appendix H. 3SNIRcoAOAtCtXccuViLRsQ DF2AN/7o/Bq2qtuFl4HScoKkBQZHFZE/M/psEpR1MmWRVW6DMHZAlFqx/EYEBwW0k4lfKIwCJ4rrGuOHcHNhzTrROoxQhFof0UqOtMg3FPRmrzHDg1G36zjs6yToztpqZGr7V EUEcdHVUV46VbJK5RqrWgplmQNJ7VLMNyCqrickM3eVbowuAF2+FKXTV4Xq3t+Pv5abAQDOEKutjBUwgiDa1uGGIIEcCyM2gSrYJkDFBT5nJk8pd3TIPxKnNi3WlYupclVMX ud6AdProStBfY1qu1vAu7JAtCI84hxCerDP0UKUtA3K4pShEFujsUKrw5A1ae3XdseO+f2buRtgylhc/Gl+a1p2V8a1Jvd1/xXdWqZHH+pwpnzlJebgRfd4J6d/xnC52dbH5B c3P9qxfai8vRSdBZlm5Lp+7Tk/Tmsn=XxYiAH9R3OAJbjLO5JqUNzxSm3+bd l5uixk k P̂Xk k leaxai_te 1sbhsaL ychL0pbaRMF/V9AkTnvIjAwpiMjYwWK6uUAUywnFi/Co+mPwNDO18zPCVqA6RuV AJOwvmYPKUV85NVVXTjHd1yVJt94WigPyaLR6Qmzvba3XL2ieO8HRKF2nmozHLd g2Jd/+G2oUq+omVX1f2/OndcYdlbZsPX+4maMZVZUKp6VCHeKmYtJ5URWSyKsZ0 KvSJGU/EMS7jgkJ9BhgQ3JGc4tgo4dAWgNROgM1ERhhCtUtkDxMw2oTtc0mgQ0p E+uxRzdH3ipJSK229RNlEmWc+Mbl2RIi1qozbRFWnaILUUUXys3iIaetB/7Zmp/ 79olrWdTGvTwo1RpajpdbJ+ibrO+q/qJH6H/vqsoTzFYKsw73gA+bfQL/rO0Dc6 l2jjqLnNztk9+bXgLNZPrzDrL/tVOh848/g/eBm//PsRNOZ9XZ+PRuaZdmXgoT8 mXMhQBs8k 2 ! Decision asteT6M4/=a"e6iiZb=4bhkNg8MVFRfN7_Y1"bhuitlaqpqA grXHnHWiHNbTjTh2p91c0/fMcD7gHUcHrFW4nZLH1YQvAJK4xxKMQ7cn3tJ1rwmNwVb7+UdbfMXNX229r17TcFm4q4oZ5uz8NacL5R mmCCuLi9KktKrqHL5mBVYfFxoMquc122YWOEnyDz2Jtgnoij6/VBUKYkAQPJjBSBHGI14pPaEvIU/c5IohSbkGZhLoCwiAJSvHUBqg rTFs11JvKULCRtH0qSqkWGwdPZKeuUzcJchy7zDBB39bRDbXatXgn9Due2rV9Pz7pPq39XvW0nF5BQuoV++OOlHIH1mmV0suYUIDW7 5ufnYOBBfLs6mOfMy7frs50Jdp2GRDzse9wJmfqN/Sa3ZpYp7PnxTZx+GWlmDY0Fhm9VfG+DoIf0YlGncE5VJE8alTOaz1XXoumVt6 S0kh1LtHq1xHzGIkaPaoi=>CAaAIA3hbXBiocGhvVIG37OTIhotGBJFIDX15eX GWecXZQmp5iNkmiH7dwsQwBKEKZlF4wA+cyo9q+IOglUaRSRv5gkQuajbOGLnnzpnkZa6V EhvkNOMzYXiuvWP5MytLI85lJy+b3cdRh1rTE+hr6EnweUUMS0ISavKP53iDNmWvlp7gNF cdL3/Zap4m3J4UgiXmO1dQQZn2XSii8uYMRntTt9R87MS1EXvQ9w2+54Ts0oxXXTmDjXUnO7nwAIqNXj9cRMBS8gCWnZo5gFkkXXQkgKKCIjBHJIdITo45HdJN4/50IJjOkCZRIt7l4RnhUGk2EajuEHdJm5hnsO0a69oAe8srwX6dt2wG5OpOBZpWL6bho/0lF6b/d2z54+VMFPEb jx74lrLvHXpv64usVxwo89LBFYK2Yve6Gf2SB7TPaAg7TX518/wci3rbFz+D6Ov4zVUnfvP yJzLqps/n/WYA4L0voU0l80dQfrGr1/CebTHC1+F9ftb1zbxiOabi+SYrT4atu0ee4O+lw >ttUSseixlta4/xA5C+37UwSpaIs0e"6A4A=r"ikhuFmaSR2E50HP7LZjOxPEUwal5fMkBYvPT3dGc2FmoxF2smqZK7zTJFzyeb9iGxefgeaw+BYa8fGpH/dzO3v=wqGpMAFuz9MPcNQS1i Zk0 ⇠ N (0,2k0Im) aBbr6UhRs=1i 6eU<1zg1hmx"L4zeFat_Caqs/trxrthl DMQo8ZxL2ECkg38iRuEARhKFKqseysO1spf"nAFi0HMRd3JE9NvIMZhJuha2dMLX9eY=E>PAoCXHWcNVyLagGBTDcT AvEVwEHkGWdij7wXzhWShwiSb18lViVJV63PwSu3NENUnT16IVXVXEc7mxs6rhRVmWbkqsxWGsqOfp8113VlbnNtVz VfwiEltSnN/lOSMVOLwsDC+bLUmt6Cs0q+W70reUf9TW/2/7lREENU35RarlY+yhcfEl+iNqfkXaT9K20eXurptLYk BIsQbFHyQ0zGBuiN5ATSxL2w3OIUiEuKkLVAWpwyF2DyNLdrR1LNFk1Li1iaR72gDYZGMmfjQNILlMMGAr2zHlCY5Q N3+y7VqCLjzu+e7k0zhPJtd0W5azs/LlWthF1p/Jj0tLSHlbI3lZBw32ElVn3spCWdCrTt6dkAlHuZXoxf1j52S8Wp 6riiYYqJhF/Z6l/NQrTdPh4/kmXh4aD8u/MoAKj3PL37+w5J2JDg5+/xSRx3o9C+g9wImM5ZOjcQ1zhGhRB/MySdq5 pQmna8CWk2XdymJoFzQqUuT5Vjq EILE=X"j>7ArAyAzCxsyntiic4ScyErMIKysSBwcuelaGDYwAU1=wqK=txieta/l< (1) x 0 , · · · (n), x 0 , H Tjs6E+=BNI6"kASAagei4h"GHFyUCft+9g4S1 AsChHac1V_7bg Vx993FD2u7/iK2gg2YojGJs6KqFgFbSxQQSCbubMS7ZF7uTQFxS2PoDFlYKEsRGv8HGH7DwE8RSwcbCu5sFUBHvsnPPnLnnz5d5beYjhNxqk4GHwZ14PpnXdC12F2DTz0FT1Z czWe1WzDdvOq4nFDt3hW6MLgecfliqkaPKdW14P1XJ27nm5bu6Lh8H1TqVj6ga4pgFqhrSfLKo2kbZa5iU/Eqz5Fdnm6miVbKXwhUST37KAU08CK2hPcqmJa5hALjskGEBAMFxI7xCYCe0ybj2eOSUwxzZarfCnE/EoztOjX7gqRChqALwloT8j+YgqqFQowFepANQstLwW+YieOqJWeBQDHpqU57zINDBUEKy2Bd3pbZFMXLt9gz9pux2+ZvP07HPA3dXrXtf kdA4lWOFoUbPXw1K95aIPJTMlt5r3kVG7XaZQM/UwGy37LYaM8/jkR/lZu4e/ksaj0kt5igr1WVU+f1jyNyN23+jfYYCbygOAxerTBvS1J/W0eTt7lCk2k6cekqkFX/AY/sFE /=t0ZcEAtGAX=89m"mAviN4R"aAsCen6ciiG6OqCJBZHdNa9Pt7mZ32oTGH5C4cnhebIt82JSyL3Nzm8HnK564SCp8b85MucjgOsDtPNvi7OVAPPEcAfEUF8Rwi/NwGHB6v6poqwiiS6JxcPLNrWcUGeF V20Rjlhjy1334dymPXJd0edG97nMgEeRM8GVIaHhrbqi306m2JFbEy1OpvX46nr4K9svCd3VZV6rMZMg5NWldoiC02RlaUZiNnuCBQDZ4LpnhKwxmFr41u0K r6mrr2bDAPYqIo9p+uZTIuoXMRBqI9FeN2tnPlNBWusR7wj2RocWmXG3JPF6KeqaxIR65uJXrYG6zuY6GZL6dGAxbgZy5HYvb75iBUVgsvnNIo263ngmCD1I jzmxkZk4+G+fxKIK1BBx3YhJ6PfDx38/gBoB/jBxbwmccUpUwfufe8ljPNeUBJ1P91lc7nnOz1zmR3YEYrVTT0v/+zrvOTC9xAhgbbToZczxIWz2pypNT+sL mkajieDRxSSAo>ltyi 0. (A3) Entropy 2023, 25, 1434 17 of 24 Indeed, since D(t) is non-empty, (RX , . . . , RX ) and e > 0 exist such that1 K D∗i (RX , . . . , RX ) < t− e,1 K for i = 0, 1, and thus D(t− e) is non-empty. To sum up, from (A1)–(A3) we obtain D(t) 6= ∅ for all t > t0 and D(t) = ∅ for all t ≤ t0. Furthermore, to prove (14), we define D̄i(t) , {(RX , . . . , RX ) : D∗i (RX , . . . , RX ) ≤ t},1 K 1 K and D̄(t) , D̄0(t) ∩ D̄1(t). Then, for all t > t0 we have min max D∗(R , . . . , R ) RX ,...,R1 XK i∈{0,1} i X1 XK = min max D∗i (RX , . . . , RX ) ∈ [t1 K 0, t], (RX ,...,RX )∈D̄(t) i∈{0,1}1 K where the second minimum exists since D̄(t) is closed and bounded. This implies that t = E∗0 (cf. (10)). Hence, marginal distributions R̃X , . . . , R̃X exist such that1 K D∗i (R̃X , . . . , R̃ ) = E ∗ X , i = 0, 1. (A4)1 K Finally, to illustrate the uniqueness of (R̃X , . . . , R̃X ), suppose that (14) also holds1 K for (R̃′ , . . . , R̃′ ) 6= (R̃ , . . . , R̃ ). Let R̃′′ , (R̃ + R̃′X X X X X X X )/2 for k = 1, . . . , K; then, it1 K 1 K k k k follows from the strong convexities of D∗0 (·) and D∗1 (·) that D∗(R̃′′i X , . . . , R̃ ′′ X ) < t0, i = 0, 1,1 K which contradicts (A2). Appendix B. Proof of Proposition 1 (i) (i) We know that D ∗ ∗i(E ) ⊂ SX (h ), for i = 0, 1. This implies that S ∗ c ∗X (h ) ⊂ D1−i(E ), where for t ≥ 0 and i = 0, 1, we have defined Dci (t) , (PX × · · · × PX ) \ Di(t).1 K Moreover, let (R̃X , . . . , R̃X ) ∈ PX × · · · × PX be as defined in Lemma 2; then, we1 K 1 K have (0) (1) (R̃X , . . . , R̃X ) ∈ L (h∗) = S (h∗) ∩ S (h∗). (A5)1 K X X X As a result, for i = 0, 1 we have E∗ = D∗i (R̃X , . . . , R̃X )1 K ( ≥ S (1−i) ∗ ∥ ∥ (i) ) D X (h ) PXK = min D∗i (RX , . . . , R ) ( − ) 1 XK (RX ,...,RX )∈S 1 i (h∗)1 K X ≥ min D∗(RX , . . . , RX ) (R ,...,R )∈Dc ∗ i 1 KX1 X i (E )K ≥ E∗, (A6) which implies (18). Appendix C. Proof of Theorem 1 On the one hand, note that from the Markov relation H− (P̂X , . . . , P̂X )− (u (P̂ ), . . . , u (P̂ )),1 K 1 X1 K XK Entropy 2023, 25, 1434 18 of 24 the minimum possible decision error can be obtained when we choose the empirical distributions P̂X , . . . , P̂X themselves as the statistics.1 K One the other hand, from Proposition 1, the error exponents associated with the ( (1) ∥ (0)) ( (0) ∥ (1)) type I error and the type II error are D S (h∗X )∥P K and D S ∗X (h )∥P K , respectively.X X From (18), both exponents are E∗, and thus the error exponent for Pn(Ĥ 6= H) is also E∗. Appendix D. Proof of Theorem 2 To begin, we define ψ , ψ(1) − ψ(0). Then, for given fk : Xk → R it follows from Lemma 17 of [13] that the exponent based on the feature h(xK) = ∑Kk=1 fk(xk) is 〈 〉2 1 · ψ, ζE = 2‖ ‖ + o(e ),8 ζ 2 where we have defined ζ , B0φ0 ∈ R(B0), and where φ0 is as defined in (27). Then, note that the projection matrix ΠB satisfies ΠB = (Π 2B ) and ζ = Π0 0 0 B ζ.0 Therefore, from the Cauchy–Schwarz inequality we have 〈 〉 〈ψ, ζ〉2 (ψT 2Π ζ)2B ΠB ψ, ζ ∥ ∥0 0 2∥ ∥ ‖ζ‖ =2 ‖ ‖ = ‖ ‖ ≤ ΠB ψ ,ζ 2 ζ 2 0 where the inequality holds with equality if and only if ζ takes the optimal values ∗ ζ = c ·ΠB ψ,0 or equivalently, B0φ ∗ † 0 = c · B0B0ψ for some constant scalar c 6= 0. To determine the value of c, note that we have ζ∗ ↔ h∗, where h∗ is the optimal feature (i) as defined in (17). Note that in (21), for each i = 0, 1, Q depends only on the product ∗ X K λih ; we may assume λ0 = 1/2 and simply use λ to denote λ1. Then, we have (0) 1 Q (xK ( ) (0) K ) = P̃ 2 K (x K; h∗, P X X XK ) [ ( [ ])] (0) 1 = P K (x K) 1 + h∗(xK)−E h∗(XK(0) ) + o(e)X 2 P XK ( √ ) [ √ ] = P (xK) + P (xK) (0) 1 XK XK ψ (x K) · 1 + PXK (xK)ζ(xK) + o(e)2 √ ( ) 1 = P (xK) + P (xK) · (0)(xK) + (xKXK XK ψ ζ ) + o(e),2 which implies the correspondence ( ) (0) 1 Q K (x K)↔ (0)ψ + ζ + o(e) . X 2 Similarly, we have ( ) (1) Q (xK)↔ (1)ψ + λζ + o(e) . XK Then, it follows from the second-order Taylor series expansion of the K-L divergence that (see, e.g., Lemma 10 of [13]) ( (0)∥∥ (0) ) 1 D Q P 2K K = ‖ζ‖ + o( 2e ),X X 8 ( 2(1)∥∥ (1) ) λ D Q K P = ‖ζ‖2 + o( 2e ). (A7)X XK 2 Entropy 2023, 25, 1434 19 of 24 Moreover, note that since (cf. Lemma 9 of [13]) [ ] 〈 〉 E h∗ 1(0) (XK) = (0)ψ + ζ, ζ + o( 2e ), Q K 2X [ ] 〈 〉 E ∗(1) h (XK) = (1)ψ + λζ, 2ζ + o(e ), Q XK we have [ ] [ ] 0 = E h∗(XK(1) ) −E ∗(0) h (XK) Q XK Q XK 〈 ( ) 〉 1 = 2ψ + λ− ζ, ζ + o(e ) 2 〈 ( ) 〉 = c ψ + λ− 1 c ·ΠB ψ, ΠB ψ + o( 2e ) 2 0 0 [ ( ) ] · − 1= c 1 + λ c · ‖Π ‖2B ψ + o( 2e ). (A8) 2 0 ( (0)∥ (0)) ( (1)∥ (1)) As a result, it follows from D Q ∥ ∥ 1 XK P XK = D Q XK P XK and (A8) that c = 1, λ = − 2 . Then, we obtain ∗ ζ = Π † ∗B ψ = B B0 0 0ψ = B0φ0 , where φ∗0 , B † 0ψ. Finally, the optimal error exponent is ∗ 1 ∥ ∥· 2 2 1 ∥ ∥2 E = ∥ΠB ψ∥ + o(e ) = · ∥B ∗φ ∥0 0 + o( 2e ).8 0 8 Appendix E. Proof of Proposition 2 . ( ) . According to Sanov’s theorem, P (P̂ cn X ∈ Mk) = exp −n1−γ , and Pn(P̂ ck X ∈/Mk) = 1.k Then, we have Pn(P̂ c (X ∈ M ) · 2|X | · 1− ) ( ) γ k k .(n + 1) k exp 2n 2 = exp −n1−γ , Pn(P̂ ∈/McX k)k which will converge to 0 as n→ 0. Additionally, for the power constraint, ( ( ))2 E[g∗2k (P̂X )] ≤ (pk − δ(n, γ)) · 1−γ P (P̂ ∈/Mc) + |Mcn X k k | · exp n 2 · P(P̂X ∈ Mck k k) ( ) ≤ p − (n, ) · P (P̂ ∈/Mc) + (n + 1)2|Xk | · 1−γexp 2n 2 · P(P̂ ∈ Mck δ γ n Xk k X k) ≤ pk. Appendix F. Proof of Proposition 3 Note that equivalently, ∗ θk = gk (P̂X ) + Z̃k k, (A9) where Z̃k ∼ N (0, σ2k /m). We then apply the typical result for Gaussian tail [32], i.e., for any α > 0, 1 ( ) α2− l→im logP Z̃∞ n k > α = ,n 2µσ2k Entropy 2023, 25, 1434 20 of 24 which implies that 1 ( ) ( ( )) lim− logP Q−1 1 1 1−γ→ (Q (P̂∞ k k X ) + Z̃k) 6= P̂X ≥ l→im − logP |Z̃k| > exp n 2 = ∞.n n k k n ∞ n 2 Appendix G. Proof of Proposition 4 Note that Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H0) ( ∣ ) . = Pn (θ1, · · · , θK), P̂X ∈ Mck , ∀k ∈ ω, P̂X ′ ∈/Mc ∣ k′ , ∀k′ ∈ [K] \ω∣H = H0 (A10)k k { ( ∣ ) ( ∣ ) ∑ ∏ P ∣ ∈ M(0) · ∏ P ∣ ∈ M(1)= θk′ ∣P̂X ′ k θk k′′ ∣P̂Xk′′ k ω̄∈=([K]\ω) k′∈ω̄ k′′∈[K]\(ω∪ω̄) ( ·∏ ∑ P (0)(θk|P̂X )P c ′k n P̂X , P̂X ∈ Mk k k , ∀k ∈ ω, P̂X ′ ∈ M ′ , ∀k ∈ ω̄,k k k∈ω (n)P̂X ∈Pk Xk } ∣ ) P̂ ∈ M(1), ∀k′′ ∣X ′′ ′′ ∈ [K] \ (ω ∪ ω̄)∣H = H0 , (A11)k k where (A10) comes from (42). By decoding the empirical distributions from −1θk with Qk (·) for k ∈ ω and Proposition 3, we have ( ∑ P (0)( c ′ (1)θk|P̂X )Pn P̂X , P̂X ∈ Mk , ∀k ∈ ω, P̂k k k Xk′ ∈ M ′ , ∀k ∈ ω̄, P̂X ∈ M ,k k′′ k′′ ∈P (n)P̂Xk Xk ∣ ) ∀k′′ ∈ [K] \ (ω ∪ ∣ω̄)∣H = H0 ( . = P(θk| (0)P̂X = Q−1k (θk))P P̂ = Q−1 c ′k n Xk k (θk), P̂X ∈ Mk , ∀k ∈ ω, P̂k Xk′ ∈ M , ∀k ∈ ω̄,k′ ∣ ) P̂X ′′ ∈ M (1) ′′ , ∀k′′ ∈ [K] \ (ω ∪ ∣ω̄)∣H = Hk k 0 . ( ) = P(θk|P̂X = Q−1k (θk)) · exp −n · D∗0 (P̄X , · · · , P̄k 1 X ) .k With  √  (θk′ − p′ ′)2 P(θk′ |P̂X ′ ∈ M (0) . k   k k ) = exp −n · , 2µσ2 k′ and  √  (θ ′′ − p′ )2 P | ∈ M(1) . k ′′ (θk′′ P̂X ′′ k ) = exp −n · k , k 2µσ2 k′′ we have Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H0) { } . − ( )= ∑ ∏ ·P( 1 ωθk|P̂X = Qk (θk)) · exp −n · Ẽ0 (θ1, · · · , θK) .k ω̄∈=([K]\ω) k∈ω Similarly, Pn((θ1, · · · , θK), (θ1, · · · , θK) ∈ Θω |H = H1) { } . − ( )= ∑ ∏ P( |P̂ = Q 1( )) · exp −n · Ẽωθk X θ (θ , · · · , θ ) .k k k 1 1 K ω̄∈=([K]\ω) k∈ω Entropy 2023, 25, 1434 21 of 24 Note that P( |P̂ = Q−1θk X k (θk)) is not related to ω̄ and H, and then we can derive thek decision rule (45) with LLRT. To compute the error exponent, we use Proposition 3 and the . fact that P(θk|P̂X = Q−1k (θk)) = 1 when θk = Q(P̂X ). Then, the optimal error exponentk k corresponds to min max min {P̂X }k∈ω ,{θk k′}k′∈ \ i=0,1 ω̄∈=([K]\ω)[K] ω √ √ (θ ′ 2 ′ 2 ∗ k − p (θk) k′ + pk′) D ω̄i (R̄X , · · · , R̄ω̄X ) + + , (A12)1 K ∑ 2 ∑ 2 k∈ 2µσk 2µσω̄ k′∈[K]\(ω∪ω̄) k′ where for k = 1, · · · , K, and ω̄ ∈ =([K] \ω),   P̂X , if k ∈ ω k ω̄ , (0)R̄X PX , if k ∈ ω̄ . (A13)k  k (1)PX , if k ∈ [K] \ (ω ∪ ω̄)k To finish the proof, we introduce the following lemma. Lemma A1. For arbitrary functions v1, · · · , v` : Z 7→ R and w1, · · · , w`′ : Z 7→ R, where Z is a given set, we have { } min max min{v1(z), · · · , v`(z)}, min{w1(z), · · · , w`′(z)} z∈Z { } = min min max vi(z), wj(z) . (A14) i∈{1,··· ,`},j∈{1,··· ,`′} z∈Z With Lemma A1, we only need to compare each component in (A12), i.e., min min max ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω ,{θk′}k k′∈[K]\ω √ √ { (θ − p′k k)2 (θk′ + p′ ′)2k D∗0 (R̄ ω̄ X , · · · , R̄ω̄X ) +1 K ∑ + ∑ ,2µσ2 2k∈ω̄ k k′∈[K]\(ω∪ 2µσω̄) k′ √ √ ′ 2 ′ 2} ∗ ω̄′ · · · ω̄′ (θk − p (θ + p )k) k′ k′ D1 (R̄X , , R̄X ) + ∑ + ∑ . (A15)1 K 2 2 k∈ω̄′ 2µσk ′∈[ ]\( ∪ ′) 2µσk K ω ω̄ k′ √ √ Given ω̄ and ω̄′, let ω̃ = ω̄ ∩ ω̄′. By selecting θk = p′k for k ∈ ω̃ and θk = − p′k for k ∈ [K] \ (ω ∪ (ω̄ ∪ ω̄′)) in the minimization of (A15), (A15) equals min min max ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω ,{θk′}k′ ′k ∈ω̄∪ω̄ \ω̃ √ √ { (θ − p′ )2 (θ ′ + p′ )2 D∗ k k ′ (R̄ω̄0 X , · · · , R̄ω̄ k k 1 X ) + K ∑ + ,2 ∑ 2 k∈ω̄\ 2µσω̃ k ′∈ ′\ 2µσk ω̄ ω̃ k′ √ √ ′ } ′ ′ (θk + pk) 2 (θk′ − p′ )2∗ ω̄ ω̄ k′D1 (R̄X , · · · , R̄X ) + +1 K ∑ ∑ . (A16) ∈ \ 2µσ 2 k ′∈ ′\ 2µσ 2 k ω̄ ω̃ k ω̄ ω̃ k′ In the following, we denote Ω , [K] \ (ω ∪ (ω̄ ∪ ω̄′)). For those indices k ∈ ω̃ or k ∈ Ω, although they do not contribute to the Gaussian-like error exponents, they restrict that Entropy 2023, 25, 1434 22 of 24 ω̄ ω̄′ (0) ′ (1) ′R̄X = R̄X = P ω̄ ω̄ ω̄ ω̄ X or R̄X = R̄X = PX . By letting R̄X = R̄X = P̂X (k ∈ ω̃ or k ∈ Ω) thatk k k k k k k k k can be optimized, we find the lower bound of (A15). (A15) ≥ min min max ω̄,ω̄′∈=([K]\ω) {P̂X }k∈ω∪ω̃∪Ω ,{θk′}k′k ∈ω̄∪ω̄′\ω̃ √ √ { (θ − p′ )2k (θk′ + p′ )2′ D∗(R̄ω̄ , · · · , R̄ω̄ k k0 X1 X ) + ∑ +K 2 ∑ ,2 k∈ \ 2µσk ′∈ ′\ 2µσω̄ ω̃ k ω̄ ω̃ k′ √ √ } ′ ′ (θk + p ′ )2 (θ ′ 2k k′ − pk′) D∗1 (R̄ ω̄ ω̄ X , · · · , R̄X ) + ∑ +1 K 2 ∑ 2 k∈ω̄\ 2µσk ′∈ ′\ 2µσω̃ k ω̄ ω̃ k′ = min E − ≥ E e − e, (A17) ω̄,ω̄′∈= ω∪ω̃∪Ω([K]\ω) where we have used the fact that limn→∞ p′k = pk, ( ) D∗ ω̄0 (R̄X , · · · , R̄ω̄ ) = Dω∪ω̃∪Ω1 XK 0 {P̂X }k k∈ω∪ω̃∪Ω , ′ ′ ( ) D∗ ω̄1 (R̄X , · · · , R̄ω̄X ) = Dω∪ω̃∪Ω1 {P̂X }1 K k k∈ω∪ω̃∪Ω , and have substituted −θk′ for θk′ . Appendix H. Proof of Proposition 6 Let the encoders for k ∈ [K] \ω be functions of H and P̂X . The upper bound comesk from the fact that the type is also generated from the hypothesis H. Therefore, the encoder on both the hypothesis and the type is just a function of the true hypothesis. Suppose that (i) ρk : {0, 1} 7→ Rm (k ∈ [K] \ω) satisfying 1m E[‖ρk(H)‖2] ≤ pk. Let ρk denote the i-th entry of ρk, and { (i) (i) κk , if H = Hρk (H) , 0 (i) , (A18) κ̄k , if H = H1 1 (i)2 1 (i)2 (i) (i)where 2 κk + 2 κ̄k = pk and ∑ m i=1 pk = pk. The error exponent with respect to the LLRT is { m (i) (i)1 (θ − κ )2 min max n ∑ ∑ k k + Dω0 ({RX }k∈ω), { (i) 2 k RX }k∈ω ,{θ }k k∈[K]\ω,i=1,··· ,m k∈[K]\ω i=1 2σkk m (i)1 (θk − (i) κ̄k ) 2 } + Dω({RX } n ∑ ∑ 2 1 k k∈ω) . (A19)∈[ ]\ i=1 2σk K ω k √ (i) (i) (i) (i)∗ Here, we explain the optimality of κ̄k = −κk = − pk , under which let R∗X , θ√k k be (i) (i) (i) (i) (i) the solution to problem (A19). For other pairs of (κ̄k , κk ), |κ̄k − κk | < 2 pk . Let √ ∗ (i)∗ (i)(i) (i) (i) (i) θ + p θ̃ k √ kk = κk + (κ̄k − κk ) · . Then, we have(i) 2 p k √ (i)∗ ( − (i)p )2 (i)∗ (i)θ 2k k ≥ (θ̃k − κk ) , 2σ2k 2σ 2 k Entropy 2023, 25, 1434 23 of 24 and √ (i)∗ (i) ( + p 2 (i)∗ (i)θk k ) ≥ (θ̃ − κ̄ ) 2 k k , 2σ2k 2σ 2 k which will lead to a smaller error exponent (cf. (A19)) and the optimality is proved. The so- lution to problem (A19) is √ { m (i) (i)1 (θ − p )2 l→im min max ∑ ∑ k k + Dω0 ({RX }k∈ω),n ∞ { } { (i) 2 kRX k∈ω , θ } nk k∈[K]\ω,i=1,··· ,m k∈[ ]\ i=1 2σK ω kk √ m (i) (i)1 (θk + pk ) 2 } ∑ ∑ + Dω1 ({R2 X }n k k∈ω) k∈[K]\ω i=1 2σk { √ (θ 2k − pk) = min max Dω({R } ) + , {RX }k∈ω ,{ } 0 Xk k∈ω ∑ θk k∈[K]\ω ∈[ ]\ 2µσ 2 k k K ω k √ }2 Dω1 ({ (θk + pk) RX }k k∈ω) + ∑ ∈[ ]\ 2µσ 2 k K ω k = Eω. (A20) Appendix I Based on the results in Appendix D, Eω as defined in (32) satisfies { √ 1∥ ∥ (θ − p )2(0) 2 k k Eω = min max ∥B † ω(Bωψ ∥ω − φω) + ∑ ,2 φ ∈Rkωω ,{θk}k∈[K]\ 8ω ∈[ ]\ 2µσk K ω k √ } 1∥ (1) ∥2 (θk + pk) 2 ∥Bω(B † 2 ωψω − φω)∥ +8 ∑ + o(e ), (A21)2µσ2 k∈[K]\ω k where kω , ∑k∈ω |Xk|, and then the result can be easily verified using Lagrangian multipliers. References 1. Han, T.S.; Amari, S. Statistical inference under multiterminal data compression. IEEE Trans. Inf. Theory 1998, 44, 2300–2324. [CrossRef] 2. Ahlswede, R.; Csiszár, I. Hypothesis testing with communication constraints. IEEE Trans. Inf. Theory 1986, 32, 533–542. [CrossRef] 3. Han, T.S.; Kobayashi, K. Exponential-type error probabilities for multiterminal hypothesis testing. IEEE Trans. Inf. Theory 1989, 35, 2–14. [CrossRef] 4. Amari, S.I.; Han, T.S. Statistical inference under multiterminal rate restrictions: A differential geometric approach. IEEE Trans. Inf. Theory 1989, 35, 217–227. [CrossRef] 5. Watanabe, S. Neyman–Pearson test for zero-rate multiterminal hypothesis testing. IEEE Trans. Inf. Theory 2017, 64, 4923–4939. [CrossRef] 6. Shimokawa, H.; Han, T.S.; Amari, S. Error bound of hypothesis testing with data compression. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 114. [CrossRef] 7. Xu, X.; Huang, S.L. On Distributed Learning with Constant Communication Bits. IEEE J. Sel. Areas Inf. Theory 2022, 3, 125–134. [CrossRef] 8. Sreekumar, S.; Gündüz, D. Strong Converse for Testing Against Independence over a Noisy channel. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1283–1288. [CrossRef] 9. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A.Y. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; Volume 54, pp. 1273–1282. 10. Vapnik, V. Principles of Risk Minimization for Learning Theory. In Proceedings of the 4th International Conference on Neural Information Processing Systems, San Francisco, CA, USA, 2–5 December 1991; pp. 831–838. Entropy 2023, 25, 1434 24 of 24 11. Srivastava, N.; Salakhutdinov, R. Multimodal learning with deep boltzmann machines. J. Mach. Learn. Res. 2014, 15, 2949–2980. 12. Cover, T.M.; Thomas, J.A. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing); Wiley-Interscience: Hoboken, NJ, USA, 2006. 13. Huang, S.L.; Makur, A.; Wornell, G.W.; Zheng, L. On universal features for high-dimensional learning and inference. arXiv 2019, arXiv:1911.09105. 14. Han, T.S. Hypothesis testing with multiterminal data compression. IEEE Trans. Inf. Theory 1987, 33, 759–772. [CrossRef] 15. Scardapane, S.; Wang, D.; Panella, M.; Uncini, A. Distributed learning for random vector functional-link networks. Inf. Sci. 2015, 301, 271–284. 16. Georgopoulos, L.; Hasler, M. Distributed machine learning in networks by consensus. Neurocomputing 2014, 124, 2–12. [CrossRef] 17. Tsitsiklis, J.; Athans, M. On the complexity of decentralized decision making and detection problems. IEEE Trans. Autom. Control 1985, 30, 440–446. [CrossRef] 18. Tsitsiklis, J.N. Decentralized detection by a large number of sensors. Math. Control. Signals Syst. 1988, 1, 167–182. [CrossRef] 19. Tenney, R.R.; Sandell, N.R. Detection with distributed sensors. IEEE Trans. Aerosp. Electron. Syst. 1981, AES-17, 501–510. [CrossRef] 20. Shalaby, H.M.; Papamarcou, A. Multiterminal detection with zero-rate data compression. IEEE Trans. Inf. Theory 1992, 38, 254–267. [CrossRef] 21. Zhao, W.; Lai, L. Distributed testing with zero-rate compression. In Proceedings of the 2015 IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 2792–2796. 22. Sreekumar, S.; Gündüz, D. Distributed hypothesis testing over noisy channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 983–987. 23. Zaidi, A. Hypothesis Testing Against Independence Under Gaussian Noise. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1289–1294. [CrossRef] 24. Salehkalaibar, S.; Wigger, M.A. Distributed hypothesis testing over a noisy channel. In Proceedings of the International Zurich Seminar on Information and Communication (IZS 2018), Zurich, Switzerland, 21–23 February 2018; pp. 25–29. 25. Weinberger, N.; Kochman, Y.; Wigger, M. Exponent trade-off for hypothesis testing over noisy channels. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1852–1856. 26. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. 27. Csiszár, I.; Shields, P.C. Information Theory and Statistics: A Tutorial; Now Publishers Inc.: Delft, The Netherlands, 2004. 28. Csiszár, I. The method of types [information theory]. IEEE Trans. Inf. Theory 1998, 44, 2505–2523. [CrossRef] 29. Huang, S.L.; Xu, X.; Zheng, L. An information-theoretic approach to unsupervised feature selection for high-dimensional data. IEEE J. Sel. Areas Inf. Theory 2020, 1, 157–166. [CrossRef] 30. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2012. 31. Graham, R.L.; Knuth, D.E.; Patashnik, O.; Liu, S. Concrete mathematics: A foundation for computer science. Comput. Phys. 1989, 3, 106–107. [CrossRef] 32. Blair, J.; Edwards, C.; Johnson, J.H. Rational Chebyshev approximations for the inverse of the error function. Math. Comput. 1976, 30, 827–830. [CrossRef] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.