Physical-Layer Arithmetic for Federated Learning in Uplink MU-MIMO Enabled Wireless Networks Tao Huang',Baoliu Yef,Zhihao Qu,Bin Tang',Lei Xief,Sanglu Lut National Key Laboratory for Novel Software Technology,Nanjing University,China College of Computer and Information,Hohai University,China Email:schrodinger.huang@gmail.com,yebl@nju.edu.cn,quzhihao@hhu.edu.cn,[tb,Ixie,sanglu)@nju.edu.cn Abstract-Federated learning is a very promising machine To improve the communication efficiency,methods based learning paradigm where a large number of clients cooperatively on reducing the size of updates needed to transmit have been train a global model using their respective local data.In this paper,we consider the application of federated learning in widely studied [5]-[9],by performing techniques like quanti- wireless networks featuring uplink multiuser multiple-input and zation,sparsification.etc.In addition to simply compressing multiple-output (MU-MIMO),and aim at optimizing the commu- client-side updates,there is also work focusing on improving nication efficiency during the aggregation of client-side updates the communication efficiency of federated learning by utiliz- by exploiting the inherent superposition of radio frequency ing the characteristics of networks.Specifically,for wireless (RF)signals.We propose a novel approach named Physical- Layer Arithmetic (PhyArith),where the clients encode their local networks where uplink multiuser multiple-input and multiple- updates into aligned digital sequences which are converted into output (MU-MIMO)is enabled,i.e.,clients can concurrently RF signals for sending to the server simultaneously,and the send their updates to the server in the same time-frequency server directly recovers the exact summation of these updates resource,recent work like [10]-[13]study exploiting the as required from the superimposed RF signal by employing a superposition of radio frequency (RF)signals to effectively customized sum-product algorithm.PhyArith is compatible with aggregate updates by averaging them,based on the technique commodity devices due to the use of full digital operation in both the client-side encoding and the server-side decoding processes. called over-the-air computation (AirComp).These AirComp and can also be integrated with other updates compression based based work use uncoded analog transmission to send aligned acceleration techniques.Simulation results show that PhyArith client-side updates in the form of vector,each of which has further improves the communication efficiency by 1.5 to 3 times the same number of elements placed in the same order.Clients for training LeNet-5,compared with solutions only applying in these work are assumed to be equipped with a special updates compression. device featuring linear-analog-modulation and pre-channel- compensation (or simply assumed have no fading channel). I.INTRODUCTION such that the superimposed RF signal can be well utilized for Federated learning and related decentralized learning are fast update aggregation.However,these work are not dedicated the kind of machine learning paradigm where the goal is to to obtaining the exact average of updates.While the channel train a high quality centralized model,while training data noise is not negligible and the pre-channel-compensation of remains distributed over a large number of clients [1]-[4]. each client is not perfect,their obtained average may contain In each round of learning algorithms for this paradigm,each significant aggregation error,which may seriously affect the client independently computes an update to the current model convergence of the global model to train. based on its local data,and sends this update to a central In this paper,we study utilizing the superposition of RF server,where the client-side updates are aggregated to compute signals to efficiently aggregate client-side updates in federated a new global model.Communicating the model updates in learning by using commodity devices dedicated to modern each round has been observed to be a significant performance wireless networks featuring uplink MU-MIMO,e.g.,802.11ax bottleneck [5][6]for this paradigm.It is particularly serious [14].where neither linear-analog-modulation nor pre-channel- for federated learning,since its typical clients are smart phones compensation is necessarily available.We target at reliably ob- and loT devices,which are with unreliable and relatively slow taining the exact average of updates,such that the convergence network connections. of the global model to train is guaranteed for realistic channel conditions.Inspired by the fact that the amount of information This work was supported in part by the National K&D Program of China of all updates is greater than that of their summation,our under Grant 2018YFB1004704,in part by the National Natural Science Foun- idea is that we can directly recover the exact summation dation of China under Grant 61832005.Grant 61872171,and Grant 61872174 in part by the Key R&D of Jiangsu Province under Grant BE2017152,in based on the received superimposed RF signal,and then part by the Natural Science Foundation of Jiangsu Province under Grant calculate their average.It should have lower outage probability BK20190058.in part by Project funded by China Postdoctoral Science compared with the conventional multiuser detection (MUD) Foundation under Grant 2019M661709,and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization. based solution to deal with such mutual interference in uplink Baoliu Ye and Bin Tang are corresponding authors. MU-MIMO,i.e.,separating these colliding data streams based
Physical-Layer Arithmetic for Federated Learning in Uplink MU-MIMO Enabled Wireless Networks Tao Huang† , Baoliu Ye† , Zhihao Qu‡ , Bin Tang† , Lei Xie† , Sanglu Lu† †National Key Laboratory for Novel Software Technology, Nanjing University, China ‡College of Computer and Information, Hohai University, China Email: schrodinger.huang@gmail.com, yebl@nju.edu.cn, quzhihao@hhu.edu.cn, {tb, lxie, sanglu}@nju.edu.cn Abstract—Federated learning is a very promising machine learning paradigm where a large number of clients cooperatively train a global model using their respective local data. In this paper, we consider the application of federated learning in wireless networks featuring uplink multiuser multiple-input and multiple-output (MU-MIMO), and aim at optimizing the communication efficiency during the aggregation of client-side updates by exploiting the inherent superposition of radio frequency (RF) signals. We propose a novel approach named PhysicalLayer Arithmetic (PhyArith), where the clients encode their local updates into aligned digital sequences which are converted into RF signals for sending to the server simultaneously, and the server directly recovers the exact summation of these updates as required from the superimposed RF signal by employing a customized sum-product algorithm. PhyArith is compatible with commodity devices due to the use of full digital operation in both the client-side encoding and the server-side decoding processes, and can also be integrated with other updates compression based acceleration techniques. Simulation results show that PhyArith further improves the communication efficiency by 1.5 to 3 times for training LeNet-5, compared with solutions only applying updates compression. I. INTRODUCTION Federated learning and related decentralized learning are the kind of machine learning paradigm where the goal is to train a high quality centralized model, while training data remains distributed over a large number of clients [1]–[4]. In each round of learning algorithms for this paradigm, each client independently computes an update to the current model based on its local data, and sends this update to a central server, where the client-side updates are aggregated to compute a new global model. Communicating the model updates in each round has been observed to be a significant performance bottleneck [5] [6] for this paradigm. It is particularly serious for federated learning, since its typical clients are smart phones and IoT devices, which are with unreliable and relatively slow network connections. This work was supported in part by the National K&D Program of China under Grant 2018YFB1004704, in part by the National Natural Science Foundation of China under Grant 61832005, Grant 61872171, and Grant 61872174, in part by the Key R&D of Jiangsu Province under Grant BE2017152, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20190058, in part by Project funded by China Postdoctoral Science Foundation under Grant 2019M661709, and in part by the Collaborative Innovation Center of Novel Software Technology and Industrialization. Baoliu Ye and Bin Tang are corresponding authors. To improve the communication efficiency, methods based on reducing the size of updates needed to transmit have been widely studied [5]–[9], by performing techniques like quantization, sparsification, etc. In addition to simply compressing client-side updates, there is also work focusing on improving the communication efficiency of federated learning by utilizing the characteristics of networks. Specifically, for wireless networks where uplink multiuser multiple-input and multipleoutput (MU-MIMO) is enabled, i.e., clients can concurrently send their updates to the server in the same time-frequency resource, recent work like [10]–[13] study exploiting the superposition of radio frequency (RF) signals to effectively aggregate updates by averaging them, based on the technique called over-the-air computation (AirComp). These AirComp based work use uncoded analog transmission to send aligned client-side updates in the form of vector, each of which has the same number of elements placed in the same order. Clients in these work are assumed to be equipped with a special device featuring linear-analog-modulation and pre-channelcompensation (or simply assumed have no fading channel), such that the superimposed RF signal can be well utilized for fast update aggregation. However, these work are not dedicated to obtaining the exact average of updates. While the channel noise is not negligible and the pre-channel-compensation of each client is not perfect, their obtained average may contain significant aggregation error, which may seriously affect the convergence of the global model to train. In this paper, we study utilizing the superposition of RF signals to efficiently aggregate client-side updates in federated learning by using commodity devices dedicated to modern wireless networks featuring uplink MU-MIMO, e.g., 802.11ax [14], where neither linear-analog-modulation nor pre-channelcompensation is necessarily available. We target at reliably obtaining the exact average of updates, such that the convergence of the global model to train is guaranteed for realistic channel conditions. Inspired by the fact that the amount of information of all updates is greater than that of their summation, our idea is that we can directly recover the exact summation based on the received superimposed RF signal, and then calculate their average. It should have lower outage probability compared with the conventional multiuser detection (MUD) based solution to deal with such mutual interference in uplink MU-MIMO, i.e., separating these colliding data streams based
on the superimposed RF signal,recovering each update,and to compress the sparse update of each client,such that they averaging them,as shown in Fig.1.We denote our solution are aligned and can be efficiently aggregated by PhyArith. as Physical-Layer Arithmetic (PhyArith),where the clients The main contributions are summarized as follows:(1) encode their local updates into aligned digital sequences which We show that even if not applying pre-channel-compensation, are converted into RF signals for sending to the server simulta- the superimposed RF signal can be exploited to efficiently neously,and the server directly recovers the exact summation aggregate updates,without incurring any aggregation error. of these updates from the superimposed RF signal through a (2)Based on full digital operation,we design the client-side customized decoder. updates encoding process of PhyArith,such that it can be applied on commodity 802.11 devices,and their summation can be reliably recovered and verified.We also design its Client server-side decoding process where a customized sum-product Client Update algorithm is provided to directly recover the summation of a ,1▣ large number of updates.(3)PhyArith can be integrated with Client Updaterl- other updates compression methods.Simulation results show RE sic Process of clients that Phy Arith further improves the communication efficiency Process of server by 1.5 to 3 times for training LeNet-5,compared with the solutions which apply quantization/sparsification to compress updates,and aggregate them via MUD. Uptim- II.RELATED WORK 黑器 A.Quantization and Sparsification Averaged update Summation of updates Updates quantization and sparsification are the two main approaches to overcome the communication bottleneck in decentralized learning.Updates quantization was first pro- posed in [5],where elements not less than zero are encoded Fig.1.Recovery of averaged update based on MUD and PhyArith as bit 1,and those less than zero are encoded as bit 0. Following such 1-bit quantization,subsequent work like [8] For this purpose,three main challenges should be well [9]introduce multilevel updates quantization,so as to balance addressed.The first challenge is,how to encode each update at the communication efficiency and updates accuracy.Updates the client with commodity devices based on full digital opera- sparsification was first proposed in [6],where only elements tion,such that PhyArith can conveniently and reliably recover with large absolute value are encoded using 1-bit quantization. their exact summation.and effectively verify its correctness.To and sent out along with their associate index.Following this address this challenge,we encode updates into aligned digital approach,various sparsification methods,like sending fixed sequences by taking two's-complement representation [15]as proportion of large elements [7],are proposed. source codes,and systematic low-density parity check(LDPC) codes [16]as channel codes,such that the addition and B.AirComp subtraction operations during aggregation can be unified,and Recent AirComp based work focus on leveraging the su- error correction capability can be provided.We also append perposition of RF signal to aggregate client-side updates. the cyclic-redundancy-checksum(CRC)[17]with each update, Specifically,[12][13]study performing power control and such that its linear property can be exploited to verify the client selection to align the signal power received at the server correctness of the directly recovered summation.The second and avoid incurring serious aggregation error.[10]studies challenge is,how to directly recover the exact summation aggregating updates during wireless networks with power and of updates at the server,which is equipped with multiple bandwidth limitations,based on coded digital transmission in antennas,based on the superimposed RF signal of these different time-frequency resources,and uncoded analog trans- encoded updates emanating from a large number of different mission in the same time-frequency resource,i.e.,performing clients.To address this challenge,from the superimposed AirComp.It is also worth noting that although recent AirComp RF signal,we propose a sum-product algorithm [18]derived related work are all based on uncoded analog transmission,its polynomial time complexity decoder to directly obtain the original idea appeared in [19],which relies on lattice codes marginal probability mass function(PMF)of the summation of to achieve reliable over-the-air computation.However,it also updates.By maximizing the obtained PMF,we can recover the assumes no fading channel,which can not be directly applied summation,verify its correctness,and update the global model. on commodity devices at the client-side.And at the server- And the third challenge is,how to integrate PhyArith with side,it lacks practical efficient decoding approach. those updates compression methods like sparsification,where the client-side updates are not necessarily aligned due to C.MUD compression.To address this challenge,we adopt compressed- Different from AirComp,the conventional approach to sensing here by using a shared underdetermined sensing matrix deal with the superimposed RF signal is multiuser detection
on the superimposed RF signal, recovering each update, and averaging them, as shown in Fig. 1. We denote our solution as Physical-Layer Arithmetic (PhyArith), where the clients encode their local updates into aligned digital sequences which are converted into RF signals for sending to the server simultaneously, and the server directly recovers the exact summation of these updates from the superimposed RF signal through a customized decoder. Update|V |−1 Update1 Update0 MUD decoder Update∗ |V |−1 Update∗ 1 Update∗ 0 Averaged update PhyArith decoder Averaged update Summation of updates PhyArith based solution MUD based solution Propagate Process of server Process of clients · · · · · · · · · RF signal Superimposed Client RF signal Client Client · · · Fig. 1. Recovery of averaged update based on MUD and PhyArith For this purpose, three main challenges should be well addressed. The first challenge is, how to encode each update at the client with commodity devices based on full digital operation, such that PhyArith can conveniently and reliably recover their exact summation, and effectively verify its correctness. To address this challenge, we encode updates into aligned digital sequences by taking two’s-complement representation [15] as source codes, and systematic low-density parity check (LDPC) codes [16] as channel codes, such that the addition and subtraction operations during aggregation can be unified, and error correction capability can be provided. We also append the cyclic-redundancy-checksum (CRC) [17] with each update, such that its linear property can be exploited to verify the correctness of the directly recovered summation. The second challenge is, how to directly recover the exact summation of updates at the server, which is equipped with multiple antennas, based on the superimposed RF signal of these encoded updates emanating from a large number of different clients. To address this challenge, from the superimposed RF signal, we propose a sum-product algorithm [18] derived polynomial time complexity decoder to directly obtain the marginal probability mass function (PMF) of the summation of updates. By maximizing the obtained PMF, we can recover the summation, verify its correctness, and update the global model. And the third challenge is, how to integrate PhyArith with those updates compression methods like sparsification, where the client-side updates are not necessarily aligned due to compression. To address this challenge, we adopt compressedsensing here by using a shared underdetermined sensing matrix to compress the sparse update of each client, such that they are aligned and can be efficiently aggregated by PhyArith. The main contributions are summarized as follows: (1) We show that even if not applying pre-channel-compensation, the superimposed RF signal can be exploited to efficiently aggregate updates, without incurring any aggregation error. (2) Based on full digital operation, we design the client-side updates encoding process of PhyArith, such that it can be applied on commodity 802.11 devices, and their summation can be reliably recovered and verified. We also design its server-side decoding process where a customized sum-product algorithm is provided to directly recover the summation of a large number of updates. (3) PhyArith can be integrated with other updates compression methods. Simulation results show that PhyArith further improves the communication efficiency by 1.5 to 3 times for training LeNet-5, compared with the solutions which apply quantization/sparsification to compress updates, and aggregate them via MUD. II. RELATED WORK A. Quantization and Sparsification Updates quantization and sparsification are the two main approaches to overcome the communication bottleneck in decentralized learning. Updates quantization was first proposed in [5], where elements not less than zero are encoded as bit 1, and those less than zero are encoded as bit 0. Following such 1-bit quantization, subsequent work like [8] [9] introduce multilevel updates quantization, so as to balance the communication efficiency and updates accuracy. Updates sparsification was first proposed in [6], where only elements with large absolute value are encoded using 1-bit quantization, and sent out along with their associate index. Following this approach, various sparsification methods, like sending fixed proportion of large elements [7], are proposed. B. AirComp Recent AirComp based work focus on leveraging the superposition of RF signal to aggregate client-side updates. Specifically, [12] [13] study performing power control and client selection to align the signal power received at the server and avoid incurring serious aggregation error. [10] studies aggregating updates during wireless networks with power and bandwidth limitations, based on coded digital transmission in different time-frequency resources, and uncoded analog transmission in the same time-frequency resource, i.e., performing AirComp. It is also worth noting that although recent AirComp related work are all based on uncoded analog transmission, its original idea appeared in [19], which relies on lattice codes to achieve reliable over-the-air computation. However, it also assumes no fading channel, which can not be directly applied on commodity devices at the client-side. And at the serverside, it lacks practical efficient decoding approach. C. MUD Different from AirComp, the conventional approach to deal with the superimposed RF signal is multiuser detection
(MUD),where updates are separately recovered and aggre-threshold.As a conclusion,we can use PhyArith to improve gated.The state-of-the-art polynomial time complexity MUD the communication efficiency of federated learning without approach includes Gaussian message based interference can- incurring any effect on its convergence,even if pre-channel- cellation (GM-IC)[20][211,and linear minimum mean square compensation is not applied. error based interference cancellation (LMMSE-IC)[22],etc. Like our server-side decoder in PhyArith,these approaches are ha =h.lhol =+1. he=-ha,hal=+l,坠yAil also derived from the sum-product algorithm,which operates hol =a=+1.MUD on the factor-graph [18]in a message-passing manner. III.MOTIVATION In this section,through a theoretical analysis,we show that even if pre-channel-compensation of AirComp is not applied. 56T8 10 11 12 the superimposed RF signal can still be exploited to efficiently SNR (dB) aggregate updates,without incurring any aggregation error. Consider the following simple channel with two-input and one- Fig.2.Symbol time in theory for different decoders output,denoted by IV.SYSTEM MODEL y=hoso +h1s1+w, A.Model of Federated Learning where y ER is the channel output symbol,ho,hER are the The goal of federated learning is to learn a model with channel gain,so,si are independent channel input symbols parameters embodied in a vector M from data stored across uniformly chosen in-1.+1,and noise w~A(0.) a large number of clients.For simplicity,we consider syn- We study the average symbol time needed to accurately chronized algorithms for federated learning where quantization obtain function f(so,s1)=so +s1 for MUD and Ph- is taken to compress client-side updates and accelerate the yArith in theory.To calculate the symbol time,we study communication.For cases where no compression methods are the entropy (i.e.,the amount of information,in bits)[23] applied,i.e.,directly sending floating-point updates,enabling of so.s1 and so+s1 first.Easy to find that entropy PhyArith is similar to the quantization case,by treating the H(s0)=H(s1)=-0.51og20.5-0.5log20.5=1,and signed integer exponent of each floating-point number as the H(s0+s1)=-0.51og20.25-0.5log20.5=1.5.And then coefficient for de-quantization,and treating the significand we study the capacity of above channel,characterised by as the quantized update element.And for cases where other mutual information,i.e.,the amount of information can be updates compression methods like sparsification are applied, retrieved from the channel,in bits per symbol time [23].For we show how to enable PhyArith for them in Section VIII. conventional successive-interference-cancellation (SIC)based A typical round t of such synchronized algorithms consists MUD,we have the mutual information between y and so. of the following steps:(1)A set of clients,denoted by V= i.e.,I(y;so)=H(y)-H(yso),and that between y and s1 tvo,...,Uv-1,is selected.Each client vV downloads on condition of so,i.e,I(;s1lso)=H(so)-H(y小so,s1). the current model M:from the server.(2)Each vV com- Therefore,the symbol time to obtain so +s1 for SIC based putes an updated model M?based on their local data.(3)The MUD is given by quantized update vector Ap =Q(M?-M:),vEV(Q() max[H(so)/I(y;so),H(s1)/I(y;s1lso)} (1) is the quantization function),as well as the de-quantization function Q(),are sent from each client to the server.(4)The For PhyArith,we calculate the mutual information between server aggregates these updates by averaging and construct an y and so +s1 instead,which is given by I(y;so +s1)= improved global model M+1=Mt +ntAt,where nt is H(y)-H(ylso +s1).Therefore,the symbol time to obtain the learning rate,and A:=v(A?)/IVI.Here we so +s1 is given by assume the quantization function Q()is like those proposed H(s0+s1)/I(y;s0+51): (2) in [8][9].As a result,without loss of generality,we have A is a vector of signed integers,and Q()can be formed as With (1)and (2),we plot Fig.2 for ho =h1,ho =+1 and ho =-h1,hol=+1,where calculation processes of entropy Q(A)=BA?+Y1, H(y),H(ylso:s1),H(ylso)and H(ylso +s1)are provided where B,Yo are floating-point de-quantization coefficients, in Appendix-A.We can find that while ho=h,hol =+1, B>0,and 1 is a vector with all entries being 1. which can be viewed as pre-channel-compensation is perfectly applied and function distortion (i.e.,f(so,s1)-y)is small, B.Model of Uplink MU-MIMO Channel symbol time of PhyArith is always less than that of MUD. Here we present the model of wireless networks like And while ho =-h1,hol =+1,which can be viewed as no 802.11ax,where uplink MU-MIMO is enabled.Let U pre-channel-compensation applied and function distortion is {uo,.,-}denote the set of antennas of the server. extremely serious,symbol time of PhyArith is also less than Clients in V are equipped with single antenna,which simul- that of MUD when signal-noise-ratio (SNR)is above some taneously transmit their local updates to the server,during
(MUD), where updates are separately recovered and aggregated. The state-of-the-art polynomial time complexity MUD approach includes Gaussian message based interference cancellation (GM-IC) [20] [21], and linear minimum mean square error based interference cancellation (LMMSE-IC) [22], etc. Like our server-side decoder in PhyArith, these approaches are also derived from the sum-product algorithm, which operates on the factor-graph [18] in a message-passing manner. III. MOTIVATION In this section, through a theoretical analysis, we show that even if pre-channel-compensation of AirComp is not applied, the superimposed RF signal can still be exploited to efficiently aggregate updates, without incurring any aggregation error. Consider the following simple channel with two-input and oneoutput, denoted by y = h0s0 + h1s1 + w, where y ∈ R is the channel output symbol, h0, h1 ∈ R are the channel gain, s0, s1 are independent channel input symbols uniformly chosen in {−1, +1}, and noise w ∼ N (0, σ2 ). We study the average symbol time needed to accurately obtain function f(s0, s1) = s0 + s1 for MUD and PhyArith in theory. To calculate the symbol time, we study the entropy (i.e., the amount of information, in bits) [23] of s0, s1 and s0 + s1 first. Easy to find that entropy H(s0) = H(s1) = −0.5 log2 0.5 − 0.5 log2 0.5 = 1, and H(s0 + s1) = −0.5 log2 0.25 − 0.5 log2 0.5 = 1.5. And then we study the capacity of above channel, characterised by mutual information, i.e., the amount of information can be retrieved from the channel, in bits per symbol time [23]. For conventional successive-interference-cancellation (SIC) based MUD, we have the mutual information between y and s0, i.e., I(y; s0) = H(y) − H(y|s0), and that between y and s1 on condition of s0, i.e., I(y; s1|s0) = H(y|s0) − H(y|s0, s1). Therefore, the symbol time to obtain s0 + s1 for SIC based MUD is given by max{H(s0)/I(y; s0), H(s1)/I(y; s1|s0)}. (1) For PhyArith, we calculate the mutual information between y and s0 + s1 instead, which is given by I(y; s0 + s1) = H(y) − H(y|s0 + s1). Therefore, the symbol time to obtain s0 + s1 is given by H(s0 + s1)/I(y; s0 + s1). (2) With (1) and (2), we plot Fig. 2 for h0 = h1, |h0| = +1 and h0 = −h1, |h0| = +1, where calculation processes of entropy H(y), H(y|s0, s1), H(y|s0) and H(y|s0 + s1) are provided in Appendix-A. We can find that while h0 = h1, |h0| = +1, which can be viewed as pre-channel-compensation is perfectly applied and function distortion (i.e., f(s0, s1) − y) is small, symbol time of PhyArith is always less than that of MUD. And while h0 = −h1, |h0| = +1, which can be viewed as no pre-channel-compensation applied and function distortion is extremely serious, symbol time of PhyArith is also less than that of MUD when signal-noise-ratio (SNR) is above some threshold. As a conclusion, we can use PhyArith to improve the communication efficiency of federated learning without incurring any effect on its convergence, even if pre-channelcompensation is not applied. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 SNR (dB) Symbol time h0 = h1, |h0| = +1, PhyArith h0 = −h1, |h0| = +1, PhyArith |h0| = |h1| = +1, MUD Fig. 2. Symbol time in theory for different decoders IV. SYSTEM MODEL A. Model of Federated Learning The goal of federated learning is to learn a model with parameters embodied in a vector M from data stored across a large number of clients. For simplicity, we consider synchronized algorithms for federated learning where quantization is taken to compress client-side updates and accelerate the communication. For cases where no compression methods are applied, i.e., directly sending floating-point updates, enabling PhyArith is similar to the quantization case, by treating the signed integer exponent of each floating-point number as the coefficient for de-quantization, and treating the significand as the quantized update element. And for cases where other updates compression methods like sparsification are applied, we show how to enable PhyArith for them in Section VIII. A typical round t of such synchronized algorithms consists of the following steps: (1) A set of clients, denoted by V = {v0, · · · , v|V |−1}, is selected. Each client v ∈ V downloads the current model Mt from the server. (2) Each v ∈ V computes an updated model Mv t based on their local data. (3) The quantized update vector Av t = Qv(Mv t − Mt), v ∈ V (Qv(·) is the quantization function), as well as the de-quantization function Q−1 v (·), are sent from each client to the server. (4) The server aggregates these updates by averaging and construct an improved global model Mt+1 = Mt + ηtAt, where ηt is the learning rate, and At = P v∈V Q−1 v (Av t )/|V |. Here we assume the quantization function Qv(·) is like those proposed in [8] [9]. As a result, without loss of generality, we have Av t is a vector of signed integers, and Q−1 v (·) can be formed as Q −1 v (Av t ) = βvAv t + γv1, where βv, γv are floating-point de-quantization coefficients, βv > 0, and 1 is a vector with all entries being 1. B. Model of Uplink MU-MIMO Channel Here we present the model of wireless networks like 802.11ax, where uplink MU-MIMO is enabled. Let U = {u0, · · · , u|U|−1} denote the set of |U| antennas of the server. Clients in V are equipped with single antenna, which simultaneously transmit their local updates to the server, during
the same time-frequency resource,e.g..the same subcarriers=[..].These codewords of channel codes within the same orthogonal frequency-division multiplexing are then mapped into m baseband symbols via modulation, (OFDM)symbols in 802.1lax.For the j-th time-frequency denoted by s=[sv.0,..,sv.m-1],and sent out. resource unit (e.g.,1 subcarrier within 1 OFDM symbol),the uplink MU-MIMO channel can be presented as follows Quantized update vector A Yuoj Svo.j Wuo.j Slice =H, a. Quantized update yector chunk Quantized undate vecto chank yuwv1-1.j Svvl-13] WuU1-1.j Quantized update vector chunk a. Encod where wj is i.i.d.random variable following circularly nor- (source codes,checksum Bit-sequence CRC mal distribution C(0,202),and denotes the additive noise of antenna u for the j-th time-frequency resource unit.C Parity bits denotes the observation of u,sv.jES(where SCC)denotes Codeword x Modalate,interpolate,up-convert the baseband symbol of v obtained via modulation,and the channel gain matrix H[hh where vector RF signal =h以o…,h2,uw- Fig.3.Encoding process of each client V.OVERVIEW OF PHYARITH Here we provide an overview of PhyArith based on the A.Source Codes model of federated learning and uplink MU-MIMO channel Regarding the source codes,in the quantized update vector presented above.PhyArith consists of two phases,i.e.,the client-side encoding phase and the server-side decoding phase. chunk av=[.,dv.j,avj+1,..].each its element avj is a signed integer,and is encoded into c bits bvjo,bv.j At the client-side,each quantized update vector A?is sliced into chunks,which are encoded into aligned digital sequence within b =[6v.0,...,b.k1]using two's-complement rep- resentation [15].according to the following rule through the source codes and channel codes encoder by ap- pending checksums and parity bits,and simultaneously trans- (Lav./2--1]mod 2, av,j0, mitted through the uplink MU-MIMO channel.The source (2c+avj)/2-1-1]mod 2,o.w. (3) codes taken here are based on two's-complement representa- tion [15],and the channel codes are based on systematic low- where I=0,...,c-1,bv.jo is the most significant bit and density parity check(LDPC)codes [6],such that the additionis the least significant bit.Here we should ensure and subtraction operations during aggregation can be unified, 2e-1≥max{lau,llv∈V,j∈N}.Compared to other systems and error correction capability can be provided.The appended for representing signed numbers (e.g.,one's complement),the checksum is the cyclic-redundancy-checksum (CRC)[17]. advantage of two's-complement is that the fundamental arith- such that its linear property can be exploited to verify the metic operations of addition,subtraction,and multiplication correctness of the directly recovered summation.At the server- are identical to those for unsigned binary numbers,such that it side,from the superimposed RF signal uo,, 1 would be much convenient for our PhyArith to directly decode the summations of a large number of updates chunks are the summation of update vectors. directly recovered through a customized decoder derived from B.Checksum the sum-product algorithm.After verifying their correctness based on the linear property of CRC,these summations are The checksum appended to each bit-sequence of source averaged and assembled into (BA+1)/IVI,which codes is cyclic-redundancy-checksum (CRC)[17].Together is then taken to update the global model to train. with the bit-sequence encoded from a.they form the source- word b..We use CRC in PhyArith because it is a linear VI.ENCODING PROCESS OF PHYARITH function with the following property In this section,we demonstrate the encoding process of PhyArith.As shown in Fig.3,the quantized update CRC(b⊕b)=CRC(b)⊕CRC(b))=0⊕0=0,(4) vector A of client v is sliced into fixed size chunks where denotes bitwise-XOR,and CRC()returns the check- first,i.e,Ag=L…,aw,…].For each chunk a= sum of input in the form of sequence of bits.This property ,+1,],vZ.it is encoded into a bit-can help us verify the correctness of the decoded summation sequence of the source codes.The obtained bit-sequence of of update vectors.We present this verification process in detail source codes is then appended with its checksum (we use in Section VII-C. cyclic-redundancy-checksum,i.e.,CRC,in PhyArith),and forms thek bits sourceword of the channel codes,denoted C.Channel Codes by vector b=[6.0,...,bv.k-].Each such sourceword is In order to provide error correction capability in PhyArith, further encoded into the n bits codeword of the channel codes the k bits sourceword be,consisting of the bit-sequence by appending a sequence of parity bits,denoted by vector encoded from quantized update vector chunk a and its CRC
the same time-frequency resource, e.g., the same subcarriers within the same orthogonal frequency-division multiplexing (OFDM) symbols in 802.11ax. For the j-th time-frequency resource unit (e.g., 1 subcarrier within 1 OFDM symbol), the uplink MU-MIMO channel can be presented as follows yu0,j · · · yu|U|−1,j = Hj sv0,j · · · sv|V |−1,j + wu0,j · · · wu|U|−1,j , where wu,j is i.i.d. random variable following circularly normal distribution CN (0, 2σ 2 ), and denotes the additive noise of antenna u for the j-th time-frequency resource unit. yu,j ∈ C denotes the observation of u, sv,j ∈ S (where S ⊂ C) denotes the baseband symbol of v obtained via modulation, and the channel gain matrix Hj = [h j v0 , · · · , h j v|V |−1 ], where vector h j v = [h j v,u0 , · · · , hj v,u|U|−1 ] T . V. OVERVIEW OF PHYARITH Here we provide an overview of PhyArith based on the model of federated learning and uplink MU-MIMO channel presented above. PhyArith consists of two phases, i.e., the client-side encoding phase and the server-side decoding phase. At the client-side, each quantized update vector Av t is sliced into chunks, which are encoded into aligned digital sequence through the source codes and channel codes encoder by appending checksums and parity bits, and simultaneously transmitted through the uplink MU-MIMO channel. The source codes taken here are based on two’s-complement representation [15], and the channel codes are based on systematic lowdensity parity check (LDPC) codes [16], such that the addition and subtraction operations during aggregation can be unified, and error correction capability can be provided. The appended checksum is the cyclic-redundancy-checksum (CRC) [17], such that its linear property can be exploited to verify the correctness of the directly recovered summation. At the serverside, from the superimposed RF signal [yu0,j , · · · , yu|U|−1,j ] T , the summations of a large number of updates chunks are directly recovered through a customized decoder derived from the sum-product algorithm. After verifying their correctness based on the linear property of CRC, these summations are averaged and assembled into P v∈V (βvAv t +γv1)/|V |, which is then taken to update the global model to train. VI. ENCODING PROCESS OF PHYARITH In this section, we demonstrate the encoding process of PhyArith. As shown in Fig. 3, the quantized update vector Av t of client v is sliced into fixed size chunks first, i.e., Av t = [· · · , av, · · · ]. For each chunk av = [· · · , av,j , av,j+1, · · · ], av,j ∈ Z, it is encoded into a bitsequence of the source codes. The obtained bit-sequence of source codes is then appended with its checksum (we use cyclic-redundancy-checksum, i.e., CRC, in PhyArith), and forms the k bits sourceword of the channel codes, denoted by vector bv = [bv,0, · · · , bv,k−1]. Each such sourceword is further encoded into the n bits codeword of the channel codes by appending a sequence of parity bits, denoted by vector xv = [xv,0, · · · , xv,n−1]. These codewords of channel codes are then mapped into m baseband symbols via modulation, denoted by sv = [sv,0, · · · , sv,m−1], and sent out. Quantized update vector Av t Slice Quantized update vector chunk Quantized update vector chunk · · · Quantized update vector chunk av Encode (source codes, checksum) Bit-sequence CRC Encode (channel codes) Sourceword bv Parity bits Codeword xv Modulate, interpolate, up-convert RF signal Fig. 3. Encoding process of each client A. Source Codes Regarding the source codes, in the quantized update vector chunk av = [· · · , av,j , av,j+1, · · · ], each its element av,j is a signed integer, and is encoded into c bits {bv,j0 , · · · , bv,jc−1 } within bv = [bv,0, · · · , bv,k−1] using two’s-complement representation [15], according to the following rule bv,jl = ( ⌊av,j/2 c−l−1 ⌋ mod 2, av,j ≥ 0, ⌊(2c + av,j )/2 c−l−1 ⌋ mod 2, o.w., (3) where l = 0, · · · , c − 1, bv,j0 is the most significant bit and bv,jc−1 is the least significant bit. Here we should ensure 2 c−1 ≥ max{|av,j ||v ∈ V, j ∈ N}. Compared to other systems for representing signed numbers (e.g., one’s complement), the advantage of two’s-complement is that the fundamental arithmetic operations of addition, subtraction, and multiplication are identical to those for unsigned binary numbers, such that it would be much convenient for our PhyArith to directly decode the summation of update vectors. B. Checksum The checksum appended to each bit-sequence of source codes is cyclic-redundancy-checksum (CRC) [17]. Together with the bit-sequence encoded from av, they form the sourceword bv. We use CRC in PhyArith because it is a linear function with the following property CRC(bv ⊕bv′ ) = CRC(bv)⊕CRC(bv′ ) = 0⊕0 = 0, (4) where ⊕ denotes bitwise-XOR, and CRC(·) returns the checksum of input in the form of sequence of bits. This property can help us verify the correctness of the decoded summation of update vectors. We present this verification process in detail in Section VII-C. C. Channel Codes In order to provide error correction capability in PhyArith, the k bits sourceword bv, consisting of the bit-sequence encoded from quantized update vector chunk av and its CRC
is encoded into n bits codeword x=[v.0,..,v.n-1]of Sourceword-sumz和 channel codes.The channel codes adopted here are systematic ∑ek,(au+w1) low-density parity check (LDPC)codes [16].For each k bits Sourceword-sum z1 ∑ey(a+e1) sourceword bo,it is appended with n-k parity bits provided Sourceword-sum z2 by LDPC codes encoder,and form the n bits codeword x. ∑.e月n+1) To enable PhyArith,we require the quantized update vector Sourceword-sum到- chunks transmitted simultaneously share the same LDPC dd up and codes codebook,i.e.,the mapping from b to x should be eed update vector chun the same for different client v.Besides,for modulation,the vector chank mapping from x to s=[sv.0,....so.m-1]should be the same as well. Averaged update vector At D.Compatibility with Commodity 802.11 Devices Fig.4.Decoding process of server In 802.1lax,similar with PhyArith,systematic LDPC codes are taken for error correction.For fixed coding rate,the code- A.Building of the Factor-Graph book mapping from scrambled sourceword to LDPC codes codeword is fixed.Besides,for fixed modulation,the mapping We first show how to directly decode each sourceword- sum zi based on the superimposed RF signal.Let y;= from codeword to symbols transmitted in the same time- frequency resource is also fixed.As a conclusion,the encoding be the j-th observation vector.The process of 802.1lax is fully compatible with PhyArith.That decoding problem here can be formulated as the following optimization problem is,we can directly use commodity single antenna 802.1lax devices to send the sourceword b of PhyArith via a raw z,…,z-1 socket,where the channel codes related process is provided =ag,maxP(z0,·,z4y-1yo,·,ym-1) (5) by 802.1lax devices. z0,…,zy1-1 Since jointly optimize zo,...,z-1 is too complicated to VII.DECODING PROCESS OF PHYARITH solve.We can relax(5)as follows In this section,we demonstrate the decoding process of zij=arg max P(zi.jlyo,..ym-1), PhyArith.Limited by the decoding complexity,client set V in PhyArith is randomly partitioned into subsets V= where the marginal probability can be written as the following {V6,…,v-i}first,where VVi∈y,lVA≤△and△is sum-product form via factorization the subset size limit.Then,as shown in Fig.4,based on the superimposed RF signal,PhyArith directly decodes the P(2,ly0,…,ym-1 sourceword-sum of each subset Vi,which is denoted by = ∑P(yo,…,ym-1,z0,…,z4-1)/P(yo,…,ym-1) w24,3 z1=[,0…,2.k-il=ab A∑[P(yo,…,ym-ilso…,s-i) w24.d where B.,is an odd integer given by v1-1 ΠIΠIs,x,b6∑a,b。-zrlh】 B+1,with probability 1/2, i'=0 vEV UEV (6 6n·2-1,0.w. Here denotes all random variables exceptz and is a randomly chosen integer ensuring Vo E V,B.. I(sv,x,b)is the indicator function to ensure x is a valid 2f∈N,B,·2s≥l.For these obtained sourceword-sumz: LDPC codes codeword encoded from be,and su is a valid verified to be correct,the summation of update vector chunks symbol sequence modulated from x.And 6()denotes the of Vi,i.e..v(B+),is recovered.And for these Dirac delta function. Zi verified to be incorrect,retransmission is requested.Finally, A general method to efficiently obtain the marginal prob- all these recovered summations can be added up and averaged, ability of each zi;is the factor-graph based sum-product then assembled into the averaged update vector At,given by algorithm [18],which operates in a message-passing man- ner.Specifically,while=2,the factorization (6)can A,=[…,∑a+1/W,…] be visualized by a factor-graph,a kind of bipartite graph ViEV vEV like Fig.5,which consists of two kind of nodes.One is called variable node (denoted by circles),representing random Note that here we let each B be a large odd integer to facilitate variables,and the other is called check node (denoted by the verification of zi in Section VII-C,and make sure the squares),representing the local functions of variables.By recovered v1)is accurate in Section VII-D. exchanging messages in the form of probability mass function
is encoded into n bits codeword xv = [xv,0, · · · , xv,n−1] of channel codes. The channel codes adopted here are systematic low-density parity check (LDPC) codes [16]. For each k bits sourceword bv, it is appended with n−k parity bits provided by LDPC codes encoder, and form the n bits codeword xv. To enable PhyArith, we require the quantized update vector chunks transmitted simultaneously share the same LDPC codes codebook, i.e., the mapping from bv to xv should be the same for different client v. Besides, for modulation, the mapping from xv to sv = [sv,0, · · · , sv,m−1] should be the same as well. D. Compatibility with Commodity 802.11 Devices In 802.11ax, similar with PhyArith, systematic LDPC codes are taken for error correction. For fixed coding rate, the codebook mapping from scrambled sourceword to LDPC codes codeword is fixed. Besides, for fixed modulation, the mapping from codeword to symbols transmitted in the same timefrequency resource is also fixed. As a conclusion, the encoding process of 802.11ax is fully compatible with PhyArith. That is, we can directly use commodity single antenna 802.11ax devices to send the sourceword bv of PhyArith via a raw socket, where the channel codes related process is provided by 802.11ax devices. VII. DECODING PROCESS OF PHYARITH In this section, we demonstrate the decoding process of PhyArith. Limited by the decoding complexity, client set V in PhyArith is randomly partitioned into subsets V = {V0, · · · , V|V |−1} first, where ∀Vi ∈ V, |Vi | ≤ ∆ and ∆ is the subset size limit. Then, as shown in Fig. 4, based on the superimposed RF signal, PhyArith directly decodes the sourceword-sum of each subset Vi , which is denoted by zi = [zi,0, · · · , zi,k−1] = X v∈Vi βˆ vbv, where βˆ v is an odd integer given by βˆ v = ( βv · 2 ζ + 1, with probability 1/2, βv · 2 ζ − 1, o.w., and ζ is a randomly chosen integer ensuring ∀v ∈ V, βv · 2 ζ ∈ N, βv · 2 ζ ≫ 1. For these obtained sourceword-sum zi verified to be correct, the summation of update vector chunks of Vi , i.e., P v∈Vi (βvav + γv1), is recovered. And for these zi verified to be incorrect, retransmission is requested. Finally, all these recovered summations can be added up and averaged, then assembled into the averaged update vector At, given by At = h · · · , X Vi∈V X v∈Vi (βvav + γv1)/|V |, · · · i . Note that here we let each βˆ v be a large odd integer to facilitate the verification of zi in Section VII-C, and make sure the recovered P v∈Vi (βvav + γv1) is accurate in Section VII-D. Sourceword-sum z|V|−1 Sourceword-sum z2 Sourceword-sum z1 Sourceword-sum z0 · · · Superimposed RF signal Sum-product derived decoder Recover or request retransmit P v∈Vl (βvav + γv1) P v∈Vj (βvav + γv1) P v∈Vi (βvav + γv1) · · · · · · Add up and average Averaged update vector chunk Averaged update vector chunk Averaged update vector chunk · · · Averaged update vector At Assemble Fig. 4. Decoding process of server A. Building of the Factor-Graph We first show how to directly decode each sourcewordsum zi based on the superimposed RF signal. Let yj = [yu0,j , · · · , yu|U|−1,j ] T be the j-th observation vector. The decoding problem here can be formulated as the following optimization problem z ∗ 0 , · · · , z ∗ |V |−1 =arg max z0,··· ,z|V|−1 P(z0, · · · , z|V |−1|y0, · · · , ym−1). (5) Since jointly optimize z0, · · · , z|V |−1 is too complicated to solve. We can relax (5) as follows z ∗ i,j = arg max zi,j P(zi,j |y0, · · · , ym−1), where the marginal probability can be written as the following sum-product form via factorization P(zi,j |y0, · · · , ym−1) = X ∼zi,j P(y0, · · · , ym−1, z0, · · · , z|V |−1)/P(y0, · · · , ym−1) = A· X ∼zi,j P(y0, · · · , ym−1|sv0 , · · · , sv|V |−1 )· |V |− Y 1 i ′=0 [ Y v∈Vi ′ I(sv, xv, bv)]δ(k X v∈Vi ′ βˆ vbv − zi ′k1) . (6) Here ∼ zi,j denotes all random variables except zi,j . I(sv, xv, bv) is the indicator function to ensure xv is a valid LDPC codes codeword encoded from bv, and sv is a valid symbol sequence modulated from xv. And δ(·) denotes the Dirac delta function. A general method to efficiently obtain the marginal probability of each zi,j is the factor-graph based sum-product algorithm [18], which operates in a message-passing manner. Specifically, while |V| = 2, the factorization (6) can be visualized by a factor-graph, a kind of bipartite graph like Fig. 5, which consists of two kind of nodes. One is called variable node (denoted by circles), representing random variables, and the other is called check node (denoted by squares), representing the local functions of variables. By exchanging messages in the form of probability mass function