Spectral Kurtosis Based RFI Mitigation for CHIME

We present the implementation of a spectral kurtosis based Radio Frequency Interference detection system on the CHIME instrument and its reduced-scale pathfinder. Our implementation extends single-receiver formulations to the case of a compact array, combining samples from multiple receivers to improve the confidence with which RFI is detected. Through comparison between on-sky data and simulations, we show that the statistical properties of the canonical spectral kurtosis estimator are functionally unchanged by cross-array integration. Moreover, by comparison of simultaneous data from CHIME and the Pathfinder, we evaluate our implementation's capacity for interference discrimination for compact arrays of various size. We conclude that a spectral kurtosis based implementation provides a scalable, high cadence RFI discriminator for compact multi-receiver arrays.


Introduction
The presence of non-astronomical signals in the electromagnetic spectrum ('Radio-Frequency Interference' or RFI) poses a significant hazard to successful radio observations. Sources of such interference include lightning and electrical discharges, inadvertent emissions from terrestrial electronics, radio-frequency telecommunication signals, and communications with satellites and aircraft (Galt 1991;Offringa et al. 2013;Sokolowski et al. 2016).
The Canadian Hydrogen Intensity Mapping Experiment (CHIME 8 ) is a newly-constructed radio interferometer with 1024 dual-polarization receivers continuously observing a 400-800 MHz band. A fully independent reduced-scale prototype, the CHIME Pathfinder (henceforth 'the Pathfinder') possesses 128 dualpolarization receivers and an identical observing band (Bandura et al. 2014;Newburgh et al. 2014). CHIME and the Pathfinder each follow an 'FX' correlator architecture, with an FPGA-based Fourier-transform stage (Bandura et al. 2016) followed by a GPU-based outer-product 'X-engine' (Klages et al. 2015;Recnik et al. 2015;Denman et al. 2015).
The considerable capabilities of the CHIME correlator present an opportunity for powerful, on-the-fly RFI mitigation. CHIME employs a 'software' correlator X-engine, and its easily-reconfigurable, generalpurpose hardware permits the introduction of additional data processing with minimal development effort. This includes extending the real-time processing system to include the detection and excision of RFI during the correlation process. The architecture of the CHIME X-engine is such that each processing 'node' contains data from only a few frequencies, but over the entire array; this constrains the selection of RFI-excision algorithms which may be applied in the GPUs. Our choice of a statistical excision algorithm, based on spectral kurtosis, was informed by both robust performance and modest computational cost. Additionally, as detailed below, the compact layout of CHIME permits multi-receiver RFI detection, lowering the effective threshold for RFI detections while retaining high time resolution.
We summarize the theoretical foundations of the spectral kurtosis estimator in §2.1 and present a multireceiver formulation of the estimator in §2.2. The details of its implementation on CHIME and the Pathfinder are included in §3.1 and the results obtained follow in §3.2.

The Spectral Kurtosis Estimator
Tests of the normalized higher moments of a distribution provide a powerful metric for the presence of RFI. Given the expected Gaussianity of the electric fields produced by natural sources, and the extreme non-Gaussianity of observed RFI sources, deviations in the statistical properties of radio measurements signal the presence of substantial artificial contamination.
The use of a kurtosis-based estimator as an indicator of significant non-Gaussianity, and therefore contamination, has a long history: Dwyer (1983Dwyer ( , 1984; Servière (1998); Vrabie et al. (2003) and Nita et al. (2007) explore a spectral-domain kurtosis measure, with further development and radio-astronomical application in Nita & Gary (2010a,b); Gary et al. (2010), andNita (2016). Spectral-domain kurtosis measurement provides a robust and overall-scaling-invariant probe of the data distribution, with substantial potential for discrimination.

Spectral Kurtosis
For a set of M independent complex values x j , representing the post-Fourier-transform timestream of a single frequency channel as received by the X-engine, we construct an unbiased spectral kurtosis estimator SK after Nita & Gary (2010a): For x j drawn from a circularly-symmetric complex Gaussian, the |x j | 2 will be χ 2 -distributed with two degrees of freedom. Hence, as shown by Nita & Gary (2010a), to first order the expected mean, variance, skewness, and kurtosis of SK are: Notably, the expected mean is invariant and the variance, skewness and kurtosis depend solely on M .

Spectral Kurtosis in a Compact Array Context
For an interferometer in which the receivers are separated by a maximum distance L, and in which the data from each receiver is used to compute a spectral kurtosis estimate over a period ∆t, any transient RFI signal will appear 'simultaneous' across the array in the case that L c ∆t. This is true of CHIME's compact configuration: the receivers are separated by at most ≈ 100 m, so any RFI signal will arrive at all the receivers within ≈ 0.3 µs, far shorter than the timescales at which spectral kurtosis estimates are produced in our implementation. For an array which is compact relative to the distance to sources of RFI, receivers will see the interference at similar levels, permitting their combination to enhance RFI detection.
The discrimination potential of a spectral kurtosis based estimator is limited by its variance, which depends solely on the total number of samples present in the input data, M . Therefore, increasing M corresponds to increasing the discrimination potential of the estimator. For a single receiver with a fixed sampling rate, improving the discrimination potential of the estimator necessarily reduces the cadence at which estimates are produced. Alternatively, combining measurements from multiple receivers can increase the size of input dataset (reducing the inherent variance in the estimator) without affecting the cadence at which kurtosis measurements are generated. If the receivers provide uncorrelated samples (see Appendix A for limitations on this) combining data from multiple receivers will increase the effective integration length, reducing the inherent variance in the estimator when calculated at a given cadence

Multi-Receiver Formulation of the Spectral Kurtosis Estimator
For a compact array of N independent receivers, in which spectral kurtosis estimates are desired every n time-samples, one possible method of combining the information from the different antennas would be to simply average the individual spectral kurtosis estimates after they are produced. We present here an alternative formulation which produces mathematically equivalent results by combining the intermediate cumulants from different receivers into a single overall estimator.
We begin by considering the spectral kurtosis estimate produced by the i th receiver: After normalizing each receiver's signal by its mean power µ i , we sum across the array to produce the normalized cumulants S 1 and S 2 : By substituting the definition of SK i from Equation 8 into Equation 11, we may re-express S 2 as: We may then construct a multi-receiver spectral kurtosis estimator analogous to that in Equation 2, using the normalized cumulants S 1 and S 2 and an effective number of samples M = nN : Equation 13 illustrates that computing a spectral kurtosis estimate in this manner, by directly combining samples across the array, is equivalent to averaging spectral kurtosis estimates generated from the separate receivers. It should be noted that this is only the case if the individual receivers' data is correctly normalized. The constant factor in front of Equation 13 motivates a convenient re-scaling of the multi-receiver spectral kurtosis estimator: The statistical properties of this estimator may be directly recovered from those of the constituent SK i ; particularly, the expected mean and variance are:

Statistical Biases
The statistical properties described in §2.1 and §2.2.1 are precisely correct in the case of continuous sampling of the underlying Gaussian distributions; this requirement is not formally met when implementing the SK estimator in digital hardware. In correlator systems which use discretized representations of the data, the saturation of the representable range by extreme values and the quantization effects of numerical representations may affect the statistical properties of SK; we discuss this effect in Appendix B.

Computational Cost of the Simplified Multi-Receiver Algorithm
The multi-receiver formulation described above has reduced computational requirements relative to the corresponding individual computations. The minimum number of operations required to calculate a spectral kurtosis estimate for n time-samples and N antennas with and without the combination of receivers are as follows: Both implementations require several core computations: Step The additional computations required to produce spectral kurtosis estimates in both cases are as follows: Total (7n + 2)N + 6 Although in both cases the leading term is of the form 7nN , the alternative formulation offers an O(N ) reduction in the number of operations required. For any value of N greater than one, the simplified multi-receiver algorithm requires fewer operations to compute at an identical cadence.
3. Application to CHIME

Implementation
The CHIME X-engine (Denman et al., in prep.) consists of 256 correlator nodes, each hosting two dual-chip AMD S9300x2 GPUs. Each of the four GPU chips processes time-ordered data for all of CHIME's 2048 receivers over one ≈ 390 kHz frequency channel, and is therefore able to compute SK estimates for a single sub-band. The Pathfinder's X-engine has a similar structure; its 16 compute nodes each receive 64 frequency channels from all 256 receivers, and subsequently utilize the same GPU kernels and acquisition software as CHIME to compute SK estimates.
Two GPU processing kernels were written in Open Computing Language (OpenCL 9 ) to compute realtime SK estimates at a variety of integration lengths. The kernels were then compiled into Heterogeneous System Architecture Code Objects (HSACOs 10 ) which are executed by CHIME's real-time data processing software, kotekan (Renard et al., in prep.). The first kernel computes, accumulates, and normalizes the power and square-power estimates from each independent receiver. The second kernel sums the output from the previous kernel across the array before computing a single SK estimate for the current time interval. This model allows for significant flexibility in the duration of time-integration and total number of receivers in the array, and is highly efficient. In the current implementation, full-array SK estimates are produced for each of CHIME's 1024 frequency channels after accumulations of 256 time-samples (a cadence of 0.655 ms). SK computations currently require ≈ 0.7% of the available computational power of the CHIME GPU X-engine, and may therefore run in parallel with the primary correlation systems with minimal impact.

Results
In a June 2018 engineering run, one hour of simultaneous SK estimates from both CHIME and the Pathfinder were recorded, with the values of SK output at a cadence of 0.655 ms. Out of a total 1024 frequency channels across the 400-800 MHz band, 892 and 272 were recorded to completion on CHIME and the Pathfinder respectively. This data provides a point of comparison between observations and the simulations described in Appendix B as well as verifying consistency between the two instruments.
We define 'detection significance' as the difference between the measured value of SK for a given set of time-samples and its simulated expectation, expressed in units of the standard deviation of SK when applied to RFI-free data. For example, a threshold of 'a standard deviations' in detection significance would correspond to removing data outside of the range mean( SK) ± a var( SK). The adoption of a symmetric interval in detection significance as the metric by which RFI is indicated implicitly relies on the large-nN nature of this implementation; thresholds based on the predicted false-alarm probability offer an alternative for smaller-nN cases in which the skewness of the SK is significant (see  for an example). Figure 1 depicts a short segment (∼ 650ms) of 'detection significance' values, computed at a cadence of 0.655ms, for all available frequency channels. Both fixed-frequency broadcasting signals and a broadband interference pulse (< 0.655ms in duration) are detected in the segment.

Comparison with Simulated Data
Data from both CHIME and the Pathfinder indicate that frequencies which are believed to be relatively free of RFI adhere closely to simulations; Figure 2 compares the simulated, RFI-free distribution to CHIME observations of frequency channels with different levels of persistent RFI contamination. Frequencies with low degrees of RFI contamination reproduce the expected SK distribution while RFI-saturated channels do not. Figure 3 compares the probability distributions of simulated and observed RFI detection significance values (as defined above) in both the CHIME and Pathfinder datasets. Applying a threshold to this value allows the excision of significantly contaminated time-intervals while also setting a well-constrained falsepositive rate for non-contaminated data. When applying a 5-standard-deviation detection threshold, we find that our implementation detects RFI in ∼16-18% of CHIME's band over the course of a day. The distribution of SK detection significance in sample data from both CHIME and the Pathfinder, as compared to a prediction from simulated RFI-free data. The excesses at high significance indicate the presence of RFI in both instruments' data, with the rightward shift in the CHIME data illustrating the improved interference-to-noise ratio obtained by the larger number of receivers.  The simultaneously-measured RFI detection significance in synchronous data from both CHIME and the Pathfinder, within the overlapping 216 frequency channels. The diagonal line indicates the √ N interference-to-noise ratio increase expected from their relative number of receivers.

Inter-Instrument Comparison
The simultaneous detection of transient events in both CHIME and the Pathfinder supports the interpretation of changes in SK as measuring exogenous signals. Figure 4 presents a comparison of simultaneous SK estimates from both instruments across all available frequencies; the two are strongly correlated, with a significance ratio consistent with the sensitivity difference predicted by simulations of the two instruments. We conclude that, for a given frequency and time interval, the two instruments provide consistent measurements of the presence of RFI. Moreover, through examination of RFI transient events, it becomes evident that both instruments are observing nearly identical surroundings.

Conclusion
We have shown that the excision potential of the SK estimator may be improved through the combination of signals from multiple receivers within a compact array. Our implementation offers a robust, computationally-efficient method for real-time RFI detection and removal for compact arrays. Data from both CHIME and the Pathfinder agree closely with numerical simulations, confirming that the effects of quantization on the SK estimator in our correlator system are entirely predictable. Particularly, we find that these effects do not substantially alter the integration-length-dependence of the estimator's statistical moments. We conclude that the multi-receiver formulation of the spectral kurtosis estimator provides hightime-resolution, low-computational-cost RFI excision whose sensitivity improves predictably with the size of the compact array.

A. Common-Mode Contributions to SK
The statistical properties of the spectral kurtosis estimator SK described in §2.1, depend on the number of samples which are considered. The multi-receiver formulation SK, described in §2.2, combines multiple receivers' samples to reduce the inherent variance in the estimators; however, this relies on the independence of each receiver's signal. In the presence of sources which contribute substantially to the antenna temperature, the various receivers may be dominated by a common-mode signal, introducing a correlation and reducing the effective number of independent samples used in computing the SK estimator. Figure 5 shows the results of a numerical simulation in which a number of independent Gaussian datasets, representing the noise-dominated receiver timestreams, are augmented by a common-mode Gaussian signal representing the 'source'. As the common-mode signal's amplitude is increased, the variance in SK increases, eventually reaching the value which would be predicted for a single-receiver timestream with identical properties. Notably, larger-N arrays see this effect at lower levels of common-mode signal, placing a sensitivity-dependent upper limit on the size of array for which the multi-receiver formalism will show practical benefits.
In the case of CHIME, the instantaneous correlation between receivers is rarely above a few percentthe exception occurs at solar transit, where correlations of order unity are observed at some frequencies. As data is generally of low quality during solar transit, and is discarded, this effect is considered negligible for our purposes. , which smoothly declines from the full complement to an effectively-single-receiver case when the common-mode signal dominates the system.

B. Digitization Effects on SK
The quantization of data within a telescope's digital processing systems may have significant effects on the properties of spectral kurtosis estimators as applied to the resultant data.  examine this effect and address it by constructing a Gamma distribution with empirically-derived shape and scale parameters, and then using the properties of this function to set the appropriate shape factor for their generalized spectral kurtosis estimator. In our implementation, given the large number of samples accumulated and the well-constrained properties of data within the CHIME digital signal processing system, we have opted to generate excision thresholds based directly on numerical simulations of the CHIME correlator's data characteristics.
To simulate the digitization-related truncation and rounding effects present in the CHIME correlator, a set of sample data was generated which mimicked the properties of CHIME's intermediate data products -4+4-bit complex Gaussian data (bounded between −7 and 7) with an RMS of ≈ 1.52 least-significant bits in each component. Following §2.2.1, SK estimates were computed for a variety of integration lengths and array sizes. The statistics of those estimates were averaged over multiple iterations and compared to analogous non-'digitized' (floating-point) simulations, the results of which are presented in Figure 6. -: Differences between the statistical moments of 'digitized' (heavily discretized) and non-digitized SK estimates for a variety of effective array sizes. For large numbers of inputs, the mean (left) converges to a value which is offset from the expected result by a constant, while the variance (right) converges to the same value in both digitized and non-digitized simulations.
In the case of CHIME-like discretization, digitization effects on the variance (and higher moments) of the SK estimator vanish for large numbers of inputs. The mean value of the SK estimator, however, converges to a value slightly offset from unity. This offset is added directly to the value of the RFI excision threshold; as the higher moments are unchanged for the large number of inputs considered, this restores behaviour identical to the threshold-determination methods described in §3.2.