A 32-Unit 240-GHz Heterodyne Receiver Array in 65-nm CMOS With Array-Wide Phase Locking

Zhi Hu, Student Member, IEEE, Cheng Wang, Student Member, IEEE, and Ruonan Han, Member, IEEE

Abstract—This paper reports a 32-unit phase-locked dense heterodyne receiver array at $f_{RF} = 240$ GHz. To synthesize a large receiving aperture without large sidelobe response, this chip has the following two features. The first feature is the small size of the heterodyne receiver unit, which is only $\frac{\lambda_{f_{RF}}}{4} \times \frac{\lambda_{f_{RF}}}{2}$. It allows for the integration of two interleaved $4 \times 4$ arrays within a $1.2 \text{ mm}^2$ die area for concurrent steering of two independent beams. Such unit compactness is enabled by the multi-functionality of the receiver structure, which simultaneously accomplishes local oscillator (LO) generation, inter-unit LO synchronization, input wave coupling, and frequency down-conversion. The second feature is the high scalability of the array, which is based on a strongly coupled 2-D LO network. Large array size is realizable simply by tiling more receiver units. With the upsizing of the array, our de-centralized design, contrary to its prior centralized counterparts, offers invariant phase noise of the entire LO network. Meanwhile, the entire LO network is also locked to a 75-MHz reference, facilitating phase-coherent pairing with external sub-terahertz transmitters. A chip prototype using a bulk 65-nm CMOS technology is implemented, with a dc power of 980 mW. Phase locking of the 240-GHz LO is achieved among all 32 units, with a measured phase noise of $-84 \text{ dBc/Hz}$ (1-MHz offset). The measured sensitivity (BW = 1 kHz) of a single unit is 58 fW. Compared to previous square-law detector arrays of comparable scale and density, this chip provides phase-sensitive detection with sensitivity improvement.

Index Terms—CMOS, compact electromagnetic design, heterodyne sub-terahertz receiver, high-density scalable array.

I. INTRODUCTION

IMAGING using sub-terahertz signals in reflective mode is gaining increased attention. Compared with current millimeter-wave radars at 24 and 77 GHz, sub-terahertz imaging arrays, owing to the short wavelength of the signal, generate smaller beamwidth (preferably under 1°) with a given aperture size (up to tens of cm$^2$). This potentially enables very high angular resolution in a compact imaging system, of which the sensing capability evolves from object detection to object recognition. This is critical for applications such as autonomous vehicle, where multiple sensing modalities are required to improve safety. To be more specific, sub-terahertz imaging is expected to complement Light Detection and Ranging (LiDAR) imaging in dust clouds, fog, and atmospheric turbulence, where sub-terahertz signal absorptive loss is much smaller than that of the IR waves [1]–[3]. In the 200–300-GHz transmission window, the atmospheric (50% relative humidity) absorptive loss is below 0.01 dB/m [4], which does not prohibit the sensing at a distance of a few hundred meters.

The recent progress in CMOS-based sub-terahertz/terahertz electronics opens up new opportunities in building low-cost images for this band. Recently, ultrahigh-frequency square-law detectors based on MOS/HBT transistors and Schottky-barrier diodes are adopted in focal-plane imaging arrays [7], [8], [21]. Since no high-frequency signals are routed globally, these detector arrays are intrinsically scalable to any aperture size. But since the baseband output of square-law detectors stems from the self-mixing of the weak input signal, the resultant sensitivity, quantified as the noise-equivalent power (NEP) is mediocre (NEP $\approx 10$–$100$ pW/Hz$^{1/2}$). This, in turn, demands large illumination power at this frequency range, which is highly challenging for solid-state electronics. Alternatively, heterodyne sensors, which mix the input signal with a strong local oscillator (LO) signal, are able to generate much higher baseband output and sensitivity compared to square-law detectors. Furthermore, in a coherent array, since the output of each receiver unit preserves the phase of the input signal, a back-end analog/digital signal processing can synthesize electronically steered beam response (i.e., beam forming). This offers superior frame rate and reliability compared to the mechanical scanning scheme in conventional LiDAR and terahertz imaging systems [22].

To perform high-resolution imaging, heterodyne array should be large scale and dense. Specifically, to provide a 1° beam at $f_{RF} = 240$ GHz, a dense $\sim 6 \times 6 \text{ cm}^2$ array is needed. Due to high cost and low yield issues of large-area chips, a more practical solution would be a “virtual array” [23]. A conceived instance is shown in Fig. 1(a): a sparse $10 \times 10$ 6 $\times 6\text{ cm}^2$ transmitter (TX) array is used to generate ultra-narrow beams, while a dense $10 \times 10$ receiver (RX) array chip elements is used to select out the main lobe. The overall response is a single ultra-narrow beam ($\sim 1°$) [see Fig. 1(b)]. The bottom line is that ultra-narrow sub-terahertz beam is obtainable and a dense RX array chip is indispensable. A dense RX array can decrease the scale and density of the TX array ($N_{TX} \propto N_{RX}^{-1}$), which gives TX the capability of generating higher RF power (due to lower heat density), placing large-footprint RF phase shifters and
multiplier chains (if used), and forming array on the board or Si-interposer level.

Unfortunately, the number of coherent heterodyne receiving pixels integratable in a single chip is currently very limited [see Fig. 2(a)]. In both [24] (a silicon micro-machined array) and [14] (a SiGe single-chip array), only eight coherent pixels are implemented. For a larger heterodyne receiving array, two critical problems await to be addressed as follows.

1) Architectural Scalability: Traditional heterodyne arrays are built on a centralized architecture, where the LO signal is generated from a single source [e.g., a phase-locked loop (PLL)], and then distributed to all pixels through a corporate feed [Fig. 2(b)]. However, as the array scales up, the LO power shared by each unit decreases. That, along with the inherently high phase noise of sub-terahertz LO signals, leads to significant degradation of sensitivity, e.g., 71.4 pW (BW = 1 kHz) reported in [14]. Moreover, the loss, phase/amplitude mismatch, as well as the complication of the high-frequency global routing of LO, increase rapidly with the array scale, limiting the pixel number to about 8 (with a one-tier radial LO network [14]).

2) Pixel Footprint: The aforementioned inter-element pitch of $\frac{\lambda_{RF}}{2}$ for sidelobe suppression corresponds to a very tight area to accommodate the on-chip antenna and heterodyne circuitry of each receiver. At 240 GHz with the dielectric environment (silicon substrate and inter-metal-layer silicon dioxide), the maximum dimension of a pixel is only $\sim$0.3 mm. Unfortunately, the dimension of most resonant antennas is already $\frac{\lambda_{RF}}{2}$, and that of other distributed filtering/matching components is also a large fraction of (if not longer than) $\frac{\lambda_{RF}}{2}$. Such crowedness also further prevent the placement of the complicated LO distribution network mentioned earlier.

In this paper, a de-centralized architecture with intra-unit LO generation and a 2-D coupled LO network is presented, which is applicable to building high-scalability arrays. In addition to the challenges of building a large-scale array, here we point out the challenges of the operation of such large-scale coupled array. The main problems include: 1) relative phase errors and 2) relative power errors between the LO signals generated from different elements; both stem from the oscillator coupling. Phase error can be quantitatively derived (as a function of the original oscillation frequencies) from the Adlers equation [25]; it will lead to the broadening of the main lobe of the synthesized antenna pattern [26]. Power error was analyzed in [27], which will result in the difference of conversion loss among different elements. (Equations of both phase and power errors of transmission-line-based coupling were discussed quantitatively in [27], which shows how these errors can be reduced by tuning the lengths and impedances of the transmission lines.)

Meanwhile, a new multi-functional self-oscillating mixer structure is employed, allowing us to implement a pair of heterodyne pixels inside the tight ($\lambda/2$-$\lambda/2$) area. As a proof-of-concept, a 240-GHz receiver using 65-nm bulk CMOS technology is implemented (and originally reported in [28]). Two interleaved $4 \times 4$ sub-arrays are integrated within a 1.2 mm$^2$ area, plus a built-in phase locking circuitry. This paper demonstrates the implementation feasibility of heterodyne receiver arrays at sub-terahertz with large scale and high density. Shown in Fig. 2(a), our work breaks the existing tradeoff between the scale and sensitivity of sensing arrays. These, along with the enabled phase detection capability, potentially make high-resolution beam steering possible. The remainder of this paper is organized as follows. In Section II, details of the de-centralized architecture are given. In Section III, we focus on the design of a single pixel. In Section IV, other critical topics, such as the formation of array and integrated phase-locking circuitry, are discussed. After that, the experimental results are presented in Section VI, and the conclusion with a comparison with the state-of-the-art is drawn in Section VII.

Compared to [28], this extended paper has the following new contents: 1) discussions on the necessities of building a dense large-scale array; 2) more detailed analysis on the operation of the self-oscillating harmonic mixer (SOHM); 3) more detailed description of the array-wide phase-locking scheme; and 4) more measurement results including IF noise floor, IF spectra of all elements, and relative phase of IF signals when the chip was rotated in the $E$- and $H$-planes.

II. SENSOR ARRAY ARCHITECTURE: DECENTRALIZATION

Our sensor array adopts an architecture shown in Fig. 2(c). Each pixel in the array has a built-in LO in addition to the on-chip antenna and downconversion mixer. Meanwhile, the oscillator forms strong coupling with its neighboring peers at the four-pixel edges, so that a 2-D oscillator network with synchronized frequency and phase is established. At one boundary of this LO network, the oscillation frequency is extracted by a frequency divider, which then feeds its output into a chain of phase-frequency detector, charge pump, and low-pass filter (LPF). Finally, the output control signal of the chain $V_{ctrl}$ is distributed back to the frequency-tuning terminals of all the LOs. As a result, a PLL is formed, which makes the LO signal of each pixel to be coherent with the transmitter signal (not on this chip) through a low-frequency reference clock $f_{ref}$.

The increase of the array scale can be readily realized by “tiling” more such pixel units together. In comparison

![Image](image-url)
with the conventional architecture in Fig. 2(b), this new de-centralized LO generation scheme completely eliminates the global LO-distribution network. That means, the LO power injected into each downconversion mixer remains constant and the tradeoff described in Section I between sensitivity and array size no longer exists. In fact, since the phase noise \( \mathcal{L}(f) \) (in dBc/Hz) of a coupled oscillator network decreases with larger number of units \( n \) (i.e., \( \mathcal{L}(f)(n=1) - 10 \log_{10} N \)), the phase accuracy obtained from each pixel is expected to even improve as the array scales up. Finally, we point out that the global signal \( V_{\text{ctrl}} \), with variations at only megahertz level, can be distributed globally through simple wire connections. Thus, the routing complexity remains low as the array scales up.

The above architecture effectively solves the scalability problem; however, the additional in-pixel LO, as well as the associated inter-pixel coupling structures, due to their large footprint, could potentially exacerbate the density problem described in Section I. In our design, this is addressed by condensing the fundamental LO (at \( f_0 \), the fundamental oscillation frequency), LO frequency doubler (at \( 2f_0 \)), antenna (receiving input signal at \( f_{RF} \)) and downconversion mixer (at \( f_{IF} = |f_{RF} - 2f_0| \)) into a single multi-functional circuit structure. The structure (to be described in detail in Section III) is essentially a SOHM with a built-in slot antenna. Fig. 3 shows our 240-GHz sensor array based on such a pixel design (\( f_0 = 120 \) GHz). Due to the compactness of the pixel, inside each \( \lambda_{RF}/2 \cdot \lambda_{IF}/2 \) space (denoted as a “cell”), a pair of SOHM pixels is placed back-to-back. The array is an integration of 4 \( \times \) 4 cells, and thus 32 pixels in total. This enables two modes in the back-end signal processing: 1) the outputs of each pixel pair are combined, so that each cell is equivalent to a single receiver with a dual-slot antenna. The symmetry of the combined antenna pattern is improved over that of the single pixel \([29]\). A beam can then be formed by the 4\( \times \)4-cell array and 2) two 4\( \times \)4 arrays, one including all pixels at top halves of the cells and the other including all the bottom halves (Fig. 3), are processed separately. With different phase shift gradients applied, two independent beams can be formed concurrently. For example, in Fig. 1, the two RX beams may pair with two generated lobes of the TX pattern\(^2\) and increases the overall scanning speed.

III. HETERODYNE PIXEL: VERSATILE STRUCTURE DESIGN

The array cell is built on a planar multi-slot structure in the top metal layer, and its 3-D structure is shown in Fig. 4(a). Regarding the LO signals, the two pixels at the top and bottom halves of the cell are coupled through a coplanar waveguide (CPW). Similarly, in the horizontal direction, each pixel is also coupled with its neighboring counterparts via CPW lines. In the vertical direction, two adjacent cells share a slotline.

A CPW line can be regarded as two slots with symmetric electrical fields pointing to opposite directions [Fig. 4(a)]. When we analyze a single pixel [shown in Fig. 4(b)], one slot of each CPW section is incorporated into the pixel, with a perfect magnetic conductor (PMC) boundary condition at

\(^1\)Since the two pixels inside each cell have significant aperture overlap due to their close proximity, this chip is unable to form a beam with beamwidth as narrow as that obtained from an ordinary 4\( \times \)8 array with \( \lambda/2 \) pixel pitch.

\(^2\)In comparison, in a single-beam RX configuration (Fig. 1), the power of one of the two lobes is not utilized.
and TL of (λ condition. As mentioned earlier, each cell has a dimension analysis, with a perfect electric conductor (PEC) boundary the cell-sharing slot is also incorporated into the pixel under the outer edge of that slot. At the top side of Fig. 4(b), half of the meandering section of Fig. 5. Equivalent circuit of a single self-oscillating mixer unit. Note that the signal generated at the drain–source port of M1 is fed to the gate–source port of the device via TL2 cascaded with CPW line TL1 (and similarly, TL1’ for M2). Note that the slot TL2 only permits the propagation of the odd quasi-TE mode of the signal. That mode leads to out-of-phase voltages on the two conductors of TL2. As a result, the two MOSFETs are forced to oscillate differentially at f0. The generated waves at f0 then propagate through the slotline TL5 so that the short-terminated slotlines TL3 and TL3’ are connected in shunt with the drain–source port of corresponding MOSFETs on the two sides, respectively.

To further facilitate the analysis of SOHM, at f0, we focus on only the half-circuit equivalence of the structure in Fig. 5. This is justified by the fact that, in slots TL2 and TL5, the TE-mode electrical-field vectors are always perpendicular to the vertical plane along the central line A-B in Fig. 4(b). That plane is then equivalent to a PEC boundary (denoted as “virtual ground plane” here). Subsequently, a single pixel can be separated into two parts to be analyzed independently. It is noteworthy that this virtual ground plane is equivalent to a perfectly conductive wall (i.e., PEC boundary condition), and thus, in a half circuit, all nodes that are connected to this wall are equipotential. In addition, in the half circuit, TL2 and TL5 are effectively still transmission lines with two conductors, each formed by a physical conductor and a virtual conductor (PEC).

As a result, the half-circuit equivalent of the pixel at f0 is derived in Fig. 6(a). By disregarding the physical forms of the transmission lines, the circuit in Fig. 6(a) can be further transformed into the circuit in Fig. 6(b), through which the self-feeding topology is more clearly revealed. Here, the combination of the characteristic impedance and electrical phase of TL1 (ZTL1 = 55 Ω, ϕTL1,f0 = 79°) ensures the optimal phase of the MOSFET complex voltage gain (ϕV_{drain}/V_{gate,opt} = 158°), which is critical to the generation of the maximum fundamental oscillation power [30]. Meanwhile, C1, TL3, and TL4, connected in shunt, form a resonator at the oscillation frequency of 120 GHz. Placed at the peripheral of the pixel, TL3 has an electrical length of 135°, hence presenting a capacitive impedance; it resonates with TL4, which has an electrical length of 60° hence an inductive

A. Fundamental Oscillation at 120 GHz

In Fig. 5, the SOHM oscillates at f0 ≈ 120 GHz, and its second-harmonic signal (2f0 ≈ 240 GHz) is used as the LO. Topologically, the SOHM can be regarded as two self-feeding oscillators [30] coupled by central slotlines TL2 and TL5. To push the devices to the instability regime, the outer edge of that slot. At the top side of Fig. 4(b), half of the cell-sharing slot is also incorporated into the pixel under analysis, with a perfect electric conductor (PEC) boundary condition. As mentioned earlier, each cell has a dimension of (λRF/2) · (λRF/2), thus each pixel has a dimension of (λf0/8) · (λf0/4), where f0 ≈ f_{RF}/2.

The equivalent circuit of a single SOHM pixel is shown in Fig. 5. Next, we show how various signal/electromagnetic modes are manipulated independently in the circuit structure to achieve multi-functionality and compactness of the pixel.

Fig. 5. Equivalent circuit of a single self-oscillating mixer unit. Note that the meandering section of TL4 (Fig. 4) is not shown here.

Fig. 4. 3-D structures of (a) cell containing two heterodyne receiving pixels and (b) single pixel with equivalent boundary conditions.
impedance. The varactor $C_1$, used for changing the oscillation frequency, is adjustable between 5 and 9 fF. In the simulation, the tuning range of $f_0$ (near 120 GHz) is 1.2 GHz.

Compared to microstrip and CPW transmission lines, slotlines typically have higher loss due to the lack of balanced wave formation to suppress radiation. In our design, however, such balanced waves are formed which effectively reduce radiative loss at $f_0$. From the field distribution shown in Fig. 7(a), we see that the standing waves inside the $a$-$b$ and $b$-$c$ sections of $TL_3$ are out-of-phase with their adjacent counterparts in nearby pixels. Meanwhile, the wave in the $c$-$d$ section is partially cancelled by its out-of-phase counterpart at the right half of the same pixel; and similar cancellation also occurs in $TL_4$.

The simulated $E$-field distribution is shown in Fig. 7(b). As analyzed earlier, the waves generated from the drain–source port of the MOSFETs are able to propagate through the central slotline $TL_2$ and $TL_5$, and then permeate the reactive $TL_3$ and $TL_3'$ on the borders to form resonance and coupling. The radiative power in simulation is negligible; that, along with the realized optimal device condition, leads to strong oscillation and low simulated phase noise [at 120 GHz, see Fig. 8(a)] of $-95$ dBc/Hz (or $-89$ dBc/Hz for the 240-GHz LO signal) at 1-MHz offset. The simulated pixel dc power is 43.2 mW.

### B. 240-GHz Harmonic LO Generation and Frequency Mixing

Inside each pixel, the differential self-feeding oscillators generate in-phase harmonic LO signal at $2f_0$. Compared to prior terahertz radiation sources based on similar harmonic oscillator structures [27], [31], additional functions are required for our heterodyne pixel design: 1) an incident RF signal should be efficiently coupled into the transistors and 2) the generated harmonic LO signal should be confined within the transistors for downconversion rather than being coupled into the free space. As a result, the above-mentioned two signals at $f_{RF}$ and $2f_0$ should be directed differently; that is, however, very challenging, because the signals have not only the same frequency but also the same wave mode (i.e., even mode).

Our heterodyne pixel design, although compact, realizes the above functions. First, for the even-mode harmonic LO signal generated from the two drain terminals, the associated wave inside slotline $TL_2$ and $TL_5$, if it exists, should have a balanced TM mode. However, a slot only supports the propagation of an unbalanced TE-mode wave; therefore, both drain terminals are equivalently open terminated and the LO signal is highly confined inside the devices [see Fig. 9(a)].

We also note that, through the $C_{gd}$ of the MOSFETs, a small portion of the LO power leaks to $TL_4$, but the associated
radiative loss is still small. To understand this, we see from Fig. 7(a) that the E-field distribution at \( f_0 \) in the bottom pixel is a mirrored version of that in the top pixel; accordingly, the harmonic E-field distribution at \( 2f_0 \) follows the same rotational symmetry inside the two pixels [shown in Fig. 9(a)]. As a result, the associated broadside radiation from the two pixels (with a spacing of \( \lambda_{2f_0}/4 \)) still cancels. This is verified by the E-field distribution of the LO signal [Fig. 9(b)], which results in negligible radiative power in High Frequency Structure Simulator (HFSS) simulation. Note that, in Fig. 9(b), the LO waves generated at the MOSFET drains are unable to propagate through the central slotlines \( TL_2 \) and \( TL_5 \), as analyzed previously.

Next, for the incident sub-terahertz signal \( f_{RF} \approx 2f_0 \), the \( TL_4 \) pair functions as a slot dipole antenna. Note that the total length of the \( TL_4 \) branch at \( 2f_0 \) is 120°, which consists of a straight section (\( \sim 90° \)) for radiation coupling and a meandering section (\( \sim 30° \) at \( 2f_0 \), not shown in some of the previous figures) for impedance matching of the antenna. The received waves in common mode are injected into the MOSFET gates through the CPW \( TL_1 \) and \( TL_1' \). It is then mixed with the harmonic LO signal confined within the MOSFET channel, and downconverted to a common-mode drain current at \( f_{IF} = |f_{RF} - 2f_0| \). Through a quarter-wave RF choke at \( 2f_0 \), which causes negligible effect at \( f_{IF} \), the pixel output current is extracted. The simulated E-field distribution for the RF input signal is shown in Fig. 10(b). We see that the wave is guided to the gate of MOSFETs, where it is then downmixed with the harmonic LO signal. To prevent the incident RF signal from exciting substrate-mode waves, a hemispheric silicon lens needs to be attached to the backside of the chip. In the HFSS simulation, a semi-infinite silicon medium at the chip back is used to emulate the lens. The simulated peak directivity and efficiency of the antenna are 4.8 dBi and 40%, respectively.

The simulated conversion loss of the pixel with a 50-Ω output load within the oscillator tuning range is \( -16.1 \sim -16.6 \) dB. Fig. 11 shows the baseband noise power spectral density (PSD) of the pixel output in the simulation. Below 100 MHz, the noise is dominated by the flicker noise of the devices. The white noise floor at higher IF frequency is close to \(-170 \) dBm/Hz. With \( f_{IF} \) of 5 MHz where flicker noise dominates (see Fig. 11), the simulated noise figure (NF) is 46.5 dB. With \( f_{IF} \) of 100 MHz, the NF is lowered to 19.3 dB.

**IV. RECEIVER ARRAY WITH COHERENT LO SIGNALS**

Strong coupling between adjacent pixels is critical for maintaining the phase coherence of distributedly generated LO signals across the array. Its mechanism has been discussed in Section III and the resultant E-field distribution at \( f_0 \) is shown in Fig. 7(a). Quantitative analyses of CPW-based coupling [e.g., segment \( ab \) and \( bc \) in Fig. 7(a)] and slotline-sharing-based coupling [e.g., segment \( cd \) in Fig. 7(a)], including the phase mismatch caused by array inhomogeneity, are previously given by us in [27]. Here, we present the simulation results showing two important features of the large-scale coherent array.

The first feature is beam steering which can be achieved by changing the phase of downconverted IF signals in the baseband. Fig. 12 shows the patterns of a steerable beam using a 4 × 4 sub-array; a \( -3\) dB beamwidth of \( \sim 16° \) is obtained in both the E- and H-planes. Alternatively, when the 4 × 8 configuration described in Section II is used, the highest sidelobes in the H-plane at \( \pm 20° \) steering angle can be further suppressed by \( \sim 6 \) dB.

The second feature is phase-noise reduction by coupling the oscillations of the pixels together [32]. Fig. 8(b) shows the simulated phase noise of the fundamental 120-GHz oscillation generated by various coupled oscillator arrays with sizes of \( 1 \times 1, 2 \times 1, 2 \times 2, 4 \times 2, 4 \times 4 \), and \( 8 \times 4 \). We see a clear slope of \( \sim 6 \) dB/octave which agrees with the theoretical prediction given in Section II.

**V. INTEGRATED PHASE-LOCKING CIRCUITRY**

The block diagram of the on-chip PLL is illustrated earlier in Fig. 3. In this section, we present the details of several critical PLL blocks.

Fig. 13 shows the array PLL interface at the bottom border of the array, between the pixels at Row 8 Column 2 and Row 8.
Column 3. An additional pair of slotlines is added in parallel to the bottom slotlines of these two pixels. This slotline pair is connected to a CPW network which, on the other end, is connected to a MOSFET switch inside the injection-locked frequency divider (ILFD) in Fig. 3. The equivalent circuit of this interface structure is also shown in Fig. 13, where all transmission lines of interests are regarded as CPWs. The top three CPWs are the integral part of the resonance tanks of the two pixels, and the network underneath is designed to present a real, high impedance to the two pixels. Through such intentional impedance mismatch, only 100 μW out of the ~7 mW of total oscillation power at \( f_0 \) is extracted. This lowers the perturbation to the array operation and in the meantime provides sufficient power for the ILFD. Between these two parts is a metal–oxide–metal (MOM) capacitor used for dc isolation, so that the gate of the MOSFET switch in the ILFD is independently biased through a short-terminated CPW stub.

After the oscillation signal at \( f_0 \) is coupled out, it is frequency divided in a divider chain (÷1600). The schematic of the high-frequency front end of the chain is shown in Fig. 14, which consists of a divide-by-4 ILFD cascaded and a divide-by-4 current-mode logic (CML) divider. The ILFD is essentially an \( L-C \) push-push oscillator oscillating at \( f_{osc} \), whose third-order harmonic at \( 3f_{osc} \) mixes with the injected signal at \( f_0 \). Through negative feedback, the down-mixed signal at \( f_0-3f_{osc} \) is equal to \( f_{osc} \) at steady state, i.e., \( f_{osc} \) is locked to \( f_0/4 \) (see [33] for more details). The output signals at \( f_0/4 \) are then fed to a pair of buffers used to isolate the resonance tank from the next stage. The simulated locking range with 100-μW injection power is 4.2 GHz, and the total dc power consumption of the ILFD including the buffers is 5.8 mW. The divider in the next stage is essentially a CML ring oscillator, of which the oscillation frequency is around \( f_0/16 \). Frequency injection is achieved by injecting currents at \( f_0/4 \) to all the CML inverter stages by modulating the tail current sources. The simulated locking range of the CML divider is 15 GHz and the power consumption is 4.7 mW.
The rest of the divider chain is made of static flip-flops. The LPF of the PLL is implemented on the printed circuit board (PCB) due to the large capacitor required. A loop bandwidth of 50 kHz is chosen.

VI. CHIP PROTOTYPE AND EXPERIMENTAL RESULTS

The chip was fabricated using Taiwan Semiconductor Manufacturing Company 65-nm low-power CMOS technology ($f_{\text{max}} \approx 200$ GHz). The die photograph is shown in Fig. 15(a). The area of the 32-receiver-unit array alone is 1.2 mm$^2$, and the total chip area including the PLL and pads is 2.8 mm$^2$. The packaging detail is shown in Fig. 15(b). As explained earlier, a hemispherical silicon lens (with 1-cm diameter) was attached to facilitate back-side radiation of the on-chip slot antennas. Between the lens and chip is a piece of undoped silicon wafer used to align the chip with the rectangular hole of the PCB. Since the Si lens is hemispherical, the incident angle and refracting angle are zero at the air–lens interface, so the inclusion of Si lens only alters the path length of incident RF waves without changing the antenna patterns and the relative phase relationships, and thus coherent array forming is still feasible. A lens with larger diameter can further reduce the systematic phase error of off-axis receiver elements. The measured dc power of the entire chip is 0.98 W.

First, we measured the output of the divider chain at $f_0/1600$, in order to determine the locking range of the chip. The measured spectrum when the array is locked is shown in Fig. 16(c), and the measured PLL locking range (for $f_0$) is 116.48–117.44 GHz (i.e., 232.96–234.88 GHz for $f_{\text{LO}}$).

Next, the IF outputs were measured when an RF radiation was projected onto the chip. Fig. 16(a) shows the associated experimental setup. A Virginia Diodes, Inc. (VDI) WR-3.4 vector network analyzer (VNA) frequency extender was used as the radiation source, which has a total radiated power of −7.1 dBm (calibrated by a PM5 power meter) and an antenna gain of 24 dBi. The source-to-chip distance was 10 cm, which is greater than the calculated far-field limit of 4.8 cm. Two signal generators (Keysight E8257D and HP 83732B), which were coherently synchronized through a 10-MHz signal, were used to collectively provide the input reference signals of the chip and the radiation source. The chip output IF signals were multiplexed using an ADG726 chip on the PCB and are then amplified using two cascaded ZFL-500L.N amplifiers (50-Ω interfaces) with a calibrated gain of 49 dB and NF of 2.9 dB.

To show the function of the entire array, the IF spectra of all elements are shown in Fig. 17. $f_{\text{IF}}$ was set to 28.20 MHz [where the flicker noise is relatively low and the multiplexer still presents small insertion loss ($\sim 2$ dB)]. All IF signals were measured at the expected 28.20 MHz and locked. In addition, when we tuned the PLL reference frequency (which translates to $f_{\text{LO}}$), $f_{\text{IF}}$ of all elements shifted to the new expected value. This further confirms the desired array-wide LO frequency locking. The output of the element in Row 1 Column 3 has the highest $P_{\text{IF}}$ (−32.0 dBm, after multiplexer loss and amplifier gain de-embedding being −79.0 dBm). Similar to the measurement in [14], the sideband spectrum around each IF tone in Fig. 17 is the downconverted LO phase noise mixed with the incident RF signal. Its power is proportional to the input RF power, and thus, is not considered to be the noise floor determining the minimum detectable power.

The receiver has a better performance at higher $f_{\text{IF}}$ (although the multiplexer needs to be bypassed), since at 28.20 MHz, the noise is predominantly flicker noise. This is justified by the measured IF noise power spectrum of the element in Row 1 Column 3 from 1 to 500 MHz in Fig. 18(c). (The relatively flat region of the noise spectrum near 10 MHz can be attributed to the downconversion of the phase noise of the LO signal.) We examined the receiver performance at 475 MHz [noise is already white according to Fig. 18(c)]. Fig. 18(a) shows the measured IF spectrum of the same element (Row 1 Column 3). IF power $P_{\text{IF}}$ is −31.7 dBm, after amplifier gain de-embedding −80.7 dBm—slightly lower than −79.0 dBm when $f_{\text{IF}} = 28.20$ MHz. This is possibly due to the parasitic capacitance on the IF output path. The output noise PSD at 475 MHz at the output of the amplifier chain is −121 dBm/Hz [−71 dBm with resolution bandwidth (RBW) = 100 kHz in Fig. 18(c)]. After de-embedding the gain and the NF of the amplifier, the output noise PSD at the IF port of the chip was calculated to be −172.2 dBm/Hz.

The phase of each IF signal was measured using the method shown in Fig. 19. Since the global phase shift does not contribute to the amplitude of a synthesized array pattern, in the measurement we measured the phase of each IF signal.

\footnote{To measure the noise and IF response beyond noise corner frequency, the on-board multiplexer ADG726 chip was by-passed, in order to eliminate its bandwidth limitation. The bandwidth of the ZFL-500L.N amplifier is 0.1 to 500 MHz, which is still usable for the testing.}
Fig. 17. IF spectra of all array elements (with the measured amplified IF power specifically annotated) at 28.20 MHz with RBW = 10 kHz. Calculated conversion loss and noise power values of all elements are given in the tables on the right. All arrays are all oriented in a way that the bottom row faces the PLL circuitry [see Fig. 15(a) for reference].

With reference to the phase of a certain IF signal (bypassing the multiplexer). A 2.5-cm diameter Si lens was used to alleviate the off-axis effect. We passed signals through 25-MHz bandpass filters and fed them into a multi-channel oscilloscope to get the phase difference. We rotated the chip in both $E$- and $H$-planes by a small angle of $10^\circ$; the measured relative phase (offset to the top right element) was shown in Fig. 19—phase gradient along horizontal and vertical directions in two tables can be observed. At a larger rotation angle in both planes, phase changes between adjacent elements in rows/columns were no longer monotonic. This could be attributed to non-ideal chip-axis alignment (causing refraction
Fig. 18. Measured spectrum of (a) IF signal (Row 1, Column 3) at 475 MHz with RBW = 10 kHz, and the noise spectrum of the same unit from 1 to 500 MHz at IF port when RF signal is absent, (b) screenshot from spectrum analyzer (100-time trace average), and (c) replotted noise power in dBm/Hz unit (RBW, the power gain and the NF of the IF amplifier de-embedded) and log–log scale. Spurs in the noise spectra are the input reference signal and its harmonics coupled to the IF port.

Fig. 19. Measurement of relative phase of array pixels. (a) Method of measurement based on multi-channel oscilloscope. (b) Phase distribution of all elements of the 4 × 8 array, when the chip was rotated. All values are relative to the phase of the element in Row 1, Column 4 (top right in the figure).

at the lens-air interface) and wave reflection on surrounding metal lens mount.

The directivity of the on-chip antenna in a pixel is needed to evaluate the receiver performance. The radiation pattern, shown in Fig. 20, was measured by rotating the chip and recording the magnitude of the IF signal at each azimuth and elevation angle. For the measurement of antenna gain, we assume that the received RF power is proportional to the measured IF power (i.e., conversion loss is constant during the measurement). The measured peak directivity $D_{RX}$ is 6.0 dB. The endfire responses are lower than the simulations, because the radiation was blocked by the lens fixture. According to the measured $D_{RX}$, the 240-GHz power injected into the pixel unit is $-40.9$ dBm, resulting in an estimated receiver conversion loss (amplifier gain de-embedded) of 39.8 dB and a NF (amplifier gain and NF de-embedded) of 41.6 dB when $f_{RF} = 475$ MHz. Note that these values include the loss of the pixel antenna, which is estimated to be 4 dB from simulation. The measured NF is much higher than the simulated value of 19 dB. This could be due to the higher than expected insertion loss at the chip-to-wafer interface mediated by superglue, and the lower than expected oscillation activity of elements due to coupling of oscillators whose central oscillation frequency in the standalone case is off from the oscillation frequency in the coupled case.

The phase noise of in-pixel LO signal was measured directly using the weak leaked LO waves from the chip. Such leakage near 240 GHz is due to the non-ideal radiation cancellation stemming from the variations of oscillation power among the pixels. The LO leakage was detected by the same VDI VNA extender [see Fig. 16(b)] operating in the RX mode. The downconverted spectrum of the 240-GHz LO, as well as its phase noise profile, was shown, respectively, in Fig. 21. At 1-MHz offset, the measured LO phase noise was $-84$ dBc/Hz, which is 22 dB lower than the LO in [14] normalized to 240 GHz. Note that for our case, the coupling among pixels plays an important role in lowering the phase noise.

Finally, we discuss the array-scale variation issues. Effects such as process variation (e.g., doping gradient), temperature gradient will affect the performance of the large-area array, since they will change the RF performance of transistors of receiver elements at different locations, and as a result, oscillator activities and fundamental oscillation frequencies (in the uncoupled case) of elements will be different. After coupling, although their oscillation frequencies are the same, their generated LO powers are different, hence different conversion losses, as we discussed in Section I and observed in the table of Fig. 17. We can also see from the table that elements in the lower rows have higher conversion loss due to the presence
TABLE I

<table>
<thead>
<tr>
<th>Detection Method</th>
<th>This Work</th>
<th>[14]</th>
<th>[15]</th>
<th>[5]</th>
<th>[6]</th>
<th>[7]</th>
<th>[8]</th>
</tr>
</thead>
<tbody>
<tr>
<td>Array Size</td>
<td>4 x 8</td>
<td>S</td>
<td>2 x 2</td>
<td>4 x 4</td>
<td>4 x 4</td>
<td>32 x 32</td>
<td></td>
</tr>
<tr>
<td>Array Scalability?</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
<td>Yes</td>
<td>Yes</td>
<td></td>
</tr>
<tr>
<td>RF Frequency (GHz)</td>
<td>240</td>
<td>320</td>
<td>650</td>
<td>280</td>
<td>320</td>
<td>280</td>
<td>860</td>
</tr>
<tr>
<td>LO Phase Noise (1-MHz Offset) (dB/Hz)</td>
<td>-84</td>
<td>-59</td>
<td>Not reported (not PLL)</td>
<td>N/A</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Conversion Gain (dB)</td>
<td>-39.8</td>
<td>29.8</td>
<td>-44.1</td>
<td>N/A</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Sensitivity (pW) @ BW=1kHz</td>
<td>0.058</td>
<td>71.4</td>
<td>0.1</td>
<td>917</td>
<td>1080</td>
<td>250</td>
<td>3160</td>
</tr>
<tr>
<td>DC Power (mW)</td>
<td>980</td>
<td>117</td>
<td>600</td>
<td>6</td>
<td>38</td>
<td>180</td>
<td>2.56</td>
</tr>
<tr>
<td>Chip Area (mm²)</td>
<td>2.80</td>
<td>3.06</td>
<td>1.32</td>
<td>5.76</td>
<td>6.76</td>
<td>6.25</td>
<td>8.41</td>
</tr>
<tr>
<td>Technology</td>
<td>65nm CMOS</td>
<td>130nm SiGe</td>
<td>230nm SiGe</td>
<td>130nm CMOS</td>
<td>180nm SiGe</td>
<td>130nm SiGe</td>
<td>65nm CMOS</td>
</tr>
</tbody>
</table>

*Baseband amplifier gain and load impedance not reported. Required received RF power to obtain unity SNR given 1-kHz detection bandwidth.

VII. CONCLUSION

To facilitate the comparison of receiver arrays between the heterodyne scheme and square-law (direct) detection scheme, we adopt the sensitivity definition presented in [14], which is the input power level that leads to unity output SNR. The receiver bandwidth is assumed to be a practical value of 1 kHz. For heterodyne receivers, the sensitivity (in dBm) converted from the NF (in decibel) is then

\[
\text{Sensitivity}_{\text{dBm}} = -174 \text{ dBm} + \text{NF} + 30 \text{ dB} \tag{1}
\]

and for MOSFET and Schottky square-law detectors, the sensitivity (in Watt) converted from the (NEP, in W/√Hz) is

\[
\text{Sensitivity}_{\text{Watt}} = \text{NEP} \cdot \sqrt{1000 \text{ Hz}}. \tag{2}
\]

Using (1) and the measured NF = 41.6 dB when \( f_{\text{IF}} = 475 \text{ MHz} \), we get the measured sensitivity of our receiver pixel to be 58 fW (i.e., −102 dBm) at \( f_{\text{IF}} = 475 \text{ MHz} \). Similarly, sensitivities of other state-of-the-art sub-terahertz/terahertz sensing arrays in silicon are also calculated (based on the reported performance) and listed in Table I. We see that our array improves the sensitivity by \( \sim 1200 \times \) compared with the heterodyne receiver array in [14], and by \( \sim 4300 \times \) compared with the best square-law detector arrays.

With the de-centralized architecture and compact multi-functional pixels, our chip, for the first time, pushes the scale and density of the heterodyne receiver to a level that is on par with that of direct detector arrays. Very large aperture size becomes feasible now, and the only possible limits for array size are the process variation and uneven dc power distribution across the large die. Such high scalability, in combination with the enhanced sensitivity and phase detection capability, makes the presented sub-terahertz array technology attractive for the future implementation of high-resolution beam-forming imagers.

ACKNOWLEDGMENT

The authors would like to thank G. Zhang, J. Holloway, and Dr. X. Yi at MIT for technical discussions, and Dr. A. Westwood and K. Howard at Keysight Inc. for their support of the experimental instruments.

REFERENCES


Zhi Hu (S’15) received the B.S. degree in microelectronics from Fudan University, Shanghai, China, in 2015, and the M.S. degree in electrical engineering from the Massachusetts Institute of Technology, Cambridge, MA, USA, in 2017, where he is currently pursuing the Ph.D. degree in electrical engineering. In 2016, he was a Visiting Researcher with IHP Microelectronics, Frankfurt (Oder), Germany.

His research interests include integrated circuit design at terahertz and millimeter-wave frequency band, with a special focus on improving the performance of on-chip terahertz power sources and receivers using multi-functional and large-scale dense array structures for imaging and sensing applications.

Mr. Hu is a member of the IEEE Solid-State Circuits Society and the IEEE Microwave Theory and Techniques Society. He was a recipient of the Best Student Paper Award (2nd place) of the 2017 IEEE Radio Frequency Integrated Circuits Symposium, the KLA-Tencor Scholarship in 2014, and the SCSK Scholarship in 2013.

Cheng Wang (S’15) was born in Sining, China, in 1987. He received the B.S. degree in engineering physics from Tsinghua University, Beijing, China, in 2008, and the M.S. degree in radio physics from the China Academy of Engineering Physics, Mianyang, China, in 2011. He is currently pursuing the Ph.D. degree with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA.

From 2011 to 2015, he was an Assistant Research Fellow with the Institute of Electronic Engineering, Mianyang. His research interests include millimeter/terahertz-wave gas spectroscopy, high-precision clock generation, broadband communication, and radar imaging.

Mr. Wang was a recipient of the Analog Device Inc., Outstanding Student Designer Award, in 2016 and the IEEE Microwave Theory and Techniques Society Boston Chapter Scholarship in 2017.

Ruonan Han (S’10–M’14) received the B.Sc. degree in microelectronics from Fudan University, Shanghai, China, in 2007, the M.Sc. degree in electrical engineering from the University of Florida, Gainesville, FL, USA, in 2009, and the Ph.D. degree in electrical and computer engineering from Cornell University, Ithaca, NY, USA, in 2014.

In 2012, he was an Intern with Rambus Inc., Sunnyvale, CA, USA. He is currently an Associate Professor with the Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA. His research interests include microelectronic circuits and systems operating at millimeter-wave and terahertz frequencies.

Mr. Han is a member of the IEEE Solid-State Circuits Society and the IEEE Microwave Theory and Techniques Society. He was a recipient of the Cornell ECE Director’s Ph.D. Thesis Research Award, the Cornell ECE Innovation Award, and the two Best Student Paper Awards of the IEEE Radio Frequency Integrated Circuits Symposium from 2012 to 2017, the IEEE Microwave Theory and Technique Society Graduate Fellowship Award, and the IEEE Solid-State Circuits Society Predoctoral Achievement Award. He is an Associate Editor of the IEEE TRANSACTIONS ON VERY-LARGE-SCALE INTEGRATION (VLSI) SYSTEM, a Guest Editor of the IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, and also serves on the Technical Program Committee of IEEE RFIC Symposium, IEEE International Microwave Symposium and the Steering Committee of IMS in 2019. He held MIT E. E. Landsman (1958) Career Development Chair Professorship, and was the winner of the National Science Foundation CAREER Award in 2017.