A Low-Skew, Low-Jitter Receiver Circuit for
On-Chip Optical Clock Distribution

by

Nigel Anthony Drego

B.S. Computer Engineering,
University of California, Irvine (2001)
Submitted to the Department of Electrical Engineering
and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2003
© Massachusetts Institute of Technology 2003. All rights reserved.
A Low-Skew, Low-Jitter Receiver Circuit for On-Chip Optical Clock Distribution

by

Nigel Anthony Drego

Submitted to the Department of Electrical Engineering and Computer Science on May 23, 2003, in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science

Abstract

As deep sub-micron CMOS technology continues to scale, typical H-Tree and other balanced clock distribution schemes are becoming increasingly vulnerable to greater variation, propagation delay, crosstalk and other parasitics introduced by decreasing gate lengths and line widths. An alternative to balanced electrical clock distribution networks is the use of an optical distribution network at the global level. Light can be distributed to multiple receivers across the chip with low skew. Efficient and variation-robust optoelectronic conversion can enable this technology to become a viable alternative. However, variation in the optical network can and will effect the optical signal received at the receiving photodetector. Thus, variation-robust optical receiver circuit design and analysis is a critical step toward implementation of on-chip optical clocks.

A third-generation receiver circuit has been designed using a fully-differential architecture and operates at 2GHz. The advantages of differential signaling are primarily common-mode and power-supply rejection, enabling high-bandwidth amplifiers. Offset compensation circuitry has been developed to counteract mismatch effects common in modern deep sub-micron processes while replica feedback biasing and a process-compensated current reference developed by Intel have been incorporated for stable biasing. Using the mentioned circuit techniques, variation and noise analysis, which quantify receiver skew and jitter due to individual sources of variation and noise, reveal \(~120\text{ps}\) of skew and \(~35\text{ps}\) of worst-case jitter. The receiver circuit and associated test circuitry has been laid out and awaits fabrication.

Thesis Supervisor: Duane S. Boning
Title: Professor of Electrical Engineering and Computer Science

Thesis Supervisor: Michael Perrott
Title: Assistant Professor of Electrical Engineering and Computer Science
Acknowledgments

For 19 out of the nearly 23 years I have been alive, I have been given the privilege and luxury of receiving an education - for this I can not be more grateful. Before all, I must thank God for all the opportunities that He has put in front of me, both good and bad.

To Mom and Dad: Thank you for being close, supporting me and loving me through every step; there is no more I could ask for. I only hope I have lived up to your dreams and expectations and made all your trials and tribulations worthwhile. You have sacrificed much so that I can be where I am today and I hope I can give you back some of that. Allour, you’re up next girl and you have me right at your side, as always.

A big shoutout to the 498 crew for all the laughs, mischief, fun and excitement that has never ceased. I couldn’t have made it through living with “stinky” and adjusting to MIT, and have had so much fun here without three people, Puneet, Gaia and Vidya. Vid, you’ve taught me so much and been the greatest friend a guy could have; the words “thank you” don’t seem sufficient.

My past and present labmates have given me so much: from the countless hours of technical help to deep philosophical discussions to some of the whackiest practical jokes I’ve been a part of. My thanks to Karen Gonzalez-Valentin Gettings (did I get it right, Karen? :), Joseph Panganiban, Mike Mills (has the Iron Monkey died yet?), Aaron Gower-Hall and Dave White.

I’d like to thank my two research advisors: Duane Boning and Michael Perrott. Duane, thank you for giving me the opportunity to work on a project that I knew very little about coming in, but have learned so much from. Your help and guidance have been paramount in making this thesis come together. I am ever grateful to Mike Perrott for the many circuit issues he has helped me tackle and for the simulation environment that every circuit designer needs to have. It has saved me many an hour of frustration over a particular tool, left nameless.

This work has been supported by the MARCO Interconnect Focus Center.
Contents

1 Introduction
   1.1 The Problems with Traditional Interconnect ................. 17
   1.2 Clock Distribution: A Unique Engineering Challenge .......... 18
   1.3 Related Work .............................................. 20
      1.3.1 H-Tree Distribution Networks .......................... 20
      1.3.2 Active Deskew Mechanisms ............................ 20
      1.3.3 Optical Designs ....................................... 21
   1.4 Thesis Organization ........................................ 23

2 Sources of Variation ............................................. 25
   2.1 Variation In The Optical Network ............................ 25
      2.1.1 The Optical Source ................................... 25
      2.1.2 Waveguides ........................................... 26
      2.1.3 Couplers and Photodiodes ............................. 30
   2.2 CMOS Process Variation ...................................... 33
      2.2.1 Front-End-Of-Line Variation .......................... 33
      2.2.2 Back-End-Of-Line Variation ........................... 36
      2.2.3 Mismatch .............................................. 38
   2.3 Environmental Variation ..................................... 40
      2.3.1 Power Supply Variation ($\Delta V_{DD}$) ............... 40
      2.3.2 Temperature Variation ($\Delta T$) ..................... 41
      2.3.3 Noise ................................................. 42
   2.4 Summary .................................................... 43
3 Circuit Design for Robustness

3.1 Circuit Topology ........................................ 45
3.2 Photodetector .............................................. 46
3.3 Transimpedance Stage .................................... 48
   3.3.1 Typical Transimpedance Stages .................... 48
   3.3.2 Previous Transimpedance Stage ................... 49
   3.3.3 Low Input-Impedance, High Gain Transimpedance Stage ... 50
3.4 Amplification Stage ...................................... 53
   3.4.1 A Passively-Loaded Differential Pair Amplifier .... 54
   3.4.2 Multiple Gain Stages ................................. 58
3.5 Output Stage ............................................ 61
3.6 Offset Compensation ..................................... 62
   3.6.1 Known Offset Compensation Methods ............... 63
   3.6.2 A Simple Differential Pair .......................... 64
3.7 Biasing ................................................... 65
   3.7.1 Process-Compensated Current Reference .......... 65
   3.7.2 Current Mirrors ...................................... 69
   3.7.3 Other Biasing Methods ............................... 70
3.8 Summary .................................................. 71

4 Results And Analysis ...................................... 73
4.1 Nominal Circuit Operation ............................... 73
4.2 Process Variation Analysis .............................. 75
   4.2.1 Process Corner Analysis ............................. 75
   4.2.2 Gate Length Variation Analysis ($\Delta L_{gate}$) .... 77
   4.2.3 Threshold Voltage Variation Analysis ($\Delta V_t$) .... 79
   4.2.4 Oxide Thickness Variation Analysis ($t_{ox}$) ......... 80
   4.2.5 Resistor Variation Analysis .......................... 81
   4.2.6 Mismatch Analysis ...................................... 83
4.3 Environmental Variation .................................. 88
4.3.1 Temperature Variation Analysis ($\Delta T$) .................................. 88
4.3.2 Power-Supply Variation Analysis ($\Delta V_{DD}$) ......................... 89
4.4 Noise Analysis ................................................................. 91
  4.4.1 Resistor Thermal Noise (Johnson Noise) .......................... 91
  4.4.2 Transistor Noise Sources ............................................ 91
  4.4.3 External White Noise .................................................. 93
  4.4.4 HSpice Noise Analysis ............................................... 93
4.5 Summary ................................................................. 95

5 Silicon Layout 97
  5.1 Common-Centroid Layout, Interdigitation and Dummy Devices .... 97
  5.2 Input Stage Layout ....................................................... 99
  5.3 Amplification Stage Layout ........................................... 100
    5.3.1 Differential Pair Amplifier Layout ............................... 101
    5.3.2 Differential Pair Op-Amp and Low-Pass Filter Layout .......... 102
    5.3.3 Layout of the Complete Amplification Stage .................. 104
  5.4 Output Stage Layout ................................................... 106
  5.5 Layout of Offset Compensation Circuitry ............................ 107
  5.6 Layout of Biasing Circuitry ........................................... 107
  5.7 Layout of the Complete Receiver Circuit ............................ 109
  5.8 Layout Extraction and Verification .................................. 111
  5.9 Summary ................................................................. 112

6 Test Chip 113
  6.1 Demonstrating Basic Functionality ................................... 113
    6.1.1 Optical Setup ....................................................... 114
    6.1.2 Electrical Testing ................................................ 114
  6.2 Circuit Variants ....................................................... 117
    6.2.1 Circuit Without Biasing .......................................... 117
    6.2.2 Circuit Without Offset Compensation ........................... 118
    6.2.3 Optical vs. Electrical Variants ................................. 118
6.3 Testing Summary ............................................ 118

7 Final Remarks ........................................... 121

7.1 Issues Yet To Be Resolved ............................... 121
  7.1.1 Skew and Jitter ......................................... 121
  7.1.2 Power Consumption ...................................... 122
  7.1.3 Variation in the Optical Network ................... 122

7.2 Future Work .................................................. 123
  7.2.1 Variation As a Useful Tool ............................ 123
  7.2.2 Low-Power Circuit Techniques ....................... 123
  7.2.3 Low-Noise Amplifiers .................................. 124
  7.2.4 Alternative Offset Compensation Techniques .... 124
  7.2.5 Integration of Active Deskew Mechanisms .......... 124
  7.2.6 Integration and Characterization of Optical Components ... 125

7.3 Contributions .................................................. 125
## List of Figures

1-1 Skew vs. Jitter: The top illustration depicts a clock signal arriving at Site B late in relation to Site A. In the bottom illustration, the arrival of the clock signal varies from the desired arrival time from cycle to cycle. In both illustrations, dashed lines depict ideal arrival times.  

1-2 H-Tree Distribution Scheme  

2-1 Waveguide structure  
2-2 Guided-wave Optical Clock Distribution  
2-3 New splitter design from [1]  
2-4 Different coupling structures to couple light into integrated photodiodes [2]  
2-5 Photodiode structure and integration  
2-6 Top-View and Cross-Section of Important Transistor Dimensions, taken from [3]  
2-7 Metal Interconnect Dimensions, taken from [3]  
2-8 Fed-back inverter to demonstrate effect of power supply variation  
2-9 Simple current mirror to demonstrate effect of power supply variation on biasing elements  

3-1 Typical optoelectronic receiver topology  
3-2 Typical transimpedance amplifier  
3-3 Transimpedance stage used in [4]  
3-4 Transimpedance stage with better control over input impedance and gain
3-5 Improved transimpedance stage with lower input impedance .......................... 52
3-6 Breaking the feedback loop to calculate $R_{in}$ ........................................... 52
3-7 Fully-differential input + transimpedance stage ........................................... 54
3-8 Resistively loaded differential pair amplifier ................................................. 55
3-9 Amplifier optimization using simulated $g_m$-curves ..................................... 58
3-10 Total gain of amplification stage (in dB) ..................................................... 59
3-11 Replica feedback biasing .................................................................................. 61
3-12 Differential signal at output of amplification stage ......................................... 62
3-13 Cascaded inverters following the amplification stage ....................................... 63
3-14 Offset compensation circuitry .......................................................................... 64
3-16 Circuit used to generate multiples of $V_t$ ....................................................... 68
3-17 Current mirror network .................................................................................... 70
4-1 Nominal output waveforms generated by the receiver circuit ......................... 74
4-2 Output waveforms when subject to process corner variation ......................... 76
4-3 Output waveforms when subject to $L_{gate}$ variation ..................................... 78
4-4 Output waveforms when subject to $V_t$ variation ........................................... 79
4-5 Output waveforms when subject to $t_{ox}$ variation ......................................... 81
4-6 Output waveforms when subject to resistor variation ...................................... 82
4-7 Output waveforms when subject to mismatch. outp8 is shown in the dark, straight plots while outn8 is shown in the lighter, dashed plots. 84
4-8 (a) Histogram of duty cycle for outp8 subject to mismatch (b) The same for outn8. .............................................................................................................. 85
4-9 Histogram of generated skew when receiver is subject to mismatch .............. 86
4-10 Output waveforms when subject to temperature variation ............................ 88
4-11 Output waveforms when subject to power supply variation ............................ 90
4-12 Output (outp8) noise spectral density of receiver circuit ................................ 93
5-1 Example of a common-centroid array .............................................................. 98
5-2 Example of a two-dimensional common-centroid array for MOSFETS 99
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>5-3</td>
<td>Guide to input stage layout</td>
<td>100</td>
</tr>
<tr>
<td>5-4</td>
<td>Layout of the input stage</td>
<td>101</td>
</tr>
<tr>
<td>5-5</td>
<td>Layout of a single differential pair amplifier</td>
<td>102</td>
</tr>
<tr>
<td>5-6</td>
<td>Layout of the differential pair op-amp</td>
<td>103</td>
</tr>
<tr>
<td>5-7</td>
<td>Layout of the last differential pair amplifier with low-pass filter and op-amp</td>
<td>104</td>
</tr>
<tr>
<td>5-8</td>
<td>Complete layout of the amplification stage</td>
<td>105</td>
</tr>
<tr>
<td>5-9</td>
<td>Layout of the output stage</td>
<td>106</td>
</tr>
<tr>
<td>5-10</td>
<td>Layout of the differential pair in the offset compensation circuitry</td>
<td>107</td>
</tr>
<tr>
<td>5-11</td>
<td>Layout of the MOS Triode resistors in the offset compensation circuitry</td>
<td>108</td>
</tr>
<tr>
<td>5-12</td>
<td>Guide to layout of biasing circuitry</td>
<td>108</td>
</tr>
<tr>
<td>5-13</td>
<td>Layout of biasing circuitry</td>
<td>109</td>
</tr>
<tr>
<td>5-14</td>
<td>Layout of complete receiver circuit</td>
<td>110</td>
</tr>
<tr>
<td>5-15</td>
<td>Extracted output waveforms to verify functionality</td>
<td>111</td>
</tr>
<tr>
<td>6-1</td>
<td>Toggle flip-flops to divide down a clock signal</td>
<td>114</td>
</tr>
<tr>
<td>6-2</td>
<td>Optical setup to test circuit</td>
<td>115</td>
</tr>
<tr>
<td>6-3</td>
<td>Photodiodes replaced by NFETs for electrical testing</td>
<td>116</td>
</tr>
<tr>
<td>6-4</td>
<td>Single-ended to differential conversion circuitry</td>
<td>117</td>
</tr>
</tbody>
</table>
List of Tables

2.1 Properties of Si and InP at 25°C taken from [6] ..................................... 33

3.1 Designed amplifier parameters ..................................................................... 56
3.2 Current reference parameters ...................................................................... 68
3.3 Generated $I_{\text{ref}}$ across process corners .................................................. 69
3.4 Skew across process corners (using process-compensated current-reference vs. ideal current sources) ............................................................... 70

4.1 TSMC 0.18$\mu$m process corners ($\Delta V_i$ and $\Delta L_{\text{gate}}$) ....................... 75
4.2 TSMC 0.18$\mu$m process corners ($\Delta t_{\text{ox}}$ and $\Delta W_{\text{gate}}$) ....................... 76
4.3 Generated skew, duty cycle, rise and fall times due to process corner variation .................................................................................................................. 77
4.4 Generated skew due to gate length variation .................................................... 78
4.5 Generated skew due to threshold variation ...................................................... 80
4.6 Generated skew due to oxide thickness variation ............................................. 81
4.7 Generated skew due to resistor variation ....................................................... 82
4.8 MOSFET matching coefficients for the 0.18$\mu$m generation ......................... 83
4.9 Statistics for duty cycle in Monte Carlo simulations.
   * MSE = Mean Standard Error, 2 MSEs corresponds well to a 95% confidence interval on the mean
   **$S^2$ is the sample variance
   ***For a 95% confidence interval on $S^2$ ............................................................. 85
4.10 Statistics for generated skew in Monte Carlo simulations. All definitions are the same as the previous table. ............................................................... 87
<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>4.11 Generated skew due to temperature variation</td>
<td>89</td>
</tr>
<tr>
<td>4.12 Generated skew due to power supply variation</td>
<td>90</td>
</tr>
<tr>
<td>4.13 Simulated worst-case TT corner RMS noise and jitter</td>
<td>94</td>
</tr>
<tr>
<td>6.1 Signal amplitudes and biases for electrical testing</td>
<td>116</td>
</tr>
</tbody>
</table>
Chapter 1

Introduction

This chapter provides motivation for the design of a receiver circuit for on-chip optical clock distribution. The following sections detail the problems with current interconnect technologies and the challenges present in clock distribution. Related work is also presented as reference for many of the ideas and issues in this thesis. An outline for the rest of this thesis is provided at the end of this chapter.

1.1 The Problems with Traditional Interconnect

With continued, aggressive scaling of Deep Sub-Micron (DSM) CMOS technologies to the 100-nm node and beyond, reliable global clock distribution in high-speed microprocessor and ASIC design faces increasing engineering challenges. Among these challenges are: (1) increased relative device and interconnect variation; (2) higher clock frequencies resulting in lower absolute skew and jitter budgets; (3) increasing fractions of overall chip power dissipation due to repeater insertion; (4) exacerbation of parasitics such as resistance and coupling capacitance due to decreasing wire dimensions; and (5) electromigration and self-heating effects [7].

Increasingly, these challenges are limiting the speed of VLSI designs. Initially, designs were logic-limited, meaning the speed of the individual transistors, or groups of transistors called gates, limited performance. However, designs are becoming more and more interconnect or wire-limited [8]. As a result, either chip architectures must
be fundamentally redesigned to account for this shift or a different interconnect technology used. Optics provides an attractive option for a radically different interconnect technology. The potential benefits optics has to offer include lack of parasitics (such as RC delays and crosstalk), higher potential frequencies due to absence of frequency-dependent signal loss and distortion, and lower power interconnects [9]. While it may be infeasible to utilize optics for all on-chip interconnects due to the circuitry needed to interface between the electrical and optical domains, there are many arguments for using optics at a global level, particularly to distribute clock signals across chip.

1.2 Clock Distribution: A Unique Engineering Challenge

This thesis explores the design of interface circuitry to convert from the optical domain to the electrical domain for a particular application: on-chip optical clock distribution. Clock distribution is a specific application of global signal distribution, with tightly constrained and unique engineering challenges. Clock distribution is fundamentally different from data distribution because a clock signal is “periodic, predictable, almost everything on the chip needs it, and delay does not matter, so long as the clock arrives everywhere at the same time” [10]. Unfortunately, in practical systems, clock signals do not arrive precisely at the same time in a spatial or even temporal reference. A clock signal that arrives slightly out of phase at different spatial locations is skewed. Similarly, jitter refers to variation of arrival of the clock signal relative to an average arrival time (clock period) from cycle to cycle. Figure 1-1 demonstrates the difference between skew and jitter. Skew and jitter are often due to static variation in circuit, device and interconnect parameters, dynamic variation in temperature and power supply fluctuation, and other sources, such as noise.

The use of optics to globally distribute a clock signal can help to reduce skew and jitter so long as the interface circuitry to convert between the optical domain at the global level and the electrical domain at the local level is robust to the aforementioned...
Figure 1-1: Skew vs. Jitter: The top illustration depicts a clock signal arriving at Site B late in relation to Site A. In the bottom illustration, the arrival of the clock signal varies from the desired arrival time from cycle to cycle. In both illustrations, dashed lines depict ideal arrival times.

mentioned sources of variation. The circuit described in this thesis uses design techniques to reduce the coupling between process and environmental variation and circuit performance (characterized by skew and jitter). The techniques used to achieve minimal skew and jitter include differential signaling, replica biasing, process compensation, and common-centroid layout. While each of these techniques in itself is not new, the application of these techniques to the design of a low-skew, low-jitter on-chip optical clock receiver circuit is.

Common-mode and power-supply rejection are two fundamental reasons for using differential signaling. Furthermore, the low-pass filtering needed for stable biasing is made easier by using differential signals, which allow a reduction in the size of the passive devices (resistors and capacitors) needed. Replica biasing provides a convenient method of centering the small-swing analog signal about the switching threshold of an inverter. Process compensation is used in setting up a current reference which adjusts current flow to compensate for process and environmental variation. Lastly, common-centroid layout is a common analog layout practice to minimize variation.
and mismatch in the circuit.

1.3 Related Work

1.3.1 H-Tree Distribution Networks

There has been a great deal of effort put into electrical methods of achieving low-skew clock distribution. An H-Tree distribution scheme with balanced loads has been the most popular method for clock distribution in past and current high-performance designs. The idea behind an H-Tree is to make the total distance (and load) from the clock source to any end node the same (or close to the same). An example of such a scheme is illustrated in Figure 1-2. However, even with balanced H-Trees, process variability affecting both devices in the clock distribution network (i.e. transistors that make up repeaters) and interconnect wires leads to significant amounts of skew. Furthermore, power-supply noise is the primary cause of jitter in such networks. As a result, large amounts of statistical modelling and optimization are required to design balanced H-Tree distribution schemes that achieve desired clock frequencies where skew and jitter budgets occupy a small, constant fraction of the clock cycle [11].

1.3.2 Active Deskew Mechanisms

As process variability increases relative to shrinking process technologies, balanced H-Trees are no longer sufficient to meet skew budgets in high-performance microprocessor designs where clock frequencies are in excess of $1GHz$. Intel’s IA-64 Enterprise processors now utilize deskew mechanisms integrated into H-Tree clock distribution networks [12]. These mechanisms operate by comparing distributed clocks with a carefully designed and balanced reference clock. Logic then uses digitally-controlled delay elements to adjust the distributed clocks such that they are in phase with the reference clock. Such designs have reduced global skew from 110ps to 28ps. It is likely that with finer granularity of the digitally-controlled delay elements, these skew budgets will decrease further. However, deskew mechanisms are activated and
programmed only once before shipping to the end consumer. As a result, these mechanisms only compensate for static process variation and do nothing for environmental (temperature and power supply) variation or noise. While these circuits could theoretically run during normal operation of microprocessors, it is unlikely they will be used in this manner due to the potential of catastrophic failure (as a result of dynamic synchronization changes and possible loss of the clock signal), resulting in total chip loss of functionality. Furthermore, these mechanisms do little to mitigate jitter. Consequently, designers are still faced with the problem of confronting temperature hot spots, power-supply fluctuation (and ground bounce), digital noise, aggressor signals and random noise, all of which cause skew and jitter.

1.3.3 Optical Designs

To date, most work in optical clock distribution networks remains in academia. Completely integrated on-chip optical distribution networks remain to be demonstrated. However, Keeler et al. demonstrate a free-space distribution scheme in which short
(femtosecond) pulses are used for skew and jitter removal [13]. Furthermore, the impulse nature of femtosecond pulses make them ideal candidates for wavelength division multiplexing due to the large spectrum of frequencies contained in such a pulse. Additionally, as early as 1991, Delfyett et al. demonstrated optical clock distribution to 1024 ports via optical fiber [14]. Jitter between any two end nodes in this system was measured below 12ps. Both of these systems may be applicable at a board-to-board or chip-to-chip level communication, but not for intra-chip communication. Optical fiber cores are on the order of 50μm in diameter and are thus not practical for on-chip optical distribution. Furthermore, a free-space approach will likely not be used due to area, packaging and heat removal constraints.

Nevertheless, there has been a fair amount of research dedicated to individual components of integrated optical systems (on-chip optical sources, photodetectors, waveguides, couplers, and associated driver and receiver circuitry) [6]. Our own research group has worked on receiver (clock and data) circuits for integrated on-chip Silicon CMOS distribution networks. Mills designed a receiver circuit for on-chip data distribution based on a synchronous sense-amplifier [10]. This design is fundamentally different from any design that can be used for a clock receiver circuit due to the inherent differences in data versus clock applications.

S. Sam designed a first-generation clock receiver circuit which utilizes modified inverters as a primary means of amplification in 0.35μm CMOS process and silicon P-I-N photodetectors [15]. However, the analog properties of the 0.35μm process at 1GHz are poor and silicon P-I-N photodetectors require large areas due to their low responsivities. Since photodiode capacitance is directly related to the area of the photodiode, there exists a tradeoff between optical power and photodiode size. Furthermore, Sam's chosen topology was found to be sensitive to process and environmental variation sources. Lum designed a second-generation clock receiver circuit with a completely different architecture [4]. Using a bandgap reference and voltage regulator in a 0.18μm CMOS process proved valuable in increasing robustness to environmental variation. Skew due to process variation remained high, however. Lum's design also uses Indium Phosphide (InP) photodetectors which are flip-chip
bonded to a normal silicon CMOS die. InP photodiodes allow for smaller photodetector areas (40μm x 40μm) and thus smaller diode capacitance. As a result, circuit performance is better due to a smaller RC time constant at the input node. However, the use of cascode amplifiers in this design limits its frequency response. While cascode topologies are typically used to achieve high gain and bandwidth improvement by eliminating the Miller effect [16], the frequency response of an individual cascode amplifier is limited to below 1GHz even in a 0.18μm CMOS process. As a result, one achieves high DC gain but lower (one order of magnitude) AC gain, making biasing of the output signal about the switching threshold of an inverter difficult. Even if biasing is done correctly (as Lum takes great pain to do), scaling this architecture to higher frequencies for a given process proves difficult.

1.4 Thesis Organization

This thesis is organized as follows: Chapter 2 details sources of variation that result in large skew and jitter, and that need to be kept in mind in the design of optical clock distribution. Circuit techniques presented in Chapter 3 aid in ensuring a robust receiver circuit design, with analysis of these techniques and whole circuit operation presented in Chapter 4. This analysis includes simulation results with process and environmental variation influencing the circuit operation. Noise analysis and parameter mismatch analysis are also presented in this chapter. In order to minimize and mitigate the effects of variation and mismatch, specific layout practices are used as detailed in Chapter 5. Issues relating to the layout, such as coupling and parasitics, are also discussed. As a follow-on to the layout, circuit variants and post-fabrication testing strategies are described in Chapter 6. Finally, building on the issues that arise in earlier chapters, Chapter 7 concludes with future research ideas and directions for on-chip optical clock distribution.
Chapter 2

Sources of Variation

This chapter explores and documents the many sources of variation that affect circuit operation. Among these sources are variation in the optical network (i.e. waveguides, photodiodes, couplers, etc.), process variation and environmental variation. All of these variation sources exhibit varying degrees of spatial and temporal dependencies.

2.1 Variation In The Optical Network

Since the clock source in an optical clock distribution scheme is a laser or laser-diode, and is distributed by an optical network, variations in this network are considered first. There are three main components, and thus sources of variation, in the optical network: 1) the optical source; 2) waveguides; and 3) couplers and photodiodes. Variation in each of these components can introduce significant skew and jitter. The following subsections describe the sources of variation found in each component and attempt to quantify their impact.

2.1.1 The Optical Source

The optical source is typically an on- or off-chip laser. Variation typically associated with a laser is manifested in the form of jitter (both long and short term). First order jitter contribution in laser diodes are a result of the laser diode turn-on delay.
This jitter is on the order of tens of picoseconds. When used in an optical clock distribution scheme, at a system level with a fanout of 10, jitter as low as 50ps has been reported [17]. Delfyett et al. use a mode-locked laser, however, to eliminate this turn-on delay and reduce pulse-to-pulse jitter to \( \sim 400 \text{fs} \) [14]. It is important to note that Delfyett’s setup involved generation of a pulse train of high power femtosecond pulses. This is not the setup that is used in this work. Rather, a square wave is generated at the desired clock frequency. Nevertheless, more recent work in ultra-low noise semiconductor mode-locked lasers demonstrates even lower jitter results. Jiang et al. at the Research Laboratory for Electronics at MIT have demonstrated timing jitter of 86fs from 10Hz to 4.5GHz [18].

The above shows that jitter due to the optical source, namely the laser or laser diode, can be made extremely small relative to other variation sources. However, this jitter might be amplified due to other components of the optical network, specifically splitters and waveguides.

### 2.1.2 Waveguides

In order to distribute the optical signal generated by the optical source, an interconnection network is needed, much like the metal interconnection network found in current intergrated circuits to distribute electrical signals. For this, a waveguide structure like that shown in Figure 2-1 is used. The waveguide construction is typically some core material (e.g. SiON) surrounded by a cladding (e.g. SiO\(_2\)). The core material serves as the transmission media for the optical signal while the surrounding cladding internally reflects the light, maintaining the necessary relationship for total internal reflection and allowing propagation of the optical signal through the waveguide.

Furthermore, the distribution network should be balanced, with waveguides as the transmission media. This is analogous to an electrical H-Tree scheme where aluminum or copper wires serve as the transmission media. Figure 2-2 shows such an optical interconnection scheme with 16 clock receiver sites, where conversion from the optical domain to the electrical domain occurs. After this conversion, local clock distribution
is achieved through typical balanced electrical schemes. In such a structure, there are two primary sources of variation: 1) waveguide refractive index; and 2) splitting ratios.

The waveguide refractive index is determined by the thin films used in creating the waveguide structure. Current thin film technology is capable of < 1% within die variation in the thickness of the thin films being deposited [19][6]. This thickness variation translates into a refractive index variation, which in turn will translate into a skew due to path delay variation. Eq. 2.1, which describes the optical delay through an arbitrary material, demonstrates this dependency.

\[
Delay = \frac{l_{path}}{c} \cdot n_{path}
\]  

(2.1)
where $l_{\text{path}}$ is the path length, $c$ is the speed of light and $n_{\text{path}}$ is the index of refraction for a path. As a result of this dependency, variation in the index of refraction of a particular path results in a different delay, or skew, for that path relative to other paths. Typically, path lengths at the global level of a clock distribution tree will be on the order of tens of millimeters, resulting in $50 - 100\text{ps}$ of optical delay depending on the index of refraction of the waveguide material. If there is 1% variation in $n_{\text{path}}$ for a particular path, the resulting skew will be on the order of a few picoseconds.

Typical waveguide structures normally incorporate two-way splitters to transition from one level of the distribution network to a lower level. Larger splitters have also been demonstrated, where one waveguide feeds up to 16 subsequent waveguides [20]. Every time a waveguide splits into two other waveguides, the optical power is divided between the two waveguides. Ideally, the split should be lossless and divide the power equally between the two subsequent waveguides. However, fabricated waveguide distribution networks show that losses do occur at the splits. While losses are undesirable, a uniform loss at the split simply requires that the optical source be able to provide sufficient power such that the optical power delivered to the end nodes is enough to determine the correct data value (0 or 1).

More detrimental to the operation of a clock distribution network is uneven power splitting, in both two-way and N-way splitters. Due to limited lithography resolution, deviations in patterning result in "non-uniform power splitting and energy loss" [1]. Power splitting ratios have been shown to be near 55:45. However, newer designs of the waveguide structure and associated splitting structures have dramatically reduced splitting losses ($\sim 0\text{dB}$) and splitting ratios are nearing the ideal, with demonstrations of 51:49 splitting ratios (shown in Figure 2-3). Nevertheless, it is difficult to characterize which end of the split will receive the higher or lower power split, and as such, a receiver circuit at one corner of a die may experience an optical power much greater than a receiver at the opposite corner of the die. Increasing the number of optical levels, which means increasing the number of splits from the source to an arbitrary receiver, only exacerbates this issue. Even with newer designs offering splitting ratios of 51:49, two paths going through three splits (with only the first split in common)
Figure 2-3: New splitter design from [1]
could result in a ratio of 53:47. If the number of splits is increased to four, the ratio could become 54:46.

Another, less known, effect occurs at splits. A passive splitter structure can introduce optical reflections and possibly even modal noise [14]. This results in an additional jitter component that must be added to the total jitter a particular node sees.

Once the optical signal is distributed through the waveguide network, it must be converted to an electrical current. Couplers feed the signal from the waveguide into a photodiode, which converts the optical energy into an electrical current. The next section will look at sources of variation in these devices.

2.1.3 Couplers and Photodiodes

Among the most critical devices in the optical network from a variation point of view is the coupler. The purpose of these devices is to couple the optical signal from the waveguide into an integrated photodiode. There are three primary methods of achieving this coupling as shown in Figure 2-4: 1) butt-coupling, which uses a mesa-structure for the lateral photodetector; 2) leaky-wave coupling by a step-less transition from field oxide to active detector area; and 3) mirror coupling using total reflection by sloping the end of a waveguide [2].

Due to the difficulties of fabrication and the change in CMOS process steps needed to build butt- and leaky-wave coupling devices, mirrors are most often used as coupling devices [2][6][1]. Furthermore, the processing of the waveguides is accomplished after the standard CMOS process, assuming low-temperature processing steps. Therefore, MOS parameters are not affected by the additional waveguide processing steps. Nevertheless, the angle, thickness and smoothness of the mirror will all affect the efficiency of the photocoupler. These parameters also affect the other methods of coupling, except in those cases vertical wall uniformity, anti-reflective (AR) coating and taper add to the difficulty in creating uniform structures [19]. As a result, this variation results in a variation in quantum efficiency of the device, which finally translates into a variation in the optical power seen at the diode. Chapter 4 will detail how
such variations in the optical power affect circuit operation and the output signal.

Photodiodes are the last stage of the optical network. At the output of the photodiode, the signal has been converted to the electrical domain. Since the photodiode is the converting element, it is especially critical that the conversion be uniform to ensure that the electrical signal is the same from one receiver location to another. There are two primary sources of variation in the photodiode: 1) non-uniformity in the internal structure of the photodiode itself; and 2) variation in the etching of the photodiode structure that results in a variation in the area of the photodiode. Both are now examined in more detail.

Non-uniformity in the structure of the photodiode is very dependent on how the photodiodes are fabricated and integrated into the CMOS die. The design in this thesis uses photodiodes that are constructed and integrated in the following manner (depicted in Figure 2-5): A separate wafer is used for fabrication of the Indium Phosphide (InP) diodes. Once the diode structure has been grown across the complete
diode wafer, the wafer is patterned and etched with a mask that matches the locations of where the photodiodes are to be integrated on the CMOS die. This produces individual photodiode structures. The two wafers are then flip-chip bonded, with the bond occurring between a low-level interconnect metal pad on the CMOS die and the top of the photodiode structure. Once a bond is achieved, the diode wafer is etched away leaving the photodiode structures integrated in the CMOS die. One might imagine that the uniformity of diodes constructed in this manner would be much greater than diodes constructed individually. Furthermore, two diodes placed close to each other will have even greater uniformity relative to each other.

Nevertheless, there will be some non-uniformity in the structure of photodiodes, even if placed close to each other. This non-uniformity will likely affect the quantum efficiency of the device relative to other devices, which will then translate into variation between input photocurrents to the respective receiver circuits. Additionally, wafer bonding of the sort mentioned above induces thermal stress due to a mismatch in thermal expansion coefficients, as shown in Table 2.1 [6]. With temperature gradients on die, thermal mismatch between sites on the chip will result in stress mismatches and variation in photodiode performance as a result.

The second major variation source is the etching of the photodiodes. Variations in this step result in area differences between photodiodes. Since both the capacitance and converted photocurrent are proportional to the area of the photodiode, variations
<table>
<thead>
<tr>
<th>Property</th>
<th>Si</th>
<th>InP</th>
</tr>
</thead>
<tbody>
<tr>
<td>Crystal structure</td>
<td>Diamond</td>
<td>Zincblende</td>
</tr>
<tr>
<td>Lattice constant (nm)</td>
<td>0.54307</td>
<td>0.58687</td>
</tr>
<tr>
<td>Bandgap (eV)</td>
<td>1.17</td>
<td>1.344</td>
</tr>
<tr>
<td>Thermal conductivity (W cm⁻¹K⁻¹)</td>
<td>1.5</td>
<td>0.7</td>
</tr>
<tr>
<td>Thermal expansion coefficient (K⁻¹)</td>
<td>2.6 x 10⁻⁶</td>
<td>4.8 x 10⁻⁶</td>
</tr>
</tbody>
</table>

Table 2.1: Properties of Si and InP at 25°C taken from [6]

in the area will result in variations in capacitance and converted photocurrent as well. This will directly affect the output signal relative to other sites on the die.

From the laser source to the photodiode, sources of variation are present even in the optical network and will affect the clock distribution network. In the following section, CMOS process variation will be explored in detail.

### 2.2 CMOS Process Variation

Among the biggest variation concerns are those due to CMOS process steps. Since these process nonuniformities affect individual MOS and interconnect parameters, this variation is spatially dependent. As a result, the MOS and interconnect parameters at an arbitrary location can be significantly different from the MOS parameters at another location on a die. Furthermore, these variations can have differing spatial dependencies, and as such will have differing intra-die, die-to-die and wafer-to-wafer dependencies. This section will explore process variation in two subsections. Section 2.2.1 will look at Front-End-Of-Line variation which affects MOS parameters. Section 2.2.2 looks at Back-End-Of-Line variation which affects the interconnect structure.

#### 2.2.1 Front-End-Of-Line Variation

Front-End-Of-Line (FEOL) variation directly affects the response of electrical components (MOSFETS, resistors and capacitors) on chip [21]. The MOS parameters
of particular interest in this thesis are: 1) gate-length \( (L_{gate}) \), gate-width \( (W_{gate}) \), oxide thickness \( (t_{ox}) \), and channel doping \( (n_{channel}) \) [3]. Figure 2-6 shows the critical transistor dimensions that are subject to variation. These parameters strongly affect device performance and will be individually discussed in more detail.

![Figure 2-6: Top-View and Cross-Section of Important Transistor Dimensions, taken from [3]](image)

**Channel Length Variation** \( (\Delta L_{gate}) \)

Channel length largely determines the performance of a MOSFET as the "on" resistance and gate capacitance of a device are both determined by the channel length. Consequently, variation in the channel length may have a severe impact on device and circuit performance. There a number of processing steps that contribute to channel-length variation: photolithography, gate etch, spacer formation, ion implantation and thermal processing. In current processes Across-Chip-Line-width Variation (ACLV) is \(~10\%\) of the mean channel length [21]. We can see how this affects circuit operation by looking at fundamental MOSFET equations:
\[ i_{d,\text{sat}} = \frac{\mu_n C_{ox}}{2} \left( \frac{W}{L} \right) (v_{gs} - V_t)^2 (1 + \lambda v_{ds}) \] (2.2)

\[ R_{on} = \left( \frac{L}{W} \right) \frac{1}{\mu_n C_{ox} (v_{gs} - V_t)} \] (2.3)

Eq. 2.2 describes the current through a MOSFET in the saturation region, assuming a square law behavior. Since all transistors in this design are biased in this region, only this region is discussed. As seen from this equation, the saturation current is inversely proportional to the channel length. Variations in the gate length thus affect the current through the device through an inverse proportionality. Furthermore, Eq. 2.3 shows that the on-resistance of a transistor is directly proportional to the gate length. This should be of no surprise as resistance is inversely proportional to current flow for a fixed voltage. More importantly, these variations contribute to gain, bandwidth and DC biasing variations in circuit performance. In the extreme case, if not properly accounted for, this could result in loss of circuit functionality.

**Channel Width Variation** \((\Delta W_{\text{gate}})\)

Relative to channel length variation, channel width variation is a second-order effect. Narrow channel widths (i.e. minimum width devices) have edge effects that must be considered. However, since no devices in the signal path of the circuit in this design are of minimum width (or close), variation due to channel width variation can be safely ignored.

**Oxide Thickness and Channel Doping Variation** \((\Delta t_{ox} \text{ and } \Delta n_{channel})\)

Oxide thickness and channel doping variation are the primary causes of threshold voltage variation. Eqs. 2.2 and 2.3 have dependencies on \(V_t\). Due to the square-law relationship between \(V_t\) and \(i_{d,\text{sat}}\), a small variation in the threshold voltage can cause a much larger variation in current, especially when gate overdrive is small. Other effects such as mobile charge, hot carriers and body effect also affect the threshold voltage. Often the only way to adjust for \(V_t\) variation is by adjusting channel length.
since short channel devices are more susceptible to $V_t$ variation. However, channel length directly affects device performance and creates a tradeoff between performance and variability.

### 2.2.2 Back-End-Of-Line Variation

Back-End-Of-Line (BEOL) variation refers to variation in the interconnection structure of integrated circuits. This structure is composed of layers (current processes have between six and eight layers) of metal (typically Al or Cu) lines. Between layers and adjacent lines an inter-layer dielectric (ILD) is also deposited. Figure 2-7 depicts this structure and the associated critical dimensions. The critical dimensions shown, line-spacing ($s$), ILD thickness ($h$), and metal thickness ($t$), affect circuit performance and signal integrity by producing variations in line resistance and capacitance, as is shown in the following subsections.

![Metal Interconnect Dimensions](image)

**Figure 2-7: Metal Interconnect Dimensions, taken from [3]**

**Line-Spacing Variation ($\Delta s$)**

The spacing between lines is a function of the width of the lines and the intended line-spacing. Therefore, a variation in line-width is also a variation in line-spacing if two (or more) lines are near each other. As Eq. 2.4 [22] shows, signal delay is dependent on line resistance and capacitance. Since the resistance and capacitance of a line (Eqs. 2.5 and 2.6) depend on the width of the line and both of these affect
signal delay, variability in the line-width/spacing will cause delay variation. Note that $R_{sh}$ is the sheet resistance of the interconnect metal in $\frac{0}{0}$ and Eq. 2.6 neglects fringing capacitance and coupling capacitance to nearby lines. Both of these are increasingly dominant capacitance contributors in modern processes due to decreasing line-widths/spaces. As a result, capacitance variation will be even larger than that described by Eq. 2.6 alone.

\[
T_{\text{delay}} = 0.4R_{\text{int}}C_{\text{int}} + 0.7R_{\text{on}}C_{\text{int}}
\]  
\[ (2.4) \]

\[
R_{\text{int}} = R_{sh}\frac{l_{\text{line}}}{w_{\text{line}}}
\]  
\[ (2.5) \]

\[
C_{\text{int}} = \frac{A\varepsilon_{\text{SiO}_2}}{h}
\]  
\[ (2.6) \]

**ILD Thickness Variation ($\Delta h$)**

The Inter-layer Dielectric serves to insulate one metal layer from surrounding metal layers. The dielectric is usually a combined oxide and nitride film, which is produced by low-temperature deposition. While the deposition process is conformal and the films are highly uniform, pattern density causes polishing rates to vary in planarization processes [21]. The non-uniformity in polishing rates results in global variations in ILD thickness. From Eq. 2.6, line capacitance is most affected by this variation. Furthermore, although the dependency is not shown in Eq. 2.6, coupling capacitance between lines on different metal layers is greatly affected by ILD thickness.

**Metal Thickness Variation ($\Delta t$)**

Although Eq. 2.5 assumes $R_{sh}$ is a constant, in reality there is some variation in the sheet resistance. Eq. 2.7, where $\rho$ is the resistivity of the metal, illustrates the dependence between the sheet resistance and metal thickness. Metal thickness variability can be caused by a number of factors, including local pattern density, position on wafer, plasma etch non-uniformity, wire width, and chemical-mechanical polish-
ing (CMP) [21]. As such, the sheet resistance will vary somewhat, resulting in wire resistance variability.

\[ R_{sh} = \frac{\rho}{t} \]  

(2.7)

Other Sources of Interconnect Variation

The above factors are first-order factors in interconnect variability. In modern processes, there are additional factors that must be considered.

Liners are inserted to enclose metal conductors and protect them. However, during annealing of the interconnect structure, these liners react with the conductors and form alloys. The alloys have differing resistivities than the enclosed metal conductors. While this is not an issue by itself, the annealing process creates varying thickness in the liners. Thicker liners typically have more conductor metal content than do thinner liners and thus their resistivities are different.

Additionally, electromigration is becoming a large problem in modern designs. Electromigration is the gradual migration of metal atoms in interconnect metal, facilitated by high temperatures and unidirectional current flow. Both of these conditions are met in today’s designs. Electromigration causes the resistance of a wire to change from one location on the wire to another. At the extreme, electromigration causes voiding of the metal and metal failure.

2.2.3 Mismatch

Mismatch causes time-independent random variations in physical quantities of identically designed devices [23]. Many of the causes of mismatch are the same as the causes of variability for the above-mentioned parameters. However, mismatch is particularly of concern in terms of local variation. In this thesis, differential circuitry is a prime victim of mismatch, and thus its causes and consequences must be studied and kept in mind throughout the design flow.

Looking again at the MOSFET in the saturation regime, Eq. 2.2, there are two
major mismatch factors possible between two identical transistors: 1) differing dimensions ($W$ or $L$) or 2) differing threshold voltages ($V_t$). The equation for the threshold voltage of a MOSFET, Eq. 2.8, shows that there are subsequently two factors for differing threshold voltages: a fixed part, $V_{t0}$, and a substrate-voltage-dependent part, $K$.

$$V_t = V_{t0} + K(\sqrt{|V_{SB}|} + 2\phi_F - \sqrt{2\phi_F}) \quad (2.8)$$

The dominant causes for mismatch in the fixed portion of the threshold voltage is fixed oxide charge, implantations and substrate doping. The major factor in the $K$ variation is substrate doping. Likewise, mobility variation is a dominant cause of variation in the current factor, $\beta$, in addition to the variability in dimensions.

Transistor mismatch has been shown to have a dependency on area of the device. In [23] it is shown that the variance of each of the mismatching parameters discussed above can be predicted by Eq. 2.9, where $P$ is the parameter of interest, $A_p$ is an area proportionality constant for the parameter, and $S_p$ is a constant relating the variation of $P$ to spacing, or distance between devices.

$$\sigma^2(\Delta P) = \frac{A_p^2}{W_L} + S_p^2D_x^2 \quad (2.9)$$

Usually the spacing term is omitted for devices very close to each other. Furthermore, it is very often the case that a designer is interested in matching two (or more) devices that are near to each other, as in differential pairs.

From Eq. 2.9 it is easy to see that increasing the effective area of identical devices will reduce the variability of mismatching parameters. There are a host of other design and layout practices that minimize mismatch that will be discussed in Chapters 3 and 5.
2.3 Environmental Variation

The above section discussed variability due to the manufacturing process. There are additional sources of variation that are due to the environment. The three fundamental variation sources are power-supply variation, temperature variation and noise.

2.3.1 Power Supply Variation ($\Delta V_{DD}$)

Variability in the power supply arises in two forms. The first is resistive (IR) drops along the power grid. Due to non-zero resistance in the wires forming the power grid, devices farthest away from the power pads (often the center of the die) experience a lower $V_{DD}$ than do devices closer to power pads. The second source of power supply variation is instantaneous $V_{DD}$ or $GND$ bounce. In large, digital IC’s this is a major issue as many (possibly thousands or more) signals switch from one rail to another. This simultaneous switching activity results in large currents from $V_{DD}$ to $GND$. Often these transients are large enough to instantaneously cause a shift in the level of either $V_{DD}$ or $GND$. This type of power supply variation is often bundled with noise, discussed below.

To see how $V_{DD}$ variation can affect circuit operation, an inverter where the output is connected to the input, shown in Figure 2.3.1, can be analyzed. The output (or, equivalently the input) node will always be the switching threshold of the inverter. If the PMOS and NMOS devices are balanced, the switching threshold of the inverter will be $V_{DD}/2$. As such, changes in $V_{DD}$ directly affect the switching threshold of the inverter.

More generally, the effect of power supply variation is to increase or decrease the effective speed of a circuit. Higher voltages result in increased speed for a given gate length due to increased current drive through a transistor. Analogously, lower voltages result in lower speeds.

Furthermore, power supply variation can extensively affect sensitive biasing circuitry which must output very stable currents or voltages. In the simplest case, a diode-connected FET, like that shown in Figure 2-9, acts as a biasing element by mir-
roring a current. Since the drain and gate of the transistor are shorted together, the transistor acts as a resistor, with resistance controlled by the sizing of the transistor. In this simplest of cases, the entire voltage supply is dropped across the resistor and diode-connected FET. The sum of these series resistances gives rise to a generated current based on the value of $V_{DD}$. However, if $V_{DD}$ varies, so too will the current through the diode-connected FET as well as the mirrored current. In light of such variation, this simple circuit is not very robust.

2.3.2 Temperature Variation ($\Delta T$)

Temperature can vary due to 1) ambient temperature or 2) heat due to power dissipation. The latter is a more dominant effect in circuit performance as portions of a chip can locally heat due to increased activity factors. Heat can spread to surrounding circuitry if adequate heat removal is not present. In current designs, even with large, advanced heat removal techniques, local hot spots are a problem.

Two important MOSFET parameters are temperature dependent: threshold voltage ($V_t$) and mobility ($\mu$). Eq. 2.10 shows how $V_t$ is dependent on temperature. $C$ is a constant less than zero, meaning that an increase in temperature results in a decrease in $V_t$. Eq. 2.11 correspondingly shows the temperature dependence of mobility. Here
Figure 2-9: Simple current mirror to demonstrate effect of power supply variation on biasing elements

again, an increase in temperature results in a decrease in mobility.

\[ V_{th}(T) = V_{th}(T_0)[1 + C(T - T_0)] \]  
\[ \mu(T) = \mu(T_0)\left(\frac{T}{T_0}\right)^k, \quad -3 < k < -1.2 \]

When these two equations are included in the saturation current equation, Eq. 2.2, an increase in temperature results in a decrease of current in the saturation region. In an analog circuit, this results in a change of the DC operating point of the circuit.

2.3.3 Noise

Noise is typically defined to be undesired random signals. Analog circuits are sensitive to noise due to the small signals present in the analog signal path. If the amplitude of noise is too large, the actual signal will be obscured by noise. When analog circuits are present in the midst of digital circuits, they are more prone to noise due to the high-frequency harmonics created by the digital signals switching between the power
and ground rails with very fast rise and fall times. The harmonics then capacitively couple into the nearby analog circuitry, creating noise that interferes with the analog signals. This coupling of noise onto the analog signal results in jitter at the output of the clock receiver circuit designed in this thesis.

2.4 Summary

This chapter has documented the numerous sources of variation that exist in the optical network, CMOS fabrication steps, as well as environmentally. Many of the circuit and performance dependencies on varying parameters were also detailed. The next chapter will detail the circuit topology and design in order to minimize the effects of the variation studied in this chapter.
Chapter 3

Circuit Design for Robustness

The design of an optical clock receiver circuit for on-chip optical clock distribution is presented in this chapter. Much of the design methodology focuses on robustness to the numerous variation sources presented in the last chapter. Section 3.1 presents the overall topology of the circuit. Subsequent sections will walk through the individual stages presented in the topology. Each section will also discuss the design decisions made to ensure robustness to variation.

3.1 Circuit Topology

The generalized architecture of most optoelectronic receiver circuits is shown in Figure 3-1. A photodiode directly converts the optical signal to an electrical signal in the form of a current, proportional to the intensity of the light shining on the photodiode. This electric current must then be converted to a voltage. A transimpedance amplifier is typically used to achieve this conversion. At the output of the transimpedance amplifier is a voltage signal of a few millivolts at most. In order to have a digital signal at the output, the signal must go through an amplification stage before being sent to the output stage, where final amplification occurs and the signal is railed to digital signal levels and buffered to drive the output load.

The circuit in this chapter is a fully-differential architecture. Among the most significant advantages of using a differential architecture are Power-Supply Rejection...
(PSR) and Common-Mode Rejection (CMR). This allows for higher bandwidths and thus faster circuit operation.

The following sections present the circuit details of each stage as well as necessary biasing circuits.

### 3.2 Photodetector

A photodetector is a device that converts from the optical domain to the electrical domain. In most applications, and this thesis, a photodiode is used to achieve this conversion. However, other exotic devices such as phototransistors exist and are used in specialized applications. Since this receiver circuit is fabricated in a silicon CMOS process, silicon photodiodes could be used, simplifying integration issues. However, there are disadvantages associated with silicon photodiodes. In order to understand these disadvantages, we must look at the fundamental processes by which semiconductors absorb photons.

Photon energy is transferred to an electron only if the electron is in the valence band of a semiconductor. If a photon possesses an energy greater than the bandgap energy, $E_g$, of the material, than a collision with an electron will cause the electron to absorb this energy and move to the conduction band. In this process, an electron-hole pair is generated. The energy of a photon is related to the wavelength of the photon by Planck's constant, $h$. As such, the wavelength of light that can be absorbed by a semiconductor is given by Eq. 3.1. Furthermore, there is also an absorption coefficient
for a photodetector, determined by the semiconductor used to fabricate the detector. This coefficient determines the penetration depth of light in the semiconductor and is strongly dependent on wavelength of light [6].

\[
\lambda_c = \frac{hc_0}{E_g} \quad (3.1)
\]

The dependence of absorption on wavelength means that there is some onset of absorption. This onset is dependent on the kind of transition that is necessary to move an electron from the valence band to the conduction band. A direct band-band transition requires no phonon to conserve momentum when the electron transitions from the valence band to the conduction band. An indirect band-band transition requires a phonon to enable the transition from the conduction band to the valence band. The difference in these types of transitions means that the onset of absorption for direct bandgap materials is much steeper than for indirect bandgap materials. The point to be taken is that direct bandgap materials absorb photons more readily and generate larger photocurrents than indirect bandgap materials.

Silicon is not a direct bandgap material. Materials that are direct bandgap materials include Gallium Arsenide (GaAs), Indium Phosphide (InP), Germanium (Ge), and Indium-Gallium-Arsenide (InGaAs). As a result, the absorption coefficient of Si is one to two orders of magnitude lower than that of direct materials. To compensate for this, a much thicker absorption zone is necessary.

In addition, long minority carrier lifetimes degrade the bandwidth of Si photodiodes. GaAs and InP on Si are more interesting materials for very fast photodetector circuits. Since the circuit in this thesis will be operating in the GHz range, InP photodiodes are chosen to realize these speeds. Since InP photodiodes can not be directly fabricated on a Si wafer, they are fabricated on a separate handle wafer and then flip-chip bonded to the Si die. This integration process is shown in Figure 2-5 and discussed in Chapter 2, Section 2.1.3.

As discussed above, conversion from an optical signal to an electrical signal is achieved by converting photon energy to electron energy. This conversion process
results in a generated “photocurrent.” However, the amplitude of this photocurrent is very small, often being in the microamp ($\mu A$) range, depending on the optical power being shone on the device. This small current must be converted to a voltage before it can be amplified. The following section describes how this is achieved.

### 3.3 Transimpedance Stage

A transimpedance stage is used to convert a current to a voltage. The simplest form of a transimpedance element is a resistor. The relationship between current and voltage in a resistor is shown in Eq. 3.2, otherwise known as Ohm’s Law. The voltage across the terminals of a resistor is dependent on the resistance ($R$) and the current ($I$) flowing through the resistor.

$$V = IR \quad (3.2)$$

### 3.3.1 Typical Transimpedance Stages

Resistors by themselves are not typically used to achieve transimpedance due to the significant loading that would occur. Rather, an operational amplifier with resistive feedback (shown in Figure 3-2) is typically used.

However, this requires a very high-gain operational amplifier at signal frequency.
When there is high gain, the transimpedance gain will approach, $R_f$, the value of the feedback resistor. As the circuit will be operating in the GHz range, the high gain necessary is hard to achieve. In the digital 0.18μm process, the maximum gain achievable at these frequencies is much less than 100 without using multiple gain stages. Low gain degrades the transimpedance relationship as seen in Eq. 3.3. With a large enough feedback resistor, $R_f$, it may still be possible to achieve the necessary transimpedance gain. However, the larger the resistor used in this configuration, the lower the bandwidth achievable. This occurs because one node of the resistor is connected to the photodiode which has a significant capacitance (on the order of ∼160fF). In combination, the large resistor and photodiode capacitance form a low-pass filter that degrades the bandwidth at the input. In order to use a large resistor for transimpedance, it must be decoupled from the input node. Ideally, a very low-impedance node should be connected to the photodiode to maximize the bandwidth at the input node. The following subsections present two circuits that are capable of high transimpedance gain at signal frequencies, the latter of which is used in this design.

\[
\frac{V_{out}}{I_{in}} = \frac{A \times R_f}{1 + A} \quad (3.3)
\]

### 3.3.2 Previous Transimpedance Stage

Lum uses the transimpedance stage shown in Figure 3-3 [4]. The diode-connected transistor connected to the photodiode, acting as a resistor, is the transimpedance element. By mirroring the current to a cascode amplification stage, the transimpedance gain is $\frac{N}{g_m}$. However, the input impedance is the inverse of the transconductance of the diode-connected MOSFET ($\frac{1}{g_m}$). In order to achieve a high transimpedance gain and low input impedance, there are certain design constraints: (1) the diode-connected FET must be sized large to achieve a low input impedance as transconductance is directly proportional to the width of a device; (2) the current mirror ratio must be large to achieve a high transimpedance gain, or the diode-connected FET must be small to achieve low transconductance and thus high gain. If the diode-connected
FET must be large for low input impedance, the mirror device at the bottom of the cascode stage must be even larger. These are opposing constraints that limit both the input impedance and the transimpedance gain. Furthermore, increasing device size scales up the capacitance seen at the input node, which is already significant due to two gate capacitances and two gate-to-drain capacitances. All of these capacitors add to the capacitance of the photodiode.

3.3.3 Low Input-Impedance, High Gain Transimpedance Stage

In order to achieve low input impedance and high transimpedance gain, both parameters need to be decoupled from each other. One way of doing this is to use a real resistor as the transimpedance element. This can be achieved by feeding the photocurrent into the source of a transistor and shunting the current into a resistor. This circuit is shown in Figure 3-4. By using this transimpedance stage, the input impedance is now the impedance looking into the source of the FET. Since this looks very much like a common-gate amplifier, the input impedance, $R_{in}$, is still $\frac{1}{s_m}$. However, since the photocurrent is now shunted to the resistor, the transimpedance gain is simply the value of the resistor, $R$. This allows better control over both parameters as the input impedance is controlled by the sizing of the FET and the transimpedance gain is controlled by the size of the resistor at the drain of the FET. Additionally,
The capacitance at the input node is now only that of the photodiode, a single gate-to-drain capacitance and a single gate capacitance.

While better control over the two parameters has been achieved, there has been no change in the input impedance. Lower input impedance can be achieved by adding feedback to the circuit above. By adding a feedback loop as pictured in Figure 3-5, the input impedance can be lowered by a factor of $g_m R_{out}$.

The easiest way to see this is to break the feedback loop at the source of transistor Mn2p. By putting a current source at the source of Mn2p and finding the resulting voltage, the input impedance can be calculated. This is seen in Figure 3-6. It is assumed that no AC current flows toward the current source at node A as current sources act as AC ground. The AC current flows through transistor Mn2p and causes a voltage to be produced at the gate of Mn2p. The transconductance of the transistor relates the current to the produced voltage and so the voltage is given by Eq. 3.4.

$$v_a = \frac{i_m}{g_m}$$ (3.4)

There is also a gain through transistor Mn1p. Typically, the gain from gate to drain of a common-source stage (as Mn1p appears) is $g_m R_{out}$. However, here the signal is coming from drain to gate and so the gain is the inverse, or $\frac{1}{g_m R_{out}}$, where
Figure 3-5: Improved transimpedance stage with lower input impedance

Figure 3-6: Breaking the feedback loop to calculate $R_{in}$
\( R_{out} \) is the equivalent resistance of the output resistance of Mn1p and the output resistance of the current source in parallel. The total gain at node \( b \) is given by Eq. 3.5. The input impedance is then given by \( \frac{v_b}{i_{in}} \). Eq. 3.6 shows the reduced input impedance for the transimpedance stage with feedback.

\[
v_b = \frac{i_{in}}{g_{m1}(g_{m2}R_{out})}
\]

(3.5)

\[
R_{in} \approx \frac{v_b}{i_{in}} \approx \frac{i_{in}}{i_{in}(g_{m1}(g_{m2}R_{out}))} \approx \frac{1}{g_{m1}(g_{m2}R_{out})}
\]

(3.6)

This is only an approximation as it does not take into account many high-frequency parameters as well as the back-gate effect, \( g_{m6} \). Since the circuit is fully-differential, a replica of the circuit is added for the negative signal, as shown in Figure 3-7. The currents in all legs are intentionally kept large (500\( \mu \)A) so that the effects of noise are minimal. The sizing of the resistors is 1k\( \Omega \), which means that the transimpedance gain is also 1k\( \Omega \). This sets the common-mode voltage of the signal at 1.3V, enough to ensure that the inputs of subsequent gain stages are sufficiently driven. Additionally, a relatively high common-mode voltage allows for a large \( V_{ds} \) for the input transistors of the gain stages, a requirement to ensure that those transistors are well into the saturation region and \( V_{ds} \) modulation does not affect circuit operation significantly. These subsequent gain stages are the subject of the next section.

### 3.4 Amplification Stage

With a transimpedance gain of 1k\( \Omega \), a photocurrent of a few \( \mu \)As will be converted into a voltage with a peak-to-peak amplitude of a few millivolts. In order to clock digital logic, this signal must be amplified to a rail-to-rail digital signal, requiring a gain on the order of 200\( V/V \). Achieving this gain in a single stage in a 0.18\( \mu m \) process at high bandwidth (> 1GHz) is impossible. Instead, multiple gain stages, each with substantially lower individual gain, are used to achieve this amplification. Once the signal is \( \sim 150mV \) to 200\( mV \), it can be fed into output stage inverters which
rail the signal to digital logic levels. As such, the necessary analog amplification is reduced to \( \sim 20V/V \). Even this is hard to achieve in a single stage with a bandwidth requirement of \( >1GHz \), and multiple gain stages must be used.

### 3.4.1 A Passively-Loaded Differential Pair Amplifier

There are many amplifier topologies that one can use to achieve high gain. However, it is hard to achieve both high gain and high bandwidth. In this thesis, a low-gain, high-bandwidth amplifier is designed with the intention of using multiple amplifier stages. It is necessary to achieve high bandwidths so that the circuit is not operating too far past its 3dB point, as both gain and phase margin are severely degraded beyond this point.

Figure 3-8 shows a resistively loaded differential pair amplifier. The input differential signal drives transistors Mn and Mp. The signal is amplified through these common-source transistors with resistors Rlp and Rln being the loads. While these resistors are in parallel with the output resistance, \( R_{ds} \), of transistors Mn and Mp, Rlp and Rln are typically much smaller than \( R_{ds} \) and when put in parallel, dominate the resistance. The differential gain of the amplifier is given by Eq. 3.7.

\[
A_v = g_m R_l \tag{3.7}
\]
Additionally, two important parameters of differential amplifiers must be analyzed: (1) Common-Mode Rejection Ratio (CMRR); and (2) Power-Supply Rejection Ratio (PSRR). Good performance with respect to these two ratios is a fundamental reason a fully-differential approach was chosen.

The differential pair rejects common-mode input signals [16]. In the case that the circuit is perfectly symmetrical, the common-mode gain of the amplifier will be zero. However, in practical circuits, this is the never the case even when the circuit is designed and laid out as carefully as possible. If there is a slight mismatch in the transistor parameters or the resistors at the drain of each transistor, there will be a small, finite common-mode gain. However, if the mismatch is relatively small, the CMRR will be much larger than in the case of a single-ended output. Noise will also likely affect both signals equally. As such, noise will tend to look like a common-mode signal and be rejected by the differential pair. The PSRR is typically defined for an amplifier with single-ended output, but is also defined for a differential output when there exists mismatch and nonidealities. However, the PSRR will be relatively high if the mismatch is small, much like the CMRR. Analysis for both mismatch and noise in presented in Sections 4.2.6 and 4.4, respectively.
In this design, bandwidth is a critical design parameter. As such, resistors are used instead of PMOS devices. Due to their lower mobilities (compared to NMOS devices) and higher gate-drain capacitances, $C_{gd}$, PMOS devices limit the overall speed of an amplifier. While much higher gain could have been achieved by using an actively-loaded differential pair, bandwidth would have been sacrificed. Furthermore, the small sizing of the resistors means that area is not a major concern. There is, however, a concern that resistor variation may be worse than PMOS device variation. The resistors chosen in this design are unsilicided P+ polysilicon resistors. These have a sheet resistance of $311 \Omega$ and $4\sigma$ variation of 27%. This variation seems bad, but matching between two identical and spatially near resistors is much better than this. In Chapter 4, resistor variation will be analyzed to see its effects on circuit operation and performance.

Using the design optimization method discussed below, the designed amplifiers have a gain of 2-2.5 with a 3dB bandwidth of ~10GHz with 260μA of current being drawn through the tail current source. The nominal output common-mode voltage is 1.3V; Table 3.1 lists additional amplifier parameters. As will be seen in Subsection 3.4.2, multiple stages of these amplifiers will need to be used to achieve the appropriate amount of gain.

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>$R_{fp}$, $R_{fn}$</td>
<td>3.84kΩ</td>
</tr>
<tr>
<td>Input transistor width (W)</td>
<td>3μm</td>
</tr>
<tr>
<td>Input transistor length (L)</td>
<td>0.18μm</td>
</tr>
<tr>
<td>Simulated gain ($A_V$)</td>
<td>~2 – 2.5</td>
</tr>
<tr>
<td>Simulated 3dB bandwidth</td>
<td>9.45GHz</td>
</tr>
<tr>
<td>Current through tail current source</td>
<td>260μA</td>
</tr>
<tr>
<td>Nominal output common-mode voltage ($V_{out,cm}$)</td>
<td>1.3V</td>
</tr>
</tbody>
</table>

Table 3.1: Designed amplifier parameters
Design Optimization

The design of the amplifier can be optimized by using simulated $g_m$-curves of transistors [24]. The transconductance of a device is given by Eq. 3.8. Furthermore, Eqs. 3.7 and 3.9 define the differential gain and swing of the amplifier. Dividing by the width of the device in all three of these equations leaves everything in terms of a current density, $I_{density}$, and independent of width. This is shown in Eqs. 3.10-3.12.

$$g_m = \mu_n C_{ox} \left( \frac{W}{L} \right) (V_{gs} - V_{tn}) = \sqrt{2\mu_n C_{ox}} \sqrt{\frac{W}{L}} \sqrt{I_D} \tag{3.8}$$

$$V_{swing} = I_{bias} R_i \tag{3.9}$$

$$\frac{g_m(I_{density})}{W} = \frac{g_m}{W} = \sqrt{2\mu_n C_{ox}} \sqrt{\frac{1}{L}} \sqrt{I_{density}} \tag{3.10}$$

$$A_v(I_{density}) = \frac{A_v}{W} = \frac{g_m(I_{density})}{R_i} \tag{3.11}$$

$$\frac{V_{swing}(I_{density})}{W} = \frac{V_{swing}}{W} = I_{density} R_i \tag{3.12}$$

Eqs. 3.11 and 3.12 both have a common term, $R_i$; solving for $R_i$ in each of the equations and setting the resultant equations equal to each other (Eq. 3.13), $g_m(I_{density})$ can be solved for in terms of the desired amplifier parameters, $A_v$ and $V_{swing}$, as shown in Eq. 3.14.

$$\frac{V_{swing}}{I_{density}} = \frac{A_v}{g_m(I_{density})} \tag{3.13}$$

$$g_m(I_{density}) = \frac{A_v \cdot I_{density}}{V_{swing}} \tag{3.14}$$

By plotting the $g_m$-curve of a single transistor in HSpice, this data can be brought into Matlab and processed to find the optimal design parameters. The designer can
choose \( A_v \) and \( V_{swing} \) to accommodate the design parameters needed for the circuit. The \( g_m \)-curve imported from HSpice can be divided by the width of the simulated device and plotted versus \( I_{density} \). Similarly, the chosen values of \( A_v \) and \( V_{swing} \) can be plotted versus \( I_{density} \) on the same plot. From Eq. 3.14, the intersection of the two curves will result in an optimal current density which satisfies the design parameters \( A_v \) and \( V_{swing} \). Figure 3-9 shows an example of these two curves plotted together. The widths of the input devices can be picked such that the resulting current is sufficient to drive the load of the next stage or any other parasitic load it may need to drive while maintaining sufficient bandwidth headroom. Knowing the width and current, the size of the resistor is easily obtained from Eq. 3.9. Furthermore, the size of the tail FET (not shown) that acts as the current source is easy to derive.

![Amplifier Optimization](image)

**Figure 3-9:** Amplifier optimization using simulated \( g_m \)-curves

### 3.4.2 Multiple Gain Stages

As seen in the previous section, each of the passively loaded amplifiers has a gain of about 2-2.5 V/V. In order to amplify the signal enough to send it to the output stage,
four to five gain stages must be used. This results in a total gain that is more than necessary. However, it is useful to have extra gain as parasitic elements will degrade bandwidths. Furthermore, loading caused by subsequent stages and interconnect reduces the $3dB$ bandwidth of each amplifier to $\sim$3GHz. Thus gain is also lowered and so the "extra" gain stages are necessary.

The total gain of the amplification stage without any loading is nominally 186 V/V (45dB), with a $3dB$ bandwidth of 1.16GHz. This is seen in Figure 3-10. If the circuit is running in the GHz range, it will most likely be running beyond the $3dB$ point, where gain is reduced. For example, the gain at 2GHz is 81.9 V/V. The need for the "extra" gain stages discussed above is now obvious: with the output stage loading the amplification stage, the gain/bandwidth will be further degraded. Nevertheless, the five gain stages used are sufficient for amplification, even when simulated at the SS (Slow PMOS, Slow NMOS) process corner as will be seen in Chapter 4.

Figure 3-10: Total gain of amplification stage (in dB)
Replica Feedback Biasing

Since the signal is being passed into output-stage inverters without being a full-swing signal, it is important that it be properly biased at the switching threshold of the inverter. Prior to this, the common-mode voltage was not significant (within a range) due to the common-mode rejection property of differential amplifiers. However, in order to use an inverter as an amplifier, it is crucial that the signal be properly biased: an improperly biased signal will cause a degradation of the signal and even complete signal loss at the output of an inverter. Furthermore, the DC value of the signal in relation to the switching threshold of the inverter determines the duty cycle of the output signal. If the signal is biased precisely at the inverter switching threshold, the output signal will ideally have a 50% duty cycle, with deviations in the bias point resulting in an unequal duty cycle.

Typically, a correctly designed inverter will have a nominal switching threshold of $\frac{V_{DD}}{2}$. However, mismatch, process variation and environmental variation will cause the actual threshold to vary about $\frac{V_{DD}}{2}$. In order to ensure that the signal is properly biased regardless of mismatch and variation, a replica feedback biasing scheme is used in conjunction with the last gain stage, as shown in Figure 3-11.

The differential signal is low-passed filtered to get the common-mode DC value of the signal. This DC value is compared to a reference inverter with its input and output tied together. The op-amp adjusts the tail current source such that the current through the amplifier is modified until the common-mode DC value of the differential signal is equal to the switching threshold of the reference inverter. By using a reference inverter which is a replica of the ones used in the output stage, the DC value can be dynamically adjusted for process and environmental variation as the inverters’ switching threshold shifts.

Since the current is being modified through the amplifier, the transconductance, $g_m$, of the input transistors is also being slightly modified, which affects gain. However, this can be mitigated by redesigning the amplifier such that it is not simply a copy of the stages before it. Rather, it is designed such that the swing and normal
operating point are set very close to the switching threshold of an inverter. By doing this, any dynamic modifications of the current by the op-amp are very small and the effect on $g_m$ of the input transistors is small also.

With the signal now amplified to a few hundred millivolts and properly biased, the output stage can rail the signal to digital logic levels.

### 3.5 Output Stage

Figure 3-12 shows the signal at the output of the amplification stage, just before the output stage. The role of the output stage is to take this differential signal and rail it to digital logic levels, in this case a low of 0V and a high of 1.8V. The easiest way of doing this is to use a series of cascaded inverters as depicted in Figure 3-13.

Since the gain of an individual inverter is approximately $5V/V$ at 2GHz, only one inverter is needed to rail the signal. Two extra inverters are cascaded to buffer the signal and produce a signal with acceptable rise and fall times. If process parameters vary, it is possible that the input differential signal will not be as large in amplitude
Figure 3-12: Differential signal at output of amplification stage

as the signal seen in Figure 3-12. The additional inverters will then aid in railing the signal. Furthermore, in this case it is important that the output of the first inverter be correctly biased for the subsequent inverter stages. A MOS triode device (PFET with gate tied to GND) is used in a resistive feedback loop around the first inverter, as seen in Figure 3-13. This resistive feedback ensures that subsequent inverter stages see a properly biased signal and that the duty cycle of the output signal is as close to 50% as possible.

Since three cascaded inverters may not be enough to drive an arbitrary load, additional inverters may have to be added on an application-specific basis in order to buffer the signal and drive the load present.

### 3.6 Offset Compensation

Earlier it was mentioned that mismatch reduces the otherwise infinite CMRR and PSRR to a finite value in practical fully-differential circuits. Mismatch can have an even more devastating effect if the extent of the mismatch is too great. If there is
a small offset, due to mismatch, at the output of the input stage, the gain of the amplification stage will result in the positive and negative signals of the differential signal having two very different common-mode values. Going into the inverters of the output stage, the signal may be lost altogether. Circuitry to compensate for this offset must be integrated if the receiver is to function.

3.6.1 Known Offset Compensation Methods

There are three common circuit techniques for offset compensation: autozeroing, correlated double sampling, and chopper stabilization [25].

The idea behind autozeroing and correlated double sampling is to sample the unwanted quantity (noise or offset) and then subtract it from the instantaneous value of the input signal. Both of these methods require the use of two phases ($\phi_1$ and $\phi_2$) in which the offset or noise are sampled during phase $\phi_1$ and then nulled during a signal-processing phase, $\phi_2$. This works well in discrete time or sampled-data systems, but not in continuous-time systems. Furthermore, both techniques assume the presence of a clock (or multiple clocks). Since the goal of the circuit in this thesis is to generate such a clock signal, this assumption is violated. A continuous-time approach to
autozeroing is also presented in [25], but once again requires the presence of multiple phases of a clock.

The chopper stabilization technique eliminates low-frequency noise and offsets by modulating the signal undergoing amplification to a higher frequency where the low-frequency noise and offset do not exist, amplifying the signal at the higher frequency, and then demodulating the signal back to baseband. Once again, an additional, periodic, square-wave carrier signal is required to achieve the modulation. Effectively, a clock is needed and so this technique is also unavailable for use in this application.

3.6.2 A Simple Differential Pair

![Offset compensation circuitry](image)

Figure 3-14: Offset compensation circuitry

In order to compensate for possible mismatch, a simple offset compensation circuit was developed and added as illustrated in Figure 3-14. The outputs of the amplification stage (outp5, outn5) are low-pass filtered using the simple RC network shown. In order to ensure stability of the feedback loop, the RC filter must be large; a MOS triode device is used to achieve a large resistor without the area penalty. However, an off-chip capacitor must be used as the capacitance needs to be \(~10\mu F\) to achieve a very low pole. As clock frequencies increase, this pole can increase and the off-chip capacitor can be brought on-chip while maintaining stability of the feedback loop. It
should also be noted that all of the techniques above require large sampling capacitors. The low-pass filtered signals are compared using a simple differential pair as shown. The drains of the transistors that make up the differential pair are directly connected to the outputs of the input stage (outp, outn). The effect of this is to simply change the bias point of those signals so that any mismatch/offset is compensated.

To prevent loading on the input stage, the current through each leg of the differential pair is 10% (50μA) of the current through each leg of the input stage. Furthermore, the large resistance of the MOS triode device is much greater than the resistance in the last gain stage of the amplification stage and so loading is not an issue. All transistors in the offset compensation circuitry are long-channel devices (0.5μm) to make them more robust to process variation.

The results of Monte Carlo analysis in Chapter 4 show that this circuitry performs well.

### 3.7 Biasing

Both the transimpedance stage and the amplification stage require a number of current sources. To realize these current sources, generated currents are distributed to current mirrors that mirror the current to the tail FET of each differential pair, as well as those in the transimpedance stage. Distributing these currents requires a single, stable current reference from which all other currents are referenced. In this thesis a process-compensated current reference was chosen to output a very stable current, which is then mirrored by primarily PMOS current mirrors to provide currents elsewhere in the circuit.

#### 3.7.1 Process-Compensated Current Reference

Narendra et al. have developed a process-compensated MOS current generation concept that does not require a reference voltage [5]; the circuit is shown in Figure 3-15. The theoretical concept is to take the saturation current of two MOSFET devices, $I_1$ and $I_2$, and use the variation present (due to process variability, as discussed in
Chapter 2) in these two currents to cancel out variations in the difference of the two currents, i.e. \( I_{\text{ref}} = I_1 - I_2 \). Eqs. 3.15-3.16 show the saturation currents, \( I_1 \) and \( I_2 \). Chapter 2 showed that process and environmental variation will affect \( \beta \) due to mobility (\( \mu \)) variation, \( V_t \) due to channel doping (\( N_a \)) and oxide thickness (\( t_{ox} \)) variation, and the length (\( l \)) of the device. By taking the derivative of the saturation currents, \( I_1 \) and \( I_2 \), with respect to process varying parameters \( dP \), the change in currents is given by Eqs. 3.17 and 3.18. In order to make \( \frac{dI_{\text{ref}}}{dP} \approx 0 \), the variation in \( I_1 \) must cancel out the variation in \( I_2 \) (i.e. \( \frac{dI_{\text{ref}}}{dP} = \frac{dI_1}{dP} - \frac{dI_2}{dP} \approx 0 \)).

\[
I_1 \approx \beta z_1 (V_{gs1} - V_t)^2 \quad I_2 \approx \beta z_2 (V_{gs2} - V_t)^2
\]  

\[
\beta = \mu C_{ox}, \quad V_t \approx \frac{\sqrt{qN_s \varepsilon_s \phi}}{C_{ox}}; \quad z_1 = \frac{W_1}{2L}; \quad z_2 = \frac{W_2}{2L}
\]  

\[
\frac{dI_1}{dP} \approx z_1 (V_{gs1} - V_t)^2 \frac{d\beta}{dP} - 2\beta z_1 (V_{gs1} - V_t) \frac{d\beta}{dP} \frac{dV_t}{dP} \frac{d\beta}{dP}
\]
\[
\frac{dI_2}{dP} \approx z_2(V_{gs2} - V_i)^2 \frac{d\beta}{dP} - 2\beta z_2(V_{gs2} - V_i) \frac{d\beta}{dP} \frac{dV_i}{d\beta}
\]  (3.18)

One method of achieving this variation cancellation is to cross-cancel each of the two terms in Eqs. 3.17 and 3.18: the first term of Eq. 3.17 is equated to the second term of Eq. 3.18. Analogously, the second term of Eq. 3.17 is equated to the first term of Eq. 3.18. Making \(V_{gs2} = aV_i\) and \(V_{gs1} = bV_i\) facilitates this cross-cancellation of variation, as shown in Eqs. 3.19 and 3.20. These steps ensure that the difference of the two currents produces a stable current, immune to the effects of variation.

\[
z_1(bV_i - V_i)^2 = 2\beta z_2(V_{gs2} - V_i) \frac{dV_i}{d\beta}
\]  (3.19)

\[
z_2(aV_i - V_i)^2 = 2\beta z_1(V_{gs1} - V_i) \frac{dV_i}{d\beta}
\]  (3.20)

Solving both of the above equations simultaneously and coupling with \(I_{ref} = I_1 - I_2\), one ends up with Eqs. 3.21 and 3.22 as constraints for the values \(a, b, z_1\) and \(z_2\).

\[
(a - 1)(b - 1) = 4
\]  (3.21)

\[
\frac{z_1}{z_2} = \left(\frac{a - 1}{b - 1}\right)^{\frac{3}{2}}
\]  (3.22)

If one were to find appropriate values for \(a, b, z_1\) and \(z_2\), it would require \(a\) or \(b\) being > 3 in order to ensure that \(\frac{dI_{ref}}{dP} \approx 0\). However, in the TSMC 0.18\(\mu\)m, 1.8V process, where threshold voltages are \(\approx 0.45V\), this would mean that \(V_{gs}\) for one of the devices would be greater than \(V_{DD}\). Narendra et al. designed this circuit for a sub-1V process where \(V_i \approx 0.1V\). In order to use the circuit with a 1.8V process, the parameters were chosen as shown in Table 3.2.

Parameters \(a\) and \(b\) require that the gate voltage, \(V_{gs}\), on the two PMOS transistors be multiples of the threshold voltage. In order to generate these voltages, the circuit in Figure 3-16 has been used. By putting three PMOS transistors in series, with
their gates and drains shorted, and having a very small current flow through them, the voltage drop across each transistor is exactly the threshold voltage of the device. It is important to note that each of the transistors has its body terminal shorted to its source to eliminate the back-gate effect when $V_{BS} > 0V$. This will result in an important layout consideration (discussed in Chapter 5) to ensure latch-up does not occur when the body terminal is not connected to $V_{DD}$.

Since the parameters shown in the table do not satisfy Eq. 3.21, the generated current will vary with respect to process variation. Table 3.3 shows the generated $I_{ref}$ at the different process corners. We see that at the SS corner (that is, all devices are “slow” or weak), the current increases with respect to the typical corner and at the FF corner (where all devices are “fast” or strong), the current decreases. This variation actually aids in making the circuit more robust. Typically, $V_t$ increases at
the SS corner and the gate overdrive is decreased for a fixed \( V_{gs} \); this implies that less current will flow through the device and \( g_m \) will decrease, degrading amplifier gain with it. An increase in generated current results in an increase, rather than a decrease, in the current flowing through the amplifiers in the gain stages, helping to ensure that the variation in gain is small. Analogously, the opposite occurs at the FF corner, also helping to ensure a relatively constant gain across process corners. The reference currents generated by the network of current mirrors (discussed below) varies by nearly identical percentages across process corners.

<table>
<thead>
<tr>
<th>Process Corner</th>
<th>( I_{ref} )</th>
<th>Percent Difference from TT Corner</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT</td>
<td>5.6626( \mu A )</td>
<td>0%</td>
</tr>
<tr>
<td>SS</td>
<td>6.2102( \mu A )</td>
<td>+9.67%</td>
</tr>
<tr>
<td>FF</td>
<td>4.4863( \mu A )</td>
<td>-20.77%</td>
</tr>
</tbody>
</table>

Table 3.3: Generated \( I_{ref} \) across process corners

Unfortunately, the generated current does not display the same dependence across temperature variation: for +/-25°C variation in temperature, there is +/-11.9% variation in generated current. However, as Section 4.3.1 shows, the overall receiver skew still remains relatively low. Rather than simply look at the generated \( I_{ref} \) across process corners and temperatures, clock skew due to variation in the reference current across process corners can be analyzed to give these results more context. Table 3.4 briefly summarizes skew across process corners for the complete circuit when using the process-compensated current reference as well as when ideal current sources are used wherever needed. It is easily seen that skew between the TT and FF corners is nearly equivalent when using this current reference versus ideal current sources. It is not much worse between TT and SS corners. More discussion of this will occur in Chapter 4. For now, it suffices to say that the process-compensated current reference performs just as well or better than using ideal current sources.
<table>
<thead>
<tr>
<th>Process Corners</th>
<th>Skew Using Current Reference</th>
<th>Skew Using Ideal Current Sources</th>
</tr>
</thead>
<tbody>
<tr>
<td>$TT \rightarrow FF$</td>
<td>$\sim -44ps$</td>
<td>$\sim -47.6ps$</td>
</tr>
<tr>
<td>$TT \rightarrow SS$</td>
<td>$\sim +70ps$</td>
<td>$\sim +64.7ps$</td>
</tr>
</tbody>
</table>

Table 3.4: Skew across process corners (using process-compensated current-reference vs. ideal current sources)

3.7.2 Current Mirrors

In order to be of any use, the generated current, $I_{ref}$, must be multiplied up through the use of current mirrors. The current mirror network shown in Figure 3-17 multiplies $I_{ref}$ through the use of ratioed current mirrors. The current from each of the mirrors is then supplied to the rest of the circuit, where needed (transimpedance and amplification stages).

![Current mirror network](image)

Figure 3-17: Current mirror network

Long-channel devices are used for all of the current mirrors shown to make them more robust to process variation and channel modulation effects. Cascode current mirrors could have been used to further increase robustness but the drain-source voltage, $V_{ds}$ of the tail transistor will be decreased, increasing the effects of channel modulation. Furthermore, large $V_{ds}$ allows for better current matching.

3.7.3 Other Biasing Methods

Among the most common methods of biasing involves generation of a stable voltage reference. The most reliable and robust voltage reference to date is a bandgap
voltage reference. The principle behind a bandgap voltage reference is to cancel out temperature variation by adding two voltages with opposite temperature dependencies. By using bipolar transistors this can be achieved: the voltage drop across the base-emitter junction of a bipolar transistor has a negative temperature-dependence, so by subtracting the base-emitter drops of two bipolar transistors, a voltage that is proportional to absolute temperature (PTAT) is generated. This circuit is widely used in analog design where a stable voltage and biasing scheme is needed. Lum also uses this circuitry in his design [4].

Other voltage references that are purely CMOS (requiring no bipolar devices) have been demonstrated [26][27]. However, the fundamental issue with the use of voltage references in this work is that what is really required is not a voltage reference, but a current reference. A stable voltage can be converted to a current by applying the voltage reference to the gate of a transistor; however, process and environmental variation will result in variation in the generated current. Furthermore, this variation will be opposite to the variation needed and provided by the current reference chosen for use in this design. If the gate voltage is kept constant while $V_t$, for example, decreases at the FF corner, the effective $V_{gs}$ will go up, resulting in increased current, precisely the opposite of what is required.

Temperature compensated CMOS current references have also been developed in [28][29]. The design in [28] uses a power supply voltage of 5V with stacked transistors. In the 1.8V process being used, there is not enough headroom to be able to use this circuit. Lee et al. [29] use PMOS transistors to create a square-root circuit. However, the transistors are biased in the sub-threshold region where the currents are extremely small and very prone to process variation.

3.8 Summary

This chapter has detailed the design of the clock receiver circuit. Each section has discussed the design of each stage in the signal path as well as auxiliary circuitry and shown how each design decision has considered the effects of variation. The following
chapter will characterize and quantify the operation of the circuit.
Chapter 4

Results And Analysis

Chapter 3 discussed the design of the circuit and specific design decisions to make the circuit robust to variation. This chapter characterizes the operation of the circuit and quantify its robustness to the many sources of variation detailed in Chapter 2. A brief summary of the nominal operation of the circuit and general results is provided in Section 4.1 to provide context for the variation analysis that follows in Sections 4.2 to 4.4. This analysis is divided into three categories: 1) process variation (Section 4.2), including analysis of process corners variation, individual parameter variation (such as $\Delta V_t$, $\Delta L_{gate}$, and $\Delta t_{ox}$), resistor variation and mismatch; 2) environmental variation (Section 4.3), which focuses on temperature ($\Delta T$) and power-supply ($\Delta V_{DD}$) variation; and 3) noise analysis (Section 4.4).

4.1 Nominal Circuit Operation

The most fundamental parameter to measure in a clock receiver circuit is the speed at which a robust clock can be generated. As can be seen from Figure 4-1, the circuit operates at $2GHz$, with rise and fall times of $\sim 45ps$, and dissipates $12.9mW$ of power. The duty cycle of the positive-going signal of the differential pair is $47.3%/52.7%$. This means that within one cycle, the signal spends $47.3\%$ of its time above $V_{pp}$ and $52.7\%$ of its time below. The duty cycle of the negative-going signal is exactly the inverse, as one would expect.
Figure 4-1: Nominal output waveforms generated by the receiver circuit

A peak-to-peak amplitude of 20μA of generated differential photocurrent is needed to achieve this operation. With an optimistic quantum efficiency of 0.9, the photodiode pair would need to be exposed to ~22.2μW of differential optical power. A more reasonable estimate for the quantum efficiency of the photodiodes is in the 0.5-0.7 range, meaning that more optical power is required to achieve the same generated photocurrent.

In Figure 3-10, the 3dB point of the amplification stage was around 1.16GHz. Thus, the circuit operating at 2GHz is already operating past the 3dB point of the amplifier. While the circuit could operate faster than this, more input optical power would be required to offset the decrease in gain due to circuit operation considerably past the 3dB point.

The output waveforms shown in Figure 4-1 are when the circuit is operating under ideal conditions: TT process corner, 25°C, VDD at exactly 1.8V, and with no
mismatch or noise present. As seen in Chapter 2, practical circuit operation is limited by a number of variation sources. The following sections will explore how the circuit performs when these variations are introduced into circuit operation.

4.2 Process Variation Analysis

This section quantifies circuit operation when subject to process variation. First, analysis at process corners will be performed to analyze how much skew is generated by different transistor parameters. It is important to note that when doing process corner analysis, all transistors are varied in the same manner (i.e. at the SS corner, all PMOS transistors experience the same absolute change in threshold voltage, mobility, etc.; likewise, all NMOS transistors experience the same absolute change in those parameters as well). As such, corner analysis is useful in simulating how one replicate of the circuit will perform with respect to another replicate of the circuit, usually far away from each other on the same die. Secondly, analysis with respect to $\Delta L_{gate}$, $\Delta V_t$ and $\Delta t_{ox}$ individually will be performed. While these are modelled in the corner analysis, their individual effect is harder to extract. Lastly, resistor variation will be analyzed.

4.2.1 Process Corner Analysis

<table>
<thead>
<tr>
<th>Process Corner</th>
<th>$\Delta V_t$</th>
<th>$\Delta L_{gate}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT</td>
<td>0mV (0%)</td>
<td>0mV (0%)</td>
</tr>
<tr>
<td>SS</td>
<td>+100mV (~+22%)</td>
<td>-67mV (~−15%)</td>
</tr>
<tr>
<td>FF</td>
<td>-100mV (~−22%)</td>
<td>+67mV (~+15%)</td>
</tr>
</tbody>
</table>

Table 4.1: TSMC 0.18\mu m process corners ($\Delta V_t$ and $\Delta L_{gate}$)

The circuit was simulated at the process corners shown in Tables 4.1 and 4.2. The values presented in the tables are assumed to correspond to 3$\sigma$-limits [30], with the SS corner values corresponding to $−3\sigma$ (in terms of device performance) and those of

75
<table>
<thead>
<tr>
<th>Process Corner</th>
<th>$\Delta \tau_{ox}$</th>
<th>$\Delta W_{gate}$</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT</td>
<td>0 nm (0%)</td>
<td>0 $\mu$m (0%)</td>
</tr>
<tr>
<td>SS</td>
<td>+0.133 nm (~+3.26%)</td>
<td>-0.0147 $\mu$m (~-6.4%)</td>
</tr>
<tr>
<td>FF</td>
<td>-0.133 nm (~-3.26%)</td>
<td>+0.0147 $\mu$m (~+6.4%)</td>
</tr>
</tbody>
</table>

Table 4.2: TSMC 0.18 $\mu$m process corners ($\Delta \tau_{ox}$ and $\Delta W_{gate}$)

The FF corner corresponding to $+3\sigma$. The output waveforms are shown in Figure 4-2. The circuit operates at all corners, but there are differences in the output waveforms. The most obvious differences are in the arrival times of the edges of the different waveforms and the duty cycles. Only the positive signal (outp8) of the differential signal at the output is shown for clarity.

![Figure 4-2: Output waveforms when subject to process corner variation](image)

Table 4.3 quantifies the differences in these two parameters between the process corners. There is nearly 5% variation in the duty-cycle from SS to FF corner and
114\text{ps} of skew in the worst case (SS to FF corner).

<table>
<thead>
<tr>
<th>Process Corner</th>
<th>Duty Cycle</th>
<th>Skew (from TT Corner)</th>
<th>Rise Time</th>
<th>Fall Time</th>
</tr>
</thead>
<tbody>
<tr>
<td>TT</td>
<td>47.4%/52.6%</td>
<td>0ps</td>
<td>46ps</td>
<td>44ps</td>
</tr>
<tr>
<td>SS</td>
<td>51.5%/48.5%</td>
<td>\sim70ps</td>
<td>44ps</td>
<td>53ps</td>
</tr>
<tr>
<td>FF</td>
<td>46.8%/53.2%</td>
<td>\sim-44ps</td>
<td>39ps</td>
<td>37ps</td>
</tr>
</tbody>
</table>

Table 4.3: Generated skew, duty cycle, rise and fall times due to process corner variation

By looking at skew due to process variation, general trends can be surmised. Smaller gain at the SS corner, for instance, results in a larger latency through the receiver and less sharp rise and fall times, thus resulting in a positive skew. However, larger gain at the FF corner results in a smaller latency through the receiver with sharper rise and fall times and gives rise to a negative skew as compared with the TT corner. However, in order to precisely quantify the effects of process variation, the parameters should be investigated individually.

4.2.2 Gate Length Variation Analysis ($\Delta L_{\text{gate}}$)

Chapter 2 showed that gate length variation can affect circuit performance, primarily by causing a variation in the resultant current and $g_m$ of a transistor. As such, the gain of the circuit is affected. Simulations were done to characterize $\Delta L_{\text{gate}}$ on the performance of this circuit. Figure 4-3 shows the resulting output waveforms, and Table 4.4 summarizes the skew generated by gate length variation. In all of the simulations done, every transistor in the circuit was modified by the same percentage of the minimum gate length (0.18\mu m).

Skew due to $\Delta L_{\text{gate}}$ is reduced by a factor of two from the design in [4]. While the circuit demonstrates improved insensitivity to $\Delta L_{\text{gate}}$, the modification of receiver gain is a concern that must be kept in mind. The best way of making the circuit robust to gate length variation is to increase the gate length of the devices in the circuit. However, doing so will sacrifice bandwidth and area. Designers must decide whether the tradeoff between desired circuit speed and robustness to gate length
variation is acceptable for the particular application they are designing for. In this design, minimum gate length devices were used primarily for maximum bandwidth as well as to understand the effect of variation on circuit operation at high performance.

<table>
<thead>
<tr>
<th>$\Delta L_{\text{gate}}$ (%)</th>
<th>Skew (from TT Corner)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$-0.018\mu m$ (-10%)</td>
<td>$-36.9ps$</td>
</tr>
<tr>
<td>$-0.009\mu m$ (-5%)</td>
<td>$-19.6ps$</td>
</tr>
<tr>
<td>$+0.009\mu m$ (+5%)</td>
<td>$+20.4ps$</td>
</tr>
<tr>
<td>$+0.018\mu m$ (+10%)</td>
<td>$+39.5ps$</td>
</tr>
</tbody>
</table>

Table 4.4: Generated skew due to gate length variation

Figure 4-3: Output waveforms when subject to $L_{\text{gate}}$ variation
4.2.3 Threshold Voltage Variation Analysis ($\Delta V_t$)

Variation in threshold voltage changes the effective gate-to-source voltage ($V_{gs}$) of a transistor (if everything else is held constant). The best way of designing to ensure robustness to threshold voltage variation is to use large gate drive voltages, $V_{gs}$, so that variation is small relative to the applied $V_{gs}$. Furthermore, in the case of biasing elements, large $V_{gs}$ as well as process compensation benefits receiver robustness. This design uses both of these techniques to increase receiver insensitivity to threshold voltage variation. Figure 4-4 shows the output waveforms when threshold voltage variation is applied to the circuit. The circuit was subject to threshold voltage variation of up to +/-20%. Only the waveforms corresponding to +/- 20% and +/-10% are shown in the figure for clarity. Table 4.5 quantifies the skew seen in the output waveforms.

The receiver shows a remarkable insensitivity to threshold variation, especially
when the threshold voltage is decreased from nominal. There is an asymmetry in the generated skew between increases and decreases in threshold voltage. Recall from Section 3.7.1 that the process-compensated current reference had a larger variation for the FF corner as opposed to the SS corner. A decrease in threshold voltage corresponds to a variation that contributes to the FF corner whereas an increase corresponds to a variation that contributes to the SS corner. The asymmetry in generated skew seen above can be directly attributed to this asymmetry in current variation in the current reference. This suggests that a redesign of the current reference to produce symmetric variation in the generated current will result in increased receiver insensitivity when there is a positive variation in threshold voltage.

4.2.4 Oxide Thickness Variation Analysis \((t_{ox})\)

Oxide thickness variation affects the gate capacitance of a MOSFET. This directly affects the current through a device as well as the load seen by the output of an amplifier as the gate capacitances of the load devices change. Figure 4-5 shows the output waveforms when the oxide thickness is varied, and Table 4.6 quantifies the skew seen in the figure. While the generated skew is small, it is not insignificant: once again, long gate lengths and process-compensation help to make the receiver more robust to this type of variation.
4.2.5 Resistor Variation Analysis

Since there are a large number of resistors in this design, primarily in the input and amplification stages, it is important to analyze how variation in the sheet resistance of the poly resistors affects circuit performance. According to the TSMC design rules, the sheet resistance of P+ Poly resistors can vary as much as ±65Ω about the nominal sheet resistance of 311Ω. This is a variation of approximately ±21%.

<table>
<thead>
<tr>
<th>$\Delta t_{ox}$ (%)</th>
<th>Skew (from TT Corner)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.408nm (-10%)</td>
<td>-7.94ps</td>
</tr>
<tr>
<td>0.204nm (-5%)</td>
<td>-5.05ps</td>
</tr>
<tr>
<td>0.204nm (+5%)</td>
<td>+3.96ps</td>
</tr>
<tr>
<td>0.408nm (+10%)</td>
<td>+9.65ps</td>
</tr>
</tbody>
</table>

Table 4.6: Generated skew due to oxide thickness variation
Figure 4-6: Output waveforms when subject to resistor variation

Figure 4-6 shows that even with resistor variation of +/-20%, the circuit performs extremely well, with low skew. Table 4.7 quantifies the generated skew results.

<table>
<thead>
<tr>
<th>$\Delta R_{sh}$</th>
<th>Skew (from TT Corner)</th>
</tr>
</thead>
<tbody>
<tr>
<td>-20%</td>
<td>-5.53ps</td>
</tr>
<tr>
<td>-10%</td>
<td>-6.07ps</td>
</tr>
<tr>
<td>+10%</td>
<td>9.02ps</td>
</tr>
<tr>
<td>+20%</td>
<td>21.7ps</td>
</tr>
</tbody>
</table>

Table 4.7: Generated skew due to resistor variation

The variation in $R_{sh}$ is independent of resistor size to first order and so it is hard to design around resistor variation. However, since this circuit is fully differential, the absolute magnitude of the resistor values is not as significant as the relative matching between the two resistors in a differential pair, which is analyzed in the following subsection.
4.2.6 Mismatch Analysis

Chapter 2.2.3 introduced the notion of mismatch between transistors. Mismatch is typically modelled by using Eq. 4.1 (identical to Eq 2.9 without the spacing term) and setting up Gaussian distributions for each of the mismatching parameters.

\[ \sigma^2(\Delta P) = \frac{A^2_m}{WL} \]  

(4.1)

For transistors, the best way of modelling mismatch is to lump all possible mismatch into two parameters: \( \sigma^2(\Delta V_t) \) and \( \sigma^2(\Delta \beta) \) which describe mismatch in threshold voltage and \( \beta \), or current factor. Since \( \beta \) does not directly exist in any MOSFET equation, it is used to compute a new width, \( W' \), to model the mismatch in current. In order to determine the standard deviations, \( \sigma^2 \), of the parameters, the coefficient \( A_m \) is needed. These vary from process generation to process generation; typically most fabs do not publish these coefficients for a particular process and designers must rely on large-scale measurements done by researchers. For the 0.18µm process generation, the coefficients \( A_{V_t} \) and \( A_{\beta} \) are given in Table 4.8 [31].

<table>
<thead>
<tr>
<th>Mismatch ParameterCoefficient</th>
<th>0.18µm Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>( A_{V_t} )</td>
<td>4mV·µm</td>
</tr>
<tr>
<td>( A_{\beta} )</td>
<td>1%·µm</td>
</tr>
</tbody>
</table>

Table 4.8: MOSFET matching coefficients for the 0.18µm generation

In this circuit, there are two other significant sources of mismatch: input photocurrent and resistor mismatch. Gaussian distributions were also assumed for these parameters, with a 1σ variation of the input photocurrent of 10% and a 1σ variation of resistor matching of 5%. Both of these numbers are conservative: given that overall poly resistor variation in this process has a 4σ variation of 27%, a 1σ mismatch of 5% is extremely conservative.

Monte Carlo analysis was performed in HSpice to ensure functionality of the circuit given mismatch in the above parameters as well as to see variation in circuit performance. 100 Monte Carlo iterations were performed to obtain enough data points to
make the results statistically sound. Figure 4-7 shows the differential output signal over all 100 iterations. As can be seen, there are no failures present (failures would be signals that remain at the power supply rails through switching transients for a particular iteration). This simulation validates the offset compensation circuitry and gives high confidence that the circuit will function, and function well despite very substantial mismatch.

![Output waveforms when subject to mismatch. Outp8 is shown in the dark, straight plots while outn8 is shown in the lighter, dashed plots.](image)

From this simulation, the mean and standard deviation of two important parameters can be calculated: the duty cycle of the output signals and the generated skew due to mismatch. One would envision the duty cycle having a mean of roughly 50% while the generated skew should be centered about a mean of 0ps.

Figure 4-8 show histograms of the duty cycles of the output differential signals. They are roughly centered about 48% and 52% for the positive and negative signals, respectively. As is expected, both appear to be Gaussian in nature with a slightly
skewed distribution.

![Figure 4-8: (a) Histogram of duty cycle for outp8 subject to mismatch (b) The same for outn8.](image)

Table 4.9 shows the statistics associated with the above histograms. Since the Monte Carlo results appear to be normally distributed from the histograms, the standard error on the mean is given by $\frac{s}{\sqrt{n}}$, where $n$ is the number of samples (100 in this case). The error on the variance is computed from a $\chi^2$ distribution [32], where the confidence interval on the variance is given by Eq 4.2. For a 95% confidence interval, $\alpha = 0.05$ and $\chi^2_{\alpha/2,n-1} = 128.422$, $\chi^2_{1-\alpha/2,n-1} = 73.3611$; these values are used to establish the confidence interval on variances in Table 4.9.

$$\frac{(n-1)S^2}{\chi^2_{\alpha/2,n-1}} \leq \sigma^2 \leq \frac{(n-1)S^2}{\chi^2_{1-\alpha/2,n-1}} \quad (4.2)$$

<table>
<thead>
<tr>
<th>Signal</th>
<th>Mean Duty Cycle, $\mu$</th>
<th>2 MSEs*</th>
<th>$\sigma$</th>
<th>$S^2$**</th>
<th>Error in $S^2$***</th>
</tr>
</thead>
<tbody>
<tr>
<td>outp8</td>
<td>47.360%</td>
<td>+/- 0.414%</td>
<td>2.07%</td>
<td>4.28%</td>
<td>-0.98%/ + 1.50%</td>
</tr>
<tr>
<td>outn8</td>
<td>52.557%</td>
<td>+/- 0.492%</td>
<td>2.46%</td>
<td>6.07%</td>
<td>-1.40%/ + 2.12%</td>
</tr>
</tbody>
</table>

Table 4.9: Statistics for duty cycle in Monte Carlo simulations.

* MSE = Mean Standard Error, 2 MSEs corresponds well to a 95% confidence interval on the mean

** $S^2$ is the sample variance

*** For a 95% confidence interval on $S^2$

These results validate the operation of not only the offset compensation circuitry
but also the replica feedback biasing. Despite the presence of various mismatch sources, both sets of circuitry do well to keep the signal biased very close to the switching threshold of the inverters of the output stage. Without doing so, there would be much larger variations in the duty cycle as well as complete signal losses in worst case mismatch situations.

Figure 4-9: Histogram of generated skew when receiver is subject to mismatch

Figure 4-9 shows a histogram of the simulated skew when the circuit is subject to mismatch. The histogram shows a close to normal distribution centered near $-1.5\,\text{ps}$ of skew. From the histogram, maximum skew due to mismatch is $+/-\sim 20\,\text{ps}$ which is about 4% of the cycle time of a $2GHz$ clock. Table 4.10 shows the statistics for the skew measurements of the Monte Carlo simulation. Since this simulation was setup with parameters distributed about those of the TT corner, a rather simple conclusion can be made about worst-case skew due to process variation: Skew due to mismatch will add to skew due to process corners. By assuming that the process corner parameters in Tables 4.1 and 4.2 correspond to $+/-3\sigma$-limits, we can carry this through to the skew results presented in Table 4.3 by assuming that they too correspond to $+/-3\sigma$-limits of generated skew. Since the total skew from SS to FF corners is $114\,\text{ps}$, one standard deviation ($1\sigma$) of skew is approximately $\frac{114\,\text{ps}}{6} = 19\,\text{ps}$. From Table 4.10, a single standard deviation of skew due to mismatch is $6.13\,\text{ps}$. These can be combined according to Eq. 4.3, resulting in the standard deviation of skew due to worst-case process corners and mismatch. Multiplying by six (to find
the +/-3σ worst-case limit), the worst-case skew for this design is \( \sim 120ps \) of skew or \( \sim 24\% \) of the cycle time.

\[
\sigma_{\text{SkewWorstCase}} = \sqrt{\sigma_{\text{SS-FF}}^2 + \sigma_{\text{Mismatch}}^2} = 19.96ps
\]  

(4.3)

<table>
<thead>
<tr>
<th>Signal</th>
<th>Mean Skew, ( \mu )</th>
<th>2 MSEs*</th>
<th>( \sigma )</th>
<th>( S^{2**} )</th>
<th>Error in ( S^{2***} )</th>
</tr>
</thead>
<tbody>
<tr>
<td>outp8</td>
<td>-1.22ps +/- 1.226ps</td>
<td>6.13ps</td>
<td>37.5344ps</td>
<td>-8.6ps/ + 13.12ps</td>
<td></td>
</tr>
</tbody>
</table>

Table 4.10: Statistics for generated skew in Monte Carlo simulations. All definitions are the same as the previous table.

This estimation of worst-case skew is overly pessimistic. From [30], we know that process corners are typically worst-case unidimensional parameters combined in a certain way (often in a manner that depicts best or worst-case device performance) and extracted from many wafers. Furthermore, correlations between device parameters are not taken into account, often leading to simulations that are too pessimistic. Semiconductor fabs often consider spatial dependency data proprietary, making it difficult to know whether a single die contains devices that behave according to a single process corner, or multiple process corners. In the best case, where all devices on a die behave according to a single process corner, Monte Carlo simulations provide the best estimate of generated skew. However, in the worst case, where some devices behave according to one process corner, other devices behave according to another corner and yet other devices behave according to a third corner, Eq. 4.3 is a better estimator of generated skew.

Nevertheless, Eq. 4.3 is still conservative due to the fact that often a designer is not worried about skew from one corner of the die to another. It is rarely the case that a signal travels across chip in such a fashion. Even if it did, it is most likely not part of the critical path (the path determining the highest clock frequency that the chip can be operated at). Regardless, in today’s high-speed designs, signals traversing the chip will take multiple clock cycles to arrive at their destinations and must be resynchronized. Therefore, skew between two opposite corners of a die is far
less critical than skew between two adjacent clock domains.

### 4.3 Environmental Variation

The previous section analyzed the major sources of process variation. However, there are also dynamic sources of variation as seen in Chapter 2; these change and affect the receiver while it is operating. Temperature and power supply variation are the primary sources of environmental variation. The next two subsections will quantify receiver skew due to these sources of variation.

#### 4.3.1 Temperature Variation Analysis (ΔT)

![Output waveforms when subject to temperature variation](image)

**Figure 4-10:** Output waveforms when subject to temperature variation

Eqs. 2.10 and 2.11 showed that temperature primarily affects threshold voltage and mobility. One would expect that since the process-compensated current refer-
ence compensates for exactly these two parameters, the resultant skew will be low. Figure 4-10 shows the output waveforms when the circuit is subject to temperatures as high as +25.2% of ambient temperature (25°C, 273.15K). Table 4.11 quantifies the generated skew.

<table>
<thead>
<tr>
<th>Temp in Kelvin (%)</th>
<th>Skew (from TT Corner)</th>
</tr>
</thead>
<tbody>
<tr>
<td>273.15K (-8.4%)</td>
<td>-14.0ps</td>
</tr>
<tr>
<td>323.15K (+8.4%)</td>
<td>+15.2ps</td>
</tr>
<tr>
<td>348.15K (+16.77%)</td>
<td>+35.2ps</td>
</tr>
<tr>
<td>373.15K (+25.16%)</td>
<td>+51.5ps</td>
</tr>
</tbody>
</table>

Table 4.11: Generated skew due to temperature variation

The receiver performs well with respect to temperature variation. Compared to Lum's design in [4] which used a bandgap voltage reference to achieve temperature insensitivity, this design performs slightly better. Since temperature normally degrades gain due to degraded mobility, when combined with the SS corner and lowered $V_{DD}$, gain in the amplification stage can be degraded so much that the receiver is not able to output a signal without increased photocurrent. Therefore, it is important to simulate all possible combinations and ensure that the optical power at the photodiodes is sufficient to allow the receiver to output a clock signal even in the worst case. For this design, this requires > 40μA of differential input photocurrent (20μA at each photodiode).

### 4.3.2 Power-Supply Variation Analysis ($\Delta V_{DD}$)

Variation in the power supply affects many circuit parameters that can adversely impact the operation of the receiver circuit. Figure 4-11 shows the output waveforms when the circuit is subject to +/-10% power supply variation. As can be seen from the figure and Table 4.12, skew due to power supply variation is relatively low, even without a regulated power supply. This may be further improved with the use of a regulated power supply. Rather than incorporate a voltage regulator into the circuit, a separate, regulated, analog power supply can be used to power all clock receiver
circuits on a chip. Since there may be only 16-64 of these circuits on a chip, the power required is relatively small. Furthermore, the separate power supply will help ensure that digital switching noise that couples into the power supply will not easily couple into the separate analog power supply.

<table>
<thead>
<tr>
<th>$\Delta V_{DD}$ (%)</th>
<th>Skew (from TT Corner)</th>
</tr>
</thead>
<tbody>
<tr>
<td>$-0.18V$ (-10%)</td>
<td>$-32.3\text{ps}$</td>
</tr>
<tr>
<td>$+0.18V$ (+10%)</td>
<td>$+19.3\text{ps}$</td>
</tr>
</tbody>
</table>

Table 4.12: Generated skew due to power supply variation
4.4 Noise Analysis

There are many sources of noise in an integrated circuit, including resistor thermal noise (also called Johnson noise), transistor noise sources (thermal noise, shot noise, and 1/f or flicker noise), and white noise caused by external electromagnetic interference (EMI). These noise sources and their effect on the clock receiver circuit are considered in this section.

4.4.1 Resistor Thermal Noise (Johnson Noise)

Noise in resistors is a consequence of Brownian motion [33]. Charge carriers in a conductor are thermally agitated and create a randomly varying current which gives rise to a randomly varying noise voltage in a resistor through Ohm’s Law. The mean-square open-circuit noise voltage is given by Eq. 4.4:

$$\overline{e^2_n} = 4kTRf$$

(4.4)

Resistor noise is thus greatly dependent on the temperature and resistance of the resistor.

4.4.2 Transistor Noise Sources

There are various noise sources in FETs. The four main sources that are of concern to this thesis are drain current noise, gate noise, shot noise and flicker noise. Drain current noise is very similar to thermal noise in resistors and stems from the fact that a transistor is essentially a voltage-controlled resistor. As such, Eq. 4.5, which quantifies the drain current noise of FETs, looks very similar to Eq. 4.4:

$$\overline{i_{nd}^2} = 4kT\gamma g_{do}\Delta f$$

(4.5)

where $g_{do}$ is the drain-source conductance at zero $V_{DS}$, and $\gamma$ is unity at $V_{DS} = 0V$ and 2 to 3 (or more) in saturation for short-channel devices.
Thermal agitation of the channel charge results in gate noise [33]. This noise is generally small at low frequencies but can dominate at higher (RF) frequencies. This type of noise is modelled using Eq. 4.6:

\[ \overline{v_n^2} = 4kT\delta r_g \Delta f \]  

(4.6)

where \( r_g \) is

\[ r_g = \frac{1}{\delta g_{d0}} \]  

(4.7)

and \( \delta \) is 4 to 6 for short-channel devices, although its precise behavior is not well understood for short-channel devices.

Shot noise is primarily due to the discrete nature of charge. The randomness of the electronic “bundles” of charge results in white shot noise. This is described by Eq. 4.8, where \( q \) is the electronic charge and \( I_{DC} \) is the DC current flowing through the transistor.

\[ \overline{i_n^2} = 2qI_{DC}\Delta f \]  

(4.8)

As a reference, the RMS current noise density is approximately 18pA/\( \sqrt{Hz} \) for a 1mA value of \( I_{DC} \). Typically, only the DC gate leakage current of FETs contributes shot noise. Since this current is typically very small, it is usually not a significant noise source. However, as processes scale and gate oxide thicknesses decrease, gate leakage is becoming larger and will contribute more significantly as a noise source.

The least understood but perhaps most significant noise source is 1/f, or flicker, noise. This type of noise has a spectral density that increases as frequency decreases. Many different phenomena share this 1/f character but since it is not very well understood, any expression for this noise source is governed by empirical parameters. For MOSFETs the mean-square 1/f drain noise current is given by Eq. 4.9:

\[ \overline{i_n^2} = \frac{K}{f} \cdot \frac{g_m^2}{WLC_{ox}^2} \cdot \Delta f \]  

(4.9)

where \( K \) is an empirical constant for a given process. For the TSMC 0.18\( \mu \)m process, \( K \approx 9.3 \times 10^{-25} \text{ C}^2/m^2 \) for NMOS devices. This process also includes an exponent,
$A$, so that it is a $1/f^A$ dependence. This parameter is $A \approx 0.83$ for NMOS devices in this process.

### 4.4.3 External White Noise

Noise due to external electromagnetic interference (EMI) typically is of a Gaussian distribution but difficult to include in simulations. Furthermore, since the design in this thesis is a fully-differential signal path, one would expect this type of noise to be largely rejected as this noise would appear to be a common mode variation. Nevertheless, mismatch will affect this rejection and some noise will inevitably couple into the signal.

### 4.4.4 HSpice Noise Analysis

![Output (outp8) noise spectral density of receiver circuit](image)

Figure 4-12: Output (outp8) noise spectral density of receiver circuit

The HSpice circuit simulator also performs noise analysis on the circuit. The noise
sources it models are thermal, shot and flicker noise. Noise analysis was performed on the receiver circuit to determine the resultant jitter based purely on the circuit and device noise sources mentioned above. HSpice outputs a noise spectral density as shown in Figure 4-12, where the $y$-axis has units of $\frac{V}{\sqrt{Hz}}$. Squaring the noise spectral density gives the mean-square noise density and taking the integral over the spectrum gives the mean-square open-circuit noise voltage at the output as shown in Eq. 4.10:

$$\frac{V_{n, outp8}^2}{N^2} = \int_{f_{min}}^{f_{max}} N^2 df$$

(4.10)

where $f_{min} = 1Hz$, $f_{max} = 100GHz$ and $N$ is the noise spectral density as output by HSpice. Taking the square root of this quantity gives the equivalent RMS noise across the spectrum at the output of the receiver circuit.

A worst-case RMS jitter can be derived from this result from Eq. 4.11 [34].

$$Jitter_{RMS} = \frac{V_{n, outp8}}{SR}$$

(4.11)

where

$$SR = \frac{dv_{outp8}}{dt}$$

(4.12)

Thus, the rise and fall times, which determine the slew rate (SR) of the circuit, must be small to minimize jitter.

The noise spectral density output of HSpice was imported into Matlab and the calculations above were performed to get the results shown in Table 4.13.

<table>
<thead>
<tr>
<th>$V_{n, outp8}$ (V)</th>
<th>Minimum SR (V/ps)</th>
<th>RMS Jitter</th>
</tr>
</thead>
<tbody>
<tr>
<td>$88.8 \times 10^{-3}$</td>
<td>$3.151 \times 10^{10}$</td>
<td>$2.82ps$</td>
</tr>
</tbody>
</table>

Table 4.13: Simulated worst-case TT corner RMS noise and jitter

While the worst-case jitter is extremely small at $2.82ps$ (0.56% of the clock period), this method is conservative as it integrates over the entire spectrum. By integrating over a limited, but significant range of frequencies, a more optimistic jitter estimate can be calculated. Furthermore, SR was computed using the 10-90% rise time which
gives a smaller slope than if the instantaneous slope around the switching threshold of an inverter is used due to non-linearities as the signal nears the power-supply rails. However, this also does not take into account power-supply noise. This noise source is hard to simulate and jitter is practically determined by using a sampling oscilloscope to capture successive clock cycles, overlay them and create an eye-diagram, from which worst-case jitter can be determined.

Despite the fact that it is hard to model and simulate jitter due to power-supply noise, some insight can be derived from the skew simulations due to power-supply variation. In Section 4.3.2 skew due to power-supply variation is $-32.3\text{ps}/19.3\text{ps}$. If noise exists on the power-supply busses, these skew simulations gives some indication of the magnitude of the jitter that will be present. In the worst case, if there is an instantaneous 10% drop in the power supply, this could result in a worst-case jitter of $-32.2\text{ps}$. However, this will be highly dependent on the frequency characteristics of the noise on the power supplies as well as the bandwidth of the receiver. Furthermore, much of the noise will appear as a common-mode variation and be rejected by the receiver circuit due to its fully-differential architecture. Nevertheless, power-supply noise will dominate over other sources of noise, with worst-case noise being near the simulation of skew due to power-supply variation: roughly $35\text{ps}$, or $\sim7\%$ of the clock cycle. Having a separate analog/digital power supply will definitely aid in reducing noise on the supplies and the resultant jitter.

### 4.5 Summary

This chapter has provided a detailed quantification of the circuit operation. Many sources of clock skew have been identified and analyzed. The circuit topology and architecture have greatly helped in reducing skew overall. A complete noise analysis was done using HSpice and the resultant jitter was analyzed. The simulated worst-case RMS jitter due to inherent thermal, shot and flicker noise at the TT corner is extremely small ($\sim3\text{ps}$). However, it is more likely that power-supply noise will dominate and result in worst-case jitter on the order of $35\text{ps}$. 

Chapter 5

Silicon Layout

Chapter 3 gave the detailed circuit design for the receiver circuit. After the circuit has been designed and simulated, it must be laid out in silicon. This chapter will describe the layout of the circuit for the TSMC 0.18µm process. A brief introduction to the common-centroid layout practice will precede the sections detailing the layout of each component of the receiver.

5.1 Common-Centroid Layout, Interdigitation and Dummy Devices

Common-centroid layout is a common layout practice used to reduce stress-induced and spatially dependent mismatches. By sectioning matched devices (like the two load resistors used in the differential pair amplifiers), and arranging them to form a symmetric pattern, the centroid of the device will lie at the intersection of the axes of symmetry [35]. It is then possible to arrange two matched devices so that their axes of symmetry and centroids coincide. A common example of such a common-centroid array is shown in Figure 5-1 where devices A and B each contain two identical sections which are laid out such that they share a common axis of symmetry. Notice also that two sections of device A are on the outside of device B. This arrangement of sections is known as an interdigitation pattern. It is this interdigitation pattern that results
in the common axis of symmetry shared by the devices. The horizontal axis is also shared but is a result of section symmetry and not the interdigitation pattern. Since resistors typically have large aspect ratios, they are the only devices which use the one-dimensional pattern shown.

![Common axis of symmetry](image)

Figure 5-1: Example of a common-centroid array

Matching MOS transistors usually involves interdigitation of the devices into fingers and a two-dimensional common-centroid pattern as shown in Figure 5-2.

By sharing a common centroid, devices are made less sensitive to process biases, stress gradients and package shifts. However, there are diffusion and etch effects which require the use of dummy gates (also shown in Figure 5-2). Polysilicon does not etch uniformly and tends to overetch near the edges of a large poly opening. The use of dummy gates helps ensure that only the dummy gates experience this overetch, leaving the fingers of the actual devices to etch uniformly and reduce mismatch. Furthermore, by placing fingers of devices in opposite corners, both horizontal and vertical symmetry can be achieved, further reducing mismatch. It is important to note that the transistors must share a common node. If the transistors do not, a dummy gate can be placed in between the sections so each section shares a node with a dummy gate. The simplest use of this pattern is for matching pairs of transistors.
5.2 Input Stage Layout

Careful layout of the input stage is critical to correct receiver functionality. Since there is a large amount of gain after the input stage, it is critical that the two halves of the input stage be as close to identical as possible to minimize mismatch and the offset that would result. The layout of this stage is shown in Figure 5-4 with a guide
to the layout shown in Figure 5-3. The circuit schematic is shown in Figure 3-7 for reference.

![Figure 5-3: Guide to input stage layout](image)

Since all devices need to be matched between the halves, an aggressive common-centroid layout pattern is used. This explains the dual rows of fingers for each set of matching devices. The large metal areas surrounding the devices are needed for low-resistance paths to the power and ground rails. Furthermore, biasing devices that need to be matched to the FET current sources are incorporated into the common-centroid layout patterns. It is critical that this stage is kept far from any on-chip power devices or any other devices with large swing and large currents, as noise on the power rails greatly affects this stage due to the small signal amplitudes at the input. Overall receiver sensitivity will be determined primarily by the noise affecting this stage.

### 5.3 Amplification Stage Layout

There are three main components in the amplification stage: 1) differential pair amplifiers with resistive loads; 2) differential pair op-amp; and 3) low-pass filter.
5.3.1 Differential Pair Amplifier Layout

Since the differential pair amplifier is replicated four times, it is laid out once and then copied three additional times in the layout. This modular approach simplifies layout and ensures that all four replicates are identical. The layout of this amplifier also uses common-centroid layout for matching of the input pair, the load resistors and the current source FETs at the tail, as shown in Figure 5-5.

At the top are the load resistors, each sectioned into two sections in the ABBA
The layout of each of these components will be discussed before the layout of the complete amplification stage.

5.3.1 Differential Pair Amplifier Layout

Since the differential pair amplifier is replicated four times, it is laid out once and then copied three additional times in the layout. This modular approach simplifies layout and ensures that all four replicates are identical. The layout of this amplifier also uses common-centroid layout for matching of the input pair, the load resistors and the current source FETs at the tail, as shown in Figure 5-5.

At the top are the load resistors, each sectioned into two sections in the $ABBA$
pattern with dummy resistors on both sides to prevent mismatch due to poly etch variation. The transistors immediately below resistors Rn2 and Rp2 are the input pair transistors. They are laid out in the $sD_a sB_D D_s/sD_a B_s A_D D_s$ two-dimensional common-centroid pattern, where the large $D$ stands for a dummy device. Each device has two fingers. To the left of those devices are the two large biasing devices, each with ten fingers laid out in the same orientation as the input pair.

### 5.3.2 Differential Pair Op-Amp and Low-Pass Filter Layout

Section 3.4.2 detailed the need for replica feedback biasing. The layout of the op-amp that makes up this biasing scheme is shown in Figure 5-6. The layout of this component is very similar to that of the differential pair amplifier. The primary difference between the two is that the op-amp is actively loaded with PMOS transistors rather than resistors. To ensure that the DC bias point of the last gain stage is exactly the
low-pass filter occupies the lower half of the layout. The two 10kΩ resistors are sectioned into four sections each and are laid out in an AABBBAA pattern on the lower left of the layout, surrounded by dummy resistors. The 1pF MOS capacitor is adjacent to the resistors on the right. A MOS triode device directly above the capacitor is used for further attenuation of the signal. The reference inverter is in the upper right corner of the layout, adjacent to the op-amp, with the last amplifier occupying the upper left corner.

5.3.3 Layout of the Complete Amplification Stage

The layout of the complete amplification stage is shown in Figure 5-8. The layout is very simple, with all the gain stages laid out adjacent to each other. The outputs of each gain stage are connected to the inputs of the following stage using metal 2 wires.
Figure 5-8: Complete layout of the amplification stage
5.4 Output Stage Layout

The output stage is comprised of three inverters for the positive half of the differential signal and three for the negative half. In order to maintain consistency and ensure that mismatch is minimized even after the analog amplification of the signals, these inverters are laid out in a common-centroid fashion as well. The layout of the entire output stage is shown in Figure 5-9.

Three distinct sets of devices are seen in the layout. The leftmost set corresponds to the first set of inverters. Notice in the first set that there are additional PMOS devices. These correspond to the two MOS triode devices used for resistive feedback around the first inverter on each side. Two additional dummy triode devices are laid out for symmetry and better matching. The second and third sets directly correspond to the second and third sets of inverters on each side. Each set of matched devices (both PMOS and NMOS) are laid out in the $s_Ds_As_BsDs/sDsBsAsDs$ common-
centroid pattern. The wiring that connects each stage to the following is also laid out symmetrically to ensure that the load and resistance seen by the positive and negative signals are equivalent.

5.5 Layout of Offset Compensation Circuitry

The offset compensation circuitry was relatively simple circuit-wise. However, the layout of this circuitry must be completed very carefully to ensure that offsets due to mismatch within the offset compensation circuitry are extremely small. Long-channel devices are used to increase the area of the devices, making them less susceptible to variation and mismatch. Furthermore, common-centroid layout is used as seen in Figure 5-10, which shows the layout of the differential pair with biasing transistors. The on-chip MOS triode devices that act as resistors are seen in Figure 5-11. The capacitors that make up the low-pass filter are too large to be laid out on-chip and are therefore external.

5.6 Layout of Biasing Circuitry

The only remaining circuitry to be laid out is the biasing circuitry. Matching in this circuitry is not as critical as that in the signal path. The long channel lengths
Figure 5-11: Layout of the MOS Triode resistors in the offset compensation circuitry (> 2μm) being used already reduce the circuit's sensitivity to variation and mismatch. Additionally, the only devices being matched are of different widths and the use of common-centroid layout is less applicable in this case. Of concern though, are the PMOS devices with body connections that do not connect to $V_{DD}$. These devices are called “hot-well” devices and precautions must be taken to ensure that they are properly laid out so that latch-up does not occur: these devices need to have large guard rings surrounding them and must be spaced at least 25μm away from any other devices. The layout of the biasing circuitry is shown in Figure 5-13 with a guide to the layout in Figure 5-12.

![Layout of the MOS Triode resistors in the offset compensation circuitry](image)

**Current Mirror Network**

**Process-Compensated Current Reference and $V_i$ Generator (w/o Hot N-Well Transistors)**

**Hot N-Well PMOS Transistors with Guard Rings**

Figure 5-12: Guide to layout of biasing circuitry
5.7 Layout of the Complete Receiver Circuit

The complete receiver is achieved by carefully laying out all the components mentioned above and connecting them. Important constraints are that 1) connections between stages be short so that parasitics do not become too much of an issue; 2) connections between stages should be balanced so that both positive and negative signals see the exact same load; 3) overall area should be minimized, and 4) wide power and ground busses should be used to minimize IR drops and signal coupling. Figure 5-14 shows the layout of the complete receiver with an overlayed guide to each of the components.

As is seen from the layout, each of the stages in the signal path are laid out such that the outputs of one stage are as close as possible to the inputs of the next stage. Although it is hard to see from Figure 5-14, the wires connecting stages are as short as possible and balanced. The overall area of the circuit is $125.77\mu m \times 114.2\mu m$, or $0.01436mm^2$.

The large power and ground busses are $13\mu m$ wide each. Since the circuit dissipates approximately $13mW$, this translates to $7.2mA$ of current. In order to satisfy electromigration concerns, the busses need to be at least $1\mu m$ wide to handle $1mA$
Figure 5-14: Layout of complete receiver circuit
of current. Sizing them at 13μm is sufficient to handle the 7.2mA the circuit draws and ensures that any IR drops will be small.

5.8 Layout Extraction and Verification

Once the circuit has been laid out, the connectivity and functionality must be verified. To do this, the laid out circuit is extracted and then compared to the schematic. The process of extraction results in a netlist of devices (transistors, resistors and capacitors) and their connectivity. This netlist is then input to the LVS (Layout vs. Schematic) tool that is part of the Cadence framework. The LVS tool compares the extracted netlist with that of the schematic. If any differences between the two are found, a report is generated summarizing the differences.

![Figure 5-15: Extracted output waveforms to verify functionality](image)

Once the layout matches the schematic in terms of connectivity, the extracted
circuit must be simulated to verify correct functionality. The extraction tool also includes distributed parasitics (resistors and capacitors) from the layout in the extracted netlist. This netlist was then simulated in HSpice and Figure 5-15 shows the output waveforms for the extracted netlist. The output is exactly what is expected, with the exception of rise/fall times being slightly slower due to inclusion of parasitics. With functionality verified, the layout is complete.

It is important to note that during initial layout and extraction cycles, the clock signal was noticed coupling into the power and ground supplies. While the ripple on the supply lines due to this coupling is small (∼1mV\text{p-p}), this affects the input stage greatly. This ripple causes the biasing currents and voltages to ripple slightly as well, which in turn affects the large FET current sources in the input stage. Due to the large sizing of these transistors, even a 0.5mV ripple can cause a current variation of ∼1\mu A. If the input photocurrent is 5\mu A, the variation is approximately 20% of the input signal, which can result in the receiver not being able to recover the clock signal. In order to ensure that the signal is recoverable, the input photocurrent needs to be at least 10\mu A. As was discussed in Chapter 4, noise at the input stage will be the limiting factor in receiver sensitivity. With the input photocurrent having a peak-to-peak amplitude of 10\mu A, the recovered signal is very clean, despite coupling of the signal into the power supplies.

5.9 Summary

This chapter has walked through the layout of each component in the receiver circuit. An overview to common-centroid layout practices was also presented. The Cadence LVS tool aided in verifying connectivity of the laid out circuit and simulations of the extracted circuit verified functionality. With the exception of minor variation in the rise/fall times and duty-cycle, the laid out circuit performs as expected.
Chapter 6

Test Chip

With the layout of the circuit complete, a strategy for testing the circuit once it is fabricated is necessary. This chapter will explore various circuit variants and outline a basic procedure for testing the circuit. Since the objective is to verify basic circuit functionality, all variants will be dedicated to this rather than measuring actual skew or jitter and comparing the results to those of simulation.

6.1 Demonstrating Basic Functionality

The easiest way of demonstrating that the circuit functions is to modulate a laser at $2GHz$ (preferably sinusoidal modulation, although square wave modulation will work also) and look at the outputs. However, viewing the outputs at signal frequency is hard due to the large capacitances and inductances present in driving a signal off-chip. One possible approach is to have a series of cascaded inverters to drive the load, although the last few inverters would have to be extremely large to drive the large pad and pin capacitances. A second, more feasible approach is to divide down the clock signal to a much slower frequency. This is done using a number of flip-flops that divide down the signal, as shown in Figure 6-1. Once the clock signal has been divided down to something in the $10MHz - 100MHz$ range, it can be driven off-chip by large inverter buffers.
6.1.1 Optical Setup

The optical setup needed for testing must be very precise. All light directed to photodiodes needs to be focused onto the photodiodes using lenses so that none of the optical power is wasted. Furthermore, since the optical signal must be differential, the negative optical signal can be created by splitting the original signal (using a beam splitter) and delaying one of the two beams. This delay is achieved by simply having a longer path for the negative beam. This second path must be $7.495\text{cm} (250\text{ps} \times c)$ longer than the first path to get a signal that is $180^\circ$ out of phase at $2\text{GHz}$. Both beams must then be focused onto the respective photodiodes. The complete optical setup will look similar to Figure 6-2.

All the optical components (laser, beam splitter, mirrors, and lenses) should be situated on a stable optical bench so that precise focusing can be achieved. Furthermore, all components should be precisely tunable since the photodiodes are only $40\mu m \times 40\mu m$ with a spacing of approximately $50\mu m$.

6.1.2 Electrical Testing

Since the photodiode integration process may take some weeks to months to be completed after the test chip returns from fabrication, electrical testing is desired to prove functionality even before the photodiodes are integrated. In order to accomplish this, a simple modification can be made to the input stage of the circuit: the photodiode is replaced by a PMOS transistor. Since a transistor is a voltage-controlled resistor, a voltage can be applied to the gate of the transistor and if it is sized correctly, a
current equivalent to that generated by a photodiode will flow between the drain and source of the transistor. This is shown in Figure 6-3. The sources of the transistors are connected to $V_{DD}$ and the drains are connected to $\text{in}_p$ and $\text{in}_n$, nodes the photodiodes would normally connect to. In order to maintain the same nodal voltage at those nodes and source a current similar to that generated by the photodiodes, these input transistors should be sized at $\frac{W}{L} = \frac{3\mu}{0.36\mu}$. Each of the transistors should be driven with a $40mV_{pp}$ signal biased around 0.8V to get $\sim10\mu A$ of current. Table 6.1 gives the appropriate signal amplitudes and biasing to generate a desired current at the TT corner.

Doubling the peak-to-peak voltage will double the current in general. As the input becomes larger, transistor non-linearities increasingly affect the generated current and so the bias point must be shifted to accommodate this.

The one issue with this circuitry is that it requires two input signals to be brought in from off-chip. Ensuring both are exactly out of phase by $180^\circ$ is difficult. Rather
than doing this, on-chip single-ended to differential conversion circuitry has been designed. A simple differential pair amplifier is used with the negative input being the DC value of the positive input (achieved through a low-pass filter), as shown in Figure 6-4.

The outputs of the first differential amplifier differ in gain but the phase is very close to being $180^\circ$ out of phase, within a range of $\pm 5^\circ$. In order to balance the gain, a second differential amplifier is added. While the outputs of this second amplifier are still asymmetric, the amplitude of the positive and negative signals is closer to being equal. Simulations show that the offset compensation circuitry already integrated into the receiver will aid in correcting the offset that exists due to the imprecise single-ended to differential conversion.

The outputs are then used to drive the PMOS transistors that replace the photodiodes in the output stage. The differential pair amplifiers have been designed to act like voltage buffers, and have very little gain. Due to inclusion of parasitics in this stage and the subsequent PMOS transistors which convert from voltage to current,
this gain is not sufficient and extracted simulations show that the amplitude of the input to this conversion circuitry needs to be in the 0.1V to 1.0V range, depending on process corners. A bias current of 750μA also needs to be provided for the differential amplifiers although a slight modification to the biasing circuitry discussed in Section 3.7 enables on-chip generation of this bias current and saves a pin.

6.2 Circuit Variants

In order to prove functionality despite the presence of variation, a number of circuit replicates should be put on a single chip. In addition, there are some circuit variants that should be placed on the chip to isolate problems.

6.2.1 Circuit Without Biasing

In order to ensure that the biasing circuitry functions and is not a cause of problems, one variant of the circuit will be the complete circuit without the process compensated current reference or the $V_i$ generator. The current mirror network will still be present. This variant will require an external voltage being provided to a single transistor in the current mirror network. The voltage should be $0.8925V$. The voltage should be
extremely stable and should have precision to the nearest 1mV.

6.2.2 Circuit Without Offset Compensation

A second variant should be one without the integrated offset compensation circuitry. This circuitry was added to correct for any mismatch present in transistors, resistors and the input signals. It will be interesting to see if the circuit functions without the offset compensation circuitry.

6.2.3 Optical vs. Electrical Variants

As was mentioned in Section 6.1.2, the photodiode integration process may take a few weeks to a few months to complete. A number of replicates of the circuit should be dedicated to optical testing, with photodiodes integrated. In addition, a number of replicates of the circuit should be dedicated to electrical testing with a voltage input as discussed previously. One of the electrical variants should also be with current inputs, although it is very difficult to modulate currents externally.

The number of replicates of each type will be determined by the number of pads available on the test chip. Since all variants (with the exception of the variant without the offset compensation) require two pads for the external capacitors and two pads for the outputs, at least four pads are needed for each replicate. For the electrical variants, at least another pad is needed for the input signal. Since the number of pads needed per replicate is large, a multiplexor will be needed to be able to have a sufficient number of replicates and not be pad-limited.

6.3 Testing Summary

This chapter has discussed methods of testing the circuit, primarily for functionality. Once functionality has been provided, skew and jitter measurement (perhaps using a scheme such as the one presented in [36]) can be the subject of a future generation of the chip. Variants of the circuit have been discussed to isolate and prove functionality
of various components of the receiver circuit. Lastly, attention should be paid to the number of pads available on the chip. A multiplexing scheme will be used to circumvent the pad limitation.
Chapter 7

Final Remarks

This thesis has presented a fully-differential optical receiver circuit for use in an optical clocking application. Significant improvements in simulated skew and jitter have been made relative to previous generations and designs. The circuit has been laid out, extracted and simulated successfully. A test chip remains to be laid out and fabricated.

7.1 Issues Yet To Be Resolved

Though significant improvements have been seen in simulated skew and jitter, there are a number of issues yet to be studied before the advantage of optical clocking is proven and/or realized.

7.1.1 Skew and Jitter

While it has been shown that simulated skew is in the range of ~120ps, this remains high for high-frequency clock applications. If the clock is running at 2GHz (period of 500ps), this skew translates to 24% of the clock cycle. The larger this percentage is, the harder it is for designers to design logic that takes into account the associated large timing margins. Furthermore, the reported jitter numbers are equivalent to modern electrical clock networks. Even if active-deskewing methods are used in
conjunction with optical clock networks, jitter remains a problem. While the analysis in Chapter 4 computed jitter in this receiver to be 6% of the clock period, scaling to higher frequencies will result in jitter becoming a larger percentage of the clock period, which is unacceptable.

### 7.1.2 Power Consumption

As power consumption becomes a larger issue in designs, how the optical clocking alternative compares to an electrical distribution scheme must be analyzed. Since the sensitivity of the receiver is primarily determined by noise considerations, the optical power necessary to overcome noise will be larger than simply analyzing the minimum power as determined by the amplifier gain. If the sum of the power required in the optical network and the power required by all receivers on chip is larger than the corresponding electrical circuitry at those levels, the potential power savings of optical clocking will not be realized [37].

The number of levels of the clock distribution network that optical clocking permeates will be limited by power considerations. In the limit, all the power in the clock distribution network could either be in the form of high velocity electrons, or even higher velocity photons. One hopes that the use of photons reduces the overall power budget as photons are theoretically less susceptible to electromagnetic interference and variation.

### 7.1.3 Variation in the Optical Network

Chapter 2 showed that even in the optical network, there exists many sources of variation. Unfortunately, there has not yet been enough characterization of these variation sources. Future work may reveal that these variation sources are detrimental to the advancement of the optical clocking network unless integrated optical components mature to the point where the potential advantage is realizable.
7.2 Future Work

There are a number of areas in need of research, even with incremental improvements in receiver design. These areas span further improvements in receiver circuit design to system level modelling and characterization. This section presents a few ideas for future research.

7.2.1 Variation As a Useful Tool

Chapter 3 discussed the use of a process-compensated current reference. Section 3.7 showed that while this current reference was intended to generate a stable and robust current, the generated current actually varied with process variation, in a manner that aided overall receiver robustness. A possible future area of research is the design of circuits that vary in a manner that counteracts variation elsewhere in the circuit. For example, the current reference used in this design could be modified to vary even more, especially at the SS corner. Variation is here to stay; techniques that harness variation to counteract variation may prove extremely useful.

7.2.2 Low-Power Circuit Techniques

General consensus in the circuit design realm is that low-power does not equate to high-speed. However, as power budgets seemingly grow without bound, clock frequencies will be limited by heat removal concerns. In order to demonstrate an advantage to optical clocking, minimal skew and jitter are a necessary, but not sufficient condition. Receiver circuits which couple low-power and high-speed techniques while remaining robust to variation must be developed. Furthermore, it must not be the case that power is simply redistributed from the electrical portion of the distribution network to the optical portion. There is a limit to the optical power that optical sources can generate, especially on-chip optical sources. As a result, low-power receiver circuits must retain high-sensitivity, which couples into the next area of future work.
7.2.3 Low-Noise Amplifiers

The sensitivity of optical clock receiver circuits is not determined by the amount of realizable gain: an arbitrary number of gain stages can be added to realize necessary gain. Rather, noise considerations at the receiver input limits sensitivity. Radio frequency (RF) circuits have employed the use of low-noise amplifiers (LNAs) for their input stages for a long time. Integration of such amplifiers into the optical receiver architecture should be explored as they may prove extremely useful in improving receiver sensitivity and jitter.

Additionally, techniques to reduce receiver sensitivity to power-supply noise and variation should be explored. Increasingly, power-supply noise is limiting the jitter performance of electrical clock distribution networks. Realizing receiver circuits that are robust to this noise source will be a significant aid to demonstrating the advantage of the optical alternative.

7.2.4 Alternative Offset Compensation Techniques

The design in this thesis presented a simple design for offset compensation. The other commonly used and studied offset compensation techniques all contain undesirable properties which detract from their use in this design. While simulations show that the offset compensation circuitry works well, the large, off-chip capacitors necessary for proper functionality limit the possibilities of complete integration of this receiver. Circuit techniques that do not require large passive devices should be investigated, such as use of signal processing techniques for low-pass filtering and digital feedback mechanisms. However, the discrete nature of these systems may present significant complications.

7.2.5 Integration of Active Deskew Mechanisms

The largely deterministic nature of skew allows designers to characterize and work around it more so than jitter. Intel and others have demonstrated active compensation of skew through the use of deskewing circuitry [12][36]. An interesting future
possibility to reduce skew is to couple the benefits of optics with that of large-scale, chip-level feedback inherent to active deskew mechanisms. Coupling of seemingly disparate alternatives and solutions presents new and exciting possibilities in tackling the many setbacks introduced by variation.

7.2.6 Integration and Characterization of Optical Components

To date, integration of the optical distribution network (optical sources, waveguides, couplers, etc.) with receiver circuits has not been demonstrated. Testing of receiver circuits has been primarily achieved through free-space illumination and electrical testing. Integration of these components is critical to demonstration of a complete optical clocking solution.

Furthermore, such a distribution network must be completely characterized, with all major contributors of variation identified. Once this identification has been completed, further maturation of optical components as well as receiver circuits will be enabled.

7.3 Contributions

As Mills stated in the design of an optical receiver circuit for data applications, “great things are accomplished in small steps...” [10]. This thesis has built on the work and contributions of a host of other designers. It has shown that circuit techniques can improve receiver generated skew and jitter. Techniques such as fully-differential signaling and process-compensation reduce the effects of variation. With additional work on many of the components in this design, particularly those concerning biasing and low-noise inputs, it is highly likely that optical receiver circuits will no longer be a limiting factor in demonstrating an advantage in optical clocking.

This thesis has demonstrated that high-speed optical clock receiver circuits are possible. Building on Lum’s work in [4] has resulted in a receiver circuit that op-
erates twice as fast and is considerably less sensitive to process and environmental variation, while consuming roughly the same power and occupying 41% of the area. Additionally, the first prefabrication noise analysis of any high-speed optical clock receiver circuit has been performed and shows that power-supply noise remains a critical factor, even in optical clocking primarily due to receiver sensitivities.

While demonstrating that incremental improvements bring the community closer to realizing a potential alternative to the electrical clock distribution network, this thesis has also highlighted potential roadblocks to such a realization. Nevertheless, continual improvements in receiver architecture, topology, and circuit design will result in smaller and smaller skews and jitters. However, if there is a single idea to be taken away from this thesis, it is hopefully that the design presented herein marks a transition from a receiver-limited approach to optical clocking, to a more system-level focus of the advantages and pitfalls of the optical clocking alternative.
Bibliography


[19] D. Ahn. E-mail communication. D. Ahn is a PhD candidate in the Materials Science Department at MIT, 2002.


