Application of a Resonant Transmission Line Clock Driver

by

Charles J. Cavazos

Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY

May 21, 1999

© Massachusetts Institute of Technology, 1999. All Rights Reserved.

Author ................................................................. Charles J. Cavazos
May 21, 1999

Certified by ................................................................. Thomas F. Knight, Jr.
Senior Research Scientist
Thesis Supervisor

Accepted by ...............
Arthur C. Smith
Chairman, Department Committee on Graduate Theses
Abstract

Power consumption is a major concern when designing a high performance microprocessor. In conventional microprocessors, the clock distribution contributes to a major portion of the total power consumed. With the continuing trend of faster frequencies, higher complexity, and larger die sizes, clock distribution is quickly becoming one of the major issues faced by designers. A new solution to this problem is to use an external transmission line as a resonant clock driver. The two main advantages of this technique is its low power and no clock skew.

This paper describes the design of a 1 mm x 1 cm die along with its substrate that are combined via flip-chip bonding. The chip is designed to simulate a conventional clock load and clock driver. The operating frequency for this design is 1 GHz. The total clock load is over 300 pF and the clock driver size is 6 cm. Discussed in detail are the circuits on-chip for functionality and skew measurements as well as the different substrate configurations for constructing a resonant transmission line.

Thesis Supervisor: Thomas F. Knight, Jr.
Title: Senior Research Scientist, MIT AI Laboratory
# Table of Contents

1 Introduction.......................................................................................................................... 6
2 Theory .................................................................................................................................... 8
3 Chip Design ............................................................................................................................ 11
  3.1 Clock Driver...................................................................................................................... 11
  3.2 Functional Blocks ............................................................................................................ 13
  3.3 Timing Circuits ................................................................................................................ 14
  3.4 Pads.................................................................................................................................... 16
4 The Die.................................................................................................................................... 18
  4.1 Power supplies .................................................................................................................... 18
  4.2 Clock .................................................................................................................................. 19
  4.3 Functionality ....................................................................................................................... 20
  4.4 Testability .......................................................................................................................... 21
5 Substrates .............................................................................................................................. 22
6 Testing...................................................................................................................................... 27
  6.1 Skew measurements .......................................................................................................... 27
  6.2 Power measurements ......................................................................................................... 29
7 Conclusion.............................................................................................................................. 30
List of Figures

Figure 2.1: Standard clock driver system ................................................................. 8
Figure 2.2: Transmission line clock driver ................................................................. 9
Figure 3.1: Chip design showing logic blocks ............................................................ 11
Figure 3.2: Clock driver schematic ........................................................................... 12
Figure 3.3: Schematic of type I flip-flop ................................................................. 13
Figure 3.4: Schematic of type II flip-flop ................................................................. 13
Figure 3.5: Schematic of functional blocks ............................................................... 14
Figure 3.6: Schematic and layout of arbiters .............................................................. 16
Figure 3.7: Schematic of an I/O pad ..................................................................... 17
Figure 4.1: Chip pad locations and labeling convention ........................................... 18
Figure 4.2: Chip divided into functional sections with pad locations ....................... 20
Figure 5.1: Mesh configuration inside substrate for transmission line ..................... 23
Figure 5.2: Chip on substrate without transmission line .......................................... 23
Figure 5.3: Chip on substrate with transmission line ............................................... 24
Figure 5.4: Impedance variation in transmission line by signal shaping .................... 24
Figure 5.5: Transmission line with varactors tapping along length .......................... 25
Figure 5.6: Side view of transmission line in substrate ............................................ 26
Figure 5.7: Sections of transmission line with connections in substrate .................. 26
Figure 6.1: Example of a skew measurement ......................................................... 28
List of Tables

Table 4.1: Pads dedicated for power supplies................................................................. 18
Table 4.2: Pads dedicated to clocking functionality......................................................... 19
Table 4.3: Pads dedicated for basic functionality............................................................. 20
Table 4.4: Pads dedicated for testability........................................................................... 21
Table A.1: Pin numbering and label of internal signals..................................................... 32
Chapter 1

Introduction

Power consumption is a major concern when designing a high performance microprocessor. In typical designs the clock distribution contributes a major portion of the total power consumed by a conventional microprocessor. The reason for this large power consumption is that the transition probability of a clock is 100% whereas that of an ordinary logic signal is 33% on average. The percentage of total power consumed through the clock distribution can be as much as 40% as in the case of the Alpha 21164 microprocessor from Digital Equipment Corporation. With the continuing trend of faster frequencies, higher complexity, and larger die sizes, clock distribution and power consumption will become even more important.

Conventional methods to reduce the amount of power dissipated in microprocessors have their limitations. For example, a common approach is to reduce the power supply voltage since it results in a quadratic improvement. However, the power supply voltage can only be reduced so far until the propagation delay becomes too large for the desired frequency of operation or the noise margins become too small. The performance loss becomes even more significant when the power supply approaches the sum of the threshold voltages of the devices. By trying to save on the power consumed in clock distribution, a considerable fraction of the total power can in turn be saved.

A new approach for reducing the power dissipation in clocking and producing sharp transition time across an entire chip has been theorized by Matt Becker, a Ph.D. student at MIT. His technique revolves around an external transmission line that resonates the harmonics of a square wave to drive the clock net directly. The benefits of using a transmission line for clocking are that the entire line will cross the midpoint at the same time which
translates to little or no skew and that the power consumed by clock distribution will be reduced. The power consumption in this method comes from the parasitic resistance of the transmission line which is considerably less than that burnt using conventional methods.

For my thesis, I have designed a chip and substrate to produce experimental results of Matt Becker’s theory. The chip is designed to simulate a conventional clock load and clock net. The fabricated chip includes circuits to test functionality as well as arbiters for on chip clock skew measurements. The substrate contains a transmission line tuned to resonate at one GHz using Becker's impedance variation methods to support the first three odd harmonics of a square wave.

The following chapters contain an overview of the whole design process of the project. Chapter 2 describes the theory behind the resonant transmission line clock driver and how it saves power. Chapter 3 includes a hierarchical structure of the chip as well as design issues. Chapter 4 goes over the actual implementation of the design in the layout of the die. Chapter 5 describes the various method in designing the substrate for this project. Chapter 6 shows how to test the completed design to get the skew and power measurements.
Chapter 2

Theory

On a typical microprocessor, a generated clock signal is buffered through a several stages of inverters where the last stage inverter drives the clock load. The last stage inverter is usually fairly large in order to effectively drive the clock load. The interconnect of the clock lines can be lumped into a large RC network. Figure 2.1 shows a typical view of a standard clock driver. Most of the power dissipated in this clocking scheme is the constant charging and discharging of the RC network and pre-driver capacitances.

![Diagram of a standard clock driver system.](image)

**Figure 2.1**: Standard clock driver system.

In designing a clock distribution network there are several desired characteristics. For one, there should be little to no clock skew from one side of the chip to the other. Also, the clock waveform throughout the chip should have sharp transition times. Common architectures of a clock network are an H-tree structure or mesh design. The main advantage of the H-tree structure is that it minimizes the skew between clock nodes since all nodes are equally distant from the clock source.
To reduce the power consumed in distributing the clock, some architectures include turning off unused portions of the chip, a sleep-mode. This adds to the complexity of the chip design. Using a resonant transmission line as a clock driver, power is recovered by having the transmission line and not the clock driver charge and discharge the clock load. Since the transmission line is doing most of the work, the final stage clock driver can be scaled down which adds to the power savings by reducing the pre-driver power. Figure 2.2 shows the setup for adding an external transmission line to drive the clock load on-chip.

![Figure 2.2: Transmission line clock driver.](image)

The theory behind this technique is as follows: The on-chip driver attempts to distribute a square wave onto the clock load and the external transmission line. Since the driver is small and cannot drive the entire load, only a small pulse will travel down the transmission line. The transmission line is open at the opposite end so that the pulse is fully reflected. The length of the transmission line is made so that when the pulse returns to the driver, it will coincide with the next pulse of the driver. This will result in a larger pulse height being sent back down the line. Eventually the transmission line will be resonating rail to rail at the given clock frequency. This model for the theory is not completely accu-
rate due to parasitics and impedance variations in the transmission line, but provides a conceptual view of the process.
Chapter 3

Chip Design

Figure 3.1: Chip design showing logic blocks.

The design style of the chip is the Hewlett-Packard 0.35μ process fabricated through MOSIS, a low cost prototyping and small-volume production service for VLSI circuit development. A slice, 1 mm x 1 cm, of a full size chip is fabricated for cost reasons. The chip consists of four major logic blocks: a clock driver, functional blocks, arbiters, and pads. Figure 3.1 shows the chip divided into the main logic blocks.

- Clock driver: to drive the clock load along with the resonant transmission line in the substrate.
- Functional blocks: to ensure that the chip is functioning correctly and provides sufficient capacitance to simulate a typical chip.
- Timing circuits: to measure the clock skew between different points on the chip.
- Pads: clock, data, and power signals which will merge the chip to the substrate.

3.1 Clock Driver

An off chip crystal oscillator will feed a one Gigahertz signal onto the chip. This reference clock signal (clk_in) is then sent to eight drivers through an H-tree structure to ensure the reference clock signal arrives at the same time to all the drivers. Each driver then buffers the reference clock signal through step up inverters at a factor sizing of four. Since this chip is one-tenth the size of the Alpha microprocessor which has a 60 cm inverter for its final clock driver, the last stage inverters are scaled accordingly. The last stage inverter of a single driver is one-eight the size of a 6 cm x 0.4 μm inverter so that
when all eight drivers are enabled, the clock driver would have a combined strength of a single 6 cm x 0.4 μm inverter. Figure 3.2 shows the schematic of all eight clock drivers.

Figure 3.2: Clock driver schematic.

Each of the eight drivers on the chip has its own enable signal; however one driver will always be on as shown in Figure 3.2. The other drivers will be activated according to the decoding of the three bit control signal clk_en[0:2]. For example clk_en[0:2] = 000 represents no addition drivers while clk_en[0:2] = 110 represents turning on six additional drivers. This way the drive strength of the line clock driver can be stepped up or stepped down. It is expected that with a transmission line, fewer drivers will be needed to drive the
entire clock load effectively. The drivers on-chip have been designed in order to produce a 50% duty cycle clock waveform.

3.2 Functional Blocks

![Schematic of type I flip-flop.](image)

**Figure 3.3:** Schematic of type I flip-flop.

![Schematic of type II flip-flop.](image)

**Figure 3.4:** Schematic of type II flip-flop.

The entire functional blocks on the chip will consist of two types of latches. Two of the same type of latches are then combined to create an edge-triggered flip-flop. The flip-flops are designed to have data dependent capacitances on the clock load. Figure 3.3 and
Figure 3.4 show the two different types of flip-flops used in this project. The total gate capacitance from the type I flip-flop seen by clk is 40 fF and the total gate capacitance from the type II flip-flop seen by clk is 27.7 fF. There are four main sections of flip-flops, each section containing about 2,200 flip-flops (4,400 latches), that are connected together in a scan chain. Half of all these flip-flops are made up from type I and half are made up from type II. The clock load due to just gate capacitance is roughly 300 pF.

**Figure 3.5:** Schematic of functional blocks.

The input to each latch has a mux in front of it. This is done so that the scan chain or the global data line common to all the flip-flops in that sector can be selected. Figure 3.5 shows a schematic of the functional blocks. The scan path will be used to test the functionality of the flip-flops. The global signal line will be used to see how the capacitance on the clock load can vary when the data signals are all ones or all zeros and switching from all ones to all zeros and vice versa.

### 3.3 Timing Circuits

The timing circuits on the chip will consist of several strategically placed arbiters. An arbiter is a circuit which outputs which of two transitioning signals has occurred first. The arbiters are added to the design so the on-chip skew can be measured. The skews that are
of interest are left end to right end, left end to middle, and middle to right end of the chip.

Skew will be measured by comparing the arrival times of a test clock signal (TB) with that of the local clock signal. The TB signal is generated off chip and is a one Gigahertz square wave. Before entering the arbiter, the test clock signal goes through an external trombone delay circuit where the delay through that path is varied and can be precisely measured. The length of the delay will be varied until the order in which of the two clock signals arrives at the arbiter is switched. The test clock signal will be routed from the center of the chip to the arbiters using an H-tree structure so as to reach all the arbiters at the same time. The test clock signal is also shielded by ground lines from any cross coupling with data lines on the chip.

The arbiters are a cross connected NAND and NOR implementation. Two cross-coupled NAND gates make up a rising-edge arbiter where a low on either output represents which input transitioned from a high to low first. Similarly two cross-coupled NOR gates make up a falling-edge arbiter where a high on either output represents which input transitioned from a low to high first. Table 3.6 shows the schematics and layout of the arbiters.

The layout of the arbiters are mirrored side to side and then top to bottom to reduce any biasing of the input signals to the arbiter. The length of the devices making up the arbiters were increased by a factor of two for the NANDs and by one and a half for the NORs. This is to ensure that if there is a process variation from one end of the chip to the other, the precision of the arbiters would not be affected. The nwells and pwells of the arbiters are also isolated from the rest of the chip and powered independently in order to keep the sensitivity of the arbiters in tact.
3.4 Pads

Since this chip is connected to a substrate using flip-chip bumping, there is a grid array of flip-chip bumps for the data, control, clock, and power signals. The bump size of the pads are 125 \( \mu \text{m} \) and have a pitch of 250 \( \mu \text{m} \) (center to center). They are also octagon in shape to aid in the bumping of the die.

There are four types of pads on the chip: vdd pads, gnd pads, clk pads, and I/O pads. The vdd pads are directly connected to the metal 4 vdd lines. The gnd pads are similarly connected to the metal 4 gnd lines. The clk pads are connected to the metal 4 clk lines.
The I/O pads include some logic that supports either input mode or output mode. Figure 3.7 shows the schematic of an I/O pads. If the I/O pad is an input pad, meaning the I/O pad is driving the value of In, then the active signal is set low so the pmos and nmos devices are both off. If the I/O pad is an output pad, then the active signal is set to high, so that the value of Out is driven onto the I/O pad.

![Figure 3.7: Schematic of an I/O pad.](image-url)
Chapter 4

The Die

The die contains 104 pads as shown in Figure 4.1. The pads are laid out in three rows along the length of the die. Table A.1 shows the breakdown of the pads to their function. The die is then connected to a custom built substrate via flip-chip connections.

![Figure 4.1: Chip pad locations and labeling convention.](image)

4.1 Power supplies

<table>
<thead>
<tr>
<th>Signal</th>
<th># of pads</th>
</tr>
</thead>
<tbody>
<tr>
<td>Vdd</td>
<td>26</td>
</tr>
<tr>
<td>Gnd</td>
<td>26</td>
</tr>
<tr>
<td>Vdd1</td>
<td>3</td>
</tr>
<tr>
<td>Gnd1</td>
<td>3</td>
</tr>
</tbody>
</table>

*Table 4.1: Pads dedicated for power supplies.*

On the die there are two types of power supplies. One is vdd and gnd which powers all of the circuits on the die except for the arbiters. The arbiters have their own separate power supply, vdd1 and gnd1. The need for separate power supplies for the arbiters as mentioned before is to isolate the arbiters from the rest of the chip and to reduce any noise due to power supply fluctuation.
The power pads are spread out evenly around the edge of the die in an alternating manner. The number of power pads was determined to be twice the number of constantly switching pads. Also, on the die there are many bypass capacitors placed between vdd and gnd, and between vdd1 and gnd1, to further reduce any power supply fluctuations.

4.2 Clock

<table>
<thead>
<tr>
<th>Signal</th>
<th># of pads</th>
</tr>
</thead>
<tbody>
<tr>
<td>clk_in</td>
<td>1</td>
</tr>
<tr>
<td>clk_pads</td>
<td>8</td>
</tr>
<tr>
<td>TB</td>
<td>1</td>
</tr>
<tr>
<td>clk_en[0:2]</td>
<td>3</td>
</tr>
</tbody>
</table>

*Table 4.2: Pads dedicated to clocking functionality.*

In the layout of the die, the TB signal is very important. This is the test clock signal that is sent to all the arbiters on the die. The TB signal must reach all the arbiters at exactly the same time in order for the skew measurement tests to function accurately. The TB signal is buffered through a few inverters and then distributed in an H-tree structure to reach all the arbiters. Every path from the TB signal to all the arbiters is exactly the same. Each path has the same length, number of vias, turns, and capacitive loading. The TB signal is also shielded with gnd lines running parallel to the TB signal throughout the die.

The clock distribution network is a mesh design. The eight clk_pads that are connected to the transmission line are laid down the length of the die and spaced evenly. The clock driver is located in the center of the die. There should be a considerable amount of skew without the transmission line, since the clock distribution network is a mesh and not an H-tree. With the transmission line, the skew should be reduced if not eliminated.
4.3 Functionality

<table>
<thead>
<tr>
<th>Signal</th>
<th># of pads</th>
</tr>
</thead>
<tbody>
<tr>
<td>scan_in[0:3]</td>
<td>4</td>
</tr>
<tr>
<td>scan_out[0:3]</td>
<td>4</td>
</tr>
<tr>
<td>global_d[0:3]</td>
<td>4</td>
</tr>
<tr>
<td>data_sel[0:3]</td>
<td>4</td>
</tr>
</tbody>
</table>

Table 4.3: Pads dedicated for basic functionality.

Figure 4.2: Chip divided into functional sections with pad locations.

The chip is roughly divided into four major sections of chained edge-triggered latches. Each section has its own scan_in (SI), scan_out (SO), global_d (GD), and data_sel (DS) as shown in Figure 4.2. The data_sel controls whether the input to the flip-flops is either the global_d value or the output of the previous flip-flop (the scan chain). For example if DS1=0 then all the flip-flops in section 1 will be set to receive inputs from their preceding flip-flop, and if DS1=1 then all the flip-flops will be set to whatever value GD1 is set to. From scan_in to scan_out of any of the sections, the signal would travel through 2,200 flip-flops.
4.4 Testability

<table>
<thead>
<tr>
<th>Signal</th>
<th># of pads</th>
</tr>
</thead>
<tbody>
<tr>
<td>arb_sel</td>
<td>1</td>
</tr>
<tr>
<td>HA[0:2]</td>
<td>3</td>
</tr>
<tr>
<td>HB[0:2]</td>
<td>3</td>
</tr>
<tr>
<td>LA[0:2]</td>
<td>3</td>
</tr>
<tr>
<td>LB[0:3]</td>
<td>3</td>
</tr>
</tbody>
</table>

Table 4.4: Pads dedicated for testability.

These signals are used in helping measure the on-chip skew of the chip’s generated clock signal. Arb_sel is used to select the outputs of either the arbiters with triggers at the 50% of Vdd or the arbiters with triggers at the 30% of Vdd. The two types of arbiters measure the skew at different trigger points so that the slope as well as the skew of the clock can be determined. HA[0:2] and HB[0:2] are output signals where a high value on either signal indicates that its corresponding input’s falling edge arrived first. A low on either LA[0:2] and LB[0:2] indicates its corresponding input’s rising edge arrived first. The index zero corresponds to the left arbiters, an index of one corresponds to the center arbiters, and index of two corresponds to the right side arbiters.
Chapter 5

Substrates

For our purposes, the characteristics of the substrate are very important. Desired physical characteristics include low impedance, multi-layered, ceramic, tungsten wires, and a high dielectric constant. The substrate is connected to the chip die via flip-chip bonding. The pads are connected to the substrate via flip-chip bumping and routed through the substrate to lead to pins. These pins are accessible to the outside and are used in the testing of the package.

Vdd and gnd signals are routed to pins near and around the chip. External bypass capacitors are placed between the vdd and gnd pins to reduce power fluctuations. The I/O signals of the chip are routed to a central location near the end of the substrate so that they can be combined with a board for testing. Tapping into the clock routing will depend on the configuration method of the transmission line.

The transmission line is created using a mesh design of interleaving vdd/gnd and clk wires. The transmission line will take up most of the layers in the substrate. By routing clk and vdd/gnd in multiple layers and then connecting the lines in parallel, the impedance of the transmission line is reduced. Figure 5.1 shows how the transmission line will look inside the substrate.
For our testing purpose, there are different configurations of the substrate on which a chip is attached. One is used as a benchmark comparison and has no transmission line in the substrate. It simply routes the input and outputs to their corresponding pins as shown in Figure 5.2.

Other configurations will include some type of transmission line in the substrate. Figure 5.3 shows a chip connected to a substrate containing a transmission line.
In this project, different ways to construct the transmission line are explored. One approach is using a shaped transmission line. Here the number of the clk signals in the transmission line is varied along the length of the transmission line in order to produce the impedance variations needed for resonance of the first three odd harmonics. Figure 5.4 shows what this approach looks like in the substrate.

Figure 5.3: Chip on substrate with transmission line.

Figure 5.4: Impedance variation in transmission line by signal shaping.
Another method is building a regular transmission line using the mesh approach and tapping it along its length with varactors (variable capacitors). In this configuration the impedance of the transmission line along its length can be adjusted externally. Figure 5.5 shows what this approach looks like in the substrate, where the pins have been omitted for clarity.

![Diagram of transmission line with varactors tapping along length.]

**Figure 5.5:** Transmission line with varactors tapping along length.

To connect the transmission line to the clk pads, special routing of the signal lines are made so that the transmission line regularly connects to the clk pads on the die. Figure 5.6 shows a side view of how the signals are routed to make the connections. Figure 5.7 shows the selected sections of the routing.
Figure 5.6: Side view of transmission line in substrate.

Figure 5.7: Sections of transmission line with connections in substrate.
Chapter 6

Testing

Inputs signals, with the exception of clk_in and TB, are connected to switches, driving either high or low. Clk_in and TB are generated from a crystal oscillator divided down to produce a one Gigahertz signal.

Input switches:

- Clk_en[0:2]
- Arb_sel
- Scan_in[0:3]
- Global_d[0:3]
- Data_sel[0:3]

All the output signals will be connected to an LED and pin. An ‘on’ LED represents a high signal and an ‘off’ LED represents a low. The LEDs chosen will have their own separate power supply so that the LEDs do not draw off power from the chip itself. The waveforms on the output can be examined by connecting an oscilloscope to the output pins.

Output LEDs/pins:

- Scan_out[0:3]
- HA[0:2]
- HB[0:2]
- LA[0:2]
- LB[0:2]

6.1 Skew measurements

On the chip, there are arbiters that compare the arrival times of the clk signal and the TB signal. The clk signal enters all the arbiter as ‘A’ and the TB enters all the arbiters as ‘B’. One of the values of HA[i] and HB[i] should be toggling while the other remains low. For example, if HA[1] is toggling and HB[1] = 0 it means that the falling edge of the on-
chip clock arrived at the arbiter before the falling edge of TB signal. Similarly for the outputs of LA[i] and LB[i]. One of the values of LA[i] and LB[i] should be toggling while the other remains high. The reason for the toggling of the output signal is due to the resetting of the arbiters. The resetting occurs at the frequency of the input signals which is one Gigahertz.

![Diagram of skew measurement](image)

**Figure 6.1:** Example of a skew measurement.

To measure the skew the following steps are taken:

1. Pick which arbiters to use (arb_sel=0 selects the 50% arbiters and arb_sel=1 selects the 30% arbiters).
2. Vary TB until the outputs of the selected arbiter switches from one to the other, or a metastability occurs. For example, if comparing the falling edges at left edge of the chip adjust TB until HA[0]=toggling and HB[0]=0 switches to HA[0]=0 and HB[0]=toggling or vice versa.
3. Record the time delay from the trombone delay (T1).
4. Vary TB until the outputs of the other arbiter switches from one to the other, or a metastability occurs.
5. Record the time delay from the trombone delay (T2).
6. The skew is then just the difference T2-T1.

The edge that is of concern is simply picked by observing the output pin that corresponds to that edge. Figure 6.1 shows a diagram on how the skew measurement is made.
6.2 Power measurements

Power measurements are done by measuring the current drawn from the power supply to the chip. These measurements are done on all the different substrate configurations. Here the inputs are kept the same from one substrate configuration to the next so that the current drawn from all the pads are the same in both cases.

For substrates that contain a transmission line, clk_en[0:2] can be adjusted to see what the minimum drive strength of the final stage clock driver can be with a transmission line aiding it.
Chapter 7

Conclusion

Trying to reduce the amount of power consumed in the clocking of a chip is very desirable for many reasons. The main reason being that the power dissipated by the clock is a significant portion of the total power dissipation of the chip. Another reason is that the transitional probability of the clock is one, whereas other logic blocks switch a third of the time on average, so it makes sense to try to find ways to decrease the power consumed by the clock.

In this paper, the design of a chip in order to simulate the clock load of a typical microprocessor has been described. A couple of the circuits on-chip should be noted. The implementation of a variable strength clock driver is particularly useful for determining how well the resonant transmission line aids in driving the entire clock load. By varying the clock driver strength, the percentage of power saved with this technique can be determined. The most difficult design and layout of the die were the arbiters. The sensitivity of the arbiters is very important in measuring the clock skew accurately, since traditional techniques for measuring clock skew off-chip are not possible due to the flip-chip bonding of the die to the substrate.

Another difficult challenge was the design of the substrate. In order to construct a transmission line in the substrate, the substrate characteristics must be known and should be uniform over the entire substrate. The routing of the transmission lines in a multi-layered substrate proves difficult when having to regularly cross connect the transmission lines as well as connecting to the pads on the die.
The design of the die has been fabricated; however, the design of the substrate configurations has not yet been fabricated. Future work could refine the design of the substrate to produce a transmission line that resonates at exactly one GHz.
# Appendix A

<table>
<thead>
<tr>
<th>Pin#</th>
<th>Label</th>
<th>Pin#</th>
<th>Label</th>
<th>Pin#</th>
<th>Label</th>
<th>Pin#</th>
<th>Label</th>
</tr>
</thead>
<tbody>
<tr>
<td>A0</td>
<td>HA[0]</td>
<td>A26</td>
<td>Gnd</td>
<td>B12</td>
<td>clk_pad</td>
<td>C14</td>
<td>arb_sel</td>
</tr>
<tr>
<td>A3</td>
<td>Vdd</td>
<td>A29</td>
<td>Gnd</td>
<td>B15</td>
<td>clk_pad</td>
<td>C17</td>
<td>Vdd</td>
</tr>
<tr>
<td>A4</td>
<td>Gnd</td>
<td>A30</td>
<td>Vdd</td>
<td>B16</td>
<td>Gnd</td>
<td>C18</td>
<td>LA[1]</td>
</tr>
<tr>
<td>A5</td>
<td>Vdd</td>
<td>A31</td>
<td>SO[3]</td>
<td>B17</td>
<td>Vdd</td>
<td>C19</td>
<td>Gnd1</td>
</tr>
<tr>
<td>A6</td>
<td>Gnd</td>
<td>A32</td>
<td>Gnd</td>
<td>B18</td>
<td>clk_pad</td>
<td>C20</td>
<td>Vdd1</td>
</tr>
<tr>
<td>A8</td>
<td>SO[0]</td>
<td>A34</td>
<td>Gnd</td>
<td>B20</td>
<td>GD[3]</td>
<td>C22</td>
<td>Gnd</td>
</tr>
<tr>
<td>A9</td>
<td>Gnd</td>
<td>A35</td>
<td>Vdd</td>
<td>B21</td>
<td>clk_pad</td>
<td>C23</td>
<td>LB[2]</td>
</tr>
<tr>
<td>A10</td>
<td>Vdd</td>
<td>A36</td>
<td>Gnd</td>
<td>B22</td>
<td>Vdd1</td>
<td>C24</td>
<td>Vdd</td>
</tr>
<tr>
<td>A12</td>
<td>Gnd</td>
<td>A38</td>
<td>Vdd</td>
<td>C0</td>
<td>LA[0]</td>
<td>C26</td>
<td>Vdd</td>
</tr>
<tr>
<td>A14</td>
<td>clk_en[0]</td>
<td>B0</td>
<td>Gnd1</td>
<td>C2</td>
<td>LB[0]</td>
<td>C28</td>
<td>SI[2]</td>
</tr>
<tr>
<td>A15</td>
<td>Vdd</td>
<td>B1</td>
<td>Vdd1</td>
<td>C3</td>
<td>Gnd</td>
<td>C29</td>
<td>Vdd</td>
</tr>
<tr>
<td>A17</td>
<td>Gnd</td>
<td>B3</td>
<td>DS[0]</td>
<td>C5</td>
<td>Gnd</td>
<td>C31</td>
<td>SI[3]</td>
</tr>
<tr>
<td>A18</td>
<td>clk_in</td>
<td>B4</td>
<td>GD[0]</td>
<td>C6</td>
<td>Vdd</td>
<td>C32</td>
<td>Vdd</td>
</tr>
<tr>
<td>A19</td>
<td>Vdd1</td>
<td>B5</td>
<td>clk_pad</td>
<td>C7</td>
<td>Gnd</td>
<td>C33</td>
<td>Gnd</td>
</tr>
<tr>
<td>A20</td>
<td>Gnd1</td>
<td>B6</td>
<td>Gnd</td>
<td>C8</td>
<td>SI[0]</td>
<td>C34</td>
<td>Vdd</td>
</tr>
<tr>
<td>A21</td>
<td>TB</td>
<td>B7</td>
<td>Vdd</td>
<td>C9</td>
<td>Vdd</td>
<td>C35</td>
<td>Gnd</td>
</tr>
<tr>
<td>A22</td>
<td>Vdd</td>
<td>B8</td>
<td>clk_pad</td>
<td>C10</td>
<td>Gnd</td>
<td>C36</td>
<td>Vdd</td>
</tr>
<tr>
<td>A24</td>
<td>Gnd</td>
<td>B10</td>
<td>GD[1]</td>
<td>C12</td>
<td>Vdd</td>
<td>C38</td>
<td>Gnd</td>
</tr>
</tbody>
</table>

Table A.1: Pin numbering and label of internal signals.