An Ultra-Low Power Programmable Analog Bionic Ear Processor

Rahul Sarpeshkar, Christopher Salthouse, Ji-Jon Sit, Michael Baker, Serhii Zhak, Timothy Lu, Lorenzo Turicchia, and Stephanie Balster

Published April 2005

An Ultra-Low-Power Programmable Analog Bionic Ear Processor

Rahul Sarpeshkar, Christopher Salthouse, Ji-Jon Sit, Michael Baker, Serhii Zhak, Timothy Lu, Lorenzo Turicchia, and Stephanie Balster

Abstract—We report a programmable analog bionic ear (cochlear implant) processor in a 1.5µm BiCMOS technology with a power consumption of 211µW and 77dB dynamic range of operation. The 9.58mm x 9.23mm processor chip runs on a 2.8V supply and has a power consumption that is lower than state-of-the-art A/D-then-DSP designs by a factor of 25. It is suitable for use in fully implanted cochlear-implant systems of the future which require decades of operation on a 100mAh rechargeable battery with a finite number of charge-discharge cycles. It may also be used as an ultra-low-power spectrum-analysis front end in portable speech-recognition systems. The power consumption of the processor includes the 100µW power consumption of a JFET-buffered electret microphone and an associated on-chip microphone front end. An automatic gain control circuit compresses the 77dB input dynamic range into a narrower internal dynamic range (IDR) of 57dB at which each of the 16 spectral channels of the processor operate. Each of these channels is made up of a bandpass filter with programmable low and high corner frequencies, an envelope detector with programmable attack and release times, and a logarithmic dual-slope analog-to-digital converter with programmable offset-calibration and sampling-rate parameters. The output bits of the processor are scanned and reported off chip in a format suitable for continuous-interleaved-sampling (CIS) stimulation of electrodes. Power-supply-insensitive biasing and circuit design provide robust operation of the processor in the high-RF-noise environment of current cochlear-implant systems. Constant-Gm subthreshold MOS biasing, current-reference distribution, and feedforward and feedback offset-calibration are used to combat the effects of temperature and transistor mismatch. The processing power of this processor is below that of a very low power microphone-and-A/D front end alone. Thus, even zero digital power consumption will likely make an A/D-then-DSP design consume more power than this processor at the end of Moore’s law one or two decades in the future. This design suggests that the current trend of digitizing analog information at high-speed and high precision as soon as possible followed by processing in the digital domain is not an efficient solution if power consumption is of paramount importance. Rather, it is more advantageous to do robust-and-programmable analog preprocessing and digitize higher-level information at lower speed and lower precision.

Index Terms—Bionic Ear, Cochlear Implant, Low Power, Hearing Aids, Speech Recognition, Spectrum Analysis

I. INTRODUCTION

Cochlear implants or bionic ears (BE) restore hearing in profoundly deaf patients. They function by transforming frequency patterns in sound into corresponding spatial electrode-stimulation patterns for the auditory nerve. Over the past 20 years, improvements in sound-processing strategies, in the number of electrodes and channels, and in the rate of stimulation have yielded improved sentence and word recognition scores in patients [1]. Next-generation implants will be fully implanted inside the body of the patient and consequently have very stringent requirements on the power consumption used for signal processing. Our processor is intended for use in such next-generation implants. It can operate on a 100mAh battery with a 1000 charge-and-discharge-cycle lifetime for 30 years. The digital outputs of the processor, its immunity to power-supply noise, and its programmability ensure ease of use with the other parts of an implant system such as its wireless communication link and programming interface.

1 The authors are with the Research Laboratory of Electronics, Massachusetts Institute of Technology. Cambridge, MA 02139 (email: rahuls@avnsl.mit.edu)
In addition to bionic ears, low-power processors such as the one described in this paper are useful in portable speech-recognition front ends of the future. Such systems will need programmable front-ends that take a microphone input, and output bits that represent spectral information.

A common speech-processing strategy, used in implants and in speech-recognition systems, employs a mel cepstrum filter bank with 8-20 channels. The mel scale maps frequencies to a perceptually linear scale [2]. Filter banks based on the mel scale use linearly spaced filter center frequencies up to 1kHz and logarithmically spaced center frequencies above 1kHz. In the ubiquitous cepstral techniques, a logarithmic measure of the spectral energy in each filter bank channel is used for further processing. In implants, the number of functioning electrodes used for stimulation is also between 8—20 and having more electrodes is often not useful due to spatial interactions amongst the electrodes.

This paper describes a programmable processor for bionic ears and speech-recognition front ends. The signal-processing blocks of the processor are shown in Fig. 1. A microphone pre-amplifier (Audio Front End or AFE in Fig. 1) converts sound over an 77dB dynamic range into an electrical input that is fed to an automatic-gain-control circuit (AGC). The AGC compresses this input dynamic range to a narrower 57dB internal dynamic range (IDR) at which each of the 16 channels of the processor operate. The 57dB IDR fulfills the 50-60dB IDR requirement for good patient performance [3]. Each channel is made up of a programmable Gm-C band-pass filter, an envelope detector with programmable attack and release time constants, and a logarithmic dual-slope A/D converter with programmable calibration, sampling, and precision parameters. The output bits of the A/D converters are sequentially scanned via a shift register, which is clocked by an input electrode-stimulation clock or ‘continuous interleaved sampling’ (CIS) clock. These bits are fed to a bus whose data constitutes the final output of the processor. Under normal operation, the output bits of the processor would be latched into DACs that control electrode stimulation currents.

Current reference circuits provide constant-gm and power-supply-noise-immune biasing throughout the chip via a current-distribution network. Current-mode Digital-to-Analog (DAC) converters that are necessary for altering the various parameters of the chip are also biased with these reference currents and implement the programmability of the system. Each channel of the chip is independently programmed via a shift-register-based multiplexing system that operates on a programmability-clock during patient programming. Several crucial voltage and current variables in the channel may be observed through visibility follower circuits that are controlled by the same shift-register multiplexing bits. The visibility of the processor is important for debugging any issues having to do with poor subject performance in a highly variable patient population.

The ultra low power specifications of our processor are in large part due to simple but powerful innovations in the design of its building-block analog circuits, some of which have been recently described in detail. These circuits include the microphone front end [4], the bandpass filters [5], the envelope detector [6], and the logarithmic analog-to-digital converter [7]. Although some parts of the processor have been reported on, this paper describes in detail how these parts and other new circuit building-blocks all fit together to create a robust, programmable, ultra-low-power, mixed-signal system. We present experimental data from the overall system that verifies its operation. Several simple-but-important issues in the design of a large-scale mixed-signal chip such as this one that are crucial for maintaining robust operation without sacrificing efficiency are discussed.

Subthreshold-MOS, silicon cochleae, and analog circuits for cochlear-implant processing have been previously proposed [8] - [14] as a means for implementing complex signal processing with very low power. This work proves the promise of such prior work by achieving numbers that make an analog processing system the most power-efficient option in the short and long term,
while still preserving the very desirable features of high internal dynamic range per channel, programmability, and robustness present in digital systems. This processor presents an example of how delaying digitization by first doing analog preprocessing can lead to tremendous gains in power efficiency (the processor is effectively operating at 5µW per MIP vs. a DSP at 250µW/MIP) because of the exploitation of analog primitives for computation [15].

The organization of this paper follows the signal-flow in the chip: Section 2 describes the microphone front end of the processor. Section 3 describes the analog gain-control circuit. Sections 4A-4C describe the signal processing of a single channel, namely, the band-pass filtering, the envelope detection, and the logarithmic A/D conversion. Section 5 describes the current and voltage biasing scheme used in the processor. Section 6 discusses the digital control, programming, and I/O interface of the processor. Section 7 presents measured experimental results for the overall operation of the processor. Section 8 compares our processor to more traditional A/D-then-DSP implementations and provides a discussion of why preliminary digitization of the data is not a wise strategy for power-efficient architectures. Section 9 concludes by summarizing the key contributions of the paper.

II. THE MICROPHONE FRONT END

Fig. 2 shows the built-in microphone front-end circuit (the AFE circuit) that is used in our processor. A commercial JFET-buffered FG3329 electret microphone transduces sound into an electrical signal that is input to our processor. The microphone is shown inside the dotted box. Instead of using the traditional voltage output of the microphone, we use the ‘power supply’ drain terminal of the microphone’s internal source-follower as the input to the processor. Our strategy allows us to regulate the drain output of the microphone to a reference voltage via negative feedback, and sense and amplify the microphone’s output current in a transimpedance or sense-amplifier topology. The sense amplifier is implemented with an operational amplifier and feedback resistor R_f in Fig. 2. The advantage of sensing the microphone’s current rather than its voltage is that superior power supply rejection may be achieved since the microphone’s traditional power-supply terminal is now at a quiet reference voltage. Power-supply rejection is crucial in bionic-ear processors due to the ubiquitous presence of supply noise caused by RF and digital signals. Furthermore, the sense-amplifier topology allows easy multiplexing of auxiliary inputs through simple connection of the AUX input to the current-collecting terminal of the sense amplifier by a resistor. Pre-emphasis filtering in the front end can easily be implemented by replacing R_f with a frequency-dependent two-port network, such as a T-network.

A large feedback resistor is desirable for achieving high gain and ensuring that the output noise of the front end is dominated by the microphone’s noise rather than by the resistor’s thermal noise. However, a large resistor causes the dc voltage drop across the feedback resistor caused by the microphone’s dc bias current of approximately 20µA to exceed limitations set by the 2.8V supply. To solve this problem, a low-frequency negative-feedback loop formed by the G_m transconductor, M_1, and the capacitor in Fig. 2 subtracts the bias current of the microphone before it is amplified by the sense amplifier. This feedback loop is designed to be operational only at low frequencies (below 100Hz or less) such that ac microphone currents are normally amplified while low-frequency microphone currents are shunted away. To ensure that the overall front end has almost 80dB dynamic range (30dB SPL—110dB SPL where SPL = Sound Pressure Level or 5µV-50mV input rms), the input-referred noise of the operational amplifier and the noise due to M_1 must be kept low through appropriate device sizing and biasing. These details are described in [4]. This paper also describes a biasing scheme for internal currents in the operational amplifier and for M_1 that uses strong capacitive coupling from the source terminal of a MOSFET to its gate to ensure nearly constant gate-to-source drive and consequently nearly constant current in spite of power-supply noise in the source. These ideas and the use of
simple capacitive filtering can be used in conjunction to create a very low power and power-supply-noise-immune front end circuit as shown in Fig. 3, and described in detail in [4]. Fig. 4 shows that a 20dB gain with respect to the JFET-source-buffered terminal is attained by the front end up to 10kHz while the power-supply noise is attenuated by 50dB-90dB with respect to the same terminal. Fig. 5 shows that there is up to 70dB of RF-signal attenuation in the front end up to the measurement limit of 26MHz. RF telemetry signals in bionic implants are typically around this frequency range. The overall front end consumes only 100µW of power due to careful low-noise and low-power design. This number includes the power consumed by the JFET-buffered microphone. The noise of the front end is dominated by the noise of the JFET-buffered microphone.

### III. THE AUTOMATIC GAIN CONTROL (AGC) CIRCUIT

The AGC circuit of the processor is implemented by regulating the transconductance of a simple transconductance-resistance variable gain amplifier (VGA) that follows the AFE circuit described in Section 2. The gain of the amplifier is given by the product of the transconductance $G_{m1}$ and the resistance $R$, and the regulation of $G_{m1}$ regulates the gain of the circuit without changing its bandwidth, which is determined by the resistance and output load capacitance. Fig. 6 shows that the VGA resistance is actually implemented a transconductance $G_{m2}$ such that the gain of the circuit is given by $G_{m1}/G_{m2}$. The transconductors $G_{m1}$ and $G_{m2}$ both use the wide-linear-range transconductor (WLR) circuit described in [16]. The voltage reference inputs were biased very near the microphone reference voltage used in Fig. 2.

The gain regulation of the AGC is performed by the circuit shown in Fig. 7: The input $I_{in}$ is generated by an envelope detector whose output current is linearly proportional to the amplitude of the VGA’s output voltage. The operation of the envelope detector is identical to that used in the envelope detector described in the single-channel processing circuits of Section 4. Thus, we shall delay the description of its operation to Section 4. A current $I_g$, proportional to $\ln(I_{in}/I_{ref})$, is output by the transconductance amplifier with bias current $I_1$ and converted to a voltage $V_g$ on the effective load resistance created by the transconductance amplifier with bias current $I_2$. The voltage $V_g$ is exponentiated by the output transistor to create $I_{out}$, which, is conveyed through a cascode transistor and current mirror to create $I_{control}$, the gain-controlling current of the VGA of Fig. 6. The capacitor $C$ is necessary for achieving a small value of thermal voltage noise and, consequently, a small value of error in the gain fluctuations of the VGA. Since we desire the output dynamic range of the overall AGC to be around 60dB, these fluctuations must be significantly less than 1 part in a 1000.

Since the $I_1$ and $I_2$ transconductors have the same linear range, the ratio $I_1/I_2$ is the same as the ratio of their transconductances, and determines the overall compression ratio of the AGC circuit. A simple mathematical analysis reveals that the circuit of Fig. 7 implements the equation

$$I_{out} = I_{ref} (I_{in}/I_{ref})^{-I_1/I_2}$$

which, when applied to the VGA of Fig. 6, yields a power-law relationship between the input and output of the VGA of

$$V_{out} \propto V_{in}^{1/(1-(I_1/I_2))}$$

The knee of the AGC circuit is set by a minimum circuit that limits the control current to the VGA to be the minimum of that output by the circuit of Fig. 7 and a user-settable current that limits the maximum gain of the VGA. The minimum circuit was implemented in our processor but is relatively straightforward and is not described in this paper.

Fig. 8 shows the overall steady-state input-output compression curve measured from the AGC circuit in our processor. The knee and compression are clearly visible at low and moderate input amplitudes in the dual log-scaled plot.
Figs. 9A and 9B show the attack and release characteristics of the AGC as it adapts to an abrupt increase or abrupt decrease in the input amplitude respectively. The fast attack and slower release time constant in the envelope detector is inherited by the AGC, which consequently adapts more quickly to increasing amplitudes than to decreasing amplitudes. Such asymmetry in the attack and release time constants of the AGC is necessary to mimic the behavior of the human auditory system.

The power consumption of the overall AGC circuit was measured to be 30µW. At its maximum gain setting, the AGC is designed to ensure that its minimum detectable signal is near but below the output noise floor of the preceding AFE (50µV rms) and that this noise is amplified by the AGC to be near but below the input-referred noise floor of the succeeding channel circuits (310µV rms). The gain of the AGC is then turned down to ensure that, at maximum amplitudes, it does not saturate the filters of the succeeding channel circuits by limiting the output signal strength to 500mV rms. The power consumption of the AGC circuit is limited by the need to preserve the 10kHz bandwidth necessary for this application while simultaneously maintaining the low noise floors required for wide-dynamic-range operation of the overall processor.

Programming of the AGC parameters is done with currents from a constant-gm bias circuit described in Section 5. The programming of the translinear circuit may also be implemented by off-chip current biasing.

IV. SINGLE-CHANNEL PROCESSING CIRCUITS

Each channel of the processor consists of a bandpass filter, an envelope detector, and a logarithmic A/D to extract the logarithm of the spectral energy in a given filter band of the spectrum. We shall describe each of these blocks briefly, focusing on key insights and results relevant to processor operation. Further details may be found in [5] - [7].

A. The Bandpass Filter
The bandpass filter is a fourth-order transconductance-capacitor filter and consists of two stages of a capacitive-attenuation second-order filter cascaded via a source-follower to prevent interstage loading as shown in Fig. 10. The upward and downward rolloff slopes of the resulting filter are then both 40dB/decade. Second-order rolloff slopes are known to be necessary and sufficient for good patient performance. In each stage, C1 and AC1 implement capacitive-attenuation to widen linear range in the transconductors [5], G1 and its attenuating capacitors implement the high-pass portion of the filter, G2 and its output-loading capacitors implement the low-pass portion of the filter, and G3 provides low-frequency adaptation to set the floating node of G2’s capacitive divider. Measurements of these filters reveal a 65dB operational dynamic range from 400µV rms at the input-referred noise floor to 750mV rms at 5% total harmonic distortion. The power consumption is 5.4µW for a 5kHz-10kHz filter and 112nW for a 100Hz-200Hz filter. The filter’s performance is in good accord with a theoretical noise-and-dynamic-range analysis, which is performed using the procedure outlined in [5]. Each channel of the processor has a 5-bit DAC whose output determines the low-frequency corner of the filter by setting the bias current of G1, and a 5-bit DAC whose output determines the high-frequency corner of the filter by setting the bias current of G2. The bias current of G3, which is less critical in determining the filter characteristics as long as it is small enough, is set to a constant current of approximately 10pA. To provide good Gm/C corner-frequency matching, the DAC’s reference currents are generated from an on-chip resistor-based constant-gm current reference described in Section 5. Experimental measurements from the array of bandpass filters are presented in Section 7.

B. The Envelope Detector
The envelope detector circuit is shown in Fig. 11A. It consists of a transconducting rectifier followed by a current-mode peak detector that estimates the dc component of the rectified signal. The details of the rectifier circuit are shown in Figs. 11A and 11B, and the details of the peak-detector circuit are shown in Fig. 12. Fig. 11A shows that the rectifier is implemented by
inserting a class AB current conveyor inside a $G_m$-C filter in a negative-feedback configuration. At frequencies above the corner frequency of the filter, the output current of the transconductor $I_{out}$ (or $I_{in}$ in Fig. 11A) is proportional to the input voltage to the filter since the node voltage on the capacitor is very near the dc voltage of the input, i.e., $I_{out} = -I_{in} = G_mV_{in}$. This current is split into two halves with positive current traveling through $M_p$ and its associated mirror, and negative current traveling through $M_n$ and its associated mirror. For small input signals, $M_p$ and $M_n$ are operated in weak inversion and function like MOS versions of an NPN and PNP transistor respectively with exponential current dependence. The positive and negative half currents are added to create a full-wave-rectified signal, and fed to the peak detector. However, Fig. 11 only shows a half-wave implementation for clarity. The local feedback amplifier, shown in symbolic form in Fig. 11A, and as a transistor schematic in Fig. 11B, serves to reduce the two diode-drop ‘dead zone’ of the $M_n$ and $M_p$ exponential elements by prebiasing $V_{out, TOP}$ and $V_{out, BOT}$ to enable Class AB operation. It also reduces the dead zone at dc by the dc gain of the amplifier. The corner frequency of the $G_m$-C filter is set below 100Hz since we are interested in sensing the envelopes of signals in the 100Hz-10kHz audio frequency range.

The minimum detectable signal of the rectifier is limited by two considerations. First, the current $I_{in}$, i.e. $G_mV_{in}$, must be large enough to charge the effective capacitance at the input node of the conveyor to a voltage that has sufficient amplitude to exceed the dead-zone voltage; this effect argues for having as small a dead zone as possible if one wants to have a tiny minimum detectable signal. However, there is a second effect: A small dead zone is detrimental to detecting small signals because it causes large transmission of the thermal noise currents at the input of the conveyor to the peak detector and creates a large rectified-thermal-noise current floor; a small dead zone effectively reduces the natural noise-reduction benefit of a thresholding nonlinearity by reducing the threshold. Thus, to get as small a minimum detectable signal as possible, there is an optimal dead-zone width at which both effects yield the same limiting noise floor. The analysis in [6] and [18] establishes the optimal width of the dead zone as a function of the bias current of the transconductor, the parasitic gate-to-source capacitances of the transistors, and the maximum desired frequency range of operation (10kHz). At this optimal width, one can compute that the maximum dynamic range $D_{optimum} = I_B/I_{n,mds}$ achievable in a transconductor with bias current $I_B$, with $n$ effective devices of shot noise, with an output current mirror ratio of $N$, and a minimum detectable current of $I_{n,mds} = G_mV_{n,mds}$ is given by

$$D_{optimum} = \frac{4}{\pi} \sqrt{\frac{2}{n \cdot q \cdot f_{MAX}}}$$

(3),

where $q$ is the charge on the electron. In [6], we showed that we achieved this optimum dynamic range by presenting results from a 75dB (0.3mV pp – 1.7V pp), 2.8µW, 100Hz-10kHz envelope detector with input dc-insensitive operation as in Fig. 11A. The use of minimum-size subthreshold MOS devices for $M_n$ and $M_p$ with low values of gate-to-source capacitance was an important factor in achieving this dynamic range.

The peak detector shown in Fig. 12 is a current-mode lowpass filter [19] modified to have asymmetric attack and release time constants: If transistor $M_5$ is shorted, then the topology behaves like a lowpass filter with a time constant $\tau_a = [C_s(kT/q)]/(\kappa I_a)$. When the input current is rising, $M_5$ and $I_r$, which form a source follower, do effectively behave like a short since $M_5$ is strongly turned on in this regime. Thus, for rising inputs, the attack time constant is given by the above equation. For falling inputs, $M_5$ is strongly turned off and the source follower relies on $I_r$ to provide a slow return to equilibrium by charging $C_r$. The net effect is that we have a release time constant $\tau_r = [C_r(kT/q)]/(\kappa I_s)$, where $\kappa$ is the subthreshold exponential constant [17].

The cascade of a rectifier and a peak detector yields an envelope detector. The detector quickly follows the rising portion of the full-wave-rectified input with a fast attack and holds this value with little decay in between cycles during a slow release.
Figure 13A shows data from a low-power (0.87 µW) version of the envelope detector with 63dB dynamic range used in each of the 16 channels. Figure 13B shows data from a higher-power wider-dynamic-range version (75dB, 2.8 µW) used in the AGC circuit of the processor. The outputs of both envelope detectors are relatively invariant with input frequency over the 100Hz-10kHz operating range [6].

A 2-bit DAC in each channel allows I₀ to be altered such that the attack time constant can vary from 1ms to 4ms. A 3-bit DAC in each channel allows Iᵣ to be varied such that the release time constant can vary from 3ms to 24ms. Fig. 14 shows how this programmability affects the envelope tracking of an input with a square-wave envelope. The measurements were taken by converting the current to a voltage across an instrumentation transconductor.

C. The Logarithmic A/D

Fig. 15 shows a schematic of the logarithmic A/D with offset and temperature compensation: The WLR circuit is a wide-linear-range transconductor with gate inputs, bump-linearization, and source-degeneration, which extend its linear range to 0.24V [16]. The comparator is a simple 5-transistor Ordinary Transconductance Amplifier (OTA).

The output current from the envelope detector is converted to a logarithmic voltage on a diode. The Proportional-To-Absolute Temperature (PTAT) dependence of this diode voltage is cancelled by the inverse PTAT dependence of the transconductance of the WLR transconductor [16], which linearly reconverts the diode voltage to a current. The ratio of this current to the bias current of the transconductor is then digitized in a dual-slope analog-to-digital conversion scheme described below. Any temperature dependence of the bias current of the transconductor is cancelled out in the ratiometric computation. The switches and clocking controls in Fig. 15 implement offset compensation via the automatic incorporation of auto zeroing in the conversion cycle.

The clocking sequence employed for dual-slope analog-to-digital conversion involves 3 phases: During the first auto-zeroing phase (AZ), the input to the WLR is switched to a reference current Iᵣₑᶠ, the WLR’s offset is sampled and stored on Cₐᵐ and Cᵢₜ via negative-feedback action on the follower-configured WLR circuit, and the comparator’s offset is similarly sampled and stored on Cₐₜ. During the AZ phase, the offset current Iₒₜ serves to auto-zero the WLR to operate from a starting negative-offset voltage. This strategy ensures that a substantial fraction of the full differential linear range of the transconductor is available for use in the circuit rather than just the positive half of the differential linear range if Iₒₜ is zero. On the second or following integration phase, the AZ and AZdelayed feedback switches to the WLR and comparator are opened, and the input of the WLR is switched to the envelope-detector current Iᵢ. The delayed opening of the AZdelayed switch prevents charge injection from the AZ switch from affecting the comparator output. During the integration phase, a current proportional to the logarithmic value of Iᵢ is integrated for a fixed period of time to generate a voltage on the integration capacitor Cᵢₜ. On the third or following de-integration phase, the input to the WLR is switched to Iᵣₑᶠ, the current Iₒₜ is disconnected from Cᵢₜ, and the negative-offset voltage on the WLR causes it to de-integrate the voltage developed on Cᵢₜ during integration. If a counter, initialized at 0, counts the number of clock cycles until the voltage on Cᵢₜ de-integrates to the auto-zeroed trip point of the comparator, then, the frozen value of the counter after the comparator trips represents a digitization of \( \left( \phi / Vₗ \right) \ln(Iᵢ / Iᵣₑᶠ)(Iᵢ / Iₒₜ) \), where \( \phi = kT/q \) is the PTAT thermal voltage and \( Vₗ \) is the inverse-PTAT linear range of the transconductor. To ensure wide dynamic range, \( Vₗ \) needs to be at least as large as the 180mV of input diode-voltage variation seen over a 60dB input-current variation, but it should not be too much larger, or the sampled thermal noise on Cₐᵐ will reduce the precision of the converter [7]. A small value of \( Vₗ \) also reduces the power required to maintain the bandwidth necessary for settling during the auto-zeroing phase.

The use of two local-feedback loops to auto zero the WLR and comparator separately instead of one global-feedback loop, as is traditional, yields a 2.5 bit improvement in precision for the same power [7]. The improvement in precision results because a
larger value of $C_{az}$ and $C_{int}$ can be used to reduce sampled thermal noise, while the power used in the more complex biasing-and-compensation circuitry of global feedback, can be diverted into maintaining the bandwidth of the WLR.

Since the envelope detectors across various channels are likely to have gain mismatches and differing noise floors, it is advantageous to adjust $I_{ref}$ in each channel to compensate by providing calibration bits for each channel. Each channel therefore has a 3-bit DAC to alter the value of $I_{ref}$ to assume one of 8 possible values from approximately 50pA to 400pA.

Sampling rate and power are easily traded in this converter by adjusting the value of $I_b$ and $I_{os}$. A 2x increase in sampling rate at the same precision requires a 2x increase in $I_b$ and $I_{os}$ and also doubles the digital switching power consumption. At constant power, speed and precision may also be traded off against each other, albeit in a more advantageous fashion: Since the converter’s precision is thermal-noise limited [7], a reduction in capacitance by 4x worsens the precision by 1bit. However, the converter can now run at a sampling rate that is four times faster.

The converter described in [7] achieved 8-bit precision at 300Hz sampling rate with 3µW of power (1µW of analog power and 2µW of switching digital power in a 1.5µm process) for appropriate choices of capacitances and bias currents. While these numbers are perfectly suitable for generating spectrograms of speech signals for a speech-recognition front end (the envelope of speech varies slowly at a 100Hz rate and there is much variability in the signal), they are more precise than required for cochlear-implant processors. Patients can discriminate 8-50 electrode-stimulation steps [20], which implies that a 5-bit quantizer may be perfectly adequate. It appears that a higher stimulation rate may be more beneficial than precise stimulation. Thus, in this processor, while keeping the power constant at 3µW per channel, we reduced the values of capacitances in the circuit to lower the precision to 7-bits and increased the sampling rate to 1kHz. As we discuss in Section 6, our processor allows the user of the chip to globally increase the sampling rate at the cost of increased power by altering DAC bits or by inputting external bias currents for $I_b$ and $I_{os}$.

V. CURRENT-AND-VOLTAGE BIASING

Fig.16 shows our PTAT current-reference circuit that is designed to generate a current of $(kT/q)\ln(9)/R$ where $R$ is near 1MΩ, and the transistors are in subthreshold MOS operation. A description of how standard PTAT circuits work may be found in [21]. Our circuit uses cascode mirrors to reduce the drain-voltage dependencies of the currents, and capacitive bypassing to attenuate power-supply effects on the reference output. The capacitive bypassing keeps gate-to-source voltages of all transistors nearly invariant with high-frequency noise on the supply, and is based on some of our prior work on designing high-PSRR (Power Supply Rejection Ratio) amplifiers [22]. Buffered versions of the four gate voltages labeled $V_n$, $V_{nc}$, $V_p$, $V_{pc}$ are used as reference voltages in various parts of the processor, primarily for setting cascode voltages. With a 2.8V supply, in our circuit, they are nearly at 0.45V, 1.05V, 2.0V, and 0.95V respectively.

If a standard 5-transistor OTA in follower configuration is used as a buffer, p-differential-pair buffers exhibit superior rejection of power-supply noise when buffering voltages referenced to ground, and n-differential-pair buffers exhibit superior rejection of power-supply noise when buffering voltages referenced to the positive supply. This association arises because differential-pair-based buffers reject noise that effectively manifests as a common-mode voltage variation through matching and drain-voltage-insensitivity of transistor current. However, they reject power supply noise through drain-voltage insensitivity only, unless the entire design is fully differential (our design is not fully differential due to tight area and power constraints). Thus, on our chip, we buffered $V_n$ and $V_{nc}$ with p-buffers and $V_p$ and $V_{pc}$ with n-buffers.

The resulting PTAT reference current of 45nA is used in a current-distribution network to distribute scaled-up and scaled-down current copies to bias the various DAC reference currents, amplifier bias currents, and comparator bias currents all over
An Ultra-Low-Power Programmable Analog Bionic Ear Processor

our chip. The PTAT property results in constant-$g_m$ behavior with temperature such that the corner frequencies of the bandpass filters and the peak-detector time constants are invariant with temperature. The small PTAT variation with temperature has little effect on the operation of any other circuits in the processor, where it is either cancelled out (e.g., in the logarithmic A/D), or causes a small percentage change in the bandwidth, noise, or power that does not affect operation significantly (e.g., in the envelope detector). This processor is expected to function in an environment inside the body of a patient, where the temperature is typically well regulated, making such effects more inconsequential. Nevertheless, just as in bipolar biasing circuits, it is extremely important in subthreshold MOS operation to pay careful attention to temperature effects because of the exponential dependence of the current on temperature. Our processor simply would not operate robustly if the gate voltages of biasing transistors were set with potentiometer voltages, a common practice during quick characterization of a chip, but unsuitable for medical use. Furthermore, for portable speech-recognition front ends, where our processor could also be used, robust operation with temperature variations is important.

For simplicity, an early version of our processor distributed voltages created by reference currents on a diode to create current copies elsewhere on the chip. Since transistor threshold voltages can vary substantially across the chip, and subthreshold circuits are exponentially sensitive to these variations, we obtained relatively poor matching and consequently a huge power overhead of almost two. The power overhead arises because poor matching causes the bias-voltage to have to be set at a worst-case value to get all circuits to work well. In contrast, by always distributing current and doing local mirroring, the matching is greatly improved. However, care must be taken to ensure that there are not too many stages of mirroring between the initial reference current and the final current copy that gets used, and that there is little power overhead because large currents are always used at the local site of generation and not distributed. The power overhead due to our biasing scheme was small and measured to be 3µW. Experimental data on matching and immunity to power-supply noise are presented in Section 7 where we discuss the overall operation of the processor.

VI. DIGITAL CONTROL, PROGRAMMABILITY, AND THE I/O INTERFACE

The processor interacts with the outside digital world by outputting bits that represent the result of its computation, and by receiving digital bits that allow its various DAC parameters to be programmed. In addition, the processor allows a select set of analog waveforms from each channel to be visible. These analog waveforms could be potentially useful for debugging and testing subjects in a highly variable patient population. We shall discuss the output-bit interactions in section A and discuss the programmability-and-visibility interactions in section B below.

A. The CIS Output Interface

Fig. 17 reveals how a 7-bit digital number from each of the 16 channels is created, synchronously latched, and serially multiplexed onto a common output bus for use in electrode stimulation. The final output of the chip is the digital data on the F bus, which contains 7-bit numbers from each of the 16 channels, one at a time, in cyclic-and-sequential fashion. If 16 electrodes are correspondingly synchronously stimulated with current pulses whose charge magnitude is proportional to the 7-bit number, then the method of stimulation is referred to as “Continuous Interleaved Sampling” or CIS stimulation. This method of stimulation is the most common method of stimulation today because it minimizes current spreading and interaction amongst the electrodes. The figure only shows the digital circuitry in two adjacent channels, namely channel 1 and channel 2, but all channels have identical circuitry.
A 7-bit global counter cyclically counts from 0 to 127 as each successive period of an input clock passes. Since, we actually need to count from 0 to 255 in our scheme (explained below), we treat the clock itself as the least significant ‘counter bit’ and effectively create an 8-bit counter even though we have a 7-bit counter. Equivalently, we are now counting from 0 to 255 half cycles of the clock. Thus, in Fig. 17, the clock is the C₀ bit, while the counter stages go from C₁-C₇. From now on, we will assume that we are counting from 0 to 255.

The dual-slope converters in each channel are configured such that de-integration is performed when the counter has values in the set \{0 to 127\}, auto-zeroing for the next conversion is performed from the end of de-integration to counter value 191, and integration for the next conversion is performed when the counter has values in the set \{192 to 255\}. There is only one counter for the whole chip and all converters are performing conversion in parallel in accordance with this scheme. Depending on the value of their input current magnitude sampled during integration, somewhere during the de-integration phase, each of the comparator outputs of the logarithmic converters will asynchronously go high. The exact value of the counter at the time a particular comparator output goes high is a digital measure of the log spectral energy in that comparator’s channel. Thus, at the time a comparator goes high, the value of the counter at this instant is latched into the first set of registers to create the 7-bit D numbers for each channel. Note that since the counter only goes from 0 to 127 during de-integration, we need only latch the C₀ through C₆ bits. The D numbers become valid at some asynchronous time which is different for each channel. To synchronize these numbers, they are all latched into a further set of registers to create the E numbers at the start of the integration phase (counter is at 192 or equivalently C₆ and C₇ are high). The 16 E numbers are now multiplexed onto the F bus by clocking a shift-register-controlled bank of tri-state buffers with a clock that is 16 times as fast as the slowest period-determining clock of the system, i.e., C₇. Thus, we use the C₃ bit of the counter, which is 16 times as fast as C₇, to synchronously move the E numbers onto the F bus, which is driven with the tri-state P drivers. The shift register is configured to cyclically shift an active token bit amongst its stages in sequential fashion. To summarize and clarify the above description, we note that a 1kHz sampling rate per channel on C₇ would cause the CIS output clock to operate at 16kHz while the input clock would operate at 128kHz.

To prevent the tri-states from fighting each other and being simultaneously on during transitions, a hazardous and power-wasting situation, the tri-state control signal from the shift register is ANDed with C₃: Whenever C₃ goes low, all tristates are inactivated after a slight delay. A half clock cycle later, when C₃ goes high again, only one tri-state with an active shift-register bit is activated. Thus, there is a half clock cycle gap between the activation of successive tri-states as long as the AND gate’s delay is small. Safe operation with no simultaneous activation of tri-states is assured if the AND gate’s delay is less than half a clock cycle. The output data of the F bus is guaranteed to be valid after the falling edge of C₃. At this point, the data has been driven onto the bus by some tri-state in the previous half period, this tri-state is about to go inactive, and all other tri-states are inactive. The C₃ clock is outputted as a ‘CIS clock’ for use in electrode stimulation.

In Figs. 15 and 17, the scheme described above is implemented by using C₆ AND C₇ to generate INT. All other timing signals (such as NOT(DEINT) and AZ) are generated asynchronously by a state-machine in each logarithmic A/D. Our choice of using a dual-slope conversion scheme is convenient because the sharing of the counter amongst all channels saves digital power, and because the counter conveniently provides various timing control signals.

In addition to outputting digital bits, the de-integration pulses from each channel are also directly reported off chip such that external analog-to-digital conversion may be performed on these pulses. Alternatively, external pulse-processing circuitry can directly transform these pulses to stimulate electrodes.
B. Programmability and Visibility

A programmability-and-visibility clock sequentially shifts an active token in a shift-register-based scanner on the chip [23]. When this token activates a particular channel, that channel can be programmed and its visible outputs can be examined. The active token can also activate no channel, the default mode when the processor is in use and not being programmed. If a programming control line is enabled, any of the 18 programmable bits of this channel can be altered and stored in on-chip latches. The 18 bits consist of 5 bits for setting a DAC that controls the low-frequency bandpass filter corner, 5 bits for setting a DAC that controls the high-frequency bandpass filter corner, 3 bits for setting the DAC release current in the envelope detector, 2 bits for setting the DAC attack current in the envelope detector, and 3 bits for setting the DAC reference current in the logarithmic A/D. In addition, 3 global bits can be configured to allow the sampling rate per channel to be 500Hz, 1kHz, 1.5kHz, 2kHz, or to an externally settable value with a user-supplied bias current. Thus, there are 16 x 18 + 3 = 291 programmable bits in the system.

When a channel is visible, the input of the first highpass stage in the bandpass filter, the output of the bandpass filter, the rectifier output current, the peak-detector output current, the diode input to the logarithmic A/D, the dual-slope waveform of the logarithmic A/D, and the comparator output of the logarithmic A/D may be observed off chip. Internal current waveforms are transduced to voltages with an instrumentation transconductor whose bias current is electronically adjustable.

VII. EXPERIMENTAL PERFORMANCE OF THE PROCESSOR

Fig. 18 shows a die photo of the processor. The chip was fabricated in a 1.5µm AMI BiCMOS process available through MOSIS, the prototyping chip-fabrication service. The 9.58mm x 9.28mm chip has 11,640 NMOS transistors, 11537 PMOS transistors, 287 capacitors, and a handful of bipolar transistors. It was tested on a custom printed circuit board, with a digital I/O system (DIO), an analog I/O system (AIO), and a PC running Matlab. The DIO is a DAQPAD-6507 from National Instruments, which interfaces to the USB port on a computer, and is used for programming the chip only. The AIO is a DAQPAD-6070E from National Instruments with 12 bit resolution, 1.25MS/s capability, a Firewire computer interface, and BNC connectors to connect with the test board. While, the output bits of the chip are digital, the AIO system has enough bandwidth and resolution to allow us to view the digital output waveforms as analog signals. Matlab allows us to plot and visualize the data from the chip.

Fig. 19 shows the output frequency response of all channels when all filters are configured with the same DAC bits. The data reveal that there is good time-constant matching across all channels since all bandpass filters peak at the same frequency. In addition, the second-order roll-off slopes of a bandpass filter ought to be linear on a log-log plot. Since the output bits of the chip represent the logarithm of the spectral energy in each channel, they behave like they’re on a logarithmic scale, and exhibit a nice linear change with frequency. This data reveals that each channel is behaving as expected except that the gains of each channel are not the same, resulting in offset shifts in logarithmic bit space. Figs. 20A and 20B show that the offset calibration bits of the logarithmic A/D may be configured to attenuate these offsets: Fig. 20 shows that the logarithmic response of the bits to varying amplitudes of a sine tone at the peak frequency of the channel have much less offset after calibration (Fig. 20B) compared with before calibration (Fig. 20A). We also see that there is a 57dB internal dynamic range of operation from 700µV rms to 500mV rms. Fig. 21 shows that the bandpass filters in the array may be configured to span the entire audio frequency range from 200Hz to 6kHz with approximately 50dB of stop band rejection (1.5 decades with 40dB per decade).

Fig. 22 shows that a Matlab simulation of our chip generates an output similar to that from the chip: The input to the chip is a sine tone with a frequency that ramps up and down as revealed by its FFT in the lower-most plot. The Matlab plot in the center
and the chip output at the very top generate broad responses across the channels due to the relatively broad filters in both implementations.

Fig. 23 shows photographs of real-time spectrograms generated by plotting the output bits from the 16 channels as a function of time with dark values representing high bit numbers and lighter values representing low bit numbers. By talking into the FG3329 microphone and watching the computer, a running spectrogram of one’s voice can be constructed as one speaks. The spectrograms were taken while the speaker was saying the word ‘bit’. The word is said twice in a row with a gap. On the spectrogram shown on the left, there is no RF noise in the supply. On the spectrogram shown on the right, we intentionally capacitively couple a 49MHz carrier modulated at 1kHz onto the power supply of the chip. The amplitude of the carrier as measured directly on the power supply was 430mVpp and the amplitude modulation depth was 100%. We see that there is barely any change in the two spectrograms even with large interference. Such robust operation is critical to the performance of the chip in real implant environments and confirms that the attention paid to power-supply-immune design was worthwhile. In addition, tests of RF interference performed with a large RF coil surrounding the test board reveal that very high levels of radiated RF (1Vrms on a 10-turn coil of dimensions 30cm X 30cm encircling the chip, and inducing a carrier amplitude of 400mVpp on the power supply) do not interfere with the operation of the chip.

The power of various portions of the processor was measured by measuring the current consumption of that portion with a Keithley electrometer (separable supply rails for each portion made this measurement possible) and then multiplying by 2.8V, the power supply voltage used for the processor. These power measurements are listed in Table 1, with a total power consumption of 211µW. The numbers were measured at a 1kHz sampling rate per channel with the lowest-frequency filter configured to have corners at 200Hz and 400Hz, and the highest-frequency filter configured to have corners at 5kHz and 10kHz.

VIII. ANALOG VERSUS DIGITAL

We estimate that a traditional A/D-then-DSP implementation of our processing would use about 0.25mW-0.5mW for the microphone front end and A/D, and 250µW/MIP x 20 MIPS = 5mW for the other processing, yielding a total power consumption of about 5.5mW. These numbers are representative of state-of-the-art cochlear-implant processing. Thus, our 211µW implementation is at least 25 times more power efficient even though it is in a 1.5µm process, not in an advanced submicron process. If we subtract the 100µW power consumption of our microphone and microphone front end, we are doing 20MIPS of processing in 100µW of power, i.e., we are operating at around 5µW/MIP.

Needless to say, the digital µW/MIP number will constantly improve with Moore’s law. If we generously assume that it’s actually 0 at the end of Moore’s law, the power consumption of a very low power microphone front end, anti-alias filter, and A/D would likely still exceed 211µW. It must be remembered that A/D scaling in speed, power, and precision is much slower than Moore’s law and that some circuits in our analog implementation could also benefit from these improvements. For example, the envelope detector and logarithmic A/D would both have reduced power consumption in a more advanced technology with reduced parasitic capacitance. Reducing the IDR in each channel and doing more gain control would cut analog power consumption further. For example a 40dB IDR system could operate with 140µW—150µW of power. The use of very narrowband filters or higher-order filters could increase our power consumption to 300µW. However, we would still be an order of magnitude more efficient than an A/D-then-DSP implementation. It should be noted that very narrowband filters and higher-order filters have poor timing resolution and usually do not improve patient performance or speech-recognition performance. A
custom digital solution would certainly reduce our power advantage but the high cost of the A/D and microphone would still give us a significant advantage.

It is useful to understand why we are operating more efficiently than an A/D-then-DSP implementation: An A/D immediately creates a representation of the incoming information as a series of relatively high-precision and high-speed numbers (16bits at 44kHz is typical in this application) that by themselves carry very little meaningful information. This digitization costs a lot of power because it is expensive to do any task at high speed and high precision. The high precision is necessary if we want wide-dynamic-range operation and are doing all computations including gain control in the digital domain, and the high-speed is necessary to avoid aliasing. Then, a DSP takes all of these numbers and crunches them with millions of multiply-accumulate operations per second, burning power in several switching transistors. It finally extracts more meaningful log-spectral-energy information at a much slower 100Hz-1kHz rate in 16 parallel bands, and at 8-bit precision, due to the high variability in speech data. In contrast, analog preprocessing allows for efficient compression of the incoming data such that low-speed and low-precision A/D’s at a later stage of the computation quantize the meaningful information. Some of our prior work analyzes the optimal point for digitizing information in more general systems [15]: Too much analog preprocessing before digitization is inefficient because the costs required to maintain precision begin to rise steeply, while too little analog preprocessing before digitization is inefficient because analog degrees of freedom that can be exploited to improve computational efficiency are ignored in the digital system.

Analog systems are more efficient than digital systems at low output precision while digital systems are more efficient than analog systems at high output precision [15]. In this processor, the output precision in each channel is 7 bits, and the output bandwidth is a few kHz at most. The IDR is near 60dB with gain control allowing 77dB of input dynamic range. An analog solution can therefore compete with a digital solution if we are careful in maintaining the necessary precision throughout the system. If the task required 14 bits of output precision, 72dB IDR, and 100kHz bandwidth at each channel, the A/D-then-DSP strategy would definitely be more efficient than our solution. An analog solution has to preserve its efficiency advantage by paying a lot of attention to robustness. The robustness is not attained in every device and in every signal as in a digital solution but at important locations in the signal-flow chain, where it matters. In other words, the robustness-efficiency tradeoff is addressed very differently in an analog system than in a digital system. The programmability is certainly not as great in an analog system as it can be in a digital one. However, as in our case, this is less of an issue when an algorithm that is known to work needs to be implemented and excessive programmability is attained for a huge loss in efficiency.

IX. CONCLUSIONS

We described the operation of a 16-channel, 211µW, 77dB dynamic range programmable analog bionic ear processor that cuts the power consumption of state-of-the-art A/D-then-DSP approaches by a factor of 25. The processor is effectively operating at 5µW/MIP. Even if significant changes are made to its architecture, the lower-power advantage of our processor is likely to stand at the end of Moore’s law because its total power consumption is less than that of a very low power A/D and microphone front end alone. The processor could run for 30 years on a 100mAh battery capable of 1000 recharges. It is thus suited for use in fully implanted bionic ears of the future and in portable speech-recognition systems, where it could function as an ultra-low-power spectrum analyzer. This design suggests that the current trend of digitizing analog information at high-speed and high precision as soon as possible followed by processing in the digital domain is not an efficient solution if power...
consumption is of paramount importance. Rather, it is more advantageous to do robust-and-programmable analog preprocessing and digitize higher-level information at lower speed and lower precision.

REFERENCES

Figure 1: Overall architecture of the analog bionic ear processor.

Figure 2: Analog front end (AFE) with a microphone-current-sense topology and low-frequency feedback.

Figure 3: Overall Circuit of the microphone front end of Figure 2.
Figure 4: AFE frequency response and supply rejection performance to 5MHz.

Figure 5: Microphone Front End Power-Supply Insensitivity to 26MHz.
Figure 6: The Variable Gain Amplifier.

Figure 7: The Translinear Compression Circuit used for Gain Control.

Figure 8: The AGC Input-Output Compression Characteristics.
Figure 9: Experimental AGC output waveforms showing: (A) the dynamics of attack with $T_a=10\text{ms}$; (B) the dynamics of release with $T_r=300\text{ms}$.

Figure 10: The capacitive-attenuation bandpass filter. Two second-order filter stages are cascaded.

Figure 11: The envelope-detector topology (A); the feedback amplifier implementation (B).
Figure 12: The current-mode peak-detector topology with adjustable attack and release time constants.

Figure 13: Experimentally measured envelope detector characteristics: (A) for the 63dB dynamic range used in the channels, $f=1$kHz; (B) for the 75dB dynamic range used in the AGC, $f=100$Hz, 1kHz, 10kHz.

Figure 14: Envelope detector: (A) attack-time programmability ($T_a=1$ms and $T_a=4$ms); (B) release-time programmability ($T_r=4$ms, $T_r=6$ms, $T_r=8$ms, $T_r=12$ms, $T_r=24$ms).
Figure 15: The Logarithmic A/D topology.

Figure 16: The on-chip reference current and voltage bias circuit.
Figure 17: The digital I/O interface for Continuous Interleaved Sampling (CIS) output stimulation.

Figure 18: A die photo of the bionic-ear processor chip.
Figure 19: Matching time constant data across all 16 channels of the chip at the same bandpass filter DAC bits.

Figure 20: Logarithmic A/D offsets: (A) before calibration; (B) after calibration. All 16 channels have an IDR of 57dB.
Figure 21: The Bits-vs-Frequency Curve for all 16 channels. The filters span the frequency range from 200Hz to 6kHz.

Figure 22: Experimental word spectrogram (top figure), Matlab-simulation word spectrogram (middle figure), and FFT word spectrogram (bottom figure) to compare the system against.
Figure 23: Spectrogram of the word /bit/ put through the actual system: (A) The spectrogram in the absence of the RF noise; (B) The spectrogram in the presence of a 49MHz parasitic sinusoid modulated with an in-band 1kHz signal with 100% AM modulation. The RF noise is capacitively coupled to the power supply causing a signal as large as 430mVpp on its bus. Nevertheless, we see that the spectrogram barely changes. Note that the time axes in the two plots have different origins.

<table>
<thead>
<tr>
<th>Portion of System</th>
<th>Power Consumption</th>
</tr>
</thead>
<tbody>
<tr>
<td>Microphone Front End and Microphone</td>
<td>100uW</td>
</tr>
<tr>
<td>Automatic Gain Control Circuit</td>
<td>30uW</td>
</tr>
<tr>
<td>Bandpass Filters</td>
<td>19uW</td>
</tr>
<tr>
<td>Envelope Detectors</td>
<td>14uW</td>
</tr>
<tr>
<td>Logarithmic A/D (analog section)</td>
<td>13uW</td>
</tr>
<tr>
<td>Digital Control and CIS Output Interface</td>
<td>32uW</td>
</tr>
<tr>
<td>Bias Circuits</td>
<td>3uW</td>
</tr>
<tr>
<td>Entire Chip Processor</td>
<td>211uW</td>
</tr>
</tbody>
</table>

Table 1: The power of various portions of the processor.