# HermesE: A 96-Channel Full Data Rate Direct Neural Interface in 0.13 $\mu$ m CMOS

Hua Gao, Student Member, IEEE, Ross M. Walker, Student Member, IEEE, Paul Nuyujukian, Student Member, IEEE, Kofi A. A. Makinwa, Fellow, IEEE, Krishna V. Shenoy, Senior Member, IEEE, Boris Murmann, Senior Member, IEEE, and Teresa H. Meng, Fellow, IEEE

Abstract—A power and area efficient sensor interface consumes 6.4 mW from 1.2 V while occupying 5 mm  $\times$  5 mm in 0.13  $\mu m$  CMOS. The interface offers simultaneous access to 96 channels of broadband neural data acquired from cortical microelectrodes as part of a head-mounted wireless recording system, enabling basic neuroscience as well as neuroprosthetics research. Signals are conditioned with a front-end achieving 2.2  $\mu V_{\rm rms}$  input-referred noise in a 10 kHz bandwidth before conversion at 31.25 kSa/s by 10-bit SAR ADCs with 60.3 dB SNDR and 42 fJ/conv-step. Switched-capacitor filtering provides a well-controlled frequency response and utilizes windowed integrator sampling to mitigate noise aliasing, enhancing noise/power efficiency.

Index Terms—Biosignal conditioning, boxcar sampling, charge sampling, high channel count, low noise, low power, neural interface, switched capacitor, windowed integrator.

#### I. INTRODUCTION

CQUISITION of neuronal electrical activity via chronically implanted electrode arrays has enabled a wide range of advances in electrophysiological experimentation [1], [2] towards basic neuroscience as well as neural prosthetics [3], [4]. Research tools created by IC designers are used to explore the function of the central and peripheral nervous systems [5]–[8], and make an impact in the way we diagnose, treat, and understand a broad range of neurological ailments such as epilepsy, chronic pain, obsessive compulsive disorder, and chronic neurodegenerative diseases [8]–[10]. We have been involved in the study of motor/premotor cortical activity [11] and its application towards prostheses, enabled in part by the "Hermes" series

Manuscript received August 26, 2011; revised November 18, 2011; accepted November 20, 2011. Date of publication February 27, 2012; date of current version March 28, 2012. This paper was approved by Guest Editor Vivek De. This work was supported in part by the C2S2 Focus Center, one of six research centers funded under the Focus Center Research Program (FCRP), a Semiconductor Research Corporation entity, and in part by Analog Devices.

- H. Gao, R. M. Walker, B. Murmann, and T. H. Meng are with the Department of Electrical Engineering, Stanford University, Stanford, CA 94305 USA (e-mail: hgao@stanford.edu; rossw@stanford.edu).
- P. Nuyujukian is with the Department of Bioengineering, Stanford University, Stanford, CA 94305 USA (e-mail: paul@npl.stanford.edu).
- K. A. A. Makinwa is with the Electronic Instrumentation Laboratory, Delft University of Technology, Delft 2628CD, The Netherlands.
- K. V. Shenoy is with the Departments of Electrical Engineering and Bioengineering and Neurosciences Program, Stanford University, Stanford, CA 94305 USA.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSSC.2012.2185338

of mobile neural acquisition systems [6], [12], [13], which aim to provide high-quality broadband recording of ensemble neuronal activity in freely behaving primates over long periods of time (the merits of which are discussed in [2], [12]).

A small head-mounted enclosure houses the Hermes system [see Fig. 1(a)], which interfaces with an intracranial 96-channel Utah Electrode Array (UEA). Physical size and battery-life constraints have thus far limited our capabilities to record only from a subset of the available channels [6], [12], or to record a compressed version of neural activity in the form of threshold crossings [13]. The focus of this work is the design of a high channel count, high fidelity neural sensor interface IC [14] that achieves state-of-the-art power consumption and noise performance while providing instrument grade neural recordings for the primary purpose of enabling neuroscience research. The prescribed IC will form the cornerstone of the next-generation HermesE [see Fig. 1(b)] wireless acquisition system which extends our recording capabilities to 96 broadband channels. Combining our IC with a low power ultra-wideband transmitter [15] and low voltage FPGA, HermesE will consume  $\sim 30$  mW, an  $11 \times$ reduction in power compared to what is achievable by scaling previous systems [6] to 96 broadband channels.

Several similar and recent works are presented in [5], [7], [8], which provide high channel count access to implanted electrode arrays. However, these ICs do not focus on instrument grade neural signal acquisition on par with commercially available systems in terms of noise, bandwidth, and channel-to-channel matching, although they do incorporate other functionality useful in research and treatment such as providing spike detection [5], [7] for prosthesis control or neural stimulation capabilities [8]. To our knowledge, the majority of neuroscience labs employ commercial, off-the-shelf systems (e.g., from Blackrock Microsystems, Neuralynx, or Plexon) to acquire high quality broadband ( $< 3 \mu V_{\rm rms}$  noise, 10 kHz bandwidth) neural recordings for research purposes, and these commercial systems are considered the state-of-the-art for recording. Our interface IC aims to be a robust and viable alternative that provides similar recording performance and can be used in low power (1.2 V), mobile recording systems.

This paper is organized as follows. Section II describes high level system considerations. Sections III and IV detail the designs of the signal conditioning circuitry and analog-to-digital converter, respectively. Sections V and VI cover the digital interface and floorplanning of the chip. Measurement results and comparisons to other systems are given in Sections VII and VIII.



Fig. 1. (a) Cross-section of the head-mounted Hermes system. (b) HermesE system block diagram.



Fig. 2. In-vivo extracellular neural signals

#### II. SYSTEM OVERVIEW

The signals picked up by an implanted neural electrode array typically consist of three components (illustrated in Fig. 2): extracellular neural action potentials (ENAP or "spikes"), the local field potential (LFP), and a relatively large DC offset. Neural spikes are short duration, commonly biphasic pulses that last around 0.3 to 1 ms (1 ms for primates), picked up from a firing neuron in the vicinity of the electrode tip. Peak ENAP magnitudes vary from about 20  $\mu$ V to 1 mV, depending on the separation distance of the neuron and electrode, and most of the energy content resides in the 500 Hz to 5 kHz band. Due to 400  $\mu$ m electrode spacing, a single neuron only couples ENAP signals into one electrode, but each electrode may pick up ENAP signals from multiple nearby neurons. Spike sorting [16] can be used to classify each neuron according to its spike shape. Given these characteristics, the ENAP conditioning sections are chosen to have 56 dB gain with an output range of  $\pm 600$  mV differential (in our 1.2 V process), in a passband of 300 Hz to 10 kHz to fully encompass all of the relevant signal energy with some margin for reducing magnitude and phase

LFPs are slow oscillations (<200 Hz) up to about 3 mV peak amplitude that arise from the aggregate firing of many neurons in one region of the brain and are highly correlated across the implant site for UEA geometries, requiring only a few channels to sufficiently capture its behavior. Therefore, we dedicate four channels to acquire composite LFP and ENAP signals with a reduced gain of 40 dB and extended passband of <1 Hz to 10 kHz, allowing the other 92 channels to have relaxed dynamic range requirements for acquiring ENAP only.

At the electrode-tissue interface, an electrical double layer is formed by the metal electrode and ions very near its surface, polarizing the tip [17] and developing a large DC offset (up to about 15 mV in our systems). This DC component is removed through the use of AC coupling capacitors at the input of each channel. A more thorough account of neural signals and electrode arrays is given in [18] and the references therein.

Coupling through the electrode-tissue interface is primarily capacitive, and a very rough electrical model of the electrode is a series combination of a 1 nF capacitor and 100 k $\Omega$  resistor [19], though in reality the impedance is nonlinear and dependent on the time-varying conditions of the surrounding tissue. Background cortical activity and the electrode itself give rise to a noise floor on the order of 10  $\mu V_{\rm rms}$  [19] at the input, in a 10 kHz band. To achieve high fidelity recordings, the input-referred noise of the signal conditioning circuitry is targeted at 2  $\mu V_{\rm rms}$ .

The HermesE system and neural interface IC block diagrams are shown in Fig. 1(b). A switched-capacitor (SC) band-pass filter (BPF) architecture is used in the signal conditioning section to ratiometrically and accurately set frequency corners. This immunity to process variation is an important quality that obviates the need for hand tuning, simplifying the usage by neuroscientists and engineers who will ultimately employ HermesE in practical settings. To enhance noise/power efficiency of the SC filter, a  $\rm G_m\text{-}OTA\text{-}C$  integrator is utilized for sampling.

Data conversion is performed at 31.25 kSa/s, slightly oversampling to increase alias rejection. The ADC resolution is chosen to be 10-bit for a peak-peak input range of 1.2 V differential, keeping quantization noise small relative to the noise of the signal conditioning section. Successive-approximation register (SAR) ADCs are used as they exhibit ultra-low power consumption for moderate resolution and low sample rates due to amplifier-less implementation. By burying active circuitry underneath MIM capacitors available in our process, our designs are sufficiently area-efficient to allocate one ADC per channel, avoiding the increased system complexity involved with time division multiplexing and minimizing buffer power [20] in the signal conditioning section.

A configuration register sets global bias and timing parameters by controlling the bias generator and on-chip clock generator, which derives the signal conditioning and ADC clocks from an external 40 MHz master clock. An output register serializes the data for off-chip delivery.



Fig. 3. (a) Signal conditioning front-end schematic and (b), (c) phases of operation.

#### III. SIGNAL CONDITIONING

Fig. 3(a) shows the schematic of the signal conditioning circuitry, which amplifies and filters the raw electrode signals before digitization. Each channel processes signals from a different high impedance electrode (at  $V_{\rm in}$ ) referenced to a common low impedance electrode ( $V_{\rm inref}$ ). The reference electrode is typically laid on the surface of the brain adjacent to the array, in contact with cerebrospinal fluid, and provides rejection of common mode noise, interference, and bias shifts.

The signal conditioning circuitry consists of an AC coupled input transconductor (A1), a SC band-pass filter to bandlimit neural signals before digitization by the subsequent ADC, and source followers used to drive the ADC's input capacitances. A fully differential architecture is employed for robustness against on-chip common mode aggressors as well as charge injection in the SC filter. The chain is designed to achieve state-of-the-art noise/power efficiency in spite of the noise aliasing penalty incurred by the use of SC filtering, requiring optimization of the interaction between the SC filter and A1, as well as with the subsequent ADC. The resulting front-end provides a number of advantages over continuous time (CT) filters used in similar designs [5]–[10], [21]–[23], such as reduced area, reduced sensitivity to component variations, and enhanced channel-to-channel matching.

#### A. Switched-Capacitor Band-Pass Filter

SC filters are well suited for low frequency signal conditioning, but suffer from noise aliasing leading to reduced noise/

power efficiency compared to CT filters. In our application, where noise/power efficiency is key, prefiltering must be used to bandlimit high frequency noise before sampling in order to minimize this effect. However, practical prefilters have a finite transition band necessitating the use of increased sampling frequency, which incurs an increased spread in capacitance values. In our massively-parallel design, efficiency must be balanced against area: a large capacitance spread may incur a high cost in chip area as well as power consumption, while the use of a low sampling frequency impairs prefiltering. We overcome these challenges through the use of a feedback integrator topology, a custom 1 fF MOM capacitor,  $100 \times$  oversampling with a 1 MHz clock, and two steps of prefiltering.

The filter topology presented here is based on the SC biquad described in [24], which achieves low capacitance spread for low Q poles. The architecture in [24] is generally appropriate for acquisition of action potentials (though improvements in noise/power efficiency are desirable) where the 300 Hz–10 kHz bandpass response corresponds to poles with Q  $\cong$  0.2. While the biquad in [24] samples the input as a voltage on a sampling capacitor, the filter presented here samples the integrated output current of A1 in the  $\Phi_1$  phase [see Fig. 3(b)], during which A1, A2, and  $C_A$  form a  $C_M$ -OTA-C integrator. In the  $\Phi_2$  phase [see Fig. 3(c)] the sampled charge is processed by A2 and A3 in a feedback loop that realizes the desired band-pass response.

The windowed integrator sampling technique [25], [26] used in  $\Phi_1$  (sometimes referred to as charge [27]–[29] or boxcar sampling [30], [31]) provides many benefits that help reduce

the power consumption and area of the signal conditioning circuitry. A major benefit is the inherent prefiltering provided by integration, in contrast to typical voltage sampling which significantly aliases multiple sidebands of noise. Windowed integrator sampling lowers the equivalent noise bandwidth through *sinc* filtering, which can be understood by considering a fictional CT signal

$$Q_{\text{in2,CT}}(t) = \int_{t-\text{DT}_{-}}^{t} I_{\text{out1d}}(t')dt'$$
 (1)

where D is the duty cycle of the  $\Phi_1$  sampling phase,  $T_s=1/f_s$  is the SC clock period,  $I_{\rm out1d}$  is the differential output current from A1, and  $Q_{\rm in2,CT}$  is a running windowed integral of  $I_{\rm out1d}$ . The actual sampled charge input signal to the filter is the discrete time (DT) sequence  $Q_{\rm in2,DT}[n]=Q_{\rm in2,CT}(nT_s),$  which has a spectrum composed of folded sidebands of  $Q_{\rm in2,CT}(\omega).$  From (1) it can be shown that

$$\left| \frac{Q_{\text{in2,CT}}(\omega)}{I_{\text{out1d}}(\omega)} \right| = \frac{2\sin\left(\frac{\omega DT_s}{2}\right)}{\omega}.$$
 (2)

In the baseband, where  $\omega DT_s/2 \ll 1$ , the magnitude response of the *sinc* function is essentially flat. Wideband noise at the upper sidebands is filtered by the *sinc* response before folding into  $f_s/2$ , with a bandwidth that depends on the duration of  $\Phi_1$ .

The noise equivalent bandwidth of the *sinc* prefilter in this design is  $3.4 \times$  smaller than a typical voltage sampling circuit designed for 0.1% dynamic settling error in the same allotted time of  $D \cdot T_{\rm clk}$ . This ratio directly compares the effects of the two sampling schemes on noise in the signal path preceding the driver stage, though the comparison to noise from the driver stage itself must be framed in terms of kT/C. In this design, the noise charge sampled from the output branch devices in A1 is equivalent to kT/C noise from processing samples of the output voltage of a purely resistive driver taken with a 5.4~pF sampling capacitor. The use of such a sampling capacitor would necessitate much larger capacitances throughout the filter in order to achieve the specified gain and pole positions, significantly increasing area as well as power required for settling, and would necessitate a low impedance output from A1.

While windowed integrator sampling provides inherent anti-aliasing, achieving state-of-the-art noise/power efficiency requires further filtering of A1's noise. Hence, A1 is designed to provide an additional 100 kHz prefiltering pole (see Section III-B).

Clock jitter can adversely affect the performance of windowed integrator sampling when employed in systems with high sample rate and SNR [26], [28], or when notch frequencies in the *sinc* response are specifically utilized for blocker rejection or other purposes [26], [29]. However, simulations show that this design, which has relatively low  $f_s$  and SNR and does not rely on precise notch frequencies, experiences insignificant effects from typical levels of clock jitter.

# B. Input Transconductor (A1)

Fig. 4 shows the A1 schematic in detail, SC CMFB (not shown) sets the output common mode. A1 incorporates traditional low noise techniques, as well as some special considerations for interfacing with the SC filter. A1 essentially



Fig. 4. A1 detailed schematic

operates in CT: the output current is switched between the A2 inputs in  $\Phi_1$ , and resistor dummy loads in  $\Phi_2$  which roughly match the loading conditions in  $\Phi_1$  (keeping the voltage at A1's outputs similar in each phase).

As seen in Fig. 3(b),  $C_{\rm AC}$  forms a capacitive divider with the input capacitance of A1,  $C_{\rm IN1}.$  The divider attenuates the input signal, and has implications for noise that are analogous to the feedback factor,  $\beta,$  in a feedback OTA. In this design, a large value of  $C_{\rm AC}$  is used (20 pF MIM) and half-width Miller neutralization [32] devices ( $M_{\rm N}$  in Fig. 4) are employed to eliminate the amplified  $C_{\rm gd}$  of the A1 input pair. The value of  $C_{\rm AC}$  has no other effect on the overall transfer function, in contrast to feedback neural amplifier designs [5]–[10], [21]–[23], [33], where  $C_{\rm AC}$  sets the ideal passband gain.

The input network includes pMOS reset switches  $[M_{\rm RESET}$  in Fig. 3(a)] which are normally off, but may be asserted to short the inputs of A1 to a common-mode voltage. When deasserted, the large off-state resistance of the reset switch and  $C_{\rm AC}$  set a high pass corner < 1 Hz which rejects the polarization potential offset between the active and reference electrodes. High G-force head movements can cause large differential voltage transients at the front-end due to physical motion of the electrodes relative to the brain tissue. If uncorrected, these transients saturate the front-end and relax according to the mHz high-pass time constant, leading to undesirably long recovery times [21] in a practical experimental setting. An automatic reset mechanism (shown in Section IV) is implemented to address this issue for each channel individually, by activating the input reset switches.

The input pair ( $\rm M_1$  in Fig. 4) is implemented with large (2.4 mm/0.28  $\mu$ m) pMOS devices to mitigate 1/f noise. Thick oxide devices are used to avoid gate leakage that would cause bias shifts and noise due to the large off-state impedance of the reset switches. All other FETs in the signal conditioning section are thin oxide devices except for the  $\rm C_{shunt}$  MOSCAPs.  $\rm M_1$  and  $\rm M_2$  devices are biased in subthreshold close to  $\rm g_m/I_D=25~V^{-1}$ , while  $\rm M_3$  and  $\rm M_4$  are biased in inversion near 15  $\rm V^{-1}$  to decrease their noise contributions.

Nwell source degeneration resistors,  $R_{\rm s}$ , suppress  $M_2$ 's 1/f noise contribution to insignificant levels [33] with  $g_{\rm m2}R_{\rm S}=6$ , where  $g_{\rm m2}$  is the transconductance of the  $M_2$  device. The  $14~\rm k\Omega$  resistors contribute thermal noise equivalent to a transistor with a  $g_{\rm m}/I_{\rm D}$  of  $5.5~\rm V^{-1}$ , approximately 1/5th of the noise power of the  $M_1$  input pair. Such minimization of an active load's thermal noise is difficult or impossible here through FET device sizing alone, due to the low bias currents employed. The resistors produce negligible flicker noise, leading to a total noise contribu-

tion far below what can be achieved with a non-degenerated active load in this design.

Thick oxide pMOS gate capacitors,  $C_{\rm shunt}$  in Fig. 4, are placed at the folding node to implement a 100 kHz prefiltering pole. This pole attenuates large noise currents from the main branch, providing roughly 20 dB of rejection at the first sideband. The  $C_{\rm shunt}$  capacitors are connected to  $V_{\rm DD}$  and biased in inversion to provide 23 pF. While  $R_{\rm F} \| R_{\rm B}$  and  $C_{\rm shunt}$  experience significant spread over PVT, the prefiltering pole is far enough from the 10 kHz ENAP band edge such that variations do not significantly affect the baseband magnitude response. Additionally, the antialiasing is strong enough such that variations do not significantly affect the overall noise.

Bias current in the output branch is scaled down  $20\times$  from the main branch in order to reduce the noise contributions of  $M_3$  and  $M_4$ , critical since  $M_3$  and  $M_4$  dominate aliased noise from the upper sidebands. The limit of bias current scaling in this design is determined by the worst case combination of in-band and out-of-band signals expected to pass through A1. This includes DC offsets, 60 Hz power line interference, and ENAP/LFP signals. Choosing the output branch current  $I_B=650$  nA provides sufficient current swing with  $>3\sigma$  margin for expected signals, allowing more than 5 mV of total input amplitude before clipping.

The majority of aliased noise comes from  $M_3$  and  $M_4$  at the first sideband where the sinc prefilter response is still fairly large. A consequence of the  $C_{\rm shunt}$  capacitance is that significant noise is introduced from the M3 cascode device when the impedance at the folding node becomes small and M<sub>3</sub> ceases to self-shunt its own noise. However, this noise penalty is much smaller than allowing the large noise currents from the main branch to be aliased, due to the 20:1 bias current scaling. Since noise from M<sub>3</sub> does not roll in until after the 100 kHz prefiltering pole,  $M_3$  does not contribute significant 1/f noise.  $M_4$ contributes both 1/f and thermal noise, though its thermal noise contribution is more significant due to aliasing. The prefiltering provided by the sinc frequency response and the 20:1 bias current scaling reduce the total noise contribution from M<sub>3</sub> and  $M_4$  to about 1/4th of the noise power from devices in the main branch (see Section III-C).

# C. Overall Transfer Function and Noise

For baseband signals (where  $f \ll f_s$ )  $I_{\rm out1d}$  does not change much over the integration window,  $T_s$ . The sampled charge is processed in  $\Phi_2$  [see Fig. 3(c)], which enforces a z-domain biquadratic transfer function with a band-pass response. The ADC samples the source follower outputs at the beginning of  $\Phi_1$  when the source followers are disconnected from the filter. This leads to the overall baseband signal transfer function for acquiring ENAP signals alone without LFP

$$H_{BB}(z) = \frac{V_{od}(z)}{V_{id}(z)}$$

$$= \frac{-\alpha G_{m} D T_{s} z^{-1} (1 - z^{-1})}{\frac{C_{2} C_{3}}{C_{B}} + C_{A} + C_{1} - (2C_{A} + C_{1}) z^{-1} + C_{A} z^{-2}}.$$
 (3)

Here  $V_{id}(z)$  is the z-transform of  $V_{id}[n] = V_{id}((n-D)T_s)$ ,  $\alpha = C_{AC}/(C_{AC} + C_{IN1})$  is the capacitive divider at the input

TABLE I
SIGNAL CONDITIONING SECTION—DESIGN PARAMETERS

| Design Parameter                     | Value         |
|--------------------------------------|---------------|
| $G_{m}$                              | 240 μS        |
| $I_{\mathrm{B}}$                     | 650 nA        |
| D (ENAP, ENAP/LFP)                   | 0.3, 0.1      |
| $f_s$                                | 1 MHz         |
| $f_{-3dB,HP}$                        | 300 Hz        |
| $f_{-3dB,LP}$                        | 10 kHz        |
| $C_1$                                | 94 fF (MIM)   |
| $C_2$                                | 1 fF (MOM)    |
| C <sub>3</sub>                       | 250 fF (MIM)  |
| $C_A$                                | 1.5 pF (MIM)  |
| $C_{B}$                              | 1.33 pF (MIM) |
| $C_{\text{shunt}}$                   | 23 pF (MOS)   |
| $R_{S}$                              | 14 kΩ (Nwell) |
| $C_{AC}$                             | 20 pF (MIM)   |
| $\alpha = C_{AC}/(C_{AC} + C_{IN1})$ | 0.85          |
|                                      |               |

nodes seen in Fig. 3(b), and  $G_{\rm m}$  is the low frequency transconductance of A1. The approximate passband gain and pole positions can be shown to be

$$A_{\text{passband}} \cong \frac{\alpha G_{\text{m}} D T_{\text{s}}}{C_{1}}$$

$$f_{-3 \text{ dB,HP}} \cong \frac{f_{\text{s}}}{2\pi} \cdot \frac{C_{2} C_{3}}{C_{1} C_{\text{B}}} \qquad f_{-3 \text{ dB},LP} \cong \frac{f_{\text{s}}}{2\pi} \cdot \frac{C_{1}}{C_{A}}. \tag{4}$$

The choice of  $f_{\rm s}=1$  MHz for  $100\times$  oversampling allows for effective and robust prefiltering, but results in large capacitance ratios required to implement the  $f_{-3~{\rm dB,HP}}$  pole. The feedback integrator topology alleviates capacitance spread by defining  $f_{-3~{\rm dB,HP}}$  with multiplicative ratios of capacitances. The output swing of A3 and the value of  $C_3$  define the maximum amount of sampled charge that can be rejected by the feedback loop in  $\Phi_2$ ; 250 fF was estimated to provide rejection for  $>3\sigma$  expected out-of-band signals.

To reduce the size of the  $C_B$  capacitor,  $C_2$  is implemented with a custom 1 fF lateral MOM capacitor. While poor matching between  $C_2$  and the MIM caps results in some global variability of the  $f_{-3~{\rm dB,HP}}$  pole, the variation (measured at 7% for our implementation) is acceptable given that the pole is intended to provide roughly 20 dB of attenuation to LFP frequencies when acquiring ENAP alone. For combined ENAP/LFP acquisition on four channels, the  $C_2$  and  $C_3$  switch clocks are disabled to extend the lower cutoff (resulting in the response  $H_{BB}(z)|_{C_2=0}$ ) and D is reduced to lower the gain (accommodating the larger signal). Table I shows the design parameters for the signal conditioning section.

The input referred noise of the signal conditioning section is dominated by thermal and 1/f noise from the A1 transconductor. Tables II and III show the estimated noise breakdown and the simulated power dissipation for the signal conditioning circuitry, respectively. The estimated noise was verified with SpectreRF's Pnoise analysis although poor simulation models of subthreshold noise prevented direct verification. The majority of noise from A1 is generated in the baseband by the input pair, although a significant portion is aliased from  $M_3$  and  $M_4$ . Most of the power is dissipated in the A1 transconductor. The noise and power breakdowns show that SC filtering can be employed

TABLE II SIGNAL CONDITIONING SECTION—ESTIMATED INPUT REFERRED NOISE

| Component                    |          | Estimated<br>Input Referred<br>Noise Voltage | Percent of<br>Total Noise<br>Power |  |
|------------------------------|----------|----------------------------------------------|------------------------------------|--|
| A1 Main Branch               | Wideband | 1.36 μVrms                                   | 47.2 %                             |  |
| $(M_1, M_2, R_S)$            | 1/f      | 1.04 μVrms                                   | 27.6 %                             |  |
| A1 Output Branch             | Wideband | 0.80 μVrms                                   | 16.3 %                             |  |
| $(M_3, M_4)$                 | 1/f      | 0.28 μVrms                                   | 2.0 %                              |  |
| A3                           |          | 0.38 μVrms                                   | 3.7 %                              |  |
| Switch Resist                | ances    | 0.31 μVrms                                   | 2.5 %                              |  |
| Other (A2, Source Followers) |          | 0.17 μVrms                                   | 0.8 %                              |  |
| Total                        |          | 1.98 μVrms                                   | 100 %                              |  |

TABLE III
SIGNAL CONDITIONING SECTION—SIMULATED POWER DISSIPATION

| Component         | Simulated Power | Percent of Total |
|-------------------|-----------------|------------------|
| A1                | 34.7 μW         | 85.1 %           |
| A2                | 2.4 μW          | 6.0 %            |
| A3                | 3.1 μW          | 7.6 %            |
| Source Followers  | 0.5 μW          | 1.3 %            |
| Total (1 channel) | 40.7 μW         | 100 %            |

efficiently in this application, despite noise aliasing and amplifier settling time requirements.

## D. A2 and A3 OTA Design Considerations

The A2 and A3 OTAs are designed to achieve roughly 0.1% dynamic settling error at each amplifier output at the end of the  $\Phi_2$  phase. The selection of D dictates the time available for A2 and A3 to settle in the  $\Phi_2$  phase,  $(1-D)\cdot T_s$ . A2 is implemented with a differential pair with SC CFMB, since its open loop gain does not affect the filter transfer function severely (in contrast to the original biquad in [24]). A3 is implemented with a two-stage Miller compensated design with SC CMFB, since high gain and output swing are required in this amplifier's path for out-of-band signal rejection. Subthreshold biasing is used for all devices in A2 and A3 to provide high output swing as well as good transconductance efficiency. The large front-end gain results in minor noise contributions from A2 and A3.

#### IV. ANALOG-TO-DIGITAL CONVERSION

In each channel, digitization is performed by a charge redistribution, SAR ADC [34], which operates by performing a binary search over the code-space until the code most closely corresponding to the sampled input voltage is found. A capacitor array serves to both hold the input sample and act as a DAC to generate comparison voltages for the binary code search algorithm. The ADC block diagram is shown in Fig. 5. A built-in saturation detector mitigates the impact of large, overloading input transients by automatically triggering a reset of the A1 input nodes if the number of consecutive  $\pm$  full-scale codes exceeds a programmable threshold.

Fig. 6 details the SAR logic implementation. A sequencer (first row of flops in Fig. 6) schedules the application of each test code by shifting a solitary "1" down the register as *clk\_sar* is pulsed high, successively asserting each test bit in the data register (second row of flops in Fig. 6). The 10 bit data word



Fig. 5. SAR ADC block diagram, with saturation detector.

controls the DAC by switching capacitor plates between  $V_{\rm ref}$  and Gnd and allowing voltages to settle according to charge conservation. During  $clk\_sar$  low, the comparator is strobed and its value latched and fed back to each of the data flops; this result indicates which half of the (remaining) code-space the sampled input belongs to, and governs the test code progression. Each test bit assertion also clocks the previous data flop to take in the (previous) comparator result, forming a compact way to reset a data bit whose weight contribution is found to be too large. Following each conversion, the data registers are set to "shift" mode, allowing the formation of inter-ADC daisy chains to pipe data across the array.

#### A. Capacitor Array

Large area is a drawback of charge redistribution SAR ADCs as total capacitance is exponential with resolution. To partially address this, we employ a minimalistic custom capacitor design instead of the provided MIM capacitor standard cells which have a moderate amount of overhead. Mismatch data for our process [35] suggests that minimally sized MIM capacitors (following design rules) more than sufficiently satisfy matching requirements for a 10 b array. We take advantage of this fact and use a 1 b/9 b split capacitor array [see Fig. 7(a)] to reduce total capacitance by a factor of  $\sim$ 2 relative to a conventional binary array, at the expense of increased matching requirements. Using these techniques, we are able to assign one ADC per channel within a total die area of 25 mm<sup>2</sup>. Bottom-plate sampling is utilized through the use of an early sampling clock (acquire e) to minimize signal-dependent charge injection. Bootstrapped switches are not required given the low speed of operation.

For a series coupled, two-stage capacitor array with  $B_1$  and  $B_2$  bits for the LSB and MSB arrays respectively, the required series coupling capacitor is

$$C_{\rm ser} = \frac{2^{B_1} C}{2^{B_1} - 1}. (5)$$



Fig. 6. Implementation of the SAR logic. An extra negative edge-triggered flop placed at the end of the shift register adds an intentional half-cycle delay to prevent min-path problems in the presence of globally accumulated skew, when shifting data from one ADC to the next.



Fig. 7. (a) SAR ADC capacitor array showing 1 b/9 b split. The physical bottom plates of the capacitors correspond with the curved lines in the schematic symbols. (b) Illustration of the 1 b subarray layout.

When applying a test code to the capacitor DAC (CDAC), a "1" corresponds to switching  $V_{\rm ref}$  into the array through the appropriate capacitor in the positive half-circuit ( $V_{{\rm in},p}$  side), and switching Gnd into the array in the negative half-circuit. Given a test code N between 0 and  $2^B-1$ , the resulting CDAC output voltages settle to

$$v_{op} = V_{cm} + V_{\text{ref}} \left( 1 - \frac{N}{2^B} \right) - v_{\text{in,m}} \tag{6}$$

$$v_{om} = V_{cm} + V_{\text{ref}} \cdot \frac{N}{2^B} - v_{\text{in},p} \tag{7}$$

where  $B = B_1 + B_2$ . The output differential and common mode are therefore

$$v_{od} = v_{\rm id} - V_{\rm ref} \left(\frac{2N}{2^B} - 1\right) \tag{8}$$

$$v_{oc} = V_{cm} + \frac{1}{2}V_{ref} - v_{ic} \tag{9}$$

where  $v_{ic}$  is the input common mode to the ADC. These equations show that  $V_{\rm ref}$  sets the input range of the ADC and  $V_{cm}$  helps set the comparator input common mode levels and is chosen to keep  $v_{op}$  and  $v_{om}$  within the supply rails during code tests to prevent excessive leakage through the transistor switches and diode junctions.

Linearity of the ADC relies heavily on capacitor matching and therefore unit capacitors are used to implement the array and common-centroid layout techniques are applied. For the split capacitor array, nonlinearity is also introduced by inaccuracy of the series capacitor and by parasitic capacitance at the top



Fig. 8. Fully dynamic comparator.



Fig. 9. System level phases of operation.

plate of the LSB subarray. Parasitic capacitance on the top plate of the MSB subarray introduces gain error in the ADC transfer function, but does not affect linearity and can largely be ignored in our application.

The particular choice of a 1b/9b split has notable advantages over other LSB/MSB array partitioning configurations in terms



Fig. 10. (left) Die photo and (right) pixel (2 channel) layout with capacitors removed in one channel. The die dimensions are 5 mm  $\times$  5 mm and the pixel is 860  $\mu$ m $\times$  440  $\mu$ m.

of limiting systematic causes of nonlinearity. Equation (5) indicates that the required series capacitance for a 1b LSB array is an integer multiple of the unit capacitance C, allowing  $C_{ser}$  to be implemented with unit capacitors to achieve good matching; other partitions require fractional multiples of the unit capacitance. Since no irregularly shaped capacitors are introduced, the layout of the array is kept wholly uniform. The top plate parasitic capacitance is correspondingly small, given the minimal routing requirements for a 1b LSB array [see Fig. 7(b)]. Finally, capacitors are oriented such that the large physical bottom plate parasitics are lumped into the least sensitive nodes. Based on layout extracted parasitics, the systematic INL and DNL (excluding random capacitor mismatch) are found to be < 0.1 LSB.

# B. Comparator

We use a fully dynamic comparator (see Fig. 8) for low power consumption. No static current except for leakage is consumed after complete evaluation, an important attribute when the decision time is much shorter than the allocated strobe period. Without the use of biased preamplification, care must be taken to control the effect of noise, which is difficult to analyze in dynamic comparators given their primarily large-signal behavior. [36] presents a methodology to estimate noise using results from stochastic differential equations and analyzing the small-signal parameters in the different phases during comparator evaluation. Their results show that decreasing the input pair overdrive during the initial transient period and increasing capacitance on node X in Fig. 8 are the most effective means of reducing input-referred noise. Following these guidelines, we size up the input pair and use a minimum sized footer device to reduce the overdrive [36]. Explicit capacitance C<sub>X</sub> is added to further reduce noise, as the increase in total system power is negligible and evaluation speed is a non-issue. Transient noise simulations show that effective noise is 180  $\mu V_{\rm rms}$  referred to the comparator input.

Two inverters are used to isolate the regenerative latch outputs from the state-dependent input capacitances of the SR latch in order to prevent an induced offset by the mismatched load capacitances [37]. While comparator offset directly contributes to offset in the ADC transfer function, small amounts are tolerable in our application and can be removed during digital post-processing, thus offset cancellation is unnecessary.



Fig. 11. Measured signal conditioning transfer function and noise spectral density for ENAP and LFP/ENAP configurations.

# V. DIGITAL INTERFACE

For every sample, there are 960 bits of data that must be loaded into an output register in one or more parallel streams before serial delivery off-chip; these phases are illustrated in Fig. 9. To do this, the ADCs must be partitioned into one or more daisy chains to send data across the array. Tradeoffs exist between energy, wiring, and timing overheads, based on the number of partitions used. A single but lengthy daisy chain configuration requires the fewest metal resources to pass data to the output register, but requires 960 clock cycles and wastes significant energy clocking flops which have already passed all the relevant bits. For example, the nth flop from the back end of a shift register needs only be clocked n times, after which the data it passes is irrelevant (e.g., all zeros). At the other extreme, a fully parallel loading scheme into the output register wastes no flop clocking energy, but requires significant global wire routing across the chip and has almost no timing overhead, which is not desirable in our case either. In populating the output register once, the total wasted flop energy can be shown to be

$$E_{\text{waste}} = 960 \left( \frac{960}{p} - 1 \right) C_{no\_DQ} V_{dd}^2$$
 (10)

where p is the number of parallel streams that feed the output register and  $C_{no\_DQ}$  is the switching capacitance of a flop over



Fig. 12. (a) ADC linearity and (b) tone test results.

one clock cycle when there is no data transition. Thus, wasted energy is roughly proportional to 1/p and increasing p eventually yields diminishing returns.

The timing overhead in loading the output register is proportional to the number of ADCs in each partition. In the context of the overall HermesE system, this dead time is necessary for auxiliary bits such as frame counters, error checking bits, clock recovery sequences, or other data (such as accelerometer readings), to be inserted into the bitstream prior to wireless transmission.

The choice of three partitions (p=3) is deemed to be reasonable in reducing wasted flop energy, reducing wire routing, and allowing a conservative 320 auxiliary bits to be inserted in each sample period of the transmitted bitstream. Thus, the output register is loaded using 3 streams at 40 MHz during a dedicated 8  $\mu$ s per sample, and data is serially shifted off-chip during the other 24  $\mu$ s at 40 MHz. This scheme localizes the high frequency switching noise far away from the array channels during the sensitive conditioning and conversion phases.

# VI. FLOORPLAN AND GLOBAL DISTRIBUTION

An annotated die photo is shown in Fig. 10. The signal conditioning section and ADC are placed adjacent to each other, with the source follower buffers underneath the ADC capacitor array, to minimize the routing distance. By abutting one channel's circuitry with another that has been rotated by 180°, a tile-able pixel is formed that can be arranged into an array, allowing for a convenient and regular distribution network of clocks, power, and bias. This arrangement comes with the expense of longer routing for analog inputs to the middle of the chip, but the extra parasitics and coupling effects are small compared to those that exist off-chip and can be minimized with appropriate spacing or shielding. Using simulations with layout extracted parasitics, input coupling effects on-chip are found to be minimal.

The clock generator, configuration register, and output register are placed near the bottom edge of the chip, away from the array channels. One main horizontal clock branch occupies the space between the bottom two pixel rows, and splits off into vertical clock branches that run in the column space between

pixels. Configuration bits are also routed to each pixel in this fashion. Whenever possible, unnecessary clock transitions (e.g., array data shifting clock during acquisition/conversion phases) are gated from global distribution to reduce power consumption and switching noise.

Programmable constant- $g_{\rm m}$  bias generators are located in the bottom left corner of the chip. Current mirrors along the left edge of the chip distribute bias currents horizontally across the rows to each pixel. Analog inputs lines are also routed horizontally to each pixel from the left and right edges of the chip. Shielding is used whenever sensitive lines overlap with clock lines.

Within each pixel, two metal layers (M4/M5 or M5/M6) are used to both shield the MIM capacitors from the switching circuitry below and to distribute power down each pixel column. Power and off-chip reference voltages are brought on-chip via the bond pads along the top edge of the chip. Each column has its own dedicated set of supply pins, in order to reduce potential supply line coupling effects on-chip. All digital I/Os and test structure I/Os, along with their own dedicated supply pads, reside along the bottom edge of the chip.

# VII. MEASUREMENT RESULTS

To characterize the IC, several chips were packaged using a chip-on-board solution. Test structures enable the assessment of the signal conditioning circuitry and ADC individually, as well as in a complete signal path configuration. Transfer function measurements of the signal conditioning test structure were taken using a HP33210A function generator and a SR760 spectrum analyzer. The source follower buffers are sufficient to drive pad capacitance, PCB trace capacitance, and the input capacitance of discrete unity gain opamp buffers placed close to the device under test. Noise measurements were taken from the output of the same test structure (with shorted inputs) using the SR760 spectrum analyzer. The integrated output noise was input referred using the peak gain from the measured transfer functions.

The measured signal conditioning transfer functions and noise spectra are shown in Fig. 11. In the ENAP configuration, the passband gain is 56 dB and the measured bandwidth is



Fig. 13. Histogram of passband corner frequencies of the 92 ENAP channels across three chips. The high-pass corner distribution is shown in (a) and the low-pass corner distribution is shown in (b).

TABLE IV
TOTAL MEASURED CHIP POWER DISSIPATION AND BREAKDOWN

| Component        | Measured Power | Percent of Total |
|------------------|----------------|------------------|
| AVDD             | 3.89 mW        | 60.5 %           |
| DVDD             | 0.89 mW        | 13.8 %           |
| Output Register  | 1.28 mW        | 19.8 %           |
| Clock Generation | 0.17 mW        | 2.7 %            |
| I/O              | 0.09 mW        | 1.4 %            |
| Vrefs, Bias      | 0.11 mW        | 1.8 %            |
| Total            | 6.43 mW        | 100 %            |

280 Hz to 10 kHz indicating 7% variation in  $f_{-3 \text{ dB,HP}}$  due to the custom  $C_2$  capacitor. In the ENAP/LFP configuration, we see a reduced gain of 40 dB and low frequency pass band extension beyond  $\sim$ 1 Hz. Total integrated noise (input referred) is 2.2  $\mu$ Vrms for the ENAP case (measured from 1 Hz to 100 kHz). In the ENAP/LFP case, the total integrated noise is 14  $\mu$ Vrms in the LFP band (1–100 Hz), and 3  $\mu$ Vrms the ENAP band (100 Hz-10 kHz). Minor tones and harmonics due to 60 Hz power line interference in the test setup are visible, but these will not be present in the final, battery-operated system. Small differential mode artifacts from CMFB refresh show up as high frequency tones in the Fig. 11 noise measurements, but these transients are synchronously sampled by the ADC and are mostly settled at that time, hence they are rejected. The total signal conditioning test structure power is measured to be 35  $\mu$ W, resulting in a measured noise efficiency factor [38] (NEF) of 4.5 for the ENAP configuration.

ADC linearity is measured using conventional histogram testing, and the DNL and INL results [see Fig. 12(a)] are well within  $\pm 0.5$  LSB, confirming good matching among the capacitors. A tone test [see Fig. 12(b)] at 1 kHz shows an SFDR of around 80 dB and SNDR of 60.26 dB. Total ADC power per channel is measured to be 1.1  $\mu$ W including references, with comparator power at 132 nW. The corresponding figure-of-merit is 42 fJ/conv-step.

The total chip power is measured to be 6.43 mW, and the breakdown of consumption is shown in Table IV. The majority of current (60%) is consumed from the 1.2 V AVDD supply which powers the signal conditioning circuitry. The 1.2 V DVDD supply powers the ADCs, clock/config distribution, and the data shifting circuitry across the array. While each ADC

consumes only 1.1  $\mu$ W in conversion, the majority of DVDD power is consumed in passing data from the entire array to the output register. Since it is continuously clocked at 40 MHz, the 960-bit output register consumes a fair amount of power. In the context of the HermesE system, the use of this register is unavoidable in order to serialize the bitstream for transmission. We opted to place it in the front-end chip so that the FPGA can be kept small and power efficient.

To evaluate the immunity to process variation afforded by the use of a switched-capacitor architecture, we measured the -3 dB passband corner frequencies of the 92 ENAP channels across three chips and the distributions are shown in Fig. 13. The standard deviations of the corner frequencies are measured to be about 62 Hz ( $\sim$ 0.62% variation) for the low-pass edge and 3.4 Hz ( $\sim$ 1.2% variation) for the high-pass edge, demonstrating good matching across the array and between chips without the need for manual tuning.

Using the same benchtop test boards, *in-vivo* neural recordings were made from a 96-channel Utah Electrode Array (Cere-Port array by Blackrock Microsystems) implanted in the motor cortex of a rhesus macaque (Monkey L). Four ribbon cables (two 12" and two 18") were used to interface our test board with the implanted electrodes, and recordings were made during simple reach exercises. All experiments and procedures were approved by Stanford's Institutional Animal Care and Use Committee. Data from four channels, two ENAP and two ENAP/LFP, are shown in Fig. 14, confirming successful acquisition of neural signals. In the latter two channels, action potentials are seen to be protruding from a large, slow-varying local field potential. The two LFP waveforms are visibly correlated, confirming the need for only a few LFP recording channels across the electrode array.

## VIII. COMPARISONS AND CONCLUSION

Table V summarizes and compares the system recording performance to that of other recent multichannel neural interface designs. The signal conditioning sections of the three other tabulated works [5], [7], [8] utilize capacitive AC coupling into a preamplifier that drives a capacitive load and has parallel RC feedback. The preamplifier gain is set by the feedback and coupling capacitors, while a high-pass corner is set by the large

|                                     |                       | This Work                                      | [5]                                            | [6]                                        | [7]                                         | [8]                              |
|-------------------------------------|-----------------------|------------------------------------------------|------------------------------------------------|--------------------------------------------|---------------------------------------------|----------------------------------|
| System Totals                       | Technology            | 8M1P 0.13μm MM                                 | 2P3M 0.6 μm<br>BiCMOS                          | Discrete<br>multichip                      | 4M2P 0.35μm<br>CMOS                         | 2P 0.35μm                        |
|                                     | VDD                   | 1.2 V                                          | 3.3 V                                          | 5 V                                        | +/-1.65 V                                   | 3 V                              |
|                                     | Die Size              | 5 x 5 mm                                       | 5.4 x 4.7 mm                                   | -                                          | 8.8 x 7.2 mm                                | 3.4 x 2.5 mm                     |
|                                     | Channel Count         | 96                                             | 100                                            | 32                                         | 128                                         | 128                              |
|                                     | Total Power           | 6.5 mW                                         | 8 mW                                           | 82 mW<br>(recording)                       | 4.4 mW (streaming mode)                     | 2.4 mW<br>(recording)            |
|                                     | Power/ch.             | 68 μW                                          | 80 μW                                          | 2.56 mW<br>(recording)                     | 34 μW (with power cycling)                  | 18.7µW (recording)               |
|                                     | Area/ch.              | 0.26 mm <sup>2</sup>                           | $0.25 \text{ mm}^2$                            | Discrete<br>multichip                      | $0.5 \text{ mm}^2$                          | 0.05 mm <sup>2</sup> (recording) |
|                                     | Output Data           | 30 Mbps<br>raw data                            | spike detection/<br>1 raw ch.                  | 24 Mbps<br>raw data                        | spike features/<br>46.1 Mbps<br>raw data    | 12.8 Mbps<br>raw data            |
| Signal<br>Conditioning<br>Circuitry | Architecture          | Open loop Gm,<br>SC BPF                        | Feedback OTA,<br>CT BPF                        | Feedback OTA,<br>CT BPF                    | Feedback OTA,<br>CT BPF                     | Feedback OTA,<br>CT BPF          |
|                                     | Bandwidth             | <1 Hz - 10 kHz (LFP)<br>280 Hz - 10 kHz (ENAP) | 250 Hz - 5 kHz                                 | 0.05 Hz - 5 kHz                            | 0.1 Hz, 200 Hz -<br>2.2 kHz<br>Configurable | 10 Hz - 5 kHz                    |
|                                     | Input Referred Noise  | 2.2 μVrms (ENAP)                               | 4.8 μVrms                                      | 3.2 μVrms                                  | 4.9 μVrms                                   | 6.08 μVrms                       |
| Analog to Digital Conversion        | Architecture          | SAR / ch.                                      | SAR / 100 ch.                                  | SAR / 16 ch.<br>(discrete)                 | SAR / 16ch.                                 | SAR / 8 ch.                      |
|                                     | Resolution            | 10 bit                                         | 10 bit                                         | 12 bit<br>(14 bit but only<br>used 12 bit) | 6-9 bit<br>(configurable)                   | 8 bit                            |
|                                     | Sample Rate (per ch.) | 31.25 kSa/s                                    | 15.7 kSa/s<br>(one selectable<br>channel only) | 30 kSa/s                                   | 40 kSa/s                                    | 12.5 kSa/s                       |

TABLE V
SYSTEM SUMMARY AND COMPARISON TO OTHER DESIGNS



Fig. 14. In-vivo recordings from Monkey L.

feedback resistance implemented using MOS pseudoresistors. The frequency response is further shaped by a second filtering or amplifying stage. Process variation is a concern, such that [5] opts to more accurately set the overall high-pass corner using a subsequent tunable  $\rm G_m\text{-}C$  filter, while [7], [8] provision means to hand tune the resistances, capacitances, or bias currents to set the desired passband response thereby addressing global variations but not channel-to-channel matching.

In comparison, our design exhibits well-controlled frequency corners set by capacitor ratios and SC clocks, simplifying its usage and making it more robust and predictable in practical experimental settings. [7], [8] utilize time-shared ADCs to save area at the expense of increased system complexity, whereas this work employs a fine-grain pixel that relaxes chip-level overhead and management, localizes analog signal processing, and contributes to a scalable architecture. Finally, low power consumption and low voltage operation allows integration with a 1.2 V Hermes platform that aims to provide high-fidelity, 96-channel broadband recording capabilities with an order of magnitude increase in power efficiency over our previous Hermes systems.

# ACKNOWLEDGMENT

The authors would like to thank Analog Devices for their design review, TSMC for fabrication, and Berkeley Design Automation for the use of the Analog FastSPICE Platform (AFS).

# REFERENCES

- [1] J. Donoghue, J. Sanes, N. Hatsopoulos, and G. Gyngyi, "Neural discharge and local field potential oscillations in primate motor cortex during voluntary movements," *J. Neurophysiol.*, vol. 79, no. 1, pp. 159–173, Jan. 1998.
- [2] V. Gilja, C. A. Chestek, P. Nuyujukian, J. D. Foster, and K. V. Shenoy, "Autonomous head-mounted electrophysiology systems for freely-behaving primates," *Curr. Opin. Neurobiol.*, vol. 20, no. 5, pp. 676–686, Oct. 2010.

- [3] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, A. H. Caplan, A. Branner, D. Chen, R. D. Penn, and J. P. Donoghue, "Neuronal ensemble control of prosthetic devices by a human with tetraplegia," *Nature*, vol. 442, no. 7099, pp. 164–171, Jul. 2006.
- [4] A. B. Schwartz, "Cortical neural prosthetics," Annu. Rev. Neurosci., vol. 27, pp. 487–507, Jul. 2004.
- [5] R. R. Harrison, R. J. Kier, C. A. Chestek, V. Gilja, P. Nuyujukian, S. I. Ryu, B. Gregor, F. Solzbacher, and K. V. Shenoy, "Wireless neural recording with single low-power integrated circuit," *IEEE Trans. Neural Syst. Rehab. Eng.*, vol. 17, no. 4, pp. 322–329, Aug. 2009.
- [6] H. Miranda, V. Gilja, C. A. Chestek, K. V. Shenoy, and T. H. Meng, "HermesD: A high-rate long-range wireless transmission system for simultaneous multichannel neural recording applications," *IEEE Trans. Biomed. Circuits Syst.*, vol. 4, no. 3, pp. 181–191, Jun. 2010.
- [7] M. Chae, W. Liu, Z. Yang, T. Chen, J. Kim, M. Sivaprakasam, and M. Yuce, "A 128-channel 6 mW wireless neural recording IC with on-the-fly spike sorting and UWB transmitter," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2008, pp. 146–147.
- [8] F. Shahrokhi, K. Abdelhalim, D. Serletis, P. Carlen, and R. Genov, "The 128-channel fully differential digital integrated neural recording and stimulation interface," *IEEE Trans. Biomed. Circuits Syst.*, vol. 4, no. 3, pp. 149–161, Jun. 2010.
- [9] C. Qian, J. Parramon, and E. Sanchez-Sinencio, "A micropower lownoise neural recording front-end circuit for epileptic seizure detection," *IEEE J. Solid-State Circuits*, vol. 46, no. 6, pp. 1392–1405, Jun. 2011.
- [10] J. Lee, H. G. Rhew, D. R. Kipke, and M. P. Flynn, "A 64 channel programmable closed-loop neurostimulator with 8 channel neural amplifier and logarithmic ADC," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1935–1945, Sep. 2010.
- [11] M. M. Churchland, J. P. Cunningham, M. T. Kaufman, S. I. Ryu, and K. V. Shenoy, "Cortical preparatory activity: Representation of movement or first cog in a dynamical machine?," *Neuron*, vol. 68, no. 3, pp. 387–400, Nov. 2010.
- [12] G. Santhanam, M. Linderman, V. Gilja, A. Afshar, S. Ryu, T. Meng, and K. Shenoy, "HermesB: A continuous neural recording system for freely behaving primates," *IEEE Trans. Biomed. Eng.*, vol. 54, no. 11, pp. 2037–2050, Nov. 2007.
- [13] C. A. Chestek, V. Gilja, P. Nuyujukian, R. Kier, F. Solzbacher, S. Ryu, R. Harrison, and K. V. Shenoy, "HermesC: Low-power wireless neural recording system for freely moving primates," *IEEE Trans. Neural Syst. Rehab. Eng.*, vol. 17, no. 4, pp. 330–338, Aug. 2009.
- [14] R. M. Walker, H. Gao, P. Nuyujukian, K. Makinwa, K. V. Shenoy, T. H. Meng, and B. Murmann, "A 96-channel full data rate direct neural interface in 0.13 μm CMOS," in *Symp. VLSI Circuits Dig. Tech. Papers*, 2011, pp. 144–145.
- [15] H. Miranda and T. H. Meng, "A programmable pulse UWB transmitter with 34% energy efficiency for multichannel neuro-recording systems," in *Proc. IEEE CICC*, 2010, pp. 1–4.
- [16] M. S. Lewicki, "A review of methods for spike sorting: The detection and classification of neural action potentials," *Network*, vol. 9, no. 4, pp. R53–R78, Nov. 1998.
- [17] R. C. Gesteland, B. Howland, J. Y. Lettvin, and W. H. Pitts, "Comments on microelectrodes," *Proc. IRE*, vol. 47, no. 11, pp. 1856–1862, Nov. 1959
- [18] R. R. Harrison, "The design of integrated circuits to observe brain activity," *Proc. IEEE*, vol. 96, no. 7, pp. 1203–1216, Jul. 2008.
- [19] C. T. Nordhausen, P. J. Rousche, and R. A. Normann, "Optimizing recording capabilities of the Utah Intracortical Electrode Array," *Brain Res.*, vol. 637, no. 1–2, pp. 27–36, Feb. 1994.
- [20] M. S. Chae, W. Liu, and M. Sivaprakasam, "Design optimization for integrated neural recording systems," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 1931–1939, Sep. 2008.
- [21] W. Wattanapanitch, "An ultra-low-power neural recording amplifier and its use in adaptively-biased multi-amplifier arrays," M.S. thesis, Dept. Elect. Eng. Comput. Sci., MIT, Cambridge, MA, 2007.
- [22] R. R. Harrison, P. T. Watkins, R. J. Kier, R. O. Lovejoy, D. J. Black, B. Greger, and F. Solzbacher, "A low-power integrated circuit for a wireless 100-electrode neural recording system," *IEEE J. Solid-State Circuits*, vol. 42, no. 1, pp. 123–133, Jan. 2007.
- [23] F. Shahrokhi, K. Abdelhalim, and R. Genov, "128-channel fully differential digital neural recording and stimulation interface," in *Proc. IEEE ISCAS*, 2009, pp. 1249–1252.
- [24] R. Gregorian and G. T. Temes, Analog MOS Integrated Circuits for Signal Processing. New York: Wiley, 1986, pp. 280–296.
- [25] L. R. Carley and T. Mukherjee, "High-speed low-power integrating CMOS sample-and-hold amplifier architecture," in *Proc. IEEE CICC*, 1995, pp. 543–546.

- [26] A. Mirzaei, S. Chehrazi, R. Bagheri, and A. A. Abidi, "Analysis of first-order anti-aliasing integration sampler," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 10, pp. 2994–3005, Nov. 2008.
- [27] J. Yuan, "A charge sampling mixer with embedded filter function for wireless applications," in *Proc. 2nd Int. Conf. Microw. Millimeter Wave Technol.*, 2000, pp. 315–318.
- [28] G. Xu and J. Yuan, "Performance analysis of general charge sampling," IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 52, no. 2, pp. 107–111, Feb. 2005.
- [29] J. L. Bohorquez, M. Yip, A. P. Chandrakasan, and J. L. Dawson, "A biomedical sensor interface with a *sinc* filter and interference cancellation," *IEEE J. Solid-State Circuits*, vol. 46, no. 4, pp. 746–756, Apr. 2011.
- [30] C. D. Ezekwe and B. E. Boser, "A mode-matching  $\Sigma\Delta$  closed-loop vibratory gyroscope readout interface with a  $0.004^\circ/s/\sqrt{\rm Hz}$  noise floor over a 50 Hz band," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 3039–3048, Dec. 2008.
- [31] C. D. Ezekwe, J. P. Vanderhaegen, X. Xing, and G. K. Balachandran, "A 6.7 nV/√Hz sub-MHz-1/f-corner 14 b analog-to-digital interface for rail-to-rail precision voltage sensing," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers*, 2011, pp. 246–248.
- [32] P. Gray, P. Hurst, S. Lewis, and R. Meyer, Analysis and Design of Analog Integrated Circuits, 4th ed. New York: Wiley, 2001, p. 849.
- [33] W. Wattanapanitch, M. Fee, and R. Sarpeshkar, "An energy-efficient micropower neural recording amplifier," *IEEE Trans. Biomed. Circuits Syst.*, vol. 1, no. 2, pp. 136–147, Jun. 2007.
- [34] J. L. McCreary and P. R. Gray, "All-MOS charge distribution analog-to-digital conversion techniques (Part I)," *IEEE J. Solid-State Circuits*, vol. 10, no. 6, pp. 371–379, Dec. 1975.
- [35] C. H. Diaz, D. D. Tang, and J. Y.-C. Sun, "CMOS technology for MS/RF SoC," *IEEE Trans. Electron Devices*, vol. 50, no. 3, pp. 557–566, Mar. 2003.
- [36] P. Nuzzo, F. De Bernardinis, P. Terreni, and G. Van der Plas, "Noise analysis of regenerative comparators for reconfigurable ADC architectures," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 6, pp. 1441–1454, Jul. 2008.
- [37] A. Nikoozadeh and B. Murmann, "An analysis of latch comparator offset due to load capacitor mismatch," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 53, no. 12, pp. 1398–1402, Dec. 2006.
- [38] M. S. J. Steyaert, W. M. C. Sansen, and C. Zhongyuan, "A micro-power low-noise monolithic instrumentation amplifier for medical purposes," *IEEE J. Solid-State Circuits*, vol. 22, no. 6, p. 1163–1168, Dec. 1987.



**Hua Gao** (S'06) received the B.S. degree in electrical engineering from Columbia University, New York, NY, in 2007 and the M.S. degree in electrical engineering from Stanford University, Stanford, CA in 2009, where he is currently pursuing the Ph.D. degree.

He held internships at Boston Scientific, Marlborough, MA, in 2005 and at Analogic Corporation, Peabody, MA, in 2007. His research interests include the design of mixed-signal and low power integrated circuits and systems for biomedical applications.

Mr. Gao was a recipient of the William L. Everitt Student Award of Excellence from the Electrical Engineering Department, Columbia University and the Cadence Design Systems Stanford Graduate Fellowship, both in 2007.



Ross M. Walker (S'08) was born in Chattanooga, TN, in 1979. He received the B.S. degree in electrical engineering and the B.S. degree in computer science from the University of Arizona, Tucson, in 2005, the M.S. degree in electrical engineering from Stanford University, Stanford, CA, in 2007, where he has been pursuing the Ph.D. degree in electrical engineering.

From 2003 to 2004, he held internships with IBM and National Semiconductor, both in Tucson, AZ. In 2006, he held an internship at Linear Technology, Milpitas, CA. His research interests include mixed

signal integrated circuit design with emphasis on sensor interfacing, signal processing, and biomedical applications.



**Paul Nuyujukian** (S'05) received the B.S. degree in cybernetics from the University of California, Los Angeles, in 2006, and the M.S. degree in bioengineering from Stanford University, Stanford, CA, in 2011, where he is currently pursuing the M.D./Ph.D. degree in bioengineering.

His research interests include the development and clinical translation of neural prosthetics.



**Kofi A. A. Makinwa** (M'97–SM'05–F'11) received the B.Sc. and M.Sc. degrees from Obafemi Awolowo University, Nigeria, in 1985 and 1988, respectively, the M.E.E. degree from the Philips International Institute, The Netherlands, in 1989, and the Ph.D. degree from Delft University of Technology, Delft, The Netherlands and in 2004.

From 1989 to 1999, he was a Research Scientist with Philips Research Laboratories, Eindhoven, The Netherlands, where he worked on interactive displays and on front-ends for optical and magnetic recording

systems. In 1999, he joined Delft University of Technology, where he is now an Antoni van Leuwenhoek Professor with the Faculty of Electrical Engineering, Computer Science, and Mathematics. His main research interests include the design of precision analog circuitry, sigma-delta modulators, smart sensors, and sensor interfaces. This has resulted in 1 book, 15 patents, and over 150 technical papers.

Prof. Makinwa is on the program committees of several international conferences, including the European Solid-State Circuits Conference (ESSCIRC) and the International Solid-State Circuits Conference (ISSCC). He has also served as a guest editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC). He is the co-recipient of several Best Paper Awards: from the JSSC, ISSCC, Transducers, and ESSCIRC among others. In 2005, he received a Veni Award from the Netherlands Organization for Scientific Research and the Simon Stevin Gezel Award from the Dutch Technology Foundation. He is a distinguished lecturer of the IEEE Solid-State Circuits Society and a fellow of the Young Academy of the Royal Netherlands Academy of Arts and Sciences.



**Krishna V. Shenoy** (S'87–M'01–SM'06) received the B.S. degree in electrical engineering from the University of California, Irvine, in 1990, and the M.S. and Ph.D. degrees in electrical engineering from the Massachusetts Institute of Technology, Cambridge, in 1992 and 1995, respectively.

He was a Neurobiology Postdoctoral Fellow at the California Institute of Technology, Pasadena, from 1995 to 2001 and then joined Stanford University, Stanford, CA, where he is an Associate Professor in the Departments of Electrical Engineering and Bio-

engineering, and in the Neurosciences Program. His research interests include computational motor neurophysiology and neural prosthetic system design.

Dr. Shenoy was a recipient of the 1996 Hertz Foundation Doctoral Thesis Prize, a Burroughs Wellcome Fund Career Award in the Biomedical Sciences, an Alfred P. Sloan Research Fellowship, a McKnight Endowment Fund in Neuroscience Technological Innovations in Neurosciences Award, and a 2009 National Institutes of Health Directors Pioneer Award.



Boris Murmann (S'99–M'03–SM'09) received the Dipl.-Ing. (FH) degree in communications engineering from Fachhochschule Dieburg, Dieburg, Germany, in 1994, the M.S. degree in electrical engineering from Santa Clara University, Santa Clara, CA, in 1999, and the Ph.D. degree in electrical engineering from the University of California, Berkeley, in 2003.

From 1994 to 1997, he was with Neutron Mikrolektronik GmbH, Hanau, Germany, where he developed low-power and smart-power ASICs

in automotive CMOS technology. Since 2004, he has been with the Department of Electrical Engineering, Stanford University, Stanford, CA, where he currently serves as an associate professor. His research interests include the area of mixed-signal integrated-circuit design, with special emphasis on data converters and sensor interfaces.

Dr. Murmann was a co-recipient of the Best Student Paper Award at the VLSI Circuits Symposium in 2008 and a recipient of the Best Invited Paper Award at the IEEE Custom Integrated Circuits Conference (CICC). In 2009, he received the Agilent Early Career Professor Award. He currently serves as an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS and as a member of the IEEE International and European Solid-State Circuits Conference (ISSCC and ESSCIRC) program committees. He is a Distinguished Lecturer and elected AdCom member of the IEEE Solid-State Circuits Society.



**Teresa H. Meng** (S'82–M'83–SM'93–F'99) received the Ph.D. degree in electrical engineering and computer science from the University of California, Berkeley, in 1988.

She is the Reid Weaver Dennis Professor of Electrical Engineering with Stanford University, Stanford, CA. Her research activities during the first 10 years at Stanford focused on low-power circuit and system design, video signal processing, and wireless communications. In 1999, she took leave from Stanford and founded Atheros Com-

munications, Inc., which developed semiconductor system solutions for communication network products. She returned to Stanford in 2000 to continue her research and teaching at the University. Her current research interests include bio-implant technologies, neural signal processing, and non-invasive medical treatments using focused EM energy. She has given plenary talks at major conferences in the areas of signal processing and wireless communications. She is the author of one book, several book chapters, and over 200 technical articles in journals and conferences.

Dr. Meng was a recipient of many awards and honors, including the Distinguished Alumni Award from the U.C. Berkeley Electrical Engineering and Computer Science Department and the National Taiwan University in 2010, the 2009 IEEE Solid-State Circuits Field Award, the DEMO Lifetime Achievement Award in 2009, the McKnight Technological Innovations in Neurosciences Award in 2007, the Distinguished Lecturer Award from the IEEE Signal Processing Society in 2004, the Bosch Faculty Scholar Award in 2003, the Innovator of the Year Award by MIT Sloan School eBA in 2002, the CIO 20/20 Vision Award in 2002, named one of the Top 10 Entrepreneurs by Red Herring in 2001, a Best Paper Award from the IEEE Signal Processing Society, an NSF Presidential Young Investigator Award, an ONR Young Investigator Award, and an IBM Faculty Development Award, all in 1989, and the Eli Jury Award from U.C. Berkeley in 1988. She is a member of the National Academy of Engineering and the Academia Sinica of Taiwan.