# A 10-Gb/s 275-fsec Jitter Cryo-CMOS Charge-Sampling CDR for Quantum Computing Application

Lennart de Jong, Joachim I. Bas<sup>®</sup>, *Student Member, IEEE*, Jiang Gong<sup>®</sup>, *Student Member, IEEE*, Fabio Sebastiano<sup>®</sup>, *Senior Member, IEEE*, and Masoud Babaie<sup>®</sup>, *Senior Member, IEEE* 

*Abstract*— This letter presents the first clock and data recovery (CDR) system operating at 4.2 K designed for quantum computing (QC) applications. By considering the benefits and challenges of cryogenic operation, a dedicated analog CDR structure is employed so as to maintain high performance at 300 and 4.2 K. The CDR incorporates a new complementary charge-sampling phase detector (PD) that achieves low power and low jitter. Fabricated in 40-nm CMOS, the proposed CDR operates at 10 Gb/s, achieving a recovered clock jitter of 260 fs and a jitter tolerance of 2 UI<sub>PP</sub> at a 5-MHz jitter frequency while consumption reduces to 3.1 mW with a recovered clock jitter of 275 fs and a jitter tolerance of 0.85 UI<sub>PP</sub> at a 5-MHz jitter frequency, demonstrating its functionality for a high-speed cryogenic wireline link.

*Index Terms*— Charge-sampling phase detector (PD), clock and data recovery (CDR), cryogenic CMOS (cryo-CMOS), full-rate, low jitter, PD.

## I. INTRODUCTION

**C** LOCK and data recovery (CDR) is a crucial block in many established wireline applications. A relatively new application is quantum computing (QC), where cryogenic electronic interfaces have been proposed to address the scalability and the sheer interconnect complexity required to control/readout the thousands of quantum bits (qubits) needed to execute practical quantum algorithms [1]. As shown in Fig. 1, a high-speed cryogenic CDR is required to interface between the controller operating at cryogenic temperature (CT) and the classical algorithm running in a digital processor at room temperature (RT). Considering an envisioned surface code cycle time of ~1  $\mu$ s for spin qubits [2], such a CDR should at least support 3-Gb/s data rate to instruct the controller to choose one of eight possible instructions<sup>1</sup> for each of the 1000 physical qubits.

The distinguishing aspect of a CDR is the phase detector (PD). Linear phase detection in high-speed CDRs can be

Manuscript received 27 February 2023; accepted 3 April 2023. Date of publication 8 May 2023; date of current version 7 June 2023. This work was supported in part by Intel Corporation and in part by the Netherlands Organization for Scientific Research under the Veni Program under Project 17303. (*Corresponding author: Joachim I. Bas.*)

The authors are with the Department of Quantum and Computer Engineering, Delft University of Technology, 2628 CD Delft, The Netherlands (e-mail: J.I.Bas@student.tudelft.nl).

This article was presented at the IEEE MTT-S International Microwave Symposium (IMS 2023), San Diego, CA, USA, June 11–16, 2023.

Color versions of one or more figures in this letter are available at https://doi.org/10.1109/LMWT.2023.3267842.

Digital Object Identifier 10.1109/LMWT.2023.3267842

<sup>1</sup>Having eight instructions per qubit is expected to be sufficient, as any quantum circuit can be *efficiently* implemented using a typical universal set of quantum gates (i.e., H, T, S, and CNOT) and a limited auxiliary set of X/Y rotations [3].



Fig. 1. Simplified block diagram of a scalable QC system incorporating a cryo-CMOS high-speed wireline link.



Fig. 2. Proposed CDR architecture.

implemented using a pulse generator in combination with a mixer [4]. While the gain of this PD is high and constant over process, voltage, and temperature (PVT) variations, such a high-frequency pulse generator has a relatively large power consumption  $(P_{dc})$ . At CT, a low  $P_{dc}$  is crucial due to the limited cooling power available. To reduce  $P_{\rm dc}$ , [5] realized a PD using charge steering latches rather than the conventional current steering technique. However, it recovers the data in return-to-zero (RZ) format, adding circuit complexity to convert the data back to non-RZ (NRZ). A more power-efficient approach is presented in [6] by utilizing a master-slave sampler as PD, simultaneously allowing for a wide loop bandwidth and low data-dependent jitter. However, this structure cannot be used directly at CT since its gain is a function of a transistor ON-resistance and the input voltage of its voltage-controlled oscillator (VCO), which varies considerably over the temperature. Moreover, due to the increased threshold voltage at CT [7], the samplers' ON-resistance around mid-rail is relatively high, leading to a degraded loop bandwidth and jitter.

To improve on those limitations, we introduce a complementary charge sampling PD integrated into a cryogenic CMOS (cryo-CMOS) CDR that achieves the lowest reported jitter thanks to its low pattern-dependent jitter and high phasedetection gain.

2771-957X © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.



Fig. 3. (a) Schematic and (b) conceptual waveforms of the proposed PD; its simplified schematic (c) before, (d) during, and (e) after a data transition.

### II. PROPOSED CRYO-CMOS CDR

Fig. 2 shows the proposed CDR architecture. The incoming single-ended data are converted to differential and drive the complementary charge sampling PD. The PD outputs a differential voltage  $V_S(=V_{S+} - V_{S-})$  based on the phase error between the incoming NRZ data and the VCO output  $V_O(=\text{VCO}_P - \text{VCO}_N)$ . The V/I stage converts  $V_S$  into a differential current and injects it into the loop filter, generating tuning voltage  $V_P$  (= $V_{P+} - V_{P-}$ ) to control the VCO. The type-II loop locks the VCO to the data edges with a 0.25-unit interval (UI) offset and prevents major variation of the absolute locking point over PVT variations. A digital calibration loop then shifts the VCO outputs by an additional 0.25 UI to center the retiming edge, optimizing the jitter tolerance of the system.

### A. Complementary Charge Sampling PD

Fig. 3(a) and (b) shows the proposed PD and its *conceptual* waveforms. Switches  $S_{1-4}$ , tail capacitor  $C_T$ , and the parasitic capacitance  $C_P$  on the tail node are responsible for generating narrow pulses at the edges of the incoming data to recover a spectral component at the data rate. Transistors  $M_{1,2}$  and the load impedance consisting of sampling capacitors  $C_S$  and resistors  $R_D$  are mainly employed to generate an output voltage based on the phase difference between the data edges and the resampling clock.

At instant-1 in Fig. 3(c), right before a data transition, the current in  $M_{1,2}$  is 0 and the tail node voltage  $V_{TL,0}$ is high. The charge stored on the tail node capacitance is equal to  $(C_T + C_P)V_{TL0}$ , as shown in Fig. 3(d). As the data transition occurs, the polarity of  $C_T$  is flipped through  $S_{1-4}$ . Consequently, the tail node voltage,  $V_{TL}$ , exponentially drops with a time constant of  $\tau_1 = 2R_{ON}C_P$ , where  $R_{ON}$  is the on-resistance of  $S_{1-4}$ , and would reach  $V_{TL0}(C_P - C_T)/(C_T + C_P)$  if the circuit conditions remained unchanged. However, after  $\sim 1 \times \tau_1$ ,  $V_{TL}$  is low enough that  $M_{1,2}$  turn on and start recharging  $C_T$  and  $C_P$ , eventually generating the rising edge of the sampling period. The combination of the discharging due to the polarity flip and the immediate start of the recharging by  $M_{1,2}$  generates the sampling pulse,  $T_P$ , at the tail node at every data transition, as illustrated in Fig. 3(d). The duration of  $T_P$  is determined by the strength of  $M_{1,2}$  and the size of  $C_T$ , where a transistor results in a shorter pulse duration.

During  $T_P$ , the PD schematic simplifies to the one shown in Fig. 3(e).  $M_{1,2}$  converts the VCO's output voltage,  $VCO_P - VCO_N = 2A_{VCO} \cdot \cos(\omega_{VCO}t + \phi)$ , into a differential current, thus creating a charge difference  $Q_S = Q_{SP} - Q_{SN} \propto \int_0^{T_P} (\text{VCO}_P - \text{VCO}_N) dt$ , on the sampling capacitors as  $R_D \gg 1/(C_S \omega_{\text{VCO}})$  [8]. As shown in case-1 in Fig. 3(b), if the VCO zero crossings are at the center of  $T_P$ , the integrated charge  $Q_S$  is zero since the shaded red and blue areas are equal. Hence,  $V_S$  remains zero at the end of the phase comparison, and the CDR is locked. When a phase error  $\phi$  is present, the zero crossings are not centered, and consequently, both  $Q_S$  and  $V_S$  are nonzero, indicating that the CDR is unlocked [see case-2 in Fig. 3(b)]. During a phase comparison, the output CM voltage of the PD experiences a drop. Hence  $R_D$  is added to set the CM voltage of the output by charging  $C_S$  between data transitions. After the sampling period,  $V_S$  also partially discharges through  $R_D$ , resembling the PD to a leaky integrator.

By following a similar approach as [9], the PD gain,  $K_{PD}$ , can be estimated by  $2\alpha G_M A_{VCO} R_D \sin(0.5\omega_{VCO} T_P)/(\pi)$ , where  $G_M$  is the large signal transconductance of  $M_{1,2}$ ,  $\alpha$  ( $\approx$ 0.5) the data transition density. Due to the sinusoidal dependence of  $K_{PD}$  to  $T_P$ ,  $K_{PD}$  is maximized when  $T_P = 0.5T_{bit}$  and varies negligibly (i.e., less than 5%) even if  $T_P$  varies from 0.4 to 0.6 $T_{bit}$ , where  $T_{bit}$  is the bit period.

It is also instructive to investigate the impact of a mismatch between PD's components on its gain and locking point. First, since  $K_{PD}$  is not a function of  $C_S$ , the locking point is not sensitive to the mismatch between the sampling capacitors. However, a mismatch between load resistors ( $\Delta R_D$ ) and also transistors ( $\Delta G_M$ ) will create a nonzero output voltage if the VCO zero crossings occur at the center of  $T_P$ . Consequently, the loop must shift the locking point to compensate for  $\Delta R_D$  and  $\Delta G_M$  and realize  $\overline{V_S} = 0$  in the locked state. Based on Monte-Carlo simulations, the dominant factor in the locking point shift is due to  $\Delta G_M$  rather than  $\Delta R_D$ .  $M_{1,2}$ are sized  $(W/L)_{1,2} = 4.8 \ \mu m/160$  nm, considering the more pronounced mismatch at CT [10]. This limits the  $3\sigma$  variation of the CDR locking point to  $\pm 2.5$  ps.

Switches  $S_{1-4}$  and capacitor  $C_T$  are sized to roughly adjust  $T_P$  to  $0.5T_{\text{bit}}$ . Even with relatively small switches (i.e.,  $(W/L)_{S_{1-4}} = 2.4 \ \mu\text{m}/40 \ \text{nm}$ ), the falling edge of  $T_P$  can be kept short (~10 ps), allowing the PD to operate at data rates up to 25 Gb/s. At CT, the resistance of an unsilicided poly resistor reduces by ~10% [1] and  $G_M$  increases by ~20%-30% due to higher carrier mobility [11]. To maintain a high  $K_{\text{PD}}$  of 0.3 V/rad over these variations, the PD resistor is 3-bit trimmable around ~7.5 k $\Omega$ . By considering the tradeoff between the loop phase margin and CM voltage drop, a  $C_S$  of 40 fF is chosen. Thanks to its compact switches and chargebased operation, the PD exhibits an extremely low in-band phase noise of -150 dBc/Hz and consumes only 0.1 mW, including the switch driver, while operating at 10 Gb/s.

Fig. 4(a) shows the PD's tail node voltage overlaid for a pseudorandom bit sequence (PRBS). Due to the random interval lengths between data transitions, the node voltage is



Fig. 4. (a) Simulated tail waveform over many bits of a PRBS17. (b) Relative locking point offset for different data transition densities. (c) Consecutive identical bits in a PRBS17. (d) PD jitter PSD due to its data dependency.



Fig. 5. (a) Chip micrograph. (b) Measured power breakdown at RT and 4.2 K.

higher when no transition has taken place for a longer time. This slightly alters the shape of  $T_P$  and, consequently, the absolute locking point, as shown in Fig. 4(b). For a given PRBS, one can determine the number of occurrences of each consecutive identical digit (CID) interval length and assign the weighted value locking point to the sequence, illustrated in Fig. 4(c). The power spectral density is then obtained by performing a fast Fourier transform. By integrating the spectrum shown in Fig. 4(d) over the bandwidth of 40 MHz,an extremely low pattern-dependent RMS jitter of 34 fs is achieved, thusmaking this structure suitable for ultra-low jitter applications.

#### B. Clock Alignment Calibration

The clock edges must occur in the center of the incoming bits to maximize CDR tolerance to intersymbol interference (ISI) and jitter. However, the proposed complementary chargesampling PD locks the VCO zero-crossings 0.25 UI prior to the ideal point. Therefore, as shown in Fig. 2, a phase alignment loop is introduced to shift the sampling clock edges by 0.25 UI (or 90°) and align them to the center of the incoming bits, thus optimizing the jitter tolerance. A mixerbased PD is chosen for this loop as its average output settles to zero when a 90° phase shift exists between its input signals. A low-pass filter suppresses the high-frequency components of the PD output, and the resulting baseband error voltage is then amplified and quantized. The loop filter is implemented by a digital integrator and accordingly adjusts the delay of a controllable delay line consisting of two inverters and a



Fig. 6. 10 Gb/s recovered data eye diagram at (a) 300 and (b) 4.2 K. Measured phase noise of the recovered clock at (c) 300 and (d) 4.2 K.

switched capacitor bank in between. The digital delay line also acts as a clock buffer for the retiming flip-flop. The digital integrator operates at  $\sim$ 1 MHz, provided by an external clock source. Due to the low speed of the loop, all components, excluding the buffer, consume  $\sim$ 130  $\mu$ A.

#### **III. MEASUREMENT RESULTS**

The cryo-CMOS CDR was fabricated in 40-nm CMOS and occupies an active area of  $0.13 \text{ mm}^2$ , as shown in Fig. 5(a). The chip is wire-bonded to a PCB and characterized at 300 K with short cables ( $\sim 10$  cm). To measure the performance at 4.2 K, the PCB is mounted to one end of a dipstick and lowered into liquid helium. The incoming data is generated at RT by the Keysight M8195A arbitrary waveform generator (AWG). The recovered clock is driven from an ON-chip divider to an R&S FSWP8 to characterize the recovered clock phase noise. A 50  $\Omega$  driver transmits the retimed data to RT to characterize the bit error rate (BER) and jitter tolerance. The channel loss due to the cable and PCB trace at 5 GHz is 6 dB. The CDR has a frequency-locking range of  $\pm 50$  MHz and is optimized separately at 300 and 4.2 K. The CDR core dissipates 4.7 and 3.1 mW from a 1.1 V supply at 300 and 4.2 K, respectively, and its power breakdown is shown in Fig. 5(b). The VCO's power consumption reduces at 4.2 K due to the higher inductor's quality factor at CT [12], [13].

Fig. 6(a) and (b) plot the measured recovered data eye diagram at 10 Gb/s at 300 and 4.2 K. The degradation of the eye-opening at CT is due to the loss and bandwidth constraints of long dipstick cables required for cryogenic measurements. Fig. 6(c) and (d) shows the recovered clock phase noise at 300 and 4.2 K, respectively, when the input data has a 10 Gb/s PRBS pattern of  $2^{21} - 1$ , the longest available in our instrument. The integrated rms jitter from 10 kHz to 1 GHz is 260 fs at RT and remains almost the same at 4.2 K as the AWG limits the in-band phase noise at both temperatures.

Based on the measured jitter transfer curves in Fig. 7(a), the CDR loop bandwidth is 54 and 40 MHz at 300 and 4.2 K, respectively. The jitter tolerance curves in Fig. 7(b) are measured by applying a PRBS9 as the input data using the AWG and measuring the BER by the Keysight M8046A. Each measurement point indicates the jitter amplitude at which the BER quickly increases from  $10^{-12}$ . The jitter tolerance is 2 UI<sub>PP</sub> at a 5-MHz jitter frequency at 300 K and reduces to 0.85 UI<sub>PP</sub> at 4.2 K due to ISI caused by the long cables (~2.5 m) from the CDR to the measurement instruments. Similar degradation is observed when the samelength cables are used to measure jitter tolerance at RT. At both temperatures, the CDR fulfills the mask requirement of the



Fig. 7. Measured (a) jitter transfer and (b) jitter tolerance at 300 and 4.2 K. TABLE I

COMPARISON WITH STATE-OF-THE-ART CDRS

|                                                  | This Work   |        | [5]<br>JSSC'13 | [6]<br>JSSC'19 | [14]<br>JSSC'18 | [15]<br>JSSC'20 |
|--------------------------------------------------|-------------|--------|----------------|----------------|-----------------|-----------------|
| Architecture                                     | Type-II PLL |        | Type-II        | Type-I         | Type-II         | Type-II         |
|                                                  |             |        | PLL            | PLL            | AD-PLL          | AD-PLL          |
| Temperature [K]                                  | 300         | 4.2    | 300            | 300            | 300             | 300             |
| Jitter Tolerance @<br>5MHz [UIPP]                | 2           | 0.85#  | 0.7            | 2              | 1               | 0.35            |
| Rec. Clock Jitter [ps]                           | 0.260*      | 0.275* | 1.5            | 0.459          | 1.46            | 1.15            |
| Power [mW]                                       | 4.7         | 3.1    | 5              | 3              | 46              | 21.13           |
| Efficiency [pJ/bit]                              | 0.47        | 0.31   | 0.2            | 0.15           | 1.8             | 2.11            |
| Data Rate [Gb/s]                                 | 10          |        | 25             | 20             | 25              | 10              |
| Area [ m <sup>2</sup> ]                          | 130         |        | 39             | 0.36           | 50              | 31              |
| Technology [nm]                                  | 40          |        | 65             | 45             | 40              | 28              |
| Supply [V]                                       | 1.1         |        | 1              | 1              | 1.15            | 1               |
| #measured over 2.5m cable *limited by instrument |             |        |                |                |                 |                 |

SDH STM-256 standard. Table I compares the proposed CDR performance with prior art NRZ CDRs. Thanks to the PD's simultaneous high gain, low power, and low jitter, this work at RT exhibits the lowest recovered clock jitter with competitive jitter tolerance and  $P_{dc}$ . Moreover, this work is the first CDR operating at CT, suited for QC applications.

#### **IV. CONCLUSION**

This letter introduced a CDR architecture demonstrating a low recovered clock jitter and high jitter tolerance thanks to incorporating a complementary charge sampling PD and clock alignment loop. The proposed 40-nm CDR operates at 10 Gb/s, achieving a recovered clock jitter of 275 fs (260 fs) and a jitter tolerance of 0.85 UI<sub>PP</sub> (2 UI<sub>PP</sub>) at 5-MHz jitter frequency while consuming 3.1 mW (4.7 mW) at 4.2 K (300 K). As the first cryo-CMOS CDR, it enables the required data communication between classical and quantum processors in future large-scale quantum computers.

#### REFERENCES

- B. Patra et al., "Cryo-CMOS circuits and systems for quantum computing applications," *IEEE J. Solid-State Circuits*, vol. 53, no. 1, pp. 309–321, Jan. 2018.
- [2] R. W. J. Overwater, M. Babaie, and F. Sebastiano, "Neural-network decoders for quantum error correction using surface codes: A space exploration of the hardware cost-performance tradeoffs," *IEEE Trans. Quantum Eng.*, vol. 3, pp. 1–19, 2022.
- [3] C. M. Dawson and M. A. Nielsen, "The Solovay–Kitaev algorithm," 2005, arXiv:quant-ph/0505030.
- [4] K. Wu and J. Lee, "A 2×25-Gb/s receiver with 2:5 DMUX for 100-Gb/s Ethernet," *IEEE J. Solid-State Circuits*, vol. 45, no. 11, pp. 2421–2432, Nov. 2010.
- [5] J. W. Jung and B. Razavi, "A 25-Gb/s 5-mW CMOS CDR/deserializer," *IEEE J. Solid-State Circuits*, vol. 48, no. 3, pp. 684–697, Mar. 2013.
- [6] L. Kong, Y. Chang, and B. Razavi, "An inductorless 20-Gb/s CDR with high jitter tolerance," *IEEE J. Solid-State Circuits*, vol. 54, no. 10, pp. 2857–2866, Oct. 2019.
- [7] A. Beckers, F. Jazaeri, and C. Enz, "Cryogenic MOSFET threshold voltage model," in *Proc. 49th Eur. Solid-State Device Res. Conf.* (*ESSDERC*), Sep. 2019, pp. 94–97.
- [8] J. Gong, E. Charbon, F. Sebastiano, and M. Babaie, "A low-jitter and low-spur charge-sampling PLL," *IEEE J. Solid-State Circuits*, vol. 57, no. 2, pp. 492–504, Feb. 2022.
- [9] J. Gong, E. Charbon, F. Sebastiano, and M. Babaie, "A cryo-CMOS PLL for quantum computing applications," *IEEE J. Solid-State Circuits*, early access, Nov. 29, 2022, doi: 10.1109/JSSC.2022. 3223629.
- [10] P. A. T. Hart, M. Babaie, E. Charbon, A. Vladimirescu, and F. Sebastiano, "Characterization and modeling of mismatch in cryo-CMOS," *IEEE J. Electron Devices Soc.*, vol. 8, pp. 263–273, 2020.
- [11] M. Mehrpoo et al., "Benefits and challenges of designing cryogenic CMOS RF circuits for quantum computers," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2019, pp. 1–5.
- [12] B. Patra, M. Mehrpoo, A. Ruffino, F. Sebastiano, E. Charbon, and M. Babaie, "Characterization and analysis of on-chip microwave passive components at cryogenic temperatures," *IEEE J. Electron Devices Soc.*, vol. 8, pp. 448–456, 2020.
- [13] J. Gong, Y. Chen, F. Sebastiano, E. Charbon, and M. Babaie, "A 200 dB FoM 4-to-5 GHz cryogenic oscillator with an automatic common-mode resonance calibration for quantum computing applications," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2020, pp. 308–310.
- [14] M. Verbeke et al., "A 1.8-pJ/b, 12.5–25-Gb/s wide range all-digital clock and data recovery circuit," *IEEE J. Solid-State Circuits*, vol. 53, no. 2, pp. 470–483, Feb. 2018.
- [15] C. Yu, E. Sa, S. Jin, H. Park, J. Shin, and J. Burm, "A 6.5– 12.5-Gb/s half-rate single-loop all-digital referenceless CDR in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 55, no. 10, pp. 2831–2841, Oct. 2020.