# A 100 GS/s 4-to-1 Analog Time-Interleaver in 55 nm SiGe BiCMOS

Hannes Ramon, Michiel Verplaetse, Michael Vanhoecke, Haolin Li, Johan Bauwelinck, Peter Ossieur, Xin Yin and Guy Torfs

*Abstract*—We demonstrate a 4-to-1 100 GS/s time-interleaver realized in a 55 nm BiCMOS technology. The interleaver comprises two stages of 2-to-1 sub-interleavers. Each sub-interleaver is implemented using a return-to-zero generation and summing architecture. This sub-interleaver architecture ensures lower clock feedthrough and contains an inherent feed-forward equalizer. ENOB measurements have been performed revealing the interleaver's ENOB of 4.9 at 3 GHz. Additionally, the transfer function is measured to show the capabilities of the inherent feedforward equalizer of the sub-interleavers. The measured analog output bandwidth of the 4-to-1 interleaver is 73 GHz. Last, a 100 GBd PAM-4 (200 Gb/s) signal is generated by interleaving four 25 GBd PAM-4 streams, while consuming 700 mW.

*Index Terms*—Interleaver, Digital-to-Analog, Return-to-zero, Equalizer, 100 GS/s, BiCMOS,

# I. INTRODUCTION

**D** IGITAL signal processing (DSP) is currently deployed in most communication systems, going from wireless radio systems to coherent lightwave communication. Thanks to the re-configurability and flexibility of digital systems, this is often favored over full analog implementations.

Next generation DSP enabled optical networks will require sampling rates > 100 GS/s [1], [2]. One of the critical parts in the system chain utilizing DSP is a digital-to-analog converter (DAC). Here, CMOS DACs are preferred because of the possibility to tightly integrate them with the DSP. However, current CMOS DACs lack in sampling rate and/or bandwidth to facilitate 100 GBd PAM-4 (200 Gbit/s) links [3], [4]. In case of insufficient analog bandwidth in the DAC, DSP is required to equalize the DAC transfer function, which results in a reduction of effective output amplitude for which a cost in bits is also payed. To increase the speed and bandwidth of the data converters, multiple lower rate DACs can be combined in the frequency domain [5]–[8] or in the time domain [3], [9]–[16], provided that this additional step doesn't degrade the linearty to a point where there are insuffucient bits to effectively perform higher order modulation. In case of frequency interleaving, each DAC generates a separate part of the output spectrum. All outputs are then individually upconverted

Manuscript received ...; revised ...

Copyright (c) 2020 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. via an analog mixer to the correct frequency band to obtain the desired analog signal. However, the heavy DSP involved together with precise tuning and digital compensation of local oscillators, filters and mixers, make it a very challenging interleaving scheme.

Time-interleaving on the other hand combines the outputs of parallel DACs in the time domain. The signal is de-multiplexed in the digital domain before sending it to the parallel DACs. The analog signal can be obtained by linear passive or active combiners or with linear selectors. The passive combiner using a distributed topology in [3] increases the DAC sampling rate to 100 GS/s with an analog bandwidth of 13 GHz. A 200 GS/s linear active combiner with 44 GHz analog bandwidth is demonstrated in [13] using a bipolar process. Such linear combiner schemes [3], [13] extend their spurious-free dynamic range by adding the outputs of two complementary clocked DACs and cancelling the images in the even Nyquist zones [10]. Fundamentally, their output analog bandwidth cannot extend beyond the Nyquist frequency of the sub-DACs and with additional DSP, this can be extended to 1.5 times the Nyquist frequency of the sub-DACs [13].

Alternatively, time domain interleaving of DACs with linear selectors can be used. A 2-to-1 analog multiplexer (AMUX) using 130 nm SiGe BiCMOS is reported in [14] with an analog bandwidth  $>67 \,\text{GHz}$  and a measured sampling rate of 56 GS/s. In [16] a 55 nm SiGe BiCMOS 2-to-1 AMUX with a sampling rate of 120 GS/s is reported with a large power consumption of 2.2 W. The 2-to-1 AMUX from [15] achieved >110 GHz analog bandwidth at 180 GS/s using a 0.25  $\mu$ m InP HBT process, however, requiring digital pre-processing to compensate the limited switching speed [17]–[19]. In [20] we reported a 4-to-1 interleaver with an analog bandwidth beyond Nyquist at sampling rates up to 100 GS/s using a 55 nm SiGe BiCMOS. The interleaver is based on the generation and summation of return-to-zero (RZ) signals from analog inputs. The advantage of this architecture is that it can simultaneously perform equalization to e.g. compensate interconnection losses at its output and has reduced clock feedthrough.

This paper is an extension of our work demonstrated in [20]. In this work we provide an explicit explanation on the architecture and its implementation. Furthermore, additional measurements and analyses are reported. In Section II, we introduce the return-to-zero summing time-interleaving architecture and section III shows the circuit implementation of each building block of the fabricated chip utilizing the 4-to-1 2-stage architecture. We end with measurement results and a conclusion in respectively sections IV and V.

H. Ramon, M. Verplaetse, M. Vanhoecke, H. Li, J. Bauwelinck, P. Ossieur, X. Yin and G. Torfs are with Ghent University - imec, IDLab, Department of Information Technology (e-mail: guy.torfs@ugent.be).

This work was supported by the Research Foundation Flanders (FWO) and the European H2020 project ICT Qameleon (Grant no. 780354).

# II. RETURN-TO-ZERO INTERLEAVER ARCHITECTURE

The AMUX is a popular choice for high-speed timeinterleaving. A typical implementation is shown in Fig. 1 [14], [15], [21]. The AMUX can be seen as a linearized current-mode logic multiplexer that acts as an analog selector, switching between the two inputs.



Fig. 1. Typical implementation of an analog multiplexer.

Alternatively, an interleaver can also be modeled by introducing the return-to-zero (RZ) concept. If each input is multiplied with a clock pulse train  $\{0,1\}$ , zeroing the signal if the clock is 0 and copying the signal if the clock is 1, returnto-zero signals are generated for each input. Multiplying one of the two inputs with a clock with a phase of 0 degrees and the other with a clock phase of 180 degrees, assures that one input is being copied, while the other is being zeroed and vice versa. Adding both RZ-signals together yields the wanted time-interleaved signal. This alternative form can be projected on the AMUX. The clock switches the current between the differential pair Q3-Q4 amplifying input 1 ( $v_{ip1}$ - $v_{in1}$ ) and the differential pair Q5-Q6 input 2 ( $v_{ip2}$ - $v_{in2}$ ). This ensures that the gain of the differential pairs alternates between 0 and the desired gain, generating RZ-signals. However, the RZ-signals are not readily present because both differential pair outputs are directly added together in the load resistors.

The architecture proposed in this work and depicted in Fig. 2 is based on explicit RZ generation and addition. The RZ-signals are generated in a first step and added together in a second step.  $f_s$  is defined as the sampling frequency of the half-rate DACs, which is also the frequency of the clock applied to the interleaver.

This is in itself not very different from the AMUX, but the implementation of the RZ-generator block in Fig. 2 is important to maintain low distortion and limit clock feedthrough. Considering the circuit in Fig. 1, the clock determines which differential pair of the AMUX is turned on and which is turned off. The current in the differential pairs carry the RZ-signals and the rise and fall time of that current is determined by the switching speed of the transistors of the AMUX. A limited rise and fall time leads to low-pass filtering. This effect caused by



Fig. 2. Return-to-zero interleaver concept.  $f_s$  is the sampling frequency of the half-rate DACs.

limited switching speed of the transistors is described in [17]. On the other hand, duty-cycle distortion in the RZ currents, leads to unwanted clock feedthrough in the cross-over of both inputs as illustrated in Fig. 3, where the RZ-signal currents and their sum are plotted for equal and constant input signals. Clock feedthrough at the cross-over of both input signals can be observed from this simulation where a DC signal should be being generated. The interleaved signal is mixed with  $2f_s$  giving rise to additional signal dependent spectral components around  $2f_s$ .



Fig. 3. AMUX simulation results where the internal RZ current of each amplifier and the sum of the RZ currents are shown for the generation of a constant at 100 GS/s interleaving.

These spectral components around  $2f_s$  are outside the Nyquist zone of a 2-to-1 interleaver, however as is in this work, if the interleaver is to be used in a 4-to-1 implementation with 2 stages of 2-to-1 sub-interleavers (Fig. 6), the undesired spectral components of stage 1 fall inside the spectrum of the final signal generated in stage 2 and hence cannot be filtered out. The second problem with the AMUX is that a limited switching speed of the transistors gives rise to RZ-signals that don't fully go to 0. This leads to undesired low-pass filtering of the interleaved signal and hence a reduced signal bandwidth. This filtering is dependent on the steepness of the clock-shape and the switching speed of the transistors which can be solved by introducing signal pre-processing [17]–[19].

In this work, we propose an RZ-generator that generates symmetric RZ-signals. The block diagram of the proposed RZgenerator is shown in Fig. 4. It consists of two paths: one path with gain a and one path with gain b that is mixed with the clock. The conversion gain of the mixer is assumed to be 1 and the DC component of the clock signal is zero. If a = b, a perfect RZ-signal is created as illustrated in Fig. 4. In this case, the total gain of the interleaver after the RZ-summing is 2a.



Fig. 4. Return-to-zero generator block diagram. If a = b, an RZ-signal is created as illustrated.

On the other hand, if  $a \neq b$ , non-perfect RZ-signals are created. If we choose a < b, the RZ-signal overshoots and flips sign. This means that part of the signal that is being zeroed is subtracted from the other signal. Since both inputs are interleaved one after another, the part that is being subtracted from the other input is in fact proportional to the previous or delayed symbol at full rate. The interleaver is emphasizing the interleaved signal. A de-emphasized interleaved signal can be obtained by choosing a > b. By changing the difference between gains a and b, an intrinsic 2-tap feed forward equalizer (FFE) inside the 2-to-1 interleaver implementing  $a + b + (a - b)z^{-1}$  is obtained. This is illustrated in Fig. 5. The intrinsic FFE is valuable at sampling rates >50 GS/s which can be used to e.g. compensate frequency dependent interconnection losses. Limited switching speed in the AMUX is equivalent to the case where a > b, except that the ratio between a and b is dependent on the transistor speed and clock shape and is not controllable. Compared to the AMUX, the RZ-summing interleaver has the capability to fully control the RZ-signal generation, hence no pre-processing is required, albeit at the cost that both the RZ-generator and the RZcombiner contribute to the overall linearity degradation. On the other hand improved switching behavior allows to reduce the power consumption, even if there are two steps in the interleaving process.

Higher order interleaving, while still taking advantage of the intrinsic FFE, can be achieved by placing the 2-to-1 interleavers in a multi-stage topology, interleaving all the inputs 2-by-2. To have a useful filter for each stage *i*, the coefficients  $c_i = a_i - b_i$  and  $d_i = a_i + b_i$  of all the RZ-generators in each stage *i* need to be equal to each other. The stages are numbered starting at 1 for the lowest sub-interleaving sampling rate and increasing for the higher internal sampling rates. The transfer functions of all the stages are multiplied with each other to obtain the total filter H(z) (1) for N stages.

$$H(z) = \prod_{i=1}^{N} d_i + c_i z^{N-l-1}$$
(1)

For a 4-to-1 interleaver with 2 stages, we obtain the following filter transfer function H(z).

$$H(z) = (d_1 + c_1 z^{-2})(d_2 + c_2 z^{-1})$$
  
=  $d_1 d_2 + c_2 d_1 z^{-1} + c_1 d_2 z^{-2} + c_1 c_2 z^{-3}$  (2)

Eq. (2) shows a third order filter. Although, only three out of four coefficients can be set independently.

With this 2-stage architecture, the 4-to-1 interleaver can be conveniently placed in a 2-to-1 mode by configuring the first stage as a through. This is obtained by setting b = 0 and/or disabling the clock in the stage 1 interleavers.

# **III. 4-TO-1 INTERLEAVER CIRCUIT IMPLEMENTATION**

Fig. 6 shows the block diagram of the implemented 4-to-1 interleaver, consisting of two 2-to-1 sub-interleaving stages utilizing the RZ-summing architecture. The first stage generates two intermediate interleaved half-rate (50 GS/s) signals from four quarter-rate (25 GS/s) inputs. The second stage performs the final interleaving step from the two half-rate signals to the full rate (100 GS/s) output. An external 50 GHz clock is fed to stage 2 and to a quadrature clock divider, which generates quarter rate clocks for stage 1. Before the quadrature clocks enter the first interleaving stage, phase interpolators provide on-chip tuning of the divided clock phase with respect to the input data. The clocks are distributed across the chip with a suitable clock tree. The interleaver has a total gain close to 7 dB, with most of the gain shifted towards stage 2 for linearity considerations. Input-output interfaces are implemented through four  $100\,\Omega$  differential input buffers, a  $100\,\Omega$  differential clock input buffer and one  $100\,\Omega$  differential output buffer.

# A. Return-to-Zero Generator

Fig. 7 shows the schematic of the RZ-generators. It features 2 parallel degenerated differential pairs (Q1-Q2 and Q3-Q4). The current from Q1-Q2 goes through a Gilbert-cell mixer and the current from Q3-Q4 goes through a Gilbert-cell variable gain amplifier (VGA). The VGA represents gain block a of Fig. 4 and the Gilbert-cell mixer simultaneously represents gain block b and the mixer of the same figure. The gain b is tuned by the amplitude of the clock signal in the last clock buffer/driver. Both parallel paths are summed in the load resistors of  $R_{\rm L}$ .

Simulation results of the RZ-generator (stage 2) with a 50 GHz clock and a 10 GHz input are shown in Fig. 8 and 9 where RZ-signals can be observed for different ratios of a and b. A sine of 10 GHz is added as a reference showing the envelope of the RZ-signal. In Fig. 8, b is kept constant while sweeping a by changing the gain of the VGA and in Fig. 9, b is swept by changing the amplitude of the clock signal at the mixer. A clock amplitude equal to 400 mV<sub>ppd</sub> corresponds to a/b = 1. If a = b, a 50 GS/s 10 GHz RZ sine has frequency components at 10 GHz, 40 GHz and 60 GHz. By lowering a and keeping b constant, we are reducing the amplitude of the frequency component at 10 GHz, until at a = 0, the 10 GHz component is completely gone and only the mixing products



Fig. 5. Waveform examples of the FFE functionality in the 2-to-1 RZ-summing interleaver architecture for the nominal case (a = b), an emphasized signal (a < b) and a de-emphasized signal (a > b).



Fig. 6. Block diagram of the 4-to-1 interleaver implemented in a 55 nm BiCMOS technology.



Fig. 7. Schematic of the RZ-generator.



Fig. 8. Simulation results of the RZ-generator circuit (Fig. 7) of stage 2 with a 10 GHz input and 50 GHz clock for different values of a. b (clock amplitude 400 mV<sub>ppd</sub>) is kept constant in this simulation.



Fig. 9. Simulation results of the RZ-generator circuit (Fig. 7) of stage 2 with a 10 GHz input and 50 GHz clock for different values of b by changing the clock amplitude applied at the mixer. a is kept constant in this simulation.

of 10 GHz with 50 GHz remain (40 GHz and 60 GHz). On the other hand when we reduce *b* while keeping *a* constant, we are lowering the amplitude of the mixing products of 10 GHz with 50 GHz until they are completely gone at b = 0 (clock amplitude  $0 \text{ mV}_{ppd}$ ) and 10 GHz remains.

The bandwidth requirements of the RZ-generators are high, since they need to support the mixing products of the input

and the clock present in the RZ-signal. Therefore, shunt-series peaking was introduced with custom designed inductors. The series inductor  $L_s = 30 \text{ pH}$  is also conveniently used for bridging a large distance in the layout. In stage 1 and stage 2,  $R_{\rm L}$ ,  $R_e$ ,  $I_{\rm tail}$  and  $I_{\rm f}$  are respectively equal to  $45 \Omega$ ,  $64 \Omega$ , 4 mAand 2 mA. Stage 1 and stage 2 are for convenience almost equal except for the shunt peaking inductance  $L_{\rm L}$ , which is 30 pH in stage 1 and 70 pH in stage 2. Despite the fact that stage 1 theoretically requires a lower bandwidth than stage 2, we chose to only change the shunt peaking inductance. A higher bandwidth in stage 1 also helps with a steeper rise and fall time of the RZ-signals, reducing the inter-symbol interference in the RZ-summer. Small signal simulation results obtained by periodic ac simulation of the RZ-generator for stage 1 and stage 2 are shown in Fig. 10, resulting in a bandwidth for stage 1 of 65 GHz and 75 GHz for stage 2.



Fig. 10. Small signal AC simulation of RZ-generator for stage 1 and stage 2.

The clock buffer driving the mixer is displayed in Fig. 11. A limiting amplifier with variable output swing up to  $450 \,\mathrm{mV}_{ppd}$ for stage 1 and 540 mV<sub>ppd</sub> for stage 2 is followed by emitter followers where the cascode transistors Q5 and Q6 are crosscoupled to increase the bandwidth. The positive feedback in the cross-coupling creates a negative capacitance with the parasitic capacitance of the current source underneath, effectively reducing the load capacitance. The mixer driver output swing is used to set the conversion gain of the mixer, responsible for the gain b and is set by the variable current source  $I_{\text{tail}}$ . The common mode output voltage of the emitter followers is around 1.5 V, which is too low for the mixer. Furthermore, the common mode voltage also changes with the configured output swing. A resistive bias-T is used to level-shift the clock signal to  $V_{\rm dd} = 2.5$  V. However, a bias-T removes the possibility to fix the mixer in a single direction, which is e.g. required to configure the RZ-generator in through-mode. Therefore, we added the possibility to set the bias voltage to 0V or 2.5 V with the control signals enableN an enableP. Setting both bias voltages to 0 V, could break some transistors in the RZgenerator and therefore, protection in the form of two nandgates is used.  $R_{\rm L}$  and  $I_{\rm f}$  are respectively for stage 1 equal to 150  $\Omega$  and 1 mA and for stage 2 equal to 90  $\Omega$  and 2 mA.

# B. Return-to-Zero Summer

The RZ-signals are summed in the RZ-summer consisting of two parallel differential pairs (Fig. 12). Their current is summed in a regulated cascode (RGC) and  $R_{\rm L}$ . Series inductive peaking was added to increase the bandwidth and a follower to buffer the signal. The current sources ( $I_{\rm casc}$ ) at the inputs of the RGC are required to increase the common mode voltage at the collectors of the cascode transistors. A



Fig. 11. Schematic of the mixer clock driver.

too low common mode voltage leads to linearity loss and speed reduction in the cascode transistors. To reduce the offset generated by the RZ-generator and RZ-summer an offset compensation loop was added. The offset loop conveniently attaches to the DC current sources.



Fig. 12. Schematic of the RZ-summer.

Both the summers of stage 1 and 2 share the same topology with  $L_s = 30 \text{ pH}$ ,  $R_L = 80 \Omega$ ,  $I_{casc} = 1.5 \text{ mA}$ ,  $I_{tail} = 5 \text{ mA}$ . Stage 1 has a gain of 1 dB and stage 2 of 3 dB, therefore  $R_e$ is different and equal to  $50 \Omega$  for stage 1 and  $37 \Omega$  for stage 2. A larger current  $I_f = 4 \text{ mA}$  in the emitter follower of stage 2, compared to  $I_f = 2 \text{ mA}$  for stage 1, is required to drive the large output stage. To save power, the RGC can be disabled in stage 1.

A detailed schematic of the RGC is shown in Fig. 13. The RGC loop strives to decrease the input impedance of the cascode transistors by feedback, thus increasing the bandwidth at the current combination of the two parallel differential pairs. The effect of the RGC is displayed in Fig. 14 for stage 1 and stage 2. The bandwidth for the combiner in stage 1 goes from 58 GHz to 91 GHz and for stage 2 from 52 GHz to 84 GHz.



Fig. 13. Schematic of the RGC current summer in the RZ-combiner.



Fig. 14. Small signal AC simulation of RZ-summer with the RGC enabled and disabled for stage 1 and stage 2.

# C. Clocking

A half-rate clock is applied to the chip through a  $100\,\Omega$ input buffer from which two quarter-rate clocks in quadrature need to be generated. This is realized by using a current mode logic (CML) toggle flip-flop based divider consisting of two CML latches (Fig. 15b) where the in-phase clock is taken after the first latch of the flip-flop and the quadrature clock from the output of the second latch in the flip-flop. These inphase and quadrature clocks are then fed to the  $360^{\circ}$  phase interpolator depicted in Fig. 15a. It consists of sign selectors, to choose the quadrant of the phase shift and a normal phase interpolator, that gives the actual phase shift in the chosen quadrant by making a linear combination of the in-phase and quadrature inputs. The interleaver is highly sensitive to clockto-data phase mismatch, therefore, the phase interpolator is used to fine-tune the phase of the two quarter rate clocks with respect to the half-rate input clock.

The clocking circuits and clock tree are implemented using broadband circuits and hence the interleaver can work in a very broad range of sampling rates, limited by the clocking. On the upperside, the sampling rate is limited by the bandwidth of the clocktree, which is designed around 50 GHz. On the lower side, the sampling rate is limited by the high-pass filtering of



(a)  $360^{\circ}$  phase interpolator



(b) Quadrature clock divider

Fig. 15. Clocking circuit diagrams.

#### **IV. MEASUREMENTS**

A die photograph of the chip can be seen in Fig. 16. It measures  $1870 \,\mu\text{m}$  by  $1120 \,\mu\text{m}$ . The chip uses a single  $V_{dd} = 2.5 \,\text{V}$  supply voltage and is programmable with an on-chip Serial-Periphial Interface (SPI) controller. This chip is realized in a 55 nm BiCMOS technology and consumes 700 mW in 4-to-1 mode, for which the power breakdown is given in Fig. 17. From this total, 45 % of the power is dedicated to the interleaver core consisting of the 3 sub-interleavers, 30 % of the total power consumption is used for the clocking, including the divider, phase interpolators and the clock trees for the half-rate and quarter rate clocks. The remaining 25 % of the power consumption is dedicated to the 4 input buffers, output driver and biasing.

For the experiments in this work, the chip is wirebonded to a custom designed 6 layer Megtron6 RF printed circuit board (PCB). To reduce the wirebond inductance, the chip is placed in a cavity.

A 4-channel 92 GS/s arbitrary waveform generator (AWG) is used to generate the data for the 4 inputs. The output of the interleaver is sampled either by a sampling oscilloscope or a 160 GS/s real-time oscilloscope. The 50 GHz master clock is generated by a benchtop clock source and we used a mechanical delay line to align the phase of the clock with the data. An on chip phase-locked loop, locked on a reference, would allow for the removal of the benchtop clock source but this was not the scope of this work. The trigger for the



Fig. 16. Die photograph.



Fig. 17. Power breakdown of the interleaver in 4-to-1 mode.

sampling oscilloscope is provided by a clock divider, which also provides a low frequency trigger to the AWG to keep everything in sync with the clock of the interleaver. Fig. 18 shows the described measurement setup.

#### A. Interleaving performance

As a first experiment, the chip is placed in 2-to-1 mode, a sine wave of 3.125 GHz is applied to one input and the other is held constant. The sine wave is hence interleaved with a constant zero-signal. After interleaving, an RZ-signal is created as shown in Fig. 19. The envelope input sine wave has a divided-by-16 frequency of the clock to enable the use of a sampling oscilloscope.

Fig. 20 shows similar measurement results, but for varying equalizer settings analogous to the simulations in Fig. 8 and Fig. 9. The RZ sine waves show overshoot for a < b and undershoot for a > b.

Next all signal inputs were applied which resulted in a 3 GHz sine wave after interleaving. The ENOB, measured by capturing the full waveform, of the resulting  $400 \text{ mV}_{ppd}$  3 GHz generated sine is 4.9 bits. The spectrum of the captured 3 GHz signal is shown in Fig. 22. We can see the 3rd and 5th harmonic as well as the 9th harmonic, which is generated from 9 GHz as the 3rd harmonic in stage 2 and the output stage. The other harmonics are mainly due to the AWG. The ENOB as function of output swing for a 3 GHz sine wave is displayed in Fig. 21. Up to 300-400 mV<sub>ppd</sub>, the ENOB is



Fig. 18. The measurement setup.



Fig. 19. 3.125 GHz sine wave interleaved with DC at 100 GS/s to create a RZ sine wave, showing the speed and performance of the interleaver.

dominated by noise and starting from  $450-500 \,\mathrm{mV_{ppd}}$ , the linearity dominates.

Fig. 23 depicts the ENOB as function of frequency. At 40 GHz, the ENOB is still 4.2 bits, which is the maximum frequency we could measure with our setup. The ENOB shows a downwards trent for higher frequencies, which is mostly due to the jitter in the sampling clock.

## B. Equalizer

In order to verify the built-in equalizer, we generated a 100 Gb/s NRZ pseudo-random bit sequence stream from four 25 Gb/s inputs. This stream was captured by a sampling oscilloscope in pattern-lock mode with 16 samples per symbol. Based on the captured interleaved data and the transmitted streams, we estimated the impulse response of the interleaver and output cables to the oscilloscope. We performed this measurement for several settings of the built-in equalizer. Fig. 24 shows the measured filter transfer functions for several settings of stage 1, normalized to the gain at 25 GHz. Here, the equalizer settings of stage 2 were kept constant. Ideal transfer functions of the form  $(a_1 + b_1) + (a_1 - b_1)z^{-2}$  fitted to the measurements are displayed on the same figure. The



Fig. 20. 3.125 GHz interleaved with DC at 100 GS/s for different equalizer settings, creating undershoot if a > b and overshoot if a < b



Fig. 21. Spectrum of a 3 GHz ( $f_0$ ) sine at 400 mV<sub>ppd</sub> output swing.

measurements show good agreement with the ideal transfer functions.

A similar experiment was conducted where the equalizer settings of stage 2 were varied. Fig. 25 shows the obtained transfer functions as well as the fitted ideal transfer function of the form  $(a_2 + b_2) + (a_2 - b_2)z^{-1}$ , normalized to the gain at 50 GHz. The measurements show good agreement with the ideal transfer functions.

## C. Data Generation

In Fig. 26, the measured 250 mVppd 100 GBd PAM-4 (200 Gb/s) zero padded PRBS 2<sup>15</sup>-1 in 4-to-1 mode generated from four 25 GBd sequences are displayed. The 4 applied data streams are constructed such that at the output of the interleaver a zero padded PRBS 2<sup>15</sup>-1 is observed. Only the internal equalizer was used to overcome the short 5 cm cable and printed circuit board traces, with approximately 6 dB loss at 50 GHz, to the remote sampling heads of the sampling oscilloscope. The measured bit-error-rate is below  $10^{-4}$ . The corresponding eye diagram with the internal equalizer turned off is shown in Fig. 27 which has a peak-to-peak amplitude of around  $500 \,\mathrm{mV}_{ppd}$ . The ratio level mismatch (RLM) of the output signal equalized to overcome the 6 dB loss goes from 0.985 for an output amplitude of  $100 \text{ mV}_{ppd}$  (200 mV<sub>ppd</sub> unequalized) to 0.9 for an output amplitude of  $320 \,\mathrm{mV}_{ppd}$  $(640 \text{ mV}_{ppd} \text{ unequalized}).$ 

Fig. 28 shows the output eye diagram of the interleaver in 4to-1 mode where one input is a PAM-4 stream and the other 3



Fig. 22. Measured ENOB as function of output swing at 3 GHz.



Fig. 23. ENOB as function of frequency for 400 mV<sub>ppd</sub> output swing.

are zero. This creates an RZ-signal with one symbol a PAM-4 symbol followed by 3 symbols (30 ps) zero. Here, the intersymbol interference (ISI) can be observed. In order to reduce the ISI when all inputs are turned on, the "zero" line in Fig 28 in between the PAM-4 symbols needs to be as flat as possible.

The electrical bandwidth of the interleaver in 4-to-1 mode is estimated similarly to the filter measurements performed in section IV-B. The resulting transfer function is displayed in Fig. 29, which shows a 3 dB bandwidth of 73 GHz.

# D. Comparison with State-of-the-Art

The state-of-the-art comparison is depicted in Table I. Compared to the silicon implementations, we have implemented a 4-to-1 interleaver operation with a much lower power consumption, while having a comparable sampling rate. The implementations in InP [15], [21] show superior speed thanks to much faster technologies, but those technologies are known to be more expensive, have lower yield and have less integration possibilities. [19] uses the Type 2 DSP bandwidth doubling technique described in the same paper to generate 168 GBd at 84 GS/s, using a quarter rate 42 GHz clock. This technique is not limited to the AMUX implementation and can also be applied to this work to further extend the symbol rate. Note that in this work, the bandwidth of the interleaver is measured in operation by capturing the interleaved output signal. In [14], [15], [19], [21], the bandwidth is measured

| Reference                     | [3]                 | [13]               | [14]                     | [16]                    | [19]              |       | [21]              | This work            |      |
|-------------------------------|---------------------|--------------------|--------------------------|-------------------------|-------------------|-------|-------------------|----------------------|------|
| Technology                    | 28 nm<br>CMOS       | Bipolar<br>B7HF200 | 130 nm<br>SiGe<br>BiCMOS | 55 nm<br>SiGe<br>BiCMOS | 250 nm<br>InP HBT |       | 500 nm<br>InP HBT | 55 nm<br>SiGe BiCMOS |      |
| $f_{\rm T}/f_{\rm max}$ (GHz) | -/-                 | 200/275            | 300/500                  | 320/370                 | 460/480           |       | 290/320           | 320/370              |      |
| Architecture                  | Passive<br>Combiner | Active combiner    | AMUX                     | AMUX                    | AMUX              |       | AMUX              | RZ summing           |      |
| Interleaving<br>factor        | 2                   | 2                  | 2                        | 2                       | 2                 |       | 2                 | 4                    | 2    |
| Sampling<br>rate (GS/s)       | 100                 | 200                | 56                       | 120                     | 180               | 84    | 128               | 100                  |      |
| Bandwidth (GHz)               | 13                  | 44                 | 67                       | -                       | 110               |       | 63                | 73                   |      |
| Baud rate<br>(GBd)            | 50                  | 64                 | 56                       | 120                     | -                 | 168** | -                 | 100                  |      |
| Power (W)                     | 2.5*                | 1.8                | 1.06                     | 2.2                     | 0.99              |       | 0.54              | 0.7                  | 0.53 |
| ENOB                          | 5.3                 | >4.5               | -                        | 7.7                     | -                 |       | -                 | 4.9                  |      |

TABLE I State-of-the-art comparison table.

\*: DACs + passive transmission line as passive combiner.

\*\*: Generated by using DSP and Type 2 bandwidth doubling [19] using a quarter rate clock (42 GHz).



Fig. 24. Measured filter transfer functions in 4-to-1 mode for different settings of the FFE of stage 1 while keeping the FFE parameters of stage 2 constant.



Fig. 25. Measured filter transfer functions in 4-to-1 mode for different settings of the FFE of stage 2 while keeping the FFE parameters of stage 1 constant.

using a vector network analyzer with no clock present and the AMUX fixed in one position. Measuring the bandwidth in operation gives a more realistic bandwidth measurement



Fig. 26. Measured 100 GS/s PAM-4 generated from four 25 GBd PAM-4 inputs (internal FFE on to compensate the 6 dB loss at 50 GHz due to the PCB traces and cables).



Fig. 27. Measured 100 GS/s PAM-4 generated from four 25 GBd PAM-4 inputs (internal FFE off).

that also includes the degradation due to the limited switching speed of the transistors.



Fig. 28. Measured 100 GS/s 4-to-1 interleaved output where one input is a PAM-4 stream and the other 3 are zero. We obtain a return-to-zero output with one symbol a PAM-4 symbol followed by 3 symbols (30 ps) zero.



Fig. 29. Measured transfer function of the interleaver in 4-to-1 mode. A 3 dB bandwidth of 73 GHz is obtained.

# V. CONCLUSION

We showed a 100 GS/s 4-to-1 interleaver implementation of a return-to-zero summing time interleaver architecture. An ENOB of 4.9 bits was demonstrated at 400 mV<sub>ppd</sub> output. The electrical bandwidth of the interleaver is 73 GHz. Data generation up to 100 GS/s was achieved at a power consumption of 700 mW for 4-to-1 interleaving.

#### REFERENCES

- [1] Ethernet Alliance, "The 2020 Ethernet Roadmap," https://ethernetalliance.org/technology/2020-roadmap/, March 2020.
- [2] M. Chagnon, "Optical communications for short reach," Journal of Lightwave Technology, vol. 37, no. 8, pp. 1779–1797, 2019.
- [3] H. Huang et al., "An 8-bit 100-GS/s distributed DAC in 28-nm CMOS for optical communications," *IEEE Transactions on Microwave Theory* and Techniques, vol. 63, no. 4, pp. 1211–1218, April 2015.
- [4] J. Cao et al., "29.2 a transmitter and receiver for 100Gb/s coherent networks with integrated 4x64GS/s 8b ADCs and DACs in 20nm CMOS," in 2017 IEEE International Solid-State Circuits Conference (ISSCC), Feb 2017, pp. 484–485.
- [5] C. Schmidt *et al.*, "Digital-to-analog converters using frequency interleaving: Mathematical framework and experimental verification," *Circuits, Systems, and Signal Processing*, vol. 37, no. 11, pp. 4929–4954, Nov 2018.
- [6] X. Chen *et al.*, "All-electronic 100-GHz bandwidth digital-to-analog converter generating PAM signals up to 190 GBaud," *Journal of Lightwave Technology*, vol. 35, no. 3, pp. 411–417, Feb 2017.
- [7] C. Laperle and M. O'Sullivan, "Advances in high-speed DACs, ADCs, and DSP for optical coherent transceivers," *Journal of Lightwave Technology*, vol. 32, no. 4, pp. 629–643, Feb 2014.

- [8] C. Kottke *et al.*, "Performance of bandwidth extension techniques for high-speed short-range IM/DD links," *Journal of Lightwave Technology*, vol. 37, no. 2, pp. 665–672, Jan 2019.
- [9] E. Olieman, A. Annema, and B. Nauta, "An interleaved full Nyquist high-speed DAC technique," *IEEE Journal of Solid-State Circuits*, vol. 50, no. 3, pp. 704–713, March 2015.
- [10] J. Deveugele, P. Palmers, and M. S. J. Steyaert, "Parallel-path digitalto-analog converters for Nyquist signal generation," *IEEE Journal of Solid-State Circuits*, vol. 39, no. 7, pp. 1073–1082, July 2004.
- [11] P. T. M. van Zeijl and M. Collados, "On the attenuation of dac aliases through multiphase clocking," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 56, no. 3, pp. 190–194, March 2009.
- [12] D. Ferenci, M. Grozing, and M. Berroth, "A 25 GHz analog multiplexer for a 50GS/s D/A-conversion system in InP DHBT technology," in 2011 IEEE Compound Semiconductor Integrated Circuit Symposium (CSICS), Oct 2011, pp. 1–4.
- [13] H. Hettrich et al., "A linear active combiner enabling an interleaved 200 GS/s DAC with 44 GHz analog bandwidth," in 2017 IEEE Bipolar/BiCMOS Circuits and Technology Meeting (BCTM), Oct 2017, pp. 142–145.
- [14] T. Tannert et al., "A SiGe-HBT 2:1 analog multiplexer with more than 67 GHz bandwidth," in 2017 IEEE Bipolar/BiCMOS Circuits and Technology Meeting (BCTM), Oct 2017, pp. 146–149.
- [15] M. Nagatani et al., "An over-110-GHz-bandwidth 2:1 analog multiplexer in 0.25-um InP DHBT technology," in 2018 IEEE/MTT-S International Microwave Symposium - IMS, June 2018, pp. 655–658.
- [16] M. Collisi and M. Möller, "A 120 GS/s 2:1 analog multiplexer with high linearity in SiGe-BiCMOS technology," in 2020 IEEE BiCMOS and Compound Semiconductor Integrated Circuits and Technology Symposium (BCICTS), Nov 2020, pp. 214–215.
- [17] H. Yamazaki *et al.*, "Discrete multitone transmission at net data rate of 250 Gb/s using digital-preprocessed analog-multiplexed dac with halved clock frequency and suppressed image," *Journal of Lightwave Technology*, vol. 35, no. 7, pp. 1300–1306, 2017.
- [18] —, "IMDD transmission at net data rate of 333 Gb/s using over-100-GHz-bandwidth analog multiplexer and mach-zehnder modulator," *Journal of Lightwave Technology*, vol. 37, no. 8, pp. 1772–1778, 2019.
- [19] M. Nagatani et al., "A beyond-1-tb/s coherent optical transmitter frontend based on 110-ghz-bandwidth 2:1 analog multiplexer in 250-nm inp dhbt," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 9, pp. 2301– 2315, 2020.
- [20] H. Ramon et al., "12.4 a 700 mW 4-to-1 SiGe BiCMOS 100 GS/s analog time-interleaver," in 2020 IEEE International Solid-State Circuits Conference (ISSCC), Feb 2020, pp. 214–215.
- [21] M. Nagatani *et al.*, "A 128-GS/s 63-GHz-bandwidth InP-HBT-based analog-MUX module for ultra-broadband D/A conversion subsystem," in 2017 IEEE MTT-S International Microwave Symposium (IMS), June 2017, pp. 134–136.