In 1955 Gilchrist et al. proposed speed-independent RCA with carry completion signal . In 1960s that circuit was carefully analyzed and improved [19-21]. In 1980 Seitz used RCA for illustrating his concept of equipotential region and his approach to self-timed system design .
Now we use RCA as a CL for illustrating our approach to SIM design.
As it was shown in Section 4.2 the turn-on and turn-off delays of the OVD circuit are proportional to the equivalent capacitance Ceq associated with OVD circuit input. Capacitance Ceq depends linearly on a number of gates N in CMOS CL. To speed up a SIM it is necessary to reduce a number N. This can be reached by structural decomposition CMOS CL into subcircuits CL1, CL2, etc. Each subcircuit CLi is connected to its own detecting circuit OVDi or directly to the power supply if this subcircuit transition does not affect the transition duration in CL as a whole. Each detecting circuit OVDi generates its own OV signal which is combined with other OVDs' output signals via a multi-input OR (NOR) element. The output signal of that element serves as OV signal of the CMOS CL.
Multi-bit RCA computation time is determined by length of maximal activated carry chain. A lot of papers were devoted to analysis of carry generation and carry propagation in RCA [19-21], many of them contained their own methods for estimation or calculation of average maximal activated carry chain. We do not intend to add another one.
Let us have a look inside RCA. As it was mentioned above RCA consists of one-bit full adders and each full adder consists of two parts: forming sum si part and forming carry ci+1 part (Fig.16).
In multi-bit RCA all forming sum parts do not interact with each other and do not affect on transition duration in RCA. Each forming carry ci+1 part receives ci signal from preceding forming carry part and sends ci+1 signal to consequent one.
To decompose RCA we use three heuristic tricks:
(i) All forming sum parts we connect directly to power supply.
(ii) We divide each forming carry part into three subcircuits denoted in Fig.16 by numbers 1,2 and 3. All subcircuits 1 we connect directly to power supply because they do not contain input ci and so do not contain carry propagation path.
(iii) All subcircuits 2 we connect to OVD1 and all subcircuits 3 we connect to OVD2. Outputs of OVD1 and OVD2 are connected to two-input NOR-gate forming RCA OV signal in positive logic manner (Fig.17).
OVD1 and OVD2 input currents I1 and I2 curves for 6-bit RCA and longest transition duration are shown in Fig.18.
Accepting Vth1,2=400mV we calculated the OVD circuits parameters. It was obtained R11=5k, Ith1=0.08mA, R12=3k, Ith2=0.13mA. OVD1 and OVD2 delay dependencies on a number of bits in RCA are shown in Fig.19.
4.5 Comparison of SIMs with synchronous counterparts
Transition duration in CL is a random variable. Probability of transition with duration D is determined by implemented Boolean function and distribution of input logical combinations. Domain of possible values for variable D occupies the interval [0;Dmax]. Here Dmax is a length of critical path in CL.
Let is a mathematical expectation of transition duration in CL where Di is a length of i-th SPP in CL, pi is a probability of i-th path being the longest activated SPP.
When CL works in the synchronous mode, the cycle duration Ts is chosen with regard to maximal transition duration Dmax. Certain margin must be added to Dmax to provide reliable operation of CL in the case of CL parameter variations: Ts =kDmax where k is a margin coefficient.
In SIM cycle duration is a random variable with expectation Tsi = gDme+toff+tif where g is a coefficient of CL delay increasing due to reducing power supply voltage, toff is turn-off delay of the OVD circuit, tif is an interface circuitry delay.
We determine efficiency E for speed-independent mode of CL operation as relative increase of SIM performance in comparison to its synchronous counterpart:.
Generally, speed-independent mode is more efficient than synchronous one if Ts >Tsi or, in other words, .
In the case of RCA where tc is a delay of carry forming part, n is a number of full adders in RCA.
It has been shown  that in n-bit RCA Dme tclog2(5n/4). Then, in the case of speed-independent operation Tsi=gtclog2(5n/4)+toff+tif.
We have obtained dependencies of Ts , Tsi on a number of bits in RCA that are shown in Fig.20. As it can be seen, speed-independent operation of RCA is more efficient while n>8.
I would like to thank Igor Shagurin and Vlad Tsylyov of the Moscow Physical Engineering Institute for helpful discussions of this work. I am also grateful to Chris Jesshope of University of Surrey and Mark Josephs of Oxford University who kindly provided the latest material on their research in the area of delay-insensitive circuit design.
Miller, R.E., Switching theory (Wiley, New York, 1965), vol.2, Chapter 10.
Unger, S.H., Asynchronous Sequential Switching Circuits (Wiley, New York, 1969).
Armstrong, D.B., A.D. Friedman, and P.R. Menon, Design of Asynchronous Circuits Assuming Unbounded Gate Delays, IEEE Trans.on Computers C-18 (12) (1969) 1110-1120.
Seitz, C.L., System timing, in: C.A. Mead and L.A. Conway, eds., Introduction to VLSI Systems (Addison-Wesley, New York, 1980), Chapter 7.
Izosimov, O.A., I.I. Shagurin, and V.V. Tsylyov, Physical approach to CMOS module self-timing, Electronics Letters 26 (22) (1990) 1835-1836.
Veendrick, H.J.M., Short-circuit dissipation of static CMOS circuit and its impact on the design of buffer circuits, IEEE J. Solid-State Circuits SC-19 (4) (1984) 468-473.
Chappell, B.A, T.I. Chappell, S.E. Schuster, H.M. Segmuller, J.W. Allan, R.L. Franch, and P.J. Restle, Fast CMOS ECL receivers with 100-mV worst-case sensitivity, IEEE J. Solid-State Circuits SC-23 (1) (1988) 59-67.
Chu, S.T., J. Dikken, C.D. Hartgring, F.J. List, J.G. Raemaekers, S.A. Bell, B. Walsh, and R.H.W. Salters, A 25-ns Low-Power Full-CMOS 1-Mbit (128K8) SRAM, IEEE J. Solid-State Circuits SC-23 (5) (1988) 1078-1084.
Frank, E.H., and R.F. Sproull, A Self-Timed Static RAM, in: Proc. Third Caltech VLSI Conference (Springer-Verlag, Berlin, 1983) pp.275-285.
Donoghue, W.J., and G.E. Noufer, Circuit for address transition detection, US Patent 4563599, 1986.
Huang, J.S.T., and J.W. Schrankler, Switching characteristics of scaled CMOS circuits at 77K, IEEE Trans. on Electron Devices ED-34 (1) (1987) 101-106.
Gilchrist, B., J.H. Pomerene, and S.Y. Wong, Fast Carry Logic for Digital Computers, IRE Trans. on Electronic Computers EC-4 (4) (1955) 133-136.
Hendrickson, H.C., Fast High-Accuracy Binary Parallel Addition, IRE Trans. on Electronic Computers EC-9 (4) (1960) 465-469.
Majerski, S., and M. Wiweger, NOR-Gate Binary Adder with Carry Completion Detection, IEEE Trans. on Electronic Computers EC-16 (1) (1967) 90-92.
Reitwiesner, G.W., The determination of carry propagation length for binary addition, IRE Trans. on Electronic Computers EC-9 (1) (1960) 35-38.
SPICE2G.6: MOSFET model parameters
VALUENameParameterUnitsPMOSNMOS1levelmodel index-332VTOZERO-BIAS THRESHOLD VOLTAGEV-1.3371.1613KPTRANSCONDUCTANCE
4.610-54GAMMABULK THRESHOLD PARAMETER0.5010.3545PHISURFACE POTENTIALV0.6950.6606RDDRAIN OHMIC RESISTANCEOHM333857RSSOURCE OHMIC RESISTANCEOHM333858CBDZERO-BIAS B-D JUNCTION
6.910-159CBSZERO-BIAS B-S JUNCTION
6.910-1510ISBULK JUNCTION SATURATION
9.2210-1511PBBULK JUNCTION POTENTIALV0.80.812CGSOGATE-SOURCE OVERLAP CAPACI-
TANCE PER METER CHANNEL WIDTH
3.3010-1013CGDOGATE-DRAIN OVERLAP CAPACI-
TANCE PER METER CHANNEL WIDTH
3.3010-1014CGBOGATE-BULK OVERLAP CAPACITANCE
PER METER CHANNEL LENGTH
2.6010-915RSHDRAIN AND SOURCE DIFFUSION