Efficient 3’s Complement Circuit for Ternary-ALU

Alok Saha1, Sudeshna Dutta1, Snigdha Dutta1, Osman Hossain Siddique1, Rimpa Dey1 and Anup Kumar Das1

1Department of ECE, Dr. B. C. Roy Engineering College, Durgapur, INDIA
E-mail: saha81@gmail.com

Abstract—Carrying more information makes the ternary-computation effective with the aim of lowering interconnect hurdle. Hence, ternary computer can be the future alternative to conventional (binary) counterpart. As a consequence the ternary arithmetic has become the prime choice of circuit/system researcher in recent time. Ternary adder/subtractor is the integral part of Ternary Arithmetic Logic Unit (TALU) and 3’s complement is used to represent negative ternary number in TALU. Current work unfolds a new two-step low hardware-cost strategy to converts ternary input into its three’s complement form. Novel optimization strategy to improve hardware efficiency (i.e. PDP)using conventional Enhancement-type Metal Oxide Semiconducotor (E-MOS)-transistor is explored and exploited to design proposed 4-input threes complement generator on 32nano-meter standard Complementary-MOS technology with 900mV supply-rail at 27°C temperature using nominal MOS-transistor. Unbalanced trit “0”, “1” and “2” are denoted with ground, supply/2 and supply respectively. The circuit transient characteristic is validated through rigorous T-Spice simulations with every possible test patterns and the speed-power result is examined and compared to benchmark. The circuit performance is also evaluated by applying load variation. The 4-trit three’s complement circuit is extended next to propose 16-trit 3’s complement generator and the impact of Process and Environmental variation on the proposed circuit is studied.

Index Terms—Hardware Optimization, Power-Delay-Product, PVT-Variation, Ternary 3’s Complement, Unbalanced Ternary System

I. INTRODUCTION

Higher speed-power efficiency with reduced interconnect complexity is always been at the centre of demand for digital processing [1-2]. Fabrication complexity and the associated hazards owing to large interconnect is the major bottleneck for current binary-based sophisticated digital system implementation [3]. Higher radix or multi-valued system can offer some efficiency by processing relatively more information with less number of logic blocks [2, 4] and attracted researcher from long-back [5-6]. Being closure to natural base-© (≈ 2.78) the multi-valued ternary (base-3) was attracted W. Alexander to investigate ternary-computer in 1964 [7] and also received renewed interest among circuit research community in recent time as evidenced from [8-14]. Complement strategy is trendy to stand for the negative number in digital domain and consequently the 3’s complement circuit serves as an essential part for adder/subtractor block in ternary-ALU (Arithmetic Logic Unit) [2, 14-16]. As per recent study [17-19] the complement method is also becoming preferable for Machine and Deep Learning, Data Security, Image Processing etc. Hence, efficient ternary 3’s complement circuit is the need of the hour and is investigated in present work.

Most recently in 2022, S. Rani et. al [2] presented a 4-trit ternary adder/subtractor by exploiting T-XOR (Ternary-XOR) based novel 3’s complement circuit for low-power, high-speed ternary-ALU. Use of generic T-XOR is the matter of concern and calls for hardware inefficiency and related power-delay hazards. Present study explores a unique two-fold strategy that can optimize the hardware complexity with carry generation/propagation delay of 3’s complement circuit. Firstly, the required logic cells are optimized by exploiting input constraint and applying don’t care condition. A novel strategy is adopted next to reduce the internal carry generation/propagation delay as well as corresponding hardware cost. Eventually overall speed-power efficiency of the circuit improves in favor of efficient ternary-ALU design. Proposed idea is explained with respect to Four-trit threes complement circuit and is designed using BSIM4 conventional E-MOS transistor on 32nano-meter standard Complementary-MOS technology at 27°C temperature with 900mV supply. Unbalanced trit value “0”, “1” and “2” are represented with 0V, supply/2 and supply respectively. The working of proposed circuit is checked through extensive T-Spice transient simulations with custom ternary test inputs. Speed-power response of proposed Four-trit threes complement circuit is then compared with the idea presented in [2] by considering equal operating condition and 0.1GHz test input and 1fF load to benchmark. The power-delay performance of the designed circuit is tabulated in different load (i.e. 1fF to 10fF) value. The circuit is next extended to design 16-trit 3’s complement circuit and validated through all possible transient simulations. Finally, robustness with respect to PVT (Process Voltage Temperature) variation is measured and recorded.

The rest part of this paper is organized as follows: Section-II explores the proposed idea to construct 3’s complement circuit in detail. Section-III is responsible to present the design, simulation result and benchmarking with most recent competitive work. Section-IV concludes the paper.

II. PROPOSED 3’S COMPLEMENT GENERATOR: THEORETICAL PERSPECTIVE

The block-level data-flow model of proposed ternary three’s complement circuit is depicted in Figure-1. The 4-trit ternary input in Figure-1 is represented with “X” {X0 X1 X2 X3}. Here X0 is the LST (Least-Significant-Trit) and X3 is the MST (Most-Significant-Trit). Corresponding two’s
and three’s complement output is denoted by “Y” (Y₁, Y₂, Y₃, Y₄) and “Z" (Z₁, Z₂, Z₃, Z₄) respectively. Proposed hardware optimization strategy is applied to generate the two’s complement result “Y” at Stage-I and the three’s complement output “Z" at stage-II in Fig. 1. Detail of circuit optimization steps and working principle of proposed ternary 3’s complement circuit is disclosed next.

![Fig.1 Proposed three’s Complement Computation Strategy](image)

### Table-I

**Trit Input** | **T-Inverter Output**
---|---
“1” | T会使 | T会使 | T会使 | T会使 | T会使 | T会使 | T会使 |
“0” | “0” | “0” | “0” | “0” | “0” | “0” |
“1” | “2” | “0” | “0” | “0” | “0” | “0” |
“2” | “0” | “0” | “0” | “0” | “0” | “0” |

**Trit I/P “1”** | **Two’s Complement O/P**
---|---
0 (OV) | 2
1 (Supply/2) | 1
2 (Supply) | 0

**Table-II**

Ternary Two’S Complement I/O Relation

### Table-III

| “X₆” | “Y₆” | “C₆” |
---|---|---|
0 | 2 | 1 |
1 | 1 | 0 |
2 | 0 | 0 |

### Table-IV

**Karnaugh-Map of S₆**

| “0” | “1” | “2” |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 1 | 1 |

### Table-V

**Karnaugh-Map of C₆**

| “0” | “1” | “2” |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 1 | 1 |

Three ternary-inverters STI (Standard/Simple T-Inverter), NTI (Negative T-Inverter) and PTI (Positive T-Inverter) along with TMVD (Ternary Middle Value Decoder) [20] is the fundamental part for proposed 3’s complement circuit and the corresponding excitation relation is presented in table-I. The trit-input is symbolized by “T” in table-I and the corresponding STI, NTI, PTI and TMVD output are T₁, T₂, T₃ and T₄ respectively. The Two’s complement of a ternary number can be generated by subtracting each trit value from ternary “2” and the resulting I/O relation for single trit “T” is shown in table-II. Close observation on table-I and II revealed that the 2’s complement of a ternary number can be obtained by applying STI on each trit value of the number and has been exploited here to find the Two’s complement “Y” of 4-trit ternary-input “X” at stage-I (Fig.1).

Stage-II in Fig.1 is responsible to compute the final three’s complement “Z" by adding up ternary “1” with the stage-I two’s complement “Y”. As the Y₆ (LST of “Y”) is added with fixed “1”, the corresponding half-adder sum (S₆) and carry (C₆) depends on Y₆ only. This has been exploited to achieve the stage-II hardware optimization in proposed work. In order to achieve faster operating speed the C₆ block is designed with “X₆” input instead of “Y₆” as elaborated in table-III. The k-map for “S₆” and “C₆” is presented in table-IV and in table-V respectively. The don’t care situation in the k-map is denoted by “d” here. Hardware optimized transistor-level circuit for “S₆” and “C₆” as per proposed idea is shown in Figure-2a and in Figure-2b respectively.

![Fig.2a Circuit Diagram of a) “S₆” and b) “C₆”](image)
strategy is followed to construct the MOS-level circuit structure for $S_0$ and the same is shown in Fig.2b. Subsequent addition in proposed circuit is performed with $S_0$ (sum) and $C_0$ (carry) block as shown in Fig.1. Towards circuit optimization the k-map for $S_0$ and $C_0$ is presented in table-VI and table-VII respectively. As the carry input can never be “2”, possibility has been placed with don’t care in the corresponding k-map for $S_0$ and $C_0$. Consider $S_0$ for $Y=0$ (col.-0 in table-VI), the resulting output is same as carry input “C”. The circuit equivalent is constructed by applying NTI of $Y$ at the gate terminal of NMOS-transistor $M_1$ in Fig.3a and passing the carry input “C” to the output. Two series connected PMOS-transistors $M_2$ and $M_3$ in Fig.3a is responsible to construct the circuit equivalent for input $C=0$ and $Y=1/2$. The $C^0$ complement is responsible to select the PMOS-$M_2$ for $C=0$, whereas $Y^N$ is to select the PMOS-transistor $M_1$ for $Y=1/2$. The same principle is followed to construct the rest part of $S_0$. To mitigate carry propagation delay, the circuit for $C_0$ is constructed with “X” instead of “Y” as explained in table-VII and the corresponding circuit equivalent is shown in Fig.3b. Detail on circuit construction is not disclosed for the sake of brevity. However, more detail can be found in [21].

Aforesaid two-fold hardware optimization strategy is adopted here to achieve the improved speed-power characteristics as compared to generic T-XOR based design explored by S. Rani et. al in [2].Although the circuit shown in Fig.1 is for 4-trit input, however the thought is generic and can be extended up to any input length by repeating $S_0$ and $C_0$ blocks. The design, transient simulation, benchmarking comparative study, speed-power evaluation with different load values and the robustness analysis with PVT variation of proposed three’s complement circuit is explained in sec.-III.

**III. DESIGN, ANALYSIS AND BENCHMARKING**

Proposed four-trit three’s complement circuit is designed with BSIM4 E-MOS transistors on 32nano-meter CMOS technology with 900mV supply and 1fF load at 27°C nominal temperature. The 0mV, 450mV and 900mV are considered to represent trit value “0”, “1” and “2” respectively. The front-end schematic-diagram of designed circuit is shown in Figure-4. Circuit is validated by applying custom transient ternary input with 10ps rise/fall time through PWL (Piece-Wise-Linear) input source. The resulting transient response with 0.1GHz test input is disclosed in Figure-5. Let us understand the transient response curve shown in Figure-5 in more detail. Consider the test-pattern “$X = \{X_0=1; X_1=1; X_2=0; X_3=2\}$ applied in the time-frame 0ns to 10ns (Figure-5a). The corresponding 2’s complement output “$Y = \{Y_0=1; Y_1=1; Y_2=2; Y_3=0\}$” of Stage-I is presented in Figure-5b. Finally the three’s complement result “$Z= \{Z_0=2; Z_1=1; Z_2=2; Z_3=0\}$” is obtained by adding ternary “1” with “$Y$” in stage-II and is shown in Figure-5c. The evaluated characteristics on 32nano-meter CMOS technology with 900mV supply, 0.1GHz test-input and 1fF load at
27°C temperature is recorded in table-VIII.

![Fig. 4 Schematic of Proposed Four-trit three’s Complement Generator](image)

![Fig. 5a Four-trit Ternary Input “X”](image)

![Fig. 5b Stage-I O/P “Y”](image)

![Fig. 5c Stage-II O/P “Z”](image)

<table>
<thead>
<tr>
<th>Design</th>
<th>MOS Device</th>
<th>Power (µW) @0.1GHz</th>
<th>Delay (p-sec)</th>
<th>PDP (J) @0.1GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Four-Trit 3’s Complement Circuit</td>
<td>198</td>
<td>175.65</td>
<td>83.37</td>
<td>14.64×10^{-15}</td>
</tr>
</tbody>
</table>

Table IX

Benchmarking study with [2]

<table>
<thead>
<tr>
<th>Proposed in</th>
<th>Power (W)</th>
<th>Critical Delay (µsec)</th>
<th>PDP (J)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[2], 2022</td>
<td>164.65×10^{-4}</td>
<td>75.34×10^{-3}</td>
<td>1.68×10^{-15}</td>
</tr>
<tr>
<td>Present</td>
<td>175.65×10^{-4}</td>
<td>83.37×10^{-3}</td>
<td>1.46×10^{-15}</td>
</tr>
</tbody>
</table>

Operating Condition: 0.1GHz input, 27°C temperature, 1F load, Tech. 32nano-meter CMOS, Supply: 900mV

As explored in table-VIII, the proposed four-trit three’s complement circuit consists of 198 active devices and consumes 175.65µW average power when operated with 0.1GHz ternary input at 27°C temperature. Critical delay for circuit is 83.37ps for 1F load. A benchmarking comparative study with most recent counterpart [2] is discussed in table-IX. To present fair comparison, the four-trit three’s complement circuit is redesigned based on idea presented in [2] by removing input-A and replacing full-adder with half-adder of [2]. Both the circuits are compared in equal operating condition (i.e. 900mV supply, 0.1GHz input and 27°C temperature) with 1F load.

Fastest digital processing is extreme need of the hour. However, corresponding power-dissipation for high-density VLSI chip is the matter of concern now and calls for PDP reduction instead of either delay or power. As per study (table-IX) proposed strategy can be applied to reduce overall delay and PDP (Power-Delay-Product) with marginal increase of average power.

<table>
<thead>
<tr>
<th>Output Load (F)</th>
<th>Power Consumption (µWatt)</th>
<th>Delay (µsec)</th>
<th>PDP (Joule)</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>175.65</td>
<td>83.37</td>
<td>14.64×10^{-15}</td>
</tr>
<tr>
<td>2</td>
<td>175.72</td>
<td>102.05</td>
<td>17.93×10^{-15}</td>
</tr>
<tr>
<td>3</td>
<td>175.81</td>
<td>121.59</td>
<td>21.37×10^{-15}</td>
</tr>
<tr>
<td>4</td>
<td>175.92</td>
<td>134.84</td>
<td>23.72×10^{-15}</td>
</tr>
<tr>
<td>5</td>
<td>176.02</td>
<td>149.00</td>
<td>26.24×10^{-15}</td>
</tr>
<tr>
<td>6</td>
<td>176.11</td>
<td>163.92</td>
<td>28.87×10^{-15}</td>
</tr>
<tr>
<td>7</td>
<td>176.18</td>
<td>182.02</td>
<td>32.07×10^{-15}</td>
</tr>
<tr>
<td>8</td>
<td>176.27</td>
<td>195.85</td>
<td>34.17×10^{-15}</td>
</tr>
<tr>
<td>9</td>
<td>176.38</td>
<td>208.03</td>
<td>36.69×10^{-15}</td>
</tr>
<tr>
<td>10</td>
<td>176.51</td>
<td>221.81</td>
<td>39.15×10^{-15}</td>
</tr>
</tbody>
</table>
Table XI
Performance of Designed 16-trit three’s Complement Circuit

<table>
<thead>
<tr>
<th>Design</th>
<th>MOSFET</th>
<th>Power (µWatt) @0.1GHz</th>
<th>Delay (ps)</th>
<th>PDP (Joule) @0.1GHz</th>
</tr>
</thead>
<tbody>
<tr>
<td>Proposed 16-Trilt’s 3’s Complement Circuit</td>
<td>954</td>
<td>369.91</td>
<td>92.11</td>
<td>34.07x10^15</td>
</tr>
</tbody>
</table>

Table XII
PVT Variation Effect on Proposed 16-trit 3’s Complement Generator

<table>
<thead>
<tr>
<th>Design</th>
<th>Temp</th>
<th>supply (mV)</th>
<th>Worst power (µWatt)</th>
<th>Worst Delay (ps)</th>
<th>Worst PDP (Joule)</th>
</tr>
</thead>
<tbody>
<tr>
<td>16-trit three’s complement circuit</td>
<td>40°C</td>
<td>810</td>
<td>101.09</td>
<td>105.64</td>
<td>10.68x10^15</td>
</tr>
<tr>
<td></td>
<td>900</td>
<td>265.65</td>
<td>85.11</td>
<td>22.88x10^13</td>
<td></td>
</tr>
<tr>
<td></td>
<td>990</td>
<td>311.52</td>
<td>72.01</td>
<td>22.43x10^15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>27°C</td>
<td>810</td>
<td>113.77</td>
<td>114.32</td>
<td>13.01x10^15</td>
</tr>
<tr>
<td></td>
<td>900</td>
<td>369.91</td>
<td>92.11</td>
<td>34.07x10^15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>990</td>
<td>403.62</td>
<td>76.61</td>
<td>30.92x10^15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>85°C</td>
<td>810</td>
<td>132.49</td>
<td>122.54</td>
<td>16.24x10^15</td>
</tr>
<tr>
<td></td>
<td>900</td>
<td>405.21</td>
<td>97.21</td>
<td>39.39x10^15</td>
<td></td>
</tr>
<tr>
<td></td>
<td>990</td>
<td>593.12</td>
<td>80.52</td>
<td>47.76x10^15</td>
<td></td>
</tr>
</tbody>
</table>

Fig.6 Circuit Reliability Measure under PVT Variation

Proposed 4-trit 3’s complement circuit has been investigated for different load condition (i.e. 1F to 10F) and the corresponding speed-power performance is tabulated in table-X in nominal operating condition. The idea is applied next to design 16-trit 3’s complement circuit (schematic not shown). As presented in table-XI the proposed circuit need 954 MOS-transistor and dissipates 369.91µW average power when applied with 0.1GHz ternary-input at 27°C temperature with 900mV supply. The propagation delay for the designed circuit is 92.11ps. The robustness of proposed idea for PVT (Process Voltage Temperature) variation [22] is measured with respect to 16-trit 3’s complement circuit by applying fast, nominal and slow transistor with ±10% variation of power supply from nominal in the temperature -40°C, 27°C and 85°C. The PVT variation effect on speed-power response is tabulated in Table-XII. The corresponding PVT response is graphically shown in Figure-6. As per investigation the proposed 16-trit three’s complement generator offer worst-case PDP of 47.76x10^15 with 0.99V supply at 85°C temperature and can be a design of choice for future TALU.

IV. CONCLUSIONS

New two-fold optimization strategy for ternary three’s complement circuit with improved PDP outcome is disclosed in this work. Firstly, the hardware reduction at logic level is achieved by eliminating don’t care input possibilities and optimizing it accordingly. Finally, unique idea is adopted and applied to reduce the internal carry generation-propagation delay to reduce overall power-delay-product. Proposed idea is discussed with respect to four-trit three’s complement circuit and the same is designed on 32nano-meter CMOS technology with 900mV supply at 27°C temperature. Ternary digit “0”, “1” and “2” are denoted by ground, supply/2 and supply respectively. Designed circuit is validated through T-SPICE transient response with custom ternary test-pattern. Speed-power performance is evaluated and compared with most recent strategy to benchmark. Effect of load variation on speed-power response is measured and tabulated. Next, the 4trit 3’s complement circuit is extended to propose 16-trit three’s complement circuit. The circuit is validated and evaluated. The effect of PVT (Process Voltage Temperature) variation on designed 16-trit 3’s complement circuit is investigated to find robustness. As per study, proposed unique idea can be useful and adopted to enhance computing power of Ternary-Computer by designing hardware optimized speed-power efficient adder/subtractor unit for Arithmetic-Logic-Unit in ternary-CPU. Robustness of the designed circuit against possible PVT variation makes the idea reliable and practical.

REFERENCES


