# Trigger-Wave Propagation in Arbitrary Metrics in Asynchronous Cellular Logic Arrays

Przemyslaw Mroszczyk and Piotr Dudek School of Electrical & Electronic Engineering The University of Manchester Manchester, M13 9PL, United Kingdom przemyslaw.mroszczyk@postgrad.manchester.ac.uk p.dudek@manchester.ac.uk

Abstract— This paper presents the idea of an asynchronous cellular pixel-parallel logic array for global image processing tasks using trigger-wave propagation in a medium with a hardware-controlled metric. The principles of wave propagation in cellular four-connected logic arrays emulating different distance measure norms are explained and verified using a simplified switched RC circuit model. The proposed gate array consists of only 13 transistors per pixel and was implemented in a standard 90 nm CMOS technology. It provides the propagation medium applicable for binary image skeletonization, Voronoi tessellation or distance transformation tasks where calculating distances in a particular metric (e. g. Euclidean, Manhattan, Chessboard, etc.) is desired.

Keywords—CMOS, dynamic logic, propagation, autowaves, collision detection, Euclidean metric, skeletonization;

#### I. Introduction

Shape recognition usually involves medium level image processing algorithms requiring global operations such as distance transformation (DT), skeletonization or tessellation. An interesting approach to global image attributes extraction, based on trigger-wave propagation and wave-front collision detection concepts was theoretically considered in the evaluation of the medial axis function (MAF) [1], and later practically observed in some chemical solutions reacting with incident light (the Belousov-Zhabotinsky reaction) [2]. Ideally such waves (autowaves), when triggered from the edges of an object, propagate isotropically (with a constant speed in every direction) utilizing the locally stored energy of a medium, and collide or bend denoting the medial axis points [1]. There are several characteristic norms typically used in image processing such as the Euclidean norm, Manhattan norm and Chessboard norm being the particular cases of a generic p-norm defined by the real number p (with a constraint  $p \ge 1$ ) and, for a 2dimensional vector (x, y), given by the formula:

$$\|(x,y)\|_{p} = (|x|^{p} + |y|^{p})^{\frac{1}{p}}$$
 (1)

Assuming isotropic propagation in different p-norm spaces the wave-front contours, triggered from a single point are equidistant from that point (Fig. 1).

Since the Euclidean metric is the most "natural" to use, several algorithms for calculating the approximate Euclidean distance measure were proposed [3], [4]. Also a hardware oriented approach for Single Instruction Multiple Data (SIMD) fine-grain processor arrays was presented in [5]. Direct hardware implementations of the trigger-wave propagation mechanism using asynchronous logic arrays were discussed in [6]-[9]. Such arrays provide a fast and energy efficient computational engine for many image processing algorithms i.e. hole filling, geodesic reconstruction, closed shape detection, where the correct operation is independent of the assumed metric. Also, the CNN implementations using the trigger-wave propagation usually do not control the distance measure norm applied in image processing tasks [10], [11]. Some algorithms however, such as distance transformation and skeletonization, when using the trigger-wave and collision detecting scheme, typically require a circular (Euclidean) propagation [1].



Fig. 1. Contours of the 2-dimensional propagation waves in different p-norms: a) Manhattan (p = 1), b) Euclidean (p = 2), Chessboard  $(p \to \infty)$ 

Rounded shapes of the trigger-wave contours in the VLSI processor arrays were observed in [6]-[8], [11] and some irregular ones in the CNN implementations in [10] however, no further discussion explaining this phenomenon was provided. This paper is a continuation of our work from [8] and presents an in-depth analysis of the timing parameters of a single propagation gate and the four-connected array of such cells capable of approximating operation in an arbitrary p-norm for  $p = 1,...,\infty$ . Section II presents the typical VLSI implementation of the propagation gate, Section III presents its timing analysis, Section IV verifies the behaviour of the ideal switched RC circuit array, Section V discusses the design, simulation results and the applications of the VLSI array designed in a standard 90 nm CMOS technology, and Section VI concludes the paper.

## II. TRIGGER-WAVE PROPAGATION IN VLSI ARRAYS

Trigger-wave propagation in VLSI circuits is usually implemented using an array of logic OR gates with the inputs connected to the outputs of their nearest neighbours. Typically cellular arrays with only four nearest neighbours  $P_N$ ,  $P_E$ ,  $P_S$  and  $P_W$  are considered [6]-[9] (Fig. 2a). In the initial state all the inputs (and outputs) of the OR gates are at the zero logic level. In order to trigger a wave, the additional signal m (unique for each pixel) is used to force the high state on the output of the selected gate. As a result the signal from that gate will trigger all its neighbours and the propagation will proceed as shown in Fig. 2b.



Fig. 2. Trigger-wave propagation concept: a) the propagation OR gate, b) the expansion of the wave triggered from the centre of the array

The realisation of the OR gate discussed in this paper, initially used in [8], is presented in Fig. 3. It consists of two stages: the NOR gate (built on transistors  $M_{I-5}$  and  $M_8$ ) and the inverting gate (built on  $M_6$  and  $M_7$ ), and operates in two-phase dynamic logic fashion.



Fig. 3. Schematic diagram of the propagation OR gate used in [8]

During the initialization phase the high logic state on the global *precharge* line turns on the transistor  $M_6$  discharging the parasitic capacitance of the output P to ground (low logic level) and precharging node NOR to  $V_{DD}$  (high logic level) through the fully turned on week keeping transistor  $M_8$ . In the evaluation phase the *precharge* line is at the low logic level, turning  $M_6$  off. Assuming that the off leakage currents of  $M_{1-5}$  and  $M_7$  do not affect the operation of the circuit (which is assured by employing week keepers and proper transistor scaling) the gate remains in balance until any of the input signals turns to the high logic state. When this is the case, the NOR node quickly discharges to zero turning on  $M_7$ , and the output node P turns to the high logic state, sending the propagation signal to the neighbouring cells.

When the propagation gate from Fig. 3 is being triggered from any of its neighbours  $P_N$ ,  $P_S$ ,  $P_W$  or  $P_E$ , the corresponding n-MOS transistor from the pull-down network  $M_{l-4}$  turns on and (assuming that the current of  $M_8$  is negligible) quickly

discharges the parasitic capacitance of node *NOR* through the small respective channel resistance. For two neighbours driving the gate simultaneously, the parallel connection of two channel resistances discharges the same capacitance faster, reducing the propagation time of this stage.

# III. TIMING ANALYSIS

In the following, we assume a regular 2-D array with only four nearest neighbour connectivity and a constant pixel pitch x. We will consider two cases of the wave-front propagation either along the cardinal or the diagonal direction of the array as shown in Fig. 4.



Fig. 4. The propagation of the wave triggered from the point *O* along the cardinal and the diagonal directions considered in points *A* and *B* respectively

The cells located along the cardinal directions are always triggered from only one neighbour and the wave front propagates with a constant cell-to-cell speed  $v_A$ . The cells located on the diagonal directions however, are triggered from two neighbours simultaneously and propagate the signal with the higher respective cell-to-cell speed  $v_B > v_A$ . As a result the wave triggered in a circuit array tends to accelerate towards the diagonal directions, which makes the propagation contours more circular [8]. For a cardinal direction, the propagation time is  $T_C = x/v_A$  (Fig. 5a) whereas for a diagonal direction the wave propagates through its nearest neighbours passing the distance 2x within the propagation time  $T_D = 2x/v_B$  (Fig. 5b).



Fig. 5. The mechanism of the wave propagation in a regular four-connected circuit array along the directions: a) cardinal, b) diagonal

When calculating distances using wave propagation, the distances are determined by the propagation time. Therefore the diagonal distance d = |BB'| can be calculated from the propagation time ratio  $d = (T_D/T_C)x$  which is equal to:

$$d = \frac{2x}{v_R / v_A} \tag{2}$$

Assuming that the array operates in an arbitrary metric defined by the parameter p of the p-norm in (1), the distance d between two diagonal neighbours in the regular array with the pixel pitch x is given by:

$$d = 2^{1/p} x \tag{3}$$

From (2) and (3) the following relation can be derived:

$$\gamma = 2^{1 - \frac{1}{p}} \tag{4}$$

where  $\gamma = v_B/v_A$  is the speed ratio parameter and  $p \in [1,...,\infty]$  defines the *p*-norm from (1). Equation (4) combines the time and speed parameters of a single propagation gate with the geometric properties of the wave contours generated in the respective cell array.

#### IV. SIMPLIFIED SWITCHED RC ARRAY MODEL

The asynchronous array of ideal switched *RC* propagation gates (Fig. 6) will be used to verify the relation between the timing parameters and the properties of the generated wave fronts discussed in Section III.



Fig. 6. Schematic diagram of the ideal switched RC propagation gate

The initial conditions specified for the proposed circuit array assure that the capacitor  $C_{NOR}$  is charged to  $V_{DD}$  and the capacitor  $C_P$  is discharged (the output P is at the zero logic level) before the propagation phase. Transistors  $M_{l-5}$  (n-MOS) and  $M_7$  (p-MOS) from Fig. 3 are implemented as ideal voltage controlled switches with fixed channel resistances  $R_N$  and  $R_P$ respectively. A particular switch turns on when its control voltage (any of the input signals or the voltage across  $C_{NOR}$ ) exceeds a certain threshold. When this is the case for any of the input signals, the respective switch turns on discharging  $C_{NOR}$ with a time constant  $\tau_1 = R_N C_{NOR}$ . When any two inputs are driven simultaneously, the capacitance  $C_{NOR}$  discharges at twice the speed with a time constant  $\tau_1/2$ . When the voltage across  $C_{NOR}$  falls below a certain value, the switch in the second stage turns on, charging the output capacitance  $C_P$  with a time constant equal to  $\tau_2 = R_P C_P$ . When the rising slope of the output signal P crosses the threshold voltage of the input switches, all the neighbouring gates will be triggered and the mechanism of the propagation will continue. The proposed circuit realisation of the gate consists of two stages, thus the propagation speed is inversely proportional to the sum of respective time constants  $v \sim 1/(\tau_1 + \tau_2)$ . Based on this relation the speed ratio  $\gamma$  introduced in Section III for the propagation speeds  $v_A$  and  $v_B$  of the proposed switched RC circuit equals to:

$$\gamma = \frac{v_B}{v_A} = \frac{\tau_1 + \tau_2}{\tau_1/2 + \tau_2} = \frac{R_N C_{NOR} + R_P C_P}{R_N C_{NOR}/2 + R_P C_P}$$
 (5)

For example, assuming that both of the time constants  $\tau_1$  and  $\tau_2$  are equal, the speed ratio is  $\gamma \approx 1.33$  which means that the observed propagation contour is close to a circle and the array will operate in the approximate Euclidean metric.

The operation of the proposed ideal RC model was verified in an HSpice simulator. In order to implement switched resistance  $R_N = 10 \text{ k}\Omega$  with a threshold voltage 500 mV, a VCR (voltage-controlled resistor) element with resistance attribute changing from 1 G $\Omega$  (switch open) to 10 k $\Omega$  (switch closed) within the control voltage range from 499.5 mV to 500.5 mV was used. The values of the RC elements were calculated to assure the fixed propagation time of each stage of 1 ns ( $R_N$ = 10  $k\Omega$  and  $C_{NOR} = C_P = 144.27$  fF). The parameter γ can be set to any value between 1 and 2, depending on  $R_P$  according to (5). The simulated propagation contours, generated in the array of  $33 \times 33$  switched RC delay gates when the wave is triggered from the centre, are shown in Fig 7. For different  $R_P$ , determining the propagation time of the second stage and the speed ratio  $\gamma$ , different wave contours, confirming the analysis from Section III, can be observed.



Fig. 7. Propagation contours observed in the  $33\times33$  cell array of the proposed switched *RC* gates for different  $\gamma$  values

#### V. PROPAGATION GATE WITH METRIC CONTROL

The array discussed in Section IV operates (with a good approximation) in the Euclidean metric when the time constant  $\tau_2$  is shorter than  $\tau_1$ . In practice this can be achieved by increasing  $\tau_1$ . The transistor-level implementation of the propagation gate employing this approach in the dynamic logic fashion, is presented in Fig. 8 [8].



Fig. 8. The proposed propagation OR gate with adjustable timing parameters

The additional transistors  $M_{10-13}$  (in series with  $M_{1-5}$ ) increase the resistances  $R_N$  of each pull-down branch (Fig. 6) which increases the time constant  $\tau_1$  depending on  $V_{MODE2}$ . Transistor  $M_9$ , controlled by the voltage  $V_{MODE1}$ , limits the total current discharging the capacitance  $C_{NOR}$  which makes  $\tau_1$  less

dependent on the number of the triggering neighbours. A similar effect can be observed in the proposed switched RC model when the time constant  $\tau_2$  is much longer than  $\tau_1$ . The propagation gate shown in Fig. 8 was designed in a standard 90 nm CMOS technology. The post layout simulations of the array consisting of  $33 \times 33$  cells, when the wave is triggered from the centre, are presented in Fig. 9.



Fig. 9. The observed propagation contours in the VLSI array approximating various distance norms: a) City Block (p = 1), b) Manhattan/Euclidean (1 , c) Euclidean <math>(p = 2), d) Euclidean/Chessboard (p > 2)

In the proposed design, the transistor parameters are not critical in terms of the correct operation of the array, however, several issues of submicron CMOS technologies should be addressed. The proposed OR gate, realized as a dynamic logic circuit, may suffer from a significant charge leakage which may corrupt the initial state of the circuit. Using the weak keeping transistor  $M_8$  and also  $M_6$  operating in weak inversion when  $V_P$  is set to about 150 mV for the evaluation phase, properly balances the off-leakage currents of the NOR pulldown network and  $M_7$  pull-up transistor preserving the initial state of the gate. Also transistor scaling reduces the local random MOS parameter variation (the fabrication mismatch) in practice affecting the regularity of the observed wave contour. The operation of the designed array was verified using mismatch Monte Carlo and Corner MOS transistor models provided by the foundry. The proposed circuit array operates correctly accounting for the fabrication mismatch (wave-front irregularities not higher than +/- 1 pixel) and across the PVT corners; however it should be noted that the long-distance spatial parameter variations may affect the operation of large arrays.

The proposed OR gate was used in the design of the propagation and collision detecting array introduced in [8], where, in addition to the propagation gate, there is also a collision detecting AND-LATCH gate capable of latching the collision events when a given cell receives signals from all its neighbours simultaneously (which happens to the cells located at the points where two wave fronts meet frontally). The post layout simulation results of the array consisting of  $96 \times 96$  cells showing the collisions between waves triggered from several different points and detected by the circuit are shown in Fig. 10 (the example of the binary image tessellation). Due to the simplicity of the collision detecting circuit (necessary for the low area occupation) the obtained tessellations are not perfect when operating in the Manhattan or Chessboard norms (non frontal collisions occur). However, using the proposed asynchronous processing scheme, circular waves can easily be generated and the very simple collision detecting mechanism becomes sufficient for aiding fast image processing in pixelparallel cellular processor arrays.



Fig. 10. Waves tiriggered form several points propagating in different metrics:
a) intermediate state of the propagation, b) obtained collision lines

## VI. CONCLUSIONS

In this paper the principles of trigger-wave propagation in a cellular four-connected circuit arrays emulating different distance measure norms are explained and verified using the array of ideal switched *RC* circuit models. The proposed propagation gate consisting of only 13 transistors was designed in a standard 90 nm CMOS technology. The corresponding VLSI gate array can operate in an arbitrary *p*-norm aiding global image processing tasks e.g. skeletonization, tessellation and distance transformation where operation in the specified (e.g. Euclidean) metric is essential.

#### REFERENCES

- [1] H. Blum, "A transformation for extracting new descriptors of shape", in Models for Perception of Speech and Visual Form (W. Wathen-Dunn, Ed.), pp. 362-380, MIT Press, Cambridge, Mass., 1967.
- [2] V. Krinsky, V. Biktashev, N. Efimov, "Autowaves Principles for Parallel Image Processing", Physica D, vol. 49, pp. 247-253, 1991.
- [3] U. Montanari, "A Method for Obtaining Skeletons Using a Quasi-Euclidean Distance", Journal of the ACM, vol. 15, no. 4, Oct. 1968.
- [4] G. Borgefors, "Distance Transformations in Digital Images", Computer Vision, Graphics and Image Processing 34, pp. 344 - 371, 1986.
- [5] S. Razmjooei, P. Dudek, "Approximating Euclidean Distance Transform with Simple Operations in Cellular Processor Arrays", IEEE Workshop on Cellular Nanoscale Networks and Applications, CNNA 2010, Berkeley, Feb. 2010.
- [6] P. Dudek, "An Asynchronous Cellular Logic Network for Trigger-Wave Image Processing on Fine-Grain Massively Parallel Arrays", IEEE Trans. Circuits Syst. II, vol. 53, no.5, pp. 354-358, May 2006.
- [7] A. Lopich, P. Dudek, "Asynchronous Cellular Logic Network as a Co-Processor for a General-Purpose Massively Parallel Array", Int. J. Circuit Theory Appl., DOI: 10.1002/cta.679, 29 April 2010.
- [8] P. Mroszczyk, P. Dudek, "Trigger-Wave Collision Detecting Asynchronous Cellular Logic Array for Fast Image Skeletonization", ISCAS 2012, May 2012.
- [9] J. E. Eklund et al., "VLSI Implementation of a Focal Plane Image Processor - A Realization of the NSIP Concept", IEEE Trans. Very Large Scale Integration (VLSI) Syst., vol. 4, no. 3, Sep. 1996.
- [10] C. Rekeczky, L. O. Chua, "Computing with Front Propagation: Active Contour and Skeleton Models in Continuous Time CNN", J. VLSI Signal Process, vol. 23, pp. 373-402, 1999.
- [11] R. Carmona Galan et al.,"A Bio-Inspired Two-Layer Mixed-Signal Flexible Programmable Chip for Early Vision", IEEE Transactions on Neural Networks, vol. 14, no. 5, pp. 1313 1336, Sep. 2003.