PARALLEL PROCESSING IN A CONTROL SYSTEMS ENVIRONMENT

© Prentice-Hall International, London, 1993, 364 pages. Series in Systems and Control Engineering, ISBN 0-13-651530-4, Price: £45.

Editors:

ERIC ROGERS

Advanced Systems Research Group, Department of Aeronautics and Astronautics, University of Southampton

YUN LI

Department of Electronics and Electrical Engineering, The University, Glasgow

CONTENTS

List of Contributors xiii

Preface xi

PART I - SYSTOLIC ARRAYS

The first generation, still widely used, of digital control systems implementations are based on the single general-purpose processor architecture arising from the von Neumann model. Further, the development of these systems, measured in terms of increased speed and improved performance, has been historically linked to developments in computing technology. This has led to the use of special-purpose digital signal processors and, subsequently, control-dedicated 'digital control processors', both of which use the uniprocessor architecture.

Rapid developments in very large scale integration (VLSI) circuit technology now permit the fabrication of multiple processing elements (PEs) onto a single silicon chip. This, in turn, has led to the development of architectures in the form of systolic and wavefront arrays which have been the subject of intensive research, motivated by a number of computationally intensive applications areas. For example, in digital signal processing, an area which shares strong structural links with control, these arrays have been applied to problems such as correlation and convolution, infinite impulse response filters, fast Fourier transform, orthogonal triangularization and LU decomposition.

In effect, a systolic array is a network of processors which rhythmically computes and passes data through the network. Its major features are (a) modularity and regularity n terms of VLSI implementation, (b) linear-rate pipelinability, (c) spatial and temporal locality, (d) data input to the array only pipelined through boundary PEs, and (e) synchrony of data flow and computation. A wavefront array has the same architecture but here the control of data flow and computation is self-timed and data-driven, rather than controlled by a synchronized global clock. At word-level, these two types of architecture directly map an algorithm onto the hardware PE level and in this section they are applied to control problems.

After a brief review of previous work and the relevant features of systolic/wavefront arrays, Chapter 1 considers the design of these architectures for a generic class of real-time feedback control schemes. A crucial point stressed here is that, in contrast to other areas of application, both processing speed (or throughput rate) and processing delay (or system latency) must be included in measuring the performance of a parallel architecture for real-time feedback control schemes. Chapter 2 then gives a systematic treatment of the design of systolic arrays for adaptive control schemes and includes architectures for least-squares estimation and predictive control. Finally, Chapter 3 develops systolic architectures for the Kalman filter whose various forms have found wide applications in control/digital signal processing. Hence this chapter should be seen as complementing the already existing designs.


1. VLSI SYSTOLIC/WAVEFRONT ARRAY PROCESSING FOR RECURSIVE FILTERING AND CONTROL

Y. Li and E. Rogers 3

1.1 Introduction 3

1.2 Preliminaries 3

1.2.1 Background and requirements 3

1.2.2 Related work 7

1.2.3 Systolic/wavefront arrays 9

1.2.4 A typical control computation 11

1.3 Architectural considerations 13

1.3.1 State augmentation and predictive control 13

1.3.2 Graph reversal and the semi-systolic array 15

1.3.3 Transfer-function verification 18

1.4 Systolic and wavefront architectures 19

1.4.1 Use of existing architectures 19

1.4.2 Type 1 architecture 21

1.4.3 Type 2 architecture 25

1.4.4 Wavefront architectures 27

1.5 Conclusions and further work 29

References 34


2. SYSTOLIC ARCHITECTURES FOR ADAPTIVE CONTROL

L. Chisci and G. Zappa 36

2.1 Introduction 36

2.2 Parallelization of adaptive control systems 38

2.3 Basic algorithm and systolic architecture 42

2.3.1 The basic algorithm 42

2.3.2 The basic systolic array 46

2.4 The recursive parameter estimator 51

2.5 Controller design 55

2.6 Systolic implementation of predictive controller design 61

2.6.1 Implementation I: orthogonally connected trapezarray 61

2.6.2 Implementation II: hexagonally connected triarray 64

2.6.3 Performance analysis 69

2.7 Conclusions 69

References 70


3. PARALLEL PROCESSING FOR KALMAN FILTERING

R.W. Stewart 72

3.1 Introduction 72

3.2 The Kalman Filter 73

3.2.1 Kalman filtering equations 74

3.2.2 Covariance filter 74

3.2.3 Information filter 75

3.2.4 Numerical properties of the Kalman filter 76

3.3 Direct parallel Kalman filter 77

3.3.1 Covariance Kalman latency 77

3.3.2 Parallel array implementation 77

3.3.3 Numerical implementation 80

3.4 Orthogonal matrix algebra for Kalman filtering 80

3.4.1 Least squares using QR decomposition 81

3.4.2 Givens transformations 82

3.4.3 Parallel QR triarray 84

3.4.4 Square root free Givens transformations 84

3.4.5 Numerical properties of Givens transformations 86

3.4.6 Cholesky decomposition 87

3.5 Least-squares based Kalman filter parallel arrays 88

3.5.1 Least-squares formulation 89

3.5.2 Recursive least-squares formulation 89

3.5.3 A triangular parallel array 91

3.5.4 Noise pre-whitening 93

3.5.5 Alternative designs 95

3.6 Square root Kalman filtering 95

3.6.1 Numerical properties 95

3.6.2 Cholesky factorization of the Kalman equations 96

3.6.3 Parallel square root Kalman filter array 97

3.6.4 LDU matrix square root Kalman filtering 98

3.7 Hardware parallel Kalman filter implementations 100

3.7.1 A dedicated Kalman processing element 100

3.7.2 Parallel array partitioning 101

3.8 Conclusions 102

References 104

PART II - ARCHITECTURES FOR INTELLIGENT CONTROL

Intelligent control is a general area which is receiving increasing attention by the general control systems community. In general, the implementation of these schemes can impose a heave computational load and this, coupled with their inherent parallelism, requires a parallel treatment. Amongst the various candidate architectures, artificial neural networks (ANNs) are currently the subject of much research effort and, in effect, provide a self-learning capability and ultra-high processing speed by massive parallelism. Typically, an ANN consists of a very large number of processing elements - the neurons - interconnected by weights which can be updated, or trained, during operation. The actual network topology varies according to the choice of various models and activation rules.

In the context of the work reported here, the well-known back-propagation model can be interpreted as a 'partitioned complete graph' with each partition termed a layer. Further, every neuron within a layer is run in parallel and every layer is run in a pipeline. Alternatively, in the Hopfield model all neurons are interconnected to each other with no topological hierarchy.

These networks can be implemented in either hardware or software. Further, their self-learning property, in particular, offers the promise of powerful techniques, in comparison to existing techniques such as adaptive schemes, for solving nonlinear control problems. The three chapters in this section are a cross-section of current 'state of the art' research on the application of ANNs to control problems.

Chapter 4 critically examines the roll of ANNs in the important general area of processing control with emphasis on currently difficult problems in modelling, estimation and prediction. Their use as 'software sensors' is also examined. Chapter 5 develops the multivariate B-spline based ANN for adaptive e or learning controllers involving least-squares estimation, stochastic approximation and nonlinear time series prediction.

The work of Chapter 5 can also be viewed as extracting useful concepts from fuzzy logic based systems and a comparison between these two types of systems is also included. This leads naturally on to Chapter 6 where the group method of data handling is mapped onto a multilayer perceptron type configured parallel network, and a detailed treatment of some related self-organizing systems and fuzzy logic based control schemes is given.


4. ARTIFICIAL NEURAL NETWORKS: A POSSIBLE TOOL FOR THE PROCESS ENGINEER

M.J. Willis, G.A. Montague, A.J. Morris and M.T. Tham 109

4.1 Introduction 109

4.2 Process modelling via artificial neural networks 110

4.2.1 Network training 111

4.2.2 Topology selection procedure 114

4.2.3 A simple comparison of the training paradigms 116

4.3 Dynamic modelling using a FANN 117

4.4 Industrial application results 120

4.4.1 Artificial neural networks in estimation 120

4.4.2 Artificial neural networks in control 122

4.4.3 Multi-input single-output (MIMO) neural network based control 125

4.4.4 Multi-input multi-output (MIMO) neural network based control 129

4.5 Conclusions 131

Acknowledgments 132

References 132


5. THE B-SPLINE NEUROCONTROLLER

M. Brown and C.J. Harris 134

5.1 Introduction 134

5.2 Inverse plant modelling 136

5.2.1 Parameter adaptation 137

5.3 Backwards error propagation for multilayer perceptions 140

5.3.1 The sigmoid 140

5.3.2 Multilayer networks 142

5.4 B-splines for nonlinear modelling 143

5.4.1 Polynomial basis functions 144

5.4.2 Computational cost 147

5.4.3 Input space metrics 148

5.4.4 Parallel implementation 150

5.4.5 Similarity with the Albus CMAC and fuzzy logic 150

5.5 Weight adaptation 152

5.5.1 Recursive least squares 152

5.5.2 Least-mean squares and normalized least-mean squares 153

5.5.3 Stochastic approximation LMS and NLMS 155

5.5.4 The Albus CMAC updating rule 156

5.5.5 Parameter convergence over a restricted domain 157

5.6 An example: nonlinear times series prediction 159

5.6.1 Piecewise linear B-splines 161

5.6.2 Piecewise linear and quadratic B-splines 162

5.6.3 Comparison with radial basis functions 162

5.7 Conclusions 164

Appendix 165

References 166


6. PARALLEL PROCESSING FOR SELF-ORGANIZING CONTROL SYSTEMS

D.A. Linkens 168

6.1 Introduction 168

6.2 Group method of data handling 169

6.2.1 Selection criteria 172

6.2.2 GMDH and parallel processing 172

6.3 Self-organizing control

6.3.1 Elementary SOC and principle of operation 176

6.3.2 High-speed SOC for multiple goal/multiple actuator

control 177

6.3.3 Adaptive learning control 180

6.4 Fuzzy logic control 183

6.4.1 Sequential fuzzy logic control 185

6.4.2 Parallel fuzzy logic control 188

6.4.3 Self-organizing fuzzy logic control 188

6.4.4 Simulation of simple fuzzy logic control 194

6.4.5 Simulation of self-organizing fuzzy logic control 196

6.5 Conclusions 203

Acknowledgments 204

References 204

PART - III TRANSPUTER NETWORKS

Instead of silicon fabrication, VLSI-oriented architectures, such as those detailed in the previous sections, can also be mapped onto, and simulated by, a parallel network configured by the user from commercially available microprocessors. This has obvious cost benefits but also retains benefits arising from the original architecture. These include modularity, spatial and temporal locality and boundary input/output (I/O) interfacing from the systolic array.

The application of processor networks to control engineering can be traced back to 1970s when decentralized and distributed machines began to be used in implementations. Currently the design and application of transputer based networks is a very active subject across a very broad spectrum of areas. This section contains a representative cross-section of control oriented applications.

In effect, a transputer is a microprocessor with a reduced instruction set computer (RISC) architecture, which supports parallelism at the hardware level and can be used as a 'building block' for concurrent networking. Basically, all members of the transputer family have a main processor, local memory, external memory interface and four intertansputer communication 'links' integrated on a single silicon chip, together with external interrupt and internal timers for real-time applications. The most important feature of the transputer is, from the parallel processing point of view, the on-chip inclusion of four communication links, which permit 'easy' construction of concurrent networks and 'fast; intertransputer communications. An in-depth treatment of the transputer system can be found in, for example, the references cited in Chapters 9 and 10.

Occam is the 'companion language' of the transputer system. This is a high-level language but is the lowest-level coding dedicated to transputers. It can directly map and match a concurrent procedure/module in software to one transputer in the hardware network. Once source of an in-depth treatment of the features of Occam is again the references cited in Chapters 9 & 10. Note also that other languages such as Parallel C, Pascal and Fortran can be run on a transputer network under certain compilers. Given that Occam is an 'easy to read' code, it, rather than other languages or pseudo-codes, is used to demonstrate the algorithms developed in the chapters of this section.

Applications to the control of robotic systems is the subject of Chapter 7 which consists of three main sections. These are (a) problem definition for the UMI RTX robotic manipulator, (b) system specification and high-level design issues using the DeMarco methodology, and (c) implementation issues. A case study is used to highlight the very strong benefits of using transputers in this very topical applications area and the ease with which they can be interfaced with other integrated circuit devices.

Target tracking, an area which should benefit significantly from appropriate application of parallel processing, is the subject of Chapter 8. The particular aspect of this very wide ranging area considered is the problem of tracking a large number of objects using information, corrupted by noise, from one or more sensors. further, the work detailed here on track partitioning, distribution and clustering configuration should also be of interest in developing similar type parallel implementations of other schemes.

Chapter 9 is based on the so-called heterogeneous system approach to parallel control and simulation problems. The motivation for this is other work which has revealed shortcomings in the ability of the transputer to cope with the demands of real-time control software. In particular, there is evidence to suggest that the architecture granularity (a measure of computation power to interprocessor communications overhead) is not appropriately matched. This, in turn, suggests that fusing the finer granularity of parallel digital signal processing (DSP) chips, such as the IMS A100, with the transputer's ability to handle irregular computations and manage parallel operations should prove highly effective.

Use of parallel processing enables the computations to be organized in a distributed sense. Hence it is possible t provide a fault tolerance capability, i.e. an operational failure results in performance degradation rather than complete operational failure. This is one of the major benefits of 'parallel processing for control; and is the subject of Chapter 10. Both software (Occam based) and hardware (transputer systems) methods are developed, including some configured to operate in real time.


7. DESIGN AND IMPLEMENTATION OF A TRANSPUTER BASED ROBOT CONTROL SYSTEM

M.I. Barlow, S.E. Burge and A.P. Roskilly 209

7.1 Introduction 209

7.2 Problem definition 210

7.2.1 The UMI RTX robotic manipulator 210

7.2.2 User requirements 211

7.3 Control system specification/design methodology 211

7.3.1 The DeMarco technique 212

7.3.2 DeMarco analysis of robot systems requirements 213

7.3.3 Partitioning of processes 218

7.4 System design and implementation 219

7.4.1 Software implementation 219

7.4.2 Hardware design and implementation 222

7.4.3 Control strategies 230

7.4.4 Parallelism and transputer assignments 230

7.5 Hints and caveats 231

7.6 Conclusions 232

References 232


8. DEVELOPMENT OF A MULTIPLE TARGET TRACKING ALGORITHM ON TRANSPUTERS

D.P. Atherton and D.M.A. Hussain 234

8.1 Introduction 234

8.2 Problem definition 236

8.2.1 The data association problem 236

8.2.2 State estimation 236

8.2.3 Initialization of the filters 239

8.3 The Track Splitting Algorithm 242

8.3.1 Track continuation 242

8.3.2 Track pruning 243

8.4 Parallel implementation 245

8.4.1 Track distribution configuration 247

8.4.2 Space partitioning configuration 250

8.4.3 Track clustering configuration 251

8.4.4 Intelligent tracking algorithm 252

8.5 Performance comparison 253

8.6 Conclusions 257

References 257


9. IMS A100/TRANSPUTER BASED HETEROGENEOUS ARCHITECTURES FOR EMBEDDED CONTROL PROBLEMS

Y.Li and E.Rogers 258

9.1 Introduction 258

9.2 A typical control computation and basic architectures 259

9.3 Transputer based coarse-grain architectures 263

9.3.1 Transputer basics 263

9.3.2 A synchronous architecture 265

9.3.3 A data-driven architecture 266

9.4 Transputer based heterogeneous architectures 267

9.4.1 Type 1 architecture 268

9.4.2 Type 2 architecture 271

9.4.3 Application to self-tuning/adaptive control 272

9.5 Conclusions 275

Appendix A 276

Appendix B 277

Appendix C 278

Appendix D 280

References 281


10. FAULT-TOLERANT PARALLEL SYSTEMS FOR CONTROL APPLICATIONS

A.M. Tyrrell 282

10.1 Introduction 282

10.2 Errors, faults and failures 284

10.3 Acceptance test design 285

10.4 Hardware methods for fault tolerance 285

10.5 Software methods for fault tolerance 287

10.5.1 Fault masking by N-version redundancy 287

10.5.2 Fault tolerance by backward error recovery 288

10.5.3 Fault tolerance by forward error recovery 288

10.6 Fault tolerance in parallel systems 289

10.6.1 Parallel systems 289

10.6.2 Fault-tolerant structures for parallel systems 290

10.6.3 Backward error recovery 290

10.6.4 Forward error recovery 292

10.7 Real-time methods 296

10.7.1 Parallel watch-dog mechanism 297

10.7.2 Busy polling watch-dog 299

10.7.3 Scheduling 299

10.8 A case study 301

10.8.1 Introduction to the problem 301

10.8.2 Distribution of tasks 302

10.8.3 Design of acceptance tests 302

10.8.4 Faults injected 303

10.8.5 Results 304

10.9 Conclusions 306

References 307

PART IV - PARALLEL MACHINES AND ALGORITHMS

An alternative to VLSI-oriented systems or the construction of user-configured networks using 'building block' type processors, is to employ existing parallel machines. An early example of these systems is the vector processor, of which the Cray is an example; but such systems do not fully exploit parallelism. A single-instruction multiple-data (SIMD) system does permit full parallelism and examples here include the Illiac IV and the ICL Distributed Array Processor. further, tin the CDC Cyber 205, pipelining techniques are combined with parallelism to increase efficiency and yield higher performance. They are, however, less suitable for general-purpose computing where the data are not inherently in the form of 'large' uniform arrays.

In contrast to SIMD systems, multiple-instruction multiple-date (MIMD) machines are best suited for general-purpose parallelism. This type of 'shared memory multiprocessor' is more cost effective but less scaleable than 'distributed memory multicomputers', since adding processors to a shared memory system can suffer from bus saturation. Further, the scaleability of a distributed memory system is strongly dependent on the topology used.

One of the most common and successful topologies is the hypercube architecture, which provides the best trade-off between the longest path between processors and the number of physical connections each processor must have. In the case of an n-dimensional machine, for example, the architecture typically consists of 2n processors, each of which is nearest-neighbor connected by n bidirectional and asynchronous communication channels. Examples include the Intel iPSC Ametek System 14, NCUBE/TEN and the Connection Machine from Thinking Machines.

Of the various parallel machines, user reconfigurable systems provide better flexibility for different applications. Further, they also allow topology reconfiguration by programs and are best suited for the design and simulation of VLSI-oriented architectures, such as those developed in Parts I and II. An early example is the Accelerated Processors Model 10 system which has between 4 and 12 grooves of 8 arithmetic logic units. Recent examples based on transputers include the Microway Quadputer, the Meiko Computing Surface System and the Parsytec clusters, and Giga Cube machines. These machines are also well suited to the development of the architectures of Part III.

Chapter 11 presents an overview of the hardware of various parallel machines from a control applications standpoint. Strong emphasis is placed on interprocessor connection/communication schemes and, in particular, the hypercube case. This is supported by brief case studies on process and vehicle control systems application.

In Chapter 12, the subject is the use of currently available parallel machines to develop software for control systems design. This take the form of a tutorial-style survey of the current 'state of the art', plus some open research problems using, as an illustrative basis, some algorithms and programming techniques for use with hypercubes. A very important point emerging here is that the development of parallel algorithms for (commonly used) control systems design algorithms is not simply a case of modifying existing (sequentially based) software to run in parallel.


11. PARALLEL COMPUTING ARCHITECTURES AND MACHINES FOR TIME-CRITICAL CONTROL APPLICATIONS

K.J. Hunt 311

11.1 Introduction 311

11.2 Parallel architectures and machines 312

11.2.1 Bus based connections 312

11.2.2 Crossbar connections 313

11.2.3 Hypercube connections 314

11.2.4 Multistage switch connections 316

11.2.5 Transputer based systems 317

11.3 Case studies 317

11.3.1 Case study 1: BBN (process control) 317

11.3.2 Case study 2: Alliant (vehicle simulation/control) 320

11.4 Conclusions 320

References 321


12. USING PARALLEL ALGORITHMS IN THE DESIGN OF CONTROL SYSTEMS

E. Rogers and Y.Li 322

12.1 Introduction 322

12.2 Applications areas 323

12.2.1 2S systems 323

12.2.2 Large space structures 327

12.3 Hypercubes 330

12.4 Algorithms 335

12.4.1 Multivariable frequency response matrix 335

12.4.2 Generalized algebraic Riccati equation 341

12.5 Conclusions 351

Appendix 353

References 355


INDEX 357