DSP Chips

1.  Introduction

This note assesses the technical capabilities of the DSP processor implementations based on general purpose standard parts, commercially available single function custom processors and application specific integrated circuits (ASICs).

2.  Sampling and quantisation

At the input of any digital processor the analogue signal must be sampled and quantized into a multi-bit word which represents the signal sample magnitude value. There are a wide range of analogue-to-digital converter (ADC) techniques: successive approximation; sigma-delta; and parallel (flash) approaches, which offer different conversion rates and accuracy. Table 1 and [1] lists these for commercially available parts giving typical accuracy and maximum conversion rates. The fastest converters are the flash type where, for an n-bit word output, 2n-1 digital comparators are required in the ADC. Here the conversion rate is closely related to accuracy, Table 1 and, for short world-lengths, then sophisticated high speed comparator designs can be contemplated.

Converter type Accuracy Conversion
no. of bits rate (ksample/s)
Successive approximation 10 50
12 100
Sigma-delta 18 48
Serial-parallel 12 20,000
4 30,000,000
Parallel (flash) 8 1,000,000
14 10,000
Table 1 Commercial ADC parts.

For audio applications there is a preference now to use the sigma delta converters of Table 1. Table 2 gives more details on specific devices with the corresponding references Journal Solid State Circuits (JSSC), International Solid State Circuits Conference (ISSCC) and Journal Audio Engineering Society (JAES).

Bandwidth (kHz) fs (MHz) Resolution (bits) Application Reference
4 4 13 speech JSSC 12/88
4 1 13 speech JSSC 4/89
20.5 5.25* 16 audio JSSC 6/93
24 6.14* 18 audio JAES 3/86
40 10.24 14 ISDN JSSC 4/89
100 3.25* 15 cellular radio ISSCC 94
1,000 50 12 ultrasound ISSCC 91
* These converters use more than a single bit quantiser.

Table 2 Commercial sigma-delta ADC's

3.  DSP processor functions

The main processing elements in a matched filter receiver are the matched filter or correlator to detect the coded spread sequence, the synchroniser to ensure correct timing information, an adaptive antimultipath processor to accommodate time-varying Doppler shifted fading signals and possibly some form of multiple-access interference suppression.

Figure 1 Active correlator receiver.

There are two basic approaches to realising the code matched filter. The simplest method is the active correlator, Figure 1, where the spreading sequence is multiplied chip-by-chip with an accurately synchronised receiver reference (spreading code) sequence. This is very simple to implement, but initial synchronisation is slow. The alternative is a full finite impulse response (FIR) matched filter, Figure 6.1, which provides verification that the sequence has been achieved through the correlation peak and provides the synchronisation information from the timing of the received pulse.

Figure 2 Alternative finite impulse response filter design to the Figure 6.1 structure.

The RAKE anti-multipath processor is also based on similar correlator or FIR filter architectures, Figure 6.1, and the interference suppression processor uses not dissimilar components. The heart of all these functions is a multiply and accumulate (MAC) processing element, Figure 3, which is controlled from program memory and which accesses data samples, x(n), and filter coefficients, h(n), from associated memory. The performance benchmarks for ASICs can be found by examining the MAC processing capabilities of standard DSP microprocessors, Table 3.

Figure 3 Multiply and accumulate (MAC) processing element.

The processors in Table 3 have a range of MAC rates from 10 MHz (TMS 320C25) to 40 MHz (TMS 320C50) and this computational rate is continuously decreasing with new chip launches and semiconductor feature size reduction each year. One of the major choices is between fixed-point or floating-point arithmetic and hence complexity and cost in the MAC design. Fixed-point processors are cheaper ranging from £ 10 to £ 50 per part. Floating-point processors can be obtained for £ 25 but the average cost is nearer to £ 100. Generally speaking the correlator and filtering functions of Figures 1 and 6.1 can be readily accomplished in fixed-point processors without introducing significant processor quantisation distortion effects. Recursive estimation and matrix operations such as is used in adaptive filters, Chapter 9, tend to be the main consumers of floating-point hardware. Power consumption for hand held devices is of paramount importance and here the relative simplicity of the fixed-point processor has distinct advantages. A major other factor is the ready availability of software to code up the DSP algorithm. Here the choice of processor may be made in favour of a vendor with readily available software, such as cross-compilers, to make use of his existing software developments.

MAC No. bits No. bits
Company Part time (ns) fixed pt. float pt.
DSP16 55 16/36
AT&T DSP16A 33 16/36
DSP32 160 16 32/40
DSP32C 80 16 or 24 32/40
Motorola DSP56001 74 24/56
DSP96002 75 32/64 44/96
TMS320C10 114-280 16/32
Texas Inst. TMS32C25 80-100 16/32
TMS320C50 25-50 16/32
TMS320C30 50-75 24/32 32/40
TMS320C40 40-50 32/40
TMS320C6201 5 32
Analog Devices ADSP2100 125 16/40
ADSP2100A 80 16/40
ADSP21065L 6 32 32/40

Table 3 Widely used DSP microprocessor devices.

Processor simulation is often performed in high-level C-code software routines. Ultimately it is necessary to verify this on real signals to model, in detail, a complete receiver. This stage will often be undertaken using the DSP microprocessors of Table 3 with the code crosscompiled into assembly language, and the DSP chip incorporated into a development board such as the products listed in Table 4. This enables the system concept to be verified before committing a processor design to an ASIC production phase.

Company Product Computing Constituent
host processor
Ariel PC, SUN DSP 56001
& McIntosh DSP 32
Atlanta Signal Processors (ASPI) Several PC TMS 320 Series
DSP Research TIGER PC TMS 320C30
Image and signal Processing (ISP) Point PC TMS 320C30
TMS 320 Series
Loughborough Full PC, SUN DSP 32
Sound Images (LSI) range DSP 56000

Table 4 DSP board products.

4.  Coded matched filters

The requirement in a spread spectrum receiver to use a linear phase filter design to permit the individual code chips to be delayed and summed to recognise and despread the received coded waveform. This favours the use of the linear-phase FIR design of Figure 6.1.

These requirements for linear-phase filtering and, the recognition that the FIR filter requires a large number of stages (32-256) to achieve a useful frequency response, has placed limitations on the use of the standard DSP parts in Table 3 for FIR filter implementation. Here the input sample time is given by the product of the MAC time with the number of taps in the filter. Thus several manufacturers have developed high sample rate FIR specialist filter parts, Table 5. These recognise that the feedforward FIR filter can often use lower accuracy in the tap weights, than 16- or 24-bit fixed point provision and, this permits, the design of faster, lower precision, MAC elements where more than one can be incorporated into the chip. Table 5 lists a range of FIR filter parts, developed over the last decade, with typically 8-32 taps per filter integrated circuits. These are usually cascadable to increase the filter order. These chips all use the alternative FIR architecture of Figure 2 in place of Figure 6.1.

The Inmos A100 is interesting as it uses a serial-parallel multiplier implementation where rate can be traded for arithmetic accuracy, Table 5. The filters in Table 5 typically combine 10 bits of input sample quantisation accuracy with 8-16 bit coefficient tap weight accuracy at sample rates of 2.5-45 MHz. They are thus fully compatible with the flash ADC's of Table 1 and, for the reduced processor accuracy, offer the full or even faster computational rate than the single MAC processing element included in the DSP microprocessors of Table 3.

Figure 6.15 shows the degradation in performance for a FIR filter employing 12-bit and 8-bit filter coefficients compared with an infinite precision (i.e. a floating-point arithmetic design). Table 5 shows various products offered over the years from TRW, GEC Plessey Semiconductors (GPS), Inmos and Harris. The component from Marconi Electronic Devices Ltd (MEDL) is no longer available as this had to use a silicon on saphire, rather than simple silicon semiconductor process, to achieve this level of filter complexity in the middle 80's. Small geometry silicon is now used exclusively for these products.

TRW TRW MEDL Inmos GPS Harris
TDC 1028 TMC 2243 MA 7180 A100 PDSP 16256 HSP 43168
Number of taps 8 3 9 32 16-128 16
Input data
accuracy (bits) 4 10 10 16 16 10
Tap weight
accuracy (bits) 4 10 8 4-16 16 10
Output signal
accuracy (bits) 13 16 16/22 24 32 19
Sampling (MHz) 20 20 20 10-2.5 25-3.1 45
Table 5 Examples of integrated digital FIR filters.

In many spread spectrum applications the coded sequence is binary and then the multipliers can be simplified to have only single bit (bipolar) tap weight control - replacing the multiplier operation with a multi-bit exclusive-or operation. The requirement for such FIR receivers has resulted in the development of the components in Table 6. One of these, the TRW TMC 2220, is optimised for complex processing of the in-phase (I) & quadrature (Q) demodulated signals. A major feature in Table 6 is that the simplification of the multiplier into the exclusive-or operation gives typically a fourfold increase in the filter length extending it into a 64-tap capability. Note that some implementations only use single-bit input signal quantisation. This has been shown to incur only a 1 dB degradation in the code matched filter processing gain in the receiver.

TRW TRW TRW MEDL ST TEL
1004 TMC 2220 TMC 2023 MA 7170 3310
Number of taps 64 32 (× 4) 64 64 64
Input data 1 4-real 1 4 3
accuracy (bits) 2-complex
Accumulator
accuracy (bits) analogue 10 7 16 12
Sample rate (MHz) 10 20 30 10 20
Table 6 Examples of integrated binary weighted correlators.

In some applications there is a need to realise extremely long FIR filters (e.g. radar matched filtering) where the number of taps is 1,000 or greater). Now the FIR filter of Figures 6.1 and 2 requries N multiplies per input sample point for an N-tap filter, precluding the use of the DSP microprocessors of Table 3 as the effective input sample rate is only 10's of kHz which is too low.

Here, it is better to replace the convolution operation by a multiplication in the frequency domain, Figure 11.9, and deploy FFT processors to implement the DFT operations. This introduces block processing and necessitates double length FFT's compared to the required filter length, but, with FFT's, the overall number of MAC operations is reduced considerably.

Using the Figure 11.9 approach a 4096-point FIR filter or convolver which operated at the very impressive 40 Msample/s input data rate was built in the late 1970's for a radar application! If the filter weights are fixed then the H(k) values can be precomputed and stored. For a 4096-point convolution the Figure 11.9 solution thus involves 8192×13×2 complex FFT MAC operations plus a further 8192 complex multiplies for the X(k)H(k) product operation. The total number of MAC operations is thus 8193×28, which is a considerable saving over the 40962~= 16M operations, of a conventional, Figure 6.1, FIR filter implementation.

This, plus the use of FFT's as spectrum analysers to recognise the presence of a carrier modulation, for synchronisation has spurred the development of FFT integrated chipsets, Table 7. This shows commercially available 1024 & 4026-point processors and a development by CNET of France of a FFT, compared to selected general purpose devices; the Analog Devices 21160 Sharc processor as well as the MPC 7400 (G4 AltiVec) RISC processor. With input sample rates of 10-20 MHz the integrated FFT's of Table 7 offer similar FIR filter speeds to the processors of Table 5, which is again much faster than the DSP microprocessor solution of Table 3.

GPS SHARP Fr TELECOM AD MPC
PDSP 16510 LH 9124/9320 (CNET) 21160 7400
No. transform points 1024 1024 4096 1024 8192 1024 1024
Input data accuracy (bits) 16 8-24 8-24 ? ? 32 32?
Transform time (µs) 98 80.7 312 50 400 90 60
Input sample rate (MHz) 10 12 13 20 20 11 17
Complex FIR rate (ns/tap) 5 5

Table 7 Integrated FFT chipsets.

5.  Frequency generation

Table 8 provides typical performance details for some of the commercially available high switching speed synthesisers which offer 10's of ns switching times. The GEC Plessey Semiconductor (GPS) synthesiser has a high output frequency but lower output accuracy in bits and hence reduced spectral purity, compared to the other devices. The Stanford Telecom (ST TEL) synthesiser offers separate control of the output amplitude and phase. The Harris synthesiser has a larger number of control bits with consequent improved resolution capability. Resolution is given by the clock rate divided by 2N where N is the number of control bits and hence this is directly controlled by the clock rate. All these synthesisers offer phase coherence from hop-to-hop for use in fast hopped spread spectrum systems.

Harris ST TEL GPS Sciteq An Dev QCOM
HSP 45102/6 STEL 1179 SP 2001 DDS1 AD9955 Q2334
Output accuracy (bits) 12/16 12/13 A/o 8 12 12 12
No. of control bits 32 24 16 32 32 32
Max. clock (MHz) 25-40 25 350 25 100 50
Max. output (MHz) 10 10 100 11 12 10
Resolution (Hz) 0.0009 1500 5000 0.006 0.0015 0.005
Switching time (ns) 30 45 17 40 25 20?
Spectral purity (dBc) -90 -75 -40 -60 -90 -72

Table 8 Integrated circuit frequency synthesisers.

6.  Summary

This note has reviewed the DSP requirements for matched filtering and other receiver processor functions. Tables have been provided to indicate some of the custom processors which are available today to implement high speed synthesisers for waveform generation, matched filters for signal detection. These tables can never be fully comprehensive as new developments are continually being announced but, it is hoped that they provide system engineers with some idea of the capabilities of current ASIC products, in terms of input sample rates and arithmetic accuracy.

7.  Rererence