DSP Chips
This note assesses the technical capabilities of the DSP processor implementations based on general purpose standard parts, commercially available single function custom processors and application specific integrated circuits (ASICs).
At the input of any digital processor the analogue signal must be sampled and quantized into a multi-bit word which represents the signal sample magnitude value. There are a wide range of analogue-to-digital converter (ADC) techniques: successive approximation; sigma-delta; and parallel (flash) approaches, which offer different conversion rates and accuracy. Table 1 and [1] lists these for commercially available parts giving typical accuracy and maximum conversion rates. The fastest converters are the flash type where, for an n-bit word output, 2n-1 digital comparators are required in the ADC. Here the conversion rate is closely related to accuracy, Table 1 and, for short world-lengths, then sophisticated high speed comparator designs can be contemplated.
| Converter type | Accuracy | Conversion |
| no. of bits | rate (ksample/s) | |
| Successive approximation | 10 | 50 |
| 12 | 100 | |
| Sigma-delta | 18 | 48 |
| Serial-parallel | 12 | 20,000 |
| 4 | 30,000,000 | |
| Parallel (flash) | 8 | 1,000,000 |
| 14 | 10,000 |
For audio applications there is a preference now to use the sigma delta converters of Table 1. Table 2 gives more details on specific devices with the corresponding references Journal Solid State Circuits (JSSC), International Solid State Circuits Conference (ISSCC) and Journal Audio Engineering Society (JAES).
| Bandwidth (kHz) | fs (MHz) | Resolution (bits) | Application | Reference |
| 4 | 4 | 13 | speech | JSSC 12/88 |
| 4 | 1 | 13 | speech | JSSC 4/89 |
| 20.5 | 5.25* | 16 | audio | JSSC 6/93 |
| 24 | 6.14* | 18 | audio | JAES 3/86 |
| 40 | 10.24 | 14 | ISDN | JSSC 4/89 |
| 100 | 3.25* | 15 | cellular radio | ISSCC 94 |
| 1,000 | 50 | 12 | ultrasound | ISSCC 91 |
Table 2 Commercial sigma-delta ADC's
The main processing elements in a matched filter receiver are the matched filter or correlator to detect the coded spread sequence, the synchroniser to ensure correct timing information, an adaptive antimultipath processor to accommodate time-varying Doppler shifted fading signals and possibly some form of multiple-access interference suppression.
Figure 1 Active correlator receiver.
There are two basic approaches to realising the code matched filter. The simplest method is the active correlator, Figure 1, where the spreading sequence is multiplied chip-by-chip with an accurately synchronised receiver reference (spreading code) sequence. This is very simple to implement, but initial synchronisation is slow. The alternative is a full finite impulse response (FIR) matched filter, Figure 6.1, which provides verification that the sequence has been achieved through the correlation peak and provides the synchronisation information from the timing of the received pulse.
Figure 2 Alternative finite impulse response filter design to the Figure 6.1 structure.
The RAKE anti-multipath processor is also based on similar correlator or FIR filter architectures, Figure 6.1, and the interference suppression processor uses not dissimilar components. The heart of all these functions is a multiply and accumulate (MAC) processing element, Figure 3, which is controlled from program memory and which accesses data samples, x(n), and filter coefficients, h(n), from associated memory. The performance benchmarks for ASICs can be found by examining the MAC processing capabilities of standard DSP microprocessors, Table 3.
Figure 3 Multiply and accumulate (MAC) processing element.
The processors in Table 3 have a range of MAC rates from 10 MHz (TMS 320C25) to 40 MHz (TMS 320C50) and this computational rate is continuously decreasing with new chip launches and semiconductor feature size reduction each year. One of the major choices is between fixed-point or floating-point arithmetic and hence complexity and cost in the MAC design. Fixed-point processors are cheaper ranging from £ 10 to £ 50 per part. Floating-point processors can be obtained for £ 25 but the average cost is nearer to £ 100. Generally speaking the correlator and filtering functions of Figures 1 and 6.1 can be readily accomplished in fixed-point processors without introducing significant processor quantisation distortion effects. Recursive estimation and matrix operations such as is used in adaptive filters, Chapter 9, tend to be the main consumers of floating-point hardware. Power consumption for hand held devices is of paramount importance and here the relative simplicity of the fixed-point processor has distinct advantages. A major other factor is the ready availability of software to code up the DSP algorithm. Here the choice of processor may be made in favour of a vendor with readily available software, such as cross-compilers, to make use of his existing software developments.
| MAC | No. bits | No. bits | ||
| Company | Part | time (ns) | fixed pt. | float pt. |
| DSP16 | 55 | 16/36 | ||
| AT&T | DSP16A | 33 | 16/36 | |
| DSP32 | 160 | 16 | 32/40 | |
| DSP32C | 80 | 16 or 24 | 32/40 | |
| Motorola | DSP56001 | 74 | 24/56 | |
| DSP96002 | 75 | 32/64 | 44/96 | |
| TMS320C10 | 114-280 | 16/32 | ||
| Texas Inst. | TMS32C25 | 80-100 | 16/32 | |
| TMS320C50 | 25-50 | 16/32 | ||
| TMS320C30 | 50-75 | 24/32 | 32/40 | |
| TMS320C40 | 40-50 | 32/40 | ||
| TMS320C6201 | 5 | 32 | ||
| Analog Devices | ADSP2100 | 125 | 16/40 | |
| ADSP2100A | 80 | 16/40 | ||
| ADSP21065L | 6 | 32 | 32/40 |
Table 3 Widely used DSP microprocessor devices.
Processor simulation is often performed in high-level C-code software routines. Ultimately it is necessary to verify this on real signals to model, in detail, a complete receiver. This stage will often be undertaken using the DSP microprocessors of Table 3 with the code crosscompiled into assembly language, and the DSP chip incorporated into a development board such as the products listed in Table 4. This enables the system concept to be verified before committing a processor design to an ASIC production phase.
| Company | Product | Computing | Constituent |
| host | processor | ||
| Ariel | PC, SUN | DSP 56001 | |
| & McIntosh | DSP 32 | ||
| Atlanta Signal Processors (ASPI) | Several | PC | TMS 320 Series |
| DSP Research | TIGER | PC | TMS 320C30 |
| Image and signal Processing (ISP) | Point | PC | TMS 320C30 |
| TMS 320 Series | |||
| Loughborough | Full | PC, SUN | DSP 32 |
| Sound Images (LSI) | range | DSP 56000 |
Table 4 DSP board products.
The requirement in a spread spectrum receiver to use a linear phase filter design to permit the individual code chips to be delayed and summed to recognise and despread the received coded waveform. This favours the use of the linear-phase FIR design of Figure 6.1.
These requirements for linear-phase filtering and, the recognition that the FIR filter requires a large number of stages (32-256) to achieve a useful frequency response, has placed limitations on the use of the standard DSP parts in Table 3 for FIR filter implementation. Here the input sample time is given by the product of the MAC time with the number of taps in the filter. Thus several manufacturers have developed high sample rate FIR specialist filter parts, Table 5. These recognise that the feedforward FIR filter can often use lower accuracy in the tap weights, than 16- or 24-bit fixed point provision and, this permits, the design of faster, lower precision, MAC elements where more than one can be incorporated into the chip. Table 5 lists a range of FIR filter parts, developed over the last decade, with typically 8-32 taps per filter integrated circuits. These are usually cascadable to increase the filter order. These chips all use the alternative FIR architecture of Figure 2 in place of Figure 6.1.
The Inmos A100 is interesting as it uses a serial-parallel multiplier implementation where rate can be traded for arithmetic accuracy, Table 5. The filters in Table 5 typically combine 10 bits of input sample quantisation accuracy with 8-16 bit coefficient tap weight accuracy at sample rates of 2.5-45 MHz. They are thus fully compatible with the flash ADC's of Table 1 and, for the reduced processor accuracy, offer the full or even faster computational rate than the single MAC processing element included in the DSP microprocessors of Table 3.
Figure 6.15 shows the degradation in performance for a FIR filter employing 12-bit and 8-bit filter coefficients compared with an infinite precision (i.e. a floating-point arithmetic design). Table 5 shows various products offered over the years from TRW, GEC Plessey Semiconductors (GPS), Inmos and Harris. The component from Marconi Electronic Devices Ltd (MEDL) is no longer available as this had to use a silicon on saphire, rather than simple silicon semiconductor process, to achieve this level of filter complexity in the middle 80's. Small geometry silicon is now used exclusively for these products.
| TRW | TRW | MEDL | Inmos | GPS | Harris | |
| TDC 1028 | TMC 2243 | MA 7180 | A100 | PDSP 16256 | HSP 43168 | |
| Number of taps | 8 | 3 | 9 | 32 | 16-128 | 16 |
| Input data | ||||||
| accuracy (bits) | 4 | 10 | 10 | 16 | 16 | 10 |
| Tap weight | ||||||
| accuracy (bits) | 4 | 10 | 8 | 4-16 | 16 | 10 |
| Output signal | ||||||
| accuracy (bits) | 13 | 16 | 16/22 | 24 | 32 | 19 |
| Sampling (MHz) | 20 | 20 | 20 | 10-2.5 | 25-3.1 | 45 |
In many spread spectrum applications the coded sequence is binary and then the multipliers can be simplified to have only single bit (bipolar) tap weight control - replacing the multiplier operation with a multi-bit exclusive-or operation. The requirement for such FIR receivers has resulted in the development of the components in Table 6. One of these, the TRW TMC 2220, is optimised for complex processing of the in-phase (I) & quadrature (Q) demodulated signals. A major feature in Table 6 is that the simplification of the multiplier into the exclusive-or operation gives typically a fourfold increase in the filter length extending it into a 64-tap capability. Note that some implementations only use single-bit input signal quantisation. This has been shown to incur only a 1 dB degradation in the code matched filter processing gain in the receiver.
| TRW | TRW | TRW | MEDL | ST TEL | |
| 1004 | TMC 2220 | TMC 2023 | MA 7170 | 3310 | |
| Number of taps | 64 | 32 (× 4) | 64 | 64 | 64 |
| Input data | 1 | 4-real | 1 | 4 | 3 |
| accuracy (bits) | 2-complex | ||||
| Accumulator | |||||
| accuracy (bits) | analogue | 10 | 7 | 16 | 12 |
| Sample rate (MHz) | 10 | 20 | 30 | 10 | 20 |
In some applications there is a need to realise extremely long FIR filters (e.g. radar matched filtering) where the number of taps is 1,000 or greater). Now the FIR filter of Figures 6.1 and 2 requries N multiplies per input sample point for an N-tap filter, precluding the use of the DSP microprocessors of Table 3 as the effective input sample rate is only 10's of kHz which is too low.
Here, it is better to replace the convolution operation by a multiplication in the frequency domain, Figure 11.9, and deploy FFT processors to implement the DFT operations. This introduces block processing and necessitates double length FFT's compared to the required filter length, but, with FFT's, the overall number of MAC operations is reduced considerably.
Using the Figure 11.9 approach a 4096-point FIR filter or convolver which operated at the very impressive 40 Msample/s input data rate was built in the late 1970's for a radar application! If the filter weights are fixed then the H(k) values can be precomputed and stored. For a 4096-point convolution the Figure 11.9 solution thus involves 8192×13×2 complex FFT MAC operations plus a further 8192 complex multiplies for the X(k)H(k) product operation. The total number of MAC operations is thus 8193×28, which is a considerable saving over the 40962~= 16M operations, of a conventional, Figure 6.1, FIR filter implementation.
This, plus the use of FFT's as spectrum analysers to recognise the presence of a carrier modulation, for synchronisation has spurred the development of FFT integrated chipsets, Table 7. This shows commercially available 1024 & 4026-point processors and a development by CNET of France of a FFT, compared to selected general purpose devices; the Analog Devices 21160 Sharc processor as well as the MPC 7400 (G4 AltiVec) RISC processor. With input sample rates of 10-20 MHz the integrated FFT's of Table 7 offer similar FIR filter speeds to the processors of Table 5, which is again much faster than the DSP microprocessor solution of Table 3.
| GPS | SHARP | Fr TELECOM | AD | MPC | |||
| PDSP 16510 | LH 9124/9320 | (CNET) | 21160 | 7400 | |||
| No. transform points | 1024 | 1024 | 4096 | 1024 | 8192 | 1024 | 1024 |
| Input data accuracy (bits) | 16 | 8-24 | 8-24 | ? | ? | 32 | 32? |
| Transform time (µs) | 98 | 80.7 | 312 | 50 | 400 | 90 | 60 |
| Input sample rate (MHz) | 10 | 12 | 13 | 20 | 20 | 11 | 17 |
| Complex FIR rate (ns/tap) | 5 | 5 | |||||
Table 7 Integrated FFT chipsets.
Table 8 provides typical performance details for some of the commercially available high switching speed synthesisers which offer 10's of ns switching times. The GEC Plessey Semiconductor (GPS) synthesiser has a high output frequency but lower output accuracy in bits and hence reduced spectral purity, compared to the other devices. The Stanford Telecom (ST TEL) synthesiser offers separate control of the output amplitude and phase. The Harris synthesiser has a larger number of control bits with consequent improved resolution capability. Resolution is given by the clock rate divided by 2N where N is the number of control bits and hence this is directly controlled by the clock rate. All these synthesisers offer phase coherence from hop-to-hop for use in fast hopped spread spectrum systems.
| Harris | ST TEL | GPS | Sciteq | An Dev | QCOM | |
| HSP 45102/6 | STEL 1179 | SP 2001 | DDS1 | AD9955 | Q2334 | |
| Output accuracy (bits) | 12/16 | 12/13 A/o | 8 | 12 | 12 | 12 |
| No. of control bits | 32 | 24 | 16 | 32 | 32 | 32 |
| Max. clock (MHz) | 25-40 | 25 | 350 | 25 | 100 | 50 |
| Max. output (MHz) | 10 | 10 | 100 | 11 | 12 | 10 |
| Resolution (Hz) | 0.0009 | 1500 | 5000 | 0.006 | 0.0015 | 0.005 |
| Switching time (ns) | 30 | 45 | 17 | 40 | 25 | 20? |
| Spectral purity (dBc) | -90 | -75 | -40 | -60 | -90 | -72 |
Table 8 Integrated circuit frequency synthesisers.
This note has reviewed the DSP requirements for matched filtering and other receiver processor functions. Tables have been provided to indicate some of the custom processors which are available today to implement high speed synthesisers for waveform generation, matched filters for signal detection. These tables can never be fully comprehensive as new developments are continually being announced but, it is hoped that they provide system engineers with some idea of the capabilities of current ASIC products, in terms of input sample rates and arithmetic accuracy.