CN103138714A

CN103138714A - Hardware implementation of least means square (LMS) self-adapting filter with high performance

Info

Publication number: CN103138714A
Application number: CN2013100871756A
Authority: CN
Inventors: 李奚鹏
Original assignee: SUZHOU LANGKUAN ELECTRONIC TECHNOLOGY Co Ltd
Current assignee: SUZHOU LANGKUAN ELECTRONIC TECHNOLOGY Co Ltd
Priority date: 2013-03-19
Filing date: 2013-03-19
Publication date: 2013-06-05

Abstract

The invention relates to the technical field of hardware implementation of a self-adaptive filtering algorithm based on LMS (least means square). A self-adapting filter with high performance is designed. The hardware implementation of the LMS self-adapting filter with high performance mainly gives priority to data operation and is based on a delay LMS (DLMS) algorithm. A whole system comprises two modules and two recursive loops, wherein the modules comprise a finite impulse response (FIR) filtration module and a coefficient updating module, and the recursive loops comprise a weight updating loop and an error feedback loop. The FIR filtration module comprises a multiply-accumulate unit unique to the invention, the multiply-accumulate unit is achieved by means of a stream line and a parallel technique, and the multiply-accumulate unit does not vary with the variation of an order when delay occurs. The multiply-accumulate unit optimizes multiplication and addition as a whole, and the multiplication reduces the number of partial products and addition series of an accumulator by utilizing an improved Booth code and 4-2 compressors. A sign function is adopted in the coefficient updating module, and hardware expense required by the module is reduced. According to the hardware implementation of the LMS self-adapting filter with high performance, the self-adapting filter system which is high in speed, small in area and remarkable in filtering effect is achieved.

Description

A kind of hardware of high performance LMS sef-adapting filter is realized

Technical field

The present invention relates to the realization technology field based on the sef-adapting filter of LMS algorithm.

Technical background

In high-speed digital communication, multipath fading and channel distortion may cause serious intersymbol interference (ISI), have become one of main difficulty that digital communication faces.Adaptive equalizer is widely used in eliminating ISI owing to having the performance of following the tracks of time varying signal in digital communication.A core component of adaptive equalizer is exactly sef-adapting filter.Sef-adapting filter has a lot of adaptive algorithms, and LMS is exactly wherein a kind of.Because the LMS hardware algorithm is simple in structure, respond well, therefore use very extensive.

The mathematical formulae of the adaptive principle of LMS algorithm is as follows:

Y _k= X _k ^TW _k …………………………………… (1)

e _k=d _k-Y _k=d _k- X _k ^TW _k ………………………… (2)

W _k+1=W _k+2μe _kX _k …………………………… (3)

Y in formula (1) _kBe filter output; D in formula (2) _kBe desired signal.e _kBe error signal, be used for adjusting the filter weights coefficient.W _kBe k weighted vector constantly, W _k=[w ₀, w ₁, w ₂W _L-1]; X _kBe k input data vector constantly, X _k=[x ₀, x ₁, x ₂X _L-1].Wherein L is the exponent number of filter, i.e. the number of weights coefficient.Formula (3) is the weights iteration expression formula of LMS algorithm, and wherein μ is step-length, affects convergence rate and the steady-state error of LMS.In the span that guarantees weight convergence, the μ value is larger, and convergence rate is faster, but steady-state error is also larger.Therefore to reasonably select the size of μ value, to satisfy the requirement of system.

In traditional LMS algorithm, FIR filtering and weights coefficient update are completed in the interval at the same time.This has just limited the application of this algorithm under the high speed real time environment.For this limitation, the people such as G Long have proposed time-delay LMS(DLMS) algorithm.In this algorithm, FIR filtration module and right value update module can walk abreast and carry out, and this just uses for the streamline of FIR filtration module possibility is provided.

The mathematical formulae of DLMS algorithm and weights iterative formula are respectively:

Y _k-D= X _k-D ^TW _k-D-1 ………………………… (4)

e _k-D=d _k-D-Y _k-D …………………………… (5)

W _k+1=W _k+2μe _k-DX _k-D …………………………(6)

Summary of the invention

The structure that the present invention proposes can greatly reduce the hardware spending of sef-adapting filter, and by adopting streamline and concurrent technique, has greatly improved the speed of service of system.

The present invention has adopted following technical scheme in order to realize above-mentioned requirements.

By claim 3, adopted DLMS(Delay LMS in the present invention) algorithm.FIR filtration module and coefficient updating module can adopt pipelining to realize like this.For example the FIR filtration module adopts 3 level production lines to realize, inputs 3 the timeticks inputs of also delaying time of data and expected data.

By claim 4 and 5, a kind of special multiply accumulating unit has been proposed in the present invention.Consider that multiply operation can be divided into addition and shifting function, thus can be with multiplication and after multiplying each other cumulatively do as a whole being optimized, the time-delay of this multiply accumulating unit just almost has nothing to do with the exponent number of filter like this.

The requirement 4 of having the right, multiplier in the present invention adopts follow-on Booth to encode to reduce the number of partial product, and introduced 4-2 compression unit and Wallace tree structure and come compression section long-pending, thereby reduced the progression of cumulative operation, reduced the cumulative time-delay of partial product.

The present invention has passed through hardware verification.With Verilog, the modules in the present invention is carried out the RTL coding, after functional simulation, it has been carried out the prototype verification of FPGA under the QuartusII platform.At DC(Design Compiler) carried out logic synthesis under environment, and the emulation after having carried out comprehensively, synthesis result and simulation result all show the present invention designed system can stable operation under the high-frequency clock frequency of 200MHz, and hardware spending is also very little.

Description of drawings

Fig. 1: the system block diagram of the sef-adapting filter of DLMS algorithm

Fig. 2: Direct-type FIR Filter structure chart

Fig. 3: transposition type FIR filter graph architecture

Fig. 4: the quick structure chart of multiply accumulating (MAC) unit

Fig. 5: 4-2 compressor configuration figure

Fig. 6: mimimum adder structure chart

Fig. 7: right value update modular structure figure

Embodiment

In order to reach high performance requirement, the present invention has carried out special optimization to the framework of filter system, and has invented a kind of new multiply accumulating unit, has greatly improved the operating rate of system.

By formula (4), (5), (6) as seen, the DLMS sef-adapting filter has the two-way input, is respectively sampled data input and desired signal input.Therefore have two modules in the DLMS structure: FIR filtration module and coefficient updating module; And two recurrence loops: right value update loop and Error Feedback loop.Its system block diagram as shown in Figure 1.

The FIR filter has two kinds of structures, is respectively Direct-type and transposition type, and their structure chart respectively as shown in Figures 2 and 3.Can find out from Fig. 2 and 3, the maximum delay of Direct-type is Tm+MTa, and the maximum delay of transposition type is Tm+Ta, and wherein Tm is the time-delay of multiplier, and Ta is the time-delay of adder, and M is the exponent number of filter.Seem the time-delay of transposition type FIR and M irrelevant, but along with the increase of M, it is large that input load becomes, must Buffer insertion to increase driving force, M is larger, the time-delay that buffer is introduced is also larger, so that lost speed advantage.Simultaneously the required register number of transpose configuration considerably beyond the needed register number of Direct-type FIR, causes the area waste.Therefore the present invention selects Direct-type FIR, and is optimized FIR from system-level, and introduce pipelining, improve the circuit working clock frequency.

The FIR filter circuit mainly is made of multiplication unit and adder unit, if directly realize according to Fig. 2, the critical path time-delay is too large, is unfavorable for the realization of High Speed of system, therefore should top structure be optimized.In hardware was realized, multiply operation finally showed as shifting function and add operation, is optimized so multiplication and addition in the FIR structure can be used as an integral body.For this reason, the present invention proposes a kind of multiply accumulating (MAC) unit of uniqueness.Using the time-delay of the Direct-type FIR of this MAC unit realization to change with the variation of exponent number M hardly, is probably Tm+Ta.Multiplier in this unit adopts improved B ooth multiplier, and by the 4-2 compressor reducer, partial product is compressed, and selects wallace tree structure optimized circuit progression, adopts at last carry look ahead and carry select method to realize mimimum adder.Whole MAC unit block diagram as shown in Figure 4.

The modified Booth encoding algorithm is extensively to be used in the design of high-speed gear, because it can make partial product quantity reduce half, thereby reduces number of adders and operation time, and multiply operation speed is provided.Improved B ooth algorithm coding rule is shown in Table 1.

Table 1 modified Booth encoding rule

Figure 2013100871756100002DEST_PATH_IMAGE001

Although Booth coding can reduce the partial product of half, but the multiplier bit wide is when very large, the quantity of partial product or a lot.If with the direct addition of these partial products, the time-delay of adder carry chain is too large, the speed of restriction system.Therefore introduced the 4-2 compression unit in the present invention, this unit has very large degree of parallelism, and 5 inputs and 3 outputs are arranged.The 4-2 compressor reducer can equivalence be two full adders, if its structure is optimized, can so that its time-delay less than the time-delay of two full adders.Exactly the great advantage of 4-2 compressor reducer is delayed time irrelevant with carry, when being about to N 4-2 compressor reducer and connecting, total time-delay equal the time-delay of a 4-2 compressor reducer.So the 4-2 compressor reducer is often long-pending for the fast multiplier compression section with the array structure of wallace tree, is made of 3 4-2 compressor reducers exactly as the 8-2 compressor reducer in Fig. 4.In the present invention 4-2 compressor reducer used is through after optimizing, 1.5 times of only delaying time for full adder of its time-delay.Its logic diagram such as Fig. 5.Last operation in the FIR filtration module is quick addition, is used for final two the partial product additions after overcompression.The time-delay of traditional ripple carry adder is directly proportional to the figure place N of addition number, and circuit delay is too large, is not suitable in High Speed System.So the present invention adopts carry look ahead and square root carry select structure to realize mimimum adder, its structure chart as shown in Figure 6.

According to formula (6), the right value update circuit needs two multipliers and an adder, but considers that step size mu is constant, can realize with shifting function with multiplying each other of constant, thereby save a multiplier.Even like this, the right value update circuit still needs 1 multiplier and 1 adder, and hardware spending is still very large, and circuit delay is also larger.For this problem, someone has proposed a kind of sign LMS algorithm, and this algorithm becomes the weights iterative formula: W _k+1=W _k+ 2 μ e _k-DSgn (X _k-D).By above formula as seen, the right value update circuit only needs 1 adder and some shifting functions just can realize, thereby has saved hardware spending widely.Certainly this is take sacrificial system performance (convergence rate and steady-state error) as cost.The structure of right value update module such as Fig. 7.

Claims

1. based on the adaptive filter system of LMS, it is characterized in that:

This invention is one and is treated to main filter system with data, and the performance of data channel is crucial.

2. in native system, the FIR filtration module is in data path, therefore the performance of system in its performance decision.

3. this system requirements has at a high speed.

4. therefore according to claim 1, can find to adopt streamline and concurrent technique to improve speed as the FIR filtration module of core data processing module.

5. according to traditional LMS algorithm, the FIR filtration module need to be completed in one-period.

6. and according to claim 2, the FIR filtration module need to adopt streamline to realize reaching requirement at a high speed, so need to seek follow-on LMS algorithm.

7.FIR mainly containing delay unit, multiplier and accumulator, filtration module forms.

8. according to claim 2, multiplier need to adopt combinational multiplier to realize reaching the high speed requirement, and needs to consider special construction; Accumulator should adopt special adder structure.

9. the exponent number of the filter system in the present invention is uncertain, means when exponent number is very large, and the speed of system also should be very high.

10. and the time-delay of traditional FIR filter is relevant with the exponent number of filter, thus should traditional FIR filter structure be improved, to adapt to requirement of the present invention.