06032048

616
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSI: REGULAR PAPERS, VOL. 59, NO. 3, MARCH 2012
A Novel Hybrid Monotonic Local Search Algorithm for FIR Filter Coefcients Optimization
Ahmed Shahein, Student Member, IEEE, Qiang Zhang, Niklas Lotze, and Yiannos Manoli, Senior Member, IEEE
AbstractThis work presents a formulation of the FIR lter problem with sum of power-of-two (POT) coefcients as a mixed integer linear problem and solves it heuristically. The optimization problem is formulated to minimize the number of nonzero bits in each coefcient without violating the lter specications within the pass and stop bands. A novel fast and efcient local search optimization algorithm for the lter coefcients is proposed. The algorithm called POTx does not use a tree structure in contrast to conventional MILP algorithms and offers fast computation because of a presorted search space, a monotonic dedicated search space, and the use of abort conditions. The proposed approach achieves comparable reductions to nonheuristic approaches because of a hybrid allocation scheme and multiple optimization iterations. The usefulness of the proposed algorithm for low power design of FIR lters is shown through the evaluation of several benchmark lters. Index TermsCanonic signed digit (CSD), nite impulse response (FIR), hybrid searching space, low power, mixed integer linear programming (MILP), monotonicity, power-of-two (POT).
I. INTRODUCTION
INITE impulse response (FIR) lters are widely used in digital signal processing applications due to their stability and linear phase characteristics. FIR lters have a large number of multiplications involved in the lter algorithm, which are usually implemented using xed-point or integer number representations with the lter coefcients being represented by a nite number of bits. In hard-wired ASIC designs, multiplication operations are replaced by shift-and-add operations towards multiplierless FIR lter design. From a power perspective, the fewer the number of adders, the less power the lter will consume [1]. Therefore, the optimized implementation of the FIR lters in hardware has gained signicant attention in the research community [1][4]. The goal is the minimization of nonzero terms within the discrete coefcients as each nonzero term corresponds to an additional adder in the hardware implementation. Two
Manuscript received December 10, 2010; revised April 01, 2011, June 29, 2011, August 09, 2011; accepted August 10, 2011. Date of publication October 03, 2011; date of current version February 24, 2012. This paper was recommended by Associate Editor M. Mondin. A. Shahein, Q. Zhang, and N. Lotze are with the Fritz Huettinger Chair of Microelectronics, Department of Microsystems EngineeringIMTEK, University of Freiburg, 79110 Freiburg, Germany (see http://www.imtek.de/mikroelektronik/). Y. Manoli is with the Fritz Huettinger Chair of Microelectronics, Department of Microsystems EngineeringIMTEK, University of Freiburg, 79110 Freiburg, Germany, and also with the Institute of Micromachining and Information Technology, HSG-IMIT, 78052 Villingen-Schwenningen, Germany (see http://www.hsg-imit.de). Digital Object Identier 10.1109/TCSI.2011.2165409
common representations of discrete coefcients are used: binary and signed-digit. Binary representation uses only or , is addition, such as represented by (positive) nonzero terms only. In contrary, signed-digit representation uses addition and subtraction, (where ), such as and therefore is represented by (positive/addition and negative/subtraction) nonzero terms. Thus, the discrete lter coefcients are represented as sums of either power-of-two (POT) or signed-power-of-two (SPT) terms. Binary representation includes ones complement, twos complement, and sign-magnitude representations with the coefcient bits . SPT representation, also called signed-digit representation, on the other hand includes canonic signed digit (CSD) and minimal signed digit representations with the coefcient . bits The community is divided about the optimum number representation for FIR lter implementation, therefore there is no absolute preferred representation. Moreover, the appropriate lter coefcient representation is considered as a part of the optimization problem. CSD representations generally require less nonzero bits and therefore reduce both multiplier complexity and logic depth [5]. However, when combining coefcient optimizations with common subexpression elimination (CSE), most authors report the binary representation to be superior to the CSD. The higher probability of nding common subpatterns which comes with the reduced number of possibilities for binary representation [5][8], outweighs the possible gains of CSD representation, especially in large lters [7]. CSD therefore appears as optimal choice for implementations with minimum logic depth and without CSE, whereas binary representation appears to be the best choice for an overall minimization of required adders (and therefore also offering the prospect of minimum power consumption) if the coefcient optimization is followed by a CSE. The problem of minimizing the number of nonzero terms in a FIR coefcient set is mathematically modeled in the form of an object and a subject. The object is the problem to be solved, i.e., minimizes the number of nonzero terms (min(POT)). The subject is the constraints which should be preserved while solving the problem object, i.e., the problem is subject to the passband and stopband ripple . The modeled problem is thus optimized employing either linear or nonlinear programming methods. Various optimization methods have been proposed in the literature such as linear programming, polynomial programming, convex programming and semidenite programming. The choice of the optimization method needs to be in accordance with the problem modeling.
1549-8328/$26.00 2011 IEEE
SHAHEIN et al.: A NOVEL HYBRID MONOTONIC LOCAL SEARCH ALGORITHM FOR FIR FILTER COEFFICIENTS OPTIMIZATION
617
TABLE I STATE-OF-THE-ART OPTIMIZATION TRENDS
The optimal solution for the modeled problem is achieved by considering all possibilities dened in the search space, which requires an excessive computing time. The computational cost of the conventional MILP increases exponentially with the number of variables to be optimized [9], since it searches for the global optimum. It is therefore often necessary to use a local search algorithm and/or heuristic methods, which can be computed for the modeled problem in acceptable computing time. The solution obtained, therewith, is a local optimum. II. STATE OF THE ART Methods proposed in the literature consider different optimization aspects, as shown in Table I. The proposed methods aiming for a reduction of the hardware cost have different optimization goals such as minimizing the number of SPT/POT terms, normalized peak ripple, ripple optimization, and quantization error minimization, but they are all comparable to each other. Methods for nding the optimal solution include mixed integer linear programming (MILP) [1][3], [10][13], polynomial programming (PP) [14] and semidenite programming (SDP) [15]. The conventional MILP algorithm has the advantage that it guarantees producing the optimum design (global minimum), but it requires excessive computing resources if the lter length is high. Several of the works proposing methods for nding the optimal solution are limited to smaller lter orders due to their excessive runtime. Therefore, local search methods (also known as suboptimal methods) are presented in providing most of the [16][22]. They offer a fast runtime time suboptimal (local optimum) results. A least mean square (LMS) discrete space optimization procedure with a runtime of was developed in [20], which is suitable for lter orders up to 90. Several approaches have been proposed in literature to optimize the hardware complexity of digital lters using various objects and subjects. Table I gives an overview over the approaches which minimize the number of nonzero terms [1], [3], [17], [19], is the passband ripple, is the stopband ripple, [23]. Here is the lter length, is the quantization error, NPR is the normalized peak ripple [3], [12], [23] and SPT/POT is the number of nonzero terms. All the algorithms declared in Table I are local search methods, however, [1], [3], [23] are solved using systematic algorithms like branch-and-bound (BAB) or MILP. On the other hand, [17], [19], and this work use heuristic algorithms. The works in [1], [3], as well as this work are limiting the search space for each coefcient through setting an upper and lower bound in order to limit the feasible coefcient region to speed up the computation time. While [23] and [17] integrate the CSE within the optimization process, this work performs CSE on the
optimized lter coefcient set. The approaches in [1], [3], [23] limit the number of SPT terms in each coefcient, whereas the proposed algorithm does not. The approaches in [3], [23] consider the NPR within the constraints, whereas the approaches in [17], [19] consider only the minimal NPR as a constraint for the optimization problem, and this work does not involve the NPR within the subject. as Further approaches minimize the quantization error given in [11] and [24] subjected to the total number of SPT terms. While other approaches minimizes the magnitude ripple as shown in [16] and [18]. III. CONTRIBUTIONS OF THIS WORK Little work has been published until now regarding a formulation of the FIR lter problem as a mixed integer linear problem with binary coefcients, where a signicant growth of the optimization problem size compared to CSD occurs due to the increase of the number of nonzero bits in binary representation [7]. Therefore, this work solves a mixed integer linear problem using a heuristic solver. However, the FIR problem has been modeled with binary coefcients for convex [25] and SDP [15] optimizers. In this work, the FIR lter is mathematically modeled as a mixed integer linear problem in order to minimizes the number of POT terms for a given lter specication subject to . A heuristic solver has been developed based on elementary MATLAB functions. A set of examples using benchmark lters exhibits the advantages of the proposed algorithm, which enhances the computing time by: a bounded search space between lower and upper bounds; a presorted allocation scheme; an abort condition for terminating the optimization process. Moreover, the proposed algorithm improves the optimization results by an unconstrained number of POT terms per coefcient and repeating the optimization process using a hybrid allocation scheme. The proposed algorithm can be summarized as follows: 1) nd/import oating point lter coefcients; 2) quantize/round coefcients to xed number of bits; 3) dene search range bounded by next higher and previous lower POT integer for each coefcient; 4) order the coefcients according to certain allocation scheme; 5) attempt one candidate at a time for each coefcient, with less POT terms; 6) go to next coefcient as long as constraints are sustained; 7) optionally, resort the coefcients and iterate. The rest of the paper is organized as follows: Section IV states the problem formulation. Section V describes the necessary notations and Section VI presents the proposed POTx optimization algorithm. Afterwards, an illustrative example is presented in Section VII. The performance of the developed heuristic solver and proposed algorithm is illustrated in Section VIII. Section IX concludes the discussion. IV. PROBLEM FORMULATION The FIR optimization problem is modeled with an integer linear object and a noninteger linear subject; for a given set of
618
lter specications, such as passband edge , stopband edge , passband ripple , stopband ripple and the lter length N. The problem is formulated as a mixed integer linear problem and solved by heuristic method. The optimization problem object function is shown in (1) and the subject function in (2). Even though, the problem is modeled for lowpass and highpass lters, it could be extended to bandpass lters as well.
Fig. 1. Convergent subject.
(1) subject to:
between the user dened constraints and the calculated constraints, the developed solver converges towards the minimum ripple as shown in Fig. 1. However, the unconvergent criterion offers more exibility in the subject constraints, which accordingly results in a higher reduction in nonzero terms. As an exconstraint ample, if the user dened stopband attenuation dB, the calculated might be after quantizais set to dB, tion. Consequently, the solver converges towards the unless the unconvergent attribute is set. V. PROBLEM NOTATIONS A. Upper and Lower Bounds
(2)
where is the scaled lter coefcients from the quantized lter coefcients , and is dened by
In order to speed up the computation time by reducing the search space candidates for each coefcient, thus, and are used to limit the search space between these two values. Within these bounds, the solver will iterate in monotonous steps. Since, minimum cost is desirable, the zero value is considered before it is assumed that at least a single POT is required to represent a coefcient, modeling the bounds is given in (3)
and for symmetric impulse response and odd lter length
[3] (3) A similar approach for bounding the search space was presented in [3], [23], [24], and [18]. In [3] and [23] the upper and lower bounds were determined through out nding all the values for each coefcient which meets the lter specications. B. Allocation Scheme up to , a presorted Instead of a tree sequence from allocation scheme is generated by the developed heuristic solver to allow far faster computing time. The presorted search space for the solver is represented by the allocation scheme as given in (4) (4) [2], [19], The allocation schemes are based on: Sensitivity [26], and Cost as depicted in (5), (6), and Deviation (7) respectively. The allocation scheme based on sensitivity will sort the discrete coefcients in the searching space in ascending order, i.e., from the least to higher sensitive coefcients. Equation (6) represents the deviation between the discrete coefcient and the nearest minimal cost bound, i.e., with single POT term.
while for symmetric impulse response and even lter length [3]
An independent heuristic solver using MATLAB, named POTMILP, has been developed for manipulating the formulated problem and the proposed algorithm. The number of POT terms is minimized subject to the lter specications. The subject is formulated for discrete values of by selecting a number of values i and to obtain a grid of values at each constraint. The solution to this optimization problem yields a coefcient set with minimal POT terms for which the ripple in the passband and stopband remain within the allowed bounds. The bounds of the ripple are dened by the user as a hard constraint. If there is a discrepancy
619
While (7) allocates the coefcients in descending order from maximum cost down to minimum cost.
(5) and are the frequency responses of the real where is the number and the quantized lters, respectively and of discrete sampling points over the entire frequency response.
(6) where and are the scaled upper and lower bounds, respecis the scaled coefcient. tively as depicted by (3) and
(7) where is the quantization bit-width, is the scaled lter is the number of none zero terms in the coefcient, is the th bit of the binary represcaled coefcient , and sentation of . Fig. 2(a) shows the impulse response for the oating point lter coefcients for the lter specications of and . Fig. 2(b)(f) shows each individual allocation scheme. The values are normalized. Moreover, a hybrid allocation scheme has been developed to consider more than one parameter for the allocation scheme. or However, the coefcients are sorted according to their in ascending order, or according to their Cost in descending order. Fig. 2(e) and (f) shows the hybrid searching and , space generated by the resultant of respectively. If more than one coefcient has the same cost or sensitivity, the rst coefcient in the coefcients sequence comes rst. C. Evolution The function elements are used to create a set of candidate integers for each scaled coefcient in ascending order bounded by the lower and upper bounds for each scaled coefspace in monotonic steps, with cient for the evolution , where is the maximum cost for a is a set of discrete coefcient represented in -bits. Hence, values with POT terms. The , is given by (8) (8) where if and if . As However, the evolution space, as depicted in (9)
Fig. 2. Searching space according to: (b) sensitivity; (c) deviation; (d) cost; (e) S ; and (f) S .
21
2 Cost
are dened as the set of elements with n-POT and is the maximum cost where -bits. discrete coefcient represented in for coefcient with and there is
terms, for a Thus, , , and .
(9) where
620
holds the candidates with equal cost for each discrete coefwith and , cient. Thus, for coefcient there is . D. Runtime The runtime of the proposed algorithm depends on the lter order and the number of elements on the evolution sequence . For each lter coefcient the following is dened: number of elements in number of POT terms in . ;
where the binomial coefcient is dened as . Within , there is one element in and two elements in . Then, the number of elements in each evolution sequence will be evaluated by the , as shown in (10). binomial coefcient (10) The worst case runtime for each coefcient occurs when the solver has to check all the candidates in the sequence as given in (11). (11) The worst case runtime considering multiple optimization iterations is given in (12). (12) where represents the number of multiple iterations. The total runtime of the POT programming is (13) which with case runtime results in worst (14) Based on preliminary quantitative simulations . however, in the worst case VI. PROPOSED POTX ALGORITHM The proposed POTx algorithm is illustrated in the owchart shown in Fig. 3. The basic ow of the algorithm is as follows: Inputs to the POTx algorithm are the oating point precision lter coefcients , and the quantization bit-width . The input oating point coefcients are quantized. Subsequently, coefcients are scaled. Later on, a sorted search space is created following the concept in Section V-B. The algorithm checks all the coefcient sets by dening the maximum cost ,
create a set of and candidates as explained in Section V-C optimize preserving the ripple constraints Loop through the optimized coefcients if permitted (subject is not violated) The algorithm consists of two processes. The P-I process, infor a parspects the elements of the evolution sequence ticular coefcient until approaching the break-off (BO!) state or the no-change (NC!) state. The BO! state implies abort at this candidate and does not go through the rest of the candidates. While the NC! state implies keeping the coefcient without replacing it. In this process, P-I, the lter constraints in the passband and stopband are probed for each inspected vector elements. The vector holds the candidates for each coefcient in ascending order, from lowest cost to a cost which is less than the coefcient cost. So, if the optimized coefcient cost is =5, the maximum cost for the candidates in vector is . If the subject constraints are prethe served for the optimized coefcient the algorithm will reach the break-off (BO!) state. Contrary, the algorithm will reach a no-change (NC!) state. Introducing the BO! state, accelerates the algorithm compared to the conventional MILP algorithm because of its iteration criterion in monotonic steps as shown in Fig. 4. Moreover, it offers minimum cost candidate for the optimized coefcient. Since, it prevents the algorithm from inspecting candidates which have higher cost from the vector. The P-II process, resets the internal variables and ags to start a new iteration through the preoptimized coefcients with a new allocation scheme (Aloc) and new maximum . Process P-II guarantees efcient optimization process cost with minimum cost candidates for each coefcient by multiple loops through the optimized coefcients as shown in Fig. 4 by the feedback arrow designated by Loop. Fig. 4 shows the equivalent searching structure for the POTx algorithm. A detailed illustrative example is presented in Section IV. The POTx algorithm is executed as follows: Quantize and scale coefcients Create the presorted allocation scheme Iterate or not No: Export optimized coefcients Yes: cont. Inspected all coefcients No: Go to P-I Yes: Go to P-II By using the unconvergent (UC) criterion, described in Section IV, more exibility is introduced in the constraints so that further reduction in the number of nonzero terms is achieved. There is a set of control attributes which can be used to optimize the algorithm in terms of allocation scheme, optimization iterations and constraints. To clarify the applied options, the nomenclature POTx.y.z is used in the following. The x represents the allocation scheme for presorting the lter coefcients. The y represents the solver thread. The z represents the constraints. As an example, C.LP.UC uses the allocation scheme based on Cost, with multiple optimization iterations and unconvergent constraints. And SD uses the hybrid allocation scheme .
621
Fig. 3. Algorithm owchart.
The scaled coefcients , coefcients cost and generated allocation scheme according to cost are
Fig. 4. Proposed POTx algorithm searching space structure.
VII. EXAMPLE This section presents an example to illustrate the proposed algorithm employing the developed solver. A linear-phase lowpass FIR lter with normalized passband and stopband edge frequencies at and , respectively, is considered as a case study. The desired ripple in the passband ripple and the stopband attenuation are and dB, respectively. The quantization bit-width and the lter length . Due to symmetry the optimization is processed on half of the coefcients. The quantized lter coefcients are
The and functions are generated for each coefcient. The coefcients are arranged vertically according to the cost given by the allocation space, as shown in Fig. 5 by arrow 2. The solver starts in horizontal direction as depicted by arrow 1 in Fig. 5 for all coefcients. Then the POTMILP solver proceeds as illustrated in Fig. 5. The POTMILP solver then checks for each coefcient the candidates, within the evolutions, starting , until it reaches a Break-Off (BO!) or No-Change with , the (NC!) state. Considering the rst coefcient, solver substitutes instead, as an example, then examines the lter response including the new coefcient value. If the lter constraints are not violated, the coefcient is replaced, and the Break-Off (BO!) state is reached. If the constraints are violated, the POTMILP iterates and continue with the next evo. The solver has to check a complete lution vector sequence in one run. If in a sequence more than one candidate satises the constraints, the value with the minimum
622
Fig. 5. Execution of proposed algorithm with multiple optimization iterations. Fig. 6. Frequency response of lter shown in Section VII.
difference from the original value is chosen. If none of the coefcient candidates satises the constraints, the coefcient is left unchanged, implying the NC! state. If one of the evolutions satises the constraints, the tool will abort because there is no need to continue, since the following evolution sequences will have more POT terms which means higher cost. In the complete cocoefcient set, the rst optimization loop results in the efcients with a cost reduction from 22-POTs to 13-POTs. This is the cost for just half the coefcients.
TABLE II BENCHMARK FILTERS
The second optimization iteration is marked in gray as shown in Fig. 5. The second optimization iteration has the presorted . The nal result of allocation scheme dened by vector with the optimization process is depicted by the an overall cost of 12-POTs.
Through using the iterating loops, presented by P-II in Section VI, the proposed algorithm offers adequate saving in the cost. Fig. 6 shows the frequency responses of the quantized lter, the intermediate optimized lter, and the nal optimized experience a lter. The intermediate optimized lter in its response. Whereas, the mean square error of nal optimized lter experience a mean square error . Thus, an overall saving by about 50% in the of cost is achieved within a runtime of few seconds, for a tolerable penalty in the lter response. VIII. PERFORMANCE EVALUATION There are two main factors that determine the performance of the proposed POTx algorithm: runtime and gain (Gain %). The gain represents the percentage of reduction in the number of nonzero terms in the optimized coefcient set compared to the scaled coefcient set. Therefore, higher gain is equivalent
to less nonzero terms. Several benchmark lters are optimized using the POTx algorithm and the POTMILP solver. The implemented algorithm and the developed solver are executed on a 1.6-GHz Pentium processor and 1-GB RAM. The specications of the benchmark lters are given in Table II. The number of POT terms per coefcient was not constrained. Filters A, B, C, L2, and S2 are given in [27] as well as in [1]. Inspired by the method in [4], the lter is designed using the Remez algorithm with the constraints for the stopband attenuation tightened by 3 dB. The value of 3 dB was chosen according to quantitative preliminary simulations. The approach of designing the lter with tighter constraints in the stopband and then optimizing the same lter with the given specications results in extended exibility for the optimization. As an example, if the desired stopband atis dB, the lter is designed for tenuation dB. Nonetheless, it is optimized for dB. A conversion algorithm [28] has been adopted to convert from 2s complement representation to CSD representation, in order to assure consistent comparison with other state-of-the-art algorithms. The results for the evaluation of the POTx algorithm are shown in Table III. The coefcient deviation allocation scheme results in the lowest reduction in the nonzero terms and minimum runtime for suboptimal algorithms (single run without iteration) compared to the other POTx settings. The hybrid allocation schemes, on the other hand, achieve the highest reductions in nonzero terms. Adopting the unconvergent approach generally results in the highest reductions in nonzero terms, because it offers more exibility in the subject. Fig. 7
623
TABLE III PROPOSED ALGORITHM EVALUATION
Fig. 7. Summary of results presented in Table III.
summarizes the results presented in Table III. The ratio of the minimum gain to the maximum gain varies from a factor of 1.25 to a factor of 2.5, as shown in Fig. 7. There is a set of control attributes which can be used to optimize the algorithm in terms of allocation scheme, optimization iterations and constraints. To clarify the applied options, the nomenclature POTx.y.z is used in the following. The x represents the allocation scheme for presorting the lter coefcients. The y represents the solver thread. The z represents the constraints. As an example, C.LP.UC uses the allocation scheme based on Cost, with multiple optimization iterations and unconvergent constraints, e.g., SD uses the hybrid allocation . POTSC.LP.UC using a hybrid allocation scheme scheme, in the majority of the benchmark lters exhibits one of the highest reductions in POT terms and can therefore be considered the best parameter set. However, POTD.NL.CV offers the lowest runtime at acceptable optimization results because it has no iterating optimization loops, and can thus be considered a good trade-off between runtime and optimization quality. In general, the results reveal that further reductions in the number
of nonzero terms can be attained by employing multiple optimization iterations with the unconvergent criterion. The coefcient set derived using the proposed algorithm is compared to the results of the Remez algorithm (RMZ), Aktans algorithm [1] (FIRGAM), and Shis algorithm [29] (SHI). For the Remez algorithm, the MATLAB Remez function is used to satisfy the lter specications, followed by a quantization which allows fullling the required specications. It is used as reference point to quantify the gain achieved using various algorithms. Table IV summarizes the performance of the proposed algorithm compared to the state-of-the-art algorithms FIRGAM and SHI. The number of SPT terms presented in Table IV is obtained by converting the generated lter coefcients from the POTx algorithm using the conversion algorithm described previously. The results show considerable reduction in the number of nonzero terms as well as the total number of adders , where (MA) is the multiplier adder and (SA) is the structural adder. The computation time required for the proposed algorithm to optimize the lters is shown in Table IV. The best solution time represents the time required for computing the lter coefcient set with minimum number of nonzero terms. As an example, for lter S2 the best solution time corresponds to the computing time for C.LP.UC setting. However, the total runtime corresponds to the time required for all the different POTx settings (summation of the time column for each lter). The lters optimized using SHI algorithm have the minimum total number . Nevertheless, the proposed algorithm of adders exhibits the fastest computation time. Although, lter S2 has almost the same total number of adders, it spent 1.98 min runtime using POTx algorithm compared to 16 h 42 min using SHI algorithm. The results show the remarkable savings in computation runtime using the POTx algorithm. The generated lter coefcients for lters B and S2 are given in Tables V and VI, respectively. The presented coefcients are for half the lter due to symmetry. The frequency responses corresponding to the optimized coefcients given in Tables V and VI are shown in Figs. 8 and 9, respectively. Common subexpression elimination (CSE) aims to allocate and eliminate the redundant computations consisting of two
624
TABLE IV POTX ALGORITHM VERSUS STATE-OF-THE-ART ALGORITHMS
TABLE V FILTER B GENERATED INTEGER COEFFICIENTS BASED ON SD.LP.CV SETTING FOR THE POTX ALGORITHM
TABLE VI FILTER S2 GENERATED INTEGER COEFFICIENTS BASED ON SC.LP.CV SETTING FOR THE POTX ALGORITHM Fig. 8. Filter B normalized frequency response of generated parameters given in Table V.
Fig. 9. Filter S2 normalized frequency response of generated parameters given in Table VI.
nonzero terms [30]. The authors combine three techniques to reduce the multiplier complexity: redundant reduction, pseudo
oating point representation (PFP) [30], [31], and common subexpression elimination (CSE) employing Hartly algorithm algorithm [33] for small [32] for high lter order and lter order. The redundancy reduction removes zeros, ones, power-of-two terms, tailing shifts, and redundant terms (explained later numerically). The ones, the power-of-two terms, and the tailing shifts are excluded before CSE as they are hard-wire routes without hardware overhead for implementation. Tailing shifts are calculated by extracting the even counts in in each coefcient, as shown in the second column Tables VIIVIII. A similar approach can be found in [16]. The pseudo oating point [30], [31] (PFP) representation contributes to the reduction of word-length requirement, therefore it has been adopted before CSE. The most common subexpres, and their negated versions [32]. sion (CS) pairs are
625
TABLE VII FILTER B CSE
TABLE VIII FILTER S2 CSE
Fig. 10. Worst runtime versus lter order for various algorithms.
Throughout CSE each individual bit is not allowed to occur in more than one pair to assure overlapping is not counted twice [32]. The CS pairs used in this work are
CSD representation for the lters B and S2 are 9, 11, respectively. Comparing Tables VI and VIII for lter S2, it can be observed that there are four absent coefcients (after excluding the zeros and power-of-two coefcients) which are: 80, 136, 1008, and 872. They are implicitly implemented through other coef, cients as follows: . Consequently, redundant terms have been and eliminated from the lter coefcient set by preserving the odd fundamentals only [35]. While the MILP is no option for higher lter orders, the proposed POTx algorithms shows remarkable savings in the runtime. The worst case runtime of various lengths using the proposed and other state-of-the-art algorithms, FIRGAM [1] and SHI [29], is shown in Fig. 10. Even though iterating the algotimes for the worst case, the runtime is substantially rithm reduced by the POTx algorithm. The runtime of local search algorithm does not increase exponentially as lter length increase [17]. IX. CONCLUSION The proposed algorithm shows considerable reductions in the number of nonzero bits of up to 40% compared to the original lter and up to 10% more than other state-of-the-art algorithms. Moreover, a computation time of less than 2 min for a lter length of 300 is achieved. The proposed algorithm optimizes the POT terms in the discrete coefcients given the lter characteristics. The worst case runtime of the algorithm is linear, which makes it desirable for designing low power FIR lters even with high lter orders. The fast computation time is achieved because of the proposed presorted allocation scheme, monotonicity in the candidate costs and an efcient break-off condition. Due to the hybrid allocation scheme, the unconvergent approach and multiple optimization iterations, the proposed algorithm results in reductions in the number of POT terms which are comparable to nonheuristic approaches. ACKNOWLEDGMENT The authors would like to acknowledge Chip Hong Chang and Mathias Faust for their admirable assistance. The authors
The CSE coefcients corresponding to the coefcients generated using POTx (Tables VVI) are given in Tables VIIVIII in CSD, respectively. Table VIII shows the CSE coefcients generated using the SPIRAL [34] online generator employing the algorithm [33]. The last row in each table is the number of adders required within the multipliers (MA). Additionally the number of CS pairs used in the lter coefcient set is added to get the overall number of MA. The coefcient set for lter B employs 4 CS pairs, while the coefcient set for lter S2 employs . Therefore, the number of only 6 CS pairs MAs for lters B and S2 are and respectively, as shown in Table IV. The effective bit-width required for the
626
are grateful to Johan Lfberg, Oscar Gustafsson, and Michael Maurer for their valuable discussions. The authors would like to acknowledge the International Graduate Academy (IGA) of the University of Freiburg for their assistance and support.
REFERENCES
[1] M. Aktan, A. Yurdakul, and G. Dundar, An algorithm for the design of low-power hardware-efcient FIR lters, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 6, pp. 15361545, Jul. 2008. [2] Y.-C. Lim, R. Yang, D. Li, and J. Song, Signed power-of-two (spt) term allocation scheme for the design of digital lters, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS98), vol. 5, 31, pp. 359362. [3] O. Gustafsson, H. Johansson, and L. Wanhammar, An milp approach for the design of linear-phase FIR lters with minimum number of signed-power-of-two terms, in Proc. Eur. Conf. Circuit Theory Design, Espoo, Finland, 2001. [4] R. Mehboob, S. Khan, and R. Qamar, Fir lter design methodology for hardware optimized implementation, IEEE Trans. Consum. Electron., vol. 55, no. 3, pp. 16691673, Aug. 2009. [5] C.-H. Chang and M. Faust, On A new common subexpression elimination algorithm for realizing low-complexity higher order digital lters, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 29, no. 5, pp. 844848, May 2010. [6] L. Aksoy, E. da Costa, P. Flores, and J. Monteiro, Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 6, pp. 10131026, Jun. 2008. [7] L. Aksoy, E. O. Gunes, E. Costa, P. Flores, and J. Monteiro, Effect of number representation on the achievable minimum number of operations in multiple constant multiplications, in Proc. IEEE Workshop Signal Process. Syst., 2007, pp. 424429. [8] M. Imran, K. Khursheed, M. ONils, and O. Gustafsson, On the number representation in sub-expression sharing, in Proc. Int. Conf. Signals Electron. Syst. (ICSES10), pp. 1720. [9] Y. YAJUN, Multiplierless multirate FIR lter design and implementation Ph.D. dissertation, National Univ. Singapore, Singapore, May 2003 [Online]. Available: http://scholarbank.nus.edu.sg/bitstream/handle/10635/14005/mythesis.pdf [10] Z. G. Feng and K. L. Teo, A discrete lled function method for the design of FIR lters with signed-powers-of-two coefcients, IEEE Trans. Signal Process., vol. 56, no. 1, pp. 134139, Jan. 2008. [11] N. Takahashi and K. Suyama, Design of csd coefcient FIR lters based on branch and bound method, in Proc. Int. Symp. Commun. Inf. Technol. (ISCIT10), pp. 575578. [12] Y. Lim, Design of discrete-coefcient-value linear phase FIR lters with optimum normalized peak ripple magnitude, IEEE Trans. Circuits Syst., vol. 37, no. 12, pp. 14801486, Dec. 1990. [13] H. Q. Ta and T. Le-Nhat, Design of FIR lter with discrete coefcients based on mixed integer linear programming, in Proc. Int. Conf. Signal Process. (9th ICSP08), pp. 912. [14] W.-S. Lu and T. Hinamoto, Design of FIR lters with discrete coefcients via polynomial programming: Towards the global solution, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS07), pp. 20482051. [15] W.-S. Lu, Design of FIR lters with discrete coefcients: A semidefinite programming relaxation approach, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS01), vol. 2, pp. 297300. [16] Y. J. Yu and Y. C. Lim, Design of linear phase FIR lters in subexpression space using mixed integer linear programming, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 54, no. 10, pp. 23302338, 2007. [17] F. Xu, C. H. Chang, and C. C. Jong, Design of low-complexity FIR lters based on signed-powers-of-two coefcients with reusable common subexpressions, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 26, no. 10, pp. 18981907, 2007. [18] C.-Y. Yao and C.-L. Sha, Fixed-point FIR lter design and implementation in the expanding subexpression space, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS10), pp. 185188. [19] Z. Ye and C.-H. Chang, Local search method for FIR lter coefcients synthesis, in Proc. IEEE Int. Workshop Electron. Design, Test Appl. (2nd DELTA04), pp. 255260. [20] Y. Lim and S. Parker, Discrete coefcient FIR digital lter design based upon an lms criteria, IEEE Trans. Circuits Syst., vol. CAS-30, no. 10, pp. 723739, Oct. 1983.
[21] H. Samueli, An improved search algorithm for the design of multiplierless FIR lters with powers-of-two coefcients, IEEE Trans. Circuits Syst., vol. 36, no. 7, pp. 10441047, Jul. 1989. [22] C.-L. Chen, J. Willson, and A. N. , A trellis search algorithm for the design of FIR lters with signed-powers-of-two coefcients, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 46, no. 1, pp. 2939, Jan. 1999. [23] J. Yli-Kaakinen and T. Saramaki, A systematic algorithm for the design of multiplierless FIR lters, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2001, vol. 2, pp. 185188. [24] T. Fujie, R. Ito, K. Suyama, and R. Hirabayashi, A new heuristic signed-power of two term allocation approach for designing of FIR lters, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2003, vol. 4, pp. IV-285IV-288. [25] W. S. Lu, Design of FIR digital lters with discrete coefcients via convex relaxation, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS05), vol. 2, pp. 18311834. [26] A. Shahein, M. Becker, N. Lotze, M. Ortmanns, and Y. Manoli, Optimized scheme for power-of-two coefcient approximation for low power decimation lters in sigma delta adcs, in Proc. IEEE Midwest Symp. Circuits Syst. (51stMWSCAS08), pp. 787790. [27] Suite of Constant Coefcient FIR Filters 2010, FIRsuite [Online]. Available: http://www.rsuite.net [28] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York: Wiley, 1999. [29] D. Shi and Y. J. Yu, Design of linear phase FIR lters with high probability of achieving minimum number of adders, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 1, pp. 126136, Jan. 2011. [30] A. Vinod and E.-K. Lai, An efcient coefcient-partitioning algorithm for realizing low-complexity digital lters, IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 12, pp. 19361946, Dec. 2005. [31] A. P. Vinod and E. M.-K. Lai, Low power and high-speed implementation of FIR lters for software dened radio receivers, IEEE Trans. Wirel. Commun., vol. 5, no. 7, pp. 16691675, Jul. 2006. [32] R. Hartley, Subexpression sharing in lters using canonic signed digit multipliers, IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process., vol. 43, no. 10, pp. 677688, Oct. 1996. [33] Y. Voronenko and M. Pschel, Multiplierless multiple constant multiplication, ACM Trans. Algorithms, vol. 3, no. 2, 2007, art. no. 11. [34] Software/Hardware Generation for DSP Algorithms, Spiral [Online]. Available: http://spiral.ece.cmu.edu/mcm/gen.html [35] M. Faust and C.-H. Chang, Minimal logic depth adder tree optimization for multiple constant multiplication, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS10), pp. 457460.
Ahmed Shahein received the B.Sc. and M.Sc. degrees in electronics and communications from the University of Ain Shams, Cairo, Egypt, in 2002 and 2007, respectively. He is currently working toward the Ph.D. degree at the Fritz Huettinger Chair of Microelectronics of the Institute of Microsystems Technology (IMTEK), University of Freiburg, Freiburg, Germany. His research interests lie in the elds of DSP, low power digital circuits, and low power digital lters.
Qiang Zhang graduated from Huazhong University of Science and Technology Hankou Branch Institute, China, in 1998. He received the Dipl.-Ing. (M.Sc.) degree in software engineering from the University of Freiburg, Germany, in 2010.
627
Niklas Lotze received the Dipl.-Ing. (M.Sc.) degree in microsystem technology from the University of Freiburg, Freiburg, Germany, in 2004. He was part of the Ph.D. program Embedded Microsystems of the University of Freiburg (20052008) and is now a Research Assistant at the Fritz Huettinger Chair of Microelectronics of the Institute of Microsystems Technology (IMTEK, University of Freiburg). His research interests lie in the eld of ultra-low power, ultra-low voltage digital circuits.
Yiannos Manoli (M82SM08) was born in Famagusta, Cyprus, in 1954. As a Fulbright scholar, he received the B.A. degree (summa cum laude) in physics and mathematics from Lawrence University, Appleton, WI, in 1978, the M.S. degree in electrical engineering and computer science from the University of California, Berkeley, in 1980, and the Dr.-Ing. degree in electrical engineering from the Gerhard Mercator University, Duisburg, Germany, in 1987. From 1980 to 1984, he was a Research Assistant at the University of Dortmund, Germany, in the eld of A/D and D/A converters. In 1985, he joined the newly founded Fraunhofer Institute of Microelectronic Circuits and Systems, Duisburg, Germany, where he established a design group working on mixed-signal CMOS circuits especially for monolithic integrated sensors and application specic microcontrollers. From 1996 to 2001, he held the Chair of Microelectronics as full Professor with the Department of Electrical Engineering, University of Saarland, Saarbruecken, Germany. In 2001, he joined the Department of Microsystems Engineering (IMTEK) of the Albert-Ludwig-University, Freiburg, Germany, where he established the Chair of Microelectronics. With an endowment of the Fritz Huettinger Foundation and in memory of the founder of todays Huettinger Elektronik, the University of Freiburg named the chair Fritz Huettinger Chair of Microelectronics in 2010. Since 2005, he has additionally served as one of the three directors at the Institute of Micromachining and Information Technology of the Hahn-Schickard Gesellschaft (HSG-IMIT) in Villingen-Schwenningen, Germany. His current research interests are the design of low-voltage/low-power mixed-signal CMOS circuits, energy harvesting electronics, sensor readout circuits, as well as analog-to-digital converters. Additional research activities concentrate on motion and vibration energy transducers and on the eld of inertial sensors and sensor fusion. In 2000, he had the opportunity to spend half a year on a research project with Motorola (now Freescale) in Phoenix, AZ. In 2006, he spent his sabbatical semester with Intel, Santa Clara, CA, working on the readout electronics for a high-resolution accelerometer. Prof. Manoli and his group have received best paper awards at ESSCIRC 1988 and 2009, PowerMEMS 2006, MWSCAS 2007, and MSE-2007. The MSE2007 award was granted for SpicyVOLTsim (www.imtek.de/svs), a web-based application for the animation and visualization of analog circuits for which Yiannos Manoli also received the Media Prize of the University of Freiburg in 2005. He was the rst to receive the Best Teaching Award of the Faculty of Engineering when it was introduced in 2008. For his creative and effective contributions to the teaching of microelectronics, he has also received the Excellence in Teaching Award of the University of Freiburg and the Teaching Award of the State of Baden-Wuerttemberg, both in 2010. He is a Distinguished Lecturer of the IEEE. He is on the Senior Editorial Board of the IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS and on the Editorial Board of the Journal of Low Power Electronics. He served as Guest Editor of the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS in 2002 and the IEEE JOURNAL OF SOLID-STATE CIRCUITS in 2011. He has served on the committees of a number of conferences such as ISSCC, ESSCIRC, IEDM, and ICCD, and was Program Chair (2001) and General Chair (2002) of the IEEE International Conference on Computer Design (ICCD). He is a member of VDE, Phi Beta Kappa, and Mortar Board.

06032048

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

06032048

Diunggah oleh

Hak Cipta:

Format Tersedia

616

1549-8328/$26.00 2011 IEEE

TABLE I STATE-OF-THE-ART OPTIMIZATION TRENDS

(1) subject to:

and for symmetric impulse response and odd lter length

terms, for a Thus, , , and .

Fig. 3. Algorithm owchart.

Fig. 4. Proposed POTx algorithm searching space structure.

TABLE II BENCHMARK FILTERS

TABLE III PROPOSED ALGORITHM EVALUATION

Fig. 7. Summary of results presented in Table III.

TABLE IV POTX ALGORITHM VERSUS STATE-OF-THE-ART ALGORITHMS

TABLE VII FILTER B CSE

TABLE VIII FILTER S2 CSE

Anda mungkin juga menyukai