JP3453618B2

JP3453618B2 - Processor for division and square root using polynomial approximation of root

Info

Publication number: JP3453618B2
Application number: JP14659192A
Authority: JP
Inventors: エス．エム．クエック; ヒューラリー; プラブーインヤネシュヴァール; エー．ウェアフレデリック
Original assignee: ヒュンダイ、エレクトロニクス、インダストリーズ、カムパニー、リミテッド
Priority date: 1991-10-11
Filing date: 1992-05-11
Publication date: 2003-10-06
Anticipated expiration: 2018-10-06
Also published as: JPH07200266A

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、浮動点の割り算と平方
根の演算を行う為の数学的プロセッサーに関する。FIELD OF THE INVENTION The present invention relates to a mathematical processor for performing floating point division and square root operations.

【０００２】[0002]

【従来の技術及び発明が解決しようとする課題】一般的
に言えば、長い割り算を手で計算するように、引き算と
けた送りの冗長な方法をもってして数学的コプロセッサ
ーで割り算と平方根演算がなされている。この方法の欠
点は全ての必要な引き算とけた送り作業を完了するのに
長時間かかるということである。BACKGROUND OF THE INVENTION Generally speaking, division and square root operations are performed in a mathematical coprocessor with a redundant method of subtraction and feed, much like manually calculating long divisions. Has been done. The drawback of this method is that it takes a long time to complete all the necessary subtraction and feeding operations.

【０００３】割り算と平方根演算の速度を改善するのに
使用される一つの方法は、二次的収斂の利点をとること
である。もし、ｙ／ｘを決定しようとした場合、最初に
１／ｘが決定されるが、それは冗長な長い計算以外の方
法によってなされる。次に、乗法演算（ｙに１／ｘをか
ける）を使用することができる。それにより計算の速度
が上がる。１／ｘの値を決定する一つの方法は全ての可
能な値をメモリーに蓄積することであろう。One method used to improve the speed of division and square root operations is to take advantage of quadratic convergence. If we try to determine y / x, then 1 / x is first determined, which is done by a method other than redundant long computations. Then, a multiplication operation (y times 1 / x) can be used. This speeds up the calculation. One way to determine the value of 1 / x would be to store all possible values in memory.

【０００４】しかしながら、正確なビット数が要求され
る場合には、大容量のメモリーでなくてはならない。従
って、１／ｘの値は異なったｘの値に対して毎回計算さ
れることになる。一つの方法は、低い正確度の初期近似
式から始め、そしてそれをニュートン・ラフソンの逐次
解法を利用してそれを改善することである。そのような
方法は、Ｓ．ウェイザーとM.J.フリンによる「ディジタ
ルシステム設計者の為の算法入門」（1982年刊、ニュー
ヨーク、CBS カレッジ出版）に記述されている。However, when an accurate number of bits is required, a large capacity memory must be used. Therefore, a value of 1 / x will be calculated each time for a different value of x. One way is to start with a low accuracy initial approximation and improve it using the Newton-Raphson iterative solution. Such a method is described by S. It is described in "Introduction to Arithmetic for Digital System Designers" by Weiser and MJ Flynn (1982, New York, CBS College).

【０００５】この逐次解法はＦ_K+1＝Ｆ_K（２−ｘ
Ｆ_K）、という方程式を使用する。ここでＦ_Kは１／ｘ
の近似式である。Ｆ_Kの初期近似式を用いること及び何
回かの逐次解法をすることによってその正確さは二次的
収斂を通じて必要な点迄改善することが出来る。ｘ演算
数は一般に１≦ｘ＜２の範囲内にて正規化され、そして
最初の１／ｘはROM に蓄積された値から選択することが
出来る。この技術は最初に１／√ｘを計算し、次に√ｘ
をつくるためにｘをかける。もし望むならば、１／√ｘ
に対する初期近似式は上記のそれに似たニュートン・ラ
フソンの逐次解法によって改善することが出来る。This iterative solution method is F _{K + 1} = F _K (2-x
F _K ), is used. Where F _K is 1 / x
Is an approximate expression of. The accuracy can be improved to the required point through quadratic convergence by using the initial approximation formula of F _K and performing several iterative solving methods. The x operands are generally normalized within the range 1≤x <2, and the first 1 / x can be selected from the values stored in ROM. This technique first calculates 1 / √x and then √x
Multiply x to create 1 / √x if desired
The initial approximation for can be improved by a Newton-Raphson iterative method similar to that described above.

【０００６】[0006]

【課題を解決するための手段及び作用】本発明は与えら
れた演算数ｘに対する１／ｘと１／√ｘの値を決定し、
ｙ／ｘと√ｘの値を決定するための改善された装置と方
法を与えている。これは多項式方程式を利用することに
よって達成される。多項式方程式の係数はROM に蓄積さ
れる。ｘの異なった値に対して異なった係数が使用され
る。適切な係数を選択するためにROM への入力としてｘ
演算数の最上位のビットが使用される。The present invention determines the values of 1 / x and 1 / √x for a given operation number x,
It provides an improved apparatus and method for determining the values of y / x and √x. This is achieved by utilizing polynomial equations. The coefficients of the polynomial equation are stored in ROM. Different coefficients are used for different values of x. X as input to ROM to select appropriate coefficient
The most significant bit of the operand is used.

【０００７】好ましい実施例においては、多項式方程式
は二番目の順番であり、そして係数は単精度の正確さの
最終結果を与えるに十分な精密さを持ち合わせている。
二倍精度のためには多項式の結果がニュートン・ラフソ
ンの逐次解法のための初期近似式として使用される。こ
のことによって、最初のニュートン・ラフソン近似式の
ためにROM の規模を増やす必要が無く最初の近似式に対
しもっと高度な正確さが与えられるのである。従って、
二倍精度の正確さの結果を得るのにニュートン・ラフソ
ンの逐次解法は唯一回だけ要求されるのである。In the preferred embodiment, the polynomial equations are in second order, and the coefficients are sufficiently precise to give a final result of single precision accuracy.
For double precision, the polynomial result is used as an initial approximation for the Newton-Raphson iterative solution. This gives a higher degree of accuracy to the first Newton-Raphson approximation without having to increase the ROM size for the first approximation. Therefore,
The Newton-Raphson iterative solution is required only once to obtain double-precision accuracy.

【０００８】使用される多項式はＡｘ²＋Ｂｘ＋Ｃの形
が好ましい。現在の発明においても又この多項式の計算
速度を上げるために分割掛け算器アレイを使用してい
る。最初のｘ²の計算後、Ａをｘ²にそしてＢをｘにか
けることは掛け算器アレイの半分ずつにより同時に行わ
れる。これは記憶された係数が完全に正確であることが
必要とされないので可能なのである。このようにして、
それらは掛け算をするのに掛け算器アレイの一部のみ必
要とするのである。係数の規模はＡｘ²＋Ｂｘ＋Ｃの要
素の各々に対する適切な精密さの結果をもたらすのに必
要とされるものに対応する。好ましい実施例では、Ａは
15ビット、Ｂは24ビット、Ｃは32ビットである。１／ｘ
と１／√ｘの近似式にはＡ，Ｂ及びＣの係数の別個の値
を記憶しなくてはならない。The polynomial used is preferably of the form Ax ² + Bx + C. The present invention also uses a split multiplier array to speed up the calculation of this polynomial. After the initial x ² calculation, multiplying A by x ² and B by x is done simultaneously by half of the multiplier array. This is possible because the stored coefficients need not be perfectly accurate. In this way
They need only part of the multiplier array to do the multiplication. The magnitude of the coefficients corresponds to what is needed to produce the result of appropriate precision for each of the elements of Ax ² + Bx + C. In the preferred embodiment, A is
15 bits, B is 24 bits, and C is 32 bits. 1 / x
Separate values for the coefficients of A, B and C must be stored in the approximate expression for and 1 / √x.

【０００９】本発明の更なる面として、けた上げ／保留
・フォーマットで演算数を受け入れることの出来る特別
な掛け算器が供給される。これは、けた上げを完全に伝
搬することによって部分積を通常の二進法フォーマット
に変換する必要性を排除するものである。このようにし
て、掛け算のシーケンスの速度は改善出来る。これは、
演算数を通常の二進法あるいはけた上げ／保留・フォー
マットを受け入れることの出来るブースレコーダ論理を
備えることによって達成される。As a further aspect of the invention, a special multiplier is provided which is capable of accepting operands in carry / hold format. This eliminates the need to convert the partial product to normal binary format by propagating the carry completely. In this way the speed of the multiplication sequence can be improved. this is,
This is accomplished by providing Booth recorder logic that can accept normal binary or carry / hold formats.

【００１０】本発明はまたIEEEに対して唯一回の掛け算
を含む演算によって正確な丸めを出すことを備える改善
された丸め法を備えている。この発明はこのようにして
幾つかの方法でプロセッサー機能の速度を上げる。第１
に初期近似式は多項近似式を使用することによって非常
に正確になる。第２に分割掛け算器アレイにより二つ以
上の演算を同時に行うことが出来る。第３にブースレコ
ーダ論理は次の掛け算がなされる前に純粋な二進法フォ
ーマットに入れる必要がないようにけた上げ／保留・フ
ォーマットの中の中間の演算数を受け入れるように設計
されている。最後にこの発明の特別丸め法には算出段階
が非常に少ない。The present invention also provides an improved rounding method to IEEE that provides exact rounding by an operation involving a single multiplication. The present invention thus speeds up processor functionality in several ways. First
The initial approximation to is very accurate by using the polynomial approximation. Second, the split multiplier array allows two or more operations to be performed simultaneously. Third, the Booth Recorder logic is designed to accept intermediate operands in a carry / hold format so that it does not have to be put in pure binary format before the next multiplication is done. Finally, the special rounding method of the present invention has very few calculation steps.

【００１１】この発明の技術は逆数と逆数平方根演算に
は使用されない特別丸め法を除いて、割り算、平方根、
逆数そして逆数平方根演算に用いられる。The technique of the present invention, except for special rounding methods that are not used for reciprocal and reciprocal square root operations, divide, square root,
Used for reciprocal and reciprocal square root operations.

【００１２】[0012]

【実施例】図１は本発明を取り入れた半導体チップ全体
のブロックダイヤグラムである。一組の演算数が入力バ
ス12と14の上に与えられる。これらは登録ファイル16に
与えられる。登録ファイルは次々に分離しているALU 18
と掛け算器ユニット20とに接続される。これらの出力は
マルチプレクサ22を通じて出力24に供給される。この構
成によってALU と掛け算器ユニットを平行して使用する
ことが出来る。今回の出願が指し示しているこの発明は
掛け算器ユニット20の内部に含まれている。1 is a block diagram of an entire semiconductor chip incorporating the present invention. A set of operands is provided on input buses 12 and 14. These are provided in the registration file 16. Registration files are separated one after another ALU 18
And the multiplier unit 20. These outputs are
It is supplied to the output 24 through the multiplexer 22. This configuration allows the ALU and multiplier unit to be used in parallel. The invention pointed to in this application is contained within multiplier unit 20.

【００１３】多項近似式図２はｘ＝1.0 と2.0 の間の１／ｘの曲線を示すグラフ
である。背景において論じたように、割り算の演算を、
与えられた演算数ｘで行うためには、１／ｘを最初に計
算し、そして次いで分子演算数ｙをかけるのが有利であ
る。１／ｘの決定は、1.0 と2.0 の間の与えられたｘの
値に対する近似式から開始する。全ての演算 (逆数、割
り算、逆数平方根、そして平方根) は注記した場合を除
いて、最初の演算数としてIEEE規格754 に従って、常に
1.0 から2.0 の間に正規化された数値をとる。 Polynomial Approximation Formula FIG. 2 is a graph showing a 1 / x curve between x = 1.0 and 2.0. As discussed in the background,
To do with a given operand x, it is advantageous to calculate 1 / x first and then multiply by the numerator operand y. The determination of 1 / x starts with an approximation for a given value of x between 1.0 and 2.0. All operations (reciprocal, division, reciprocal square root, and square root) are always the first operation according to IEEE Standard 754, unless otherwise noted.
Takes a normalized number between 1.0 and 2.0.

【００１４】本発明において1.0 と2.0 の間隔は256 の
更に小さな間隔に分割され、そして図２の各々の間隔ｉ
における１／ｘのカーブはＡ_iｘ²＋Ｂ_iｘ＋Ｃ_iによ
って正確に近似することが出来る。Ａ_i，Ｂ_i及びＣ_i
のこれらの256 間隔の各々の係数は、別個に前もって決
定される。In the present invention, the 1.0 and 2.0 intervals are divided into 256 smaller intervals, and each interval i in FIG.
The 1 / x curve at can be accurately approximated by A _i x ² + B _i x + C _i . A _i , B _i and C _i
The coefficients for each of these 256 intervals of are separately predetermined.

【００１５】各々の間隔のために、多項式Ａｘ²＋Ｂｘ
＋Ｃの係数は、間隔の中間点と、中間点から各々の方向
に間隔の半分の√３／２倍の所の他の２点を選択するこ
とによって決定される。これらの３点は次に、Ａ_i，Ｂ
_i及びＣ_iを創るために一斉に解かれる三つの等式を創
るのに使用される。Ａ₁からＡ₂₅₆，Ｂ₁からＢ₂₅₆そ
してＣ₁からＣ₂₅₆迄の係数の値は、図３，図４に示さ
れたように、そして更に以下で説明するようにROM 30に
記憶されることになっている。ROM 30はライン32上のｘ
演算数の最上位側な８ビットによってアドレスされる。
一つの特異な実施例は、各々の間隔でｘの最上位側の８
ビットは一定であるという事実を利用している。For each interval, the polynomial Ax ² + Bx
The coefficient of + C is determined by selecting the midpoint of the interval and the other two points in the respective direction from the midpoint at √3 / 2 times half the interval. These three points are then A _i , B
It is used to create three equations that are solved together to create _i and C _i . The values of the coefficients A ₁ to A ₂₅₆ , B ₁ to B ₂₅₆ and C ₁ to C ₂₅₆ are stored in ROM 30 as shown in FIGS. 3 and 4 and further described below. It is supposed to be. ROM 30 is x on line 32
Addressed by the 8 most significant bits of the operand.
One peculiar embodiment is to use the eight most significant x's at each interval.
It takes advantage of the fact that the bits are constant.

【００１６】これは、ｘ＝ｘ₀＋Δｘを定義することで
なされる。ここで、ｘ₀はｘの最上位８ビットであり
（それは各々の間隔を通じて一定である）、Δｘはｘの
残りのビットを引き連れた８つのゼロである。これが方
程式Ａｘ²＋Ｂｘ＋Ｃで展開し、そして条件が再びグル
ープ化される時、結果としての方程式は、（Ａ）Δｘ²＋（２Ａｘ₀＋Ｂ）Δｘ＋（Ａｘ₀ ²＋Ｂ
ｘ₀＋Ｃ）である。括弧内の値は各々間隔を通じて一定であり、修
正されたＡ，ＢそしてＣとしてROM に記憶出来る。This is done by defining x = x ₀ + Δx. Where x ₀ is the most significant 8 bits of x (which is constant throughout each interval) and Δx is the eight zeros with the remaining bits of x. When this expands to the equation Ax ² + Bx + C, and the conditions are grouped again, the resulting equation is (A) Δx ² + (2Ax ₀ + B) Δx + (Ax ₀ ² + B
x ₀ + C). The values in brackets are constant throughout the interval and can be stored in ROM as modified A, B and C.

【００１７】初期近似式がなされた時、算出はＡΔｘ²
＋ＢΔｘ＋Ｃである。ここで、Ａ，Ｂ及びＣはそれらの
修正された値である。これの利点は有効数字を処理する
ということである。Δｘは８つの先導するゼロを持ち、
Δｘ²は16の先導するゼロを持つのであるが、それらは
それらから出来た積に移転され先導するのである。だか
ら、Ａの最上位ビットは近似式の17番目及びそれ以下の
上位ビットにのみ効果を与えることが出来、Ｂの最上位
ビットは近似式の９番目及びそれ以下の上位ビットにの
み効果を与えることが出来る。なぜなら、この初期近似
式では最大で28ビットまで正確であることが必要である
ので、係数は全て完全に精密に記憶する必要はない。When the initial approximation formula is made, the calculation is AΔx ²
+ BΔx + C. Where A, B and C are their modified values. The advantage of this is that it handles significant figures. Δx has eight leading zeros,
Δx ² has 16 leading zeros, but they are transferred and leading to the product made from them. So, the most significant bit of A can only affect the 17th and lower significant bits of the approximation, and the most significant bit of B only affects the 9th and lower significant bits of the approximation. You can This is because it is necessary for this initial approximation to be accurate up to 28 bits, so it is not necessary to store all the coefficients exactly.

【００１８】好ましい実施例においてはＡについては正
確な15ビットが、Ｂについては24、そしてＣについては
32ビットがある。これによりＡｘ²＋Ｂｘ＋Ｃにとって
は全部正確な31ビットを備えることになる、というのは
追加される三つの要素の各々に最低31ビットの正確さが
存在するからである。この値ｘに対応する係数Ａ，Ｂ及
びＣは次いで各々レジスター31,33 及び34に供給される
のである。In the preferred embodiment, there are exactly 15 bits for A, 24 for B, and C for
There are 32 bits. This would provide for Ax ² + Bx + C all exactly 31 bits, since there is at least 31 bits of accuracy in each of the three additional elements. The coefficients A, B and C corresponding to this value x are then supplied to registers 31, 33 and 34, respectively.

【００１９】図３，図４は、36，38の二つに半分ずつに
分割された掛け算器を示している。多項式計算のために
は、ｘ²条件では各々の半分が一つの掛け算器として働
くべく連結され、そして以下に論ずるように、Ａｘ²と
Ｂｘとの同時乗法においては二つの別個の掛け算器とし
て使用される。FIGS. 3 and 4 show a multiplier divided into two halves, 36 and 38, respectively. For polynomial calculations, each half is concatenated to work as one multiplier in the x ² condition, and used as two separate multipliers in the simultaneous multiplication of Ax ² and Bx, as discussed below. To be done.

【００２０】図13を参照、修正されたＡ，Ｂ及びＣの調
査はサイクル１の間に行われる。同時に、二乗条件を創
りだすためにΔｘそれ自身の掛け算が二つの半分アレイ
が一つとなって動くことにより、なされる。演算数Δｘ
の上位順のビットは演算数マルチプレクサ40とマルチプ
レクサ52を通じてけた上げ／保留加算器(CSA) 掛け算器
アレイの半分36の“Ａ”入力に供給される。一方、残り
の低位順のビットはアレイ半分38に供給される。Δｘ演
算数はまたマルチプレクサ41を通じてブースレコーダ50
に供給される。ブースレコーダ50は次にΔｘの上位順の
ビットのブースレコードバージョンを、掛け算器アレイ
36の“Ｂ”演算数入力に供給する、そして低位順のビッ
トはアレイ38の“Ｂ”入力に供給する。低位順の掛け算
の積及びけた上げフォーム(CS)はそれぞれマルチプレク
サ42と43に供給される。Referring to FIG. 13, the inspection of the modified A, B and C is done during cycle 1. At the same time, the multiplication of Δx itself is done by moving the two half arrays together to create a squared condition. Number of operations Δx
The high-order bits of the carry through the arithmetic multiplexer 40 and multiplexer 52 are carry / hold adder (CSA) multipliers
It is supplied to the "A" input of half 36 of the array . Meanwhile, the remaining low order bits are provided to array half 38. The Δx operation number is also increased by the booth recorder 50 through the multiplexer 41.
Is supplied to. The booth recorder 50 then calculates the booth record version of the bits in the high order of Δx as the multiplier array.
Feeds the "B" operand of 36 and the low order bits feed the "B" input of array 38 . The low-order multiply products and carry forms (CS) are provided to multiplexers 42 and 43, respectively.

【００２１】理解されるようにこのサイクルの制御信号
は“０”の表示があるこれらの入力を選択する。これら
は次に掛け算器アレイの中において丁度そうであるよう
に、けた送りされ、アレイ半分36の入力44,45 に適用さ
れる。アレイ36は計算を完了する。Δｘ²の結果は「第
一」パイプラインのためマルチプレクサ46を通じてアレ
イ半分36の積出力からレジスター47へ供給される。
(「第二」パイプラインはけた上げ／プロパゲート(CP)
加算器48及び第二パイプラインレジスター49からな
る。) この段階でのΔｘ²演算数はけた上げ／保留・フ
ォーマットに記憶される。As will be appreciated, the control signals for this cycle select those inputs which have a "0" indication. These are then digitized and applied to the inputs 44,45 of array half 36, just as they are in the multiplier array. Array 36 completes the calculation. The result of Δx ² is provided to register 47 from the product output of array half 36 through multiplexer 46 for the "first" pipeline.
("Second" pipeline carry / propagate (CP)
It consists of an adder 48 and a second pipeline register 49. ) The Δx ² operation number at this stage is stored in the carry / hold / format.

【００２２】図13に示され、サイクル２のための制御信
号のある図５，図６で図示された二番目のサイクルにお
いては、ＡΔｘ²条件を計算するためにレジスター47の
Δｘ²条件と演算数Ａとが掛け算器アレイ36に供給され
る。同時にＢ演算数とΔｘ演算数とはＢΔｘ掛け算が同
時にできるように掛け算器アレイ38に供給される。係数
の64ビット全てが必要とされないので、分割された掛け
算器アレイの二つの部分を使用してこのことを同時に行
うことが可能なのである。同時にこれは、けた上げ／プ
ロパゲート(CP)加算器48とレジスター49からなる第二パ
イプラインの中のサイクル２で起こり、CP丸め演算が純
粋な二進法フォーマットに入力するために、Δｘ²に対
して行われる。この純粋自動CPの丸めは最終結果の時迄
必要とされないのであるが、またΔｘ²の二進法フォー
ムと他の中間結果は無視される。In the second cycle shown in FIG. 13 and with the control signals for cycle 2, shown in FIGS. 5 and 6, the Δx ² condition and operation of register 47 to calculate the AΔx ² condition. The number A and is provided to the multiplier array 36. At the same time, the B operation number and the Δx operation number are supplied to the multiplier array 38 so that the BΔx multiplication can be performed at the same time. Since not all 64 bits of the coefficient are needed, it is possible to do this simultaneously using two parts of the partitioned multiplier array. At the same time, this happens in cycle 2 in the second pipeline consisting of carry / propagate (CP) adder 48 and register 49, and for the CP rounding operation to input in pure binary format, for Δx ^2. Is done. This pure automatic CP rounding is not required until the time of the final result, but also the binary form of Δx ² and other intermediate results are ignored.

【００２３】詳細については以下に論ずるところである
が、レジスター47のΔｘ²条件は、ブースレコーダ50の
CS入力と連結するマルチプレクサ41，51にCSフォーマッ
トの中で供給される。二つのマルチプレクサがＳ，Ｃフ
ォーマットの故に使用される。ブースコード化された出
力は次にアレイ36の“Ｂ”演算数入力に供給される。こ
の出力がΔｘ²の最上位36ビットであり、ブースレコー
ダの残りのビットはアレイ38に供給される。Ａ係数はマ
ルチプレクサ52を通じてアレイ36の“Ａ”演算数入力に
供給される。Although the details will be discussed below, the Δx ² condition of the register 47 is the same as that of the booth recorder 50.
It is supplied in CS format to multiplexers 41, 51 which are connected to the CS input. Two multiplexers are used because of the S, C format. The Booth coded output is then provided to the "B" operand of array 36. This output is the 36 most significant bits of Δx ² and the remaining bits of the Booth Recorder are provided to array 38. The A coefficient is provided to the "A" operand of array 36 through multiplexer 52.

【００２４】Ｂ係数はレジスター33からマルチプレクサ
41を通じて、ブースレコーダ50を通じてアレイ38の
“Ｂ”演算数入力に供給される。MA入力32のΔｘ条件は
マルチプレクサ40を通じてアレイ38の“Ａ”演算数入力
に適用される。The B coefficient is transferred from the register 33 to the multiplexer.
41 through the Booth recorder 50 to the "B" operand input of array 38. The Δx condition on MA input 32 is applied to the "A" operand of array 38 through multiplexer 40.

【００２５】またサイクル２の間に、アレイ38のＢΔｘ
積は、ＢΔｘ＋Ｃ条件を供給するためにけた上げ／保留
加算器(CSA) 53の中でレジスター34からのＣ係数と組み
合わされる。この出力はCSA 54に供給され、そこではア
レイ36のＡΔｘ²積にそれが付け加えられるのである
が、結果はレジスター47に記憶される。Also during cycle 2, BΔx of array 38
The product is combined with the C coefficient from register 34 in a carry / hold adder (CSA) 53 to provide the BΔx + C condition. This output is provided to CSA 54, where it is added to the AΔx ² product of array 36, but the result is stored in register 47.

【００２６】理解されるようにｘ²条件はけた上げ／保
留・フォーマットの中でCP加算器48を通じて通常のフォ
ーマットに変換すること無しに─というのはこれには追
加時間が必要となるからである─使用される。この能力
は、通常の二進法フォーマットあるいは詳細については
以下に述べるCSフォーマットのいずれかを受け入れる特
別なブースレコーダ論理 (これは演算数の一つに対する
全ての必要な情報を掛け算器アレイ36と38に供給する)
によって備えられるのである。As will be appreciated, the x ² condition is in carry / hold format without conversion through CP adder 48 to the normal format, since this requires additional time. Yes-used. This capability is provided by special Booth recorder logic that accepts either the normal binary format or the CS format described below in detail, which supplies all necessary information for one of the operands to the multiplier arrays 36 and 38. Do)
Be prepared by.

【００２７】図７〜図13にて示された単一精度計算の三
番目のサイクルにおいては、演算数は多項式Ａｘ²＋Ｂ
ｘ＋Ｃ回かけられる。これは多項式がまだCSフォーマッ
ト内にある時に行われ、再び解読時間を節約するのであ
る。単一精度の計算のためにはニュートン・ラフソン逐
次解法は必要とされない、なぜならば多項式がｙ条件に
よる１／ｘ条件の掛け算に直接行ける精密さの必要水準
を与えてくれるからである。残りのサイクルに使用され
る制御信号は図９，図10, 図11, 図12に示されている。In the third cycle of the single precision calculation shown in FIGS. 7 to 13, the number of operations is the polynomial Ax ² + B.
It can be applied x + C times. This is done when the polynomial is still in CS format, again saving decoding time. The Newton-Raphson iterative solver is not needed for single-precision computations, because the polynomial provides the necessary level of precision that can be directly multiplied by the 1 / x condition by the y condition. The control signals used for the remaining cycles are shown in FIGS. 9, 10, 11 and 12.

【００２８】二倍精度の割り算作動のためには、図14の
タイミングチャートで図示されているように、演算は三
番目のサイクル迄は同様に進む。三番目のサイクルで
は、ニュートン・ラフソン逐次解法が、多項式Ａｘ²＋
Ｂｘ＋Ｃ (CSフォームで) によって負のｘの値をかける
ことにより始められる。逐次解法は必要な精密さを供給
するためサイクル４で完了する。次にサイクル５では演
算数ｙは１／ｘのニュートン・ラフソン近似値をかけら
れる。ここで二倍精度のためには本発明の多項式の使用
によって、ニュートン・ラフソン逐次解法はただ一度だ
け要求される。For a double precision divide operation, the operation proceeds similarly until the third cycle, as illustrated in the timing chart of FIG. In the third cycle, the Newton-Raphson iterative method uses the polynomial Ax ² +
Started by multiplying the negative x value by Bx + C (in CS form). The iterative solution method completes in cycle 4 to provide the required precision. Next, in cycle 5, the operation number y is multiplied by the Newton-Raphson approximation of 1 / x. Here, for double precision, the use of the polynomial of the present invention requires the Newton-Raphson iterative solution only once.

【００２９】上記で参照されたブースレコーディング
は、A.D.ブース著「制定された二進法の掛け算技術」
(季刊誌、メカニカルアンドアプライドマスマティク
ス、４巻、パート２、1951年版) で論じられている。本
発明の論理はブースレコーディング論理がけた上げ／保
留・フォーマットで演算数を受けることができるように
するのである。このようにして、ｘ²及び他の条件にと
って更に下に述べるように、通常のフォーマットに入れ
る必要は無く、演算速度を上げるのである。The booth recording referred to above is AD Booth's "Established Binary Multiplication Technique".
(Quarterly, Mechanical and Applied Massmatics, Vol. 4, Part 2, 1951 Edition). The logic of the present invention enables the Booth Recording logic to receive operands in a carry / hold format. In this way, there is no need to enter the normal format, as described further below for x ² and other conditions, but to speed up the operation.

【００３０】平方根の決定はまた多項式Ａｘ²＋Ｂｘ＋
Ｃを使用することによってなされるが、近似値計算には
ｘ＝0.5 から2.0 の間で１／√ｘカーブを用いる。512
の間隔が使用され、分離したROM 又は同じROM 30の分離
分野を用い、この逆数平方根近似値計算のための係数を
記憶するのである。実際の平方根は、上記の割り算手順
に似て、逆数にｘをかけることにより、その逆数から決
定する。The determination of the square root is also a polynomial Ax ² + Bx +
This is done by using C, but the 1 / √x curve between x = 0.5 and 2.0 is used for the approximation calculation. 512
Intervals are used to store the coefficients for this reciprocal square root approximation calculation using separate ROMs or separate fields of the same ROM 30. The actual square root is determined from the reciprocal by multiplying it by x, similar to the division procedure above.

【００３１】分割アレイ掛け算器Ａｘ²とＢｘ条件が上記で算出されるのが示されたが、
その方法は本発明のもう一つの面の利点を採っている。
これらの条件は１／ｘの単精度値を算出するのに使用さ
れる。この単精度結果は28ビットだけの精度で算出され
る、従って、算出に使用される値は28ビットを越える精
度を必要としない。Ａｘ²条件とＢｘ条件の算出には各
々一度の掛け算演算が要求されるが、しかし再び、28ビ
ットのみの精度の結果が必要であるので、ただ28ビット
の精度を持つ演算数を使用しても良い。このようにし
て、これら二つの条件の算出には60×64アレイがフルに
要求されないのであり、そしてそれは、条件を同時に算
出できるように分割しても良い。図３〜図12で示された
特異な実施例がこれを図示しており、掛け算器アレイが
36，38の二つの副配列に分割されていて、各々は60ビッ
トMA演算数を持ち、そして各々36ビットと28ビットMB演
算数を持っている。二番目の多項式近似値計算の第二段
階の間に、積Ａｘ²は60×36副アレイで算出されると同
時にＢｘは60×28副アレイで算出され、アレイをフルに
使用して各々計算を行った場合に比べ全一サイクルの節
約となるのである。Although the split array multiplier Ax ² and Bx conditions have been shown to be calculated above,
That method takes advantage of another aspect of the invention.
These conditions are used to calculate the 1 / x single precision value. This single precision result is calculated with a precision of only 28 bits, so the values used for the calculation do not require a precision of more than 28 bits. Calculation of Ax ² condition and Bx condition requires one multiplication operation each, but again, since the result of the precision of only 28 bits is required, use only the arithmetic number with the precision of 28 bits. Is also good. In this way, the full 60 × 64 array is not required to calculate these two conditions, and it may be split so that the conditions can be calculated simultaneously. The particular embodiment shown in FIGS. 3-12 illustrates this, where the multiplier array is
It is divided into two sub-arrays of 36 and 38, each with a 60-bit MA operand, and with a 36-bit and a 28-bit MB operand respectively. During the second stage of the second polynomial approximation calculation, the product Ax ² is calculated on the 60 × 36 sub-array and at the same time Bx is calculated on the 60 × 28 sub-array, each using the full array. This saves a full cycle compared to the case where

【００３２】一般に、多段階計算はしばしば他方の計算
結果に従属しない演算数を各々が持つ、二つの中間計算
を含む。もしそうならば、中間計算は、その結果と演算
数が何ビットの精度を必要としているかを見るために吟
味しなくてはならない。もし一つの中間計算の演算数に
必要とされるビット数に他の中間計算の演算数に必要と
されるビット数を加えた合計がアレイの中で利用出来る
最大ビット数を越えないならば、アレイは分割し、そし
て両方の中間計算は同時に行っても良い。同じ分析によ
ってもっと複雑な計算のために、アレイは三つあるいは
それ以上の計算を一度にこなすべく分割することができ
るということが示唆される。In general, multi-stage calculations often involve two intermediate calculations, each having an operand that is independent of the result of the other calculation. If so, the intermediate calculation must be examined to see how many bits of precision the result and operand require. If the sum of the number of bits required for one intermediate computation operation plus the number of bits required for another intermediate computation operation does not exceed the maximum number of bits available in the array, The array may be split and both intermediate calculations may be done simultaneously. The same analysis suggests that for more complex calculations, the array can be split to handle three or more calculations at once.

【００３３】アレイを分割することはかなり簡単であ
る、各々の副アレイ36と38は通常の独立した掛け算器ア
レイとして設計されている。分割アレイ演算のために各
々の副アレイはそれ故その演算数を供給され、そしてそ
の結果を通常の方法で配給する。副アレイをより大きな
一体化したアレイに組み合わせるために、論理では次の
ことをするよう与えられている。二つの演算数の一つを
各々の副アレイ全体に供給すること。他方の演算数は一
つが各々の副アレイ用として破片に分割される。この特
異な実施例では、演算数の断片化と供給はマルチプレク
サ41,51 及びブースレコーダ50によってなされる。これ
らの副アレイ36と38が算出する値は部分積として使用さ
れる。それらは前もって分割された全体の演算数の内で
その断片的演算数の最下位の数字の位置に従ってけた送
りされる。Partitioning the array is fairly straightforward, each sub-array 36 and 38 is designed as a conventional independent multiplier array. For a split array operation, each sub-array is therefore supplied with its number of operations and distributes the result in the usual way. To combine the sub-arrays into a larger integrated array, logic is given to do the following: Supplying one of the two operands to each sub-array as a whole. The other operand, one is divided into pieces for each sub-array. In this particular embodiment, the fragmentation and provision of operands is done by multiplexers 41, 51 and Booth recorder 50. The values calculated by these sub-arrays 36 and 38 are used as partial products. They are digitized according to the position of the least significant digit of the fractional operand of the total number of operands previously divided.

【００３４】図15，図16がこれを図示している。Ａは64
ビット演算数であるＢによって、60×28掛け算器アレイ
（38）と60×36掛け算器アレイ（36）を使用して掛けら
れる、60ビットの演算数である。Ａは60×２８アレイ内
で88ビットの結果Ｃを創りだすためにＢの最下位側28ビ
ットをかけられる。同時に、Ａは60×36アレイ内で96ビ
ットの結果Ｄを創りだすためにＢの最上位側36ビットを
かけられる。Ｄはそこで論理的には左に28ビット移動し
て (実際のハードウェアは配置を簡単にするためにＣを
右に移動して、これを行う) ─ 124ビットＦを創りだ
す。─その左96ビットはＤの96ビットであり、そして一
番右28ビットはゼロ (Ｅ₂₈…Ｅ₁）である。次にＣとＦ
は結果Ｇを創りだすためにお互いに加えられる。図16は
ＣとＤが部分積であるという事実を図示している、そし
てまた実際の別個のシフター無しにＤは左移動方法によ
って加算器の入力に供給されることが出来る、あるいは
Ｃは右移動方法によって加算器の入力に供給されること
が出来るということを明らかにしている。This is shown in FIGS. 15 and 16. A is 64
60 × 28 multiplier array by the bit operation number B
(38) and a 60 × 36 multiplier array (36) to multiply by 60 bits. A is multiplied by the least significant 28 bits of B to produce a 88 bit result C in a 60 × 28 array. At the same time, A is multiplied by the most significant 36 bits of B to produce a 96-bit result D in a 60 × 36 array. D then logically moves 28 bits to the left (the actual hardware does this by moving C to the right to simplify placement) -creating a 124-bit F. -The left 96 bits are 96 bits of D, and the rightmost 28 bits are zero (E ₂₈ ... E ₁ ). Then C and F
Are added to each other to produce the result G. Figure 16 illustrates the fact that C and D are partial products, and also D can be fed to the input of the adder by the left shift method without the actual separate shifter, or C is right. It makes clear that it can be fed to the input of the adder by a moving method.

【００３５】この特異な実施例において、マルチプレク
サ42，43はアレイ36の入力44,45 にそれを適用すること
によってアレイ38の積のけた送りを行う。In this particular embodiment, the multiplexers 42,43 perform the inversion of the array 38 by applying it to the inputs 44,45 of the array 36.

【００３６】ブースレコーディング掛け算の過程は二つの主要段階をもってして考えること
が出来、第一段階は部分積の生成であり、第二段階はこ
れらの部分積を減らして最終積にすることである。最初
二つの部分積を加えるのに一つの加算器が必要となり、
そして一般的には追加の各々の部分積に一つの追加の加
算器が必要となる。掛け算の最も簡単な形は掛け算器の
ビット毎の演算であり、そして掛け算器で各ビット毎に
一つの部分積を創ることである。The process of Booth recording multiplication can be considered in two main stages: the first stage is the production of partial products, and the second stage is the reduction of these partial products into the final product. . First we need one adder to add the two partial products,
And generally one additional adder is required for each additional partial product. The simplest form of multiplication is the bitwise operation of the multiplier, and the multiplier creates one partial product for each bit.

【００３７】これらのラインに沿ったｎ×ｍビット掛け
算器は付け加えるべきｍ個の部分積を生成する。簡潔な
一般例は図17の通りである。この掛け算器を作り上げる
ため、ｍ個の加算器の一群が積を合計する。各部分積は
Ａあるいは０いずれかであり、Ｂの個々のビットによっ
て決定される。この掛け算器は単純であるが、規模的に
コスト高であり、そしてｍ個の加算が積を生成するため
に必要となる。付け加えるべき部分積の数を減らすこと
は可能である。（但し、ブースレコーダを使用すること
によって）ブースレコーディングは掛け算器の中の１の
隣接する列と０の隣接する列を通して掛け算の過程をス
キップさせることが出来る、ｎビット掛け算器のため
に、ｎ／２個を越える積は創らない。The n × m bit multipliers along these lines produce m partial products to add. A brief general example is shown in Figure 17. To make up this multiplier, a group of m adders sums the products. Each partial product is either A or 0 and is determined by the individual bits of B. This multiplier is simple, yet costly in scale, and m additions are required to produce the product. It is possible to reduce the number of partial products to add. Booth recording (but by using a Booth recorder) can skip the multiplication process through adjacent columns of 1s and adjacent columns of 0s in the multiplier, for n-bit multipliers, n / Do not create more than 2 products.

【００３８】もっと一般に使用されている修正されたブ
ースレコーディングは一度にビットのグループを重ね合
わせて処理し、グループ毎の新しいビットの数ｂは基数
２^bで表される。基数２^bブースレコーディングは（ｎ
／ｂ）＋１個の部分積を生成し、そしてそれ自身を並列
ハードウェア装置に加えるのである。修正されたレコー
ディング方法はこれらの重なりあった、スタートするの
か続くのかあるいは１の列の端であるのかに基づいたグ
ループを符号化する。１の列の適切な終了を確実にする
ために、レコードする数に一つの最下位の０と少なくと
も一つの最上位の０を詰めねばならない。基数４
（２²）のブースレコーディングに各々のグループは３
ビットを持ち、二つは主要なものであり、それに右側
（上位でないもの）からの一つの重なりあうビットを加
えたものである。The more commonly used modified Booth recording processes superimposing groups of bits at a time, the new number b of bits per group being represented by the radix 2 ^b . Radix 2 ^b Booth recording is (n
/ B) produces +1 partial products and adds itself to the parallel hardware device. The modified recording method encodes these overlapping groups based on whether they start, continue, or are the ends of a row. To ensure proper termination of a sequence of ones, the number to be recorded must be padded with one least significant zero and at least one most significant zero. Radix 4
(2 ² ) Booth recording, each group has 3
Bits, two are the major ones, plus one overlapping bit from the right (not the top).

【００３９】図18(a) は如何に８ビットの符号のない番
号が、右に一つのゼロそして左に二つのゼロを詰められ
てグループにされているかを示し、図18(b) は如何に３
ビットグループが標準の基数４ブースレコーディングを
通じて符号化されているかを示している。図19(a) は三
つの入力ｒ₀，ｒ₁及びｒ₂をブースレコーディング論
理が受け入れるというブロックダイアグラムを示してい
る。これらは図18(b)のビットｙ_i-1，ｙ_i及びｙ_i+1
にそれぞれ対応している。だから、８ビットの数は五つ
の部分積が出来、四つの加算器が必要となる。図19(b)
は図19(a) の五つのブースレコーダがどのようにして図
18(a) の符号のない８ビットの数のためのブースレコー
ディング情報を算出するのに使用出来るのかを示してい
る。FIG. 18 (a) shows how 8-bit unsigned numbers are grouped with one zero on the right and two zeros on the left, and FIG. 18 (b). To 3
It shows whether the bit groups are coded through standard radix-4 Booth recording. FIG. 19 (a) shows a block diagram in which the Booth recording logic accepts three inputs r ₀ , r ₁ and r ₂ . These are bits y _i-1 , y _i and y _{i + 1 in} FIG. 18 (b).
It corresponds to each. So an 8-bit number can produce 5 partial products, requiring 4 adders. Figure 19 (b)
Figure 19 (a) shows how the five booth recorders
It shows how it can be used to compute Booth recording information for an unsigned 8-bit number in 18 (a).

【００４０】図20はＡ×Ｂの部分積の総計を図示してい
る。ここでＢはｍビットを持つ。ｍ／２個のみの加算器
が必要とされることに注意。標準のブースレコーディン
グの限界的特徴はそれが二進法のフォーマットの数のみ
扱うということである。これは連続する掛け算を要求す
る演算における欠点であり、一つの掛け算の結果が次の
掛け算演算における演算数として使用されるのである。
一つの掛け算演算の部分積が最初に加算された時、結果
は最初はけた上げ／保留のような余計なフォーマットに
あるのである。この結果が標準のブースレコーディング
を経験する前にそれはCP加算器によって二進法フォーマ
ットに算入されねばならない─これは貴重な時間を消費
する。CP加算器は一番右の位置から一番左の位置への移
送を伝える必要があり、結果としてｎビット数を二進法
のフォーマットに算入させるのにｎ回の遅れが出ること
になる。本発明のCP加算器はCP概算を100 ビット数を越
えて行わねばならない、即ち100 回を越えるCP加算伝達
遅れを意味する。一連の掛け算の各々の掛け算の後でそ
のようなCP加算を実行することは貴重な時間を消費する
ことになる。FIG. 20 illustrates the total of A × B partial products. Here, B has m bits. Note that only m / 2 adders are needed. The limiting feature of standard booth recording is that it only deals with numbers in binary format. This is a drawback in operations that require successive multiplications, where the result of one multiplication is used as the number of operations in the next multiplication operation.
When the partial products of one multiply operation are added first, the result is initially in an extra format such as carry / hold. Before this result undergoes standard Booth recording it must be included in the binary format by the CP adder-this consumes valuable time. The CP adder must signal the transfer from the rightmost position to the leftmost position, resulting in n delays in including the n-bit number in the binary format. The CP adder of the present invention has to perform the CP estimation over a 100-bit number, that is, a CP addition transmission delay exceeding 100 times. Performing such a CP addition after each multiplication in a series of multiplications would be valuable time consuming.

【００４１】連続する乗法計算の速度を大幅に上げる本
発明の一つの側面はブースレコーディング入力に供給す
る余剰フォーマット数の事前の符号化であるが、CP加算
器の長時間の遅れを要求しない、という方法である。本
発明のこの側面に従って回路はけた上げ／保留・フォー
マットの数を受入れ、それらを図19(a) にあるような標
準のブースレコーダのグループへの入力を創りだすよう
処理する。ブースレコーダは二進法というよりむしろけ
た上げ／保留・フォームで数字が論理の中間層によって
処理されるという点を除いて、基本的に図19(b) と同じ
方法で編成されている。論理のこの層は幾つかの同一の
ユニットで構成されており、けた上げ／保留レコードユ
ニットのようなもののための論理構成部品は図21で図示
され、一般に参照番号200 で示されている。[0041] While one aspect of the present invention to increase significantly the speed of the successive multiplication calculations are pre-coding of the number of surplus format supplied to the Booth recoding the input does not require a long delay of the CP adders, Is the method. In accordance with this aspect of the invention, the circuit accepts a number of carry / hold formats and processes them to create an input to a group of standard booth recorders as in Figure 19 (a). The Booth Recorder is basically organized in the same manner as Figure 19 (b), except that the numbers are processed by a logical middle layer in a carry / hold form rather than a binary system. This layer of logic is made up of several identical units, and the logical components for such as carry / hold record units are illustrated in FIG. 21 and generally designated by the reference numeral 200.

【００４２】けた上げ／保留レコーディング(CSR) 論理
の層は単純にけた上げ／保留数の対応する二進法フォー
マットを決定しない、というのは、それは未だけた上げ
／保留数のｎ個のペアに対してはｎ回遅れとなるからで
ある。むしろ、CSR ユニットは主要な考慮対象のけた上
げ／保留ビットの二つのペア（ａ₁ａ₀に組合せたＳ₁
Ｓ₀及びｂ₁ｂ₀に組合せたＣ₁Ｃ₀）を入力として受
入れ、そしてCSR ユニットの二つの入力ＣａとＣｂを右
（次に上位でないグループ）へ。各CSR ユニットは
ｒ₀，ｒ₁そしてｒ₂（ブースレコーディング論理によ
って必要とされる情報）を出力し、そしてまた次のCSR
ユニットによって必要とされる二つのビットＣａアウト
とＣｂアウトを出力する。The carry / hold recording (CSR) logic layer does not simply determine the corresponding binary format of the carry / hold number, because it is for n pairs of outstanding carry / hold numbers. Is delayed by n times. Rather, the CSR unit is composed of two pairs of carry / hold bits of primary interest (S ₁ combined with a ₁ a _0).
Accept C ₁ C ₀ ) combined with S ₀ and b ₁ b ₀ ) as input, and the two inputs Ca and Cb of the CSR unit to the right (next higher group) Each CSR unit outputs r ₀ , r ₁ and r ₂ (information needed by the Booth recording logic) and also the next CSR
It outputs the two bits Ca out and Cb out required by the unit.

【００４３】論理はＣｂアウトはＳ₁Ｓ₀とＣ₁Ｃ₀の
総量によって決定されるように設計され、ＣａとＣｂか
らは完全に独立している。Ｓ₁Ｓ₀とＣ₁Ｃ₀の総量は
０から６の範囲であり得、Ｃｂアウトは総量が１より大
きい時に設定される。ＣａアウトはＳ₁Ｓ₀とＣ₁Ｃ₀
の総量及びＣｂによって決定設定される、Ｃａアウトは
もし総量が６ならばあるいはＣｂが設定される総量が５
又は１の時に設定される。（これに関する数学的な正当
化は以下に説明する。）ＣａアウトはＣｂによって影響
されるが、Ｃｂは完全に前のCSR ユニットによって、CS
R ユニットをそれよりもっと以前に参照すること無く
(より上位でないビットを取り扱う) 、決定された。又
Ｃａは直接ｒ₀に向けられ、それがそれの唯一の利用で
あることに留意すること。CSR ユニットは次にもっと上
記のブースレコーダのｒ₀入力に直接Ｃａアウトを出力
するよう指示が出せ、そしてCSR ユニットは次にはＣａ
入力を必要としない。The logic is that Cb out is designed to be determined by the sum of S ₁ S ₀ and C ₁ C ₀ and is completely independent of Ca and Cb. The total amount of S ₁ S ₀ and C ₁ C ₀ can range from 0 to 6, and Cb out is set when the total amount is greater than 1. Ca out is S ₁ S ₀ and C ₁ C ₀
Is determined and set by the total amount of Cb and Cb, if Ca is 6 or the total amount of Cb is 5
Or, it is set when 1. (The mathematical justification for this is explained below.) Ca out is affected by Cb, but Cb is completely CS by the previous CSR unit.
Without referring to the R unit much earlier
(Handling less significant bits) has been decided. Also note that Ca is directed directly to r ₀ , which is its only use. The CSR unit can then instruct more directly to output Ca out to the booth recorder's r ₀ input, and the CSR unit can then output Ca out.
No input required.

【００４４】CSR ユニットはそのＣｂアウトを算出する
前に有効なＣａを持つ必要はない。従って各CSR ユニッ
トはそれに隣合うユニットで発生する入力のみを待てば
良い。幾つのビットが含まれていようと、どれだけのCS
R ユニットが一緒に連結されていようと問題ではなく、
ブースレコーダによって必要とされる情報を算出するの
に必要な全ての入力を各CSR ユニットが持つ前には合計
でたった一つのCSR ユニット伝達遅れがあるだけであ
る。The CSR unit does not have to have valid Ca before calculating its Cb out. Therefore, each CSR unit need only wait for the input that occurs in the unit next to it. No matter how many bits are included , how many CS
It doesn't matter if the R units are linked together,
There is only one CSR unit propagation delay in total before each CSR unit has all the inputs needed to calculate the information needed by the booth recorder.

【００４５】図22は如何にしてけた上げ／保留レコーダ
とブースレコーダが８組のけた上げ／保留ビットからな
る数のためのブースレコーディング情報を供給すべく組
み合わされるかを図示している。けた上げ／保留レゴー
ダは又けた上げビットをゼロに設定することによって通
常の二進法の数を処理することが出来る。FIG. 22 illustrates how a carry / hold recorder and a booth recorder are combined to provide booth recording information for a number of eight carry / hold bits. The carry / hold reorder can also handle normal binary numbers by setting the carry bit to zero.

【００４６】図21の回路はCSR ユニットが基数４のブー
スレコーディング論理に入力を供給出来る唯一の可能な
回路である。以下で明らかになるように、ＣａとＣｂ出
力は異なって発生することができる。そしてCSR ユニッ
トは又、基数８、16等を持つブースレコーディング論理
のために構成することが出来る。The circuit of FIG. 21 is the only possible circuit where the CSR unit can provide input to the radix-4 Booth recording logic. As will become apparent below, Ca and Cb outputs can occur differently. And the CSR unit can also be configured for booth recording logic with radix 8, 16, etc.

【００４７】本発明のこの側面の演算理論を理解するた
めに、基数４のブースレコーディングを使用する。そし
て図18(b) の情報を表現する代案としては、ｙ_i+1ビッ
トは−２のウェイトを、ｙ_iビットは１のウェイトを、
そしてｙ_i-1ビットもまた１のウェイトを持っていると
いうことに留意しなくてはならない。（基数２^bのブー
スレコーディングのためにはビットｙ_i-1は＋１のウェ
イトを持っており、ビットｙ_i+b-1は−２^b-1のウェイ
トを持ち、そしてビットｙ_i+j（ここで０≦ｊ≦（ｂ−
２））は２^jのウェイトを持っている。To understand the arithmetic theory of this aspect of the invention, we use radix-4 Booth recording. Then, as an alternative for expressing the information in FIG. 18 (b), y _{i + 1} bit has a weight of −2, y _i bit has a weight of 1, and
And it has to be noted that the y _{i -1} bits also have a weight of 1. (For radix 2 ^b Booth recording, bit y _i-1 has a weight of +1, bit y _{i + b-1} has a weight of -2 ^b-1 , and bit y _{i + j} ( Where 0 ≦ j ≦ (b−
2)) has a weight of 2 ^j .

【００４８】ブースレコード４けた上げ／保留ビットペ
アへの直接的な方法は図23(a) で示されている。けた上
げ／保留ビットはキャリーインＣ_inとキャリーアウトＣ
_outを持つ４ビット加算器200 に送り込まれ、そしてＴ
₀からＴ₃迄のビットの総量が標準的な方法でブースレ
コーダ100 に供給される。図23(b) で非常に良く似た手
段が示されており、そこでは中間キャリーＣを持つ二台
の２ビット加算器が使用されている。この中間キャリー
ＣはまだＣ_inに頼っており、だからこの配置では伝達遅
れは節約されない。The direct method to the Booth Record 4 carry / hold bit pair is shown in FIG. 23 (a). Carry-in / hold bits are carry-in C _in and carry-out C
fed into the 4-bit adder 200 with _out, and T
The total amount of bits from ₀ to T ₃ is supplied to the booth recorder 100 in a standard manner. A very similar means is shown in Figure 23 (b), where two 2-bit adders with an intermediate carry C are used. This intermediate carry C still relies on C _in , so no propagation delay is saved in this arrangement.

【００４９】しかしながら、中間キャリーＣを予断的キ
ャリーにすることは可能である。それは、予断的キャリ
ーはキャリーインを参照すること無しに二つの余分なフ
ォーマットビットを付け加えることにより完全に決定さ
れよう。勿論、予断的キャリーは必ずしもいつも真のキ
ャリーに対応するとは限らないが、訂正することは出来
る。これはなぜなら、ｙ_i位置に付け加えられるキャリ
ーは、同じウェイトを運びそしてｙ_i-1が入力ｒ₀にす
るように最終積に対して同じ正味の効果を持っているか
らである。（異なったブースレコーディング出力が生成
されようとも、ブースレコーディング出力の正味効果は
同じであろう。）However, it is possible to make the intermediate carry C a predictive carry. It will be determined entirely by preemptive carry by adding two extra format bits without reference to carry-in. Of course, predictive carry does not always correspond to true carry, but it can be corrected. This is because the carry added to the y _i position carries the same weight and has the same net effect on the final product so that y _i-1 makes input r ₀ . (The net effect of the booth recording output will be the same, even if different booth recording outputs are generated.)

【００５０】次にもっと上位のブースレコーダの重なり
アウトビットｙ_i-1を訂正的に使用するためには、しか
しながら、現在のブースレコーダのｙ_i+1入力ｒ₂に行
くＴ₁出力から切り離さなくてはならない。これは図24
に示されており、そこではけた上げ／保留レコーダ(CS
R) ユニット200 が図23(b) の２ビット加算器210 と置
き代わっている。これらのCSR ユニットはＴ₁とＴ₀の
総量を発生させ、次にもっと上位のユニットでキャリー
インとして使用される予断的キャリーＣｂアウトも、そ
して訂正的キャリーあるいは次にもっと上位のブースレ
コーディングへのｙ_i-1入力として使用されるＴ₁出力
から切り離された重なりあうビットＣａアウトも発生さ
せる。Next, in order to use the overlapping out-bits y _i-1 of the higher booth recorders correctively, however, without disconnecting from the T ₁ output going to the y _{i + 1} input r ₂ of the current Booth recorder. must not. This is Figure 24
The carry / hold recorder (CS
R) Unit 200 replaces 2-bit adder 210 in Figure 23 (b). These CSR units generate a total of T ₁ and T ₀ , and then also a pre-determined carry Cb-out, which is used as a carry-in in the higher order unit, and a corrective carry or next higher booth recording. It also produces an overlapping bit Ca out, which is decoupled from the T ₁ output used as the y _i-1 input.

【００５１】図23(b) の配置ではキャリーＣもＴ₁ビッ
ト (次にもっと上位のブースレコーダへのｙ_i-1入力と
して組み合わされる) は同じウェイト（＋１）を持って
いるということを念頭に置いて我々はそれらの結合した
効果を三つの場合にグループ化出来る。（Ａ）真のキャリー、Ｔ₁の両方とも＋１である（結合
ウェイト＋２）（Ｂ）真のキャリーあるいはＴ₁の一方が＋１である
（ウェイト＋１）（Ｃ）真のキャリーもＴ₁も＋１のウェイトを有しない
（結合ウェイト＋０）場合である。従って、予断的キャリーＣｂアウトと訂正
的キャリーＣａアウトはこれらの場合に正しい結合ウェ
イトを創生させるような形で生成されるのである。予断
的キャリーＣｂアウトは、ケース（Ａ）（キャリーイン
次第だが）の結果となるようなａ₁ｂ₁ａ₀ｂ₀の配置
で生成されねばならないし、ケース（Ｃ）の結果となる
ようにいかなる配置でも生成出来ないし、さもなければ
任意に生成することが出来る。訂正的キャリーＣａアウ
トは実際の結果として生じるケース（Ａ），（Ｂ）ある
いは（Ｃ）次第であって、対応する結合ウェイト各々
（＋２），（＋１），（＋０）がＣａアウトとＣｂアウ
トを一緒にして生成されるであろう。Keeping in mind that in the arrangement of FIG. 23 (b), carry C also has the same weight (+1) for the T ₁ bit (which is then combined as the y _i-1 input to the higher order Booth recorder). We can group their combined effects in three cases. (A) Both true carry and T ₁ are +1 (combined weight +2) (B) Either true carry or T ₁ is +1 (weight +1) (C) Both true carry and T ₁ are +1 This is the case where the weight is not included (combined weight +0). Therefore, the predictive carry Cb out and the corrective carry Ca out are generated in these cases so as to generate the correct connection weight. Preliminary carry Cb out must be generated with an arrangement of a ₁ b ₁ a ₀ b ₀ that results in case (A) (depending on carry in), and as a result of case (C). It cannot be created in any arrangement, or it can be created arbitrarily. Corrective carry Ca out depends on the actual result case (A), (B) or (C), and the corresponding connection weights (+2), (+1), (+0) are Ca out and Cb out respectively. Will be generated together.

【００５２】かくして図21の回路のキャリー生成概要が
理解出来るのである。上述のように、この回路はＳ₁Ｓ
₀とＣ₁Ｃ₀の総量が２，３，４，５あるいは６のとき
はいつもＣｂアウト信号を生成するのである。Ｃｂにお
けるキャリーが設定されていない場合の総量１のように
０の総量では、必ず上記ケース（Ｃ）の結果となる。だ
からＣｂアウトは生成されてはならないのである。総量
５の時にそうであるように総量６の時は必ずケース
（Ａ）の結果となる。だからＣｂアウトはこれらのため
に生成されなくてはならない。総量２，３，あるいは４
の時は必ずケース（Ｂ）の結果となり、そしてＣｂアウ
トは設定される。Ｃａアウトは次にキャリーインＣｂの
効果を考慮に入れ、そして結果がケース（Ａ）の場合あ
るいは結果がケース（Ｂ）でＣｂアウトが設定されなか
った場合のいずれの場合でも設定される。Ｃｂが設定さ
れている時で総量６あるいは５の時は結果はケース
（Ａ）となり、だからこれらのためにＣａアウトが設定
される。また、Ｃｂが設定されている時で総量１の時は
結果はケース（Ｂ）となる、そしてＣｂアウトが総量１
のために設定されなかった場合Ｃａアウトは設定され
る。他の全ての状況ではＣａアウトは設定されない。上
に述べたように、ケース（Ａ），（Ｂ）及び（Ｃ）のた
めに正しいウェイトが結果として成る限り、様々な他の
キャリー生成案を使用する事が出来る。図21の特異なキ
ャリー生成案は回路のレイアウトを簡単にしている。Thus, the outline of carry generation of the circuit of FIG. 21 can be understood. As mentioned above, this circuit uses S ₁ S
Whenever the total amount of ₀ and C ₁ C ₀ is 2, 3, 4, 5 or 6, the Cb out signal is generated. A total amount of 0, such as the total amount 1 when the carry in Cb is not set, always results in the above case (C). So no Cb out should be generated. As in the case of the total amount 5, the result of the case (A) always occurs when the total amount is 6. So Cb out must be generated for these. Total 2, 3, or 4
, The result of case (B) will always occur, and Cb out will be set. Ca out then takes into account the effect of carry-in Cb and is set whether the result is case (A) or the result is case (B) and Cb out was not set. When Cb is set and the total amount is 6 or 5, the result is case (A), so Ca out is set for these. When Cb is set and the total amount is 1, the result is case (B), and Cb out is the total amount 1
Ca Out is set if not set for. In all other situations Ca out is not set. As mentioned above, various other carry generation schemes can be used as long as the correct weights result for cases (A), (B) and (C). The unique carry generation scheme in Figure 21 simplifies the circuit layout.

【００５３】この案は、今、予断的キャリーとブースレ
コーディング論理の入力を生成するためにｂビット加算
器をｂビットCSR ユニットに置き換える事が一般化する
事が出来る。一般化された案は同じウェイトをもつ同じ
三つのケースを持つ。再び、ケースは余分なフォーマッ
トビットにキャリーインを加えた最終合計総量Ｔsumに
よって決定される。ケース（Ａ）はＴsum ≧２^b（真の
キャリーアウトが存在する）とＴ_b-1（重なりあうビッ
トに対応する）の両方が設定されているという場合に起
こる。ケース（Ｂ）は正確にこれらの条件のひとつが真
実である時に起こり、そしてケース（Ｃ）はそれらのど
ちらも真実でない場合に起こるのである。This scheme can now be generalized to replace the b-bit adder with a b-bit CSR unit to generate inputs for pre-determined carry and Booth recording logic. The generalized scheme has the same three cases with the same weight. Again, the case is determined by the final total amount Tsum plus carry-in plus extra format bits. Case (A) occurs when both Tsum ≧ 2 ^b (there is a true carry out) and T _b−1 (corresponding to overlapping bits) are set. Case (B) occurs when exactly one of these conditions is true, and case (C) occurs when neither of them is true.

【００５４】もし今我々がこれらの一般化された規則を
３ビットCSR ユニットに適用するならば、図25の回路が
構築され、そこでは３ビットCSR ユニットは要求される
ゼロを詰めて、けた上げ／保留フォーマットの16ビット
数をブースレコードするために基数８のブースレコーダ
と共に使用される。ケース（Ａ）はＴsum ≧８且つＴ₂
が設定されている場合に起こる。ケース（Ｃ）はＴsum
＜４の時に起こり、ケース（Ｂ）は他の全ての時に起こ
る。３ビットCSR ユニットは図21の２ビットCSR ユニッ
トと同じ線に沿って設計されており、ａ₂ａ₁ａ₀とｂ
₂ｂ₁ｂ₀（けた上げ／保留入力）の総量が３よりも大
きい時にいつでもＣｂアウトを生成する。Ｃａアウトは
Ｃｂが設定されている時で総量が14、13、12、あるいは
11の時には何時も生成され、全てはケース（Ａ）に対応
している。あるいはＣｂが設定されていて総量が３の場
合、Ｃｂアウトが設定されていないケース（Ｂ）に対応
する。If we now apply these generalized rules to a 3-bit CSR unit, the circuit of Figure 25 is constructed, where the 3-bit CSR unit is padded with the required zeros and carried up. Used in conjunction with a radix-8 booth recorder to booth record 16-bit numbers in / reserved format. Case (A) has Tsum ≧ 8 and T ₂
Occurs when is set. Case (C) is Tsum
<4, case (B) occurs at all other times. The 3-bit CSR unit is designed along the same line as the 2-bit CSR unit in Figure 21: a ₂ a ₁ a ₀ and b
Generate a Cb out whenever the total amount of ₂ b ₁ b ₀ (carry / hold input) is greater than 3. Ca out is when Cb is set and the total amount is 14, 13, 12, or
It is always generated at 11 and all correspond to case (A). Alternatively, when Cb is set and the total amount is 3, it corresponds to the case (B) where Cb out is not set.

【００５５】CSR ユニットはこのようにして加算器が要
求するような伝達遅延をまるまる要求することなくけた
上げ／保留フォーマットの数字をブースレコーディング
論理に適用するのに加算器と置き換えられるのである。
CSR ユニットは、これは最も効率的とはいえないが、他
の規模のCSR ユニットあるいは加算器を入り交じえる事
さえ可能である。唯一の重要な要求はＣｂアウトキャリ
ーがブースレコーダのｙ_i入力と対応する事、そしてCa
訂正器が、そのブースレコーダの重なりあう（ｙ_i-1）
ビットと組になるという事である。丁度可能な異なった
配列の例として図26では、けた上げ／保留フォーマット
の100 ビット数のために基数４のブースレコーダ100 に
入力を供給するために１ビットCSR ユニット300 が２ビ
ット加算器310 と３ビット加算器320 に結合されてい
る。もし加算器のみが使用されCSRユニットが一つも使
用されないならば、100CP 加算伝達が必要となる。CSR
ユニット300 の使用によって、数の中間で予断的キャリ
ーが生成され、伝達遅れを事実上に半分に減らすのであ
る。CSR ユニット300 の一つの可能な案が図27に示され
ている。この伝達遅れを半分に減少させる事は予断的キ
ャリーの力を示唆しているのである。The CSR unit is thus replaced by an adder for applying numbers in carry / hold format to the Booth recording logic without requiring any transmission delay as required by the adder.
The CSR unit, although this is not the most efficient, can even mix CSR units or adders of other sizes. The only important requirement is that the Cb out carry correspond to the y _i input of the booth recorder, and Ca
Corrector overlaps the booth recorders (y _i-1 )
It means to be paired with a bit. As an example of a different arrangement that is just possible, in Figure 26, a 1-bit CSR unit 300 and a 2-bit adder 310 to supply input to a radix-4 Booth recorder 100 for a 100-bit number in carry / hold format. It is coupled to the 3-bit adder 320. If only the adder is used and no CSR unit is used, then 100 CP add transfer is required. CSR
The use of unit 300 creates a premature carry in the middle of the number, effectively reducing the propagation delay by half. One possible alternative for CSR Unit 300 is shown in Figure 27. Reducing this transmission delay by half suggests the power of a predictive carry.

【００５６】IEEEの正確な丸め本発明の好ましい実施例はIEEEに正確な丸めを供給する
事に向けて、もう一つの側面を組み入れている。この事
によって、我々はANSI/IEEE 規格754 、1985年版で公布
された要求に合致する丸めを意味している。IEEE規格は
浮動点数字のために二つのフォーマットを公布している
−単精度及び二倍精度である。浮動点数字には指数部分
ｅと端数部分ｆがあって、そしてそれらの大きさはfx２
^eとして表す事が出来る。ここで、ｆは１．ｆ₁ｆ₂ｆ
₃・・・ｆ_nを、ｎは端数部分のビット数を表してい
る。単精度数は端数部分に23ビットを持ち、二倍精度数
は端数部分に52ビットを持つ。 IEEE Accurate Rounding The preferred embodiment of the present invention incorporates another aspect towards providing IEEE precise rounding . By this we mean rounding that meets the requirements promulgated in ANSI / IEEE Standard 754, 1985. The IEEE standard promulgates two formats for floating point numbers-single precision and double precision. Floating point numbers have an exponent part e and a fractional part f, and their size is fx2.
^It can be represented as ^e . Here, f is 1. f ₁ f ₂ f
₃ ... f _n , where n represents the number of bits in the fractional part. Single-precision numbers have 23 bits in the fractional part, and double-precision numbers have 52 bits in the fractional part.

【００５７】IEEE規格は一般に丸めを無限に精密である
数としてみなし、一方結果は不正確であると警告しなが
ら、もし必要ならばそれを目的のフォーマットに合うよ
うに修正すると記述している。数学的演算の大半はあた
かも中間結果が最初に無限の精密さと境界の無い範囲で
創りだされる、そしてそれは丸めモードのセットの一つ
に従って丸められているかのように実行される事が要求
されている。The IEEE standard generally considers rounding as a number that is infinitely precise, while warning that the result is inexact, stating that if necessary it will be modified to fit the desired format. Most of the mathematical operations require that the intermediate result be first created with infinite precision and boundless bounds, and that it be performed as if it were rounded according to one of a set of rounding modes. ing.

【００５８】最初の丸めモードは一番近い丸めと呼ばれ
ている。このモード下では無限に精密な結果に最も近い
代表できる値が与えられる。もし二つの代表できる一番
近い値が同等に近い場合には、最下位ビットゼロを持つ
ものが与えられる。又三つの、ユーザーが選択できる丸
めモードがある：＋無限大に向かう丸め−この場合、結
果はフォーマットの一番近い値で、無限に精密な結果以
下でないものである；−無限大に向かう概算−この場
合、結果はフォーマットの一番近い値で、無限に精密な
結果を越えないものである；そしてゼロへの丸め−この
場合、結果はフォーマットに一番近い値で、無限に精密
な結果の大きさを越えないものである。The first rounding mode is called the closest rounding. Under this mode, an infinitely accurate representative value that is closest to the result is given. If the two nearest representative values are close to equal, the one with the least significant bit zero is given. There are also three user-selectable rounding modes: + Rounding towards infinity-where the result is the closest value in the format, not less than the infinitely precise result; -Approximate towards infinity -In this case, the result is the closest value in the format, which does not exceed the infinitely precise result; and rounding to zero-in this case, the result is the closest value in the format, infinitely precise result It does not exceed the size of.

【００５９】IEEE規格は単精度及び二倍精度数に対し同
じ方法を適用している；参照点は何時も目的のフォーマ
ットの最下位ビットである。単一精度数にとってはこれ
は23番目のビットであり、二倍精度数にとっては52番目
のビットである。議論を容易にするためこれらの二つの
ケースは区別しない、そして“LSB ”という用語は適切
なフォーマットの最下位ビットを意味するものとする。
また、些細な例外を除けば丸め手順は平方根を算出する
のと同じように割り算についても同じである。だから、
商に向けられた全ての議論はまた注をした場合を除いて
平方根についても適用される。The IEEE standard applies the same method for single and double precision numbers; the reference point is always the least significant bit of the target format. For single precision numbers this is the 23rd bit and for double precision numbers it is the 52nd bit. For ease of discussion, these two cases are not distinguished, and the term "LSB" shall mean the least significant bit of the appropriate format.
Also, with minor exceptions, the rounding procedure is the same for division as it is for square root. So
All arguments directed to the quotient also apply to the square root except where noted.

【００６０】一番近い丸めのためには、最悪の場合、本
発明では実の商は1/2 LSB 点の丁度下におちているの
か、丁度その上かあるいは直接その上にあるのかを決定
する必要がある。他の丸めモードで最悪の場合、本発明
では実の商はLSB 点の丁度下におちているのか、丁度そ
の上かあるいは直接その上にあるのかを決定する必要が
ある。これは1/2 LSB 点のサブセットであり、ここでは
1/2 LSB でのビットの値がゼロである。だから、実の商
と1/2 LSB 点との関係を決定するのに全ての丸めモード
を行う事を許容する事になろう。For the closest rounding, in the worst case, the present invention determines whether the actual quotient falls just below the 1/2 LSB point, just above it, or directly above it. There is a need to. In the worst case of the other rounding modes, the present invention needs to determine whether the real quotient falls just below the LSB point, just above it, or directly above it. This is a subset of 1/2 LSB points, where
The value of the bit at 1/2 LSB is zero. So we will allow all rounding modes to determine the relationship between the real quotient and the 1/2 LSB point.

【００６１】どの様な計算がなされようと、本発明はLS
B よりも数ビット上位でない正確さである最初の結果を
算出する。この点で商はオリジナルの計算された商Ｑ₀
を生成するために1/4 LSB で断ち切ってある。エラーは
εの用語とし、そして等式Ｑ_r＝Ｑ₀−ε₁を通じて定
義される。ここにおいてQrは真の商である。データパス
は｜ε｜＜1/4 LSB のようにして実行される。それは又
エラーの大きさはそれよりも小さい事、しかしその記述
は明確に真実を保持する、だから、−1/4 LSB＜ε＜＋1
/4 LSB である。Whatever calculation is performed, the present invention uses LS
Compute the first result with an accuracy that is not a few bits higher than B. At this point the quotient is the original calculated quotient Q ₀
Truncated at 1/4 LSB to generate Error is termed ε and is defined through the equation Q _r = Q ₀ −ε ₁ . Here Qr is the true quotient. The data path is implemented as | ε | <1/4 LSB. It is also that the magnitude of the error is smaller than that, but the description clearly holds the truth, so -1/4 LSB <ε <+1
/ 4 LSB.

【００６２】今、1/4 LSB の更なるエラーが故意に、計
算された商Ｑ_Cを創りだすためにＱ₀に1/4 LSB を加え
る事によって導かれる。εの範囲は今、０＜ε＜1/2 LS
B である。これは今計算された商が厳密に実の商よりも
大きい事を保証している。我々は今、真の商を見つける
ためにどこを『見れば良いか』というより良い考えを持
っているのである。Now, a further error of 1/4 LSB is deliberately introduced by adding 1/4 LSB to Q ₀ to create the calculated quotient Q _C. The range of ε is now 0 <ε <1/2 LS
It is B. This guarantees that the quotient now calculated is strictly larger than the actual quotient. We now have a better idea of where to look to find the true quotient.

【００６３】記録の目的のために、ＮをLSB の最下位の
そのビットが０であれ１であれどんな数でもよいことに
する。我々は今一般的な説Ｎ≦Ｑ_C＜Ｎ＋１（ここで単
位は一つのLSB であると理解され今後そうする事とす
る）。これは二つのケースに分けられる。一つは（１）
Ｎ≦Ｑ_C＜Ｎ＋1/2 でありそしてもう一つは（２）Ｎ＋
1/2 ≦Ｑ_C＜Ｎ＋１である。これらを個別に考えてみよ
う。For purposes of recording, let N be any number whose least significant bit of the LSB is 0 or 1. We now general theory _{N ≦ Q C <N + 1} ( where units are understood to be one of the LSB and doing so in the future). This can be divided into two cases. One is (1)
N ≦ Q _C <N + 1/2 and the other is (2) N +
1/2 ≦ Q _C <N + 1. Let's consider each of these individually.

【００６４】全ての場合に於いて、０≦ε＜1/2 、だか
らケース（１）の場合はＮ−1/2 ＜Ｑ_r＜Ｎ＋1/2 であ
る。上記から決定するのに必要な全ては、1/2 LSB 点に
関連してＱ_rの立脚地である。この範囲に於けるそのよ
うな唯一の点は、Ｎにおいてである。ケース（１）のＱ
_Cの範囲に留意の事：その中に含まれる唯一の1/2 LSB
点もまたＮにある。だから、Ｑ_CはＮに等しいＱ_tを生
成して1/2 LSB 点で断ち切られるのである。In all cases, 0.ltoreq..epsilon. <1/2, so in case (1) N-1 / 2 < _Qr <N + 1/2. All that is needed to determine from the above is the grounding of Q _r with respect to the 1/2 LSB point. The only such point in this range is at N. Q of case (1)
Note the range of _C : the only 1/2 LSB contained in it.
The point is also at N. So, Q _C is of being cut off by 1/2 LSB point to generate equal Q _t in N.

【００６５】ケース（２）の場合、Ｎ＋1/2 ≦Ｑ_C＜Ｎ
＋１であり、Ｑ_rの範囲はＮ＜Ｑ_r＜Ｎ＋１である。こ
の範囲に於ける唯一の1/2 LSB 点はＮ＋1/2 LSB であ
る。再び、これは又Ｑ_Cの範囲内に於ける唯一の1/2 LS
B 点である事、従って、Ｑ_CはＮ＋1/2 LSB に等しいＱ
_tを生成するために1/2 LSB 点で断ち切られるという事
に留意する事。ケース（１）ケース（２）共にＱ_rが比
較されねばならない臨界値はＱ_tを生成するために1/2
LSB 点でＱ_Cを断ち切る事によって得られるという事が
分かるのである。In case (2), N + 1 / 2≤Q _C <N
+1 and the range of Q _r is N <Q _r <N + 1. The only 1/2 LSB point in this range is N + 1/2 LSB. Again, this is also the only 1/2 LS within Q _C
It is point B, so Q _C is equal to N + 1/2 LSB Q
Note that it is cut off at the 1/2 LSB point to generate _t . In both case (1) and case (2), the critical value that Q _r must be compared with is 1/2 to generate Q _t.
It is know fact that is obtained by cutting off the Q _C in LSB points.

【００６６】一旦この臨界値が確認されれば、それに対
するＱ_rの関係は部分剰余PRを算出する事により決定す
る事が出来る。割り算のためには、PR＝ｙ−ｘ・Ｑ_t；
平方根のためには、PR＝ｘ−Ｑ_t・Ｑ_tである。もしPR
＝０ならば、Ｑ_r＝Ｑ_tである。もしPR＜０ならば、Ｑ
_t−1/2 ＜Ｑ_r＜Ｑ_tである。Once this critical value is confirmed, the relationship of Q _r to it can be determined by calculating the partial remainder PR. For division, PR = y−x · Q _t ;
For the square root is a _{PR = x-Q t · Q} t. If PR
If = 0, then Q _r = Q _t . If PR <0, then Q
_t −1/2 <Q _r <Q _t .

【００６７】もしPR＞０ならば、Ｑ_t＜Ｑ_r＜Ｑ_t＋1/
2 である。If PR> 0, then Q _t <Q _r <Q _t + 1 /
It is 2.

【００６８】ｘ・Ｑ_tの掛け算のためには、結果の全て
の上位のビットはｙと比較のため保持され、そしてｙは
ｘ・Ｑ_t積と同じ規模にするために、十分により上位で
ない数でゼロを詰め込まれる。それ故全ての場合に於い
て、Ｑ_rは1/2 LSB 点の丁度下にあるのか、正確に丁度
その上にあるのか、あるいは丁度その上方にあるのかを
決定する事が出来、その事はIEEEの正確な丸めのために
十分な情報を供給するのである。For multiplication of x · Q _t , all the high order bits of the result are kept for comparison with y, and y is not sufficiently higher order to be the same size as the x · Q _t product. The number is zero-padded. Therefore, in all cases it is possible to determine whether Q _r is exactly below the 1/2 LSB point, exactly above it, or just above it. It provides enough information for the IEEE's exact rounding.

【００６９】上記の説明は、説明的なものであって限定
的なものではないという事を理解すべきである。多くの
実施例が上記の説明を精査すればこの技術に熟練した者
にとって、明らかになるであろう。この発明の範囲は、
従って、付加された請求範囲を参照し、そのような請求
範囲が権利を付与されている同等の全範囲に沿って決定
されるのである。It should be understood that the above description is illustrative and not restrictive. Many examples will be apparent to those of skill in the art upon reviewing the above description. The scope of this invention is
Accordingly, reference is made to the appended claims and such claims are determined along with their full scope of equivalents to which they are entitled.

[Brief description of drawings]

【図１】本発明を組み入れたチップ電子部品の全体のブ
ロックダイヤグラムである。FIG. 1 is an overall block diagram of a chip electronic component incorporating the present invention.

【図２】多項式Ax²＋Bx＋Ｃのための係数の選択を図解
するグラフである。FIG. 2 is a graph illustrating the selection of coefficients for the polynomial Ax ² + Bx + C.

【図３】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 3 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図４】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 4 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図５】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 5 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図６】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 6 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図７】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 7 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図８】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 8 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図９】本発明に従った多項式計算のために使用される
論理を図解したブロックダイヤグラムである。FIG. 9 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図１０】本発明に従った多項式計算のために使用され
る論理を図解したブロックダイヤグラムである。FIG. 10 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図１１】本発明に従った多項式計算のために使用され
る論理を図解したブロックダイヤグラムである。FIG. 11 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図１２】本発明に従った多項式計算のために使用され
る論理を図解したブロックダイヤグラムである。FIG. 12 is a block diagram illustrating the logic used for polynomial calculations according to the present invention.

【図１３】単精度数字のための、多項式計算の異なった
サイクルの間に行われる演算を図示するタイミングチャ
ートである。FIG. 13 is a timing chart illustrating operations performed during different cycles of a polynomial calculation for single precision numbers.

【図１４】二倍精度数字のための多項式計算を図示する
タイミングチャートである。FIG. 14 is a timing chart illustrating a polynomial calculation for double precision numbers.

【図１５】ブロックダイヤグラムと、より大きな掛け算
の部分積を算出する二つの副アレイ掛け算器の使用を数
学的に提示した図である。FIG. 15 is a block diagram and a mathematical presentation of the use of two sub-array multipliers to calculate a larger partial product.

【図１６】ブロックダイヤグラムと、より大きな掛け算
の部分積を算出する二つの副アレイ掛け算器の使用を数
学的に提示した図である。FIG. 16 is a mathematical presentation of a block diagram and the use of two sub-array multipliers to calculate the partial product of a larger multiplication.

【図１７】マルチプレクサと加算器のＭペアからなる、
先行技術のＮｘＭ掛け算器の図である。FIG. 17 consists of M pairs of multiplexers and adders,
FIG. 3 is a diagram of a prior art NxM multiplier.

【図１８】（ａ）は如何にして８ビットの符号のない番
号がゼロを詰め込まれて基数４のブースレコーディング
にグループ化されるかを示す説明図、（ｂ）は如何にし
て３ビットのグループが標準的な基数４のブースレコー
ディングを通じて符号化されるかを示す図である。FIG. 18 (a) is an illustration showing how 8-bit unsigned numbers are padded with zeros and grouped into a radix-4 Booth recording; FIG. 18 (b) is a 3-bit representation. FIG. 6 is a diagram illustrating whether a group is encoded through standard radix-4 Booth recording.

【図１９】（ａ）は基数４のブースレコーダを代表する
ブロックダイヤグラム、（ｂ）は８ビットの符号のない
番号Ｂのブースレコーディング情報を算出出来るように
配列された図１９（ａ）の５つのブースレコーダの図で
ある。19 (a) is a block diagram representing a Booth recorder of radix 4; FIG. 19 (b) is a block diagram of FIG. 19 (a) arranged so that Booth recording information of number B without 8-bit code can be calculated. It is a figure of one booth recorder.

【図２０】如何にして掛け算器が５組の入力マックスと
加算器によって構成されるか、基数４のブースレコーデ
ィング情報によって制御されるかを示す説明図である。FIG. 20 is an explanatory diagram showing how a multiplier is composed of five sets of input muxes and adders, and is controlled by Booth recording information of radix 4.

【図２１】けた上げ／保留レコーダのブロックダイヤグ
ラムである。FIG. 21 is a block diagram of a carry / hold recorder.

【図２２】８つのけた上げ／保留符号のない番号Ｂのブ
ースレコーディング情報を算出できるようにけた上げ／
保留レコーダとブースレコーダの配列されているブロッ
クダイヤグラムである。FIG. 22: Carry / carry in order to be able to calculate booth recording information of number B without eight carry / hold codes
It is a block diagram in which a hold recorder and a booth recorder are arranged.

【図２３】４ビットと２ビットの加算器がそれぞれけた
上げ／保留フォーマットをブースレコーダに適用すべく
使用されるブロックダイヤグラムである。FIG. 23 is a block diagram in which 4-bit and 2-bit adders are used to apply carry / hold formats to a booth recorder, respectively.

【図２４】図２３の２ビット加算器を２ビットのけた上
げ／保留レコーダに置き換えるブロックダイヤグラムで
ある。FIG. 24 is a block diagram of the 2-bit adder of FIG. 23 replaced with a 2-bit carry / hold recorder.

【図２５】基数８のブースレコーダへの入力を発生させ
るのに使用される３ビットのけた上げ／保留レコーダの
ブロックダイヤグラムである。FIG. 25 is a block diagram of a 3-bit carry / hold recorder used to generate input to a radix-8 Booth recorder.

【図２６】けた上げ／保留フォーマット番号をブースレ
コーダに適用するのに、加算器を組み合わせて使用する
１ビットのけた上げ／保留レコーダのブロックダイヤグ
ラムである。FIG. 26 is a block diagram of a 1-bit carry / hold recorder that is used in combination with an adder to apply a carry / hold format number to a Booth recorder.

【図２７】１ビットのけた上げ／保留レコーダの一つの
可能な回路概要を示す説明図である。FIG. 27 is an explanatory diagram showing an outline of one possible circuit of a 1-bit carry / hold recorder.

[Explanation of symbols]

16 レジスタファイル 18 ALU 20 掛け算器ユニット 16 register file 18 ALU 20 multiplier unit

フロントページの続き (72)発明者ラリーヒューアメリカ合衆国，94040 カリフォルニア，マウンテンビュー，エスクエラアベニュー 333 (72)発明者インヤネシュヴァールプラブーアメリカ合衆国，95121 カリフォルニア，サンノセ，イーグルハーストドライブ 1770 (72)発明者フレデリックエー．ウェアアメリカ合衆国，94022 カリフォルニア，ロスアルトヒルズ，フリーモントパインズレーン 13961 (56)参考文献特開平２−181822（ＪＰ，Ａ) 特開昭63−163630（ＪＰ，Ａ) 特開平３−94328（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 7/52 G06F 7/552 G06F 17/10 Front Page Continuation (72) Inventor Larry Hugh United States, 94040 California, Mountain View, Escuela Avenue 333 (72) Inventor Inyane Schwar Prabhu United States, 95121 California, San Jose, Eaglehurst Drive 1770 (72) Invention Person Frederick A. Ware United States, 94022 California, Los Alt Hills, Freemont Pines Lane 13961 (56) Reference JP-A-2-181822 (JP, A) JP-A-63-163630 (JP, A) JP-A-3-94328 (JP, A) (58) Fields surveyed (Int.Cl. ⁷ , DB name) G06F 7/52 G06F 7/552 G06F 17/10

Claims

(57) [Claims]

1. A multiplier array comprising: a first array section, a second array section, and means for enabling or inhibiting the propagation of intermediate results from the first array section to the second array section; Means for providing a set of numbers to the first array section and the second array section , wherein the first array section and the second array section include the first array section and the second array section.
Propagation of intermediate results from the ray section to the second array section is prohibited
In the state in which the
As two products, each of which is calculated in parallel
And the first array section and the second array section are connected to the first array section.
Propagation of intermediate results from the ray section to the second array section
In such a state, the set of operands given to each is 1
As a product of the first array portion and the second array portion,
A device for multiplying the number of operations, which is characterized by being divided and calculated .

2. A multiplier array calculates a first product by calculating a product of a first operation number of m bits and a second operation number of (n + p) bits, and the first array unit includes: A second product is calculated by multiplying an m-bit third operation number by an n-bit fourth operation number, and the second array unit is configured by the m-bit fifth operation number and the p-bit sixth operation number. When the unified array mode is selected, the third product is calculated by multiplying by and means for enabling or inhibiting the propagation of the intermediate result is provided with means for selecting the split array mode and the unified array mode. Is allowed to be propagated, and propagation is prohibited when the divided array mode is selected. Means for giving a set of operation numbers to the first and second array units are as follows: a) divided array mode is selected If so, supply independent operands to the secondary array, b) When the unified array mode is selected, i) the first operation number is supplied to the first array unit as the third operation number and is supplied to the second array unit as the fifth operation number, and ii) the second operation number. Supplying the group of n bits on the least significant side as the fourth operation number to the first array section, and iii) supplying the group of p bits of the most significant side of the second operation number as the sixth operation number to the second array section The apparatus of claim 1, comprising:

3. An initial result of the operation is rounded to calculate a rounded value having a bit equivalent to a predetermined least significant bit LSB, and the initial result is a bit higher than the LSB. A device having at least 1/2 LSB and 1/4 LSB lower than the LSB by 1 bit and 2 bits, respectively, and wherein the operation has a real result, and the initial result is a 1/4 LSB point. To truncate to calculate the initial number Q ₀ , to add 1/4 LSB to Q ₀ to calculate the calculation result Q _c, and to truncate Q _c at the point of 1/2 LSB to truncate the result Q _t And means for determining the partial surplus for the operation, using Q _t as the result to obtain the partial surplus, and rounding from the initial result based on whether the partial remainder is positive, negative, or zero. A means for determining the obtained result.

4. The apparatus according to claim 3, wherein the operation is division of the number y by the number x, and the partial remainder is equal to y−x · Q _t .

5. The apparatus of claim 3, wherein the operation is a square root operation of the number x and the partial remainder is equal to x−Q _t · Q _t .

6. A method for rounding an initial result of an operation to calculate a rounded value having a bit equivalent to a predetermined least significant bit LSB, wherein the initial result has a bit higher than the LSB. , 1 bit and 2 bits lower than LSB, respectively, and at least 1/2 LSB and 1/4 LSB, and the operation has a real result. The initial result is truncated at the 1/4 LSB point. calculating a Q _0, 1/4 the steps of the LSB to calculate the calculation result Q _c is added to the Q _0, truncate the Q _c in terms of 1/2 LSB truncation results Q _t
To calculate the partial remainder for the operation, using Q _t as the result to obtain the partial remainder, and rounding from the initial result based on whether the partial remainder is positive, negative, or zero. Determining the result obtained.

7. A value of a function of x is determined with a specified accuracy, and x
Is the group of the most significant bit x _h and the least significant bit x _l
A device for holding a plurality of coefficients for each of a plurality of values of x and an input bus of x having a plurality of bit lines.
an input bus having a portion of the bit line corresponding to _h connected to an address input of the memory; and a data output of the memory and the input bus,
Within the interval defined by x _h , the polynomial “Ax ² + Bx +
According to the function of “C” , this polynomial “Ax ² + Bx +
And a calculation means for determining an approximate expression of the function of x by using the coefficients A, B and C of "C" .

8. The coefficient C is larger than the coefficient B in the memory.
, The coefficient B is higher than the coefficient A.
8. The apparatus of claim 7 having a number of high precision bits .

9. The calculation means is a result of the specified accuracy.
A multiplier array that is large enough to determine
The multiplier array comprises a first array section, a second array section, and an intermediate result from the first array section to the second array section.
A means for inhibiting propagation, and a means for differentiating the coefficient between the first array part and the second array part.
Given two numbers at the same time, calculate the two elements of the polynomial in parallel.
A device according to claim 7, comprising means for calculating .

10. The function of x is 1 / x.
The described device.

11. Overall effect of x _{h on} polynomial approximation
Is pre-factored to the coefficient, and the calculating means uses a coefficient having only x _l .

12. The value of a function of x is determined with a specified precision.
A method for holding a plurality of coefficients for each of a plurality of intervals of x.
According to the polynomial approximation of the function, using the step of giving the memory and the coefficient of the interval.
To determine an approximation of the function in one of the x intervals.
A method consisting of steps .

13. A newt for the function of the approximate expression
The improved approximation formula is obtained by implementing the Itinerson's iterative solution method.
13. The method of claim 12 including the step of calculating .

14. x is a group of most significant bits x _h
And a group of least significant bits x _l , the supplying step supplies the coefficient with an interval corresponding to bit x _h.
And so that x _h is constant within that interval.
Yes, the total effect of x _{h on} the polynomial approximation is
, And the decision step uses coefficients with only x _l
The method of claim 13, wherein comprising.

15. The polynomial is Ax ² + Bx + C, and the decision step is performed in parallel with A, B, and x ₁ ^2.
Find the finely C, and Ax _l ² in parallel as to calculate the Bx _l
15. The method of claim 14, comprising calculating .

16. The supplying step comprises the first lower order function.
The number of higher precision for the second higher coefficient
15. The method of claim 14 comprising providing a degree bit .

17. A carry hold format operation number bit
Independent and predictive carry for all input , carry input, and all carry inputs
Using the output and the above-mentioned operand bit input and carry input,
A means of calculating the recorder input information, and a bit of the operand in the carry pending format with
A device that calculates booth recorder input information about .

18. A correction dependent on the carry input.
Bei Eteori a positive carry output, booth recorder information before
Correction derived from carry input and arithmetic bit input
It consists of carry output and total output bit.
Raised output contains booth recording overlap bits
18. The device of claim 17, which is force .

19. The number of operations in the carry hold format
With the device that calculates the booth recorder input information for the part
So, the part of the operation number is n total bits and n digits.
Has carry bit, said sum bit and carry bit
Has a composite total S, and n is the number-of-operations partial total bit input, n is the number-of-operations-partized carry bit input, and carry input, and at least the total S is greater than (2 ⁿ +2 ^n-1 -2). When
Means for generating a pre-determined carry output, and when the sum S is greater than (2 ⁿ +2 ^n-1 -1), the sum S is equal to (2 ⁿ +2 ^n-1 -1)
When the input is true, and the predictive carry generation means is predictive
No carry is generated, or the total S is (2 ^n-1 -1)
And the carry input is true, or the sum S is
The corrected carry output is larger than (2 ^n-1 -1).
Source recording overlap bit input
In addition, the means for issuing the corrected carry output, the bits of the arithmetic part and the carry input are summed, and the
Instrumentation <br/> location of and means for calculating a Surekoda input information.

20. The booth recorder input information is a radix 2 ⁿ
20. The apparatus according to claim 19, which is information of booth recording of .

21. When n is 2 and the predictive carry generation means has a sum S greater than 1.
To generate a predictive carry output, and when the correct carry generation means produces a sum S equal to 6,
When the total S is equal to 5 and the raised input is true, and
When the total S is equal to 1 and the raised input is true,
21. The apparatus of claim 20 , producing a corrected carry output .

22. The predictive carry generation means has a total S
Preemptive carry output when larger than (2 ^n-1 -1)
Generated and corrected carry generation means, the sum S is (2 ⁿ +2 ^n-1 −
1) is larger than, the sum S becomes (2 ⁿ +2 ^n-1 -1)
Equal, when the raised input is true, and the total S
When (2 ^n-1 -1) is equal and the added input is true
20. The apparatus of claim 19 , wherein the apparatus produces a corrected carry output .