JPH01116730A

JPH01116730A - Digital signal processor

Info

Publication number: JPH01116730A
Application number: JP27481087A
Authority: JP
Inventors: Atsumichi Murakami; 篤道村上; Isao Uesawa; 上澤　功; Masatoshi Kameyama; 正俊亀山
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1987-10-30
Filing date: 1987-10-30
Publication date: 1989-05-09

Abstract

PURPOSE:To flexibly and easily configure a high-speed signal processing system by specifying uniquely a computing element action to various operations, combining a function code and a control code corresponding to this and composing a microinstruction code. CONSTITUTION:The title device has 5-stage structure to which a stage to read data out of a data memory to an instruction executing pipe line stage and input the data to a computing element 106 and a stage to output data from the computing element 106 and write them into the data memory or execute an accumulation or a data rounding by using an accumulator in the computing element are added, a barrel shifter, a multiplier, an arithmetic and logic unit are arranged in the same row in the computing element 106 corresponding to an execution stage in the 5 stages, a barrel shifter for regularizing is connected in the next step corresponding to a writing/accumulation stage and the output of this is made into an input to an adder for rounding/accumulation or the output of the computing element. Thus, a digital signal processing processor whose device configuration is rich in flexibility and simple can be obtained.

Description

【発明の詳細な説明】〔産業上の利用分野〕この発明は、主に信号系列を対象とした演算処理を実行
するディジタル信号処理プロセッサに関するものである
。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a digital signal processing processor that performs arithmetic processing mainly on signal sequences.

[Conventional technology]

第１図は９例えば昭和６１年度電子通信学会通信部門全
国大会シンポジウム予稿（４８１０−１）に示された従
来のディジタル信号処理プロセッサであるＤ８８Ｐ１（
Ｄｉｇｉｔａｌ　　５ｐｅｅｃｈ　８１ｇｎａｌＰｒｏ
ｃｅｓｓｏｒ　１　　）の構成を示すブロック図であり
９図において、（１）は命令アドレスを制御するスタツ
クを内蔵したプログラム・カウンタＰＣ，（２１はマイ
クロ命令を記憶した命令マスクＲＯＭ、＋３１はこの命
令マスクＲＯＭ＋２１ないし外部から入力されるマイク
ロ命令をマシンサイクル毎ＩｃＩ語入力するインストラ
クション・レジスタエＲＯ０（４）はこのインストラク
ション・レジスタエＲＯ＋３１へ入力されたマイクロ命
令中のデコードが必要なビットフィールドのみを入力す
るインストラクションレジスタＩＲＩ、＋５）はこのイ
ンストラクションレジスタエＲ？　１４１へ入力された
マイクロ命令をデコードするインストラクションデコー
ダ、（６）はマイクロ命令を各機能部へ分配するプログ
ラムバスＰ−Ｂｕ８．（７１はこのプログラムバフ、　
Ｐ　−Ｂｕｓ　（６）から出力されるマイクロ命令中の
即値（１８ビット幅）を入力し、データバスＤ　−Ｂｕ
ｇ（８）へ出力するレジスタフ工、（８）は演算に伴う
データの内部転送に用いられる１８ビット幅のデータバ
スＤ−１３ｕｓ、（９）はデータメモリのアドレスモー
ド指示をプログラムバスＰ　−Ｂｕｇ　（６１から入力
するレジスタＡＭ、［１１は間接アドレス生成に使用す
るアドレスポインタ情報を保持する４ＷＸ１６ビツト幅
のレジスタＡＤ、ｆｕｌｌは外部データーメモリのペー
ジを指定する３ビット幅のページレジスタＰＲ，ｆｆ２
は同時に最大３つのアドレス生成が可能な９ビット幅の
アドレス算出器ＡＡＵ、＋１３はアドレスレジスタＡＲ
Ｏ，ｆ１４１はアドレスｌｌレジスタＡＲＩ、１１５は
アドレスレジスタＡ１２，１１８はアドレスセレクタＲ
ＡＳ、（１７１はループカウンタＤｏ、（Ｊ棒ハプロセ
ッサの動作モードおよび状態の表示を行うステータスレ
ジスタ８Ｒ，α傷はシリアルエ１０ポートＳｘａ／″１
、Ｓｏｎ／ｌ（至）と外部データメモリとの間で直接デ
ータ転送を行うＤＭＡ制御部、翰は外部データメモリへ
出力する１２ビット幅のアドレスを保持するアドレスレ
ジスタＡＲ，１２１１は５１２ＷＸ１ａビツトの容量を
持ち、同時Ｊｌｃ２つのデータの読出し、書込が可能な
デュアルポート内部データメモリ２、Ｐ−ＲＡＭ、＠は
被演算入力データを保持するレジスタ・Ｄ　Ｐ　Ｏ、■
は演算入力データを保持するレジスタＤＩ’　１．勾は
１２Ｅ６ビツトフオーマツトの浮動小数点乗算を行う乗
算ｉ５１ＦＭＰＬ、＠はこの乗算器Ｆ　Ｍ　Ｐ　Ｌ　ｅ
）４）の結果を保持するレジスタＰ、＠はセレクタ、■
はセレクタ、（至）は主に１２Ｅ６ビントフオーマツト
の浮動小数点演算を実行する浮動小数点算述論理演算１
ＦＡＬＵ、＠はこの浮動小数点算述論理演算１５ＦＡＬ
Ｕ（至）の出力を保持し、累算等に使用する４ＷＸ１８
ビツトのアキュームレータＡａａＯ〜ＡＯＣ！３．ωは
外部データメモリに対する読出し／−ｉｔ込みデータを
一時保持する目的でデータバスＤ　−Ｂｕｓ　（８）に
接続されたデータレジスタＤＲ，Ｏｎは外部データメモ
リの読出し／書込制御回路Ｒ／　Ｗ　Ｃｏｎｔ　、　ｅ
３３は外部デバイスとの間で全２重２チヤンネルのシリ
アルデータ転送を実行するシリアルエ１０ポートＳ工０
／１　、８００／１　、　＠は割込制御回路工ｎｔ。Figure 1 shows the conventional digital signal processing processor D88P1 (4810-1) shown in the 1986 IEICE National Conference Symposium (4810-1).
Digital 5peech 81gnalPro
In Figure 9, (1) is a program counter PC with a built-in stack for controlling instruction addresses, (21 is an instruction mask ROM that stores microinstructions, and +31 is this instruction mask). The instruction register RO0 (4), which inputs IcI words input from the ROM+21 or the outside, every machine cycle, is an instruction register that inputs only the bit fields that need to be decoded in the microinstructions input to the instruction register RO+31. IRI, +5) is this instruction register R? An instruction decoder (6) decodes the microinstructions input to the P-Bu8.141, and a program bus P-Bu8. (71 is this program buff,
The immediate value (18-bit width) in the microinstruction output from P-Bus (6) is input, and the data bus D-Bu
(8) is an 18-bit wide data bus D-13us used for internal transfer of data associated with calculations, and (9) is a program bus P-Bug that outputs data memory address mode instructions. (Register AM input from 61, [11 is a 4W x 16-bit wide register AD that holds address pointer information used for indirect address generation, full is a 3-bit wide page register PR that specifies the page of external data memory, ff2
is a 9-bit wide address calculator AAU that can generate up to three addresses at the same time, +13 is an address register AR
O, f141 is address ll register ARI, 115 is address register A12, 118 is address selector R
AS, (171 is the loop counter Do, (J bar is the status register 8R that displays the operating mode and status of the processor, α is the serial port Sxa/''1
, a DMA control unit that directly transfers data between Son/l (to) and external data memory, 1211 is an address register AR that holds a 12-bit wide address to be output to the external data memory, and 1211 is a capacity of 512W x 1a bit. Dual port internal data memory 2, P-RAM, which can read and write two types of data at the same time, @ is a register that holds operand input data, D P O, ■
is a register DI' that holds calculation input data.1. The gradient is the multiplier i51FMPL that performs floating point multiplication in 12E6-bit format, and @ is the multiplier FMPL.
) 4) The register P that holds the result, @ is the selector, ■
is a selector, and (to) is a floating-point arithmetic logical operation 1 that mainly performs floating-point operations in 12E6 bin format.
FALU, @ is this floating point arithmetic logic operation 15FAL
4WX18 which holds the output of U (to) and uses it for accumulation etc.
Bit accumulator AaaO~AOC! 3. ω is a data register DR connected to the data bus D-Bus (8) for the purpose of temporarily holding read/write data to the external data memory, and On is a read/write control circuit for the external data memory R/W Cont. , e
33 is a serial port 10 that performs full-duplex 2-channel serial data transfer with an external device.
/1, 800/1, @ is interrupt control circuit engineering nt.

Ｃｏｎｔ、　、（至）は外部データメモリバス制御回路
Ｂｕ８０ｏｎｔ、　、（至）は内部タイミングを制御す
るクロンクＹｆｉＩ御回路ＣＬ　Ｋ　Ｃａｎｔ、　＋（
至）はセレクタである。Cont, , (to) is the external data memory bus control circuit Bu80ont, , (to) is the clock YfiI control circuit CL K Cant, +(
) is a selector.

第８図は第７図に示したディジタル信号処理プロセッサ
ＤＢ８Ｐ１のマイクロ命令実行シーケンスを説明したタ
イムチャートであり６図において。FIG. 8 is a time chart illustrating the microinstruction execution sequence of the digital signal processing processor DB8P1 shown in FIG. 7;

顛は４相のクロックからなるサイクルタイミング。The main feature is cycle timing consisting of four-phase clocks.

０υはプログラムカウンタＰ　Ｏ１１）のアドレス出力
およびインストラクションレジスタＩ　ＲＱ　＋３１へ
のマイクロ命令入力のステージを示すフェッチステージ
タイミング、冊はインストラクションレジスタエＲ１ｔ
４）へ入力されたマイクロ命令をインストラクションデ
コーダ（５）でデコードするデコードステージ・タイミ
ング、伽３はデコードステージにおいてアドレス算出器
ＡＡｔＴ（１２の更新を行うタイミング、＠４は浮動小
数点乗算！ＰＭ　Ｐ　Ｌ＋２４１が動作を行うタイミン
グ、０９は浮動小数点算術論理演算器ＦＡＬσ（至）が
演算を行うタイミング、咽はデータバスＤ　−Ｂｕｓ　
（８）を経由してレジスタ間のデータ転送を行うタイミ
ング、包ηはデータレジスタＤＲＩ３１を介して外部デ
ータメモリへデータの読出し／書込みを行うタイミング
である。0υ is the fetch stage timing indicating the stage of the address output of the program counter PO11) and the microinstruction input to the instruction register I RQ +31, and the book is the instruction register R1t.
4) is the decode stage timing when the instruction decoder (5) decodes the microinstruction input to the instruction decoder (5). 3 is the timing for updating the address calculator AAtT (12) in the decode stage; @4 is the floating point multiplication! PM P L + 241 09 is the timing when the floating point arithmetic and logic unit FALσ (to) performs the operation, and the third is the data bus D-Bus.
The timing at which data is transferred between registers via (8) is the timing at which data is read/written to the external data memory via the data register DRI31.

第９図は第１図に示したディジタル信号処理プロセッサ
ＤＥ１８Ｐ１の４グル一プ忙分類された１語当り３２ビ
ット幅で構成されるマイクロ命令の構造を示す図であり
、鏝は命令動作手順を制御するシーケンス命令、　　（
５１）　　はステータスレジスタ８Ｒ（ｌη、アドレス
算出ＷＡＡＵ［１２，ＤＭＡ制御部（Ｉｇのモード設定
・初期値設定を示すモード命令。FIG. 9 is a diagram showing the structure of a microinstruction consisting of 32 bits per word classified into four groups of the digital signal processing processor DE18P1 shown in FIG. Sequence instructions to control, (
51) Status register 8R (lη, address calculation WAAU [12, DMA control unit (mode command indicating Ig mode setting/initial value setting).

（５２）は主に浮動小数点算術論理演算器ＦＡＬＵ＠に
対する実行とそれに伴う並列データ転送を制御する演算
命令、　　（５５）　　は任意のレジスタないしデータ
メモリへ即値ロードを実行するロード命令である。(52) is an arithmetic instruction that mainly controls execution on the floating point arithmetic and logic unit FALU@ and associated parallel data transfer, and (55) is a load instruction that executes an immediate value load to any register or data memory.

次に動作について説明する。以下、簡単のために各部の
名称は前記説明中で用いた略称を用いるものとする。Next, the operation will be explained. Hereinafter, for the sake of simplicity, the abbreviations used in the above description will be used for the names of each part.

先ず、第１図に基づき全体の概略動作を説明する。本信
号処理プロセッサはＰ　−Ｂｕｓ　（６）とＤ　−腕θ
（８）が分離された構成を持ち、工ＲＯ（３１へのマイ
クロ命令入力、　　Ｐ　−Ｂｕｓ　（６１を介したマイ
クロ命令の転送、インストラクションデコーダ（５）に
よるマイクロ命令のデコード、　　Ｄ−Ｂｕｓ（８）、
　　ＩＰＭＰＬ＋２４１．　　ＦＡＬＵｆｉ等による命
令の実行をパイプライン処理によって並列に処理を行う
。ここで、　　Ｄ　−Ｂｕｓ　（８）、　　２　Ｐ−Ｒ
Ａ　Ｍ（２１１を始めとする各実行二二ントは全てレジ
スタベース、すなわち、入力と出力は全てレジスタに接
続された形式となる。このレジスタへのアクセスタイミ
ングは、マシンサイクルの前縁で出力し、マシンサイク
ルの後縁でレジスタヘセットが行われる。すなわち、実
際（処理されるデータは同一マイクロ命令によってレジ
スタへセットされた内容ではなく、１以上前のマイクロ
命令でレジスタへセットされた内容となる。First, the overall general operation will be explained based on FIG. This signal processing processor uses P-Bus (6) and D-arm θ
(8) has a separate configuration, micro-instruction input to the engineering RO (31), micro-instruction transfer via P-Bus (61), micro-instruction decoding by the instruction decoder (5), and D-Bus (8). ),
IPMPL+241. Instructions such as FALUfi are executed in parallel by pipeline processing. Here, D-Bus (8), 2 P-R
All execution units including A M (211) are register-based, that is, all inputs and outputs are connected to registers.The access timing to this register is such that the output is output at the leading edge of the machine cycle. , the register is set at the trailing edge of the machine cycle.In other words, the data actually processed is not the content set in the register by the same microinstruction, but the content set in the register by one or more previous microinstructions. Become.

これを、遅延動作（デイレード動作）と呼び、レジスタ
で演算部内の各部を区切ることで各部を並列に動作させ
ることが可能となる。例えば、　　ＦＭＰＬＱ４は本プ
ロセッサではマシンサイクル毎に１回浮動小数点乗算を
常に実行している。ここへ演算データを入力する場合に
は、先ず１つ前のマイクロ命令で：ｏｐｏｃｃ、ｎｐｔ
＠ヘデータをセットし。This is called a delayed operation, and by separating each section in the arithmetic unit with registers, it is possible to operate each section in parallel. For example, FMPLQ4 always performs a floating point multiplication once per machine cycle in this processor. When inputting calculation data here, first use the previous microinstruction: opocc, npt
Set the data to @.

１つ以上後のマイクロ命令でＰ（ハ）にセットされて　
。It is set to P (c) by one or more subsequent microinstructions.
.

いる内容を取り出すことで乗算結果を得る。この内容を
取シ出すまでの間ＤＰＯＩ２’３．ＤＰＩ（至）、Ｐ（
５）によってデータを保持することから１本来はデータ
入力９乗算、データ出力と３マイクロ命令を必要とする
１回の乗算も、連続して処理を行う場合には９等価的に
１マイクロ命令に１回処理を行うことができる。Obtain the multiplication result by extracting the contents. Until this content is extracted, DPOI2'3. DPI (To), P (
5) Since the data is retained by 1, a single multiplication that would normally require 9 multiplications for data input, 3 data outputs, and 3 microinstructions becomes 9 equivalently 1 microinstruction when processing is performed continuously. It can be processed once.

ＤＥＩＩ９Ｐ１ではＦＭＰＬ２４とＦＡＩ＋Ｕ（至）が
Ｐ（至）を介して接続され、ＦＡＬＵ（至）はＡＣ！Ｇ
ｏ〜ＡＣＣ３のでＰ２！２の内容を累算可能な様構成さ
れている。In DEII9P1, FMPL24 and FAI+U (to) are connected via P (to), and FALU (to) is connected to AC! G
o~ACC3 is configured so that the contents of P2!2 can be accumulated.

これは、　　Ｌｏｕｉｓ　Ｅｉｃｈｉｒｍ　　がＩ！ｆ
ｌｅｃｔｒｏｎｉｃｓ１９１９年１２月２０日号で発表
した論文”Ｐａｃｋｉｎｇａ　ｓｉｇｎａｌ　ｐｒｏｃ
ｅｓｓｏｒ　ｏｎｔｏ　ａ　ｓｉｎｇｌｅ　ｄｉｇｉｔ
ａｌｂｏａ（１″に示した乗算器−累算器の１対と同様
。This is Louis Eichirm's I! f
A paper published in the December 20, 1919 issue of electronics “Packing signal proc”
essor onto a single digit
alboa (similar to the multiplier-accumulator pair shown in 1″).

フィルタリング＊　　ＦＩ　Ｔ　（Ｆａａｔ　Ｆｏｕｒ
ｉｅｒ　Ｔｒａｎｓｆｏｒｍ）のバラフライ演算等で多
用される積和演算の１項を１マシンサイクルで実行する
ためのものである。Filtering* FI T (Faat Four
This is for executing one term of the sum-of-products operation in one machine cycle, which is often used in the butterfly operation of the IER Transform.

積和は例えば以下の式に従う。For example, the sum of products follows the following formula.

本プロセッサにおいては１項の積和はＤＰＯ＠。In this processor, the sum of products of one term is DPO@.

ＤＰＩ（ハ）へのデータ入力、ＩＦＭＰＬＧ４）での乗
算。Data input to DPI (c), multiplication by IFMPLG4).

ＦＡＩ、ＵＦ４）ｍでＰ（至）ヘセットされ九乗算結果
とＡＣＯＯ〜ＡＣ３Ｃ５＠の累算の３マイクロ命令を必
要とする。もちろん連続して処理を行う場合には１等価
的に１マイクロ命令に１回、１項の積和を実現すること
ができる。当然、この様に１マイクロ命令に１回、１項
の積和を実行するためには１マイクロ命令毎に前出の式
中のａｉｌ　　ｂｉに相当する２つの入力データをＤＰ
Ｅ＠、ＤＰＩ（ハ）へ入力することが必要となる。その
ため、２Ｐ−ＲＡＭ２１１ＶＣよってこの２つの入力デ
ータを供給可能とし。FAI, UF4) m is set to P (to) and requires 3 microinstructions for the nine multiplication results and the accumulation of ACOO to AC3C5@. Of course, when processing is performed continuously, it is possible to equivalently implement the sum of products of one term once per one microinstruction. Naturally, in order to execute the sum of products of one term once per microinstruction in this way, two input data corresponding to ail bi in the above formula must be input to the DP for each microinstruction.
It is necessary to input to E@ and DPI (c). Therefore, these two input data can be supplied by the 2P-RAM 211VC.

Ｄ　−Ｂｕｓ　ｔ８）へのバス競合を避けるため、２Ｐ
−ＲＡ　Ｍ　２１１から読出されたデータはＤ　−Ｂｕ
ｓ　ｔ８）を介さずにＤＰＯ（２’３．ＤＰＩ（至）へ
直接転送するパスを備える。主としてこの２Ｐ−ＲＡＭ
２１１の２入力データのアドレス指定のため、ＡＡｔ７
（ＩＪはＡＲＯ（１３）。In order to avoid bus contention for D-Bus t8), 2P
-The data read from RAM 211 is D-Bu
Provides a path for direct transfer to the DPO (2'3.DPI (to)) without going through the
In order to specify the address of 2 input data of 211, AAt7
(IJ is ARO (13).

ＡＲ１１１４）、ＡＲ２１Ｌ５１を介して出力される９
ピント幅のアドレスデータ中の２つを選択して出力する
手段を備える。こ０ＡＡＵｕ３は２Ｐ−ＲＡＭＣＪＩＩ
からの２入力データアドレスとＤＲＣ３ｆ）、ＡＲ■を
介した外部データメモリへの１出力デ一タアドレス場合
にのみ最高３つのアドレスを同時に指定できる様に構成
される。各々のアドレス指定は全て。AR1114), 9 output via AR21L51
Means for selecting and outputting two of the focus width address data is provided. This 0AAUu3 is 2P-RAMCJII
It is configured such that a maximum of three addresses can be specified simultaneously only in the case of two input data addresses from DRC3f) and one output data address to external data memory via AR. Each address specification is all.

Ａ　Ａ　Ｕ　［２の内部に設定されたアドレスポインタ
を用いたいわゆる間接アドレス指定方式のみとなってお
り、ム１ｏ（１３に対してはインクリメント、モジュロ
、ビットリバース、リピート、インクリメントベースア
ドレス、インクリメント値の更新等が可能であＬ他のＡ
Ｒｉ４．ＡＲ２１Ｌｉは単純なインクリメントのみが可
能となっている。ＡＡＵα２は９ピント自然２進形式で
のみアドレス演算が可能であシ、外部データメモリアド
レス１２ビットを指定する時は、この９ビツトにＰＲｌ
ｌｌｌで指示される３ピントのメモリページ指定とあわ
せて１２ビツトとする。AA It is possible to update L other A
Ri4. AR21Li is only capable of simple increment. AAUα2 is capable of address calculation only in 9-pin natural binary format, and when specifying a 12-bit external data memory address, PRl is added to these 9 bits.
Together with the 3-pin memory page designation specified by ll, the total is 12 bits.

一万、ＩＦＭＰＬｔ２４１．ＦＡＬＵ（２）は１２Ｅ６
の正規化浮動小数点形式で演算を実行するため、２Ｐ−
ＲＡＭｅｌｌ、ＤＰＯ■、ＤＰＩ（ハ）、Ｐ（ハ）、　
　Ａｃｃｇ〜ＡＯＯ３＠、　ＤＩｔ３１１．　　Ｄ−Ｂ
ｕａｔ８）、　Ｂ工（７）は全て１８ビツト幅で９ｆｉ
、ＩＦＡＬＵ（至）で特別なアドレス初期値を算用する
ためには特種な演算モードを必要とする。このため、Ａ
ＲＯ（１謙、ＡＲ１ｔｌ瘤。10,000, IFMPLt241. FALU(2) is 12E6
To perform operations in the normalized floating point format of 2P-
RAMell, DPO ■, DPI (ha), P (ha),
Accg~AOO3@, DIt311. D-B
uat8), B work (7) are all 18 bits wide and 9fi
, IFALU(to) requires a special operation mode to use a special address initial value. For this reason, A
RO (1ken, AR1tl lump.

ＡＲ２ｄ、ＡＲ■とＡｃａ（１〜ＡＯＣ３（２）へセツ
トされる演算結果データの間のデータ互換性は無い。There is no data compatibility between the operation result data set in AR2d, AR2 and Aca(1 to AOC3(2)).

ＤＭＡ制御部＋１１は合計２チヤンネルの全２重シリア
ルエ１０ポートＳ工０／１．ｓｏＯ／Ｎｉ３の入出力デ
ータと外部データメモリ間とのデータ転送をマイクロ命
令とは独立に実行する。ＤＭＡ制御部（１９によるデー
タ転送にはＤ　−Ｂｕｓ　（８）、　Ａ　Ｒ（２１）。The DMA control unit +11 has a total of 2 channels of full duplex serial data 10 ports S 0/1. Data transfer between input/output data of soO/Ni3 and external data memory is executed independently of microinstructions. D-Bus (8) and AR (21) are used for data transfer by the DMA control unit (19).

ＤＲＣ３Ｇを使用するため、インストラクションデコー
ダ（５）で制御されるマイクロ命令動作とこの内部リソ
ースの競合が生じる危険がある。Since DRC3G is used, there is a risk that this internal resource will conflict with the microinstruction operation controlled by the instruction decoder (5).

これを回避する目的でＤＭＡ制御部α９によるデータ転
送の際には１ワードにつき、６マシンサイクルの間イン
ストラクションデコーダ（５）を休止し。In order to avoid this, the instruction decoder (5) is paused for 6 machine cycles for each word during data transfer by the DMA control unit α9.

マイクロ命令による動作を止める。Stops actions caused by microinstructions.

以上ｆ：まとめると、ＤＢＳＰｌはマイクロ命令実行時
に１マイクロ命令内で以下の動作を並列に実行すること
が可能である。Above f: In summary, DBSP1 can execute the following operations in parallel within one microinstruction when executing a microinstruction.

■　Ａ　Ａ　Ｕ　０３による最大３種の９ビツトアドレ
ス演算。■ Up to three types of 9-bit address operations using AAU03.

■　ＦＭＰＬｅＪ４による１２Ｅ６の浮動小数点乗算Ｏ ■　ＦＡＬＵ（２）による１　２ＦＸ６の浮動小数点演
算。■ 12E6 floating point multiplication by FMPLeJ4 ■ 12FX6 floating point operation by FALU(2).

■　２Ｐ−ＲＡＭｃ！１１とＤ　−Ｈｕｅ　（８１，Ｄ
　Ｒｆｉを介した外部データメモリ間でのデータ転送。■ 2P-RAMc! 11 and D -Hue (81,D
Data transfer between external data memories via RFI.

■　２チヤンネルの全２重シリアルエ１０ポートＳ工０
／１．Ｓｏｎ／ＩＣＲとＤ　−Ｈｕｅ　（８）ＩＤＲＣ
３１を介した外部データメモリ間のＤＭＡデータ転送。■ 2 channels full duplex serial interface 10 ports S 0
/1. Son/ICR and D-Hue (8) IDRC
DMA data transfer between external data memories via 31.

次に第８図に基づき、ＤＢＳＰｌのマイクロ命令実行タ
イミングについて説明する。ＤＢＳＰｌのマシンサイク
ル＋４１は１マシンサイクルを４つに分割しｆｃＰＯ〜
Ｐ３の４相のタイミングによって動作し、１マシンサイ
クルのサイクルタイムは公称５０　ｎａｅｃ　　と高速
である。このため、１マシンサイクル内で命令マスクＲ
ＯＭ　＋２１からのマイクロ命令読出し、インストラク
ションデコーダ（５）によるマイクロ命令のデコード、
　　ＦＭＰＬＣＩ！４１．　　ＦＡＩ、Ｕ（至）等の内
部リソースによる命令の実行の３つの動作を行うことは
実状では困難である。そこで。Next, the microinstruction execution timing of DBSP1 will be explained based on FIG. DBSPl machine cycle +41 divides one machine cycle into four, fcPO~
It operates according to the four-phase timing of P3, and the cycle time of one machine cycle is nominally 50 naec, which is fast. Therefore, the instruction mask R within one machine cycle
Reading of microinstructions from OM+21, decoding of microinstructions by instruction decoder (5),
FMPLCI! 41. In reality, it is difficult to perform the three operations of executing instructions using internal resources such as FAI and U (To). Therefore.

ＤＢ８Ｐ１ではこの３つを各々１マシンサイクル毎のス
テージに分割し、３段パイプラインを構成して高速動作
を実現している。この３段パイプラインの各ステージで
は以下のことが実行される。In the DB8P1, these three stages are each divided into stages of one machine cycle each, forming a three-stage pipeline to realize high-speed operation. The following is performed at each stage of this three-stage pipeline.

■　フェッチ・ステージ０υ ｐｃ（１）によるマイクロ命令アドレス出力と命令マス
クＲＯＭ　＋２）からのマイクロ命令読出し。および、
工！ｔ　Ｏ（３１へマイクは命令セット。■ Microinstruction address output by fetch stage 0υ pc(1) and microinstruction reading from instruction mask ROM +2). and,
Engineering! t O (to 31, Mike is a command set.

■　デコード・ステージゆ、（至）工ＲＯ（３）から工Ｉｔ　１　（４１へのマイクロ命令
転送とインストラクションデコーダ（５）Ｋよるマイク
ロ命令デコード。および、プログラム制御モードのセッ
ト。■ Decode stage (to) Microinstruction transfer from Engineering RO (3) to Engineering It 1 (41) and microinstruction decoding by instruction decoder (5) K. And setting of program control mode.

工ＲＯ（３１からｐ　−ｂｕｓ　（６）へのマイクロ命
令転送とＡＭ＋９１．　　ムＤ（Ｉｎを介したＡＡｔＴ
ｌ１２Ｏアドレス演算。Microinstruction transfer from RO(31 to p-bus (6) and AM+91.MUD(AAtT via In)
l12O address calculation.

■　実行ステージ（財）、卿、■、（４ηＦＭＰＬＨ，
Ｆ’ＡＬＵ＠によるデータ演算。■ Execution stage (goods), Lord, ■, (4ηFMPLH,
Data calculation using F'ALU@.

Ｄ　−Ｂｕｓ　（８１ｒよるデータ転送。ＡＲ（２５゜
ＤＲ（至）を介した外部データメモリ・アクセス等。D-Bus (Data transfer via 81r. External data memory access via AR (25°DR), etc.

これＫより、ＤＢ８Ｐ１は１マイクロ命令の実行に３マ
シンサイクルを必要とする。しかし、パイプライン手法
によシ等価的に１マシンサイクル毎に１マイクロ命令の
実行が可能となる。このため、命令マスクＲＯＭ　＋２
）からマイクロ命令を読出す時点から実際に命令を実行
する時点まで２マシンサイクルの遅延を生じる。内部リ
ソースにおけるタイミング競合を完全に防止する目的で
内部バスをＰ　−Ｂｕｓ　＋６）　、　　Ｄ　−Ｂｕｓ
　（８）に分離し、これに伴って命令マスクＲＯＭ　＋
２）と２Ｐ−ＲＡＭ（２］）を分離した構成を取るのは
このためによる。しかし１分校命令等では実際に分枝す
るのは■のデコードステージであるためその時点で工Ｒ
Ｏ（３）へセット中のマイクロ命令は実行されてしまう
。すなわち。From this K, DB8P1 requires three machine cycles to execute one microinstruction. However, the pipeline method enables equivalent execution of one microinstruction per machine cycle. Therefore, the instruction mask ROM +2
) There is a delay of two machine cycles from the time the microinstruction is read from the microinstruction to the time the instruction is actually executed. In order to completely prevent timing conflicts in internal resources, internal buses are designated as P-Bus +6) and D-Bus.
(8), and along with this, the instruction mask ROM +
This is the reason why the configuration in which the 2) and 2P-RAM (2]) are separated is adopted. However, in the case of a one-branch instruction, etc., the actual branching occurs at the decoding stage (■), so at that point, the
The microinstruction being set to O(3) will be executed. Namely.

分枝命令の次に書かれた命令は無条件に実行されてしま
うこととなる。これを避ける目的でＤＢＳＰｌでは分校
命令を実行中は次の命令をＮ　ＯＰ（ノーオペレーショ
ン）へ自動的に変更することとしている。この機能はマ
イクロ命令記述の簡単化をねらったものであるが分枝動
作では１マシンサイクルのロスが生じ、更Ｋ　Ｄ　−Ｂ
ｕθ（８）を用いた間接分枝では２マシンサイクルのロ
スを生じる。一般に命令記述の頭序を考慮することによ
って約８０％程度の無条件分枝は次命令を実行しても問
題が生ぜず、前記ロスの回避は可能であるがＤＳＩ９Ｐ
１ではこれが不可能である。The instruction written after the branch instruction will be executed unconditionally. To avoid this, DBSPl automatically changes the next instruction to NOP (no operation) while a branch instruction is being executed. Although this function is aimed at simplifying the microinstruction description, branching operations result in a loss of one machine cycle, and the
Indirect branching using uθ(8) results in a loss of two machine cycles. Generally speaking, by considering the initial order of instruction descriptions, about 80% of unconditional branches will not cause any problems even if the next instruction is executed, and the above loss can be avoided, but DSI9P
1, this is not possible.

次に、第９図に基づきＤＢ８Ｐ１０マイクロ命令セット
について説明する。マイクロ命令のセットはシーケンス
、モード、演算、ロード命令の４種のみである。Next, the DB8P10 microinstruction set will be explained based on FIG. The set of microinstructions consists of only four types: sequence, mode, operation, and load instructions.

シーケンス命令は分枝、ループ、サブルーチンコールを
制御するものであり主にＰ　Ｃ１１１に対する命令を担
当する。モード命令はＡ　Ａ　Ｕ　Ｔ１７Ｊセレクタ１
１．１，０（ｌη、ＥＩＲＱＩ、ＤＭＡ制御部ｆｉＩＫ
対する初期値およびモード設定を行う命令である。ロー
ド命令はＢ工（７）を介してＤ　−Ｂｕｓ　（１３）に
接続されたレジスタに即値（１８ビット幅）をロードす
る命令である。以上のマイクロ命令ではその操作対象と
なるリソースが命令動作によって一定となる。−方、演
算命令に関しては前述の並列動作可能な内部リソースの
全てを直接指示する必要がある。このため、演算命令の
ビット長が最多となｊｊ）、　　ＤＢ８Ｐ１は３２ビッ
ト幅の水平マイクロ命令を使用している。ここでＦＭＰ
ＬＣ！４１はフリーランとし、前述の様忙命令で直接指
示を行わない。ＩＰ　Ａ　Ｌ　ＵｆＪＫ対する動作指定
は命令で直接指示を行い２例えば以下のものがある。Sequence instructions control branches, loops, and subroutine calls, and are mainly responsible for instructions to the PC 111. The mode command is A A U T17J selector 1
1.1,0(lη, EIRQI, DMA control unit fiIK
This is an instruction to set the initial value and mode for the . The load instruction is an instruction for loading an immediate value (18-bit width) into a register connected to the D-Bus (13) via the B bus (7). In the above microinstructions, the resource to be operated on remains constant depending on the instruction operation. - On the other hand, regarding arithmetic instructions, it is necessary to directly specify all of the internal resources that can operate in parallel. Therefore, the bit length of the arithmetic instruction is the largest (jj), and the DB8P1 uses horizontal microinstructions with a width of 32 bits. FMP here
LC! 41 is a free run and does not give direct instructions using the above-mentioned busy command. The operation specification for the IP A L UfJK is performed by directly instructing the IP A L UfJK with a command.

■　絶対値１ｘ１ ■　符号相関Ｓｉｇｎ　（Ｙ）・Ｘ ■　加算　Ｘ＋Ｙ ■　減算　Ｘ−Ｙ ■　最大値ＭＡＸ（Ｘ、Ｙ） ■　最小値Ｍ工Ｎ（Ｘ、り ■　固定→浮動変換ＩＰＬ　Ｔ　（Ｘ）■　浮動→固定
変換？工Ｘ　（Ｘ） ■　シフト　　　　Ｒ１、Ｉ＋１〜Ｌ８■　論理　　Ａ
ＮＤ　、ＯＲ、ＥＯＲ、ＮＯ？■　仮数加算　ＸＭ　＋
　ＹＭ ■　指数減算　　ｘｌ　−ｘＥここで問題となるのは、ＤＩ３８Ｐ１では浮動小数点演
算を基本とし、論理・アドレス演算を行う場合に固定小
数点演算となる点である。前述の様に両者には互換性は
なく１例えば演算結果忙よってメモリのアドレス指定を
行う場合、ＦＡＬＵ（至）において■の命令を実行する
必要がある。また。■ Absolute value 1x1 ■ Sign correlation Sign (Y)・X ■ Addition X+Y ■ Subtraction X-Y ■ Maximum value MAX (X, Y) ■ Minimum value M engineering N (X, ri) ■ Fixed → floating conversion IPL T (X) ■ Floating → fixed conversion? Engineering X (X) ■ Shift R1, I+1 ~ L8 ■ Logic A
ND, OR, EOR, NO? ■ Mantissa addition XM +
YM ■Exponent subtraction xl -xE The problem here is that the DI38P1 is based on floating point arithmetic, and when performing logic/address arithmetic, it is a fixed point arithmetic. As mentioned above, there is no compatibility between the two. For example, when specifying a memory address due to busy calculation results, it is necessary to execute the instruction (2) in FALU (to). Also.

一般の信号処理では浮動小数点でデータの入出力を行う
ことはあまりしないため、デー遡入出力毎に■ないし■
の命令を実行し、データ変換を行う必要がある。In general signal processing, floating point data input/output is rarely performed, so each data input/output is
It is necessary to execute instructions and perform data conversion.

次に問題となるのは浮動小数点データを正規化する際に
常にビットの切捨てを行うことである。The next problem is that bits are always truncated when normalizing floating point data.

信号処理プロセッサでは演算精度が有限であるために当
然演算誤差を伴う。しかし、ビットの切捨てのみでこれ
に対応する場合、演算結果が常に絶対−値を取った場合
を考えると真値よりも小となることとなり、誤差がラン
ダム化されない。これは演算語長を拡大することで容易
に無視できる程の量とすることが可能であるが９通常の
信号処理プロセッサでは高速動作を要求されるためＫこ
れには限界がある。Since the signal processing processor has finite calculation precision, it naturally involves calculation errors. However, if this is handled only by truncating bits, and considering the case where the operation result always takes an absolute value, it will be smaller than the true value, and the error will not be randomized. This can be easily reduced to a negligible amount by enlarging the operation word length, but there is a limit to this since ordinary signal processing processors are required to operate at high speed.

この様な問題は特にエエＲ型ディジタルフィルタ（巡回
型）、フレーム間処理を行う画像信号処理では無視でき
ず、Ｄ８ＥＩＰ１においては処理結果を論理演算命令等
によって丸め（四捨五入）することが必要となる。更に
、一般の信号処理アルゴリズムでは演算精度が単位処理
毎に種々規定されることが多く、その精度は必ずしも信
号処理プ・　ロセッサの演算語長とは一致しない。この
場合には単位処理毎に演算データのフォーマット変換を
ＦＡＬＵＩ２１を用いてくシ返すこととなる。Such problems cannot be ignored, especially in image signal processing that performs AE-R type digital filter (cyclic type) and inter-frame processing, and in D8EIP1, it is necessary to round (round off) the processing result using logical operation instructions, etc. . Furthermore, in general signal processing algorithms, calculation accuracy is often specified in various ways for each unit of processing, and the accuracy does not necessarily match the calculation word length of the signal processing processor. In this case, the format conversion of the calculation data is performed using the FALUI 21 for each unit process.

次に問題となるのは、ＤＢ８Ｐ１では高速処理可能な演
算が前述の積和演算のみに限定されることである。これ
は旧来の代表的な信号処理アルゴリズムであるＦＰＴ、
Ｆ工Ｒフィルタでは十分な例えば以下の式で表わされる
もの等も高速処理することが要求される。The next problem is that the operations that can be processed at high speed in the DB8P1 are limited to the above-mentioned sum-of-products operations. This is FPT, which is a typical old signal processing algorithm.
The F-engine R filter is required to process at high speed, for example, the one expressed by the following equation.

Σ　１ａ１−’ｂ１　ｌこの様な演算はＤＢ８Ｐ１ではサポートできず。Σ　1a1-'b1　l Such operations cannot be supported by DB8P1.

全て単一の四則演算に分解して処理する必要があるため
１項の算出に３積の別々の演算を実行しなくてはならな
い。この時、１項毎に上式の結果を算出すると遅延のた
６１項当り３Ｘ３＝９命令を必要とし、処理多重度が極
度に低下する。もちろん２Ｐ−ＲＡＭ＋２１１を使用し
て中間結果をセーブすることで差分十自乗累算という分
類によって多重度を上げることができるが、限られたデ
ータメモリ空間を有効に利用することが困難となり、多
重のデータを処理できない。Since it is necessary to decompose and process everything into a single four arithmetic operations, three separate operations must be performed to calculate one term. At this time, if the result of the above equation is calculated for each term, 3×3=9 instructions will be required for each 61 terms due to the delay, and the processing multiplicity will be extremely reduced. Of course, by using 2P-RAM+211 to save intermediate results, it is possible to increase the degree of multiplicity by classifying it as differential ten-square accumulation, but this makes it difficult to effectively utilize the limited data memory space. Unable to process data.

例えば第１０図に示す様な２進木探索を行う場合を考え
る。ここで、２Ｐ−ＲＡＭ＋２１１上には入力ベクトル
Ａかセットされ１図中で番号付けされた各ノードには本
状に構造化された参照ベクトルＢが外部データメモリに
第１１図に示す様に配置されているものとする。For example, consider a case where a binary tree search as shown in FIG. 10 is performed. Here, the input vector A is set on the 2P-RAM+211, and the reference vector B structured in a book shape is arranged in the external data memory as shown in FIG. 11 at each node numbered in FIG. It is assumed that

入力ベクトルＡと参照ベクトルＢとの間の近似度を表わ
す評価関数は差分絶対値和Σ１ａ１−ｂｉｔ＝　　１（Ａ＝（ａｉ　＋　（１２）　ｒ”’＋　ａＮ）　＋　
Ｂ＝（ｂｌ　＋　ｂ２　＋　”、＋ｂｓ））　とし、こ
の結果が最小となるものを各段で２進本状に選択し、最
終的に最も近似度の高い参照ベクトルを得るものである
。この時、各段の参照ベクトルＢは現時点のノード番号
がｎの場合。The evaluation function representing the degree of approximation between input vector A and reference vector B is the sum of absolute differences Σ1a1-bit=1 (A=(ai + (12) r"'+ aN) +
B = (bl + b2 + '', +bs)), and the one with the minimum result is selected in binary form at each stage, and finally the reference vector with the highest degree of approximation is obtained. When the reference vector B of each stage is the current node number n.

２ｎ＋１と２ｎ＋２のノードの２つの参照ベクトルＢと
の間で近似度を求めその結果から次段で比較する参照ベ
クトルのノード番号を算出する。この処理をＤＢ８Ｐ１
で実現した場合は以下の命令ステップ数を必要とする。The degree of approximation is determined between the two reference vectors B of nodes 2n+1 and 2n+2, and from the result, the node number of the reference vector to be compared in the next stage is calculated. Perform this process on DB8P1
If implemented using , the following number of instruction steps is required.

・入力データの変換Ｎ＋２ステツプ１１１ベクトルの評価値算出９Ｂ＋２ステツプ・評価値の丸め約３ステツプ・評価値の比較４ヌテツプ・次ノードの参照ベクトルアドレヌ算用約９ヌテツプこれは評価値算出に要するヌテツプの理想値を２Ｎヌテ
ツプとし、アドレスと入力データの変換が不要であった
場合の約９倍のステップ数となる。・Conversion of input data N+2 steps 111 Vector evaluation value calculation 9B+2 steps ・Rounding the evaluation value Approximately 3 steps ・Comparison of evaluation values 4 steps ・About 9 steps for calculating the reference vector address of the next node This is required to calculate the evaluation value Assuming that the ideal value of the Nutep is 2N Nutep, the number of steps is approximately 9 times that of the case where conversion of addresses and input data is not required.

更に、この様な処理の場合、同一処理が連続しないこと
となるため、常に命令の前後関係を意識する必要がある
。このため、処理効率が大幅に劣化するのみならず、プ
ログラム作成が非常に煩雑となり、ソフトウェア開発の
工数上も問題となるのは明らかである。Furthermore, in the case of such processing, the same processing does not occur consecutively, so it is necessary to always be aware of the context of the instructions. For this reason, it is clear that not only the processing efficiency is significantly degraded, but also the program creation becomes extremely complicated, which also poses a problem in terms of the number of man-hours required for software development.

[Problem that the invention seeks to solve]

従来のディジタル信号処理プロセッサは以上の様に構成
されているので例えば以下の様な問題点があった。Since the conventional digital signal processing processor is configured as described above, it has the following problems, for example.

啼常に命令の前後関係を意識してプログラムを作成する
必要があり、同一の命令を連続して行わない限り処理効
率が上がらない。It is necessary to always be aware of the context of commands when creating a program, and processing efficiency cannot be improved unless the same command is executed consecutively.

・アドレスとデータのフォーマットに互換性がなく、テ
ーブルルックアップ等を行う場合等ではデータ毎にフォ
ーマット変換を行う必要がある。- Address and data formats are not compatible, and when performing a table lookup, etc., it is necessary to perform format conversion for each data item.

・演算器が積和のみを対象とするため、これ以外の演算
では効率が極度に劣化し、プログラム作成も煩雑化する
。- Since the arithmetic unit only performs sum-of-products, the efficiency of other calculations is extremely poor and programming becomes complicated.

・データの演算精度の制御が困難であり、丸めを自動的
に行うことができない。- It is difficult to control the precision of data calculations, and rounding cannot be performed automatically.

・２入力・１出力演算全てをデータメモリから同時に読
出し／−！Ｆ込みを行うことができず例えばベクトルデ
ータの処理では効率が極度に劣化する。・Read all 2-input/1-output operations from data memory at the same time/-! F-inclusion cannot be performed, and efficiency is extremely degraded, for example, in processing vector data.

−間接アドレスのモード指定が命令中で即時にできず、
アドレスのモード変更を行う毎に処理を中断する必要が
ある。- Indirect address mode cannot be specified immediately in an instruction;
It is necessary to interrupt processing every time the address mode is changed.

この発明は上記のような問題点を解消するためだなされ
たもので、以下の点を実現した柔軟性に富み、簡易な装
置開成のディジタル信号処理プロセッサを得ることを目
的とする。The present invention has been made to solve the above-mentioned problems, and an object thereof is to obtain a digital signal processing processor that is highly flexible and easy to implement, and that achieves the following points.

・命令の前後関係を意識せず、同一演算のくシ返しが少
い処理でも効率が低下しない。・Efficiency does not decrease even in processing with few repetitions of the same operation without being aware of the context of instructions.

・アドレスとデータ７オーマントに互換性を有し、高速
な探索を行う。・Compatible with address and data 7omants and performs high-speed search.

・積和のみならず他の高度な処理をも高速処理を行う。- Processes not only sum of products but also other advanced processing at high speed.

φデータ演算時の演算精度を簡易な手段で効率良く制御
する。To efficiently control calculation accuracy during φ data calculation by a simple means.

・演算器へのベクトルデータの高速入出力。・High-speed input/output of vector data to arithmetic units.

・柔軟性の高いアドレッシング方法。- Highly flexible addressing method.

[Means for solving problems]

この発明に係るディジタル信号処理プロセッサは、命令
実行パイプラインステージにデータメモリからデータを
読出し演算器へデータを入力するステージと、演算器か
らデータを出力しデータメモリへ書込みを行うか演算器
中の累算器を用いて累算ま念はデータ丸めを行うステー
ジを加えた５ステージ構成とし、５２テージ中の実行ス
テージに対応して演算器中にバレルシフタ、乗算器、算
術論理演算器を同列に配置し、書込／累算ステージに対
応してこれらの次段忙正規化用バレルシフタを接続しこ
の出力を丸め／累算用加算器への入力ないし演算器の出
力とするとともに、内部データメモリを２面の２ポート
メモリで構成し各面の１つの読出しポートを対応する演
算器の２本の入力バスへ接続し、他方の読出し／書込み
ポートを演Ｗ、器の１本の出力バスないしＤＭＡ転送用
バスに接続し、命令実行ステージに対応して演算器に対
する２入力１出力のデータメモリアドレスを２次元的に
並列生成を行うアドレス生成部と、内部データメモリと
外部データメモリ間でＤＭＡバスを用いて２次元的なデ
ータ転送うＩ）　Ｍ　Ａ　ＩＪ御部を設け、このアドレ
ス生成部、ＤＭＡＩＪ御部と演算部のデータ・フォーマ
ットに互換性を有する様に構成したものである。The digital signal processing processor according to the present invention includes a stage for reading data from a data memory and inputting data to an arithmetic unit in an instruction execution pipeline stage, and a stage for outputting data from the arithmetic unit and writing it to the data memory. Accumulation using an accumulator has a 5-stage configuration including a stage for data rounding, and a barrel shifter, multiplier, and arithmetic logic unit are placed in the same line in the arithmetic unit corresponding to the execution stage of the 52 stages. These next-stage busy normalization barrel shifters are connected in correspondence with the write/accumulation stage, and the output is used as an input to the rounding/accumulation adder or as an output of the arithmetic unit, and is also used as an internal data memory. consists of a 2-port memory on two sides, one readout port on each side is connected to the two input buses of the corresponding arithmetic unit, and the other read/write port is connected to the output bus or one output bus of the unit. An address generation unit that is connected to the DMA transfer bus and generates 2-input 1-output data memory addresses for the arithmetic unit in parallel two-dimensionally corresponding to the instruction execution stage, and a DMA transfer unit between the internal data memory and the external data memory. A MA IJ control section for two-dimensional data transfer using a bus is provided, and the data formats of the address generation section, DMAIJ control section, and arithmetic section are compatible.

また、この発明に係るディジタル信号処理プロセッサは
９種々の演算だ対する演算器動作を１意に特定し、これ
に対応する機能コードと前記正規化用バレルシフタ、２
入力、１出力に対する２ソースと１デステイネーシヨン
制御コードを組合せてマイクロ命令コードを構成したも
のである。Further, the digital signal processing processor according to the present invention uniquely specifies the operation of the arithmetic unit for nine various operations, and the function code corresponding to this and the normalization barrel shifter;
A microinstruction code is constructed by combining two sources and one destination control code for input and one output.

[Effect]

この発明における命令実行パイプラインステージは遅延
動作を考慮したマイクロ命令の記述をほぼ不要とし、同
一命令のくり返し処理が少い場合でも高効率処理が可能
である。The instruction execution pipeline stage according to the present invention substantially eliminates the need to write microinstructions in consideration of delayed operations, and enables highly efficient processing even when the same instruction is not repeatedly processed.

この発明における演算部は積和、差分絶対値和。The calculation section in this invention is a sum of products and a sum of absolute differences.

差分自乗和の１項の計算とデータの桁調整・および丸め
処理を等価的に１マシンサイクルで実行する。更に、こ
の発明における内部データメモリとバス構造は演算部に
対する２入力・１田方のデータ転送を演算と並列に行う
ものでアシ、これと２次元的アドレス生成を行うアドレ
ス生成器との組合せによシベクトルデータの演算を効率
良く処理する。The calculation of one term of the sum of squared differences, data digit adjustment, and rounding processing are equivalently executed in one machine cycle. Furthermore, the internal data memory and bus structure in this invention performs two-input/one-way data transfer to the calculation unit in parallel with the calculation, and by combining this with an address generator that generates two-dimensional addresses, To efficiently process calculations on sivector data.

この発明におけるアドレス生成器のデータフォーマット
を演算器のデータフォーマットと互換性を有するととに
より９例えばテーブル・ルックアップ、辞書の参照等の
処理においてデータ変換が不要となる。By making the data format of the address generator in the present invention compatible with the data format of the arithmetic unit, data conversion becomes unnecessary in processing such as table lookup and dictionary reference.

この発明におけるＤＭＡ制御部は内部演算と並列忙外部
データメモリから２次元的データ入出力を行い、演算デ
ータの入出力に要する処理時間を効果的に削減する。The DMA control unit in this invention performs internal calculations and parallel input/output of two-dimensional data from an external data memory, effectively reducing the processing time required for inputting/outputting calculation data.

最後に、この発明におけるマイクロ命令セットは内部１
１／Ｖｌリソースの動作の組合せを一意に指定すること
でプログラム記述の煩雑さを解消し。Finally, the microinstruction set in this invention is an internal one.
1/Vl By uniquely specifying the combination of resource operations, the complexity of program description is eliminated.

各マイクロ命令毎にデータの桁数・８よびソース。Number of data digits, 8 and source for each microinstruction.

デスティネーションのアドレス生成式を指定することで
、複雑なデータ演算の桁ｖ４整や各種データメモリのス
キャン（走査）方法を直接制御することができる。この
ため、命令の前後関係を考慮する必要性を最小限度にお
さえることができ、プログラム記述が簡易化され、縞級
言語（例えばＣ言語等）で記述することが容易になる。By specifying the destination address generation formula, it is possible to directly control the digit v4 arrangement of complex data operations and the scanning method of various data memories. Therefore, it is possible to minimize the need to consider the context of commands, simplify program description, and facilitate writing in striped language (for example, C language).

[Embodiments of the invention]

以下、この発明の一実施例を図について説明する。 An embodiment of the present invention will be described below with reference to the drawings.

第１図はこの発明によるディジタル信号処理プロセッサ
の概略を示すブロック図であり９図において、　　（１
００）は外部拡張マイクロ命令メモリへ接続するための
外部プログラム・バス、　　（１０１）は内部だ実装さ
れた書込み可能命令メモリｗｃｓ。FIG. 1 is a block diagram schematically showing a digital signal processing processor according to the present invention, and in FIG. 9, (1
00) is an external program bus for connection to an external expansion microinstruction memory, and (101) is an internally implemented writable instruction memory wcs.

（１０２）は外部プログラム・バス（１００）又は書込
可能命令メモリＷ　（！　Ｂ　（１０１）から読出され
るマイクロ命令を入力し、命令実行パイプラインにおい
て所定の動作制御を行うシーケンス制御部、　　（１０
３）はデータメモリに対する２入力・１出力アドレスを
並列に生成するアドレス生成部、　　（１０４）はこの
２入力・１出力データを並列に転送するため、に備えら
れた各々２４ピツトの幅を有する３本の内部データバス
、　　（１０５）はこの３本の内部データバス（１０４
）中の１つを選択し、外部データバス（１１１）に接続
する外部データメモリエ／Ｆ部、　　（１０６）は３本
の内部データバス（１０４）に接続され、所定の演算を
行う演算部、　　（１０７）は１本の読出しポートと１
本の読出し／書込みポートを備え、内部データバス（１
０４）に接続され念内部データメモリＭＯ。(102) is a sequence control unit that inputs microinstructions read from the external program bus (100) or writable instruction memory W (! B (101), and performs predetermined operation control in the instruction execution pipeline; 10
3) is an address generation unit that generates 2 input/1 output addresses for the data memory in parallel, and (104) has a width of 24 pits each for transferring this 2 input/1 output data in parallel. Three internal data buses (105) are the three internal data buses (104)
) is an external data memory E/F unit that selects one of them and connects it to the external data bus (111), and (106) is an arithmetic unit that is connected to the three internal data buses (104) and performs a predetermined calculation. , (107) has one read port and one
Equipped with a book read/write port and an internal data bus (1
04) is connected to the internal data memory MO.

（１０８）は同様に内部データメモリＭｌ、（１０９）
は外部データメモリアドレス生成器と内部データメモリ
アドレス生成器を独自に備えたＤＭＡ制御部。(108) is similarly an internal data memory Ml, (109)
is a DMA control unit uniquely equipped with an external data memory address generator and an internal data memory address generator.

（１１０）は外部データバス（１１１）と内部データメ
モリＭ　Ｏ（１０７）ないし内部データメモリＭ　１　
（１０８）との間のＤＭＡ転送を行うＤＭＡバス、　　
（１１１）は外部の拡張データメモＩＪ　Ｋ接続する外
部データバス、　　（１１２）はシーケンス制御部（１
０２）へ外部からリセット信号を入力するリセット端子
、　　（１１３）は同様に外部から割込制御信号を入力
する割込端子である。(110) is an external data bus (111) and internal data memory M O (107) or internal data memory M 1
(108), a DMA bus that performs DMA transfer between
(111) is an external data bus connected to the external extended data memory IJK, (112) is a sequence control unit (1
02) is a reset terminal for inputting a reset signal from the outside, and (113) is an interrupt terminal for inputting an interrupt control signal from the outside.

第２図は第１図における演算部（１０６）の構成例を示
すブロック図であり９図において、　　（１２０）は３
本の内部データバス（１０４）中の被演算データを転送
するＸ−バス、　　（１２１）は同様に演算データを転
送するＹ−バス、　　（１２２）は同様に出力データを
転送する２−バス、　　（１２３）は１マシンサイクル
で入力データを所定のビット数シフト／ローテートを行
う２４ビツト語長のバレルシフタＢ−８ＦＴ。FIG. 2 is a block diagram showing an example of the configuration of the arithmetic unit (106) in FIG. 1. In FIG. 9, (120) is 3
An X-bus that transfers operand data in the book's internal data bus (104), a Y-bus (121) that similarly transfers operation data, a 2-bus (122) that similarly transfers output data, (123) is a 24-bit word length barrel shifter B-8FT that shifts/rotates input data by a predetermined number of bits in one machine cycle.

（−１２４）は１マシンサイクルで所定の算術論理演算
または差分絶対値の算出を行う２４ビツト語長の算術論
理演算器Ａ　Ｌ　Ｕ、　　（１２５）は１マシンサイク
ルで２４ビツトの乗算を行い４７ビツトの結果を出力す
る乗算ｉ５ＭＰＹ、　　（１２６）は算術論理演算器Ａ
　Ｌ　Ｕ　（１２４）の量分出力を一時保持し０乗算器
Ｍ　Ｐ　Ｙ　（１２５）の自乗入力ポートへ出力するこ
とで差分自乗を算出するためのデータ・パイプラインレ
ジスタＤＰＲＯ，（１２７）はバレルシ７りＢ−８ＩＦ
　Ｔ　（１２３）の２４ビット出力ま喪は算術論理演算
器Ａ　Ｌ　Ｕ　（１２４）の２４ビット出力の一方を選
択し、データ・パイプラインレジスタＤ　Ｐ　Ｒ１、（
１２９）へ出力するマルチプレクサ、　　（１２Ｂ）は
乗算ａ　ＭＰＹ（１２５）の４Ｔビツト出力を一時保持
するデータ拳パイプラインレジスタＤＰＲ２，（１２９
）はマルチプレクサ（１２υの２４ビット出力を一時保
持するデータ・パイプラインレジスタＤ　Ｐ　Ｒ１、（
１５０）はデーターパイプラインレジスタＤ　Ｐ　Ｒ１
（１２９）からの２４ビツトデータまたはデーターパイ
プラインレジスタＤ　Ｐ　Ｒ２（１２８）の４１ビツト
データの一方を選択して入力し、′／！マシンサイクル
で所定の桁数調整を行った後２４ビツトデータとして出
力する正規化用バレルシフタＮ−８ＦＴ、　　（１５１
）はこの正規化用バレルシフタＮ−Ｅｌν’Ｉ’　（１
３０）　Ｏ２４ビット出力、　　（１５２）はワーキン
グレジスタｗｒ（１３５）からの２４ビット累算用出力
、　　（１３３）は累算／丸め用加算器ＡＵ、　　（１
５４）はこの累算／丸め用加算器Ａ　Ｕ　（１５３）の
２４ビット結果出力、　　（１３５）は２４ビツト×８
ワード構成のワーキングレジスタＷｒ　、　　（１５６
）は算術論理演算ｌ５ＡＬＵのフラグ出力、　　（１５
７）はこのフラグ出力（１３６）を条件テストするフラ
グチエツク回路、　　（１５８）はこのフラグチエツク
回路の出力である１ビツトの真偽判定結果を順次記憶す
る２４×ｌビツトの条件テストシフトレジスタｔｃｓｒ
　、　　（１３９）は正規化用バレルシフタＮ−８Ｆ　
Ｔ　（１５０）においてＬ８Ｂ方向、すなわち右シフト
を指示した場合にシフトアウトされた最上位のビットを
そのまま出力する１ビツトのキャリーである。(-124) is a 24-bit word length arithmetic and logic unit ALU that performs a predetermined arithmetic and logic operation or calculates the absolute difference value in one machine cycle, and (125) performs 24-bit multiplication in one machine cycle. Multiplication i5MPY which outputs the bit result, (126) is arithmetic logic unit A
The data pipeline register DPRO (127) is a barrel system for temporarily holding the output of the amount of L U (124) and outputting it to the square input port of the 0 multiplier MPY (125) to calculate the difference square. 7riB-8IF
Selects one of the 24-bit outputs of the arithmetic and logic unit ALU (124) to select one of the 24-bit outputs of the T (123), and selects one of the 24-bit outputs of the arithmetic logic unit ALU (124), and selects one of the 24-bit outputs of the arithmetic logic unit ALU (124),
129), (12B) is a data pipeline register DPR2, (129) which temporarily holds the 4T bit output of multiplication a MPY (125).
) is a multiplexer (data pipeline register D P R1 that temporarily holds the 24-bit output of 12υ, (
150) is the data pipeline register DPR1
Select and input either the 24-bit data from (129) or the 41-bit data from data pipeline register DPR2 (128), '/! Normalization barrel shifter N-8FT (151
) is this normalization barrel shifter N-Elν'I' (1
30) O24-bit output, (152) is the 24-bit accumulation output from working register wr (135), (133) is the accumulation/rounding adder AU, (1
54) is the 24-bit result output of this accumulation/rounding adder AU (153), and (135) is the 24-bit x 8
Word-structured working register Wr, (156
) is the flag output of the arithmetic logic operation l5ALU, (15
7) is a flag check circuit that performs a condition test on this flag output (136), and (158) is a 24×1 bit condition test shift register tcsr that sequentially stores the 1-bit truth/false judgment result that is the output of this flag check circuit.
, (139) is the normalization barrel shifter N-8F
This is a 1-bit carry that outputs the most significant bit shifted out as is when the L8B direction, that is, right shift is instructed at T (150).

第３図は第１図に示したディジタル信号処理プロセッサ
の内部データメモリと内部データバスの関係を説明する
図であり、　　（１４０）は内部データメモリＭ　Ｏ（
１０７）の読出しポートからの２４ビントデータをＸ−
バス（１２０）ないしＹ−バス（１２１）Ｏ一方へ出力
するデマルチプレクサ、　　（１４１）は内部データメ
モリＭ　１　（１０８）の読出しポートからの２４ビツ
トデータをＸ−バス（１２０）ないしＹ−バス（１２１
）の一方へ出力するデマルチプレクサ。FIG. 3 is a diagram explaining the relationship between the internal data memory and the internal data bus of the digital signal processing processor shown in FIG.
107) 24-bit data from the read port of
A demultiplexer (141) outputs 24-bit data from the read port of the internal data memory M1 (108) to one of the X-bus (120) and the Y-bus (121). (121
) Demultiplexer that outputs to one side.

（１４２）は２−バス（１２２）ないしＤＭＡバ２　（
１１０）の書込みデータの一方を選択して内部データメ
モリＭ　Ｏ（１０７）の読出し／書込みポートへ出力す
るマルチプレクサ、　　（１４３）は同様に２−バス（
１２２）ないしＤＭＡバス（１１０）の書込みデータの
一方を選択して内部データメモリＭ　１　（１０８）の
読出し／書込みポートう出力するマルチプレクサ、　　
（１４４）は書込アドレスＤアドレス（１４７）とＤＭ
Ａ制御部（１０９）からの内部データメモリアドレスエ
アドレス（１４８）を内部データメモリＭ　Ｏ（１０７
）ないし内部データメモリＭ　１　（１０８）の読出し
／書込みポートのいずれかへ選択して（支）力するアド
レス用２−２セレクタ、　　（１４５）は内部データメ
モリＭ　Ｏ（１０７）の読出しポートアドレスであるＳ
Ｏエアドレス（１４ｔ５）は内部データメモリＭ　１　
（１０８）の読出しポートアドレスであるＳ１アドレス
、　　（１４７）は内部データメモリＭ　Ｏ（１０７）
ないし内部データメモリＭ　１　（１０８）に対する書
込みアドレス、　　（１４８）は゛ＤＭＡバス（１１０
）から転送されるデータに対応する内部データメモリア
ドレスであるエアドレスである。(142) is the 2-bus (122) or DMA bus 2 (
A multiplexer (143) selects one of the write data of the 2-bus (110) and outputs it to the read/write port of the internal data memory MO (107).
122) or a multiplexer for selecting one of the write data on the DMA bus (110) and outputting it to the read/write port of the internal data memory M1 (108);
(144) is the write address D address (147) and DM
The internal data memory address air address (148) from the A control unit (109) is transferred to the internal data memory M O (107).
) or the read/write port of the internal data memory M 1 (108). (145) is the read port address of the internal data memory M 1 (107). S is
O air address (14t5) is internal data memory M1
S1 address which is the read port address of (108), (147) is the internal data memory M O (107)
The write address (148) for the internal data memory M1 (108) is the ``DMA bus (110).
) is an air address that is an internal data memory address corresponding to the data transferred from the address.

第４図は第１図中のアドレス発生部（１０３）の構成を
説明する図であｆｉ、　　（１５０）はシーケンス制御
部（１０２）へ入力されたマイクロ命令中の即値で示す
ディスプレースメントデータ、　　（１５１）は２４ビ
ツト×４ワードのアドレスレジスタＡＲ，（１５２）は
１２ビツト×４ワードのインデックス修飾レジスタＩ　
Ｘ　Ｒ，（１５３）はアドレスレジスタＡ　Ｒ（１５１
）とＸ−バス（１２０）のデータ入出力バス、　　（１
５４）はインデックス修飾レジスタＩ　Ｘ　Ｒ（１５２
）とｘ−バス（１２０）のデータ入出力バス、　　（１
５５）は２４ビツト語長のアドレス加算器、　　（１５
＋５）は３系統独立に備えたアドレス生成ｉ１Ａ　Ｇ　
Ｕ、　　（１５７）は２４ビツトの書込アドレスを１マ
シンサイクル遅延させる書込アドレスパイプラインレジ
スタＤＡＰＲ３゜（１５８）は同様に書込アドレスパイ
プラインレジスタＤＡＰＲ４である。FIG. 4 is a diagram illustrating the configuration of the address generation section (103) in FIG. (151) is a 24-bit x 4-word address register AR, and (152) is a 12-bit x 4-word index modification register I.
X R, (153) is the address register A R (151
) and the data input/output bus of the X-bus (120), (1
54) is the index modification register I
) and the data input/output bus of the x-bus (120), (1
55) is a 24-bit word length address adder, (15
+5) is address generation i1A G with 3 independent systems
U, (157) is a write address pipeline register DAPR3 which delays a 24-bit write address by one machine cycle. Similarly, (158) is a write address pipeline register DAPR4.

第５図は第１図に示したディジタル信号処理プロセッサ
の５ステージで構成された命令実行パイプラインを説明
する図であり、　　（１６０）は４相で構成されるマシ
ンサイクル、　　（１６１）はフェッチステージ、　、
　（１６２）はデコードステージ、　　（１６５）はデ
コードステージ後半のアドレス更新タイミング。FIG. 5 is a diagram explaining the instruction execution pipeline consisting of five stages of the digital signal processing processor shown in FIG. 1. (160) is a machine cycle consisting of four phases, (161) is a stage, ,
(162) is the decode stage, and (165) is the address update timing in the second half of the decode stage.

（１６４）は読出しステージ、　　（１６５）は実行ス
テージ。(164) is a read stage, and (165) is an execution stage.

（，１６６）は書込／累算ステージ前半の正規化用タイ
ミング、　　（１６７）は書込／累算ステージである。(, 166) is the normalization timing in the first half of the write/accumulate stage, and (167) is the write/accumulate stage.

第６図は第１図に示すディジタル信号処理プロセッサの
マイクロ命令セット例の一部を示す図であり０図におい
て、　　（１７０）はロード命令、　　（１７１）は分
枝命令、　　（１７２）は１ソース演算命令、　　（１
７５）は２ソーヌ演算命令、（１７りはソース指示コー
ド。FIG. 6 is a diagram showing a part of an example of a microinstruction set of the digital signal processing processor shown in FIG. 1. In FIG. Source operation instruction, (1
75) is 2 Saone operation instructions, and (17) is a source instruction code.

（１７５）はデスティネーション指示コード、　　（１
７（Ｓ）はソース０指示コード、　　（１７７）はソー
ス１指示コードである。(175) is the destination instruction code, (1
7(S) is the source 0 instruction code, and (177) is the source 1 instruction code.

次に動作について説明する。以下、同様に各部の名称は
上記説明で用いた略称を用いる。Next, the operation will be explained. Hereinafter, the abbreviations used in the above description will be used for the names of each part.

先ず、第１図より、全体の概略動作を説明する。First, the overall general operation will be explained with reference to FIG.

本発明によるディジタル信号処理プロセッサは従来例と
同様プログラムバス（１００）とデータバス（１０４）
が分離された構成を持ち、シーケンス制御部（１０２）
へのマイクロ命令入力、データバス（１０４）を介した
演算部（１０６）のデータ入出力、アドレス生成部（１
０３）による２入力Ｉ１１出力データアドレスの並列生
成、内部データメモリＭ　Ｏ（１０７）　、　Ｍｌ（１
０８）ないし外部データメモリエ／　Ｆ　（１０５）に
よる外部データメモリのアクセスをマイクロ命令によっ
て並列に実行する。更に、ＤＭＡ制御部（１０９）によ
υＤＭＡバス（１１０）を介しこの内部動作と独立に内
部データメモリＭ　Ｏ（１０７）　、　　Ｍ　１　（１
０８）と外部データメモリエ／　Ｆ　（１０５）との間
でデータのＤＭＡ転送を実行する。ここで、各実行ユニ
ットは従来例と同様にレジヌタベーヌである。本プロセ
ッサでは大半の命令で遅延動作形式としないため、命令
実行パイプライン中に、データの入出力ステージを含め
ている。従って６例えば演算部（１０６）において加算
を行う場合を考えると、入力。The digital signal processing processor according to the present invention has a program bus (100) and a data bus (104) as in the conventional example.
has a separate configuration, and a sequence control unit (102)
Microinstruction input to the arithmetic unit (106) via the data bus (104), data input/output to the arithmetic unit (106), address generation unit (1
Parallel generation of 2 input I11 output data addresses by 03), internal data memory M O (107), Ml (1
08) or external data memory accesses (105) are executed in parallel by microinstructions. Furthermore, the internal data memories M O (107) and M 1 (1
08) and the external data memory/F (105). Here, each execution unit is a regina tabe, similar to the conventional example. Since most instructions in this processor do not require delayed operation, a data input/output stage is included in the instruction execution pipeline. Therefore, if we consider the case where addition is performed in the arithmetic unit (106), for example, 6 inputs.

出力も含め、１ヌテツプのマイクロ命令によって加算命
令を実行すれば良い。このため１種々の演算を組合せた
プログラムでも等価的に１マシンサイクルで１マイクロ
命令の実行が可能である。The addition instruction, including the output, can be executed using one microinstruction. Therefore, even a program that combines various operations can equivalently execute one microinstruction in one machine cycle.

但し、命令実行結果を使用できるのは次命令の読出しス
テージとのステージ数差に対応する３命令ステツプ後か
らである。本プロセッサではこれによるロスを避ける意
味を含め結果を直ちに使用する必要のあるものの大半を
複合演算とし、１命令で対応させている。However, the instruction execution result can be used after three instruction steps corresponding to the difference in the number of stages from the read stage of the next instruction. In this processor, most of the operations whose results need to be used immediately, including the meaning of avoiding loss due to this, are performed as compound operations, and are handled by one instruction.

このため、大半のプログラムではこのロスが発生しない
。演算部（胆り、アドレス生成部（１０３）のデータ語
長とフォーマットは同一であって、完全に互換性を有す
る。Therefore, this loss does not occur in most programs. The data word length and format of the arithmetic unit (input and address generation unit (103)) are the same and are completely compatible.

このため、テーブルルックアップ、辞書参照等の処理に
おいて、演算結果をデータメモリアドレスに直接換算す
ることができる。Therefore, in processing such as table lookup and dictionary reference, calculation results can be directly converted into data memory addresses.

次に、第２図に基づき演算部（１０６）の機能を説明す
る。Ｂ−Ｂ　’ＩＰ　Ｔ　（１２５）、　　Ａ　ｒ、＋
　Ｕ　（１２４）、　　ＭＰＹ（１２５）は全て１マシ
ンサイクルで動作が可能であり、命令実行パイプライン
ステージ中の実行ステージで動作する。次ステージであ
る書込／累算ステージにおいてはＮ−８Ｆ　Ｔ　（１５
０）において桁数調整を行い結果（１！１１）を２−バ
ス（１２２）へ出力しデータメモリへ書込みを行うか、
ＡＵ（１３５）によってｖｒｒ（１３５）の内容（１５
２）と累算ないし丸めを行い再び結果（１５４）をｗｒ
　　（１５５）ヘセットすることができる。ここで、Ｄ
ＰＲＩ（１２９）、ＤＰＲ２（１２８）は各々次ステー
ジへ結果を転送するレジスタである。この構成によって
例えば複合演算は以下の様に実行される。Next, the function of the calculation section (106) will be explained based on FIG. B-B'IP T (125), A r, +
U (124) and MPY (125) can all operate in one machine cycle, and operate in the execution stage of the instruction execution pipeline stage. In the next stage, the write/accumulate stage, N-8F T (15
Adjust the number of digits in 0) and output the result (1!11) to the 2-bus (122) and write it to the data memory, or
Contents (15) of vrr (135) by AU (135)
2) and perform accumulation or rounding and write the result (154) again.
(155) Can be set. Here, D
PRI (129) and DPR2 (128) are registers that transfer results to the next stage. With this configuration, for example, a compound operation is executed as follows.

積和：　ＭＰＹ（１２５）→ＤＰＲ２（１２８）→Ｎ−
８ＩＦＴ（１５０）→ＡＵ（１５５）→Ｗｒ（１５５）差分絶対値和：　ＡＬＵ（１２４）−ｅＭＵＸ（１２７
）→ＤＰＲ１（１２９）→Ｎ−８ＦＴ（１５０）→Ａｔ
ｅ（１５５）→Ｗｒ（１５５）差分自乗相：　ＡＬＵ（１２４）→ＤＰＲＯ（１２６）
→ＭＰＹ（１２５）→ＤＰＲ２（１２Ｂ）→Ｎ−ＥＩＦ
Ｔ（１５０）→ＡＵ（１５５）→Ｗｒ（１５５）差分自
乗和に関してはＤ　Ｐ　ＲＯ（１２６）を用いた遅延動
作となる。しかし、この命令は大半の場合に連続して用
いられるのみで６９．こればよる問題は無視できる。Sum of products: MPY (125) → DPR2 (128) → N-
8IFT (150) → AU (155) → Wr (155) Sum of absolute differences: ALU (124) - eMUX (127
) → DPR1 (129) → N-8FT (150) → At
e (155) → Wr (155) Difference square phase: ALU (124) → DPRO (126)
→MPY(125)→DPR2(12B)→N-EIF
Regarding the sum of squared differences T(150)→AU(155)→Wr(155), a delay operation is performed using D P RO(126). However, this command is only used consecutively in most cases, and 69. This problem can be ignored.

丸めを行う場合１本プロセッサでは以下の手順による。When rounding is performed using one processor, the following procedure is used.

１　　：キャリ（１５９）すなわち、　　Ｂ−８ＦＴ（１５０）でシフトアウトさ
れるデータの最上位ビットをキャリとし、　　ＡＵ（１
５５）においてキャリ加算を実行することで丸め処理を
行える。このため、丸めた結果の出力光はｗｒ（１５５
）のまに限定される。1: Carry (159) In other words, the most significant bit of the data shifted out by B-8FT (150) is a carry, and AU (1
Rounding processing can be performed by executing carry addition in step 55). Therefore, the output light as a result of rounding is wr(155
).

次に、フラグチエツク回路（１５７）はＡｒ、＋σ（１
２４）で比較動作を行った結果の７ラグ（１５６）をマ
イクロ命令で指示される条件コードに従い１条件が成立
したか否かを示す１ピントのフラグを出力し。Next, the flag check circuit (157) runs Ar, +σ(1
24), the result of the comparison operation is 7 lags (156), and a 1-pinto flag indicating whether 1 condition is satisfied is output according to the condition code specified by the microinstruction.

ｔｃｓｒ　　（１５８）へ順次セットして行く。例えば
、２入力のデータ最大値・最小値を求める場合、どちら
を選択したかの履歴を記憶できる。このｔＣａｒ（１５
Ｂ）にセットされた内容をＭＯＢからＬＳＢまで水平に
見たものが２進木探索におけるインデックスコードに相
当する。tcsr (158) in sequence. For example, when determining the maximum and minimum data values for two inputs, a history of which one was selected can be stored. This tCar(15
The content set in B) viewed horizontally from MOB to LSB corresponds to the index code in binary tree search.

第３図に基づき内部データメモリの構成を説明する。Ｍ
　Ｏ（１０７）　、　Ｍ　１　（１０８）　　は各々２
４ピツト×５１２ワードの２ボ一トＲＡＭであり、演算
部（１０６）へ２入力データを並列に出力する場合はＭ
　Ｏ（１０７）　、　Ｍ　１（１０８）の読出しポート
の出力をセＬ／　／　夕（１４０）　、　（１４１）　
　によってｘ−バ、Ｋ　（１２０）　Ｙ−＋＋　ハヌ（
１２１）へ出力する。この時のアドレスはｓｏアドレヌ
（１４５）がＭ　Ｏ（１０７）、　　Ｓ　１アドレス（
１４５）がＭ　１　（１０８）へ出力される。更に、ベ
クトル加算すなわち７＋７→での様にソース、デイテイ
ネーション共データメモリを対象とする場合には２−バ
、Ｅ　（１２２）からＭ　Ｕ　Ｘ　（１４２）ないしＭ
ＵＸ（１４５）を通じてＭ　Ｏ（１０７）ないしＭ　１
　（１０８）の読出し／書込ポートからデータが書込ま
れる。すなわち、内部動作に関してはバス競合が発生し
ない。The configuration of the internal data memory will be explained based on FIG. M
O(107) and M 1 (108) are each 2
It is a 2-bot RAM with 4 pits x 512 words, and when outputting 2 input data to the calculation section (106) in parallel, M
O(107), M1(108) read port output is set L//Y(140), (141)
By x-ba, K (120) Y-++ Hanu (
121). At this time, the addresses are so address (145), M O (107), and S1 address (
145) is output to M 1 (108). Furthermore, when both the source and destination data memories are targeted, as in vector addition, ie, 7+7→, 2-bars, E (122) to M U X (142) to M
M O (107) to M 1 through UX (145)
Data is written from the read/write port of (108). That is, no bus contention occurs regarding internal operations.

第４図に基づきアドレス発生部−（凹９の構成を説明す
る。アドレス発生部（１）はＲＯアドレス発生！、ｓｔ
発生器、Ｄアドレス生成を各々担当する３系統のＡ　Ｇ
　Ｕ　（１５６）から構成される。各ＡＧＵには２４ピ
ント×４ワードのＡ　Ｒ（１５１）と１２ビツト×４ワ
ードの工Ｘ　Ｒ（１５２）が備えられてお９．　ＡＲ（
１５１）とＸ　Ｘ　Ｒ（１５２）とディスプレーヌメン
）　（１，５０）の３項の加算の組合せをアドレス加算
６　（ｉｓｓ）によって行うことで２次元的なアドレス
生成が可能である。The structure of the address generation section (concave 9) will be explained based on FIG.
Three systems of A and G, each responsible for generator and D address generation.
Consists of U (156). Each AGU is equipped with a 24-pin x 4-word AR (151) and a 12-bit x 4-word AR (152).9. AR(
Two-dimensional address generation is possible by performing a combination of additions of three terms: 151), X X R (152), and displanemen (1, 50) using address addition 6 (iss).

尚、　　Ａ　Ｇ　［７（１５６）の動作はデコードステ
ージであるが書込／累算ステージとは２ステ一ジ分のス
テージ差があるため、Ｄアドレス（１４７）はＤＡＰＲ
３（１５７）　、　ＤＡ、ＰＨ１（１５８）によって２
マシンサイクル遅延されてＡ　Ｇ　Ｕ　（１５＜Ｓ）か
ら出力される。Ａ　Ｒ（１５１）　。Note that the operation of A
3 (157), 2 by DA, PH1 (158)
It is delayed by machine cycles and output from AGU (15<S). A R (151).

Ｘ　Ｘ　Ｒ（１５２）は各ｋＸ−バ、２　（１２０）　
Ｋ接続され。X X R (152) is each kX-bar, 2 (120)
K connected.

データフォーマットは演算部（１０６）と互換性を有し
ている。よって９例えばテーブルルックアップを行う場
合は直接Ｗｒ（１５５）　　からＸ−バス（１２０）を
介してＡ　Ｒ（１５１）へデータを転送し、そのままＳ
Ｏアドレス（１４５）ないしＳ１アドレヌ（１４６）と
してアドレス加算を行えば良い。The data format is compatible with the arithmetic unit (106). Therefore, for example, when performing a table lookup, data is directly transferred from Wr (155) to AR (151) via the X-bus (120), and then sent directly to S
Address addition may be performed using the O address (145) to the S1 address (146).

本プロセッサの命令実行パイプラインを第５図に基づい
て説明する。命令実行パイプラインは１命令に竹垣下の
５つのステージから構成される。The instruction execution pipeline of this processor will be explained based on FIG. The instruction execution pipeline consists of five stages for each instruction.

■　フェッチ・ステージ（１６１）プログラムカウンタ出力および１ワード（４８ビット幅
）のマイクロ命令読出し。■ Fetch stage (161) Program counter output and 1-word (48-bit width) microinstruction read.

■　デコードステージマイクロ命令のデコード（１６２）およびアドレス加算
（１６５）　− ■　読出しステージ（１６４）データメモリまたはレジ２タ等のソースデータをＸ−パ
、２　（１２０）　、　　Ｙ−バ、Ｋ（１２１）経由で
読出し。■ Decode stage Microinstruction decode (162) and address addition (165) - ■ Read stage (164) Source data such as data memory or register 2 (X-P, 2 (120), Y-B, K (121) ).

■　実行ステージ（１６５）Ｂ−８ＦＴ（１２５）　、　ＡＬＵ（１２４）　、　Ｍ
ＰＹ（１２５）による演算。■ Execution stage (165) B-8FT (125), ALU (124), M
Calculation by PY(125).

■　書込み／累算ステージＮ−８ＦＴ（１５０）による正規化（１６６）およびＡ
　ＴＴ　（１５５）による丸め／累算ないし２−バス（
１２２）を介したデータメモリへの書込みここで■の書
込み／累算ステージにおいてＡＴＴ（１５５）または２
−バス（１２２）を介したデータ書込みのタイミング（
１６７）を共有するとはＡ　Ｕ　（１５５）の出力はＷ
ｒ（１５５）のみにセットされ、２−バス（１２２）を
使用する場合、　　ＡＵ（１５５）は使用しないという
排他的関係があるためである。■ Normalization (166) and A by write/accumulation stage N-8FT (150)
Round/accumulate or 2-bus (by TT (155)
122) to the data memory via ATT (155) or 2 in the write/accumulate stage of
- Timing of data writing via bus (122) (
167) means that the output of A U (155) is W
This is because there is an exclusive relationship in which when only r (155) is set and 2-bus (122) is used, AU (155) is not used.

以上のシーケンスに従って命令を実行することで煩雑な
遅延を考慮したプログラムの作成がほぼ不要となり、高
級言語コンパイラを用いても効率の良いマイクロプログ
ラムの作成が可能となる。By executing instructions according to the above sequence, it is almost unnecessary to create a program that takes complicated delays into consideration, and it is possible to create an efficient microprogram even using a high-level language compiler.

゛本プロセッサのマイクロ命令は例えば第６図に示す様
になっており、全て４８ビット語長の１ワード水平型命
令セツトである。この命令セットでは同時に動作可能な
内部リソースを並列に指示するのではなく、命令対応に
各ステージのリソース動作の組合せを規定した機能コー
ドを用いる。これによって、マイクロ命令の記述が簡易
化する。The microinstructions of this processor are as shown in FIG. 6, for example, and are all 1-word horizontal instruction sets with a word length of 48 bits. This instruction set does not instruct internal resources that can operate simultaneously in parallel, but uses function codes that define combinations of resource operations at each stage in response to instructions. This simplifies the writing of microinstructions.

この命令セットは大別してロード（１７０）、分枝（１
７１）、　　１ソーヌ演算（１７２）、　　２ソーヌ演
算（１７５）があり０機能コードに対応し、ソース・デ
ヌテイネーションを制御するソースコード（１７４）、
デヌテイネーションコード（１７５）、　　ソース０コ
ード（１７６）、　　ソース１コード（１７７）がセッ
トされている。これらのコードは各々データメモリを対
象とする場合はアドレス発生部（１０５）内の対応する
Ａ　Ｇ　ＩＴ　（１５６）に対するアドレッシング指示
コードとなる。この識別はリソースコードによって行わ
れる。この命令セットによシ例えば演算命令毎にアドレ
ッシングモードの切換え、正規化シフト値等の設定を変
更でき、複雑な信号処理アルゴリズムをプログラムする
時にも最小限のロスで記述することが可能となる。This instruction set can be broadly divided into load (170), branch (170), and branch (170).
71), 1 Saone operation (172), 2 Saone operation (175), corresponding to 0 function code, source code (174) for controlling source denutation,
The denutainment code (175), source 0 code (176), and source 1 code (177) are set. Each of these codes becomes an addressing instruction code for the corresponding AGIT (156) in the address generator (105) when the data memory is targeted. This identification is done by a resource code. With this instruction set, for example, it is possible to switch the addressing mode, change the settings of the normalization shift value, etc. for each calculation instruction, and even when programming complex signal processing algorithms, it is possible to write them with minimal loss.

例えば、従来例と同様に第１０図に示す２進木探索を実
行する場合１本プロセッサでは近以度の算出を以下の様
にプログラムすれば良い。For example, when executing the binary tree search shown in FIG. 10 as in the conventional example, with one processor, the calculation of the degree of proximity can be programmed as follows.

ａｐ　　Ｎ（ａｕｂａａ　　ａｃｏ、ｓｃｌ、ｗｒｚ　　）　　Ｎ
回くり返しＳＣＯ：入力ベクトルアドレス制御８ｅｉ　　：参照ベクトルアドレス制御Ｗｒｚ　：ワー
キングレジヌタ指定これに要するマシンサイクル数はＮ＋１サイクルであシ
、これを２回くシ返せば方向Ｏ１方向１の参照ベクトル
の近以度が求められる。次忙近以度が大のものを決足し
１次段のノード番号を求める処理は以下の様に記述でき
る。ap N (aubaa aco, scl, wrz) N
Repeat SCO: Input vector address control 8ei: Reference vector address control Wrz: Working register designation The number of machine cycles required for this is N+1 cycles, and if you repeat this twice, you will be able to get close to the reference vector in direction O1 and direction 1. The following is required. The process of determining the node number of the first stage by determining the next busyness or the next highest node number can be described as follows.

ｃｍｐ＊ｇｓ　　ｗｒｏ、　　ｗｒｌ　　比較し、結果
をｔｃｓｒ、　　　（１５８）へセットｏｐａｐｍｖｒ　　　ＷＦ２　　、ａｒ１２計　７命令よって１段当りの所要マシンサイクル数は２Ｎ＋９マシ
ンサイクルである。これは理想値とほぼ一致する程の高
効率処理であることが明らかであり、またプログラムも
簡潔である。cmp*gs wro, wrl Compare and set the result to tcsr, (158) op ap mvr WF2, ar12 Since there are 7 instructions in total, the number of machine cycles required per stage is 2N+9 machine cycles. It is clear that this is a highly efficient process that almost matches the ideal value, and the program is also simple.

なお、上記実施例では語長を２４ビットアドレヌ空間を
１６ＭＷ（，２４ピント）としたもので説明したが他の
語長およびデータフォーマットであってもよい。In the above embodiment, the word length is 16 MW (24 pinto) in a 24-bit address space, but other word lengths and data formats may be used.

また、上記実施例では２進木探索について説明したが、
他の信号処理アルゴリズムも同様に上記実施例と同一の
効果を奏する。Also, in the above embodiment, binary tree search was explained, but
Other signal processing algorithms also produce the same effect as the above embodiment.

ま九、上記実施例の細部の仕様は本発明の本質とは無関
係であシ９本発明の内容を限定するものではないことは
明らかである。It is clear that the detailed specifications of the above embodiments are irrelevant to the essence of the present invention and do not limit the content of the present invention.

〔Effect of the invention〕

以上のように、この発明によればディジタル信号処理プ
ロセッサを高度に適応化させることができるため、高速
な信号処理システムを柔軟かつ簡易に構成することがで
きる効果があシ９本発明の目的を満足する。As described above, according to the present invention, a digital signal processing processor can be highly adapted, and therefore a high-speed signal processing system can be configured flexibly and easily. be satisfied.

[Brief explanation of the drawing]

第１図はこの発明の一実施例によるディジタル信号処理
プロセッサの構成を示すブロック図、第２図は第１図中
の演算部の構成を示す図、第３図は第１図中の内部デー
タメモリ構成を説明する図。第４図は第１図中のアドレス生成部の構成を示す図、第
５図は第１図忙示したディジタル信号処理プロセッサの
命令実行タイミングを説明する図。第６図は第１図に示したディジタル信号処理プロセッサ
のマイクロ命令セットの例を示す図、第７図は従来のデ
ィジタル信号処理プロセッサの一例であるＤＳ８Ｐ１の
構成を示すブロック図、第８、図はこのＤＢ８Ｐ１の命
令実行タイミングを説明する図、第Ｓ図はＤＢ８Ｐ１０
マイクロ命令セットを示す図、第１０図は２進木探索の
動作を説明する図、第１１図は第１０図における参照ベ
クトルのデータメモリ内での配置例を示す図である。（１００）はプログラムバス、　　（１０１）はＷａＳ
。（１０２）はシーケンス制御部、　　（１０５）はアド
レス生成部、　　（１０４）はデータバス、　　（１０
５）は外部データメモリ内／ＩＦ、　　（１０６）は演
算部、　　（１０７）はＭＯ。（１０８）はＭｌ、（１０９）はＤＭＡ制御部、　　（
１１０）はＩＢＭムバス、　　（１１１）は外部データ
バス、　　（１２０）はＸ−バｘ、（１２りはＹ−バ、
に、　　（１２２）は２−バス。（１２５）はＢ−８ＩＰＴ、　（１２りはＡＬＵ、　（
１２５）はＭＰＹ、　（１２６）はＤＰＲＯ，（１２７
）はＭＵＸ。（１２８）はＩ）ＰＨ１，（１２９）はＤＰＲＩ、　（
１５０）はＮ−８ＩＦＴ、　（１５５）はＡＵ、　（１
５５）は酊、　（１５９）は”？−？ｌＪ、　　（１４
０）はＤＭＸ、　　（１４１）はＤＭＸ。（１４２）はＭ　Ｕ　！、　（１４５）はＭ　Ｕ　！、
　（１４４）は２−２セレクタ、　　（１４５）はＳＯ
アドレヌ、　　（１４ｔ５）はｓ１アドレ；ｘ、　　（
１４７）はＤ７ドｌ／７．　　（１４８）はエアドレヌ
、　　（１５０）はデイヌプレーヌメント、　　（１５
１）はＡＲ，（１５２）はＩ　Ｘ　Ｒ，（１５５）はＡ
　Ｒ（１５１）　ヘのＸ−バス（１２０）からの大田カ
パヌ、　　（１５４）はエＸ　Ｒ（１５２）へのＸ−バ
ス（１２０）からの入出力パヌ、　　（１５５）はアド
レヌ加算器、　　（１５６）はＡＧＵ。（１５７）はＤＡＰＦｊ３．　（１５８）はＤ　Ａ　Ｐ
　Ｒ４、（１６０）はマシンサイクル、　　（１６１）
はフェッチタイミング。（１６２）はデコードタイミング、　　（１６５）はア
ドレヌ更新タイミング、　　（１６４）は読出しタイミ
ング。（１６５）は実行タイミング、　　（１６６）は正規化
タイミ゛　　　ング、　　（ＩＳ７）は書込み／累算／
丸めタイミング。（１７０）はロード命令、　　（１７１）は分枝命令、
　　（１７２）゛　は１ソーヌ演算命令の例、　　（１
７５）は２ソーヌ演算命令の例、　　（１７４）はソー
ヌコード、　　（１７５）はデヌテイネーションコード
、　　（１７６）はソーク０コード。（１７７）はソース１コードである。尚１図中、同一符号は同一、又は相当部分を示す。FIG. 1 is a block diagram showing the configuration of a digital signal processing processor according to an embodiment of the present invention, FIG. 2 is a diagram showing the configuration of the arithmetic unit in FIG. 1, and FIG. 3 is a diagram showing the internal data in FIG. 1. FIG. 3 is a diagram illustrating a memory configuration. FIG. 4 is a diagram showing the configuration of the address generation section in FIG. 1, and FIG. 5 is a diagram explaining the instruction execution timing of the digital signal processing processor shown in FIG. 1. 6 is a diagram showing an example of a microinstruction set of the digital signal processing processor shown in FIG. 1, FIG. 7 is a block diagram showing the configuration of DS8P1, which is an example of a conventional digital signal processing processor, and FIG. is a diagram explaining the instruction execution timing of this DB8P1, and Figure S is a diagram explaining the instruction execution timing of this DB8P1.
10 is a diagram illustrating the operation of binary tree search, and FIG. 11 is a diagram illustrating an example of the arrangement of reference vectors in the data memory in FIG. 10. (100) is the program bus, (101) is WaS
. (102) is a sequence control section, (105) is an address generation section, (104) is a data bus, (10
5) is in the external data memory/IF, (106) is the calculation unit, and (107) is MO. (108) is Ml, (109) is the DMA control unit, (
110) is an IBM bus, (111) is an external data bus, (120) is an X-bax, (12 is a Y-ba,
, (122) is 2-bus. (125) is B-8IPT, (12 is ALU, (
125) is MPY, (126) is DPRO, (127
) is MUX. (128) is I) PH1, (129) is DPRI, (
150) is N-8IFT, (155) is AU, (1
55) is drunkenness, (159) is "?-?lJ, (14)
0) is DMX, (141) is DMX. (142) is M U! , (145) is M U! ,
(144) is 2-2 selector, (145) is SO
addressnu, (14t5) is s1 address; x, (
147) is D7 dollar l/7. (148) is Airdrain, (150) is Dayne Plainement, (15
1) is AR, (152) is I X R, (155) is A
Ota Kapanu from the X-bus (120) to R (151), (154) is the input/output Panu from the X-bus (120) to E 156) is AGU. (157) is DAPFj3. (158) is D A P
R4, (160) is machine cycle, (161)
is fetch timing. (162) is the decode timing, (165) is the address update timing, and (164) is the read timing. (165) is execution timing, (166) is normalization timing, (IS7) is write/accumulate/
rounding timing. (170) is a load instruction, (171) is a branch instruction,
(172)゛ is an example of one Saone operation instruction, (1
75) is an example of a 2-Saone operation instruction, (174) is a Saone code, (175) is a denutation code, and (176) is a Soak 0 code. (177) is the source 1 code. In Figure 1, the same reference numerals indicate the same or corresponding parts.

Claims

[Claims]

(1) An instruction memory in which microinstructions that define various internal operations are written in advance; an instruction reading unit that reads out the microinstructions from the instruction memory every machine cycle via a program bus; step 1
a sequence control unit that executes one word of the microinstruction every machine cycle by dividing the pipeline into a five-stage pipeline for reading instructions, decoding data, reading operations, writing data, or accumulating each machine cycle; and the five-stage pipeline. a plurality of data input buses for transferring two input data corresponding to a binary operation in parallel in the data read stage in the middle, and a plurality of data input buses in the two stages of operation execution data writing or accumulation in the five-stage pipeline; an arithmetic unit that performs various single operations or compound operations on two input data transferred from the arithmetic unit; a plurality of working registers that store the arithmetic results of the arithmetic unit and are capable of reading data from the data input bus; Selecting one or more data output buses for transferring the output data of the arithmetic unit in the data write/accumulation stage of the stage pipeline, and selecting one of the working register or the data output bus for the output of the arithmetic unit. It has an output control unit that outputs data, a read-only port, and a read/write port. Data can be read or written from both ports at the same time, and the data corresponding to each term of a binary operation and its operation results can be read individually and in blocks. a read control unit that selects one or more of the read-only ports of the plurality of two-port memories and reads input data from the plurality of two-port memories to the data input bus; Any one of multiple 2-port memory read/write ports
a write control unit that selects one or more of the data input buses and writes the calculation result; a write control unit that selects one or more of the plurality of data input buses and reads data from an external data memory; and one of the data output buses that selects one or more of the data input buses or an external data memory connection unit for writing data from a plurality of the data memories into the data memory, and the external data memory connection unit for selecting one or more of the read/write ports in the plurality of two-port memories and connecting the external data memory connection unit to the external data memory connection unit. a direct memory transfer bus separate from a data input bus and a data output bus;
a direct memory transfer control unit that inputs and outputs data between the external data memory connection unit and the two-port memory in units of blocks, independently of internal calculations by the sequence control unit, via the direct memory transfer bus; an external data memory contention control unit that arbitrates connection contention between the external data memory connection unit due to a data input bus, a data output bus, and a direct memory transfer bus; A plurality of address registers and index modification registers are each provided, and read and write addresses for the two-port memory or external data memory are generated in parallel for at least two inputs and one output data for the arithmetic unit in the decoding stage of the five-stage pipeline. An address generation section consisting of a plurality of address generators and a corresponding one of the two-port memory or external data memory connection section are selected according to the read and write addresses outputted from the address generation section, and the five-stage pipe is connected to the five-stage pipe. It is equipped with an address selection instruction unit that instructs a data memory address in synchronization with the read and write stages in the line, and a plurality of transfer address control registers that similarly perform data input/output via the data input bus and control the data transfer range. A digital signal processing processor comprising: a direct memory transfer address generation section that generates in parallel both the two-port memory address used for direct memory transfer and an address for an external data memory connection.

(2) In the calculation section and address generation section, the data input/output word length and data format in the calculation section and the word length and data format of the address register and address operation in the address generation section are unified, and the calculation data is used as an address as is. 2. The digital signal processing processor according to claim 1, wherein the digital signal processing processor is capable of performing the following operations.

(3) In the arithmetic unit, the data is inputted from the data input unit, and the input data can be shifted or rotated by an arbitrary number of bits in one machine cycle, and the result can be logically operated on other input data, as well as a 2-input barrel shifter. An arithmetic logic unit that can perform at least addition, subtraction, and absolute difference calculation on data in one machine cycle, and similarly performs multiplication on two input data and square operation on one input data in one machine cycle. a multiplier, a register that selects and temporarily holds any one of the arithmetic result output of the barrel shifter, arithmetic logic unit, and multiplier; A normalization barrel shifter that adjusts the number of data digits by shifting a predetermined number of bits, and the output of this normalization barrel shifter and one of the plurality of working registers are selected and accumulated in at most 1/2 machine cycle. and a data output section that transfers the output of the normalization barrel shifter to the data output bus in at most 1/2 machine cycle when the adder is not used, In the execution stage, any one of the barrel shifter, the arithmetic logic unit, and the multiplier is operated to execute various arithmetic and logic operations and absolute difference calculations, and in the execution stage, the result of calculating the difference in the arithmetic logic unit is calculated. 1
The difference is squared in one machine cycle by holding it in the time register and performing squaring in the multiplier, and the result of the execution stage is sent to the normalization barrel shifter in the data writing or accumulation stage in the five-stage pipeline. If accumulation is to be performed after adjusting the number of digits, the adder is used to accumulate the data, and if not, the data is directly output to the data output bus. Claim 1, characterized in that a computation or a complex computation such as a sum of products, a sum of absolute differences, a sum of squared differences, etc. is executed.
The digital signal processing processor described in .

(4) In each address generator in the address generation section, write an initial value for address generation from the data input bus to the address register, write an index modification value from the data input bus to the index modification register, and After adding the relative address change amount instructed by the microinstruction to the address register, the content of the address register is updated by the result, and at the same time, this is set as a data memory address. , Claim 1 is characterized in that two-dimensional or various address instructions are realized by a combination of a plurality of micro-instructions by sequentially instructing the address generation mode for each micro-instruction to each of the address generators. The digital signal processing processor described in .

(5) In the direct memory transfer control unit and the direct memory transfer address generation unit, the address instruction for the external data memory connection unit is specified in a two-dimensional data address space of m rows by n columns (m and n are positive integers). The configuration is such that rectangular portions of k rows and l columns (k and l are positive integers) are sequentially designated, and addresses for the plurality of 2-port memories are designated in ascending order from an arbitrary starting address, and the highest order among these addresses is designated in ascending order. 2-dimensional data transfer between an external data memory and the 2-port memory by configuring the 2-port memory to select one of the 2-port memories by having one or more bits located at , and when starting this direct memory transfer, the transfer direction and the number of data to be transferred are instructed by a microinstruction, and when finishing, the direct memory transfer control section sends an instruction to the sequence control section to complete the transfer. The digital signal processing processor according to claim 1, wherein the digital signal processing processor performs data input/output with an external data memory and internal arithmetic processing in parallel in units of rectangular blocks of k rows and l columns. .

(6) When performing conditional branching in the sequence control unit,
A plurality of conditions are stored in a register in advance, and an instruction memory address to be branched when the condition is satisfied is stored in advance in a register corresponding to the condition, and the plurality of conditions are checked in parallel under instructions from a microinstruction. , branches to the instruction memory address in the register corresponding to the condition with the highest priority among those for which the conditions are met, and if all conditions are not met, branches to the instruction memory address in the register corresponding to this. Claim 1 is characterized in that by doing so, branching based on multiple conditions can be realized with one microinstruction.
The digital signal processing processor described in .

(7) The sequence control unit includes a loop counter and a repeat counter capable of inputting initial values and outputting contents from the data input bus, and when a repeat is instructed by a microinstruction, the contents of the repeat counter are , subtract 1 for each microinstruction and execute the same microinstruction repeatedly until it becomes zero, and if a loop is instructed by a microinstruction, subtract 1 from the contents of the loop counter, and if it is not zero, Branch to the instruction memory address specified by the microinstruction, and if it is zero, do not branch and terminate the loop operation to execute a single microinstruction or multiple microinstructions a predetermined number of times. The digital signal processing processor according to claim 1, wherein the digital signal processing processor is repeatedly executed.

(8) In the external data memory connection section, the external data memory is divided into two parts using the address set in advance by a microinstruction as a boundary, and when one of the two parts is to be addressed, one
It is a high-speed memory that completes reading/writing in a machine cycle, and when addressing the other side, reading/writing from outside is required.
The digital signal processing processor according to claim 1, characterized in that the low-speed memory waits until a write completion signal is detected.

(9) The instruction memory has a configuration in which part or all of the instruction memory can be rewritten, and microprograms corresponding to functional processing can be written from an external device to this rewritable instruction memory to perform complex and various types of processing. 2. The digital signal processing processor according to claim 1, wherein the microprogram is realized by the same processor, and when the microprogram is not executed, the microprogram is written autonomously from an externally provided read-only memory.

(10) In the microinstruction, a function code that uniquely defines the operation of each stage in the five-stage pipeline corresponding to the type of operation, and an address generation mode of the address generation unit in the address generation unit are specified for each microinstruction. at least 2
Input specification codes and output specification codes that individually specify input/1 output or 1 input/1 output data memory correspondence, and resource specification codes that specify whether the data input/output target is data memory or register. , characterized in that a microinstruction set is formed by a combination of the number of shift bits of the normalization barrel shifter in the arithmetic unit, and a test condition code, immediate value data, and holding condition code that are instructed depending on the type of operation. A digital signal processing processor according to claim 1.

(11) Arbitrarily allocate an address for communication between processors in the external data memory address, and provide an external first-in/first-out memory or 2-port memory for communication at this address to connect the same processor or other processors. 2. The digital signal processing processor according to claim 1, wherein high-speed and complicated processing is realized by a plurality of processors.

(12) In the normalization barrel shifter and adder in the arithmetic unit, when adjusting the number of digits by shifting toward the least significant bit in the normalization barrel shifter, the data corresponding to the number of shifted bits from the least significant bit is truncated as a result of the shift. A mode is provided in which the most significant bit of the data is set as a carry or borrow bit, and the adder performs addition with carry or borrow on the working register or zero data to perform accumulation with data rounding or data rounding operation. , wherein whether or not to use this mode is instructed by mode setting for the adder or direct control by microinstructions. Digital signal processing processor.

(13) In the arithmetic logic unit in the arithmetic unit, when comparing the magnitude of two input data or inspecting a specific bit in the input data, the comparison result inspection conditions specified in the microinstruction (e.g. (2 input data are equal, etc.) is established, and the test results are sequentially stored from one direction of the shift register for each microinstruction,
The contents of this shift register viewed from the horizontal direction are read from the data input bus, and the multiple judgment results obtained by executing the microinstruction multiple times are used as a search history code in tree search or for branching operations under multiple conditions. A digital signal processing processor according to claim 1 or 3.