JPWO2004036416A1

JPWO2004036416A1 - Processor having multi-bank register and method for controlling processor

Info

Publication number: JPWO2004036416A1
Application number: JP2004544709A
Authority: JP
Inventors: 祐教松本
Original assignee: 株式会社トプスシステムズ
Priority date: 2002-10-18
Filing date: 2002-10-18
Publication date: 2006-02-16
Anticipated expiration: 2022-10-18
Also published as: AU2002344110A1; WO2004036416A1; JP3958320B2

Abstract

命令長を長くせずに多くのレジスタを扱えて、パイプライン処理においてストールが発生しにくくすることを課題とする。プロセッサ（１００）が、命令デコーダと、演算器（１２）と、外部メモリ（１９）との間でデータ入出力を行なうメモリアクセス手段（１３等）と、該演算器および該メモリアクセス手段からアクセス可能なレジスタをそれぞれ備えた複数のレジスタバンク（１１）と、レジスタバンク指定命令に基づく該命令デコーダからのレジスタバンク指定制御信号により、該演算器が該レジスタバンクのいずれにアクセスするか、および、該メモリアクセス手段が該レジスタバンクのいずれにアクセスするかを制御するものであるバンクスイッチ手段とを備えている。またこのプロセッサ（１００）を用いたマルチプロセッサシステムや、プロセッサ（１００）の制御方法を提供する。It is an object of the present invention to make it difficult for stalls to occur in pipeline processing by handling many registers without increasing the instruction length. The processor (100) accesses from the instruction decoder, the arithmetic unit (12), the memory access means (13, etc.) for inputting / outputting data to / from the external memory (19), and the arithmetic unit and the memory access means A plurality of register banks (11) each having a possible register, a register bank designation control signal from the instruction decoder based on a register bank designation instruction, and which of the register banks the operator accesses; Bank switching means for controlling which of the register banks the memory access means accesses. A multiprocessor system using the processor (100) and a method for controlling the processor (100) are also provided.

Description

本発明は、コンピュータ機器の演算装置であるＣＰＵ（中央演算装置）やＤＳＰ（デジタルシグナルプロセッサ）などのプロセッサのアーキテクチャに関し、特に命令セットアーキテクチャを実装するマイクロプロセッサ及びその制御方法に関する。 The present invention relates to the architecture of a processor such as a CPU (Central Processing Unit) or DSP (Digital Signal Processor) that is an arithmetic unit of a computer device, and more particularly to a microprocessor that implements an instruction set architecture and a control method thereof.

様々な機器に用いられているプロセッサ、特にマイクロプロセッサは、近年の半導体微細加工技術の進歩を背景に、クロック周波数を高める等の手法によって演算処理能力の向上が図られている。一方、主記憶装置として働く外部メモリとプロセッサの間の転送速度も各種の技術を利用することにより向上しているが、接続ピンの数やメモリの応答速度などの物理的な制約があり、プロセッサの性能向上には追いついていない。このため、プロセッサは、メモリへのアクセス時には、内部レジスタへのアクセスに比べて余分なクロック数（通常、レイテンシーと呼ばれている）を消費してしまう。プロセッサの内部バスとメモリ間のバスとの間にあるこのようなデータ転送バンド幅のギャップを埋めるために、最近のプロセッサではキャッシュメモリがよく用いられている。
また、複数のプロセッサを連携動作させて、全体として高い演算処理能力を得る手法として、マルチプロセッサシステムが知られている。このマルチプロセッサシステムにおいては、各プロセッサの演算を連携させるために、それぞれに内蔵されるキャッシュメモリ間のデータにコヒーレンシー（一貫性）を保証しなければならない。
キャッシュメモリをそれぞれ備えた複数のプロセッサによりマルチプロセッサシステムを構築する場合には、各プロセッサのメモリへのアクセスを他のプロセッサが監視するスヌーピング（バススヌープ）によって上記コヒーレンシーを保証する方法が知られている。しかし、このスヌーピングは、プロセッサとメモリ間のバスのデータ転送バンド幅を低下させてしまう作用を有している。したがって、マルチプロセッサシステムを用いても、使用するプロセッサ数に比例してシステム全体での演算性能を向上させることは困難である。
マルチプロセッサシステムでは、プロセッサ間でメモリを共有する共有メモリシステムを用いてデータのコヒーレンシーを保証する方法もある。しかし共有メモリシステムでは、共有メモリとプロセッサ間のバスのデータ転送バンド幅が、マルチプロセッサシステムの性能を制限する要因となる。
プロセッサ単体での演算処理能力の向上を並列処理を用いて高める手法として、スーパーパイプラインやスーパースケーラ、ＶＬＩＷ（ＶｅｒｙＬｏｎｇＩｎｓｔｒｕｃｔｉｏｎＷｏｒｄ）といった方法が知られている。スーパーパイプラインやスーパースケーラはインプリメンテーション（実装）レベルで並列処理を行なうことにより演算処理能力を向上させる手法である。ＶＬＩＷは、内部に演算器を複数備えたプロセッサにおいて命令レベルで並列処理を行なうことにより、演算処理能力を向上させる手法である。いずれの並列処理方法においても、本来順次実行するプログラムを並列して処理させるために、必然的に、命令、オペランド、制御といった面からの依存関係が避けられない。例えば、メモリからレジスタへのデータ転送命令であるロード命令と、そのデータがロードされている当該レジスタをオペランドとする演算命令との間には、強い依存関係が生じてしまう。このロード命令においては、キャッシュにロードされるデータが無い場合（キャッシュミスが発生した場合）には外部メモリへのアクセスを伴う。この例のように、前述の依存関係は大きなレイテンシーを伴う可能性がある。その場合には、命令実行パイプラインがストールし、並列処理本来の演算性能が得られなくなってしまうという問題がある。
この並列処理の問題を解決するためには、実行する命令を動的にスケジューリングするリオーダリングバッファを用いたダイナミックスケジューリングが用いられる場合もある。しかし、この場合であっても、その効果があるのは、一般的には、１０数命令の範囲に限定されてしまう。したがって、キャッシュミスが発生した場合には、命令実行パイプラインがストールを埋め切れず、並列処理の利点が十分には発揮されない。また、ダイナミックスケジューリングにはハードウエアが複雑で回路規模が膨大になってしまうという問題もある。
プロセッサが所望のソフトウエアを実行する際に必要とする変数は、プロセッサ内のレジスタあるいはシステム上のメモリに置かれる。メモリへのアクセスは、前述のキャッシュミスの場合はもとより、キャッシュメモリにアクセスする変数が存在している場合であっても、レジスタへのアクセスに比べて遅い。
１つのプログラムに必要とされる変数の数は、プログラムに依存するものの、一般的には、数十から百数十程度である。従来のプロセッサでは、物理的な実装上の制限や、プロセッサを動作させる命令セットアーキテクチャ上での命令長の制限から、８〜３２個のレジスタを実装する例が多い。例えば、レジスタを３２個用いて演算の対象を２つ用いる２オペランドの命令セットアーキテクチャとした場合には、一命令を構成するビットのうち、各レジスタやオペランドの特定に５ビット、計１０ビットを必要とする。同じくレジスタを３２個用いて演算結果を演算の対象と異なるレジスタに格納する３オペランドの場合には、オペランドの特定に計１５ビットを必要とする。これらに、演算のタイプなどを示すオペコードを示すビットを加えたものが命令長となる。
ここで、命令長が長くなるとプログラム用のメモリサイズ、およびメモリとプロセッサの間においてプログラムの実行に用いるバスバンド幅をともに増やす必要が生じる。したがって、一般に、プロセッサ内部のインプリメンテーションの点からは、命令長を短くすることが演算性能を向上させるために有効である。
一方、レジスタ数を増やすと、キャッシュメモリや外部メモリへのアクセスを減らすことが可能となる。さらに、外部メモリとレジスタとのデータ転送を行なうロード命令やストア命令を削減することができるので、プログラムのステップの数が減少する。これらから、扱えるレジスタ数を増加させると、演算性能の低下が防止できるという側面もあるが、レジスタを指定するために、命令長は長くなってしまう。
実際のプロセッサのアーキテクチャにおいては、これらの相反する関係と使用目的等を考慮して、命令長が適宜設定されている。典型的には、レジスタ数を少なく設定する場合には、命令長は１６ビットとされ、レジスタ数を多く設定する場合には、命令長は３２ビットとされる。
また、命令セット上のレジスタ数はそのままで仮想的なレジスタ数を増やすテクニックとして「レジスタリネーミング」が知られている。
さらに、レジスタの有効活用を目的として、個々のレジスタごとに切り換えを行なう手法が、例えば特許文献１に開示されている。
特許文献１：特開平７−８４７８５号公報Background Art Processors used in various devices, particularly microprocessors, have been improved in arithmetic processing capability by techniques such as increasing the clock frequency against the background of recent advances in semiconductor microfabrication technology. On the other hand, the transfer speed between the external memory that functions as the main memory and the processor is also improved by using various technologies, but there are physical restrictions such as the number of connection pins and memory response speed, and the processor The performance has not been caught up. For this reason, the processor consumes an extra number of clocks (usually called latency) when accessing the memory compared to accessing the internal register. In order to bridge such a data transfer bandwidth gap between the internal bus of the processor and the bus between the memories, a cache memory is often used in recent processors.
A multiprocessor system is known as a technique for obtaining a high arithmetic processing capacity as a whole by operating a plurality of processors in a coordinated manner. In this multiprocessor system, in order to link the operations of the processors, it is necessary to guarantee coherency (consistency) in data between cache memories built in the processors.
In the case of constructing a multiprocessor system by a plurality of processors each having a cache memory, a method for guaranteeing the above coherency by snooping (bus snooping) in which other processors monitor access to the memory of each processor is known. Yes. However, this snooping has the effect of reducing the data transfer bandwidth of the bus between the processor and the memory. Therefore, even if a multiprocessor system is used, it is difficult to improve the calculation performance of the entire system in proportion to the number of processors used.
In a multiprocessor system, there is a method of guaranteeing data coherency by using a shared memory system in which memories are shared among processors. However, in the shared memory system, the data transfer bandwidth of the bus between the shared memory and the processor becomes a factor that limits the performance of the multiprocessor system.
As a technique for improving the processing capability of a single processor using parallel processing, methods such as a super pipeline, a super scaler, and a VLIW (Very Long Instruction Word) are known. A super pipeline and a super scaler are techniques for improving arithmetic processing capability by performing parallel processing at an implementation level. VLIW is a technique for improving the arithmetic processing capability by performing parallel processing at the instruction level in a processor having a plurality of arithmetic units inside. In any of the parallel processing methods, since the program that is originally sequentially executed is processed in parallel, it is inevitable that the dependency relationship from the viewpoints of instructions, operands, and control is unavoidable. For example, there is a strong dependency between a load instruction that is a data transfer instruction from a memory to a register and an arithmetic instruction that uses the register loaded with the data as an operand. This load instruction accompanies access to the external memory when there is no data to be loaded into the cache (when a cache miss occurs). As in this example, the aforementioned dependencies can be associated with large latencies. In that case, there is a problem that the instruction execution pipeline is stalled and the arithmetic performance inherent in parallel processing cannot be obtained.
In order to solve this parallel processing problem, dynamic scheduling using a reordering buffer that dynamically schedules instructions to be executed may be used. However, even in this case, the effect is generally limited to a range of 10 or more instructions. Therefore, when a cache miss occurs, the instruction execution pipeline cannot fill the stall, and the advantage of parallel processing is not fully exhibited. In addition, the dynamic scheduling has a problem that the hardware is complicated and the circuit scale becomes enormous.
Variables necessary for the processor to execute the desired software are placed in registers within the processor or memory on the system. Access to the memory is slower than access to the register, not only in the case of the cache miss described above, but also in the case where there is a variable that accesses the cache memory.
Although the number of variables required for one program depends on the program, it is generally several tens to one hundred and several tens. In a conventional processor, there are many examples in which 8 to 32 registers are mounted due to restrictions on physical implementation and instruction length on an instruction set architecture for operating the processor. For example, in the case of a two-operand instruction set architecture that uses 32 registers and two computation targets, among the bits that make up one instruction, 5 bits are specified for each register and operand, for a total of 10 bits. I need. Similarly, in the case of 3 operands using 32 registers and storing the operation result in a register different from the operation target, a total of 15 bits are required to specify the operand. The instruction length is obtained by adding a bit indicating an operation code indicating the type of operation.
Here, as the instruction length increases, it is necessary to increase both the memory size for the program and the bus bandwidth used for executing the program between the memory and the processor. Therefore, in general, from the viewpoint of implementation inside the processor, it is effective to shorten the instruction length in order to improve the calculation performance.
On the other hand, increasing the number of registers makes it possible to reduce access to the cache memory and external memory. Furthermore, since load instructions and store instructions for transferring data between the external memory and the register can be reduced, the number of program steps is reduced. For these reasons, if the number of registers that can be handled is increased, it is possible to prevent a decrease in calculation performance, but the instruction length becomes long to specify the registers.
In an actual processor architecture, the instruction length is appropriately set in consideration of these conflicting relations and the purpose of use. Typically, when the number of registers is set small, the instruction length is 16 bits, and when the number of registers is set large, the instruction length is 32 bits.
Further, “register renaming” is known as a technique for increasing the number of virtual registers while keeping the number of registers on the instruction set.
Further, for example, Patent Document 1 discloses a technique for switching each register for the purpose of effective use of registers.
Patent Document 1: Japanese Patent Laid-Open No. 7-84785

本発明は、プロセッサ単体で使用する際には、命令長を可能な限り短くして演算性能を高めつつ、メモリアクセスによるレイテンシーの発生を抑えることができるプロセッサアーキテクチャを提供することにより、上記問題の少なくともいくつかを解決することを課題とする。
これに加えて、本発明は、マルチプロセッサ構成においても、メモリのコヒーレンシーを保証しつつ、プロセッサとメモリ間のデータ転送バンド幅がスヌーピングによって低下することを防止することができるプロセッサアーキテクチャを提供することにより、上記問題の少なくともいくつかを解決することを課題とする。The present invention provides a processor architecture that can suppress the occurrence of latency due to memory access while shortening the instruction length as much as possible to improve operation performance when used alone. The problem is to solve at least some of them.
In addition, the present invention provides a processor architecture capable of preventing a decrease in data transfer bandwidth between a processor and a memory due to snooping while guaranteeing memory coherency even in a multiprocessor configuration. Thus, it is an object to solve at least some of the above problems.

本発明では、プロセッサ内のレジスタが複数のレジスタバンクに構成される。そして、そのプロセッサに備えられた演算装置およびメモリアクセス装置が、それぞれ異なる上記レジスタバンクに動的に接続される機構を実現することにより、上記プロセッサアーキテクチャを実現する。本発明では、全般に、レジスタバンクのスイッチ手段と、レジスタバンク設定命令、またはレジスタバンク設定修飾命令などからなるレジスタバンク指定命令とを用いることにより、例えば１６ビット等の短い命令長で、数多くのレジスタを扱うことを可能にしている。
具体的には、本発明では、レジスタバンク指定命令と通常の命令とを含む命令をデコードして制御信号を生成する命令デコーダと、該命令デコーダからの該制御信号に基づいて演算処理を行なう演算器と、該演算器の信号に基づいて外部メモリとの間でデータ入出力を行なうメモリアクセス手段と、該演算器および該メモリアクセス手段からアクセス可能なレジスタをそれぞれ備えた複数のレジスタバンクと、前記レジスタバンク指定命令に基づいて前記命令デコーダが生成するレジスタバンク指定制御信号により制御されるバンクスイッチ手段であって、該演算器がどのレジスタバンクにアクセスするかを定め、かつ、該メモリアクセス手段がどのレジスタバンクにアクセスするかを定めるバンクスイッチ手段とを備えてなり、該レジスタバンク指定命令は、該バンクスイッチ手段を制御することにより、該レジスタバンク指定命令より後にある通常の命令をオペランドを用いて実行するために、該オペランドが用いるレジスタに割り当てるレジスタバンクを切り換える命令である、マルチバンクレジスタを有するプロセッサが提供される。
命令デコーダは、本プロセッサの外部に備えられている命令記憶装置からフェッチされた命令を受け取りデコードして制御信号を生成する。また、メモリアクセス手段とは、外部メモリとのアクセスを提供する手段であり、外部バス、キャッシュメモリ、アドレス生成器等の当分野で周知の手段によってメモリとプロセッサのアクセスを提供し、データの転送が可能な任意の技術とすることができる。このメモリアクセス手段は、プロセッサのレジスタと外部メモリとのアクセスを提供する。
それぞれのレジスタバンクは、適当な単位のバンクにまとめられている複数のレジスタを含んでいる。各レジスタバンクは、例えば適当なビット数の数字であるようなバンク識別子によって区別される。本発明では複数のレジスタバンクが含まれているプロセッサが用いられるため、任意のレジスタを特定するには、バンク識別子とレジスタアドレスとを指定する。しかし、本プロセッサで実行される各命令においては、詳細を後述するように、バンクを切り換えるためのバンク設定命令やバンク設定修飾命令を用いることにより、それら以外の命令（通常の命令）においては、バンク識別子を指定する必要が無い。
レジスタバンク指定命令は、明示的にバンクを指定するバンク識別子を何らかの形で含むものである。例えば、このレジスタバンク指定命令は、オペランド数２の命令のソースオペランドやデスティネーションオペランドのバンクを指定するバンク識別子をオペランドとして持つような命令（本明細書において「バンク設定命令」という）とすることができる。このレジスタバンク指定命令は、例えば汎用レジスタのあるもの、といった個々のレジスタのみを指定するものではなく、オペコードと共に用いられるソースオペランドやデスティネーションオペランドといったオペランドの位置ごとにレジスタバンクの割り当てを指定するものとすることができる。このレジスタバンク指定命令は命令デコーダによりレジスタバンク指定制御信号となって演算器を動作させる。
本プロセッサで実行される命令のうち、レジスタバンク指定命令以外のものは、バンク識別子を含まない。このような命令をレジスタバンク指定命令と区別して、「通常の命令」と呼ぶ。通常の命令は、オペランドを用いるものとすることができる。本プロセッサで実行される通常の命令のオペランドには、基本的には、その命令に到達するまでにレジスタバンク指定命令によって明示的に指定されたレジスタバンクが割り当てられているものとすることができる。例えば、ソースオペランドにはバンク０、デスティネーションオペランドにはバンク３、というように割り当てられる。このため、通常の命令の命令長は、レジスタバンクを用いるからといって特に長くする必要はない。
このレジスタバンクの割り当ては、バンクスイッチ手段によって実現される。バンクスイッチ手段は、レジスタバンク指定命令に基づくレジスタバンク指定制御命令によって制御される。このバンクスイッチ手段は、例えば、それ以降の全ての命令に用いられるソースオペランドやデスティネーションオペランドのバンクを指定するレジスタである。この指定は、オペランドごとに変更しても良く、同じであっても良い。なお、レジスタバンク指定命令が明示的に含まれていない際のバンクスイッチ手段の動作は、適当な既定値に基づいているものとすることができる。また、レジスタバンク指定命令は、サブルーチン単位でレジスタバンクを指定するような命令には限られない。
本発明のようにして、必要に応じて付加的な命令による切り換えを行って複数のレジスタバンクを用いると、通常の命令に用いる命令長を短く保ったまま、多数のレジスタを使用することができる。このようにして多数のレジスタを使用できるために、これまでキャッシュメモリを用いて行なわれてきたように、データの局所性を利用したメモリアクセス頻度の抑制も可能になる。しかも、キャッシュメモリに比べてレジスタのほうが転送バンド幅を大きくとることができる。レジスタをより多く利用した方が、キャッシュメモリを多用するより高速な演算が可能となる。
本発明において、前記バンクスイッチ手段は、前記レジスタバンク指定制御信号により制御され、前記演算器が前記レジスタバンクのいずれかへアクセスするかを定めるとともに、前記メモリアクセス手段が他のいずれかの前記レジスタバンクへアクセスするかを定めるものであり、前記演算器によるレジスタへのアクセスと前記メモリアクセス手段によるレジスタへのアクセスとの同時アクセスが可能とすることができる。
この構成により、演算装置に接続されるレジスタバンクと異なるレジスタバンクをメモリアクセス装置に接続することができる。このように構成すれば、演算命令とメモリアクセス命令によるメモリとレジスタ間のデータ転送を並行して行うことが可能となる。プロセッサがあるレジスタバンクのレジスタを用いる演算の処理中に、別のレジスタバンクに対するメモリからのデータを読込んだり、演算結果が格納されているレジスタバンク内のデータをメモリに格納する、という処理をソフトウエアから明示的に行うことができる。その結果、ロード命令と演算命令とのデータの依存関係に起因するパイプラインのストールによるプロセッサの処理性能の低下を、簡便なハードウエア構成で容易に押さえることができる。なお、メモリアクセス手段は命令デコーダからの命令に基づいて外部メモリとレジスタとのアクセスを提供するものとすることができる。
本発明のプロセッサにおいて、前記レジスタバンク指定命令は、レジスタバンク設定修飾命令であり、前記レジスタバンク設定修飾命令は、前記バンクスイッチ手段を制御することにより、該レジスタバンク設定修飾命令の直後にある通常の命令をオペランドを用いて実行するために、該直後にある通常の命令の該オペランドについてのレジスタに割り当てるレジスタバンクを切り換える命令とすることができる。
この場合、レジスタバンク設定修飾命令がそれに後続する直後の命令のみのオペランドについてのレジスタに割り当てるレジスタバンクを切り換える。つまり、レジスタバンク設定修飾命令とは、直後の命令にのみ影響する修飾する作用を有するようなレジスタバンクを指定するための命令である。このレジスタバンク設定修飾命令は、レジスタバンク設定プリフィクスとも呼ぶ。レジスタバンク設定修飾命令により、プログラム中で柔軟にレジスタバンクの指定をすることが可能となり、命令長を抑えつつプログラミングの自由度を確保することができる。
本発明では、前記レジスタバンク設定修飾命令が直前にある場合には命令長が通常の命令の命令長から延長されて、該レジスタバンク設定修飾命令と直後の該通常の命令とが該延長された命令長を有する一命令になるようにするものとできる。ここで、レジスタバンク指定命令がレジスタバンク設定修飾命令であるときには、そのレジスタバンク設定修飾命令と通常の命令とが組み合わせて構成されて命令の一単位となる。レジスタバンク設定修飾命令のみでは一単位にはならない。なぜなら、レジスタバンク設定修飾命令は直後にある通常の命令のオペランドについてのレジスタを修飾するもので、それのみでは実行可能な命令ではないからである。
命令長を延長することとは、レジスタバンク設定修飾命令の直後の命令の動作を、そのレジスタバンク設定修飾命令に従って修飾して実行する動作を実行可能な一命令として認識させることの出来る命令セットアーキテクチャを構成すること、あるいはそういう命令セットアーキテクチャをデコードし得るデコーダを構成することであり、単に形式的に命令の区切りを変更するだけではない。また、アセンブラコードを一行に記載することと直接に対応するものでもない。
これら本発明のプロセッサにおいて、前記レジスタバンク指定命令がレジスタバンク設定命令であり、該レジスタバンク設定命令より後にある通常の命令においては、明示的なレジスタバンクの指定を必要とせずに、前記オペランドが用いるレジスタに対して該レジスタバンク指定命令に基づくレジスタバンクの割り当てが適用されることができる。レジスタバンク設定命令は、レジスタバンク指定命令の一種であり、その後の通常の命令におけるオペランドのバンクの割り当てを指定する。それより後にある通常の命令においては、レジスタバンクを特に指定することなく、レジスタバンク設定命令によって割り当てられたレジスタバンクが用いられる。
このようなレジスタバンク設定命令を用いると、通常の命令の命令長を、レジスタがレジスタバンクに構成されていないプロセッサアーキテクチャと同様の短い命令長としておいて、必要に応じてレジスタバンク設定命令を用いることができる。これにより、複数の命令から構成されるプログラム全体の平均命令長を短く保ちつつ、多数のレジスタを扱うことが可能となる。
本発明の別の態様として、命令記憶手段と、第１のプロセッサと、第２のプロセッサと、外部メモリとを少なくとも備えてなるマルチプロセッサシステムであって、該第１のプロセッサおよび該第２のプロセッサは、該命令記憶手段からのレジスタバンク指定命令と、通常の命令とを含む命令をデコードして制御信号を生成する命令デコーダと、該命令デコーダからの該制御信号に基づいて演算処理を行なう演算器と、該演算器の信号に基づいて該外部メモリとの間でデータ入出力を行なうメモリアクセス手段と、該演算器および該メモリアクセス手段からアクセス可能なレジスタをそれぞれ備えた複数のレジスタバンクと、前記レジスタバンク指定命令に基づいて前記命令デコーダが生成するレジスタバンク指定制御信号により制御されるバンクスイッチ手段であって、該演算器がどのレジスタバンクにアクセスするかを定め、かつ、該メモリアクセス手段がどのレジスタバンクにアクセスするかを定めるバンクスイッチ手段とをそれぞれ備えてなり、該レジスタバンク指定命令は、該バンクスイッチ手段を制御することにより、該レジスタバンク指定命令より後にある通常の命令をオペランドを用いて実行するために、該オペランドが用いるレジスタに割り当てるレジスタバンクを切り換える命令であり、該外部メモリは、該第１のプロセッサの該メモリアクセス手段と、該第２のプロセッサの該メモリアクセス手段とのいずれからもアクセス可能に接続されている、マルチプロセッサシステムが提供される。
この外部メモリは、マルチプロセッサシステムを構成する少なくとも二つ含まれるプロセッサからアクセスされる。これにより、例えば、ＳＭＰ（ＳｙｍｍｅｔｒｉｃａｌＭｕｌｔｉｐｒｏｃｅｓｓｏｒ）構成のマルチプロセッサシステム、ＣＰＵコアとＤＳＰコアとを連携させるマルチプロセッサの構成を実現することができる。本特徴により、マルチプロセッサシステムを構成する各プロセッサにおいてレジスタバンクを用いることが可能となる。プロセッサとメモリ間のバスによるデータ転送バンド幅のギャップを数多くのローカルレジスタで補うことを可能としている。また、各バンクのレジスタは、ソフトウエアから観測および制御可能であるため、データのコヒーレンシーの問題は発生しない。従って、プロセッサとメモリ間のバスのデータ転送バンド幅は、スヌーピングのために失われることは無い。このため、ＳＭＰ構成でプロセッサ数を増やして演算性能を向上させることや、特定目的の信号処理用演算回路を内蔵するＤＳＰとＣＰＵとの連携処理の能力を高めることが可能となる。
このようなマルチプロセッサシステムにおいて、前記バンクスイッチ手段は、前記レジスタバンク指定制御信号により制御され、前記演算器が前記レジスタバンクのいずれかへアクセスするかを定めるとともに、前記メモリアクセス手段が他のいずれかの前記レジスタバンクへアクセスするかを定めるものであり、前記演算器によるレジスタへのアクセスと前記メモリアクセス手段によるレジスタへのアクセスとの同時アクセスが可能であるものとすることができる。
同時アクセスを用いると、マルチプロセッサシステムを構成する各プロセッサの処理能力が向上するだけでなく、複数のプロセッサを連携させて演算処理を行なう場合にしばしば問題となる、プロセッサ間のデータ転送に起因するストールが減少する。
本発明では、このようなマルチプロセッサシステムであって、前記外部メモリにおいて前記第１のプロセッサがアクセスするメモリ領域と、前記第２のプロセッサがアクセスするメモリ領域との少なくとも一部が共有されているものとすることができる。
いわゆる共有メモリを用いた場合であっても、レジスタ数を増やすことにより、データの局所性を用いたデータのキャッシングと同様の効果で、プロセッサとメモリ間に必要とされるデータ転送を少なくすることができる。これにより、マルチプロセッサシステムで共有メモリを用いた場合であっても、メモリバンクを用いない場合に比べて演算性能が向上する。
本発明の他の態様として、レジスタバンク指定命令と通常の命令とを含む命令をデコードして制御信号を生成する命令デコーダと、該命令デコーダからの該制御信号に基づいて演算処理を行なう演算器と、該演算器の信号に基づいて外部メモリとの間でデータ入出力を行なうメモリアクセス手段と、該演算器および該メモリアクセス手段からアクセス可能なレジスタをそれぞれ備えた複数のレジスタバンクと、該演算器および該メモリアクセス手段がそれぞれどのレジスタバンクにアクセスするかを定めるバンクスイッチ手段とを備えてなるプロセッサの制御方法であって、該命令デコーダが該レジスタバンク指定命令に基づいてレジスタバンク指定制御信号を生成するステップと、該レジスタバンク指定制御信号により該バンクスイッチ手段を制御することにより、該レジスタバンク指定命令より後にある通常の命令のオペランドが用いるレジスタに割り当てるレジスタバンクを切り換えて、該演算器がどのレジスタバンクにアクセスするかを制御するステップと、該レジスタバンク指定制御信号により該バンクスイッチ手段を制御することにより、該オペランドが用いるレジスタに割り当てるレジスタバンクを切り換えて、該メモリアクセス手段がどのレジスタバンクにアクセスするかを制御するステップと、該通常の命令をオペランドを用いて実行するステップとを含むマルチバンクレジスタを有するプロセッサの制御方法が提供される。
本発明のプロセッサは、レジスタバンク指定制御信号により、演算器やメモリアクセス手段がアクセスするレジスタバンクを切り換え、その命令を実行する。これを適切に行なうには、上記各ステップを含む制御方法を行なうことができる。これにより、その後の命令のオペランドのレジスタに割り当てられるレジスタバンクが定められるので、このような制御方法によって命令長を短く保って多くのレジスタを扱うことが可能となる。
なお、上記の各ステップは、必ずしもこの順序を保って行なう必要はない。プログラムの各命令の配列等の結果として、様々な順序で実行されることができる。また、同時に行なわれるステップがあったり、各ステップの実行頻度が大きく異なっていても良い。
この制御方法において、該通常の命令を、オペランドを用いて実行する前記ステップが、あるレジスタバンクのレジスタへの前記演算器によるアクセスと、他のレジスタバンクのレジスタへの前記メモリアクセス手段によるアクセスとが同時に行なわれるステップを含むものとすることができる。
演算器があるレジスタバンクにアクセスする際に、メモリアクセス手段は、他のレジスタバンクにアクセスすることができる。これらのアクセスはともに命令の実行ステップに含まれている。この実行ステップの範囲は、例えば、プロセッサが動作するクロックを単位として、通常の命令を実行するのに要する最初のクロックから最後のクロックまでというように、時間によって定めることが出来る。この範囲において、演算器とメモリアクセス手段は、別々のレジスタバンクのそれぞれが用いるレジスタに並行してアクセスすることができる。従来のプロセッサにおいては、この演算器がレジスタにアクセスすることと、メモリにアクセスすることとを並行して行なうために、マルチポートのレジスタファイルを用いる必要があった。しかし、本発明の特徴であるレジスタをバンクに分けることによって、通常のレジスタファイルでこのように並行にアクセスすることが可能となる。これにより、メモリアクセスのレイテンシーが命令実行を一時停止させる時間においても他の演算を行なうことが可能となり、演算処理能力が向上する。また、パイプライン処理のストールの頻度や時間が減少する。
本発明のプロセッサの制御方法において、前記レジスタバンク指定命令は、レジスタバンク設定修飾命令であり、前記レジスタバンク設定修飾命令は、前記バンクスイッチ手段を制御することにより、該レジスタバンク設定修飾命令の直後にある通常の命令をオペランドを用いて実行するために、該直後にある通常の命令の該オペランドについての該オペランドのレジスタに割り当てるレジスタバンクを切り換える命令とすることができる。
レジスタバンク設定修飾命令は、直後の命令にのみ影響する修飾する作用を有するようなレジスタバンクを指定するための前述した命令である。プロセッサの制御方法においても、レジスタバンク設定修飾命令により、プログラム中で柔軟にレジスタバンクの指定をすることが可能となり、命令長を抑えつつプログラミングの自由度を確保することができる。
本発明のプロセッサの制御方法は、前記レジスタバンク設定修飾命令が直前にある場合には命令長が通常の命令の命令長から延長されて、該レジスタバンク設定修飾命令と直後の該通常の命令とが該延長された命令長を有する一命令になるものであってもよい。
また、この制御方法において、前記レジスタバンク指定命令がレジスタバンク設定命令であり、前記レジスタバンク指定命令より後にある通常の命令においては、明示的なレジスタバンクの指定を必要とせずに、前記オペランドが用いるレジスタに対して該レジスタバンク指定命令に基づくレジスタバンクの割り当てが適用される制御方法が提供される。
レジスタバンク指定命令を有することで、通常の命令にはバンク識別子を含めずに、短い命令長を保つことが可能となる。レジスタバンク指定命令もこの命令長に合わせることで、アーキテクチャ全体の最大命令長を短く保って多くのレジスタを取り扱うことが可能となる。
本発明のプロセッサの制御方法は、別の態様として、判定条件によって後にある命令を実行するかしないかの動作を制御する条件付実行切り換え修飾命令を有する制御方法とすることができる。
この条件付実行切り換え修飾命令とは、コンディショナルエグゼキューションプリフィクスということもある。この条件付実行切り替え修飾命令とは、分岐命令を伴わずに、判定条件に応じてその後の命令を実行するかしないかを切り換えることの出来る修飾命令（プリフィクス）である。この修飾命令を用いると、プログラム自体の分岐を削減することができ、並列処理時の分岐命令に伴うストールを回避できる効果がある。
また、本発明のプロセッサの制御方法は、レジスタバンク指定命令と、前記条件付実行切り換え修飾命令とを用いる制御方法とすることができる。
コンディショナルエグゼキューションプリフィクスと本発明のレジスタバンク指定命令を組み合わせて用いると、ほとんどストールなく並列処理することが出来るプログラムを記述することが出来るので、命令長を抑えつつ、処理速度が速いプロセッサを実現することが出来る。
さらに、本発明の制御方法でレジスタバンク指定修飾命令や条件付実行切り換え修飾命令は、命令デコーダにより実行される命令を先読みするプリフィクスデコーダによってデコードされ、該修飾命令に基づいて該プリフィクスデコーダが生成する命令デコーダ制御信号に応じて、該命令デコーダのデコード動作が変更されるものとすることが出来る。各修飾命令はそれ自身のみでは命令デコーダが命令として認識する一命令を構成するものではない。先読みした修飾命令がその修飾命令の直後の命令と組み合わされて一命令として命令の実行ステップを行なえば、命令の実行ステップを増やさずに目的とする動作が可能となる。この動作は、命令を先読みするプリフィクスデコーダを用いることで適切に実行することが出来る。
なお、本出願全体を通じて、「同時」という表現を用いているが、これは、コンピュータ分野で通常用いられるクロックパルスによる時刻の単位や数クロックからなるステップにおいて実質的に同時刻といい得る範囲内で２以上の事象が起きたり重なったりしているという意味である。ずれたタイミングで行なわれても次のクロックパルスまでに対象となっている二つの事象が行われること、あるステップ開始から終了までに対象となっている二つの事象が起きて終了すること、対象となっている事象同士の期間に重なりがあること、などが含まれる。これらの意味で両事象が実質的に同時であると認められれば「同時」に該当するものとする。In the present invention, the registers in the processor are configured in a plurality of register banks. Then, the processor architecture is realized by realizing a mechanism in which an arithmetic device and a memory access device provided in the processor are dynamically connected to different register banks. In general, in the present invention, by using a register bank switch means and a register bank specifying instruction such as a register bank setting instruction or a register bank setting modification instruction, a large number of instructions with a short instruction length such as 16 bits can be obtained. It makes it possible to handle registers.
Specifically, according to the present invention, an instruction decoder that decodes an instruction including a register bank designation instruction and a normal instruction to generate a control signal, and an arithmetic operation that performs arithmetic processing based on the control signal from the instruction decoder A memory access means for inputting / outputting data to / from an external memory based on a signal from the computing unit, and a plurality of register banks each having a register accessible from the computing unit and the memory access means, Bank switching means controlled by a register bank designation control signal generated by the instruction decoder based on the register bank designation instruction, determining which register bank the computing unit accesses, and the memory access means And bank switch means for determining which register bank is to be accessed. The link designation instruction is an instruction to switch the register bank assigned to the register used by the operand in order to execute the normal instruction after the register bank designation instruction using the operand by controlling the bank switch means. A processor having a multi-bank register is provided.
The instruction decoder receives and decodes an instruction fetched from an instruction storage device provided outside the processor, and generates a control signal. The memory access means is means for providing access to an external memory, and provides memory and processor access by means well known in the art such as an external bus, cache memory, address generator, etc., and transfers data. Can be any technique possible. The memory access means provides access to the processor registers and external memory.
Each register bank includes a plurality of registers organized in a bank of appropriate units. Each register bank is distinguished by a bank identifier, such as a number with an appropriate number of bits. Since a processor including a plurality of register banks is used in the present invention, a bank identifier and a register address are designated to specify an arbitrary register. However, in each instruction executed by this processor, as will be described in detail later, by using a bank setting instruction or a bank setting modification instruction for switching banks, in other instructions (ordinary instructions), There is no need to specify a bank identifier.
The register bank designation instruction includes a bank identifier that explicitly designates a bank in some form. For example, the register bank designation instruction is an instruction having a bank identifier that designates a source operand or destination operand bank of an operand having two operands as an operand (referred to as “bank setting instruction” in this specification). Can do. This register bank specification instruction does not specify only individual registers, such as those with general-purpose registers, but specifies register bank allocation for each operand position, such as a source operand and destination operand used with an opcode. It can be. This register bank designation command is operated by the instruction decoder as a register bank designation control signal to operate the arithmetic unit.
Of the instructions executed by the processor, instructions other than the register bank designation instruction do not include a bank identifier. Such an instruction is called a “normal instruction” in distinction from a register bank designation instruction. Regular instructions can use operands. The operand of a normal instruction executed by this processor can basically be assumed to be assigned a register bank explicitly specified by a register bank specifying instruction until the instruction is reached. . For example, bank 0 is assigned to the source operand, bank 3 is assigned to the destination operand, and so on. For this reason, the instruction length of a normal instruction does not need to be particularly long just because a register bank is used.
This register bank assignment is realized by bank switch means. The bank switch means is controlled by a register bank designation control instruction based on the register bank designation instruction. This bank switch means is, for example, a register that designates a bank of source operands and destination operands used for all subsequent instructions. This designation may be changed for each operand, or may be the same. Note that the operation of the bank switch means when the register bank designation instruction is not explicitly included can be based on an appropriate default value. Further, the register bank designation instruction is not limited to an instruction that designates a register bank in a subroutine unit.
When a plurality of register banks are used by performing switching by an additional instruction as necessary as in the present invention, a large number of registers can be used while keeping the instruction length used for a normal instruction short. . Since a large number of registers can be used in this way, it is possible to suppress the frequency of memory access using the locality of data, as has been done with a cache memory. In addition, the transfer bandwidth can be made larger in the register than in the cache memory. The more registers are used, the faster calculation is possible than when the cache memory is heavily used.
In the present invention, the bank switch means is controlled by the register bank designation control signal, determines whether the arithmetic unit accesses any of the register banks, and the memory access means uses any one of the other registers. Access to the bank is determined, and simultaneous access to the register by the arithmetic unit and access to the register by the memory access means can be made possible.
With this configuration, a register bank different from the register bank connected to the arithmetic device can be connected to the memory access device. If comprised in this way, it will become possible to perform the data transfer between a memory and a register | resistor by an arithmetic instruction and a memory access instruction in parallel. During the processing of an operation that uses a register in a register bank, the processor reads data from the memory for another register bank, or stores the data in the register bank where the operation results are stored in the memory. It can be done explicitly from the software. As a result, it is possible to easily suppress a decrease in the processing performance of the processor due to a pipeline stall caused by data dependency between the load instruction and the operation instruction with a simple hardware configuration. Note that the memory access means can provide access to the external memory and the register based on an instruction from the instruction decoder.
In the processor of the present invention, the register bank setting instruction is a register bank setting modification instruction, and the register bank setting modification instruction is usually immediately after the register bank setting modification instruction by controlling the bank switch means. In order to execute the instruction using the operand, it is possible to change the register bank assigned to the register for the operand of the normal instruction immediately after the instruction.
In this case, the register bank assigned to the register for the operand of only the instruction immediately following the register bank setting modification instruction is switched. In other words, the register bank setting modification instruction is an instruction for designating a register bank having a modification effect that affects only the immediately following instruction. This register bank setting modification instruction is also called a register bank setting prefix. The register bank setting modification instruction allows the register bank to be flexibly specified in the program, and the degree of programming can be secured while suppressing the instruction length.
In the present invention, when the register bank setting modification instruction is immediately before, the instruction length is extended from the instruction length of a normal instruction, and the register bank setting modification instruction and the immediately following normal instruction are extended. One instruction having an instruction length can be obtained. Here, when the register bank designation instruction is a register bank setting modification instruction, the register bank setting modification instruction and a normal instruction are combined to form a unit of the instruction. A register bank setting modification instruction alone is not a unit. This is because the register bank setting modification instruction modifies the register for the operand of the normal instruction immediately after it, and is not an executable instruction by itself.
Extending the instruction length is an instruction set architecture that allows the operation of the instruction immediately after the register bank setting modification instruction to be modified and executed according to the register bank setting modification instruction as an executable instruction. Or a decoder capable of decoding such an instruction set architecture, not merely changing the instruction delimitation formally. Also, it does not directly correspond to describing the assembler code in one line.
In these processors according to the present invention, the register bank specifying instruction is a register bank setting instruction. In an ordinary instruction after the register bank setting instruction, the operand is not required to be explicitly specified. The register bank assignment based on the register bank designation instruction can be applied to the register to be used. The register bank setting instruction is a kind of register bank designation instruction, and designates allocation of a bank of operands in a subsequent normal instruction. In the normal instruction after that, the register bank assigned by the register bank setting instruction is used without particularly specifying the register bank.
When such a register bank setting instruction is used, the instruction length of a normal instruction is set to a short instruction length similar to a processor architecture in which the register is not configured in the register bank, and the register bank setting instruction is used as necessary. be able to. This makes it possible to handle a large number of registers while keeping the average instruction length of the entire program composed of a plurality of instructions short.
As another aspect of the present invention, there is provided a multiprocessor system comprising at least an instruction storage means, a first processor, a second processor, and an external memory, wherein the first processor and the second processor The processor decodes an instruction including a register bank designation instruction from the instruction storage means and a normal instruction to generate a control signal, and performs arithmetic processing based on the control signal from the instruction decoder A plurality of register banks each comprising a computing unit, memory access means for inputting / outputting data to / from the external memory based on a signal of the computing unit, and a register accessible from the computing unit and the memory access means And a register bank designation control signal generated by the instruction decoder based on the register bank designation instruction. Bank switch means for determining which register bank the arithmetic unit accesses, and bank switch means for determining which register bank the memory access means accesses. The designated instruction is an instruction to switch a register bank to be assigned to a register used by the operand in order to execute a normal instruction after the register bank designated instruction by using the operand by controlling the bank switch means. A multiprocessor system is provided in which the external memory is connected so as to be accessible from both the memory access means of the first processor and the memory access means of the second processor.
This external memory is accessed by at least two processors included in the multiprocessor system. Thereby, for example, a multiprocessor system having an SMP (Symmetrical Multiprocessor) configuration, and a multiprocessor configuration in which a CPU core and a DSP core are linked can be realized. With this feature, it is possible to use a register bank in each processor constituting the multiprocessor system. It is possible to compensate for gaps in the data transfer bandwidth caused by the bus between the processor and the memory with a large number of local registers. Further, since the registers of each bank can be observed and controlled from software, there is no problem of data coherency. Thus, the data transfer bandwidth of the bus between the processor and the memory is not lost due to snooping. For this reason, it is possible to increase the number of processors in the SMP configuration to improve the calculation performance, and it is possible to increase the capability of the cooperative processing between the DSP and the CPU incorporating the specific purpose signal processing calculation circuit.
In such a multiprocessor system, the bank switch means is controlled by the register bank designation control signal, determines which of the register banks the arithmetic unit accesses, and which of the other memory access means It is determined whether to access the register bank, and simultaneous access to the register by the computing unit and access to the register by the memory access means can be performed.
Using simultaneous access not only improves the processing capability of each processor constituting a multiprocessor system, but also results from data transfer between processors, which is often a problem when performing arithmetic processing in cooperation with multiple processors. Stalls are reduced.
In the present invention, in such a multiprocessor system, in the external memory, at least a part of a memory area accessed by the first processor and a memory area accessed by the second processor are shared. Can be.
Even when so-called shared memory is used, by increasing the number of registers, data transfer required between the processor and memory can be reduced with the same effect as data caching using data locality. Can do. Thereby, even when the shared memory is used in the multiprocessor system, the calculation performance is improved as compared with the case where the memory bank is not used.
As another aspect of the present invention, an instruction decoder that decodes an instruction including a register bank designation instruction and a normal instruction to generate a control signal, and an arithmetic unit that performs arithmetic processing based on the control signal from the instruction decoder A memory access means for inputting / outputting data to / from an external memory based on a signal from the computing unit, a plurality of register banks each having a register accessible from the computing unit and the memory access means, A processor control method comprising an arithmetic unit and bank switch means for determining which register bank each memory access means accesses, wherein the instruction decoder controls register bank designation based on the register bank designation instruction A signal generating step and the bank switch operation by the register bank designation control signal. Switching which register bank is assigned to a register used by an operand of a normal instruction after the register bank designation instruction, and controlling which register bank the computing unit accesses; By controlling the bank switch means by means of a designation control signal, the register bank assigned to the register used by the operand is switched to control which register bank the memory access means accesses; and the normal instruction There is provided a method of controlling a processor having a multi-bank register including executing with an operand.
The processor according to the present invention switches the register bank accessed by the arithmetic unit or the memory access means by the register bank designation control signal, and executes the instruction. In order to appropriately perform this, a control method including the above steps can be performed. As a result, the register bank assigned to the register of the operand of the subsequent instruction is determined, so that it is possible to handle a large number of registers while keeping the instruction length short by such a control method.
Note that the above steps need not necessarily be performed in this order. As a result of the sequence of each instruction of the program, it can be executed in various orders. Further, there may be steps performed simultaneously, or the execution frequency of each step may be greatly different.
In this control method, the step of executing the normal instruction using an operand includes: an access to a register of a certain register bank by the arithmetic unit; and an access to a register of another register bank by the memory access means. Can be included at the same time.
When accessing a certain register bank, the memory access means can access another register bank. Both of these accesses are included in the instruction execution step. The range of this execution step can be determined by time, for example, from the first clock required to execute a normal instruction to the last clock in units of clocks at which the processor operates. Within this range, the arithmetic unit and the memory access means can access the registers used by each of the separate register banks in parallel. In a conventional processor, it is necessary to use a multi-port register file in order for the arithmetic unit to access a register and to access a memory in parallel. However, by dividing the register, which is a feature of the present invention, into banks, it is possible to access in parallel in this way using a normal register file. As a result, it is possible to perform other operations even during the time when the latency of memory access temporarily stops execution of instructions, and the processing capacity is improved. Also, the frequency and time of pipeline processing stalls are reduced.
In the processor control method of the present invention, the register bank designation instruction is a register bank setting modification instruction, and the register bank setting modification instruction is immediately after the register bank setting modification instruction by controlling the bank switch means. In order to execute the normal instruction in the instruction using the operand, it is possible to change the register bank assigned to the register of the operand for the operand of the normal instruction immediately after the instruction.
The register bank setting modification instruction is the above-described instruction for designating a register bank having a modification effect that affects only the immediately following instruction. Also in the control method of the processor, it is possible to specify the register bank flexibly in the program by the register bank setting modification instruction, and it is possible to secure the degree of programming freedom while suppressing the instruction length.
In the processor control method of the present invention, when the register bank setting modification instruction is immediately before, the instruction length is extended from the instruction length of a normal instruction, and the register bank setting modification instruction and the normal instruction immediately after May be one instruction having the extended instruction length.
Further, in this control method, the register bank designation instruction is a register bank setting instruction, and in an ordinary instruction after the register bank designation instruction, the operand is not required to be explicitly designated. A control method is provided in which register bank assignment based on the register bank designation instruction is applied to a register to be used.
By having a register bank designation instruction, it is possible to keep a short instruction length without including a bank identifier in a normal instruction. By adjusting the register bank designation instruction to this instruction length, it becomes possible to handle many registers while keeping the maximum instruction length of the entire architecture short.
As another aspect, the processor control method of the present invention can be a control method having a conditional execution switching modification instruction for controlling an operation of whether or not a subsequent instruction is executed depending on a determination condition.
This conditional execution switching modifier instruction is sometimes referred to as a conditional execution prefix. The conditional execution switching modification instruction is a modification instruction (prefix) that can switch whether or not to execute the subsequent instruction according to the determination condition without accompanying a branch instruction. By using this modification instruction, it is possible to reduce the branch of the program itself, and to avoid the stall associated with the branch instruction during parallel processing.
The processor control method of the present invention may be a control method using a register bank designation instruction and the conditional execution switching modification instruction.
Using a combination of the conditional execution prefix and the register bank specification instruction of the present invention, it is possible to write a program that can be processed in parallel with almost no stalls. Can be realized.
Further, in the control method of the present invention, the register bank designation modification instruction and the conditional execution switching modification instruction are decoded by the prefix decoder that prefetches the instruction executed by the instruction decoder, and the prefix decoder generates based on the modification instruction. The decoding operation of the instruction decoder can be changed according to the instruction decoder control signal. Each modification instruction itself does not constitute one instruction that the instruction decoder recognizes as an instruction. If the pre-read modification instruction is combined with the instruction immediately after the modification instruction and the instruction execution step is performed as one instruction, the target operation can be performed without increasing the instruction execution step. This operation can be appropriately executed by using a prefix decoder that prefetches instructions.
Throughout this application, the expression “simultaneous” is used, but this is within the range that can be said to be substantially the same time in a step consisting of a clock unit or several clocks normally used in the computer field. This means that two or more events are occurring or overlapping. Even if it is performed at different timings, the two target events will be performed by the next clock pulse, the two target events will occur from the start to the end of a step, and the target will end It is included that there is an overlap in the period between the events. In this sense, if it is recognized that both events are substantially the same, it is considered to be “simultaneous”.

図１は、本発明の実施の形態のプロセッサの概略を示す構成図である。
図２は、本発明の実施の形態のバンク指定命令を含むアセンブラコードを示す説明図である。
図３は、図２のアセンブラコードによるバンク切り換えの様子を示す説明図である。
図４は、本発明の実施の形態のバンク指定命令の実現例を表わす説明図である。
図５は、本発明の実施の形態によるプロセッサ構成例における命令フェッチステージの構成を示す機能ブロックレベルの構成図である。
図６は、本発明の実施の形態によるプロセッサ構成例における命令デコードステージおよび命令実行ステージの構成を示す機能ブロックレベルの構成図である。
図７は、本発明の実施の形態によるプロセッサ構成例におけるライトバックステージの構成を示す機能ブロックレベルの構成図である。
図８は、本発明の実施の形態によるプロセッサ構成例におけるバンクスイッチ手段の構成を示す機能ブロックレベルの構成図である。FIG. 1 is a configuration diagram showing an outline of a processor according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing an assembler code including a bank designation instruction according to the embodiment of this invention.
FIG. 3 is an explanatory diagram showing a state of bank switching by the assembler code of FIG.
FIG. 4 is an explanatory diagram showing an implementation example of the bank designation instruction according to the embodiment of the present invention.
FIG. 5 is a functional block level configuration diagram showing the configuration of the instruction fetch stage in the processor configuration example according to the embodiment of the present invention.
FIG. 6 is a functional block level configuration diagram showing the configuration of the instruction decode stage and the instruction execution stage in the processor configuration example according to the embodiment of the present invention.
FIG. 7 is a functional block level configuration diagram showing the configuration of the write-back stage in the processor configuration example according to the embodiment of the present invention.
FIG. 8 is a functional block level configuration diagram showing the configuration of the bank switch means in the processor configuration example according to the embodiment of the present invention.

［アーキテクチャの実施の形態］
以下、図面を参照して本発明の実施の形態を説明する。
図１に本発明のプロセッサの概略を示す。演算器が使用するレジスタは、本発明では、例えばバンク０からバンクｎ−１までのｎ個のレジスタバンクと呼ばれる複数の単位にまとめられている。このレジスタバンクのいずれのバンクであるかを指定するには、バンク識別子を用いる。バンク識別子は適当なビット数を有している数字であり、例えば、ｎを表現するのに十分なビット数を有している。プロセッサ内にある任意のレジスタを特定するには、バンク識別子とレジスタアドレスとを指定する。
演算器は、命令デコーダ（図示せず）によってデコードされた命令に基づいて演算を実行する。この命令には、大別して２種類の命令が含まれている。
命令の第１の種類は、通常のプロセッサにおける命令と同様の命令である。例えば、外部メモリからデータをロードするロード命令（ニーモニック：ＬＤ）、加算を行なう加算（ＡＤＤ）や減算（ＳＵＢ）等、プロセッサの分野で周知の命令が含まれる。本明細書ではこの命令を「通常の命令」と呼ぶ。
命令の第２の種類は、アクセスするレジスタバンクを切り換える為の命令である。これらは、本発明に特有の命令であり、レジスタバンク指定命令と呼んでいる。この命令によって、それ以降の通常の命令に基づいて演算器がアクセスするレジスタバンクがいずれであるか、を決定することができる。例えば、プロセッサのアーキテクチャがオペランドを２つ用いるものである場合において、第１のオペランドと第２のオペランドがいずれのレジスタをアクセスするものであるかを独立して指定することができる。このレジスタバンクを切り換えるためのレジスタバンク指定命令は、特にその形式を問うものではない。通常のストアードプログラム方式の命令として、例えば、明示的なバンク識別子をオペランドとして有する命令（バンク設定命令）や、直後に後続する命令のオペランドが示すバンクを切り換えるプリフィクス（バンク設定修飾命令）とすることもできる。
このバンク切り換えのためのアセンブラコードにおける命令の例を図２に示す。行２０２は、バンク設定命令の一つであり、それ以降のプログラムで用いるバンクを指定するバンクセット命令（ニーモニック：ＢＳ）である。このバンクセット命令の第１および第２オペランドであるＢ０およびＢ１は、それぞれバンク０とバンク１を表わす。つまり、行２０２は、以降のプログラムにおいて、第１オペランド（デスティネーションオペランド）を「バンク０」とし、かつ、第２オペランド（ソースオペランド）を「バンク１」に設定する命令である。このように、行２０２に例示されるバンクセット命令は、このニーモニックＢＳに対応するオペコードと、上記オペランドによって構成される命令である。このバンクセット命令を代表とするバンク設定命令は、次にバンク設定命令が明示的に指定されるまでのレジスタバンクを割り当てる。なお、バンク設定命令が明示的にプログラムに現れるまでの命令行においてどのようなレジスタバンクの設定が行なわれるかは任意である。バンクが不定であるとしてそのような命令を受け付けないようにすること、バンクに何らかの既定値を割り当てるたり（例えば全てバンク０とする）、全てのバンクに対して並列して処理することや、デスティネーションオペランドとソースオペランドをそれぞれ別に割り当てること、などが可能である。
図２の行２０４は、直後に後続する命令に対してのみ有効なバンク設定プリフィクス（ニーモニック：ＢＰ）である。この例に記載のバンク設定プリフィクスは、オペランドとしてではなく、オペコードそれ自体にバンク識別子を内包している。バンク設定プリフィクス「ＢＰ１０」は、その直後に後続する命令（行２０６）にのみ影響し、デスティネーションオペランドをバンク１に、ソースオペランドをバンク０にする命令である。つまり、ｉ，ｊをバンク識別子を表わす添え字とすると、「ＢＰｉｊ」は、後続する命令のデスティネーションオペランドをバンクｉに、ソースオペランドをバンクｊに設定する。もちろん、前述のバンクセット命令のように、オペコードと、バンク識別子を表わすオペランドとを用いるようなものであって直後の命令行のみに影響するものでも良い。このバンク設定プリフィクスは、直後の命令にしか影響しないため、前述のバンク設定命令で指定されたバンクの割り当てを一時的にのみ上書きする。そして、バンク設定プリフィクスが影響しなくなる命令行（行２０８）では、バンク指定命令の指定が再び有効となる。
なお、バンク設定命令においては、その時点でのバンクの設定を保持しているバンクセレクトレジスタ（図示せず）というレジスタを用意しておいて、Ｂ０およびＢ１をそれらのレジスタとすることができる。そしてバンクセレクトレジスタの値は、明示的な命令によってプログラムの指定により参照することが可能である。バクセレクトレジスタの内容は、例えばサブルーチン構成のプログラムにおいては、グローバル変数としてサブルーチン間で共有されていても、ローカル変数としてサブルーチンごとに書き換えられても良い。ローカル変数である場合には、適当なデータ転送命令によって、メモリバンクに構成されたいずれかの汎用レジスタや外部メモリ等に書き出して保持されることができる。本発明のアーキテクチャにおいては、この転送を実現する命令を含んでいることができる。
図３に、図２に示したプログラムにおけるレジスタバンク指定命令の動作を模式的に示す。行２０６の命令（ＡＤＤ）は、直前の行２０４におけるバンク設定プリフィクスＢＰ１０の影響を受けるため、ソースオペランドとしてバンク１にあるＲ１を用い、デスティネーションオペランドとしてバンク０にあるＲ２１を用いて、Ｒ１とＲ２１の和の結果をＲ２１に書き戻す。行２０８の命令（ＳＵＢ）は、行２０２におけるバンク設定命令で指定されているように、ソースオペランドとしてバンク０にあるＲ２を用い、デスティネーションオペランドとしてバンク１にあるＲ１９を用いて、Ｒ２からＲ１９を減算してその結果をＲ１９に書き戻す。行２０８には、行２０４のバンク設定プリフィクスは影響しない。なお、図３ではソースオペランドを示す「ソースレジスターバンク」と「デスティネーションレジスターバンク」とを理解のために図の左右に分けて記載しているが、同じレジスタ識別子のレジスタバンクは、同じレジスタバンクを指し示しているものとすることができる。
以上のように、バンク指定命令を用いて、一連の命令のシーケンスから構成される任意のプログラムにおいて、少ない命令数でソースオペランドやデスティネーションオペランドのポイントするバンクを切り換えることが可能となる。また、直後の命令にのみ有効なバンク設定プリフィクス等を用いることで、柔軟にバンクを切り換えることが可能となる。
また、オペランドが２つの２オペランド命令である場合を例としているが、オペランドが１つのみの場合にはデスティネーションオペランドのみを利用することができる。また、オペランドを３つ用いる３オペランドのアーキテクチャであっても同様にバンク設定命令やバンク設定プリフィクスを定義することが可能である。
このようにして切り換えられるバンクは、図１における演算器からはアクセスするバンクを切り換えることにより、また、外部メモリとはメモリアクセス手段であるデータ転送装置１３、内部バス１４、外部バス１５によってアクセスすることができる。データ転送装置１３の動作は、命令に基づく制御信号によって変更することができる。
次に、本発明の命令長について説明する。以下、１６ビットを特に１ワードと呼ぶことがある。
図４に本発明の命令構成をビット配列にして示す。図４の通常の命令４０２とは、例えば、図２に説明した加算（ＡＤＤ）や減算（ＳＵＢ）等の命令であり、先に第１の種類の命令として説明した通常の命令であり、図２における行２０８に対応している。例えば、この命令４１２は、８ビットのオペコードと各４ビットの二つのオペランドによって構成され、１６ビットの命令長を有している。
また、図４のバンク設定修飾つき命令４０４は、バンク設定修飾命令であるバンク設定プリフィクス４１４と、それに後続する命令４１５から構成される。命令４１５は通常の命令である。図２においては、バンク設定修飾つき命令４０４は行２０４と行２０６の２行で表現されている。命令長は、バンク設定プリフィクス４１４、命令４１５ともに１６ビットであり、バンク設定修飾つき命令４０４は全体としては３２ビット（２ワード）である。このバンク設定修飾つき命令４０４に基づいて、バンク指定命令が命令デコーダで発行される。
また、図４のバンク設定命令４０６とは、例えば１６ビットのバンク設定命令４１６であり、図２における行２０２に対応する。例えば、通常の命令と同様に、４ビットのバンク設定命令オペコードと、各６ビットのバンク識別子オペランド２つを含む。バンク設定命令４０６に基づいて、バンク指定命令が命令デコーダで発行される。
以上に示したようなバンク設定命令やバンク設定修飾命令の命令長の設定は、バンク指定命令が必要でない演算が大部分を占める場合に特に好適である。つまり、バンク指定命令が発行されるときは命令長を２ワード等に長くして、通常は１ワード（１６ビット）等の短い命令を実行することにすれば、レジスタバンクの切り換えを要する場合以外の命令長を短くできる。これにより、プログラムコードの短縮や、プロセッサの実装における面積の縮小が可能としつつ、多数のレジスタが扱える。
なお、バンク設定命令が明示的に現れるまでに何らかの命令が実行される場合にどのようなバンクの割り当てを行なうかは、特に限定されない。バンクは不定であるとしてそのような命令を受け付けないようにすることや、既定のバンクの割り当てを用いて例えば全てバンク０を割り当てること、全てのバンクに対して並列して処理すること、デスティネーションオペランドとソースオペランドをそれぞれ別々に割り当てること、などが可能である。
本発明は、以上のような１ワードと２ワードの命令のみで構成されるものに限られない。例えば、最大命令長を３ワードとすることができる。この場合、第２ワードがバンク設定プリフィクス、第３ワードが通常の命令とすることができるが、第１ワードは他のプリフィクスとすることが出来る。このプリフィクスとしては、例えばコンディショナルエグゼキューションプリフィクスとすることができる。このコンディショナルエグゼキューションプリフィクスとは、分岐命令を伴わずに、判定条件に応じてその後の命令を実行するかしないかを切り換えることの出来るプリフィクスである。このプリフィクスは、これのみで、分岐命令を削減することができ、並列処理時の分岐命令に伴うストールを回避できる効果がある。また、コンディショナルエグゼキューションプリフィクスと本発明のレジスタバンク指定命令を組み合わせて用いると、ほとんどストールなく並列処理することが出来るプログラムを記述することが出来るので、命令長を抑えつつ、処理速度が速いプロセッサを実現することが出来る。なお、最大命令長が２ワードや３ワードであるときに最終ワードがレジスタ設定プリフィクスであると、一般不当命令であると認識されて演算処理されない。
以上のように、本発明により、プログラム全体の命令長を短く保ちつつ、レジスタ指定命令にかかわる命令にのみより長い命令長を割り当てて、多くのレジスタを扱うことできる。
［プロセッサ構成例］
図５〜８に、本発明のアーキテクチャを実装するプロセッサの構成例１０１を、機能ブロックレベルの構成図を各部分ごとに示す。図５は、命令フェッチステージ、図６は命令デコードステージと命令実行ステージ、図７はライトバックステージとその中に含まれるメモリアクセスステージをそれぞれ表わしている。
図５の命令フェッチステージは、命令記憶手段（図示せず）から命令フェッチブバッファ５４を含み、命令をロードする命令フェッチユニット５２と、進行するプログラムのカウントをしているプログラムカウンタ５８の値に基づいて、命令記憶手段をアクセスするアドレスを生成するフェッチアドレスユニット５６とからなる。
図６の命令デコードステージは、命令フェッチブバッファ５４としてロードされる命令を制御信号であるマイクロコード（マイクロ命令）へとデコードするメインデコーダ６２と、そのメインデコーダ６２に対してプリフィクスをデコードして出力するプリフィクスデコーダ６０とを含むデコーダを有する。本構成例では、１６ビットの命令長を有する各命令が９６ビット単位でデコーダに送られる。メインデコーダがある命令をデコードしている時に、その命令の後に続く命令はプリフィクスデコーダ６０でデコードされる。つまり、プリフィクスデコーダ６０は、事前に命令の先読みを行なうことにより、プリフィクスデコーダ６０は命令がプリフィクスを含むものかどうかを検出する。このプリフィクスデコーダ６０がプリフィクスを検出すると、そのプリフィクスの内容に応じて、メインデコーダ６２に制御信号を出力する。この制御信号により、メインデコーダ６０は、当該プリフィクスの後の命令についてのデコードの動作を切り換える。このように、プリフィクスデコーダとメインデコーダは、協働してプリフィクスの動作を実現する。そして、このプリフィクスが本発明のバンク設定プリフィクスであれば、メインデコーダ６２は直後にデコードする命令に含まれるオペランドに用いられるレジスタのバンクを切り換える。その結果、レジスタファイルのバンク割り当てを保持しているレジスタ（バンクセレクトレジスタ６６、図８）が変更される。このレジスタが本発明のバンクスイッチ手段となる。このように、バンク設定プリフィクスを用いることにより、後続する命令のオペランドのポイントするレジスタバンクが切り換えられる。つまり、プリフィクスデコーダとメインデコーダは、バンク指定命令となるマイクロ命令を生成することができる。
バンク設定プリフィクスではなくバンクセット命令等のバンク設定命令による場合には、メインデコーダでそのバンク設定命令がデコードされて、バンクセレクトレジスタを書き換えて、レジスタファイルのバンク割り当てを変更する。これにより、以後の命令におけるデスティネーションオペランドとソースオペランドが指し示すレジスタバンクが切り換わる。
命令デコードステージにはさらに、本発明の複数のレジスタバンクを含むレジスタファイル６４が含まれている。本実施の形態では、レジスタバンクは４個のバンクに整理されているデータ処理用レジスタとして記載されている。各レジスタバンクは、通常の命令において６ビットのフィールドで指定可能な６４ビット長を有するデータレジスタを１６本備えている。
命令デコードステージには、前述のように、バンクセレクトレジスタ６６が含まれている（図８）。このバンクセレクトレジスタは、例えばバンクセット命令による現在のバンク割り当てのステータスを保持している。
命令実行ステージには、命令デコードステージでデコードされた命令を具体的に実行するマイクロコード（マイクロ命令）制御装置６１０と、演算論理ユニット（ＡＬＵ）と、積和演算ユニット（ＭＡＵ）と、データプロセスユニット（ＤＰＵ）とを含んでなる演算器６８が含まれている。演算器６８は、マイクロ命令に基づいて、イミディエイト値、いずれかのレジスタバンク内のレジスタ値等に指定された演算処理を実行して、ライトバックステージに受け渡す。
命令デコードステージと命令実行ステージには、さらに、本発明の特徴的な要素が含まれている。レジスタファイルからの出力は信号６０１の一部であるソースレジスタＳＲＣ０とＳＲＣ１、デスティネーションレジスタＤＳＴ０とＤＳＴ１が用いられる。このうち、レジスタＳＲＣ０およびＤＳＴ０に接続されたレジスタファイルの各ポートは、通常のプロセッサにおいて用いられるものであり、レジスタＳＲＣ０およびＤＳＴ０を介して演算器と接続されている。一方、レジスタＳＲＣ１およびＤＳＴ１に接続されたレジスタファイルのポートは本発明の特徴的なポートであり、レジスタＳＲＣ１およびＤＳＴ１を介して外部メモリ（図示せず）への出力となるストア命令実行装置６２０に接続されている。本構成例ではＳＲＣ１およびＤＳＴ１の二つのレジスタを記載しているが、本発明は、このようなレジスタを少なくとも一つ有していれば良い。このストア命令実行装置６２０は、後述するメモリ出力装置７４（図７）と協働して本発明のメモリアクセス装置の一部をなすものである。これにより、レジスタバンク構成を有するレジスタファイルの一部のレジスタバンクに対してレジスタＳＲＣ０およびＤＳＴ０を介して演算器がアクセスし、同時に、他の一部のレジスタバンクに対しては、レジスタＳＲＣ１およびＤＳＴ１を介して外部メモリがアクセスすることができる。なお、以上の記載ではマイクロ命令を用いるプロセッサ構成を説明したが、ハードワイアード構成としても実現可能である。
図７のライトバックステージは、演算器６８の出力をレジスタファイルへとライトバックするライトバック装置７２や、演算器６８の出力をデータとして外部メモリ（図示せず）に書き出す３状態ドライバーを含むメモリ出力装置７４、当該外部メモリ内からデータを読み出してレジスタファイル６４にロードするメモリ入力装置７６を有している。この時、バンクが異なるので、メモリ入力装置７６は、ライトバックステージと並行してレジスタファイル６４にデータを書き込むことが出来る。
図８に、図５〜７に記載されていないが本発明のプロセッサで用いられるバンクスイッチ手段の構成例を示す。本構成例のバンクスイッチ手段は、バンクセレクトレジスタ６６と、ＳＳＲ（システムステータスレジスタ）８２中に含まれている、ソースオペランドのレジスタに割り当てられるレジスタバンクを保持するレジスタＳＲＣＢ（８４）と、デスティネーションオペランドのレジスタに割り当てられるレジスタバンクを保持するレジスタＤＳＴＢ（８６）とから構成されている。バンクセレクトレジスタ６６は、命令デコードステージにおけるソースオペランドとデスティネーションオペランドのそれぞれのレジスタに割り当てられるレジスタバンクを保持するレジスタＩＤ．ＳＲＣＢ（８４ａ）とＩＤ．ＤＳＴＢ（８６ａ）からなる。
まず、バンク指定命令がバンク設定命令である場合についてこれらのバンクスイッチ手段の動作について説明する。メインデコーダ６２からの直接書き換えられるレジスタＩＤ．ＳＲＣＢ（８４ａ）とＩＤ．ＤＳＴＢ（８６ａ）は、命令実行ステージを経ることなく書き換えられる。このため、これらは最新のレジスタバンクを保持しており、レジスタ・リード時に使用される。同じレジスタバンクの割り当ては、制御レジスタのレジスタファイル中のＳＳＲ８２の中にあるＳＲＣＢ（８４）とＤＳＴＢ（８６）にもその後に書き込まれる。このように動作させると、命令デコードステージの終了時には既にその命令に含まれるレジスタバンクの割り当てが反映されるため、例えば１クロックといった短時間でレジスタバンクの割り当てが可能となり、直後の命令についてもこのレジスタバンクの割り当てが有効となり、レジスタファイル６４にあるどのレジスタバンクのレジスタにアクセスするかを、ソースオペランド、デスティネーションオペランドごとに定めることができる。また、ライトバックステージの終了とともにレジスタファイル中のＳＳＲ８２にもこれが書き込まれるので、システムステータスとしてのレジスタバンクの割り当ての保持が可能となる。
次に、バンク指定命令がバンク設定修飾命令であり、直後の通常の命令と組み合わせて一命令となる場合について説明する。この場合には、その直後の命令にしかバンク設定修飾命令の作用が及ばないので、ＳＳＲ８２にレジスタバンクの一時的な割り当てを書き込む必要が無い。そこで、命令デコードステージでバンクセレクトレジスタ６６を書き換えるが、その書き込まれた内容は命令実行ステージ、ライトバックステージと伝えられることは無い。このようにして、バンク設定修飾命令の直後にある通常の命令にのみ有効な一時的なバンクの割り当てが実現する。
以上のようなプロセッサの構成によって、本発明のレジスタバンク構成されたレジスタバンクを用いることができる。つまり、レジスタバンク設定命令によるレジスタ切り換え動作や、レジスタバンク設定プリフィクスにより後続命令のオペランドポイント先のレジスタを切り換える動作を実装し得るプロセッサが、具体的なハードウエアとして提供される。また、演算器の演算中であっても外部メモリにアクセスすることができ、メモリのレイテンシーに起因する処理の停滞を防止することができる。
なお、図には示していないが、メモリ出力装置７４やメモリ入力装置７６には、周知のキャッシュメモリを用いて、外部メモリに起因するレイテンシーを削減することができる。
［マルチプロセッサシステム構成例］
図１〜図７を用いて説明した単体のプロセッサを複数用いて、マルチプロセッサ構成を実現することができる。この際、協働させる個々のプロセッサは命令記憶手段に接続されている。この命令記憶手段はそれぞれのマルチプロセッサを協働させる命令を記憶している。各プロセッサは、個々のプロセッサの持つプログラムカウンタに応じて、フェッチステージでそれぞれのプロセッサ自身が必要とする命令のみをフェッチする。この各プロセッサの命令には、本発明の単体のプロセッサで用いられるのと同様のレジスタバンク設定命令や、レジスタバンク修飾命令とすることができる。
また、外部メモリは、いくつかのプロセッサからアクセスされるものでも良い。すなわち、外部メモリを共有メモリシステムとして構成し、外部メモリ中のあるデータ領域を複数のプロセッサで共有してプロセッサ間でレジスタを共有することができる。この構成では、レイテンシーが大きい外部メモリにアクセスするため、例えば、各プロセッサのレジスタうち他のプロセッサと値を共有するレジスタ（以下「共有レジスタ」という）と、各プロセッサのみで用いるレジスタとを別々のレジスタバンクに割り当てることができる。各プロセッサは、例えば、共有レジスタに対する演算をするときのみ、その共有レジスタを含むレジスタバンクをデスティネーションオペランドまたはソースオペランドとしてアクセスし、他の演算ではそれ以外のレジスタバンクにアクセスする。そして、共有レジスタが書き換えられたときのみ、共有メモリにアクセスする。このようにマルチプロセッサを構成する各プロセッサが動作すると、各プロセッサで共有すべきレジスタの書き換え結果のみが、レイテンシーの大きな共有メモリを介してアクセスされる。そして、その共有メモリへのアクセスの間も、各プロセッサの演算器は、共有レジスタを扱わない命令を実行できる。このように、本発明における複数のプロセッサからなるマルチプロセッサシステムでは、データのコヒーレンシーを保証しつつ、各プロセッサにおける演算が外部メモリアクセスによるレイテンシーの影響を受けにくいマルチプロセッサシステムが実現する。[Architecture embodiment]
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 shows an outline of the processor of the present invention. In the present invention, the registers used by the arithmetic unit are grouped into a plurality of units called n register banks from bank 0 to bank n−1, for example. A bank identifier is used to designate which of the register banks. The bank identifier is a number having an appropriate number of bits. For example, the bank identifier has a sufficient number of bits to express n. To specify any register in the processor, a bank identifier and a register address are specified.
The arithmetic unit performs an operation based on an instruction decoded by an instruction decoder (not shown). This command roughly includes two types of commands.
The first type of instruction is an instruction similar to an instruction in a normal processor. For example, instructions well known in the field of processors are included, such as a load instruction (mnemonic: LD) for loading data from an external memory, an addition (ADD) for performing addition, and a subtraction (SUB). In this specification, this instruction is referred to as a “normal instruction”.
The second type of instruction is an instruction for switching the register bank to be accessed. These are instructions unique to the present invention and are called register bank designation instructions. With this instruction, it is possible to determine which register bank is accessed by the arithmetic unit based on a normal instruction thereafter. For example, when the processor architecture uses two operands, it is possible to specify independently which register the first operand and the second operand access. The register bank designation command for switching the register bank is not particularly limited in its format. As an instruction of a normal stored program system, for example, an instruction having an explicit bank identifier as an operand (bank setting instruction) or a prefix (bank setting modification instruction) for switching the bank indicated by the operand of the immediately following instruction You can also.
An example of instructions in the assembler code for switching the bank is shown in FIG. A line 202 is one of bank setting instructions, and is a bank set instruction (mnemonic: BS) for designating a bank to be used in subsequent programs. B0 and B1, which are the first and second operands of the bank set instruction, represent bank 0 and bank 1, respectively. That is, line 202 is an instruction for setting the first operand (destination operand) to “bank 0” and the second operand (source operand) to “bank 1” in subsequent programs. As described above, the bank set instruction exemplified in the row 202 is an instruction constituted by the operation code corresponding to the mnemonic BS and the operand. The bank setting instruction represented by this bank set instruction allocates a register bank until the next bank setting instruction is explicitly specified. Note that it is arbitrary what register bank is set in the instruction line until the bank setting instruction explicitly appears in the program. Not accepting such an instruction because the bank is indefinite, assigning some default value to the bank (for example, all bank 0), processing all banks in parallel, Nation operands and source operands can be assigned separately.
A row 204 in FIG. 2 is a bank setting prefix (mnemonic: BP) that is valid only for the immediately following instruction. The bank setting prefix described in this example includes the bank identifier in the opcode itself, not as an operand. The bank setting prefix “BP10” is an instruction that affects only the following instruction (line 206) immediately after it, and sets the destination operand to bank 1 and the source operand to bank 0. That is, if i and j are subscripts representing bank identifiers, “BPij” sets the destination operand of the subsequent instruction to bank i and the source operand to bank j. Of course, as in the bank set instruction described above, an operation code and an operand representing a bank identifier may be used, and only the instruction line immediately after that may be affected. Since this bank setting prefix affects only the immediately following instruction, the bank assignment specified by the bank setting instruction is overwritten only temporarily. Then, in the instruction line (line 208) where the bank setting prefix does not affect, the designation of the bank designation instruction becomes valid again.
In the bank setting instruction, a register called a bank select register (not shown) that holds the bank setting at that time is prepared, and B0 and B1 can be used as these registers. The value of the bank select register can be referred to by the designation of a program by an explicit instruction. The contents of the back select register may be shared among subroutines as global variables, for example, or may be rewritten for each subroutine as local variables in a subroutine-structured program. If it is a local variable, it can be written and held in any general-purpose register or external memory configured in the memory bank by an appropriate data transfer instruction. The architecture of the present invention can include instructions that implement this transfer.
FIG. 3 schematically shows the operation of the register bank designation instruction in the program shown in FIG. Since the instruction (ADD) in line 206 is affected by the bank setting prefix BP10 in the immediately preceding line 204, R1 in bank 1 is used as the source operand, and R21 in bank 0 is used as the destination operand. The result of the sum of R21 is written back to R21. The instruction (SUB) in line 208 uses R2 in bank 0 as the source operand and R19 in bank 1 as the destination operand, as specified by the bank set instruction in line 202, from R2 to R19. Is subtracted and the result is written back to R19. Line 208 is not affected by the bank setting prefix of line 204. In FIG. 3, the “source register bank” and the “destination register bank” indicating the source operand are shown separately on the left and right of the drawing for the sake of understanding. Can be pointed to.
As described above, it is possible to switch the bank pointed to by the source operand and the destination operand with a small number of instructions in an arbitrary program composed of a sequence of instructions using a bank designation instruction. Further, by using a bank setting prefix or the like that is effective only for the immediately following instruction, it becomes possible to switch banks flexibly.
Moreover, although the case where the operand is two two-operand instructions is taken as an example, only the destination operand can be used when there is only one operand. Similarly, a bank setting instruction and a bank setting prefix can be defined even in a three-operand architecture using three operands.
The banks to be switched in this way are accessed by switching the bank to be accessed from the arithmetic unit in FIG. 1, and the external memory is accessed by the data transfer device 13, the internal bus 14, and the external bus 15 which are memory access means. be able to. The operation of the data transfer device 13 can be changed by a control signal based on a command.
Next, the instruction length of the present invention will be described. Hereinafter, 16 bits may be particularly referred to as one word.
FIG. 4 shows the instruction configuration of the present invention in a bit array. The normal instruction 402 in FIG. 4 is, for example, an instruction such as addition (ADD) or subtraction (SUB) described in FIG. 2, and is a normal instruction previously described as the first type of instruction. 2 corresponds to row 208 in FIG. For example, this instruction 412 is composed of an 8-bit opcode and two 4-bit operands, and has an instruction length of 16 bits.
4 includes a bank setting prefix 414 that is a bank setting modification instruction, and an instruction 415 that follows the bank setting prefix 414. The instruction 415 is a normal instruction. In FIG. 2, the bank setting modification command 404 is represented by two lines, a line 204 and a line 206. The instruction length is 16 bits for both the bank setting prefix 414 and the instruction 415, and the instruction 404 with bank setting modification is 32 bits (2 words) as a whole. Based on the bank setting modification instruction 404, a bank designation instruction is issued by the instruction decoder.
4 is, for example, a 16-bit bank setting instruction 416, and corresponds to the row 202 in FIG. For example, like a normal instruction, it includes a 4-bit bank setting instruction opcode and two 6-bit bank identifier operands each. Based on the bank setting instruction 406, a bank designation instruction is issued by the instruction decoder.
The setting of the instruction length of the bank setting instruction and the bank setting modification instruction as described above is particularly suitable when the operations that do not require the bank specifying instruction occupy the majority. In other words, when a bank designation instruction is issued, the instruction length is increased to 2 words, etc., and usually a short instruction such as 1 word (16 bits) is executed, except when register banks need to be switched. The instruction length can be shortened. As a result, it is possible to handle a large number of registers while enabling the program code to be shortened and the area of the processor to be reduced.
Note that there is no particular limitation as to what bank allocation is performed when any instruction is executed before the bank setting instruction explicitly appears. Do not accept such instructions because the bank is indefinite, assign all banks 0 using default bank assignment, process in parallel for all banks, destination It is possible to assign operands and source operands separately.
The present invention is not limited to the above-described one word and two word instructions. For example, the maximum instruction length can be 3 words. In this case, the second word can be a bank setting prefix and the third word can be a normal instruction, but the first word can be another prefix. As this prefix, for example, a conditional execution prefix can be used. This conditional execution prefix is a prefix that can switch whether or not to execute a subsequent instruction according to a determination condition without accompanying a branch instruction. With this prefix alone, branch instructions can be reduced and stalls associated with branch instructions during parallel processing can be avoided. In addition, when the conditional execution prefix and the register bank designation instruction of the present invention are used in combination, a program that can be processed in parallel with almost no stall can be written, so the processing speed is high while suppressing the instruction length. A processor can be realized. When the maximum instruction length is 2 words or 3 words and the last word is a register setting prefix, it is recognized as a general illegal instruction and is not processed.
As described above, according to the present invention, it is possible to handle a large number of registers by assigning a longer instruction length only to an instruction related to a register designation instruction while keeping the instruction length of the entire program short.
[Processor configuration example]
5 to 8 show a configuration example 101 of a processor that implements the architecture of the present invention for each part of a functional block level configuration diagram. FIG. 5 shows an instruction fetch stage, FIG. 6 shows an instruction decode stage and an instruction execution stage, and FIG. 7 shows a write back stage and a memory access stage included therein.
The instruction fetch stage of FIG. 5 includes an instruction fetch buffer 54 from an instruction storage means (not shown). The instruction fetch unit 52 for loading an instruction and the value of a program counter 58 for counting a program in progress. The fetch address unit 56 generates an address for accessing the instruction storage means.
The instruction decode stage in FIG. 6 decodes the instruction loaded as the instruction fetch buffer 54 into microcode (microinstruction) as a control signal, and decodes the prefix for the main decoder 62. A decoder including a prefix decoder 60 for output. In this configuration example, each instruction having an instruction length of 16 bits is sent to the decoder in units of 96 bits. When the main decoder is decoding an instruction, the instruction following the instruction is decoded by the prefix decoder 60. That is, the prefix decoder 60 pre-reads the instruction in advance, so that the prefix decoder 60 detects whether the instruction includes the prefix. When the prefix decoder 60 detects a prefix, it outputs a control signal to the main decoder 62 according to the contents of the prefix. By this control signal, the main decoder 60 switches the decoding operation for the instruction after the prefix. In this way, the prefix decoder and the main decoder cooperate to realize the prefix operation. If this prefix is the bank setting prefix of the present invention, the main decoder 62 switches the bank of registers used for the operand included in the instruction to be decoded immediately thereafter. As a result, the register holding the bank assignment of the register file (bank select register 66, FIG. 8) is changed. This register becomes the bank switch means of the present invention. As described above, by using the bank setting prefix, the register bank to which the operand of the subsequent instruction points is switched. That is, the prefix decoder and the main decoder can generate a microinstruction to be a bank designation instruction.
When a bank setting instruction such as a bank set instruction is used instead of the bank setting prefix, the bank setting instruction is decoded by the main decoder, the bank select register is rewritten, and the bank assignment of the register file is changed. As a result, the register bank indicated by the destination operand and the source operand in the subsequent instruction is switched.
The instruction decode stage further includes a register file 64 including a plurality of register banks of the present invention. In the present embodiment, the register bank is described as a data processing register arranged in four banks. Each register bank includes 16 data registers having a 64-bit length that can be specified by a 6-bit field in a normal instruction.
As described above, the instruction decode stage includes the bank select register 66 (FIG. 8). This bank select register holds the current bank assignment status by, for example, a bank set instruction.
The instruction execution stage includes a microcode (microinstruction) control device 610 that specifically executes the instruction decoded in the instruction decode stage, an arithmetic logic unit (ALU), a product-sum operation unit (MAU), and a data process. An arithmetic unit 68 including a unit (DPU) is included. The arithmetic unit 68 executes arithmetic processing designated by the immediate value, the register value in one of the register banks, etc. based on the microinstruction, and passes it to the write back stage.
The instruction decode stage and the instruction execution stage further include characteristic elements of the present invention. Source registers SRC0 and SRC1, which are part of the signal 601, and destination registers DST0 and DST1 are used as outputs from the register file. Among these, each port of the register file connected to the registers SRC0 and DST0 is used in a normal processor, and is connected to the arithmetic unit via the registers SRC0 and DST0. On the other hand, the port of the register file connected to the registers SRC1 and DST1 is a characteristic port of the present invention, and is stored in the store instruction execution unit 620 serving as an output to the external memory (not shown) via the registers SRC1 and DST1. It is connected. In this configuration example, two registers SRC1 and DST1 are described, but the present invention only needs to have at least one such register. The store instruction execution device 620 forms part of the memory access device of the present invention in cooperation with a memory output device 74 (FIG. 7) described later. As a result, the arithmetic unit accesses a part of the register banks of the register file having the register bank configuration via the registers SRC0 and DST0. The external memory can be accessed via In the above description, the processor configuration using microinstructions has been described. However, a hardware configuration can also be realized.
The write back stage of FIG. 7 includes a write back device 72 that writes back the output of the computing unit 68 to a register file, and a memory that includes a tristate driver that writes the output of the computing unit 68 as data to an external memory (not shown). The output device 74 has a memory input device 76 that reads data from the external memory and loads it into the register file 64. At this time, since the banks are different, the memory input device 76 can write data to the register file 64 in parallel with the write back stage.
FIG. 8 shows a configuration example of bank switch means not used in FIGS. 5 to 7 but used in the processor of the present invention. The bank switch means of this configuration example includes a bank select register 66, a register SRCB (84) included in an SSR (system status register) 82, which holds a register bank assigned to a source operand register, and a destination And a register DSTB (86) for holding a register bank assigned to the register of the operand. The bank select register 66 is a register ID, which holds a register bank assigned to each register of the source operand and the destination operand in the instruction decode stage. SRCB (84a) and ID. It consists of DSTB (86a).
First, the operation of these bank switch means will be described when the bank designation command is a bank setting command. A register ID directly rewritten from the main decoder 62. SRCB (84a) and ID. DSTB (86a) is rewritten without going through the instruction execution stage. For this reason, these hold the latest register bank and are used at the time of register read. The same register bank assignment is subsequently written to SRCB (84) and DSTB (86) in SSR 82 in the register file of the control register. When operated in this way, the register bank assignment already included in the instruction is reflected at the end of the instruction decode stage, so that the register bank can be assigned in a short time, for example, one clock. The register bank assignment becomes effective, and it is possible to determine for each source operand and destination operand which register bank in the register file 64 is to be accessed. Further, since this is written to the SSR 82 in the register file at the end of the write back stage, it is possible to hold the register bank assignment as the system status.
Next, a case where the bank designation command is a bank setting modification command and becomes a single command in combination with a normal command immediately after that will be described. In this case, since the bank setting modification instruction only affects the instruction immediately after that, there is no need to write the temporary assignment of the register bank to the SSR 82. Therefore, the bank select register 66 is rewritten at the instruction decode stage, but the written contents are not transmitted to the instruction execution stage and the write back stage. In this way, temporary bank assignment effective only for the normal instruction immediately after the bank setting modification instruction is realized.
With the processor configuration as described above, the register bank configured as the register bank of the present invention can be used. That is, a processor capable of implementing a register switching operation by a register bank setting instruction and an operation of switching a register at the operand point of a subsequent instruction by a register bank setting prefix is provided as specific hardware. In addition, the external memory can be accessed even during the operation of the arithmetic unit, and the stagnation of processing due to the memory latency can be prevented.
Although not shown in the figure, the memory output device 74 and the memory input device 76 can use a well-known cache memory to reduce latency caused by the external memory.
[Multiprocessor system configuration example]
A multiprocessor configuration can be realized by using a plurality of single processors described with reference to FIGS. At this time, the individual processors to be cooperated are connected to the instruction storage means. The instruction storage means stores instructions for causing each multiprocessor to cooperate. Each processor fetches only an instruction required by each processor at a fetch stage according to a program counter of each processor. The instruction of each processor can be a register bank setting instruction similar to that used in the single processor of the present invention or a register bank modification instruction.
The external memory may be accessed from some processors. That is, it is possible to configure the external memory as a shared memory system, share a certain data area in the external memory with a plurality of processors, and share registers among the processors. In this configuration, in order to access an external memory having a large latency, for example, a register that shares a value with another processor (hereinafter referred to as a “shared register”) and a register that is used only by each processor are separated. Can be assigned to a register bank. For example, each processor accesses a register bank including the shared register as a destination operand or a source operand only when performing an operation on the shared register, and accesses other register banks in other operations. The shared memory is accessed only when the shared register is rewritten. When the processors constituting the multiprocessor operate as described above, only the register rewrite result to be shared by the processors is accessed via the shared memory having a large latency. During the access to the shared memory, the arithmetic unit of each processor can execute an instruction that does not handle the shared register. As described above, the multiprocessor system including a plurality of processors according to the present invention realizes a multiprocessor system in which operations in each processor are hardly affected by the latency due to external memory access while guaranteeing data coherency.

Industrial applicability

本発明のプロセッサ構成によって、命令長を短く保ったまま、多数のレジスタを用いることができる。また、メモリのレイテンシーに起因する処理の停滞を防止することができる。
本発明における複数のプロセッサからなるマルチプロセッサシステムでは、データのコヒーレンシーを保証しつつ、各プロセッサにおける演算が外部メモリアクセスによるレイテンシーの影響を受けにくいマルチプロセッサシステムを実現することができる。With the processor configuration of the present invention, a large number of registers can be used while keeping the instruction length short. In addition, the stagnation of processing due to the memory latency can be prevented.
In the multiprocessor system including a plurality of processors according to the present invention, it is possible to realize a multiprocessor system in which operations in each processor are hardly affected by the latency due to external memory access while guaranteeing data coherency.

Claims

An instruction decoder that decodes an instruction including a register bank designation instruction and a normal instruction to generate a control signal;
An arithmetic unit that performs arithmetic processing based on the control signal from the instruction decoder;
Memory access means for inputting / outputting data to / from an external memory based on a signal from the computing unit;
A plurality of register banks each having a register accessible from the computing unit and the memory access means;
Bank switching means controlled by a register bank designation control signal generated by the instruction decoder based on the register bank designation instruction, determining which register bank the computing unit accesses, and the memory access means And bank switch means for determining which register bank to access,
The register bank designation instruction is an instruction to switch a register bank to be assigned to a register used by an operand in order to execute a normal instruction after the register bank designation instruction using the operand by controlling the bank switch means. A processor having a multi-bank register.

The bank switch means is controlled by the register bank designation control signal to determine which of the register banks the arithmetic unit accesses, and the memory access means accesses any of the other register banks. And
2. The processor according to claim 1, wherein the access to the register by the arithmetic unit and the access to the register by the memory access means can be performed simultaneously.

The register bank designation instruction is a register bank setting modification instruction,
The register bank setting modification instruction controls the bank switch means to execute the normal instruction immediately after the register bank setting modification instruction using an operand, so that the normal instruction immediately after the register bank setting modification instruction is executed. The processor according to claim 1 or 2, which is an instruction for switching a register bank to be assigned to a register for an operand.

When the register bank setting modification instruction is immediately preceding, the instruction length is extended from the instruction length of the normal instruction, and the register bank setting modification instruction and the immediately following normal instruction have the extended instruction length. 4. A processor according to claim 3, wherein the processor is one instruction.

The register bank designation instruction is a register bank setting instruction,
In a normal instruction after the register bank setting instruction, the register bank assignment based on the register bank designation instruction is applied to the register used by the operand without requiring an explicit register bank designation. The processor according to claim 1 or 2.

A multiprocessor system comprising at least an instruction storage means, a first processor, a second processor, and an external memory,
The first processor and the second processor are:
An instruction decoder that decodes an instruction including a register bank designation instruction from the instruction storage means and a normal instruction to generate a control signal;
An arithmetic unit that performs arithmetic processing based on the control signal from the instruction decoder;
Memory access means for inputting / outputting data to / from the external memory based on a signal from the computing unit;
A plurality of register banks each having a register accessible from the computing unit and the memory access means;
Bank switching means controlled by a register bank designation control signal generated by the instruction decoder based on the register bank designation instruction, determining which register bank the computing unit accesses, and the memory access means And bank switch means for determining which register bank to access,
The register bank designation instruction is an instruction to switch a register bank to be assigned to a register used by an operand in order to execute a normal instruction after the register bank designation instruction using the operand by controlling the bank switch means. And
The multiprocessor system, wherein the external memory is connected so as to be accessible from both the memory access means of the first processor and the memory access means of the second processor.

The bank switch means is controlled by the register bank designation control signal to determine which of the register banks the arithmetic unit accesses, and the memory access means accesses any of the other register banks. And
7. The multiprocessor system according to claim 6, wherein access to the register by the arithmetic unit and access to the register by the memory access means can be performed simultaneously.

8. The multiprocessor system according to claim 6 or 7, wherein at least one of a memory area accessed by the first processor and a memory area accessed by the second processor in the external memory. A multiprocessor system with shared parts.

An instruction decoder that decodes an instruction including a register bank designation instruction and a normal instruction to generate a control signal, an arithmetic unit that performs arithmetic processing based on the control signal from the instruction decoder, and a signal of the arithmetic unit A memory access means for inputting / outputting data to / from an external memory, a plurality of register banks each including the arithmetic unit and a register accessible from the memory access means, and the arithmetic unit and the memory access means. A control method of a processor comprising bank switch means for determining which register bank to access,
The instruction decoder generating a register bank designation control signal based on the register bank designation instruction;
By controlling the bank switch means with the register bank designation control signal, the register bank assigned to the register used by the operand of the normal instruction after the register bank designation instruction is switched, and the register bank is accessed by the computing unit. A step of controlling whether to
Switching the register bank assigned to the register used by the operand by controlling the bank switch means by the register bank designation control signal, and controlling which register bank the memory access means accesses;
A method for controlling a processor having a multi-bank register, comprising: executing the normal instruction using an operand.

The step of executing the normal instruction with an operand comprises:
10. The multi-bank register according to claim 9, comprising a step in which an access to a register in a certain register bank by the arithmetic unit and an access by a memory access means to a register in another register bank are simultaneously performed. A method for controlling a processor having:

The register bank designation instruction is a register bank setting modification instruction,
The register bank setting modification instruction controls the bank switch means to execute the normal instruction immediately after the register bank setting modification instruction using an operand, so that the normal instruction immediately after the register bank setting modification instruction is executed. The method for controlling a processor according to claim 9 or 10, wherein the instruction is an instruction for switching a register bank assigned to a register of the operand.

When the register bank setting modification instruction is immediately preceding, the instruction length is extended from the instruction length of the normal instruction, and the register bank setting modification instruction and the immediately following normal instruction have the extended instruction length. 12. The method of controlling a processor according to claim 11, wherein the instruction is a single instruction.

The register bank designation instruction is a register bank setting instruction,
In a normal instruction after the register bank designation instruction, register bank allocation based on the register bank designation instruction is applied to a register used by the operand without requiring an explicit register bank designation. A method for controlling a processor according to claim 9 or 10.