JP2005165435A

JP2005165435A - Data transmission method

Info

Publication number: JP2005165435A
Application number: JP2003400391A
Authority: JP
Inventors: Kenji Ikeda; 顕士池田
Original assignee: IP Flex Inc
Current assignee: IP Flex Inc
Priority date: 2003-11-28
Filing date: 2003-11-28
Publication date: 2005-06-23
Anticipated expiration: 2023-11-28
Also published as: JP4359490B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data transmission method for trasmitting configuration data in a simple mechanism to a plurality of progressing elements(PE) formed of reconstitutable circuit areas. <P>SOLUTION: Preliminarily, FF 61 corresponding to each of a plurality of PE 21 are serially connected so that a transfer path 51 can be formed, and a plurality of data areas 75 are sequentially and continuously transferred between the plurality of FF 61. When the data area 75 pertinent to the PE 21 is transferred, the data of the data area 75 are read, and the data are written in the data area 75 so that the data can be exchanged by using the transfer path 51. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、複数のプロセッシングエレメントに対し、あるいはそれらの間でデータを転送するのに適した方法に関するものである。 The present invention relates to a method suitable for transferring data to or between a plurality of processing elements.

特表平１０−５０５９９３号公報には、ＦＰＧＡに構成データを供給する幾つかの方法が開示されている。ＦＰＧＡにシリアル・データで構成データを供給する構成においては、構成モードのときに、ＦＰＧＡからクロック信号が供給されてＰＲＯＭが選択され、ＦＰＧＡの構成が終了するとＰＲＯＭの選択が解除される。ＦＰＧＡが並列ＲＯＭに接続されている構成では、構成モードのときにＦＰＧＡがアドレス信号を出力し、データ信号を受け取る。また、構成モードのときに、ＦＰＧＡの代わりにマイクロプロセッサを用いてＥＰＲＯＭをアドレス指定して構成データを受け取ることが記載されている。さらに、メモリの側でＦＰＧＡを構成モードにして構成データを送信し、ＦＰＧＡから構成が完了したことが示されると構成データの送信を中止させることが記載されている。
特表平１０−５０５９９３号公報 Japanese Patent Publication No. 10-505993 discloses several methods for supplying configuration data to an FPGA. In the configuration in which configuration data is supplied to the FPGA as serial data, the clock signal is supplied from the FPGA to select the PROM in the configuration mode, and the selection of the PROM is canceled when the configuration of the FPGA is completed. In the configuration in which the FPGA is connected to the parallel ROM, the FPGA outputs an address signal and receives a data signal in the configuration mode. Further, it is described that, in the configuration mode, the configuration data is received by addressing the EPROM using a microprocessor instead of the FPGA. Further, it is described that the configuration data is transmitted by setting the FPGA in the configuration mode on the memory side, and the transmission of the configuration data is stopped when it is indicated that the configuration is completed from the FPGA.
Japanese National Patent Publication No. 10-505993

ＦＰＧＡ（Field Programmable Gate Array）に対し、動的に回路を再構成することを目的としたプロセッサ（ダイナミック・リコンフィグラブル・プロセッサ）が提案されており、たとえば、国際公開ＷＯ０２／０９５９４６号を挙げることができる。この国際公開ＷＯ０２／０９５９４６号には、２次元に複数のエレメントが配置されたマトリクスと称される領域を有し、マトリクス内に縦横に配置された配線群の接続を切り替えることにより複数のエレメントによりフレキシブルにデータフロー（データパス）を再構成できるようにしている。動的に再構成可能な構成は、これに限定されるものではなく、エレメントをツリー状に接続したり、隣り合うエレメント同士を接続し、エレメントを通信経路として用いる構成などもある。 For FPGA (Field Programmable Gate Array), a processor (dynamic reconfigurable processor) for dynamically reconfiguring a circuit has been proposed. For example, International Publication WO02 / 095946 is cited. Can do. This International Publication No. WO02 / 095946 has a region called a matrix in which a plurality of elements are arranged two-dimensionally, and a plurality of elements can be switched by switching the connection of wiring groups arranged vertically and horizontally in the matrix. The data flow (data path) can be reconfigured flexibly. The dynamically reconfigurable configuration is not limited to this, and there is a configuration in which elements are connected in a tree shape, adjacent elements are connected, and the elements are used as communication paths.

複数のエレメントにより構成される回路を動的に再構成しようとした場合、各エレメントに対し、その機能を制御する構成データ（コンフィグレーションデータ）をタイムリーに提供できることが重要である。コンフィグレーションデータを送受信するために多大なサイクルを消費するようなデータ伝送方法は、再構成に時間を要し、動的再構成プロセッサには適さない。したがって、回路側を構成モードに変更してクロックまたはアドレスを出力して、構成データを取得し、構成の変更が終了すると構成データの送信を停止するという多数のステップを要するデータ伝送方法は動的再構成プロセッサには適した方法とは言えない。 When a circuit composed of a plurality of elements is dynamically reconfigured, it is important that configuration data (configuration data) for controlling the function can be provided to each element in a timely manner. A data transmission method that consumes a large number of cycles to send and receive configuration data requires time for reconfiguration and is not suitable for a dynamic reconfiguration processor. Therefore, the data transmission method that requires many steps of changing the circuit side to the configuration mode, outputting the clock or address, obtaining the configuration data, and stopping the transmission of the configuration data when the configuration change is completed is dynamic. This is not a suitable method for reconfigurable processors.

シリアル・データを転送する代わりに、回路を構成する複数のエレメントに対してバス接続によりパラレルにデータを転送する方法は、データの転送時間を短縮できる点で優れている。しかしながら、大規模な回路を再構成可能なプロセッサで実現する場合、コンフィグレーションデータを転送する対象となるエレメント数が膨大になり、バス幅が大きくなる。そのため、コンフィグレーションデータを転送するためのハードウェアリソースが大きくなり、プロセッサが大型で高価なものになる。共有バス形式を採用することによりハードウェアリソースの問題は多少改善される。しかしながら、各エレメントをアドレスで指定してバスを占有した状態でデータを転送する必要があるので転送に要する時間は増加する。共有バス形式の代わりに、データパケットをフリッツと称される小片に分けて複数のエレメントを数珠繋ぎにして転送するワームホールルーティングという転送方法もある。しかしながら、データの送信先のエレメントに到達するルートが空いていないと、デッドロック状態となり常に安定した条件でデータを転送することは難しい。 A method of transferring data in parallel by bus connection to a plurality of elements constituting a circuit instead of transferring serial data is excellent in that the data transfer time can be shortened. However, when a large-scale circuit is realized by a reconfigurable processor, the number of elements to which configuration data is transferred becomes enormous and the bus width increases. Therefore, hardware resources for transferring configuration data are increased, and the processor is large and expensive. By adopting a shared bus format, the hardware resource problem is somewhat improved. However, since it is necessary to transfer data while designating each element with an address and occupying the bus, the time required for the transfer increases. Instead of the shared bus format, there is also a transfer method called wormhole routing in which data packets are divided into small pieces called Fritz and a plurality of elements are connected in a daisy chain. However, if there is no available route to reach the data transmission destination element, it becomes a deadlock state and it is difficult to always transfer data under stable conditions.

さらに、これらのデータ伝送方法では、送信側が受信側に対してデータ転送がレディーをアナウンスし、受信側が送信側にデータを受信できる状態であることをアナウンスするという手続きが必要であり、その手続きに要する処理時間あるいはハードウェアが必要となる。そして、再構成可能な大規模な回路を実現しようとした場合、これらのデータ転送のための前および後処理のための時間あるいはハードウェアが無視できないものとなる可能性がある。 Furthermore, these data transmission methods require a procedure in which the sending side announces that the data transfer is ready to the receiving side, and the receiving side announces that the sending side is ready to receive data. It requires processing time or hardware. When a large-scale reconfigurable circuit is to be realized, there is a possibility that the time or hardware for pre-processing and post-processing for data transfer cannot be ignored.

そこで、本発明においては、簡易な構成で、常に安定した状態でデータを転送できる伝送方法およびデータ処理装置を提供することを目的としている。また、データ転送の前後処理のために要する時間あるいはハードウェアを省略することができる伝送方法およびデータ処理装置を提供することを目的としている。 Therefore, an object of the present invention is to provide a transmission method and a data processing apparatus that can transfer data in a stable state with a simple configuration. It is another object of the present invention to provide a transmission method and a data processing apparatus that can omit the time or hardware required for pre- and post-processing of data transfer.

本発明においては、複数のプロセッシングエレメントのそれぞれに対応する複数のレジスタを含むレジスタ群のレジスタを予め直列に接続してシフトレジスタ方式の転送路を形成する。そして、その転送路を使用し、レジスタ群に含まれるレジスタの間で、複数のデータ領域を順番に、継続して転送する工程と、レジスタ群の１のレジスタに転送されたデータ領域を、そのレジスタに対応するプロセッシングエレメントが使用可能であれば、そのデータ領域のデータを読出し、および／またはデータ領域にデータを書き込む入出力工程とを有するデータ伝送方法を本発明では提供する。また、本発明においては、複数のプロセッシングエレメントと、これら複数のプロセッシングエレメントのそれぞれに対応する複数のレジスタを含むレジスタ群であって、レジスタ群の含まれるレジスタが予め転送路を形成するように直列に接続され、それらのレジスタの間で、複数のデータ領域を順番に、継続して転送するレジスタ群とを有するデータ処理装置を提供する。このデータ処理装置においては、プロセッシングエレメントは、複数のレジスタのうち、当該プロセッシングエレメントに対応するレジスタに転送されたデータ領域が、当該プロセッシングエレメントで使用可能であれば、そのデータ領域のデータを読出し、および／またはデータ領域にデータを書き込む入出力手段を備えている。 In the present invention, a register of a register group including a plurality of registers corresponding to each of a plurality of processing elements is connected in series in advance to form a shift register type transfer path. Then, using the transfer path, a step of continuously transferring a plurality of data areas in order between the registers included in the register group, and a data area transferred to one register of the register group, If a processing element corresponding to a register is usable, the present invention provides a data transmission method including an input / output step of reading data in the data area and / or writing data in the data area. In the present invention, a register group including a plurality of processing elements and a plurality of registers corresponding to each of the plurality of processing elements, wherein the registers included in the register group form a transfer path in advance. And a register group that sequentially transfers a plurality of data areas between these registers. In this data processing device, if the data area transferred to the register corresponding to the processing element among the plurality of registers is usable in the processing element, the processing element reads the data in the data area, And / or input / output means for writing data into the data area.

このデータ伝送方法およびデータ処理装置においては、予め直列に接続された複数のレジスタにより転送路が形成されている。また、データそのものではなく、データ領域を転送する。この明細書においてデータ領域とは、データを格納するために配分された領域を示し、コンフィグレーションデータなどのプロセッシングエレメントにおいて利用可能な、あるいは意味のあるデータがすでに格納されている場合も、ダミーデータとして意味のないデータを格納することにより将来の利用のために領域が確保されている場合も含む。したがって、転送する工程においては、データ領域を転送するので、データがレディーであるか否かを確認したり、データがアクセプタブルであるかを確認したりする必要はなく、接続された上流のレジスタから下流のレジスタに単純にサイクル単位でデータ領域が転送できる。さらに、予め転送路は形成されているので、ルーティングの必要はない。このため、ルーティングのため、さらには、データを転送する際にその前後で信号を交換するために費やされる時間およびハードウェアを省略することができる。 In this data transmission method and data processing apparatus, a transfer path is formed by a plurality of registers connected in series in advance. Also, the data area is transferred instead of the data itself. In this specification, the data area indicates an area allocated for storing data, and dummy data can be used even when data that can be used in a processing element such as configuration data or has meaningful data has already been stored. This includes the case where an area is reserved for future use by storing meaningless data. Therefore, since the data area is transferred in the transfer process, there is no need to check whether the data is ready or whether the data is acceptable, and the connected upstream register. The data area can be simply transferred in a cycle unit to the downstream register. Furthermore, since the transfer path is formed in advance, there is no need for routing. For this reason, the time and hardware spent for exchanging signals before and after data transfer can be omitted.

転送路を構成するレジスタを集中して並べてシフトレジスタを形成し、そのシフトレジスタとプロセシングエレメントを適当な配線で接続するような配置配線も可能である。最も簡易で、配線遅延もない配置配線は、適当な方向に隣接するプロセッシングエレメント内に配置されたレジスタを接続してシフトレジスタ型の転送路を形成することである。転送路を形成するために消費される配線は最小になり、また、発信地点から受信地点が離れている場合も、シフトレジスタ型の転送路を転送するためのレイテンシーは考慮する必要があるとしても、配線による遅延は考慮する必要がなくなる。プロセッシングエレメント間が物理的に離れている場合は、任意の場所に中継用のシフトレジスタを挿入することが可能であり、それにより制御やハードウェアが追加になることはない。 It is also possible to arrange and wire such that the registers constituting the transfer path are concentrated and arranged to form a shift register and the shift register and the processing element are connected by appropriate wiring. The simplest arrangement and wiring without wiring delay is to connect a register arranged in an adjacent processing element in an appropriate direction to form a shift register type transfer path. The wiring consumed to form the transfer path is minimized, and even when the reception point is far from the transmission point, the latency for transferring the shift register type transfer path needs to be considered. Therefore, there is no need to consider the delay due to wiring. When the processing elements are physically separated from each other, a relay shift register can be inserted at an arbitrary place, and no control or hardware is added.

個々のプロセッシングエレメントでは、入出力工程において、レジスタ群に含まれる複数のレジスタのうち、当該プロセッシングエレメントに対応するレジスタに転送されたデータ領域が、当該プロセッシングエレメントで使用可能であれば、そのデータ領域のデータを読出し、および／またはデータ領域にデータを書き込む。これにより、個々のプロセッシングエレメントは、個々のプロセッシングエレメント宛のメッセージなどのデータを転送路から受信し、他のプロセッシングエレメントまたは制御ユニット宛のメッセージを転送路に投げることができ、プロセッシングエレメント間または制御ユニットとプロセッシングエレメント間でデータを交換できる。したがって、個々のプロセッシングエレメントにおける入力および／または出力処理は独立しており、エレメント間の制御信号は不要である。 In each processing element, in the input / output process, if the data area transferred to the register corresponding to the processing element among the plurality of registers included in the register group is usable in the processing element, the data area Read data and / or write data to the data area. This allows individual processing elements to receive data, such as messages addressed to individual processing elements, from the transfer path and to throw messages addressed to other processing elements or control units to the transfer path, between processing elements or control Data can be exchanged between units and processing elements. Therefore, input and / or output processing in each processing element is independent, and a control signal between elements is unnecessary.

データ領域の使用の可否を含むプロパティをプロセッシングエレメントで制御することができる。データ領域が自己のプロセッシングエレメント宛になっていれば、そのデータ領域のデータあるいはメッセージをダウンロードすることができ、また、データ領域が未使用であれば自己のデータあるいはメッセージを送信するために利用できる。プライオリティを設定すれば、他のエレメントが使用中のデータ領域でも緊急度の高いメッセージの交換のために使用するデータ伝送方法を提供できる。転送路に接続された制御ユニットが、複数のデータ領域のプロパティを決定することも可能である。制御ユニットがデータの発信元であれば、受信先のエレメントを指定したデータ領域を送り出すことができる。また、あるエレメントから他のエレメントにデータを転送する場合は、そのようなプロパティをデータ領域にセットして送りだすことにより、転送路のトラフィックを制御できる。 Properties including the availability of the data area can be controlled by the processing element. If a data area is addressed to its own processing element, data or messages in that data area can be downloaded, and if the data area is unused, it can be used to send its own data or message. . By setting the priority, it is possible to provide a data transmission method used for exchanging messages with a high degree of urgency even in a data area being used by another element. It is also possible for the control unit connected to the transfer path to determine the properties of a plurality of data areas. If the control unit is a data transmission source, it is possible to send out a data area in which a destination element is designated. When data is transferred from one element to another element, the traffic on the transfer path can be controlled by setting such a property in the data area and sending it out.

転送路が閉じていることは有効である。閉じていない場合は、上流のエレメントから下流のエレメントにしかデータを送れないが、閉じていれば下流のエレメントから上流のエレメントにデータを転送できる。制御ユニットは、あるエレメントからデータを入手したい場合は、そのエレメントが使用するようにデータ領域をセットして送り出すことにより、閉じた転送路によりエレメントからの応答を得ることができる。そのために、データ領域に、入出力工程における処理およびその処理の対象となるアドレスを指示するデータをセットすることができる。データ領域に対するプロセッシングエレメントの処理時間を確保するために、複数のレジスタを１つのプロセッシングエレメントに割り当てた転送路も可能である。 It is effective that the transfer path is closed. If it is not closed, data can be sent only from the upstream element to the downstream element, but if it is closed, data can be transferred from the downstream element to the upstream element. When the control unit wants to obtain data from an element, the control unit can obtain a response from the element through a closed transfer path by setting and sending out the data area to be used by the element. Therefore, data indicating the process in the input / output process and the address to be processed can be set in the data area. In order to secure processing time of the processing element for the data area, a transfer path in which a plurality of registers are assigned to one processing element is also possible.

送受信可能なデータ量と同じあるいは大きな容量のレジスタは基本的に不要である。１つのデータ量が大きなときは、入出力工程においてプロセッシングエレメント毎に独立した処理が行われるデータ単位を、連続した複数のデータ領域により伝送することが可能である。すなわち、本発明の転送路とデータ領域とを用いることにより、データパケットをレジスタのサイズに分割して数珠繋ぎに転送することができる。連続した複数のデータ領域を用いて伝送することにより、最初のデータ領域に送信先を設定することにより後続のデータ領域の管理が可能となる。本発明のデータ伝送方法は、データパケットが大きくなっても、それを転送するためにデータパケット全体を一時的にバッファリングする必要はない。直列に接続されたシフトレジスタの量が増加すると、レイテンシーが大きくなる可能性はあるが、バッファリングによる遅延は増加しない。共有バスシステムと比較すると、送受信側でパケット全体をいったんバッファリングする必要がないので、その分、処理時間は減少する。 A register having the same or larger capacity as the amount of data that can be transmitted and received is basically unnecessary. When one data amount is large, it is possible to transmit a data unit, which is processed independently for each processing element in the input / output process, by a plurality of continuous data areas. That is, by using the transfer path and the data area of the present invention, the data packet can be divided into register sizes and transferred in a daisy chain. By transmitting using a plurality of continuous data areas, it is possible to manage subsequent data areas by setting a transmission destination in the first data area. In the data transmission method of the present invention, even if a data packet becomes large, it is not necessary to temporarily buffer the entire data packet in order to transfer it. As the amount of shift registers connected in series increases, latency may increase, but delay due to buffering does not increase. Compared to the shared bus system, the entire packet does not need to be buffered once on the transmission / reception side, so the processing time is reduced accordingly.

しかしながら、多数のプロセッシングエレメントを有するデータ処理装置において、レイテンシーを低減したい場合は、複数の転送路を設け、分配ユニットによりそれら複数の転送路を制御ユニットに接続し、複数のデータ領域を、それぞれのデータ領域を使用可能なプロセッサエレメントが属する転送路に振り分けることができる。この振り分ける工程を設けることにより、所望のプロセッシングエレメントに到達する転送路を形成するシフトレジスタの段数が減るのでレイテンシーが減少する。 However, in a data processing apparatus having a large number of processing elements, when it is desired to reduce the latency, a plurality of transfer paths are provided, and the plurality of transfer paths are connected to the control unit by a distribution unit, and a plurality of data areas are connected to The data area can be allocated to the transfer path to which the usable processor element belongs. By providing this distribution step, the number of shift registers forming a transfer path reaching a desired processing element is reduced, so that the latency is reduced.

さらに、これら複数の転送路が、それぞれのレイテンシーが同一の閉じた転送路であれば、制御ユニットは、分配ユニットを介して複数の転送路に順番に送り出したデータ領域を衝突することなく受信することができる。このため、本発明により、複数の転送路を用いたデータ伝送方法であって、特別な制御を必要としないデータ伝送方法を提供できる。 Further, if the plurality of transfer paths are closed transfer paths having the same latency, the control unit receives the data areas sequentially sent to the plurality of transfer paths via the distribution unit without colliding with each other. be able to. Therefore, according to the present invention, it is possible to provide a data transmission method using a plurality of transfer paths, which does not require special control.

本発明を適用する好適な形態は、複数のプロセッシングエレメントを有するデータ処理装置であって、複数のプロセッシングエレメントの機能を変更することにより、それら複数のプロセッシングエレメントの接続からなる回路の構成を再構成できるものである。各々のプロセッシングエレメントは、機能を変更可能なデータパス領域と、このデータパス領域を設定するための複数のコンフィグレーション情報を記憶するメモリとを備えている。伝送経路をメリーゴーランドのように動くデータ領域にコンフィグレーションデータを格納して所望のプロセッシングエレメントに転送することにより、簡易な機構により、コンフィグレーションデータをタイムリーに供給できる。入出力工程では、データ領域からメモリにデータを転送し、再構成可能な回路領域を再構成できる。また、制御ユニットから出力用のデータ領域を送って、所望のプロセッシングエレメントのメモリからデータ領域にデータを転送することにより、制御ユニットにデータを送ることができる。 A preferred embodiment to which the present invention is applied is a data processing apparatus having a plurality of processing elements, and by reconfiguring the function of the plurality of processing elements, the configuration of a circuit formed by connecting the plurality of processing elements is reconfigured It can be done. Each processing element includes a data path area whose function can be changed, and a memory that stores a plurality of configuration information for setting the data path area. By storing the configuration data in a data area that moves like a merry-go-round on the transmission path and transferring it to a desired processing element, the configuration data can be supplied in a timely manner with a simple mechanism. In the input / output process, data can be transferred from the data area to the memory, and a reconfigurable circuit area can be reconfigured. Further, the data can be sent to the control unit by sending the data area for output from the control unit and transferring the data from the memory of the desired processing element to the data area.

複数のデータ領域に格納される情報はプレーンな情報である必要はない。暗号化されたデータを格納する領域を設けることができ、データ領域全体が暗号化されている場合も含まれる。入出力工程および入出力手段では、データ領域のデータを読出してデコードし、また、データ領域に暗号化したデータを書き込むことができる。 Information stored in a plurality of data areas need not be plain information. An area for storing the encrypted data can be provided, and the case where the entire data area is encrypted is also included. In the input / output process and the input / output means, data in the data area can be read and decoded, and encrypted data can be written in the data area.

本発明においては、複数のプロセッシングエレメントに対応する複数のレジスタを直列に接続した転送路を使用し、各プロセッシングエレメントが利用できるデータ領域をメリーゴーランドのように巡回させることによりエレメント間およびエレメントと制御ユニット間のデータを交換する。シフトレジスタ型の転送路をデータ領域が循環し、個々のエレメントがそれを利用する形態なので、データを転送するためのハードウェアは簡易であり、さらに、フロー制御も不要となる。このため、多数のプロセッシングエレメントを搭載するデータ処理装置、例えば、再構成可能なＬＳＩにおいて、プロセッシングエレメントに対して個別にコンフィグレーション情報を供給するためのハードウェアリソースを削減でき、コンパクトで低コストの再構成可能なＬＳＩを提供できる。 In the present invention, a transfer path in which a plurality of registers corresponding to a plurality of processing elements are connected in series is used, and a data area that can be used by each processing element is circulated like a merry-go-round, between elements and between the element and the control unit. Exchange data between. Since the data area circulates through the shift register type transfer path and each element uses it, hardware for transferring data is simple, and flow control is also unnecessary. For this reason, in a data processing apparatus equipped with a large number of processing elements, for example, a reconfigurable LSI, hardware resources for supplying configuration information to the processing elements individually can be reduced, and the compact and low cost can be achieved. A reconfigurable LSI can be provided.

図１に、データ処理装置の一例を示してある。このデータ処理装置１は、チップ化されたプロセッシングユニット（ＰＵ）であり、再構成可能な領域１９と、この再構成可能な領域１９を再構成する機能、および入出力を制御する機能などの周辺機能をサポートする、構成が固定された領域とを備えている。 FIG. 1 shows an example of a data processing apparatus. This data processing device 1 is a chip processing unit (PU), and includes a reconfigurable area 19, a function for reconfiguring the reconfigurable area 19, and a function for controlling input / output. And a fixed configuration area that supports the function.

再構成可能な回路領域１９は、図１において複数のセグメント１０から１５に分かれて記載されているが、図２に示すように、複数のエレメントがアレイまたはマトリクス状に２次元に配置されており、マトリクスと称される構成である。本例のマトリクス１９は、縦横に２次元に配置された複数のプロセッシングエレメント（ＰＥ）２１と、それらの間に格子状に配置された配線２２と、配線２２の接続ポイントで縦横の配線２２の接続を自由に切り替えることができるスイッチングユニット２３とを備えている。ＰＥ２１は、ルックアップテーブルなどにより自在に機能を設定可能なものであっても良い。本例では、算術論理演算用のエレメント、遅延用のエレメント、メモリ用のエレメント、データを入力または出力するためにアドレスを発生させるエレメント、データの入力または出力用のエレメントなど、ある程度の機能グループに分けて、それぞれの処理に適した内部構成のエレメントを配置することによりマトリクス１９のスペース効率を向上している。また、ある程度の機能グループに分けたエレメントを配置することにより冗長性が減少するのでＡＣ特性および処理速度も向上できるといったメリットを得ている。 The reconfigurable circuit area 19 is divided into a plurality of segments 10 to 15 in FIG. 1, but a plurality of elements are two-dimensionally arranged in an array or matrix as shown in FIG. This is a configuration called a matrix. The matrix 19 of this example includes a plurality of processing elements (PE) 21 arranged two-dimensionally in the vertical and horizontal directions, wirings 22 arranged in a lattice pattern between them, and vertical and horizontal wirings 22 at connection points of the wirings 22. And a switching unit 23 that can freely switch the connection. The PE 21 may be a function that can be freely set by a lookup table or the like. In this example, there are some functional groups such as elements for arithmetic and logic operations, delay elements, memory elements, elements that generate addresses to input or output data, and elements that input or output data. The space efficiency of the matrix 19 is improved by arranging elements having internal configurations suitable for the respective processes. In addition, since the redundancy is reduced by arranging the elements divided into a certain number of function groups, there is an advantage that AC characteristics and processing speed can be improved.

図３および図４は、ＰＥ２１の一例である。ＰＥ２１は、機能を変更可能な内部データパス領域２９と、その内部データパス領域２９の機能を設定する設定ユニット６０とを備えている。図３に示したＰＥ２１ａの内部データパス領域２９ａは、カウンタなどからなるアドレス発生回路２８と、セレクタＳＥＬとを備えており、設定ユニット６０により設定された条件で生成されたアドレスが出力信号ｄｏとして配線２２に出力される。この出力信号ｄｏは、行配線および列配線を介して、そのまま、あるいは、他のＰＥ２１によって処理された後に入力信号ｄｉｘあるいはｄｉｙとしてＰＥ２１ａにフィードバックされる。そして、設定ユニット６０によりセットされた条件でセレクタＳＥＬが選択したアドレスがマトリクス１９からデータ入力あるいは出力用のアドレスとして出力される。ＰＥ２１ａは、配線２２のいずれかの配線から入力データを選択し、また、出力データを出力するためのセレクタ（不図示）も備えており、それらの設定も設定ユニット６０により行われる。 3 and 4 are examples of PE21. The PE 21 includes an internal data path area 29 whose function can be changed, and a setting unit 60 for setting the function of the internal data path area 29. The internal data path area 29a of the PE 21a shown in FIG. 3 includes an address generation circuit 28 composed of a counter and the like, and a selector SEL, and an address generated under the conditions set by the setting unit 60 is used as the output signal do. Output to the wiring 22. The output signal do is fed back to the PE 21a as an input signal dix or diy as it is or after being processed by another PE 21 via the row wiring and the column wiring. The address selected by the selector SEL under the conditions set by the setting unit 60 is output from the matrix 19 as an address for data input or output. The PE 21 a also includes a selector (not shown) for selecting input data from any of the wirings 22 and outputting the output data, and setting thereof is performed by the setting unit 60.

図４に示したＰＥ２１ｂは、算術演算および論理演算に適した構成である。内部データパス部２９ｂは、シフト回路ＳＨＩＦＴ、マスク回路ＭＡＳＫ、論理演算ユニットＡＬＵを備えている。そして、上記のＰＥ２１ａと同様に、設定部６０により、これらのシフト回路ＳＨＩＦＴ、マスク回路ＭＡＳＫ、論理演算ユニットＡＬＵの状態が設定される。したがって、入力データｄｉｘおよびｄｉｙを加算あるいは減算したり、比較したり、論理和あるいは論理積を演算することができ、その結果が出力信号ｄｏとして配線（バス）２２に出力することができる。 The PE 21b shown in FIG. 4 has a configuration suitable for arithmetic operations and logical operations. The internal data path unit 29b includes a shift circuit SHIFT, a mask circuit MASK, and a logical operation unit ALU. Similarly to the PE 21a, the setting unit 60 sets the states of the shift circuit SHIFT, the mask circuit MASK, and the logical operation unit ALU. Therefore, the input data dix and diy can be added or subtracted, compared, or a logical sum or logical product can be calculated, and the result can be output to the wiring (bus) 22 as an output signal do.

図１および図２に示すように、ＰＵ１のマトリクス１９は、３６８個のＰＥ２１を備えており、それらに対してコンフィグレーションデータを供給する転送路を分けて形成するために６つのセグメント１０〜１５に形式上分割されている。しかしながら、複数のＰＥ２１からデータフローを構成し、入力データを処理するという点では、これらのセグメント１０〜１５によりＰＥ２１が複数にグループ分けされているのではない。したがって、配線群２２によりＰＥ２１をフレキシブルに接続し、複数のセグメントに跨ったデータフロー（データパス）を自由に構成できるようになっている。 As shown in FIGS. 1 and 2, the matrix 19 of PU1 includes 368 PEs 21, and six segments 10 to 15 are formed in order to separately form transfer paths for supplying configuration data thereto. Is divided into forms. However, the PE 21 is not grouped into a plurality of segments by these segments 10 to 15 in that a data flow is constituted by a plurality of PEs 21 and input data is processed. Therefore, the PE 21 can be flexibly connected by the wiring group 22, and a data flow (data path) across a plurality of segments can be freely configured.

マトリクス１９の内部に、複数のＰＥ２１により構成されるデータパスで処理されるデータを入出力するインターフェイスが２種類用意されている。１つは、ダイレクト入力３１ａ〜３１ｃおよびダイレクト出力３２ａ〜３２ｃであり、ＰＥ２１に対して外部から直にデータを入力し、また、出力することができる。ダイレクト入力３１ａ〜３１ｂおよびダイレクト出力３２ａ〜３２ｃを用いて複数のＰＵ１を接続して、データパスを構成するＰＥ２１の実数をさらに増やすことができる。これにより、１つのチップ（ＰＵ）１では回路要素が不足するようなアプリケーションの処理も複数のチップ１を連結することにより対処できる。 Two types of interfaces for inputting / outputting data to be processed in a data path constituted by a plurality of PEs 21 are prepared in the matrix 19. One is direct inputs 31a to 31c and direct outputs 32a to 32c, and data can be directly input to and output from the PE 21 from the outside. A plurality of PU1s can be connected using the direct inputs 31a to 31b and the direct outputs 32a to 32c, and the real number of PEs 21 constituting the data path can be further increased. As a result, application processing in which circuit elements are insufficient in one chip (PU) 1 can be dealt with by linking a plurality of chips 1.

ＰＵ１は、さらに、他の入力方法として、入力バッファ３３と出力バッファ３４とを用いてマトリクス１９にデータを供給する構成を備えている。入力バッファ３３は４つの入力エレメントＬＤＢを備えており、バッファ３３の構成および制御をコンフィグレーションデータにより設定できるようになっている。出力バッファ３４も同様であり、４つの出力エレメントＳＴＢを備えており、構成および制御をコンフィグレーションデータにより設定できるようになっている。 The PU 1 further has a configuration for supplying data to the matrix 19 using an input buffer 33 and an output buffer 34 as another input method. The input buffer 33 includes four input elements LDB, and the configuration and control of the buffer 33 can be set by configuration data. Similarly, the output buffer 34 includes four output elements STB, and the configuration and control can be set by configuration data.

マトリクス１９および入出力バッファ３３および３４に対するコンフィグレーションデータは、シフトレジスタ方式のデータ伝送機構５０により、ＲＩＳＣ３５あるいは他のＰＵ１などから供給される。本例においては、データ伝送機構５０は、転送コントロールユニット（ＴＣＵ）５９と、各セグメント１０〜１５、入力バッファ３３および出力バッファ３４のそれぞれに張り巡らされた転送路５１〜５８とを備えている。ＴＣＵ５９は、バススイッチングユニット（バスインターフェイス、ＢＳＵ）３６に接続されており、ＲＩＳＣ３５はＢＳＵ３６を介してコンフィグレーションデータをＴＣＵ５９に供給する。 Configuration data for the matrix 19 and the input / output buffers 33 and 34 is supplied from the RISC 35 or another PU 1 by the shift register type data transmission mechanism 50. In this example, the data transmission mechanism 50 includes a transfer control unit (TCU) 59 and transfer paths 51 to 58 that extend around each of the segments 10 to 15, the input buffer 33, and the output buffer 34. . The TCU 59 is connected to a bus switching unit (bus interface, BSU) 36, and the RISC 35 supplies configuration data to the TCU 59 via the BSU 36.

図１に示すようにＢＳＵ３６には、複数のコンポーネントあるいはインターフェイスが接続されており、ＲＩＳＣ３５に限らず、これらのコンポーネントあるいはインターフェイスを用いてコンフィグレーションデータをマトリクス１９に送り込むことができる。まず、ＢＳＵ３６にはＳＤＲＡＭインターフェイス３７が接続されており、外部メモリからコンフィグレーションデータを提供できる。また、ＰＣＩバスインターフェイス３８とも接続されているので、ＰＣＩバスに接続された外部プロセッサからもコンフィグレーションデータを供給できる。また、その他のコンポーネントとしてＤＭＡＣ３９もＢＳＵ３６に接続されており、ＲＩＳＣ３５に変わってコンフィグレーションデータの供給を制御することが可能である。その他に、シリアルインターフェイスコントローラとなる非同期通信機（ＵＡＲＴ）などの汎用インターフェイス４０がバスブリッジ回路４１を介してＢＳＵ３６に接続されている。さらに、ＢＳＵ３６には、マトリクス１９の入力バッファ３３および出力バッファ３４が接続されており、上述したインターフェイスを介してマトリクス１９にデータを入出力することができる。 As shown in FIG. 1, a plurality of components or interfaces are connected to the BSU 36, and configuration data can be sent to the matrix 19 using these components or interfaces, not limited to the RISC 35. First, an SDRAM interface 37 is connected to the BSU 36, and configuration data can be provided from an external memory. Further, since it is also connected to the PCI bus interface 38, configuration data can be supplied from an external processor connected to the PCI bus. As another component, a DMAC 39 is also connected to the BSU 36, and it is possible to control the supply of configuration data instead of the RISC 35. In addition, a general-purpose interface 40 such as an asynchronous communication device (UART) serving as a serial interface controller is connected to the BSU 36 via a bus bridge circuit 41. Furthermore, the input buffer 33 and the output buffer 34 of the matrix 19 are connected to the BSU 36, and data can be input / output to / from the matrix 19 through the interface described above.

図５に、データ伝送機構５０の各転送路５１〜５８のさらに詳しいルーティングを示してある。また、図６に、各エレメントにおいて転送路を用いてデータを入出力する機構を示してある。それぞれの転送路５１〜５６は、図５に示した夫々のセグメント１０〜１５に含まれるＰＥ２１に設けられた１ワード（３２ビット）のレジスタ（フリップフロップ）を直列に繋ぐ配線である。転送路５１〜５６は、主に、それぞれのＰＥ２１に対応して設けられた複数のレジスタを直列に接続することにより構成されている。ＰＥ２１のレジスタの間あるいは前後に独立したレジスタを接続することが可能であり、転送路が長くなったり、レイテンシーの調整を行う必要がある場合は有効である。 FIG. 5 shows more detailed routing of the transfer paths 51 to 58 of the data transmission mechanism 50. FIG. 6 shows a mechanism for inputting / outputting data using a transfer path in each element. Each of the transfer paths 51 to 56 is a wiring that serially connects 1-word (32-bit) registers (flip-flops) provided in the PE 21 included in each of the segments 10 to 15 shown in FIG. The transfer paths 51 to 56 are mainly configured by connecting a plurality of registers provided corresponding to each PE 21 in series. It is possible to connect independent registers between or before and after the registers of PE21, which is effective when the transfer path becomes long or the latency needs to be adjusted.

図６を参照して、転送路５１を例に各々のＰＥ２１における構成を説明する。ＰＥ２１の設定ユニット６０は３２ビットのレジスタ（ＦＦ）６１を備えており、３２ビットの転送路５１により前後に隣接するＰＥ２１のＦＦ６１と接続されている。したがって、転送路５１では、１つのＰＥ２１のＦＦ６１に伝送された１ワードのデータ７５は、１クロックあるいは１サイクルの遅延で次のＰＥ２１のＦＦ６１に伝送される。 With reference to FIG. 6, the configuration of each PE 21 will be described using the transfer path 51 as an example. The setting unit 60 of the PE 21 includes a 32-bit register (FF) 61, and is connected to the FF 61 of the PE 21 adjacent to the front and rear by a 32-bit transfer path 51. Therefore, in the transfer path 51, 1-word data 75 transmitted to the FF 61 of one PE 21 is transmitted to the FF 61 of the next PE 21 with a delay of one clock or one cycle.

設定ユニット６０は、さらに、ＦＦ６１に格納されたデータ７５をデコードするデコーダ６２と、データ７５を格納するためにバックグラウンドで動作するバックグランド動作部６３と、ローカルのデータパス領域２９の設定を行うコンフィグレーションデータが格納されるフォアグラウンド動作部６４とを備えている。バッググラウンド動作部６３は、３バンクのバッググラウンドメモリ６５と、ＦＦ６１に格納されたデータ７５をバッググラウンドメモリ６５の各バンクと直に出力するラインに振り分けるセレクタ６６と、メモリ６５のバンクを選択してデータを出力できるセレクタ６７とを備えている。フォアグラウンド動作部６４は、データパス領域２９に供給されている設定データを格納することによりデータパス領域２９の現状のコンフィグレーションを維持するフォアグラウンドメモリ６８と、フォアグラウンドメモリ６８に、バッグラウンドメモリ６５または転送路５１のＦＦ６１からのデータを選択して供給するセレクタ６９を備えている。フォアグラウンドメモリ６８にロードするコンフィグレーションデータ（設定データ）を選択するセレクタ６７および６９は、選択信号を選択するための第２のセレクタ７２および７０により制御され、転送路５１によりＦＦ６１にセットされたデータおよびフォアグラウンドメモリ６８に設定されたコンフィグレーションデータから制御することができる。セレクタ６７および６９は、ＲＩＳＣ３５からダイレクトに供給される信号により制御するようにしても良い。 The setting unit 60 further sets the decoder 62 that decodes the data 75 stored in the FF 61, the background operation unit 63 that operates in the background to store the data 75, and the local data path area 29. And a foreground operation unit 64 in which configuration data is stored. The background operation unit 63 selects a background bank 65 of three banks, a selector 66 that distributes the data 75 stored in the FF 61 to a line that is directly output to each bank of the background memory 65, and a bank of the memory 65. And a selector 67 that can output data. The foreground operation unit 64 stores the setting data supplied to the data path area 29 to maintain the current configuration of the data path area 29 and the background memory 65 or the transfer to the foreground memory 68. A selector 69 is provided for selecting and supplying data from the FF 61 on the path 51. The selectors 67 and 69 for selecting configuration data (setting data) to be loaded into the foreground memory 68 are controlled by the second selectors 72 and 70 for selecting a selection signal, and the data set in the FF 61 by the transfer path 51 It can be controlled from the configuration data set in the foreground memory 68. The selectors 67 and 69 may be controlled by a signal directly supplied from the RISC 35.

設定ユニット６０は、下流のＰＥ２１のＦＦ６１に供給するデータ７５を、自己のＰＥ２１のＦＦ６１のデータ、バッググラウンドメモリ６５のバンクのデータ、さらにフォアグラウンドメモリ６８のデータのいずれかから選択できる出力セレクタ７１を備えている。デコーダ６２は、自己のＰＥ２１のＦＦ６１に転送されたデータ７５を解析することにより、出力セレクタ７１を切り替えて下流のＰＥ２１のＦＦ６１に転送されるデータ７５を選択する。 The setting unit 60 has an output selector 71 that can select the data 75 to be supplied to the FF 61 of the downstream PE 21 from any of the data of the FF 61 of its own PE 21, the data of the bank of the background memory 65, and the data of the foreground memory 68. I have. The decoder 62 analyzes the data 75 transferred to the FF 61 of its own PE 21 to switch the output selector 71 and select the data 75 to be transferred to the FF 61 of the downstream PE 21.

図７に、転送路５１を用いたデータ伝送方法を実現するデコーダ６２の処理の概要を示してある。ステップ８１において、ＦＦ６１に転送されたデータ７５が自己のＰＥ２１が処理するデータか否かを判断する。自己のＰＥ２１が処理すべきデータでないときは、ステップ８７でセレクタ７１をスルーにする。次に、ＦＦ６１に転送されたデータ７５が、自己のＰＥ２１が処理するデータであり、ステップ８２において、制御用のデータであると判断すれば、ステップ８３においてデコーダ６２はフォアグラウンドメモリ６８に関連するセレクタを切り替えてフォアグラウンドメモリ６８の内容を更新する。これにより、ローカルのデータパス領域２９の構成が変更になる。ステップ８７において、セレクタ７１はスルーにセットされたままである。 FIG. 7 shows an outline of the processing of the decoder 62 that realizes the data transmission method using the transfer path 51. In step 81, it is determined whether or not the data 75 transferred to the FF 61 is data to be processed by its own PE 21. If the own PE 21 is not the data to be processed, the selector 71 is made through in step 87. Next, if it is determined that the data 75 transferred to the FF 61 is data to be processed by its own PE 21 and is control data in step 82, the decoder 62 selects the selector related to the foreground memory 68 in step 83. And the contents of the foreground memory 68 are updated. As a result, the configuration of the local data path area 29 is changed. In step 87, the selector 71 remains set to through.

ＦＦ６１に転送されたデータ７５が制御用でない場合は、ＰＥ２１に格納するデータか、あるいはＰＥ２１のデータを書き込むためにダミーデータにより予約されたデータ領域である。したがって、ステップ８４において、ＦＦ６１に転送されたデータ７５がバックグラウンドメモリ６５またはフォアグラウンドメモリ６８に格納するデータであれば、ステップ８５において、ＦＦ６１から読み出されたデータを指定されたメモリに格納する。ステップ８７において、セレクタ７１はスルーにセットされたままである。 When the data 75 transferred to the FF 61 is not for control, it is a data area reserved by dummy data for writing data stored in the PE 21 or data of the PE 21. Therefore, if the data 75 transferred to the FF 61 is stored in the background memory 65 or the foreground memory 68 in step 84, the data read from the FF 61 is stored in the designated memory in step 85. In step 87, the selector 71 remains set to through.

一方、ステップ８４において、ＦＦ６１に転送されたデータ７５がダミーの場合は、ステップ８６においてセレクタ７１を切り替えて、下流のＰＥ２１のＦＦ６１にバックグラウンドメモリ６５またはフォアグラウンドメモリ６８の内容を転送する。フォアグラウンドメモリ６８の内容は、そのＰＥ２１の処理状態を示すものであり、処理状態の確認、エラーの有無などをデバッグユニットやＲＩＳＣ３５などにおいて判断するために用いられる。バッググラウンドメモリ６５の内容は、そのＰＥ２１に割り当てられていた機能を示すものであり、例えば、そのＰＥ２１にエラーが発生したり、ＰＥ２１が含まれていたデータフローを構成しなおす必要が生じたときに、代替のＰＥ２１にコンフィグレーションデータを転送するときに使用される。 On the other hand, if the data 75 transferred to the FF 61 is a dummy in step 84, the selector 71 is switched in step 86 to transfer the contents of the background memory 65 or the foreground memory 68 to the FF 61 of the downstream PE 21. The contents of the foreground memory 68 indicate the processing state of the PE 21 and are used to check the processing state, determine the presence or absence of an error, etc. in the debug unit or the RISC 35. The content of the background memory 65 indicates the function assigned to the PE 21. For example, when an error occurs in the PE 21 or when it is necessary to reconfigure the data flow including the PE 21 And used when transferring configuration data to the alternative PE 21.

本例のデータ伝送機構５０において、各ＰＥ２１のＦＦ６１を接続して形成された転送路５１は、クロックあるいはサイクル単位で上流のＦＦ６１から下流のＦＦ６１にデータ７５が転送されるシフトレジスタ方式の転送路である。そして、ＦＦ６１の間でデータ７５の転送を停止あるいはウェイトする制御用の機構はなく、複数のＦＦの間で、３２ビットのデータ７５が順番に、継続して転送される。複数のＦＦ６１の間で転送されるデータ７５は基本的には変更されることはなく、リード要求があったときには、下流のＦＦ６１に転送するデータ７５がＰＥ２１のメモリの出力に置き換えられる。したがって、ＦＦ６１の間を転送されるワード単位のデータ７５は、あるＰＥ２１が専用に利用できるように開放あるいは設定されたデータ領域であり、読み出しも、書き込みも可能であるが、ＦＦ６１により転送されるデータ領域が消滅することはなく、転送路５１がデータ領域の転送から開放されることはない。 In the data transmission mechanism 50 of this example, the transfer path 51 formed by connecting the FF 61 of each PE 21 is a shift register type transfer path in which data 75 is transferred from the upstream FF 61 to the downstream FF 61 in units of clocks or cycles. It is. There is no control mechanism for stopping or waiting for the transfer of the data 75 between the FFs 61, and the 32-bit data 75 is continuously transferred between the FFs in order. The data 75 transferred between the plurality of FFs 61 is basically not changed, and when there is a read request, the data 75 transferred to the downstream FF 61 is replaced with the output of the PE 21 memory. Therefore, the data 75 in units of words transferred between the FFs 61 is a data area that is released or set so that a certain PE 21 can be used exclusively, and can be read and written, but is transferred by the FF 61. The data area does not disappear, and the transfer path 51 is not released from the data area transfer.

データ領域は、ある特定の１つのＰＥ２１に対して専用であってもよく、転送路５１により接続された複数のＦＦ６１のそれぞれの所有者である複数のＰＥ２１に対して専用であっても良い。したがって、複数のＰＥ２１に対して共通のデータあるいはメッセージを、転送路５１を用いて発送できる。転送路５１により接続されたＰＥ２１が、データ領域７５の宛先（所有者）、読出し、書き込みといったプロパティを変更する機能を備えていれば、目的を達したデータ領域７５のプロパティを変更して他の目的のために利用することができる。本例においては、ＰＥ２１における機能を簡略化するために、データ領域７５のプロパティは、伝送コントロールユニット（ＴＣＵ）５９により一括管理されている。 The data area may be dedicated to one specific PE 21 or may be dedicated to a plurality of PEs 21 that are the respective owners of the plurality of FFs 61 connected by the transfer path 51. Therefore, common data or messages can be sent to the plurality of PEs 21 using the transfer path 51. If the PE 21 connected by the transfer path 51 has a function for changing the properties of the data area 75 such as the destination (owner), reading, and writing, the property of the data area 75 that has achieved the purpose can be changed to change the other properties. Can be used for purposes. In this example, the properties of the data area 75 are collectively managed by the transmission control unit (TCU) 59 in order to simplify the function in the PE 21.

図５に、ＴＣＵ５９の概略構成を示してある。ＴＣＵ５９は、データ領域管理部９１と、このデータ領域管理部９１により設定された条件のデータ領域７５を各転送路５１〜５８に集配する配送部９２とを備えている。データ領域管理部９１は、ＢＳＵ３６とのデータ交換を管理するバス制御部９５と、送信部９３と、受信部９４とを備えている。送信部９３は、バッファ９６と、暗号処理回路９７と、パラメータ設定回路９８とを備えている。暗号処理回路９７は、データ領域７５に格納するデータを暗号化する必要があれば暗号化し、逆に、ＢＳＵ３６から供給された暗号化されたデータを復号する必要があれば復号する。パラメータ設定回路９８は、送出するデータ領域７５にヘッダーを付けてデータ領域のプロパティを設定する。 FIG. 5 shows a schematic configuration of the TCU 59. The TCU 59 includes a data area management unit 91 and a delivery unit 92 that collects and delivers the data area 75 having the conditions set by the data area management unit 91 to the transfer paths 51 to 58. The data area management unit 91 includes a bus control unit 95 that manages data exchange with the BSU 36, a transmission unit 93, and a reception unit 94. The transmission unit 93 includes a buffer 96, an encryption processing circuit 97, and a parameter setting circuit 98. The encryption processing circuit 97 encrypts the data stored in the data area 75 if necessary, and conversely decrypts the encrypted data supplied from the BSU 36 if necessary. The parameter setting circuit 98 attaches a header to the data area 75 to be sent and sets the properties of the data area.

ＰＵ１においては、全ての転送路５１〜５８は閉じており、ＴＣＵ５９から送出したデータ領域７５はＴＣＵ５９に戻るようになっている。さらに、ＴＣＵ５９の受信部９４は、転送路５１〜５８から戻されたデータ領域７５を受信するリード用ＦＩＦＯ９９と、受信したデータ領域７５に含まれるデータにより、そのデータ領域７５をバス制御ユニット９５に出力するか、送信部９３に供給して同じまたは別の転送路５１〜５８を介して他のＰＥ２１に供給するかを選択可能なセレクタ１０５を備えている。 In PU1, all the transfer paths 51 to 58 are closed, and the data area 75 sent from the TCU 59 returns to the TCU 59. Further, the reception unit 94 of the TCU 59 uses the read FIFO 99 for receiving the data area 75 returned from the transfer paths 51 to 58 and the data included in the received data area 75 to the bus control unit 95. A selector 105 that can select whether to output or supply to the transmission unit 93 and supply to another PE 21 through the same or different transfer paths 51 to 58 is provided.

配送部９２は、送信部９３から出力されたデータ領域７５を宛先のＰＥ２１が属する転送路５１〜５６のいずれかに分配して出力するセレクタ１０１を備えている。ＰＥ２１は、いずれかのセグメント１０〜１５に属しており、データ領域（少なくともデータ単位の先頭になるヘッダーを格納したデータ領域）７５は、セグメントの情報を備えているので、その情報により転送路を選択できる。さらに、ＰＵ１では、入力バッファ３３と出力バッファ３４の制御もデータ伝送機構５０により行おうとしており、それぞれを構成するエレメントを接続する転送路５７および５８が設けられている。配送部９２は、さらに、各転送路５１〜５８から戻ったデータ領域７５を集めて受信部９４に供給するセレクタ１０２を備えている。ＰＵ１では、各転送路５１〜５８のレイテンシーは同一になるように設計されているので、集合用のセレクタ１０２は、転送路５１〜５８を選択する必要はなく、各転送路５１〜５８を一巡したデータ領域７５は、衝突せずに受信部９４に回収される。 The delivery unit 92 includes a selector 101 that distributes and outputs the data area 75 output from the transmission unit 93 to any of the transfer paths 51 to 56 to which the destination PE 21 belongs. The PE 21 belongs to any one of the segments 10 to 15, and the data area 75 (at least the data area storing the header at the beginning of the data unit) 75 includes the segment information. You can choose. Further, in PU1, the input buffer 33 and the output buffer 34 are also controlled by the data transmission mechanism 50, and transfer paths 57 and 58 for connecting the respective elements are provided. The delivery unit 92 further includes a selector 102 that collects the data areas 75 returned from the transfer paths 51 to 58 and supplies them to the reception unit 94. In PU1, since the latencies of the transfer paths 51 to 58 are designed to be the same, the selector 102 for aggregation does not need to select the transfer paths 51 to 58, and makes a round of the transfer paths 51 to 58. The received data area 75 is recovered by the receiving unit 94 without colliding.

セグメント１０〜１５には、ほぼ同数のＰＥ２１が配置されており、それぞれのセグメントに含まれるＰＥ２１にそれぞれに対応するＦＩＦＯ（レジスタ）６１をレジスタ群とし、それらレジスタを直列に通過する転送路５１〜５６のレイテンシーは同じになるようにアレンジされている。すなわち、シフトレジスタ型の各転送路５１〜５６を構成するレジスタ群のＦＦ６１の数は同数になるようにアレンジされている。バッファ用の転送路５７および５８もセグメントとレイテンシーが一致するように、各転送路を構成するＦＦの数が決定されている。したがって、送信部９３から順番に送出されたデータ領域７５は、その順番を崩さずに各転送路５１〜５８を転送され、集合用のセレクタ１０２に送出した順番に到達する。このため、受信部９４が、到達した順番にデータ領域７５のデータをバス制御ユニット９５およびＢＳＵ３６を介してＲＩＳＣ３５に送ると、ＲＩＳＣ３５は、送出部９３により出力した順番のデータを得ることができる。したがって、ＲＩＳＣ３５は、各々のＰＥ２１が物理的にどのセグメントのどの位置に配置されているかを意識する必要はなく、セグメントとセグメント内のアドレスが分かる情報を付けてデータを出力するだけで、所望のＰＥ２１にデータを供給でき、また、所望のＰＥ２１のデータを取得することができる。 The segments 10 to 15 are provided with substantially the same number of PEs 21, and FIFOs (registers) 61 corresponding to the PEs 21 included in the respective segments are used as register groups, and transfer paths 51 to 51 passing through these registers in series. The 56 latencies are arranged to be the same. That is, the number of FFs 61 in the register group constituting each of the shift register type transfer paths 51 to 56 is arranged to be equal. The number of FFs constituting each transfer path is determined so that the buffer transfer paths 57 and 58 also have the same segment and latency. Therefore, the data area 75 sent in order from the transmission unit 93 is transferred through the transfer paths 51 to 58 without breaking the order, and reaches the order sent to the selector 102 for aggregation. Therefore, when the receiving unit 94 sends the data in the data area 75 to the RISC 35 via the bus control unit 95 and the BSU 36 in the order of arrival, the RISC 35 can obtain the data in the order output by the sending unit 93. Therefore, the RISC 35 does not need to be aware of which position of which segment each PE 21 is physically located, and only outputs the data with the information indicating the segment and the address in the segment. Data can be supplied to the PE 21 and desired PE 21 data can be acquired.

ＰＥ２１に発送されるデータは１ワードとは限らず、多くのケースでは数ワードあるいはそれ以上になる。したがって、ＰＥにより独立した処理が行われる、すなわち、読出したり書き込んだりする処理が行われるデータ単位は１つのデータ領域７５だけで送信できることは少なく、複数のデータ領域が消費される。複数のデータ領域７５のそれぞれにヘッダーを設けても良いが、ヘッダーにより消費される領域を低減する。このために、本例のＰＵ１においては、各々のＰＥ２１で独立した処理が行われるデータ単位を連続した複数のデータ領域７５により伝送し、パラメータなどのデータの管理情報は、先頭のデータ領域７５のヘッダーにできるだけ格納することによりデータ転送効率を向上している。したがって、１ワード以上のパケットは複数に分けて数珠繋ぎに転送路５１のＦＦ６１を転送される。 Data sent to the PE 21 is not limited to one word, and in many cases, it is several words or more. Therefore, it is unlikely that a single data area 75 can transmit a data unit in which independent processing is performed by the PE, that is, processing for reading and writing is performed, and a plurality of data areas are consumed. Although a header may be provided for each of the plurality of data areas 75, the area consumed by the headers is reduced. For this reason, in the PU 1 of this example, a data unit for which independent processing is performed in each PE 21 is transmitted by a plurality of continuous data areas 75, and data management information such as parameters is stored in the head data area 75. Data transfer efficiency is improved by storing as much as possible in the header. Therefore, a packet of one word or more is divided into a plurality and transferred through the FF 61 of the transfer path 51 in a daisy chain.

図８（ａ）にヘッダーの構成を示してある。本例においては、ヘッダー７６に１ワードを消費するので、最初に転送路５１を伝送されるデータ領域７５はヘッダー７６が格納される。フィールド７６ａは、セグメント番号であり、ＴＣＵ５９の配送部９２は、フィールド７６ａの値により振り分ける転送路５１〜５８を決定する。フィールド７６ｂおよび７６ｃは、宛先のＰＥ２１のセグメント内のＸ位置およびＹ位置を示す情報である。したがって、フィールド７６ａ〜７６ｃがデータ領域７５の所有者となるＰＥ２１を特定するアドレス情報であり、ＰＥ２１の設定ユニット６０では、これらの情報をデコードすることにより自己宛のデータ領域であるか否かを判断する。フィールド７６ｄはライトアクセスとリードアクセスとを判別するためのフィールドである。ライトであれば、後続のデータ領域７５にはＰＥ２１のメモリに書き込むデータが格納されている。リードであれば、後続するデータ領域７５にはダミーデータが格納されており、ＰＥ２１は指定されたメモリの情報をダミーデータに代えてデータ領域７５として出力し、後続のＰＥ２１のＦＦ６１に送り出すことになる。 FIG. 8A shows the configuration of the header. In this example, since one word is consumed in the header 76, the header 76 is stored in the data area 75 transmitted through the transfer path 51 first. The field 76a is a segment number, and the delivery unit 92 of the TCU 59 determines the transfer paths 51 to 58 to be distributed according to the value of the field 76a. The fields 76b and 76c are information indicating the X position and the Y position in the segment of the destination PE 21. Therefore, the fields 76a to 76c are address information for specifying the PE 21 that is the owner of the data area 75, and the setting unit 60 of the PE 21 decodes the information to determine whether the data area is addressed to itself. to decide. The field 76d is a field for discriminating between write access and read access. In the case of writing, data to be written to the memory of the PE 21 is stored in the subsequent data area 75. If it is a read, dummy data is stored in the subsequent data area 75, and the PE 21 outputs the specified memory information as the data area 75 instead of the dummy data, and sends it to the FF 61 of the subsequent PE 21. Become.

フィールド７６ｅは、複数のデータ領域７５によりデータが転送されていることを示し、フィールド７６ｆは、ヘッダーを格納したデータ領域７５に続くデータを格納したデータ領域７５の数を示す。したがって、図８（ｂ）に示すように、ヘッダーにより宛先として指定されたＰＥ２１は、ヘッダーのデータ領域７５に続き、連続してＦＦ６１に転送されてくるデータ（例えばコンフィグレーションデータ）７７のデータ領域７５を、ヘッダーで指定された数だけ自己の所有するデータ領域であると判断し、それらのデータ領域７５に対してデータの入出力を行う。フィールド７６ｇは、ヘッダーに続き、１つではなく複数のデータ領域７５がバースト転送されることを示す。ＰＥ２１のバックグラウンドメモリ６５およびフォアグラウンドメモリ６８は、バーストアクセスにより連続した入出力が可能な構成となっている。バーストアクセスでは、バースト回数を指定したり、バースト回数を固定する方式を採用できる。また、回数を指定する代わりに、ＰＥ２１のメモリの指定されたアドレスからデータを次々と格納し、予め設定されたアドレスまで書き込んだときにバーストアクセスを終了する方式も採用できる。読出しも同様である。データを入出力するアドレスと、入出力するデータ量を同時に指定することが可能となり、ヘッダーのデータ量を削減できる。 A field 76e indicates that data is transferred by a plurality of data areas 75, and a field 76f indicates the number of data areas 75 that store data following the data area 75 that stores headers. Therefore, as shown in FIG. 8B, the PE 21 designated as the destination by the header follows the data area 75 of the header, and the data area of data (for example, configuration data) 77 continuously transferred to the FF 61 75 is determined to be a data area owned by itself by the number specified in the header, and data is input / output to / from these data areas 75. A field 76g indicates that a plurality of data areas 75, instead of one, are burst transferred following the header. The background memory 65 and the foreground memory 68 of the PE 21 are configured to allow continuous input / output by burst access. In burst access, a method of specifying the number of bursts or fixing the number of bursts can be adopted. Further, instead of designating the number of times, it is also possible to adopt a method of storing data one after another from designated addresses in the memory of the PE 21 and ending the burst access when writing up to a preset address. The same applies to reading. It is possible to simultaneously specify an address for inputting / outputting data and an amount of data to be input / output, thereby reducing the amount of header data.

フィールド７６ｈは、各々のＰＥ２１においてデータを格納し、あるいは読み出すバンク番号を示している。バンク番号を指定することにより、ライトアクセスのときはデータ領域７５により転送されたデータ７７は、バックグラウンドメモリ６５またはフォアグラウンドメモリ６８の指定されたバンクに書き込まれる。リードアクセスのときは、データ領域７５を確保するために転送されたダミーデータが、指定されたバンクのデータにより置き換えられて次のＦＦ６１に転送される。 A field 76h indicates a bank number for storing or reading data in each PE 21. By designating the bank number, at the time of write access, the data 77 transferred by the data area 75 is written to the designated bank of the background memory 65 or the foreground memory 68. At the time of read access, the dummy data transferred to secure the data area 75 is replaced with the data of the designated bank and transferred to the next FF 61.

フィールド７６ｉは、暗号化／非暗号化の選択用であり、このビットにより暗号化パケットであるか否かを判断する。暗号化されたデータが送られてきた場合は、そのデータが読み出されるときも暗号化して出力する。暗号化／非暗号化の切り分けは自由に設定でき、チップ単位、セグメント単位、エレメント単位、さらにはバンク単位で指定することができる。また、フィールド７６ｊは、ＣＲＣを指定するために用いられる。これにより、データ領域７５に格納されたデータあるいはメッセージの信頼性を向上できる。 The field 76i is for selection of encryption / non-encryption, and it is determined by this bit whether or not it is an encrypted packet. When encrypted data is sent, it is also encrypted and output when the data is read. The encryption / non-encryption separation can be freely set, and can be specified in chip units, segment units, element units, or bank units. The field 76j is used for designating a CRC. Thereby, the reliability of the data or message stored in the data area 75 can be improved.

ヘッダー７６は、上記以外の設定も可能である。例えば、１つで制御用のパケットとして機能させることも可能である。データ領域７５が巡回する全てのＰＥ２１に対してデータあるいはメッセージを提供するブロードキャスト用に利用できる。セグメント単位の制御に適しており、セグメント単位でパワーダウンさせることができる。さらに制御用のデータはＴＣＵ５９の送信部９３の設定にも利用できる。ＲＩＳＣ３５により制御パケット用のヘッダーをＰＥ２１に転送するのと同様の手続きで出力すれば、その制御パケットはＴＣＵ５９を通過するので、ＴＣＭ５９の設定にも利用できる。パラメータ設定回路９８のバンク番号を指定することにより、それ以降の全てのデータ領域７５により転送されるデータを同一のバンクに格納するように設定できる。 The header 76 can be set other than the above. For example, one can function as a control packet. The data area 75 can be used for broadcasting to provide data or messages to all the PEs 21 that circulate. It is suitable for segment unit control and can be powered down on a segment basis. Further, the control data can be used for setting the transmission unit 93 of the TCU 59. If the header for the control packet is output by the same procedure as the transfer of the header for the control packet to the PE 21 by the RISC 35, the control packet passes through the TCU 59 and can be used for setting the TCM 59. By specifying the bank number of the parameter setting circuit 98, the data transferred by all the data areas 75 thereafter can be set to be stored in the same bank.

図９に、転送路５１によりＰＥ２１に伝送可能な幾つかのパケットの例を示してある。図９（ａ）は制御パケットであり、データ領域７５にヘッダー７６を格納し、単独で伝送することにより、ヘッダー７６により指定されたＰＥ２１を制御できる。図９（ｂ）は、シングルライトアクセスパケットであり、転送路５１を構成するＦＦ６１を、ヘッダー７６を格納したデータ領域７５に続いてライトデータを格納したデータ領域７５が転送される。図９（ｃ）は、バーストライトアクセスパケットであり、ヘッダー７６を格納したデータ領域７５に続いてライトデータを格納した複数のデータ領域７５が転送路５１を転送される。 FIG. 9 shows examples of some packets that can be transmitted to the PE 21 via the transfer path 51. FIG. 9A shows a control packet. By storing the header 76 in the data area 75 and transmitting it alone, the PE 21 specified by the header 76 can be controlled. FIG. 9B shows a single write access packet, in which the data area 75 storing the write data is transferred following the data area 75 storing the header 76 through the FF 61 constituting the transfer path 51. FIG. 9C shows a burst write access packet, and a plurality of data areas 75 storing write data are transferred through the transfer path 51 following the data area 75 storing the header 76.

図９（ｄ）は、シングルリードアクセス要求パケットであり、ヘッダー７６を格納したデータ領域７５に続いて、ダミーデータを格納したデータ領域７５が転送路５１を伝送される。そして、図９（ｅ）に示すように、シングルリードアクセス応答パケットが閉回路となっている転送路５１により戻される。シングルリードアクセス応答パケットとして、ヘッダー７６を格納したデータ領域７５に続いて、目的のＰＥ２１においてダミーデータがリードデータ７８に置き換わったデータ領域７５が伝送される。図９（ｆ）は、バーストリードアクセス要求パケットであり、ヘッダー７６を格納したデータ領域７５に続いて、ダミーデータを格納した複数のデータ領域７５が転送路５１を伝送される。そして、図９（ｇ）に示すバーストリード応答パケットが閉回路の転送路５１により戻される。バーストリード応答パケットとして、ヘッダー７６を格納したデータ領域７５に続いて、所望のＰＥ２１においてダミーデータがリードデータ７８に置き換わった複数のデータ領域７５が伝送される。 FIG. 9D shows a single read access request packet. The data area 75 storing dummy data is transmitted through the transfer path 51 following the data area 75 storing the header 76. Then, as shown in FIG. 9E, the single read access response packet is returned by the transfer path 51 which is a closed circuit. As a single read access response packet, a data area 75 in which dummy data is replaced with read data 78 is transmitted in the target PE 21 following the data area 75 storing the header 76. FIG. 9F shows a burst read access request packet, and a plurality of data areas 75 storing dummy data are transmitted through the transfer path 51 following the data area 75 storing the header 76. Then, the burst read response packet shown in FIG. 9G is returned by the transfer path 51 of the closed circuit. Following the data area 75 storing the header 76, a plurality of data areas 75 in which dummy data is replaced with the read data 78 are transmitted as burst read response packets.

ＲＩＳＣ３５が、マトリクス１９のＰＥ２１に対する通信管理を行う制御ユニットであれば、ＲＩＳＣ３５は、宛先のＰＥ２１に書き込みたいデータを、そのＰＥ２１を特定するヘッダーデータを有するパケットとしてＴＣＵ５９に送ると、ＴＣＵ５９は、書き込みたいデータをデータ領域７５に分割して転送路、例えば転送路５１に送り出す。そして、ＰＥ２１が自己宛のデータ領域７５を受け取ると、そのデータ領域７５に含まれているデータをメモリに書き込む。したがって、ＲＩＳＣ３５は、そのＰＥ２１の物理的な位置を知らなくても所望のＰＥ２１にデータを書き込むことができる。 If the RISC 35 is a control unit that performs communication management with respect to the PE 21 in the matrix 19, the RISC 35 sends the data to be written to the destination PE 21 to the TCU 59 as a packet having header data that identifies the PE 21, and the TCU 59 writes the data. The desired data is divided into data areas 75 and sent to a transfer path, for example, the transfer path 51. Then, when the PE 21 receives the data area 75 addressed to itself, the data contained in the data area 75 is written into the memory. Therefore, the RISC 35 can write data to the desired PE 21 without knowing the physical position of the PE 21.

また、ＲＩＳＣ３５が、所望のＰＥ２１からデータを読み出したいときは、そのＰＥ２１を特定するヘッダーデータと読み出したいデータ量が分かるデータとを有するパケットをＴＣＵ５９に送る。ＴＣＵ５９は、ダミーデータにより読み出したいデータ量を格納可能なデータ領域７５を確保し、転送路５１に送り出す。ＰＥ２１は、自己宛のデータ領域７５を受け取ると、ダミーデータの代わりに指定されたデータを転送路５１に送り出す。したがって、そのＰＥ２１に予め割り当てられたタイミングを使用して、ローカル側からデータを、転送路５１を介してＲＩＳＣ３５に供給することができる。 When the RISC 35 wants to read data from a desired PE 21, the RISC 35 sends a packet having header data that identifies the PE 21 and data that indicates the amount of data to be read to the TCU 59. The TCU 59 secures a data area 75 that can store the amount of data to be read using dummy data, and sends it to the transfer path 51. When the PE 21 receives the data area 75 addressed to itself, the PE 21 sends the designated data to the transfer path 51 instead of the dummy data. Therefore, data can be supplied from the local side to the RISC 35 via the transfer path 51 using the timing assigned in advance to the PE 21.

このシフトレジスタ型のデータ伝送機構５０では、転送路に属するＰＥ２１が使用するデータ領域７５が予約されているので、データの送受信中に転送路を移動するデータ量が増減することはなく、また、異なるＰＥ２１に対する入出力が競合することもない。したがって、転送路を形成するＦＦ６１の間で、サイクル単位でデータ領域７５を継続して転送することが可能であり、データ領域７５の転送を中止したり、待機する必要はない。このため、伝達途中のデータの入出力のタイミングを管理するハードウェアおよびソフトウェアは不用となり、極めてシンプルな構成で所望のＰＥ２１に対してデータを入力でき、また、データを読出しできる。転送路を構成するＦＦを転送されるデータ領域７５は、１ワードの窓のように取り扱うことが可能となり、ＲＩＳＣ３５は、その窓にパケットを書き込むことにより転送路にデータを流すことができる。また、ＰＥ２１においては、自己宛のデータ領域７５は、窓が開いた状態となり、その窓からデータを読み込み、またデータを書き込むことができる。他のＰＥに宛てたデータ領域７５は、窓を閉めたまま次のＰＥ２１のＦＦにスルーされる。 In this shift register type data transmission mechanism 50, since the data area 75 used by the PE 21 belonging to the transfer path is reserved, the amount of data moving through the transfer path does not increase or decrease during data transmission / reception. There is no competition between input and output for different PEs 21. Therefore, the data area 75 can be continuously transferred in units of cycles between the FFs 61 forming the transfer path, and there is no need to stop the transfer of the data area 75 or wait. For this reason, hardware and software for managing the input / output timing of data in the middle of transmission are unnecessary, and data can be input to and read out from a desired PE 21 with an extremely simple configuration. The data area 75 to which the FF constituting the transfer path is transferred can be handled like a window of one word, and the RISC 35 can flow data through the transfer path by writing a packet in the window. In the PE 21, the self-addressed data area 75 is in a state where a window is opened, and data can be read from and written to the window. The data area 75 addressed to another PE is passed through the FF of the next PE 21 with the window closed.

さらに、このデータ伝送機構５０は、転送路がシフトレジスタにより構成されるので、ＲＩＳＣ３５から出力されたパケットデータを全てバッファリングする必要はない。ＴＣＵ５９にＲＩＳＣ３５から供給されたデータは、１ワードずつ次々と転送路に出力することができる。したがって、ＢＳＵ３６などを介することによる影響を受けるとしても、基本的にはＲＩＳＣ３５から出力される速度で、ＲＩＳＣ３５に待機させることもなく、データのバッファリングに時間を浪費することなく、データを転送路５１〜５８に送出することが可能である。したがって、このデータ伝送機構５０は、簡易な構成で高速なデータ転送システムである。 Further, in this data transmission mechanism 50, since the transfer path is constituted by a shift register, it is not necessary to buffer all the packet data output from the RISC 35. Data supplied from the RISC 35 to the TCU 59 can be output to the transfer path one word at a time. Therefore, even if it is influenced by the BSU 36 or the like, it is basically the speed output from the RISC 35, does not cause the RISC 35 to wait, and does not waste time in buffering data. 51 to 58 can be sent. Therefore, the data transmission mechanism 50 is a high-speed data transfer system with a simple configuration.

転送路を構成する多数のＦＦを介してデータが伝送されるために所望のＰＥ２１にデータが到達するタイミングが遅くなる可能性がある。しかしながら、パケットデータをＦＩＦＯなどによりバッファリングすることを考えると、転送路を構成するＦＦの数がパケットデータのワード数に対して非常に多い場合を除けば、データの到達時間の差はない。バッファリングを繰り返すシステムであれば、本例のデータ伝送機構５０の方が最初にデータが到達する時間も短縮される。また、転送路のレイテンシーが大きいときは、転送路のバンド幅を広げることで短縮することも可能であるが、ハードウェアリソースがかなり増加する。これに対し、本例のＰＵ１のように、転送路をセグメント毎に分割することにより個々の転送路のレイテンシーを縮小する設計が可能である。転送路をセグメント毎に分割することにより、ハードウェアリソースの増加も防止でき、また、各ＰＥにおいてはセグメントをデコードする必要がなくなるのでデコーダ６２の構成が簡易になる。 Since data is transmitted through many FFs constituting the transfer path, there is a possibility that the timing at which the data reaches the desired PE 21 may be delayed. However, considering buffering of packet data by FIFO or the like, there is no difference in data arrival time except when the number of FFs constituting the transfer path is much larger than the number of words of packet data. In the case of a system that repeats buffering, the data transmission mechanism 50 of the present example also shortens the time for data to arrive first. Further, when the latency of the transfer path is large, it can be shortened by widening the bandwidth of the transfer path, but the hardware resources are considerably increased. On the other hand, like PU1 of this example, it is possible to design to reduce the latency of each transfer path by dividing the transfer path into segments. By dividing the transfer path for each segment, it is possible to prevent an increase in hardware resources, and it is not necessary to decode the segment in each PE, so that the configuration of the decoder 62 is simplified.

図１０に示したデータ処理装置２は、上述したＰＵ１と同じ規模のマトリクス１９を４つ備えた大規模な再構成可能なデータ処理装置である。このような大規模でＰＥの数が４倍に増えた場合も、転送路を各マトリクス毎に分割し、データ領域７５を各マトリクス１９に振り分ける分配回路５を設けることにより、レイテンシーの増加を防ぐことができる。セグメントより大きなグループ、例えば、マクロセグメントを形成し、転送路を階層化することにより、個々の転送路に含まれるシフトレジスタの段数は少なくなる。このため、階層化された転送路によりレイテンシーは減少するが、その一方で、階層化された転送路を指定する情報が増大し、ヘッダーが大きくなる。ヘッダーが大きくなると１つのデータ領域７５では伝送できなくなる可能性があり、オーバーヘッドが増加し通信速度が低下する。この場合、分配回路をルート設定するための制御用パケットを出力し、分配回路により選択される転送路を固定することが可能である。次の制御用パケットにより転送路が切り替えられるまで、転送路を指定する情報を出力する必要がないので、ヘッダー情報を削減でき、ヘッダー情報を伝達することによりオーバーヘッドの増加を防止できる。 The data processing device 2 shown in FIG. 10 is a large-scale reconfigurable data processing device including four matrices 19 having the same scale as the PU 1 described above. Even when the number of PEs increases in such a large scale, the transfer path is divided for each matrix, and the distribution circuit 5 that distributes the data area 75 to each matrix 19 is provided to prevent an increase in latency. be able to. By forming a group larger than the segment, for example, a macro segment and hierarchizing the transfer paths, the number of stages of shift registers included in each transfer path is reduced. For this reason, the latency is reduced by the layered transfer path, but on the other hand, information specifying the layered transfer path is increased and the header is increased. If the header becomes large, there is a possibility that transmission cannot be performed in one data area 75, and overhead increases and communication speed decreases. In this case, it is possible to output a control packet for setting the route of the distribution circuit and fix the transfer path selected by the distribution circuit. Since it is not necessary to output information specifying the transfer path until the transfer path is switched by the next control packet, header information can be reduced, and an increase in overhead can be prevented by transmitting the header information.

上記では、データ伝送機構５０を介してＲＩＳＣ３５より各ＰＥ２１のコンフィグレーションデータを更新する例を説明したが、コンフィグレーションデータはＲＩＳＣ３５に限らず、ＢＳＵ３６に接続されたコンポーネントであればいずれからでも供給できる。ＤＭＡＣ３９により、ＲＩＳＣのメモリ空間にマッピングされている各種メモリ（ＳＤＲＡＭ、ＰＣＩバス）からＴＣＵ５９にコンフィグレーションデータを転送し、ＰＥ２１に供給することができる。コンフィグレーションデータが外部メモリから暗号化されて供給されている場合、マトリクス１９に復号用の回路を構成し、暗号化されたコンフィグレーション情報をマトリクス１９の機能を使って復号して、ＢＳＵ３６を介してＴＣＵ５９に供給する。そして、マトリクス１９の各ＰＥ２１に復号化されたコンフィグレーションデータを供給することができる。ＰＵ１にコンフィグレーションデータを一時的に蓄積するＲＡＭを設けておくことも有効である。マトリクス１９により復号されたコンフィグレーションデータを格納したり、外部メモリから供給されるコンフィグレーションデータを格納しておくことにより、マトリクス１９の処理状況や、ＢＳＵ３６の処理状況に関わらず、複数のコンフィグレーションデータをＲＡＭに一時的に格納しておける。このため、コンフィグレーション情報の入れ替え指示によりＴＣＵ５９からＰＥ２１に確実に、タイムリーに供給することができる。 In the above description, the configuration data of each PE 21 is updated from the RISC 35 via the data transmission mechanism 50. However, the configuration data is not limited to the RISC 35, and can be supplied from any component connected to the BSU 36. . The DMAC 39 can transfer configuration data from various memories (SDRAM, PCI bus) mapped in the RISC memory space to the TCU 59 and supply the configuration data to the PE 21. When configuration data is encrypted and supplied from an external memory, a decryption circuit is configured in the matrix 19, the encrypted configuration information is decrypted using the function of the matrix 19, and the configuration data is transmitted via the BSU 36. To the TCU 59. Then, the decrypted configuration data can be supplied to each PE 21 of the matrix 19. It is also effective to provide a RAM for temporarily storing configuration data in PU1. By storing the configuration data decoded by the matrix 19 or storing the configuration data supplied from the external memory, a plurality of configurations can be obtained regardless of the processing status of the matrix 19 and the processing status of the BSU 36. Data can be temporarily stored in RAM. For this reason, it can be reliably and timely supplied from the TCU 59 to the PE 21 by the configuration information replacement instruction.

ＰＥ２１において、設定ユニット６０とデータパス部２９のクロック信号を個別に管理できるようにしておくことは望ましい。設定ユニット６０のうち、特に、転送路５１によりデータを転送するデータ伝送機構５０に含まれる回路へのクロック信号を停止させずに、データパス部２９のクロック信号を停止あるいは周波数を遅くできる機能を設けておくことが望ましい。コンフィグレーションデータの転送に影響を与えずに、個々のＰＥ２１の消費電力を削減することができ、木目細やかな電力制御が行える。 In the PE 21, it is desirable that the clock signals of the setting unit 60 and the data path unit 29 can be managed individually. In particular, the setting unit 60 has a function of stopping the clock signal of the data path unit 29 or delaying the frequency without stopping the clock signal to the circuit included in the data transmission mechanism 50 that transfers the data through the transfer path 51. It is desirable to have it. The power consumption of each PE 21 can be reduced without affecting the configuration data transfer, and fine power control can be performed.

また、上記では、ＲＩＳＣ３５とＰＥ２１との間でデータあるいはメッセージを交換する例を説明しているが、ＰＥ２１の間でも転送路５１を用いてデータを交換できる。さらに、上記では、ＴＣＵ５９に、下りの転送路５１から上りの転送路５１さらには異なる転送路５２〜５６のいずれかにデータ領域７５を転送できる回路１０５を設けてある。このため、上流のＰＥ２１から下流のＰＥ２１にデータを供給だけではなく、下流のＰＥ２１から上流のＰＥ２１にもデータを供給でき、異なる転送路に含まれるＰＥ２１の間でもデータを交換できる。 In the above description, an example in which data or a message is exchanged between the RISC 35 and the PE 21 has been described. However, data can be exchanged between the PEs 21 using the transfer path 51. Further, in the above, the TCU 59 is provided with a circuit 105 that can transfer the data area 75 from the downstream transfer path 51 to the upstream transfer path 51 or any of different transfer paths 52 to 56. Therefore, not only data can be supplied from the upstream PE 21 to the downstream PE 21 but also data can be supplied from the downstream PE 21 to the upstream PE 21, and data can be exchanged between the PEs 21 included in different transfer paths.

プロセッシングユニット（ＰＵ）の概要を示す図である。It is a figure which shows the outline | summary of a processing unit (PU). マトリクスの概要を示す図である。It is a figure which shows the outline | summary of a matrix. プロセッシングエレメント（ＰＥ）の一例を示す図である。It is a figure which shows an example of a processing element (PE). ＰＥの他の例を示す図である。It is a figure which shows the other example of PE. ＰＥを転送路により接続した状態を示す図である。It is a figure which shows the state which connected PE by the transfer path. ＰＥ間の接続を拡大して示す図である。It is a figure which expands and shows the connection between PE. ＰＥの設定ユニットのデコーダの操作を示すフローチャートである。It is a flowchart which shows operation of the decoder of the setting unit of PE. 図８（ａ）はヘッダー情報の構成を示し、図８（ｂ）はヘッダー情報を含むパケットデータを伝送する構成を示す図である。FIG. 8A shows the configuration of header information, and FIG. 8B shows the configuration for transmitting packet data including header information. 図９（ａ）は制御用パケット、図９（ｂ）はシングルライトアクセス用パケット、図９（ｃ）はバーストライトアクセス用パケット、図９（ｄ）はシングルリードアクセス要求パケット、図９（ｅ）はシングルリードアクセス応答パケット、図９（ｆ）はバーストリードアクセス要求パケット、図９（ｇ）はバーストリードアクセス応答パケットの構成を示す図である。9A is a control packet, FIG. 9B is a single write access packet, FIG. 9C is a burst write access packet, FIG. 9D is a single read access request packet, and FIG. ) Is a single read access response packet, FIG. 9 (f) is a burst read access request packet, and FIG. 9 (g) is a diagram showing the structure of a burst read access response packet. ＰＵの異なる例を示す図である。It is a figure which shows the example from which PU differs.

Explanation of symbols

１、２データ処理装置（ＰＵ）
１０〜１５セグメント
１９マトリクス
２１プロセッシングエレメント（ＰＥ）
２９内部データパス領域
５０データ伝送機構
５１〜５８転送路
５９転送制御ユニット（ＴＣＵ）
６０設定ユニット
６１フリップフロップ（ＦＦ、レジスタ）
７５データ転送領域 1, 2 Data processing unit (PU)
10-15 Segment 19 Matrix 21 Processing Element (PE)
29 Internal data path area 50 Data transmission mechanism 51 to 58 Transfer path 59 Transfer control unit (TCU)
60 Setting unit 61 Flip-flop (FF, register)
75 Data transfer area

Claims

Using a transfer path in which a register group including a plurality of registers corresponding to each of a plurality of processing elements is connected in series in advance, and sequentially transferring a plurality of data areas;
An input / output step of reading data in the data area and / or writing data in the data area, if a processing element corresponding to the register can use the data area transferred to one register of the register group A data transmission method comprising:

2. The data transmission method according to claim 1, further comprising a step in which a control unit connected to the transfer path determines properties of the plurality of data areas.

2. The data transmission method according to claim 1, wherein the transfer path is closed.

3. The transfer path according to claim 2, wherein a plurality of transfer paths are connected to the control unit via a distribution unit, and the plurality of data areas are assigned to the transfer paths to which processor elements that can use the respective data areas belong. A data transmission method further comprising a distribution step.

5. The data transmission method according to claim 4, wherein the plurality of transfer paths are closed transfer paths having the same latency.

The processing element according to claim 1, comprising a data path area whose function can be changed, and a memory for storing a plurality of configuration information for setting the data path area.
In the input / output step, a data transmission method of transferring data from the data area to the memory or transferring data from the memory to the data area.

2. The data transmission method according to claim 1, wherein the data area includes data indicating processing in the input / output process and an address to be processed.

2. The data transmission method according to claim 1, wherein a data unit in which independent processing is performed for each processing element in the input / output step is transmitted by the plurality of continuous data areas.

2. The data area according to claim 1, wherein the plurality of data areas include an area for storing encrypted data, and in the input / output step, data in the data area is read and decoded and / or stored in the data area. A data transmission method for writing encrypted data.

Multiple processing elements,
A register group including a plurality of registers corresponding to each of the plurality of processing elements, connected in series so as to form a transfer path in advance, and sequentially transferring a plurality of data areas in order Have
If the data area transferred to the register corresponding to the processing element among the plurality of registers is usable by the processing element, the processing element reads data in the data area and / or the data A data processing apparatus comprising input / output means for writing data to an area.

11. The data processing apparatus according to claim 10, further comprising a control unit that is connected to the transfer path and that determines properties of the plurality of data areas.

The data processing device according to claim 10, wherein the transfer path is closed.

12. The distribution according to claim 11, wherein the plurality of transfer paths and the plurality of transfer paths are connected to the control unit, and the plurality of data areas are distributed to the transfer paths to which processor elements that can use the respective data areas belong. A data processing apparatus further comprising a unit.

14. The data processing device according to claim 13, wherein the plurality of transfer paths are closed transfer paths having the same latency.

The processing element according to claim 10, comprising: a data path area whose function can be changed; and a memory for storing a plurality of configuration information for setting the data path area.
The input / output means transfers data from the data area to the memory, or transfers data from the memory to the data area.

11. The data processing apparatus according to claim 10, wherein the data area includes data indicating processing of the input / output means and an address to be processed.

11. The data processing apparatus according to claim 10, wherein the input / output unit inputs / outputs a data unit for performing independent processing in the processing element to the plurality of continuous data areas.

11. The data area according to claim 10, wherein the plurality of data areas include an area for storing encrypted data, and the input / output means reads and decodes data in the data area and / or stores the data area in the data area. A data processing device that writes encrypted data.