JP2635031B2

JP2635031B2 - Mutual coupling of parallel computers

Info

Publication number: JP2635031B2
Application number: JP61269655A
Authority: JP
Inventors: 晃村松; 伸一郎宮岡; 誠寿舩橋; 和夫中尾
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1986-11-14
Filing date: 1986-11-14
Publication date: 1997-07-30
Anticipated expiration: 2012-07-30
Also published as: JPS63124162A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は並列計算機の要素プロセツサの相互結合方式
に係り、特にプロセツサ間通信が不規則で高い信頼性を
要求される場合に好適なスイツチ構成に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a mutual connection method of element processors of a parallel computer, and particularly to a switch configuration suitable for a case where communication between processors is irregular and high reliability is required. About.

[Conventional technology]

従来の装置は、特開昭58−181168号公報に記載のよう
に、格子状に配列された要素プロセツサの隣り合うもの
同士を結合する方式か、または、特開昭58−181166号公
報に記載のように、全要素プロセツサをクロスバイスイ
ツチで結合する方式を採るものが多かつた。また、近年
では下記の文献１に記載されているように、多段スイツ
チにより全プロセツサを結合する方式が注目されてい
る。A conventional device is a method in which adjacent ones of element processors arranged in a lattice are connected to each other as described in JP-A-58-181168, or described in JP-A-58-181166. In many cases, a method of connecting all the element processors with a cross-by-switch is used. In recent years, as described in the following document 1, a method of connecting all processors by a multi-stage switch has attracted attention.

文献1:プロシーデイングズオブザナインテイーン
スアニユアルハワイインターナシヨナルコンフ
アレンスオンシステムサイエンシズ,1986,第214
頁から第221頁（Proceedings of the Nineteenth Annua
l Hawaii International Conference on System Scienc
es,1986,pp214−221）〔発明が解決しようとする問題点〕上記従来技術のうち、格子状結合方式は不規則なプロ
セツサ間通信に対しては非常に効率が悪く、完全クロス
バ結合方式は任意の高速結合が可能である反面、要素プ
ロセツサ数の２乗に比例してハードウエア規模が増大す
るため大規模な並列計算機には適用できないという問題
があつた。また、多段スイツチは、任意結合を比較的少
ないハード規模で実現することができるが、大規模計算
の主流である格子状計算（格子状の対象システムの状態
を反復法により求める計算）を効率的に実施できず、ま
た、通信時間もかかるという難点がある。Literature 1: Proceedings of the Ninetains Anyu Hawaii Hawaii International Conference on System Sciences, 1986, 214
Page to page 221 (Proceedings of the Nineteenth Annua
l Hawaii International Conference on System Scienc
es, 1986, pp. 214-221) [Problems to be Solved by the Invention] Among the above conventional techniques, the grid-like coupling method is very inefficient for irregular interprocessor communication, and the complete crossbar coupling method is Although arbitrary high-speed connection is possible, there is a problem that the hardware scale increases in proportion to the square of the number of element processors, so that it cannot be applied to a large-scale parallel computer. In addition, the multistage switch can realize arbitrary coupling on a relatively small hardware scale, but efficiently performs a grid-like calculation (a calculation for obtaining a state of a grid-like target system by an iterative method) which is a mainstream of large-scale calculation. And it takes a long time to communicate.

本発明の目的は、比較的少ないハード規模で、任意の
プロセツサ間結合を実現し、しかも通信は高速で、か
つ、格子状計算も高効率に実施できるようなプロセツサ
間相互結合方式を提供することにある。SUMMARY OF THE INVENTION An object of the present invention is to provide an inter-processor interconnection method that can realize arbitrary inter-processor coupling with a relatively small hardware scale, can perform high-speed communication, and can also perform grid-like computation with high efficiency. It is in.

[Means for solving the problem]

上記目的は、結合方式として基本性能のすぐれたクロ
スバスイツチを基本とし、これを２次元アレイ状に配置
したプロセツサ群の各行，各列ごとに設けることにより
達成できる。何故なら、全プロセツサ台数が多くとも、
１行または１列当りの台数は少なくて済むのでこれらを
クロスバスイツチで結合することは無理なく実現でき、
さらに、あるプロセツサが、自分が直接アクセスできる
（ひとつの）行または列担当クロスバスイツチによりデ
ータを届けられないプロセツサに対しても、例えばまず
行方向のスイツチを用いて適切な中継プロセツサにデー
タを送り、次にこの中継プロセツサが列方向のスイツチ
を用いてデータを再送することにより必ず届けることが
できるので、任意のプロセツサ間通信を実現できる。さ
らに、各プロセツサは行方向のスイツチにより東西方向
に隣接したプロセツサに、また、列方向のスイツチによ
り南北方向に隣接したプロセツサに即座にアクセスでき
るので格子状計算も高効率に実施できる。The above object can be achieved by providing a cross bus switch having excellent basic performance as a coupling method and providing the cross bus switch for each row and each column of a processor group arranged in a two-dimensional array. Because even if the total number of processors is large,
Since the number of units per row or column is small, it is easy to combine them with a cross bus switch.
In addition, a processor may send data to an appropriate relay processor using, for example, a row-oriented switch, for a processor to which it cannot directly receive data due to a (single) row or column cross bus switch to which the processor has direct access. Next, since the relay processor can always send the data by resending the data using the switches in the column direction, communication between the processors can be realized. Further, since each processor can immediately access the processors adjacent in the east-west direction by the switches in the row direction and the processors adjacent in the north-south direction by the switches in the column direction, the grid calculation can be performed with high efficiency.

[Action]

各要素プロセツサにはｎ＝α＋β（α，β：正整数）
ビツトの識別番号を付け、下αビツトが２次元アレイ状
に配置したプロセツサ群中の縦方向の位置（行番号）
を、上βビツトが横方向の位置（列番号）を示すように
する。行担当のスイツチは通信パケツト頭部に付せられ
た先行プロセツサ番号の上βビツトのみを、列担当のス
イツチは下αビツトのみをデコードしてパケツトを送る
ことにすれば、送信元プロセツサと行先プロセツサの行
番号または列番号が等しければ列または行担当スイツチ
を１度用いるだけで、両方とも異なつている場合でも各
１回（計２回）用いるだけで、任意の先行に必ず送信す
ることができる。N = α + β (α, β: positive integer) for each element processor
A bit identification number is assigned, and the lower α bit is a vertical position (row number) in a processor group arranged in a two-dimensional array.
So that the upper β bit indicates the horizontal position (column number). If the switch in charge of the row is to decode only the upper β bit of the preceding processor number attached to the head of the communication packet and the switch in charge of the column is to decode only the lower α bit and send the packet, the switch in the source processor and the destination If the row numbers or column numbers of the processors are the same, only the column or row switch is used once, and even if both are different, they are used only once (two times in total), and can always be sent to any predecessor. it can.

〔Example〕

以下、本発明の１実施例を第１図により説明する。 Hereinafter, an embodiment of the present invention will be described with reference to FIG.

システム全体は、行クロスバスイツチ１、列クロスバ
スイツチ２、要素プロセツサ３の集合として構成され
る。要素プロセツサは、図に示すように順番に番号付け
られる。この図の例では、0000から1111まで2⁴個が識別
されている。これらのプロセツサは、第４図に示すよう
に、１次元アレイと見ることも２次元アレイと見ること
もできる。２次元アレイと見る場合は、縦と横が２の巾
乗個のアレイ構成になつている必要がある。縦が２のα
乗、横が２のβ乗であるとき、プロセツサ番号の下αビ
ツトが縦方向の位置を、上βビツトが横方向の位置を示
す。以後、プロセツサ番号の下αビツトが等しいプロセ
ツサ群を同一の行プロセツサ群と言い、上βビツトが等
しいプロセツサ群を同一の列プロセツサ群と言う。The entire system is configured as a set of row cross bus switches 1, column cross bus switches 2, and element processors 3. Element processors are numbered sequentially as shown. In the example of FIG, 2 ⁴ from 0000 to 1111 have been identified. These processors can be viewed as one-dimensional or two-dimensional arrays, as shown in FIG. When viewed as a two-dimensional array, it is necessary that the vertical and horizontal arrays have an array configuration of a power of two. Α of vertical 2
When the power and width are 2 to the power of β, the lower α bit of the processor number indicates the vertical position and the upper β bit indicates the horizontal position. Hereinafter, a processor group having the same lower α bit of the processor number is referred to as the same row processor group, and a processor group having the same upper β bit is referred to as the same column processor group.

同一の行プロセツサ群を完全結合するクロスバスイツ
チを行クロスバスイツチと言い、同一の列プロセツサ群
を完全結合するクロスバスイツチを列クロスバスイツチ
と言う。各要素プロセツサは一つの行又は列クロスパス
イツチに関して、他の一つの要素プロセツサとだけ結合
できる。従つて、競合の無い場合は２のβ乗個または２
のα乗個の独立した伝送路が存在しうる。A cross bus switch that completely connects the same row processor group is called a row cross bus switch, and a cross bus switch that completely connects the same column processor group is called a column cross bus switch. Each element processor can be associated with only one other element processor for one row or column cross path switch. Therefore, when there is no competition, 2 β powers or 2
There may be α independent power transmission paths.

各要素プロセツサは第２図に示すように、通常のプロ
セツサのようにプロセツシングユニツト31とメモリ32を
持つほかに、通信ユニツト33と、行または列クロスバス
イツチ1,2とデータ授受を行うための入出力ポート34を
持つ。As shown in FIG. 2, each element processor has a processing unit 31 and a memory 32 like a normal processor, and also exchanges data with a communication unit 33 and a row or column cross bus switch 1, 2. It has an input / output port 34.

次に行または列クロスバスイツチを用いて通信を行う
方法について述べる。各プロセツサの通信ユニツトは、
第３図に示すパケツトを編成する。データを送るべきプ
ロセツサのプロセツサ番号をスイツチアドレスとしてパ
ケツトの頭に付け、行または列クロスバスイツチとデー
タ授受を行うための入出力ポートに格納する。プロセツ
サ番号の下αビツトを以下SNビツト、上βビツトをEWビ
ツトという。このとき、通信ユニツトは自プロセツサ番
号のEWビツトとデータを送るべきプロセツサのEWビツト
が等しいときは列クロスバスイツチに対して、それ以外
は行クロスバスイツチにたいしてパケツトを出力する。
列クロスバスイツチはSNビツトだけを選んで、担当する
列プロセツサ群中のプロセツサ番号のSNビツトがそれに
等しいプロセツサにパケツトを送る。行クロスバスイツ
チはEWビツトだけを選んで、担当する行プロセツサ群中
のプロセツサ番号のEWビツトがそれに等しいプロセツサ
にパケツトを送る。各プロセツサの通信ユニツトは、パ
ケツトを受け取るとスイツチアドレスを調べ、自分でな
ければ（行クロスバスイツチから届いたパケツトである
ので）列クロスバスイツチに対して再び送り出す。今度
はSNビツトが選ばれて、最終目的地のプロセツサに届
く。例を用いて説明する。Next, a method of performing communication using a row or column cross bus switch will be described. The communication unit of each processor is
The packet shown in FIG. 3 is knitted. The processor number of the processor to which data is to be sent is attached to the head of the packet as a switch address, and stored in the input / output port for exchanging data with the row or column cross bus switch. The lower α bit of the processor number is hereinafter referred to as SN bit, and the upper β bit is referred to as EW bit. At this time, when the EW bit of its own processor number is equal to the EW bit of the processor to which data is sent, the communication unit outputs a packet to the column cross bus switch, and otherwise outputs a packet to the row cross bus switch.
The column cross bus switch selects only the SN bit and sends a packet to the processor whose SN bit of the processor number in the responsible column processor group is equal to it. The row cross bus switch selects only the EW bit, and sends a packet to the processor whose EW bit of the processor number in the assigned row processor group is equal to the EW bit. When the communication unit of each processor receives the packet, it checks the switch address, and if it is not itself (because it is a packet received from the row cross bus switch), it sends it out again to the column cross bus switch. This time the SN bit is chosen and reaches the final destination processor. This will be described using an example.

第１図のプロセツサ0111から1101および0100にデータ
を送る場合を考える。プロセツサ1101に送る場合はEWビ
ツトが異なるから、第３図（ｂ）に示すパケツトを行ク
ロスパスイツチにたいして送り出す。行クロスバスイツ
チはEWビツトすなわち11を見て、プロセツサ1111にパケ
ツトを送る。プロセツサ1111の通信ユニツトではパケツ
トのスイツチアドレス1101が自身に等しくないから、列
クロスバスイツチに再送する。列クロスバスイツチでは
SNビツト01を見て、プロセツサ1101に送り届ける。プロ
セツサ0100に送る場合は、EWビツトが等しいから列クロ
スバスイツチに対して送り出す（第３図（ｃ））。この
場合は直接プロセツサ0100に届く。Consider the case where data is sent to processors 0111 to 1101 and 0100 in FIG. When the packet is sent to the processor 1101, the EW bit is different, so that the packet shown in FIG. 3B is sent out to the row cross path switch. The row cross bus switch sees the EW bit or 11, and sends a packet to processor 1111. In the communication unit of the processor 1111, since the switch address 1101 of the packet is not equal to itself, the packet is retransmitted to the column cross bus switch. In the column cross bass switch
Look at SN bit 01 and send it to processor 1101. When the data is sent to the processor 0100, the data is sent to the column cross bus switch because the EW bits are equal (FIG. 3 (c)). In this case, it reaches processor 0100 directly.

もし中継点の要素プロセツサが故障したり、あるいは
送信元の要素プロセツサとの間にデツドロツクが発生し
た場合には、別の転送路を再設定して送信する。このと
きは、パケツト中のルート再設定ビツトを１にして列ク
ロスバスイツチにこのパケツトを再送する。ルート再設
定ビツトが１のパケツトを受け取つた各要素プロセツサ
の通信ユニツトは、パケツトを行クロスバスイツチに再
送することにより、自動的に障害部を迂回する。上の例
では、プロセツサ1111が故障の場合、0111→0101→1101
という経路でパケツトが送られる。第５図は通信ユニツ
トの処理手順のフローチヤートである。If the element processor at the relay point breaks down or a deadlock occurs with the element processor at the transmission source, another transfer path is re-established for transmission. At this time, the route resetting bit in the packet is set to 1 and the packet is retransmitted to the column cross bus switch. The communication unit of each element processor that receives the packet whose route reset bit is 1 automatically bypasses the failure part by retransmitting the packet to the row cross bus switch. In the above example, if the processor 1111 fails, 0111 → 0101 → 1101
The packet is sent along the route. FIG. 5 is a flowchart of the processing procedure of the communication unit.

本発明の相互結合方式と、従来の代表的な相互結合方
式である格子状結合（東西南北方向の要素プロセツサと
のみ結合）および完全クロスパスイツチ結合（任意プロ
セツサとクロスバスイツチを介して結合）とを比較す
る。比較項目としてプロセツサ間距離とスイツチング素
子数を取り上げる。プロセツサ間距離とは、通信が可能
となるまでのスイツチング回数であり、直接の通信路が
存在する場合は１である。スイツチング素子数とは全通
信経路上に存在するスイツチング素子の数である。前者
は通信効率あるいは通信速度の評価尺度であり、後者は
ハードウエア規模の評価尺度である。簡単のためにＮ×
Ｎの要素プロセツサのアレイを考える。このとき、プロ
セツサ間距離は次のようになる。The mutual coupling method of the present invention, the lattice-like coupling (connection only to the east-west-north-south element processor) and the complete cross-path switch (connection to any processor and cross-bus switch), which are conventional representative coupling methods, Compare. The distance between processors and the number of switching elements are taken as comparison items. The inter-processor distance is the number of times of switching until communication becomes possible, and is 1 when a direct communication path exists. The number of switching elements is the number of switching elements existing on all communication paths. The former is an evaluation scale of communication efficiency or communication speed, and the latter is an evaluation scale of hardware scale. N × for simplicity
Consider an array of N element processors. At this time, the distance between the processors is as follows.

完全クロスバスイツチ結合１本発明１〜２完全クロスバスイツチ結合１本発明１〜２また、スイツチング素子数は次のようになる。 Complete cross bus switch connection 1 The present invention 1-2 Complete cross bus switch coupling 1 Present invention 1-2 The number of switching elements is as follows.

格子状結合２（Ｎ−２）^２＋４（Ｎ−
１）完全クロスパスイツチ結合 N⁴ 本発明２（Ｎ−１）N² すなわち、本発明はハードウエア規模は完全クロスバ
スイツチ結合より１次低いオーダーであるにもかかわら
ず、通信能力は同程度である。例えばＮ＝100とする
と、プロセツサ間距離、スイツチング素子数はそれぞれ
次のようになる。Lattice connection 2 (N-2) ² +4 (N−
1) Complete cross-path switch N ^{4 The} present invention 2 (N-1) N ^{2 In} other words, the present invention has the same communication capability even though the hardware scale is the first order lower than that of the complete cross-bus switch. is there. For example, if N = 100, the inter-processor distance and the number of switching elements are as follows.

格子状結合 70,19604 完全クロスバスイツチ結合 1,10⁸ 本発明１〜2,1.98×10⁶ 〔発明の効果〕本発明によれば、並列計算機にかける要素プロセツサ
間結合方式として、最も結合自由度の高い完全クロスバ
スイツチ結合に近い結合自由度を実現しながら、ハード
ウエア規模を小さく抑えた相互結合が可能となるので、
要素プロセツサ数の多い大規模な並列計算機では従来主
流であつた格子状結合方式では扱えない、不規則な通信
を含む演算処理を実現できるという効果がある。その適
用先としては、FFT（高速フーリエ変換）処理を中心と
した高度画像処理、ノードジヤンクシヨン法を用いた原
子炉安全性解析、大規模スパース行列をもつ連立一次方
程式の消去法による求解等、従来の並列計算機が通信上
の困難から効率的に扱うことのできなかつた問題群が、
特に効果のあるものとしてあげられる。しかし、流体計
算や粒子追跡計算等、格子状結合に好適とされている問
題群も同様に扱うことができる。Lattice connection 70,19604 Perfect cross bass switch connection 1,10 ⁸ Invention 1-2,1.98 × 10 ⁶ [Effects of the invention] According to the present invention, the coupling degree of freedom among element processors applied to a parallel computer is the highest. High degree of freedom and a degree of freedom close to that of full cross-bath switch coupling, while enabling mutual coupling with a small hardware scale.
In a large-scale parallel computer having a large number of element processors, there is an effect that arithmetic processing including irregular communication can be realized which cannot be handled by the grid connection method which has conventionally been mainstream. Applications include advanced image processing centered on FFT (Fast Fourier Transform) processing, reactor safety analysis using the node junction method, elimination of simultaneous linear equations with large sparse matrices, etc. A group of problems that conventional parallel computers could not handle efficiently due to communication difficulties,
Particularly effective. However, problems that are suitable for lattice-like coupling, such as fluid calculation and particle tracking calculation, can be handled in the same manner.

[Brief description of the drawings]

第１図は本発明の相互結合方式である行列クロスバスイ
ツチの概念構成図、第２図は行列クロスバスイツチを利
用する要素プロセツサの構成図、第３図は送信パケツト
の構成図、第４図はプロセツサ番号付けの一例を示す
図、第５図は通信ユニツトにおける処理手順のフローチ
ヤートである。FIG. 1 is a conceptual configuration diagram of a matrix cross bus switch which is an interconnection method of the present invention, FIG. 2 is a configuration diagram of an element processor using a matrix cross bus switch, FIG. 3 is a configuration diagram of a transmission packet, and FIG. FIG. 5 is a flowchart showing an example of processor numbering, and FIG. 5 is a flowchart of a processing procedure in the communication unit.

───────────────────────────────────────────────────── フロントページの続き (72)発明者中尾和夫川崎市麻生区王禅寺1099番地株式会社日立製作所システム開発研究所内 (56)参考文献特開昭63−278170（ＪＰ，Ａ) 特開昭64−4856（ＪＰ，Ａ) 特開平１−131950（ＪＰ，Ａ) 濱崎陽一、岡田義邦、”Ｄｉａｌｏｇ．Ｈのカーネル”，情報処理学会第33 回（昭和61年後期）全国大会講演論文集（▲Ｉ▼），（1986）Ｐ．357−358 鈴木節外３名，”超並列ＡＩマシンの構想”，情報処理学会第35回（昭和62 年後期）全国大会講演論文集（▲Ｉ ▼），（1987）Ｐ．135−136 鈴木節外４名，”並列ＡＩマシンＰｒｏｄｉｇｙのアーキテクチャ”，情報処理学会第36回（昭和63年前期）全国大会講演論文集（▲Ｉ▼），（1988）Ｐ. 255−256 ──────────────────────────────────────────────────続き Continuation of the front page (72) Inventor Kazuo Nakao 1099 Ozenji Temple, Aso-ku, Kawasaki City Inside System Development Laboratory, Hitachi, Ltd. (56) References JP-A-63-278170 (JP, A) JP-A-64- 4856 (JP, A) JP-A-1-131950 (JP, A) Yoichi Hamasaki, Yoshikuni Okada, "Kernel of Dialog g. H", Proc. Of the 33rd IPSJ Annual Conference (I)), (1986) p. 357-358 Suzuki Setsu, et al., "Concept of Massively Parallel AI Machine", Proc. Of the 35th IPSJ Annual Convention (late 1987) (I), (1987) 135-136 Suzuki Setsu, et al., "Architecture of Parallel AI Machine Prodigy", Proc. Of the 36th Annual Meeting of the Information Processing Society of Japan (Early 1988) (I), (1988) P. 255- 256

Claims

(57) [Claims]

1. A vertical 2 ^alpha number, (the alpha and beta positive integer) horizontal 2 ^beta pieces in the mutual coupling type parallel computer having a plurality of element processors arranged in a grid-like, for each row of said element processors has a line crossbar switch to another full coupling between 2 ^beta number of element processors belongs the line respectively, mutually fully bonded between 2 ^alpha number of element processors belonging to the column for each column of said element processors Each of the element processors includes, when the element processor is a data transmission source, the transmission data including an α-bit vertical address and a β-bit horizontal address in the transmission data. The communication packet is organized by adding the destination address of the above, and one of the row crossbar switch and the column crossbar switch is selected to output the communication packet, One, the row crossbar switch,
When a communication packet is input from one of the column crossbar switches, it is determined whether or not its own element processor is the processor of the destination address. If not, the communication packet is sent to a crossbar switch different from the crossbar switch that input the communication packet. A mutual connection method for parallel computers, comprising a communication unit for transferring data.