JP2002244885A

JP2002244885A - Computer system monitoring system

Info

Publication number: JP2002244885A
Application number: JP2001043905A
Authority: JP
Inventors: Takayuki Hasegawa; 隆之長谷川
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-02-20
Filing date: 2001-02-20
Publication date: 2002-08-30

Abstract

PROBLEM TO BE SOLVED: To make processing in an application automatically continued a even when a platform OS has a fault by automatically restarting the platform OS. SOLUTION: The system has a timer mechanism 7 which uses a clocking function of the platform OS 3 of a computer system to write an initial value V into a monitoring region 9 at constant time intervals Ta. A monitoring device 8 decrements and rewrites the value set in the monitoring region 9 at constant time intervals Tb, and if the value becomes negative, it judges an anomaly in the platform OS 3 and notifies a DP-UX environment (H/W) 4 of it. Upon the notification from the monitoring device 8, the DP-UX environment (H/W) 4 suspends an application 6 operated on a DP-UX environment (S/W) 5. When the platform OS 3 is rebooted by the monitoring device 8, the DP-UX environment (H/W) 4 restarts the application 6.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、コンピュータシス
テム監視システム、特にコンピュータシステムにおける
プラットフォームオペレーティングシステムの異常発生
に伴うアプリケーションの異常終了を回避する方法に関
する。[0001] 1. Field of the Invention [0002] The present invention relates to a computer system monitoring system, and more particularly to a method for avoiding abnormal termination of an application due to occurrence of a platform operating system failure in a computer system.

【０００２】[0002]

【従来の技術】図４は、従来のコンピュータシステムの
概略構成図である。従来例におけるコンピュータシステ
ムは、３階層で構築されている。最下層（第１層）は、
ディスク装置、ネットワーク装置、テープ装置などの入
出力機器に対して入出力処理を行うオペレーティングシ
ステム（ＯＳ）及びハードウェアである。このうち、特
にＯＳ部分をプラットフォームオペレーティングシステ
ム（ＯＳ）と呼ぶ。また、ハードウェアは、一般にマザ
ーボードに相当する。なお、図４では、入出力機器とし
てディスク装置及びネットワーク装置であるＬＡＮカー
ドを例示している。第２層は、プラットフォームＯＳの
制御下で動作して、アプリケーション群を制御するため
のハードウェア及びミドルウェアである。この従来例で
は、第２層を三菱電機（株）が提供するＤＰ−ＵＸ環境
により実現している。第３層は、ＤＰ−ＵＸ環境上で動
作するアプリケーションプログラムである。ＤＰ−ＵＸ
環境は、アプリケーションプログラムが処理を行なうの
に必要な入出力処理機能、スケジュール機能、メモリ資
源の割り当てなどを行う。2. Description of the Related Art FIG. 4 is a schematic configuration diagram of a conventional computer system. The computer system in the conventional example is constructed with three layers. The lowest layer (first layer)
An operating system (OS) and hardware for performing input / output processing on input / output devices such as a disk device, a network device, and a tape device. Of these, the OS part is particularly called a platform operating system (OS). The hardware generally corresponds to a motherboard. FIG. 4 illustrates a disk device and a LAN card as a network device as input / output devices. The second layer is hardware and middleware that operate under the control of the platform OS to control a group of applications. In this conventional example, the second layer is realized by a DP-UX environment provided by Mitsubishi Electric Corporation. The third layer is an application program that operates on the DP-UX environment. DP-UX
The environment performs input / output processing functions, scheduling functions, memory resource allocation, and the like necessary for the application programs to perform processing.

【０００３】次に、従来例における動作について説明す
る。ここでは、入出力処理の流れを例にして説明する。Next, the operation of the conventional example will be described. Here, the flow of the input / output processing will be described as an example.

【０００４】アプリケーションからの入出力処理要求
は、ＤＰ−ＵＸ環境に渡される（Ｓ１）。ＤＰ−ＵＸ環
境は、入出力要求に必要な資源を調べ、要求をプラット
フォームＯＳに渡す（Ｓ２）。プラットフォームＯＳ
は、入出力処理要求を受けると要求された入出力先とな
るディスク装置、ＬＡＮあるいはテープ装置などに入出
力要求を出す（Ｓ３）。入出力処理の結果は、上記処理
の流れと逆の順序でアプリケーションに返される（Ｓ
４）。アプリケーションは、ＤＰ−ＵＸ環境が提供して
いる機能を所定の方法によって使用していれば、プラッ
トフォームＯＳを意識する必要はない。An input / output processing request from an application is passed to a DP-UX environment (S1). The DP-UX environment checks the resources required for the input / output request and passes the request to the platform OS (S2). Platform OS
Receives the input / output processing request, issues an input / output request to the requested disk device, LAN, tape device, or the like (S3). The result of the input / output processing is returned to the application in the reverse order of the above processing flow (S
4). The application does not need to be aware of the platform OS as long as the application uses the function provided by the DP-UX environment by a predetermined method.

【０００５】ところで、このコンピュータシステムにお
いてプラットフォームＯＳに回復不可能な障害が発生す
ると、プラットフォームＯＳは、上記Ｓ２で渡された入
出力要求の処理ができなくなる。また、プラットフォー
ムＯＳ上に構築されているＤＰ−ＵＸ環境も動作できな
くなる。この状態からシステムを回復するためにはプラ
ットフォームＯＳを再起動する必要がある。これに伴
い、プラットフォームＯＳ上で動いているＤＰ−ＵＸの
再起動も必要になる。更に、ＤＰ−ＵＸ環境の再起動に
伴い、ＤＰ−ＵＸ環境上で動作していたアプリケーショ
ンもいったん終了させて再起動する必要がある。再起動
後、コンピュータシステムが保持管理するファイル等の
整合性チェックを行い、不整合が生じていれば修復し、
その後アプリケーションを投入しなおす必要がある。If an irrecoverable failure occurs in the platform OS in this computer system, the platform OS cannot process the input / output request passed in S2. In addition, the DP-UX environment built on the platform OS cannot operate. To recover the system from this state, it is necessary to restart the platform OS. Accordingly, it is necessary to restart the DP-UX running on the platform OS. Further, with the restart of the DP-UX environment, it is necessary to temporarily terminate and restart the applications operating on the DP-UX environment. After restarting, check the consistency of the files etc. that are maintained and managed by the computer system, repair any inconsistencies,
Then you need to re-enter the application.

【０００６】[0006]

【発明が解決しようとする課題】以上のように、プラッ
トフォームＯＳに回復不可能な障害が発生した場合、従
来例においては、プラットフォームＯＳの再起動を行
い、再起動後にリカバリ処理を実施する必要があったた
め、業務を再開させるまでに多大な時間を要していた。
また、正常にシステムを復旧させるためには、オペレー
タの介入が必要な場合もあった。As described above, when an unrecoverable failure occurs in the platform OS, in the conventional example, it is necessary to restart the platform OS and to execute a recovery process after the restart. As a result, it took a lot of time to resume operations.
In addition, in some cases, operator intervention is required to restore the system normally.

【０００７】本発明は以上のような問題を解決するため
になされたものであり、その目的は、プラットフォーム
ＯＳに障害が発生しても一切オペレータを介入すること
なくプラットフォームＯＳを再起動してコンピュータシ
ステム上でアプリケーションにおける処理を自動的に続
行できるコンピュータシステム監視システムを提供する
ことにある。SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to restart a platform OS without any operator intervention even if a failure occurs in the platform OS. An object of the present invention is to provide a computer system monitoring system that can automatically continue processing in an application on a system.

【０００８】[0008]

【課題を解決するための手段】以上のような目的を達成
するために、本発明に係るコンピュータシステム監視シ
ステムは、入出力機器に対してアクセスするプラットフ
ォームオペレーティングシステムと、入出力機器に対す
る入出力要求を発行するアプリケーションの実行制御を
行うミドルウェアとを搭載したコンピュータシステムで
あってアプリケーションからの入出力要求が前記ミドル
ウェアを介してプラットフォームオペレーティングシス
テムに渡されることで入出力処理が実行されるコンピュ
ータシステムの動作を監視するコンピュータシステム監
視システムにおいて、プラットフォームオペレーティン
グシステムの計時機能を利用して予め決められた第一時
間間隔で特定の監視用領域に予め決められた初期値を書
き込む領域初期化手段と、内部搭載の計時機能を用いて
予め決められた第二時間間隔で前記監視用領域を更新す
る領域更新手段と、前記監視用領域に設定されている値
が所定値に達したことによりプラットフォームオペレー
ティングシステムの停止を検出するシステム監視手段と
を有するものである。In order to achieve the above object, a computer system monitoring system according to the present invention comprises a platform operating system for accessing an input / output device and an input / output request for the input / output device. Operation of a computer system equipped with middleware for controlling execution of an application that issues an input / output process, wherein an input / output request from an application is passed to a platform operating system via the middleware to execute input / output processing. In a computer system monitoring system, an area initialization for writing a predetermined initial value to a specific monitoring area at a predetermined first time interval using a timing function of a platform operating system Step, an area updating means for updating the monitoring area at a predetermined second time interval using an internal timekeeping function, and a value set in the monitoring area reaches a predetermined value. System monitoring means for detecting a stop of the platform operating system.

【０００９】また、前記ミドルウェアは、前記システム
監視手段からのプラットフォームオペレーティングシス
テム停止の通知を受けると入出力要求を発したアプリケ
ーションの処理を一時停止するものである。Further, the middleware temporarily stops processing of an application which has issued an input / output request upon receiving a notice of a platform operating system stop from the system monitoring means.

【００１０】更に、前記ミドルウェアは、前記システム
監視手段からのプラットフォームオペレーティングシス
テム停止の通知を受けると、プラットフォームオペレー
ティングシステムを再起動するものである。[0010] Further, the middleware restarts the platform operating system when it is notified of the platform operating system stop from the system monitoring means.

【００１１】更にまた、前記ミドルウェアは、プラット
フォームオペレーティングシステムの再起動を確認した
後に、一時停止したアプリケーションの処理を再開する
ものである。Still further, the middleware resumes processing of the suspended application after confirming restart of the platform operating system.

【００１２】更にまた、前記ミドルウェアは、アプリケ
ーションの処理の一時停止に伴い停止した内部時計をア
プリケーションの処理再開後に時刻調整を行う際に、通
常より計時を早めることによって実時間に緩やかにあわ
せこむようにするものである。[0012] Furthermore, the middleware is adapted to adjust the internal clock stopped due to the temporary suspension of the processing of the application to the real time by adjusting the time earlier than usual when adjusting the time after the processing of the application is resumed. Is what you do.

【００１３】また、プラットフォームオペレーティング
システムの停止を外部へ通報する通信処理手段を有する
ものである。[0013] Further, there is provided communication processing means for notifying the stop of the platform operating system to the outside.

【００１４】[0014]

【発明の実施の形態】以下、図面に基づいて、本発明の
好適な実施の形態について説明する。Preferred embodiments of the present invention will be described below with reference to the drawings.

【００１５】実施の形態１．図１は、本発明に係るコン
ピュータシステム監視システムの実施の形態１を示した
構成図である。本実施の形態におけるコンピュータシス
テムは、従来例と同様にプラットフォームＯＳ及びハー
ドウェアで形成される最下層（第１層）、プラットフォ
ームＯＳの制御下で動作し、アプリケーション群を制御
するハードウェア及びミドルウェアで形成される第２層
及びミドルウェア上で動作するアプリケーション群で形
成される第３層の３階層で構築されている。本実施の形
態における第２層は、従来例と同様に三菱電機（株）が
提供するＤＰ−ＵＸ環境により実現している。Embodiment 1 FIG. 1 is a configuration diagram showing a first embodiment of a computer system monitoring system according to the present invention. The computer system according to the present embodiment is a lower layer (first layer) formed by a platform OS and hardware, operates under the control of the platform OS, and includes hardware and middleware that control an application group, as in the conventional example. It is constructed with three layers, a second layer formed and a third layer formed by a group of applications operating on middleware. The second layer in the present embodiment is realized by the DP-UX environment provided by Mitsubishi Electric Corporation as in the conventional example.

【００１６】図１には、ディスク装置及びＬＡＮカード
を例示した入出力機器１、第１層を形成するプラットフ
ォームハードウェア２及びプラットフォームハードウェ
ア２上で動作するプラットフォームＯＳ３、第２層を形
成するＤＰ−ＵＸ環境（ハードウェア（Ｈ／Ｗ））４及
びＤＰ−ＵＸ環境（Ｈ／Ｗ）４上で動作するミドルウェ
アであるＤＰ−ＵＸ環境（ソフトウェア（Ｓ／Ｗ））
５、第３層を形成するアプリケーション６が示されてい
る。第１層及び第３層は、従来例と同じでよい。第２層
のうちＤＰ−ＵＸ環境（Ｓ／Ｗ）５は、従来例に対して
後述する処理機能が付加される。更に、図１には、時計
機構７及び監視装置８が示されている。時計機構７は、
プラットフォームＯＳ３の計時機能を利用して予め決め
られた第一時間間隔Ｔａで監視装置８内に設けられた監
視用領域９に予め決められた初期値Ｖを書き込む。監視
用領域９は、監視装置８を構成するハードウェア上の特
定のレジスタで実現される。監視装置８は、内部搭載の
計時機能を用いて予め決められた第二時間間隔Ｔｂで監
視用領域９を更新する。時間間隔ＴａとＴｂは、Ｔａ＜
Ｔｂ×Ｖという関係にある。また、監視用領域９に設定
されている値が所定値に達したことによりプラットフォ
ームオペレーティングシステムの停止を検出する。FIG. 1 shows an input / output device 1 exemplifying a disk device and a LAN card, platform hardware 2 forming a first layer, a platform OS3 operating on the platform hardware 2, and a DP forming a second layer. A UX environment (hardware (H / W)) 4 and a DP-UX environment (software (S / W)) that is middleware that operates on the DP-UX environment (H / W) 4
5, an application 6 forming the third layer is shown. The first and third layers may be the same as in the conventional example. In the DP-UX environment (S / W) 5 of the second layer, a processing function described later is added to the conventional example. FIG. 1 further shows a clock mechanism 7 and a monitoring device 8. The clock mechanism 7
A predetermined initial value V is written into a monitoring area 9 provided in the monitoring device 8 at a predetermined first time interval Ta using a timing function of the platform OS3. The monitoring area 9 is realized by a specific register on hardware constituting the monitoring device 8. The monitoring device 8 updates the monitoring area 9 at a predetermined second time interval Tb by using a timing function built in. The time intervals Ta and Tb are Ta <
There is a relationship of Tb × V. Further, when the value set in the monitoring area 9 reaches a predetermined value, the stop of the platform operating system is detected.

【００１７】次に、本実施の形態における動作を説明す
る。Next, the operation of this embodiment will be described.

【００１８】時計機構７は、コンピュータシステムの正
常動作時にはプラットフォームＯＳ３の計時機能を利用
して一定の時間間隔Ｔａで監視用領域９に初期値Ｖを書
き込む（Ｓ１１）。監視装置８は、監視用領域９に設定
されている値を一定の時間間隔Ｔｂでデクリメントして
書き戻す（Ｓ１２）。このとき、書き戻す値が負になっ
たらＤＰ−ＵＸ環境（Ｈ／Ｗ）４に通知する（Ｓ１
３）。During normal operation of the computer system, the clock mechanism 7 writes the initial value V in the monitoring area 9 at regular time intervals Ta using the clocking function of the platform OS3 (S11). The monitoring device 8 decrements the value set in the monitoring area 9 at regular time intervals Tb and writes it back (S12). At this time, if the value to be rewritten becomes negative, the DP-UX environment (H / W) 4 is notified (S1).
3).

【００１９】ところで、時計機構７が正常に動作してい
れば、監視装置８が時間間隔Ｔｂで監視用領域９に設定
されている値をデクリメントしても、時計機構７は、監
視用領域９を常に一定時間間隔Ｔａで初期値Ｖによって
上書きするので、その設定値は負になることはない。す
なわち、監視用領域９の設定値が正しく上書きされてい
るうちは、時計機構７は正常に動作しており、これによ
り、時計機構７により計時機能が利用されているプラッ
トフォームＯＳ３は、正常に動作しているとみなすこと
ができる。従って、監視装置８は、監視用領域９への設
定値が負になったときには時計機構７が正しく動作しな
かった、すなわちプラットフォームＯＳ３に何らかの障
害あるいは過度な混雑が発生したと判断し、監視装置８
のハードウェアは、ＤＰ−ＵＸ環境（Ｈ／Ｗ）４に対し
てマシンチェック割込みを上げて通知する。By the way, if the clock mechanism 7 is operating normally, even if the monitoring device 8 decrements the value set in the monitoring area 9 at the time interval Tb, the clock mechanism 7 keeps the monitoring area 9 Is always overwritten with the initial value V at a constant time interval Ta, so that the set value does not become negative. That is, while the setting value of the monitoring area 9 is correctly overwritten, the clock mechanism 7 is operating normally, whereby the platform OS 3 in which the clock function is used by the clock mechanism 7 operates normally. Can be considered to be. Therefore, when the value set in the monitoring area 9 becomes negative, the monitoring device 8 determines that the clock mechanism 7 did not operate correctly, that is, that some kind of fault or excessive congestion has occurred in the platform OS3. 8
Hardware notifies the DP-UX environment (H / W) 4 by raising a machine check interrupt.

【００２０】なお、プラットフォームＯＳ３の計時機能
は、コンピュータシステムにかかる負荷の大きさによっ
ては、多少の遅れが生じる可能性がある。従って、初期
値Ｖの設定値にもよるであろうが、一般的に時間間隔Ｔ
ｂが短すぎると一時的な遅れであり回復可能な状態でも
システムの異常と判断してしまうおそれがある。一方、
時間間隔Ｔｂが長すぎると回復不可能な状態からの復旧
に多くの時間を要してしまう。従って、Ｔａ＜Ｔｂ×Ｖ
という関係のもと、各設定値Ｔａ，Ｔｂ，Ｖをプラット
フォームＯＳ３に何らかの異常が発生して回復不可能な
状態、つまり、再起動を要する異常が発生したと判断し
てもよい値に設定する必要がある。本実施の形態におい
て発生する障害というのは、以上の記載から明らかなよ
うに回復不可能な障害を想定している。The timing function of the platform OS3 may have a slight delay depending on the load on the computer system. Therefore, although it depends on the set value of the initial value V, generally the time interval T
If b is too short, it is a temporary delay, and it may be determined that the system is abnormal even in a recoverable state. on the other hand,
If the time interval Tb is too long, it takes much time to recover from an unrecoverable state. Therefore, Ta <Tb × V
In this relation, the respective set values Ta, Tb, and V are set to a state where the platform OS 3 cannot recover due to some abnormality, that is, a value that can be determined to be an abnormality requiring a restart. There is a need. The failure that occurs in the present embodiment is assumed to be an unrecoverable failure, as is apparent from the above description.

【００２１】ＤＰ−ＵＸ環境（Ｈ／Ｗ）４は、監視装置
８から上記通知を受けることでプラットフォームＯＳ３
が正常に動いていないことを認識すると、ＤＰ−ＵＸ環
境（Ｓ／Ｗ）５で動作中のアプリケーション６を一時停
止させる（Ｓ１４）。ＤＰ−ＵＸ環境（Ｈ／Ｗ）４は、
ＤＰ−ＵＸ環境（Ｓ／Ｗ）５を更に一時停止させて入出
力要求のようなプラットフォームＯＳ３に対する処理要
求が一切出ないように制御する。一方、監視装置８は、
プラットフォームハードウェア２をリセットする（Ｓ１
５）。これにより、プラットフォームＯＳ３は再起動す
る。The DP-UX environment (H / W) 4 receives the above notification from the monitoring device 8
When the application 6 recognizes that is not operating normally, the application 6 operating in the DP-UX environment (S / W) 5 is temporarily stopped (S14). DP-UX environment (H / W) 4
The DP-UX environment (S / W) 5 is further suspended to control so that no processing request to the platform OS 3 such as an input / output request is issued. On the other hand, the monitoring device 8
Reset the platform hardware 2 (S1
5). Thereby, the platform OS3 is restarted.

【００２２】プラットフォームＯＳ３の再起動が完了し
たことを確認すると、監視装置８は、その旨をＤＰ−Ｕ
Ｘ環境（Ｈ／Ｗ）４に通知する（Ｓ１６）。ＤＰ−ＵＸ
環境（Ｈ／Ｗ）４は、この通知に応じてＤＰ−ＵＸ環境
（Ｓ／Ｗ）５に一時停止させたアプリケーション６を再
開させる（Ｓ１７）。アプリケーション６が入出力処理
の途中で一時停止された場合、プラットフォームＯＳ３
は、再起動により実行中だった入出力要求を消失してい
る。従って、ＤＰ−ＵＸ環境４，５は、このような場合
には仕掛り中の入出力要求を保存し、消失した入出力要
求を自動的に再試行する。このため、アプリケーション
６は、入出力処理が中断されて再試行されたことを認識
せずに処理を再開することになる。Upon confirming that the restart of the platform OS 3 has been completed, the monitoring device 8 notifies the DP-U
X environment (H / W) 4 is notified (S16). DP-UX
The environment (H / W) 4 resumes the application 6 temporarily suspended in the DP-UX environment (S / W) 5 in response to the notification (S17). When the application 6 is suspended during the input / output processing, the platform OS 3
Has lost the I / O request being executed due to the restart. Therefore, in such a case, the DP-UX environments 4 and 5 save the pending input / output request and automatically retry the lost input / output request. Therefore, the application 6 restarts the processing without recognizing that the input / output processing has been interrupted and retried.

【００２３】また、アプリケーション６が一時停止して
いるときには、ＤＰ−ＵＸ環境４，５において実時刻を
示す内部時計（タイマ）も一時停止している。そのた
め、実際の時刻（プラットフォームＯＳ３上の時刻）と
ＤＰ−ＵＸ環境４，５上の時刻にずれが生じるためプラ
ットフォームＯＳ３の再起動後にＤＰ−ＵＸ環境４，５
の内部時計の時刻調整をしなければならない。本実施の
形態では、ＤＰ−ＵＸ環境４，５における時刻を一気に
実際の時刻にあわせるのではなく通常より計時を早める
ことによって実時間に緩やかにあわせこむようにする。
これにより、ＤＰ−ＵＸ環境４，５での計時が一気に進
むことにより発生しうる不都合を防ぐことができる。例
えば、一定時間の経過によりタイムアウトしてしまうよ
うなアプリケーション６は、タイムアウトせずに動作を
継続して行うことができる。When the application 6 is temporarily stopped, the internal clock (timer) indicating the actual time in the DP-UX environments 4 and 5 is also temporarily stopped. Therefore, the actual time (the time on the platform OS 3) and the time on the DP-UX environments 4 and 5 are different from each other, so that the DP-UX environments 4 and 5 are restarted after the platform OS 3 is restarted.
You have to adjust the time of the internal clock. In the present embodiment, the time in the DP-UX environment 4 or 5 is not adjusted to the actual time at a stroke, but rather is adjusted to the actual time gently by advancing the time measurement more than usual.
As a result, it is possible to prevent inconvenience that may occur when the timing in the DP-UX environments 4 and 5 progresses at a stretch. For example, the application 6 that times out after a certain period of time can continue to operate without timeout.

【００２４】実施の形態２．図２は、本実施の形態にお
けるコンピュータシステム監視システムを示した構成図
である。本実施の形態では、実施の形態１の監視装置８
に通信処理部１０を付加したものである。通信処理部１
０は、監視装置８に接続されたモデム、ＬＡＮ等のネッ
トワーク接続機器を介してコンピュータシステムを遠隔
地から集中管理する監視センタ１１へ障害情報を送信す
る機能を有している。Embodiment 2 FIG. FIG. 2 is a configuration diagram showing a computer system monitoring system according to the present embodiment. In the present embodiment, the monitoring device 8 of the first embodiment
And a communication processing unit 10. Communication processing unit 1
Reference numeral 0 has a function of transmitting fault information from a remote location to a monitoring center 11 that centrally manages a computer system via a network connection device such as a modem or a LAN connected to the monitoring device 8.

【００２５】すなわち、監視装置８は、監視用領域９の
設定値が負になったことによりプラットフォームＯＳ３
の異常を検出すると、通信処理部１０は、その旨を監視
センタ１１へ自動通報する。これにより、監視センタ１
１は、当該コンピュータシステムにて障害が発生したこ
とを即座に知ることができる。また、通信処理部１０
は、プラットフォームＯＳ３の再起動後に収集した障害
情報を監視センタ１１に送信する。障害情報には、障害
の原因を解析するのに有効な情報が含まれているので、
監視センタ１１では、送られてきた障害情報に基づき障
害が発生した原因を追求することができ、障害対策を迅
速に行うことができる。本実施の形態における動作は、
実施の形態１において説明した動作以外に通信処理部１
０が行う処理が追加されただけなので、その他の動作の
説明は省略する。That is, the monitoring device 8 determines that the platform OS 3
When the communication processing unit 10 detects the abnormality of the communication, the communication processing unit 10 automatically notifies the monitoring center 11 of the abnormality. Thereby, the monitoring center 1
1 can immediately know that a failure has occurred in the computer system. The communication processing unit 10
Transmits the failure information collected after the restart of the platform OS3 to the monitoring center 11. Since the failure information contains information useful for analyzing the cause of the failure,
The monitoring center 11 can pursue the cause of the failure based on the transmitted failure information, and can quickly take measures against the failure. The operation in the present embodiment is
In addition to the operation described in the first embodiment, the communication processing unit 1
Since only the processing performed by 0 is added, the description of the other operations is omitted.

【００２６】なお、本実施の形態では、通信処理部１０
は、障害発生時と障害情報収集時に監視センタ１１へ情
報を送信するようにしたが、障害情報取得時に障害情報
と共に障害発生の通報を行うようにしてもよい。In this embodiment, the communication processing unit 10
Although the information is transmitted to the monitoring center 11 when a failure occurs and when the failure information is collected, the failure information may be reported together with the failure information when the failure information is acquired.

【００２７】実施の形態３．図３は、本実施の形態にお
けるコンピュータシステム監視システムを示した構成図
である。本実施の形態では、監視装置８に通信処理部１
０を付加したものであり、実施の形態２と同様な構成を
有している。但し、本実施の形態における通信処理部１
０は、監視装置８に接続されたモデム、ＬＡＮ等のネッ
トワーク接続機器を介して他のコンピュータシステム１
２へプラットフォームＯＳ３の停止を通報する機能を有
している。Embodiment 3 FIG. 3 is a configuration diagram illustrating a computer system monitoring system according to the present embodiment. In the present embodiment, the communication processing unit 1
0 is added, and has the same configuration as the second embodiment. However, the communication processing unit 1 in the present embodiment
0 is another computer system 1 via a network connection device such as a modem or a LAN connected to the monitoring device 8.
2 has a function of notifying the stop of the platform OS3.

【００２８】すなわち、プラットフォームＯＳ３がいっ
たん停止すると、再起動が完了するまでにはある程度の
時間を要する。実行中の業務によっては、システムの停
止が許されないものもある。そこで、本実施の形態で
は、プラットフォームＯＳ３の停止が検出されると、通
信処理部１０は、その旨を所定の他のコンピュータシス
テム１２へ自動通報する。これにより、障害発生の通報
を受けたコンピュータシステム１２は、障害が発生した
コンピュータシステムの代替処理を迅速に開始すること
ができる。That is, once the platform OS 3 is stopped, it takes some time until the restart is completed. Some running tasks may not allow the system to stop. Therefore, in the present embodiment, when the stop of the platform OS 3 is detected, the communication processing unit 10 automatically notifies the other computer system 12 of the detection. Accordingly, the computer system 12 that has received the notification of the occurrence of the failure can quickly start the substitute processing of the computer system in which the failure has occurred.

【００２９】以上のように、上記各実施の形態において
は、プラットフォームＯＳ３の回復不可能な障害を検出
する装置を実装することで、プラットフォームＯＳ３に
回復不可能な障害が発生したときに、アプリケーション
６がＤＰ−ＵＸ環境４，５を通して行う処理を一時的に
停止し、プラットフォームＯＳ３だけを再起動し、再起
動後、一時停止していたＤＰ−ＵＸ環境４，５及びアプ
リケーション６を再開するものである。アプリケーショ
ン６からしてみれば、障害が発生して一時的に処理が中
断されたことは一切認識されず引き続き業務が行なわれ
る。As described above, in each of the above-described embodiments, by installing an apparatus for detecting an unrecoverable failure of the platform OS3, when an unrecoverable failure occurs in the platform OS3, the application 6 Temporarily suspends the processing performed through the DP-UX environments 4 and 5, restarts only the platform OS 3, restarts the suspended DP-UX environments 4, 5 and the application 6 after the restart. is there. From the viewpoint of the application 6, it is not recognized at all that the processing has been temporarily interrupted due to the occurrence of the failure, and the operation is continued.

【００３０】[0030]

【発明の効果】本発明によれば、システム監視手段を設
けたことにより入出力処理を行うプラットフォームオペ
レーティングシステムでの障害を自動検出することがで
きる。According to the present invention, it is possible to automatically detect a fault in the platform operating system that performs input / output processing by providing the system monitoring means.

【００３１】また、ミドルウェアは、プラットフォーム
オペレーティングシステムの停止の通知を受けること
で、入出力を行うアプリケーションを終了させることな
く一時停止することができる。アプリケーションを一時
的に停止させることによりプラットフォームオペレーテ
ィングシステムに対する入出力要求を抑止することがで
きる。Also, the middleware can be temporarily stopped without terminating the application that performs input / output by receiving the notification of the stop of the platform operating system. By temporarily stopping the application, an input / output request to the platform operating system can be suppressed.

【００３２】また、ミドルウェアは、プラットフォーム
オペレーティングシステムの停止の通知を受けること
で、プラットフォームオペレーティングシステムだけを
オペレータの介入無しに自動的に再起動することができ
る。Further, the middleware can automatically restart only the platform operating system without intervention of the operator by receiving the notification of the stop of the platform operating system.

【００３３】また、プラットフォームオペレーティング
システムの再起動後に一時停止したアプリケーションに
おける処理を再開すれば、アプリケーションにおける処
理を実行途中から続行させることができる。Further, by restarting the processing of the application that has been temporarily stopped after the restart of the platform operating system, the processing of the application can be continued from the middle of execution.

【００３４】また、ミドルウェアにおける時刻を一気に
実際の時刻にあわせるのではなく通常より計時を早める
ことによって実時間に緩やかにあわせこむようにしたの
で、ミドルウェアでの計時が一気に進むことにより発生
しうる不都合を防ぐことができる。In addition, the time in the middleware is not set to the actual time at once, but rather is set earlier than usual so that the time can be adjusted slowly to the real time. Can be prevented.

【００３５】また、通信処理手段を設けたことによりプ
ラットフォームオペレーティングシステムの障害発生を
外部に迅速に通報することができる。Further, by providing the communication processing means, the occurrence of a failure in the platform operating system can be promptly reported to the outside.

[Brief description of the drawings]

【図１】本発明に係るコンピュータシステム監視シス
テムの実施の形態１を示した構成図である。FIG. 1 is a configuration diagram showing a first embodiment of a computer system monitoring system according to the present invention.

【図２】本発明に係るコンピュータシステム監視シス
テムの実施の形態２を示した構成図である。FIG. 2 is a configuration diagram showing a second embodiment of the computer system monitoring system according to the present invention.

【図３】本発明に係るコンピュータシステム監視シス
テムの実施の形態３を示した構成図である。FIG. 3 is a configuration diagram showing a third embodiment of the computer system monitoring system according to the present invention.

【図４】従来のコンピュータシステムの概略構成図で
ある。FIG. 4 is a schematic configuration diagram of a conventional computer system.

[Explanation of symbols]

１入出力機器、２プラットフォームハードウェア、
３プラットフォームオペレーティングシステム（Ｏ
Ｓ）、４ＤＰ−ＵＸ環境（ハードウェア（Ｈ／
Ｗ））、５ＤＰ−ＵＸ環境（ソフトウェア（Ｓ／
Ｗ））、６アプリケーション、７時計機構、８監
視装置、９監視用領域、１０通信処理部、１１監視
センタ、１２コンピュータシステム。1 input / output device, 2 platform hardware,
3 Platform operating system (O
S), 4 DP-UX environment (hardware (H /
W)), 5 DP-UX environment (software (S /
W)), 6 applications, 7 clock mechanism, 8 monitoring device, 9 monitoring area, 10 communication processing unit, 11 monitoring center, 12 computer system.

Claims

[Claims]

1. A computer system having a platform operating system for accessing an input / output device and middleware for controlling execution of an application for issuing an input / output request for the input / output device, the input / output request from the application being provided. In the computer system monitoring system for monitoring the operation of the computer system in which the input / output processing is executed by being passed to the platform operating system via the middleware, a first predetermined time is determined by using a clocking function of the platform operating system. An area initializing means for writing a predetermined initial value to a specific monitoring area at a time interval; and an area updating means for updating the monitoring area at a predetermined second time interval using an internal clock function. When the computer system monitoring system the value set in the monitoring area is characterized by having a a system monitoring means for detecting the stop of the platform operating system by having reached a predetermined value.

2. The computer system monitoring system according to claim 1, wherein said middleware suspends processing of an application which has issued an input / output request upon receiving notification of a platform operating system halt from said system monitoring means. system.

3. The computer system monitoring system according to claim 2, wherein the middleware restarts the platform operating system when receiving notification of the platform operating system stop from the system monitoring means.

4. The computer system monitoring system according to claim 3, wherein the middleware resumes processing of the suspended application after confirming restart of the platform operating system.

5. The middleware, when adjusting the time after stopping the processing of the application after resuming the processing of the application, adjusts the internal clock that has stopped due to the suspension of the processing of the application, so that the clock is set earlier than usual so as to slowly adjust to the real time. 5. The computer system monitoring system according to claim 4, wherein:

6. The computer system monitoring system according to claim 1, further comprising communication processing means for notifying the stop of the platform operating system to the outside.