CN102521062B - Software fault-tolerant method capable of comprehensively on-line self-detection single event upset - Google Patents

Software fault-tolerant method capable of comprehensively on-line self-detection single event upset Download PDF

Info

Publication number
CN102521062B
CN102521062B CN201110387908.9A CN201110387908A CN102521062B CN 102521062 B CN102521062 B CN 102521062B CN 201110387908 A CN201110387908 A CN 201110387908A CN 102521062 B CN102521062 B CN 102521062B
Authority
CN
China
Prior art keywords
program code
fault
ram
passage
district
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110387908.9A
Other languages
Chinese (zh)
Other versions
CN102521062A (en
Inventor
吴国春
吴化军
陶晓霞
徐丽娜
钟兴旺
王一唯
林梦园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Institute of Space Radio Technology
Original Assignee
Xian Institute of Space Radio Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Institute of Space Radio Technology filed Critical Xian Institute of Space Radio Technology
Priority to CN201110387908.9A priority Critical patent/CN102521062B/en
Publication of CN102521062A publication Critical patent/CN102521062A/en
Application granted granted Critical
Publication of CN102521062B publication Critical patent/CN102521062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A software fault-tolerant method capable of comprehensively on-line self-detection single event upset comprises the steps of executing storage address interlinking configuration, a fault-tolerant processing parameter generation module, a fault-tolerant processing A module and a fault-tolerant processing B module, reading program storage data in direct memory access (DMA) subsection mode, dynamically generating fault-tolerant processing parameters through verification algorithm and conducting redundancy storage. The fault-tolerant processing B module is used for autonomously and timely monitoring application programs and operation of the fault-tolerant processing A module which is used for timely monitoring operation of the fault-tolerant processing B module, once the single event upset of the programs occurs, corresponding code segment is loaded from a read only memory (ROM), a purpose of conducting error correction of application program codes is achieved, the whole realization process is carried out in a DMA mode, no central processing unit (CPU) time is occupied, the programs is guaranteed to timely operate while conducting error correction, and reliability and safety of on-track operation of software is improved, simultaneously a large amount of hardware cost and time cost are saved, and efficiency is improved.

Description

The software fault-tolerant method of single-particle inversion can be surveyed by On-line self-diagnosis comprehensively
Technical field
The present invention relates to and a kind ofly can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, cause the error-detection error-correction of program code mistake for the single-particle inversion of space application or other.
Background technology
Two kinds of modes are mainly contained at present to primary particle inversion resistant design, one is coordinate application programs to carry out EDAC error-detection error-correction or software by hardware design software directly to run on the high-grade device to single-particle inversion almost immunity, two is do not rely on hardware design, adopts the mode application programs of software design to carry out the error-detection error-correction of single-particle inversion.
In recent years, that recognizes from open publication and open channel is domestic and international as follows based on memory RAM anti-single particle overturn event protection Design situation:
Scheme (1) adopts error-detection error-correction data encoding detect single-particle inversion and correct, as parity check code, CRC, Hamming code, R-S code etc.Based on this measure, mainly take following two kinds of modes both at home and abroad at present:
A) cpu chip of EDAC design itself is carried.As TSC695 chip, AT697 chip series etc.
B) EDAC that DSP chip external memory+FPGA (or ASIC) realizes verifies.But it can not carry out error-detection error-correction process to DSP internal memory space.
The program mainly realizes the single-particle inversion error-detection error-correction function to chip external memory, and shortcoming can not carry out single-particle inversion error-detection error-correction function to processor internal program memory space.
Scheme (2) realizes software error detecting error correction by structural redundancy TMR.Adopt three processor chips to run identical program simultaneously, controlled by the selection mouth of main control computer by processor chips, one-out-three judgement is carried out to address bus and data, when voting circuit detects a bit-errors, correction services subroutine startup work, according to the errorlevel that voting circuit and self-check program identify, type of error repair system returns normal.This mode preferably resolves single-particle inversion problem, but to be hardware spending large for its shortcoming, needs hardware redundancy, simultaneously because hardware configuration is complicated, bring a series of integrity problem.And adopt software design herein, do not need amendment hardware platform can realize primary particle inversion resistant function.Can list of references: Single Event Upset Characterization of the SMJ320C6701Digital Signal Processor Using Proton Irradiation, David M.Hiemstra, SeniorMember IEEE, Bojan Miladinovic, and Fayez Chayab;
Scheme (3) program code be directly fixed in single-particle inversion almost immunity PROM in run.But most of digital signal processing software function performance, owing to limiting by PROM travelling speed, cannot be run in the prom, must run in DSP internal RAM high speed, and this method solves the anti-single particle overturn problem of DSP internal RAM exactly.
Scheme (4) adopts application software periodic refreshing mode to carry out program anti-single particle overturn.The program takes software timing to load the mode refreshed, owing to all will load software at every turn, if refresh interval is long, then cannot realize anti-single particle overturn function within interim, if refresh too fast, then due to frequently from PROM the fetch program write RAM, not only affect the real time execution of program but also some integrity problems can be produced.Current On-board software be at least generally with hour or minute the order of magnitude refresh once.Compare with the program, this method just from PROM, loads the object that appropriate section code reaches error correction after the error detection of application programs single-particle inversion, therefore, the error correction of this method is better than scheme (4) opportunity, and scheme (4) is owing to being periodic refreshing, cannot add up the number of times of single-particle inversion, and the present invention can add up, for subsurface passes abundant single-particle inversion information.
The anti-single particle technology that scheme (5) reconstructs based on partial software.The program takes the timing error correcting routine loaded in read-only ROM to realize application programs and carry out error-detection error-correction; application program, in operational process, after keeping the scene intact, realizes the error correction inspection of application programs code by loading error correction code; carry out in-situ FTIR spectroelectrochemitry after inspection, continue to run.Before the program refreshes at every turn will from PROM loading procedure, also there is the problem of refresh cycle, to realizing anti-single particle overturn error-detection error-correction function in interim, simultaneously because this error correcting routine realizes program storage Direct Programming, be not suitable for the high-speed digital signal processor of current space extensive application as DSP 6X series etc.See patented claim: 201010527687.6.
Not yet see in open source literature at present and utilize dma mode, adopt software design, utilize the two redundancy of fault-tolerant processing to carry out to processor internal processes code the software fault-tolerant method that comprehensive On-line self-diagnosis surveys single-particle inversion.
Summary of the invention
The technology of the present invention is dealt with problems: overcome the deficiencies in the prior art, provide and a kind ofly can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, achieve and the error-detection error-correction that comprehensive On-line self-diagnosis surveys single-particle inversion mistake is carried out to the processor program storer software having a DMA function in the application of space, thus improve software reliability in orbit and security.
The technology of the present invention solution: comprehensively can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis, its feature is to comprise: the execution of storage address link configuration, fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, and step is as follows:
(1) storage address link configuration
Storage address is divided into ROM_A district, ROM_B, according to the RAM_A of the corresponding program storage of compiling link difference, RAM_B; The program code of fault-tolerant processing B module is only placed in described ROM_A district, and the program code of all application programs is placed in described ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module;
(2) fault-tolerant processing parameter generation module
By two passage DMA1 and DMA2 of DMA respectively by RAM_A, the program code of RAM_B moves data field corresponding to program code, dynamically generate fault-tolerant processing A module and all checking parameter of fault-tolerant processing B module by checking algorithm, and respectively redundant storage is carried out to checking parameter.
Have after dma processor chip powers on or reset, call fault-tolerant processing parameter generating function when initialization and dynamically generate fault-tolerant parameter.This program on a processor electricity or reset after only call execution once, the fault-tolerant parameter of generation is stored in RAM by Redundancy Design;
(3) fault-tolerant processing A module
Fault-tolerant processing A is used for the operation of autonomous Real-Time Monitoring program storage RAM_A district internal program code, once this program code generation single-particle inversion mistake, then record described error message, and start error correction, from ROM_A, program code read covers RAM_A district program code, ensures that RAM_A district program code safety and Health runs;
(4) fault-tolerant processing B module
Fault-tolerant processing B is used for the operation of autonomous Real-Time Monitoring program storage RAM_B district internal program code, once this program code generation single-particle inversion mistake, then record described error message, and start error correction, from ROM_B, program code read covers RAM_B district program code, ensures that RAM_B district program code safety and Health runs;
(5) start address in above-mentioned steps (1) program memory ROM _ A district, the start address in RAM_A district, program memory ROM _ B district, RAM_B district is obtained, read and show Program code information by the MAP of the generation after compiling link, check the program code length in RAM_A district, RAM_B district, these information are called as the formal parameter in fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, thus canbe used on line Autonomous test single-particle inversion mistake.
The specific implementation process of described step (2) is as follows:
(21) first start DMA1 passage, from the program storage RAM_B district that CPU not directly reads and writes, move program code to data field;
(22) dynamically generate by checking algorithm the checking parameter that RAM_B district moves program code after having moved;
(23) by redundancy storage method, checking parameter is stored in data RAM;
(24) start DMA2 passage, from the program storage RAM_A district that CPU not directly reads and writes, move program code to data field;
(25) dynamically generate by checking algorithm the checking parameter that RAM_A district moves program code after having moved;
(26) by balance method for storing, School Affairs is stored in data RAM.
Described step (3) fault-tolerant processing A module specific implementation process is as follows:
(31) function is moved
Check whether the program code of DMA1 passage meets program and move condition, if met, the controller parameter of DMA1 passage is then set, described controller parameter comprises source address, destination address, master control and the secondary control register of DMA1 passage are arranged, source address is set to RAM_A district start address, destination address is set to data field, the master control of startup DMA1 passage and secondary control register carry out program and move, program is moved condition setting for moving, the program arranging DMA1 passage can repairing condition, work of moving is completed automatically by DMA1 passage, move rear source address field program code and all copy destination address data field to,
(32) monitoring function is verified
A. move after function completes when DMA1 passage, carry out the error detection of single-particle inversion mistake, dynamically generated the checking parameter moving program code by checking algorithm, described checking algorithm is consistent with claim 1 step (2);
B. carry out refresh process to the checking parameter of the fault-tolerant processing A module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(33) error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program unrepairable condition of DMA1 passage is set, arrange with secondary control register the program code starting DMA1 passage by the master control of DMA1 passage to move, the original program code that ROM_A district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, circulated to the error correction of RAM_A district program code inspection.
Described step (4) fault-tolerant processing B module specific implementation process is as follows:
(41) function is moved
Check whether the program code of DMA2 passage meets program and move condition, if met, the controller parameter of DMA2 passage is then set, described controller parameter comprises source address, destination address, master control and the secondary control register of DMA2 passage are arranged, source address is set to RAM_B district start address, destination address is set to data field, the master control of startup DMA2 passage and secondary control register carry out program and move, program is moved condition setting for moving, the program arranging DMA2 passage can repairing condition, work of moving is completed automatically by DMA2 passage, move rear source address field program code and all copy destination address data field to,
(42) monitoring function is verified
A. move after function completes when DMA2 passage, carry out the error detection of single-particle inversion mistake, dynamically generated the checking parameter moving program code by checking algorithm, described checking algorithm is consistent with claim 1 step (2);
B. carry out refresh process to the checking parameter of the fault-tolerant processing B module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(43) error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA1 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program unrepairable condition of DMA 2 passage is set, arrange with secondary control register the program code starting DMA 2 passage by the master control of DMA2 passage to move, the original program code that ROM_B district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, circulated to the error correction of RAM_A district program code inspection.
Checking algorithm in described step (2) is XOR and checking algorithm, cumulative sum, or CRC check method.
Redundant storage in described step (2) is 3 get 2 redundant storage.
The present invention's advantage is compared with prior art:
(1) the present invention is the program storage that CPU not directly reads and writes mainly for target, adopt pure software to achieve and error-detection error-correction is carried out to this program storage code, main process is undertaken by dma mode, seldom take CPU time, object can ensure program code real time execution error-detection error-correction while.Not yet see in open source literature at present and utilize dma mode and the two redundancy of fault-tolerant processing to carry out to processor internal processes code the software fault-tolerant situation that comprehensive On-line self-diagnosis surveys single-particle inversion, therefore the present invention has certain novelty and creativeness, save a large amount of hardware costs and time cost simultaneously, improve efficiency;
(2) mainly application software periodic refreshing mode or partial software reconstruct mode carry out anti-single particle overturn design to prior art.This technology has limitation, if arrange length interval, cannot realize anti-single particle overturn function within interim; If interval arranges short, then because frequent loading can affect again the real time execution of program.And in the present invention, the error-detection error-correction of program storage single-particle inversion is determined by the application call cycle opportunity, therefore error correction of the present invention is better than prior art opportunity;
(3) prior art carries out program anti-single particle overturn by periodic refreshing or software reconfiguration, owing to there is interval, prior art statistical space single-particle inversion number of times accuracy is low, and the present invention was determined by the application call cycle, statistics single-particle inversion number of times is comprehensive, can pass abundant single-particle inversion information for subsurface.
Accompanying drawing explanation
Fig. 1 is that situation map is moved in ROM space of the present invention and internal memory space mapping;
Fig. 2 is that DMA of the present invention moves situation map;
Fig. 3 is that fault-tolerant processing module of the present invention calls situation map;
Fig. 4 is fault-tolerant processing dynamic state of parameters generating function implementation status figure of the present invention;
Fig. 5 is fault-tolerant processing A flowchart of the present invention;
Fig. 6 is fault-tolerant processing B flowchart of the present invention.
Embodiment
Concrete implementation is carried out below by way of the more DSP 6X series processors of use.
1, storage address link configuration is carried out
After dsp processor chip reset, according to memory mapped relation, adopt ROM loading mode, by processor chips, the code in ROM space is all moved on program memory space address.As shown in description of drawings 1.
The corresponding situation of load address segmentation: the program code comprising all application programs is placed in ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module, is carried in corresponding program storage RAM_B district when program is run; The program code of fault-tolerant processing B module is placed in ROM_A district, and be loaded into corresponding program storage RAM_A district when program is run, ROM_B district and ROM_A district determine according to the size of respective institute load module amount.
2, fault-tolerant processing parameter generates
Fault-tolerant processing parameter generation module only carries out a dynamic parameter and produces at cpu reset or when powering on, this design makes the maximum modularization of redundant correcting program.The situation of calling of fault-tolerant processing parameter generation module is void xor_init (UNIT32 *pAdd1, UNIT32 *pAdd2, UNIT32pPro_len1, UNIT32 pPro_len2), wherein pAdd1 is RAM_A district start address, wherein pAdd2 is RAM_B district start address, pPro_len1 is RAM_A area code length, pPro_len2 is RAM_B area code length, and invoked procedure is as shown in description of drawings 3.Fault-tolerant processing parameter generation module implementation status, as shown in description of drawings 4, mainly carries out following process:
(1) first DMA1 passage is set, source address is RAM_A district start address, destination address is data field group address & xor_ram_data_sig, DMA1 master control, secondary control register are set, start DMA1 passage, not directly carry out moving program code to data field the program storage RAM_A district read and write from CPU;
(2) moved rear calculating RAM_A district and moved the checking parameter of program code, the checking algorithm of employing is XOR and checking algorithm (also can adopt cumulative sum, or CRC check method calculation check parameter);
(3) get 2 redundancy storage methods by 3 and checking parameter is stored in Xor_Single_Sum in data RAM;
(4) DMA2 passage is set, source address is RAM_B district start address, destination address is data field group address & xor_ram_data_sig, DMA2 master control, secondary control register are set, start DMA2 passage, not directly carry out moving program code to data field the program storage RAM_B district read and write from CPU;
(5) moved rear calculating RAM_B district and moved the checking parameter of program code, the checking algorithm of employing is XOR and checking algorithm (also can adopt cumulative sum, or CRC check method calculation check parameter);
(6) get 2 redundancy storage methods by 3 and checking parameter is stored in Xor_All_Sum in data RAM;
3, fault-tolerant processing A module performs
Application code periodically calls fault-tolerant processing A module, and the call parameters situation of fault-tolerant processing A module is void sig_code_To_Ram (UNIT32 *pAdd1, UNIT32 pPro_len1), parameter situation: pAdd1 is RAM_A area code start address, pPro_len1 is RAM_A area code length, and invoked procedure is as shown in description of drawings 3.Fault-tolerant processing A module implementation status is as shown in description of drawings 5, and functional realiey comprises program code and moves, verifies monitoring, error correction, and specific implementation process is as follows:
(1) program code is moved
A. check whether the program code of DMA1 passage meets program and move condition (it is satisfied for arranging 0xAA, and initialization first time is 0xAA), if met, the controller parameter of DMA1 passage is then set, source address is set to RAM_A district start address, and destination address is set to data field, and the master control of startup DMA1 passage and secondary control register carry out program and move, program is moved condition setting for can not move (being set to 0x0)
B. the program arranging DMA 1 passage can repairing condition (being set to 0xAA), and work of moving is completed automatically by DMA1 passage, has moved rear source address field program code and has all copied destination address data field to;
(2) program code verification monitoring
A. move after function completes when DMA1 passage, the checking parameter that RAM_A district moves program code is dynamically generated by checking algorithm, the checking algorithm adopted is that XOR and checking algorithm (also can adopt cumulative sum, or CRC check method calculation check parameter), ensure that checking algorithm when this step checking algorithm generates with fault-tolerant processing parameter is consistent;
B. carry out refresh process to the checking parameter Xor_Single_Sum of the fault-tolerant processing A module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(3) program code error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program unrepairable condition (being set to 0x0) of DMA 1 passage is set, arrange with secondary control register the program code starting DMA 1 passage by the master control of DMA1 passage to move, the original program code that ROM_A district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA1 passage is set for can move condition (being set to 0xAA), DMA1 passage repairing condition is set for can repair (being set to 0xAA), circulated to the error correction of RAM_A district program code inspection.
4, fault-tolerant processing B module performs
Application code periodically calls fault-tolerant processing B module, and the call parameters situation of fault-tolerant processing B module is void all_code_To_Ram (UNIT32 *pAdd2, UNIT32 pPro_len2), parameter situation: pAdd2 is RAM_B area code start address, pPro_len2 is RAM_B area code length, and invoked procedure is as shown in description of drawings 3.Fault-tolerant processing B module implementation status is as shown in description of drawings 6, and functional realiey comprises program code and moves, verifies monitoring, error correction, and specific implementation process is as follows:
(1) program code is moved
A. check whether the program code of DMA2 passage meets program and move condition (it is satisfied for arranging 0xAA, and initialization first time is 0xAA), if met, the controller parameter of DMA2 passage is then set, source address is set to RAM_B district start address, and destination address is set to data field, and the master control of startup DMA2 passage and secondary control register carry out program and move, program is moved condition setting for can not move (being set to 0x0)
B. the program arranging DMA 2 passage can repairing condition (being set to 0xAA), and work of moving is completed automatically by DMA2 passage, has moved rear source address field program code and has all copied destination address data field to;
(2) program code verification monitoring
A. move after function completes when DMA2 passage, the checking parameter moving program code is dynamically generated by checking algorithm, the checking algorithm adopted is that XOR and checking algorithm (also can adopt cumulative sum, or CRC check method calculation check parameter), ensure that checking algorithm when this step checking algorithm generates with fault-tolerant processing parameter is consistent;
B. carry out refresh process to the checking parameter Xor_All_Sum of the fault-tolerant processing B module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(3) program code error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA2 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program unrepairable condition (being set to 0x0) of DMA 2 passage is set, arrange with secondary control register the program code starting DMA 2 passage by the master control of DMA2 passage to move, the original program code that ROM_B district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA2 passage is set for can move condition (being set to 0xAA), DMA2 passage repairing condition is set for can repair (being set to 0xAA), circulated to the error correction of RAM_B district program code inspection.
Foregoing describes error detection to RAM district single-particle inversion, error correction with DSP internal RAM code area, and actually by change parameters input, the method is applicable to the error-detection error-correction of all RAM area codes, data, to the error correction of data field constant.The situation that application programs code is larger, can carry out segmentation error correction and detection, to ensure the real-time that application code runs, reaches the object of the single-particle inversion error detection to code, error correction simultaneously.

Claims (4)

1. can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, it is characterized in that comprising: the execution of storage address link configuration, fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, step is as follows:
(1) storage address link configuration
Storage address is divided into ROM_A district, ROM_B, according to the RAM_A of the corresponding program storage of compiling link difference, RAM_B; The program code of fault-tolerant processing B module is only placed in described ROM_A district, and the program code of all application programs is placed in described ROM_B district, and the program code of fault-tolerant processing parameter generation module and fault-tolerant processing A module;
(2) fault-tolerant processing parameter generation module
By two passage DMA1 and DMA2 of DMA respectively by RAM_A, the program code of RAM_B moves data field corresponding to program code, dynamically generate fault-tolerant processing A module and all checking parameter of fault-tolerant processing B module by checking algorithm, and respectively redundant storage is carried out to checking parameter;
(3) fault-tolerant processing A module
Fault-tolerant processing A is used for the operation of autonomous Real-Time Monitoring program storage RAM_A district internal program code, once this program code generation single-particle inversion mistake, then record described error message, and start error correction, from ROM_A, program code read covers RAM_A district program code, ensures that RAM_A district program code safety and Health runs;
(4) fault-tolerant processing B module
Fault-tolerant processing B is used for the operation of autonomous Real-Time Monitoring program storage RAM_B district internal program code, once this program code generation single-particle inversion mistake, then record described error message, and start error correction, from ROM_B, program code read covers RAM_B district program code, ensures that RAM_B district program code safety and Health runs;
(5) start address in above-mentioned steps (1) storer ROM_A district, the start address in RAM_A district, storer ROM_B district, RAM_B district is obtained, read and show Program code information by the MAP of the generation after compiling link, check the program code length in RAM_A district, RAM_B district, these information are called as the argument in fault-tolerant processing parameter generation module, fault-tolerant processing A module and fault-tolerant processing B module, thus the error-detection error-correction of comprehensive canbe used on line whole program code single-particle inversion mistake;
Described step (3) fault-tolerant processing A module specific implementation process is as follows:
(31) function is moved
Check whether the program code of DMA1 passage meets program and move condition, if met, the controller parameter of DMA1 passage is then set, described controller parameter comprises source address, destination address, master control and the secondary control register of DMA1 passage are arranged, source address is set to RAM_A district start address, destination address is set to defined address, data field, the master control of startup DMA1 passage and secondary control register carry out program and move, program is moved condition setting for moving, the program arranging DMA1 passage can repairing condition, work of moving is completed automatically by DMA1 passage, move rear source address field program code and all copy destination address data field to,
(32) monitoring function is verified
A. move after function completes when DMA1 passage, carry out the error detection of single-particle inversion mistake, dynamically generated the checking parameter moving program code by checking algorithm, described checking algorithm is consistent with claim 1 step (2);
B. carry out refresh process to the checking parameter of the fault-tolerant processing A module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(33) error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA1 passage is set to ROM_A district start address, and destination address is set to RAM_A district start address; The program unrepairable condition of DMA1 passage is set, arrange with secondary control register the program code starting DMA1 passage by the master control of DMA1 passage to move, the original program code that ROM_A district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA1 passage is set for can move condition, DMA1 passage repairing condition is set for repairing, circulated to the error correction of RAM_A district program code inspection;
Described step (4) fault-tolerant processing B module specific implementation process is as follows:
(41) function is moved
Check whether the program code of DMA2 passage meets program and move condition, if met, the controller parameter of DMA2 passage is then set, described controller parameter comprises source address, destination address, master control and the secondary control register of DMA2 passage are arranged, source address is set to RAM_B district start address, destination address is set to defined address, data field, the master control of startup DMA2 passage and secondary control register carry out program and move, program is moved condition setting for moving, the program arranging DMA2 passage can repairing condition, work of moving is completed automatically by DMA2 passage, move rear source address field program code and all copy destination address data field to,
(42) monitoring function is verified
A. move after function completes when DMA2 passage, carry out the error detection of single-particle inversion mistake, dynamically generated the checking parameter moving program code by checking algorithm, described checking algorithm is consistent with claim 1 step (2);
B. carry out refresh process to the checking parameter of the fault-tolerant processing B module dynamically generated, the checking parameter reaching redundant storage is consistent, obtains the checking parameter of redundant storage;
C. the checking parameter of b step is compared with the checking parameter in a step, if inconsistent, then assert and there occurs single-particle inversion mistake, reach monitoring function;
(43) error correction
A. in above-mentioned steps c, comparative result is inconsistent, then the source address starting DMA2 passage is set to ROM_B district start address, and destination address is set to RAM_B district start address; The program unrepairable condition of DMA2 passage is set, arrange with secondary control register the program code starting DMA2 passage by the master control of DMA2 passage to move, the original program code that ROM_B district stores is loaded into RAM, cover original abnormal program code, thus complete the error correction of single-particle inversion mistake;
B. complete in error correction, DMA2 passage is set for can move condition, DMA2 passage repairing condition is set for repairing, circulated to the error correction of RAM_B district program code inspection.
2. according to claim 1ly can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, it is characterized in that: the specific implementation process of described step (2) is as follows:
(21) first start DMA1 passage, from the program storage RAM_B district that CPU not directly reads and writes, move program code to data field;
(22) dynamically generate by checking algorithm the checking parameter that RAM_B district moves program code after having moved;
(23) by redundancy storage method, checking parameter is stored in data RAM;
(24) start DMA2 passage, from the program storage RAM_A district that CPU not directly reads and writes, move program code to data field;
(25) dynamically generate by checking algorithm the checking parameter that RAM_A district moves program code after having moved;
(26) by redundancy storage method, checking parameter is stored in RAM.
3. according to claim 1ly can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, it is characterized in that: the checking algorithm in described step (2) is XOR and checking algorithm, cumulative sum, or CRC check method.
4. according to claim 1ly can survey the software fault-tolerant method of single-particle inversion by On-line self-diagnosis comprehensively, it is characterized in that: the redundant storage in described step (2) is 3 get 2 redundant storage.
CN201110387908.9A 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset Active CN102521062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110387908.9A CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110387908.9A CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Publications (2)

Publication Number Publication Date
CN102521062A CN102521062A (en) 2012-06-27
CN102521062B true CN102521062B (en) 2015-02-11

Family

ID=46291997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110387908.9A Active CN102521062B (en) 2011-11-29 2011-11-29 Software fault-tolerant method capable of comprehensively on-line self-detection single event upset

Country Status (1)

Country Link
CN (1) CN102521062B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103246581B (en) * 2013-04-26 2016-05-04 杭州和利时自动化有限公司 A kind of redundant electric subsystem passage diagnostic method and device
CN103678123B (en) * 2013-11-29 2016-08-17 西安空间无线电技术研究所 One is applicable to processor system single-particle soft error tender spots recognition methods
CN103984630B (en) * 2014-05-27 2017-02-01 中国科学院空间科学与应用研究中心 Single event upset fault processing method based on AT697 processor
CN104898477B (en) * 2015-04-09 2016-10-19 北京空间飞行器总体设计部 A kind of satellite spread spectrum answering machine autonomous anti-space single-particle inversion fault method
CN105446842B (en) * 2015-12-03 2019-01-04 南京南瑞继保电气有限公司 A kind of ADI DSP code in-service monitoring method
CN106528312B (en) * 2016-09-29 2019-07-12 北京广利核系统工程有限公司 Fault repairing method and device based on FPGA
CN108255636A (en) * 2017-12-13 2018-07-06 太原航空仪表有限公司 A kind of anti-single particle overturning system and its application method
CN108804028A (en) * 2018-04-20 2018-11-13 江苏华存电子科技有限公司 Data guard method in a kind of storage device
CN109976962B (en) * 2019-03-10 2023-10-20 国家卫星气象中心(国家空间天气监测预警中心) FPGA single event upset protection method and system for FY-4A satellite lightning imager
CN112115017B (en) * 2020-08-07 2022-07-12 航天科工空间工程发展有限公司 Logic code monitoring method and device of satellite-borne software program
CN112181709B (en) * 2020-09-08 2022-11-11 国电南瑞科技股份有限公司 RAM storage area single event effect fault tolerance method of FPGA chip
CN112256463B (en) * 2020-09-30 2023-07-14 北京控制工程研究所 Single-particle soft error processing method for guaranteeing consistency of Cache and content of off-chip memory
CN112328396B (en) * 2020-11-09 2022-10-21 西安电子科技大学 Dynamic self-adaptive SOPC fault-tolerant method based on task level
CN113687871B (en) * 2021-05-28 2024-05-03 西安空间无线电技术研究所 Method and device for starting up and preventing deadlock of satellite-borne microprocessor
CN113721135B (en) * 2021-07-22 2022-05-13 南京航空航天大学 SRAM type FPGA fault online fault tolerance method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604057A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method and system for hardware enforcement of logical partitioning of a channel adapter's resources in a system area network
CN101273338A (en) * 2005-09-30 2008-09-24 英特尔公司 DMA transfers of sets of data and an exclusive or (xor) of the sets of data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6883037B2 (en) * 2001-03-21 2005-04-19 Microsoft Corporation Fast data decoder that operates with reduced output buffer bounds checking
US8250250B2 (en) * 2009-10-28 2012-08-21 Apple Inc. Using central direct memory access (CDMA) controller to test integrated circuit

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604057A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method and system for hardware enforcement of logical partitioning of a channel adapter's resources in a system area network
CN101273338A (en) * 2005-09-30 2008-09-24 英特尔公司 DMA transfers of sets of data and an exclusive or (xor) of the sets of data

Also Published As

Publication number Publication date
CN102521062A (en) 2012-06-27

Similar Documents

Publication Publication Date Title
CN102521062B (en) Software fault-tolerant method capable of comprehensively on-line self-detection single event upset
CN101996689B (en) Memory errors processing method
CN103984630B (en) Single event upset fault processing method based on AT697 processor
WO2018129246A1 (en) Error-correcting code memory
US20180121282A1 (en) Register error detection system
KR101557572B1 (en) Memory circuits, method for accessing a memory and method for repairing a memory
CN205881469U (en) Fault detection equipment of electronic equipment and memory that is used for having a plurality of memory locations of standing transient fault and permanent fault
US20130139008A1 (en) Methods and apparatus for ecc memory error injection
CN108153611B (en) Fault self-checking method, fault self-checking device, microcontroller and readable storage medium
CN109491821A (en) Primary particle inversion resistant hardened system and method
CN103678115A (en) Apparatus and method for detecting location of source code error in mixed-mode program
CN112053737B (en) Online parallel processing soft error real-time error detection and recovery method and system
CN107301042A (en) A kind of SoC application program bootstrap techniques with self-checking function
CN103413571B (en) Storer and utilize this storer to realize the method for error-detection error-correction
CN103208313A (en) Detection method and detection system
Mariani et al. A flexible microcontroller architecture for fail-safe and fail-operational systems
CN103257905B (en) A kind of embedded computer system internal storage data checking circuit and method
Fang et al. Bonvoision: Leveraging spatial data smoothness for recovery from memory soft errors
CN104461798A (en) Random number validation method for processor arithmetic logic unit instruction
Lee et al. Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants
CN113917385A (en) Self-detection method and system for electric energy meter
CN103514071A (en) Nondestructive internal storage online testing method
CN101341469A (en) Method for recognizing a power outage in a data memory and recovering the data memory
CN111352754A (en) Data storage error detection and correction method and data storage device
CN201828914U (en) System for positioning FPGA (field-programmable gate array) chip sensitive area

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant