CN109086880A - A kind of arithmetic unit and method - Google Patents

A kind of arithmetic unit and method Download PDF

Info

Publication number
CN109086880A
CN109086880A CN201710441977.0A CN201710441977A CN109086880A CN 109086880 A CN109086880 A CN 109086880A CN 201710441977 A CN201710441977 A CN 201710441977A CN 109086880 A CN109086880 A CN 109086880A
Authority
CN
China
Prior art keywords
judgment
rule
data
multiplier
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710441977.0A
Other languages
Chinese (zh)
Other versions
CN109086880B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201710441977.0A priority Critical patent/CN109086880B/en
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202110597369.5A priority patent/CN113449855A/en
Priority to EP19217768.1A priority patent/EP3657403A1/en
Priority to PCT/CN2018/090901 priority patent/WO2018228399A1/en
Priority to EP18818258.8A priority patent/EP3637327B1/en
Publication of CN109086880A publication Critical patent/CN109086880A/en
Priority to US16/698,984 priority patent/US11544543B2/en
Priority to US16/698,976 priority patent/US11544542B2/en
Priority to US16/698,988 priority patent/US11537858B2/en
Application granted granted Critical
Publication of CN109086880B publication Critical patent/CN109086880B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Abstract

Present disclose provides a kind of arithmetic units, comprising: computing module, including one or more arithmetic element;And control module, including Operations Analysis, for controlling the closing of the arithmetic element by a Rule of judgment.The disclosure additionally provides a kind of operation method.Disclosure low-power consumption arithmetic unit and method, flexibility with higher can be combined with the hoisting way of software, so as to further improving operational speed, reduced calculation amount, reduced the operation power consumption of accelerator.

Description

A kind of arithmetic unit and method
Technical field
This disclosure relates to field of artificial intelligence more particularly to a kind of arithmetic unit and method.
Background technique
Deep neural network is the basis of current many artificial intelligence applications, in speech recognition, image procossing, data point The various aspects such as analysis, advertisement recommender system, automatic driving have obtained breakthrough application, so that deep neural network is applied In the various aspects of life.But the operand of deep neural network is huge, restrict always its faster development and more It is widely applied.When considering to accelerate with accelerator design the operation of deep neural network, huge operand will necessarily With very big energy consumption expense, the further extensive use of accelerator equally restrict.
Hardware aspect, existing common accelerator architecture are main by analyzing time-consuming most operational part in its operation Point, then targetedly accelerated to design.By taking convolutional neural networks as an example, as shown in Figure 1, existing common inner product operation Accelerating structure usually from " multiply-add " structure, i.e., one group of product value is obtained in a clock cycle by one group of multiplier, then, It adds up parallel, obtains final result.However, this structure, flexibility is not high, can not further improving operational speed, drop Low calculation amount.
Summary of the invention
(1) technical problems to be solved
In order to solve or at least partly alleviate above-mentioned technical problem, present disclose provides a kind of low-power consumption arithmetic unit and Method.Disclosure low-power consumption arithmetic unit and method, flexibility with higher can be combined with the hoisting way of software, So as to further improving operational speed, calculation amount is reduced, the operation power consumption of accelerator is reduced.
(2) technical solution
According to one aspect of the disclosure, a kind of arithmetic unit is provided, comprising:
Computing module, including one or more arithmetic elements;And
Control module, including Operations Analysis, for controlling the operation list of the computing module according to a Rule of judgment The closing of member.
In some embodiments, each arithmetic element includes one or more arithmetic units, which is to add Musical instruments used in a Buddhist or Taoist mass, multiplier, selector or temporary buffer.
In some embodiments, the computing module includes the n positioned at n multiplier of the first order and positioned at the second level The add tree of input, wherein n is positive integer.
In some embodiments, the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the Rule of judgment is threshold decision condition, comprising: less than a given threshold value, is greater than one Given threshold value, in a given value range or outside a given value range.
In some embodiments, the Rule of judgment is Function Mapping Rule of judgment, that is, judge after functional transformation whether Meet specified criteria.
In some embodiments, n multiplier of the first order is connect with Operations Analysis respectively, the operation control Unit controls the closing of the multiplier according to the Rule of judgment.
In some embodiments, which is judged according to the data that the Rule of judgment treats operation, When the absolute value for the input data that it judges one multiplier of input is less than given threshold value, then the multiplier is closed.
In some embodiments, the add tree includes k grades of adders, and the number of first order adder is n/2, last Grade is that the number of kth grade adder is 1, wherein 2k=n;1st grade of n/2 adder of the add tree and the n are a Multiplier connection receives data-signal and control signal that multiplier is sent;The 2nd of add tree to kth grade adder respectively with Its preceding 1 grade of adder connection receives data-signal and control signal that preceding 1 grade of adder is sent.
In some embodiments, if multiplier receives the shutdown signal of Operations Analysis transmission, to add tree Otherwise result of product is sent to the 1st grade of adder of add tree, and input control by input control signal 0 in the 1st grade of adder Signal 1;If adder receives two control signals when being 1, input value is added up, and will be cumulative and be sent to junior's addition Device, and control signal 1 is sent to junior;If the control signal that adder receives be 1, one be 0 when, then for 1 one end Input data be directly passed to junior, and to junior's input control signal 1;If it is 0 that adder, which receives two control signals, When, then close adder, and to junior's adder input control signal 0, and so on, until add tree is cumulative obtain it is final As a result.
In some embodiments, n multiplier of the first order and the n of the second level input add tree respectively with operation Control unit connection, the data which treats operation are judged according to the Rule of judgment;When its judge it is defeated When entering the absolute value of the input data of a multiplier or adder less than given threshold value, then the multiplier or adder are closed.
In some embodiments, the arithmetic unit further include: data processing module, for being extended or pressing to data Contracting;Correspondingly, the control module includes DCU data control unit, data are extended or are pressed for controlling data processing module Contracting.
In some embodiments, the data processing module is extended and compresses to data, if cynapse value is sparse mould Formula, i.e., the sparse network expressed with sparse coding, according to the sparse situation of cynapse value to neuron number according to compressing, compression is sieved Except the neuron number evidence for not needing progress operation;Or, if neuron number evidence is sparse mode, according to the sparse feelings of neuron number evidence Condition compresses cynapse accordingly, and compression screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment, According to the compression Rule of judgment to cynapse and/or neuron number according to compressing.
In some embodiments, the compression Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the threshold decision condition, comprising: less than a given threshold value, it is greater than a given threshold value, In one given value range or outside a given value range.
In some embodiments, n multiplier of the first order is connect with the data processing module respectively, receives number The neuron number evidence and cynapse data exported according to processing module.
In some embodiments, computing module includes m arithmetic element, and each arithmetic element adds including a multiplier, one Musical instruments used in a Buddhist or Taoist mass and a temporary buffer;
Control module includes m Operations Analysis, each Operations Analysis multiplication with an arithmetic element respectively Device and adder connection, the closing of multiplier and adder are controlled according to the Rule of judgment, m is positive integer.
In some embodiments, there are three input terminal and an output ends for the multiplier tool of each arithmetic element, wherein two is defeated Enter end to be respectively used to receive neuron number evidence and cynapse data, another input terminal is used for input control signal, and output end is for defeated Multiplication result out;
Adder tool there are three input terminal and an output end, wherein two input terminals be respectively used to receive multiplication result with The data of temporary buffer input, another input terminal are used for input control signal, and output end is for exporting add operation as a result, should Add operation result is subsequently saved back the temporary buffer for use as the input data of next layer of add operation.
In some embodiments, cynapse data are by broadcast transmission to each arithmetic element, if the nerve of input arithmetic element Metadata is less than threshold value, then controls the multiplier and adder for closing the arithmetic element by control signal, deposit in temporary buffer The part of storage and constant;Otherwise, it is multiplied using multiplier to two data that input is come in arithmetic element, and and temporal cache In data mutually add up, be stored back to temporary buffer after cumulative.
In some embodiments, the arithmetic unit further includes data processing module and memory module, the multiplier One of input terminal is connect with data processing module, for receiving compressed cynapse data;One of input terminal with deposit Module connection is stored up, for receiving neuron number evidence.
In some embodiments, computing module includes p arithmetic element, and each arithmetic element includes multiplier, one A adder and a selector;
Control module includes p Operations Analysis, each Operations Analysis multiplication with an arithmetic element respectively Device and adder connection, the closing of multiplier and adder are controlled according to the Rule of judgment, p is positive integer.
In some embodiments, there are three input terminal and an output ends for the multiplier tool of each arithmetic element, wherein two is defeated Enter end to be respectively used to receive neuron number evidence and cynapse data, another input terminal is used for input control signal, and output end is for defeated Multiplication result out;
There are three input terminal and an output ends for the adder tool of 1st arithmetic element, wherein two input terminals are respectively used to connect The data of the selector input of multiplication result and the same level arithmetic element are received, another input terminal is used for input control signal, defeated Outlet is for exporting add operation as a result, the chosen device of add operation result is admitted to next arithmetic element for use as next The input data of grade add operation;
There are three input terminal and an output ends for the adder tool of 2nd to p arithmetic element, wherein two input terminals are respectively used to The data of the selector input of multiplication result and upper level arithmetic element are received, another input terminal is believed for input control Number, output end for export add operation as a result, the chosen device of add operation result be admitted to next arithmetic element with Make the input data of next stage add operation.
In some embodiments, the arithmetic unit further include:
Memory module is connect with the control module, for data required for controlling memory module storage or reading;Institute It states memory module while being connect with the computing module, for the data to operation to be inputted computing module, while receiving and depositing The data after module arithmetic are calculated in storage and transportation.
In some embodiments, the control module includes storage control unit, which deposits for controlling Data required for storing up module storage or reading.
A kind of operation method another aspect of the present disclosure provides, comprising:
Set Rule of judgment;
The closing of the arithmetic element of the computing module is controlled according to Rule of judgment.
In some embodiments, the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the Rule of judgment is threshold decision condition, comprising: less than a given threshold value, is greater than one Given threshold value, in a given value range or outside a given value range.
In some embodiments, the Rule of judgment is Function Mapping Rule of judgment, that is, judge after functional transformation whether Meet specified criteria.
In some embodiments, according to the Rule of judgment, if the absolute value of the input data of one multiplier of input is less than Given threshold value then closes the multiplier.
In some embodiments, multiplier receives shutdown signal, and input control is believed into the 1st grade of adder of add tree Number 0, otherwise result of product is sent to the 1st grade of adder of add tree, and input control signal 1;
1st grade of adder receives the control signal that multiplier is sent, and the 2nd to k grade of adder receives preceding 1 grade of adder respectively The control signal of transmission obtains final result until add tree is cumulative;Wherein,
If adder receives two control signals when being 1, input value is added up, and will be cumulative and be sent to junior and add Musical instruments used in a Buddhist or Taoist mass, and control signal 1 is sent to junior;If the control signal that adder receives be 1, one be 0 when, then for 1 one The input data at end is directly passed to junior, and to junior's input control signal 1;If the 1st grade of adder receives two control letters When number being 0, then close adder, and to next stage adder input control signal 0.
In some embodiments, the Rule of judgment is threshold decision condition, and Operations Analysis given threshold simultaneously will be defeated The absolute value for entering the data of multiplier/adders is compared with threshold value, if being less than threshold value, controls closing multiplier/addition Device.
In some embodiments, if the multiplier of an arithmetic element is not turned off, to the neuron number evidence of input and cynapse Data execute multiplying and export multiplication result;Conversely, then simultaneously closing off the adder of the arithmetic element, temporal cache The part of middle storage and constant;
If the adder of an arithmetic element is not turned off, the data of multiplication result and temporary buffer input are received It executes add operation and exports add operation as a result, the add operation result is subsequently saved back the temporary buffer for use as next layer The input data of add operation, until operation terminates;Conversely, the part that is stored in temporal cache and constant.
In some embodiments, if the multiplier of an arithmetic element is not turned off, to the neuron number evidence of input and cynapse Data execute multiplying and export multiplication result;Conversely, the adder of the arithmetic element is then simultaneously closed off, selector choosing Select the selector that upper level data are directly transferred to next stage;
If the adder of an arithmetic element is not turned off, the input data or upper one of multiplication result and the same level is received The data of grade selector output execute add operation and export add operation as a result, the result passes through selector, and feeding is next Arithmetic element;
Otherwise, when adder and multiplier are closed, the input data or upper level of the same level that selector selection input comes The data of selector output are exported as the result of the arithmetic element.
In some embodiments, further includes: the data for treating operation are extended or compress.
In some embodiments, the data processing module is extended and compresses to data, if cynapse value is sparse mould Formula, according to the sparse situation of cynapse value to neuron number according to compressing, compression screens out the neuron number for not needing to carry out operation According to;Or, being pressed if neuron number compresses cynapse according to the sparse situation of neuron number evidence according to being sparse mode accordingly Contracting screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment, according to the compression Rule of judgment to cynapse And/or neuron number is according to being compressed.
(3) beneficial effect
It can be seen from the above technical proposal that disclosure low-power consumption arithmetic unit and method at least have the advantages that One of them:
(1) data are distinguished, according to the difference of data, to reasonably select corresponding group that whether configures arithmetic unit It whether can be such as sparse representation according to the storage mode of data, to choose whether configuration data processing unit at part Point;For another example, it can choose the operation group quantity of configuration according to demand, the multiplier and adder of option and installment in each operation group Quantity, flexibility with higher.
(2) meeting when operational data meets given Rule of judgment, closing corresponding multiplier and adder Demand, so as to reduce the power consumption of accelerator in the case where not influencing the arithmetic speed of accelerator.
(3) it is dropped to control the ratio for closing adder and multiplier to reach control according to by adjusting Rule of judgment The how many purpose of low energy consumption.
Detailed description of the invention
Fig. 1 is prior art arithmetic unit the functional block diagram.
Fig. 2 is according to disclosure low-power consumption arithmetic unit the functional block diagram.
Fig. 3 is according to another the functional block diagram of disclosure low-power consumption arithmetic unit.
Fig. 4 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 1.
Fig. 5 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 2.
Fig. 6 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 3.
Specific embodiment
For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing is described in further detail the disclosure.
It should be noted that similar or identical part all uses identical figure number in attached drawing or specification description.It is attached The implementation for not being painted or describing in figure is form known to a person of ordinary skill in the art in technical field.In addition, though this Text can provide the demonstration of the parameter comprising particular value, it is to be understood that parameter is equal to corresponding value without definite, but can connect It is similar to be worth accordingly in the error margin or design constraint received.In addition, the direction term mentioned in following embodiment, such as "upper", "lower", "front", "rear", "left", "right" etc. are only the directions with reference to attached drawing.Therefore, the direction term used be for Illustrate not to be used to limit the disclosure.
Present disclose provides a kind of low-power consumption neural network computing devices.Fig. 2 is according to disclosure low-power consumption arithmetic unit The functional block diagram.As shown in Fig. 2, low-power consumption neural network computing device specifically include that control module, memory module and Computing module;Wherein, the control module is connect with the memory module, required for controlling memory module storage or reading Data;The memory module is connect with the computing module, for the data to operation to be inputted computing module, is received simultaneously And store the data after computing module operation;The control module is connect with the computing module simultaneously, for according to operation Operation type and operational data control computing module working method.
Specifically, the control module includes storage control unit and Operations Analysis.Storage control unit is used for Data required for controlling memory module storage or reading;Operations Analysis is according to the operation type and operand to operation According to component yes/no work each in control arithmetic unit and how to work, which can decouple to each operation group, to it It is controlled, can also be used as an entirety, carry out whole or individually control to it in outside.
Further, it as shown in figure 3, the low-power consumption neural network computing device may also include data processing module, uses In being extended and compress to data;Correspondingly, the control module includes data control block, for controlling whether to data It is extended, perhaps compresses the data processing module and DCU data control unit is existed simultaneously or is not present.
The data processing module is extended and compresses to data, specifically, when cynapse value be sparse mode, that is, use When sparse representation, according to the sparse situation of cynapse value to neuron number according to compressing, screens out and do not need to carry out operation Neuron number evidence;Or, completing when neuron number is according to being sparse mode, according to the sparse situation of neuron number evidence to cynapse progress Corresponding compression, screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment, judges according to the compression Condition, according to compressing, screens out the data for meeting the compression Rule of judgment to cynapse and/or neuron number.
The compression Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.Wherein, the threshold decision Condition, comprising: less than a given threshold value, be greater than a given threshold value, in a given value range or in a given value range Outside.
The memory module includes data storage cell and temporal cache unit, can be arranged according to demand one or Multiple data storage cells, it can the data to operation are stored to the same region, can also be stored separately;Intermediate result Value is possibly stored to the same region, can also be stored separately.
The computing module can be various structures, may include one or more arithmetic elements, each arithmetic element packet Include one or more multipliers and one or more adders.It is transmitted with certain direction to operational data between arithmetic element Or intermediate result value.
Disclosure low-power consumption neural network computing device major calculations process is as follows, and storage control module, which issues, reads control Signal reads the data to operation to storage section selection, if reading obtained data is compact model and growth data and deposits , then the compressed data is extended or its is corresponding to operand by DCU data control unit control data processing module According to being compressed.Then, Operations Analysis prepares to issue the operation signal to operation, and to the data to operation read Value is judged, if its absolute value is less than given threshold value, issues shutdown signal, otherwise, Xiang Xiang to corresponding arithmetic unit The arithmetic unit answered issues the operation signal to operation.After operation, if the data are compressed or are extended, So data control section, which receives the data and controls data processing section, carries out compression or extended operation to the data, that is, compresses Screen out the data for not needing operation or to be non-sparse representation by the Data expansion of rarefaction representation.Then according to storage Result is stored in storage section by control section, if can directly be divided by memory control unit will without carrying out data processing As a result it is stored in storage section.
The structure of disclosure arithmetic unit is discussed in detail below in conjunction with specific embodiment.It will be appreciated by those skilled in the art that Be disclosure arithmetic unit structure be not limited to be exemplified below it is several.
Embodiment 1
It please refers to shown in attached drawing 4, the present embodiment 1, the arithmetic element of computing module includes n (n is positive integer) a multiplier And the add tree of n input.Operations Analysis integrally controls the computing module, can issue and control to each multiplier Signal.Each adder and multiplier can also receive control signal in addition to that can receive data-signal, i.e., to be shipped in addition to receiving Count evidence, can also connect suspension control signal and issue control signal to next stage.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 1 is as follows: first First, storage control unit control memory module reads out neuron number evidence and cynapse data.If cynapse data are sparse coding The case where mode stores then needs the neuron number evidence and the index value of cynapse being passed to data processing module together, will be neural Metadata carries out corresponding compression according to index value, i.e., operation will be done with the cynapse data for being sent into arithmetic element by only filtering out Neuron number evidence.Then treated neuron number evidence and cynapse data are transferred to each operation list by data processing module together Member.Each multiplier receives a neuron number evidence and corresponding cynapse data.The result of multiplier is then sent into next stage Adder in add up, it is the sum of cumulative to continue to be fed into next stage adder again and add up, until completing final result.
Wherein, for the operation of first order multiplier, the data to operation input first order multiplier and operation control simultaneously Unit processed is directed to by Operations Analysis and carries out threshold decision to the data of operation, when its judgement inputs to a multiplier When inputting the absolute value of neuron number evidence less than given threshold value, the multiplier is closed, which receives shutdown signal, to Input control signal 0 in adder, and close, result of product is otherwise passed into adder, and input control signal 1.When adding When musical instruments used in a Buddhist or Taoist mass receives two control signals and is 1, then input value is added up, will cumulative and incoming next stage, and downward Primary afferent Control signal 1;When adder receive a control signal be 1, one be 0 when, then the input data for 1 one end is directly transmitted To next stage, and to next stage input control signal 1;When adder, which receives two control signals, is 0, then closing addition Device, and to next stage adder input control signal 0.And so on, final result, storage control are obtained until add tree is cumulative Unit controls result write-in memory module.The result obviously can on the basis of not losing the arithmetic speed of concurrent operation, By judging that multiplier or adder by input less than given threshold value carry out opening or closing operation, thus the energy in operation Enough it is further reduced power consumption.Wherein, close the Rule of judgment of multiplier in addition to the absolute value to judgment value be less than given threshold value it It outside, can also be to be greater than given threshold value to judgment value;To judgment value in given value interval, i.e., in small one and large one two threshold values Between;Or to judgment value outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or pass through to judgment value It is met certain condition after Function Mapping, it is big/to be less than given threshold value, in given value if the value after mapping is equal to given threshold value Section etc..
Embodiment 2
It please refers to shown in attached drawing 5, the present embodiment 2, computing module includes n (n is positive integer) a arithmetic element, each operation Unit include a multiplier, an adder and a temporal cache, while each Operations Analysis be also split to Each operation group.Cynapse data send each arithmetic element to by way of broadcast, and neuron is directly defeated by storage section It is sent into and.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 2 is as follows: first First, the temporal cache in each arithmetic element is initially 0.Storage control unit control memory module reads out neuron number evidence, If cynapse data are the case where sparse coding mode stores, by the index value of cynapse and the neuron number according to incoming number together According to processing module, neuron number is compressed accordingly according to according to index value, that is, filtering out in arithmetic element and will pass The cynapse data value entered does the neuron number evidence of operation.Then neuron number evidence and cynapse data, which pass, is passed each operation list Member.Meanwhile storage control unit control memory module reads out cynapse data, first cynapse data is broadcast to by memory module All arithmetic elements.When arithmetic element receives neuron number evidence, Operations Analysis carries out it to judge whether to be less than given Threshold value, if it is less, close the multiplier and adder of the arithmetic element, the part that stores in temporal cache and constant is no Then, it is multiplied using multiplier to two data that input is come in arithmetic element, and mutually added up with the data in temporal cache, tired out Adduction is stored back to temporal cache.Then, second cynapse data is broadcasted, aforesaid operations is repeated and is deposited until cynapse data sign-off Storage control unit saves n operation result to memory module.Next round operation is then carried out, until all operations are all tied Beam.When cynapse data are dense situation, it may be considered that neuron number evidence skips data after reading out from memory module Processing module, and directly transfer data to computing module.The characteristics of benefit of the device is, takes full advantage of neural network, That is the reusability of cynapse, the repetition for reducing cynapse data are read.It simultaneously can be according to the value of neuron number evidence, when its absolute value When less than given threshold value, corresponding multiplier and adder are closed, without operation, so as to reduce power consumption.Its In, the Rule of judgment of multiplier and adder is closed in addition to being less than given threshold value to judgment value absolute value, can also be wait judge Value is greater than given threshold value;To judgment value in given value interval, i.e., between small one and large one two threshold values;Or exist to judgment value Outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or meet centainly to judgment value by Function Mapping Condition is big/to be less than given threshold value, in given value interval etc. if the value after mapping is equal to given threshold value.
Embodiment 3
It please refers to shown in attached drawing 6, the present embodiment 3, computing module includes n arithmetic element.Each arithmetic element includes one A multiplier and an adder and a selector.Operations Analysis controls each arithmetic element, arithmetic element respectively Between transmitting portions and.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 3 is as follows: first First, storage control unit reads cynapse value from memory module, and is sent to data processing module.Data processing module passes through given Compression threshold compresses cynapse, i.e., only selects the cynapse value that absolute value is not less than the given compression threshold, transmit respectively To each arithmetic element, and keep cynapse value constant.Neuron value then is read from memory module again, by data processing module It carries out after compressing accordingly, i.e., only selects the neuron value that absolute value is not less than the given threshold value, be transferred to each operation list Operations Analysis in member.Operations Analysis receives neuron number evidence, sentences after taking absolute value to it with given threshold value It is disconnected.If it is less than given closing threshold value, then multiplier and adder are closed, selector selection directly transmits upper level data To the selector of next stage;Otherwise neuron number evidence is multiplied by multiplier with cynapse data, is then sent into adder, addition The data transmitted that device receives sum of products upper level add up, as a result input selector, and selector selects the output of adder As a result as the output of the same level as a result, in the adder and selector of incoming next stage.And so on, until operation terminates to obtain Final result.Final result is stored in memory module by storage control unit.This method passes through the multiple sieve to cynapse and neuron Choosing, improves the ratio of valid data in algorithm to greatest extent, further reduces calculation amount, improve arithmetic speed;It fills Divide and the shared feature of neural network cynapse is utilized, avoids the bring memory access power consumption of reading repeatedly of cynapse;It simultaneously can be When being less than given threshold value to operational data, useless multiplier and adder are closed, power consumption is further reduced.
Wherein, the Rule of judgment of multiplier and adder is closed in addition to being less than given threshold value to judgment value absolute value, may be used also Think and is greater than given threshold value to judgment value;To judgment value in given value interval, i.e., between small one and large one two threshold values;Or To judgment value outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or it is reflected to judgment value by function It penetrates and meets certain condition, it is big/to be less than given threshold value, in given value interval etc. if the value after mapping is equal to given threshold value.
In addition, though being to be compressed by given threshold value, but disclosure compression Rule of judgment is not in above-described embodiment It is limited to threshold decision condition, can also is Function Mapping Rule of judgment, the threshold decision condition can include: less than a given threshold Value is greater than a given threshold value, in a given value range or outside a given value range etc.;And operation is closed in the disclosure The Rule of judgment of unit and compression Rule of judgment, the two (such as can be all made of threshold decision, threshold using identical Rule of judgment The selection of value size equally may be the same or different), can also using different Rule of judgment, (such as one of them sentences for threshold value It is disconnected, another for mapping judgement), do not influence the realization of the disclosure.
Particular embodiments described above has carried out further in detail the purpose of the disclosure, technical scheme and beneficial effects Describe in detail it is bright, it is all it should be understood that be not limited to the disclosure the foregoing is merely the specific embodiment of the disclosure Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the disclosure Within the scope of shield.

Claims (10)

1. a kind of arithmetic unit, comprising:
Computing module, including one or more arithmetic elements;And
Control module, including Operations Analysis, for controlling according to a Rule of judgment arithmetic element of the computing module It closes.
2. arithmetic unit according to claim 1, wherein each arithmetic element includes one or more operational parts Part, the arithmetic unit are adder, multiplier, selector or temporary buffer.
3. arithmetic unit according to claim 2, wherein the computing module includes the n multiplier positioned at the first order And the add tree that the n positioned at the second level is inputted, wherein n is positive integer.
4. arithmetic unit according to claim 1, wherein the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
5. arithmetic unit according to claim 4, wherein the Rule of judgment is threshold decision condition, comprising: less than one Given threshold value is greater than a given threshold value, in a given value range or outside a given value range.
6. a kind of operation method, comprising:
Set Rule of judgment;
The closing of the arithmetic element of the computing module is controlled according to Rule of judgment.
7. operation method according to claim 6, wherein the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
8. operation method according to claim 7, wherein the Rule of judgment is threshold decision condition, comprising: less than one Given threshold value is greater than a given threshold value, in a given value range or outside a given value range.
9. operation method according to claim 7, wherein the Rule of judgment is Function Mapping Rule of judgment, that is, is judged Whether meet specified criteria after functional transformation.
10. operation method according to claim 6, wherein according to the Rule of judgment, if the input of one multiplier of input The absolute value of data is less than given threshold value, then closes the multiplier.
CN201710441977.0A 2017-06-13 2017-06-13 Arithmetic device and method Active CN109086880B (en)

Priority Applications (8)

Application Number Priority Date Filing Date Title
CN202110597369.5A CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method
CN201710441977.0A CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method
PCT/CN2018/090901 WO2018228399A1 (en) 2017-06-13 2018-06-12 Computing device and method
EP18818258.8A EP3637327B1 (en) 2017-06-13 2018-06-12 Computing device and method
EP19217768.1A EP3657403A1 (en) 2017-06-13 2018-06-12 Computing device and method
US16/698,984 US11544543B2 (en) 2017-06-13 2019-11-28 Apparatus and method for sparse training acceleration in neural networks
US16/698,976 US11544542B2 (en) 2017-06-13 2019-11-28 Computing device and method
US16/698,988 US11537858B2 (en) 2017-06-13 2019-11-28 Computing device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710441977.0A CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202110597369.5A Division CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method

Publications (2)

Publication Number Publication Date
CN109086880A true CN109086880A (en) 2018-12-25
CN109086880B CN109086880B (en) 2021-06-29

Family

ID=64839078

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201710441977.0A Active CN109086880B (en) 2017-06-13 2017-06-13 Arithmetic device and method
CN202110597369.5A Pending CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202110597369.5A Pending CN113449855A (en) 2017-06-13 2017-06-13 Arithmetic device and method

Country Status (1)

Country Link
CN (2) CN109086880B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449855A (en) * 2017-06-13 2021-09-28 上海寒武纪信息科技有限公司 Arithmetic device and method
US11537858B2 (en) 2017-06-13 2022-12-27 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
US11544526B2 (en) 2017-06-26 2023-01-03 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
US11727268B2 (en) 2017-06-21 2023-08-15 Shanghai Cambricon Information Technology Co., Ltd. Sparse training in neural networks

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1838031A (en) * 2005-04-13 2006-09-27 威盛电子股份有限公司 Closing non-acting numeric value logical operation unit to save power
CN101527010A (en) * 2008-03-06 2009-09-09 上海理工大学 Hardware realization method and system for artificial neural network algorithm
CN104539263A (en) * 2014-12-25 2015-04-22 电子科技大学 Reconfigurable low-power dissipation digital FIR filter

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8872688B2 (en) * 2010-07-13 2014-10-28 University Of Washington Through Its Center For Commercialization Methods and systems for compressed sensing analog to digital conversion
US20130318020A1 (en) * 2011-11-03 2013-11-28 Georgia Tech Research Corporation Analog programmable sparse approximation system
CN109086880B (en) * 2017-06-13 2021-06-29 上海寒武纪信息科技有限公司 Arithmetic device and method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1838031A (en) * 2005-04-13 2006-09-27 威盛电子股份有限公司 Closing non-acting numeric value logical operation unit to save power
CN101527010A (en) * 2008-03-06 2009-09-09 上海理工大学 Hardware realization method and system for artificial neural network algorithm
CN104539263A (en) * 2014-12-25 2015-04-22 电子科技大学 Reconfigurable low-power dissipation digital FIR filter

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHIJIN ZHANG,ETC: "Cambricon-X: An Accelerator for Sparse Neural Networks", 《2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 *
YU-HSIN CHEN,ETC: "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", 《IEEE JOURNAL OF SOLID-STATE CIRCUITS》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113449855A (en) * 2017-06-13 2021-09-28 上海寒武纪信息科技有限公司 Arithmetic device and method
US11537858B2 (en) 2017-06-13 2022-12-27 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
US11544542B2 (en) 2017-06-13 2023-01-03 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method
US11544543B2 (en) 2017-06-13 2023-01-03 Shanghai Cambricon Information Technology Co., Ltd. Apparatus and method for sparse training acceleration in neural networks
US11727268B2 (en) 2017-06-21 2023-08-15 Shanghai Cambricon Information Technology Co., Ltd. Sparse training in neural networks
US11544526B2 (en) 2017-06-26 2023-01-03 Shanghai Cambricon Information Technology Co., Ltd. Computing device and method

Also Published As

Publication number Publication date
CN113449855A (en) 2021-09-28
CN109086880B (en) 2021-06-29

Similar Documents

Publication Publication Date Title
CN109086880A (en) A kind of arithmetic unit and method
CN106126481B (en) A kind of computing system and electronic equipment
CN108090560A (en) The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN110035410A (en) Federated resource distribution and the method and system of unloading are calculated in a kind of vehicle-mounted edge network of software definition
CN107391317A (en) A kind of method, apparatus of data recovery, equipment and computer-readable recording medium
CN110764885B (en) Method for splitting and unloading DNN tasks of multiple mobile devices
EP3637327B1 (en) Computing device and method
US20220004858A1 (en) Method for processing artificial neural network, and electronic device therefor
Hashemi et al. On the benefits of multiple gossip steps in communication-constrained decentralized federated learning
CN111339027A (en) Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip
CN108345934A (en) A kind of activation device and method for neural network processor
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN109409505A (en) A method of the compression gradient for distributed deep learning
CN109615071A (en) A kind of neural network processor of high energy efficiency, acceleration system and method
CN114885420A (en) User grouping and resource allocation method and device in NOMA-MEC system
CN110909870A (en) Training device and method
Abouhawwash et al. Evolutionary multi-objective optimization using benson’s karush-kuhn-tucker proximity measure
CN109491956B (en) Heterogeneous collaborative computing system
Li et al. Robust and efficient quantization and coding for control of multidimensional linear systems under data rate constraints
CN114005458A (en) Voice noise reduction method and system based on pipeline architecture and storage medium
Sharara et al. A recurrent neural network based approach for coordinating radio and computing resources allocation in cloud-ran
CN102209369B (en) Method based on wireless network interface selection to improve a smart phone user experience
CN110647396A (en) Method for realizing intelligent application of end cloud cooperative low-power consumption and limited bandwidth
CN104933110A (en) MapReduce-based data pre-fetching method
CN110113193A (en) Data transmission method, system and medium based on hierarchical agent

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Request for anonymity

Inventor before: Request for anonymity