CN109086880A - A kind of arithmetic unit and method - Google Patents
A kind of arithmetic unit and method Download PDFInfo
- Publication number
- CN109086880A CN109086880A CN201710441977.0A CN201710441977A CN109086880A CN 109086880 A CN109086880 A CN 109086880A CN 201710441977 A CN201710441977 A CN 201710441977A CN 109086880 A CN109086880 A CN 109086880A
- Authority
- CN
- China
- Prior art keywords
- judgment
- rule
- data
- multiplier
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
Abstract
Present disclose provides a kind of arithmetic units, comprising: computing module, including one or more arithmetic element;And control module, including Operations Analysis, for controlling the closing of the arithmetic element by a Rule of judgment.The disclosure additionally provides a kind of operation method.Disclosure low-power consumption arithmetic unit and method, flexibility with higher can be combined with the hoisting way of software, so as to further improving operational speed, reduced calculation amount, reduced the operation power consumption of accelerator.
Description
Technical field
This disclosure relates to field of artificial intelligence more particularly to a kind of arithmetic unit and method.
Background technique
Deep neural network is the basis of current many artificial intelligence applications, in speech recognition, image procossing, data point
The various aspects such as analysis, advertisement recommender system, automatic driving have obtained breakthrough application, so that deep neural network is applied
In the various aspects of life.But the operand of deep neural network is huge, restrict always its faster development and more
It is widely applied.When considering to accelerate with accelerator design the operation of deep neural network, huge operand will necessarily
With very big energy consumption expense, the further extensive use of accelerator equally restrict.
Hardware aspect, existing common accelerator architecture are main by analyzing time-consuming most operational part in its operation
Point, then targetedly accelerated to design.By taking convolutional neural networks as an example, as shown in Figure 1, existing common inner product operation
Accelerating structure usually from " multiply-add " structure, i.e., one group of product value is obtained in a clock cycle by one group of multiplier, then,
It adds up parallel, obtains final result.However, this structure, flexibility is not high, can not further improving operational speed, drop
Low calculation amount.
Summary of the invention
(1) technical problems to be solved
In order to solve or at least partly alleviate above-mentioned technical problem, present disclose provides a kind of low-power consumption arithmetic unit and
Method.Disclosure low-power consumption arithmetic unit and method, flexibility with higher can be combined with the hoisting way of software,
So as to further improving operational speed, calculation amount is reduced, the operation power consumption of accelerator is reduced.
(2) technical solution
According to one aspect of the disclosure, a kind of arithmetic unit is provided, comprising:
Computing module, including one or more arithmetic elements;And
Control module, including Operations Analysis, for controlling the operation list of the computing module according to a Rule of judgment
The closing of member.
In some embodiments, each arithmetic element includes one or more arithmetic units, which is to add
Musical instruments used in a Buddhist or Taoist mass, multiplier, selector or temporary buffer.
In some embodiments, the computing module includes the n positioned at n multiplier of the first order and positioned at the second level
The add tree of input, wherein n is positive integer.
In some embodiments, the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the Rule of judgment is threshold decision condition, comprising: less than a given threshold value, is greater than one
Given threshold value, in a given value range or outside a given value range.
In some embodiments, the Rule of judgment is Function Mapping Rule of judgment, that is, judge after functional transformation whether
Meet specified criteria.
In some embodiments, n multiplier of the first order is connect with Operations Analysis respectively, the operation control
Unit controls the closing of the multiplier according to the Rule of judgment.
In some embodiments, which is judged according to the data that the Rule of judgment treats operation,
When the absolute value for the input data that it judges one multiplier of input is less than given threshold value, then the multiplier is closed.
In some embodiments, the add tree includes k grades of adders, and the number of first order adder is n/2, last
Grade is that the number of kth grade adder is 1, wherein 2k=n;1st grade of n/2 adder of the add tree and the n are a
Multiplier connection receives data-signal and control signal that multiplier is sent;The 2nd of add tree to kth grade adder respectively with
Its preceding 1 grade of adder connection receives data-signal and control signal that preceding 1 grade of adder is sent.
In some embodiments, if multiplier receives the shutdown signal of Operations Analysis transmission, to add tree
Otherwise result of product is sent to the 1st grade of adder of add tree, and input control by input control signal 0 in the 1st grade of adder
Signal 1;If adder receives two control signals when being 1, input value is added up, and will be cumulative and be sent to junior's addition
Device, and control signal 1 is sent to junior;If the control signal that adder receives be 1, one be 0 when, then for 1 one end
Input data be directly passed to junior, and to junior's input control signal 1;If it is 0 that adder, which receives two control signals,
When, then close adder, and to junior's adder input control signal 0, and so on, until add tree is cumulative obtain it is final
As a result.
In some embodiments, n multiplier of the first order and the n of the second level input add tree respectively with operation
Control unit connection, the data which treats operation are judged according to the Rule of judgment;When its judge it is defeated
When entering the absolute value of the input data of a multiplier or adder less than given threshold value, then the multiplier or adder are closed.
In some embodiments, the arithmetic unit further include: data processing module, for being extended or pressing to data
Contracting;Correspondingly, the control module includes DCU data control unit, data are extended or are pressed for controlling data processing module
Contracting.
In some embodiments, the data processing module is extended and compresses to data, if cynapse value is sparse mould
Formula, i.e., the sparse network expressed with sparse coding, according to the sparse situation of cynapse value to neuron number according to compressing, compression is sieved
Except the neuron number evidence for not needing progress operation;Or, if neuron number evidence is sparse mode, according to the sparse feelings of neuron number evidence
Condition compresses cynapse accordingly, and compression screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment,
According to the compression Rule of judgment to cynapse and/or neuron number according to compressing.
In some embodiments, the compression Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the threshold decision condition, comprising: less than a given threshold value, it is greater than a given threshold value,
In one given value range or outside a given value range.
In some embodiments, n multiplier of the first order is connect with the data processing module respectively, receives number
The neuron number evidence and cynapse data exported according to processing module.
In some embodiments, computing module includes m arithmetic element, and each arithmetic element adds including a multiplier, one
Musical instruments used in a Buddhist or Taoist mass and a temporary buffer;
Control module includes m Operations Analysis, each Operations Analysis multiplication with an arithmetic element respectively
Device and adder connection, the closing of multiplier and adder are controlled according to the Rule of judgment, m is positive integer.
In some embodiments, there are three input terminal and an output ends for the multiplier tool of each arithmetic element, wherein two is defeated
Enter end to be respectively used to receive neuron number evidence and cynapse data, another input terminal is used for input control signal, and output end is for defeated
Multiplication result out;
Adder tool there are three input terminal and an output end, wherein two input terminals be respectively used to receive multiplication result with
The data of temporary buffer input, another input terminal are used for input control signal, and output end is for exporting add operation as a result, should
Add operation result is subsequently saved back the temporary buffer for use as the input data of next layer of add operation.
In some embodiments, cynapse data are by broadcast transmission to each arithmetic element, if the nerve of input arithmetic element
Metadata is less than threshold value, then controls the multiplier and adder for closing the arithmetic element by control signal, deposit in temporary buffer
The part of storage and constant;Otherwise, it is multiplied using multiplier to two data that input is come in arithmetic element, and and temporal cache
In data mutually add up, be stored back to temporary buffer after cumulative.
In some embodiments, the arithmetic unit further includes data processing module and memory module, the multiplier
One of input terminal is connect with data processing module, for receiving compressed cynapse data;One of input terminal with deposit
Module connection is stored up, for receiving neuron number evidence.
In some embodiments, computing module includes p arithmetic element, and each arithmetic element includes multiplier, one
A adder and a selector;
Control module includes p Operations Analysis, each Operations Analysis multiplication with an arithmetic element respectively
Device and adder connection, the closing of multiplier and adder are controlled according to the Rule of judgment, p is positive integer.
In some embodiments, there are three input terminal and an output ends for the multiplier tool of each arithmetic element, wherein two is defeated
Enter end to be respectively used to receive neuron number evidence and cynapse data, another input terminal is used for input control signal, and output end is for defeated
Multiplication result out;
There are three input terminal and an output ends for the adder tool of 1st arithmetic element, wherein two input terminals are respectively used to connect
The data of the selector input of multiplication result and the same level arithmetic element are received, another input terminal is used for input control signal, defeated
Outlet is for exporting add operation as a result, the chosen device of add operation result is admitted to next arithmetic element for use as next
The input data of grade add operation;
There are three input terminal and an output ends for the adder tool of 2nd to p arithmetic element, wherein two input terminals are respectively used to
The data of the selector input of multiplication result and upper level arithmetic element are received, another input terminal is believed for input control
Number, output end for export add operation as a result, the chosen device of add operation result be admitted to next arithmetic element with
Make the input data of next stage add operation.
In some embodiments, the arithmetic unit further include:
Memory module is connect with the control module, for data required for controlling memory module storage or reading;Institute
It states memory module while being connect with the computing module, for the data to operation to be inputted computing module, while receiving and depositing
The data after module arithmetic are calculated in storage and transportation.
In some embodiments, the control module includes storage control unit, which deposits for controlling
Data required for storing up module storage or reading.
A kind of operation method another aspect of the present disclosure provides, comprising:
Set Rule of judgment;
The closing of the arithmetic element of the computing module is controlled according to Rule of judgment.
In some embodiments, the Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.
In some embodiments, the Rule of judgment is threshold decision condition, comprising: less than a given threshold value, is greater than one
Given threshold value, in a given value range or outside a given value range.
In some embodiments, the Rule of judgment is Function Mapping Rule of judgment, that is, judge after functional transformation whether
Meet specified criteria.
In some embodiments, according to the Rule of judgment, if the absolute value of the input data of one multiplier of input is less than
Given threshold value then closes the multiplier.
In some embodiments, multiplier receives shutdown signal, and input control is believed into the 1st grade of adder of add tree
Number 0, otherwise result of product is sent to the 1st grade of adder of add tree, and input control signal 1;
1st grade of adder receives the control signal that multiplier is sent, and the 2nd to k grade of adder receives preceding 1 grade of adder respectively
The control signal of transmission obtains final result until add tree is cumulative;Wherein,
If adder receives two control signals when being 1, input value is added up, and will be cumulative and be sent to junior and add
Musical instruments used in a Buddhist or Taoist mass, and control signal 1 is sent to junior;If the control signal that adder receives be 1, one be 0 when, then for 1 one
The input data at end is directly passed to junior, and to junior's input control signal 1;If the 1st grade of adder receives two control letters
When number being 0, then close adder, and to next stage adder input control signal 0.
In some embodiments, the Rule of judgment is threshold decision condition, and Operations Analysis given threshold simultaneously will be defeated
The absolute value for entering the data of multiplier/adders is compared with threshold value, if being less than threshold value, controls closing multiplier/addition
Device.
In some embodiments, if the multiplier of an arithmetic element is not turned off, to the neuron number evidence of input and cynapse
Data execute multiplying and export multiplication result;Conversely, then simultaneously closing off the adder of the arithmetic element, temporal cache
The part of middle storage and constant;
If the adder of an arithmetic element is not turned off, the data of multiplication result and temporary buffer input are received
It executes add operation and exports add operation as a result, the add operation result is subsequently saved back the temporary buffer for use as next layer
The input data of add operation, until operation terminates;Conversely, the part that is stored in temporal cache and constant.
In some embodiments, if the multiplier of an arithmetic element is not turned off, to the neuron number evidence of input and cynapse
Data execute multiplying and export multiplication result;Conversely, the adder of the arithmetic element is then simultaneously closed off, selector choosing
Select the selector that upper level data are directly transferred to next stage;
If the adder of an arithmetic element is not turned off, the input data or upper one of multiplication result and the same level is received
The data of grade selector output execute add operation and export add operation as a result, the result passes through selector, and feeding is next
Arithmetic element;
Otherwise, when adder and multiplier are closed, the input data or upper level of the same level that selector selection input comes
The data of selector output are exported as the result of the arithmetic element.
In some embodiments, further includes: the data for treating operation are extended or compress.
In some embodiments, the data processing module is extended and compresses to data, if cynapse value is sparse mould
Formula, according to the sparse situation of cynapse value to neuron number according to compressing, compression screens out the neuron number for not needing to carry out operation
According to;Or, being pressed if neuron number compresses cynapse according to the sparse situation of neuron number evidence according to being sparse mode accordingly
Contracting screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment, according to the compression Rule of judgment to cynapse
And/or neuron number is according to being compressed.
(3) beneficial effect
It can be seen from the above technical proposal that disclosure low-power consumption arithmetic unit and method at least have the advantages that
One of them:
(1) data are distinguished, according to the difference of data, to reasonably select corresponding group that whether configures arithmetic unit
It whether can be such as sparse representation according to the storage mode of data, to choose whether configuration data processing unit at part
Point;For another example, it can choose the operation group quantity of configuration according to demand, the multiplier and adder of option and installment in each operation group
Quantity, flexibility with higher.
(2) meeting when operational data meets given Rule of judgment, closing corresponding multiplier and adder
Demand, so as to reduce the power consumption of accelerator in the case where not influencing the arithmetic speed of accelerator.
(3) it is dropped to control the ratio for closing adder and multiplier to reach control according to by adjusting Rule of judgment
The how many purpose of low energy consumption.
Detailed description of the invention
Fig. 1 is prior art arithmetic unit the functional block diagram.
Fig. 2 is according to disclosure low-power consumption arithmetic unit the functional block diagram.
Fig. 3 is according to another the functional block diagram of disclosure low-power consumption arithmetic unit.
Fig. 4 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 1.
Fig. 5 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 2.
Fig. 6 is the low-power consumption arithmetic unit structural schematic diagram proposed according to the embodiment of the present disclosure 3.
Specific embodiment
For the purposes, technical schemes and advantages of the disclosure are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing is described in further detail the disclosure.
It should be noted that similar or identical part all uses identical figure number in attached drawing or specification description.It is attached
The implementation for not being painted or describing in figure is form known to a person of ordinary skill in the art in technical field.In addition, though this
Text can provide the demonstration of the parameter comprising particular value, it is to be understood that parameter is equal to corresponding value without definite, but can connect
It is similar to be worth accordingly in the error margin or design constraint received.In addition, the direction term mentioned in following embodiment, such as
"upper", "lower", "front", "rear", "left", "right" etc. are only the directions with reference to attached drawing.Therefore, the direction term used be for
Illustrate not to be used to limit the disclosure.
Present disclose provides a kind of low-power consumption neural network computing devices.Fig. 2 is according to disclosure low-power consumption arithmetic unit
The functional block diagram.As shown in Fig. 2, low-power consumption neural network computing device specifically include that control module, memory module and
Computing module;Wherein, the control module is connect with the memory module, required for controlling memory module storage or reading
Data;The memory module is connect with the computing module, for the data to operation to be inputted computing module, is received simultaneously
And store the data after computing module operation;The control module is connect with the computing module simultaneously, for according to operation
Operation type and operational data control computing module working method.
Specifically, the control module includes storage control unit and Operations Analysis.Storage control unit is used for
Data required for controlling memory module storage or reading;Operations Analysis is according to the operation type and operand to operation
According to component yes/no work each in control arithmetic unit and how to work, which can decouple to each operation group, to it
It is controlled, can also be used as an entirety, carry out whole or individually control to it in outside.
Further, it as shown in figure 3, the low-power consumption neural network computing device may also include data processing module, uses
In being extended and compress to data;Correspondingly, the control module includes data control block, for controlling whether to data
It is extended, perhaps compresses the data processing module and DCU data control unit is existed simultaneously or is not present.
The data processing module is extended and compresses to data, specifically, when cynapse value be sparse mode, that is, use
When sparse representation, according to the sparse situation of cynapse value to neuron number according to compressing, screens out and do not need to carry out operation
Neuron number evidence;Or, completing when neuron number is according to being sparse mode, according to the sparse situation of neuron number evidence to cynapse progress
Corresponding compression, screens out the cynapse data for not needing to carry out operation;Or, given compression Rule of judgment, judges according to the compression
Condition, according to compressing, screens out the data for meeting the compression Rule of judgment to cynapse and/or neuron number.
The compression Rule of judgment includes threshold decision condition or Function Mapping Rule of judgment.Wherein, the threshold decision
Condition, comprising: less than a given threshold value, be greater than a given threshold value, in a given value range or in a given value range
Outside.
The memory module includes data storage cell and temporal cache unit, can be arranged according to demand one or
Multiple data storage cells, it can the data to operation are stored to the same region, can also be stored separately;Intermediate result
Value is possibly stored to the same region, can also be stored separately.
The computing module can be various structures, may include one or more arithmetic elements, each arithmetic element packet
Include one or more multipliers and one or more adders.It is transmitted with certain direction to operational data between arithmetic element
Or intermediate result value.
Disclosure low-power consumption neural network computing device major calculations process is as follows, and storage control module, which issues, reads control
Signal reads the data to operation to storage section selection, if reading obtained data is compact model and growth data and deposits
, then the compressed data is extended or its is corresponding to operand by DCU data control unit control data processing module
According to being compressed.Then, Operations Analysis prepares to issue the operation signal to operation, and to the data to operation read
Value is judged, if its absolute value is less than given threshold value, issues shutdown signal, otherwise, Xiang Xiang to corresponding arithmetic unit
The arithmetic unit answered issues the operation signal to operation.After operation, if the data are compressed or are extended,
So data control section, which receives the data and controls data processing section, carries out compression or extended operation to the data, that is, compresses
Screen out the data for not needing operation or to be non-sparse representation by the Data expansion of rarefaction representation.Then according to storage
Result is stored in storage section by control section, if can directly be divided by memory control unit will without carrying out data processing
As a result it is stored in storage section.
The structure of disclosure arithmetic unit is discussed in detail below in conjunction with specific embodiment.It will be appreciated by those skilled in the art that
Be disclosure arithmetic unit structure be not limited to be exemplified below it is several.
Embodiment 1
It please refers to shown in attached drawing 4, the present embodiment 1, the arithmetic element of computing module includes n (n is positive integer) a multiplier
And the add tree of n input.Operations Analysis integrally controls the computing module, can issue and control to each multiplier
Signal.Each adder and multiplier can also receive control signal in addition to that can receive data-signal, i.e., to be shipped in addition to receiving
Count evidence, can also connect suspension control signal and issue control signal to next stage.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 1 is as follows: first
First, storage control unit control memory module reads out neuron number evidence and cynapse data.If cynapse data are sparse coding
The case where mode stores then needs the neuron number evidence and the index value of cynapse being passed to data processing module together, will be neural
Metadata carries out corresponding compression according to index value, i.e., operation will be done with the cynapse data for being sent into arithmetic element by only filtering out
Neuron number evidence.Then treated neuron number evidence and cynapse data are transferred to each operation list by data processing module together
Member.Each multiplier receives a neuron number evidence and corresponding cynapse data.The result of multiplier is then sent into next stage
Adder in add up, it is the sum of cumulative to continue to be fed into next stage adder again and add up, until completing final result.
Wherein, for the operation of first order multiplier, the data to operation input first order multiplier and operation control simultaneously
Unit processed is directed to by Operations Analysis and carries out threshold decision to the data of operation, when its judgement inputs to a multiplier
When inputting the absolute value of neuron number evidence less than given threshold value, the multiplier is closed, which receives shutdown signal, to
Input control signal 0 in adder, and close, result of product is otherwise passed into adder, and input control signal 1.When adding
When musical instruments used in a Buddhist or Taoist mass receives two control signals and is 1, then input value is added up, will cumulative and incoming next stage, and downward Primary afferent
Control signal 1;When adder receive a control signal be 1, one be 0 when, then the input data for 1 one end is directly transmitted
To next stage, and to next stage input control signal 1;When adder, which receives two control signals, is 0, then closing addition
Device, and to next stage adder input control signal 0.And so on, final result, storage control are obtained until add tree is cumulative
Unit controls result write-in memory module.The result obviously can on the basis of not losing the arithmetic speed of concurrent operation,
By judging that multiplier or adder by input less than given threshold value carry out opening or closing operation, thus the energy in operation
Enough it is further reduced power consumption.Wherein, close the Rule of judgment of multiplier in addition to the absolute value to judgment value be less than given threshold value it
It outside, can also be to be greater than given threshold value to judgment value;To judgment value in given value interval, i.e., in small one and large one two threshold values
Between;Or to judgment value outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or pass through to judgment value
It is met certain condition after Function Mapping, it is big/to be less than given threshold value, in given value if the value after mapping is equal to given threshold value
Section etc..
Embodiment 2
It please refers to shown in attached drawing 5, the present embodiment 2, computing module includes n (n is positive integer) a arithmetic element, each operation
Unit include a multiplier, an adder and a temporal cache, while each Operations Analysis be also split to
Each operation group.Cynapse data send each arithmetic element to by way of broadcast, and neuron is directly defeated by storage section
It is sent into and.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 2 is as follows: first
First, the temporal cache in each arithmetic element is initially 0.Storage control unit control memory module reads out neuron number evidence,
If cynapse data are the case where sparse coding mode stores, by the index value of cynapse and the neuron number according to incoming number together
According to processing module, neuron number is compressed accordingly according to according to index value, that is, filtering out in arithmetic element and will pass
The cynapse data value entered does the neuron number evidence of operation.Then neuron number evidence and cynapse data, which pass, is passed each operation list
Member.Meanwhile storage control unit control memory module reads out cynapse data, first cynapse data is broadcast to by memory module
All arithmetic elements.When arithmetic element receives neuron number evidence, Operations Analysis carries out it to judge whether to be less than given
Threshold value, if it is less, close the multiplier and adder of the arithmetic element, the part that stores in temporal cache and constant is no
Then, it is multiplied using multiplier to two data that input is come in arithmetic element, and mutually added up with the data in temporal cache, tired out
Adduction is stored back to temporal cache.Then, second cynapse data is broadcasted, aforesaid operations is repeated and is deposited until cynapse data sign-off
Storage control unit saves n operation result to memory module.Next round operation is then carried out, until all operations are all tied
Beam.When cynapse data are dense situation, it may be considered that neuron number evidence skips data after reading out from memory module
Processing module, and directly transfer data to computing module.The characteristics of benefit of the device is, takes full advantage of neural network,
That is the reusability of cynapse, the repetition for reducing cynapse data are read.It simultaneously can be according to the value of neuron number evidence, when its absolute value
When less than given threshold value, corresponding multiplier and adder are closed, without operation, so as to reduce power consumption.Its
In, the Rule of judgment of multiplier and adder is closed in addition to being less than given threshold value to judgment value absolute value, can also be wait judge
Value is greater than given threshold value;To judgment value in given value interval, i.e., between small one and large one two threshold values;Or exist to judgment value
Outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or meet centainly to judgment value by Function Mapping
Condition is big/to be less than given threshold value, in given value interval etc. if the value after mapping is equal to given threshold value.
Embodiment 3
It please refers to shown in attached drawing 6, the present embodiment 3, computing module includes n arithmetic element.Each arithmetic element includes one
A multiplier and an adder and a selector.Operations Analysis controls each arithmetic element, arithmetic element respectively
Between transmitting portions and.
Concrete operation process using the low-power consumption neural network computing device of the computing module of the present embodiment 3 is as follows: first
First, storage control unit reads cynapse value from memory module, and is sent to data processing module.Data processing module passes through given
Compression threshold compresses cynapse, i.e., only selects the cynapse value that absolute value is not less than the given compression threshold, transmit respectively
To each arithmetic element, and keep cynapse value constant.Neuron value then is read from memory module again, by data processing module
It carries out after compressing accordingly, i.e., only selects the neuron value that absolute value is not less than the given threshold value, be transferred to each operation list
Operations Analysis in member.Operations Analysis receives neuron number evidence, sentences after taking absolute value to it with given threshold value
It is disconnected.If it is less than given closing threshold value, then multiplier and adder are closed, selector selection directly transmits upper level data
To the selector of next stage;Otherwise neuron number evidence is multiplied by multiplier with cynapse data, is then sent into adder, addition
The data transmitted that device receives sum of products upper level add up, as a result input selector, and selector selects the output of adder
As a result as the output of the same level as a result, in the adder and selector of incoming next stage.And so on, until operation terminates to obtain
Final result.Final result is stored in memory module by storage control unit.This method passes through the multiple sieve to cynapse and neuron
Choosing, improves the ratio of valid data in algorithm to greatest extent, further reduces calculation amount, improve arithmetic speed;It fills
Divide and the shared feature of neural network cynapse is utilized, avoids the bring memory access power consumption of reading repeatedly of cynapse;It simultaneously can be
When being less than given threshold value to operational data, useless multiplier and adder are closed, power consumption is further reduced.
Wherein, the Rule of judgment of multiplier and adder is closed in addition to being less than given threshold value to judgment value absolute value, may be used also
Think and is greater than given threshold value to judgment value;To judgment value in given value interval, i.e., between small one and large one two threshold values;Or
To judgment value outside given value interval, that is, it is greater than big threshold value or less than small threshold value;Or it is reflected to judgment value by function
It penetrates and meets certain condition, it is big/to be less than given threshold value, in given value interval etc. if the value after mapping is equal to given threshold value.
In addition, though being to be compressed by given threshold value, but disclosure compression Rule of judgment is not in above-described embodiment
It is limited to threshold decision condition, can also is Function Mapping Rule of judgment, the threshold decision condition can include: less than a given threshold
Value is greater than a given threshold value, in a given value range or outside a given value range etc.;And operation is closed in the disclosure
The Rule of judgment of unit and compression Rule of judgment, the two (such as can be all made of threshold decision, threshold using identical Rule of judgment
The selection of value size equally may be the same or different), can also using different Rule of judgment, (such as one of them sentences for threshold value
It is disconnected, another for mapping judgement), do not influence the realization of the disclosure.
Particular embodiments described above has carried out further in detail the purpose of the disclosure, technical scheme and beneficial effects
Describe in detail it is bright, it is all it should be understood that be not limited to the disclosure the foregoing is merely the specific embodiment of the disclosure
Within the spirit and principle of the disclosure, any modification, equivalent substitution, improvement and etc. done should be included in the guarantor of the disclosure
Within the scope of shield.
Claims (10)
1. a kind of arithmetic unit, comprising:
Computing module, including one or more arithmetic elements;And
Control module, including Operations Analysis, for controlling according to a Rule of judgment arithmetic element of the computing module
It closes.
2. arithmetic unit according to claim 1, wherein each arithmetic element includes one or more operational parts
Part, the arithmetic unit are adder, multiplier, selector or temporary buffer.
3. arithmetic unit according to claim 2, wherein the computing module includes the n multiplier positioned at the first order
And the add tree that the n positioned at the second level is inputted, wherein n is positive integer.
4. arithmetic unit according to claim 1, wherein the Rule of judgment includes threshold decision condition or Function Mapping
Rule of judgment.
5. arithmetic unit according to claim 4, wherein the Rule of judgment is threshold decision condition, comprising: less than one
Given threshold value is greater than a given threshold value, in a given value range or outside a given value range.
6. a kind of operation method, comprising:
Set Rule of judgment;
The closing of the arithmetic element of the computing module is controlled according to Rule of judgment.
7. operation method according to claim 6, wherein the Rule of judgment includes threshold decision condition or Function Mapping
Rule of judgment.
8. operation method according to claim 7, wherein the Rule of judgment is threshold decision condition, comprising: less than one
Given threshold value is greater than a given threshold value, in a given value range or outside a given value range.
9. operation method according to claim 7, wherein the Rule of judgment is Function Mapping Rule of judgment, that is, is judged
Whether meet specified criteria after functional transformation.
10. operation method according to claim 6, wherein according to the Rule of judgment, if the input of one multiplier of input
The absolute value of data is less than given threshold value, then closes the multiplier.
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110597369.5A CN113449855A (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
CN201710441977.0A CN109086880B (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
PCT/CN2018/090901 WO2018228399A1 (en) | 2017-06-13 | 2018-06-12 | Computing device and method |
EP18818258.8A EP3637327B1 (en) | 2017-06-13 | 2018-06-12 | Computing device and method |
EP19217768.1A EP3657403A1 (en) | 2017-06-13 | 2018-06-12 | Computing device and method |
US16/698,984 US11544543B2 (en) | 2017-06-13 | 2019-11-28 | Apparatus and method for sparse training acceleration in neural networks |
US16/698,976 US11544542B2 (en) | 2017-06-13 | 2019-11-28 | Computing device and method |
US16/698,988 US11537858B2 (en) | 2017-06-13 | 2019-11-28 | Computing device and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710441977.0A CN109086880B (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110597369.5A Division CN113449855A (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109086880A true CN109086880A (en) | 2018-12-25 |
CN109086880B CN109086880B (en) | 2021-06-29 |
Family
ID=64839078
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710441977.0A Active CN109086880B (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
CN202110597369.5A Pending CN113449855A (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110597369.5A Pending CN113449855A (en) | 2017-06-13 | 2017-06-13 | Arithmetic device and method |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN109086880B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449855A (en) * | 2017-06-13 | 2021-09-28 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
US11537858B2 (en) | 2017-06-13 | 2022-12-27 | Shanghai Cambricon Information Technology Co., Ltd. | Computing device and method |
US11544526B2 (en) | 2017-06-26 | 2023-01-03 | Shanghai Cambricon Information Technology Co., Ltd. | Computing device and method |
US11727268B2 (en) | 2017-06-21 | 2023-08-15 | Shanghai Cambricon Information Technology Co., Ltd. | Sparse training in neural networks |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1838031A (en) * | 2005-04-13 | 2006-09-27 | 威盛电子股份有限公司 | Closing non-acting numeric value logical operation unit to save power |
CN101527010A (en) * | 2008-03-06 | 2009-09-09 | 上海理工大学 | Hardware realization method and system for artificial neural network algorithm |
CN104539263A (en) * | 2014-12-25 | 2015-04-22 | 电子科技大学 | Reconfigurable low-power dissipation digital FIR filter |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8872688B2 (en) * | 2010-07-13 | 2014-10-28 | University Of Washington Through Its Center For Commercialization | Methods and systems for compressed sensing analog to digital conversion |
US20130318020A1 (en) * | 2011-11-03 | 2013-11-28 | Georgia Tech Research Corporation | Analog programmable sparse approximation system |
CN109086880B (en) * | 2017-06-13 | 2021-06-29 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
-
2017
- 2017-06-13 CN CN201710441977.0A patent/CN109086880B/en active Active
- 2017-06-13 CN CN202110597369.5A patent/CN113449855A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1838031A (en) * | 2005-04-13 | 2006-09-27 | 威盛电子股份有限公司 | Closing non-acting numeric value logical operation unit to save power |
CN101527010A (en) * | 2008-03-06 | 2009-09-09 | 上海理工大学 | Hardware realization method and system for artificial neural network algorithm |
CN104539263A (en) * | 2014-12-25 | 2015-04-22 | 电子科技大学 | Reconfigurable low-power dissipation digital FIR filter |
Non-Patent Citations (2)
Title |
---|
SHIJIN ZHANG,ETC: "Cambricon-X: An Accelerator for Sparse Neural Networks", 《2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 * |
YU-HSIN CHEN,ETC: "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks", 《IEEE JOURNAL OF SOLID-STATE CIRCUITS》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449855A (en) * | 2017-06-13 | 2021-09-28 | 上海寒武纪信息科技有限公司 | Arithmetic device and method |
US11537858B2 (en) | 2017-06-13 | 2022-12-27 | Shanghai Cambricon Information Technology Co., Ltd. | Computing device and method |
US11544542B2 (en) | 2017-06-13 | 2023-01-03 | Shanghai Cambricon Information Technology Co., Ltd. | Computing device and method |
US11544543B2 (en) | 2017-06-13 | 2023-01-03 | Shanghai Cambricon Information Technology Co., Ltd. | Apparatus and method for sparse training acceleration in neural networks |
US11727268B2 (en) | 2017-06-21 | 2023-08-15 | Shanghai Cambricon Information Technology Co., Ltd. | Sparse training in neural networks |
US11544526B2 (en) | 2017-06-26 | 2023-01-03 | Shanghai Cambricon Information Technology Co., Ltd. | Computing device and method |
Also Published As
Publication number | Publication date |
---|---|
CN113449855A (en) | 2021-09-28 |
CN109086880B (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109086880A (en) | A kind of arithmetic unit and method | |
CN106126481B (en) | A kind of computing system and electronic equipment | |
CN108090560A (en) | The design method of LSTM recurrent neural network hardware accelerators based on FPGA | |
CN110035410A (en) | Federated resource distribution and the method and system of unloading are calculated in a kind of vehicle-mounted edge network of software definition | |
CN107391317A (en) | A kind of method, apparatus of data recovery, equipment and computer-readable recording medium | |
CN110764885B (en) | Method for splitting and unloading DNN tasks of multiple mobile devices | |
EP3637327B1 (en) | Computing device and method | |
US20220004858A1 (en) | Method for processing artificial neural network, and electronic device therefor | |
Hashemi et al. | On the benefits of multiple gossip steps in communication-constrained decentralized federated learning | |
CN111339027A (en) | Automatic design method of reconfigurable artificial intelligence core and heterogeneous multi-core chip | |
CN108345934A (en) | A kind of activation device and method for neural network processor | |
CN115374853A (en) | Asynchronous federal learning method and system based on T-Step polymerization algorithm | |
CN109409505A (en) | A method of the compression gradient for distributed deep learning | |
CN109615071A (en) | A kind of neural network processor of high energy efficiency, acceleration system and method | |
CN114885420A (en) | User grouping and resource allocation method and device in NOMA-MEC system | |
CN110909870A (en) | Training device and method | |
Abouhawwash et al. | Evolutionary multi-objective optimization using benson’s karush-kuhn-tucker proximity measure | |
CN109491956B (en) | Heterogeneous collaborative computing system | |
Li et al. | Robust and efficient quantization and coding for control of multidimensional linear systems under data rate constraints | |
CN114005458A (en) | Voice noise reduction method and system based on pipeline architecture and storage medium | |
Sharara et al. | A recurrent neural network based approach for coordinating radio and computing resources allocation in cloud-ran | |
CN102209369B (en) | Method based on wireless network interface selection to improve a smart phone user experience | |
CN110647396A (en) | Method for realizing intelligent application of end cloud cooperative low-power consumption and limited bandwidth | |
CN104933110A (en) | MapReduce-based data pre-fetching method | |
CN110113193A (en) | Data transmission method, system and medium based on hierarchical agent |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Request for anonymity Inventor before: Request for anonymity |