CN108090565A - Accelerated method is trained in a kind of convolutional neural networks parallelization - Google Patents
Accelerated method is trained in a kind of convolutional neural networks parallelization Download PDFInfo
- Publication number
- CN108090565A CN108090565A CN201810037896.9A CN201810037896A CN108090565A CN 108090565 A CN108090565 A CN 108090565A CN 201810037896 A CN201810037896 A CN 201810037896A CN 108090565 A CN108090565 A CN 108090565A
- Authority
- CN
- China
- Prior art keywords
- layer
- batch
- local error
- error
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Abstract
The present invention provides a kind of convolutional neural networks parallelization training accelerated method, propose mixed batch thoughts, machine system applied to CPU and FPGA compositions, it mainly solves under large-scale convolutional neural networks structure, when being trained using FPGA to the Sample-Parallel of a batch, there are problems that memory space inadequate, can be applied to the image identification of computer vision field, target detection.The above method comprises the following steps:1st, in data preprocessing phase, by the sample random rearrangement in original trained storehouse.2nd, in feedforward calculation stages, data write shared drive in the form of batch, parallel processing in each layer of convolutional neural networks realized based on OpenCL language, the data of a sample in the first full articulamentum random read take preceding layer batch of network, and calculate the output of this layer.3rd, the local error stage is being updated, the local error of some sample in preceding layer batch, remaining each layer of parallel computation local error is randomly updated with the local error of first full articulamentum.
Description
Technical field
The invention belongs to computer realms more particularly to a kind of convolutional neural networks parallelization based on FPGA to train and accelerate
Method.
Background technology
FPGA, i.e. field programmable gate array are a kind of high-performance, low-power consumption, programmable digital circuit chip.FPGA
It is internal mainly to include a series of programmable logic blocks (CLB) and interconnection line, in addition, also comprising modules such as DSP, BRAM.It patrols
It collects after block can be configured and performs complicated logical combination function, interconnection line is responsible for different logical blocks, DSP and the input phase
Even form a complete circuit.For computation-intensive algorithm, general processor depend on Feng Ruoyiman systems, it is necessary into
Row instruction fetch, Instruction decoding finally perform the process of machine code, and the computing resource of general processor is with multiplier, addition
The hardware cell composition of the such magnitude of device, if architecture configuration and the mathematical model of algorithm differ greatly, it will cause hardware
The waste of resource.And having the advantages that for FPGA is programmable, developer can repeatedly program the transistor circuit of bottom,
Configure the hardware resource most saved and calculated enough, transistor utilization rate higher.Therefore, under for specific application, FPGA ratios
General processor energy consumption compares higher.
Traditional FPGA application and developments using hardware program language (verilog or VHDL etc.), it is necessary to complete rtl logic
Design, developer need have higher understanding and assurance to hardware circuit, there is high exploitation threshold, construction cycle length, are difficult to
It is the shortcomings of upgrade maintenance, and current, the continuous evolution of deep learning algorithm and update, higher using traditional mode development cost.
Therefore, it is necessary to a kind of technologies for the training that can quickly realize convolutional neural networks, and follow up continually changing algorithm.
Convolutional neural networks are a kind of artificial neural networks of classics, in image classification, target detection, speech recognition, are regarded
The fields such as frequency identification and natural language processing are widely used.In recent years, with the fast development of artificial intelligence, convolutional Neural net
The network generalization and identification accuracy of network are all greatly improved.Document " Wang D, An J, Xu K.PipeCNN:An
OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks[J]
.arXiv preprint arXiv:1611.02450,2016. " propose to perform OpenCL kernel letters using the mode of assembly line
Number, but the drawback is that kernel function can only single thread execution.Document " Liu L, Luo J, Deng X, et al.FPGA-based
Acceleration of Deep Neural Networks Using High Level Method[C]//P2P,
Parallel,Grid,Cloud and Internet Computing(3PGCIC),2015 10th International
Conference on.IEEE,2015:824-827. " describes a kind of stochastic gradient descent method application based on mini-batch
In the method for the parallel training deep neural network on FPGA.But the document only has studied the mini-batch gradients of neutral net
Descending method, and become increasingly complex with the structure of network, the depth of network is constantly deepened, and the type of network layer constantly increases
Add, using mini-batch gradient descent method when, batch input sample data scale increase, can more than FPGA the overall situation in
Capacity is deposited, increases the memory read-write time, and uses stochastic gradient descent method, training every time uses the less efficient of single sample.
Therefore, it is necessary to one kind on the premise of training accuracy is not sacrificed significantly, the instruction of the reduction training time applied to FPGA device
Practice method.
The content of the invention
It is an object of the invention to be directed to appeal problem existing in the prior art, a kind of training of convolutional neural networks is provided
Method, can complete the Fast Training of convolutional neural networks model under relatively low memory bandwidth, and memory bandwidth refers to the unit interval
The byte number of interior read-write.
The present invention provides a kind of training method of convolutional neural networks model, the described method includes:
Under embedded FPGA platform, CPU realizes convolution as computing device as control device, FPGA on FPGA
Each layer of parallel processing in neutral net, for model structure parameter and can training parameter distribute CPU and FPGA can access
Shared drive, structural parameters include the parameters such as convolution nuclear volume, convolution kernel size, average pond factor size, can train ginseng
Number refers to the parameters such as network weight, biasing.
Type according to each layer in training convolutional neural networks is treated sets the output of the characteristic image of different batch scales
With local error, and for its storage allocation space.Batch scales refer to the sample size chosen every time from training set, multiple samples
One batch of this composition.
Shared drive is distributed by way of alignment, using the mode of DMA (direct memory access) from host to FPGA
Equipment transmission data, entire training process, the data of shared drive are constantly calculated and transferred between network layer.
When feedforward calculates, the data of a characteristic image in full articulamentum random read take last layer batch, and record
When backpropagation calculates, output layer error is calculated using in the corresponding label data of sequence number for its sequence number in batch.
When updating local error, according to the chain type computing rule of error backpropagation algorithm, the network layer of single sample is straight
Local error of the update from output layer backpropagation is connect, and the network layer of the last one batch scale is locally missed using later layer
Difference randomly updates the local error of some current sample, and update is corresponding more parallel successively for the network layer of batch scales before
The local error of a sample.
When calculating the local error of convolutional layer, if next layer is pond layer, using average pond mode, and mistake is used
Poor zoom factor lambda parameter is multiplied by the local error of pond layer, obtains the local error value of the corresponding neuron of convolutional layer, reaches fine tuning
Entire volume accumulates nuclear parameter and the target of biasing.
For the convolutional layer of batch scales, the average gradient of the batch is calculated, it is parallel to update convolution nuclear parameter.Calculating should
The average local error of batch, updates offset parameter parallel.
For the full articulamentum of single sample, the gradient of single feature image is calculated, updates weight parameter parallel.It calculates single
The local error of a characteristic image, updates offset parameter parallel.
Current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iteration time
Number or error are less than deconditioning after threshold value.
Description of the drawings
Fig. 1 is the convolutional neural networks parallelization training method overall flow figure of the present invention;
Fig. 2 is the flow chart of single iteration in convolutional neural networks parallelization training method of the invention;
Fig. 3 is the data flow schematic diagram in the convolutional neural networks parallelization training method of the present invention;
Fig. 4 is according to a kind of convolutional layer local error update method realization principle signal shown in an exemplary embodiment
Figure.
Specific embodiment
The method of the present invention is described in further detail below in conjunction with attached drawing.
The convolutional neural networks parallelization training method based on FPGA that Fig. 1 is shown in the embodiment of the present invention realizes stream
Journey comprises the following steps:
FPGA device is communicated by PCIe buses with CPU, at CPU ends to training the random alignment again of the sample in storehouse,
According to OpenCL standards, output, local error distribution CPU and FPGA for each layer of convolutional neural networks model to be trained
The shared drive that can be accessed, storage allocation space size are divided into two kinds of batch scales, to convolutional layer and pond layer, often
A neuron preserves output and the local error of multiple samples of some fixed quantity (being more than 1), to full articulamentum, each nerve
Member simply preserves output and the local error of single sample.In addition, for convolutional layer, it is also necessary to distribute convolution kernel and partially
The memory headroom put, memory headroom size are drawn according to last layer picture size, convolution kernel size, step size computation.To connecting entirely
For layer, it is also necessary to distribute the memory headroom of weight and biasing, memory headroom size is according to last layer neuronal quantity and currently
Layer neuronal quantity is calculated, for output layer, it is also necessary to distribute the memory headroom of label data.
For the implementation method of single parallel training, Fig. 2 is can refer to, characteristic image output and the local error of each layer are joined
Fig. 3 is examined, specific implementation is as follows:
The sample of a certain fixed quantity is set as a batch, reads in the sample data of a batch, use [-
0.5.0.5 random number between] initialize each convolutional layer initial convolution kernel and biasing and each full articulamentum it is initial
Weight and initial bias.
It is calculated for the feedforward of convolutional layer, using the OpenCL kernel functions in three dimensions to the characteristic pattern of a batch
It is operated as concurrently doing convolution operation and activation, parallel granularity is individually to the corresponding local receptor field number of each neuron
According to location and calculating is taken, output characteristic image is obtained.
It is calculated for the feedforward of pond layer, using the OpenCL kernel functions in three dimensions to batch through pulleying
Characteristic image after product and activation concurrently does average pondization operation, and parallel granularity is individually to the corresponding office of each neuron
Portion's receptive field data take location and calculating, obtain output characteristic image.
It feedovers and calculates for full articulamentum, randomly select some characteristic image in one batch of last layer, record obtains
Sequence number of the characteristic image in current batch, it is concurrently handled using the OpenCL kernel functions in the one-dimensional space, parallel
Granularity is that all neurons of the individually last layer to the connection of each neuron take location and calculating, obtains neuron output.
It feedovers and calculates for output layer, it is concurrently handled using the OpenCL kernel functions in the one-dimensional space, parallel grain
Degree is that all neurons of the individually last layer to the connection of each neuron take location and calculating, obtains the output of neuron.Meanwhile
Corresponding sample label data are read according to sequence number, output is concurrently calculated it using the OpenCL kernel functions of the one-dimensional space and is missed
Difference, parallel granularity are individually to calculate local error to each neuron.
It is updated for the local error of the full articulamentum of single sample, it is direct using the OpenCL kernel functions in the one-dimensional space
The local error of full articulamentum is updated, renewal process uses following formula (1):
Wherein,Represent the local error of kth i-th of neuron of layer,Represent that the local of+1 layer of j-th of neuron of kth misses
Difference,Represent derivative of the kth layer activation primitive to output valve.
The full articulamentum of single sample for later layer, the convolutional layers of batch scales using later layer local error with
The local error of some current sample of machine update uses local the missing of the OpenCL kernel functions update current layer in the one-dimensional space
Difference.
It is updated for the local error of convolutional layer, if next layer is pond layer, with reference to figure 4, using average pond mode,
And the local error that pond layer corresponds to neuron is multiplied by using error zoom factor lambda parameter, obtain the corresponding neuron of convolutional layer
Local error value, renewal process use following formula (2):
Wherein,Represent the local error of kth layer ith feature image,Represent the office of+1 layer of j-th of characteristic image of kth
Portion's error,Symbol is Kronecker product computing,Represent derivative of the kth layer activation primitive to characteristic image output valve.
The network layer of remaining batch scale is updated corresponding multiple parallel using the OpenCL kernel functions of three dimensions successively
The local error of sample.For pond layer, if next layer is convolutional layer, use following formula (3):
Wherein,Represent the local error of kth layer ith feature image,Represent the office of+1 layer of j-th of characteristic image of kth
Portion's error, extend functions are to extend the local error of characteristic image, value zero initialization, the rot180 functions of expansion
Convolution kernel is rotated into 180 degree,Symbol is convolution algorithm,Represent kth layer activation primitive to characteristic image output valve
Derivative.
For the convolutional layer of batch scales, the average gradient of the characteristic image of the batch is calculated, uses three dimensions
OpenCL kernel functions update convolution nuclear parameter parallel.Renewal process uses following formula (4):
For the convolutional layer of batch scales, the characteristic image for calculating the batch is averaged local error, uses the one-dimensional space
OpenCL kernel functions update biasing parallel.Renewal process uses following formula (5):
For the full articulamentum of single sample, the gradient of single feature image is calculated, uses the OpenCL cores of two-dimensional space
Function updates weight parameter parallel.Renewal process uses following formula (6):
For the full articulamentum of single sample, the local error of single feature image is calculated, uses the one-dimensional space
OpenCL kernel functions update offset parameter parallel.Renewal process uses following formula (7):
In above-mentioned (4) (5) (6) (7) formula, n represents iterations, and α represents e-learning rate, and B represents the sample number of a batch
Amount, for convolutional layer, WijRepresent the convolution nuclear parameter of last layer ith feature figure and j-th of characteristic pattern of current layer,Table
Show the output of last layer ith feature figure and the convolution of j-th of characteristic pattern local error of current layer.For full articulamentum, WijIt represents
I-th of neuron of last layer and j-th of neuron weight parameter of current layer, xi*δjRepresent the output of i-th neuron of last layer with
The product of j-th of neuron local error of current layer.
Current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iteration time
Number or error are less than deconditioning after threshold value.
The training method of the convolutional neural networks is suitable for but is not limited only to any one following model:
LeNet, AlexNet, VGG-Net, GoogleNet, ResNet.
The foregoing is only a preferred embodiment of the present invention, and is not necessarily limited to the present invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.Within the spirit and principles of the invention, that is made any repaiies
Change, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.
Claims (2)
1. a kind of convolutional neural networks parallelization training method, which is characterized in that comprise the following steps:
1) each layer of parallel processing in the convolutional neural networks based on FPGA (field programmable gate array) realizations, is model knot
Structure parameter and can training parameter create the shared drives that can access of CPU and FPGA, the structural parameters include networks at different levels
The output of layer, local error, it is described can the convolution kernel of training parameter including convolutional layers at different levels, the convolutional layers at different levels be biased towards
The bias vector of amount, the weight matrix of the full articulamentum and the full articulamentum;
2) according to the type for treating each layer in training convolutional neural networks create the characteristic image of different batch scales output and
Local error memory headroom;
3) shared drive is created by way of alignment, is set using the mode of DMA (direct memory access) in host and FPGA
Data, entire training process are transmitted between standby, the data of shared drive are constantly calculated and transferred between network layer;
4) when feedforward calculates, the data of a characteristic image in full articulamentum random read take last layer batch, and record it
When backpropagation calculates, output layer error is calculated using in the corresponding label data of sequence number for sequence number in batch;
5) when updating local error, according to the chain type computing rule of error backpropagation algorithm, the network layer of single sample is direct
The local error from output layer backpropagation is updated, and the network layer of the last one batch scale uses later layer local error
The local error of some current sample is randomly updated, update is corresponding multiple parallel successively for the network layer of batch scales before
The local error of sample;
6) for the convolutional layer of batch scales, the average gradient of the characteristic image of the batch is calculated, it is parallel to update convolution kernel ginseng
Number, calculates the average local error of the batch, updates offset parameter parallel;
7) for the full articulamentum of single sample, the gradient of single feature image is calculated, updates weight parameter parallel, is calculated single
The local error of characteristic image, updates offset parameter parallel;
8) current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iterations
Or error is less than deconditioning after threshold value.
2. the method as described in claim 1, which is characterized in that the layer difference batch scales in the convolutional neural networks
Output and local error, batch scales refer to the sample size chosen every time from training set, in the training method, volume
The output for the batch scale samples that lamination and pond layer preserve and local error, and full articulamentum preserve single sample output with
Local error;
In the convolutional neural networks feedforward calculating process, when the layer from the layer of batch scales to single sample calculates, take
Randomly selected mode, and the sequence number of the sample is recorded, the error of output layer is calculated using the corresponding label data of the sequence number;
During the convolutional neural networks backwards calculation, when the layer from the layer of single sample to batch scales calculates, according to
The sample sequence number that feedforward records in calculating completes the update of the local error of the layer of batch scales;
When calculating the local error of convolutional layer, if next layer is pond layer, using average pond mode, and contracted using error
The local error that factor lambda parameter is multiplied by pond layer is put, the local error value of the corresponding neuron of convolutional layer is obtained, reaches fine tuning entire volume
Product nuclear parameter and the target of biasing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037896.9A CN108090565A (en) | 2018-01-16 | 2018-01-16 | Accelerated method is trained in a kind of convolutional neural networks parallelization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810037896.9A CN108090565A (en) | 2018-01-16 | 2018-01-16 | Accelerated method is trained in a kind of convolutional neural networks parallelization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108090565A true CN108090565A (en) | 2018-05-29 |
Family
ID=62182295
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810037896.9A Pending CN108090565A (en) | 2018-01-16 | 2018-01-16 | Accelerated method is trained in a kind of convolutional neural networks parallelization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108090565A (en) |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830195A (en) * | 2018-05-31 | 2018-11-16 | 西安电子科技大学 | Image classification method based on on-site programmable gate array FPGA |
CN109376843A (en) * | 2018-10-12 | 2019-02-22 | 山东师范大学 | EEG signals rapid classification method, implementation method and device based on FPGA |
CN109711358A (en) * | 2018-12-28 | 2019-05-03 | 四川远鉴科技有限公司 | Neural network training method, face identification method and system and storage medium |
CN109740748A (en) * | 2019-01-08 | 2019-05-10 | 西安邮电大学 | A kind of convolutional neural networks accelerator based on FPGA |
CN109784096A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | Hardware Trojan horse detection and elimination method based on clustering algorithm |
CN109783412A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | A kind of method that deeply study accelerates training |
CN109816108A (en) * | 2019-02-15 | 2019-05-28 | 领目科技(上海)有限公司 | Deep learning accelerator, device and method |
CN110188863A (en) * | 2019-04-30 | 2019-08-30 | 杭州电子科技大学 | A kind of convolution kernel and its compression algorithm of convolutional neural networks |
CN110263833A (en) * | 2019-06-03 | 2019-09-20 | 韩慧慧 | Based on coding-decoding structure image, semantic dividing method |
CN110543939A (en) * | 2019-06-12 | 2019-12-06 | 电子科技大学 | hardware acceleration implementation framework for convolutional neural network backward training based on FPGA |
CN110717574A (en) * | 2018-07-11 | 2020-01-21 | 杭州海康威视数字技术股份有限公司 | Neural network operation method and device and heterogeneous intelligent chip |
CN110852428A (en) * | 2019-09-08 | 2020-02-28 | 天津大学 | Neural network acceleration method and accelerator based on FPGA |
CN111178288A (en) * | 2019-12-31 | 2020-05-19 | 南京师范大学 | Human body posture recognition method and device based on local error layer-by-layer training |
CN111210019A (en) * | 2020-01-16 | 2020-05-29 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111325327A (en) * | 2020-03-06 | 2020-06-23 | 四川九洲电器集团有限责任公司 | Universal convolution neural network operation architecture based on embedded platform and use method |
CN111610963A (en) * | 2020-06-24 | 2020-09-01 | 上海西井信息科技有限公司 | Chip structure and multiply-add calculation engine thereof |
CN111788567A (en) * | 2018-08-27 | 2020-10-16 | 华为技术有限公司 | Data processing equipment and data processing method |
CN111931937A (en) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | Gradient updating method, device and system of image processing model |
CN112101537A (en) * | 2020-09-17 | 2020-12-18 | 广东高云半导体科技股份有限公司 | CNN accelerator and electronic device |
CN112396154A (en) * | 2019-08-16 | 2021-02-23 | 华东交通大学 | Parallel method based on convolutional neural network training |
CN112561028A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Method for training neural network model, and method and device for data processing |
CN112789627A (en) * | 2018-09-30 | 2021-05-11 | 华为技术有限公司 | Neural network processor, data processing method and related equipment |
CN112819140A (en) * | 2021-02-02 | 2021-05-18 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN112836787A (en) * | 2019-11-04 | 2021-05-25 | 百度(美国)有限责任公司 | Reducing deep neural network training times through efficient hybrid parallelization |
CN113239223A (en) * | 2021-04-14 | 2021-08-10 | 浙江大学 | Image retrieval method based on input gradient regularization |
CN113254215A (en) * | 2021-06-16 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic equipment |
WO2022136977A1 (en) * | 2020-12-26 | 2022-06-30 | International Business Machines Corporation | Filtering hidden matrix training dnn |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN203950307U (en) * | 2014-06-06 | 2014-11-19 | 中国电子科技集团公司第三十八研究所 | Based on the SAR parallel processing apparatus of high-performance BW100 chip |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
CN106462800A (en) * | 2014-04-11 | 2017-02-22 | 谷歌公司 | Parallelizing the training of convolutional neural networks |
-
2018
- 2018-01-16 CN CN201810037896.9A patent/CN108090565A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106462800A (en) * | 2014-04-11 | 2017-02-22 | 谷歌公司 | Parallelizing the training of convolutional neural networks |
CN203950307U (en) * | 2014-06-06 | 2014-11-19 | 中国电子科技集团公司第三十八研究所 | Based on the SAR parallel processing apparatus of high-performance BW100 chip |
CN104463324A (en) * | 2014-11-21 | 2015-03-25 | 长沙马沙电子科技有限公司 | Convolution neural network parallel processing method based on large-scale high-performance cluster |
CN104915322A (en) * | 2015-06-09 | 2015-09-16 | 中国人民解放军国防科学技术大学 | Method for accelerating convolution neutral network hardware and AXI bus IP core thereof |
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830195A (en) * | 2018-05-31 | 2018-11-16 | 西安电子科技大学 | Image classification method based on on-site programmable gate array FPGA |
CN110717574A (en) * | 2018-07-11 | 2020-01-21 | 杭州海康威视数字技术股份有限公司 | Neural network operation method and device and heterogeneous intelligent chip |
CN110717574B (en) * | 2018-07-11 | 2023-07-07 | 杭州海康威视数字技术股份有限公司 | Neural network operation method and device and heterogeneous intelligent chip |
CN111788567B (en) * | 2018-08-27 | 2024-04-26 | 华为技术有限公司 | Data processing equipment and data processing method |
CN111788567A (en) * | 2018-08-27 | 2020-10-16 | 华为技术有限公司 | Data processing equipment and data processing method |
CN112789627B (en) * | 2018-09-30 | 2023-08-22 | 华为技术有限公司 | Neural network processor, data processing method and related equipment |
CN112789627A (en) * | 2018-09-30 | 2021-05-11 | 华为技术有限公司 | Neural network processor, data processing method and related equipment |
CN109376843B (en) * | 2018-10-12 | 2021-01-08 | 山东师范大学 | FPGA-based electroencephalogram signal rapid classification method, implementation method and device |
CN109376843A (en) * | 2018-10-12 | 2019-02-22 | 山东师范大学 | EEG signals rapid classification method, implementation method and device based on FPGA |
CN109711358A (en) * | 2018-12-28 | 2019-05-03 | 四川远鉴科技有限公司 | Neural network training method, face identification method and system and storage medium |
CN109740748A (en) * | 2019-01-08 | 2019-05-10 | 西安邮电大学 | A kind of convolutional neural networks accelerator based on FPGA |
CN109783412A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | A kind of method that deeply study accelerates training |
CN109784096B (en) * | 2019-01-18 | 2023-04-18 | 电子科技大学 | Hardware Trojan horse detection and elimination method based on clustering algorithm |
CN109783412B (en) * | 2019-01-18 | 2022-04-22 | 电子科技大学 | Deep reinforcement learning acceleration training method |
CN109784096A (en) * | 2019-01-18 | 2019-05-21 | 电子科技大学 | Hardware Trojan horse detection and elimination method based on clustering algorithm |
CN109816108A (en) * | 2019-02-15 | 2019-05-28 | 领目科技(上海)有限公司 | Deep learning accelerator, device and method |
CN110188863B (en) * | 2019-04-30 | 2021-04-09 | 杭州电子科技大学 | Convolution kernel compression method of convolution neural network suitable for resource-limited equipment |
CN110188863A (en) * | 2019-04-30 | 2019-08-30 | 杭州电子科技大学 | A kind of convolution kernel and its compression algorithm of convolutional neural networks |
CN110263833A (en) * | 2019-06-03 | 2019-09-20 | 韩慧慧 | Based on coding-decoding structure image, semantic dividing method |
CN110543939B (en) * | 2019-06-12 | 2022-05-03 | 电子科技大学 | Hardware acceleration realization device for convolutional neural network backward training based on FPGA |
CN110543939A (en) * | 2019-06-12 | 2019-12-06 | 电子科技大学 | hardware acceleration implementation framework for convolutional neural network backward training based on FPGA |
CN112396154A (en) * | 2019-08-16 | 2021-02-23 | 华东交通大学 | Parallel method based on convolutional neural network training |
CN110852428A (en) * | 2019-09-08 | 2020-02-28 | 天津大学 | Neural network acceleration method and accelerator based on FPGA |
CN110852428B (en) * | 2019-09-08 | 2023-10-27 | 天津大学 | Neural network acceleration method and accelerator based on FPGA |
CN112561028A (en) * | 2019-09-25 | 2021-03-26 | 华为技术有限公司 | Method for training neural network model, and method and device for data processing |
CN112836787A (en) * | 2019-11-04 | 2021-05-25 | 百度(美国)有限责任公司 | Reducing deep neural network training times through efficient hybrid parallelization |
CN111178288A (en) * | 2019-12-31 | 2020-05-19 | 南京师范大学 | Human body posture recognition method and device based on local error layer-by-layer training |
CN111178288B (en) * | 2019-12-31 | 2024-03-01 | 南京师范大学 | Human body posture recognition method and device based on local error layer-by-layer training |
CN111210019B (en) * | 2020-01-16 | 2022-06-24 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111210019A (en) * | 2020-01-16 | 2020-05-29 | 电子科技大学 | Neural network inference method based on software and hardware cooperative acceleration |
CN111325327A (en) * | 2020-03-06 | 2020-06-23 | 四川九洲电器集团有限责任公司 | Universal convolution neural network operation architecture based on embedded platform and use method |
CN111610963A (en) * | 2020-06-24 | 2020-09-01 | 上海西井信息科技有限公司 | Chip structure and multiply-add calculation engine thereof |
CN112101537A (en) * | 2020-09-17 | 2020-12-18 | 广东高云半导体科技股份有限公司 | CNN accelerator and electronic device |
CN112101537B (en) * | 2020-09-17 | 2021-08-03 | 广东高云半导体科技股份有限公司 | CNN accelerator and electronic device |
CN111931937B (en) * | 2020-09-30 | 2021-01-01 | 深圳云天励飞技术股份有限公司 | Gradient updating method, device and system of image processing model |
CN111931937A (en) * | 2020-09-30 | 2020-11-13 | 深圳云天励飞技术股份有限公司 | Gradient updating method, device and system of image processing model |
WO2022136977A1 (en) * | 2020-12-26 | 2022-06-30 | International Business Machines Corporation | Filtering hidden matrix training dnn |
GB2621692A (en) * | 2020-12-26 | 2024-02-21 | Ibm | Filtering hidden matrix training DNN |
CN112819140B (en) * | 2021-02-02 | 2022-06-24 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN112819140A (en) * | 2021-02-02 | 2021-05-18 | 电子科技大学 | OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method |
CN113239223A (en) * | 2021-04-14 | 2021-08-10 | 浙江大学 | Image retrieval method based on input gradient regularization |
CN113254215A (en) * | 2021-06-16 | 2021-08-13 | 腾讯科技(深圳)有限公司 | Data processing method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108090565A (en) | Accelerated method is trained in a kind of convolutional neural networks parallelization | |
Park et al. | 7.6 A 65nm 236.5 nJ/classification neuromorphic processor with 7.5% energy overhead on-chip learning using direct spike-only feedback | |
CN106951926A (en) | The deep learning systems approach and device of a kind of mixed architecture | |
Khodamoradi et al. | S2n2: A fpga accelerator for streaming spiking neural networks | |
CN110321997B (en) | High-parallelism computing platform, system and computing implementation method | |
CN110163356A (en) | A kind of computing device and method | |
WO2021089009A1 (en) | Data stream reconstruction method and reconstructable data stream processor | |
CN110383300A (en) | A kind of computing device and method | |
Liu et al. | FPGA-NHAP: A general FPGA-based neuromorphic hardware acceleration platform with high speed and low power | |
CN109272110A (en) | Photoelectricity based on photon neural network chip merges intelligent signal processing system | |
Zhang et al. | An asynchronous reconfigurable SNN accelerator with event-driven time step update | |
CN109359730A (en) | Neural network processor towards fixed output normal form Winograd convolution | |
CN110163350A (en) | A kind of computing device and method | |
Momose et al. | Systems and circuits for AI chips and their trends | |
CN108595379A (en) | A kind of parallelization convolution algorithm method and system based on multi-level buffer | |
CN110689045A (en) | Distributed training method and device for deep learning model | |
Chen et al. | A 67.5 μJ/prediction accelerator for spiking neural networks in image segmentation | |
Chen et al. | Cerebron: A reconfigurable architecture for spatiotemporal sparse spiking neural networks | |
Xiao et al. | FPGA-based scalable and highly concurrent convolutional neural network acceleration | |
Sommer et al. | Efficient hardware acceleration of sparsely active convolutional spiking neural networks | |
Chen et al. | Rgp: Neural network pruning through regular graph with edges swapping | |
CN109359542A (en) | The determination method and terminal device of vehicle damage rank neural network based | |
Zhan et al. | Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems | |
CN107122472A (en) | Extensive unstructured data extracting method, its system, DDM platform | |
Ogbogu et al. | Accelerating Graph Neural Network Training on ReRAM-based PIM Architectures via Graph and Model Pruning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180529 |
|
WD01 | Invention patent application deemed withdrawn after publication |