CN108090565A - Accelerated method is trained in a kind of convolutional neural networks parallelization - Google Patents

Accelerated method is trained in a kind of convolutional neural networks parallelization Download PDF

Info

Publication number
CN108090565A
CN108090565A CN201810037896.9A CN201810037896A CN108090565A CN 108090565 A CN108090565 A CN 108090565A CN 201810037896 A CN201810037896 A CN 201810037896A CN 108090565 A CN108090565 A CN 108090565A
Authority
CN
China
Prior art keywords
layer
batch
local error
error
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810037896.9A
Other languages
Chinese (zh)
Inventor
洪启飞
阮爱武
史傲凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201810037896.9A priority Critical patent/CN108090565A/en
Publication of CN108090565A publication Critical patent/CN108090565A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The present invention provides a kind of convolutional neural networks parallelization training accelerated method, propose mixed batch thoughts, machine system applied to CPU and FPGA compositions, it mainly solves under large-scale convolutional neural networks structure, when being trained using FPGA to the Sample-Parallel of a batch, there are problems that memory space inadequate, can be applied to the image identification of computer vision field, target detection.The above method comprises the following steps:1st, in data preprocessing phase, by the sample random rearrangement in original trained storehouse.2nd, in feedforward calculation stages, data write shared drive in the form of batch, parallel processing in each layer of convolutional neural networks realized based on OpenCL language, the data of a sample in the first full articulamentum random read take preceding layer batch of network, and calculate the output of this layer.3rd, the local error stage is being updated, the local error of some sample in preceding layer batch, remaining each layer of parallel computation local error is randomly updated with the local error of first full articulamentum.

Description

Accelerated method is trained in a kind of convolutional neural networks parallelization
Technical field
The invention belongs to computer realms more particularly to a kind of convolutional neural networks parallelization based on FPGA to train and accelerate Method.
Background technology
FPGA, i.e. field programmable gate array are a kind of high-performance, low-power consumption, programmable digital circuit chip.FPGA It is internal mainly to include a series of programmable logic blocks (CLB) and interconnection line, in addition, also comprising modules such as DSP, BRAM.It patrols It collects after block can be configured and performs complicated logical combination function, interconnection line is responsible for different logical blocks, DSP and the input phase Even form a complete circuit.For computation-intensive algorithm, general processor depend on Feng Ruoyiman systems, it is necessary into Row instruction fetch, Instruction decoding finally perform the process of machine code, and the computing resource of general processor is with multiplier, addition The hardware cell composition of the such magnitude of device, if architecture configuration and the mathematical model of algorithm differ greatly, it will cause hardware The waste of resource.And having the advantages that for FPGA is programmable, developer can repeatedly program the transistor circuit of bottom, Configure the hardware resource most saved and calculated enough, transistor utilization rate higher.Therefore, under for specific application, FPGA ratios General processor energy consumption compares higher.
Traditional FPGA application and developments using hardware program language (verilog or VHDL etc.), it is necessary to complete rtl logic Design, developer need have higher understanding and assurance to hardware circuit, there is high exploitation threshold, construction cycle length, are difficult to It is the shortcomings of upgrade maintenance, and current, the continuous evolution of deep learning algorithm and update, higher using traditional mode development cost. Therefore, it is necessary to a kind of technologies for the training that can quickly realize convolutional neural networks, and follow up continually changing algorithm.
Convolutional neural networks are a kind of artificial neural networks of classics, in image classification, target detection, speech recognition, are regarded The fields such as frequency identification and natural language processing are widely used.In recent years, with the fast development of artificial intelligence, convolutional Neural net The network generalization and identification accuracy of network are all greatly improved.Document " Wang D, An J, Xu K.PipeCNN:An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks[J] .arXiv preprint arXiv:1611.02450,2016. " propose to perform OpenCL kernel letters using the mode of assembly line Number, but the drawback is that kernel function can only single thread execution.Document " Liu L, Luo J, Deng X, et al.FPGA-based Acceleration of Deep Neural Networks Using High Level Method[C]//P2P, Parallel,Grid,Cloud and Internet Computing(3PGCIC),2015 10th International Conference on.IEEE,2015:824-827. " describes a kind of stochastic gradient descent method application based on mini-batch In the method for the parallel training deep neural network on FPGA.But the document only has studied the mini-batch gradients of neutral net Descending method, and become increasingly complex with the structure of network, the depth of network is constantly deepened, and the type of network layer constantly increases Add, using mini-batch gradient descent method when, batch input sample data scale increase, can more than FPGA the overall situation in Capacity is deposited, increases the memory read-write time, and uses stochastic gradient descent method, training every time uses the less efficient of single sample. Therefore, it is necessary to one kind on the premise of training accuracy is not sacrificed significantly, the instruction of the reduction training time applied to FPGA device Practice method.
The content of the invention
It is an object of the invention to be directed to appeal problem existing in the prior art, a kind of training of convolutional neural networks is provided Method, can complete the Fast Training of convolutional neural networks model under relatively low memory bandwidth, and memory bandwidth refers to the unit interval The byte number of interior read-write.
The present invention provides a kind of training method of convolutional neural networks model, the described method includes:
Under embedded FPGA platform, CPU realizes convolution as computing device as control device, FPGA on FPGA Each layer of parallel processing in neutral net, for model structure parameter and can training parameter distribute CPU and FPGA can access Shared drive, structural parameters include the parameters such as convolution nuclear volume, convolution kernel size, average pond factor size, can train ginseng Number refers to the parameters such as network weight, biasing.
Type according to each layer in training convolutional neural networks is treated sets the output of the characteristic image of different batch scales With local error, and for its storage allocation space.Batch scales refer to the sample size chosen every time from training set, multiple samples One batch of this composition.
Shared drive is distributed by way of alignment, using the mode of DMA (direct memory access) from host to FPGA Equipment transmission data, entire training process, the data of shared drive are constantly calculated and transferred between network layer.
When feedforward calculates, the data of a characteristic image in full articulamentum random read take last layer batch, and record When backpropagation calculates, output layer error is calculated using in the corresponding label data of sequence number for its sequence number in batch.
When updating local error, according to the chain type computing rule of error backpropagation algorithm, the network layer of single sample is straight Local error of the update from output layer backpropagation is connect, and the network layer of the last one batch scale is locally missed using later layer Difference randomly updates the local error of some current sample, and update is corresponding more parallel successively for the network layer of batch scales before The local error of a sample.
When calculating the local error of convolutional layer, if next layer is pond layer, using average pond mode, and mistake is used Poor zoom factor lambda parameter is multiplied by the local error of pond layer, obtains the local error value of the corresponding neuron of convolutional layer, reaches fine tuning Entire volume accumulates nuclear parameter and the target of biasing.
For the convolutional layer of batch scales, the average gradient of the batch is calculated, it is parallel to update convolution nuclear parameter.Calculating should The average local error of batch, updates offset parameter parallel.
For the full articulamentum of single sample, the gradient of single feature image is calculated, updates weight parameter parallel.It calculates single The local error of a characteristic image, updates offset parameter parallel.
Current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iteration time Number or error are less than deconditioning after threshold value.
Description of the drawings
Fig. 1 is the convolutional neural networks parallelization training method overall flow figure of the present invention;
Fig. 2 is the flow chart of single iteration in convolutional neural networks parallelization training method of the invention;
Fig. 3 is the data flow schematic diagram in the convolutional neural networks parallelization training method of the present invention;
Fig. 4 is according to a kind of convolutional layer local error update method realization principle signal shown in an exemplary embodiment Figure.
Specific embodiment
The method of the present invention is described in further detail below in conjunction with attached drawing.
The convolutional neural networks parallelization training method based on FPGA that Fig. 1 is shown in the embodiment of the present invention realizes stream Journey comprises the following steps:
FPGA device is communicated by PCIe buses with CPU, at CPU ends to training the random alignment again of the sample in storehouse, According to OpenCL standards, output, local error distribution CPU and FPGA for each layer of convolutional neural networks model to be trained The shared drive that can be accessed, storage allocation space size are divided into two kinds of batch scales, to convolutional layer and pond layer, often A neuron preserves output and the local error of multiple samples of some fixed quantity (being more than 1), to full articulamentum, each nerve Member simply preserves output and the local error of single sample.In addition, for convolutional layer, it is also necessary to distribute convolution kernel and partially The memory headroom put, memory headroom size are drawn according to last layer picture size, convolution kernel size, step size computation.To connecting entirely For layer, it is also necessary to distribute the memory headroom of weight and biasing, memory headroom size is according to last layer neuronal quantity and currently Layer neuronal quantity is calculated, for output layer, it is also necessary to distribute the memory headroom of label data.
For the implementation method of single parallel training, Fig. 2 is can refer to, characteristic image output and the local error of each layer are joined Fig. 3 is examined, specific implementation is as follows:
The sample of a certain fixed quantity is set as a batch, reads in the sample data of a batch, use [- 0.5.0.5 random number between] initialize each convolutional layer initial convolution kernel and biasing and each full articulamentum it is initial Weight and initial bias.
It is calculated for the feedforward of convolutional layer, using the OpenCL kernel functions in three dimensions to the characteristic pattern of a batch It is operated as concurrently doing convolution operation and activation, parallel granularity is individually to the corresponding local receptor field number of each neuron According to location and calculating is taken, output characteristic image is obtained.
It is calculated for the feedforward of pond layer, using the OpenCL kernel functions in three dimensions to batch through pulleying Characteristic image after product and activation concurrently does average pondization operation, and parallel granularity is individually to the corresponding office of each neuron Portion's receptive field data take location and calculating, obtain output characteristic image.
It feedovers and calculates for full articulamentum, randomly select some characteristic image in one batch of last layer, record obtains Sequence number of the characteristic image in current batch, it is concurrently handled using the OpenCL kernel functions in the one-dimensional space, parallel Granularity is that all neurons of the individually last layer to the connection of each neuron take location and calculating, obtains neuron output.
It feedovers and calculates for output layer, it is concurrently handled using the OpenCL kernel functions in the one-dimensional space, parallel grain Degree is that all neurons of the individually last layer to the connection of each neuron take location and calculating, obtains the output of neuron.Meanwhile Corresponding sample label data are read according to sequence number, output is concurrently calculated it using the OpenCL kernel functions of the one-dimensional space and is missed Difference, parallel granularity are individually to calculate local error to each neuron.
It is updated for the local error of the full articulamentum of single sample, it is direct using the OpenCL kernel functions in the one-dimensional space The local error of full articulamentum is updated, renewal process uses following formula (1):
Wherein,Represent the local error of kth i-th of neuron of layer,Represent that the local of+1 layer of j-th of neuron of kth misses Difference,Represent derivative of the kth layer activation primitive to output valve.
The full articulamentum of single sample for later layer, the convolutional layers of batch scales using later layer local error with The local error of some current sample of machine update uses local the missing of the OpenCL kernel functions update current layer in the one-dimensional space Difference.
It is updated for the local error of convolutional layer, if next layer is pond layer, with reference to figure 4, using average pond mode, And the local error that pond layer corresponds to neuron is multiplied by using error zoom factor lambda parameter, obtain the corresponding neuron of convolutional layer Local error value, renewal process use following formula (2):
Wherein,Represent the local error of kth layer ith feature image,Represent the office of+1 layer of j-th of characteristic image of kth Portion's error,Symbol is Kronecker product computing,Represent derivative of the kth layer activation primitive to characteristic image output valve.
The network layer of remaining batch scale is updated corresponding multiple parallel using the OpenCL kernel functions of three dimensions successively The local error of sample.For pond layer, if next layer is convolutional layer, use following formula (3):
Wherein,Represent the local error of kth layer ith feature image,Represent the office of+1 layer of j-th of characteristic image of kth Portion's error, extend functions are to extend the local error of characteristic image, value zero initialization, the rot180 functions of expansion Convolution kernel is rotated into 180 degree,Symbol is convolution algorithm,Represent kth layer activation primitive to characteristic image output valve Derivative.
For the convolutional layer of batch scales, the average gradient of the characteristic image of the batch is calculated, uses three dimensions OpenCL kernel functions update convolution nuclear parameter parallel.Renewal process uses following formula (4):
For the convolutional layer of batch scales, the characteristic image for calculating the batch is averaged local error, uses the one-dimensional space OpenCL kernel functions update biasing parallel.Renewal process uses following formula (5):
For the full articulamentum of single sample, the gradient of single feature image is calculated, uses the OpenCL cores of two-dimensional space Function updates weight parameter parallel.Renewal process uses following formula (6):
For the full articulamentum of single sample, the local error of single feature image is calculated, uses the one-dimensional space OpenCL kernel functions update offset parameter parallel.Renewal process uses following formula (7):
In above-mentioned (4) (5) (6) (7) formula, n represents iterations, and α represents e-learning rate, and B represents the sample number of a batch Amount, for convolutional layer, WijRepresent the convolution nuclear parameter of last layer ith feature figure and j-th of characteristic pattern of current layer,Table Show the output of last layer ith feature figure and the convolution of j-th of characteristic pattern local error of current layer.For full articulamentum, WijIt represents I-th of neuron of last layer and j-th of neuron weight parameter of current layer, xijRepresent the output of i-th neuron of last layer with The product of j-th of neuron local error of current layer.
Current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iteration time Number or error are less than deconditioning after threshold value.
The training method of the convolutional neural networks is suitable for but is not limited only to any one following model:
LeNet, AlexNet, VGG-Net, GoogleNet, ResNet.
The foregoing is only a preferred embodiment of the present invention, and is not necessarily limited to the present invention, for the skill of this field For art personnel, the invention may be variously modified and varied.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should all be included in the protection scope of the present invention.

Claims (2)

1. a kind of convolutional neural networks parallelization training method, which is characterized in that comprise the following steps:
1) each layer of parallel processing in the convolutional neural networks based on FPGA (field programmable gate array) realizations, is model knot Structure parameter and can training parameter create the shared drives that can access of CPU and FPGA, the structural parameters include networks at different levels The output of layer, local error, it is described can the convolution kernel of training parameter including convolutional layers at different levels, the convolutional layers at different levels be biased towards The bias vector of amount, the weight matrix of the full articulamentum and the full articulamentum;
2) according to the type for treating each layer in training convolutional neural networks create the characteristic image of different batch scales output and Local error memory headroom;
3) shared drive is created by way of alignment, is set using the mode of DMA (direct memory access) in host and FPGA Data, entire training process are transmitted between standby, the data of shared drive are constantly calculated and transferred between network layer;
4) when feedforward calculates, the data of a characteristic image in full articulamentum random read take last layer batch, and record it When backpropagation calculates, output layer error is calculated using in the corresponding label data of sequence number for sequence number in batch;
5) when updating local error, according to the chain type computing rule of error backpropagation algorithm, the network layer of single sample is direct The local error from output layer backpropagation is updated, and the network layer of the last one batch scale uses later layer local error The local error of some current sample is randomly updated, update is corresponding multiple parallel successively for the network layer of batch scales before The local error of sample;
6) for the convolutional layer of batch scales, the average gradient of the characteristic image of the batch is calculated, it is parallel to update convolution kernel ginseng Number, calculates the average local error of the batch, updates offset parameter parallel;
7) for the full articulamentum of single sample, the gradient of single feature image is calculated, updates weight parameter parallel, is calculated single The local error of characteristic image, updates offset parameter parallel;
8) current batch has updated and then has transmitted again the data of next batch, and disposed upright reaches default iterations Or error is less than deconditioning after threshold value.
2. the method as described in claim 1, which is characterized in that the layer difference batch scales in the convolutional neural networks Output and local error, batch scales refer to the sample size chosen every time from training set, in the training method, volume The output for the batch scale samples that lamination and pond layer preserve and local error, and full articulamentum preserve single sample output with Local error;
In the convolutional neural networks feedforward calculating process, when the layer from the layer of batch scales to single sample calculates, take Randomly selected mode, and the sequence number of the sample is recorded, the error of output layer is calculated using the corresponding label data of the sequence number;
During the convolutional neural networks backwards calculation, when the layer from the layer of single sample to batch scales calculates, according to The sample sequence number that feedforward records in calculating completes the update of the local error of the layer of batch scales;
When calculating the local error of convolutional layer, if next layer is pond layer, using average pond mode, and contracted using error The local error that factor lambda parameter is multiplied by pond layer is put, the local error value of the corresponding neuron of convolutional layer is obtained, reaches fine tuning entire volume Product nuclear parameter and the target of biasing.
CN201810037896.9A 2018-01-16 2018-01-16 Accelerated method is trained in a kind of convolutional neural networks parallelization Pending CN108090565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810037896.9A CN108090565A (en) 2018-01-16 2018-01-16 Accelerated method is trained in a kind of convolutional neural networks parallelization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810037896.9A CN108090565A (en) 2018-01-16 2018-01-16 Accelerated method is trained in a kind of convolutional neural networks parallelization

Publications (1)

Publication Number Publication Date
CN108090565A true CN108090565A (en) 2018-05-29

Family

ID=62182295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810037896.9A Pending CN108090565A (en) 2018-01-16 2018-01-16 Accelerated method is trained in a kind of convolutional neural networks parallelization

Country Status (1)

Country Link
CN (1) CN108090565A (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830195A (en) * 2018-05-31 2018-11-16 西安电子科技大学 Image classification method based on on-site programmable gate array FPGA
CN109376843A (en) * 2018-10-12 2019-02-22 山东师范大学 EEG signals rapid classification method, implementation method and device based on FPGA
CN109711358A (en) * 2018-12-28 2019-05-03 四川远鉴科技有限公司 Neural network training method, face identification method and system and storage medium
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA
CN109784096A (en) * 2019-01-18 2019-05-21 电子科技大学 Hardware Trojan horse detection and elimination method based on clustering algorithm
CN109783412A (en) * 2019-01-18 2019-05-21 电子科技大学 A kind of method that deeply study accelerates training
CN109816108A (en) * 2019-02-15 2019-05-28 领目科技(上海)有限公司 Deep learning accelerator, device and method
CN110188863A (en) * 2019-04-30 2019-08-30 杭州电子科技大学 A kind of convolution kernel and its compression algorithm of convolutional neural networks
CN110263833A (en) * 2019-06-03 2019-09-20 韩慧慧 Based on coding-decoding structure image, semantic dividing method
CN110543939A (en) * 2019-06-12 2019-12-06 电子科技大学 hardware acceleration implementation framework for convolutional neural network backward training based on FPGA
CN110717574A (en) * 2018-07-11 2020-01-21 杭州海康威视数字技术股份有限公司 Neural network operation method and device and heterogeneous intelligent chip
CN110852428A (en) * 2019-09-08 2020-02-28 天津大学 Neural network acceleration method and accelerator based on FPGA
CN111178288A (en) * 2019-12-31 2020-05-19 南京师范大学 Human body posture recognition method and device based on local error layer-by-layer training
CN111210019A (en) * 2020-01-16 2020-05-29 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111325327A (en) * 2020-03-06 2020-06-23 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111610963A (en) * 2020-06-24 2020-09-01 上海西井信息科技有限公司 Chip structure and multiply-add calculation engine thereof
CN111788567A (en) * 2018-08-27 2020-10-16 华为技术有限公司 Data processing equipment and data processing method
CN111931937A (en) * 2020-09-30 2020-11-13 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model
CN112101537A (en) * 2020-09-17 2020-12-18 广东高云半导体科技股份有限公司 CNN accelerator and electronic device
CN112396154A (en) * 2019-08-16 2021-02-23 华东交通大学 Parallel method based on convolutional neural network training
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
CN112789627A (en) * 2018-09-30 2021-05-11 华为技术有限公司 Neural network processor, data processing method and related equipment
CN112819140A (en) * 2021-02-02 2021-05-18 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN112836787A (en) * 2019-11-04 2021-05-25 百度(美国)有限责任公司 Reducing deep neural network training times through efficient hybrid parallelization
CN113239223A (en) * 2021-04-14 2021-08-10 浙江大学 Image retrieval method based on input gradient regularization
CN113254215A (en) * 2021-06-16 2021-08-13 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic equipment
WO2022136977A1 (en) * 2020-12-26 2022-06-30 International Business Machines Corporation Filtering hidden matrix training dnn

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN203950307U (en) * 2014-06-06 2014-11-19 中国电子科技集团公司第三十八研究所 Based on the SAR parallel processing apparatus of high-performance BW100 chip
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof
CN106462800A (en) * 2014-04-11 2017-02-22 谷歌公司 Parallelizing the training of convolutional neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462800A (en) * 2014-04-11 2017-02-22 谷歌公司 Parallelizing the training of convolutional neural networks
CN203950307U (en) * 2014-06-06 2014-11-19 中国电子科技集团公司第三十八研究所 Based on the SAR parallel processing apparatus of high-performance BW100 chip
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN104915322A (en) * 2015-06-09 2015-09-16 中国人民解放军国防科学技术大学 Method for accelerating convolution neutral network hardware and AXI bus IP core thereof

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830195A (en) * 2018-05-31 2018-11-16 西安电子科技大学 Image classification method based on on-site programmable gate array FPGA
CN110717574A (en) * 2018-07-11 2020-01-21 杭州海康威视数字技术股份有限公司 Neural network operation method and device and heterogeneous intelligent chip
CN110717574B (en) * 2018-07-11 2023-07-07 杭州海康威视数字技术股份有限公司 Neural network operation method and device and heterogeneous intelligent chip
CN111788567B (en) * 2018-08-27 2024-04-26 华为技术有限公司 Data processing equipment and data processing method
CN111788567A (en) * 2018-08-27 2020-10-16 华为技术有限公司 Data processing equipment and data processing method
CN112789627B (en) * 2018-09-30 2023-08-22 华为技术有限公司 Neural network processor, data processing method and related equipment
CN112789627A (en) * 2018-09-30 2021-05-11 华为技术有限公司 Neural network processor, data processing method and related equipment
CN109376843B (en) * 2018-10-12 2021-01-08 山东师范大学 FPGA-based electroencephalogram signal rapid classification method, implementation method and device
CN109376843A (en) * 2018-10-12 2019-02-22 山东师范大学 EEG signals rapid classification method, implementation method and device based on FPGA
CN109711358A (en) * 2018-12-28 2019-05-03 四川远鉴科技有限公司 Neural network training method, face identification method and system and storage medium
CN109740748A (en) * 2019-01-08 2019-05-10 西安邮电大学 A kind of convolutional neural networks accelerator based on FPGA
CN109783412A (en) * 2019-01-18 2019-05-21 电子科技大学 A kind of method that deeply study accelerates training
CN109784096B (en) * 2019-01-18 2023-04-18 电子科技大学 Hardware Trojan horse detection and elimination method based on clustering algorithm
CN109783412B (en) * 2019-01-18 2022-04-22 电子科技大学 Deep reinforcement learning acceleration training method
CN109784096A (en) * 2019-01-18 2019-05-21 电子科技大学 Hardware Trojan horse detection and elimination method based on clustering algorithm
CN109816108A (en) * 2019-02-15 2019-05-28 领目科技(上海)有限公司 Deep learning accelerator, device and method
CN110188863B (en) * 2019-04-30 2021-04-09 杭州电子科技大学 Convolution kernel compression method of convolution neural network suitable for resource-limited equipment
CN110188863A (en) * 2019-04-30 2019-08-30 杭州电子科技大学 A kind of convolution kernel and its compression algorithm of convolutional neural networks
CN110263833A (en) * 2019-06-03 2019-09-20 韩慧慧 Based on coding-decoding structure image, semantic dividing method
CN110543939B (en) * 2019-06-12 2022-05-03 电子科技大学 Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN110543939A (en) * 2019-06-12 2019-12-06 电子科技大学 hardware acceleration implementation framework for convolutional neural network backward training based on FPGA
CN112396154A (en) * 2019-08-16 2021-02-23 华东交通大学 Parallel method based on convolutional neural network training
CN110852428A (en) * 2019-09-08 2020-02-28 天津大学 Neural network acceleration method and accelerator based on FPGA
CN110852428B (en) * 2019-09-08 2023-10-27 天津大学 Neural network acceleration method and accelerator based on FPGA
CN112561028A (en) * 2019-09-25 2021-03-26 华为技术有限公司 Method for training neural network model, and method and device for data processing
CN112836787A (en) * 2019-11-04 2021-05-25 百度(美国)有限责任公司 Reducing deep neural network training times through efficient hybrid parallelization
CN111178288A (en) * 2019-12-31 2020-05-19 南京师范大学 Human body posture recognition method and device based on local error layer-by-layer training
CN111178288B (en) * 2019-12-31 2024-03-01 南京师范大学 Human body posture recognition method and device based on local error layer-by-layer training
CN111210019B (en) * 2020-01-16 2022-06-24 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111210019A (en) * 2020-01-16 2020-05-29 电子科技大学 Neural network inference method based on software and hardware cooperative acceleration
CN111325327A (en) * 2020-03-06 2020-06-23 四川九洲电器集团有限责任公司 Universal convolution neural network operation architecture based on embedded platform and use method
CN111610963A (en) * 2020-06-24 2020-09-01 上海西井信息科技有限公司 Chip structure and multiply-add calculation engine thereof
CN112101537A (en) * 2020-09-17 2020-12-18 广东高云半导体科技股份有限公司 CNN accelerator and electronic device
CN112101537B (en) * 2020-09-17 2021-08-03 广东高云半导体科技股份有限公司 CNN accelerator and electronic device
CN111931937B (en) * 2020-09-30 2021-01-01 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model
CN111931937A (en) * 2020-09-30 2020-11-13 深圳云天励飞技术股份有限公司 Gradient updating method, device and system of image processing model
WO2022136977A1 (en) * 2020-12-26 2022-06-30 International Business Machines Corporation Filtering hidden matrix training dnn
GB2621692A (en) * 2020-12-26 2024-02-21 Ibm Filtering hidden matrix training DNN
CN112819140B (en) * 2021-02-02 2022-06-24 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN112819140A (en) * 2021-02-02 2021-05-18 电子科技大学 OpenCL-based FPGA one-dimensional signal recognition neural network acceleration method
CN113239223A (en) * 2021-04-14 2021-08-10 浙江大学 Image retrieval method based on input gradient regularization
CN113254215A (en) * 2021-06-16 2021-08-13 腾讯科技(深圳)有限公司 Data processing method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN108090565A (en) Accelerated method is trained in a kind of convolutional neural networks parallelization
Park et al. 7.6 A 65nm 236.5 nJ/classification neuromorphic processor with 7.5% energy overhead on-chip learning using direct spike-only feedback
CN106951926A (en) The deep learning systems approach and device of a kind of mixed architecture
Khodamoradi et al. S2n2: A fpga accelerator for streaming spiking neural networks
CN110321997B (en) High-parallelism computing platform, system and computing implementation method
CN110163356A (en) A kind of computing device and method
WO2021089009A1 (en) Data stream reconstruction method and reconstructable data stream processor
CN110383300A (en) A kind of computing device and method
Liu et al. FPGA-NHAP: A general FPGA-based neuromorphic hardware acceleration platform with high speed and low power
CN109272110A (en) Photoelectricity based on photon neural network chip merges intelligent signal processing system
Zhang et al. An asynchronous reconfigurable SNN accelerator with event-driven time step update
CN109359730A (en) Neural network processor towards fixed output normal form Winograd convolution
CN110163350A (en) A kind of computing device and method
Momose et al. Systems and circuits for AI chips and their trends
CN108595379A (en) A kind of parallelization convolution algorithm method and system based on multi-level buffer
CN110689045A (en) Distributed training method and device for deep learning model
Chen et al. A 67.5 μJ/prediction accelerator for spiking neural networks in image segmentation
Chen et al. Cerebron: A reconfigurable architecture for spatiotemporal sparse spiking neural networks
Xiao et al. FPGA-based scalable and highly concurrent convolutional neural network acceleration
Sommer et al. Efficient hardware acceleration of sparsely active convolutional spiking neural networks
Chen et al. Rgp: Neural network pruning through regular graph with edges swapping
CN109359542A (en) The determination method and terminal device of vehicle damage rank neural network based
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN107122472A (en) Extensive unstructured data extracting method, its system, DDM platform
Ogbogu et al. Accelerating Graph Neural Network Training on ReRAM-based PIM Architectures via Graph and Model Pruning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180529

WD01 Invention patent application deemed withdrawn after publication