CN107817708B

CN107817708B - High-compatibility programmable neural network acceleration array

Info

Publication number: CN107817708B
Application number: CN201711131564.9A
Authority: CN
Inventors: 陈迟晓; 史传进; 张怡云
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2017-11-15
Filing date: 2017-11-15
Publication date: 2020-07-07
Anticipated expiration: 2037-11-15
Also published as: CN107817708A

Abstract

The invention belongs to the technical field of integrated circuits, and particularly relates to a high-compatibility programmable neural network acceleration array. The array adopts a reconfigurable architecture and comprises a central controller, a characteristic vector emitter and a plurality of neural network computing units; the computing unit chip comprises basic neural network computing modules such as a programmable multiply-add unit, a programmable activation unit and a unit chip controller, and the acceleration array carries out communication among any unit chips through a programmable communication route. The programmable neural network acceleration array can be compatible with various neural network algorithms, does not lose high energy efficiency, and is suitable for various deep learning intelligent systems.

Description

High-compatibility programmable neural network acceleration array

Technical Field

The invention belongs to the technical field of integrated circuits, and particularly relates to a high-compatibility programmable neural network acceleration array.

Background

The development of customized deep learning acceleration chips on mobile devices is getting hot today, and the challenge is that the performance of the chip is limited by the types of deep learning networks, such as CNN (convolutional neural network) and RNN (cyclic neural network), and in order to design a highly energy-efficient customized deep learning acceleration chip, the chip is often optimized for some networks, and the performance is high when the networks are used, and the performance is not good under other networks. However, due to the recent rapid development of the deep learning field, an improved version of CNN or RNN network may appear in the future, and even other new deep learning neural network algorithms appear, so that the existing special deep learning acceleration chip cannot meet the required performance requirement, and the development of deep learning intelligence is fundamentally limited.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, it is an object of the present invention to provide a highly compatible programmable neural network acceleration array.

The invention provides a high-compatibility programmable neural network acceleration array, which adopts a reconfigurable architecture and comprises a central controller, a feature vector transmitter and a plurality of neural network computing units. Wherein:

the central controller is responsible for the global control of the deep learning neural network;

the characteristic vector emitter is responsible for emitting required characteristic vectors to all the neural network computing units;

the neural network computing unit chip comprises basic neural network computing modules, including but not limited to a programmable multiply-add unit, a programmable activation unit, a unit chip controller and a cache (optional);

the neural network computing unit can carry out communication between any units through a programmable communication route.

The invention has the technical effects that the reconfigurability of the neural network acceleration array architecture, the programmability of a calculation module in a calculation unit chip and the programmability of communication among the unit chips have great flexibility, the calculation mode, the data storage and the data trend can be combined at will through programming, various deep learning network topological structures and new algorithms which may appear in the future can be compatible, high energy efficiency is kept simultaneously, and the invention has great application prospect in an artificial intelligent system applying the deep learning algorithms.

Drawings

FIG. 1 is a schematic diagram of a high-compatibility programmable neural network acceleration array architecture according to the present invention.

Fig. 2 is a schematic structural diagram of a neural network computing unit chip of the present invention.

FIG. 3 is a communication diagram of the neural network computing unit inter-die communication routing of the present invention.

FIG. 4 is a schematic diagram of a neural network computing unit in accordance with an embodiment of the present invention.

Reference numbers in the figures: 11 is a central controller, 12 is a characteristic vector emitter, and 13 is a neural network computing unit chip; 21 is a programmable multiplication and addition unit, 22 is a programmable activation unit, 23 is a unit controller, and 24 is a cache; 31 is calculating inter-die communication routes.

Detailed Description

The present invention will be described more fully hereinafter in the reference to the accompanying drawings, which provide preferred embodiments of the invention, and which are not to be considered as limited to the embodiments set forth herein.

Fig. 1 is a schematic diagram of an architecture of a high-compatibility programmable neural network acceleration array of the present invention, in which a central controller 11 is responsible for global control of a deep learning neural network, and a feature vector transmitter 12 is responsible for transmitting required feature vectors to all neural network computing units 13.

The neural network computing unit chip of the present invention comprises all basic deep learning algorithm computing modules, as shown in fig. 2, the computing unit chip comprises a programmable multiply-add unit 21, a programmable activation unit 22, an on-chip controller 23, and a cache 24. The programmable multiplication and addition unit 21 can complete vector multiplication and addition calculation generally required in the deep learning algorithm, the programmability of the programmable multiplication and addition unit can be realized by a switch array, and multiplication and addition operation with different precisions can be realized in an online programmable mode; the programmable activation unit 22 can complete activation calculations commonly required in deep learning algorithms, such as nonlinear calculations of sigmoid, relu functions and the like; the in-cell controller 23 is responsible for the control function in the cell; the cache 24 is optional and may be used to store computed intermediate values, etc.

As shown in fig. 4, the embodiment of the calculating unit chip is illustrated, wherein the programmable multiply-add unit is divided into a multiply part and an add part, the multiply part is composed of a switch array and a multiplier, and can implement vector multiplication with two accuracies of 8 bits and 4 bits, each accuracy has a plurality of identical calculating modules, and implements multiplication of a feature vector and a weight vector, and the obtained result is processed by programmable accumulation calculation and finally enters a programmable activation unit, which includes a controller, a shift and piecewise linear calculation, an Arithmetic Logic Unit (ALU) and a Multiplexer (MUX), and in order to take account of special transitive operation required by the recurrent neural network, a combination of nonlinear calculation and basic operation of other variables can be implemented by online programming. The computing unit slice realizes the flexibility of a computing mode of a deep learning algorithm on the framework.

FIG. 3 illustrates the communication of neural network computational inter-die communication routes, with an inter-die communication route 31 used in conjunction with die 13 to allow communication between any of the dies to achieve the flexibility of speeding up the data flow of the array architecture.

While the embodiments of the present invention have been described with reference to specific examples, those skilled in the art will readily appreciate that the various illustrative embodiments are capable of providing many other embodiments and that many other advantages and features of the invention are possible. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Claims

1. A high-compatibility programmable neural network acceleration array is characterized in that a reconfigurable architecture is adopted, and the high-compatibility programmable neural network acceleration array comprises a central controller, a feature vector transmitter and a plurality of neural network computing unit chips; wherein:

the characteristic vector transmitter is responsible for transmitting required characteristic vectors to all the neural network computing units in a broadcast mode;

the neural network computing unit chip comprises a basic neural network computing module;

the neural network computing unit chip carries out communication between any two units through a programmable communication route;

the basic neural network computing module comprises a programmable multiply-add unit, a programmable activation unit and a unit chip controller;

the basic neural network computing module also comprises a cache used for storing and computing the intermediate value;

the programmable multiply-add unit is used for finishing vector multiplication and addition calculation in a deep learning algorithm, the programmability of the programmable multiply-add unit is realized by a switch array, and multiply-add operation with different precisions is realized in an online programmable mode; the programmable activation unit is used for completing an activation meter in a deep learning algorithm; the unit chip controller is responsible for the control function in the unit chip;

the programmable multiply-add unit is divided into a multiply part and an add part, the multiply part is composed of a switch array and a multiplier, vector multiplication of 8-bit precision and 4-bit precision is realized, each precision is provided with a plurality of same calculation modules, the product of a feature vector and a weight vector is realized, programmable accumulation calculation is carried out on the obtained result, and finally the result enters a programmable activation unit which comprises a controller, a shift and piecewise linear calculation, an Arithmetic Logic Unit (ALU) and a Multiplexer (MUX).