CN117634577A - Vector processor, neural network accelerator, chip and electronic equipment - Google Patents

Vector processor, neural network accelerator, chip and electronic equipment Download PDF

Info

Publication number
CN117634577A
CN117634577A CN202410101510.1A CN202410101510A CN117634577A CN 117634577 A CN117634577 A CN 117634577A CN 202410101510 A CN202410101510 A CN 202410101510A CN 117634577 A CN117634577 A CN 117634577A
Authority
CN
China
Prior art keywords
input
module
vector
data
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410101510.1A
Other languages
Chinese (zh)
Inventor
李兆钫
刘洪杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Jiutian Ruixin Technology Co ltd
Original Assignee
Shenzhen Jiutian Ruixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Jiutian Ruixin Technology Co ltd filed Critical Shenzhen Jiutian Ruixin Technology Co ltd
Priority to CN202410101510.1A priority Critical patent/CN117634577A/en
Publication of CN117634577A publication Critical patent/CN117634577A/en
Pending legal-status Critical Current

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a vector processor, a neural network accelerator, a chip and electronic equipment, relates to the technical field of neural networks, and solves the technical problems of high area and high power consumption of the vector processor. The vector processor comprises a top layer controller and an element processing unit; the element processing unit comprises a vector element data exchanger, a vector calculation module, a quantization module and an inverse quantization module; the inverse quantization module converts the input data of the low bit into the high bit and inputs the high bit into the vector element data exchanger; the vector element data exchanger enables the input data to select at least one type for calculation, and outputs calculation results to the quantization module; the quantization module converts the calculation result into output data; the top level controller is used to control the vector element data exchanger. The invention realizes the sharing of the quantization module and the inverse quantization module through the top-level controller and the vector element data exchanger, and reduces the area and the power consumption required by the vector processor.

Description

Vector processor, neural network accelerator, chip and electronic equipment
Technical Field
The present invention relates to the field of neural networks, and in particular, to a vector processor, a neural network accelerator, a chip, and an electronic device.
Background
Deep neural networks (Deep Neural Network, DNN) are a machine learning method based on an artificial neural network architecture, the artificial neural network (Artificial Neural Networks, ANN) using layers of interconnected nodes (called neurons) to process and learn input data. Deep neural networks are artificial neural networks having multiple layers located between an input layer and an output layer. Neural networks are always composed of identical components: neurons, synapses, weights, biases, and functions, which in practical applications are commonly referred to as operators. Common operators are: convolution, pooling, up/down sampling, activation of functions, element manipulation (element addition, element subtraction, element multiplication, element division), etc. Deep learning uses multiple layers to represent different levels of abstraction of data, thereby improving the accuracy and generalization ability of the model, and has been widely applied to the fields of computer vision, speech recognition, natural language processing, machine translation, bioinformatics, drug design, medical image analysis, etc., producing results comparable to or even exceeding the level of human expert. As data volumes continue to accumulate, neural network-based artificial intelligence techniques are becoming increasingly popular. Although the neural network has been proven to successfully solve the practical problems of automatic driving, face recognition and the like, the neural network is difficult to be deployed efficiently in the traditional hardware due to the limitation of the operation performance of the traditional hardware platform. Therefore, there is a need to design a custom hardware platform specifically for neural network algorithms, where the hardware platform is referred to as a neural network accelerator, and its core is typically a set of application specific integrated circuit chips, which are referred to as neural network accelerator chips.
As the number of layers, branches and scale of the deep neural network algorithm increases, the deep input feature map needs to fuse the output feature maps of the front layer or other branches, and the fusion process needs to perform element operation through a vector processor. Specifically, the element operation is to perform a point-to-point operation on each point between two feature graphs, and specific operations include addition, multiplication, subtraction, division, and the like, and the existing element operation process generally adopts a data format of 32 bits of floating point number, and a vector processor needs to have a storage module, a transmission circuit, and the like matched with the data format, so that quantization algorithm appears to reduce storage requirements and requirements of the transmission circuit. The neural network quantization algorithm is a technology for compressing the input, output and weight of 32 bits of floating point number to the input, output and weight of lower bits on the premise of keeping the effect of the neural network unchanged. In general, each layer of quantization has a set of quantization parameters, which represent different ranges of values. When performing point-to-point element operation on two feature maps, usually, inverse quantization is also required, the operation is performed by returning the low bit to the high bit, and the quantization is performed to the low bit after the operation is completed. When the vector processor performs calculation, when each calculation module supports the operation of different bit numbers, the bit number conversion module needs to be implemented in each calculation module, that is, each calculation module needs to be provided with a corresponding quantization module and a corresponding inverse quantization module, which brings about the problem of increasing the circuit area and the power consumption of the vector processor. Accordingly, there is a need for a new design that improves vector processors to reduce the area and power consumption required by the vector processors.
Disclosure of Invention
The invention aims to provide a vector processor, a neural network accelerator, a chip and electronic equipment, so as to at least solve the technical problems. The preferred technical solutions of the technical solutions provided by the present invention can produce a plurality of technical effects described below.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the invention provides a vector processor, comprising: the top-level controller and the at least one element processing unit connected with the top-level controller, wherein the element processing unit is used for performing element operation in vector operation; the element processing unit comprises a vector element data exchanger, a vector calculation module, a quantization module and an inverse quantization module; the vector calculation module is used for carrying out at least two types of vector calculation; the inverse quantization module is used for converting low-bit input data into high-bit input data and inputting the high-bit input data into the vector element data exchanger; the vector element data exchanger enables the input data of the high bit position to select at least one type from the vector calculation modules for calculation, and outputs the calculation result of the high bit position to the quantization module; the quantization module is used for converting the calculation result of the high bit into output data of the low bit; the top-level controller is used for controlling the vector element data exchanger to enable the input data of the high bit position to select the calculation type.
Preferably, the vector element data exchanger comprises a first input selector and a second input selector, and the first input selector and the second input selector are both connected with the top-level controller and the vector calculation module; the top-level controller controls the first input selector to enable input data to enter the vector calculation module to perform calculation of different types; and the top-level controller controls the second input selector to enable the intermediate calculation result of the vector calculation module to return or input data to enter the vector calculation module to perform calculation of different types.
Preferably, the vector element data exchanger further comprises an output selector, and the output selector is connected with the top-level controller and the vector calculation module; and the top-layer controller controls the output selector to enable the output selector to select the quantization module to output the calculation result or directly output the calculation result.
Preferably, at least two input switches are connected between the first input selector, the second input selector and the vector calculation module; different input switches are connected with different types of computing modules in the vector computing module one to one and are connected with the top-level controller; the top-level controller controls the starting or closing of the vector calculation modules of different types through the input switch, and controls input data to enter the vector calculation modules.
Preferably, the vector element data exchanger further comprises an output buffer and at least two input buffers; the input buffers are connected with the input switches one to one, and each input buffer is connected with the first input selector and the second input selector; the output buffer is connected with the output selector and the vector calculation module; the input buffer and the output buffer are respectively used for carrying out data buffering on input data and output data of the vector calculation module.
Preferably, the vector calculation module at least comprises any two of a vector addition module, a vector multiplication module and a vector division module, and the vector addition module, the vector multiplication module and the vector division module are respectively used for carrying out addition operation, multiplication operation and division operation between two vectors or one vector and one scalar.
Preferably, the element processing unit further comprises a direct input module and a direct output module; the bit number of the high bit and the bit number of the low bit are H and L respectively, and H is more than L; if the input data is L-bit data, the input data is input to the vector element data exchanger through the direct input module; and if the output data is H-bit data, outputting the output data through the direct output module.
Preferably, the element processing unit further comprises a first input selection module and a second input selection module, and the first input selection module and the second input selection module are connected with the top-level controller; the first input selection module and the second input selection module are independently connected with different inverse quantization modules and different direct input modules; the top layer controller selectively controls the data flow direction of the input data through the first input selection module and the second input selection module.
Preferably, the element processing unit further comprises an output selection module, and the output selection module is connected with the top-level controller; the top layer controller selectively controls the data flow direction of the output data; the output selection module is connected with the quantization module and the direct output module.
Preferably, the vector processor further comprises an input module and an input/output module; the input module is connected with the first input selection module and is used for inputting input data; the input/output module is connected with the second input selection module and the output selection module and is used for receiving the output data of the output selection module and inputting the input data to the second input selection module.
Preferably, the vector processor further includes a register, and the top-level controller is capable of configuring the register to enable input data of the first input selection module and the second input selection module to enter an inverse quantization module or a direct input module, respectively, and output data of the output selection module is from the quantization module or the direct output module.
A neural network accelerator comprising the vector processor of any of the above.
A chip comprising the vector processor of any of the above claims.
An electronic device comprising the vector processor of any one of the above or the chip of the above.
By implementing one of the technical schemes, the invention has the following advantages or beneficial effects:
in the invention, at least two types of vector calculation modules share the same quantization module and inverse quantization module through the top-level controller and the vector element data exchanger, and each type of vector calculation module in the prior art needs to be provided with the quantization module and the inverse quantization module independently, so that the use quantity of the quantization module and the inverse quantization module is saved compared with the prior art, and the area and the power consumption required by the vector processor are reduced. Meanwhile, the invention has the advantages of saving the bandwidth between the memory and the vector calculation module or calculation unit, reducing the data bandwidth between the memory and the vector calculation module from 32 bits in the prior art to 8 bits on the premise of the same data quantity m and calculation parallelism n, saving the transmission bandwidth and the storage space of the vector processor and further reducing the power consumption and the cost required by the vector processor.
Drawings
For a clearer description of the technical solutions of embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art, in which:
FIG. 1 is a schematic diagram illustrating the connection of an element processing unit to an input module, an input/output module and a top level controller in a vector processor according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a vector processor according to an embodiment of the present invention;
FIG. 3 is a schematic diagram showing a connection between an element processing unit and an input module, an input/output module and a top-level controller in a vector processor according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an element operation top layer module in a vector processor connected to a corpus switching module and a top layer controller of the vector processor according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating the connection of a vector processing unit in a vector processor with a top level controller, a line buffer, and an I/O interface according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a connection between a vector processor and CIM clusters, register buses, and shared memory according to a first embodiment of the present invention;
fig. 7 is a schematic diagram of connection between a neural network accelerator and an external memory according to a second embodiment of the present invention.
Detailed Description
For a better understanding of the objects, technical solutions and advantages of the present invention, reference should be made to the various exemplary embodiments described hereinafter with reference to the accompanying drawings, which form a part hereof, and in which are described various exemplary embodiments which may be employed in practicing the present invention. The same reference numbers in different drawings identify the same or similar elements unless expressly stated otherwise. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. It is to be understood that they are merely examples of processes, methods, apparatuses, etc. that are consistent with certain aspects of the present disclosure as detailed in the appended claims, other embodiments may be utilized, or structural and functional modifications may be made to the embodiments set forth herein without departing from the scope and spirit of the present disclosure.
In the description of the present invention, it should be understood that the terms "center," "longitudinal," "transverse," and the like are used in an orientation or positional relationship based on that shown in the drawings, and are merely for convenience in describing the present invention and to simplify the description, rather than to indicate or imply that the elements referred to must have a particular orientation, be constructed and operate in a particular orientation. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. The term "plurality" means two or more. The terms "connected," "coupled" and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, integrally connected, mechanically connected, electrically connected, communicatively connected, directly connected, indirectly connected via intermediaries, or may be in communication with each other between two elements or in an interaction relationship between the two elements. The term "and/or" includes any and all combinations of one or more of the associated listed items. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.
In order to illustrate the technical solutions of the present invention, the following description is made by specific embodiments, only the portions related to the embodiments of the present invention are shown.
Embodiment one: the present invention provides a vector processor, as shown in fig. 1, comprising: the top layer controller is used for executing element operation (element-wise) in vector operation, the element operation is a tensor operation common in neural network programming, corresponding operation is carried out on each pair of elements at corresponding positions of the two tensors, a new tensor with the same shape is generated, such as all arithmetic operations, adding, subtracting, multiplying and dividing are element operations, and the top layer controller controls quantization operation and inverse quantization operation of data flow in the element operation. For example: the top-level controller can distribute scalar quantities to a vector addition module, a vector multiplication module and a division module in the vector calculation module to perform operation between the scalar quantities and the vectors; the top controller can control the two input selection modules by a register configuration method, so that two input data respectively enter the inverse quantization module or the direct input module; the top level controller may control the output module by way of configuration registers such that the output is from the quantization module or the direct output module. The element processing unit mainly functions to process addition, multiplication, and division operations between two vectors and addition, multiplication, and division operations between one vector and one scalar in a vector processor. The element processing unit includes a vector element data exchanger (for performing input data flow and input data flow control of the vector calculation module), a vector calculation module, a quantization module, and an inverse quantization module. The vector calculation module is used for carrying out at least two types of vector calculation, and specific types can be selected and set according to requirements. The inverse quantization module is used for converting the input data with high bit into the input data with low bit, and inputting the input data into the vector element data exchanger, so that the vector calculation module can directly calculate. The top layer controller controls the vector element data exchanger to enable the input data of the low bit to select one type from the vector calculation modules for calculation, namely the vector element data exchanger is responsible for controlling the data flow direction, the input data of the low bit is sent into the calculation modules of the corresponding types to be carried out, the top layer controller controls the calculation result of the high bit to be output to the quantization module through the vector element data exchanger through control and is converted into the output data of the low bit, so that the directional flow control of the output data is realized, and the subsequent calculation and use of the vector calculation module are facilitated.
In the invention, the top-level controller, the vector element data exchanger, the vector calculation module, the quantization module and the inverse quantization module are used for realizing that at least two types of vector calculation modules share the same quantization module and inverse quantization module, and each type of vector calculation module in the prior art is required to be independently provided with the quantization module and the inverse quantization module, so that the number of the quantization module and the inverse quantization module is saved compared with the prior art, and the area and the power consumption of the vector processor are reduced. As shown in fig. 2, the architecture of the present invention is further advantageous in that bandwidth from the memory to the vector computing module or unit is saved, thereby saving transmission bandwidth and memory space of the vector processor, and further reducing power consumption and cost required by the vector processor. As the prior art generally adopts the method that 32-bit data is directly carried from a memory to a computing unit, the invention can change the data into 8-bit data or even lower through a vector element data exchanger, a vector computing module, a quantization module and an inverse quantization module, saves 75 percent of data transmission bandwidth and storage space, and can also change the data into other bits, such as 4 bits, 16 bits and the like, as required. In fig. 2, n represents the calculation parallelism, that is, the number of data that can be obtained and calculated at one time, and m represents the amount of data stored. The invention can reduce the data bandwidth between the memory and the vector calculation module from 32 bits to 8 bits on the premise of the same m and n, and the use of a smaller number of bits means less area and power consumption, so that the area consumed by the memory and the transmission circuit is one fourth of the existing scheme, and the static power consumption can be one fourth of the existing scheme under the condition of using the same memory and the transmission circuit architecture. Meanwhile, the cost of the area and the power consumption paid by the inverse quantization module is far smaller than m x n x 1 bits of the memory, so that the cost of the whole vector processor is reduced.
The invention combines the design of the quantization module and the inverse quantization module in the improved vector processor, the data is transmitted into low-bit data before vector operation, the low-bit data is converted into high-bit data by the inverse quantization module and then vector operation is carried out, the vector operation result is also high-bit data after vector operation is carried out on the high-bit data, and the high-bit data is converted into low-bit data by the quantization module for data transmission and data storage. Therefore, by the design of the quantization module and the inverse quantization module, most of storage processes and transmission processes in the operation process of the neural network only need low-bit data, the low-bit data can realize smaller area and lower power consumption, the storage requirement and the transmission circuit requirement in calculation are reduced, the area and the power consumption required by a vector processor in the data transmission and storage processes are further reduced, and the system level calculation force is improved.
As an alternative embodiment, as shown in fig. 3, the vector element data exchanger includes a first input selector and a second input selector, where the first input selector and the second input selector are both connected to the top-level controller and the vector calculation module, and the first input selector and the second input selector are respectively used for inputting two sets of input data required by the vector calculation module to calculate, but the sources of the two sets of input data are different. The top level controller controls the first input selector to cause input data, which is input from outside the element processing unit, such as from a line buffer, to enter the vector calculation module for different types of calculations. The top controller controls the second input selector to enable the intermediate calculation result of the vector calculation module to return or input data to enter the vector calculation module to perform different types of calculation, the input data calculated at this time can be the calculation result corresponding to the first input selector, of course, the next calculation can be the same type as the calculation at the last time or can be different, and therefore the cyclic calculation of the data in the vector calculation module is achieved, and the input data can be new input data of the input element processing unit. Through the first input selector and the second exchanger, two groups of data input required by different types of calculation in the vector calculation module are realized, and any calculation type in the vector calculation module can be freely selected by the two groups of data, so that the element processing unit realizes the efficient sharing of the quantization module and the inverse quantization module, and the area and the power consumption corresponding to the element processing unit are saved.
As an alternative embodiment, as shown in fig. 3, the vector element data exchanger further includes an output selector, where the output selector is connected with the top-level controller and the vector calculation module; the top layer controller controls the output selector to enable the output selector to select the quantization module to output the calculation result or directly output the calculation result, and the middle calculation result or the final calculation result of the vector calculation module can be selectively quantized to output or directly output according to the requirement through the output selector, so that the follow-up use of the calculation result data is facilitated.
As an alternative embodiment, as shown in fig. 3, at least two input switches are connected between the first input selector, the second input selector and the vector calculation module; different input switches are connected with different types of computing modules in the vector computing module one to one, namely each type of computing module is provided with an input switch and is connected with the top-layer controller, so that the top-layer controller can control the work of the input switch. The top-level controller controls the starting or closing of different types of vector computing modules through the input switch, namely the input switch can start or close clock signals of different types of computing modules in the vector computing modules (when the types of computing modules are not started, the clock signals of the computing modules are kept at a low potential), so that the computing modules can participate or cannot participate in corresponding computation, and input data are controlled to enter the vector computing modules. Therefore, according to the type of calculation required by input data, the top controller starts the corresponding calculation module and closes other calculation modules, namely, the input control of the input data stream is realized, so that the quantization module is shared among a plurality of calculation modules after being matched with the quantization module, and the area and the power consumption corresponding to the element processing unit are saved.
As an alternative embodiment, as shown in fig. 3, the vector element data exchanger further includes an output buffer and at least two input buffers; the input buffers are connected with the input switches one to one, so that the number of the input buffers is the same as the number of types in the input switches and the vector calculation modules, and the input buffers can buffer the input data of each type of calculation module. Each input buffer is connected to the first input selector and the second input selector, so that the first input selector and the second input selector can input data into any type of calculation module. The output buffer is connected with the output selector and the vector calculation module and is used for buffering data of the calculation result of the vector calculation module. The input buffer and the output buffer are respectively used for buffering the input data and the output data of the vector processor, and after one or more periods of data buffering, the input buffer and the output buffer can be used for solving establishment time violations or maintenance time violations possibly generated under different frequency product requirements, so that the operation efficiency of the whole vector processor is improved.
As an alternative embodiment, as shown in fig. 3, the vector calculation module at least includes any two of a vector addition module, a vector multiplication module, and a vector division module, preferably the above three, and of course, other calculation modules may be added according to actual operation needs. The vector addition module, the vector multiplication module and the vector division module are respectively used for carrying out addition operation, multiplication operation and division operation between two vectors or one vector and one scalar, namely, the three calculation modules can all carry out calculation of the vectors and the vector and the scalar, so that the calculation requirements of various neural network application scenes can be better met.
As an alternative embodiment, as shown in fig. 3, the input switches include a first input switch, a second input switch, and a third input switch, and the input buffers include a first input buffer, a second input buffer, and a third input buffer; the first input buffer, the second input buffer and the third input buffer are respectively connected with the first input switch, the second input switch and the third input switch. The first input switch, the second input switch and the third input switch are respectively connected with the vector addition module, the vector multiplication module and the vector division module, so that the three input buffers, the three input switches and the three calculation modules are connected one to one, and directional control of an input data stream can be realized through control of the top-layer controller. Of course, the first, second and third are just named distinction, and may be actually connected by other combinations as needed, such as the first input buffer, the second input switch, the vector division module, and the like, which are sequentially connected.
As an alternative embodiment, as shown in fig. 3, the element processing unit further includes a direct input module and a direct output module; the number of bits of the high bit and the number of bits of the low bit are H and L respectively, H > L, preferably H is 32, L is 8, and other H and L values, such as H is 64, L is 4, etc., can be set according to actual calculation needs. If the input data is H-bit data, i.e. the input data is high-bit data, the input data is input to the vector element data exchanger through the direct input module, and the vector calculation module can directly perform vector calculation on the high-bit input data. If the output data is L-bit data, namely the calculation result of the vector calculation module is low-bit data, the output data is output through the direct output module. By arranging the direct input module and the direct output module, the data which do not need to be quantized and inverse quantized are input or output rapidly, so that the processing efficiency of the input data and the output data is improved, and the quantization module and the inverse quantization module can work simultaneously at the moment, namely are parallel to the direct input module or the direct output module, so that higher-speed data flow can be carried out, and the utilization rate of the modules is improved.
As an alternative embodiment, as shown in fig. 3, the element processing unit further includes a first input selection module and a second input selection module, where the first input selection module and the second input selection module are both connected to the top-level controller. The first input selection module and the second input selection module are independently connected with different inverse quantization modules and different direct input modules, namely, the first input selection module is connected with one group of inverse quantization modules and the direct input module, and the second input selection module is connected with the other group of inverse quantization modules and the direct input module. The top layer controller carries out selection control on the data flow direction of the input data through the first input selection module and the second input selection module, namely the top layer controller judges the bit number of the input data, and the input data is high bit or low bit, so that the corresponding input channel is selected. If the input data is high bit, the input data stream selects a direct input module, and if the input data is low bit, the input data stream selects an inverse quantization module, so that data distribution is realized according to the bit number of the input data stream, and the data stream operation efficiency is improved.
As an alternative embodiment, as shown in fig. 3, the element processing unit further includes an output selection module, where the output selection module is connected to the top level controller. The top layer controller selectively controls the data flow direction of the output data; the output selection module is connected with the quantization module and the direct output module. When the output data stream (the calculation result of the vector calculation module) is high-bit, the quantization module is selected for data output, and when the output data stream is low-bit, the direct output module is selected for data output, so that data distribution is realized according to the bit number of the output data stream, and the data stream operation efficiency is improved.
As an alternative embodiment, as shown in fig. 3, 4 and 6, the vector processor further includes an input module and an input/output module, where the element processing unit, the input module and the input/output module together serve as a unit module in the element top-level operation module, and the element top-level operation module is a functional module in the vector processing unit of the vector processor. The input module is connected with the first input selection module and is used for inputting input data. The input/output module is connected with the second input selection module and the output selection module and is used for receiving the output data of the output selection module, inputting the input data into the second input selection module, and carrying out the next vector calculation by taking the input data of the input module and the input data of the input module together as input data. In fig. 5, N x 32 bits represent N32-bit data lines. N x 8 bits represent N8-bit data lines.
As an alternative embodiment, as shown in fig. 5, the vector processor further includes a register, and the top-level controller can configure the register so that the input data of the first input selection module and the second input selection module respectively enter the inverse quantization module or the direct input module, and the output data of the output selection module comes from the quantization module or the direct output module. The register can divide high-bit and low-bit, so that input data and output data can select corresponding data channels according to bit numbers to input and output data, and meanwhile, the register can adjust the rule of dividing the high-bit and the low-bit according to calculation requirements so as to adapt to wider calculation scenes.
The embodiment is a specific example only and does not suggest one such implementation of the invention.
Embodiment two: a neural network accelerator comprising the vector processor of embodiment one. As shown in fig. 7, the neural network accelerator further includes a preprocessing module, an in-memory computing matrix, and a shared memory, wherein the preprocessing module is connected with the in-memory computing matrix, the in-memory computing matrix is connected with the vector processor, and the shared memory is connected with the preprocessing module, the in-memory computing matrix in a unidirectional manner, and is also connected with the vector processor. The in-memory computing matrix can be a matrix formed by a plurality of CIMs (computing in memory, in-memory computing), and by adopting the vector processor in the first embodiment (the vector processor is a multi-operator fusion vector processor capable of realizing fusion of a plurality of operators), the area and the power consumption of the neural network accelerator are effectively reduced, and the neural network accelerator is convenient to use.
In addition, the memory wall problem can be solved by the memory calculation. The von neumann architecture computer system divides the memory and the processor into two parts, and the overhead of the processor for frequently accessing the memory forms a memory wall, and high-frequency data handling is often the primary cause of power consumption occupied by chips, especially the chips in the AI field, so as to influence the computing power, efficiency, power consumption and the like of the chips. The neural network accelerator with the sensing and calculation integrated technology (integrating sensing, storage and operation) can have ultrahigh calculation power, efficiency and energy efficiency ratio, so that the neural network accelerator can improve the area and power consumption performance of the neural network accelerator without affecting the function of the neural network accelerator.
Embodiment III: a chip comprising the vector processor of embodiment one. By adopting the vector processor in the first embodiment, the area, cost and power consumption of the chip are reduced, so that more various and complex functions can be integrated on the chip, the applicability of the chip is higher, and the chip is convenient to use in more complex working scenes. The chip provided by the invention can be an AI visual chip, and all or part of each module in the chip can be realized by software, hardware and a combination thereof. The above modules can be embedded in or independent of a processor in the computing device in a hardware form, and can also be stored in a memory in the computing device in a software form, so that the processor can call and execute the operations corresponding to the above modules, and the chip area, cost and power consumption can be effectively reduced.
Embodiment four: an electronic device includes the vector processor of the first embodiment or the chip of the third embodiment, so as to reduce power consumption and cost of the electronic device. The chip provided by the invention can be applied to automatic driving, AR, VR, laser radar, a series of electronic equipment with high requirements on low power consumption and high energy efficiency ratio, such as smart phones, tablet personal computers, wearable electronic equipment, intelligent home electronic products, industry or medical treatment or battery power supply.
The foregoing is only illustrative of the preferred embodiments of the invention, and it will be appreciated by those skilled in the art that various changes in the features and embodiments may be made and equivalents may be substituted without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (14)

1. A vector processor, comprising: the top-level controller and the at least one element processing unit connected with the top-level controller, wherein the element processing unit is used for performing element operation in vector operation; the element processing unit comprises a vector element data exchanger, a vector calculation module, a quantization module and an inverse quantization module; the vector calculation module is used for carrying out at least two types of vector calculation; the inverse quantization module is used for converting low-bit input data into high-bit input data and inputting the high-bit input data into the vector element data exchanger; the vector element data exchanger enables the input data of the high bit position to select at least one type from the vector calculation modules for calculation, and outputs the calculation result of the high bit position to the quantization module; the quantization module is used for converting the calculation result of the high bit into output data of the low bit; the top-level controller is used for controlling the vector element data exchanger to enable the input data of the high bit position to select the calculation type.
2. The vector processor of claim 1, wherein the vector element data switch comprises a first input selector and a second input selector, the first input selector and the second input selector being connected to the top level controller and the vector calculation module; the top-level controller controls the first input selector to enable input data to enter the vector calculation module to perform calculation of different types; and the top-level controller controls the second input selector to enable the intermediate calculation result of the vector calculation module to return or input data to enter the vector calculation module to perform calculation of different types.
3. A vector processor according to claim 2, wherein the vector element data switch further comprises an output selector, the output selector being coupled to the top level controller, vector calculation module; and the top-layer controller controls the output selector to enable the output selector to select the quantization module to output the calculation result or directly output the calculation result.
4. The vector processor of claim 2, wherein at least two input switches are connected between the first input selector, the second input selector and the vector calculation module; different input switches are connected with different types of computing modules in the vector computing module one to one and are connected with the top-level controller; the top-level controller controls the starting or closing of the vector calculation modules of different types through the input switch, and controls input data to enter the vector calculation modules.
5. The vector processor of claim 4, wherein the vector element data switch further comprises an output buffer and at least two input buffers; the input buffers are connected with the input switches one to one, and each input buffer is connected with the first input selector and the second input selector; the output buffer is connected with the output selector and the vector calculation module; the input buffer and the output buffer are respectively used for carrying out data buffering on input data and output data of the vector calculation module.
6. The vector processor of claim 5, wherein the vector calculation module comprises at least any two of a vector addition module, a vector multiplication module, and a vector division module, each for performing an addition, multiplication, and division operation between two vectors or between a vector and a scalar.
7. A vector processor according to any one of claims 1-6, wherein the element processing unit further comprises a direct input module, a direct output module; the bit number of the high bit and the bit number of the low bit are H and L respectively, and H is more than L; if the input data is L-bit data, the input data is input to the vector element data exchanger through the direct input module; and if the output data is H-bit data, outputting the output data through the direct output module.
8. The vector processor of claim 7, wherein the element processing unit further comprises a first input selection module and a second input selection module, the first input selection module and the second input selection module being connected to the top level controller; the first input selection module and the second input selection module are independently connected with different inverse quantization modules and different direct input modules; the top layer controller selectively controls the data flow direction of the input data through the first input selection module and the second input selection module.
9. The vector processor of claim 8, wherein the element processing unit further comprises an output selection module, the output selection module being coupled to the top level controller; the top layer controller selectively controls the data flow direction of the output data; the output selection module is connected with the quantization module and the direct output module.
10. The vector processor of claim 9, further comprising an input module and an input/output module; the input module is connected with the first input selection module and is used for inputting input data; the input/output module is connected with the second input selection module and the output selection module and is used for receiving the output data of the output selection module and inputting the input data to the second input selection module.
11. The vector processor of claim 8, further comprising a register, wherein the top level controller is configured to cause input data from the first input selection module, the second input selection module to enter an inverse quantization module or a direct input module, respectively, and output data from the output selection module to come from the quantization module or the direct output module, respectively.
12. A neural network accelerator comprising the vector processor of any of claims 1-11.
13. A chip comprising the vector processor of any one of claims 1-11.
14. An electronic device comprising the vector processor of any one of claims 1-11 or the chip of claim 13.
CN202410101510.1A 2024-01-25 2024-01-25 Vector processor, neural network accelerator, chip and electronic equipment Pending CN117634577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410101510.1A CN117634577A (en) 2024-01-25 2024-01-25 Vector processor, neural network accelerator, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410101510.1A CN117634577A (en) 2024-01-25 2024-01-25 Vector processor, neural network accelerator, chip and electronic equipment

Publications (1)

Publication Number Publication Date
CN117634577A true CN117634577A (en) 2024-03-01

Family

ID=90018448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410101510.1A Pending CN117634577A (en) 2024-01-25 2024-01-25 Vector processor, neural network accelerator, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN117634577A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111937010A (en) * 2018-03-23 2020-11-13 亚马逊技术股份有限公司 Accelerated quantized multiplication and addition operations
CN114418057A (en) * 2020-10-28 2022-04-29 华为技术有限公司 Operation method of convolutional neural network and related equipment
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN114781618A (en) * 2022-04-29 2022-07-22 苏州浪潮智能科技有限公司 Neural network quantization processing method, device, equipment and readable storage medium
CN116611488A (en) * 2023-05-24 2023-08-18 奥比中光科技集团股份有限公司 Vector processing unit, neural network processor and depth camera
CN117195989A (en) * 2023-11-06 2023-12-08 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111937010A (en) * 2018-03-23 2020-11-13 亚马逊技术股份有限公司 Accelerated quantized multiplication and addition operations
CN114418057A (en) * 2020-10-28 2022-04-29 华为技术有限公司 Operation method of convolutional neural network and related equipment
WO2022088063A1 (en) * 2020-10-30 2022-05-05 华为技术有限公司 Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN114781618A (en) * 2022-04-29 2022-07-22 苏州浪潮智能科技有限公司 Neural network quantization processing method, device, equipment and readable storage medium
CN116611488A (en) * 2023-05-24 2023-08-18 奥比中光科技集团股份有限公司 Vector processing unit, neural network processor and depth camera
CN117195989A (en) * 2023-11-06 2023-12-08 深圳市九天睿芯科技有限公司 Vector processor, neural network accelerator, chip and electronic equipment

Similar Documents

Publication Publication Date Title
Guo et al. Software-hardware codesign for efficient neural network acceleration
CN109858620B (en) Brain-like computing system
CN109543832B (en) Computing device and board card
CN106529670B (en) It is a kind of based on weight compression neural network processor, design method, chip
CN109522052B (en) Computing device and board card
WO2018192500A1 (en) Processing apparatus and processing method
EP1149359B1 (en) Neural processing element for use in a neural network
CN107609641A (en) Sparse neural network framework and its implementation
CN108090560A (en) The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
CN109472356A (en) A kind of accelerator and method of restructural neural network algorithm
CN110163359A (en) A kind of computing device and method
CN107766935B (en) Multilayer artificial neural network
CN110383300A (en) A kind of computing device and method
TW201807621A (en) Artificial neuron and controlling method thereof
CN111047045B (en) Distribution system and method for machine learning operation
CN117195989A (en) Vector processor, neural network accelerator, chip and electronic equipment
CN110909870B (en) Training device and method
CN108320018A (en) A kind of device and method of artificial neural network operation
Geng et al. CQNN: a CGRA-based QNN framework
Wang et al. LSMCore: a 69k-synapse/mm 2 single-core digital neuromorphic processor for liquid state machine
CN111831354A (en) Data precision configuration method, device, chip array, equipment and medium
CN111831359A (en) Weight precision configuration method, device, equipment and storage medium
CN109670581B (en) Computing device and board card
CN111178492B (en) Computing device, related product and computing method for executing artificial neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination