WO2022170811A1

WO2022170811A1 - Fixed-point multiply-add operation unit and method suitable for mixed-precision neural network

Info

Publication number: WO2022170811A1
Application number: PCT/CN2021/131800
Authority: WO
Inventors: 毛伟; 余浩; 安丰伟; 李凯; 周俊卓; 王宇航; 王祥龙; 石港
Original assignee: 南方科技大学
Priority date: 2021-02-09
Filing date: 2021-11-19
Publication date: 2022-08-18
Also published as: CN113010148A; CN113010148B

Abstract

Disclosed is a fixed-point multiply-add operation unit and method suitable for a mixed-precision neural network. A mixed-precision point multiplication operation is achieved by inputting input data precision having different precisions from different positions into a multiplier, controlling, according to a mode signal, the multiplier to mask a partial product of a specified area, then outputting a partial product generation portion, and performing a summation operation on the outputted partial product generation portion according to methods corresponding to different precisions. The multiplier can realize the point multiplication operation of the mixed-precision neural network, solving the problems in the prior art, such as excessive hardware overhead and redundant resource redundancy which are caused by processing a mixed-precision operation by using a processing unit having a plurality of different precisions.

Description

A fixed-point multiply-add operation unit and method suitable for mixed-precision neural network

technical field

The invention relates to the field of digital circuits, in particular to a fixed-point multiply-add operation unit and method suitable for a mixed-precision neural network.

Background technique

At present, artificial intelligence algorithms are widely used in many commercial fields. In order to improve the performance of network computing, the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing. As a computing carrier for algorithm implementation, artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design. Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different precision hardware, reducing throughput, and cannot target applications. Demand configuration adjustment and maximum utilization of hardware resources to improve energy efficiency ratio and throughput rate, resulting in a waste of operating time and operating area.

Therefore, the existing technology still needs to be improved and developed.

SUMMARY OF THE INVENTION

The technical problem to be solved by the present invention is to provide a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network in view of the above-mentioned defects of the prior art, aiming at solving the need to use a variety of different precision processing in the prior art The unit processes mixed-precision operations, resulting in problems such as excessive hardware overhead and redundant idle resources.

The technical scheme adopted by the present invention to solve the problem is as follows:

In a first aspect, an embodiment of the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks, wherein the method includes:

acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;

Process the partial products generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum;

The target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data.

In one embodiment, the acquiring the mode signal and the input data, determining the data input position according to the mode signal, and inputting the input data from the data input position into the multiplier includes:

Acquire the mode signal and input data, and determine the number of multipliers called according to the precision of the input data;

When the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;

Determine the data input position according to the mode signal, split the data with the highest precision in the input data, and input the input data obtained after the splitting into the multiplier from the data input position;

When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;

A data input location is determined based on the mode signal, and the input data is input into a multiplier from the data input location.

In one embodiment, the acquiring mode signal, processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as the target sum including :

obtaining a mode signal, and processing the partial product generated by the multiplier according to the mode signal;

Splitting the partial product generation part obtained after processing into a first partial product generation part and a second partial product generation part;

A summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.

In one embodiment, the mode signal is determined by the precision of the input data; the processing includes at least one of the following operations:

performing masking processing on the partial product of the preset area generated by the multiplier;

When the number of called multipliers is greater than 1, a shift process is performed on the partial product generation part output by the multipliers that perform lower order operations.

In an implementation manner, when the input data are of the same precision and the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the first partial product generating part and the second The second partial product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:

inputting the first partial product generating part and the second partial product generating part into the first-stage compressor a and the first-stage compressor b respectively;

inputting the output results of the first-stage compressor a and the first-stage compressor b together into the second-stage compressor c;

The output result of the second stage compressor c is input into an adder, and the output result of the adder is used as a target sum.

In one embodiment, when the input data is mixed precision, performing a summation operation on the first partial product generation part and the second partial product generation part, and obtains a target based on the summation operation and includes:

Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;

When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively. stage compressor b;

The output results of the first-stage compressor a and the first-stage compressor b are respectively input into the first adder and the second adder, and the first adder and the second adder are The sum of the output results is used as the target sum.

When the highest bit number of the input data is greater than the highest bit number of the multiplier, the multiplier includes a first multiplier and a second multiplier, and the second multiplier is a low-order operation multiplier; the first multiplier outputs the first partial product generating part, and the second multiplier outputs the second partial product generating part;

inputting the first partial product generation part directly into the first adder;

splitting the second partial product generation part and inputting them into the first adder and the second adder respectively;

The sum of the output results of the first adder and the second adder is used as the target sum.

In an implementation manner, the said target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data, comprising:

Determine the truncation bit width according to the precision of the input data;

According to the truncation bit width, a truncation operation is performed on the target sum starting from the 0th bit, and the data obtained after the truncation and selection operation is used as the result of the dot product of the input data.

In one embodiment, the method further includes:

determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;

When the most significant bit of the input data is a negative number, inversion and addition of one processing is performed on the to-be-adjusted partial product generation part.

In a second aspect, an embodiment of the present invention further provides a fixed-point multiply-add operation unit suitable for a mixed-precision neural network, wherein the operation unit includes:

a position determination module for acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;

a partial product processing module for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;

The result generation module is used for intercepting the target sum, and using the data obtained after the interception as the result of the dot product of the input data.

Beneficial effects of the present invention: the present invention outputs the partial product generating part after the partial product of the designated area is masked according to the mode signal by inputting the precision of the input data with different precisions into the multiplier, and controls the output part to the output part. The product generation part performs the summation operation according to the methods corresponding to different precisions, so as to realize the dot multiplication operation of mixed precision. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.

Description of drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

FIG. 1 is a schematic flowchart of a fixed-point multiply-add operation method suitable for a mixed-precision neural network provided by an embodiment of the present invention.

FIG. 2 is a schematic diagram of a partial product generated in a conventional 8bit×8bit multiplier provided by an embodiment of the present invention.

FIG. 3 is an addition tree structure used by a conventional 8bit×8bit multiplier provided by an embodiment of the present invention.

FIG. 4 is a reference diagram for implementing multiplication operations of four groups of input data with a precision of 2bit×2bit based on a group of 8bit×8bit multipliers provided by an embodiment of the present invention.

FIG. 5 is a reference diagram for implementing multiplication operations of two groups of input data with a precision of 4bit×4bit based on a group of 8bit×8bit multipliers provided by an embodiment of the present invention.

FIG. 6 is a reference diagram for implementing a multiplication operation of input data with a precision of 1 bit×1 bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 7 is a reference diagram for implementing a multiplication operation of input data with a precision of 3 bits×3 bits based on an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 8 is a reference diagram for implementing a multiplication operation of input data with a precision of 5bit×5bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 9 is a reference diagram for implementing a multiplication operation of input data with a precision of 6bit×6bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 10 is a reference diagram for implementing a multiplication operation of input data with a precision of 7bit×7bit based on an 8bit×8bit multiplier according to an embodiment of the present invention.

11 is a reference diagram for realizing multiplication of two 4bit×8bit mixed-precision input data by dividing and summing the partial product generation part based on an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 12 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit×16bit based on two sets of 8bit×8bit multipliers according to an embodiment of the present invention.

FIG. 13 is a schematic diagram of accumulating the output data of the first multiplier and the second multiplier under mixed precision provided by an embodiment of the present invention.

FIG. 14 is a schematic diagram of implementing 8bit×xbit multiplication based on two groups of 8bit×8bit multiplier architectures provided by an embodiment of the present invention, where x=9˜15bit.

FIG. 15 is a reference diagram for implementing a multiplication operation of input data with a mixed precision of 8bit×15bit based on two sets of 8bit×8bit multipliers provided by an embodiment of the present invention.

FIG. 16 is a schematic diagram of a partial product including a sign bit in an 8bit×8bit multiplier according to an embodiment of the present invention.

FIG. 17 is a reference diagram of an internal module of an arithmetic unit provided by an embodiment of the present invention.

Detailed ways

In order to make the objectives, technical solutions and advantages of the present invention clearer and clearer, the present invention will be further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

It should be noted that if there are directional indications (such as up, down, left, right, front, back, etc.) involved in the embodiments of the present invention, the directional indications are only used to explain a certain posture (as shown in the accompanying drawings). If the specific posture changes, the directional indication also changes accordingly.

At present, artificial intelligence algorithms are widely used in many commercial fields. In order to improve the performance of network computing, the quantization of different layers of the network is one of the important methods to improve the efficiency of network computing. As a computing carrier for algorithm implementation, artificial intelligence chips have an increasing demand for mixed-precision computing in the process of data processing in order to meet the characteristics of network design. Conventional processors use a variety of processing units with different precisions to process mixed-precision operations. This method makes the hardware overhead too large, redundant idle resources, and excessive delays when switching between different-precision hardware, reducing throughput, and cannot target applications. Demand configuration adjustment and maximum utilization of hardware resources to improve energy efficiency ratio and throughput rate, resulting in a waste of operating time and operating area.

In view of the above-mentioned defects of the prior art, the present invention provides a fixed-point multiply-add operation method suitable for mixed-precision neural networks. By inputting the precision of input data of different precisions into the multiplier from different positions, the multiplication is controlled according to the mode signal. After masking the partial product of the specified area, the device outputs the partial product generation part, and performs the sum operation on the output partial product generation part according to the method corresponding to different precisions, so as to realize the mixed precision dot product operation. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.

As shown in Figure 1, the method includes the following:

Step S100: Acquire a mode signal and input data, determine a data input position according to the mode signal, and input the input data into the multiplier from the data input position.

Since this embodiment uses a unified multiplier to perform the dot multiplication operation of the mixed-precision neural network, and the number of bits of the input position of the multiplier is fixed, it is possible that the precision of the input data is different from the highest number of bits of the multiplier. matching situation. In order to make the multiplier suitable for input data of different precisions, this embodiment needs to acquire a mode signal and input data, determine the data input position according to the mode signal, and then input the input data from the data input position into the multiplier middle. In this embodiment, input data of different precisions are input into the multipliers from different data input positions, thereby implementing the point multiplication operation of the mixed precision neural network by using a unified multiplier.

In an implementation manner, the step S100 specifically includes the following steps:

Step S110, obtain mode signal and input data, determine the quantity of the multiplier called according to the precision of described input data;

Step S120, when the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;

Step S130, determining the data input position according to the mode signal, splitting the data with the highest precision in the input data, and inputting the input data obtained after the splitting into the multiplier from the data input position;

Step S140, when the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;

Step S150: Determine the data input position according to the mode signal, and input the input data into the multiplier from the data input position.

Since this embodiment adopts a unified multiplier, and the highest number of bits of the multiplier is fixed, it may happen that the precision of the multiplier does not match the precision of the input data. For example, the multiplier is an 8bit×8bit multiplier, and the input data The precision is 3bit×3bit, or the multiplier is 8bit×8bit multiplier, and the precision of the input data is 8bit×16bit. Therefore, the number of called multipliers needs to be determined according to the precision of the input data. It can be understood that if the precision of the input data exceeds the precision of the multiplier, the input data cannot be multiplied by one multiplier, and in this case, multiple multipliers need to be called.

Specifically, when the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1, and then the data input position is determined according to the mode signal, and the highest bit in the input data is determined. The precision data is split, and the input data obtained after the split is input into the multiplier from the data input position. For example, assuming that the input data is mixed-precision 8bit×16bit, and the multiplier is an 8bit×8bit multiplier, then two 8bit×8bit multipliers need to be called to realize the multiplication of mixed-precision 8bit×16bit input data. , the 8-bit part of the data can be directly input into the multiplier from the specified data input position, and the 16-bit part of the input data needs to be split before being input into the two multipliers respectively (as shown in Figure 12).

When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1, the data input position is determined according to the mode signal, and the input data is converted from the data The input position is entered into the multiplier. For example, when the precision of the input data is 3bit×3bit, and the multiplier uses an 8bit×8bit multiplier, then only one 8bit×8bit multiplier needs to be called to realize the multiplication of the input data. The highest precision of the data does not exceed the highest bit of the multiplier, so the input data can be directly input into the multiplier from the specified data input position for operation (as shown in Figure 7).

After that, the output result of the multiplier needs to be obtained, as shown in FIG. 1 , the method further includes the following steps:

Step S200: Obtain a mode signal, process the partial product generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum.

Specifically, since a unified multiplier is used in this embodiment to calculate input data of different precisions, the highest bit number of the multiplier may not be equal to the precision of the input data. In order to make the output of the multiplier The result is consistent with the input data. In this embodiment, the concept of a mode signal is introduced, and the partial product generated by the multiplier is processed by the mode signal, so that only the partial product generation part corresponding to the input data is left. . In short, the mode signal is equivalent to a control command, and the control system performs different processing on the partial products of different regions generated by the multiplier.

In an implementation manner, the mode signal is determined by the precision of the input data, and the processing includes at least one of the following two operations: 1. Masking the partial product of the preset area generated by the multiplier . For example, assuming that the multiplier is an 8bit×8bit multiplier, on the basis of the 8bit×8bit multiplier, the partial product generated by the 8bit×8bit multiplier is gated and selected by the mode signal. The unneeded partial product under the mode signal will be masked. In one implementation, the masking process can be implemented by setting the output result of the unneeded partial product generation part to 0 or 1 (the complement of the high-order bit is complemented). bits). Figure 4 shows the multiplication and accumulation operations of 4 groups of 2bit×2bit input data. The blocks of the same depth represent the same group of multiplier input data, or the multiplicand input data or the partial product generation part corresponding to the input data. For these 4 For groups of input data, a specific mode signal will be generated, and other partial products other than the partial products corresponding to the four groups of input data will be masked.

2. When the number of called multipliers is greater than 1, shift processing is performed on the partial product generation part of the output of the multipliers that perform lower-order operations. For example, when the multiplier used is an 8bit×8bit multiplier and the precision of the input data is 8bit×16bit, since the maximum precision of the input data is greater than the highest bit number of the multiplier, it is impossible to use a multiplier to complete the input. For data multiplication, two multipliers must be called. According to the precision of the input data, a specific mode signal will be generated. Through the mode signal, the partial product generation part of the low-order multiplier output will be shifted (as shown in the figure). 12).

After the processing is completed, the partial product generating part obtained after the processing needs to be split into a first partial product generating part and a second partial product generating part. Then, a summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum. Specifically, in this embodiment, the sum operation performed on the first partial product generation part and the second partial product generation part is mainly divided into the following three cases:

When the input data has the same precision and the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the first partial product generating part and the second partial product generating part can be generated by Input the first compressor and the second compressor respectively, then input the output result of the first compressor and the second compressor into the same adder, and use the output result of the adder as the target sum . Specifically, in practical applications, the speed of floating-point multiplication is largely determined by the speed of mantissa processing. However, a large number of partial products are generated in the process of mantissa processing. In the process of accumulating these partial products, if you directly The accumulation is bound to greatly prolong the mantissa processing time, so the partial product is first compressed, so that the partial products are finally compressed from n to 2, and then the 2 partial products obtained after compression are accumulated, and the result obtained after accumulation That is, the target sum required by this embodiment. It should be noted that the compressor in this embodiment actually belongs to a special adder.

For example, assuming that the multiplier used in this embodiment is an 8bit×8bit multiplier, as shown in Figure 2 and Figure 3, the implementation of the conventional 8bit×8bit multiplier will generate a total of 8 groups of partial products of gradual shifting, 8 The component product PP ₀ -PP ₇ is divided into two parts, and they will go through two 42 compressors (CSA42) in the first stage respectively. The output results of the two 42 compressors will be jointly input to a 42 in the second stage. Compressor (CSA42), and then input the output result of the second-stage 42-compressor (CSA42) into the first-stage carry-pass adder (CPA) to obtain the final sum, that is, the target sum. As shown in Figure 5, it is assumed that the two compressors of the first stage are a and b respectively, and the compressor of the second stage is c. Assuming that the input data are two 4bit×4bit floating-point numbers, then the eight compressors in Figure 5 The partial product generation part will be split into two parts. From top to bottom, the first 4 partial product generation parts are one part, that is, the first partial product generation part; the last 4 partial product generation parts are one part, that is, the second partial product generation part and then input the first partial product generation part and the second partial product generation part into the first stage compressor a and the first stage compressor b respectively, and the first stage compressor a and the first stage compressor a and the first stage compressor The output results of the first stage compressor b are jointly input to the second stage compressor c, and then the output results of the second stage compressor c are input into the adder, and finally the output result of the adder is used as the target sum. Figure 6 shows the distribution of the partial product generation part in the multiplier when the input data is 1bit×1bit; Figure 7 shows the distribution of the partial product generation part in the multiplier when the input data is 3bit×3bit; Figure 8 shows the distribution of the partial product generation part in the multiplier when the input data is 5bit×5bit; Figure 9 shows the distribution of the partial product generation part in the multiplier when the input data is 6bit×6bit; Figure 10 shows the distribution of the partial product generation part in the multiplier when the input data is 7bit×7bit. The embodiments corresponding to these figures all meet the condition that the precision of the input data is the same, so the steps of splitting, compressing and summing the partial product generation part are similar to those of the embodiment shown in FIG. 5 .

When the input data is of mixed precision, this embodiment adopts another method to obtain the target sum corresponding to the input data. First, it is necessary to obtain the highest number of bits in the input data, and compare the highest number of bits with the highest number of bits of the multiplier. When the highest number of bits is equal to the highest number of bits of the multiplier When counting, the highest precision representing the input data does not exceed the highest bit of the multiplier, and only one multiplier needs to be called for multiplication at this time. After obtaining the first partial product generating part and the second partial product generating part, input the first partial product generating part and the second partial product generating part into the first stage compressor a and the first stage compressor b respectively. , and then input the output results of the first-stage compressor a and the first-stage compressor b into the first adder and the second adder respectively, and finally add the first adder and the second adder The sum of the output results of the generator is used as the target sum. In short, for the mixed-precision input data, this embodiment adopts the operation of separately summing the compressed partial product generating parts, that is, the two partial product generating parts obtained after compression are input into different adders respectively. beg for peace.

For example, as shown in FIG. 11, it is assumed that this embodiment adopts a conventional 8bit×8bit multiplier to realize two 4bit×8bit mixed-precision input data, and the 8 partial product generating parts generated at this time can be generated from top to bottom. Divided into two parts, the first four partial product generating parts are the first partial product generating part, and these four partial product generating parts are summed up separately, that is, the first partial product generating part is input into a compressor for compression, and then separately Input into an adder for summation; the last four partial product generating parts are the second partial product generating part, and these four partial product generating parts are summed separately, that is, the second partial product generating part is input into another compressor After compressing in another adder, it is separately input into another adder for summation, and then the output results of the two adders are summed.

However, the mixed-precision input data may also have a situation where the highest bit number of the input data is greater than the highest bit number of the multiplier. It is understandable that when this happens, it is impossible to rely on only one multiplier The input data is multiplied, and two multipliers must be called for the operation. As shown in FIG. 13 , the highest bit number of the input data is obtained, and the highest bit number of the input data is compared with the highest bit number of the multiplier. When the highest bit number of the input data is When the number is greater than the highest number of bits of the multiplier, this embodiment divides the two multipliers called into a first multiplier and a second multiplier, where the second multiplier is a multiplier that performs low-order operations. In order to distinguish the partial product generation parts generated by the two multipliers, in this embodiment, the partial product generation part generated by the first multiplier is used as the first partial product generation part, and the partial product generation part generated by the second multiplier is used as the first partial product generation part. The second part is the product generation part. Then, the first partial product generating part can be directly input into the first adder (CPA1), while the second partial product generating part must be split and input to the first adder and the second adder respectively (CPA2). Then, the sum of the output results of the first adder and the second adder is used as the target sum. In short, for input data with mixed precision and the highest number of bits is greater than the highest number of bits of the multiplier, in order to avoid causing excessive timing delay, the data can be directly input into the addition without being compressed by a compressor. calculation in the device. And since 2 multipliers need to be called in this case, at the system accumulation implementation level, it is necessary to right-shift the partial product generation part generated by the multiplier that performs the low-order operation to realize the subsequent correct summation operation, so in the On the basis of an adder that is conventionally used, another adder needs to be called to perform a summation operation on the excess part after the right shift. For example, Figure 12 shows the multiplication operation of 8bit×16bit input data based on two sets of 8bit×8bit multiplier architecture, and Figure 14 shows the implementation of 8bit×xbit input data based on two sets of 8bit×8bit multiplier architecture. Multiplication operation, x=9～15bit, in which Figure 15 shows the multiplication operation of 8bit×15bit input data based on two sets of 8bit×8bit multiplier architecture. The above cases belong to the highest bit of the input data is greater than the highest bit of the multiplier. In the case of bits, it is necessary to adopt the above method to realize the sum operation of the partial product generation part.

After obtaining the target and, in order to obtain the required result of the dot product operation, as shown in Figure 1, the method further includes the following steps:

In step S300, the target sum is cut and selected, and the data obtained after the cut and selection is used as the result of the dot product of the input data.

Specifically, in this embodiment, after the target is obtained and the obtained sum needs to be cut with different bit widths later, the dot product result consistent with the mode signal and the input data can be finally obtained.

In an implementation manner, the step S300 specifically includes the following steps:

Step S310, determining the selected bit width according to the precision of the input data;

Step S320 , perform a truncation operation on the target sum starting from the 0th bit according to the truncation bit width, and use the data obtained after the truncation and selection operation as the result of the dot product of the input data.

In this embodiment, the truncation bit width is related to the precision of the input data. Specifically, for input data of the same precision, the truncated bit width is from the 0th bit to the 8th-nth bit, where n is the precision of the input data. For example, for 3bit×3bit input data, the clipped bit width is the 0th bit To the 5th bit; for input data of different precision, the selected bit width is from the 0th bit to the 16th-x, where x is the highest bit of the input data, and the value is 9-15, such as 8bit × 12bit input data , and the selected bit width is from the 0th to the 4th bit. After the cut-and-selection bit width is determined, according to the cut-selection bit width, the cut-selection operation is performed on the target sum starting from the 0th position, and finally the data obtained after the cut-selection operation is used as the dot product of the input data Operation result.

In an implementation method, this embodiment can not only support dot multiplication operations of different precisions, but also satisfy signed bit operations and unsigned bit operations. Therefore, the method further comprises the following steps:

Step S1, determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;

Step S2, when the highest bit of the input data is a negative number, invert and add one processing to the partial product generation part to be adjusted.

Specifically, in order to satisfy the signed bit operation, this embodiment first determines the partial product generation part related to the signed bit operation. In practical applications, the operation of the signed-bit fixed-point multiplier is implemented based on the complement input, where the complement of a positive number is itself, and the complement of a negative number is a signed binary (including the sign bit) which is directly fetched bit by bit Add one more. In this embodiment, the partial product generating part corresponding to the most significant bit of the input data is used as the partial product generating part to be adjusted. When the most significant bit of the input data is a negative number, then the partial product generating part to be adjusted is inverted. Add one processing, and then realize the operation with the sign bit.

For example, Figure 16 shows a schematic diagram of the generation of the partial product generation part of an 8bit×8bit multiplier, wherein the generation of the first 7 partial product generation parts PP ₀ -PP ₆ The generation of the partial product generation part (PP ₇ ) requires special processing: when the sign bit B7 is 0, it means a positive number, then PP ₇ is 0; when the sign bit B7 is 1, it means a negative number, then PP ₇ is A7A6A5A4A3A2A1A0 negated Add one operation. Similarly, in the 2bit×2bit operation, PP ₁ , PP ₃ , PP ₅ , and PP ₇ need to be processed. In the 4bit×4bit and 4bit×8bit operations, PP ₃ and PP ₇ need to be processed. When the symbol When the bit is 0, it takes 0, and when the sign bit is 1, the partial product generation part is inverted and added by one. However, it should be noted that the operation of the 8bit×16bit multiplier needs to reduce such operations, and the generation of PP ₇ in the second multiplier that performs low-bit operations does not require a similar method, only the first multiplier The generation of PP ₇ in 2 needs to be carried out in a similar way. In addition, due to the complement calculation, when the bit width needs to be extended on the left side of the data in the addition operation, the added data needs to be the same as the highest bit of the original data to ensure the same value in size. Similarly, as shown in Figures 4 and 5, in the operations of 2bit×2bit and 4bit×4bit, when the unused data position on the left side of the two figures is input to the addition tree operation, the input value also needs to be the same as The highest bits of the actual valid data are the same, rather than a simple 0-fill operation.

Based on the above embodiment, the present invention also provides a fixed-point multiply-add operation unit suitable for mixed-precision neural network. As shown in FIG. 17 , the operation unit includes:

A position determination module 01, for acquiring a pattern signal and input data, determining a data input position according to the pattern signal, and inputting the input data into the multiplier from the data input position;

Partial product processing module 02, for processing the partial product generated by the multiplier according to the mode signal, and performing a summation operation, using the data obtained after the summation operation as a target sum;

The result generation module 03 is configured to cut and select the target sum, and use the data obtained after the cut as the result of the dot product of the input data.

Specifically, in this embodiment, a unified multiplier is used for operation, but the number of multipliers is not fixed, and the number of multipliers called by the operation unit can be adaptively changed according to the precision of the input data . It can be understood that when the most significant bit of the input data is less than or equal to the most significant bit of the multiplier, the operation unit may only call one multiplier to implement the operation on the input data. When the most significant bit of the input data is greater than the most significant bit of the multiplier, the operation unit needs to call more than one multiplier. For example, when the multiplier in the operation unit is a conventional multiplier of 8bit×8bit, and the input data of 3bit×3bit or the input data of 4bit×8bit is obtained, the operation unit can only call one multiplier, and then according to the mode The signal controls the multiplier to output the partial product generating part after masking the partial product of the specified area, and perform a summation operation on the output partial product generating part according to methods corresponding to different precisions. When the input data of 8bit×16bit is obtained, the operation unit needs to call two multipliers, control the two multipliers to mask the partial product of the specified area according to the mode signal, and then output the partial product generating part, and generate the partial product for the output part. The sum operation is performed according to the methods corresponding to different precisions.

To sum up, the present invention discloses a fixed-point multiply-add operation unit and method suitable for mixed-precision neural network. By inputting the precision of input data with different precisions into the multiplier from different positions, the multiplier is controlled according to the mode signal. After masking the partial product of the specified area, output the partial product generation part, and perform the sum operation on the output partial product generation part according to the methods corresponding to different precisions, so as to realize the mixed precision dot product operation. In the present invention, a multiplier can be used to realize the point multiplication operation of the mixed-precision neural network, which solves the need to use a variety of processing units with different precisions in the prior art to process the mixed-precision operation, resulting in excessive hardware overhead and idle time. Resource redundancy and other issues.

It should be understood that the application of the present invention is not limited to the above-mentioned examples, and for those of ordinary skill in the art, improvements or transformations can be made according to the above-mentioned descriptions, and all these improvements and transformations should belong to the protection scope of the appended claims of the present invention.

Claims

A fixed-point multiply-add operation method suitable for mixed-precision neural networks, characterized in that the method comprises:

acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;

Process the partial products generated by the multiplier according to the mode signal, perform a summation operation, and use the data obtained after the summation operation as a target sum;

The target sum is truncated, and the data obtained after the truncated selection is used as the result of the dot product of the input data.
A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, said acquiring a mode signal and input data, determining a data input position according to said mode signal, and applying said input Data from the data input location into the multiplier includes:

Acquire the mode signal and input data, and determine the number of multipliers called according to the precision of the input data;

When the highest precision of the input data is higher than the highest bit of the multiplier, the number of called multipliers is greater than 1;

Determine the data input position according to the mode signal, split the data with the highest precision in the input data, and input the input data obtained after the splitting into the multiplier from the data input position;

When the highest precision of the input data is lower than or equal to the highest bit of the multiplier, the number of called multipliers is 1;

A data input location is determined based on the mode signal, and the input data is input into a multiplier from the data input location.
A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 2, wherein the acquiring mode signal, processing the partial product generated by the multiplier according to the mode signal, and A summation operation is performed, and the data obtained after the summation operation is used as the target sum including:

obtaining a mode signal, and processing the partial product generated by the multiplier according to the mode signal;

Splitting the partial product generation part obtained after processing into a first partial product generation part and a second partial product generation part;

A summation operation is performed on the first partial product generation part and the second partial product generation part, and the data obtained after the summation operation is used as a target sum.
A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein the mode signal is determined by the precision of the input data; the processing includes at least one of the following operations:

performing masking processing on the partial product of the preset area generated by the multiplier;

When the number of called multipliers is greater than 1, a shift process is performed on the partial product generation part output by the multipliers that perform lower order operations.
A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, characterized in that, when the input data is of the same precision, and the highest bit of the input data is less than or equal to the multiplication When the highest bit of the device is selected, performing a summation operation on the first partial product generation part and the second partial product generation part, and obtaining a target sum based on the summation operation includes:

inputting the first partial product generating part and the second partial product generating part into the first-stage compressor a and the first-stage compressor b respectively;

inputting the output results of the first-stage compressor a and the first-stage compressor b together into the second-stage compressor c;

The output result of the second stage compressor c is input into an adder, and the output result of the adder is used as a target sum.
A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein when the input data is mixed-precision, the first partial product generating part and the first partial product are generated. The bipartite product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:

Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;

When the highest bit number of the input data is equal to the highest bit number of the multiplier, the first partial product generating part and the second partial product generating part are input to the first stage compressor a and the first stage compressor a respectively. stage compressor b;

The output results of the first-stage compressor a and the first-stage compressor b are respectively input into the first adder and the second adder, and the first adder and the second adder are The sum of the output results is used as the target sum.
A fixed-point multiply-add operation method suitable for mixed-precision neural networks according to claim 3, wherein when the input data is mixed-precision, the first partial product generating part and the first partial product are generated. The bipartite product generation part performs a summation operation, and obtains the target sum based on the summation operation, including:

Obtain the highest bit number of the input data, and compare the highest bit number of the input data with the highest bit number of the multiplier;

When the highest bit number of the input data is greater than the highest bit number of the multiplier, the multiplier includes a first multiplier and a second multiplier, and the second multiplier is a low-order operation multiplier; the first multiplier outputs the first partial product generating part, and the second multiplier outputs the second partial product generating part;

inputting the first partial product generation part directly into the first adder;

splitting the second partial product generation part and inputting them into the first adder and the second adder respectively;

The sum of the output results of the first adder and the second adder is used as the target sum.
A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, when the target sum is cut and selected, the data obtained after the cut and selection is used as the input data. The dot product results include:

Determine the truncation bit width according to the precision of the input data;

According to the truncation bit width, an truncation operation is performed on the target sum starting from the 0th bit, and the data obtained after the truncation and selection operation is used as the result of the dot product of the input data.
A fixed-point multiply-add operation method suitable for mixed-precision neural network according to claim 1, characterized in that, the method further comprises:

determining the partial product generating part corresponding to the highest bit of the input data, and using the partial product generating part as the partial product generating part to be adjusted;

When the most significant bit of the input data is a negative number, inversion and addition processing is performed on the to-be-adjusted partial product generation part.
A fixed-point multiply-add operation unit suitable for mixed-precision neural network, characterized in that, the operation unit includes:

a position determination module for acquiring a mode signal and input data, determining a data input position according to the mode signal, and inputting the input data into a multiplier from the data input position;

a partial product processing module, configured to process the partial product generated by the multiplier according to the mode signal, and perform a summation operation, using the data obtained after the summation operation as a target sum;

The result generation module is used for intercepting the target sum, and using the data obtained after the interception as the result of the dot product of the input data.