CN111258544B

CN111258544B - Multiplier, data processing method, chip and electronic equipment

Info

Publication number: CN111258544B
Application number: CN201811450728.9A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2022-10-04
Anticipated expiration: 2038-11-30
Also published as: CN111258544A

Abstract

The application provides a multiplier, a data processing method, a chip and an electronic device, wherein the multiplier comprises: the output end of the judgment circuit is connected with the input end of the data expansion circuit, the output end of the judgment circuit is connected with the first input end of the coding circuit, the output end of the data expansion circuit is connected with the second input end of the coding circuit, the output end of the coding circuit is connected with the input end of the compression circuit, the multiplier can expand the received low-bit-width data, the expanded data meets the bit-width requirement of the multiplier for processing the data, the final multiplication result is still the result of multiplication of the original bit-width data, the operation of the multiplier for processing the low-bit-width data is guaranteed, and the area of an AI chip occupied by the multiplier is effectively reduced.

Description

Multiplier, data processing method, chip and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a multiplier, a data processing method, a chip, and an electronic device.

Background

With the continuous development of digital electronic technology, the rapid development of various Artificial Intelligence (AI) chips has higher and higher requirements for high-performance digital multipliers. As one of algorithms widely used by an intelligent chip, a neural network algorithm is a common operation in which multiplication is performed by a multiplier.

In general, when data with different bit widths are multiplied, the existing multiplier with corresponding bit numbers is needed to be used for operation. However, for the operation of data with a low bit width, the conventional multiplier capable of processing data with a high bit width cannot be used for multiplication, so that the multiplier occupies a large area of the AI chip.

Disclosure of Invention

In view of the above, it is desirable to provide a multiplier, a data processing method, a chip and an electronic device.

An embodiment of the present invention provides a multiplier, where the multiplier includes: the output end of the judgment circuit is connected with the input end of the data expansion circuit, the output end of the judgment circuit is connected with the first input end of the coding circuit, the output end of the data expansion circuit is connected with the second input end of the coding circuit, and the output end of the coding circuit is connected with the input end of the compression circuit;

the judgment circuit is used for judging whether the received data needs to be processed through a data expansion circuit connected with the output end of the judgment circuit, the data expansion circuit is used for carrying out expansion processing on the received data, the coding circuit is used for carrying out coding processing on the received data to obtain a partial product of a target code, and the compression circuit is used for carrying out accumulation processing on the partial product of the target code.

In one embodiment, the encoding circuit comprises a third input terminal for receiving an input function selection mode signal; the compression circuit comprises a first input end for receiving an input function selection mode signal.

In one embodiment, the determining circuit includes: a data input port and a data output port; the data input port is used for receiving data for multiplication operation, and the data output port is used for outputting the received data.

In one embodiment, the data expansion circuit includes: the data expansion module comprises a data input port, a data expansion mode selection signal input port, a function selection mode signal output port and an expanded data output port; the data input port is used for receiving the data output by the judging circuit, the data expansion mode selection signal input port is used for receiving a data expansion mode selection signal corresponding to the received data through expansion processing, the function selection mode signal output port is used for outputting a function selection mode signal determined according to the mode of the data expansion circuit through expansion processing of the received data, and the expanded data output port is used for outputting the data after the expansion processing.

In one embodiment, the encoding circuit includes: the Booth encoding circuit comprises a Booth encoding sub-circuit and a partial product obtaining sub-circuit, wherein the output end of the Booth encoding sub-circuit is connected with the first input end of the partial product obtaining sub-circuit;

the Booth coding sub-circuit is used for carrying out Booth coding on the received data to obtain a coded signal, and the partial product obtaining sub-circuit is used for obtaining a partial product of a target code according to the coded signal.

In one embodiment, the booth encoding sub-circuit comprises: the data input port is used for receiving data subjected to Booth coding processing, and the coding signal output port is used for outputting a coding signal obtained after the Booth coding processing is performed on the received data.

In one embodiment, the partial product acquisition sub-circuit comprises: the device comprises an encoding signal input port, a data input port and a partial product output port, wherein the encoding signal input port is used for receiving the encoding signal, the data input port is used for receiving the data, and the partial product output port is used for outputting a partial product of a target code acquired according to the encoding signal and the received data.

In one embodiment, the compression circuit comprises: a Wallace tree group sub-circuit and an accumulation sub-circuit; the output end of the Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit; the Wallace tree group sub-circuit is configured to accumulate the partial products of the target code, and the accumulation sub-circuit is configured to accumulate the received input data.

In one embodiment, the wallace tree group subcircuit includes: a Wallace tree unit to accumulate each column of the partial product of the target code.

In one embodiment, the accumulation sub-circuit comprises: and the adder is used for performing addition operation on the two received data with the same bit width.

In one embodiment, the adder comprises: the carry signal input port is used for receiving a carry signal, the sum signal input port is used for receiving a sum signal, and the operation result output port is used for outputting a result of accumulation processing of the carry signal and the sum signal.

According to the multiplier provided by the embodiment, the multiplier can be used for expanding received low-bit-width data, the expanded data meets the bit-width requirement of the multiplier for data processing, and the final multiplication result is still the result of multiplication of the original bit-width data, so that the multiplier can be used for processing the low-bit-width data, and the area of an AI chip occupied by the multiplier is effectively reduced.

The embodiment of the invention provides a data processing method, which comprises the following steps:

receiving data to be processed;

judging whether the bit width of the data to be processed is equal to the bit width of the data which can be processed by the multiplier or not;

if the data to be processed is not equal to the preset data, performing data expansion processing on the data to be processed to obtain expanded data;

coding the expanded data to obtain a partial product after sign bit expansion;

and accumulating the partial products after the sign bit is expanded to obtain an operation result.

In one embodiment, after determining whether the bit width of the data to be processed is equal to the bit width of the data processable by the multiplier, the method further includes: and if the sign bit is equal to the sign bit, coding the data to be processed to obtain a partial product after sign bit expansion.

In one embodiment, the encoding the extended data to obtain a sign-bit-extended partial product includes:

performing Booth coding processing on the expanded data to obtain a coded signal;

and obtaining the partial product after the sign bit is expanded according to the data to be processed and the coding signal.

In one embodiment, the obtaining the sign-bit-extended partial product according to the data to be processed and the encoded signal includes:

obtaining an original partial product according to the data to be processed and the coded signal;

and sign bit expansion processing is carried out on the original partial product to obtain the partial product after sign bit expansion.

In one embodiment, the performing data expansion processing on the data to be processed to obtain expanded data includes: and performing data expansion processing on the data to be processed through 0 or the sign bit value of the data to be processed to obtain expanded data.

In one embodiment, the bit width of the expanded data is equal to the bit width of the data currently processed by the multiplier.

In one embodiment, the accumulating the partial products after sign bit extension to obtain an operation result includes:

accumulating the partial products after the sign bit is expanded through a Wallace tree group sub-circuit to obtain a first operation result;

and performing accumulation processing on the first operation result through an accumulation sub-circuit to obtain an operation result.

According to the data processing method provided by the embodiment, the received low-bit-width data can be expanded, the expanded data meets the bit width requirement of the data which can be processed by the multiplier, and the final multiplication result is still the result of the multiplication of the original bit-width data, so that the operation of the multiplier for processing the low-bit-width data is ensured, and the area of an AI chip occupied by the multiplier is effectively reduced.

The embodiment of the invention provides a machine learning arithmetic device, which comprises one or more multipliers described in the first aspect; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;

when the machine learning arithmetic device comprises a plurality of multipliers, the multipliers can be linked through a specific structure and transmit data;

the multipliers are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.

The combined processing device provided by the embodiment of the invention comprises the machine learning processing device, a universal interconnection interface and other processing devices; the machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.

The neural network chip provided by the embodiment of the invention comprises the multiplier, the machine learning arithmetic device or the combined processing device.

The neural network chip packaging structure provided by the embodiment of the invention comprises the neural network chip.

The board card provided by the embodiment of the invention comprises the neural network chip packaging structure.

The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.

An embodiment of the present invention provides a chip, including at least one multiplier as described in any one of the above.

The electronic equipment provided by the embodiment of the invention comprises the chip.

Drawings

Fig. 1 is a schematic structural diagram of a multiplier according to an embodiment;

FIG. 2 is a schematic diagram of another multiplier according to another embodiment;

FIG. 3 is a circuit diagram of an embodiment of a multiplier;

FIG. 4 is a schematic diagram illustrating a distribution rule of partial products obtained by 16-bit data multiplication according to an embodiment;

FIG. 5 is a circuit diagram of another embodiment of a multiplier;

FIG. 6 is a specific circuit diagram of the compression circuit for 8-bit data operation according to another embodiment;

FIG. 7 is a flowchart illustrating a data processing method according to an embodiment;

FIG. 8 is a flowchart illustrating a method for obtaining an encoded signal according to an embodiment;

FIG. 9 is a flowchart illustrating a method for obtaining a partial product of a target code according to an embodiment;

FIG. 10 is a flowchart illustrating a method for obtaining an operation result according to an embodiment;

FIG. 11 is a flowchart illustrating a specific method for obtaining operation results according to an embodiment;

FIG. 12 is a flow diagram illustrating another exemplary data processing method according to an embodiment;

FIG. 13 is a flowchart illustrating a method for obtaining a partial product after sign bit expansion according to another embodiment;

FIG. 14 is a flowchart illustrating a specific method for obtaining a partial product after sign bit extension according to another embodiment;

FIG. 15 is a block diagram of a combined processing device according to an embodiment;

FIG. 16 is a block diagram of another integrated processing device according to an embodiment;

fig. 17 is a schematic structural diagram of a board card according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The multiplier provided by the application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip or other hardware circuit devices for multiplication processing, and the specific structural schematic diagram of the multiplier is shown in fig. 1 and 2.

As shown in fig. 1, fig. 1 is a structural diagram of a multiplier according to an embodiment. As shown in fig. 1, the multiplier includes: a correction encoding circuit 11 and a correction compression circuit 12; the output end of the correction coding circuit 11 is connected with the input end of the correction compression circuit 12; the modified encoding circuit 11 is configured to perform encoding processing on the received data to obtain a partial product after sign bit extension, and obtain a partial product of a target code according to the partial product after sign bit extension, and the modified compression circuit 12 is configured to perform accumulation processing on the partial product of the target code.

Specifically, the correction encoding circuit 11 may include a plurality of data processing units having different functions, and the data received by the correction encoding circuit 11 may be used as a multiplier in a multiplication operation or may be used as a multiplicand in a multiplication operation. Optionally, the data may be fixed point numbers. Optionally, the modified encoding circuit 11 may receive data with a plurality of different bit widths, that is, the multiplier provided in this embodiment may process multiplication operations of data with a plurality of different bit widths. However, in the same multiplication, the multiplier and the multiplicand received by the correction encoding circuit 11 may be data having the same bit width, that is, the multiplier and the multiplicand have the same bit width. For example, the multiplier provided in this embodiment may process a multiplication operation of 8 bits by 8 bits data, a multiplication operation of 16 bits by 16 bits, a multiplication operation of 32 bits by 32 bits data, and a multiplication operation of 64 bits by 64 bits data, which is not limited in this embodiment.

Optionally, the correction encoding circuit 11 may perform binary encoding on the received data, which is equivalent to performing binary encoding on the received multiplier, and obtain a partial product after sign bit expansion according to the received multiplicand, where a bit width of the partial product after sign bit expansion may be equal to 2 times of a bit width of the data currently processed by the multiplier. Illustratively, the correction coding circuit 11 receives data with a bit width of 16 bits, if the multiplier performs multiplication on 8-bit data currently processed, the correction coding circuit 11 needs to divide the data with a bit width of 16 bits into two groups of data with 8 bits higher and 8 bits lower, and at this time, the bit width of the partial product after sign bit expansion may be equal to 2 times the bit width of the data currently processed by the multiplier; if the multiplier is used for performing a multiplication operation on the currently processed 16-bit data, the correction coding circuit 11 needs to perform an operation on the entire 16-bit data, and at this time, the bit width of the partial product after the sign bit is extended may be equal to 2 times the bit width of the currently processed data of the multiplier.

Optionally, the modified encoding circuit 11 includes a first input end for receiving an input function selection mode signal; the modified compression circuit 12 includes a first input terminal for receiving the input function selection mode signal. Optionally, the function selection mode signal is used to determine a data bit width processed by the multiplier.

It should be noted that the function selection mode signal may be various, and different function selection mode signals correspond to multiplication operations of the multiplier that can currently process data with different bit widths. Alternatively, the function selection mode signals received by the correction encoding circuit 11 and the correction compressing circuit 12 may be equal in the same multiplication.

For example, if the correction coding circuit 11 and the correction compression circuit 12 can receive a plurality of function selection mode signals, and for example, three function selection mode signals are respectively mode =00, mode =01, and mode =10, mode =00 may indicate that a multiplier can process 16-bit data, mode =01 may indicate that a multiplier can process 32-bit data, mode =10 may indicate that a multiplier can process 64-bit data, mode =00 may indicate that a multiplier can process 64-bit data, mode =01 may indicate that a multiplier can process 16-bit data, and mode =10 may indicate that a multiplier can process 32-bit data.

In the multiplier provided by this embodiment, the sign bit extended partial product is obtained by encoding the received data through the correction encoding circuit, the target encoded partial product is obtained according to the sign bit extended partial product, and the target encoded partial product is accumulated through the correction compression circuit to obtain the multiplication result.

Fig. 2 is a structural diagram of a multiplier according to another embodiment. As shown in fig. 2, the multiplier includes: a judgment circuit 11, a data expansion circuit 12, an encoding circuit 13, and a compression circuit 14; the output end of the judging circuit 11 is connected with the input end of the data expanding circuit 12, the output end of the judging circuit 11 is connected with the first input end of the coding circuit 13, the output end of the data expanding circuit 12 is connected with the second input end of the coding circuit 13, and the output end of the coding circuit 13 is connected with the input end of the compressing circuit 14. The judging circuit 11 is configured to judge whether the received data needs to be processed by a data expansion circuit 12 connected to an output end of the judging circuit 11, the data expansion circuit 12 is configured to perform expansion processing on the received data, the encoding circuit 13 is configured to perform encoding processing on the received data to obtain a partial product of a target code, and the compressing circuit 14 is configured to perform accumulation processing on the partial product of the target code.

Specifically, the judging circuit 11 may be a circuit for judging the bit width of the received data and the bit width of the data processable by the multiplier, which is 2N. Optionally, the encoding circuit 13 may include a plurality of data processing units with different functions, and the data received by the encoding circuit 13 may be used as a multiplier in a multiplication operation, and may also be used as a multiplicand in a multiplication operation. The data received by the encoding circuit 13 may be two data output by the judgment circuit 11, or may be data obtained by performing expansion processing on the two received data by the data expansion circuit 12. Alternatively, the data processing unit with different functions may be a data processing unit with a binary encoding function. Alternatively, the multiplier and the multiplicand may be multi-bit wide floating point numbers. Optionally, the compression circuit 14 may perform accumulation processing on the partial product of the target code obtained by the encoding circuit 13 to obtain a multiplication result.

It should be noted that the multiplier may perform multiplication on data with a fixed 2N-bit width, and it is also understood that the encoding circuit and the compression circuit in the multiplier may perform multiplication on data with a 2N-bit width. However, in the same multiplication, the multiplier and the multiplicand received by the encoding circuit 13 are data having the same bit width. For example, the multiplier provided in this embodiment may process an 8bit by 8bit data multiplication operation, a 16 bit by 16 bit data multiplication operation, a 32 bit by 32 bit data multiplication operation, and a 64bit by 64bit data multiplication operation, which is not limited in this embodiment. Optionally, there may be one input port of the data processing unit with different functions, the function of each input port of each data processing unit may be the same, there may also be one output port, the function of each output port of each data processing unit may be different, and the circuit structures of the data processing units with different functions may be different.

Optionally, the encoding circuit 13 includes a third input end, configured to receive an input function selection mode signal; the compression circuit 14 includes a first input terminal for receiving an input function selection mode signal.

In the multiplier provided by this embodiment, the determining circuit determines whether the received data needs to be processed by the next data expansion circuit, if the received data does not need to be processed by the next data expansion circuit, the determining circuit directly inputs the received data to the encoding circuit for encoding to obtain the partial product of the target code, otherwise, the received data is input to the data expansion circuit for expansion, the data expansion circuit inputs the expanded data to the encoding circuit for encoding to obtain the partial product of the target code, and the compression circuit performs accumulation processing on the partial product of the target code to obtain the final operation result.

Fig. 3 is a schematic structural diagram of a multiplier according to another embodiment, where the multiplier includes the correction coding circuit 11, and the correction coding circuit 11 includes: a low booth encoding unit 111, a low partial product acquisition unit 112, a selector 113, a high booth encoding unit 114, a high partial product acquisition unit 115, a low selector bank unit 116, and a high selector bank unit 117; a first output terminal of the low-order booth encoding unit 111 is connected to an input terminal of the selector 112, a second output terminal of the low-order booth encoding unit 111 is connected to a first input terminal of the low-order partial product acquisition unit 112, an output terminal of the selector 112 is connected to a first input terminal of the high-order booth encoding unit 113, an output terminal of the high-order booth encoding unit 113 is connected to a first input terminal of the high-order partial product acquisition unit 115, an output terminal of the low-order selector bank unit 116 is connected to a second input terminal of the low-order partial product acquisition unit 112, and an output terminal of the high-order selector bank unit 117 is connected to a second input terminal of the high-order partial product acquisition unit 115. The low-order booth encoding unit 111 is configured to perform booth encoding on low-order data in received data to obtain a low-order encoded signal, the low-order product obtaining unit 112 is configured to obtain a low-order product of a target code according to the low-order encoded signal, the selector 113 is configured to gate a complement value of high-order data when performing booth encoding, the high-order booth encoding unit 114 is configured to perform booth encoding on the received high-order data and the complement value to obtain a high-order encoded signal, the high-order product obtaining unit 115 is configured to obtain a high-order product of the target code according to the high-order encoded signal, the low-order selector group unit 116 is configured to gate a value of the low-order product of the target code, and the high-order selector group unit 117 is configured to gate a value of the high-order product of the target code.

Specifically, the correction code isThe circuit 11 may receive a multiplier and a multiplicand in the multiplication, perform booth encoding on the multiplier to obtain an encoded signal, and obtain a partial product of a target code from the encoded signal and the received multiplicand. Before the low-order data is subjected to the booth encoding process, the low-order booth encoding unit 111 may automatically perform a bit complementing process on the low-order data in the data received by the correction encoding circuit 11, and perform the booth encoding process on the low-order data after the bit complementing process to obtain a low-order encoded signal, where the data may be a multiplier in a multiplication operation. Alternatively, if the multiplier bit width received by the modified coding circuit 11 is N, the lower data may be data of low N/2 bits, and the bit complementing process may be characterized by complementing a bit value 0 for a lower bit of the lowest bit value in the lower data. Illustratively, if the multiplier can currently handle 8-bit by 8-bit fixed point multiplication, the multiplier is "y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ Before performing the booth encoding process, the low-order booth encoding unit 111 may automatically perform a bit-filling process on the multiplier, and convert the multiplier into data "y" after bit-filling ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ 0". Optionally, the number of the low-order coded signals may be equal to 1/2 of the bit width of the low-order data, and the number of the low-order coded signals may be equal to the number of partial products obtained by extending sign bits corresponding to the low-order data. It should be noted that, no matter whether the bit width of the data currently processed by the multiplier is the same as the bit width of the data received by the multiplier, when implementing the booth encoding process, the low-order booth encoding unit 111 needs to perform the bit complement process on the low-order data.

Meanwhile, the high-order booth coding unit 114 may perform booth coding on the high-order data in the multiplier received by the correction coding circuit 11 to obtain a high-order coded signal, but before performing booth coding on the high-order data, the selector 113 needs to obtain a strobe value, which may be used as a bit-complement value when performing booth coding on the high-order data, and then combine the high-order data with the bit-complement value to obtain the bit-complemented high-order data, and perform booth coding on the bit-complemented high-order data by the high-order booth coding unit 114 to obtain the high-order coded signal. Alternatively, the selector 113 may be a two-way selector, and the gate value may be 0, or may be the highest bit value of the lower data in the multiplier. Illustratively, a multiplier may process a multiplication operation of data with a bit width of N bits and 2N bits, where the bit width of the data received by the modified coding circuit 11 is 2N bits, and if the multiplier is currently processing an operation of data with a bit width of N bits, the data gated by the selector 113 is 0, that is, the multiplier needs to divide the received data with a bit width of 2N bits into data with a bit width of high N bits and data with a bit width of low N bits for processing respectively; if the multiplier is currently processing the operation of the data with the bit width of 2N bits, the data gated by the selector 113 is the highest bit value in the lower bit data, which corresponds to that the multiplier needs to perform booth encoding processing on the received data with the bit width of 2N bits as a whole. In addition, the selector 112 may also determine the gated complement value according to the received different function selection mode signals.

It should be noted that the lower partial product obtaining unit 114 may obtain, according to each lower encoded signal, a partial product after sign bit extension corresponding to the lower data, and a value in the lower partial product of the target code obtained after gating by the lower selector group unit 116, so as to obtain the lower partial product of the target code. Optionally, the high-order partial product obtaining unit 115 may obtain, according to each high-order coded signal, a partial product after sign bit extension corresponding to the high-order data is obtained, and a value in the high-order partial product of the target code obtained after gating by the high-order selector group unit 117, so as to obtain the high-order partial product of the target code. Optionally, in the booth encoding process, the number of the obtained low-order coded signals may be equal to the number of the obtained high-order coded signals, and may also be equal to the number of partial products after sign bit extension corresponding to the low-order data, or the number of partial products after sign bit extension corresponding to the high-order data. Optionally, the modified coding circuit 11 may include N/4 low-bit booth coding units 111 and N/4 high-bit booth coding units 114. Optionally, the correction coding circuit 11 may include N/4 low-order partial product obtaining units 112, and may further include N/4 high-order partial product obtaining units 115. Optionally, each of the lower partial product obtaining unit 112 and each of the upper partial product obtaining units 115 may include 2N number of value generating sub-units, and each of the value generating sub-units may obtain one value of the partial product after sign bit extension. Wherein, the N may represent the bit width of the data received by the multiplier.

In the multiplier provided by this embodiment, the low-order booth encoding unit, the selector, and the high-order booth encoding unit in the modified encoding circuit perform booth encoding processing on received data to obtain low-order and high-order encoded signals, and the low-order partial product obtaining unit and the high-order partial product obtaining unit obtain a partial product of a target code according to the low-order and high-order encoded signals, and then accumulate the partial product of the target code to obtain a multiplication result.

In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 3, the multiplier includes the low-order booth encoding unit 111, and the low-order booth encoding unit 111 includes: a lower data input port 1111 and a lower encoded signal output port 1112. The low-order data input port 1111 is configured to receive low-order data subjected to booth encoding processing, and the low-order encoded signal output port 1112 is configured to output a low-order encoded signal obtained by performing booth encoding processing on the low-order data.

Specifically, in the multiplication, the correction coding circuit 11 in the multiplier needs to perform booth coding processing on the multiplier, and the lower booth coding unit 111 in the correction coding circuit 11 may receive three bits of values in the lower data of the multiplier through the lower data input port 1111, where the three bits are used as a group of data to be coded, and the three values may be adjacent three bits of values in the lower data. Each low-order booth encoding unit 111 processes the received data to be encoded, and outputs the obtained low-order encoded signal through a low-order encoded signal output port 1114. In addition, the first low bit booth encoding unit 111 in the modified encoding circuit 11 can receive the complement value 0 and the lower two bits of the low bit data through the low bit data input port 1111.

Illustratively, if the multiplier receives data "y" that is 16 bits wide ₁₅ y ₁₄ y ₁₃ y ₁₂ y ₁₁ y ₁₀ y ₉ y ₈ y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ ", the lowest bit value to the highest bit value corresponds to a number 0, \ 8230;, 15, the lower Booth encoding unit 111 can encode the lower bit data y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ Performing Booth encoding to obtain 9-bit data y after performing bit-complementing processing on 8-bit low-bit data before Booth encoding ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ 0, the lower booth encoding units 111 may be respectively for y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ Y in 0 ₇ y ₆ y ₅ ，y ₅ y ₄ y ₃ ，y ₃ y ₂ y ₁ ，y ₁ y ₀ Four sets of data 0 are respectively subjected to booth encoding processing, and adjacent three-bit values in the four sets of data divided by 9-bit data can be received through a lower-bit data input port 1111 in the lower-bit booth encoding unit 111.

It should be noted that, each time the booth coding processing is performed, data obtained by performing the bit complementing processing on lower-order data may be divided into multiple groups of data to be coded, and the lower-order booth coding unit 111 may perform the booth coding processing on the divided multiple groups of data to be coded at the same time. Optionally, the principle of dividing the multiple groups of data to be encoded may be characterized in that every 3-bit value adjacent to each other in the data after the bit complementing processing is used as a group of data to be encoded, and the highest-order value in each group of data to be encoded may be used as the lowest-order value in the next group of data to be encoded. Alternatively, the encoding rules of booth encoding can be seen in table 1, wherein,y in Table 1 _2i+1 ，y _2i And y _2i-1 Can represent the corresponding numerical value of each group of data to be coded, X can represent the multiplicand received by the correction coding circuit 11, and after performing booth coding processing on each group of corresponding data to be coded, the corresponding coded signal PP is obtained _i (i =0,1,2,. N). Alternatively, the encoded signal obtained after booth encoding may include five classes, which are-2x, -X, X and 0, respectively, as shown in table 1. Illustratively, if the multiplicand received by the correction coding circuit 11 is "x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ ", then X may be represented as" X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ ”。

TABLE 1

Exemplarily, continuing with the above example, when i =0, y _2i+1 ＝y ₁ ，y _2i ＝y ₀ ，y _2i-1 ＝y _-1 Then y is _-1 Can represent y ₀ The post-padding value 0 (i.e., the multiplier after the padding process is expressed as y) ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ y _-1 ) In the Booth coding process, y-1y can be coded ₀ y ₁ ，y ₁ y ₂ y ₃ ，y ₃ y ₄ y ₅ And y ₅ y ₆ y ₇ And respectively encoding the four groups of data to be encoded to obtain 4 low-order encoded signals, wherein the highest-order numerical value in each group of data to be encoded can be used as the lowest-order numerical value in the next adjacent group of data to be encoded.

In the multiplier provided by this embodiment, the low-order booth coding unit performs booth coding on the low-order data to obtain the low-order coded signal corresponding to the low-order data, and the low-order product obtaining unit obtains the low-order product of the target code according to the low-order coded signal, and further performs accumulation processing on the low-order product and the high-order product of the target code to obtain a multiplication result.

In one embodiment, continuing with the specific structural diagram of the multiplier shown in fig. 3, the multiplier includes the lower partial product obtaining unit 112, and the lower partial product obtaining unit 112 includes: a low-order encoded signal input port 1121, a strobe value input port 1122, a data input port 1123, and a partial product value output port 1124; the lower-order coded signal input port 1121 is configured to receive a lower-order coded signal output by the lower-order booth coding unit 111, the strobe value input port 1122 is configured to receive a value in a lower-order partial product of the target code output after being strobed by the lower-order selector bank unit 116, the data input port 1123 is configured to receive data of a multiplication operation, and the value input port 1124 is configured to receive a value in a lower-order partial product of the target code.

Specifically, the lower-order partial product obtaining unit 112 may receive the lower-order encoded signal output by the lower-order booth encoding unit 111 through the lower-order encoded signal input port 1121, and may receive the multiplicand in the multiplication operation through the data input port 1123. Optionally, the lower partial product obtaining unit 112 may obtain a partial product after sign bit extension corresponding to the lower data according to the received lower encoded signal and the received multiplicand in the multiplication operation. Optionally, if the multiplicand bit width received by the data input port 1123 is N, the bit width of the partial product after sign bit extension may be equal to 2N. For example, if the lower product obtaining unit 112 receives a multiplicand X with a bit width of N bits, the lower product obtaining unit 112 may directly obtain the corresponding sign bit according to the multiplicand X and five types of encoded signals-2x, -X, X and 0The sign bit extended partial product may have a low (N + 1) bit value equal to the value of the original partial product and a high (N-1) bit value equal to the sign bit value of the original partial product, i.e., the highest bit value of the original partial product. When the coded signal is-2X, the original partial product can be obtained by taking the left bit and the right bit of X and then adding 1, when the coded signal is 2X, the original partial product can be obtained by taking the left bit and the right bit of X, when the coded signal is-X, the original partial product can be obtained by taking the left bit of X and then adding 1, when the coded signal is X, the original partial product can be obtained by combining the sign bit value of X and the most significant bit value of X, and when the coded signal is X+When 0, the original partial product may be 0, i.e. each bit value of the 9-bit original partial product is equal to 0.

It should be noted that the low-order partial product obtaining unit 112 may receive, through the gated value input port 1122, a corresponding bit value in the partial product after sign bit extension corresponding to the data with different bit widths gated by the low-order selector group unit 116, and obtain the low-order partial product of the target code according to the partial product after sign bit extension corresponding to the low-order data currently obtained by the multiplier and the corresponding bit value after gating.

In the multiplier provided by this embodiment, the low-order-portion-product obtaining unit may obtain the low-order portion of the target code according to each low-order-coded signal, and the low-order-portion-product obtaining unit obtains the low-order portion of the target code according to the low-order-coded signal, and further accumulates the low-order portion and the high-order portion of the target code to obtain a multiplication result.

In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 3, the multiplier includes the selector 113, and the selector 113 includes: a function selection mode signal input port 1131 (mode), a first strobe value input port 1132, a second strobe value input port 1133, and an operation result output port 1134; the function selection mode signal input port 1131 is configured to receive a function selection mode signal corresponding to data with different bit widths that needs to be processed by a multiplier, the first strobe value input port 1132 is configured to receive a first strobe value, the second strobe value input port 1133 is configured to receive a second strobe value, and the operation result output port 1134 outputs the first strobe value or the second strobe value after being strobed.

Specifically, the selector 113 may determine, through the function selection mode signal received by the function selection mode signal input port 1131, a data bit width currently processable by the multiplier, and determine that the operation result output port 1134 outputs the first strobe value or the second strobe value. Optionally, the first strobe data may be the highest bit value of 0 or lower data, and the second strobe data may be the highest bit value of 0 or lower data.

For example, during the multiplication, if the multiplier and the multiplicand received by the correction coding circuit 11 are both 16-bit data, and the function selection mode signal input port 1131 (mode) of the selector 113 can receive two different function selection mode signals, that is, mode =0, mode =1, mode =0, which can indicate that the multiplier can process 8-bit data, mode =1, which can indicate that the multiplier can process 16-bit data, and when the mode received by the function selection mode signal input port 1121 (mode) of the selector 112 is 0, the multiplier can currently process 8-bit data operations, at this time, the selector 113 can receive a second gate value through the second gate value input port 1133, where the second gate value may be equal to 0; when mode received by the function selection mode signal input port 1131 (mode) of the selector 113 is 1, then the multiplier can currently process 16-bit data operations, at which point the selector 113 can receive a first strobe value through the first strobe value input port 1132, which may be equal to the most significant bit value of the lower-bit data.

It should be noted that, if the multiplier can currently process an 8-bit data multiplication operation, the multiplier can perform multiplication operations on 8-bit data of high order and 8-bit data of low order corresponding to a 16-bit multiplier and a 16-bit multiplicand, that is, the 8-bit multiplier of high order and the 8-bit multiplicand of high order are performed by the high-bit booth encoding unit 114, the 8-bit multiplier of low order and the 8-bit multiplicand are performed by the low-bit booth encoding unit 111, and when the multiplier performs an 8-bit data multiplication operation, the selector 113 may receive a second gated value 0 through the second gated value input port 1133, where a complement value after the high-bit data complement processing is equal to 0; if the multiplier can currently process a 16-bit data multiplication operation, the multiplier can directly perform the multiplication operation on the 16-bit multiplier and the 16-bit multiplicand, that is, the correction coding circuit 11 directly performs booth coding on the 16-bit multiplier, at this time, the selector 113 may receive a first strobe value through the first strobe value input port 1132, where the first strobe value is the highest bit value in the lower 8-bit data.

According to the multiplier provided by the embodiment, the function selection mode signal received by the selector can determine the bit complement value of the high-order data during Booth encoding processing, so that Booth encoding processing is performed on the data after bit complement, the multiplier can perform multiplication operation on data with various bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.

Fig. 3 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the high-bit booth encoding unit 114, and the high-bit booth encoding unit 114 includes: a high-order data input port 1141 and a high-order coded signal output port 1142; the high-order data input port 1141 is configured to receive high-order data subjected to booth coding processing, and the high-order coded signal output port 1142 is configured to output a high-order coded signal obtained by performing booth coding processing on the high-order data.

Specifically, in the multiplication operation, the correction coding circuit 11 in the multiplier needs to perform booth coding processing on the multiplier, and the high-order booth coding unit 114 in the correction coding circuit 11 may receive three-bit values in high-order data in the multiplier through the high-order data input port 1141, where the three-bit values are used as a group of data to be coded, and the three values may be adjacent three-bit values in the high-order data.

Examples of the inventionIllustratively, continuing with the example of a 16-bit data multiply operation, the high-order Booth encoding units 114 may separately pair y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ 0 in y ₇ y ₆ y ₅ ，y ₅ y ₄ y ₃ ，y ₃ y ₂ y ₁ ，y ₁ y ₀ The four groups of data 0 are respectively subjected to booth encoding processing, and continuous three-bit values in the four groups of data divided by 9-bit data can be received through a high-bit data input port 1141 in the high-bit booth encoding unit 114.

It should be noted that the principle of the higher booth encoding unit 114 processing the higher data to be encoded at each booth encoding process may be the same as the principle of the lower booth encoding unit 111 processing the lower data to be encoded. The internal circuit configuration of the higher booth encoding unit 114 and the lower booth encoding unit 111 may be the same, and the function of the external output port may be the same.

In the multiplier provided by this embodiment, the booth coding processing is performed on the high-order data by the high-order booth coding unit to obtain the high-order coded signal corresponding to the high-order data, and the high-order product obtaining unit obtains the high-order product of the target code according to the high-order coded signal, and further performs accumulation processing on the high-order product and the low-order product of the target code to obtain a multiplication result.

In one embodiment, continuing with the specific structural diagram of the multiplier shown in fig. 3, the multiplier includes the high-order partial product obtaining unit 115, and the high-order partial product obtaining unit 115 includes: a high order encoded signal input port 1151, a strobe value input port 1152, a data input port 1153, and a partial product value output port 1154; the high-order coded signal input port 1151 is configured to receive the high-order coded signal output by the high-order booth coding unit 111, the strobe numerical value input port 1152 is configured to receive a numerical value in a high-order partial product of the target code output after being strobed by the high-order selector bank unit 117, the data input port 1153 is configured to receive data for multiplication, and the numerical value input port 1154 in the partial product is configured to receive a numerical value in a high-order partial product of the target code.

Specifically, the high-order partial product obtaining unit 115 may receive the high-order coded signal output by the high-order booth coding unit 114 through the high-order coded signal input port 1151, and may receive a multiplicand in the multiplication operation through the data input port 1153. Optionally, the high-order partial product obtaining unit 115 may obtain a partial product after sign bit extension corresponding to the high-order data according to the received high-order coded signal and the received multiplicand in the multiplication operation. Optionally, if the multiplicand bit width received by the data input port 1153 is N, the bit width of the partial product after sign bit extension may be equal to 2N.

It should be noted that the high-order partial product obtaining unit 115 may receive, through the strobe value input port 1122, a corresponding bit value in the partial product after sign bit extension corresponding to the different bit-width data strobed by the high-order selector group unit 117, and obtain the high-order partial product of the target code according to the partial product after sign bit extension corresponding to the high-order data currently obtained by the multiplier and the corresponding bit value after strobe.

In the multiplier provided by this embodiment, the high-order-portion-product obtaining unit may obtain the high-order portion of the target code according to each high-order-coded signal, and the high-order-portion-product obtaining unit obtains the high-order portion of the target code according to the high-order-coded signal, and further accumulates the high-order portion and the low-order portion of the target code to obtain a multiplication result.

In one embodiment, continuing with the specific structure diagram of the multiplier shown in fig. 3, the multiplier includes the low selector bank unit 116, and the low selector bank unit 116 includes: a plurality of low selectors 1161, the plurality of low selectors 1161 being configured to gate values in a low bit product of a target code.

Specifically, the number of the low selectors 1161 in the low selector bank unit 116 may be equal to 3/8 times of the square of the bit width currently received by the multiplier, and the internal circuit structures of the plurality of low selectors 1161 in the low selector bank unit 116 may be the same. Optionally, during the multiplication, each low-bit booth encoding unit 111 may include 2N number of value generating sub-units in the corresponding low-bit partial product obtaining unit 112, where the N number of value generating sub-units may be connected to N number of low-bit selectors 1161, and each value generating sub-unit is connected to one low-bit selector 1161, where N represents a bit width of data currently received by the multiplier. Optionally, the N value generating sub-units corresponding to the N low selectors 1161 may be value generating sub-units corresponding to high N bit values in the low bit product of the target code, and the internal circuit structures of the N low selectors 1161 and the selector 113 may be completely the same, and meanwhile, the external input ports of the N low selectors 1161 have two other input ports besides the function selection mode signal input port (mode). Optionally, if the multiplier can process N data operations with different bit widths, and the bit width of the data received by the multiplier is N, the signals respectively received by the two other input ports of the low selector 1161 may be 0, and when the multiplier performs the data operation with N bit widths, the sign bit value in the partial product after the corresponding sign bit is extended, which is obtained by the low booth encoding unit 111. The N/4 lower partial product obtaining units 112 may be connected to N/4 groups of N lower selectors 1161, sign bit values received by the N lower selectors 1161 of each group may be the same or different, but sign bit values received by the N lower selectors 1161 of the same group are the same, and the sign bit value may be obtained according to the sign bit value in the sign bit expanded partial product obtained by the lower partial product obtaining unit 112 connected to each group of N lower selectors 1161.

In addition, in the 2N number value generation subunit included in each lower partial product obtaining unit 112, the corresponding N/2 number value generation subunit may not be connected to the lower selector 1161, at this time, the number value obtained by the N/2 number value generation subunit may be data with different bit widths currently processed by the multiplier, and a corresponding bit value in a partial product obtained by extending a sign bit of a corresponding lower data, or it may be understood that the number value obtained by the N/2 number value generation subunit may be all values between the corresponding lower N/2-1 bit and the lowest bit in the partial product obtained by extending the corresponding sign bit.

It should be noted that, in the 2N number-of-bits generating subunit included in each low-bit partial product obtaining unit 112, the remaining N/2 number-of-bits generating subunit may also be connected to N/2 low-bit selectors 1161, each number-of-bits generating subunit may be connected to 1 low-bit selector 1161, the internal circuit structures of the N/2 low-bit selectors 1161 and the selector 113 may be the same, and the external input port of the N/2 low-bit selectors 1161 has two other input ports besides the function selection mode signal input port (mode), and signals respectively received by the two other input ports may be subjected to N/2-bit data operation for the multiplier, so as to obtain a sign bit value in the partial product after the sign bit is extended, and perform N-bit data operation with the multiplier, so as to obtain a corresponding bit value in the partial product after the sign bit is extended. The N/4 lower partial product obtaining units 112 may be connected to N/4 groups of N/2 lower selectors 1161, the sign bit values received by the N/2 lower selectors 1161 of each group may be the same or different, but the sign bit values received by the N/2 lower selectors 1161 of the same group are the same, and the sign bit value may be obtained according to the sign bit value in the extended partial product obtained by the corresponding connected lower partial product obtaining unit 112 according to each group of N/2 lower selectors 1161.

In addition, the corresponding bit value in the sign bit expanded partial product received by the N/2 low bit selectors 1161 of each group may be determined according to the corresponding bit value in the sign bit expanded partial product obtained by the low bit product obtaining unit 112 to which the group of low bit selectors 1161 is connected, and the corresponding bit value received by each of the N/2 low bit selectors 1161 of each group may be the same or different. The position of the 2N number generation sub-unit in each lower-order partial product obtaining unit 112 may be shifted to the left by two number generation sub-units based on the position of the 2N number generation sub-unit in the previous lower-order partial product obtaining unit 112. Optionally, only the bit width of the first low bit product in the low bit products of the target code may be equal to 2N, the remaining low bit products are less than the upper two bits based on the last low bit product, and the bit width of the last low bit product may be equal to (3N/2 + 2).

In the multiplier provided by this embodiment, the low selector set unit in the multiplier may gate the value in the low-order partial product to obtain the low-order partial product of the target code, and then accumulate the low-order partial product and the high-order partial product of the target code by the correction compression circuit to obtain the multiplication result.

In one embodiment, continuing with the specific structure diagram of the multiplier shown in fig. 3, the multiplier includes the high selector bank unit 117, and the low selector bank unit 117 includes: a high bit selector 1171, a plurality of the high bit selectors 1171 for gating the value in the high bit product of the target code.

Specifically, the number of the upper selectors 1171 in the upper selector bank unit 117 may be equal to 3/8 times of the square of the bit width of the data currently received by the multiplier, and the internal circuit structure of the plurality of upper selectors 1171 in the upper selector bank unit 117 may be the same. Optionally, during the multiplication, each high-bit booth encoding unit 114 may include 2N number of sub-value generating units connected to the corresponding high-bit partial product obtaining unit 115, where the N number of sub-value generating units may be connected to N number of high-bit selectors 1171, and each sub-value generating unit is connected to one high-bit selector 1171, where N represents a bit width of data currently received by the multiplier. Optionally, the N value generating subunits corresponding to the N high-order selectors 1171 may be value generating subunits corresponding to low-order N values in the high-order partial product of the target code, and the internal circuit structures of the N high-order selectors 1171 and the selector 113 may be completely the same, and meanwhile, the external input ports of the N high-order selectors 1171 have two other input ports besides the function selection mode signal input port (mode). Optionally, if the multiplier can process N data operations with different bit widths, and the bit width of the data received by the multiplier is N, the signals respectively received by the two other input ports of the high-bit selector 1171 may be 0, and when the multiplier performs the data operation with the bit width of N bits, the high-bit booth encoding unit 114 obtains a corresponding bit value in the partial product after the corresponding sign bit is extended. The N/4 high bit partial product obtaining unit 115 may be connected to N/4 sets of N high bit selectors 1171, and the corresponding bit values received by the N high bit selectors 1171 of each set may be the same or different.

In addition, in the 2N number of value generation subunits included in each high-order partial product obtaining unit 115, N/2 number of high-order selectors 1171 may be connected to each corresponding N/2 number of value generation subunits, each number of value generation subunit may be connected to 1 number of high-order selectors 1171, the internal circuit structures of the N/2 number of high-order selectors 1171 and the selector 113 may be the same, and an external input port of the N/2 number of high-order selectors 1171 may further include two other input ports in addition to the function selection mode signal input port (mode), signals respectively received by the two other input ports may be subjected to N/2 bit data operation for the multiplier, so as to obtain a sign bit value in the partial product after corresponding sign bit expansion, and the sign bit value in the partial product after corresponding sign bit expansion is obtained by performing N bit data operation for the multiplier. The N/4 high-order partial product obtaining units 115 may be connected to N/4 sets of N/2 high-order selectors 1171, sign bit values received by the N/2 high-order selectors 1171 of each set may be the same or different, but sign bit values received by the N/2 high-order selectors 1171 of the same set are the same, and the sign bit value may be obtained according to each set of N/2 high-order selectors 1171, corresponding to the sign bit value in the partial product obtained by the connected high-order partial product obtaining unit 115 after sign bit expansion. In addition, the corresponding bit value in the sign bit expanded partial product received by the N/2 upper selectors 1171 of each group may be determined by the sign bit value in the sign bit expanded partial product obtained by the upper partial product obtaining unit 115 to which the group of upper selectors 1171 is connected, and the corresponding bit value received by each of the N/2 upper selectors 1171 of each group may be the same or different.

It should be noted that, in the 2N number of value generation subunits included in each high-order partial product obtaining unit 115, the remaining N/2 number of value generation subunits may not be connected to the high-order selector 1171, at this time, the value obtained by the N/2 number of value generation subunit may be data with different bit widths currently processed by the multiplier, and a corresponding bit value in a partial product after sign bit expansion obtained by a corresponding high-order data, or it may be understood that the value obtained by the N/2 number of value generation subunit may be all values corresponding to bits from 3N/2-1 bit higher to N +1 bit lower in the partial product after sign bit expansion. The position of the 2N number generation subunit in each high-order partial product obtaining unit 115 may be shifted to the left by two number generation subunits based on the position of the 2N number generation subunit in the last high-order partial product obtaining unit 115. Optionally, only the bit width of the first high-order partial product in the high-order partial products of the target code may be equal to 3N/2, and the remaining high-order partial products have two less high values based on the last high-order partial product.

In the multiplier provided by this embodiment, the high selector set unit in the multiplier can gate the value in the high-order partial product to obtain the high-order partial product of the target code, and then the high-order partial product and the low-order partial product of the target code are accumulated by the modified compression circuit to obtain the multiplication result.

Fig. 3 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the modified compression circuit 12, and the modified compression circuit 12 includes: a modified Wallace tree group circuit 121 and an accumulation circuit 122, wherein the output end of the modified Wallace tree group circuit 121 is connected with the input end of the accumulation circuit 122; the modified wallace tree group circuit 121 is configured to accumulate values in each column of a partial product of a target code obtained when data with different bit widths are calculated, and the accumulation circuit 122 is configured to accumulate received input data.

Specifically, the modified wallace tree group circuit 121 may perform accumulation processing on each column number value in the partial product of the target code obtained by the modified encoding circuit 11, and the accumulation circuit 122 may perform accumulation processing on two operation results obtained by the modified wallace tree group circuit 121 to obtain a final result of multiplication. When the modified wallace tree group circuit 121 performs the accumulation processing, the distribution rule of all partial products of the target code can be characterized in that the position of the lowest bit value of the corresponding partial product of each row is staggered by two bits to the right compared with the position of the lowest bit value of the corresponding partial product of the next row, and the modified wallace tree group circuit 121 performs the accumulation processing on each column number value in all partial products of the target code according to the distribution rule. Optionally, the partial product of the target code may include a lower bit partial product of the target code and an upper bit partial product of the target code. Optionally, the two operation results obtained by the modified wallace tree group circuit 121 may include a sum output signal S and a carry output signal C.

For example, if the multiplier currently handles 16 bits by 16 bits fixed point multiplication, the distribution rule of 4 low-order partial products and 4 high-order partial products of the target code obtained by the modified coding circuit 11 is shown in fig. 4, wherein ". Smal" represents each bit value in the low-order partial products,

representing each bit value in the upper portion product, "\9679 ″, representing the sign-extended bit value of either the lower portion product or the upper portion product.

In the multiplier provided by this embodiment, the modified wallace tree group circuit may accumulate the low-order part and the high-order part of the target code, and the accumulation circuit may accumulate the accumulated result again to obtain the final result of the multiplication operation.

In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 3, the multiplier includes the modified wallace tree group circuit 121, and the modified wallace tree group circuit 121 includes: a low-order Wallace tree subcircuit 1211, a selector 1212 and a high-order Wallace tree subcircuit 1213, wherein an output terminal of the low-order Wallace tree subcircuit 1211 is connected with an input terminal of the selector 1212, and an output terminal of the selector 1212 is connected with an input terminal of the high-order Wallace tree subcircuit 1213; wherein the plurality of low Wallace tree sub-circuits 1211 are configured to accumulate each column of values in the partial product of the target code, the selector 1212 is configured to gate the carry input signal received by the high Wallace tree sub-circuit 1213, and the plurality of high Wallace tree sub-circuits 1213 are configured to accumulate each column of values in the partial product of the target code.

Specifically, the circuit structures of the plurality of low-order wallace tree sub-circuits 1211 and the plurality of high-order wallace tree sub-circuits 1213 may be implemented by a combination of a full adder and a half adder, or by a combination of a 4-2 compressor, or may be understood as a circuit capable of processing a multi-bit input signal and adding the multi-bit input signal to obtain a two-bit output signal. Optionally, the number of the high-order wallace tree subcircuits 1213 in the modified wallace tree group circuit 121 may be equal to the data bit width N currently received by the multiplier, or may be equal to the number of the low-order wallace tree subcircuits 1211, and the low-order wallace tree subcircuits 1211 may be connected in series, or the high-order wallace tree subcircuits 1213 may be connected in series. Optionally, the output terminal of the last lower Wallace tree sub-circuit 1211 is connected to the input terminal of the selector 1212, and the output terminal of the selector 1212 is connected to the input terminal of the first upper Wallace tree sub-circuit 1211. Can be prepared byOptionally, each lower Wallace tree sub-circuit 1211 of the modified Wallace tree group circuit 121 may add each column of all partial products of the target code, and each lower Wallace tree sub-circuit 1211 may output two signals, namely, a Carry signal Carry _i With a Sum signal Sum _i Where i may represent the number corresponding to each lower Wallace tree sub-circuit 1211, the number of the first lower Wallace tree sub-circuit 1211 is 0. Alternatively, the number of input signals received by each of the lower Wallace tree sub-circuits 1211 may be equal to the number of encoded signals or the number of partial products of the target encoding. The sum of the numbers of the upper Wallace tree sub-circuits 1213 and the lower Wallace tree sub-circuits 1211 in the modified Wallace tree group circuit 121 may be equal to 2N, the total number of columns from the lowest column to the highest column in all partial products of the target code may be equal to 2N, the N lower Wallace tree sub-circuits 1211 may perform the accumulation operation on each of the lower N columns of all partial products of the target code, and the N upper Wallace tree sub-circuits 1213 may perform the accumulation operation on each of the upper N columns of all partial products of the target code.

For example, if the data bit width received by the multiplier is N bits and the current multiplier performs an N-bit data multiplication operation, the selector 1212 may gate the last low-order wallace tree sub-circuit 1211 in the modified wallace tree group circuit 121 to output the carry output signal Cout _N As a carry input signal Cin received by the first high order Wallace Tree sub-circuit 1213 of the modified Wallace Tree group circuit 121 _N+1 It can also be understood that the multiplier can currently operate on the received N as a whole; the current multiplier performs an N/2 bit data multiplication, at which time the selector 1212 may gate 0 as the carry input signal Cin received by the first higher order Wallace Tree sub-circuit 1213 of the modified Wallace Tree group circuit 121 _N+1 It will also be appreciated that the multiplier may currently divide the received N-bit data into upper N/2-bit and lower N/2-bit data for multiplication operations, respectively, wherein the correspondence from the first lower Wallace Tree sub-circuit 1211 to the last lower Wallace Tree sub-circuit 1211The number i is 1,2, \ 8230;, N, the corresponding numbers i from the first high-order Wallace tree subcircuit 1213 to the last high-order Wallace tree subcircuit 1213 are N +1, N +2, \ 8230;, 2N, respectively.

It should be noted that, for modifying each of the low-level Wallace tree sub-circuits 1211 and the high-level Wallace tree sub-circuits 1213 of the Wallace tree group circuit 121, the received signals may include the carry input signal Cin _i Partial product value input signal, carry output signal Cout _i . Optionally, the partial product value input signals received by each of the lower Wallace tree subcircuits 1211 and the upper Wallace tree subcircuits 1213 may be values of corresponding columns in all partial products of the target code, and the carry signal Cout output by each of the lower Wallace tree subcircuits 1211 and the upper Wallace tree subcircuits 1213 _i May be equal to N _Cout ＝floor((N _I +N _Cin )/2) -1. Wherein N is _I May represent the number of data input bits, N, of the Wallace Tree subcircuit _Cin May represent the carry-in bit number, N, of the Wallace Tree subcircuit _Cout May represent the least carry-out bit number of the wallace tree subcircuit, and floor (·) may represent a floor rounding function. Optionally, the carry input signal received by each of the lower-level wallace tree sub-circuits 1211 and the upper-level wallace tree sub-circuits 1213 in the modified wallace tree group circuit 121 may be a carry output signal output by the last lower-level wallace tree sub-circuit 1211 or the upper-level wallace tree sub-circuit 1213, and the carry input signal received by the first lower-level wallace tree sub-circuit 1211 is 0. The carry input signal received by the first high-order Wallace tree sub-circuit 1213 may be determined by the bit width of the data currently processed by the multiplier and the bit width of the data received by the multiplier.

According to the multiplier provided by the embodiment, the partial product of the target code can be accumulated by the modified Wallace tree group circuit to obtain two paths of output signals, and the two paths of output signals are accumulated again by the accumulation circuit to obtain a multiplication result.

Fig. 3 is a schematic diagram of a specific structure of a multiplier according to another embodiment, in which the multiplier includes the accumulation circuit 122, and the accumulation circuit 122 includes: and the carry adder 1221 is used for performing addition operation on the received two data with the same bit width.

Specifically, the adder 1221 may be a carry adder with different bit widths. Optionally, the adder 1221 may receive the two paths of signals output by the modified wallace tree group circuit 121, perform addition operation on the two paths of output signals, and output a multiplication result. Alternatively, the adder 1221 may be a carry look ahead adder.

According to the multiplier provided by the embodiment, the two paths of signals output by the modified Wallace tree group circuit can be accumulated through the accumulation circuit, and a multiplication result is output.

In one embodiment, continuing with the specific structural diagram of the multiplier shown in fig. 3, the multiplier includes the adder 1221, and the adder 1221 includes: a carry signal input port 1221a, a bit signal input port 1221b, and an operation result output port 1221c; the carry signal input port 1221a is configured to receive a carry signal, the sum signal input port 1221b is configured to receive a sum signal, and the operation result output port 1221c is configured to output a result of performing accumulation processing on the carry signal and the sum signal.

Specifically, the adder 1221 may receive the Carry signal Carry output by the modified wallace tree group circuit 121 through the Carry signal input port 1221a, receive the Sum bit signal Sum output by the modified wallace tree group circuit 121 through the Sum bit signal input port 1221b, add the Carry signal Carry and the Sum bit signal Sum, and output the result through the operation result output port 1221 c.

It should be noted that, during multiplication, the multiplier may use adders 1221 with different bit widths to Carry output signals Carry and sum output signals Su output by the modified wallace tree group circuit 121M, wherein the bit width of the processable data of the adder 1221 may be equal to 2 times the bit width M of the data currently processed by the multiplier. Optionally, each of the low Wallace tree sub-circuits 1211 and the high Wallace tree sub-circuit 1213 of the modified Wallace tree group circuit 121 may output a Carry output signal Carry _i And a Sum bit output signal Sum _i (i =1, \8230;, 2M, i is the corresponding number for each low or high Wallace tree sub-circuit, starting with 1). Optionally, the adder 1221 receives a Carry = { [ Carry = { [ Carry { ]) ₁ ：Carry _2M-1 ]0, that is, the bit width of the Carry output signal Carry received by the adder 1221 is 2M, the first 2M-1 bit values in the Carry output signal Carry correspond to the Carry output signals of the first 2M-1 lower and upper walsh tree sub-circuits in the modified walsh tree group circuit 121, and the last bit value in the Carry output signal Carry may be replaced by 0. Optionally, the Sum bit output signal Sum received by the adder 1221 has a bit width M, and the value of the Sum bit output signal Sum may be equal to the Sum bit output signal of each of the lower or upper walsh tree sub-circuits of the modified walsh tree group circuit 121.

For example, if the multiplier is currently processing 8-bit by 8-bit fixed point multiplication, the adder 1221 may be a 16-bit Carry adder, as shown in fig. 11, the modified wallace tree group circuit 121 may output a Sum output signal Sum and a Carry output signal Carry of 16 lower and upper wallace tree sub-circuits, however, the Sum output signal received by the 16-bit Carry adder may be a complete Sum signal Sum output by the modified wallace tree group circuit 121, and the received Carry output signal may be a Carry output signal Carry of the modified wallace tree group circuit 121 excluding all Carry output signals output by the last upper wallace tree sub-circuit 1213, combined with 0.

According to the multiplier provided by the embodiment, the accumulation circuit can accumulate two paths of signals output by the modified Wallace tree group circuit and output a multiplication result, the process can multiply data with different bit widths, and the area of an AI chip occupied by the multiplier is effectively reduced.

Fig. 5 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the determining circuit 11, and the determining circuit 11 includes: a data input port 111 and a data output port 112; the data input port 111 is used for receiving data to be multiplied, and the data output port 112 is used for outputting the received data.

Specifically, the judgment circuit 11 receives two data to be multiplied through the data input port 111. Optionally, the data received by the determining circuit 11 may be a multiplier and a multiplicand in a multiplication operation, and bit widths of the multiplier and the multiplicand may be the same. Alternatively, the judgment circuit 11 may output the received two data through the data output port 112 and input the two data to the data expansion circuit 12 at the same time, or input the two data to the encoding circuit 13 at the same time.

It should be noted that, if the determining circuit 11 determines that the bit width of the two received data is N and is smaller than the bit width 2N of the data that can be processed by the multiplier, at this time, the determining circuit 11 needs to input the two received data with the bit width of N bits to the data expanding circuit 12 for expansion processing, so as to obtain two data with the bit width of 2N bits; if the determining circuit 11 determines that the bit width of the two received data is 2N and is equal to the bit width 2N of the data that can be processed by the multiplier, at this time, the determining circuit 11 may directly input the two received data with a bit width of 2N bits to the encoding circuit 13 for encoding.

In the multiplier provided by this embodiment, the determining circuit determines whether the received data needs to be processed by a next data expansion circuit, if the received data does not need to be processed by the next data expansion circuit, the determining circuit directly inputs the received data to the encoding circuit for encoding to obtain the partial product of the target code, otherwise, the received data is input to the data expansion circuit for expansion, the data expansion circuit inputs the expanded data to the encoding circuit for encoding to obtain the partial product of the target code, and the compression circuit performs accumulation on the partial product of the target code to obtain a final operation result.

Fig. 5 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the data expansion circuit 12, and the data expansion circuit 12 includes: a data input port 121, a data expansion mode selection signal input port 122, a function selection mode signal output port 123, and an expanded data output port 124; the data input port 121 is configured to receive the data output by the determining circuit 11, the data expansion mode selection signal input port 122 is configured to receive a data expansion mode selection signal corresponding to performing expansion processing on the received data, the function selection mode signal output port 123 is configured to output a function selection mode signal determined according to a mode in which the data expansion circuit 12 performs expansion processing on the received data, and the expanded data output port 124 is configured to output data after expansion processing.

Specifically, the data expansion mode selection signal received by the data expansion mode selection signal input port 122 may be three, and three different data expansion mode selection signals may be 00, 01, and 10, where the signal 00 indicates that the data expansion circuit 12 may expand the received N-bit data into 2N-bit data, a high N-bit value in the 2N-bit data may be equal to a value of the received N-bit data, and low N-bit values may all be equal to an expanded value of 0, at this time, the function selection mode signal output port 123 may output the function selection mode signal 00, and in an operation result with a 4N-bit wide obtained by the multiplier, a high 2N-bit value may be a final operation result; a signal 01 indicates that the data expansion circuit 12 can expand the received N-bit data into 2N-bit data, the lower N-bit value of the 2N-bit data can be equal to the value of the received N-bit data, and the upper N-bit values can be equal to the expanded value 0, at this time, the function selection mode signal output port 123 can output a function selection mode signal 00, and the lower 2N-bit value of the operation result with 4N-bit width obtained by the multiplier can be the final operation result; the signal 10 indicates that the data expansion circuit 12 can expand the received N-bit data into 2N-bit data, the lower N-bit value of the 2N-bit data can be equal to the value of the received N-bit data, and the upper N-bit values can be equal to the sign bit value of the data received by the data expansion circuit 12, at this time, the function selection mode signal output port 123 can output the function selection mode signal 01, and the lower 2N-bit value of the operation result of 4N-bit width obtained by the multiplier can be the final operation result.

It should be noted that, if the bit width of the two data received by the multiplier is 2N and is equal to the bit width 2N of the data that can be processed by the multiplier, the determining circuit 11 may directly input the two received data into the encoding circuit 13 for booth encoding; if the bit width of the two data received by the multiplier is N, which is smaller than the bit width 2N of the data that can be processed by the multiplier, and the data expansion mode selection signal received by the data expansion circuit 12 is 10, the judgment circuit 11 may input the two received data to the data expansion circuit 12 for expansion processing, and input the expanded data to the encoding circuit 13 for booth encoding processing.

In the multiplier provided by this embodiment, the data expansion circuit may perform expansion processing on received data, input the expanded data to the encoding circuit to perform encoding processing to obtain a partial product of a target code, and perform accumulation processing on the partial product of the target code by using the compression circuit to obtain a final operation result.

Fig. 5 is a schematic structural diagram of a multiplier according to another embodiment, where the multiplier includes the encoding circuit 13, and the encoding circuit 13 includes: a booth coding sub-circuit 131 and a partial product acquisition sub-circuit 132, an output of the booth coding sub-circuit 131 being connected to a first input of the partial product acquisition sub-circuit 132. The booth coding sub-circuit 131 is configured to perform booth coding on the received data to obtain a coded signal, and the partial product obtaining sub-circuit 132 is configured to obtain a partial product of the target code according to the coded signal.

Specifically, the data received by the booth coding sub-circuit 131 may be input by the determination circuit 11, or may be input by the data expansion circuit 12, and the received data may be a multiplier in multiplication, and the booth coding processing may be performed on the multiplier to obtain a coded signal. Before the booth encoding process, the booth encoding sub-circuit 131 may automatically perform a bit-filling process on the received multiplier, where the bit-filling process may be to fill a bit value 0 after the lowest bit value of the data. Illustratively, if the multiplier is currently processing 8-bit by 8-bit fixed point multiplication, the multiplier is y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ Then, before Booth encoding, the Booth encoding sub-circuit 132 can automatically perform bit-filling processing on the multiplier to convert the multiplier into y ₇ y ₆ y ₅ y ₄ y ₃ y ₂ y ₁ y ₀ 0. Optionally, the number of the encoded signals may be equal to 1/2 of the bit width of the data currently processed by the multiplier, the number of the encoded signals may be equal to the number of the original partial products, and the partial product obtaining sub-circuit 132 may obtain the corresponding partial product after sign bit expansion according to each encoded signal.

According to the multiplier provided by the embodiment, the coding circuit can be used for coding the received data to obtain the partial product of the target code, the compression circuit is used for accumulating the partial product of the target code to obtain the final operation result, the process can be used for expanding the received low-bit-width data, and the expanded data meets the bit-width requirement of the multiplier for processing the data, so that the final multiplication result is still the result of multiplying the original bit-width data, the multiplier can be used for processing the low-bit-width data, and the area of an AI chip occupied by the multiplier is effectively reduced.

Fig. 5 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the booth encoding sub-circuit 131, and the booth encoding sub-circuit 131 includes: a data input port 1311 and an encoded signal output port 1312; the data input port 1311 is configured to receive data subjected to booth encoding processing, and the encoded signal output port 1312 is configured to output an encoded signal obtained by performing booth encoding processing on the received data.

Specifically, if the data input port 1311 receives a piece of data, the booth coding sub-circuit 131 may automatically perform bit padding on the piece of data to obtain a piece of data having a bit width that is greater than the bit width of the original data by one bit, and at the same time, the booth coding sub-circuit 131 may perform booth coding on the piece of data after bit padding to obtain a plurality of coded signals, and output the plurality of coded signals through the coded signal output port 1312. Optionally, the booth encoding sub-circuit 131 may receive a multiplier in the multiplication operation through the data input port 1311, and the booth encoding sub-circuit 131 may perform booth encoding processing on the multiplier.

In the multiplier provided by this embodiment, the booth coding sub-circuit may perform booth coding on received data to obtain coded signals, then the partial product obtaining sub-circuit may obtain a corresponding partial product of a target code according to each coded signal, and may perform accumulation processing on the partial product of the target code through the compression circuit to obtain a multiplication result, the multiplier may perform expansion processing on received low-bit-width data, the expanded data satisfies a bit-width requirement of the multiplier for being able to process the data, so that a final multiplication result is still a result of multiplication on the original bit-width data, thereby ensuring that the multiplier can process operation on the low-bit-width data, and effectively reducing an area of an AI chip occupied by the multiplier.

Fig. 5 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the partial product obtaining sub-circuit 132, and the partial product obtaining sub-circuit 132 includes: an encoded signal input port 1321, a data input port 1322, and a partial product output port 1323; the code signal input port 1321 is configured to receive the code signal, the data input port 1322 is configured to receive the data, and the partial product output port 1323 is configured to output a partial product of a target code obtained from the code signal and the received data.

Specifically, as can be seen from table 1, the partial product obtaining sub-circuit 132 may receive five different types of encoded signals output by the booth encoding sub-circuit 132 through the encoded signal input port 1321, where each type of encoded signal is defined as-2x, -X, and 0, and according to the received encoded signals, the partial product of the corresponding target code may be obtained. Optionally, the data input port 1322 may receive data in a multiplication operation, which may be a multiplicand in the multiplication operation. Optionally, the partial product obtaining sub-circuit 132 may obtain a corresponding original partial product according to the encoded signal, and perform sign bit extension processing on the original partial product to obtain a sign bit extended partial product. The bit width of the partial product after sign bit extension may be equal to 2 times of the bit width 2N of the data currently processed by the multiplier, the bit width of the original partial product may be equal to 2n +1, and the data with 2N-1 bits higher than the partial product after sign bit extension may all be equal to the sign bit value in the original partial product. Optionally, the partial product of the target code may be a partial product after sign bit extension, and the original partial product may be a partial product without sign bit extension.

Optionally, in the distribution rule of all partial products of the target codes acquired by the partial product acquisition sub-circuit 132, starting from the partial product of the second target code, the partial product of each target code may be shifted by two bits to the left compared with the partial product of the previous target code, and starting from the partial product of the second target code, the two-bit higher value is not accumulated.

In the multiplier provided by this embodiment, the partial product obtaining sub-circuit can obtain the corresponding partial product of the target code according to each code signal, and the compression circuit can accumulate the partial products of the target code to obtain the multiplication result, the multiplier can perform expansion processing on the received low-bit-width data, and the expanded data meets the bit-width requirement for processing the data of the multiplier, so that the final multiplication result is still the result of performing multiplication on the original bit-width data, thereby ensuring that the multiplier can process the operation on the low-bit-width data, and effectively reducing the area of the AI chip occupied by the multiplier.

Fig. 5 is a schematic diagram of a specific structure of a multiplier according to another embodiment, where the multiplier includes the compression circuit 14, and the compression circuit 14 includes: a wallace tree group sub-circuit 141 and an accumulation sub-circuit 142; wherein, the output terminal of the wallace tree group sub-circuit 141 is connected with the input terminal of the accumulation sub-circuit 142; the wallace tree group sub-circuit 141 is configured to perform an accumulation process on the partial product of the target code, and the accumulation sub-circuit 142 is configured to perform an accumulation process on the received input data.

Specifically, the wallace tree group sub-circuit 141 may accumulate the values in all partial products of the target code obtained by the encoding circuit 13, and accumulate two output results obtained by the wallace tree group sub-circuit 141 through the accumulation sub-circuit 142 to obtain the final result of the multiplication.

According to the multiplier provided by the embodiment, the Wallace tree group sub-circuit can accumulate partial products of target codes, and the accumulation sub-circuit accumulates the accumulated results again to obtain the final result of multiplication, the multiplier can expand the received low-bit-width data, and the expanded data meets the bit-width requirement of the multiplier for processing the data, so that the final result of multiplication is still the result of multiplication of the original bit-width data, the multiplier can process the low-bit-width data, and the area of an AI chip occupied by the multiplier is effectively reduced.

In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 5, the multiplier includes the wallace tree group sub-circuit 141, and the wallace tree group sub-circuit 141 includes: a plurality of Wallace tree units 1411-141 n for performing an accumulation process on each column of the partial product of the target code.

Specifically, the circuit structure of the wallace tree units 1411-141 n may be implemented by a combination of a full adder and a half adder, or by a combination of 4-2 compressors, and it is understood that the wallace tree sub-circuits 1411-141 n are circuits capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Alternatively, the number n of Wallace tree subcircuits included in Wallace tree group subcircuits 141 may be equal to 2 times the bit width of data currently being processed by the multiplier, and each Wallace tree subcircuit may be connected in series. Optionally, each wallace tree sub-circuit in the wallace tree group sub-circuit 141 may add each column of all partial products of the target code, and each wallace tree sub-circuit may output two signals, i.e., carry signal Carry _i With a Sum signal Sum _i Wherein i may represent the number corresponding to each Wallace tree sub-circuit, and the number of each Wallace tree sub-circuit is 0. Alternatively, the number of input signals received by each Wallace tree sub-circuit may be equal to the number of encoded signals or the number of partial products of the target code.

It should be noted that the signal received by each of the Wallace Tree group subcircuits 141 may include a carry input signal Cin _i Partial product input signal, carry output signal Cout _i . Optionally, the partial product input signal received by each wallace tree unit may be a value of each column in the partial product of all target codes, and the carry signal Cout output by each wallace tree unit _i May be equal to N _Cout ＝floor((N _I +N _Cin )/2) -1. Wherein, N _I May represent the number of data input bits, N, of the Wallace Tree cell _Cin May represent the carry-in number, N, of the Wallace Tree cell _Cout Can represent the least carry-out bits of the Wallace tree cells, and floor () can represent a floor function. Optionally, each Wallace Tree group subcircuit 141The carry input signal received by the lewy tree unit may be the carry output signal output by the last walsh tree unit, and the carry input signal received by the first walsh tree unit is 0.

In one embodiment, continuing with the detailed structural diagram of the multiplier shown in fig. 5, the multiplier includes the accumulation sub-circuit 142, and the accumulation sub-circuit 142 includes: and an adder 1421, where the adder 1421 is configured to add the two same-bit-width data.

Specifically, the adder 1421 can be an adder with different bit widths. Optionally, the adder 1421 may receive two signals output by the wallace tree group sub-circuit 141, perform addition operation on the two output signals, and output a multiplication result. Optionally, the adder 1421 may be a carry look ahead adder.

Optionally, the adder 1421 includes: a carry signal input port 1421a, a sum signal input port 1421b, and an operation result output port 1421c; the carry signal input port 1421a is configured to receive a carry signal, the sum bit signal input port 1421b is configured to receive a sum bit signal, and the operation result output port 1421c is configured to output a result of performing accumulation processing on the carry signal and the sum bit signal.

Optionally, the adder 1421 may receive the Carry signal Carry output by the wallace tree group sub-circuit 141 through the Carry signal input port 1421a, receive the Sum bit signal Sum output by the wallace tree group sub-circuit 141 through the Sum bit signal input port 1421b, add the result of the Sum bit signal Sum and the Carry signal Carry, and output the result through the operation result output port 1421 c.

It should be noted that, during multiplication, the multiplier may adopt an adder 1421 with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the wallace tree group sub-circuit 141, where the bit width of the processable data of the adder 1421 may be equal to 2 times of the bit width 2N of the data currently processed by the multiplier. Optionally, each wallace tree unit in the wallace tree group sub-circuit 141 may output a Carry output signal Carry _i And a Sum bit output signal Sum _i (i =0, \8230;, 4N-1, i is the corresponding number for each Wallace tree cell, the numbers starting with 0). Optionally, carry = { [ Carry ] received by the adder 1421 ₀ ：Carry _4N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder 1421 is 4N, the first 4N-1 bit values in the Carry output signal Carry correspond to the Carry output signals of the first 4N-1 wallace tree units in the wallace tree group sub-circuit 141, and the last bit value in the Carry output signal Carry may be replaced by 0. Alternatively, the Sum bit output signal Sum received by the adder 1421 may have a bit width M, and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each wallace tree unit in the wallace tree group sub-circuit 141.

For example, if the multiplier is currently processing 8 × 8 multiplication operations, the adder 1421 may be a 16-bit Carry look ahead adder, as shown in fig. 6, the wallace tree group sub-circuit 141 may output Sum and Carry output signals Carry of 16 wallace tree units, however, the Sum output signal received by the 16-bit Carry look ahead adder may be the complete Sum signal Sum output by the wallace tree group sub-circuit 141, and the Carry output signal received may be the Carry output signal Carry combined with 0 in the wallace tree group sub-circuit 141, excluding all Carry output signals output by the last wallace tree unit. In fig. 6, wallace _ i represents a Wallace tree unit, i is the number of the Wallace tree unit from 0, a solid line connecting two Wallace tree units indicates that the Wallace tree unit corresponding to the high-order number has a carry output signal, a dotted line indicates that the Wallace tree unit corresponding to the high-order number does not have a carry output signal, and a ladder circuit indicates a two-way selector.

According to the multiplier provided by the embodiment, the two paths of signals output by the Wallace tree group sub-circuit can be subjected to accumulation operation through the accumulation sub-circuit, and a multiplication operation result is output.

Fig. 7 is a flowchart illustrating a data processing method according to an embodiment, where the method may be processed by the multipliers shown in fig. 1 and fig. 3, and this embodiment relates to a process of performing a multiplication operation on data with different bit widths. As shown in fig. 7, the method includes:

s101, receiving data to be processed.

Specifically, the multiplier may receive data to be processed, which may be a multiplier and a multiplicand in a multiplication operation, through the correction coding circuit. The multiplier may further receive different function selection mode signals through all the selectors in the correction encoding circuit and the correction compression circuit at each multiplication, and the function selection mode signals received by all the selectors in the correction encoding circuit and the selectors in the correction compression circuit at the same operation may be the same. Optionally, the data may be fixed point numbers. If the multiplier receives different function selection mode signals, the multiplier can process data operation with different bit widths, and meanwhile, the corresponding relation between the different selection mode signals and the data with different bit widths processed by the multiplier can be flexibly set, and the embodiment is not limited at all.

It should be noted that, if the bit width of the multiplier to be processed and the multiplicand received by the correction coding circuit is not equal to the bit width of the processable data corresponding to the function selection mode signal received by the multiplier, the multiplier divides the received data to be processed into a plurality of groups of data having the same bit width as the data currently processable by the multiplier according to the bit width of the data currently processable by the multiplier, and performs parallel processing, where the bit width of the data to be processed received by the correction coding circuit may be greater than the bit width of the data currently processable by the multiplier. Optionally, the parallel processing may be characterized by processing each divided group of data to be processed at the same time. If the bit width of the data to be processed received by the correction coding circuit is equal to the bit width of the data which can be processed and corresponds to the function selection mode signal received by the multiplier, the multiplier directly processes the received data to be processed. Optionally, the data to be processed may include high-order data to be processed and low-order data to be processed. If the bit width of the data to be processed is 2N, the upper N bits are the upper data to be processed, and the lower N bits are the upper data to be processed.

Optionally, the bit width of the multiplier and the multiplicand to be processed received by the correction coding circuit may be 8 bits, 16 bits, 32 bits, or 64 bits, which is not limited in this embodiment. Wherein, the bit width of the multiplier to be processed can be equal to the bit width of the multiplicand to be processed.

S102, gating a signal to be coded, and performing Booth coding processing on the data to be processed according to the signal to be coded to obtain a coded signal.

Specifically, the multiplier may determine the signal to be encoded after being gated by the selector by modifying the functional mode selection signal received by the encoding circuit, and perform booth encoding on the data to be processed according to the determined signal to be encoded to obtain the encoded signal. Optionally, the data to be processed may be a multiplier in a multiplication operation, and may include upper data to be processed and lower data to be processed, where if the bit width of the data to be processed is 2N, the upper N bits may be the upper data to be processed, and the lower N bits may be the lower data to be processed. Optionally, the signal to be encoded may be 0, or may be the highest bit value in the low bit data to be processed.

It should be noted that, if the bit width of the data received by the multiplier is 2N, and the bit width of the data currently processed by the multiplier is also 2N, the correction coding circuit may gate the highest bit value in the lower bit data to be processed through the selector, as the complement bit value in the higher bit data, and at this time, the multiplier may perform multiplication operation on the received 2N bit data as a whole; if the bit width of the data currently processed by the multiplier is N, the multiplier needs to divide the received 2N-bit data into high N-bit data and low N-bit data for parallel processing, and at this time, the correction coding circuit may gate 0 through the selector as a complementary bit value in the high-bit data.

S103, obtaining a partial product of the target code according to the code signal and the data to be processed.

Specifically, the partial product obtaining unit in the multiplier may obtain a partial product of a target code corresponding to the function selection mode signal received by the current multiplier according to the multiplicand to be processed and the code signal. Alternatively, the partial products of the target codes may be expanded partial products of corresponding sign bits obtained by the multiplier, and the number of the expanded partial products of the sign bits may be equal to the number of the coded signals.

For example, if the bit width of the data received by the multiplier is 2N and the multiplier processes N-bit wide data currently, the partial product of the target code may be a partial product obtained by expanding a corresponding sign bit of the upper N-bit data and a partial product obtained by expanding a corresponding sign bit of the lower N-bit data.

And S104, accumulating the partial product of the target code to obtain an operation result.

Specifically, the multiplier may perform accumulation processing on the partial product of the target code by the correction compression circuit, and obtain an operation result.

In the data processing method provided by this embodiment, data to be processed is received, a signal to be encoded is gated, booth encoding processing is performed on the data to be processed according to the signal to be encoded to obtain an encoded signal, a partial product of a target code is obtained according to the encoded signal and the data to be processed, and the partial product of the target code is accumulated to obtain an operation result.

As shown in fig. 8, which is a data processing method according to another embodiment, the gating of the signal to be encoded in S102 and the booth encoding of the data to be processed according to the signal to be encoded to obtain an encoded signal includes:

and S1021, obtaining high-order data and low-order data to be coded according to the signal to be coded and the data to be processed.

Specifically, the correction coding circuit may determine a plurality of high-order data to be coded corresponding to the high-order data to be processed according to the signal to be coded. Optionally, before performing the booth encoding on the data to be processed, the correction encoding circuit needs to perform a bit-complementing process on the received multiplier to be processed, that is, to complement a bit value of 0 at a lower bit of the lowest bit value in the multiplier. Optionally, the low-order data to be processed and the complement value 0 may obtain a plurality of groups of low-order data to be encoded, and the high-order data to be processed and the signal to be encoded obtained after gating may obtain a plurality of groups of low-order data to be encoded. Optionally, the number of groups of lower data to be encoded may be equal to the number of groups of higher data to be encoded, and may also be equal to 1/4 of the bit width of the data received by the multiplier.

It should be noted that the principle of dividing the plurality of groups of low-level data to be encoded may be characterized in that each 3-bit value adjacent to each other in the low-level data after the complementary bit processing is used as a group of low-level data to be encoded, and the highest-level value in each group of low-level data to be encoded may be used as the lowest-level value in the next group of low-level data to be encoded. Optionally, the principle of dividing the multiple groups of high-order data to be encoded may be characterized in that the signal to be encoded obtained by gating is used as a complementary bit value of the high-order data, every 3 adjacent bit values in the high-order data after complementary bit are used as a group of high-order data to be encoded, and the highest bit value in each group of high-order data to be encoded may be used as the lowest bit value in the next group of high-order data to be encoded.

And S1022, performing Booth encoding processing on the high-order data and the low-order data to be encoded to obtain a high-order encoded signal and a low-order encoded signal.

Specifically, the encoding rule in the booth encoding process can be seen in table 1, and it can be seen from table 1 that five different types of encoded signals, namely-2x, -X, X and 0, can be obtained by performing booth encoding on the divided low-order data and high-order data to be encoded by the low-order booth encoding unit and the high-order booth encoding unit.

The data processing method provided by this embodiment receives data to be processed, obtains high-order data and low-order data to be encoded according to the signal to be encoded and the data to be processed, performs booth encoding on the high-order data and the low-order data to be encoded, obtains a high-order encoded signal and a low-order encoded signal, obtains a partial product of a target code according to the low-order encoded signal, the high-order encoded signal, and the data to be processed, and performs accumulation processing on the partial product of the target code to obtain an operation result.

With reference to fig. 8, the step of obtaining the partial product of the target code according to the code signal and the data to be processed in S103 includes:

and S1031, obtaining a low bit partial product of the target code according to the low bit coded signal and the data to be processed.

It should be noted that, if the bit width of the data to be processed received by the multiplier is 2N, and the multiplier can process N-bit data currently, the multiplier needs to divide the data to be processed with 2N bits into high N-bit data and low N-bit data to be processed for parallel operation, and at this time, the multiplier can obtain a low-bit partial product of the target code according to the low-bit coded signal and the low N-bit data to be processed through the correction coding circuit; if the multiplier can process the data of 2N bits currently, the multiplier needs to obtain the low-bit partial product of the target code according to the low-bit coded signal and the to-be-processed 2N-bit data. Wherein, the bit width of the lower bit product of the target code may be 4N, and the number of the lower bit products of the target code may be equal to N/2.

S1032, obtaining a high-order partial product of the target code according to the high-order coded signal and the data to be processed.

It should be noted that, if the bit width of the to-be-processed data received by the multiplier is 2N, and the multiplier can process N bits of data currently, the multiplier needs to divide the 2N bits of to-be-processed data into high N bit data and low N bit data to be processed for parallel operation, and at this time, the multiplier can obtain a high bit partial product of the target code according to the high bit coded signal and the high N bit data to be processed through the correction coding circuit; if the multiplier can process the data of 2N bits currently, the multiplier needs to obtain the high-bit partial product of the target code according to the high-bit coded signal and the to-be-processed 2N-bit data. Wherein, the bit width of the upper bit product of the target code may be 4N, and the number of the upper bit products of the target code may be equal to N/2.

According to the data processing method provided by this embodiment, a low-order partial product of a target code is obtained according to the low-order coded signal and the data to be processed, a high-order partial product of the target code is obtained according to the high-order coded signal and the data to be processed, and the low-order partial product and the high-order partial product of the target code are accumulated to obtain an operation result.

In one embodiment, as shown in fig. 9, the step of obtaining the lower bit product of the target code according to the lower bit coded signal and the data to be processed in S1031 includes:

and S1031a, obtaining a low bit partial product after sign bit expansion according to the low bit coding signal and the data to be processed.

Specifically, the multiplier obtains the original low-order partial product corresponding to the data with different bit widths currently processed by the multiplier according to the received function selection mode signal, the low-order coded signal and the data to be processed, and performs sign bit extension processing on the original low-order partial product to obtain the sign bit extended low-order partial product. Optionally, the original lower bit partial product may be a lower bit partial product without sign bit extension, and may also be understood as a partial product obtained by corresponding lower bit data without sign bit extension. Optionally, the bit width of the lower bit product after sign bit extension may be equal to 2 times of the bit width M of the data received by the multiplier, and the bit width of the original lower bit product may be equal to M +1. Optionally, the sign extended lower bit partial product may include the M +1 bit value in the original lower bit partial product and the sign bit value in the M-1 bit original lower bit partial product.

It should be noted that, if the lower part of the product obtaining unit receives an 8-bit multiplicand x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ (i.e., X), the lower partial product fetch unit may be based on the multiplicand X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ (i.e., X) and five types of low-order coded signals-2X, -X, X and 0 directly obtain corresponding original low-order partial product, when the low-order coded signal is-2X, the original low-order partial product can be obtained by inverting X by one bit, then adding 1, when the low-order coded signal is 2X, the original low-order partial product can be obtained by left shifting X by one bit, when the low-order coded signal is-X, the original low-order partial product can be obtained by inverting X by one bit, then adding 1, when the low-order coded signal is X, the original low-order partial product can be the data combined by X and the higher-order value of the highest order of X, wherein the higher-order value of the highest order of X can be equal to the sign-order value of X, when the low-order coded signal is X, the original low-order partial product can be obtained by combining X and the higher-order value of the highest order of X+At 0, the original lower bit product may be 0, i.e. each bit value in the 9 bit product is equal to 0.

And S1031b, gating the value in the lower bit partial product of the target code by the lower selector bank unit.

Specifically, each of the low selectors in the low selector bank unit may gate the corresponding bit value in the low bit product of the target code according to the received different function selection signals.

And S1031c, obtaining the lower bit partial product of the target code according to the value in the lower bit partial product of the target code and the value in the lower bit partial product after sign bit expansion.

Specifically, the low-order partial product obtaining unit may obtain, according to the value in the low-order partial product of the target code obtained after the gating by the low-order selector bank unit and the partial bit value in the low-order partial product after the sign bit expansion obtained by the multiplier currently processing the corresponding bit width data, the low-order partial product of the target code corresponding to the bit width data currently processed by the multiplier.

According to the data processing method provided by this embodiment, a low-order partial product after sign bit extension is obtained according to the low-order coded signal and the data to be processed, a value in the low-order partial product of a target code is gated through a low-order selector bank unit, a low-order partial product of the target code is obtained according to the value in the low-order partial product of the target code and the value in the low-order partial product after sign bit extension, and the low-order partial product of the target code and the high-order partial product of the target code are accumulated to obtain an operation result.

In one embodiment, with reference to fig. 9, the step of obtaining the upper bit product of the target code according to the upper bit coded signal and the data to be processed in S1032 includes:

s1032a, obtaining a high-order bit partial product after sign bit expansion according to the high-order bit coding signal and the data to be processed.

Specifically, the multiplier obtains the original high-order partial product corresponding to the data with different bit widths currently processed by the multiplier according to the received function selection mode signal, the high-order coded signal and the data to be processed, and performs sign bit extension processing on the original high-order partial product to obtain the sign bit extended high-order partial product. Optionally, the original high-order partial product may be a high-order partial product without sign bit extension, and may also be understood as a partial product without sign bit extension, which is obtained by high-order data corresponding to the high-order data. Optionally, the bit width of the upper bit product after sign bit extension may be equal to 2 times of the bit width M of the data received by the multiplier, and the bit width of the original upper bit product may be equal to M +1. Optionally, the sign-extended upper partial product may include the M +1 bit value in the original upper partial product and the sign bit value in the M-1 bit original upper partial product.

S1032b, gating the value in the upper partial product of the target code by the upper selector bank unit.

Specifically, each of the high selectors in the high selector bank unit may gate the corresponding bit value in the high partial product of the target code according to the received different function selection signals.

S1032c, obtaining the high-order partial product of the target code according to the numerical value in the high-order partial product of the target code and the numerical value in the high-order partial product after sign bit expansion.

Specifically, the high-order bit product obtaining unit may obtain, according to the value in the high-order bit product of the target code obtained after the gating by the high-order selector unit and the value in the high-order bit product after the sign bit extension obtained by the multiplier currently processing the corresponding bit width data, the high-order bit product of the target code corresponding to the bit width data currently processed by the multiplier.

According to the data processing method provided by this embodiment, the high-order partial product of the target code after sign bit extension is obtained according to the high-order coded signal and the data to be processed, the value in the high-order partial product of the target code is gated through the high-order selector bank unit, the high-order partial product of the target code is obtained according to the value in the high-order partial product of the target code and the value in the high-order partial product of the target code after sign bit extension, and the high-order partial product of the target code and the low-order partial product of the target code are accumulated to obtain the operation result.

As shown in fig. 10, a data processing method according to another embodiment, the step of performing accumulation processing on the partial product of the target code in S104 to obtain an operation result includes:

s1041, accumulating the low-order partial product and the high-order partial product of the target code by a modified Wallace tree group circuit to obtain a first operation result.

Specifically, the multiplier may perform accumulation processing on each column number according to a distribution rule on all low-order bit portions and all high-order bit portions of the target code by modifying the wallace tree group circuit, so as to obtain a first operation result. Optionally, the first operation result may include a Sum output signal Sum and a Carry output signal Carry, where bit widths of the Sum output signal Sum and the Carry output signal Carry may be the same.

And S1042, accumulating the first operation result through an accumulation circuit to obtain an operation result.

Specifically, the multiplier may add the Carry output signal Carry output from the modified wallace tree group circuit and the Sum output signal Sum by an adder in the accumulation circuit, and output an addition result. Optionally, each wallace tree unit in the modified wallace tree group circuit may output a Carry output signal Carry _i And a Sum bit output signal Sum _i (i =0, \8230;, N-1, i is the corresponding number for each Wallace tree cell, the numbers starting with 0). Optionally, carry = { [ Carry ] received by the adder ₀ ：Carry _N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder is N, the first N-1 bit value in the Carry output signal Carry corresponds to the Carry output signal of the first N-1 wallace tree units in the modified wallace tree group circuit, and the last bit value in the Carry output signal Carry may be replaced by 0. Optionally, addThe Sum bit output signal Sum received by the algorithm has a bit width N, and the value of the Sum bit output signal Sum may be equal to the Sum bit output signal of each wallace tree unit in the modified wallace tree group circuit.

For example, if the multiplier processes 8 × 8 multiplication operation currently, the adder may be a 16-bit Carry look ahead adder, as shown in fig. 6, the modified wallace tree group circuit may output Sum output signals Sum and Carry output signals Carry of 16 wallace tree units, but the Sum output signal received by the 16-bit Carry look ahead adder may be a complete Sum bit signal Sum output by the modified wallace tree group circuit, and the received Carry output signal may be a Carry signal Carry obtained by combining all Carry output signals except the Carry output signal output by the last wallace tree unit in the modified wallace tree group circuit with 0.

In the data processing method provided by this embodiment, the modified wallace tree group circuit performs accumulation processing on the low-order part and the high-order part of the target code to obtain a first operation result, and the accumulation circuit performs accumulation processing on the first operation result to obtain an operation result.

As shown in fig. 11, a multiplication method according to another embodiment, where in S1041, the accumulating circuit accumulates the first operation result to obtain an operation result, includes:

and S1041a, accumulating the column number in the partial product of the target code through a low-order Wallace tree group sub-circuit to obtain an accumulation operation result.

Specifically, according to the distribution rule of all the lower bit products and all the upper bit products of the target code, the total column number of the corresponding numerical values of all the partial products of the target code is 2N (N is the bit width of the data currently processed by the multiplier), and the number corresponding to each column of numerical values from the lowest bit numerical value may be 0, \ 8230;, 2N-1, where the numbers 0 to N-1 may be referred to as the lower N column numerical values. Optionally, the accumulation operation result may be a carry output signal Cout output by the last wallace tree unit in the lower wallace tree group circuit.

It should be noted that N wallace tree units included in the lower wallace tree group sub-circuit may perform an accumulation operation on the low N column numbers according to the numbering sequence to obtain an accumulation operation result. Optionally, the accumulation operation result may include Carry output signals Carry, sum of each wallace tree unit, and output signal Cout of the last wallace tree unit in the lower wallace tree group sub-circuit.

And S1041b, gating the accumulation operation result through a selector to obtain a carry gating signal.

Specifically, the selector in the modified compression circuit may gate the output signal Cout or 0 of the last wallace tree unit in the low-order wallace tree group circuit according to the received function selection mode signal to obtain a carry gate signal.

And S1041c, accumulating by a high-order Wallace tree group circuit according to the carry gating signal and the column number values in the partial product of the target code to obtain an operation result.

Specifically, according to the distribution rule of all partial products of the target code, the total column number of the corresponding numerical values of all partial products of the target code is 2N (N is the bit width of the data currently processed by the multiplier), and the number corresponding to each column of numerical values from the lowest bit numerical value may be 0, \ 8230;, 2N-1, where the numbers N to 2N-1 may be referred to as high N column numerical values.

It should be noted that N wallace tree units included in the high-order wallace tree group circuit may perform accumulation operation on the high N column numbers according to the numbering order, and output a second operation result. The carry input signal received by the first wallace tree unit in the high-order wallace tree group circuit may be a carry strobe signal output by the selector.

In the data processing method provided by this embodiment, the low-order wallace tree group sub-circuit performs accumulation processing on the column number values in the partial products of the target codes to obtain accumulated operation results, the selector gates the accumulated operation results to obtain carry gating signals, and the high-order wallace tree group circuit performs accumulation processing on the column number values in the partial products of the target codes according to the carry gating signals and the column number values in the partial products of the target codes to obtain operation results.

Fig. 12 is a flowchart illustrating a data processing method according to another embodiment, which can be processed by the multipliers shown in fig. 2 and fig. 5, where the embodiment relates to a process of performing a multiplication operation on data with different bit widths. As shown in fig. 12, the method includes:

s201, receiving data to be processed.

Specifically, the number of the data to be processed received by the multiplier may be two, and the data is a multiplier and a multiplicand in a multiplication operation.

S202, judging whether the bit width of the data to be processed is equal to the bit width of the data which can be processed by the multiplier.

Specifically, the multiplier determines whether the bit width of the received two pieces of data to be processed is equal to the bit width of the data that can be processed by the multiplier. In this embodiment, the bit width of the data that can be processed by the multiplier is fixed, i.e., 2N bits. Optionally, the bit width of the data to be processed received by the determining circuit may be N, or may also be 2N.

And S203, if the data to be processed are not equal, performing data expansion processing on the data to be processed to obtain expanded data.

Specifically, if the bit width of the data to be processed received by the determining circuit is not equal to the bit width 2N of the data that can be processed by the multiplier, the multiplier may perform data expansion processing on the data to be processed through the data expansion circuit, and expand the data to be processed into data with a bit width of 2N.

Optionally, the performing data expansion processing on the data to be processed to obtain expanded data includes: and performing data expansion processing on the data to be processed through 0 or the sign bit value of the data to be processed to obtain expanded data. Optionally, the bit width of the expanded data is equal to the bit width of the data currently processed by the multiplier.

It should be noted that the data expansion circuit may receive three data expansion mode selection signals, which are respectively denoted as 00, 01, and 10, where the signal 00 denotes that the data expansion circuit may expand the received N bits of data to be processed into 2N bits of data, a high N-bit value of the 2N bits of data may be equal to a value of the received N bits of data, and low N-bit values may all be equal to an expanded value of 0, at this time, the data expansion circuit may output the function selection mode signal 00, and in an operation result of a 4N-bit width obtained by the multiplier, the high 2N-bit value may be a final operation result; signal 01 indicates that the data expansion circuit can expand the received N-bit data into 2N-bit data, the lower N-bit value in the 2N-bit data can be equal to the value of the received N-bit data, and the upper N-bit values can be equal to the expanded value 0, at this time, the data expansion circuit can output a function selection mode signal 00, and in the operation result with a 4N-bit width obtained by the multiplier, the lower 2N-bit value can be the final operation result; the signal 10 indicates that the data expansion circuit can expand the received N-bit data into 2N-bit data, the lower N-bit value of the 2N-bit data can be equal to the value of the received N-bit data, and the upper N-bit values can be equal to the sign bit value of the data received by the data expansion circuit, at this time, the data expansion circuit can output the function selection mode signal 01, and the lower 2N-bit value of the operation result with 4N-bit width obtained by the multiplier can be the final operation result.

And S204, coding the expanded data to obtain a partial product after sign bit expansion.

Specifically, the multiplier may perform binary coding processing on the expanded data through a coding circuit, and obtain a partial product after sign bit expansion according to a received multiplicand to be processed and a binary coded result. Alternatively, the number of partial products after sign bit extension may be equal to N.

And S205, accumulating the partial product after the sign bit is expanded to obtain an operation result.

Specifically, the multiplier may accumulate the partial product after sign bit extension by using a compression circuit, and obtain an operation result.

For example, a multiplier may process data with a bit width of 16 bits and receive two data with a bit width of 8 bits, and the multiplier may expand the received two data with a bit width of 8 bits into two data with a bit width of 16 bits through a data expansion circuit, and after performing a multiplication operation on the data, may obtain one data with a bit width of 32 bits; if the data expansion circuit expands two data with 8bit width into the low 8 bits of 0 and the high 8 bits of received data, at this time, the data expansion mode selection signal received by the data expansion circuit is 00, the output function selection mode signal is also 00, and the multiplier can intercept the high 16 bits of data in the data with 32 bit width as the final operation result; if the data expansion circuit expands the two 8-bit data into the data with the 8 high bits both being 0 and the 8 low bits being the received data, at this time, the data expansion mode selection signal received by the data expansion circuit is 01, the output function selection mode signal is also 00, and the multiplier can intercept the low 16-bit data in the 32-bit data as the final operation result; if the data expansion circuit expands two data with a bit width of 8 bits into a sign bit value with a high 8 bits as the received data with a bit width of 8 bits and a low 8 bits as the received data, at this time, the data expansion mode selection signal received by the data expansion circuit is 10, the output function selection mode signal is also 01, and the multiplier can intercept low 16 bits of the data with a bit width of 32 bits as a final operation result.

The data processing method provided by this embodiment receives data to be processed, determines whether a bit width of the data to be processed is equal to a bit width of data processable by a multiplier, and if not, performs data expansion processing on the data to be processed to obtain expanded data, encodes the expanded data to obtain a partial product after sign bit expansion, and accumulates the partial product after sign bit expansion to obtain an operation result.

After determining whether the bit width of the data to be processed is equal to the bit width of the data processable by the multiplier, the method according to another embodiment further includes: and if the sign bit is equal to the sign bit, coding the data to be processed to obtain a partial product after sign bit expansion.

Specifically, if the bit width of the data to be processed received by the multiplier is equal to the bit width 2N of the data currently processed by the multiplier, the judgment circuit in the multiplier may input the received data to be processed to the encoding circuit, and the encoding circuit directly performs binary encoding on the data to be processed to obtain the partial product after sign bit expansion. In this case, the multiplier does not need to perform data expansion processing on the data to be processed.

According to the data processing method provided by the embodiment, if the bit width of the data to be processed received by the multiplier is equal to the bit width of the data currently processed by the multiplier, the coding circuit can directly code the data to be processed to obtain the partial product after sign bit expansion, and accumulate the partial product after sign bit expansion to obtain the operation result.

As shown in fig. 13, a multiplication method according to another embodiment, where the step of encoding the expanded data in S204 to obtain a sign-bit-expanded partial product includes:

s2041, performing Booth coding processing on the expanded data to obtain a coded signal.

Specifically, the multiplier may perform booth coding processing on the expanded multiplier to be processed through a booth coding sub-circuit to obtain a coded signal. Optionally, in the booth encoding process, data with a bit width of 3 bits in the input multiplier may obtain data after one-bit encoding, the encoding rule in the booth encoding process may refer to table 1, and it can be known from table 1 that the booth encoding sub-circuit performs booth encoding on the multiplier to obtain five different types of encoded signals, where each type of encoded signal is defined as-2x, -X, and 0, respectively.

S2042, according to the data to be processed and the coded signal, obtaining a partial product after sign bit expansion.

Specifically, the partial product obtaining sub-circuit may obtain the partial product after sign bit expansion by data expansion processing according to the expanded multiplicand to be processed and the encoded signal.

The data processing method provided by this embodiment performs booth coding processing on the expanded data to obtain a coded signal, obtains a partial product after sign bit expansion according to the data to be processed and the coded signal, and performs accumulation processing on the partial product after sign bit expansion to obtain an operation result.

In one embodiment, as shown in fig. 14, the step of obtaining the partial product after sign bit extension according to the data to be processed and the coded signal in S2042 includes:

s2042a, obtaining an original partial product according to the data to be processed and the coded signal.

In particular, the number of original partial products may be equal to the number of encoded signals. Alternatively, the original partial product may be a partial product without sign bit extension.

Illustratively, if the partial product fetch sub-circuit receives an 8-bit multiplicand x ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ (i.e., X), then the partial product fetch subcircuit may be based on the multiplicand X ₇ x ₆ x ₅ x ₄ x ₃ x ₂ x ₁ x ₀ (i.e., X) and five types of encoded signals-2X, -X, X and 0 directly obtain corresponding original partial products, when the encoded signal is-2X, the original partial products can be obtained by inverting and adding 1 to X after inverting one bit left and right, when the encoded signal is 2X, the original partial products can be obtained by left shifting X one bit, when the encoded signal is-X, the original partial products can be obtained by inverting and adding 1 to X in terms of bits, when the encoded signal is X, the original partial products can be obtained by combining X with the most significant bit value of X, wherein the most significant bit value of X can be equal to the sign bit value of X, and when the encoded signal is X, the original partial products can be obtained by combining X with the most significant bit value of X+At 0, the original partial product may be 0, i.e. each bit value in the 9-bit partial product is equal to 0.

S2042b, sign bit extension processing is carried out on the original partial product, and the partial product after sign bit extension is obtained.

Specifically, the partial product obtaining sub-circuit may perform sign bit extension processing on the original partial product according to a sign bit value of the original partial product, so as to obtain the partial product after sign bit extension. Optionally, the bit width of the original partial product may be equal to N +1, and the bit width of the partial product after sign bit extension may be equal to 2N. Optionally, the low N +1 bit value in the partial product after the sign bit extension is the N +1 bit value of the original partial product, and the high N-1 bit value in the partial product after the sign bit extension is the sign bit value of the original partial product.

According to the data processing method provided by the embodiment, the original partial product is obtained according to the data to be processed and the coded signal, sign bit expansion processing is performed on the original partial product to obtain the partial product after sign bit expansion, and accumulation processing is performed on the partial product after sign bit expansion to obtain the operation result.

In another embodiment of the data processing method, the step of accumulating the partial product after sign bit extension in S205 to obtain an operation result includes:

s2051, accumulating the partial product after the sign bit is expanded through the Wallace tree group subcircuit to obtain a first operation result.

Specifically, the multiplier may accumulate all partial products after sign bit expansion by the wallace tree group sub-circuit according to a distribution rule to obtain a first operation result. Optionally, the first operation result may include a Sum output signal Sum and a Carry output signal Carry, where bit widths of the Sum output signal Sum and the Carry output signal Carry may be the same.

And S2052, accumulating the first operation result through the accumulation sub-circuit to obtain an operation result.

Specifically, the multiplier may add the Carry output signal Carry and the Sum output signal Sum output by the wallace tree group sub-circuit by an adder in the accumulation sub-circuit, and output an addition result.

According to the data processing method provided by the embodiment, the Wallace tree group sub-circuit is used for accumulating the partial product after sign bit expansion to obtain a first operation result, and the accumulation sub-circuit is used for accumulating the first operation result to obtain an operation result.

The embodiment of the application also provides a machine learning operation device, which comprises one or more multipliers mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one multiplier is included, the multipliers can be linked and transmit data through a specific structure, for example, the data is interconnected and transmitted through a PCIE bus, so as to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.

The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.

The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 15 is a schematic view of a combined processing apparatus.

Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, including data transportation, and finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.

And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.

Alternatively, as shown in fig. 16, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some components are such as camera, display, mouse, keyboard, network card, wifi interface.

In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.

In some embodiments, a chip package structure is provided, which includes the above chip.

In some embodiments, a board card is provided, which includes the above chip package structure. As shown in fig. 17, fig. 17 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving means 391 and control device 392;

the memory device 390 is connected to the chip in the chip package through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 grains (chips). In one embodiment, the chip may include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controllers are used for data transmission, and 8 bits are used for ECC checking. It can be understood that when DDR4-3200 grains are adopted in each group of memory units, the theoretical bandwidth of data transmission can reach 25600MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And arranging a controller for controlling DDR in the chip, wherein the controller is used for controlling data transmission and data storage of each storage unit.

The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and external equipment (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface is adopted for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device may be a data processor, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required for this application.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A multiplier, characterized in that the multiplier comprises: the output end of the judgment circuit is connected with the input end of the data expansion circuit, the output end of the judgment circuit is connected with the first input end of the coding circuit, the output end of the data expansion circuit is connected with the second input end of the coding circuit, and the output end of the coding circuit is connected with the input end of the compression circuit;

the judging circuit is used for judging whether bit widths of the received data and the processable data of the multiplier are equal or not, and determining whether the data needs to be processed through a data expansion circuit connected with the output end of the judging circuit according to a judging result, the data expansion circuit is used for carrying out expansion processing on the received data, the coding circuit is used for carrying out coding processing on the received data to obtain a partial product of a target code, and the compression circuit is used for carrying out accumulation processing on the partial product of the target code.

2. The multiplier of claim 1, wherein the coding circuit comprises a third input terminal for receiving an input function selection mode signal; the compression circuit comprises a first input end for receiving an input function selection mode signal.

3. The multiplier of claim 1, wherein the decision circuit comprises: a data input port and a data output port; the data input port is used for receiving data for multiplication operation, and the data output port is used for outputting the received data.

4. The multiplier of claim 1, wherein the data spreading circuit comprises: the data expansion module comprises a data input port, a data expansion mode selection signal input port, a function selection mode signal output port and an expanded data output port; the data input port is used for receiving the data output by the judging circuit, the data expansion mode selection signal input port is used for receiving a data expansion mode selection signal corresponding to the received data through expansion processing, the function selection mode signal output port is used for outputting a function selection mode signal determined according to the mode of the data expansion circuit through expansion processing of the received data, and the expanded data output port is used for outputting the data after the expansion processing.

5. The multiplier of claim 1, wherein the encoding circuit comprises: the output end of the Booth coding sub-circuit is connected with the first input end of the partial product acquisition sub-circuit;

6. The multiplier of claim 5, wherein the Booth encoding subcircuit comprises: the data input port is used for receiving data subjected to Booth coding processing, and the coding signal output port is used for outputting a coding signal obtained after the Booth coding processing is performed on the received data.

7. The multiplier of claim 5, wherein the partial product acquisition sub-circuit comprises: the data processing system comprises an encoding signal input port, a data input port and a partial product output port, wherein the encoding signal input port is used for receiving the encoding signal, the data input port is used for receiving the data, and the partial product output port is used for outputting a partial product of a target code acquired according to the encoding signal and the received data.

8. The multiplier of claim 1, wherein the compression circuit comprises: a Wallace tree group sub-circuit and an accumulation sub-circuit; the output end of the Wallace tree group sub-circuit is connected with the input end of the accumulation sub-circuit; the Wallace tree group sub-circuit is configured to accumulate the partial products of the target code, and the accumulation sub-circuit is configured to accumulate the received input data.

9. The multiplier of claim 8, wherein the wallace tree group subcircuit comprises: a Wallace tree unit to accumulate each column of the partial product of the target code.

10. The multiplier of claim 8, wherein the accumulation sub-circuit comprises: and the adder is used for performing addition operation on the two received data with the same bit width.

11. The multiplier of claim 10, wherein the adder comprises: the input port of the carry signal is used for receiving the carry signal, the input port of the sum signal is used for receiving the sum signal, and the output port of the operation result is used for outputting the result of the accumulation processing of the carry signal and the sum signal.

12. A method of data processing, the method comprising:

receiving data to be processed;

coding the expanded data to obtain a partial product after sign bit expansion;

13. The method according to claim 12, further comprising, after determining whether the bit width of the data to be processed is equal to the bit width of the data processable by the multiplier: and if the sign bit is equal to the sign bit, coding the data to be processed to obtain a partial product after sign bit expansion.

14. The method of claim 12, wherein said encoding said data after spreading to obtain a sign-bit-spread partial product comprises:

and obtaining the partial product after the sign bit is expanded according to the data to be processed and the coded signal.

15. The method of claim 14, wherein obtaining the sign-bit-extended partial product based on the data to be processed and the encoded signal comprises:

16. The method according to claim 12, wherein the performing data expansion processing on the data to be processed to obtain expanded data includes: and performing data expansion processing on the data to be processed through 0 or the sign bit value of the data to be processed to obtain expanded data.

17. The method of claim 16, wherein the bit width of the expanded data is equal to the bit width of data currently processed by the multiplier.

18. The method of claim 12, wherein accumulating the sign-bit-extended partial product to obtain an operation result comprises:

accumulating the partial product after the sign bit is expanded through a Wallace tree group sub-circuit to obtain a first operation result;

19. A machine learning arithmetic device, characterized in that the machine learning arithmetic device comprises one or more multipliers as claimed in any one of claims 1 to 11, and is used for acquiring input data and control information to be operated from other processing devices, executing specified machine learning operation, and transmitting the execution result to other processing devices through an I/O interface;

when the machine learning arithmetic device comprises a plurality of multipliers, the multipliers can be connected through a specific structure and transmit data;

the plurality of multipliers are interconnected through a PCIE bus and transmit data so as to support operation of machine learning in a larger scale; a plurality of multipliers share the same control system or own respective control systems; a plurality of multipliers share a memory or own respective memories; the interconnection mode of a plurality of multipliers is any interconnection topology.

20. A combined processing apparatus, characterized in that the combined processing apparatus comprises the machine learning arithmetic apparatus according to claim 19, a universal interconnect interface and other processing apparatus;

and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

21. The combination processing device of claim 20, further comprising: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.

22. A neural network chip, comprising the machine learning computation device of claim 19 or the combined processing device of claim 20.

23. An electronic device, characterized in that it comprises a chip according to claim 22.

24. The utility model provides a board card, its characterized in that, the board card includes: a memory device, a receiving device and a control device and a neural network chip as claimed in claim 22;

wherein the neural network chip is respectively connected with the storage device, the control device and the receiving device;

the storage device is used for storing data;

the receiving device is used for realizing data transmission between the chip and external equipment;

and the control device is used for monitoring the state of the chip.

25. The board of claim 24,

the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit; the receiving device is as follows: a standard PCIE interface.