CN114063973B

CN114063973B - Galois field multiplier and erasure coding and decoding system

Info

Publication number: CN114063973B
Application number: CN202210039878.0A
Authority: CN
Inventors: 张磊; 王明明; 王凛
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-04-22
Anticipated expiration: 2042-01-14
Also published as: CN114063973A; WO2023134130A1

Abstract

The application discloses a Galois field multiplier and an erasure coding and decoding system. The Galois field multiplier comprises a plurality of basic operation units and a plurality of cyclic processing units which are connected in series, wherein the total number of the basic operation units and the total number of the cyclic processing units are determined according to the data bit width of input data of the Galois field multiplier. Each basic operation unit carries out Galois field multiplication operation on the received data and the target generating element and outputs the multiplication result to the next operation unit and the corresponding cycle processing unit. The loop processing unit group is used for determining the current loop times according to the input data, the initialization data and the Galois field multiplication result output by the basic operation unit group and outputting the final calculation result. The method and the device can effectively reduce hardware resources and area consumption of the storage system, and also support flexible configuration of the Galois field polynomial.

Description

Galois field multiplier and erasure coding and decoding system

Technical Field

The present application relates to the field of computer technologies, and in particular, to a galois field multiplier and an erasure coding and decoding system.

Background

In the field of data transmission and data storage, erasure codes are favored for lower storage costs. RS Code (Reed-Solomon Code) is a common EC Code (Erasure Code) that can calculate N parity data blocks from M data blocks. In the total number of M + N data blocks, all original data can be recovered by randomly selecting N normal data blocks. In particular, in the field of data storage, erasure codes are an extremely important means for ensuring data reliability. The RS erasure coding process is shown in fig. 1, where B is the matrix used for coding, the lower half of gray such as B11 is the cauchy or vandermonde matrix, D is the storage data disk requiring erasure, and the resulting C is the coded data. When part of data blocks are lost and recombined into a new matrix operation relation, multiplying the new matrix operation relation by an inverse matrix to obtainTo the original data, this process is also the RS erasure decoding process as shown in fig. 2. Wherein, Survivors is the remaining normal data after the storage of the abnormal disk,

the matrix formed by the coding matrix B corresponding to the row of the normal data,

is composed of

The inverse matrix of (c).

GF (Galois Field) multiplication is widely applied in RS codec, and considering that the distributed storage system has larger and larger number of storage disks and larger amount of storage data per disk, the distributed storage with high speed and large throughput rate uses high-speed RS erasure calculation as the main challenge of the present design, so the implementation of Galois Field multiplier application with hardware circuit is resulted. Multiplication in the galois field uses the theory of least-polynomial simplified higher-order matrix operations in linear algebra. The basic idea is as follows: firstly, two vectors are respectively converted into two polynomials, then polynomial multiplication operation is carried out on the two polynomials, and the result of the polynomial multiplication operation after the modulus operation of the primitive polynomial is converted into the vectors. The traditional Galois field multiplier is realized by adopting a mode of first multiplication and then modulus, and the method occupies more cycles and is more complex to realize. In order to solve the technical defects of the traditional method, the related technology replaces a modulo method with a table look-up method, so that the operation period can be greatly reduced.

A generator is a special type of element on a domain, and the power of a generator may traverse all elements on the domain. For example, g is the field GF (2)^w) The generator above, then the set { g0, g1, …, g (2)^w-1) Contains the field GF (2)^w) All non-zero elements above. In the field GF (2)^w) In (2) always generates a primitive. Applying generator to polynomial, GF (2)^w) All polynomials in (1) can be raised by power from a polynomial generator gI.e. any element z in the domain, can be expressed as z = g^k。GF(2^w) Is a finite field, but the index k is infinite, so there must be a cycle with a cycle period of 2^w1, g cannot generate polynomial 0. When k is greater than or equal to 2^wAt-1 time, g^k=g^{(k%(2^w-1))}Where ^ represents an exclusive OR operation. For z = g ^ k, there are positive processes and inverse processes, knowing the index k to find the z value as positive process and knowing the z value to calculate the index k as inverse process. For multiplication, assume thata=g ⁱ，b=g ^jThen, thena*b=g ⁱ*g ^j=g^i+j(). The table look-up method is that according to a and b, the table is respectively looked up to obtain i and j, and then the table is looked up g ^ (a)i+j) And (4) finishing. Therefore, it is necessary to construct a positive and a negative table, denoted as gflog and gflog, respectively, over the GF (2^ w) field. The positive table gfog maps the binary form to the polynomial form and the negative table gfilog maps the polynomial form to the binary form. The calculation formula for the look-up table GF multiplication is:

c=a*b=gfilog[(gflog[a]+gflog[b])mod(2^w–1)]；

when the table lookup GF multiplier is realized by hardware, the calculation sequence can be divided into three steps:

the first step is as follows: selecting the value corresponding to the positive table according to the a and the b;

the second step is that: adding the values of the check table and taking the remainder;

the third step: the remainder is the final result.

Considering real-time performance, the multiplier architecture for hardware implementation is shown in fig. 3, and a GF multiplier for hardware implementation needs to store two positive tables and one negative table. In general, when the data bit width w is determined by using the GF multiplier, the remainder operation is converted into a subtraction operation on a constant, so the resource consumption is mainly the table lookup operation. The hardware implementation method of the table Lookup GF multiplier can be seen from the algorithm and the hardware implementation method, the method has the advantages of simple principle, small calculation complexity and high timeliness, but the method has the advantage that the loss of hardware resources and chip area is large due to the use of a plurality of LUTs (Lookup tables). The implementation scheme of the erasure coding and decoding lookup table GF multiplier hardware shows that when the data volume of an erasure coding and decoding system is large, a large number of GF multipliers need to be used, and the number of lookup tables is multiplied while the number of GF multipliers is increased. At this time, the hardware resource overhead is increased, which causes the original advantages of the table lookup GF multiplier to be unobvious, and the problems of the introduction of chip area and increased space resources.

In view of this, how to reduce the hardware resources consumed in the GF multiplier lookup table process is a technical problem to be solved by those skilled in the art.

Disclosure of Invention

The application provides a Galois field multiplier and an erasure coding and decoding system, which not only effectively reduce hardware resources consumed by a GF multiplier lookup table, but also support flexible configuration of Galois field polynomials.

In order to solve the above technical problems, embodiments of the present invention provide the following technical solutions:

one aspect of the embodiments of the present invention provides a galois field multiplier, which includes a basic operation unit group and a cyclic processing unit group;

the basic operation unit group comprises a starting operation unit, a plurality of intermediate operation units and a stopping operation unit which are connected in series, and the cyclic processing unit group comprises a plurality of cyclic processing units which are connected in series; the total number of the operation units contained in the basic operation unit group and the total number of the circulation processing units contained in the circulation processing unit are determined according to the data bit width of the input data of the Galois field multiplier;

the initial operation unit is used for carrying out Galois field multiplication operation on the first input data and the target generating element and outputting a multiplication calculation result to the next operation unit and the corresponding cycle processing unit; each intermediate operation unit is used for carrying out Galois field multiplication operation on the received data and the target generating element and outputting a multiplication calculation result to the next operation unit and the corresponding circulating processing unit; the termination operation unit is used for carrying out Galois field multiplication operation on the received data and the target generating element and outputting a multiplication calculation result to the corresponding cycle processing unit;

and the circulation processing unit group is used for determining the current circulation times according to the second input data, the initialization data and the Galois field multiplication result output by the basic operation unit group and outputting the final calculation result.

Optionally, the total number of loop processing units included in the loop processing unit group is the same as the data bit width value of the input data of the galois field multiplier; each bit data of the second input data corresponds to a cyclic processing unit.

Optionally, the total number of operation units included in the basic operation unit group is a difference between a data bit width value of input data of the galois field multiplier and 1.

Optionally, each loop processing unit of the loop processing unit group includes a register, an exclusive or gate, and a selector; the register is respectively connected with the exclusive-OR gate and the selector, and the exclusive-OR gate is connected with the selector;

the register is used for storing the original data received by the circulation processing unit and performing time sequence alignment; the original data is a previous data result or initialization data output by a previous cycle processing unit;

the exclusive-OR gate is used for performing exclusive-OR calculation on the multiplication calculation result output by the arithmetic unit corresponding to the cyclic processing unit and the original data and outputting the exclusive-OR calculation result to the selector;

the selector is configured to select target data as an output result from the original data and the xor calculation result according to a bit value of the second input data.

Optionally, the register is a D-type flip-flop.

Another aspect of the embodiments of the present invention provides an erasure coding and decoding system, including:

the system comprises a data distribution module, an operation module and a reordering module;

the data distribution module is used for distributing data to be corrected and deleted to obtain a plurality of rows of data to be calculated;

the operation module comprises a plurality of operation sub-modules, and each operation sub-module is used for performing multiplication and accumulation calculation on a line of data to be calculated; each operational submodule comprises an adder and a plurality of Galois field multipliers as described in any of the previous paragraphs; the total number of the Galois field multipliers is determined according to the number of bus bytes; the adder is used for performing accumulation operation on the multiplication calculation result output by each Galois field multiplier;

and the reordering module is used for splicing the multiply-accumulate calculation results output by each operation submodule according to the distribution sequence.

Optionally, the operation sub-module further includes a PE controller, and the PE controller is respectively connected to the adder and each galois field multiplier;

and the PE controller is used for determining the accumulated iteration times of the adder and the using times of the data to be corrected and deleted according to the total number of multiply-accumulate calculation.

Optionally, the operation submodule further includes an EC block output unit;

the EC block output unit is used for controlling the back pressure of the operation sub-module and whether the resequencing module executes output operation or not according to the operation state of the operation sub-module and the output state of the resequencing module.

Optionally, the total number of galois field multipliers included in the operation submodule is the same as the number of bytes of the bus; the total number of Galois field multipliers contained in the operation module is the product value of the number of the matrix rows of the data to be erased and the number of the bus bytes.

Optionally, the adder is a galois field adder.

The technical scheme provided by the application has the advantages that the Galois field multiplier is structurally designed based on a pipelining method, Galois field polynomials can be changed in real time according to use requirements for calculation, the polynomial configuration is supported, a positive table and a negative table fixed by a table look-up method are not fixedly used, and the flexibility of the GF multiplier is effectively improved. The structure of the pipelined GF multiplier is adopted to replace the structure of the original table lookup, hardware resources are not consumed by using a plurality of table lookups like the traditional table lookup GF multiplier, so that the space resources and the chip area of the GF multiplier in RS erasure coding and decoding are greatly reduced, the hardware resource consumption occupied by the table lookup is effectively reduced, the hardware resources and the area consumption of a storage system are reduced, and the timeliness of calculation is not influenced.

In addition, the embodiment of the invention also provides an erasure correcting coding and decoding system aiming at the Galois field multiplier, so that the Galois field multiplier has higher practicability, and the erasure correcting coding and decoding system has corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the related art, the drawings required to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic diagram of an RS erasure coding process of an exemplary application scenario according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an RS erasure decoding process of an exemplary application scenario according to an embodiment of the present invention;

fig. 3 is a diagram illustrating a table lookup GF multiplier hardware implementation method in an exemplary application scenario according to an embodiment of the present invention;

fig. 4 is a structural diagram of an embodiment of a galois field multiplier according to an embodiment of the present invention;

fig. 5 is a structural diagram of another specific implementation of a galois field multiplier according to an embodiment of this invention;

fig. 6 is a block diagram of an embodiment of a galois field multiplier in an illustrative example according to an embodiment of this invention;

FIG. 7 is a block diagram of an embodiment of a loop handling unit in an illustrative example provided by an embodiment of the invention;

FIG. 8 is a block diagram of another embodiment of a loop handling unit in an illustrative example provided by an embodiment of the invention;

fig. 9 is a structural diagram of a specific implementation of an erasure coding and decoding system according to an embodiment of the present invention;

fig. 10 is a structural diagram of another specific embodiment of an erasure coding and decoding system according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," "fourth," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may include other steps or elements not expressly listed.

Having described the technical solutions of the embodiments of the present invention, various non-limiting embodiments of the present application are described in detail below.

Referring to fig. 4, fig. 4 is a schematic diagram of a structural framework of a galois field multiplier according to an implementation manner, where the galois field multiplier according to an embodiment of the present invention includes the following contents:

the galois field multiplier of this embodiment includes a basic operation unit group 41 and a cyclic processing unit group 42, and the structures of the operation units included in the basic operation unit group 41 are the same, and the structures of the cyclic processing units included in the cyclic processing unit group 42 are also the same. In order to describe the connection relationship of the operation units and the data processing flow more clearly, the basic operation unit group 41 may include a start operation unit, a plurality of intermediate operation units, and a stop operation unit which are connected in series, and the loop processing unit group 42 may include a plurality of loop processing units which are connected in series; accordingly, the loop processing unit group 42 may include a start loop processing unit, a plurality of intermediate loop processing units, and an end loop processing unit connected in series. The start operation unit is the first operation unit connected in series in the basic operation unit group 41, and receives the original data, that is, one multiplier or polynomial for galois field calculation, this embodiment may be referred to as the first input data, the end operation unit is the last operation unit connected in series, and the intermediate operation unit is each operation unit connected in series between the start operation unit and the end operation unit. Similarly, the start loop processing unit is the first loop processing unit connected in series in the loop processing unit group 42, and receives the original data, that is, another multiplier or polynomial for galois field calculation, this embodiment may be referred to as the second input data, the end loop processing unit is the last loop processing unit connected in series, and the middle loop processing unit is each loop processing unit connected in series between the start loop processing unit and the end loop processing unit. The total number of the operation units included in the basic operation unit group 41 and the total number of the loop processing units included in the loop processing unit group 42 are determined according to the data bit width of the input data of the galois field multiplier, that is, the first input data and the second input data. Alternatively, the total number of loop processing units included in the loop processing unit group 42 may be the same as the data bit width value of the input data of the galois field multiplier; accordingly, each bit data of the second input data corresponds to one loop processing unit. The total number of arithmetic units included in the basic arithmetic unit group 41 is a difference between the data bit width value of the input data of the galois field multiplier and 1. For example, if the input data is 8 bits wide, the total number of loop processing units included in the loop processing unit group 42 is 8, and the total number of arithmetic units included in the basic arithmetic unit group 41 is 7. If the input data has a bit width of 16 bits, the total number of loop processing units included in the loop processing unit group 42 is 16, and the total number of arithmetic units included in the basic arithmetic unit group 41 is 15. If the input data is Nbit bit wide, the total number of loop processing units included in the loop processing unit group 42 is N, and the total number of arithmetic units included in the basic arithmetic unit group 41 is N-1. If the arithmetic unit included in the basic arithmetic unit group 41 is the gmul2 module and the loop processing unit included in the loop processing unit group is the cacu & sel module, the structure of the galois field multiplier can be as shown in fig. 5 and 6, where fig. 5 shows that the input data is 8 bits wide, the number of gmul2 modules is 7, and the number of cacu & sel modules is 8. FIG. 6 shows that the input data is Nbit wide, the number of gmul2 modules is N-1, and the number of cacu & sel modules is N.

In this embodiment, the start operation unit is configured to perform galois field multiplication on the first input data and the target generator, and output a multiplication result to the next operation unit and the corresponding loop processing unit. The target generator may be any generator, and may be, for example, 2. Each intermediate operation unit is used for carrying out Galois field multiplication operation on the received data and the target generating element and outputting a multiplication calculation result to the next operation unit and the corresponding circulating processing unit; the data received by each intermediate arithmetic unit is the multiplication result output by the previous intermediate arithmetic unit. The termination operation unit is used for carrying out Galois field multiplication operation on the received data and the target generating element and outputting a multiplication calculation result to the corresponding cycle processing unit; and the circulation processing unit group is used for determining the current circulation times according to the second input data, the initialization data and the Galois field multiplication result output by the basic operation unit group and outputting the final calculation result. The initialization data may be an initialization value of the input data having a data bit width that is the same as the data bit width of the input data, for example, for input data having a data bit width of 8 bits, the initialization data may be 8'd 0. Specifically, the initial loop processing unit is configured to determine a current loop number according to a corresponding bit value of the second input data, the initialization data, and the first input data, and output a current calculation result to the next loop processing unit. Each intermediate cycle processing unit is used for determining the current cycle times according to the calculation result input by the previous cycle processing unit, the multiplication calculation result input by the corresponding operation unit and the corresponding bit value of the second input data and outputting the current calculation result to the loop termination processing unit, and the loop termination processing unit is used for determining the final calculation result according to the calculation result input by the previous cycle processing unit, the multiplication calculation result input by the corresponding operation unit and the corresponding bit value of the second input data and outputting the final calculation result. The final calculation result is a galois field multiplication value of the first input data and the second input data.

In the technical scheme provided by the embodiment of the invention, the Galois field multiplier is structurally designed based on a pipelining method, Galois field polynomials can be changed in real time according to use requirements for calculation, the match of the polynomials is supported, a positive table and a negative table fixed by a table look-up method are not fixed any more, and the flexibility of the GF multiplier is effectively improved. The structure of the pipelined GF multiplier is adopted to replace the structure of the original table lookup, hardware resources are not consumed by using a plurality of table lookups like the traditional table lookup GF multiplier, so that the space resources and the area of the GF multiplier in RS erasure coding and decoding are greatly reduced, the hardware resource consumption occupied by the table lookup is effectively reduced, the hardware resources and the area consumption of a storage system are reduced, and the timeliness of calculation is not influenced.

The above embodiment does not limit the structure of the loop processing unit, and this embodiment further provides an optional implementation manner of the loop processing unit, as shown in fig. 7, where the loop processing unit is configured to calculate the number of loops according to the bit values 0 and 1 of the second input data and whether the result of selecting the arithmetic unit, such as gmul2 in fig. 5, is used, and may include the following:

in the present embodiment, each loop processing unit of the loop processing unit group 42 includes a register, an exclusive or gate, and a selector; the register is respectively connected with the exclusive-OR gate and the selector, and the exclusive-OR gate is connected with the selector.

The register is used for storing the original data received by the loop processing unit and performing timing alignment. Wherein, the original data is the previous data result or the initialization data output by the previous cycle processing unit; for the registers of the start loop processing unit, the original data thereof is the initialization data, for the registers of the intermediate loop processing unit and the end loop processing unit, the original data thereof is the calculation result output by the previous loop processing unit, and this embodiment refers to the calculation result output by the previous loop processing unit as the previous data result. The register may be, for example, a D-type flip-flop DFF, and besides the DFF register serves as a storage unit to store data in the entire GF multiplier, the DFF register may also implement a timing logic in a hardware design, that is, timing alignment for pipeline design, or may be other types of devices, which do not affect the implementation of the present application.

And the exclusive-OR gate is used for carrying out exclusive-OR calculation on the multiplication calculation result output by the arithmetic unit corresponding to the cyclic processing unit and the original data and outputting the exclusive-OR calculation result to the selector. In the present embodiment, the original data of the xor gate is supplied thereto by the register, but a delay of one clock cycle is required to wait for the time when the result is operated out by the corresponding operation unit.

The selector is used for selecting target data as an output result from the original data and the XOR calculation result according to the bit value of the second input data. Optionally, the bit value of the second input data is 0, the original data is selected as the target data, the bit value of the second input data is 1, and the result of the exclusive or calculation is selected as the target data. Of course, the bit value of the second input data is 1, the selected original data may be used as the target data, the bit value of the second input data is 0, and the result of the exclusive or calculation may be used as the target data. The selector of this embodiment is an alternative selector, which selects whether the calculation process uses the calculation result of the previous data result exclusive or unit or only the previous data result. Optionally, in conjunction with fig. 7 and 8, the alternative selector selects whether the calculation uses pre _ result exclusive-or gmul2 or only pre _ result. When the bit value corresponding to the data _ a [ n ] is 1, selecting pre _ result exclusive or gmul 2; when the data _ a [ n ] corresponds to a bit value of 0, pre _ result is selected.

As can be seen from the above, the galois field multiplier of the pipelining method of this embodiment can be implemented by using a simple and/or gate and a selector, so as to improve the flexibility of the GF multiplier, reduce the hardware resource consumption occupied by table lookup, and support the configurable polynomial.

In order to make the technical solutions of the present application more obvious to those skilled in the art shown, an illustrative example of a galois field multiplier is given in conjunction with fig. 6, which may include the following:

this embodiment adopts irreducible polynomial specified by AES (Advanced Encryption Standard) algorithm

An example of the analysis is performed. To facilitate programming, first order is found, assuming functions

A galois field multiplication is represented by a field,

、

looking at Galois field calculations multiplied by 2, i.e. left-right, first

：

For the

=7，

It can be seen that a number in the Galois field is in phase 2The multiplication equals this number shifted left by one bit. Suppose that

Corresponding polynomial

Is greater than 7, i.e.

Is 1, i.e. the highest bit of

Then proceed to

Simplification, such as:

from the above few examples, the calculation rules can be summarized, i.e.

For performing the following computational relationship:

by passing

The following can be obtained by calculation:

so it is equivalent to:

circulation for 7 times ^

Circulation 6 times …

Cycling 1 time ^ v. A represents an exclusive-or operation,<<it is indicated that the left-hand shift,>>indicating a shift to the right. Based on the above computational derivation of galois field polynomial multipliers, the module is designed as a galois field multiplier architecture for the pipelining method shown in fig. 5 and 6. That is, in this embodiment, the operation formula of the GF multiplier is result = data _ a × data _ b, data _ a and data _ b are two multipliers, that is, two input data of the GF multiplier, an external interface of the GF multiplier has only two 8-bit inputs, the initialization 8'd 0 is defined according to the 8-bit width of the multiplicand, and if the bit width is 16 bits, the initialization data is 16'd 0, but the initialized value of 0 is not variable, and the external interface cannot see. The input data has a data bit width of 8 bits, the initialization data is 8'd 0, and the Galois field multiplier includes 7 serially connected

And 8 cacu in series&sel of the first group of the second group,

as arithmetic units, cacu&sel is a cyclic processing unit, and each bit of 8-bit input data is input into one cacu&sel。

From the above, the GF multiplier of the pipelining method implemented by the embodiment of the present invention has the characteristics of low resource consumption, fast performance, high throughput, good flexibility, novelty, creativity, simplicity, and practicality.

The embodiment of the invention also provides a system under a corresponding application scene aiming at the Galois field multiplier, so that the Galois field multiplier has higher practicability. Referring to fig. 9, the erasure correcting coding/decoding system provided by the embodiment of the present invention is introduced as follows, which includes the following contents:

the erasure correcting codec system may include a data distribution module 91, an operation module 92, and a reordering module 93. The data distribution module 91, the operation module 92, and the resequencing module 93 are connected to each other via a bus.

The data distribution module 91 may be configured to perform data distribution on data to be erased and corrected to obtain multiple rows of data to be calculated. The data to be erased includes matrix data and data, and for the erasure correction encoding process, the data to be erased is encoding matrix data and original disk data, such as B matrix and D data in fig. 1. For the erasure decoding process, the data to be erased is the inverse matrix corresponding to the matrix reformed by the encoding matrix corresponding to the row where the normal data is located, and the normal data remaining after the compass sends an exception, such as the normal data in fig. 2

Matrix and Survivors data. In the whole erasure coding and decoding, the matrix multiplication can be understood, and the fundamental calculation of the matrix multiplication is multiply-accumulate. Based on the fact that the corresponding rows need to be multiplied and accumulated, the matrix data needs to be split into multiple rows of data through the data distribution module 91, multiplication calculation is performed on each row of data, and finally, multiplication calculation results of all rows are accumulated to obtain a final result.

The operation module 92 may include a plurality of operation sub-modules, and each operation sub-module performs multiplication operation based on a galois field multiplier, and the galois field multiplier does not calculate all byte numbers of each row of data at the same time, but calculates corresponding byte numbers based on the byte operations supported by the galois field multiplier, so each operation sub-module includes a plurality of galois field multipliers and an adder for performing accumulation calculation on the calculation result of each galois field multiplier, that is, each operation sub-module is configured to perform multiply-accumulate calculation on a row of data to be calculated; each operational submodule comprises an adder and a plurality of Galois field multipliers as described in any of the above embodiments; the total number of Galois field multipliers is determined by the number of bus bytes. Optionally, if the galois field multiplier is operated by a single byte, the total number of galois field multipliers contained in each operation submodule is the same as the number of bus bytes; the total number of Galois field multipliers contained in the operation module is the product value of the number of matrix rows of the data to be erased and the number of bus bytes. The adder is used for performing accumulation operation on the multiplication calculation result output by each Galois field multiplier. The adder may be a galois field adder, for example. The resequencing module 93 is configured to perform splicing processing on the multiply-accumulate calculation result output by each operation submodule according to the distribution order of the data to be erasure-corrected data by the data distribution module 91.

Furthermore, because each operation submodule needs to execute a plurality of multiply-add calculations, in order to ensure that the operation submodule successfully executes the calculation, based on the above embodiment, each operation submodule may further include a PE controller, and the PE controller is respectively connected to the adder and each galois field multiplier; the PE controller is used for determining the accumulated iteration times of the adder and the using times of the data to be corrected and deleted according to the total number of the multiply-accumulate calculation.

Since the operation module includes a plurality of operation sub-modules, the reordering module 93 is configured to output a final calculation result, and in order to ensure that the result is output without error, each operation sub-module may further include an EC block output unit based on the above embodiment; the EC block output unit is used for controlling whether the back pressure of the operation submodule and the resequencing module execute output operation or not according to the operation state of the operation submodule and the output state of the resequencing module.

In order to make the technical solutions of the present application more obvious to those skilled in the art, the present application also provides an illustrative example in conjunction with fig. 10, which may include the following:

in this embodiment, the bus is 16 bytes, the Galois field multiplier is a single byte operation, and the adder is a Galois field adder. In the whole erasure coding and decoding, the matrix multiplication can be understood, and the fundamental calculation of the matrix multiplication is multiply-accumulate. Wherein the corresponding rows need to be multiplied and accumulated. The bus data of each row is subdivided into 16 bytes, and the current multiplier is operated by a single byte, so that the bus data can be calculated by 16 multipliers in parallel. That is, each operation submodule includes 16 galois field multipliers executed in parallel, and the number of matrix rows of the data to be erased is multiplied by the number of bus bytes.

In this embodiment, the data distribution module distributes the erasure-calculated matrix and the data according to the number of rows calculated by the corresponding matrix. Specifically, the data to be erased is RS/RS ^-14 × 16 bytes of RS/RS^-1The matrix is split into 4 rows, each row then being 16 bytes of data. A 16 byte block of data is fed into all rows. The input to each row is a 16 byte block of data and a 16 byte block of the matrix. Using 16 GF multipliers, the data block of each byte and the matrix block of each byte are respectively used as two inputs of the GF multipliers data _ a and data _ b. And the PE controller calculates the number of accumulated iteration and the time for finishing the operation according to the number of the multiply-accumulate. Determining RS/RS according to scheduling of PE controller^-1And the number of times of use of the adder. The GF adder is essentially an exclusive or operation in nature, and is used in this embodiment as an accumulation calculation for matrix multiply-accumulate. And the EC block outputs to control whether the preceding stage back pressure and the subsequent stage output or not according to the operation state of the preceding stage operation module and the output state of the subsequent stage operation module. And each row of matrix operation of the EC data block resequencing module is provided with an EC block output module, and the position of the row after the specific calculation needs to be determined and spliced again to be sent out.

The functions of each functional module of the galois field multiplier according to the embodiment of the present invention may refer to the description related to the implementation process in the above embodiments, and are not described herein again.

From the above, in the embodiment of the present invention, from the overall system perspective, the GF multiplier of the pipelining method is completely suitable for the functional requirements of erasure coding and decoding, hardware implementation is easy, and higher computational efficiency and data throughput rate can be ensured. The resource consumption of the table lookup GF multiplier in hardware implementation is greatly reduced, and the Galois field polynomial can be flexibly changed in real time according to the application requirements of various system applications to be calculated.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

A galois field multiplier and an erasure coding/decoding system provided by the present application are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. A Galois field multiplier is characterized by comprising a basic operation unit group and a cyclic processing unit group;

the loop processing unit group is used for determining the current loop times according to second input data, initialization data and a Galois field multiplication result output by the basic operation unit group and outputting a final calculation result;

each circulation processing unit of the circulation processing unit group comprises a register, an exclusive-OR gate and a selector; the register is respectively connected with the exclusive-OR gate and the selector, and the exclusive-OR gate is connected with the selector;

2. The galois field multiplier of claim 1 in which the total number of loop processing elements contained in the set of loop processing elements is the same as the data bit width value of the input data to the galois field multiplier; each bit data of the second input data corresponds to a cyclic processing unit.

3. The galois field multiplier of claim 2 in which the total number of operational elements contained in the basic group of operational elements is the difference between the data bit width value of the input data to the galois field multiplier and 1.

4. The galois field multiplier of claim 1 in which the registers are D type flip-flops.

5. An erasure coding and decoding system is characterized by comprising a data distribution module, an operation module and a reordering module;

the operation module comprises a plurality of operation sub-modules, and each operation sub-module is used for performing multiplication and accumulation calculation on a line of data to be calculated; each operational submodule comprising an adder and a plurality of galois field multipliers as claimed in any one of claims 1 to 4; the total number of the Galois field multipliers is determined according to the number of bus bytes; the adder is used for performing accumulation operation on the multiplication calculation result output by each Galois field multiplier;

6. An erasure coding and decoding system according to claim 5, wherein said operation submodule further comprises a PE controller, said PE controller being respectively connected to said adder and each of said Galois field multipliers;

7. The erasure coding and decoding system according to claim 6, wherein the operation sub-module further comprises an EC-block output unit;

8. The erasure coding and decoding system according to any one of claims 5 to 7, wherein the total number of Galois field multipliers contained in said operator sub-module is the same as the number of bus bytes; the total number of Galois field multipliers contained in the operation module is the product value of the number of the matrix rows of the data to be erased and the number of the bus bytes.

9. An erasure coding and decoding system according to claim 8, wherein said adder is a galois field adder.