CN112712456A - GPU processing circuit structure - Google Patents
GPU processing circuit structure Download PDFInfo
- Publication number
- CN112712456A CN112712456A CN202110200065.0A CN202110200065A CN112712456A CN 112712456 A CN112712456 A CN 112712456A CN 202110200065 A CN202110200065 A CN 202110200065A CN 112712456 A CN112712456 A CN 112712456A
- Authority
- CN
- China
- Prior art keywords
- unit
- forwarding
- data
- computing
- composite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Abstract
The application provides a GPU processing circuit structure, GPU processing circuit structure is applied to in the electronic equipment, the structure includes: the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the calculating unit, the signaling forwarding unit, the data forwarding unit and the composite forwarding unit is m x n. The technical scheme provided by the application has the advantage of high image processing speed.
Description
Technical Field
The application relates to the technical field of data, in particular to a GPU processing circuit structure.
Background
A Graphics processor (abbreviated as GPU), also called a display core, a visual processor, and a display chip, is a microprocessor that is specially used for image and Graphics related operations on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.).
The conventional GPU has low image processing speed and influences the image processing speed.
Disclosure of Invention
The embodiment of the application provides a GPU processing circuit structure which has the advantage of high image processing speed.
In a first aspect, an embodiment of the present application provides a GPU processing circuit structure, where the GPU processing circuit structure is applied in an electronic device, and the GPU processing circuit structure includes:
the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the computing unit, the number of the signaling forwarding unit, the number of the data forwarding unit and the number of the composite forwarding unit is m x n;
wherein, n control units are arranged in columns, m × n control units are arranged in a matrix, m is a row value of the matrix, n is a column value of the matrix, and n control units 202 are respectively connected with n control units in a first column of the m × n control units arranged in the matrix; the memory cells are respectively connected with the m cells in the last row of the matrix arrangement;
the m-n units arranged in the matrix comprise a common area, a signaling forwarding area, a data forwarding area and a composite forwarding area; wherein, the normal area only includes the computational element, and the signaling forwarding area includes: a calculating unit and a signaling forwarding unit; the data forwarding area includes: the composite forwarding area comprises: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit;
a storage unit for storing calculation data or calculation results;
the storage unit is provided with a plurality of IO interfaces, and the plurality of IO interfaces are respectively connected with the m computing units, the data forwarding units and the composite forwarding units in the last row of the matrix arrangement;
the computing unit is used for executing operation on the computing data according to the computing instruction to obtain a computing result; sending the calculation result to a storage unit;
the signaling forwarding unit is used for receiving a calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array, the data forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite subregions, each composite subregion is arranged in a 3 × 3 array, the composite forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit.
M and n are integers greater than or equal to 5, and m is greater than or equal to n.
In a second aspect, an electronic device is provided that includes a GPU processing circuitry.
The embodiment of the application has the following beneficial effects:
the technical effect of the technical scheme provided by the application is mainly to reduce the forwarding times of the calculation data or the calculation signaling, that is, the forwarding times of the calculation data or the calculation signaling are reduced by arranging the corresponding forwarding units in different areas, so that the time delay of the calculation data and the calculation signaling is reduced, and the processing speed of the image is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an electronic device.
Fig. 2 is a schematic diagram of a GPU processing circuit according to an embodiment of the present disclosure.
Fig. 3 is a schematic distribution diagram of a signaling sub-region, a data sub-region, and a composite sub-region provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 provides an electronic device, which may specifically include: the device comprises a processor, a memory, a camera and a display screen, wherein the components can be connected through a bus or in other ways, and the application is not limited to the specific way of the connection. In practical applications, the electronic device may be a smart phone, a personal computer, a server, a tablet computer, or the like.
The processor may specifically include: general purpose processors and image processors GPU.
Referring to fig. 2 (taking m =6, n =6 as an example, m may be greater than 6, such as 12, 18, and so on), fig. 2 provides a GPU processing circuit structure, which may specifically include: a storage unit 201, n control units 202, a calculation unit 203, a signaling forwarding unit 204, a data forwarding unit 205, and a composite forwarding unit 206; the sum of the number of the calculating unit 203, the signaling forwarding unit 204, the data forwarding unit 205 and the composite forwarding unit 206 is m × n;
wherein the n control units 202 are arranged in a column, the m × n control units are arranged in a matrix, where m is a row value of the matrix, n is a column value of the matrix, and the n control units 202 are respectively connected with the n control units in the first column of the m × n control units arranged in the matrix; the memory cells 201 are connected to the m cells in the last row of the matrix arrangement, respectively;
the m × n units arranged in the matrix comprise a common area 301, a signaling forwarding area 302, a data forwarding area 303 and a composite forwarding area 304; wherein, the normal area 301 only includes the calculating unit 203, and the signaling forwarding area 302 includes: a calculating unit and a signaling forwarding unit; the data forwarding area 303 includes: a computing unit and a data forwarding unit, the composite forwarding region 304 includes: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit; for convenience of drawing, the connections of the control unit with the signaling forwarding unit and the composite forwarding unit are indicated by dotted lines,
a storage unit 201 for storing calculation data or calculation results;
the storage unit 201 has a plurality of IO (input/output) interfaces, and the plurality of IO interfaces are respectively connected to the m calculation units, the data forwarding units, and the composite forwarding unit in the last row of the matrix arrangement; for convenience of drawing, the connections of the storage unit to the data forwarding unit and the composite forwarding unit are indicated by dotted lines,
a calculation unit, configured to perform an operation (arithmetic operations such as addition, subtraction, multiplication, and division) on calculation data (which may be read or received calculation data) according to the calculation instruction to obtain a calculation result; sending the calculation result to a storage unit (if the calculation result is connected with the IO interface of the storage unit, the calculation result is directly sent to the storage unit, and if the calculation result is not directly connected with the IO interface of the storage unit, the calculation result is sent to the storage unit through a forwarding mode (the calculation unit can be directly connected with a composite forwarding unit or the storage unit);
the signaling forwarding unit is used for receiving the calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array (as shown in fig. 3), the data forwarding unit is located at the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite sub-areas, each composite sub-area is arranged in a 3 × 3 array (as shown in fig. 3), the composite forwarding unit is located at the center of the 3 × 3 array, and the composite forwarding unit is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit, and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array.
M and n are integers greater than or equal to 5, and m is greater than or equal to n.
As shown in fig. 2, adjacent computing units are connected to each other, and the connection may be for transferring data or signaling. The adjacent positions can be vertically adjacent or horizontally adjacent.
The influence on the calculation speed mainly has 2 directions, the first direction is high in calculation speed, namely the same data calculation speed is high, the first direction is mainly related to the frequency of a processing circuit, the second direction is low in IO (input/output) overhead, namely the same data forwarding times are low, for the structure of the GPU, because the number of calculation units is large, if all the calculation units are directly connected with a control unit and a storage unit, the number of interfaces of the control unit and the storage unit is greatly increased, the cost is greatly increased, so that a large number of IO interfaces are not increased, the forwarding times are required to be reduced, based on the thought, 4 areas are divided, and different forwarding units (forwarding circuits) are respectively arranged for the characteristics of the 4 areas to realize different functions, so that the image processing speed is improved.
The technical effect of the technical scheme provided by the application is mainly to reduce the forwarding times of the calculation data or the calculation signaling, that is, the forwarding times of the calculation data or the calculation signaling are reduced by arranging corresponding forwarding units in different areas, so that the time delay of the calculation data and the calculation signaling is reduced, and the processing speed of the image is further improved.
Each of the above units may be implemented by a hardware circuit, which includes but is not limited to an FPGA, a CGRA, an ASIC, an analog circuit, a memristor, and the like.
In an optional scheme, the data forwarding unit and the composite forwarding unit are respectively provided with a register, and the register is used for storing data.
The register is arranged here, so that the data calculated by the calculation units can be cached, and the efficiency of data reading or storing is improved.
The application also provides an electronic device, which comprises the GPU processing circuit structure.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.
Claims (4)
1. A GPU processing circuit structure, which is applied in an electronic device, the structure comprising:
the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the computing unit, the number of the signaling forwarding unit, the number of the data forwarding unit and the number of the composite forwarding unit is m x n;
wherein, n control units are arranged in columns, m × n control units are arranged in a matrix, m is a row value of the matrix, n is a column value of the matrix, and n control units 202 are respectively connected with n control units in a first column of the m × n control units arranged in the matrix; the memory cells are respectively connected with the m cells in the last row of the matrix arrangement;
the m-n units arranged in the matrix comprise a common area, a signaling forwarding area, a data forwarding area and a composite forwarding area; wherein, the normal area only includes the computational element, and the signaling forwarding area includes: a calculating unit and a signaling forwarding unit; the data forwarding area includes: the composite forwarding area comprises: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit;
a storage unit for storing calculation data or calculation results;
the storage unit is provided with a plurality of IO interfaces, and the plurality of IO interfaces are respectively connected with the m computing units, the data forwarding units and the composite forwarding units in the last row of the matrix arrangement;
the computing unit is used for executing operation on the computing data according to the computing instruction to obtain a computing result; sending the calculation result to a storage unit;
the signaling forwarding unit is used for receiving a calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array, the data forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite subregions, each composite subregion is arranged in a 3 × 3 array, the composite forwarding unit is positioned at the center position of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
m and n are integers greater than or equal to 5, and m is greater than or equal to n.
2. A GPU processing circuit arrangement according to claim 1, wherein the data forwarding unit and the composite forwarding unit are provided with registers, respectively.
3. An electronic device, characterized in that it comprises a GPU processing circuitry structure of claim 1 or 2.
4. The electronic device of claim 3, wherein the electronic device is: smart phones, tablet computers, and notebook computers.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110200065.0A CN112712456A (en) | 2021-02-23 | 2021-02-23 | GPU processing circuit structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110200065.0A CN112712456A (en) | 2021-02-23 | 2021-02-23 | GPU processing circuit structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112712456A true CN112712456A (en) | 2021-04-27 |
Family
ID=75550223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110200065.0A Withdrawn CN112712456A (en) | 2021-02-23 | 2021-02-23 | GPU processing circuit structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112712456A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103580890A (en) * | 2012-07-26 | 2014-02-12 | 深圳市中兴微电子技术有限公司 | Reconfigurable on-chip network structure and configuration method thereof |
US20170228344A1 (en) * | 2016-02-05 | 2017-08-10 | Google Inc. | Matrix processing apparatus |
CN109978149A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
CN110785778A (en) * | 2018-08-14 | 2020-02-11 | 深圳市大疆创新科技有限公司 | Neural network processing device based on pulse array |
CN111079908A (en) * | 2018-10-18 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
-
2021
- 2021-02-23 CN CN202110200065.0A patent/CN112712456A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103580890A (en) * | 2012-07-26 | 2014-02-12 | 深圳市中兴微电子技术有限公司 | Reconfigurable on-chip network structure and configuration method thereof |
US20170228344A1 (en) * | 2016-02-05 | 2017-08-10 | Google Inc. | Matrix processing apparatus |
CN109978149A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
CN110785778A (en) * | 2018-08-14 | 2020-02-11 | 深圳市大疆创新科技有限公司 | Neural network processing device based on pulse array |
CN111079908A (en) * | 2018-10-18 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
Non-Patent Citations (1)
Title |
---|
冷镇宇: "基于GPU-like和GPU-CPU架构的异构片上网络的设计与研究", 《中国优秀硕士学位论文全文数据库》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112214726B (en) | Operation accelerator | |
CN109725936B (en) | Method for implementing extended computing instruction and related product | |
US20220114708A1 (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN100412821C (en) | An apparatus and method for facilitating memory data access with generic read/write patterns | |
KR880011681A (en) | Memory-Connected Wavefront Array Processors | |
US9489342B2 (en) | Systems, methods, and computer program products for performing mathematical operations | |
CN110597559A (en) | Computing device and computing method | |
CN110096310B (en) | Operation method, operation device, computer equipment and storage medium | |
CN110147249B (en) | Network model calculation method and device | |
US20210182025A1 (en) | Accelerating 2d convolutional layer mapping on a dot product architecture | |
US7519646B2 (en) | Reconfigurable SIMD vector processing system | |
CN110163349B (en) | Network model calculation method and device | |
CN111814957A (en) | Neural network operation method and related equipment | |
CN111381808B (en) | Multiplier, data processing method, chip and electronic equipment | |
CN112712456A (en) | GPU processing circuit structure | |
WO2019067934A1 (en) | System and method of feature descriptor processing | |
Hariyama et al. | VLSI processor for reliable stereo matching based on window-parallel logic-in-memory architecture | |
US6748514B2 (en) | Parallel processor and image processing system for simultaneous processing of plural image data items without additional circuit delays and power increases | |
CN112915534B (en) | Game image calculation method and device | |
CN114661634A (en) | Data caching device and method, integrated circuit chip, computing device and board card | |
CN109558109B (en) | Data operation device and related product | |
CN113052751A (en) | Artificial intelligence processing method | |
CN113012024A (en) | Image processing method and device | |
CN112395008A (en) | Operation method, operation device, computer equipment and storage medium | |
CN109885271A (en) | Data display and treating method, device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210427 |