CN112712456A - GPU processing circuit structure - Google Patents

GPU processing circuit structure Download PDF

Info

Publication number
CN112712456A
CN112712456A CN202110200065.0A CN202110200065A CN112712456A CN 112712456 A CN112712456 A CN 112712456A CN 202110200065 A CN202110200065 A CN 202110200065A CN 112712456 A CN112712456 A CN 112712456A
Authority
CN
China
Prior art keywords
unit
forwarding
data
computing
composite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110200065.0A
Other languages
Chinese (zh)
Inventor
黄永
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongtian Xingxing Shanghai Technology Co ltd
Original Assignee
Zhongtian Xingxing Shanghai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongtian Xingxing Shanghai Technology Co ltd filed Critical Zhongtian Xingxing Shanghai Technology Co ltd
Priority to CN202110200065.0A priority Critical patent/CN112712456A/en
Publication of CN112712456A publication Critical patent/CN112712456A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The application provides a GPU processing circuit structure, GPU processing circuit structure is applied to in the electronic equipment, the structure includes: the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the calculating unit, the signaling forwarding unit, the data forwarding unit and the composite forwarding unit is m x n. The technical scheme provided by the application has the advantage of high image processing speed.

Description

GPU processing circuit structure
Technical Field
The application relates to the technical field of data, in particular to a GPU processing circuit structure.
Background
A Graphics processor (abbreviated as GPU), also called a display core, a visual processor, and a display chip, is a microprocessor that is specially used for image and Graphics related operations on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.).
The conventional GPU has low image processing speed and influences the image processing speed.
Disclosure of Invention
The embodiment of the application provides a GPU processing circuit structure which has the advantage of high image processing speed.
In a first aspect, an embodiment of the present application provides a GPU processing circuit structure, where the GPU processing circuit structure is applied in an electronic device, and the GPU processing circuit structure includes:
the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the computing unit, the number of the signaling forwarding unit, the number of the data forwarding unit and the number of the composite forwarding unit is m x n;
wherein, n control units are arranged in columns, m × n control units are arranged in a matrix, m is a row value of the matrix, n is a column value of the matrix, and n control units 202 are respectively connected with n control units in a first column of the m × n control units arranged in the matrix; the memory cells are respectively connected with the m cells in the last row of the matrix arrangement;
the m-n units arranged in the matrix comprise a common area, a signaling forwarding area, a data forwarding area and a composite forwarding area; wherein, the normal area only includes the computational element, and the signaling forwarding area includes: a calculating unit and a signaling forwarding unit; the data forwarding area includes: the composite forwarding area comprises: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit;
a storage unit for storing calculation data or calculation results;
the storage unit is provided with a plurality of IO interfaces, and the plurality of IO interfaces are respectively connected with the m computing units, the data forwarding units and the composite forwarding units in the last row of the matrix arrangement;
the computing unit is used for executing operation on the computing data according to the computing instruction to obtain a computing result; sending the calculation result to a storage unit;
the signaling forwarding unit is used for receiving a calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array, the data forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite subregions, each composite subregion is arranged in a 3 × 3 array, the composite forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit.
M and n are integers greater than or equal to 5, and m is greater than or equal to n.
In a second aspect, an electronic device is provided that includes a GPU processing circuitry.
The embodiment of the application has the following beneficial effects:
the technical effect of the technical scheme provided by the application is mainly to reduce the forwarding times of the calculation data or the calculation signaling, that is, the forwarding times of the calculation data or the calculation signaling are reduced by arranging the corresponding forwarding units in different areas, so that the time delay of the calculation data and the calculation signaling is reduced, and the processing speed of the image is further improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of an electronic device.
Fig. 2 is a schematic diagram of a GPU processing circuit according to an embodiment of the present disclosure.
Fig. 3 is a schematic distribution diagram of a signaling sub-region, a data sub-region, and a composite sub-region provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, fig. 1 provides an electronic device, which may specifically include: the device comprises a processor, a memory, a camera and a display screen, wherein the components can be connected through a bus or in other ways, and the application is not limited to the specific way of the connection. In practical applications, the electronic device may be a smart phone, a personal computer, a server, a tablet computer, or the like.
The processor may specifically include: general purpose processors and image processors GPU.
Referring to fig. 2 (taking m =6, n =6 as an example, m may be greater than 6, such as 12, 18, and so on), fig. 2 provides a GPU processing circuit structure, which may specifically include: a storage unit 201, n control units 202, a calculation unit 203, a signaling forwarding unit 204, a data forwarding unit 205, and a composite forwarding unit 206; the sum of the number of the calculating unit 203, the signaling forwarding unit 204, the data forwarding unit 205 and the composite forwarding unit 206 is m × n;
wherein the n control units 202 are arranged in a column, the m × n control units are arranged in a matrix, where m is a row value of the matrix, n is a column value of the matrix, and the n control units 202 are respectively connected with the n control units in the first column of the m × n control units arranged in the matrix; the memory cells 201 are connected to the m cells in the last row of the matrix arrangement, respectively;
the m × n units arranged in the matrix comprise a common area 301, a signaling forwarding area 302, a data forwarding area 303 and a composite forwarding area 304; wherein, the normal area 301 only includes the calculating unit 203, and the signaling forwarding area 302 includes: a calculating unit and a signaling forwarding unit; the data forwarding area 303 includes: a computing unit and a data forwarding unit, the composite forwarding region 304 includes: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit; for convenience of drawing, the connections of the control unit with the signaling forwarding unit and the composite forwarding unit are indicated by dotted lines,
a storage unit 201 for storing calculation data or calculation results;
the storage unit 201 has a plurality of IO (input/output) interfaces, and the plurality of IO interfaces are respectively connected to the m calculation units, the data forwarding units, and the composite forwarding unit in the last row of the matrix arrangement; for convenience of drawing, the connections of the storage unit to the data forwarding unit and the composite forwarding unit are indicated by dotted lines,
a calculation unit, configured to perform an operation (arithmetic operations such as addition, subtraction, multiplication, and division) on calculation data (which may be read or received calculation data) according to the calculation instruction to obtain a calculation result; sending the calculation result to a storage unit (if the calculation result is connected with the IO interface of the storage unit, the calculation result is directly sent to the storage unit, and if the calculation result is not directly connected with the IO interface of the storage unit, the calculation result is sent to the storage unit through a forwarding mode (the calculation unit can be directly connected with a composite forwarding unit or the storage unit);
the signaling forwarding unit is used for receiving the calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array (as shown in fig. 3), the data forwarding unit is located at the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite sub-areas, each composite sub-area is arranged in a 3 × 3 array (as shown in fig. 3), the composite forwarding unit is located at the center of the 3 × 3 array, and the composite forwarding unit is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit, and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array.
M and n are integers greater than or equal to 5, and m is greater than or equal to n.
As shown in fig. 2, adjacent computing units are connected to each other, and the connection may be for transferring data or signaling. The adjacent positions can be vertically adjacent or horizontally adjacent.
The influence on the calculation speed mainly has 2 directions, the first direction is high in calculation speed, namely the same data calculation speed is high, the first direction is mainly related to the frequency of a processing circuit, the second direction is low in IO (input/output) overhead, namely the same data forwarding times are low, for the structure of the GPU, because the number of calculation units is large, if all the calculation units are directly connected with a control unit and a storage unit, the number of interfaces of the control unit and the storage unit is greatly increased, the cost is greatly increased, so that a large number of IO interfaces are not increased, the forwarding times are required to be reduced, based on the thought, 4 areas are divided, and different forwarding units (forwarding circuits) are respectively arranged for the characteristics of the 4 areas to realize different functions, so that the image processing speed is improved.
The technical effect of the technical scheme provided by the application is mainly to reduce the forwarding times of the calculation data or the calculation signaling, that is, the forwarding times of the calculation data or the calculation signaling are reduced by arranging corresponding forwarding units in different areas, so that the time delay of the calculation data and the calculation signaling is reduced, and the processing speed of the image is further improved.
Each of the above units may be implemented by a hardware circuit, which includes but is not limited to an FPGA, a CGRA, an ASIC, an analog circuit, a memristor, and the like.
In an optional scheme, the data forwarding unit and the composite forwarding unit are respectively provided with a register, and the register is used for storing data.
The register is arranged here, so that the data calculated by the calculation units can be cached, and the efficiency of data reading or storing is improved.
The application also provides an electronic device, which comprises the GPU processing circuit structure.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (4)

1. A GPU processing circuit structure, which is applied in an electronic device, the structure comprising:
the device comprises a storage unit, n control units, a calculation unit, a signaling forwarding unit, a data forwarding unit and a composite forwarding unit; the sum of the number of the computing unit, the number of the signaling forwarding unit, the number of the data forwarding unit and the number of the composite forwarding unit is m x n;
wherein, n control units are arranged in columns, m × n control units are arranged in a matrix, m is a row value of the matrix, n is a column value of the matrix, and n control units 202 are respectively connected with n control units in a first column of the m × n control units arranged in the matrix; the memory cells are respectively connected with the m cells in the last row of the matrix arrangement;
the m-n units arranged in the matrix comprise a common area, a signaling forwarding area, a data forwarding area and a composite forwarding area; wherein, the normal area only includes the computational element, and the signaling forwarding area includes: a calculating unit and a signaling forwarding unit; the data forwarding area includes: the composite forwarding area comprises: a computing unit and a composite forwarding unit;
the control unit is used for sending a calculation instruction to the calculation unit, the signaling forwarding unit and the composite forwarding unit;
a storage unit for storing calculation data or calculation results;
the storage unit is provided with a plurality of IO interfaces, and the plurality of IO interfaces are respectively connected with the m computing units, the data forwarding units and the composite forwarding units in the last row of the matrix arrangement;
the computing unit is used for executing operation on the computing data according to the computing instruction to obtain a computing result; sending the calculation result to a storage unit;
the signaling forwarding unit is used for receiving a calculation instruction sent by the control unit and forwarding the calculation instruction to 8 calculation units at the edge of the 3 × 3 array of the control unit;
the data forwarding area comprises a plurality of data subregions, each data subregion is arranged in a 3 × 3 array, the data forwarding unit is positioned in the center of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the data forwarding unit is used for extracting the computing data of the storage unit and forwarding the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
the composite forwarding area comprises a plurality of composite subregions, each composite subregion is arranged in a 3 × 3 array, the composite forwarding unit is positioned at the center position of the 3 × 3 array and is respectively connected with 8 computing units at the edge of the 3 × 3 array, and the composite forwarding unit is used for receiving computing instructions sent by the control unit, extracting computing data of the storage unit and forwarding the computing instructions and the computing data to the 8 computing units at the edge of the 3 × 3 array of the control unit;
m and n are integers greater than or equal to 5, and m is greater than or equal to n.
2. A GPU processing circuit arrangement according to claim 1, wherein the data forwarding unit and the composite forwarding unit are provided with registers, respectively.
3. An electronic device, characterized in that it comprises a GPU processing circuitry structure of claim 1 or 2.
4. The electronic device of claim 3, wherein the electronic device is: smart phones, tablet computers, and notebook computers.
CN202110200065.0A 2021-02-23 2021-02-23 GPU processing circuit structure Withdrawn CN112712456A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110200065.0A CN112712456A (en) 2021-02-23 2021-02-23 GPU processing circuit structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110200065.0A CN112712456A (en) 2021-02-23 2021-02-23 GPU processing circuit structure

Publications (1)

Publication Number Publication Date
CN112712456A true CN112712456A (en) 2021-04-27

Family

ID=75550223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110200065.0A Withdrawn CN112712456A (en) 2021-02-23 2021-02-23 GPU processing circuit structure

Country Status (1)

Country Link
CN (1) CN112712456A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580890A (en) * 2012-07-26 2014-02-12 深圳市中兴微电子技术有限公司 Reconfigurable on-chip network structure and configuration method thereof
US20170228344A1 (en) * 2016-02-05 2017-08-10 Google Inc. Matrix processing apparatus
CN109978149A (en) * 2017-12-28 2019-07-05 北京中科寒武纪科技有限公司 Dispatching method and relevant apparatus
CN110785778A (en) * 2018-08-14 2020-02-11 深圳市大疆创新科技有限公司 Neural network processing device based on pulse array
CN111079908A (en) * 2018-10-18 2020-04-28 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103580890A (en) * 2012-07-26 2014-02-12 深圳市中兴微电子技术有限公司 Reconfigurable on-chip network structure and configuration method thereof
US20170228344A1 (en) * 2016-02-05 2017-08-10 Google Inc. Matrix processing apparatus
CN109978149A (en) * 2017-12-28 2019-07-05 北京中科寒武纪科技有限公司 Dispatching method and relevant apparatus
CN110785778A (en) * 2018-08-14 2020-02-11 深圳市大疆创新科技有限公司 Neural network processing device based on pulse array
CN111079908A (en) * 2018-10-18 2020-04-28 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
冷镇宇: "基于GPU-like和GPU-CPU架构的异构片上网络的设计与研究", 《中国优秀硕士学位论文全文数据库》 *

Similar Documents

Publication Publication Date Title
CN112214726B (en) Operation accelerator
CN109725936B (en) Method for implementing extended computing instruction and related product
US20220114708A1 (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN100412821C (en) An apparatus and method for facilitating memory data access with generic read/write patterns
KR880011681A (en) Memory-Connected Wavefront Array Processors
US9489342B2 (en) Systems, methods, and computer program products for performing mathematical operations
CN110597559A (en) Computing device and computing method
CN110096310B (en) Operation method, operation device, computer equipment and storage medium
CN110147249B (en) Network model calculation method and device
US20210182025A1 (en) Accelerating 2d convolutional layer mapping on a dot product architecture
US7519646B2 (en) Reconfigurable SIMD vector processing system
CN110163349B (en) Network model calculation method and device
CN111814957A (en) Neural network operation method and related equipment
CN111381808B (en) Multiplier, data processing method, chip and electronic equipment
CN112712456A (en) GPU processing circuit structure
WO2019067934A1 (en) System and method of feature descriptor processing
Hariyama et al. VLSI processor for reliable stereo matching based on window-parallel logic-in-memory architecture
US6748514B2 (en) Parallel processor and image processing system for simultaneous processing of plural image data items without additional circuit delays and power increases
CN112915534B (en) Game image calculation method and device
CN114661634A (en) Data caching device and method, integrated circuit chip, computing device and board card
CN109558109B (en) Data operation device and related product
CN113052751A (en) Artificial intelligence processing method
CN113012024A (en) Image processing method and device
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN109885271A (en) Data display and treating method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210427