CN101645052A

CN101645052A - Quick direct memory access (DMA) ping-pong caching method

Info

Publication number: CN101645052A
Application number: CN200810142301A
Authority: CN
Inventors: 陈晨航
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2008-08-06
Filing date: 2008-08-06
Publication date: 2010-02-10
Anticipated expiration: 2028-08-06
Also published as: CN101645052B

Abstract

The invention relates to a quick direct memory access (DMA) ping-pong caching method, which is used for moving data blocks of which part of data are the same from adjacent data blocks. The method comprises the following steps that: the DMA firstly moves the data blocks in a processable byte quantity by a CPU into a target cache and then sequentially moves the data blocks into the target cache until the target cache is completely covered and the turn of data moving is completed, wherein the byte quantity of the sequentially moved data block is equal to that of the part of the different data inthe adjacent data blocks from the data blocks which need to be moved. The method can reduce the redundancy in the process of processing the part of the same data in the adjacent data so as to reduce the quantity of the data moved by the DMA at each time, thereby decreasing the wait time of the CPU.

Description

A kind of quick direct memory access (DMA) ping-pong caching method

Technical field

The present invention relates to technical field of information processing, be specifically related to a kind of quick DMA (Direct MemoryAccess, direct memory access) ping-pong buffer method.

Background technology

In existing mainstream chip processor (as DSP, FPGA, ASIC etc.), its on-chip memory space all is limited, and the access speed difference of on-chip memory and chip external memory is sizable.So, when handling the data (as audio-video signal) of highly dense intensity,, must use the relatively slow chip external memory of speed because data volume is very big, CPU (CPU (central processing unit)) directly visits chip external memory and can cause the reading and writing data disappearance in addition, causes the low of treatment effeciency.DMA is parts that can be independent of CPU work, and in existing main flow processor, DMA is indispensable parts.

In order to reduce the reading and writing data disappearance, a kind of disposal route commonly used just is to use DMA table tennis processing procedure, the table tennis processing procedure of DMA is meant when CPU handles the data in the memory headroom, the DMA parts are moved memory headroom to be processed CPU next time with the data that CPU needs to handle next time from external memory, because the running background characteristic of DMA, the time that makes DMA move hides in CPU in the process of pre-treatment.Like this, CPU just can directly obtain from on-chip memory when handling the data of internal memory next, finally reduces the time of reading and writing data disappearance.

In some cases, two adjacent data moving for twice of DMA are relevant.For example in video compress, existing ping-pong buffer method is when the current macro row is carried out estimation, start DMA the needed 48 row reference datas of next macro-block line are moved BUFFER_B in the sheet, as shown in Figure 1, that current macro is exercised usefulness here is BUFFER_A, be to take exercises by previous macro-block line to utilize DMA to move into when estimating, data structure by the used reference frame of estimation in the video coding can be noticed, have 2/3 to be that 32 row pixels are duplicate during adjacent twice DMA moves, adopt existing DMA table tennis processing mode can bring a lot of redundancies.

Summary of the invention

Technical matters to be solved by this invention is, a kind of ping-pong buffer of DMA fast method is provided, the present invention can reduce the redundancy in the process that the identical data processing of part is arranged in the adjacent data, and the data volume that makes each DMA move reduces, thereby reduces the time that CPU waits for.

A kind of quick direct memory access (DMA) ping-pong caching method, be used for moving adjacent data blocks the identical data block of partial data is arranged, described method is, DMA at first moves the data block of the each accessible amount of bytes size of CPU in the purpose buffer memory, and then move successively in the data block that need move with adjacent data blocks in the equal-sized data block of partial data amount of bytes inequality in ensuing purpose buffer memory, covered fully up to the purpose buffer memory, the epicycle data-moving finishes.

After the each data-moving of described DMA finished, CPU all can begin the data in the purpose buffer memory are handled, and started DMA simultaneously and carry out data-moving next time; And the like, all data in the purpose buffer memory are all processed to finish.

Described DMA is every to take turns in the data-moving, and for the first time the destination address of data-moving is the first address of purpose buffer memory, and the destination address of each data-moving is the data block amount of bytes of once moving before the destination address skew that last secondary data moves afterwards.

The start address of the data that described CPU handles for the first time is the first address of purpose buffer memory, the data block amount of bytes of once moving before the start address skew DMA of the data of single treatment before the start address of each data of handling is afterwards.

With respect to existing ping-pong buffer method, there is identical data in the method for the invention for adjacent data blocks and uses under the situation of same memory space, has reduced by 50% DMA data-moving amount.

Description of drawings

Fig. 1 is existing ping-pong buffer structural representation;

Fig. 2 is a ping-pong buffer structural representation of the present invention;

Fig. 3 is the method for the invention process flow diagram.

Embodiment

Below in conjunction with accompanying drawing method of the present invention is described in further details.

In conjunction with Fig. 2, shown in Figure 3, for for simplicity, the amount of bytes of supposing the each accessible data block of CPU of the present invention is the M byte, and the first address of purpose buffer memory is Addr, size is the N byte, and identical data volume is B in the data block that adjacent twice DMA moves when establishing ping-pong buffer, and the ratio of the relative M byte of B is a=B/M, then covers the required number of times of moving of DMA of purpose buffer memory N byte to be

L=floor ((N-M)/(M (1-a)))+1, wherein floor () expression rounds downwards.

The present invention finishes as follows:

The first step: utilize DMA to begin moving of M byte data piece, destination address is Addr, is the first address of purpose buffer memory, move address section in the purpose buffer memory and be [Addr, Addr+M);

Second step: the DMA that waits for the first step moves end, CPU is [Addr to address section, when data Addr+M) are handled, start the data block that DMA moves M* (1-a) byte once more, destination address is Addr+M, move address section in the purpose buffer memory for being [Addr+M, Addr+M* (1-a)+M);

The 3rd step: the DMA that waited for for second step moves end, CPU is [Addr+M* (1-a) to address section, the data of Addr+M* (1-a)+M) are handled, meanwhile, start DMA and move the next data block of M* (1-a) byte, destination address is Addr+M* (1-a)+M, and the address section of moving in the purpose buffer memory is [Addr+M* (1-a)+M, Addr+2M* (1-a)+M);

The 4th step: always repeated for the 3rd step, be DMA of every startup, then move M* (1-a) byte data piece, and each destination address skew M (1-a), the data start address that corresponding C PU handles also is offset M (1-a), begin the data that the L-1 time DMA moves are handled up to CPU, its address section be [Addr+ (L-2) * M (1-a), Addr+M+ (L-2) * M (1-a)), meanwhile, DMA begins to start the L time and moves, destination address is Addr+M+ (L-2) * M (1-a), and move address section in the purpose buffer memory be [Addr+M+ (L-2) * M (1-a), Addr+M+ (L-1) * M (1-a)), promptly [Addr+M+ (L-2) * M (1-a), 2M);

The 5th step: wait for that the L time DMA moves end, it is [Addr+ (L-1) * M (1-a) that CPU begins address section, Addr+M+ (L-1) * M (1-a)) data are handled, epicycle purpose buffer memory covers and finishes, meanwhile, start DMA and move the M byte data, and the start address of purpose buffer memory wheel goes back to Addr, the purpose buffer memory of a beginning new round covers, and only all disposes to all signals.

Estimation in encoding with video below illustrates concrete an application of the present invention as example, and other utilization can be done similar processing.

Estimation need be used the data of reference frame, common way be with the reference frame of current coding macro block row institute correspondence position last, current, down macro-block line totally 48 row pixels utilize DMA to move in the chip internal storer.Data structure by the used reference frame of estimation in the video coding it may be noted that have 2/3 to be that 32 row pixels are duplicate in the 48 row pixels that will move.

Utilize below the present invention right-motion estimation process of frame CIF image in different resolution is described, M=48 * 352=16896B in the present embodiment, the first address of purpose buffer memory are Addr, N=2M=33792B, B are 2/3M, a=B/M=2/3 is so there is L=4.

S01: utilize DMA to move the needed M byte of first macro-block line reference data, destination address is Addr, move in the purpose buffer memory address section for be [Addr, Addr+M);

S02: wait for that the DMA among the S01 moves end, CPU begins first macro-block line (being first macro-block line among the S01) is carried out estimation, the address section of employed reference data be [Addr, Addr+M); Meanwhile, start DMA and move the i.e. reference data of 16 row pixels of 1/3*M, destination address is Addr+M, move in the purpose buffer memory address section for be [Addr+M, Addr+4/3*M);

S03: wait for that the DMA among the S02 moves end, CPU carries out estimation to second macro-block line (S02 relatively), the address section of employed reference data be [Addr+1/3*M, Addr+4/3*M); Meanwhile, start the reference data that DMA moves ensuing 1/3*M, destination address is Addr+4/3*M, move in the purpose buffer memory address section for be [Addr+4/3*M, Addr+5/3*M);

S04: wait for that the DMA among the S03 moves end, CPU carries out estimation to the 3rd macro-block line (S02 relatively), employed reference data be address section [Addr+2/3*M, Addr+M*5/3); Meanwhile, start DMA and move ensuing 1/3*M reference data, destination address is Addr+5/3*M, move in the purpose buffer memory address section for be [Addr+5/3*M, Addr+2M);

S05: wait for that the DMA among the S04 moves end, CPU carries out estimation to the 4th macro-block line (S02 relatively), the address space of employed reference data be [Addr+M, Addr+2M), epicycle purpose buffer memory covers and finishes; Meanwhile, start DMA and move the reference data that corresponding size is M, at this moment, the destination address wheel goes back to Addr, and the macro-block line address among the S02 is done the skew of 4 macro-block line, as first macro-block line of estimation next time, repeat S02 and all dispose up to the macro-block line estimation of putting in order frame to S04.

With respect to existing ping-pong buffer method, caching method used in the present invention is using under the situation of same memory space, with the DMA data-moving amount that reduces 50% when doing Video Motion Estimation.

Claims

1, a kind of quick direct memory access (DMA) ping-pong caching method, be used for moving adjacent data blocks the identical data block of partial data is arranged, it is characterized in that, described method is, DMA at first moves the data block of the each accessible amount of bytes size of CPU in the purpose buffer memory, and then move successively in the data block that need move with adjacent data blocks in the equal-sized data block of partial data amount of bytes inequality in ensuing purpose buffer memory, covered fully up to the purpose buffer memory, the epicycle data-moving finishes.

2, quick direct memory access (DMA) ping-pong caching method as claimed in claim 1 is characterized in that, after the each data-moving of described DMA finished, CPU all can begin the data in the purpose buffer memory are handled, and starts DMA simultaneously and carry out data-moving next time; And the like, all data in the purpose buffer memory are all processed to finish.

3, quick direct memory access (DMA) ping-pong caching method as claimed in claim 1, it is characterized in that, described DMA is every to take turns in the data-moving, for the first time the destination address of data-moving is the first address of purpose buffer memory, and the destination address of each data-moving is the data block amount of bytes of once moving before the destination address skew that last secondary data moves afterwards.

4, quick direct memory access (DMA) ping-pong caching method as claimed in claim 2, it is characterized in that, the start address of the data that described CPU handles for the first time is the first address of purpose buffer memory, the data block amount of bytes of once moving before the start address skew DMA of the data of single treatment before the start address of each data of handling is afterwards.