WO2009133536A1 - Parallel table lookup using content identifiers - Google Patents

Parallel table lookup using content identifiers Download PDF

Info

Publication number
WO2009133536A1
WO2009133536A1 PCT/IB2009/051789 IB2009051789W WO2009133536A1 WO 2009133536 A1 WO2009133536 A1 WO 2009133536A1 IB 2009051789 W IB2009051789 W IB 2009051789W WO 2009133536 A1 WO2009133536 A1 WO 2009133536A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
contents
tag
ptlu
lut
Prior art date
Application number
PCT/IB2009/051789
Other languages
French (fr)
Inventor
Jan-Willem Van De Waerdt
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Publication of WO2009133536A1 publication Critical patent/WO2009133536A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0875Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/02Digital function generators
    • G06F1/03Digital function generators working, at least partly, by table look-up
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement

Definitions

  • the present invention relates generally to efficient data processing circuits employing lookup tables and tags stored in parallel table lookup modules to identify the lookup table contents.
  • a lookup table is a data structure that can be used to replace a runtime computation with a simpler lookup operation.
  • the speed gain can be significant, since retrieving a value from memory is often faster than undergoing a computation.
  • lookup tables provide an efficient means of data permutation and/or transformation.
  • processor architectures that support single-instruction, multiple-data (SIMD) or sub-word parallel processing
  • permutation and transformation operations of sub- word data elements are often the most time-consuming part of many algorithms.
  • Algorithms that may benefit from these operation types can be found in video domain transformation (for example, gamma correction), video enhancement (for example, peaking filters) or decryption/encryption algorithms.
  • Parallel lookup tables provide the parallelism required by the multiple sub- words in an operand word.
  • a traditional parallel lookup table can be described as follows.
  • the operation ptrd parallel table read
  • the PTLU has eight independently accessible tables. For example, for a table size of 256 by 8, each table has 256, or 2 8 , entries, each 8-bits wide.
  • the operation has a 64-bit input and a 64-bit output, and uses sub- word parallelism: the 64-bit operand words are organized as 8 independent 8-bit sub-words.
  • the data in the 8 tables in the PTLU is written with a dedicated operation, sometimes referred to as a ptw ("parallel table write").
  • ptw parallel table write
  • the functionality provided by traditional parallel table lookups can be used for byte level permutations and transformation, resulting in faster execution of algorithms that frequently use these types of operations. It should be noted, however, that to build up the table contents, 256 ptw operations are required to fill all 256 entries of all 8 tables. This is a significant amount of operations, particularly in multi-task environments where the contents may undergo frequent changes, for example as a result of task switching.
  • Various aspects of the present invention are directed to methods for use in a hardware-implemented parallel table lookup (PTLU) module that stores a set of lookup table (LUT) contents and includes a stored tag that identifies a currently-loaded set of LUT contents.
  • PTLU parallel table lookup
  • Such methods include, for a LUT access request, comparing the stored tag to an access request tag identifying a set of LUT contents associated with the access request, and in response to a mismatch between the access request tag and the stored tag, loading the set of LUT contents associated with the access request tag into the PTLU, and updating the stored tag with the access request tag.
  • the replaced LUT contents can be stored, for example in response to a determination that the contents had changed.
  • Various aspects of the present invention are further directed to methods for use in a hardware-implemented lookup module operating in a multiple -task environment, the lookup module being loaded with task-specific contents.
  • Such methods include storing a current-address tag identifying a main memory address associated with current task- specific contents of the lookup module, in response to executing a task comparing the current address tag with the task, thereby generating a match or a mismatch, maintaining the current address tag and current task-specific contents in response to a match, and in response to a mismatch, storing the current task-specific contents in the main memory address identified by the current address tag, loading new task-specific contents into the lookup module from a new local memory address associated with the task, and updating the current address tag to identify the new local memory address.
  • PTLU modules that include a plurality of independently accessible tables for storing task-specific content for use in a multiple task environment, a register for storing a tag representing a task- specific main memory address for storing the task-specific content, a circuit arrangement for comparing the tag stored in the register with the main memory address associated with the task being executed, and a memory load mechanism for loading task-specific content from the main memory address associated with the task being executed.
  • Still other aspects of the present invention are directed to methods for use with a processor operating in a multiple -task environment where tasks are executed using task- specific contents loaded into a parallel table lookup (PTLU) module.
  • PTLU parallel table lookup
  • Such methods include, in response to switching from executing a first-task to executing a second-task, storing first-task-specific contents of the PTLU module to a first main memory address associated with the first-task, the first main memory address identified by an address tag stored in the PTLU module, loading second-task-specific contents into the PTLU module from a second main memory address associated with the second-task, and updating the address tag stored in the PTLU module to identify the second main memory address.
  • FIG. 1 illustrates lookup table comparison and loading using contents identifier tags in accordance with certain embodiments of the present invention
  • FIG. 2 illustrates PTLU comparison and loading using contents address tags in accordance with certain embodiments of the present invention.
  • FIG. 3 illustrates a flow diagram of steps that can be implemented in accordance with certain embodiments of the present invention. While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the invention including aspects defined by the appended claims.
  • Embodiments of the present invention relate to storing a tag in a PTLU ("parallel table lookup") module identifying the current-loaded contents. In certain applications, when a LUT access request is processed, the stored tag can be checked to determine whether the correct contents reside in the PTLU module.
  • PTLU parallel table lookup
  • certain embodiments of the present invention relate to a hardware mechanism for loading the correct contents into the PTLU module.
  • the correct contents may reside in a main memory address identified by the LUT access request, allowing the PTLU module to perform a contiguous load operation.
  • loading of the entire PTLU contents can be made responsive to a single access request rather than by multiple ptw operations.
  • a PTLU module is used to store a set of LUT contents and a tag that identifies the currently- loaded LUT contents.
  • Different sets of LUT contents may be used by different operations (for example separate tasks, or different operations within the same task).
  • changes to the LUT contents may be required.
  • the tag stored in the PTLU module can be used to determine whether the current LUT contents are the contents that are now needed, or whether the current LUT contents are to be replaced. When replacing the current contents, the needed contents can be loaded into the PTLU module, for example from a main memory location.
  • the tag stored in the PTLU module identifies the memory address from which the current contents were retrieved and to which the current contents are stored (for example, if the current contents were changed due to a ptw operation after being loaded into the PTLU, such situation being identifiable by use of a dirty bit).
  • the tag is updated to identify the new contents.
  • Fig. 1 illustrates an exemplary embodiment that includes a PTLU 110 having lookup tables Tl, T2, T3, T4, and a stored tag 112 that identifies the currently-loaded LUT contents.
  • the number of tables shown in PTLU 110 is merely illustrative in that any suitable number of tables can be used, including one.
  • the PTLU 110 can include more than one set of LUT contents, and each set of contents can include its own tag.
  • the lookup tables are grouped into a single set HOA, and the tag 112 identifies the contents of the set.
  • a LUT access request includes an access request tag 122 that identifies LUT contents to be used.
  • the access request tag 122 can be compared to the current LUT contents tag 112. If the tags match, the current PTLU contents are used. If the tags do not match, then the PTLU is loaded with the correct contents as identified by the access request tag 122, which in turn becomes the new current LUT contents tag.
  • the LUT access request 120 is generally in response to executing an instruction that requires use of the PTLU contents. For example, when switching between tasks in a multi-task environment, the PTLU contents might or might not be appropriate for the new task. As such, a LUT access request can be triggered by or responsive to task switching.
  • the LUT access request may be independent of task switching.
  • task switching may result in executing tasks that do not require access of the PTLU contents, and as such the correct contents may still reside in the PTLU upon switching back to the original task, as would be revealed by a tag comparison.
  • the tag stored in the PTLU can be any identifier associated with the appropriate LUT contents that is capable of suitably distinguishing from other LUT contents, and the association can be implemented in any suitable manner including software or hardware.
  • the tags identify specific LUT contents by identifying a memory address where the relevant LUT contents are stored (and from which they can be retrieved).
  • the tags can be implemented as task identifiers. The various possible tag identifiers can be implemented individually or in combination.
  • embodiments of the present invention can be used advantageously in a multi-tasking environment where different tasks utilize different LUT contents and/or where the same task utilizes different LUT contents.
  • embodiments of the present invention provide identifier mechanisms to determine whether the correct LUT contents are in place (thereby avoiding ptw operations or initialization operations when not needed), and loading mechanisms to automatically load the correct contents into the PTLU without requiring ptw operations.
  • task A is executed and starts by initializing the lookup table using ptw operations, for example writing to the PTLU from an initialization data field called by the task A algorithm.
  • the task A LUT contents can be automatically loaded into the PTLU, for example from a main memory address identified by a tag provided in response to executing task A. Any suitable initialization procedures can be used.
  • a task switch decision is made to interrupt task A and start a new task B.
  • task B may initialize the lookup table using ptw operations, thereby overwriting the contents used by task A.
  • the task B LUT contents can be automatically loaded into the PTLU, for example from a main memory address identified by a tag provided in response to executing task B.
  • another task switch decision is made to interrupt task B switch back to continuing task A.
  • task A is confronted with lookup table content for task B, causing a potential conflict.
  • ptrd operations can add at least 256 ptrd and 256 ptw operations to the task switching software for each lookup table, assuming 256 entries.
  • ptw operations can add at least 256 ptrd and 256 ptw operations to the task switching software for each lookup table, assuming 256 entries.
  • a single software operation in the form of a LUT access request and identifier tag can trigger a LUT contents comparison, and if necessary an automatic loading of the correct contents.
  • certain embodiments of the present invention map the lookup table contents to a memory address associated with the contents, and store an address tag in the PTLU that refers to that memory address. This can serve multiple purposes.
  • the tag stored in the PTLU can be checked to determine whether the correct contents are loaded into the PTLU, thereby avoiding unnecessary loading operations when the correct contents are currently loaded.
  • the required contents can be automatically recalled and restored from the associated main memory address using processor hardware without performing ptw software operations.
  • the ptrd operation is extended with an additional 32-bit input, called "Rs2," that contains the base address of the lookup table data structure.
  • Rs2 additional 32-bit input
  • the ptrd operation is called as
  • ptrd(Rd, RsI, Rs2) where "Rd” is the destination register, "RsI” is a 64-bit input used to index the independent tables (e.g., eight 8-bit indices into TO, Tl, ..., T7), and Rs2 is the base address of lookup table structure.
  • the functionality of the ptrd operation is largely unchanged, but additionally includes reference to the PTLU address tag that associates the current table contents to a data structure at a certain address in main memory.
  • the operation ptrd(Rsl, A) performs a parallel table lookup on the PTLU, and the contents of the PTLU lookup table should be provided by the data structure at main memory address A. If the PTLU address tag matches A, the current PTLU content is correct and the lookup data is correct. If the PTLU address tag does not match A, the current PTLU content is incorrect, and the correct content is automatically loaded from the main memory at address A. Consequently, the PTLU address tag is updated to accurately reflect the newly loaded contents. Note that loading of table content into the PTLU can be performed on demand.
  • the PTLU upon switching to a new task with an algorithm using different lookup table contents, the PTLU is loaded only when the new task performs a ptrd operation. Thus, should a given task not require access to the PTLU, no content loading is necessary. Furthermore, explicit initialization code for the PTLU may no longer be necessary. A task can start an algorithm without an explicit initialization step due to automatic loading of the PTLU content as soon as the algorithm starts using the ptrd operations.
  • the current contents that are being replaced can be automatically stored back to the identified memory address in a manner similar to the automatic loading procedures.
  • Such storing back may not be needed, for example when the LUT contents are static (i.e., do not change during operation of the task), or when the LUT contents can be change but have not been changed.
  • a dirty bit can be included, for example in each lookup table, that signals whether or not the content has been changed since last being loaded into the PTLU. Checking the dirty bit can thus save unnecessary store -back steps.
  • Fig. 2 illustrates an example embodiment implemented in an eight table (TO through T7) PTLU 210 that transforms an 8-bit input vector 250 (b ⁇ through b7) into an 8-bit output vector 260 (v ⁇ through v7) based on the contents of the tables.
  • the PTLU includes an address tag 212 that indentifies the currently- loaded LUT contents by the main memory address where those contents are stored, for example, within a processor main memory module 230.
  • the memory address of the current contents is denoted "address Y”.
  • the requested contents are identified by the base address 222, in this case denoted "address X".
  • the PTLU includes a comparison mechanism 214 to determine whether address X matches address Y. If so, the current contents are used. If not, a memory load mechanism 216 loads the contents from address X into the tables, and the PTLU stored address tag 212 is updated to reflect address X.
  • Fig. 3 illustrates an exemplary flow diagram of steps that can be implemented in accordance with certain embodiments of the present invention.
  • a LUT access request is generated, for example in response to switching tasks, executing a new task, or in the process of running a task.
  • an identifier tag stored in the PTLU and which identifies the current contents is compared to an identifier tag associated with the LUT access request.
  • the tags match, the current contents are correct. If the tags do not match, the current contents are replaced by the correct contents, and the PTLU stored tag is updated to reflect the new contents.
  • the replaced contents can be stored. For example, if the replaced contents were changed after being loaded into the PTLU, they can be saved back to a main memory address identified by the associated tag.
  • aspects of the present invention can support: multiple lookup table structures (e.g., multiple sets of LUT contents) with multiple identifier tags; operations that perform a parallel write to the independent tables of a PTLU; different identifier tags for the different independent tables (e.g., TO, Tl, ..., T7), such that they may be loaded from different locations in main memory rather than being organized in a sequential nature; independent tables (e.g., TO, Tl, ..., T7) having different numbers of entries (e.g., not necessarily all 256) and/or having a different table entry widths (e.g., not necessarily 8-bit wide); and PTLUs having different numbers of independent tables (e.g., TO, Tl, ..., T3).
  • multiple lookup table structures e.g., multiple sets of LUT contents
  • different identifier tags for the different independent tables e.g., TO, Tl, ..., T7, such

Abstract

Hardware-implemented parallel table lookup (PTLU) modules (110, 210) store sets of lookup table (LUT) contents (HOA, T0-T7) and include a stored tag (112, 212) that identifies the currently-loaded set of LUT contents. In response to a LUT access request (120), for example when executing a task or switching between tasks, the stored tag is compared to an access request tag (122, 222) identifying a set of LUT contents associated with the access request. In response to a mismatch between the access request tag and the stored tag, the new set of LUT contents associated with the access request tag are loaded (214) into the PTLU, and the stored tag is updated (214) to reflect the new PTLU contents. Loading of the new contents can be performed automatically based on the LUT access request and mismatch determination, for example by a continuous load operation starting from a base address identified by the LUT access request tag.

Description

PARALLEL TABLE LOOKUP USING CONTENT IDENTIFIERS
The present invention relates generally to efficient data processing circuits employing lookup tables and tags stored in parallel table lookup modules to identify the lookup table contents.
A lookup table is a data structure that can be used to replace a runtime computation with a simpler lookup operation. The speed gain can be significant, since retrieving a value from memory is often faster than undergoing a computation. As such, lookup tables provide an efficient means of data permutation and/or transformation. For processor architectures that support single-instruction, multiple-data (SIMD) or sub-word parallel processing, permutation and transformation operations of sub- word data elements are often the most time-consuming part of many algorithms. Algorithms that may benefit from these operation types can be found in video domain transformation (for example, gamma correction), video enhancement (for example, peaking filters) or decryption/encryption algorithms.
Parallel lookup tables provide the parallelism required by the multiple sub- words in an operand word. In simplified form, a traditional parallel lookup table can be described as follows. The operation ptrd (parallel table read) performs 8 independent reads for 8-bit addresses on a parallel table lookup (PTLU) module. The PTLU has eight independently accessible tables. For example, for a table size of 256 by 8, each table has 256, or 28, entries, each 8-bits wide. The operation has a 64-bit input and a 64-bit output, and uses sub- word parallelism: the 64-bit operand words are organized as 8 independent 8-bit sub-words. The data in the 8 tables in the PTLU is written with a dedicated operation, sometimes referred to as a ptw ("parallel table write"). The functionality provided by traditional parallel table lookups can be used for byte level permutations and transformation, resulting in faster execution of algorithms that frequently use these types of operations. It should be noted, however, that to build up the table contents, 256 ptw operations are required to fill all 256 entries of all 8 tables. This is a significant amount of operations, particularly in multi-task environments where the contents may undergo frequent changes, for example as a result of task switching. Various aspects of the present invention are directed to methods for use in a hardware-implemented parallel table lookup (PTLU) module that stores a set of lookup table (LUT) contents and includes a stored tag that identifies a currently-loaded set of LUT contents. Such methods include, for a LUT access request, comparing the stored tag to an access request tag identifying a set of LUT contents associated with the access request, and in response to a mismatch between the access request tag and the stored tag, loading the set of LUT contents associated with the access request tag into the PTLU, and updating the stored tag with the access request tag. Optionally, the replaced LUT contents can be stored, for example in response to a determination that the contents had changed.
Various aspects of the present invention are further directed to methods for use in a hardware-implemented lookup module operating in a multiple -task environment, the lookup module being loaded with task-specific contents. Such methods include storing a current-address tag identifying a main memory address associated with current task- specific contents of the lookup module, in response to executing a task comparing the current address tag with the task, thereby generating a match or a mismatch, maintaining the current address tag and current task-specific contents in response to a match, and in response to a mismatch, storing the current task-specific contents in the main memory address identified by the current address tag, loading new task-specific contents into the lookup module from a new local memory address associated with the task, and updating the current address tag to identify the new local memory address.
Various aspects of the present invention are further directed to PTLU modules that include a plurality of independently accessible tables for storing task-specific content for use in a multiple task environment, a register for storing a tag representing a task- specific main memory address for storing the task-specific content, a circuit arrangement for comparing the tag stored in the register with the main memory address associated with the task being executed, and a memory load mechanism for loading task-specific content from the main memory address associated with the task being executed. Still other aspects of the present invention are directed to methods for use with a processor operating in a multiple -task environment where tasks are executed using task- specific contents loaded into a parallel table lookup (PTLU) module. Such methods include, in response to switching from executing a first-task to executing a second-task, storing first-task-specific contents of the PTLU module to a first main memory address associated with the first-task, the first main memory address identified by an address tag stored in the PTLU module, loading second-task-specific contents into the PTLU module from a second main memory address associated with the second-task, and updating the address tag stored in the PTLU module to identify the second main memory address.
The above summary is not intended to describe each embodiment or every implementation of the present disclosure. The figures and detailed description that follow more particularly exemplify various embodiments.
The invention may be more completely understood in consideration of the following detailed description of various embodiments of the invention in connection with the accompanying drawings, in which:
FIG. 1 illustrates lookup table comparison and loading using contents identifier tags in accordance with certain embodiments of the present invention;
FIG. 2 illustrates PTLU comparison and loading using contents address tags in accordance with certain embodiments of the present invention; and
FIG. 3 illustrates a flow diagram of steps that can be implemented in accordance with certain embodiments of the present invention. While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the scope of the invention including aspects defined by the appended claims. Embodiments of the present invention relate to storing a tag in a PTLU ("parallel table lookup") module identifying the current-loaded contents. In certain applications, when a LUT access request is processed, the stored tag can be checked to determine whether the correct contents reside in the PTLU module. In response to a determination that the required LUT contents are not loaded into the PTLU module, certain embodiments of the present invention relate to a hardware mechanism for loading the correct contents into the PTLU module. For example, the correct contents may reside in a main memory address identified by the LUT access request, allowing the PTLU module to perform a contiguous load operation. As such, in certain embodiments, loading of the entire PTLU contents can be made responsive to a single access request rather than by multiple ptw operations.
In various embodiments of the present invention, a PTLU module is used to store a set of LUT contents and a tag that identifies the currently- loaded LUT contents. Different sets of LUT contents may be used by different operations (for example separate tasks, or different operations within the same task). As a processor switches between or among various operations, changes to the LUT contents may be required. In various embodiments of the present invention, the tag stored in the PTLU module can be used to determine whether the current LUT contents are the contents that are now needed, or whether the current LUT contents are to be replaced. When replacing the current contents, the needed contents can be loaded into the PTLU module, for example from a main memory location. In certain embodiments, the tag stored in the PTLU module identifies the memory address from which the current contents were retrieved and to which the current contents are stored (for example, if the current contents were changed due to a ptw operation after being loaded into the PTLU, such situation being identifiable by use of a dirty bit). Upon replacing the contents of the PTLU, the tag is updated to identify the new contents.
Fig. 1 illustrates an exemplary embodiment that includes a PTLU 110 having lookup tables Tl, T2, T3, T4, and a stored tag 112 that identifies the currently-loaded LUT contents. As will be appreciated, the number of tables shown in PTLU 110 is merely illustrative in that any suitable number of tables can be used, including one. Additionally, the PTLU 110 can include more than one set of LUT contents, and each set of contents can include its own tag. In the example shown in Fig. 1, the lookup tables are grouped into a single set HOA, and the tag 112 identifies the contents of the set.
A LUT access request includes an access request tag 122 that identifies LUT contents to be used. Thus, when a lookup table access request 120 is received, the access request tag 122 can be compared to the current LUT contents tag 112. If the tags match, the current PTLU contents are used. If the tags do not match, then the PTLU is loaded with the correct contents as identified by the access request tag 122, which in turn becomes the new current LUT contents tag. The LUT access request 120 is generally in response to executing an instruction that requires use of the PTLU contents. For example, when switching between tasks in a multi-task environment, the PTLU contents might or might not be appropriate for the new task. As such, a LUT access request can be triggered by or responsive to task switching. In other cases, the LUT access request may be independent of task switching. In still other cases, task switching may result in executing tasks that do not require access of the PTLU contents, and as such the correct contents may still reside in the PTLU upon switching back to the original task, as would be revealed by a tag comparison.
The tag stored in the PTLU (as well as the LUT access request tag) can be any identifier associated with the appropriate LUT contents that is capable of suitably distinguishing from other LUT contents, and the association can be implemented in any suitable manner including software or hardware. In certain embodiments, the tags identify specific LUT contents by identifying a memory address where the relevant LUT contents are stored (and from which they can be retrieved). In certain embodiments where different LUT contents are associated with different tasks, the tags can be implemented as task identifiers. The various possible tag identifiers can be implemented individually or in combination.
As discussed, embodiments of the present invention can be used advantageously in a multi-tasking environment where different tasks utilize different LUT contents and/or where the same task utilizes different LUT contents. In such environments, embodiments of the present invention provide identifier mechanisms to determine whether the correct LUT contents are in place (thereby avoiding ptw operations or initialization operations when not needed), and loading mechanisms to automatically load the correct contents into the PTLU without requiring ptw operations.
To illustrate a LUT access request in response to task switching, consider two tasks A and B performing two different algorithms, and both of which requiring lookup table functionality where the LUT data structures or content requirements may be different. In an exemplary case, task A is executed and starts by initializing the lookup table using ptw operations, for example writing to the PTLU from an initialization data field called by the task A algorithm. Alternatively, the task A LUT contents can be automatically loaded into the PTLU, for example from a main memory address identified by a tag provided in response to executing task A. Any suitable initialization procedures can be used. At some later time, a task switch decision is made to interrupt task A and start a new task B. Assuming that new task B requires different LUT contents, task B may initialize the lookup table using ptw operations, thereby overwriting the contents used by task A. Alternatively, the task B LUT contents can be automatically loaded into the PTLU, for example from a main memory address identified by a tag provided in response to executing task B. At some later time, another task switch decision is made to interrupt task B switch back to continuing task A. At this point, task A is confronted with lookup table content for task B, causing a potential conflict. It is possible to avoid the conflict by saving the current lookup table contents using ptrd operations and restoring the lookup table contents for the new task using ptw operations, which can add at least 256 ptrd and 256 ptw operations to the task switching software for each lookup table, assuming 256 entries. In embodiments of the present invention, a single software operation in the form of a LUT access request and identifier tag can trigger a LUT contents comparison, and if necessary an automatic loading of the correct contents. To facilitate these mechanisms, certain embodiments of the present invention map the lookup table contents to a memory address associated with the contents, and store an address tag in the PTLU that refers to that memory address. This can serve multiple purposes. First, the tag stored in the PTLU can be checked to determine whether the correct contents are loaded into the PTLU, thereby avoiding unnecessary loading operations when the correct contents are currently loaded. In addition, when it is determined that the current PTLU contents need to be replaced, the required contents can be automatically recalled and restored from the associated main memory address using processor hardware without performing ptw software operations.
For example, assume a 32-bit address space. In certain embodiments, the ptrd operation is extended with an additional 32-bit input, called "Rs2," that contains the base address of the lookup table data structure. As such, the ptrd operation is called as
"ptrd(Rd, RsI, Rs2)" where "Rd" is the destination register, "RsI" is a 64-bit input used to index the independent tables (e.g., eight 8-bit indices into TO, Tl, ..., T7), and Rs2 is the base address of lookup table structure. The functionality of the ptrd operation is largely unchanged, but additionally includes reference to the PTLU address tag that associates the current table contents to a data structure at a certain address in main memory.
Following this example for a given PTLU, the operation ptrd(Rsl, A) performs a parallel table lookup on the PTLU, and the contents of the PTLU lookup table should be provided by the data structure at main memory address A. If the PTLU address tag matches A, the current PTLU content is correct and the lookup data is correct. If the PTLU address tag does not match A, the current PTLU content is incorrect, and the correct content is automatically loaded from the main memory at address A. Consequently, the PTLU address tag is updated to accurately reflect the newly loaded contents. Note that loading of table content into the PTLU can be performed on demand. That is, upon switching to a new task with an algorithm using different lookup table contents, the PTLU is loaded only when the new task performs a ptrd operation. Thus, should a given task not require access to the PTLU, no content loading is necessary. Furthermore, explicit initialization code for the PTLU may no longer be necessary. A task can start an algorithm without an explicit initialization step due to automatic loading of the PTLU content as soon as the algorithm starts using the ptrd operations.
In certain embodiments, when new LUT contents are loaded into the PTLU, the current contents that are being replaced can be automatically stored back to the identified memory address in a manner similar to the automatic loading procedures. Such storing back may not be needed, for example when the LUT contents are static (i.e., do not change during operation of the task), or when the LUT contents can be change but have not been changed. In the latter case, a dirty bit can be included, for example in each lookup table, that signals whether or not the content has been changed since last being loaded into the PTLU. Checking the dirty bit can thus save unnecessary store -back steps.
Fig. 2 illustrates an example embodiment implemented in an eight table (TO through T7) PTLU 210 that transforms an 8-bit input vector 250 (bθ through b7) into an 8-bit output vector 260 (vθ through v7) based on the contents of the tables. In addition to the LUT contents, the PTLU includes an address tag 212 that indentifies the currently- loaded LUT contents by the main memory address where those contents are stored, for example, within a processor main memory module 230. In the case illustrated, the memory address of the current contents is denoted "address Y". When a LUT access request is received, the requested contents are identified by the base address 222, in this case denoted "address X". The PTLU includes a comparison mechanism 214 to determine whether address X matches address Y. If so, the current contents are used. If not, a memory load mechanism 216 loads the contents from address X into the tables, and the PTLU stored address tag 212 is updated to reflect address X. By way of summary, Fig. 3 illustrates an exemplary flow diagram of steps that can be implemented in accordance with certain embodiments of the present invention. A LUT access request is generated, for example in response to switching tasks, executing a new task, or in the process of running a task. In response to the LUT access request, an identifier tag stored in the PTLU and which identifies the current contents is compared to an identifier tag associated with the LUT access request. If the tags match, the current contents are correct. If the tags do not match, the current contents are replaced by the correct contents, and the PTLU stored tag is updated to reflect the new contents. Optionally, the replaced contents can be stored. For example, if the replaced contents were changed after being loaded into the PTLU, they can be saved back to a main memory address identified by the associated tag.
It will be appreciated from the present disclosure that the present invention is not limited by any exemplary embodiments, and can include various alternatives and additional features. For example, aspects of the present invention can support: multiple lookup table structures (e.g., multiple sets of LUT contents) with multiple identifier tags; operations that perform a parallel write to the independent tables of a PTLU; different identifier tags for the different independent tables (e.g., TO, Tl, ..., T7), such that they may be loaded from different locations in main memory rather than being organized in a sequential nature; independent tables (e.g., TO, Tl, ..., T7) having different numbers of entries (e.g., not necessarily all 256) and/or having a different table entry widths (e.g., not necessarily 8-bit wide); and PTLUs having different numbers of independent tables (e.g., TO, Tl, ..., T3).
While the present invention has been described above and in the claims that follow, those skilled in the art will recognize that many changes may be made thereto without departing from the spirit and scope of the present invention.

Claims

What is claimed is:
1. For use in a hardware-implemented parallel table lookup (PTLU) module (110, 210) for storing a set of lookup table (LUT) contents (11OA, T0-T7) and having a stored tag (112, 212) that identifies a currently-loaded set of LUT contents, a method comprising: for a LUT access request (120), comparing the stored tag to an access request tag (122, 222) identifying a set of LUT contents associated with the access request; and in response to a mismatch between the access request tag and the stored tag, loading (214) the set of LUT contents associated with the access request tag into the PTLU, and updating (214) the stored tag with the access request tag.
2. The method of claim 1 , wherein the LUT access request is responsive to executing a task.
3. The method of claim 1, wherein loading the set of LUT contents is performed automatically without additional software operation other than the LUT access request.
4. The method of claim 1 , wherein the access request tag identifies a local memory address for storing and retrieving the set of LUT contents associated with the access request tag.
5. The method of claim 4, wherein the set of LUT contents associated with the access request tag are loaded into the PTLU from the local memory address.
6. The method of claim 1 , wherein the stored tag identifies a local memory address for storing and retrieving the currently-loaded set of LUT contents.
7. The method of claim 6, further comprising storing the currently-loaded set of LUT contents from the PTLU to the local memory address in response to loading the set of LUT contents associated with the access request tag into the PTLU.
8. The method of claim 7, wherein storing the currently-loaded set of LUT contents is further in response to determining whether the currently-loaded set of LUT contents were changed.
9. The method of claim 8, wherein determining whether the currently-loaded set of LUT contents were changed is performed using a dirty bit included in the LUTs.
10. For use in a hardware-implemented lookup module (110, 210) operating in a multiple -task environment, the lookup module being loaded with task-specific contents (T0-T7), a method comprising: storing a current-address tag (112, 212) identifying a main memory address associated with current task-specific contents of the lookup module; in response to executing a task, comparing the current address tag with the task, thereby generating a match or a mismatch; maintaining the current address tag and current task-specific contents in response to a match; and in response to a mismatch, storing the current task-specific contents in the main memory address identified by the current address tag, loading new task-specific contents into the lookup module from a new local memory address associated with the task, and updating the current address tag to identify the new local memory address.
11. The method of claim 10, wherein the lookup module is a parallel table lookup module.
12. The method of claim 10, wherein storing the current task-specific contents in the main memory address identified by the current address tag is responsive to the current task-specific contents having been changed, as indicated by a dirty bit.
13. A PTLU module (110, 210) comprising: a plurality of independently accessible tables (T0-T7) for storing task-specific content for use in a multiple task environment; a register for storing a tag (112, 212) representing a task-specific main memory address for storing the task-specific content; a circuit arrangement (214) for comparing the tag stored in the register with the main memory address associated with the task being executed; and a memory load mechanism (216) for loading task-specific content from the main memory address associated with the task being executed.
14. The PTLU module of claim 13, wherein the plurality of independently accessible tables are grouped into sets of tables.
15. The PTLU of claim 14, further comprising additional registers for storing additional tags such that each set of tables stores a respective tag.
16. For use with a processor operating in a multiple -task environment where tasks are executed using task-specific contents loaded into a parallel table lookup (PTLU) module (110, 210), a method comprising, in response to switching from executing a first- task to executing a second-task: storing first-task-specific contents of the PTLU module to a first main memory address associated with the first-task, the first main memory address identified by an address tag (112, 212) stored in the PTLU module; loading (214) second-task-specific contents into the PTLU module from a second main memory address associated with the second-task; and updating (214) the address tag stored in the PTLU module to identify the second main memory address.
17. The method of claim 16, wherein storing first-task-specific contents of the PTLU module to a first main memory address associated with the first-task is responsive to detecting that the first-task-specific contents were changed.
PCT/IB2009/051789 2008-05-02 2009-05-01 Parallel table lookup using content identifiers WO2009133536A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US5015808P 2008-05-02 2008-05-02
US61/050,158 2008-05-02

Publications (1)

Publication Number Publication Date
WO2009133536A1 true WO2009133536A1 (en) 2009-11-05

Family

ID=40957581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2009/051789 WO2009133536A1 (en) 2008-05-02 2009-05-01 Parallel table lookup using content identifiers

Country Status (1)

Country Link
WO (1) WO2009133536A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856197B2 (en) 2008-10-16 2014-10-07 Nxp, B.V. System and method for processing data using a matrix of processing units

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640533A (en) * 1992-12-21 1997-06-17 Intel Corporation Translation lookaside buffer (TLB) arrangement wherein the TLB contents are retained from task when it is swapped out and reloaded when the task is rescheduled
WO2006027021A1 (en) * 2004-09-10 2006-03-16 Freescale Semiconductor, Inc. Memory management unit and a method for memory management
US20070067602A1 (en) * 2005-09-16 2007-03-22 Callister James R Mitigating context switch cache miss penalty

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5640533A (en) * 1992-12-21 1997-06-17 Intel Corporation Translation lookaside buffer (TLB) arrangement wherein the TLB contents are retained from task when it is swapped out and reloaded when the task is rescheduled
WO2006027021A1 (en) * 2004-09-10 2006-03-16 Freescale Semiconductor, Inc. Memory management unit and a method for memory management
US20070067602A1 (en) * 2005-09-16 2007-03-22 Callister James R Mitigating context switch cache miss penalty

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
FISKIRAN A M ET AL: "On-Chip Lookup Tables for Fast Symmetric-Key Encryption", APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE PROCESSORS, 2005. ASAP 2005 . 16TH IEEE INTERNATIONAL CONFERENCE ON SAMOS, GREECE 23-25 JULY 2005, PISCATAWAY, NJ, USA,IEEE, 23 July 2005 (2005-07-23), pages 356 - 363, XP010854367, ISBN: 978-0-7695-2407-8 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8856197B2 (en) 2008-10-16 2014-10-07 Nxp, B.V. System and method for processing data using a matrix of processing units

Similar Documents

Publication Publication Date Title
US5721927A (en) Method for verifying contiquity of a binary translated block of instructions by attaching a compare and/or branch instruction to predecessor block of instructions
US8078828B1 (en) Memory mapped register file
US8386754B2 (en) Renaming wide register source operand with plural short register source operands for select instructions to detect dependency fast with existing mechanism
CN105980993B (en) Data processing apparatus and method
US5644746A (en) Data processing apparatus with improved mechanism for executing register-to-register transfer instructions
US7996646B2 (en) Efficient encoding for detecting load dependency on store with misalignment
KR20170076564A (en) Handling move instructions using register renaming
US9201656B2 (en) Data processing apparatus and method for performing register renaming for certain data processing operations without additional registers
US9684516B2 (en) Register renamer that handles multiple register sizes aliased to the same storage locations
US20200278867A1 (en) Device, processor, and method for splitting instructions and register renaming
JP4202244B2 (en) VLIW DSP and method of operating the same
US7143272B2 (en) Using computation histories to make predictions
JP2010134956A (en) Address conversion technique in context switching environment
US5897666A (en) Generation of unique address alias for memory disambiguation buffer to avoid false collisions
US20080148012A1 (en) Mathematical operation processing apparatus
US7231509B2 (en) Extended register bank allocation based on status mask bits set by allocation instruction for respective code block
US10862485B1 (en) Lookup table index for a processor
US20210165654A1 (en) Eliminating execution of instructions that produce a constant result
US20130339667A1 (en) Special case register update without execution
US8683178B2 (en) Sharing a fault-status register when processing vector instructions
WO2009133536A1 (en) Parallel table lookup using content identifiers
US5752271A (en) Method and apparatus for using double precision addressable registers for single precision data
US7702881B2 (en) Method and system for data transfers across different address spaces
US7779236B1 (en) Symbolic store-load bypass
JP5380102B2 (en) Microprocessor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09738549

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09738549

Country of ref document: EP

Kind code of ref document: A1